satyam openanalytics nyc

Upload: james-moore

Post on 03-Apr-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Satyam OpenAnalytics NYC

    1/24

    1BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy

    BIG DATA ANALYTICS&PITFALLS TO AVOID

    Dr. Satyam Priyadarshy

    June 17, 2013 New York City

  • 7/28/2019 Satyam OpenAnalytics NYC

    2/24

  • 7/28/2019 Satyam OpenAnalytics NYC

    3/24

    3BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy

    BIG DATA Buzz - Should Business Care?

    Big Data future is bright.Organizations that caneffectively leverage BigData without sinking in

    the Big Data Hole willrealize additionalbusiness value, a loyalcustomer base andincreased profits.

    2.5 Exa bytes of newdata/day generated

    What we know?

    A top business priority

    Big opportunities available

    Everyone is talking about it

    But...

    Emerging technology helps

    Adds value definitely

    Definition, Leverage is not clear

    Big challenges for companies

    The path to execute is less understood

    Realization is complex but getting easier

    Expertise is demand but supply is short

  • 7/28/2019 Satyam OpenAnalytics NYC

    4/24

    4BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy

    BIG DATA - 7 Vs that describe

    VELOCITYMoving away frombatch processing to

    real-time addition ofmassive data for nearreal-time analysis

    VARIETYStructured andunstructured data - e.g.

    POS data, Sensor Data,transaction data, callcenter data, supply chaindata, new media data,etc.

    VERACITYReliability andpredictability of not

    so precise data types.E.g. Sentiment data,Weather data and itsimpact on business.

    VOLUMEThe ever growing dataform Terra bytes to

    Peta bytes to Zettabytes

    Big Data definition isevolving. The origin ofword dates back to 1990.Typically 4 Vs defined

    Big Data, but I stronglyrecommend the 7 Vsthat describe Big Data.

    (Source:chiefknowledgeguru.com)

    80% of data generated isunstructured

    VALUEUnless value isrealized, Big Data isa just Big Hole

    VIRTUALData resides in virtualenvironment - e.g.POS, Private and Public

    Clouds, Geo-located,inside and outsidefirewalls

    VARIATIONNo single configurationof the 6 Vs below fitseveryone. There is

    variation for eachbusiness.

  • 7/28/2019 Satyam OpenAnalytics NYC

    5/24

    5BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy

    KARMA matters

    Knowledge

    Business,

    Technology,PeopleStrategy

    Big DataSources,Lifecycle

    Re-invest

    based onactions

    Action

    Scalable

    Architecture,Infrastructure,Tools &Technology,Resources

    Mining the BigData with

    targeted andopen mind tofind Gold andother items

    Recognition

    Revenue By

    Sell NewInsights

    IncreaseProfitMargins

    Add newfeatures to

    products &services

    Market

    Grow Share

    CustomerCentricity

    Advance

    Innovate

    with help ofBig analytics

    Gather evenmore BigData andkeep goingthrough this

    cycle

  • 7/28/2019 Satyam OpenAnalytics NYC

    6/24

    6BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy

    KARMA SCORE is calculated using maturity level of thesecapabilities

    Parallel Processing, API,Query, Reporting

    Data Mining, Analytics,Pattern, Statistics

    Machine Learning,Inference predictions

    Tools, Technologies,Human Resources

    Service to support business Data, Information,Knowledge, Process

    PresentationVisualization, Mobility,Collaboration, Exploration

    Actions ImproveProduct/Services, GrowRevenue/Profits, Agility

    Collection of Raw Data,

    Structured&Unstructured, Discovery,Staging

    Extract, Load, Transform

    Data Connectors, Access,Use, Move

    Data Storage: Hadoop,NoSQL, Key-value, MPP,In-memory, blobs, etc.

    Policy, Privacy, Security,

    Metadata, Risk, Total cost ofownership, Access control

    Data Lifecycle, Data Assets,SLA, ROI, ROA, Data Quality

    Physical Store, VirtualStorage, Encryption,Masking, Archive, DisasterRecovery

    DataGovernance

    andManagement

    Big Data

    Big Math andBig Analytics

    Big Value, BigActions

  • 7/28/2019 Satyam OpenAnalytics NYC

    7/247BIG DATA ANALYTICS & PITFALLS TO AVOID

    Dr. Satyam Priyadarshy

    What ever your KARMA Score is?One can leverage Big Data eventually

    The Great Enabler is OPEN SOURCE Revolution

    In the last decade or so.

  • 7/28/2019 Satyam OpenAnalytics NYC

    8/248BIG DATA ANALYTICS & PITFALLS TO AVOID

    Dr. Satyam Priyadarshy

    In a Zoo In an Open Environment

    OPEN SOURCE Creates a HAPPY, FLOURISHINGEnvironment

  • 7/28/2019 Satyam OpenAnalytics NYC

    9/249BIG DATA ANALYTICS & PITFALLS TO AVOID

    Dr. Satyam Priyadarshy

    Open Source Key Characteristics

    FREE (*)

    NOT CAGED, NOT

    BLACK BOX

    MODIFICATIONSALLOWED

    MODIFIEDVERSIONS

    REDISTRBUTABLE

    LIVES INHARMONY WITH

    OTHERS

  • 7/28/2019 Satyam OpenAnalytics NYC

    10/24

    10BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy

    Open Source BIG DATA PLAYERS

    THESE TOOLS ENABLE YOU TO DIG THE GOLD IN BIG DATA(This is not a comprehensive list of tools/technologies)

  • 7/28/2019 Satyam OpenAnalytics NYC

    11/24

    11BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy

    ACTION for finding the GOLD

    PROBLEMSOLVING

    OPERATIONAL

    STRATEGICFUTURISTICBasic Analytics

    Advanced Analytics

    Holistic Analytics

    GO FOR THE GOLD

    ADDRESSESCurrent Concerns

    Reduce Costs

    Eliminate Issues

    ADDRESSES GROWTH

    Customer Centric

    Easily Incorporate New Data

    Innovation Related

    Emerging Trends Adoption

    BIG DATA,

    BIG MATH,

    BIG ANALYTICS

    Descriptive Statistics

    Inferential Statistics

  • 7/28/2019 Satyam OpenAnalytics NYC

    12/24

    12BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy

    THATS A GOLD MINE

  • 7/28/2019 Satyam OpenAnalytics NYC

    13/24

    13BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy

    WHATS IN A GOLD MINE?

    Gold Suite

    BASE Suite

    Iron-Manganese

    Suite

    Gold

    Arsenic

    MercuryTungsten

    Silver

    Copper

    Lead

    ZincBismuth

    Cadmium

    Molybdenum

    Silver

    Iron

    Manganese

    CobaltNickel

    Yttrium

    To GET GOLD ONE HAS TO DIG DEEPER

    IF YOU FOUND

    SILVER WHILE DIGGING FOR GOLD

    WHAT WOULD YOU DO?

  • 7/28/2019 Satyam OpenAnalytics NYC

    14/24

    14BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy

    CASE STUDY DDoS Attack

    PROBLEMBIGANALYTICS

    THE GOLDKNOWLEDGE

    ACTIONS

    RECOGINITION

    Source of attacks identified

    After integrating

    Distributed targets

    Multiple attack types

    Slow performance over

    binary data sets

    A step closer to solution,but requires more work to

    get it near real-time for

    actionable insights.

    Feedback loop to known

    datasets to enhance the

    predictability and

    performance

    45 days later

    Its Science not BI

    DNS Servers are persistently

    attacked to create DdoS

    Attacks. Can we predict?

    CHALLENGES:

    7+ TB / Day

    Varied Formats based on

    Request and type of

    attacks

    Hadoop based data storage

    APPROACH

    Hive / MapR queries and

    R for statistical analysis

    Interconnection of datawith known data sources

    for identification

    Tableau and (Open

    source DS3.js and

    Ploticus) for Visualization

    Iteratively optimized

    queries for speed

  • 7/28/2019 Satyam OpenAnalytics NYC

    15/24

    15BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy

    CASE STUDY- DDoS Attack Pattern Based Study

    -200

    -100

    0

    100

    200

    300

    400

    500

    600

    700

    800

    900

    -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2

    Single Day - Outlier Events - 10K Size ::

    Zones Hit from Multiple Sources

    -200

    -100

    0

    100

    200

    300

    400

    500

    600

    700

    800

    900

    -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2

    Single Day - Outlier Events - 2K Size

    :: Zones Hit from Multiple Sources

    ABC.TLDABC.TLD

    SB

    GOLD.TLD

    TrafficVolume

    Unique ZRatio

    AFTER DIGGING FURTHER

    Unique ZRatio

  • 7/28/2019 Satyam OpenAnalytics NYC

    16/24

    16BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy

    PITFALLS

    Lack of knowledge Tools, Data

    Science

    Too Much Data. Initially mostof it was discarded

    HOW TO OVERCOME

    Deploy Hadoop Clusters withcheap storage and store withbest possible compression

    BIG DATA PITFALLS

    Expert, Education, Execution

    Big Data can help MOSTBUSINESSES

    Executives Not Sure

    Belief Big DATA has all theanswers

    The Whole Mine is NOTGOLD.. Shows insights andcoach

    Education, Best Practices andInsights after mining and finduseful patterns initially

  • 7/28/2019 Satyam OpenAnalytics NYC

    17/24

    17BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy

    PITFALLS

    Silo Culture

    Multiple copies of same datain different formats

    HOW TO OVERCOME

    Keep Raw Data (along withDR site), Transform duringAnalysis

    BIG DATA PITFALLS

    Devastating for companies.Single Source of Truth Key toSuccess

    Big Data can help MOSTBUSINESSES

    Well Established Enterprise DataWarehouse

    Intuition Based Culture

    Can only focus on Gold, ifyou find Silver and otherprecious metal, you miss themark. Show Insights andMove On To Gold

    Keep it for Simple,Operational Analytics,Augment with Big Data forInnovation and FutureGrowth

  • 7/28/2019 Satyam OpenAnalytics NYC

    18/24

    18BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy

    Simple way to see some Big Data Challenges

    Data acquisition

    Storage

    Processing1st

    Data transport & dissemination

    Data management & curation

    Big Analytics Tools, Technology, Know-How2nd

    Privacy, Security and Disaster Recovey

    Technical/Scientific Talent

    Cost of all of the above3rd

  • 7/28/2019 Satyam OpenAnalytics NYC

    19/24

    19BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy

    KARMA matters

    Knowledge

    Business,Technology,PeopleStrategy

    Big DataSources,Lifecycle

    Re-invest

    based onactions

    Action

    ScalableArchitecture,Infrastructure,Tools &Technology,Resources

    Mining the BigData withtargeted andopen mind tofind Gold andother items

    Recognition

    Revenue BySell NewInsights

    IncreaseProfitMargins

    Add newfeatures to

    products &services

    Market

    Grow Share

    CustomerCentricity

    Advance

    Innovatewith help ofBig analytics

    Gather evenmore BigData andkeep goingthrough thiscycle

  • 7/28/2019 Satyam OpenAnalytics NYC

    20/24

    20BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy

    THANK YOU UNDERSTAND YOUR BIG DATA KARMA SCORE ANDUnderstand the Big Picture, THE Direction and LEAD

    Helps Build

    Strong

    Foundation

    Focus on OUR MOST

    VALUED CUSTOMES

    INCREASE

    PROFITABiLITY

  • 7/28/2019 Satyam OpenAnalytics NYC

    21/24

    21BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy

    Appendix

  • 7/28/2019 Satyam OpenAnalytics NYC

    22/24

    22BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy

    The Pitfalls for Adopting Big Data

    The Big Data Definition of 4 Vs Velocity,Volume, Variety, Veracity is incomplete.

    The Belief that Big Data solves everythingfor Everyone.

    Big Data is Abound, but Dimensions of itare to be understood

    The Loudest Often Wins (LOW) or thehighest paid persons opinion (HIPPO)prevails

    Data Driven approach trumps intuition is ahard nut to crack. Really!!

    Data for Datas Sake Talent Gap

    Data, Data Everywhere

    Infighting

    Aiming Too High

    Reference: Wall Street Journal March 11,2013 on page R4

  • 7/28/2019 Satyam OpenAnalytics NYC

    23/24

    23BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy

    Time Management (ByFrederick Winslow Taylor)

    Zero Defects Analysis andPacing of Assemby Line

    (Ford)

    Statistical Process Control(Walter Shewhart)

    Operational ResearchPopularized (Royal Air

    Force)

    Social NetworkAnalysis

    Business IntelligenceTerm coined (H. P.

    Luhn)

    Artificial Intelligence(John McCarthy)

    Exploratory Data Analysis- visualization (John

    Turkey)

    Business IntellgiencePopularized (Gartner)

    Expert Systems (using AI)

    The Visual Display ofQuantitative Information

    (Edward Tufte)

    Data Mining (part ofAI) and Web analytics

    Big Analytics

    1890 1920 1950 1980 2010

    Brief History of Analytics

  • 7/28/2019 Satyam OpenAnalytics NYC

    24/24

    24BIG DATA ANALYTICS & PITFALLS TO AVOID

    DEFINITIONS of Analytics for Business

    ANALYTICSAny data-driven process that provides insights

    ADVANCED ANALYTICS Helps understanding cause-effect relationship, prediction of future events,

    best possible action

    BIG ANALYTICS FOR BUSINESS

    Relevant for the business, actionable insights for

    increasing revenue/profit, value measurement andleverages Big Data.