data-ed webinar: demystifying big data

73
Copyright 2013 by Data Blueprint Demystifying Big Data Date: May 14, 2013 Time: 2:00 PM ET/11:00 AM PT Presenter: Peter Aiken, Ph.D. Every century, a new technology-steam power, electricity, atomic energy, or microprocessors-has swept away the old world with a vision of a new one. Today, we seem to be entering the era of Big Data Michael Coren 1

Upload: dataversity

Post on 15-Jul-2015

2.359 views

Category:

Technology


2 download

TRANSCRIPT

Copyright 2013 by Data Blueprint

Demystifying Big Data

Date: May 14, 2013Time: 2:00 PM ET/11:00 AM PTPresenter: Peter Aiken, Ph.D.

• Every century, a new technology-steam power, electricity, atomic energy, or microprocessors-has swept away the old world with a vision of a new one. Today, we seem to be entering the era of Big Data– Michael Coren

• Every century, a new technology-steam power, electricity, atomic energy, or microprocessors-has swept away the old world with a vision of a new one. Today, we seem to be entering the era of Big Data– Michael Coren

1

Copyright 2013 by Data Blueprint 2

Live Twitter Feed @datablueprint @paiken #dataed

Like Us www.facebook.com/datablueprint Join the Group Data Management & Business Intelligence

Get Social with Us!

Presented by Peter Aiken, Ph.D.

Demystifying Big Data 2.0Developing the Right Approach for Implementing Big Data Techniques

Copyright 2013 by Data Blueprint 4

Peter Aiken, PhD• 30+ years of experience in data

management• Multiple international awards &

recognition• Founder, Data Blueprint (datablueprint.com)

• Associate Professor of IS, VCU (vcu.edu)

• Past President, DAMA International (dama.org)

• 9 books and dozens of articles• Experienced w/ 500+ data management

practices in 20 countries• Multi-year immersions with

organizations as diverse as the US DoD, Nokia, Deutsche Bank, Wells Fargo, and the Commonwealth of Virginia

2

Copyright 2013 by Data Blueprint

Outline

• Big Data Context: Why the Big Deal about Big Data?

• Big Data Challenges: Historical Perspective

• Big Data Challenges: Today• Big Data Approach: Crawl, Walk, Run• Design Principles:

Foundational & Technical• Take Aways and Q&A

5

Copyright 2013 by Data Blueprint

Why the Big Deal about Big Data?

6

• We are at an inflection point: The sheer volume of data generated, stored, and mined for insights has become economically relevant to businesses, government, and consumers (McKinsey)

• We believe the same important principles still apply:

– What problem are you trying to solve for your business? Your solution needs to fit your problem

– Doing data for (big) data’s sake is not going to solve any problems

– Risk of spending a lot of money on chasing Big Data that will realize little to no returns - especially at this hype cycle stage

http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation?p=1

Copyright 2013 by Data Blueprint

Myth #1: Everyone should invest in Big Data

Fact:• Not every company will benefit

from Big Data• It depends on your size and your

ability– Local pizza shop vs. state-wide or

national chain

7

Copyright 2013 by Data Blueprint

Big Data can create significant financial value across sectors

8

• Some (not all) companies can take advantage of Big Data to create value if they want to compete

Copyright 2013 by Data Blueprint

5 Ways in which Big Data creates Big Business Value1. Information is transparent and

usable at much higher frequency

2. Expose variability and boost performance

3. Narrow segmentation of customers and more precisely tailored products or services

4. Sophisticated analytics and improved decision-making

5. Improved development of the next generation of products and services

9

http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation?p=1

Copyright 2013 by Data Blueprint

Myth #2: Big Data has a clear definition

Fact:• The term is used so often and in

many contexts that its meaning has become vague and ambiguous

• Industry experts and scientists often disagree

10

http://articles.washingtonpost.com/2013-08-16/opinions/41416362_1_big-data-data-crunching-marketing-analytics

Copyright 2013 by Data Blueprint

Defining Big Data

11

• Gartner: High-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision-making, insight discovery and process optimization.

• IBM: Datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.

• NY Times: Shorthand for advancing trends in technology that open the door to a new approach to understanding the world and making decisions.

• McKinsey: Large pools of data that can be brought together and analyzed to discern patterns and make better decisions

1. VolumeThe amount of data

2. VelocityThe speed of data going in and out

3. VarietyThe range of data types & sources

4. VariabilityMany options or variable interpretations confound analysis

Q: "Would it be more useful to refer to "big data techniques?"

Copyright 2013 by Data Blueprint

Big Data Characteristics generally include:

12

Copyright 2013 by Data Blueprint

Big Data Gartner Hype Cycle

13

Copyright 2013 by Data Blueprint

Some Big Data Limitations

• Data analysis struggles with social cognition

• Data struggles with context• Data creates bigger haystacks• Big data has trouble with big

problems• Data favors memes over

masterpieces• Data obscures values

14

David Brooks, New York Times: http://www.nytimes.com/2013/02/19/opinion/brooks-what-data-cant-do.html?_r=0

Copyright 2013 by Data Blueprint

Business Information Market: $1.1 Trillion a Year

15

• Enterprises spend an average of $38 million on information/year

• Small and medium sized businesses on average spend $332,000

http://www.cio.com.au/article/429681/five_steps_how_better_manage_your_data

Copyright 2013 by Data Blueprint

Big Data = Big Spending• Enterprises are spending wildly on Big Data but don’t know if it’s

worth it yet (Business Insider, 2012)• Big Data Technology Spending Trend:

– 83% increase over the next 3 years (worldwide):• 2012: $28 billion• 2013: $34 billion • 2016: $232 billion

16

• Caution:– Don’t fall victim to SOS (Shiny Object

Syndrome)– A lot of money is being invested but is it

generating the expected return?– Gartner Hype Cycle suggests results are

going to be disappointinghttp://www.businessinsider.com/enterprise-big-data-spending-2012-11#ixzz2cdT8shhe

http://www.inc.com/kathleen-kim/big-data-spending-to-increase-for-it-industry.htmlhttp://www.gartner.com/DisplayDocument?id=2195915&ref=clientFriendlyUrl

Copyright 2013 by Data Blueprint

Myth #3: Big Data is just another IT project

Fact:• Big Data is not your typical IT

project– Does not answer typical IT questions– Trend analysis, agile, actionable, etc.– Fundamentally different approach

• Big Data Projects are exploratory• Big Data enables new capabilities• Big Data can be a disruptive

technology• It might sound simple but that

doesn’t mean it’s easy• Beware of SOS (Shiny Object

Syndrome)

17

Copyright 2013 by Data Blueprint

Healthcare Example: Patient Data

18

• Clinical data:– Diagnosis/prognosis/treatment– Genetic data

• Patient demographic data• Insurance data:

– Insurance provider– Claims data

• Prescriptions & pharmacy information• Physical fitness data

– Activity tracking through smartphone apps & social media

• Health history• Medical research data

Copyright 2013 by Data Blueprint

Retail Example: Loyalty Programs & Big Data

19

• Companies need to understand current wants and needs AND predict future tendencies

• Customer -> Repeat Customer -> Brand Advocate• Customer loyalty programs & retention strategies

– Track what is being purchased and how often– Coupons based on purchasing history– Targeted communications, campaigns & special offers– Social media for additional interactions– Personalize consumer interactions

• Customer purchase history influences product placements– Retailers rapidly respond to consumer demands– Product placements, planogram optimization, etc.

http://www.forbes.com/sites/xerox/2013/09/27/big-data-boosts-customer-loyalty-no-really/

Copyright 2013 by Data Blueprint

Take Aways-Big Data Context• Technology continues to evolve at

increasing speeds• Big Data is here

– We have the potential to create insights

• Spend wisely & strategically: – Big Data is not going to solve

all your problems.• Fact:

– Big Data is not for everyone• Fact:

– Lack of a clear definition• Hype Cycle:

– Current: Peak of Inflated Expectations– Soon: Trough of Disillusionment

20

Copyright 2013 by Data Blueprint

Outline

• Big Data Context: Why the Big Deal about Big Data?

• Big Data Challenges: Historical Perspective

• Big Data Challenges: Today• Big Data Approach: Crawl, Walk, Run• Design Principles:

Foundational & Technical• Take Aways and Q&A

21

Copyright 2013 by Data Blueprint

Myth #4: Big Data is new

Fact:• The term originated in the Silicon

Valley in the 1990s• The concept has been used

previously– 800 year old linguistic datasets– Use in sciences in 1600s– Kepler, Sloan Digital Sky Survey,

Statisticians’ view

• Much harder to leverage Big Data when you lack appropriate techniques

22

http://articles.washingtonpost.com/2013-08-16/opinions/41416362_1_big-data-data-crunching-marketing-analytics

Copyright 2013 by Data Blueprint 23

Bills of Mortality

“The Human Face of Big Data”, Rick Smolan & Jennifer Erwitt

Early Database

Copyright 2013 by Data Blueprint 24

Mortality Geocoding

Where is it happening?

When is it happening?

Why is it happening?

“The Human Face of Big Data”, Rick Smolan & Jennifer Erwitt

1.Volume– Plague data collection points

2.Velocity– Speed at which disease

registers are updated

3.Variety– Who is collecting plague data

points, how, and where?

4.Variability– Different ways of recording

disease patterns and using that data

– No social media yet but gossip existed

Copyright 2013 by Data Blueprint

Big Data Characteristics & the Plague

25

Copyright 2013 by Data Blueprint 26

John Snow’s 1854 Cholera Map of London

Copyright 2013 by Data Blueprint

Take Aways-Historic Big Data Challenges

• Fact: Big Data is not new• Foundational data

management challenges remain similar

• Bills of Mortality by John Graunt– First true health data set– World’s first pattern of

data– Foundation for probability

industry, statistics, insurance

27

Copyright 2013 by Data Blueprint

Outline

• Big Data Context: Why the Big Deal about Big Data?

• Big Data Challenges: Historical Perspective

• Big Data Challenges: Today• Big Data Approach: Crawl, Walk, Run• Design Principles:

Foundational & Technical• Take Aways and Q&A

28

Copyright 2013 by Data Blueprint

Myth #5: Big Data is innovative

Fact:• Big Data techniques are innovative• ROI and insights depend on the size

of the business and the amount of data used and produced, e.g.– Local pizza place vs. Papa John’s– Retail

29

Copyright 2013 by Data Blueprint

Data Footprints• SQL Server

– 47,000,000,000,000 bytes– Largest table 34 billion records

3.5 TBs

• Informix– 1,800,000,000 queries/day– 65,000,000 tables / 517,000

databases

• Teradata– 117 billion records– 23 TBs for one table

• DB2– 29,838,518,078 daily

queries

30

Copyright 2013 by Data Blueprint

Big Data Characteristics generally include:1. Volume

The amount of data2. Velocity

The speed of data going in and out

3. VarietyThe range of data types & sources

4. VariabilityMany options or variable interpretations confound analysis

31

Q: "Would it be more useful to refer to "big data techniques?"

Copyright 2013 by Data Blueprint 32

2012 London Summer Games• 60 GB of data/second• 200,000 hours of big data will

be generated testing systems• 2,000 hours media coverage/

daily• 845 million Facebook users

averaging 15 TB/day• 13,000 tweets/second• 4 billion watching• 8.5 billion devices connected

#1 VOLUME,The Amount of Data

Copyright 2013 by Data Blueprint

#2 VELOCITY, The Speed of Data

33

http://www.youtube.com/watch?v=LrWfXn_mvK8

Nanex 1/2 Second Trading Data

May 2, 2013

Johnson & Johnson

The European Union last year approved a new rule mandating that all trades must exist for at least a half-second - in this instance 1,200 orders and 215 actual trades

Copyright 2013 by Data Blueprint 34

#3 VARIETY, Range of Data Types & Sources Increasingly individuals make use of data producing gadgets to perform services for them

Copyright 2013 by Data Blueprint 35

#4 VARIABILITY, Many options or variable interpretations confound analysis

Historyflow-Wikipedia entry for the word “Islam”

Copyright 2013 by Data Blueprint

Take Aways: Big Data Challenges Today• Fact: Big Data techniques are innovative but

“Big Data” is not• Challenges are both foundational and

technical, today as well as in 1600s• Technology continues to advance rapidly (4

Vs)• Challenges associated with Big Data are not

new:– Well-known foundational data management issues– Need to align data and business with rapidly

changing environment– Duplicity, accessibility, availability– Foundational business issues

36

Copyright 2013 by Data Blueprint

Outline

• Big Data Context: Why the Big Deal about Big Data?

• Big Data Challenges: Historical Perspective

• Big Data Challenges: Today• Big Data Approach: Crawl, Walk, Run• Design Principles:

Foundational & Technical• Take Aways and Q&A

37

Copyright 2013 by Data Blueprint

Myth #6: Big Data provides all the Answers

Fact:• Big Data does not mean the end of

scientific theory• Be careful or you’ll end up with

spurious correlations– Don’t just go fishing for correlations and

hope they will explain the world

• To get to the WHY of things, you need ideas, hypotheses and theories

• Having more data does not substitute for thinking hard, recognizing anomalies and exploring deep truths

• You need the right approach

38

http://articles.washingtonpost.com/2013-08-16/opinions/41416362_1_big-data-data-crunching-marketing-analytics

Copyright 2013 by Data Blueprint 39

Copyright 2013 by Data Blueprint 40

• Identify business opportunity• How can data be leveraged in

exploring– External market place

• Analyze opportunities and threats– Internal efficiencies

• Analyze strengths and weaknesses

Copyright 2013 by Data Blueprint 41

Example: 2012 Olympic Summer Games1. Volume: 845 million FB users averaging 15 TB

+ of data/day2. Velocity: 60 GB of data per second 3. Variety: 8.5 billion devices connected4. Variability: Sponsor data, athlete data, etc.5. Vitality: Data Art project “Emoto”6. Virtual: Social media

Copyright 2013 by Data Blueprint 42

• Based on my 6 V analysis, do I need a Big Data solution or does my current BI solution address my business opportunity?– Do the 6 Vs indicate general Big Data characteristics?– What are the limitations of my current Bi environment?

(Technology constraint)– What are my budgetary restrictions? (Financial constraint)– What is my current Big Data knowledge base? (Knowledge

constraint)

Copyright 2013 by Data Blueprint 43

• MUST have both Foundational and Technical practice expertise

Copyright 2013 by Data Blueprint 44

Copyright 2013 by Data Blueprint 45

• Data Strategy

• Data Governance

• Data Architecture

• Data Education

Copyright 2013 by Data Blueprint 46

• Data Quality

• Data Integration

• Data Platforms

• BI/Analytics

Copyright 2013 by Data Blueprint 47

• Needs to be actionable• Generally well understood by

business• Document what has been learned

Copyright 2013 by Data Blueprint 48

• Perfect results are not necessary

• Reiterate and refine• Iterative process to

reach decision point• Use as feedback for

next exploration

Copyright 2013 by Data Blueprint 49

Copyright 2013 by Data Blueprint

Take Aways-Approach: Crawl, Walk, Run• Crawl:

– Identify business opportunity and determine whether you truly need a Big Data solution

• Walk:– Apply a combination of

foundational and technical data management practices. Document your insights and make sure they are actionable

• Run: – Recycle and explore. Staying

agile allows you to be exploratory.

50

Copyright 2013 by Data Blueprint

Outline

• Big Data Context: Why the Big Deal about Big Data?

• Big Data Challenges: Historical Perspective

• Big Data Challenges: Today• Big Data Approach: Crawl, Walk, Run• Design Principles:

Foundational & Technical• Take Aways and Q&A

51

Copyright 2013 by Data Blueprint

• Your data strategy must align to your organizational business strategy and operating model

• As the market place becomes more data-driven, a data-focused business strategy is an imperative

• Must have data strategy before you have a Big Data strategy

52

Foundational Practice: Data Strategy

Copyright 2013 by Data Blueprint

Data Strategy Case StudyEnterprise Information Management Maturity

53

Copyright 2013 by Data Blueprint

• What are the questions that you cannot answer today?

• Is there a direct reliance on understanding customer behavior to drive revenue?

• Do you have information overload and are you trying to find the signal in the noise?

• Which is more important:– Establishing value from current

data assets/data reporting?– Exploring Big Data

opportunities?

54

Data Strategy Considerations

Copyright 2013 by Data Blueprint

Myth #7: You need Big Data for Insights

Fact:• Distinction between Big Data and

doing analytics– Big Data is defined by the technology stack

that you use– Big Data is used for predictive and

prescriptive analytics

• Use existing data for reporting, figure out bottlenecks and optimize current business model

• Understand how is your data structured, architected and stored

55

Copyright 2013 by Data Blueprint

• Common vocabulary expressing integrated requirements ensuring that data assets are stored, arranged, managed, and used in systems in support of organizational strategy [Aiken 2010]

• Most organizations have data assets that are not supportive of strategies

• Big question:– How can organizations more

effectively use their information architectures to support strategy implementation?

56

Foundational Practice: Data Architecture

Copyright 2013 by Data Blueprint

• Does your current architecture for BI and analytics support Big Data?

• Are you getting enough value out of your current architecture?

• Can you easily integrate and share information across your organization?

• Do you struggle to extract the value from your data because it is too cumbersome to navigate and access?

• Are you confident your data is organized to meet the needs of your business?

57

Data Architecture Considerations

Copyright 2013 by Data Blueprint

• A data-centric organization requires unified data

• Integrating data across organizational silos creates new insights

• It is also the biggest challenge

• Big Data techniques can be used to complement existing integration efforts

58

Technical Practice: Data Integration

Allowing connections between RDBMS and NoSQL data is beneficial

Examples:1. Invoices2. Passports3. Stock shelving

Copyright 2013 by Data Blueprint

Integration Data Vault 2.0 with Big Data

59

Copyright 2013 by Data Blueprint

• The complexity of your data integration challenge depends on the questions you’re trying to answer

• Integration requirements for Big Data are dependent on the types of questions you’re asking: – Integration here may be more fuzzy than

discrete– Integration is domain-based (based on

time, customer concept, geographic distribution)

• Those requirements should evolve from your strategy

60

Data Integration Considerations

Copyright 2013 by Data Blueprint

• Quality is driven by fit for purpose considerations

• Big Data quality is different:– Basic– Availability– Soft-state– Eventual consistency

• Directional accuracy is the goal• Focus on your most important data

assets and ensure our solutions address the root cause of any quality issues – so that your data is correct when it is first created

• Experience has shown that organizations can never get in front of their data quality issues if they only use the ‘find-and-fix’ approach

61

Technical Practice: Data Quality

Copyright 2013 by Data Blueprint

• Big Data is trying to be predictive

• What are the questions you are trying to answer?– What level of accuracy are you

looking for?– What confidence levels?– Example: Do I need to know

exactly what the customer is going to buy or do I just need to know the range of products he/she is going to choose from?

62

Data Quality Considerations

Copyright 2013 by Data Blueprint

Myth #8: Bigger Data is Better

Fact:• Better to have less data of good

quality than more poor quality big data

• Analysis to reduce variables and increase manageability, otherwise Big Data = Quantity over Quality

• Beware of Shiny Object Syndrome– What problem are we trying to solve?– The solution needs to fit the problem

• Big Data may not be your answer, it may be your problem

• Investments in foundational and technical approaches result in better outcomes for Big Data

63

Copyright 2013 by Data Blueprint

• Do you want to measure critical operational process performance?

• No one data platform can answer all your questions. This is commonly misunderstood and often leads to very expensive, bloated and ineffective data platforms.

• Understanding the questions that need to be asked and how to build the right data platform or how to optimize an existing one

64

Technical Practice: Data Platforms

Copyright 2013 by Data Blueprint

The Big Data Landscape

65

Copyright Dave Feinleib, bigdatalandscape.com

Copyright 2013 by Data Blueprint

• Commonalities between most big data stacks with file storage, columnar store, querying engine, etc.

• Big data stack generally looks the same until you get into appliances – Algorithms are built into appliance

themselves, e.g. Netezza, Teradata, etc.)

• Ask these questions:– Do you want insights on your

customer’s behavior?– Do you need real-time customer

transactional information?– Do you need historical data or just

access to the latest transactions?– Where do you go to find the single

version of the truth about your customers?

66

Data Platforms Considerations

Copyright 2013 by Data Blueprint

Take Aways-Design Principles: Foundational & Technical

• Foundational data management principles still apply

• Beware of SOS (Shiny Object Syndrome)

• You must have a data strategy before you can have a Big Data strategy

• Fact: You don’t need Big Data to gain insights

• Big Data integration requirements evolve from your strategy

• Fact: Bigger Data is not always better

67

Copyright 2013 by Data Blueprint

Outline

• Big Data Context: Why the Big Deal about Big Data?

• Big Data Challenges: Historical Perspective

• Big Data Challenges: Today• Big Data Approach: Crawl, Walk, Run• Design Principles:

Foundational & Technical• Take Aways and Q&A

68

Copyright 2013 by Data Blueprint

Take Aways: In Summary• Big data techniques are innovative

but “Big Data” is not• Big Data characteristics: 6 Vs

– Volume, Velocity, Variety, Variability, Vitality, Virtual

• Approach: Crawl-Walk-Run• Big Data challenges require solutions

that are based on foundational and technical data management practices

• Beware of SOS (Shiny Object Syndrome):– Spend wisely and strategically– Big Data is not going to solve all your

problems

69

Copyright 2013 by Data Blueprint

References • The Human Face of Big Data, Rick Smolan & Jennifer Erwitt, First Edition edition (November

20, 2012)• McKinsey: Big Data: The next frontier for innovation, competition and productivity

(http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation?p=1)

• The Washington Post: Five Myths about Big Data (http://articles.washingtonpost.com/2013-08-16/opinions/41416362_1_big-data-data-crunching-marketing-analytics)

• Gartner: Gartner’s 2013 Hype Cycle for Emerging Technologies Maps Out Evolving Relationship Between Humans and Machines (http://www.gartner.com/newsroom/id/2575515)

• The New York Times | Opinion Pages: What Data Can’t Do (http://www.nytimes.com/2013/02/19/opinion/brooks-what-data-cant-do.html?_r=1&)

• CIO.com: Five Steps for How to Better Manage Your Data (http://www.cio.com.au/article/429681/five_steps_how_better_manage_your_data/)

• Business Insider: Enterprises Aren’t Spending Wildly on ‘Big Data’ But Don’t Know If It’s Worth It Yet (http://www.businessinsider.com/enterprise-big-data-spending-2012-11#ixzz2cdT8shhe)

• Inc.com: Big Data, Big Money: IT Industry to Increase Spending (http://www.inc.com/kathleen-kim/big-data-spending-to-increase-for-it-industry.html)

• Forbes: Big Data Boosts Customer Loyalty. No, Really. (http://www.forbes.com/sites/xerox/2013/09/27/big-data-boosts-customer-loyalty-no-really/)

70

Copyright 2013 by Data Blueprint

Questions?

It’s your turn! Use the chat feature or Twitter (#dataed) to submit

your questions to Peter now.

71

+ =

Data-Centric Strategy & Roadmap February 11, 2014 @ 2:00 PM ET/11:00 AM PT

Emerging Trends in Data JobsMarch 13, 2014 @ 2:00 PM ET/11:00 AM PT

Sign up here: www.datablueprint.com/webinar-schedule or www.dataversity.net

Copyright 2013 by Data Blueprint

Upcoming Events

72

10124 W. Broad Street, Suite CGlen Allen, Virginia 23060804.521.4056