big data and the data quality imperative

25
TRILLIUM SOFTWARE 2013 CUSTOMER CONFERENCE (Who’s Afraid of…) The Big Bad Data Wolf? The Big Bad Data Challenge – Big Data & the Data Quality Imperative Presented By: Nigel Turner VP Information Management Strategy 1

Upload: trillium-software

Post on 22-Apr-2015

1.805 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Big data and the data quality imperative

TRILLIUM SOFTWARE 2013 CUSTOMER CONFERENCE

(Who’s Afraid of…)The Big Bad Data Wolf?

The Big Bad Data Challenge – Big Data & theData Quality Imperative

Presented By:Nigel TurnerVP Information

Management Strategy1

Page 2: Big data and the data quality imperative

The tale of the Three Little Pigs

2© Copyright 2013, Trillium Software, Inc. All rights reserved.

Page 3: Big data and the data quality imperative

Big Data – what is it?

Set of new concepts, practices & technologies tomanage & exploit digital data

Can be defined as: “Data that exceeds the processing capability of

conventional database systems. The data is too big,moves too fast, or doesn’t fit the strictures of yourdatabase architecture”(Source: Ed Dumbill – O’Reilly Community)

Its key premise is that all data has potentialvalue if it can be collected, analysed and used togenerate actionable insight

3

33© Copyright 2013, Trillium Software, Inc. All rights reserved.

Page 4: Big data and the data quality imperative

Where does Big Data come from?

SOCIALMEDIA &SOCIAL

NETWORKS

MACHINEGENERATED

WIDELY KNOWNSOURCES

4

44

© Copyright 2013, Trillium Software, Inc. All rights reserved.

Page 5: Big data and the data quality imperative

What’s different aboutBig Data?

New technologies which enable distributed & highlyscalable MPP (Massively Parallel Processing), e.g. Apache Hadoop MapReduce NoSQL databases

Strong emphasis on analytical approaches Emergence of “data science” Predictive Analytics Data Mining

The “democratisation” of data Data made available to all (cf Cloud Computing) Business and not IT led BI

5

Page 6: Big data and the data quality imperative

Big Data & Data Quality – parallelworlds?

6

BIGDATA

DATAQUALITY

© Copyright 2013, Trillium Software, Inc. All rights reserved.

Page 7: Big data and the data quality imperative

Parallel worlds… or are they (1)?

7

Shared with 100,000+others and counting…

Page 8: Big data and the data quality imperative

Parallel worlds… or are they (2)?

8

“ I spend the vast majority of my time cleaningdata systems…cleaning and preparingdata sets makes everything I do better… it’s the highest value activity I do”

Josh WillsSenior Director of Data ScienceCloudera(From “Training a new generation ofData Scientists” – Cloudera video)

Page 9: Big data and the data quality imperative

When Big Data & Data Qualityworlds collide…

9

Big Data willexpose Data Quality

shortcomings

Poor Data Qualitywill undermine thevalue of Big Data

investments

Page 10: Big data and the data quality imperative

Big Data – building on solidfoundations

BIG DATA / ANALYTICS

DATA QUALITY FOUNDATION

10

Page 11: Big data and the data quality imperative

The 3Vs and the DQ challenge

• Exponential growth of data – predicted 40-60% perannum

• 2.5 quintillion bytes of data are created every day• 90% of all digital data created in the last two years

• Data generated more varied and complex than before:– Text, Audio, Images, Machine Generated etc.

• Much of this data is semi-structured or unstructured• Traditional IT techniques ill equipped to process &

analyse it

• Data often generated in real time• Analysis and response needs to be rapid, often also

real time• Traditional BI / DW environments cannot cope – new

approaches are needed

11

11

Page 12: Big data and the data quality imperative

Big Data –Foundations of Success

Identifying the right data to solve the businessproblem or opportunity

The ability to integrate & match varied data frommultiple data sources structured, semi-structured, unstructured

Building the right IT infrastructure to support BigData applications

Having the right capabilities & skills to exploitthe data

12

12

Page 13: Big data and the data quality imperative

Big Data – some verticalapplications

Retail: using point of sale & social media data tosupplement & enrich traditional CRM / Marketing data

Insurance & Banking: fraud detection Health: holistic patient analysis Utilities: consumption peaks & troughs & capacity

planning Telcos: call routing optimisation & customer churn Manufacturing: predictive fault identification & supply

chain optimisation Research: particle analysis, genomics etc.

13

Page 14: Big data and the data quality imperative

Example Big Data benefit:The Open Big Data Cloud

14

SOURCE: LINKED OPEN DATA (LOD) COMMUNITY

Page 15: Big data and the data quality imperative

Big Data in practice - Volvo

Every Volvo vehicle has hundreds ofmicroprocessors / sensors

Data generated used within the car itselfbut also captured for analysis by Volvoand its dealers

All data is loaded into a centralisedanalysis hub & integrated with CRM,dealership, product & social network data

Used to optimise design & manufacturing,enhance customer interaction, improvesafety & act on customer feedback

15

Page 16: Big data and the data quality imperative

Big Data – Barriers & Pitfalls

The sheer volume of data – what’s worth using? Data extraction challenges The ability to match data from disparate sources

/ formats / media The time taken to integrate new data sources The risks of mismatching and incorrect

identification of individuals Legal & regulatory pitfalls

Security concerns – corporate & individual Lack of skills & expertise

16

16

Page 17: Big data and the data quality imperative

Big Data – the data integrationchallenge

SOCIALMEDIA

SENSORS

OPENDATA

EMAIL

MOBILESEXTE

RNAL

DAT

A SO

URC

ES

INTE

RNAL

DAT

A SO

URC

ES

CRM

BILLING

OPS

SALES

PRODS

ANALYTICS PLATFORM 1

ANALYTICS PLATFORM 2

ANALYTICS PLATFORM 3

ANALYTICS PLATFORM n

ACTIONABLE INSIGHT & KNOWLEDGE17

Page 18: Big data and the data quality imperative

Big Data – the Data QualityImperative (1)

Need to profile external and internal data sources Need to classify data to define what data really

matters Need to assure the quality of internal (and some

external) data sources for accuracy, completeness,consistency

Need to define & apply business rules & metadatamanagement to how the data will be defined andused

Need for a data governance framework to ensureconsistency & control

18

Page 19: Big data and the data quality imperative

Big Data – the Data QualityImperative (2)

Need processes & tools to enable: Source data profiling Data integration Data parsing Data standardisation Business rule creation & management Metadata management & a shared business / IT glossary Data de-duplication Data normalisation Data matching Data enrichment Data audit

Many of these functions must be capable ofbeing carried out in real time with zero lag

19

Page 20: Big data and the data quality imperative

Big Data – DQ as the key enabler

SOCIALMEDIA

SENSORS

OPENDATA

EMAIL

EXTE

RNAL

DAT

A SO

URC

ES

INTE

RNAL

DAT

A SO

URC

ES

CRM

BILLING

OPS

SALES

PRODS

ANALYTICS PLATFORM 1

ANALYTICS PLATFORM 2

ANALYTICS PLATFORM 3

ANALYTICS PLATFORM n

ACTIONABLE INSIGHT & KNOWLEDGE

PROFILEPARSE

STANDARDISEMATCHENRICH

DATA QUALITY PLATFORM

PROFILEPARSE

STANDARDISEMATCHENRICH

MOBILES

20

Page 21: Big data and the data quality imperative

Big Data – some algorithms

1. BIG DATA + POOR DATA QUALITY = BIGPROBLEMS

2. DATA DEMOCRITISATION – DATA GOVERNANCE =ANARCHY

3. DATA MASH UPS – DATA QUALITY = DATA MESS

4. BIG DATA ANALYTICS + POOR DQ = WRONGRESULTS

5. BIG DATA – DATA ASSURANCE = JAIL

6. 3V + DATA QUALITY = 4V (VALIDITY)

21

Page 22: Big data and the data quality imperative

Big Data & Data Quality –summary

• Big Data will depend ondata quality to reap itsclaimed benefits – theGIGO truism

• The democratization ofdata will expose poorDQ

• The need for DataGovernance increases asdata becomes moreaccessible

• Data skills will becomemore valued for ‘datascience’

• Big Data will increasethe 3Vs of data

• Control of data becomesmore difficult – scopeand variety of useincreases

• Data standards &business rules becomemore complex

• Potential legal &regulatory minefield

22

22

Page 23: Big data and the data quality imperative

What action should we take asdata management / DQprofessionals?

Identify and get involved in any current orplanned Big Data initiatives within ourorganisations

Ensure that the Data Quality and DataGovernance implications & imperatives of theseinitiatives are understood

Plan for the new Data Quality and DataGovernance challenges that these trends willpose

23

23

Page 24: Big data and the data quality imperative

So who’s afraid of the Big BadData Wolf?

24

Page 25: Big data and the data quality imperative

Questions

(Who’s Afraid of…) The Big Bad Data Wolf?The big Bad Data challenge – Big Data &the Data Quality imperative

25