big data and the data quality imperative
DESCRIPTION
TRANSCRIPT
TRILLIUM SOFTWARE 2013 CUSTOMER CONFERENCE
(Who’s Afraid of…)The Big Bad Data Wolf?
The Big Bad Data Challenge – Big Data & theData Quality Imperative
Presented By:Nigel TurnerVP Information
Management Strategy1
The tale of the Three Little Pigs
2© Copyright 2013, Trillium Software, Inc. All rights reserved.
Big Data – what is it?
Set of new concepts, practices & technologies tomanage & exploit digital data
Can be defined as: “Data that exceeds the processing capability of
conventional database systems. The data is too big,moves too fast, or doesn’t fit the strictures of yourdatabase architecture”(Source: Ed Dumbill – O’Reilly Community)
Its key premise is that all data has potentialvalue if it can be collected, analysed and used togenerate actionable insight
3
33© Copyright 2013, Trillium Software, Inc. All rights reserved.
Where does Big Data come from?
SOCIALMEDIA &SOCIAL
NETWORKS
MACHINEGENERATED
WIDELY KNOWNSOURCES
4
44
© Copyright 2013, Trillium Software, Inc. All rights reserved.
What’s different aboutBig Data?
New technologies which enable distributed & highlyscalable MPP (Massively Parallel Processing), e.g. Apache Hadoop MapReduce NoSQL databases
Strong emphasis on analytical approaches Emergence of “data science” Predictive Analytics Data Mining
The “democratisation” of data Data made available to all (cf Cloud Computing) Business and not IT led BI
5
Big Data & Data Quality – parallelworlds?
6
BIGDATA
DATAQUALITY
© Copyright 2013, Trillium Software, Inc. All rights reserved.
Parallel worlds… or are they (1)?
7
Shared with 100,000+others and counting…
Parallel worlds… or are they (2)?
8
“ I spend the vast majority of my time cleaningdata systems…cleaning and preparingdata sets makes everything I do better… it’s the highest value activity I do”
Josh WillsSenior Director of Data ScienceCloudera(From “Training a new generation ofData Scientists” – Cloudera video)
When Big Data & Data Qualityworlds collide…
9
Big Data willexpose Data Quality
shortcomings
Poor Data Qualitywill undermine thevalue of Big Data
investments
Big Data – building on solidfoundations
BIG DATA / ANALYTICS
DATA QUALITY FOUNDATION
10
The 3Vs and the DQ challenge
• Exponential growth of data – predicted 40-60% perannum
• 2.5 quintillion bytes of data are created every day• 90% of all digital data created in the last two years
• Data generated more varied and complex than before:– Text, Audio, Images, Machine Generated etc.
• Much of this data is semi-structured or unstructured• Traditional IT techniques ill equipped to process &
analyse it
• Data often generated in real time• Analysis and response needs to be rapid, often also
real time• Traditional BI / DW environments cannot cope – new
approaches are needed
11
11
Big Data –Foundations of Success
Identifying the right data to solve the businessproblem or opportunity
The ability to integrate & match varied data frommultiple data sources structured, semi-structured, unstructured
Building the right IT infrastructure to support BigData applications
Having the right capabilities & skills to exploitthe data
12
12
Big Data – some verticalapplications
Retail: using point of sale & social media data tosupplement & enrich traditional CRM / Marketing data
Insurance & Banking: fraud detection Health: holistic patient analysis Utilities: consumption peaks & troughs & capacity
planning Telcos: call routing optimisation & customer churn Manufacturing: predictive fault identification & supply
chain optimisation Research: particle analysis, genomics etc.
13
Example Big Data benefit:The Open Big Data Cloud
14
SOURCE: LINKED OPEN DATA (LOD) COMMUNITY
Big Data in practice - Volvo
Every Volvo vehicle has hundreds ofmicroprocessors / sensors
Data generated used within the car itselfbut also captured for analysis by Volvoand its dealers
All data is loaded into a centralisedanalysis hub & integrated with CRM,dealership, product & social network data
Used to optimise design & manufacturing,enhance customer interaction, improvesafety & act on customer feedback
15
Big Data – Barriers & Pitfalls
The sheer volume of data – what’s worth using? Data extraction challenges The ability to match data from disparate sources
/ formats / media The time taken to integrate new data sources The risks of mismatching and incorrect
identification of individuals Legal & regulatory pitfalls
Security concerns – corporate & individual Lack of skills & expertise
16
16
Big Data – the data integrationchallenge
SOCIALMEDIA
SENSORS
OPENDATA
MOBILESEXTE
RNAL
DAT
A SO
URC
ES
INTE
RNAL
DAT
A SO
URC
ES
CRM
BILLING
OPS
SALES
PRODS
ANALYTICS PLATFORM 1
ANALYTICS PLATFORM 2
ANALYTICS PLATFORM 3
ANALYTICS PLATFORM n
ACTIONABLE INSIGHT & KNOWLEDGE17
Big Data – the Data QualityImperative (1)
Need to profile external and internal data sources Need to classify data to define what data really
matters Need to assure the quality of internal (and some
external) data sources for accuracy, completeness,consistency
Need to define & apply business rules & metadatamanagement to how the data will be defined andused
Need for a data governance framework to ensureconsistency & control
18
Big Data – the Data QualityImperative (2)
Need processes & tools to enable: Source data profiling Data integration Data parsing Data standardisation Business rule creation & management Metadata management & a shared business / IT glossary Data de-duplication Data normalisation Data matching Data enrichment Data audit
Many of these functions must be capable ofbeing carried out in real time with zero lag
19
Big Data – DQ as the key enabler
SOCIALMEDIA
SENSORS
OPENDATA
EXTE
RNAL
DAT
A SO
URC
ES
INTE
RNAL
DAT
A SO
URC
ES
CRM
BILLING
OPS
SALES
PRODS
ANALYTICS PLATFORM 1
ANALYTICS PLATFORM 2
ANALYTICS PLATFORM 3
ANALYTICS PLATFORM n
ACTIONABLE INSIGHT & KNOWLEDGE
PROFILEPARSE
STANDARDISEMATCHENRICH
DATA QUALITY PLATFORM
PROFILEPARSE
STANDARDISEMATCHENRICH
MOBILES
20
Big Data – some algorithms
1. BIG DATA + POOR DATA QUALITY = BIGPROBLEMS
2. DATA DEMOCRITISATION – DATA GOVERNANCE =ANARCHY
3. DATA MASH UPS – DATA QUALITY = DATA MESS
4. BIG DATA ANALYTICS + POOR DQ = WRONGRESULTS
5. BIG DATA – DATA ASSURANCE = JAIL
6. 3V + DATA QUALITY = 4V (VALIDITY)
21
Big Data & Data Quality –summary
• Big Data will depend ondata quality to reap itsclaimed benefits – theGIGO truism
• The democratization ofdata will expose poorDQ
• The need for DataGovernance increases asdata becomes moreaccessible
• Data skills will becomemore valued for ‘datascience’
• Big Data will increasethe 3Vs of data
• Control of data becomesmore difficult – scopeand variety of useincreases
• Data standards &business rules becomemore complex
• Potential legal ®ulatory minefield
22
22
What action should we take asdata management / DQprofessionals?
Identify and get involved in any current orplanned Big Data initiatives within ourorganisations
Ensure that the Data Quality and DataGovernance implications & imperatives of theseinitiatives are understood
Plan for the new Data Quality and DataGovernance challenges that these trends willpose
23
23
So who’s afraid of the Big BadData Wolf?
24
Questions
(Who’s Afraid of…) The Big Bad Data Wolf?The big Bad Data challenge – Big Data &the Data Quality imperative
25