tdwi nyc chapter - tony baer ovum on big data, data quality, and bi convergence

19
© Copyright Ovum. All rights reserved. Ovum is a subsidiary of Informa plc. 1 Big Data and Business Intelligence Must Converge Tony Baer [email protected] March 6, 2013

Upload: fitzgerald-analytics-inc

Post on 07-Dec-2014

956 views

Category:

Business


2 download

DESCRIPTION

Intersecting with Neil Raden's keynote, Ovum Principal Analyst Tony Baer asks, “what does it take to turn the promise of Big Data into tangible results?” Big opportunities to benefit from new technology have come and gone, yet the consistent challenge has been translating new potential into concrete benefits. Mr. Baer shared a practical perspective on making big data manageable by understanding key challenges you must overcome to leverage big data, especially the unique data quality issues the Big Data sources introduce. Mr. Baer also shared his insight that while Business Intelligence and Big Data are viewed and managed separately, in reality "Big Data and Business Intelligence must converge." Big Data needs to be approached with "less of a silo mentality," and so does Business Intelligence.

TRANSCRIPT

Page 1: TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence

© Copyright Ovum. All rights reserved. Ovum is a subsidiary of Informa plc.1

Big Data and Business Intelligence Must Converge

Tony Baer

[email protected]

March 6, 2013

Page 2: TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence

© Copyright Ovum. All rights reserved. Ovum is an Informa business.2

Challenges traditional data stewardship practice

Privacy – is all the world a stage?

Limits to data lifecycle?

Data quality: the big, the bad, the ugly – and it all might be good!

Agenda

Page 3: TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence

© Copyright Ovum. All rights reserved. Ovum is an Informa business.3

Data stewardship challenges –What’s old is new

Remember?

Back to undifferentiated ‘gobblobs’ of data

Programmatic access reigns

File systems, not (always) tables

Batch is back

But…

Volume, variety, velocity, and where’s the value??

Just because you can, should you?

10.102.8.152 - - [05/Nov/2003:00:19:54 -0500] "GET /inventory/index.jsp HTTP/1.1" 200 4028 "http://www.mycompany.com/index.jsp" "Mozilla/4.08 [en] (Win98; I ;Nav)"

192.168.114.201, -, 03/20/01, 7:55:20, W3SVC2, SALES1, 172.21.13.45, 4502, 163, 3223, 200, 0, GET,/DeptLogo.gif, -, 172.16.255.255, anonymous, 03/20/01, 23:58:11, MSFTPSVC, SALES1, 172.16.255.255, 60, 275, 0, 0,

if index(tempvalue,'?') then tempvalue=scan(tempvalue,1,'?');

else if index(tempvalue,'&')>1 then tempvalue=scan(tempvalue,1,'&');

Page 4: TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence

© Copyright Ovum. All rights reserved. Ovum is an Informa business.4

Data stewardship questions for Big Data

Can we, should we “control” this data?

Are there limits to how much we should know?

Can we just keep piling up data forever?

Can we cleanse terabytes of data?

Do we still need “good” data?

Page 5: TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence

© Copyright Ovum. All rights reserved. Ovum is an Informa business.5

Challenges traditional data stewardship practice

Privacy – is all the world a stage?

Limits to data lifecycle?

Data quality: the big, the bad, the ugly – and it all might be good!

Agenda

Page 6: TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence

© Copyright Ovum. All rights reserved. Ovum is an Informa business.6

Privacy –the more things change…

“You have zero privacy anyway…. Get over it”

-- Scott McNealy, 1999

Facebook does not actually delete images… but instead merely removes the links – a fix “is in sight”

-- ZDNet, 2/6/12

Facebook agrees to 20 years of federal privacy audits

-- NY Times, 11/29/11

Page 7: TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence

© Copyright Ovum. All rights reserved. Ovum is an Informa business.7

What privacy?

Florida made $63m last year by selling DMV information (name, date of birth, type of vehicle driven) to companies like LexusNexus & Shadow Soft.

-- Terence Craig & Mary LudloffPrivacy and Big Data

Florida made $63m last year by selling DMV information (name, date of birth, type of vehicle driven) to companies like LexusNexus & Shadow Soft.

-- Terence Craig & Mary LudloffPrivacy and Big Data(O’Reilly Media, 2011)

Page 8: TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence

© Copyright Ovum. All rights reserved. Ovum is an Informa business.8

Big Data privacy 101 –Don’t be creepy

Governance problem first, technology second

Understand the relationship with your customers & business partners

Keep communications in context

Don’t catch your customers by surprise

The law still trying to catch up

How Companies Learn Your Secrets

“My daughter got this in the mail!” he said. “She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?”

-- NY Times 2/16/12

Page 9: TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence

© Copyright Ovum. All rights reserved. Ovum is an Informa business.9

Challenges traditional data stewardship practice

Privacy – is all the world a stage?

Limits to data lifecycle?

Data quality: the big, the bad, the ugly – and it all might be good!

Agenda

Page 10: TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence

© Copyright Ovum. All rights reserved. Ovum is an Informa business.10

Data lifecycle –How long can this go on?

Google, Yahoo, Facebook, etc. don’t deprecate web data

Hadoop designed for economical scale-out

Moore’s Law, declining cost of storage

Is Hadoop Archive the answer?

Is Hadoop the new tape?

Management & skills will be the limit Aerial view of Quincy, WA data ctrs

Page 11: TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence

© Copyright Ovum. All rights reserved. Ovum is an Informa business.11

Challenges traditional data stewardship practice

Privacy – is all the world a stage?

Limits to data lifecycle?

Data quality: the big, the bad, the ugly – and it all might be good!

Agenda

Page 12: TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence

© Copyright Ovum. All rights reserved. Ovum is an Informa business.12

Data Quality & Hadoop –Big Quality Questions

Can we cleanse terabytes of data?

Do we still need “good” data?

Are there new approaches to cleansing Big Data?

Page 13: TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence

© Copyright Ovum. All rights reserved. Ovum is an Informa business.13

Framing the issue

“Garbage in, garbage out,’ but DW forced the issue

Traditional approaches

Profiling, cleansing, MDM

DW vs. Hadoop data quality challenges

Known data sets & known criteria vs. vaguely known Bounded vs. less bounded tasks

Limitations of MapReduce*

Cleansing & transformation within a single Map operation;

Profiling & matching of unstructured data Matching of data in operations without inter-process

communications

*Source: David Loshin, "Hadoop and Data Quality, Data Integration, Data Analysis" at http://www.dataroundtable.com/?p=8841

Page 14: TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence

© Copyright Ovum. All rights reserved. Ovum is an Informa business.14

Is data quality necessary for Hadoop?

The App

How mission-critical?

Regulatory compliance impacts?

What degree of business impact?

The Data

The 4V’s (volume, variety, velocity, value) determine what approaches to quality are feasible

Page 15: TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence

© Copyright Ovum. All rights reserved. Ovum is an Informa business.15

Examples

Web ad placement optimization

Counter-party risk management for capital markets

Customer sentiment analysis

Managing smart utility grids or urban infrastructure

Page 16: TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence

© Copyright Ovum. All rights reserved. Ovum is an Informa business.16

Bad data may be good

Sensory data

Outlier or drift?

Time to recalibrate devices?

Time to perform preventive maintenance?

Are new/unaccounted environmental factors skewing readings?

Human-readable data

Flawed concept of reality?

Flawed assumptions on data meaning?

Changes producing ‘new norm’

Page 17: TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence

© Copyright Ovum. All rights reserved. Ovum is an Informa business.17

Big Data quality in Hadoop –Emergent approaches

Crowdsourcing data –

Collect data far & wide from as many diverse sources as possible. Torrents of data overcome the noise.

Comparative trend analysis of incoming streams to dynamically ID the norm or sweet spot of “good” data

Apply data science to “correct the dots”

Don’t go record by record. Statistically analyze the data set in aggregate. Iteratively analyze & re-analyze nature of data, keep analyzing outliers Apply off-the-wall approaches

Enterprise Architectural approach

Semantic (domain) model-driven Apply cleansing logic at run time Critical for sensitive, regulatory-driven apps

Page 18: TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence

© Copyright Ovum. All rights reserved. Ovum is an Informa business.18

Summary

Challenges traditional data stewardship practice

Combination of old & new

Privacy – is all the world a stage?

Best practices, legal requirements still in flux Don’t be creepy!

Limits to data lifecycle?

Few enterprises are Google or Facebook Ability to manage large infrastructure will be major limit

Data quality

Strategy depends on type of app & data set(s) A spectrum of approaches -- from none to classic ETL to aggregate statistical No single silver bullet

Page 19: TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence

© Copyright Ovum. All rights reserved. Ovum is an Informa business.19

Disclaimer

All Rights Reserved.

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the publisher, Ovum (an Informa business).

The facts of this report are believed to be correct at the time of publication but cannot be guaranteed. Please note that the findings, conclusions and recommendations that Ovum delivers will be based on information gathered in good faith from both primary and secondary sources, whose accuracy we are not always in a position to guarantee. As such Ovum can accept no liability whatever for actions taken based on any information that may subsequently prove to be incorrect.