big data & analytics: moving beyond hype to insight

29
Big Data: Beyond Hype Srinath Perera VP Research WSO2, Apache Member (@srinath_perera) [email protected]

Upload: srinath-perera

Post on 08-Jan-2017

817 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Big Data & Analytics: Moving Beyond Hype to Insight

Big Data: Beyond HypeSrinath PereraVP Research WSO2, Apache Member(@srinath_perera) [email protected]

Page 2: Big Data & Analytics: Moving Beyond Hype to Insight

2

OPER

ATIO

NAL

MAN

AGER

DA

SHBO

ARD

Page 3: Big Data & Analytics: Moving Beyond Hype to Insight

Big Data Technology Works? Understand

how to use itUnderstand how to run it in production

Search +1 (Lucene) +1 +1

NoSQL 0+, but DBs are striking back

0 0-

Distributed File Systems

+1 (HDFS) +1 +1

Batch Processing

+1 ( Hadoop, Spark) +1 +1

Realtime Analytics

+1 ( CEP, Storm, Fink) 0- 0-

Predictive Analytics

0 (MLLib, R, Graphlab) 0- 0-

Visualizations +1 (D3) 0- +1

Page 4: Big Data & Analytics: Moving Beyond Hype to Insight

Success Stories• Money Ball ( Baseball drafting) • Nate Silver predicted outcomes in 49 of the 50

states in the 2008 U.S. Presidential election• Cancer detection from Biopsy cells ( Big Data

find 12 patterns while we only knew 9), http://go.ted.com/CseS

• Bristol-Myers Squibb reduced the time it takes to run clinical trial simulations by 98%

• Xerox used big data to reduce the attrition rate in its call centers by 20%.

• Kroger Loyalty programs ( growth in 45 consecutive quarters)

Page 5: Big Data & Analytics: Moving Beyond Hype to Insight

Premise of Big DataIf you collect data about your business, and feed it to a Big Data system, you will find useful insights

that will provide competitive advantage– (e.g. Analysis of data sets can find new correlations to

"spot business trends, prevent diseases, combat crime and so on”. [Wikipedia])Underline assumption is that way we

operate, and organizations are inefficient.

Page 6: Big Data & Analytics: Moving Beyond Hype to Insight

Big Data as a Way to Optimize

• Assumptions: Once you identify your sickness, you are halfway cured

• You must know what is worth Optimizing

premature optimization is the root of all

evil

Page 7: Big Data & Analytics: Moving Beyond Hype to Insight

“Big Data Washing”You can tick yes, but unlikely to make a difference

Page 8: Big Data & Analytics: Moving Beyond Hype to Insight

How to Big Data Wash your System in 24

hours?• Publish collect the data you can with

minimal effort• Do lot of simple aggregations• Figure out what data combinations makes

prettiest pictures • Throw in some machine learning

algorithms, predict something but don’t compare

• Create a cool dashboard and do a cool demo, and say that you are just scratching the surface!!

Page 9: Big Data & Analytics: Moving Beyond Hype to Insight

Are Insights are automatic?• I wish• Only if we have right

data • Only if we look at the

right place • Only if such insights are

there • Only if we found the

insights

Page 10: Big Data & Analytics: Moving Beyond Hype to Insight

What can Big Data do?

• Enterprise Performance Management • Daily Operational Controls and Reports • Operational Management ( Logistics,

Decision Support)• Social and Community Intelligence (e.g.

Sentiments, find champions)• Sales and Marketing (Targeting, Channel

analytics, SEO analytics, funnel analytics)• Customer Service (Segmentation,

Recommendations, and Churn Prediction)• Preventative maintenance• Fraud detection

Page 11: Big Data & Analytics: Moving Beyond Hype to Insight

Big Data Tools • KPIs• Analytics ( Batch, Real-

time, Interactive, Predicative)

• Visualizations, Dashboards • Alerts • Sensors ( and other data

collection plumbing)

Page 12: Big Data & Analytics: Moving Beyond Hype to Insight

KPIs and their Role• KPIs (Key Performance Indicators) are numbers

that can give you an idea about performance of something – E.g. Countries have them ( GDP, Per Capita

Income, HDI index etc) • Examples

– Company Revenue – Lifetime value of a customer – Revenue per Square foot ( in retail industry)

• Idea is to define them and monitor them. But defining them is hard work!!

• Often one indicator tells half the story, and you need several that cover different angles

Page 13: Big Data & Analytics: Moving Beyond Hype to Insight

What is a Dashboard?• Think a car dashboard • It give you idea about

overall system in a glance • It is boring when all is

good, and grab attention when something is wrong

• Often have support for drill down and find root cause

Page 14: Big Data & Analytics: Moving Beyond Hype to Insight

Alerts• Notifications ( sent via email,

SMS, Pager etc.) • Goal is to give you peace of mind

( not having to check all the time)

• They should be specific • They should be infrequent • They should have very low false

positives • Let users control sensitivity

Page 15: Big Data & Analytics: Moving Beyond Hype to Insight

You need a Human in the Loop

Systems that digest your data, take decisions, and run the system by itself, they can only be used with limited applications Yet(e.g. Algorithmic trading, Showing Advertisements, or War)

Page 16: Big Data & Analytics: Moving Beyond Hype to Insight

Decisions, Actions, and Drill down

• Operators need to see the data in context, and drill down into detail to understand the root cause

• Typical model is to start from an alert or dashboard, see data in context (other transactions around same time, what does same user did before and after etc.) and then let the user drill down

• For example, http://wso2.com/videos/wso2-fraud-detection-solution

Page 17: Big Data & Analytics: Moving Beyond Hype to Insight

Role of Realtime Analytics• Use to detect something very fast!

Within few milliseconds to few seconds.

• Very powerful in detecting conditions over time (e.g. ball possession in a football game)

• Alerts are done through Realtime analytics

Page 18: Big Data & Analytics: Moving Beyond Hype to Insight

Role of Predictive Analytics

• Predictive analytics learn a problem from examples– E.g. learn to drive

• Two main cases are – Predicting next value or values (e.g. electricity load

prediction) – Predicting category (e.g. SPAM or not for a email)

• Used to grouping, to generate alerts, or to augment visualizations

• Need lot of expertise to create correct models and use them.

Page 19: Big Data & Analytics: Moving Beyond Hype to Insight

Big Data Pipeline

Page 20: Big Data & Analytics: Moving Beyond Hype to Insight

Doing it Once is Cheap, Setting up a system to do it continuously is Expensive

Do your scenarios ad-hoc first (hire some expertise if you must), before setting up a system that does it every day

Page 21: Big Data & Analytics: Moving Beyond Hype to Insight

Templates for Big Data Projects• Use existing Dataset: I already have a data set,

and list of potential problems, and figure out how to fix it.

• **Fix a known Problem: Find a problem, collect data about it, analyze, visualize, build a model and improve. Then build a dashboard to monitor.

• Improve Overall Process: Instrument processes ( start with most crucial), find KPIs, analyze and visualize the processes, and improve

• Find Correlations: Collect all available data, data mine the data or visualize, find interesting correlations.

Page 22: Big Data & Analytics: Moving Beyond Hype to Insight

Actionable Insights are the Key!!

• Insights are about significant event that warrant attention ( e.g. more than two technical issues would lead customer to churn)

• Decision makers can identify the context associated with the insight ( e.g. operators can see though history of customers who qualify)

• Decision makers can do something about the insight ( e.g. can work with customers to reassures and fix)

Page 23: Big Data & Analytics: Moving Beyond Hype to Insight

Think Deeply about Who will use you’re the system and How?

Page 24: Big Data & Analytics: Moving Beyond Hype to Insight

Challenges: Keeping the System Running

● Incorporate Continuous data o Integrate data continuously o We get feedback about effectiveness

of decisions (e.g. Accuracy of Fraud)● Track and update models

o Trends changeo Generate models in batch mode and

update

Page 25: Big Data & Analytics: Moving Beyond Hype to Insight

Challenges: Causality• Correlation does not imply Causality!! ( send a book home

example [1])• Causality

– do repeat experiment with identical test – If CAN’T do a randomized test (A/B test)– With Big data we cannot do either

• Option 1: We can act on correlation if we can verify the guess or if correctness is not critical (Start Investigation, Check for a disease, Marketing )

• Option 2: We verify correlations using A/B testing or propensity analysis [1] http://www.freakonomics.com/2008/12/10/the-blagojevich-upside/[2] https://hbr.org/2014/03/when-to-act-on-a-correlation-and-when-not-to/

Page 26: Big Data & Analytics: Moving Beyond Hype to Insight

Curious Case of Missing Data

http://www.fastcodesign.com/1671172/how-a-story-from-world-war-ii-shapes-facebook-today, Pic from http://www.phibetaiota.net/2011/09/defdog-the-importance-of-selection-bias-in-statistics/

• WW II, Returned Aircrafts and data on where they were hit?

• How would you add Armour?

Page 27: Big Data & Analytics: Moving Beyond Hype to Insight

Challenges: Taking Decisions (Context)

Page 28: Big Data & Analytics: Moving Beyond Hype to Insight

Summary• Big Data provide a way to Optimize • Tools

– KPIs– Analytics ( Batch, Real-time, Interactive, Predicative) – Visualizations, Dashboards – Alerts – Sensors ( and other data collection plumbing)

• Start small • Try out with data sets before setup a system • Find a high impact problem and make it work

end to end • Pay attention to user Experience

Page 29: Big Data & Analytics: Moving Beyond Hype to Insight

Thank You