directions for hadoop innovation, yahoo
DESCRIPTION
BD Hadoop SF 2013TRANSCRIPT
Directions for Hadoop Innovation
Apr 2013
Eric Bax
Hadoop in Online Advertising at Yahoo!
Response Prediction – Clicks and Conversions
Allocation and Pricing -- Guaranteed
Analytics – Marketplace Monitoring
Science – Value of Advertising
2
Marketplace Operations
3
Model ConstructionAuction
ReconciliationAnalytics and Billing
Ad Calls
Auction Log Ad Served
Clicks and Conversions
Response Frequencies
Predict Model
Ad + ResponseROI Evaluation
Online/Offline Sales $
Desiderata
4
Faster Answers
Fewer Computations per Datum
From Analytics to Active Monitoring
From Batch Cycles to Sense and Respond
Faster Turnaround
9/18/20135
Act on the 80% of Data That Arrives Quickly
Then Correct as Late-Landing Data Arrive
Pull for Initial Result; Push for Updates?
Online Updates to Models
9/18/20136
Each day produces Big Data.
Whole history: HUMONGOUS DATA.
Update models based on new data only.
And perhaps exceptions / borderline cases from history.
“Embedded” Computation
9/18/20137
Move Computation Closer to Where Data are Generated
Monitor for Anomalies Where they Occur
(Sometimes) Compress into Sketches before Transmitting Data
Hadoop as Part of Serving vs Isolated Clusters?
Propagate Data Among Logical Neighbors Quickly
Multi-Resolution Approach at Different Time Scales
Challenge: Clustering into Logical Neighborhoods to Fit Problem
Localized / Contextual Computation
9/18/20138
Search Clusters
9/18/20139
Who Clicks?
9/18/201310
Who Doesn’t?
9/18/201311
Hadoop in Five Years
9/18/201312
Will Hadoop grow by adding features / options?
Will it branch: faster, lighter, approximate, embedded versions?
Truly huge version? With approximation / sampling / multi-resolution?
The Right Fit
9/18/201313
Multi-resolution sense and respond.
Details to neighbors, sketches and aggregates globally.
Migrate processes and storage to ingest points or logical neighbors.
Tune system-wide performance through human-machine dialog .