big bang to new economy - gateway analytics network 2015
TRANSCRIPT
FROM THE BIG BANG TO THE NEW ECONOMY, A JOURNEY IN MAKING SENSE OF BIG DATAPatrick DeglonDirector of Engineering, Analytics Area Tech [email protected]/pdeglon
2FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
from the Big Bang…
Image: CERN
13.8 billions years
5 billions years
1 billion years
300,000 years
2 min
0.0000000001 sec
10-34 sec = 0.0…001 sec (34 zeros)
10-43 sec = 0.0…001 sec (43 zeros)
During 1996-2002, worked at CERN (the European Laboratory for Particle Physics) for my MS and PhD at the University of Geneva
4
Geneva Switzerland
Image: CERN
17 miles underground tunnelfor the LEP & LHC accelerator
Source: CERN
Mont Blanc
5Image: CERN Source: CERN
6
Tape robotSource: CERN
PAW – Physics Analysis WorkstationSource: Wikipedia
Data collection & analysis was done in Fortran. Advance
analysis/statistics was done through PAW. [1996-2002]
FROM THE BIG BANG TO NEW ECONOMY, A JOURNEY IN MAKING SENSE OF BIG DATA
Example of a particle collision
7FROM THE BIG BANG TO NEW ECONOMY,
A JOURNEY IN MAKING SENSE OF BIG DATA
Solving the puzzle… which particles go together?
8
?
A
B
CD
1. AB + CD?2. AC + BD?3. AD + BC?
FROM THE BIG BANG TO NEW ECONOMY, A JOURNEY IN MAKING SENSE OF BIG DATA
Solution: Big Data infrastructure enables large scale computational such as combine all possibilities (cross-product)
9
Statistical Noise
Signal(particle resonance)
Source: http://www.atlas.ch/news/2011/ATLAS-discovers-its-first-new-particle.html
Schematic View CERN Example(discovery of a new particle bb)
FROM THE BIG BANG TO NEW ECONOMY, A JOURNEY IN MAKING SENSE OF BIG DATA
10
Size of the electron?
01
23
45
6
R < 5.1 x 10-19 m ***
*** Patrick Deglon, Etude de la diffusion Bhabha avec le détecteur L3 au LEP, Th. phys. Genève, 2002; Sc. 3332
FROM THE BIG BANG TO NEW ECONOMY, A JOURNEY IN MAKING SENSE OF BIG DATA
11
Extra dimension?
MS > 1.1 TeV ***
e-
e+
e+
e-
our universe in 4 dimensions
extra dimension
*** Patrick Deglon, Etude de la diffusion Bhabha avec le détecteur L3 au LEP, Th. phys. Genève, 2002; Sc. 3332
graviton
FROM THE BIG BANG TO NEW ECONOMY, A JOURNEY IN MAKING SENSE OF BIG DATA
12FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
… to the New Economy
13FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
… to the New Economy
Imagine a world...
… where information is ubiquitous (anytime & anywhere)
15FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
… to the New Economy
… where buildings can recognize your presence
16FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
… to the New Economy… where even streetlights are connected to Internet
17FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
… to the New EconomyWelcome to a connected world
#1 KPI reporting & Impact Measurement
#2 Marketing
#3 The cost of Big Data
#4 Human Resources
Examples
19
Example #1
KPI reporting & Impact Measurement
FROM THE BIG BANG TO NEW ECONOMY, A JOURNEY IN MAKING SENSE OF BIG DATA
So, how is the business doing?
20
Key Performance Indicators
Motorola Factory# Shipments
Distribution Channels# Sales
First Usage# Activations
Simplified Business Flow
FROM THE BIG BANG TO NEW ECONOMY, A JOURNEY IN MAKING SENSE OF BIG DATA
Google BigQuery
MotorolaCloud
Insights
...
21
Google Spreadsheet as a Reporting Engine
FROM THE BIG BANG TO NEW ECONOMY, A JOURNEY IN MAKING SENSE OF BIG DATA
22
Google Spreadsheet as a Reporting Engine
Spreadsheet
Google Big Query
HTML body in sheet
Google Mail
GoogleApp Script
Google Scheduler
Google Charts
FROM THE BIG BANG TO NEW ECONOMY, A JOURNEY IN MAKING SENSE OF BIG DATA
23FROM THE BIG BANG TO NEW ECONOMY,
A JOURNEY IN MAKING SENSE OF BIG DATA
Big Querydatasets (SQL)
Google AppEngine
Google Analytics
data
InstrumentationApp Engine
Tableaureports
ETLApp Engine
Users, ReportsDatastore
Goo
gle
Driv
e
InternalUsers
Machine Learned
Models
gCha
rt +
D3
+ Ta
blea
u AP
I
Enabling Self-Service Analytics
24
42…so what?
FROM THE BIG BANG TO NEW ECONOMY, A JOURNEY IN MAKING SENSE OF BIG DATA
Answer to the Ultimate Question of Life, The Universe, and Everything
25FROM THE BIG BANG TO NEW ECONOMY,
A JOURNEY IN MAKING SENSE OF BIG DATA
Relativity• Versus time (WoW, YoY, …)• Versus plan (target, budget, forecast, …)• Versus other products, customers, markets• Versus competition• Versus internal/external/social events• Versus trend in other metrics• …
26FROM THE BIG BANG TO NEW ECONOMY,
A JOURNEY IN MAKING SENSE OF BIG DATA
Data Issue
time (day)
# Active Users
Normal Band
Number of Active Users using their camera in US
Root Causes
● Some files don’t get loaded properly in BigQuery, creating gaps in user count.
● The instrumentation changed on the device● Customer behavior
Business Issue
# System Restarts
Number of System Restarts
Root Cause
A buggy Android app doesn’t handle the timezone change properly, crashing the devices.
Exception Reports (Illustrative Examples)
27FROM THE BIG BANG TO NEW ECONOMY,
A JOURNEY IN MAKING SENSE OF BIG DATA
1. Define a multi-dimensional cubes with real data. For example: Product, Market, # Users taking a picture
2. Each cell becomes then a time series
3. Clean the data (remove seasonality, weekday cycle and any other know perturbation)
* Note: (Bayesian likelihood with knowledge base)
4. Fit trend and establish volatility band (2 std deviations)
5. Measure variance versus prediction for each cell (e.g. market/product/metric) and trigger an exception if outside band
6. Collect all exceptions into a matrix and apply fuzzy logic* to propose potential root causes
mar
kets
BR
productsmetr
ics
Approach
Measuring impact of initiatives
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
Aug 1st Sep 1st Oct 1st
Number of listings
2012
2011B
A
C
Pre/Post analysis illustrative example (Simulation)
D
Impact of the initiative
pre post
Initiativelaunched
• Used to measure the impact of an initiative in a full market or a market segment
• Randomized Test/Control group methodology is a golden standard in research
A/B test illustrative example (Simulation)
0
50
100
150
200
250
300
350
400
450
Aug 1st Sep 1st Oct 1st
Number of purchases
Impact of the initiative
Initiativelaunched
control group
test group
28FROM THE BIG BANG TO NEW ECONOMY,
A JOURNEY IN MAKING SENSE OF BIG DATA
29
Campaign MeasurementCampaigns
• Campaign Id• Campaign Name• Time range• Set of Countries• Set of Products
KPI
• Date• Country• Product• KPI[]
X
Trend
• Campaign Id• Date• Total of KPI[]
Summary
• Campaign Id• Campaign Name• Impact Measurement[]• Statistical Error[]
Time Series
Analysis
FROM THE BIG BANG TO NEW ECONOMY, A JOURNEY IN MAKING SENSE OF BIG DATA
30
Example of one campaign cell measurement
Campaign Window
FROM THE BIG BANG TO NEW ECONOMY, A JOURNEY IN MAKING SENSE OF BIG DATA
31
Campaign Measurement
FROM THE BIG BANG TO NEW ECONOMY, A JOURNEY IN MAKING SENSE OF BIG DATA
Define Campaign
Run Campaign
Measure Impacts
Drive Insights
32FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
Descriptive Analytics
Predictive Analytics
Prescriptive Analytics
Holy Grail of Analytics
33
Example #2
FROM THE BIG BANG TO NEW ECONOMY, A JOURNEY IN MAKING SENSE OF BIG DATA
How much sales did my campaign generated?
Marketing
Case study: Online Search
Natural/OrganicSearch (free)
Paid Search
34FROM THE BIG BANG TO NEW ECONOMY,
A JOURNEY IN MAKING SENSE OF BIG DATA
X days
2 purchases
missing
X days
Y days
all purchasesare incremental1 purchase is
uncorrelated
Y days
Jan 1st Feb 1st
$ $ $ $ $ $ $
click
$ $ $
Behavioral purchasesUncorrelated to Marketing
clickMar 1st
$
Influence purchaseCorrelated to Marketing
Customer behaviors and Internet Marketing Investment
Which customer purchases are influenced by Marketing?
35FROM THE BIG BANG TO NEW ECONOMY,
A JOURNEY IN MAKING SENSE OF BIG DATA
Remember this physics problem?
36
?
A
B
CD
1. AB + CD?2. AC + BD?3. AD + BC?
FROM THE BIG BANG TO NEW ECONOMY, A JOURNEY IN MAKING SENSE OF BIG DATA
Solution: Big Data infrastructure enables large scale computational such as combine all possibilities (cross-product)
37
Statistical Noise
Signal(particle resonance)
Source: http://www.atlas.ch/news/2011/ATLAS-discovers-its-first-new-particle.html
Schematic View
Combine correlated events and uncorrelated events produce a system with a statistical noise (which is simple enough to extract) and the researched signal
CERN Example(discovery of a new particle bb)
FROM THE BIG BANG TO NEW ECONOMY, A JOURNEY IN MAKING SENSE OF BIG DATA
Marketing incrementality
(correlated purchases) Level of
behavioral purchases
Positive LatencyPurchase after Click (potential causality)
Behavior & Internet Marketing impact
Level of behavioral purchases
0 2 4 6 8 10 12 14
Latency (days)
Number of events (pairs click-purchase)
Negative LatencyPurchase before Click (no causality)
Behavior only
-14 -12 -10 -8 -6 -4 -2
User clicks on an ad-banner at time=0
User makes a purchase X days later
Latency time for each pair click - purchase
38FROM THE BIG BANG TO NEW ECONOMY,
A JOURNEY IN MAKING SENSE OF BIG DATA
39
Sales ROI Channel A 8% +20%Channel B 5% -10%Channel C 1% +10%
Method 1• Reduce spend on channel B• Invest in channel A• When prioritizing, ignore
channel C
Sales ROI Channel A 7% -20%Channel B 6% +30%Channel C 12% +60%
Method 2• Reduce spend on channel A• Invest heavily on channel C• Marketing counts actually for
25% of the site
<>
… So what?
FROM THE BIG BANG TO NEW ECONOMY, A JOURNEY IN MAKING SENSE OF BIG DATA
Consumer Heterogeneity and Paid Search Effectiveness: A Large Scale Field Experiment, Thomas Blake, Chris Nosko, Steven Tadelis
Case study: Online Search
40FROM THE BIG BANG TO NEW ECONOMY,
A JOURNEY IN MAKING SENSE OF BIG DATA
Case study: Online Search
Consumer Heterogeneity and Paid Search Effectiveness: A Large Scale Field Experiment, Thomas Blake, Chris Nosko, Steven Tadelis 41
FROM THE BIG BANG TO NEW ECONOMY, A JOURNEY IN MAKING SENSE OF BIG DATA
Consumer Heterogeneity and Paid Search Effectiveness: A Large Scale Field Experiment, Thomas Blake, Chris Nosko, Steven Tadelis
Case study: Online Search
42FROM THE BIG BANG TO NEW ECONOMY,
A JOURNEY IN MAKING SENSE OF BIG DATA
Consumer Heterogeneity and Paid Search Effectiveness: A Large Scale Field Experiment, Thomas Blake, Chris Nosko, Steven Tadelis
Case study: Online Search
43FROM THE BIG BANG TO NEW ECONOMY,
A JOURNEY IN MAKING SENSE OF BIG DATA
44
So, what’s next?Marketing 101
Don’t Do Marketing Do Marketing
No Purchase
PurchaseL L
D DC
C?
?
Cost
Direct Return
Incr Return
Rule #1: Never, ever, spend money unless you really-really have to
FROM THE BIG BANG TO NEW ECONOMY, A JOURNEY IN MAKING SENSE OF BIG DATA
So, what’s next?
Investment (costs)
Output Cost
Return (Revenues)
ProfitMax SalesNo Profit
Total ROI = 0
Max Profit
DReturn = DInvestmenti.e. marginal ROI = 0Rule #2: If you have to spend, you spend
to the point of marginal return=0
45FROM THE BIG BANG TO NEW ECONOMY,
A JOURNEY IN MAKING SENSE OF BIG DATA
SpendBucket i
SpendBucket 0
(most profitable)
SpendBucket N
(leastprofitable)
…
…
Marginal Return Chart
CumulativeCost
ROI
CurrentSpend Level
Area/initiatives/segment withnegative profitability
Cost reduction opportunity!
Point of marginal
return = 0(maximum profit)
In depth Analysis require to validate
high ROI
46FROM THE BIG BANG TO NEW ECONOMY,
A JOURNEY IN MAKING SENSE OF BIG DATA
47
Example #3
The cost of Big Data
FROM THE BIG BANG TO NEW ECONOMY, A JOURNEY IN MAKING SENSE OF BIG DATA
What is my share of the pile?
48FROM THE BIG BANG TO NEW ECONOMY,
A JOURNEY IN MAKING SENSE OF BIG DATA
Google Cloud Platform Cost
~ 0 > 0
How to determine who is costing how much?
49FROM THE BIG BANG TO NEW ECONOMY,
A JOURNEY IN MAKING SENSE OF BIG DATA
How to track Big Query usage?
Google does not provide a data feed on its customer’s usage of BigQuery. However three API can help us:
bigquery.jobs.list
List all the Jobs in a specified project.
Note: use projection = full to get email of user
bigquery.jobs.get
Retrieve the
specified job by ID.
The queries are parsed to extract underlying tables used, and the data is stored in the App Engine datastore as well as in Big Query through the streaming API (close to real-time).
bigquery.projects.list
List all (visible) projects
50FROM THE BIG BANG TO NEW ECONOMY,
A JOURNEY IN MAKING SENSE OF BIG DATA
Beyond Queries, we also scan Tables
bigquery.projects.list
List projects visible
bigquery.tables.list
List tables within a dataset
bigquery.datasets.list
List datasets within a project
bigquery.tables.get
Get details about a table
datastorequeries
information
51FROM THE BIG BANG TO NEW ECONOMY,
A JOURNEY IN MAKING SENSE OF BIG DATA
Enables Enlightenment Questions for an Analyst
• When was this table last refreshed?• How often is it refreshed?• How was it created? • Underlying data sources/tables?• Who created this table?• Who knows how to use this table?• Where can I find this great query I ran?• Who knows how to use this tag/metric?
• How much bandwidth am I using?• How much space are my tables using?• How much does my usage cost?
Rick Hotten
52FROM THE BIG BANG TO NEW ECONOMY,
A JOURNEY IN MAKING SENSE OF BIG DATA
How much bandwidth am I using in BigQuery?
53FROM THE BIG BANG TO NEW ECONOMY,
A JOURNEY IN MAKING SENSE OF BIG DATA
Big Query Pricing
$0.02 per GB per month$6.83 per TB per day Storage Cost
Query Cost $5 per TB$20,000 per month
for 5 GB/s unit,i.e. $1.58 per TB*
On-demand Reserved capacity
* Note: for continuous usage of the 5 GB/s bandwidth
54FROM THE BIG BANG TO NEW ECONOMY,
A JOURNEY IN MAKING SENSE OF BIG DATA
How much does my usage of BigQuery cost?
Assuming that the Motorola bandwidth is elastic, i.e. we always pay for the optimal number of units (5 Gb/s), we can use $1.58 per TB as a proxy
55FROM THE BIG BANG TO NEW ECONOMY,
A JOURNEY IN MAKING SENSE OF BIG DATA
Weekly Email to largest BQ users
56
Example #4
Human Resources
FROM THE BIG BANG TO NEW ECONOMY, A JOURNEY IN MAKING SENSE OF BIG DATA
It’s time for your annual review
Annual Review Feedback
57FROM THE BIG BANG TO NEW ECONOMY,
A JOURNEY IN MAKING SENSE OF BIG DATA
What is the optimal method to determine your key work partners for feedbacks? With objectivity and relevancy?
58FROM THE BIG BANG TO NEW ECONOMY,
A JOURNEY IN MAKING SENSE OF BIG DATA
Scrapping the trace of your collaboration:Gmail and Google Calendar
gmail.users.messages.getGet 1 email details
datastore
calendar.events.listList events & meta-data (by page of 100)
gmail.users.messages.listList User Email (by page of 100)
Scoring
1 pts = 30 min meeting
= 10 emailsWeight is divided by
number of participants
Fred 34 ptsNancy 24 ptsDaniel 17 pts
59FROM THE BIG BANG TO NEW ECONOMY,
A JOURNEY IN MAKING SENSE OF BIG DATA
Example
Wrapping Up… CERN vs New Economy
60
CERN
• Write kilometers long Fortran code
New Economy
• Write miles long SQL code• Analysis can run for many hours… before a
batch robot error• Queries can run for many hours… before a
spool space error
• Study billions of collision data • Study billions of customer data• Great depth of data structure & complexity • Great depth of data structure & complexity• Know your local expert for question – but try
to find the solution by yourself… much quicker
• Know your local expert for question – but try to find the solution by yourself… much quicker
• Remove “bad runs” (unclean data batch) • Remove “wackos” (non material transactions)
• Transform a complex system into insights • Transform a complex system into insights• Communicate findings to conferences • Communicate recommendation to business
review• Strong competitive landscape (4 distinct
experiments competing to the first to publish, or publish better results)
• Strong competitive landscape
FROM THE BIG BANG TO NEW ECONOMY, A JOURNEY IN MAKING SENSE OF BIG DATA