criteo tektosdata meetup
TRANSCRIPT
Copyright © 2015 Criteo
The Criteo Experience
Olivier Koch
Engineering Program Manager, Criteo
TektosData Meetup “Data Meets Business”
May 31, 2016
Copyright © 2015 Criteo
Outline
• What does Criteo do?
• Deep dive into our technical stack
• Delivery at scale
• A few lessons learned
2
Copyright © 2015 Criteo
Banners… what else?
3
Advertiser Publisher
Copyright © 2015 Criteo
Online advertising at scale
4
3B displays / day
40 PB of data
15,000 servers
worldwide
Copyright © 2015 Criteo
• Deep dive into Criteo
Copyright © 2015 Criteo
6
Bidding
• Should we bid?
•At which price?
Recommendation
•Which products shouldwe display?
Look & Feel
•Big image vs small image
•Background color, ...
Prediction
•Generic prediction engine
• Specific models trained on TBs of data
Copyright © 2015 Criteo
7
Bidding
• Should we bid?
•At which price?
Recommendation
•Which products shouldwe display?
Look & Feel
•Big image vs small image
•Background color, ...
Prediction
•Generic prediction engine
• Specific models trained on TBs of data
Copyright © 2015 Criteo
As we sell performance Criteo’s and client’s interests are aligned, so the engine aims at maximizing
the value we generate to our clients
As the cost of a display is lower and independant from the bid (2nd price auction or floor), we should
always bid the maximum value that the client is willing to pay for a display
We bid the expected value of the display for the client
Value = 1€
CPM = 0,6€CPM = 0,7€
CPM = 0,75€
CPM = 1,1€CPM = 1,2€
CPM = 1,3€
This bidding strategy is optimal: we are sure to buy all profitable displays and only them
Copyright © 2015 Criteo
Bid = CPC pClick pSale AOV
2012 - Ensures constant
value allocation between
Criteo and its clients
2014 - COS
Optimizer
2013 - CRO :
“Conversion Rate
Optimizer”
This value depends on the predicted performance and the
client’s objective
Revenue that the display will generate for the clientMaximum share that
the client is willing to
pay
Copyright © 2015 Criteo
We train our prediction models on our historical displays
Historical displays
Variables
Level of engagement of the user
Quality of inventory
User fatigue
For travel: time to check-in and number
of nights
: clicked displays : converted displays (size = order value)
Our ability to predict relies
greatly on the relevance of
the variables we consider
Machine Learning
Algorithms
Copyright © 2015 Criteo
11
Bidding
• Should we bid?
•At which price?
Recommendation
•Which products shouldwe display?
Look & Feel
•Big image vs small image
•Background color, ...
Prediction
•Generic prediction engine
• Specific models trained on TBs of data
Copyright © 2015 Criteo
Recommend products for a user
• What we want: reco(user) = products
• 1B users x 3B products!
• But we need to scale and keep it fresh
Copyright © 2015 Criteo
User X saw orange shoes
Users who saw these same shoes also saw
Most viewed product on the client’s site are
We use collaborative filtering to select candidate products
Candidate products for user X are
Historical
Similar
Best-of
Copyright © 2015 Criteo
Products delivering the best performance are displayed
Variables
Products seen by the user
Time since product event
Level of similarity
Product features
Historical displays
: clicked products : converted products (size = order value)
Products are selected based
on their pClick x pSale x AOV
Machine Learning
Algorithms
Copyright © 2015 Criteo
15
Bidding
• Should we bid?
•At which price?
Recommendation
•Which products shouldwe display?
Look & Feel
•Big image vs small image
•Background color, ...
Prediction
•Generic prediction engine
• Specific models trained on TBs of data
Copyright © 2015 Criteo
Historical displays (color = look & feel)
We train our prediction models on our historical displays
Variables
Some of which we control:
How user interacts with banner
Organization of information
Colorset
Some of which we don’t:
Zone format
Publisher
: clicked displays : converted displays (size = order value)
Look and feel will be selected
based on its pClick x pSale x AOV
My company
BUY! BUY! BUY!
BUY!
Machine Learning
Algorithms
Copyright © 2015 Criteo
17
Bidding
• Should we bid?
•At which price?
Recommendation
•Which products shouldwe display?
Look & Feel
•Big image vs small image
•Background color, ...
Prediction
•Generic prediction engine
• Specific models trained on TBs of data
Copyright © 2015 Criteo
Predict: 𝔼 𝑆𝑎𝑙𝑒𝑠𝐴𝑚𝑜𝑢𝑛𝑡 = ℙ 𝐶𝑙𝑖𝑐𝑘 ℙ 𝑆𝑎𝑙𝑒|𝐶𝑙𝑖𝑐𝑘 𝔼[𝑆𝑎𝑙𝑒𝑠𝐴𝑚𝑜𝑢𝑛𝑡|𝑆𝑎𝑙𝑒]
Each model is trained independently & refreshed as often as possible
Three sources of features: user, ad, page (mostly categorical).
Optimizing for sales amount
(logistic) (logistic) (log normal) (all regularized!)
Copyright © 2015 Criteo
Learn on huge volumes of data
10 000 displays
Copyright © 2015 Criteo
Learn on huge volumes of data
10 000 displays
leads to
50 clicks
Copyright © 2015 Criteo
Learn on huge volumes of data
10 000 displays
leads to
50 clicks
leads to
1 sale
Copyright © 2015 Criteo
We have our own large-scale distributed machine learning library on top of Hadoop used for all models.
From a ML perspective we rely, in most cases, on an L-BFGS solver initialized with SGD (see, eg, A.
Agarwal et al. A Reliable Effective Terascale Linear Learning System).
In-house Machine Learning library -- IRMA
Copyright © 2015 Criteo
Learning duration: trading time and volume
Longer ⇒ Volume ↑ VS Shorter ⇒ Reactivity ↑
23
100
110
120
130
140
150
160
170
180
190
200
11/01/2014 21/01/2014 31/01/2014 10/02/2014 20/02/2014
Sale
s a
mou
nt (€
)
Valentine’s day eve
Pre
cis
ion
Learning duration
12/02/2014 13/02/2014 14/02/2014 15/02/2014
16/02/2014 17/02/2014 18/02/2014 All
Copyright © 2015 Criteo
Each model is trained on several TB of data and contains millions of features
We learn several hundreds of models, refreshed many times per day
How about large-scale distributed machine learning?
Wait a minute: how do you handle TBs of training data?
+ =
Copyright © 2015 Criteo
Hadoop AllReduce
L-BFGS, being a batch algorithm, is easy to distribute (by distributing the computation of the gradient),
while it’s more difficult with SGD; we do parameter averaging for that, which needs some tweaking
(learning rate, number of epochs, …). In SGD, we use Hogwild! to multi-thread.
Zookeeper to ensure fault-tolerance.
Distribution of L-BFGS & SGD
Copyright © 2015 Criteo
Irma is not only about vanilla logistic regression with L2 regularization; it contains more advanced
techniques: transfer learning, factorization machines, learning to rank, …
We for example use cost-sensitive learning for bidding.
A word on advanced techniques
Copyright © 2015 Criteo
Two steps:
Offline testing is fast, cheap, and efficient for wide exploration
Online testing is expensive but has the ultimate word
The more data you have, the faster you can make decisions
Offline & online evaluation
Copyright © 2015 Criteo
28
Physical infrastructure
7 in-house data centers on 3 continents
~ 15000 servers, largest Hadoop cluster in Europe
More than 35 PB of storage Big Data
Traffic
800k HTTP requests / sec (peak activity)
29000 impressions / sec (peak activity)
<10 ms to process bidding request
<100 ms to process reco request
Copyright © 2015 Criteo
Academic research @ Criteo
• Our 1st public dataset is online: http://bit.ly/1vgw2XC
• New 1TB dataset released last year
• Recent publications:
Offline evaluation of response prediction in online advertising auctions, O. Chapelle, WWW’15.
Sources of variability in large-scale machine learning systems, D. Lefortier, A. Truchet, and M.de Rijke, NIPS workshop on ML systems, 2015
Cost-sensitive learning for bidding in online advertising auctions, F. Vasile and D. Lefortier,NIPS workshop on ML for e-commerce, 2015.
29
Copyright © 2015 Criteo
New areas of research
• Counterfactual evaluation (offline A/B tests)
• Product embeddings for recommendation
• Policy learning
30
Copyright © 2015 Criteo
• Delivery at scale
Copyright © 2015 Criteo
The early days of Criteo
32
Single C# repository
Build in 90 minutes
Weekly merges
Copyright © 2015 Criteo
What could go wrong?
33
Copyright © 2015 Criteo
34
Copyright © 2015 Criteo
Delivery at scale at Criteo
35
Trunk-based development (TBD)
Fast commits
Code reviews with Gerrit
The MOAB
Deploy with scp / bittorrent
Automatic metrics checks
=> 200+ happy engineers!
Copyright © 2015 Criteo
The Criteo MOAB
36
Copyright © 2015 Criteo
Delivery at scale at Criteo
37
Copyright © 2015 Criteo
• A few lessons learned
Copyright © 2015 Criteo
Start small
• If you can't build it with a few machines, it's likely you won't be able to do it with many
39
First Google computer
Copyright © 2015 Criteo
Start small
• Keep fancy algorithms for later
40
The Page rank algorithm
Copyright © 2015 Criteo
Iterate fast
• Easy access to data (20PB vs 4GB of clean, carefully selected data)
• Convenient technologies (e.g. Python & notebooks, scikit-learn)
• Make IT a non-problem
• Keep projects small (typical project size 3-9 months)
41
Copyright © 2015 Criteo
Iterate fast
• Easy access to data (20PB vs 4GB of clean, carefully selected data)
• Convenient technologies (e.g. Python & notebooks, scikit-learn)
• Make IT a non-problem
• Keep projects small (typical project size 3-9 months)
42
Talent magnet
Copyright © 2015 Criteo
Keep teams small
43
3 members
3 channels
4 members
6 channels
5 members
10 channels
10 members
45 channels
…
Copyright © 2015 Criteo
Build the right team
• Variety of skills
• Software/ML engineers, ops/devops
• Analysts/BI
• Product
• Designers
• Managers
44
Copyright © 2015 Criteo
Make the team agile
• Use a flat, distributed hierarchy model and make people sit next to each other
45
EPMENG LEAD
PM
MGR
Copyright © 2015 Criteo
Make the team agile
• Use the right tools
• slack
• jira
• confluence
• git
• gerrit
• OKR
46
Copyright © 2015 Criteo
Build the culture
• Let ideas emerge bottom-up
• Hackathons (for real)
• 10% projects
• Transparency : make info available to all
• Use mature technologies
• You will fail. That’s OK!
47
Copyright © 2015 Criteo
Take-aways
• Start small
• Iterate fast
• Build the team
• Make the team agile
• Build the culture
48
Copyright © 2015 Criteo
• Thanks! Questions?