boosting ad revenue using reinforcement learning (robin schuil technology stream)
TRANSCRIPT
Marktplaats.nl
• Largest classifieds site in the Netherlands
• One of the most visited websites in NL
• Founded in 1999, acquired by eBay in 2004
• Now headquarters to eBay Classifieds Group: 12 brands in 17 countries
@schuilr 3
Facts & Figures
• 1.3 million visitors / day– desktop: 34%, mobile: 49%, tablet: 18%
• 9 million live listings– 350,000 new items / day
• 6 million unique search requests / day– 70 searches per second (average)
@schuilr 4
Seasonal trends
@schuilr 6
0.00%
1.00%
2.00%
3.00%
4.00%
5.00%
6.00%
7.00%
8.00%
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
Vraa
g
Week
skibroek
ski
skipak
snowboard
Winter sports!
Seasonal trends
@schuilr 7
Camping!
0.00%
0.50%
1.00%
1.50%
2.00%
2.50%
3.00%
3.50%
4.00%
4.50%
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
Vraa
g
Week
caravans
campers
vouwwagen
Seasonal trends
@schuilr 8
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
14.00%
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
Vraa
g
Week
sinterklaas
kerst
Saint Nicolas & Christmas!
Weather, temperature, etc.
@schuilr 9
0"
5"
10"
15"
20"
25"
0.00%"
1.00%"
2.00%"
3.00%"
4.00%"
5.00%"
6.00%"
7.00%"
1" 3" 5" 7" 9" 11" 13" 15" 17" 19" 21" 23" 25" 27" 29" 31" 33" 35" 37" 39" 41" 43" 45" 47" 49" 51"
Tempe
ratuur)
Vraag)
Week)
vliegengordijn"
Temperatuur"
Fly curtains!
Weather, temperature, etc.
@schuilr 10
Heaters!0"
5"
10"
15"
20"
25"0.00%"
0.50%"
1.00%"
1.50%"
2.00%"
2.50%"
3.00%"
3.50%"
4.00%"
1" 3" 5" 7" 9" 11" 13" 15" 17" 19" 21" 23" 25" 27" 29" 31" 33" 35" 37" 39" 41" 43" 45" 47" 49" 51"
Tempe
ratuur)
Vraag)
Week)
kachel"
Temperatuur"
Reversed
Special events
@schuilr 11
0.00%
1.00%
2.00%
3.00%
4.00%
5.00%
6.00%
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51
Vraa
g
Week
oranje
Orange (“oranje”)!
World Cup Football
King’s Day
During a football game
@schuilr 12
20:45
&20:48
&20:51
&20:54
&20:57
&21:00
&21:03
&21:06
&21:09
&21:12
&21:15
&21:18
&21:21
&21:24
&21:27
&21:30
&21:33
&21:36
&21:39
&21:42
&21:45
&21:48
&21:51
&21:54
&21:57
&22:00
&22:03
&22:06
&22:09
&22:12
&22:15
&22:18
&22:21
&22:24
&22:27
&22:30
&22:33
&22:36
&22:39
&22:42
&22:45
&22:48
&22:51
&22:54
&22:57
&23:00
&23:03
&23:06
&23:09
&23:12
&23:15
&
Last&Friday& This&Friday&
Break
Kick-off 1 - 0 1 - 1 1 - 2 1 - 3 1 - 4 1 - 5 End
“Juichpakken”
0.00%$
5.00%$
10.00%$
15.00%$
20.00%$
25.00%$
1$ 3$ 5$ 7$ 9$ 11$ 13$ 15$ 17$ 19$ 21$ 23$ 25$ 27$ 29$ 31$ 33$ 35$ 37$ 39$ 41$ 43$ 45$ 47$ 49$ 51$
Vraag%
Week%
roy$donders$
juichpak$
“Nieuw & populair”
@schuilr 15
• “Nieuw & populair” = trending products
• Pay-per-click advertising model
• Advertisers bid for clicks, similar to Google Adwords
• Metric to optimize: �Revenue Per Mille (RPM) = CTR * bid * 1,000
First (minimal) version
• Find top 100 “trending” keywords using Spark• Randomly pick one of those keywords• Display top 4 results for the selected keyword
@schuilr 16
Can we do better?
• CTR and bid varies per keyword. Random selection gives average performance.
• Doesn’t consider the user’s personal preferences
@schuilr 17
One armed bandit = slot machine�
Problem:�How to pick between slot machines �
so that you maximize profit?
@schuilr 19
Exploration – Exploitation
• Explore (learn)"Try out different candidates to learn how they perform over time
• Exploit (earn)"Take advantage of what you’ve learned to maximize payoff (your current best guess)
@schuilr 20
Many different approaches
• Epsilon First• Epsilon Greedy• Upper Confidence Bound• Thompson Sampling• LinUCB
@schuilr 21
Epsilon First
Time
Random
Learn: collect data for each
candidate
( split testing, A/B testing )
Best
Earn: show the best
performer
@schuilr 22
Epsilon First • Simple and intuitive• Lots of tools available (VWO, Optimizely, …)�
• Average reward until exploration is finished• What if the best candidate is no longer the best?
@schuilr 23
Epsilon Greedy • Very simple to implement and surprisingly effective• Can deal with nonstationary problems
• How to determine the optimal value for ε?
@schuilr 25
Upper Confidence Bound Basic idea:• Calculate mean and a measure of uncertainty
(variance) for each candidate• Pick current best performer based on mean +
uncertainty bonus
@schuilr 26
Measuring uncertainty
Observed mean: 0.50
95% certain that true mean ≤ 0.76
Uncertainty bonus: 0.26
@schuilr 27
Upper Confidence Bound • Selecting “A” reduces uncertainty• Candidate “C” now has the highest score
A
B
C
Es)matedreward
Pick “C”!
@schuilr 30
Upper Confidence Bound
• Uses variance measure to automatically balance exploration with exploitation�
• Deterministic; requires online learning (not suited for small-batch mode)
@schuilr 31
Thompson Sampling Basic idea:• The number of pulls for a given lever should match
its actual probability of being the optimal lever�
• Sample from the posterior for the mean of each lever:�
p(λ|X) = Gamma(conv + prior_conv, impr + prior_impr)
@schuilr 32
Few conversions Candidate Conversions Impressions Chance of being
winner
A (3.9%) 11 282 42%
B (3.3%) 2 61 39%
C (2.8%) 4 143 19%
@schuilr 33
More conversions Candidate Conversions Impressions Chance of being
winner
A (3.9%) 93 2,382 82%
B (3.3%) 66 2,011 13%
C (2.8%) 31 1,093 5%
@schuilr 34
Many conversions Candidate Conversions Impressions Chance of being
winner
A (3.9%) 892 22,882 97%
B (3.3%) 174 5,261 2%
C (2.8%) 66 2,343 1%
@schuilr 35
Lots of conversions Candidate Conversions Impressions Chance of being
winner
A (3.9%) 5,621 144,132 > 99%
B (3.3%) 256 7,761 < 1%
C (2.8%) 101 3,593 < 1%
@schuilr 36
Thompson Sampling
• Weighted random sampling• Works well in small-batch mode�
• Doesn’t consider context (e.g. user’s personal preferences)
@schuilr 37
LinUCB Basic idea:• Define a “context” of information of the user• Fit a per-candidate logistic regression model• Applies the concept of Upper Confidence Bound
(UCB)– mean + uncertainty bonus
@schuilr 39
Context • Gender• Recently viewed categories• Current date• Weather forecast• …
Principal Component Analysis (PCA) to reduce sparseness and computation complexity
@schuilr 40
Pruning
• Periodically remove weakest performers• Replace with new, unexplored “trending keywords”• Rinse and repeat
@schuilr 42
Endless possibilities
• News homepage• Online advertising• Deciding which thumbnail to show on the SERP• Etc, etc ...
@schuilr 44
Reading List “Bandit Algorithms for Website Optimization”�http://bit.ly/bandits-book
“Reinforcement Learning”�http://bit.ly/rl-book
@schuilr 45
References • https://en.wikipedia.org/wiki/Multi-armed_bandit• http://shop.oreilly.com/product/0636920027393.do• https://webdocs.cs.ualberta.ca/~sutton/book/the-book.html• http://www.slideshare.net/chucheng/efficient-approximate-thompson-sampling-for-search-query-recommendation• http://www.slideshare.net/iliasfl/multiarmed-bandits-intro-examples-and-tricks• http://www.slideshare.net/mgershoff/conductrics-bandit-basicsemetrics1016• http://www.slideshare.net/MarkusOjala1/multi-armed-bandits-and-optimized-online-marketing-54679491
@schuilr 47