multi-armed bandits: intro, examples and tricks

34
Multi-Armed Bandit s: Intro, examples and tricks Dr Ilias Flaounas Senior Data Scientist at Atlassian Data Science Sydney meetup 22 March 2016

Upload: ilias-flaounas

Post on 22-Jan-2018

1.258 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Multi-Armed Bandits: Intro, examples and tricks

Multi-Armed Bandits:Intro, examples and tricks

Dr Ilias Flaounas Senior Data Scientist at Atlassian

Data Science Sydney meetup 22 March 2016

Page 2: Multi-Armed Bandits: Intro, examples and tricks

Motivation

Increase awareness of some very useful but less known techniques

Demo some current work at Atlassian

Connect it with some research from my past

Hopefully, there will be something useful for everybody — apologies for the few equations and loose notation

Page 3: Multi-Armed Bandits: Intro, examples and tricks
Page 4: Multi-Armed Bandits: Intro, examples and tricks

http://www.nancydixonblog.com/2012/05/-why-knowledge-management-didnt-save-general-motors-addressing-complex-issues-by-convening-conversat.html

Page 5: Multi-Armed Bandits: Intro, examples and tricks

( rA,1 )

( rC,2 )

rB,3

+ rA,4 + rA,5

+ rC,6

+ rA,7 / nA

+ rC,8

/ nB

/ nc

µA =

µB =

µC =

Page 6: Multi-Armed Bandits: Intro, examples and tricks

1. e-greedy: the best arm is selected for a proportion of 1-e of the trials and a random arm in e trials.

2. e-greedy with variable e

3. Pure exploration first, then pure exploitation.

4. …

5. Thompson sampling (Draw from the estimated beta-distrom

6. Upper Confidence Bound (UCB)

Many solutions…

Page 7: Multi-Armed Bandits: Intro, examples and tricks
Page 8: Multi-Armed Bandits: Intro, examples and tricks
Page 9: Multi-Armed Bandits: Intro, examples and tricks
Page 10: Multi-Armed Bandits: Intro, examples and tricks
Page 11: Multi-Armed Bandits: Intro, examples and tricks
Page 12: Multi-Armed Bandits: Intro, examples and tricks
Page 13: Multi-Armed Bandits: Intro, examples and tricks
Page 14: Multi-Armed Bandits: Intro, examples and tricks
Page 15: Multi-Armed Bandits: Intro, examples and tricks

Disadvantages

Reaching significance for non-winning arms takes longer

Unclear stopping criteria

Hard to order non-winning arms and assess reliably their impact

Advantages

Reaching significance for the winning arm is faster

Best arm can change over time

There are no false positives in the long term

Page 16: Multi-Armed Bandits: Intro, examples and tricks

Optimizely recently introduced MAB rebranded as: “Traffic auto-allocation”

Page 17: Multi-Armed Bandits: Intro, examples and tricks

Let’s add some context

What happens if we want to assess 100 variations?

How about 1,000 or 10,000 variations?

Page 18: Multi-Armed Bandits: Intro, examples and tricks

Contextual Multi-Armed Bandits

rA, t = f(xA,1, xA,2, xA,3…)A -> {xA,1, xA,2, xA,3…}

rB,t = f(xB,1, xB,2, xB,3…)

rC,t = f(xC,1, xC,2, xC,3…)

Experiment parameters, e.g., price, #users, product, bundles, colour of UI elements…

B -> {xB,1, xB,2, xB,3…}

C -> {xC,1, xC,2, xC,3…}

Page 19: Multi-Armed Bandits: Intro, examples and tricks

We introduce a notion of proximity or similarity

between arms

A -> {xA,1, xA,2, xA,3…}B -> {xB,1, xB,2, xB,3…}

Contextual Multi-Armed Bandits

Page 20: Multi-Armed Bandits: Intro, examples and tricks

LinUCB

L. Li, W. Chu, J. Langford, R. E. Schapire, “A Contextual-Bandit Approach to Personalized News Article Recommendation”, WWW, 2010.

The UCB is some expectation plus some confidence level:

µ↵(t) + �↵(t)

We assume there is some unknown vector θ∗, the same for each arm, for which:

E[ra,t|xa,t] = x

Ta,t✓

Page 21: Multi-Armed Bandits: Intro, examples and tricks

✓̂t := C�1t XT

t yt

Xt := {xa(1),1, xa(2),2, . . . , xa(t),t}T

yt := {ra(1),1, ra(2),2, . . . , ra(t),t}T

Ct := XTt Xt

Using least squares:

µ̂a(t) := x

Ta,t✓̂t

E[ra,t|xa,t] = x

Ta,t✓

⇤ µ↵(t) + �↵(t)

µ̂a := x

Ta,tC

�1t X

Tt yt

Page 22: Multi-Armed Bandits: Intro, examples and tricks

The upper confidence bound is some expectation plus some confidence level:

µ↵(t) + �↵(t)

�̂(t) :=qx

Ta,tC

�1t xa,tµ̂a := x

Ta,tC

�1t X

Tt yt

Page 23: Multi-Armed Bandits: Intro, examples and tricks

L. Li, W. Chu, J. Langford, R. E. Schapire, A Contextual-Bandit Approach to Personalized News Article Recommendation, WWW, 2010.

Page 24: Multi-Armed Bandits: Intro, examples and tricks

Product onboarding…

Which arm would you pull?

Page 25: Multi-Armed Bandits: Intro, examples and tricks

• How can we locate the city of Bristol from tweets?

• 10K candidate locations organised in a 100x100 grid

• At every step we get tweets from one location and count mentions of “Bristol”

• Challenge: find the target in sub-linear time complexity!

Page 26: Multi-Armed Bandits: Intro, examples and tricks

Linear methods fail on this problem.

How can we go non-linear?

Page 27: Multi-Armed Bandits: Intro, examples and tricks

John-Shawe Taylor & Nello Cristianini, “Kernel Methods for Pattern Analysis”, Cambridge University press, 2004.

The Kernel trick! —no, it’s not just for SVMs

Page 28: Multi-Armed Bandits: Intro, examples and tricks

µ̂a(t) := x

Ta,t✓̂t µ̂

a

(t) = kTx,t

K�1t

yt

�̂a

(t) =q

tkTx,t

K�2t

kx,t

�̂(t) :=qx

Ta,tC

�1t xa,t

Ct := XTt Xt Kt = XtX

Tt

LinUCB:

M. Valko, N. Korda, R. Munos, I. Flaounas, N. Cristianini, “Finite-Time Analysis of Kernelised Contextual Bandits”, UAI, 2013.

KernelUCB:

Page 29: Multi-Armed Bandits: Intro, examples and tricks

• The last few steps of the algorithm before it locates Bristol.

• KernelUCB with RBF kernel converges after ~300 iterations (instead of >>10K).

Page 30: Multi-Armed Bandits: Intro, examples and tricks

Target is the red dot. We locate it using KernelUCB with RBF kernel.

KernelUCB code: http://www.complacs.org/pmwiki.php/CompLACS/KernelUCB

Page 31: Multi-Armed Bandits: Intro, examples and tricks

What if we have a high-dimensional space?

Hashing trick

Implementation in Vowpal Wabbit, by J. Langford, et al.

Page 32: Multi-Armed Bandits: Intro, examples and tricks
Page 33: Multi-Armed Bandits: Intro, examples and tricks

ReferencesM. Valko, N. Korda, R. Munos, I. Flaounas, N. Cristianini, “Finite-Time Analysis of Kernelised Contextual Bandits”, UAI, 2013.

L. Li, W. Chu, J. Langford, R. E. Schapire, “A Contextual-Bandit Approach to Personalized News Article Recommendation”, WWW, 2010.

John-Shawe Taylor & Nello Cristianini, “Kernel Methods for Pattern Analysis”, Cambridge University press, 2004.

Implementation of KernelUCB in Complacs toolkit:http://www.complacs.org/pmwiki.php/CompLACS/KernelUCB

https://en.wikipedia.org/wiki/Multi-armed_bandit

https://github.com/JohnLangford/vowpal_wabbit/wiki/Contextual-Bandit-Example

Page 34: Multi-Armed Bandits: Intro, examples and tricks

Thank you - We are hiring!

Dr Ilias Flaounas Senior Data Scientist <first>.<last>@atlassian.com