dynamic learning gaussian process bandits€¦ · diogo dubart norinho (ucl) december 10, 2012 8 /...

47
Dynamic Learning Gaussian Process Bandits Dynamic Learning Gaussian Process Bandits Diogo Dubart Norinho University College London, Computer Science Department December 10, 2012 Diogo Dubart Norinho (UCL) December 10, 2012 1 / 44

Upload: others

Post on 13-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits

Dynamic Learning Gaussian Process Bandits

Diogo Dubart Norinho

University College London,Computer Science Department

December 10, 2012

Diogo Dubart Norinho (UCL) December 10, 2012 1 / 44

Page 2: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits

Outline

1 Introduction

2 Gaussian Process (GP)

3 Bandits

4 DLGP

5 DLGP vs. GP bandits

6 Conclusion

Diogo Dubart Norinho (UCL) December 10, 2012 2 / 44

Page 3: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits Introduction

Outline

1 Introduction

2 Gaussian Process (GP)

3 Bandits

4 DLGP

5 DLGP vs. GP bandits

6 Conclusion

Diogo Dubart Norinho (UCL) December 10, 2012 3 / 44

Page 4: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits Introduction

Diogo Dubart Norinho (UCL) December 10, 2012 4 / 44

Page 5: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits Gaussian Process (GP)

Outline

1 Introduction

2 Gaussian Process (GP)

3 Bandits

4 DLGP

5 DLGP vs. GP bandits

6 Conclusion

Diogo Dubart Norinho (UCL) December 10, 2012 5 / 44

Page 6: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits Gaussian Process (GP)

What it a Gaussian Process? Intuitively...

A GP is generalisation of a multivariate gaussian distribution.

So instead of being a distribution over vectors it is a distribution overfunctions.

Conceptualy GPs are based on the naive, yet effective, idea that afunction f (x) can be regarded as an infinite vector whose valuescorrespond f (x) evaluated at every possible input x.

Diogo Dubart Norinho (UCL) December 10, 2012 6 / 44

Page 7: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits Gaussian Process (GP)

What it a Gaussian Process? Intuitively...

A GP is a random function f : X → R, where X be a non-empty indexset, such that for any finite set of input points x1, . . . , xn,f (x1)

...f (xn)

∼ Nm(x1)

...m(xn)

,

k(x1, x1) . . . k(x1, xn)...

...k(xn, x1) . . . k(xn, xn)

,

with the parameters being the mean function m(x) and the covariancekernel k(x, x′), for all x, x′ ∈ X . This is also known as the consistencyproperty of the GPs.

Diogo Dubart Norinho (UCL) December 10, 2012 7 / 44

Page 8: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits Gaussian Process (GP)

More formally...

Definition (Gaussian Process )

A GP {f (x), x ∈ X}, is a family of random variables f (x), all defined inthe same probability space. In addition, for any finite subset F ⊂ X , withF := {xπ1 , . . . , xπn}, the random vector f := [f (xπ1), . . . , f (xπn)]> has a(possibly degenerate) Gaussian distribution.

The GP is denoted as,

f (x) ∼ GP(m(x), k(x, x′)

).

Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44

Page 9: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits Gaussian Process (GP)

Example of a kernel: the ARD kernel

The Automatic Relevance Determination kernel is defined as:

kARD(x, x′) := σ20 exp(−1

2(x− x′)>M(x− x′)), (1)

where M can be a positive semidefinite. If M is diagonal then it’s elementsare known as length scales.

Diogo Dubart Norinho (UCL) December 10, 2012 9 / 44

Page 10: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits Gaussian Process (GP)

ARD figure

ARD kernel with length-scale of dimension x is M1,1 = 0.4 and the onealong dimension y is M1,1 = 1.

02

46

810 0

24

68

10−1.5

−1

−0.5

0

0.5

1

y

x

−1.2

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

Diogo Dubart Norinho (UCL) December 10, 2012 10 / 44

Page 11: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits Gaussian Process (GP)

Model Assumption

We assume that each observation f (xi ) is tainted with some gaussiannoise εi ∼ N (0, σ2)

yi = f (xi ) + εi . (2)

Diogo Dubart Norinho (UCL) December 10, 2012 11 / 44

Page 12: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits Gaussian Process (GP)

Predictive distribution

Consider that we have n input values drawn from a GP, gathered in amatrix X∗ = (x∗1, . . . , x

∗n)>, with ∗ indicating that we have not observed

f (x∗i ) (with or without measurement noise). We compute the probabilitydistribution of the unseen observations given the observed data (X , y), inother words the predictive probability:

f∗|X , y,X∗ ∼ N (f̄∗,Σ∗), where (3)

f̄∗ := E[f∗|X , y,X∗] = K (X∗,X )[K (X ,X ) + σ2nI ]−1y, (4)

Σ∗ := K (X∗,X∗)− K (X∗,X )[K (X ,X ) + σ2nI ]−1K (X ,X∗). (5)

Diogo Dubart Norinho (UCL) December 10, 2012 12 / 44

Page 13: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits Gaussian Process (GP)

Hyperparameter Learning: maximising the evidence

The evidence is the probability of observing the data under a model withparameters θ:

P(y|X, θ) = N (f|0,K ) +N (ε|0, σ2 I ) = N (0,Σ)

with Σ := K (X ,X ) + σ2 I .

We aim at maximising the following:

L := logP(y|X, θ) = −12 log |Σ| − 1

2y>Σ−1 y − d

2 log(2π).

Here, we will do that by using the conjugate gradient algorithm.

Diogo Dubart Norinho (UCL) December 10, 2012 13 / 44

Page 14: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits Gaussian Process (GP)

Hyperparameter Learning: Leave one out crossvalidation

Li := logP(y∗i |x∗i , y−i ,X−i , θ)

= −12 log σ2i −

(y∗i − µi )2

2σ2i− 1

2 log(2π)

The hyperparameters are found by maximising the following,

LLOO(y ,X , θ) :=n∑

i=1

Li . (6)

Diogo Dubart Norinho (UCL) December 10, 2012 14 / 44

Page 15: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits Bandits

Outline

1 Introduction

2 Gaussian Process (GP)

3 Bandits

4 DLGP

5 DLGP vs. GP bandits

6 Conclusion

Diogo Dubart Norinho (UCL) December 10, 2012 15 / 44

Page 16: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits Bandits

Visual representation of a multi-armed bandit

Diogo Dubart Norinho (UCL) December 10, 2012 16 / 44

Page 17: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits Bandits

The Bandit framework

It is a way of trading off exploitation and exploration.

Environment: N slot machines, each having an unknown but fixedmean µi and variance σi .

Goal: get rich

1st Approach : sample each machine uniformly until you find sloti∗ := argmaxi∈Nµi.

2nd Approach : play more at the machines where you have earned sofar, while trying others once in a while, just to be sure you did notunder evaluate them.

Diogo Dubart Norinho (UCL) December 10, 2012 17 / 44

Page 18: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits Bandits

The Bandit framework

It is a way of trading off exploitation and exploration.

Environment: N slot machines, each having an unknown but fixedmean µi and variance σi .

Goal: get rich

1st Approach : sample each machine uniformly until you find sloti∗ := argmaxi∈Nµi. Probably not the best approach

2nd Approach : play more at the machines where you have earned sofar, while trying others once in a while, just to be sure you did notunder evaluate them.

Diogo Dubart Norinho (UCL) December 10, 2012 17 / 44

Page 19: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits Bandits

Formally our goal

Let the regret be RT := Tµi∗ −∑T

t=1 rt .Find a strategy such that the on average we our regret tends to zero:

limT→∞

RT

T= 0, (7)

Diogo Dubart Norinho (UCL) December 10, 2012 18 / 44

Page 20: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits Bandits

GP bandits and GP-UCB

GP-bandits are a framework that relies on the assumption that thearms of the bandit are dependent, hence the reward obtained bypulling one arm delivers information about neighbouring arms.

Gaussian Process - Upper Confidence Bound (GP-UCB) is analgorithm applies the principle of optimism in the face of uncertaintyto trade off exploration and exploitation.

Diogo Dubart Norinho (UCL) December 10, 2012 19 / 44

Page 21: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits Bandits

GP-UCB

The surrogate function at evaluation t:

xt = argmaxx∈D{µt−1(x) +√βtσt−1(x)}

where: • D is the objective function’s input space,• set of pulled arms At := {x1, . . . , xt−1},• µt−1 the predictive mean based on A,• σt−1 the predictive variance based on A,

• βt = 2 log(|D| t2π2

), with δ ∈ (0, 1).

Diogo Dubart Norinho (UCL) December 10, 2012 20 / 44

Page 22: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits Bandits

GP-UCB algorithm

input: Input space D; GP prior µ0 = 0, σ0, k0(. , .), θ0

1 for t=1,2,. . . do2 Generate a set of m arms U := {u1, . . . ,um} ⊂ D;3 Choose xt ← argmaxu∈U{µt−1(u) +

√βtσt−1(u)};

4 Sample yt ← f (xt) + εt ;5 µt ← GP posterior mean with θ0 and inputs Xt using (4);6 σ2t ← GP posterior variance with θ0 and inputs Xt using (5);

7 end

Diogo Dubart Norinho (UCL) December 10, 2012 21 / 44

Page 23: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits DLGP

Outline

1 Introduction

2 Gaussian Process (GP)

3 Bandits

4 DLGP

5 DLGP vs. GP bandits

6 Conclusion

Diogo Dubart Norinho (UCL) December 10, 2012 22 / 44

Page 24: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits DLGP

Why learn hyperparameters?

Diogo Dubart Norinho (UCL) December 10, 2012 23 / 44

Page 25: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits DLGP

GP-UCB algorithm

input: Input space D; GP prior µ0 = 0, σ0, k(. , .), θ0 and φ1 θ∗ ← θ0 and A0 ← {∅} ;2 for t = 1, 2, . . . do3 if t ∈ φ then4 θt ← h(θ0,At−1, ...) select new hyperparameters;5 θ∗ ← θt ;

6 end7 Generate a set of m arms U := {u1, . . . ,um} ⊂ D;8 Choose xt ← argmaxu∈U{µt−1(u) +

√βtσt−1(u)};

9 At ← {At−1, xt} adding the last arm pulled to the

10 set of pulled arms;11 Sample yt ← f (xt) + εt ;12 µt ← GP posterior mean with θ∗ at inputs Xt using (4);13 σ2t ← GP posterior variance with θ∗ at inputs Xt using (5);

14 end

Diogo Dubart Norinho (UCL) December 10, 2012 24 / 44

Page 26: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits DLGP

DLGP algorithm: notation

φ is a set of increasing integers (ideally: φ1 > #(θ) ).

θ is the set of all hyperparameters.

At := {x1, . . . , xt} is the set of all pulled arms up to t.

Diogo Dubart Norinho (UCL) December 10, 2012 25 / 44

Page 27: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits DLGP

What about overfitting?

When using such flexible methods, as GPs, there is always a risk to over fitthe data when computing θ. Hence, the model explains the observed datavery well but has low predictive power.

To avoid that we can use crossvalidation. But that assumes iid data,which is not the case here.

So we need to find a method h(θ0,At−1, ...) that somehow manages toovercome this by removing the strong dependency with the dataset.

Diogo Dubart Norinho (UCL) December 10, 2012 26 / 44

Page 28: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits DLGP

EM algorithm

E step (at recursion t):

r(t)ik :=

p(t−1)k N (x|µ(t−1)k ,Σ

(t−1)k )∑

k ′ p(t−1)k ′ N (x|µ(t−1)k ′ ,Σ

(t−1)k ′ )

,

M step (at recursion t):

µ(t)k =

∑ni=1 r

(t)ik xi∑n

i=1 r(t)ik

,

Σ(t)k =

∑ni=1 r

(t)ik (xi − µk)(xi − µk)>∑n

i=1 r(t)ik

,

p(t)k =

1

n

n∑i=1

r(t)ik .

Diogo Dubart Norinho (UCL) December 10, 2012 27 / 44

Page 29: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits DLGP

Silhouette coefficient

Let us consider the K means algorithm, that determines cluster meansµ1, . . . , µK , such that:

min{µ1,...,µK}

K∑k=1

∑i∈Ck

‖xi − µk‖2.

We find the number of clusters K by using the silhouette coefficient s(i)obtained for each observation i in the sample.Let ai := 1

|Ci |∑

j∈Ci ‖xi − xj‖ and bi := minC∈Cci1|C|∑

j∈C ‖xi − xj‖. Then

s(i) :=bi − ai

max(ai , bi ),

and by construction s(i) ∈ [−1, 1]. In the case where the cluster Ci onlycontains i we set s(i) = 0.

Diogo Dubart Norinho (UCL) December 10, 2012 28 / 44

Page 30: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits DLGP

Algorithm for h(θ0,At−1, ...) (1)

input: At , θ0, wfor k=2,3,. . . do

run k means;s̄c(k)← average silhouettes s(i)if max{s̄c(k), . . . , s̄c(k − w + 1)} < s̄c(k − w) then

K ← k − w ;leave loop;

end

end

Diogo Dubart Norinho (UCL) December 10, 2012 29 / 44

Page 31: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits DLGP

Algorithm for h(θ0,At−1, ...) (2)

t ← 0;while EM not converged do

t ← t + 1;

r(t)ik ← E step (for K clusters);

(µ(t)k ,Σ

(t)k , p

(t)k )←M step(for K clusters);

endfor i=1,2,. . . ,n do

P(xi ) =∑K

k=1 p(t)k N (xi |µ

(t)k ,Σ

(t)k ) ;

γ ← U(0, 1);f (xi )← γP(xi )

end∆← only keep the n − h lowest values in {f (x1), . . . , f (xn)};perform LOO on ∆ and use conjugate gradient to maximise θ;

Diogo Dubart Norinho (UCL) December 10, 2012 30 / 44

Page 32: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits DLGP vs. GP bandits

Outline

1 Introduction

2 Gaussian Process (GP)

3 Bandits

4 DLGP

5 DLGP vs. GP bandits

6 Conclusion

Diogo Dubart Norinho (UCL) December 10, 2012 31 / 44

Page 33: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits DLGP vs. GP bandits

Metrics: Euclidean distance

Euclidean distance of the j th closest element to the optimum,

D(j)t := min{‖xi − xopt‖ : xi ∈ A

(j)t }

where A(j)t corresponds to At once the j − 1 closest elements to the

optimum were removed. Then D̄(i)t is the average D(i)

t over all generatedsample paths under the same distribution.

Diogo Dubart Norinho (UCL) December 10, 2012 32 / 44

Page 34: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits DLGP vs. GP bandits

Metrics: Average regret

the average over all sample paths of the individual average regret overtime,

R̄t := 1t

m∑`=1

Rt,`

where

Rt,` := tµi∗ −t∑

j=1

r `j

with an additional ` in the definition of regret indicates which of the msample paths it corresponds to. Note that again we average R̄t overdifferent runs to gain additional robustness.

Diogo Dubart Norinho (UCL) December 10, 2012 33 / 44

Page 35: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits DLGP vs. GP bandits

Methodology

At every step t of every experience a set of arms to pull, Ut , israndomly generated and then Ut is presented to both algorithms.

Once they have chosen an arm ut ∈ Ut the same random noise valueεt will blur the observation.

For the 2 dimensional functions simulations a Gaussian noiseN (0, 0.3) was used.

The parameters of the DLGP were set as follows: φ1 = 20 and thenincreases by steps of 50, i.e. φi+1 = φi + 50.

Diogo Dubart Norinho (UCL) December 10, 2012 34 / 44

Page 36: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits DLGP vs. GP bandits

Rastirgin 2D

−5

0

5

−5

0

5

0

10

20

30

40

50

60

70

80

90

xy

f(x,y)

−5 0 5−5

−4

−3

−2

−1

0

1

2

3

4

5

x

y

Diogo Dubart Norinho (UCL) December 10, 2012 35 / 44

Page 37: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits DLGP vs. GP bandits

Rastirgin 2D

−5 0 5−5

0

5

evaluated points by GP−UCB

x

y

−3 −2 −1 0 1 2 3−5

0

5

evaluated points by DLGP

x

y

0 100 200 300 400 5000

1

2

3

4

5

6

7

average convergence speed

number of function evaluations

dist

ance

to o

ptim

um

GP−UCB

DLGP

0 100 200 300 400 50050

60

70

80

90

100

number of function evaluations

aver

age

regr

et

average regret

GP−UCB

DLGP

Diogo Dubart Norinho (UCL) December 10, 2012 36 / 44

Page 38: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits DLGP vs. GP bandits

Griewank 2D

−100

−50

0

50

100

−100

−50

0

50

100

0

1

2

3

4

5

6

7

xy

f(x,y)

−100 −50 0 50 100−100

−80

−60

−40

−20

0

20

40

60

80

100

x

y

Diogo Dubart Norinho (UCL) December 10, 2012 37 / 44

Page 39: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits DLGP vs. GP bandits

Griewank 2D

−100 −50 0 50 100−100

−50

0

50

100

evaluated points by GP−UCB

x

y

−100 −50 0 50 100−100

−50

0

50

100

evaluated points by DLGP

x

y

0 100 200 300 400 5000

10

20

30

40

50

60

70

average convergence speed

number of function evaluations

dist

ance

to o

ptim

um

GP−UCB

DLGP

0 100 200 300 400 5001.8

2

2.2

2.4

2.6

2.8

3

number of function evaluations

aver

age

regr

et

average regret

GP−UCB

DLGP

Diogo Dubart Norinho (UCL) December 10, 2012 38 / 44

Page 40: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits DLGP vs. GP bandits

Ackley’s 2D

−100 −50 0 50 100−100

0100

0

5

10

15

20

25

xy

f(x,y)

−100 −50 0 50 100−100

−80

−60

−40

−20

0

20

40

60

80

100

x

y

Diogo Dubart Norinho (UCL) December 10, 2012 39 / 44

Page 41: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits DLGP vs. GP bandits

Ackley’s 2D

−100 −80 −60 −40 −20 0 20 40 60 80 100−100

−80

−60

−40

−20

0

20

40

60

80

100

evaluated points by GP−UCB

x

y

−100 −80 −60 −40 −20 0 20 40 60 80 100−100

−80

−60

−40

−20

0

20

40

60

80

100

evaluated points by DLGP

x

y

0 50 100 150 200 250 300 350 400 450 5000

10

20

30

40

50

60

70

80

90

100

average convergence speed

number of function evaluations

dis

tan

ce

to

op

tim

um

GP−UCB

DLGP

0 50 100 150 200 250 300 350 400 450 50016

17

18

19

20

21

22

number of function evaluations

ave

rag

e r

eg

ret

average regret

GP−UCB

DLGP

Diogo Dubart Norinho (UCL) December 10, 2012 40 / 44

Page 42: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits DLGP vs. GP bandits

Comparison of D(5)500 for DLGP and GP–UCB on 2D

functions

DLGP GP–UCB

Best Median Worst Mean Std Best Median Worst Mean Std

F1 0.0724 0.157 0.256 0.144 0.0757 0.429 0.454 1.27 0.607 0.368

F2 0.725 0.881 1.38 0.954 0.262 6.82 6.96 8.08 7.16 0.515

F3 0.291 0.529 0.773 0.521 0.176 8.74 10.2 10.4 9.75 0.821

Diogo Dubart Norinho (UCL) December 10, 2012 41 / 44

Page 43: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits Conclusion

Outline

1 Introduction

2 Gaussian Process (GP)

3 Bandits

4 DLGP

5 DLGP vs. GP bandits

6 Conclusion

Diogo Dubart Norinho (UCL) December 10, 2012 42 / 44

Page 44: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits Conclusion

Take Home Message

Bandits allow to optimise easily without highly restrictiveassumptions.

GP bandits exploit dependencies in the observations.

DLGP converges up to 20 times faster to the optimum.

Exciting opportunities to further improve DLGP.

Use other surrogate optimising algorithms.Learn kernels (not only hyperparameters).

Use DLGP instead of GP-bandits

Diogo Dubart Norinho (UCL) December 10, 2012 43 / 44

Page 45: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits Conclusion

Thank you.

Diogo Dubart Norinho (UCL) December 10, 2012 44 / 44

Page 46: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits

ANNEXES

Diogo Dubart Norinho (UCL) December 10, 2012 45 / 44

Page 47: Dynamic Learning Gaussian Process Bandits€¦ · Diogo Dubart Norinho (UCL) December 10, 2012 8 / 44. Dynamic Learning Gaussian Process Bandits Gaussian Process (GP) Example of a

Dynamic Learning Gaussian Process Bandits

The functions in our test suit include the Rastigin F1, Griewank F2 andAckley’s F3. The search domain for F1 is [−5, 5]D while it is [−100, 100]D

for F2 and F3. All of them are multimodal and only F2 is non-separable.

F1(x) = 10D +D∑i=1

[x2i + cos(2πxi ) ]

F2(x) =D∑i=1

xi4000

−D∏i=1

cosxi√i

+ 1

F3(z) =− 20 exp

−0.2

√√√√ 1

D

D∑i=1

z2i

− exp

(1

D

D∑i=1

cos(2πzi )

)+ 20 + e1

Diogo Dubart Norinho (UCL) December 10, 2012 46 / 44