prediction, learning and games - chapter 4 randomized...

48
Prediction, Learning and Games - Chapter 4 Randomized prediction Walid Krichene November 14, 2013 Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized prediction November 14, 2013 1 / 16

Upload: others

Post on 21-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Prediction, Learning and Games - Chapter 4Randomized prediction

Walid Krichene

November 14, 2013

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 1 / 16

Page 2: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Experts framework

RecallOn iteration t: experts reveal their advice fi,t

forecaster makes a decision pt =∑N

i=1 wi,t fi,tthe losses are revealed `(fi,t , yt) and `(pt , yt)

forecaster updates weights wi,t+1

`(·, yt) is convex

DefinitionRegret

Ri,T = LT − Li,T =T∑

t=1

`(pt , yt)− `(fi,t , yt)

Goal: RTT = o(1)

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 2 / 16

Page 3: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Experts framework

RecallOn iteration t: experts reveal their advice fi,tforecaster makes a decision pt =

∑Ni=1 wi,t fi,t

the losses are revealed `(fi,t , yt) and `(pt , yt)

forecaster updates weights wi,t+1

`(·, yt) is convex

DefinitionRegret

Ri,T = LT − Li,T =T∑

t=1

`(pt , yt)− `(fi,t , yt)

Goal: RTT = o(1)

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 2 / 16

Page 4: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Experts framework

RecallOn iteration t: experts reveal their advice fi,tforecaster makes a decision pt =

∑Ni=1 wi,t fi,t

the losses are revealed `(fi,t , yt) and `(pt , yt)

forecaster updates weights wi,t+1

`(·, yt) is convex

DefinitionRegret

Ri,T = LT − Li,T =T∑

t=1

`(pt , yt)− `(fi,t , yt)

Goal: RTT = o(1)

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 2 / 16

Page 5: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Experts framework

RecallOn iteration t: experts reveal their advice fi,tforecaster makes a decision pt =

∑Ni=1 wi,t fi,t

the losses are revealed `(fi,t , yt) and `(pt , yt)

forecaster updates weights wi,t+1

`(·, yt) is convex

DefinitionRegret

Ri,T = LT − Li,T =T∑

t=1

`(pt , yt)− `(fi,t , yt)

Goal: RTT = o(1)

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 2 / 16

Page 6: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Experts framework

RecallOn iteration t: experts reveal their advice fi,tforecaster makes a decision pt =

∑Ni=1 wi,t fi,t

the losses are revealed `(fi,t , yt) and `(pt , yt)

forecaster updates weights wi,t+1

`(·, yt) is convex

DefinitionRegret

Ri,T = LT − Li,T =T∑

t=1

`(pt , yt)− `(fi,t , yt)

Goal: RTT = o(1)

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 2 / 16

Page 7: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Experts framework

RecallOn iteration t: experts reveal their advice fi,tforecaster makes a decision pt =

∑Ni=1 wi,t fi,t

the losses are revealed `(fi,t , yt) and `(pt , yt)

forecaster updates weights wi,t+1

`(·, yt) is convex

DefinitionRegret

Ri,T = LT − Li,T =T∑

t=1

`(pt , yt)− `(fi,t , yt)

Goal: RTT = o(1)

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 2 / 16

Page 8: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Multiplicative weight algorithms

DefinitionHedge algorithm

wi,t+1 ∝ wi,t exp(−γ`(fi,t , yt))

Average regret RTT ≤

ln NγT + γ

8

taking γt = O(√

ln NT ) yields RT

T ≤ O( ln NT ) + O( ln N

T )

small losses: RTT ≤

1T

1−e−γ − 1)

L∗i + 1T

ln N1−e−γ

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 3 / 16

Page 9: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Multiplicative weight algorithms

DefinitionHedge algorithm

wi,t+1 ∝ wi,t exp(−γ`(fi,t , yt))

Average regret RTT ≤

ln NγT + γ

8

taking γt = O(√

ln NT ) yields RT

T ≤ O( ln NT ) + O( ln N

T )

small losses: RTT ≤

1T

1−e−γ − 1)

L∗i + 1T

ln N1−e−γ

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 3 / 16

Page 10: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Multiplicative weight algorithms

DefinitionHedge algorithm

wi,t+1 ∝ wi,t exp(−γ`(fi,t , yt))

Average regret RTT ≤

ln NγT + γ

8

taking γt = O(√

ln NT ) yields RT

T ≤ O( ln NT ) + O( ln N

T )

small losses: RTT ≤

1T

1−e−γ − 1)

L∗i + 1T

ln N1−e−γ

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 3 / 16

Page 11: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Multiplicative weight algorithms

DefinitionHedge algorithm

wi,t+1 ∝ wi,t exp(−γ`(fi,t , yt))

Average regret RTT ≤

ln NγT + γ

8

taking γt = O(√

ln NT ) yields RT

T ≤ O( ln NT ) + O( ln N

T )

small losses: RTT ≤

1T

1−e−γ − 1)

L∗i + 1T

ln N1−e−γ

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 3 / 16

Page 12: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Motivation for randomization

What if the decision set is non-convex? E.g. 0, 1Also: any deterministic algorithm can incur Ω(n) loss.solution (to both): Randomize

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 4 / 16

Page 13: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Setting

Actions 1, 2, . . . ,N = [N]

forecaster maintains a probability distribution (pi,t)i∈[N]

randomly picks an action It ∼ pt

losses are revealed `(i , yt)

sequence y1, . . . , yT can be fixed a priori (oblivious opponent) or can dependon player’s decisions.

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 5 / 16

Page 14: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Setting

Actions 1, 2, . . . ,N = [N]

forecaster maintains a probability distribution (pi,t)i∈[N]

randomly picks an action It ∼ pt

losses are revealed `(i , yt)

sequence y1, . . . , yT can be fixed a priori (oblivious opponent) or can dependon player’s decisions.

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 5 / 16

Page 15: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Setting

Actions 1, 2, . . . ,N = [N]

forecaster maintains a probability distribution (pi,t)i∈[N]

randomly picks an action It ∼ pt

losses are revealed `(i , yt)

sequence y1, . . . , yT can be fixed a priori (oblivious opponent) or can dependon player’s decisions.

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 5 / 16

Page 16: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Setting

Actions 1, 2, . . . ,N = [N]

forecaster maintains a probability distribution (pi,t)i∈[N]

randomly picks an action It ∼ pt

losses are revealed `(i , yt)

sequence y1, . . . , yT can be fixed a priori (oblivious opponent) or can dependon player’s decisions.

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 5 / 16

Page 17: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Setting

Actions 1, 2, . . . ,N = [N]

forecaster maintains a probability distribution (pi,t)i∈[N]

randomly picks an action It ∼ pt

losses are revealed `(i , yt)

sequence y1, . . . , yT can be fixed a priori (oblivious opponent) or can dependon player’s decisions.

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 5 / 16

Page 18: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Regret

DefinitionExpected loss (conditioned on past plays) ¯(pt ,Yt) =

∑Ni=1 `(i ,Yt)pi,t

DefinitionExpected regret

Ri,T =T∑

t=1

¯(pt ,Yt)− `(i ,Yt)

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 6 / 16

Page 19: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Regret

Note: pt+1 only depends on pt and `(i ,Yt) (not on forecaster randomization)Strategy: bound expected regret

RT

T≤ B(T )

then with high probability (≥ 1− δ)

RT

T≤ B(T ) +

√− ln δ

T

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 7 / 16

Page 20: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Regret

Note: pt+1 only depends on pt and `(i ,Yt) (not on forecaster randomization)

Strategy: bound expected regret

RT

T≤ B(T )

then with high probability (≥ 1− δ)

RT

T≤ B(T ) +

√− ln δ

T

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 7 / 16

Page 21: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Regret

Note: pt+1 only depends on pt and `(i ,Yt) (not on forecaster randomization)Strategy: bound expected regret

RT

T≤ B(T )

then with high probability (≥ 1− δ)

RT

T≤ B(T ) +

√− ln δ

T

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 7 / 16

Page 22: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Back to experts

Can think of this asStrategy space is simplex on actions: ∆N

Every expert has constant advice: fi,t is the i-th vertex of the simplexdecision of forecaster is convex combination of verticesloss function is `(·,Yt) is linear: (expected) loss incurred by forecaster is∑

i

pt,i`(fi,t ,Yt) = `(∑

i

pt,i fi,t ,Yt) = `(pt ,Yt)

write the expected loss as 〈`t , pt〉 where `t,i = `(fi,t ,Yt).

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 8 / 16

Page 23: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Back to experts

Can think of this asStrategy space is simplex on actions: ∆N

Every expert has constant advice: fi,t is the i-th vertex of the simplex

decision of forecaster is convex combination of verticesloss function is `(·,Yt) is linear: (expected) loss incurred by forecaster is∑

i

pt,i`(fi,t ,Yt) = `(∑

i

pt,i fi,t ,Yt) = `(pt ,Yt)

write the expected loss as 〈`t , pt〉 where `t,i = `(fi,t ,Yt).

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 8 / 16

Page 24: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Back to experts

Can think of this asStrategy space is simplex on actions: ∆N

Every expert has constant advice: fi,t is the i-th vertex of the simplexdecision of forecaster is convex combination of vertices

loss function is `(·,Yt) is linear: (expected) loss incurred by forecaster is∑i

pt,i`(fi,t ,Yt) = `(∑

i

pt,i fi,t ,Yt) = `(pt ,Yt)

write the expected loss as 〈`t , pt〉 where `t,i = `(fi,t ,Yt).

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 8 / 16

Page 25: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Back to experts

Can think of this asStrategy space is simplex on actions: ∆N

Every expert has constant advice: fi,t is the i-th vertex of the simplexdecision of forecaster is convex combination of verticesloss function is `(·,Yt) is linear: (expected) loss incurred by forecaster is∑

i

pt,i`(fi,t ,Yt) = `(∑

i

pt,i fi,t ,Yt) = `(pt ,Yt)

write the expected loss as 〈`t , pt〉 where `t,i = `(fi,t ,Yt).

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 8 / 16

Page 26: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Hedge algorithm

Regularized greedy algorithmHedge is solution to

pt+1 = arg minp∈∆N

〈`t , p〉+1γ

DKL(p||pt)

where DKL is the Kullback-Leibler divergence DKL(p, q) =∑

i lnpiqi

pi

Alsopt+1 = arg min

p∈∆N〈Lt , p〉 −

H(p)

where H is the entropy H(p) = −∑

i pi ln pi

Connection with stochastic optimization (last week)Greedy strategy called fictitious play

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 9 / 16

Page 27: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Hedge algorithm

Regularized greedy algorithmHedge is solution to

pt+1 = arg minp∈∆N

〈`t , p〉+1γ

DKL(p||pt)

where DKL is the Kullback-Leibler divergence DKL(p, q) =∑

i lnpiqi

pi

Alsopt+1 = arg min

p∈∆N〈Lt , p〉 −

H(p)

where H is the entropy H(p) = −∑

i pi ln pi

Connection with stochastic optimization (last week)Greedy strategy called fictitious play

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 9 / 16

Page 28: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Follow the perturbed leader

Regularized problem called: follow the perturbed leader

They prove it in a more general case

Theorem (Theorem 4.2)

It = argmini

Li,t−1 + Zi,t

Then

RT

T≤ 1

T

(Emax

iZi,1 + Emax

i−Zi,1 +

∑i

∫Ft(z)(fZ (z)− fZ (z − `t))

)

where Ft(z) = `(it(z), yt) and fZ is the density of Z .

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 10 / 16

Page 29: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Follow the perturbed leader

Regularized problem called: follow the perturbed leaderThey prove it in a more general case

Theorem (Theorem 4.2)

It = argmini

Li,t−1 + Zi,t

Then

RT

T≤ 1

T

(Emax

iZi,1 + Emax

i−Zi,1 +

∑i

∫Ft(z)(fZ (z)− fZ (z − `t))

)

where Ft(z) = `(it(z), yt) and fZ is the density of Z .

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 10 / 16

Page 30: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Internal regret

regret analyzed so far: external regret. Compare your cumulative loss tocumulative loss of a single expert / action

internal regret: Swap all instances of action i with j

Definition (Internal regret)

Ri,j,T =T∑

t=1

pi,t(`(i ,Yt)− `(j ,Yt))

internal regretmaxi,j

Ri,j,T

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 11 / 16

Page 31: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Internal regret

regret analyzed so far: external regret. Compare your cumulative loss tocumulative loss of a single expert / actioninternal regret: Swap all instances of action i with j

Definition (Internal regret)

Ri,j,T =T∑

t=1

pi,t(`(i ,Yt)− `(j ,Yt))

internal regretmaxi,j

Ri,j,T

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 11 / 16

Page 32: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Internal regret

regret analyzed so far: external regret. Compare your cumulative loss tocumulative loss of a single expert / actioninternal regret: Swap all instances of action i with j

Definition (Internal regret)

Ri,j,T =T∑

t=1

pi,t(`(i ,Yt)− `(j ,Yt))

internal regretmaxi,j

Ri,j,T

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 11 / 16

Page 33: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Internal regret

In a sense, stronger than external regret:

∑i

Ri,j,T =T∑

t=1

∑i

pt,i (`t,i − `t,j)

=T∑

t=1

〈pt , `t〉 − `t,j

= Rj,T

so if maxj Rj,T ≤ N maxi,j Ri,j,T

Weighted average forecaster has large internal regret. RT = Ω(T )

But, can adapt to have small internal regret

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 12 / 16

Page 34: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Internal regret

In a sense, stronger than external regret:

∑i

Ri,j,T =T∑

t=1

∑i

pt,i (`t,i − `t,j)

=T∑

t=1

〈pt , `t〉 − `t,j

= Rj,T

so if maxj Rj,T ≤ N maxi,j Ri,j,T

Weighted average forecaster has large internal regret. RT = Ω(T )

But, can adapt to have small internal regret

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 12 / 16

Page 35: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Internal regret

In a sense, stronger than external regret:

∑i

Ri,j,T =T∑

t=1

∑i

pt,i (`t,i − `t,j)

=T∑

t=1

〈pt , `t〉 − `t,j

= Rj,T

so if maxj Rj,T ≤ N maxi,j Ri,j,T

Weighted average forecaster has large internal regret. RT = Ω(T )

But, can adapt to have small internal regret

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 12 / 16

Page 36: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Internal regret

In a sense, stronger than external regret:

∑i

Ri,j,T =T∑

t=1

∑i

pt,i (`t,i − `t,j)

=T∑

t=1

〈pt , `t〉 − `t,j

= Rj,T

so if maxj Rj,T ≤ N maxi,j Ri,j,T

Weighted average forecaster has large internal regret. RT = Ω(T )

But, can adapt to have small internal regret

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 12 / 16

Page 37: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Minimizing internal regret

Minimizing internal regret

define modified strategies pi→jt

pi→jt,i ← 0

pi→jt,j ← pt,i + pt,j

Apply an algorithm that minimizes external regret to a set of new expertsi → j , i 6= j. O(N2) experts for the new algorithm.

An action is pi→jt .

Results in a sequence of meta-strategies µt ∈ ∆N2

The fixed point pt =∑

(i,j) pi→jt µt,(i,j) minimizes internal regret

can be computed using Gaussian elimination. (write pt = A(µt)pt)

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 13 / 16

Page 38: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Minimizing internal regret

Minimizing internal regret

define modified strategies pi→jt

pi→jt,i ← 0

pi→jt,j ← pt,i + pt,j

Apply an algorithm that minimizes external regret to a set of new expertsi → j , i 6= j. O(N2) experts for the new algorithm.

An action is pi→jt .

Results in a sequence of meta-strategies µt ∈ ∆N2

The fixed point pt =∑

(i,j) pi→jt µt,(i,j) minimizes internal regret

can be computed using Gaussian elimination. (write pt = A(µt)pt)

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 13 / 16

Page 39: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Minimizing internal regret

Minimizing internal regret

define modified strategies pi→jt

pi→jt,i ← 0

pi→jt,j ← pt,i + pt,j

Apply an algorithm that minimizes external regret to a set of new expertsi → j , i 6= j. O(N2) experts for the new algorithm.

An action is pi→jt .

Results in a sequence of meta-strategies µt ∈ ∆N2

The fixed point pt =∑

(i,j) pi→jt µt,(i,j) minimizes internal regret

can be computed using Gaussian elimination. (write pt = A(µt)pt)

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 13 / 16

Page 40: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Minimizing internal regret

Minimizing internal regret

define modified strategies pi→jt

pi→jt,i ← 0

pi→jt,j ← pt,i + pt,j

Apply an algorithm that minimizes external regret to a set of new expertsi → j , i 6= j. O(N2) experts for the new algorithm.

An action is pi→jt .

Results in a sequence of meta-strategies µt ∈ ∆N2

The fixed point pt =∑

(i,j) pi→jt µt,(i,j) minimizes internal regret

can be computed using Gaussian elimination. (write pt = A(µt)pt)

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 13 / 16

Page 41: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Generalized regret

experts react to the prediction, fi,t(It)

also, activation function Ai,t(k)

Definition (Generalized regret)

ri,t =N∑

k=1

pt,kAi,t(k)(`(k,Yt)− `(fi,t(k),Yt))

External regret is a special caseSet Ai,t(k) = 1 identically, and fi,t(k) = iInternal regret is a special caseConsider experts (i , j), i 6= j). Set

f(i,j),t(k) =

k if k 6= ij otherwise

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 14 / 16

Page 42: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Generalized regret

experts react to the prediction, fi,t(It)

also, activation function Ai,t(k)

Definition (Generalized regret)

ri,t =N∑

k=1

pt,kAi,t(k)(`(k,Yt)− `(fi,t(k),Yt))

External regret is a special caseSet Ai,t(k) = 1 identically, and fi,t(k) = iInternal regret is a special caseConsider experts (i , j), i 6= j). Set

f(i,j),t(k) =

k if k 6= ij otherwise

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 14 / 16

Page 43: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Generalized regret

experts react to the prediction, fi,t(It)

also, activation function Ai,t(k)

Definition (Generalized regret)

ri,t =N∑

k=1

pt,kAi,t(k)(`(k,Yt)− `(fi,t(k),Yt))

External regret is a special caseSet Ai,t(k) = 1 identically, and fi,t(k) = i

Internal regret is a special caseConsider experts (i , j), i 6= j). Set

f(i,j),t(k) =

k if k 6= ij otherwise

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 14 / 16

Page 44: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Generalized regret

experts react to the prediction, fi,t(It)

also, activation function Ai,t(k)

Definition (Generalized regret)

ri,t =N∑

k=1

pt,kAi,t(k)(`(k,Yt)− `(fi,t(k),Yt))

External regret is a special caseSet Ai,t(k) = 1 identically, and fi,t(k) = iInternal regret is a special caseConsider experts (i , j), i 6= j). Set

f(i,j),t(k) =

k if k 6= ij otherwise

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 14 / 16

Page 45: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Generalized regret

Given a potential φ, there exists a forecaster that satisfies the Blackwellcondition. (Theorem 4.3)

Recall

Blackwell condition

〈rt ,∇φ(Rt−1)〉 ≤ 0

From the Blackwell condition one can derive a bound on regret

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 15 / 16

Page 46: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Generalized regret

Given a potential φ, there exists a forecaster that satisfies the Blackwellcondition. (Theorem 4.3)Recall

Blackwell condition

〈rt ,∇φ(Rt−1)〉 ≤ 0

From the Blackwell condition one can derive a bound on regret

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 15 / 16

Page 47: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Generalized regret

Given a potential φ, there exists a forecaster that satisfies the Blackwellcondition. (Theorem 4.3)Recall

Blackwell condition

〈rt ,∇φ(Rt−1)〉 ≤ 0

From the Blackwell condition one can derive a bound on regret

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 15 / 16

Page 48: Prediction, Learning and Games - Chapter 4 Randomized ...walid.krichene.net/notes/reading-plg-chap4.pdf · Prediction, Learning and Games - Chapter 4 Randomized prediction WalidKrichene

Next week

Chapter 5 (Efficient forecasting for special classes of Experts)or Chapter 6 (Limited information: multi-armed bandit versions)

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 16 / 16