prediction, learning and games - chapter 4 randomized...

Prediction, Learning and Games - Chapter 4Randomized prediction

Walid Krichene

November 14, 2013

Walid Krichene Prediction, Learning and Games - Chapter 4 Randomized predictionNovember 14, 2013 1 / 16

Experts framework

RecallOn iteration t: experts reveal their advice fi,t

forecaster makes a decision pt =∑N

i=1 wi,t fi,tthe losses are revealed `(fi,t , yt) and `(pt , yt)

forecaster updates weights wi,t+1

`(·, yt) is convex

DefinitionRegret

Ri,T = LT − Li,T =T∑

`(pt , yt)− `(fi,t , yt)

Goal: RTT = o(1)

Experts framework

RecallOn iteration t: experts reveal their advice fi,tforecaster makes a decision pt =

∑Ni=1 wi,t fi,t

the losses are revealed `(fi,t , yt) and `(pt , yt)

`(·, yt) is convex

DefinitionRegret

`(pt , yt)− `(fi,t , yt)

Goal: RTT = o(1)

Experts framework

∑Ni=1 wi,t fi,t

`(·, yt) is convex

DefinitionRegret

`(pt , yt)− `(fi,t , yt)

Goal: RTT = o(1)

Experts framework

∑Ni=1 wi,t fi,t

`(·, yt) is convex

DefinitionRegret

`(pt , yt)− `(fi,t , yt)

Goal: RTT = o(1)

Experts framework

∑Ni=1 wi,t fi,t

`(·, yt) is convex

DefinitionRegret

`(pt , yt)− `(fi,t , yt)

Goal: RTT = o(1)

Experts framework

∑Ni=1 wi,t fi,t

`(·, yt) is convex

DefinitionRegret

`(pt , yt)− `(fi,t , yt)

Goal: RTT = o(1)

Multiplicative weight algorithms

DefinitionHedge algorithm

wi,t+1 ∝ wi,t exp(−γ`(fi,t , yt))

Average regret RTT ≤

ln NγT + γ

taking γt = O(√

ln NT ) yields RT

T ≤ O( ln NT ) + O( ln N

small losses: RTT ≤

1−e−γ − 1)

L∗i + 1T

ln N1−e−γ

ln NγT + γ

taking γt = O(√

ln NT ) yields RT

1−e−γ − 1)

L∗i + 1T

ln N1−e−γ

ln NγT + γ

taking γt = O(√

ln NT ) yields RT

1−e−γ − 1)

L∗i + 1T

ln N1−e−γ

ln NγT + γ

taking γt = O(√

ln NT ) yields RT

1−e−γ − 1)

L∗i + 1T

ln N1−e−γ

Motivation for randomization

What if the decision set is non-convex? E.g. 0, 1Also: any deterministic algorithm can incur Ω(n) loss.solution (to both): Randomize

Setting

Actions 1, 2, . . . ,N = [N]

forecaster maintains a probability distribution (pi,t)i∈[N]

randomly picks an action It ∼ pt

losses are revealed `(i , yt)

sequence y1, . . . , yT can be fixed a priori (oblivious opponent) or can dependon player’s decisions.

Setting

Actions 1, 2, . . . ,N = [N]

Setting

Actions 1, 2, . . . ,N = [N]

Setting

Actions 1, 2, . . . ,N = [N]

Setting

Actions 1, 2, . . . ,N = [N]

Regret

DefinitionExpected loss (conditioned on past plays) ¯(pt ,Yt) =

∑Ni=1 `(i ,Yt)pi,t

DefinitionExpected regret

Ri,T =T∑

¯(pt ,Yt)− `(i ,Yt)

Regret

Note: pt+1 only depends on pt and `(i ,Yt) (not on forecaster randomization)Strategy: bound expected regret

T≤ B(T )

then with high probability (≥ 1− δ)

T≤ B(T ) +

√− ln δ

Regret

Note: pt+1 only depends on pt and `(i ,Yt) (not on forecaster randomization)

Strategy: bound expected regret

T≤ B(T )

T≤ B(T ) +

√− ln δ

Regret

Note: pt+1 only depends on pt and `(i ,Yt) (not on forecaster randomization)Strategy: bound expected regret

T≤ B(T )

T≤ B(T ) +

√− ln δ

Back to experts

Can think of this asStrategy space is simplex on actions: ∆N

Every expert has constant advice: fi,t is the i-th vertex of the simplexdecision of forecaster is convex combination of verticesloss function is `(·,Yt) is linear: (expected) loss incurred by forecaster is∑

pt,i`(fi,t ,Yt) = `(∑

pt,i fi,t ,Yt) = `(pt ,Yt)

write the expected loss as 〈`t , pt〉 where `t,i = `(fi,t ,Yt).

Back to experts

Every expert has constant advice: fi,t is the i-th vertex of the simplex

decision of forecaster is convex combination of verticesloss function is `(·,Yt) is linear: (expected) loss incurred by forecaster is∑

pt,i`(fi,t ,Yt) = `(∑

Back to experts

Every expert has constant advice: fi,t is the i-th vertex of the simplexdecision of forecaster is convex combination of vertices

loss function is `(·,Yt) is linear: (expected) loss incurred by forecaster is∑i

pt,i`(fi,t ,Yt) = `(∑

Back to experts

Every expert has constant advice: fi,t is the i-th vertex of the simplexdecision of forecaster is convex combination of verticesloss function is `(·,Yt) is linear: (expected) loss incurred by forecaster is∑

pt,i`(fi,t ,Yt) = `(∑

Hedge algorithm

Regularized greedy algorithmHedge is solution to

pt+1 = arg minp∈∆N

〈`t , p〉+1γ

DKL(p||pt)

where DKL is the Kullback-Leibler divergence DKL(p, q) =∑

i lnpiqi

Alsopt+1 = arg min

p∈∆N〈Lt , p〉 −

where H is the entropy H(p) = −∑

i pi ln pi

Connection with stochastic optimization (last week)Greedy strategy called fictitious play

Hedge algorithm

Regularized greedy algorithmHedge is solution to

pt+1 = arg minp∈∆N

〈`t , p〉+1γ

DKL(p||pt)

where DKL is the Kullback-Leibler divergence DKL(p, q) =∑

i lnpiqi

Alsopt+1 = arg min

p∈∆N〈Lt , p〉 −

where H is the entropy H(p) = −∑

i pi ln pi

Connection with stochastic optimization (last week)Greedy strategy called fictitious play

Follow the perturbed leader

Regularized problem called: follow the perturbed leader

They prove it in a more general case

Theorem (Theorem 4.2)

It = argmini

Li,t−1 + Zi,t

T≤ 1

iZi,1 + Emax

i−Zi,1 +

∫Ft(z)(fZ (z)− fZ (z − `t))

where Ft(z) = `(it(z), yt) and fZ is the density of Z .

Follow the perturbed leader

Regularized problem called: follow the perturbed leaderThey prove it in a more general case

Theorem (Theorem 4.2)

It = argmini

Li,t−1 + Zi,t

T≤ 1

iZi,1 + Emax

i−Zi,1 +

∫Ft(z)(fZ (z)− fZ (z − `t))

where Ft(z) = `(it(z), yt) and fZ is the density of Z .

Internal regret

regret analyzed so far: external regret. Compare your cumulative loss tocumulative loss of a single expert / action

internal regret: Swap all instances of action i with j

Definition (Internal regret)

Ri,j,T =T∑

pi,t(`(i ,Yt)− `(j ,Yt))

internal regretmaxi,j

Ri,j,T

Internal regret

regret analyzed so far: external regret. Compare your cumulative loss tocumulative loss of a single expert / actioninternal regret: Swap all instances of action i with j

Ri,j,T =T∑

pi,t(`(i ,Yt)− `(j ,Yt))

Ri,j,T

Internal regret

regret analyzed so far: external regret. Compare your cumulative loss tocumulative loss of a single expert / actioninternal regret: Swap all instances of action i with j

Ri,j,T =T∑

pi,t(`(i ,Yt)− `(j ,Yt))

Ri,j,T

Internal regret

In a sense, stronger than external regret:

Ri,j,T =T∑

pt,i (`t,i − `t,j)

〈pt , `t〉 − `t,j

= Rj,T

so if maxj Rj,T ≤ N maxi,j Ri,j,T

Weighted average forecaster has large internal regret. RT = Ω(T )

But, can adapt to have small internal regret

Internal regret

Ri,j,T =T∑

pt,i (`t,i − `t,j)

〈pt , `t〉 − `t,j

= Rj,T

Internal regret

Ri,j,T =T∑

pt,i (`t,i − `t,j)

〈pt , `t〉 − `t,j

= Rj,T

Internal regret

Ri,j,T =T∑

pt,i (`t,i − `t,j)

〈pt , `t〉 − `t,j

= Rj,T

Minimizing internal regret

define modified strategies pi→jt

pi→jt,i ← 0

pi→jt,j ← pt,i + pt,j

Apply an algorithm that minimizes external regret to a set of new expertsi → j , i 6= j. O(N2) experts for the new algorithm.

An action is pi→jt .

Results in a sequence of meta-strategies µt ∈ ∆N2

The fixed point pt =∑

(i,j) pi→jt µt,(i,j) minimizes internal regret

can be computed using Gaussian elimination. (write pt = A(µt)pt)

pi→jt,i ← 0

Generalized regret

experts react to the prediction, fi,t(It)

also, activation function Ai,t(k)

Definition (Generalized regret)

ri,t =N∑

pt,kAi,t(k)(`(k,Yt)− `(fi,t(k),Yt))

External regret is a special caseSet Ai,t(k) = 1 identically, and fi,t(k) = iInternal regret is a special caseConsider experts (i , j), i 6= j). Set

f(i,j),t(k) =

k if k 6= ij otherwise

Generalized regret

ri,t =N∑

f(i,j),t(k) =

Generalized regret

ri,t =N∑

External regret is a special caseSet Ai,t(k) = 1 identically, and fi,t(k) = i

Internal regret is a special caseConsider experts (i , j), i 6= j). Set

f(i,j),t(k) =

Generalized regret

ri,t =N∑

f(i,j),t(k) =

Generalized regret

Given a potential φ, there exists a forecaster that satisfies the Blackwellcondition. (Theorem 4.3)

Recall

Blackwell condition

〈rt ,∇φ(Rt−1)〉 ≤ 0

From the Blackwell condition one can derive a bound on regret

Generalized regret

Given a potential φ, there exists a forecaster that satisfies the Blackwellcondition. (Theorem 4.3)Recall

Blackwell condition

〈rt ,∇φ(Rt−1)〉 ≤ 0

Generalized regret

Given a potential φ, there exists a forecaster that satisfies the Blackwellcondition. (Theorem 4.3)Recall

Blackwell condition

〈rt ,∇φ(Rt−1)〉 ≤ 0

Next week

Chapter 5 (Efficient forecasting for special classes of Experts)or Chapter 6 (Limited information: multi-armed bandit versions)

prediction, learning and games - chapter 4 randomized...

Documents