r.us mean eexp
TRANSCRIPT
sub Gaussian R.US
A meanzero RV Z is d sub Gaussian if E expX2 Eexp O'a HER
Chernoff bound w sub Gaussian RU.s
If Xi Xa are IID RV w Elk M and X M is 82sub Gaussian then
IPC Xi p e e exp it S
If XE 0,13 then take a 0,6 1 X Ely is t sus GaussianHoeffding's Lemma
If XE La b the X Efx is bid sub GaussianUnion Bound or sub additivity of probability
Let A An be events in a welldefined probability space
00ther IPC A 3 E ÉPCA
Given n treatments Input horizon TEIN
for til 2 T
Patient t arrives w illness
prescribes treatment It Eln l inAgent
Observe XI t patient t recovers given It
Let Xi patient t recovers given i
Assume for each ie n Xi É are IID w E Xi O
Questionsperhaps Xia e
How long does it take to identify the best treatment 917 Oi
with probability at least I 8
If Rt MEEEEEXie Xia how can we minimize Rt
Expected Regret
Rt YES III Xie Xi É Ii ELTwhere Ai YE O Oi T É It i
Warm up n tonsi.de strategy that plays
action ieln round robin until Ti 3 then
plays 91 4 Gi for the remainder T ns times
T empirical mean of first 3 plays
ieCn
Define di 10 Oilers at
hemal PLAYED I IPC E 3 21 8
Prod Gi III Xis Xi e40,13 ae iid IELXi.itXi Oi is Isas Cess
with 02 1 4
PCE PAO Oil FIXE 2expt E.ltZexp log 2n 8
IPC id 3 eÉ IP d 12 4 8
Because IP f Ei I J assume I di occurs
Assume WLOG 02022 On
If 2 580 log 2m18 then on events d ME
8 O Fast Cd
O
Out
28 FÉ Ez
Oss
E
With probability at least I J strategy
identifies 937Y Oi0,2022 20h n 2 O Oz
Rt É Ai ELT A ELI since m2 and A O
A ELT HEE.n 3 1144 ve 3
Az Tolland A ELINE VE
E Az 3ELILENED A IT 3 ELISE VE 3
Note on d ME we proved above On
If Q then T J T T J
d nd 35 3 T T 33
IE To II nd 31K nd
3 ELI d ndI 3
Tell d Adz 31154 nd
It is true that IP d n dz 21 8 thus
W.p 21 8 Aztz E Az
I IeeeS
E A J t AZT 8
It 8A a log 2n 8 A TS
Idea choose 8 117 AEI
E 8A log antI
Ve wanted Rt oct we have it 105ft so
Note If treatments were nearly identical so that A 10
then our regret bound says R I 1010
But this is nonsense ble it shouldn't matter what
is prescribed they act nearly the same
We know RT A ELT E AT
Rtt min 2 88102 glut Azt
É J
I
maximize over Az ofRt 2 4 O
minlehtl.IM
A Fi2 Attend
Note R ITinstance indep regretbound
Multi armed Bandit problem
Imagine Casino
farmthat you pull to play I armed bandit
B DI postI 2 n
Each machine i pays out random amount w mean Oi
Elimination style algorithm
using pullsonlyd from round l
an
Assume observation Xi at time t is sub Gaussian
and iid and IELXi.AE oil
Assume WLOG O Or On Ai O Oi A Y'Is HiAz
hemmed With probability at least l S
arm IEXe and fixed I See KLEIN
Proof For all le IN and ie n define
die 18 0 see
Note that Eez feats Thus
IPCY.Yexed.ie E1P II dieEEFE.PCE.ie
fact É 2
III retrieslixen
P I I die 21 J thus assume these events hold
II ai is kicked out of Xe then 7 arm je Xe
Oj e Fi Zee
Assume IEXe and show I excel
For all j exe we have
One One Oe O 10 One A
E Ee t Ee to
ZeeI EXey arm I E Xe KLEIN
Fix any ie te s t Ai teeRecall I E Xe te EN
Want to show that arm it does not enter XeWe need to show YYxe
j.e Oi Zee
Yet Qe Qe One dieO e 0 10 One Ai
E Ee Ee t Ai
1 2 ee t 4 Ee
Z Z Ee
II One One 24 i X ee
Max
Jeter Aj 4 Ee 8Ey Ee Ze
Max
j exe Aj E 84 KLEINBY
Sample complexity Let To be the total pullsfrom arm i at the end of game
Total pulls II TiClaim Xe for all l See LA
85 Es 85 Eze l 1kg 817
Ee It He for l Tim 18s l
For any
in Ti É Yelliexe
e'If Je 111A see
Je flog a logicallog za
Éeilogite 17
E T2 log tee ItE zloglsloillft.tl EgI
Ec É log oshi
Thug 81 E logo 81 log 1615
For any C O log a I 7 C 20
S.t log ca E c'log a
T YY To EIT T ZITI EXIT
the Wip 21 8 after at most
c I log losidiff total pulls we
ha thatalyexits w best arm 1
ee due to eat doe
L i
antiQin
myX2
dit 10 0 EffinClaim p 11 di 21 8
Consider UCB upper confidencebound alg
Pull every arm once
for tiny l ntl
pull It 71592FÉ
i
i
t.ESIY
fyg
On yourhomework you show that Tilt Ees log E
FEETRt III ELT
it
We've demonstrated the existence of an algorithm
that satisfies Eltabectiloy T Rtt CÉÉ'doge
Lai Robbins 3
That Any strategy that satisfies E Told o ta tomato
w Ai O we have himto gtÉL
gg
It EI Ei at 2449ft
Thd As Tye a I UCB a
Thomasonsatistis i'III É Sampling
Thompson Sampling 1930s Thompson
Or Plo Observe Xi Xu N Ops
i
i
Plo Xiii P Exile 8 PCO
Given posterior distribution P Oilskin Xi toofor ith arm
I can answer the question P i 970 xi.d.ttThompson Sampling
for til
Pull arm Iii ul prob P i 970 xi.d.tt
I
Tequivalent
Of PLO xi.LI
Ie aroma Otit r
IP i as O xi.d.tt filling no dP O xi.d.tt
In MAB typically no sharing of information
Plo it Piloi s