xz s - university of texas at austinpstone/courses/394rfall19/resources/ch9-slide… · •...

23
X. Cs ) my s Xz C s ) # s VCs ) = w , X. C s ) t Wzxz ( s )

Upload: others

Post on 14-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Xz s - University of Texas at Austinpstone/Courses/394Rfall19/resources/Ch9-slide… · • Ilse)--E. klsa.si) Gfi) jcsa)--the Gbi) " Kemal trick " eqinilent to computing I in high

X. Cs )my

s

Xz C s ) #s

VCs ) = w, X. C s ) t Wzxz ( s )

Page 2: Xz s - University of Texas at Austinpstone/Courses/394Rfall19/resources/Ch9-slide… · • Ilse)--E. klsa.si) Gfi) jcsa)--the Gbi) " Kemal trick " eqinilent to computing I in high

w,

= 2 Wz =D :

X. Cslmy M

s

Xz ( s ) # W,

=L Vz =L :

s ✓VCs ) = Willis ) twzxzls )

w,

, o wz ' 4 :

I

Page 3: Xz s - University of Texas at Austinpstone/Courses/394Rfall19/resources/Ch9-slide… · • Ilse)--E. klsa.si) Gfi) jcsa)--the Gbi) " Kemal trick " eqinilent to computing I in high

w,

= 2 Wz = O :

X. Cslmy N

s

Xz ( s ) # W,

=L Vz = I :

s ✓VCs ) = w

, X. C s ) t Wzxzl s )w

,, o We24 :

Tabular equivalent : /x. Css = ( 'oitsews've

x. css

"

:{'oaetisesnnere

Thus,

Wi = Vlsi )

Page 4: Xz s - University of Texas at Austinpstone/Courses/394Rfall19/resources/Ch9-slide… · • Ilse)--E. klsa.si) Gfi) jcsa)--the Gbi) " Kemal trick " eqinilent to computing I in high

Linear Regression :

X ( s ) #dope =L

s

Xz ( s )I

.

s

Y = W,

X,

t Wz Xz

= w,

X,

t Wz . I

i Islope intercept

Page 5: Xz s - University of Texas at Austinpstone/Courses/394Rfall19/resources/Ch9-slide… · • Ilse)--E. klsa.si) Gfi) jcsa)--the Gbi) " Kemal trick " eqinilent to computing I in high

Linear Regression : Supervised learning :

slope =LGiven ( x

, y > pairs,

X ( s ) #learn fcx ) → y

Discrete Y → classifications

e. g .

"

is this image a cat or a dog ?"

Xz ( s ) Continuous Y → regression

I.

e. g .

"

predict height from weight"

s

Y = W,

X,

t Uz Xz

= w,

X,

t Wz . I

i Islope intercept

Page 6: Xz s - University of Texas at Austinpstone/Courses/394Rfall19/resources/Ch9-slide… · • Ilse)--E. klsa.si) Gfi) jcsa)--the Gbi) " Kemal trick " eqinilent to computing I in high

Etf -

- E. ( w'-

xi - Yi )'

EMEwW

'

Page 7: Xz s - University of Texas at Austinpstone/Courses/394Rfall19/resources/Ch9-slide… · • Ilse)--E. klsa.si) Gfi) jcsa)--the Gbi) " Kemal trick " eqinilent to computing I in high

ECW) -

- E. ( wtxi - Yi )'

Etr ) ✓ w.÷÷W"

Batch least squares : Sol .vefor arswmin Ecw ) directly

Exact gradient descent ÷Compute ¥ exactly from all Xi

SGD : Approximate daw

with a smell number of Samplesof X

,often only one

.

Page 8: Xz s - University of Texas at Austinpstone/Courses/394Rfall19/resources/Ch9-slide… · • Ilse)--E. klsa.si) Gfi) jcsa)--the Gbi) " Kemal trick " eqinilent to computing I in high

Eh) -

- E. ( wtxi - Yi )'

EM ✓ w

" 9.⑧.÷÷W

Wi

Batch least squares : Sol .vefor arswmin Ecw ) directly

Exact gradient descent ÷Compute LET exactly from all Xi

SGD : Approximate DIaw

with a small number of Samplesof X

,often only one

.

Page 9: Xz s - University of Texas at Austinpstone/Courses/394Rfall19/resources/Ch9-slide… · • Ilse)--E. klsa.si) Gfi) jcsa)--the Gbi) " Kemal trick " eqinilent to computing I in high

RL t Function Approximationupdate Rule :

w ← w t a ( Ge - T ( Se,

w ) ) TT ( se,

w )

Page 10: Xz s - University of Texas at Austinpstone/Courses/394Rfall19/resources/Ch9-slide… · • Ilse)--E. klsa.si) Gfi) jcsa)--the Gbi) " Kemal trick " eqinilent to computing I in high

RL t Function Approximationupdate Rule :

w ← w t a ( Ge - T ( Se,

w ) ) TT ( se,

w )

when approximate is linear in features I bases

i. e. ICE

,w ) = w . X C s )

,then :

w ← w t a ( Ge - I ( St,

w ) ) x ( se )

since dd÷!s)

= Xi Ls )

Page 11: Xz s - University of Texas at Austinpstone/Courses/394Rfall19/resources/Ch9-slide… · • Ilse)--E. klsa.si) Gfi) jcsa)--the Gbi) " Kemal trick " eqinilent to computing I in high

RL t Function Approximationupdate Rule :

w ← w t a ( Ge - I ( Se,

w ) ) TT ( se,

w )

when approximate is linear in features I bases

i. e. ICE

,w ) = w . X C s )

,then :

w ← w t a ( Ge - I ( St,

w ) ) x ( se )

since ddj) -

- Xi Ls ) fAssigns

"

credit"

to

most activated features

for success t failure

Page 12: Xz s - University of Texas at Austinpstone/Courses/394Rfall19/resources/Ch9-slide… · • Ilse)--E. klsa.si) Gfi) jcsa)--the Gbi) " Kemal trick " eqinilent to computing I in high

Fourier Basis

" Inf- Wz Mm

+ w, Nhhrrwhrwhr

:= whenParam s :

- Order

- Interaction terms ?

Page 13: Xz s - University of Texas at Austinpstone/Courses/394Rfall19/resources/Ch9-slide… · • Ilse)--E. klsa.si) Gfi) jcsa)--the Gbi) " Kemal trick " eqinilent to computing I in high

Fourier Basis Radial Basis Functions

wi -#

f- wz Mhm -1+ w

, Nhhvhlhrwhr #÷

÷ :

y

= Them Params :

- Kernel widthParams :

- Tiling density- Order

- Interaction terms ? ( both variableper dimension )

Page 14: Xz s - University of Texas at Austinpstone/Courses/394Rfall19/resources/Ch9-slide… · • Ilse)--E. klsa.si) Gfi) jcsa)--the Gbi) " Kemal trick " eqinilent to computing I in high

Fourier Basis Radial Basis Functions

wi -#

t Wz Mm -1+ w

, Mlhrvuhnvhr #÷

÷ :

y

= them Params :

- Kernel widthParams :

- Tiling density- Order

- Interaction terms ? ( both variableper dimension )

Hand -

Designed

Featuresa

i :÷::*:Iia :::"

:"

. s"

,

::p:L:*:h::I ?

" 1 if exactly this state,

0 otherwise" Locality ?

-

Page 15: Xz s - University of Texas at Austinpstone/Courses/394Rfall19/resources/Ch9-slide… · • Ilse)--E. klsa.si) Gfi) jcsa)--the Gbi) " Kemal trick " eqinilent to computing I in high

Perception

y

i

②YAY

④④④Y = X

,W

,t Xzwz t Xzwz

dy = X,

dw,Y is Linear in X

Page 16: Xz s - University of Texas at Austinpstone/Courses/394Rfall19/resources/Ch9-slide… · • Ilse)--E. klsa.si) Gfi) jcsa)--the Gbi) " Kemal trick " eqinilent to computing I in high

Perception Neural Network

y Y

qoutput layer

IO (e.g .

state valve )② w.fiO O Hidden layer ( s )144

w

Features"

)

④④④O O O O input layer

( usually raw stale vars )y = X. W,

t Xzwz t X3W3

dy

dI=x, d-= ?dw, Wi

,I

Y is Linear in X ⇒ Differentiation Voa Back propagation

Y is Nonlinear in X !

Page 17: Xz s - University of Texas at Austinpstone/Courses/394Rfall19/resources/Ch9-slide… · • Ilse)--E. klsa.si) Gfi) jcsa)--the Gbi) " Kemal trick " eqinilent to computing I in high

Nonparametric Methods

K - nearest neighbor

°

°

O 0

°

•On

itofo

°

,-

'

'

•I o

-o

-

yi

-- - -

-I '

Ilsa ) -

- IT :{ G Csi )

Page 18: Xz s - University of Texas at Austinpstone/Courses/394Rfall19/resources/Ch9-slide… · • Ilse)--E. klsa.si) Gfi) jcsa)--the Gbi) " Kemal trick " eqinilent to computing I in high

Nonparametric Methods

kernel MethodsK - nearest neighbor

Similarity"

kernel"

K ( s,

s'

)•

°

°

. . For data set of N States°

•On

o And query state Sqo

,

;'

'

'

•I •

•di-

- - -- -

'

• Ilse ) -

- E. klsa.si ) Gfi )Ilsa ) -

-

'

I :{ Glsi )

Page 19: Xz s - University of Texas at Austinpstone/Courses/394Rfall19/resources/Ch9-slide… · • Ilse)--E. klsa.si) Gfi) jcsa)--the Gbi) " Kemal trick " eqinilent to computing I in high

Nonparametric Methods

Kernel MethodsK - nearest neighbor

Similarity"

kernel"

K ( s,

S'

)•

°

°

. . For data set of N States°

•On

.

o And query state Sq°

,'

'

'

• I do

- --

I• Ilse ) -

- E. klsa.si ) Gfi )jcsa ) -

- the Gbi )"

Kemal trick" eqinilent to

computingI in high . d. in space

of features

KNN special case kernel where :

→ Complexity of kernel methodsK ( s

,s

'

) = Yr,

if s'

is a KNN of Sgrows with number of data

0 otherwisepoints

,not features

Page 20: Xz s - University of Texas at Austinpstone/Courses/394Rfall19/resources/Ch9-slide… · • Ilse)--E. klsa.si) Gfi) jcsa)--the Gbi) " Kemal trick " eqinilent to computing I in high

w ← w - I a 0 [ Va Cse ) - I Ge,

w ) )2

w ← w t a ( ⑥ e- J ( se

,

w ) ] . T i ( se,

w )

Page 21: Xz s - University of Texas at Austinpstone/Courses/394Rfall19/resources/Ch9-slide… · • Ilse)--E. klsa.si) Gfi) jcsa)--the Gbi) " Kemal trick " eqinilent to computing I in high

we w - E a Of Valse ) - I Ge,

w ) )Z

w ← w t a ( ⑥ e- I ( se

,

w ) ] . T i (Se,

w )

-MC Estimate :

T

Ue = [ ji- t - '

Rii -

- E

- Unbiased

- fixed when W changes

converges near global opt

Page 22: Xz s - University of Texas at Austinpstone/Courses/394Rfall19/resources/Ch9-slide… · • Ilse)--E. klsa.si) Gfi) jcsa)--the Gbi) " Kemal trick " eqinilent to computing I in high

we w - E a Of Valse ) - Gcse,

w ) )Z

w ← w t a ( ⑥ e- I ( se

,

w ) ] . Of (Se,

w )

aMC Estimate : Bootstrap Estimate :

T

Ue = [ ji- e - '

RiUe = Re t 8V ( Sea

,

w )i -

. t- Biased

- Unbiased- Function of w → changes with w !

- fixed when W changes- Gradient calc treated Ue

converges near global opt as a constant !

Converges to TD fixed point ( local opt )

For TD fixed vet (was ) E Fr min ( VI ( w ) )Point Wto!

Page 23: Xz s - University of Texas at Austinpstone/Courses/394Rfall19/resources/Ch9-slide… · • Ilse)--E. klsa.si) Gfi) jcsa)--the Gbi) " Kemal trick " eqinilent to computing I in high