click to edit master title style batch rl approximate...

2/18/15 1

Click to edit Master title style

Click to edit Master subtitle style

2/18/15 1

Approximate Models for Batch RL

Emma Brunskill

2/18/15 2

Image from David Silver

FVI / FQI

Policy Iteration maintains both

an explicit representation of a policy and

the value of that policy

Approximate model planners

2/18/15 3

Forward Search w/Generative Model

Slide modified from David Silver

s1 s2 s1 s2

2/18/15 4

Exact/Exhaustive Forward Search

s1 s2 s1 s2max

expexp

2/18/15 5

How many nodes in a H-depth tree (as a function of state space |S| and action space |A|)?

s1 s2 s1 s2max

expexp

2/18/15 6

How many nodes in a H-depth tree (as a function of state space |S| and action space |A|)? (|S||A|)H

s1 s2 s1 s2max

expexp

2/18/15 7

Sparse Sampling: Don’t Enumerate All Next States, Instead Sample Next States s’ ~ P(s’|s,a)

Sample n next states, si ~ P(s’|s,a)

Compute (1/n) Sumi V(s_i)

Converges to expected future reward: Sums’ p(s’|s,a)V(s’)

s1 s2 s1 s2max

expexp s1 s37

2/18/15 8

Sparse Sampling: # nodes if sample n states at each action node? Independent of |S|! O(n|A|)H

Sample n next states, si ~ P(s’|s,a)

Compute (1/n) Sumi V(s_i)

Converges to expected future reward: Sums’ p(s’|s,a)V(s’)

s1 s2 s1 s2max

expexp s1 s37

2/18/15 9

Sparse Sampling: # nodes if sample n states at each action node? Independent of |S|! O(n|A|)H

Upside: Can choose n to achieve bounds on the accuracy of the value function at the root state, independent of state space size

Downside: Still exponential in horizon, n still large for good bounds

s1 s2 s1 s2max

expexp s1 s37

2/18/15 10

Limitation of Sparse Sampling

Slide modified from Alan Fern

2/18/15 11

Monte Carlo Tree Search

Combine ideas of sparse sampling with an adaptive method for focusing on more promising parts of the ree

Here “more promising” means the actions that are seem likely to yield higher long term reward

Uses the idea of simulation search

2/18/15 12

Simulation Based Search

2/18/15 13

Simulation based Search

2/18/15 14

Simple Monte Carlo Search

rollout policy dafa

greedy improvement with respect to fixed rollout policy

2/18/15 15

Upper Confidence Tree (UCT)[Kocsis & Szepesvari, 2006]

• Combine forward search and simulation search• Instance of Monte-Carlo Tree Search

• Repeated Monte Carlo simulation of rollout policy• Rollouts add one or more nodes to search tree

• UCT• Uses optimism under uncertainty idea• Some nice theoretical properties

• Much better realtime performance than sparse sampling

2/18/15 16

2/18/15 17

2/18/15 18

2/18/15 19

2/18/15 20

s2 s11

2/18/15 21

(Upper Confidence Bound)

2/18/15 22

2/18/15 23

● Requires us to have a simulator/ generative model

● Each pass down the tree, follow tree policy until reach a state leaf where not all actions have been tried.

● Then need to simulate starting from that state leaf the result of taking another action

2/18/15 24

Guarantees on UCT

Slide modified from Remi Munos

2/18/15 25

Slide modified from slides from Alan Fern & David Silver

Computer GoPrevious game tree approaches faired poorly

2/18/15 26

Rules of Go

2/18/15 27

Position Evaluation in Go

2/18/15 28

Monte Carlo Evaluation in Go:Planning problem, just a very very hard one

2/18/15 29

Enormous Progress. MCTS Huge Impact

2/18/15 30

Going Back to Batch RL...

• Use supervised learning method to compute model

• Use learned model with MCTS planning• Note: error in model will impact error in

estimated values!

• Computes an action for current state, take action, then redo planning for next state

2/18/15 31

Autonomous Driving using Texplore (Hester and Stone 2013)

2/18/15 32

Image from David Silver

FVI / FQI API

Approximate model planners

2/18/15 33

click to edit master title style batch rl approximate...

Documents

traffic incident click to edit master title style management...

click to edit master title style · click to edit master...

click to edit master title style reaction mechanisms of...

click to edit master title style - rural...

click to edit master title style click to edit master text...

click to edit master title style · click to edit master...

click to edit master title style back to the future click...

click to edit master title style click to edit master...

click to edit master science title foundation style ... ·...

click to edit master subtitle style iraklis pliakis, expert,...

1 click to edit master title style click to edit master...

click to edit master title style -...

hydrogen: click to edit master title style · click to edit...

click to edit title style

click to edit master title style

click to edit master title style - victorian institute of...

click to edit master title style click to edit master...

click to edit master title style preparing businesses for...

click to edit master title style. click to edit master title...

click to edit master title style focus 101 - drcog travel...