classical situation hellheaven world deterministic state observable

26
Classical Situation hell heaven • World deterministic • State observable

Upload: irene-watson

Post on 18-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Classical Situation hellheaven World deterministic State observable

Classical Situation

hellheaven

• World deterministic• State observable

Page 2: Classical Situation hellheaven World deterministic State observable

MDP-Style Planning

hellheaven

• World stochastic• State observable

[Koditschek 87, Barto et al. 89]

• Policy• Universal Plan• Navigation function

Page 3: Classical Situation hellheaven World deterministic State observable

Stochastic, Partially Observable

sign

hell?heaven?

[Sondik 72] [Littman/Cassandra/Kaelbling 97]

Page 4: Classical Situation hellheaven World deterministic State observable

Stochastic, Partially Observable

sign

hellheaven

sign

heavenhell

Page 5: Classical Situation hellheaven World deterministic State observable

Stochastic, Partially Observable

sign

heavenhell

sign

??

sign

hellheaven

start

50% 50%

Page 6: Classical Situation hellheaven World deterministic State observable

Robot Planning FrameworksClassicalAI/robotplanning

State/actions discrete & continuous

State observable

Environment deterministic

Plans Sequences of actions

Completeness Yes

Optimality Rarely

State space size

Huge, often continuous, 6 dimensions

Computational Complexity

varies

Page 7: Classical Situation hellheaven World deterministic State observable

MDP-Style Planning

hellheaven

• World stochastic• State observable

[Koditschek 87, Barto et al. 89]

• Policy• Universal Plan• Navigation function

Page 8: Classical Situation hellheaven World deterministic State observable

Markov Decision Process (discrete)

s2

s3

s4s5

s1

0.7

0.3

0.90.1

0.3

0.3

0.4

0.99

0.1

0.2

0.8 r=10

r=0

r=0

r=1

r=0

[Bellman 57] [Howard 60] [Sutton/Barto 98]

Page 9: Classical Situation hellheaven World deterministic State observable

Value Iteration• Value function of policy

• Bellman equation for optimal value function

• Value iteration: recursively estimating value function

• Greedy policy:

)(,|)()( iitt

sasssrEsV

')'(),|'(max)()( dssVasspsrsVa

')'(),|'(argmax)( dssVasspsa

')'(),|'(max)()( dssVasspsrsVa

[Bellman 57] [Howard 60] [Sutton/Barto 98]

Page 10: Classical Situation hellheaven World deterministic State observable

Value Iteration for Motion Planning

(assumes knowledge of robot’s location)

Page 11: Classical Situation hellheaven World deterministic State observable

Continuous Environments

From: A Moore & C.G. Atkeson “The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Continuous State spaces,” Machine Learning 1995

Page 12: Classical Situation hellheaven World deterministic State observable

Approximate Cell Decomposition [Latombe 91]

From: A Moore & C.G. Atkeson “The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Continuous State spaces,” Machine Learning 1995

Page 13: Classical Situation hellheaven World deterministic State observable

Parti-Game [Moore 96]

From: A Moore & C.G. Atkeson “The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Continuous State spaces,” Machine Learning 1995

Page 14: Classical Situation hellheaven World deterministic State observable

Robot Planning FrameworksClassicalAI/robotplanning

Value Iteration in

MDPs

Parti-Game

State/actions discrete & continuous

discrete continuous

State observable observable observable

Environment deterministic stochastic stochastic

Plans Sequences of actions

policy policy

Completeness Yes Yes Yes

Optimality Rarely Yes No

State space size

Huge, often continuous, 6 dimensions

millions n/a

Computational Complexity

varies quadratic n/a

Page 15: Classical Situation hellheaven World deterministic State observable

Stochastic, Partially Observable

sign

??

start

sign

heavenhell

sign

hellheaven

50% 50%

sign

??

start

Page 16: Classical Situation hellheaven World deterministic State observable

A Quiz

-dim continuous*stochastic1-dimcontinuous

stochastic

actions# states size belief space?sensors

3: s1, s2, s3deterministic3 perfect

3: s1, s2, s3stochastic3 perfect

23-1: s1, s2, s3, s12, s13, s23, s123deterministic3 abstract states

deterministic3 stochastic

2-dim continuous*: p(S=s1), p(S=s2)stochastic3 none

2-dim continuous*: p(S=s1), p(S=s2)

*) countable, but for all practical purposes

-dim continuous*deterministic1-dimcontinuous

stochastic

aargh!stochastic-dimcontinuous

stochastic

Page 17: Classical Situation hellheaven World deterministic State observable

Introduction to POMDPs (1 of 3)

80100

ba

0

ba

40

s2s1

action a

action b

p(s1)

[Sondik 72, Littman, Kaelbling, Cassandra ‘97]

s2s1

100

0

100

action aaction b

Page 18: Classical Situation hellheaven World deterministic State observable

Introduction to POMDPs (2 of 3)

80100

ba

0

ba

40

s2s1

80%c

20%

p(s1) s2

s1’

s1

s2’

p(s1’)

p(s1) s2s1

100

0

100

[Sondik 72, Littman, Kaelbling, Cassandra ‘97]

Page 19: Classical Situation hellheaven World deterministic State observable

Introduction to POMDPs (3 of 3)

80100

ba

0

ba

40

s2s1

80%c

20%

p(s1) s2s1

100

0

100

p(s1) s2

s1

s1

s2

p(s1’|A)

B

A50%

50%

30%

70%B

A

p(s1’|B))())|(())((

},{11 zpzspVspV

BAz

[Sondik 72, Littman, Kaelbling, Cassandra ‘97]

Page 20: Classical Situation hellheaven World deterministic State observable

Value Iteration in POMDPs• Value function of policy

• Bellman equation for optimal value function

• Value iteration: recursively estimating value function

• Greedy policy:

)(,|)()( iitt

babbbrEbV

')'(),|'(max)()( dbbVabbpbrbVa

')'(),|'(argmax)( dbbVabbpba

')'(),|'(max)()( dbbVabbpbrbVa

Substitute b for s

Page 21: Classical Situation hellheaven World deterministic State observable

Missing Terms: Belief Space

• Expected reward:

• Next state density:

dssbsrbr )()()(

')(),|'()'|'(),|'( dsdssbasspsopabop

'),|'(),,'|'(),|'( doabopabobpabbp

Bayes filters!(Dirac distribution)

Page 22: Classical Situation hellheaven World deterministic State observable

Value Iteration in Belief Space

. ...

next belief state b’

observation o

. ...

belief state b

max Q(b’, a)

next state s’, reward r’state s

Q(b, a)value function

Page 23: Classical Situation hellheaven World deterministic State observable

Why is This So Complex?

State Space Planning(no state uncertainty)

Belief Space Planning(full state uncertainties)

?

Page 24: Classical Situation hellheaven World deterministic State observable

Augmented MDPs:

s

sHsbb ][);(argmax

[Roy et al, 98/99]

conventional state space

uncertainty (entropy)

Page 25: Classical Situation hellheaven World deterministic State observable

Path Planning with Augmented MDPsinformation gainConventional planner Probabilistic Planner

[Roy et al, 98/99]

Page 26: Classical Situation hellheaven World deterministic State observable

Robot Planning FrameworksClassicalAI/robotplanning

Value Iteration in

MDPs

Parti-Game POMDP Augmented MDP

State/actions discrete & continuous

discrete continuous discrete discrete

State observable observable observable partially observable

partially observable

Environment deterministic stochastic stochastic stochastic stochastic

Plans Sequences of actions

policy policy policy policy

Completeness Yes Yes Yes Yes No

Optimality Rarely Yes No Yes No

State space size

Huge, often continuous, 6 dimensions

millions n/a dozens thousands

Computational Complexity

varies quadratic n/a exponential O(N4)