classical situation hellheaven world deterministic state observable
TRANSCRIPT
![Page 1: Classical Situation hellheaven World deterministic State observable](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d155503460f949eb25b/html5/thumbnails/1.jpg)
Classical Situation
hellheaven
• World deterministic• State observable
![Page 2: Classical Situation hellheaven World deterministic State observable](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d155503460f949eb25b/html5/thumbnails/2.jpg)
MDP-Style Planning
hellheaven
• World stochastic• State observable
[Koditschek 87, Barto et al. 89]
• Policy• Universal Plan• Navigation function
![Page 3: Classical Situation hellheaven World deterministic State observable](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d155503460f949eb25b/html5/thumbnails/3.jpg)
Stochastic, Partially Observable
sign
hell?heaven?
[Sondik 72] [Littman/Cassandra/Kaelbling 97]
![Page 4: Classical Situation hellheaven World deterministic State observable](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d155503460f949eb25b/html5/thumbnails/4.jpg)
Stochastic, Partially Observable
sign
hellheaven
sign
heavenhell
![Page 5: Classical Situation hellheaven World deterministic State observable](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d155503460f949eb25b/html5/thumbnails/5.jpg)
Stochastic, Partially Observable
sign
heavenhell
sign
??
sign
hellheaven
start
50% 50%
![Page 6: Classical Situation hellheaven World deterministic State observable](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d155503460f949eb25b/html5/thumbnails/6.jpg)
Robot Planning FrameworksClassicalAI/robotplanning
State/actions discrete & continuous
State observable
Environment deterministic
Plans Sequences of actions
Completeness Yes
Optimality Rarely
State space size
Huge, often continuous, 6 dimensions
Computational Complexity
varies
![Page 7: Classical Situation hellheaven World deterministic State observable](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d155503460f949eb25b/html5/thumbnails/7.jpg)
MDP-Style Planning
hellheaven
• World stochastic• State observable
[Koditschek 87, Barto et al. 89]
• Policy• Universal Plan• Navigation function
![Page 8: Classical Situation hellheaven World deterministic State observable](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d155503460f949eb25b/html5/thumbnails/8.jpg)
Markov Decision Process (discrete)
s2
s3
s4s5
s1
0.7
0.3
0.90.1
0.3
0.3
0.4
0.99
0.1
0.2
0.8 r=10
r=0
r=0
r=1
r=0
[Bellman 57] [Howard 60] [Sutton/Barto 98]
![Page 9: Classical Situation hellheaven World deterministic State observable](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d155503460f949eb25b/html5/thumbnails/9.jpg)
Value Iteration• Value function of policy
• Bellman equation for optimal value function
• Value iteration: recursively estimating value function
• Greedy policy:
)(,|)()( iitt
sasssrEsV
')'(),|'(max)()( dssVasspsrsVa
')'(),|'(argmax)( dssVasspsa
')'(),|'(max)()( dssVasspsrsVa
[Bellman 57] [Howard 60] [Sutton/Barto 98]
![Page 10: Classical Situation hellheaven World deterministic State observable](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d155503460f949eb25b/html5/thumbnails/10.jpg)
Value Iteration for Motion Planning
(assumes knowledge of robot’s location)
![Page 11: Classical Situation hellheaven World deterministic State observable](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d155503460f949eb25b/html5/thumbnails/11.jpg)
Continuous Environments
From: A Moore & C.G. Atkeson “The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Continuous State spaces,” Machine Learning 1995
![Page 12: Classical Situation hellheaven World deterministic State observable](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d155503460f949eb25b/html5/thumbnails/12.jpg)
Approximate Cell Decomposition [Latombe 91]
From: A Moore & C.G. Atkeson “The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Continuous State spaces,” Machine Learning 1995
![Page 13: Classical Situation hellheaven World deterministic State observable](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d155503460f949eb25b/html5/thumbnails/13.jpg)
Parti-Game [Moore 96]
From: A Moore & C.G. Atkeson “The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Continuous State spaces,” Machine Learning 1995
![Page 14: Classical Situation hellheaven World deterministic State observable](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d155503460f949eb25b/html5/thumbnails/14.jpg)
Robot Planning FrameworksClassicalAI/robotplanning
Value Iteration in
MDPs
Parti-Game
State/actions discrete & continuous
discrete continuous
State observable observable observable
Environment deterministic stochastic stochastic
Plans Sequences of actions
policy policy
Completeness Yes Yes Yes
Optimality Rarely Yes No
State space size
Huge, often continuous, 6 dimensions
millions n/a
Computational Complexity
varies quadratic n/a
![Page 15: Classical Situation hellheaven World deterministic State observable](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d155503460f949eb25b/html5/thumbnails/15.jpg)
Stochastic, Partially Observable
sign
??
start
sign
heavenhell
sign
hellheaven
50% 50%
sign
??
start
![Page 16: Classical Situation hellheaven World deterministic State observable](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d155503460f949eb25b/html5/thumbnails/16.jpg)
A Quiz
-dim continuous*stochastic1-dimcontinuous
stochastic
actions# states size belief space?sensors
3: s1, s2, s3deterministic3 perfect
3: s1, s2, s3stochastic3 perfect
23-1: s1, s2, s3, s12, s13, s23, s123deterministic3 abstract states
deterministic3 stochastic
2-dim continuous*: p(S=s1), p(S=s2)stochastic3 none
2-dim continuous*: p(S=s1), p(S=s2)
*) countable, but for all practical purposes
-dim continuous*deterministic1-dimcontinuous
stochastic
aargh!stochastic-dimcontinuous
stochastic
![Page 17: Classical Situation hellheaven World deterministic State observable](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d155503460f949eb25b/html5/thumbnails/17.jpg)
Introduction to POMDPs (1 of 3)
80100
ba
0
ba
40
s2s1
action a
action b
p(s1)
[Sondik 72, Littman, Kaelbling, Cassandra ‘97]
s2s1
100
0
100
action aaction b
![Page 18: Classical Situation hellheaven World deterministic State observable](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d155503460f949eb25b/html5/thumbnails/18.jpg)
Introduction to POMDPs (2 of 3)
80100
ba
0
ba
40
s2s1
80%c
20%
p(s1) s2
s1’
s1
s2’
p(s1’)
p(s1) s2s1
100
0
100
[Sondik 72, Littman, Kaelbling, Cassandra ‘97]
![Page 19: Classical Situation hellheaven World deterministic State observable](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d155503460f949eb25b/html5/thumbnails/19.jpg)
Introduction to POMDPs (3 of 3)
80100
ba
0
ba
40
s2s1
80%c
20%
p(s1) s2s1
100
0
100
p(s1) s2
s1
s1
s2
p(s1’|A)
B
A50%
50%
30%
70%B
A
p(s1’|B))())|(())((
},{11 zpzspVspV
BAz
[Sondik 72, Littman, Kaelbling, Cassandra ‘97]
![Page 20: Classical Situation hellheaven World deterministic State observable](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d155503460f949eb25b/html5/thumbnails/20.jpg)
Value Iteration in POMDPs• Value function of policy
• Bellman equation for optimal value function
• Value iteration: recursively estimating value function
• Greedy policy:
)(,|)()( iitt
babbbrEbV
')'(),|'(max)()( dbbVabbpbrbVa
')'(),|'(argmax)( dbbVabbpba
')'(),|'(max)()( dbbVabbpbrbVa
Substitute b for s
![Page 21: Classical Situation hellheaven World deterministic State observable](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d155503460f949eb25b/html5/thumbnails/21.jpg)
Missing Terms: Belief Space
• Expected reward:
• Next state density:
dssbsrbr )()()(
')(),|'()'|'(),|'( dsdssbasspsopabop
'),|'(),,'|'(),|'( doabopabobpabbp
Bayes filters!(Dirac distribution)
![Page 22: Classical Situation hellheaven World deterministic State observable](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d155503460f949eb25b/html5/thumbnails/22.jpg)
Value Iteration in Belief Space
. ...
next belief state b’
observation o
. ...
belief state b
max Q(b’, a)
next state s’, reward r’state s
Q(b, a)value function
![Page 23: Classical Situation hellheaven World deterministic State observable](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d155503460f949eb25b/html5/thumbnails/23.jpg)
Why is This So Complex?
State Space Planning(no state uncertainty)
Belief Space Planning(full state uncertainties)
?
![Page 24: Classical Situation hellheaven World deterministic State observable](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d155503460f949eb25b/html5/thumbnails/24.jpg)
Augmented MDPs:
s
sHsbb ][);(argmax
[Roy et al, 98/99]
conventional state space
uncertainty (entropy)
![Page 25: Classical Situation hellheaven World deterministic State observable](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d155503460f949eb25b/html5/thumbnails/25.jpg)
Path Planning with Augmented MDPsinformation gainConventional planner Probabilistic Planner
[Roy et al, 98/99]
![Page 26: Classical Situation hellheaven World deterministic State observable](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d155503460f949eb25b/html5/thumbnails/26.jpg)
Robot Planning FrameworksClassicalAI/robotplanning
Value Iteration in
MDPs
Parti-Game POMDP Augmented MDP
State/actions discrete & continuous
discrete continuous discrete discrete
State observable observable observable partially observable
partially observable
Environment deterministic stochastic stochastic stochastic stochastic
Plans Sequences of actions
policy policy policy policy
Completeness Yes Yes Yes Yes No
Optimality Rarely Yes No Yes No
State space size
Huge, often continuous, 6 dimensions
millions n/a dozens thousands
Computational Complexity
varies quadratic n/a exponential O(N4)