ames research center planning with uncertainty in continuous domains richard dearden no fixed abode...
DESCRIPTION
Ames Research Center Motivation Panorama Image rock Image Rock Dig Trench ? Time? Power? Likelihood of Success? Different value targetsTRANSCRIPT
![Page 1: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/1.jpg)
AmesResearchCenter
Planning with Uncertainty in Continuous Domains
Richard DeardenNo fixed abode
Joint work with:Zhengzhu Feng
U. Mass Amherst Nicolas Meuleau, Dave Smith
NASA AmesRichard Washington
![Page 2: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/2.jpg)
AmesResearchCenter
Motivation
Panorama
Image rock Image Rock
Dig Trench
?
Problem: Scientists are interested in many potential targets. How to
decide which to pursue?
![Page 3: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/3.jpg)
AmesResearchCenter
Motivation
Panorama
Image rock
Image Rock
Dig Trench
?Time?
Power?
Likelihood of Success?
Different value targets
![Page 4: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/4.jpg)
AmesResearchCenter
Outline
Introduction Problem Definition A Classical Planning Approach The Markov Decision Problem approach Final Comments
![Page 5: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/5.jpg)
AmesResearchCenter
Problem Definition Aim: To select a “plan” that “maximises” long-
term expected reward received given:• Limited resources (time, power, memory capacity).• Uncertainty about the resources required to carry out
each action (“how long will it take to drive to that rock?”).
• Hard safety constraints over action applicability (must keep enough reserve power to maintain the rover).
• Uncertain action outcomes (some targets may be unreachable, instruments may be impossible to place).
Difficulties:• Continuous resources.• Actions have uncertain continuous outcomes.• Goal selection and optimization• Also possibly concurrency, …
![Page 6: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/6.jpg)
AmesResearchCenter
Possible Approaches Contingency Planning:
• Generate a single plan, but with branches.• Branch based on the actual outcome of the actions
performed so far in the plan.
Policy-based Planning:• A plan is now a policy: a mapping from states to
actions.• There’s something to do no matter what the
outcome of the actions so far.• More general, but harder to compute.
Power > 5Ah
Power 5 Ah
![Page 7: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/7.jpg)
AmesResearchCenter
An Example Problem
Drive (-2)Dig(60)Visual servo (.2, -.15) NIR
Lo res Rock finder NIR
E > .1 Ah = .05 Ah = .02 Ah
E > .6 Ah = .2 Ah = .2 Ah
= 40s = 20s
= 60s = 1s
E > 10 Ah = 5 Ah = 2.5 Ah
= 1000s = 500s
V = 100
t [9:00, 16:00] = 5s = 1s
E > .02 Ah = .01 Ah = 0 Ah
= 120s = 20s
E > .12 Ah = .1 Ah = .01 Ah
V = 50
HiRes V = 10
E > 3 Ah = 2 Ah = .5 Ah
t [10:00, 13:50] = 600s = 60s
t [10:00, 14:00] = 600s = 60s
E > 3 Ah = 2 Ah = .5 Ah
t [9:00, 14:30] = 5s = 1s
E > .02 Ah = .01 Ah = 0 Ah
V = 5
![Page 8: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/8.jpg)
AmesResearchCenter
Value Function
ExpectedValue
PowerStart time
10
15
20
5
13:20
14:40
14:20
14:0013:40
![Page 9: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/9.jpg)
AmesResearchCenter
Value Function
PowerStart time
10
15
20
5
13:20
14:40
14:2014:00
13:40
Drive (-2)Dig(60)Visual servo (.2, -.15) NIR
Lo res Rock finder NIR
E > .1 Ah = .05 Ah = .02 Ah
E > .6 Ah = .2 Ah = .2 Ah
= 40s = 20s
= 60s = 1s
E > 10 Ah = 5 Ah = 2.5 Ah
= 1000s = 500s
V = 100
t [9:00, 16:00] = 5s = 1s
E > .02 Ah = .01 Ah = 0 Ah
= 120s = 20s
E > .12 Ah = .1 Ah = .01 Ah
V = 50
HiRes V = 10
E > 3 Ah = 2 Ah = .5 Ah
t [10:00, 13:50] = 600s = 60s
t [10:00, 14:00] = 600s = 60s
E > 3 Ah = 2 Ah = .5 Ah
t [9:00, 14:30] = 5s = 1s
E > .02 Ah = .01 Ah = 0 Ah
V = 5
![Page 10: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/10.jpg)
AmesResearchCenter
Plans
Drive (-2)Dig(60)Visual servo (.2, -.15) NIR
Lo res Rock finder NIR
Time > 13:40 or Power < 10
Contingency Planning:
Policy-based Planning:• Regions of state
space have corresponding actions.
VisualServoVisualServo
Lo-Res
Hi-Res
Time < 13:40 and Power > 10 : VisualServoTime > 14:15 and Time < 14:30 and Power > 10 : Hi-Res…
![Page 11: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/11.jpg)
AmesResearchCenter
Contingency Planning
1. Seed plan2. Identify best branch point3. Generate a contingency branch4. Evaluate & integrate the branch
? ?? ?
r
Vb
Vm
Construct plangraph
Back-propagate value tables
Compute gain
![Page 12: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/12.jpg)
AmesResearchCenter
Construct Plangraph
g1
g2
g3
g4
![Page 13: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/13.jpg)
AmesResearchCenter
Add Resource Usages and Values
g1
g2
g3
g4
V1
V2
V3
V4
![Page 14: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/14.jpg)
AmesResearchCenter
Value Graphs
g1
g2
g3
g4
V1
V2
V3
V4
r
r
r
r
![Page 15: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/15.jpg)
AmesResearchCenter
Propagate Value Graphs
g1
g2
g3
g4
V1
V2
V3
V4
r
r
r
r
v
r
v
r
v
r
![Page 16: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/16.jpg)
AmesResearchCenter
p
r5 15
.1
V
p
r5 10
.2
v
r
v
r5 15
v
r10 25
Simple Back-propagation
![Page 17: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/17.jpg)
AmesResearchCenter
p
r5 15
.1
V
p
r5 10
.2
v
r
v
r5 15
v
r10 25
r > 15
Constraints
![Page 18: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/18.jpg)
AmesResearchCenter
p
r5 15
.1
V
p
r5 10
.2 v
r5 15
v
r10 25
p q
ts
v
r5 15
v
r
{t}
p
r5
{q}v
r10 20
{q}
{t}
Conjunctions
![Page 19: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/19.jpg)
AmesResearchCenter
p
r5 15
.1
V
p
r5 10
.2v
r10 25
p q
ts
p
r5
v
r10 20
{q}
{t}
v
r10 25
v
r10 20
Back-propagating Conditions
![Page 20: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/20.jpg)
AmesResearchCenter
p
r5 15
.1
V
p
r5 10
.2v
r10 25
p q
ts
p
r5
v
r10 20
{q}
{t}
r30
v
15
30
v
15
v
r10 25
v
r10 20
Back-propagating Conditions
![Page 21: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/21.jpg)
AmesResearchCenter
B
D
A
C
CDAB
CABDCADB
ACBDACDB
ABCD
Which Orderings
![Page 22: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/22.jpg)
AmesResearchCenter
v2
r
p
r10 20
.1
p
r5 10
.2v1 r5 10
v2
r10 20
rv1
v2
r10 20
v1
p
r5
v2
r10 20
v1
Max
Combining Tables
![Page 23: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/23.jpg)
AmesResearchCenter
v2
r
p
r10 20
.1
p
r5 10
.2v1 r5 10
v2
r10 20
rv1
p
r5
v2
r10 20
v1
v1+ v2
30
v2
r10 20
v1
v1+ v2
30
Achieving Both Goals
![Page 24: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/24.jpg)
AmesResearchCenter
V1
V2
V3
V4
V
r
V
r
V
r
V
r
Max
Estimating Branch Value
![Page 25: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/25.jpg)
AmesResearchCenter
r
V1
V2
V3
V4
r
P
r
plan value functionresource probability
Vm
Vb
Estimating Branch Value
![Page 26: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/26.jpg)
AmesResearchCenter
r
V1
V2
V3
V4
Vb
r
P
r
Gain = ∫ P(r) max{0,Vb(r) - Vm(r)} dr∞
0
Vm
Vb
Expected Branch Gain
![Page 27: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/27.jpg)
AmesResearchCenter
Heuristic Guidance Plangraphs generally used as heuristics – the
plans they produce may not be executable:• Not all orderings considered.• All the usual plangraph limitations:
– Delete lists generally not considered.– No mutual-exclusion representation.
• Discrete outcomes not (currently) handled.– Action uncertainty is only in resource usage, not
resulting state. Output used as heuristic guidance for classical
planner:• Start state• Goal(s) to achieve
Result is an executable plan of high value!
Drive (-1)Dig(5)Visual servo (.2, -.15) Hi res
Lo res Rock finder NIR
![Page 28: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/28.jpg)
AmesResearchCenter
ExpectedValue
PowerStart time
1015
20
5
13:20
14:4014:20
14:0013:40
Evaluating the final plan Plangraph gives a heuristic estimate of the
value of the plan. Better estimate can be computed using Monte-
Carlo techniques, but these are quite slow for a multi-dimensional continuous problem.
Figure required 500 samples per point, 4000x2000 points, so simulation of every branch of the plan 4 thousand million times. Slow!
![Page 29: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/29.jpg)
AmesResearchCenter
Outline
Introduction Problem Definition A Classical Planning Approach The Markov Decision Problem approach Final Comments
![Page 30: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/30.jpg)
AmesResearchCenter
MDP Approach: Motivation
ExpectedValue
PowerStart time
1015
20
5
13:20
14:4014:20
14:0013:40
Constant value function throughout region. Wouldn’t it be nice to only compute the value once!
Approach: Exploit the structure in the problem to find constant (or linear regions).
![Page 31: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/31.jpg)
AmesResearchCenter
Continuous MDPs States: X = {X1,X2, . . . ,Xn} Actions: A = {a1, a2, . . . , am} Transition: Pa(X0|X) Reward: Ra(X) Dynamic programming (Bellman Backup):
Can’t be computed in general without discretization
![Page 32: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/32.jpg)
AmesResearchCenter
Symbolic Dynamic Programming Special representation of transition, reward
and value using MTBDDs for discrete variables, kd-trees for continuous.
Representation makes problem structure (if any) explicit.
Dynamic programming on both the value function and the structured representation.
Idea is to do all operations of Bellman equation in MTBDD/kd-tree form.
![Page 33: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/33.jpg)
AmesResearchCenter
Requires rectangular transition, reward functions:
Continuous State Abstraction
Transition probabilities remain constant (relative to current value) over region.
Transition function is discrete: approximate continuous functions by discretizing.• Required so family of
value functions is closed under the Bellman Equation.
![Page 34: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/34.jpg)
AmesResearchCenter
Requires rectangular transition, reward functions:
Continuous State Abstraction
Reward function piecewise constant or linear over region.
This, along with discrete transition function, ensures all value functions computed using Bellman equation are also piecewise constant or linear.
Approach is to compute exact solution to approximate model.
![Page 35: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/35.jpg)
AmesResearchCenter
Value Iteration
Theorem: If Vn is rectangular PWC (PWL), then Vn+1 is rectangular PWC (PWL).
Pa Vn Vn+1
Represent rectangular partitions using kd-trees.
![Page 36: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/36.jpg)
AmesResearchCenter
Partitioning
![Page 37: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/37.jpg)
AmesResearchCenter
Performance: 2 Continuous Variables
![Page 38: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/38.jpg)
AmesResearchCenter
Performance: 3 Continuous Variables
For naïve, we just discretize everything at the given input resolution. For the others, we discretize the transition functions at that resolution,
but the algorithm may increase the resolution to accurately represent that final value function. This means that the value function is actually more accurate than for the naïve algorithm.
![Page 39: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/39.jpg)
AmesResearchCenter
Final Remarks Plangraph–based approach:
• Produces “plans” - easy for people to interpret.• Fast heuristic estimate of the value of a plan/plan fragment.• Need an effective way to evaluate actual values to really
know a branch is worthwhile.• Efficient representation for problems with many goals.• Still missing discrete action outcomes
MDP-based approach:• Produces optimal policies – the best you could possibly do.• Faster, more accurate value fn. computation (if there’s
structure).• Hard to represent some problems effectively (e.g. fact that
goals are worth something only before you reach them).• Policies are hard to interpret by humans.
Can be combined: Use MDP approach to evaluate quality of plans/plan fragments.
![Page 40: Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas](https://reader033.vdocument.in/reader033/viewer/2022051710/5a4d1ace7f8b9ab059970922/html5/thumbnails/40.jpg)
AmesResearchCenter
Future Work We approximate by building an approximate model,
then solving it exactly. One could also approximately solve the exact model.
The plangraph approach takes advantage of the current system state when planning to narrow the search. The MDP policy probably includes value computation for many unreachable states.
Preference elicitation is very important here. With many goals we need good estimates of their value.
This is part of a greater whole—rover planning problems.• Is the policy sufficiently efficiently encoded to transmit to
the rover?• How much more complex does the executive need to be to
carry out a contingent plan?