![Page 1: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/1.jpg)
A Framework for Planning in Continuous-time Stochastic Domains
Håkan L. S. YounesCarnegie Mellon University
David J. Musliner Reid G. SimmonsHoneywell Laboratories Carnegie Mellon
University
![Page 2: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/2.jpg)
Introduction
Policy generation for complex domains Uncertainty in outcome and timing of
actions and events Time as a continuous quantity Concurrency
Rich goal formalism Achievement, maintenance, prevention Deadlines
![Page 3: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/3.jpg)
Motivating Example
Deliver package from CMU to Honeywell
CMU PIT
PittsburghMSP
Honeywell
Minneapolis
![Page 4: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/4.jpg)
Elements of Uncertainty
Uncertain duration of flight and taxi ride
Plane can get full without reservation
Taxi might not be at airport when arriving in Minneapolis
Package can get lost at airports
![Page 5: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/5.jpg)
Modeling Uncertainty
Associate a delay distribution F(t) with each action/event a
F(t) is the cumulative distribution function for the delay from when a is enabled until it triggers
driving at airport
arrive
U(20,40)
Pittsburgh Taxi
![Page 6: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/6.jpg)
Concurrency
Concurrent semi-Markov processes
driving at airport
U(20,40)
Pittsburgh Taxi
at airport moving
arrive
move
return
Exp(1/40)
U(10,20)Minneapolis Taxi
PT drivingMT at airport
PT drivingMT moving
t=0
t=24
MT move
Generalized semi-Markov process
![Page 7: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/7.jpg)
Rich Goal Formalism
Goals specified as CSL formulae ::= true | a | | | Pr≥ () ::= U≤t | ◊≤t | □≤t
![Page 8: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/8.jpg)
Goal for Motivating Example
Probability at least 0.9 that the package reaches Honeywell within 300 minutes without getting lost on the way Pr≥0.9(pkg lost U≤300 pkg@Honeywell)
![Page 9: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/9.jpg)
Problem Specification
Given: Complex domain model
Stochastic discrete event system Initial state Probabilistic temporally extended goal
CSL formula
Wanted: Policy satisfying goal formula in initial
state
![Page 10: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/10.jpg)
Generate, Test and Debug [Simmons 88]
Generate initial policy
Test if policy is good
Debug and repair policy
good
badrepeat
![Page 11: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/11.jpg)
Generate
Ways of generating initial policy Generate policy for relaxed problem Use existing policy for similar problem Start with null policy Start with random policy
Not focus of this talk! Generate
Test
Debug
![Page 12: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/12.jpg)
Test
Use discrete event simulation to generate sample execution paths
Use acceptance sampling to verify probabilistic CSL goal conditions
Generate
Test
Debug
![Page 13: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/13.jpg)
Debug
Analyze sample paths generated in test step to find reasons for failure
Change policy to reflect outcome of failure analysis
Generate
Test
Debug
![Page 14: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/14.jpg)
More on Test Step
Generate initial policy
Test if policy is good
Debug and repair policy
Test
![Page 15: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/15.jpg)
Error Due to Sampling
Probability of false negative: ≤ Rejecting a good policy
Probability of false positive: ≤ Accepting a bad policy
(1−)-soundness
![Page 16: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/16.jpg)
Acceptance Sampling
Hypothesis: Pr≥()
![Page 17: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/17.jpg)
Performance of Test
Actual probability of holding
Pro
bab
ility
of
acc
ep
tin
gPr ≥
(
) as
tru
e
1 –
![Page 18: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/18.jpg)
Ideal Performance
Actual probability of holding
Pro
bab
ility
of
acc
ep
tin
gPr ≥
(
) as
tru
e
1 –
False negatives
False positives
![Page 19: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/19.jpg)
Realistic Performance
– +
Indifference region
Actual probability of holding
Pro
bab
ility
of
acc
ep
tin
gPr ≥
(
) as
tru
e
1 –
False negatives
False positives
![Page 20: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/20.jpg)
SequentialAcceptance Sampling [Wald 45]
Hypothesis: Pr≥()
True, false,or anothersample?
![Page 21: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/21.jpg)
Graphical Representation of Sequential Test
Number of samples
Nu
mb
er
of
posi
tive s
am
ple
s
![Page 22: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/22.jpg)
Graphical Representation of Sequential Test
We can find an acceptance line and a rejection line given , , , and
Reject
Accept
Continue sampling
Number of samples
Nu
mb
er
of
posi
tive s
am
ple
s A,,,(n)
R,,,(n)
![Page 23: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/23.jpg)
Graphical Representation of Sequential Test
Reject hypothesis
Reject
Accept
Continue sampling
Number of samples
Nu
mb
er
of
posi
tive s
am
ple
s
![Page 24: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/24.jpg)
Graphical Representation of Sequential Test
Accept hypothesis
Reject
Accept
Continue sampling
Number of samples
Nu
mb
er
of
posi
tive s
am
ple
s
![Page 25: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/25.jpg)
Anytime Policy Verification
Find best acceptance and rejection lines after each sample in terms of and
Number of samples
Nu
mb
er
of
posi
tive s
am
ple
s
![Page 26: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/26.jpg)
Verification Example
Initial policy for example problem
CPU time (seconds)
Err
or
pro
babili
ty
0
0.1
0.2
0.3
0.4
0.5
0 0.02 0.04 0.06 0.08 0.1 0.12
rejectaccept
=0.01
![Page 27: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/27.jpg)
More on Debug Step
Generate initial policy
Test if policy is good
Debug and repair policyDebug
![Page 28: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/28.jpg)
Role of Negative Sample Paths
Negative sample paths provide evidence on how policy can fail “Counter examples”
![Page 29: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/29.jpg)
Generic Repair Procedure
1. Select some state along some negative sample path
2. Change the action planned for the selected state
Need heuristics to make informed state/action choices
![Page 30: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/30.jpg)
Scoring States
Assign –1 to last state along negative sample path and propagate backwards
Add over all negative sample pathss9 failures1 s5s2
s1 s5 s3 failure
−1−−2−3−4
−1−−2−3
s5
s5
![Page 31: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/31.jpg)
Example
Package gets lost at Minneapolis airport while waiting for the taxi
Repair: store package until taxi arrives
![Page 32: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/32.jpg)
Verification of Repaired Policy
CPU time (seconds)
Err
or
pro
babili
ty
=0.01
0
0.1
0.2
0.3
0.4
0.5
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45
rejectaccept
![Page 33: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/33.jpg)
Comparing Policies
Use acceptance sampling: Pair samples from the verification of
two policies Count pairs where policies differ Prefer first policy if probability is at
least 0.5 of pairs where first policy is better
![Page 34: A Framework for Planning in Continuous-time Stochastic Domains](https://reader031.vdocument.in/reader031/viewer/2022012922/5681351e550346895d9c7ea0/html5/thumbnails/34.jpg)
Summary
Framework for dealing with complex stochastic domains
Efficient sampling-based anytime verification of policies
Initial work on debug and repair heuristics