on time with minimal expected cost ! time with minimal expected cost ! alexandre david, peter g...
TRANSCRIPT
On Time with Minimal Expected Cost !
Alexandre David, Peter G Jensen, Kim G Larsen, Axel Legay, Didier Lime, Mathias G Sørensen, Jakob H Taankvist
– Aalborg University
DENMARK
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAA
Motivation
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [2]
Optimal WC strategy:
E=WC=45 (bicycle)
Optimal Expected Strategy:
E=33, WC=71 (car)
Optimal Expected Strategy ensuring WC<=60:
try train upto 3 delays then bike
E=37.5, WC=59
Bruyere, V., Filiot, E., Randour, M., Raskin, J.F.: Meet your expectations with guarantees: Beyond worst-case synthesis in quantitative games. STACS14
2-Player Game (Antagonistic opponent)
Markov Decision Process (probabilistic opponent)
Duration Probabilistic Automata
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [3]
P2
P1
Pr[<=1000](<> P1.Done && P2.Done)
Kempf, J.F., Bozga, M., Maler, O.: As soon as probable: Optimal scheduling under stochastic uncertainty. In: TACAS. pp. 385{400 (2013)
Uni[2,6]
Race
Uni[2,4]
Strategy that that will minimize
expected completion time??
Motivation
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [4]
× Priced Timed Game
Timed Game Timed Automata Priced Timed Automata Stochastic (P)TA × Priced Timed MDP ∼ Decision Stochastic Priced Timed Automata
TIGA
UPPAAL
CORA
SMC
TIGA/SMC
Minimize expected cost subject to guaranteed time-bound
Uni[0,100]
Nondeterministic choice
Overview
Priced Timed Games
Time bounded reachability strategies
Priced Timed Markov Decision Processes
Minimal expected cost reachability strategy
Optimal Strategy Synthesis Using Reinforcement Learning
Representation of Stochastic Strategies
Experimental Results
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [5]
PTA and PTG
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [6]
PTA Semantics
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [7]
Strategies & Outcome
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [8]
Cost Bounded Reachability Strategies
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [9]
Motivation
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [10]
Objective: 𝐴⟨⟩(END ∧ time ≤ 210)
Deterministic, memoryless strategy:
100 200
x w w
𝝀 𝝀
time
100 200
time
x w w
𝝀
90 70
𝝀
a
𝝀
a b
Most permissive, memoryless strategy
Priced Timed MDPs
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [11]
Stochastic Strategies
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [12]
Induced Probability Measure
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [13]
Minimum Expected Cost
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [14]
Motivation
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [15]
Minimal Expected Cost Strategy (0,b) 2*80=160 Expected Cost for TIGA Strategy (100,w) 4*95=380
Minimal Expected Cost while
guaranteeing END is reached
within time 210:
Strat.: t>90 (100,w)
t>70 (0,b) ow (0,a) = 204
Reinforcement Learning
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [16]
Time Bounded Reachability (G,T)
TIGA
SMC
SMC
Strategies
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [17]
Nondeterministic Strategies (UPPAAL TIGA)
Stochastic Strategies (non-lazy *)
* Non-lazy strategies suffices for DPAs
Classes allowing for
efficient
representation and
learning
Learning
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [18]
Simulation of 𝑼𝒏𝒊 𝝈𝒑 for 𝐴⟨⟩(END ∧ time ≤ 210) time(𝝅)@Choice
C(𝝅) wait
a
b
Strategies – Representation & Manipulation
Covariance Matrices
Logistic Regression
Splitting
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [19]
Using Learning Determinization
Experiments
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [20]
Learned Strategies
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [21]
More plots of runs according to strategies learne.
Covariance
Learned Strategies
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [22]
More plots of runs according to strategies learne.
Covariance
Learned Strategies
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [23]
More plots of runs according to strategies learne.
Covariance
Learned Strategies
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [24]
More plots of runs according to strategies learne.
Covariance
Learned Strategies
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [25]
Learned Strategies
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [26]
More plots of runs according to strategies learne.
Regression
Learned Strategies
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [27]
More plots of runs according to strategies learne.
Regression
Learned Strategies
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [28]
More plots of runs according to strategies learne.
Regression
Learned Strategies
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [29]
More plots of runs according to strategies learne.
Regression
Learned Strategies
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [30]
More plots of runs according to strategies learne.
Regression
Learned Strategies
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [31]
More plots of runs according to strategies learne.
Regression
Learned Strategies
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [32]
Learned Strategies
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [33]
More plots of runs according to strategies learne.
Splitting
Learned Strategies
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [34]
More plots of runs according to strategies learne.
Splitting
Learned Strategies
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [35]
More plots of runs according to strategies learne.
Splitting
Learned Strategies
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [36]
More plots of runs according to strategies learne.
Splitting
Learned Strategies
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [37]
More plots of runs according to strategies learne.
Splitting
Learned Strategies
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [38]
More plots of runs according to strategies learne.
Splitting
Learned Strategies
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [39]
More plots of runs according to strategies learne.
Splitting
Experiments /DPA
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [40]
Kempf, J.F., Bozga, M., Maler, O.: As soon as probable: Optimal scheduling under stochastic uncertainty. In: TACAS. pp. 385{400 (2013)
http://www-verimag.imag.fr/PROJECTS/TEMPO/DATA/201304_dpa/
Experiments /DPA Random
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [41]
Experiments /DPA Random
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [42]
Conclusion & Future Work
Efficient synthesis of strategies for PTMDP ensuring time-bounds and minimizing expected cost.
If not time-bound needed we can omit the UPPAAL TIGA synthesis
Extension to Hybrid MDPs utilizing UPPAAL SMCs support for SHAs.
Make TIGA/SMC available to you! Datastructures supporting general stochastic
strategies – not just non-lazy ones. More clever filtrations of runs.
CASSTING, Brussels, May 21-23, 2014 Kim Larsen [43]