reachability in mdps: refining convergence of value...
TRANSCRIPT
Reachability in MDPs: Refining Convergence
of Value Iteration
Serge Haddad (LSV, ENS Cachan, CNRS & Inria) and
Benjamin Monmege (ULB) !
RP 2014, Oxford
2
Markov Decision Processes
• What?
✦ Stochastic process with non-deterministic choices
✦ Non-determinism solved by policies/strategies
• Where?
✦ Optimization
✦ Program verification: reachability as the basis of PCTL model-checking
✦ Game theory: 1+½ players
2
Markov Decision Processes
• What?
✦ Stochastic process with non-deterministic choices
✦ Non-determinism solved by policies/strategies
MDPs: definition and objective
a½
½
⅓
⅔
bc
d
½
½ e
3
MDPs: definition and objective
a½
½
⅓
⅔
bc
d
½
½ e
Finite number of states
3
MDPs: definition and objective
a½
½
⅓
⅔
bc
d
½
½ e
Finite number of states
Probabilistic states
3
MDPs: definition and objective
a½
½
⅓
⅔
bc
d
½
½ e
Finite number of states
Probabilistic states
Actions to be selected by the policy
3
MDPs: definition and objective
a½
½
⅓
⅔
bc
d
½
½ e
Finite number of states
Probabilistic states
Actions to be selected by the policy
M= (S,α,δ)δ :S×α→ Dist(S)
σ : (S ⋅α)! ⋅S → Dist(α)Policy3
MDPs: definition and objective
a½
½
⅓
⅔
bc
d
½
½ e
Finite number of states
Probabilistic states
Actions to be selected by the policy
Reachability objective
M= (S,α,δ)δ :S×α→ Dist(S)
σ : (S ⋅α)! ⋅S → Dist(α)Policy3
MDPs: definition and objective
a½
½
⅓
⅔
bc
d
½
½ e
Finite number of states
Probabilistic states
Actions to be selected by the policy
Reachability objective
M= (S,α,δ)δ :S×α→ Dist(S)
σ : (S ⋅α)! ⋅S → Dist(α)Policy
Probability to reach: Prsσ(F )
3
MDPs: definition and objective
a½
½
⅓
⅔
bc
d
½
½ e
Finite number of states
Probabilistic states
Actions to be selected by the policy
Reachability objective
M= (S,α,δ)δ :S×α→ Dist(S)
σ : (S ⋅α)! ⋅S → Dist(α)Policy
Probability to reach: Prsσ(F )
3
Maximal probability to reach: Pr
smax(F )= sup
σPrsσ(F )
Optimal reachability probabilities of MDPs
• How?
✦ Linear programming
✦ Policy iteration
✦ Value iteration: numerical scheme that scales well and works in practice
4
Optimal reachability probabilities of MDPs
• How?
✦ Linear programming
✦ Policy iteration
✦ Value iteration: numerical scheme that scales well and works in practice
4
used in the numerical PRISM model checker
[Kwiatkowska, Norman, Parker, 2011]
Value iteration
5
a½
½
⅓
⅔
bc
d
½
½ e
Value iteration
5
a½
½
⅓
⅔
bc
d
½
½ e
xs(n+1) = max
a∈αδ
′s ∈S∑ (s,a)( ′s )×x ′s
(n)
xs(0) = 1 if s =
0 otherwise
⎧⎨⎪⎪
⎩⎪⎪
Value iteration
5
a½
½
⅓
⅔
bc
d
½
½ e
0 0 0 0
xs(n+1) = max
a∈αδ
′s ∈S∑ (s,a)( ′s )×x ′s
(n)
xs(0) = 1 if s =
0 otherwise
⎧⎨⎪⎪
⎩⎪⎪
Value iteration
5
a½
½
⅓
⅔
bc
d
½
½ e
0 0 0 00 2/3 (b) 0 0
xs(n+1) = max
a∈αδ
′s ∈S∑ (s,a)( ′s )×x ′s
(n)
xs(0) = 1 if s =
0 otherwise
⎧⎨⎪⎪
⎩⎪⎪
Value iteration
5
a½
½
⅓
⅔
bc
d
½
½ e
0 0 0 00 2/3 (b) 0 0
1/3 2/3 (b) 0 0
xs(n+1) = max
a∈αδ
′s ∈S∑ (s,a)( ′s )×x ′s
(n)
xs(0) = 1 if s =
0 otherwise
⎧⎨⎪⎪
⎩⎪⎪
Value iteration
5
a½
½
⅓
⅔
bc
d
½
½ e
0 0 0 00 2/3 (b) 0 0
1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 0
xs(n+1) = max
a∈αδ
′s ∈S∑ (s,a)( ′s )×x ′s
(n)
xs(0) = 1 if s =
0 otherwise
⎧⎨⎪⎪
⎩⎪⎪
Value iteration
5
a½
½
⅓
⅔
bc
d
½
½ e
0 0 0 00 2/3 (b) 0 0
1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 07/12 13/18 (b) 1/4 0
xs(n+1) = max
a∈αδ
′s ∈S∑ (s,a)( ′s )×x ′s
(n)
xs(0) = 1 if s =
0 otherwise
⎧⎨⎪⎪
⎩⎪⎪
Value iteration
5
a½
½
⅓
⅔
bc
d
½
½ e
0 0 0 00 2/3 (b) 0 0
1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 07/12 13/18 (b) 1/4 0… … … …
xs(n+1) = max
a∈αδ
′s ∈S∑ (s,a)( ′s )×x ′s
(n)
xs(0) = 1 if s =
0 otherwise
⎧⎨⎪⎪
⎩⎪⎪
Value iteration
5
a½
½
⅓
⅔
bc
d
½
½ e
0 0 0 00 2/3 (b) 0 0
1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 07/12 13/18 (b) 1/4 0… … … …
0.7969 0.7988 (b) 0.3977 0
xs(n+1) = max
a∈αδ
′s ∈S∑ (s,a)( ′s )×x ′s
(n)
xs(0) = 1 if s =
0 otherwise
⎧⎨⎪⎪
⎩⎪⎪
Value iteration
5
a½
½
⅓
⅔
bc
d
½
½ e
0 0 0 00 2/3 (b) 0 0
1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 07/12 13/18 (b) 1/4 0… … … …
0.7969 0.7988 (b) 0.3977 00.7978 0.7992 (b) 0.3984 0
xs(n+1) = max
a∈αδ
′s ∈S∑ (s,a)( ′s )×x ′s
(n)
xs(0) = 1 if s =
0 otherwise
⎧⎨⎪⎪
⎩⎪⎪
Value iteration
5
a½
½
⅓
⅔
bc
d
½
½ e
0 0 0 00 2/3 (b) 0 0
1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 07/12 13/18 (b) 1/4 0… … … …
0.7969 0.7988 (b) 0.3977 00.7978 0.7992 (b) 0.3984 0
≤0.001
xs(n+1) = max
a∈αδ
′s ∈S∑ (s,a)( ′s )×x ′s
(n)
xs(0) = 1 if s =
0 otherwise
⎧⎨⎪⎪
⎩⎪⎪
Value iteration: which guarantees?
6
½
½
½
½
k
k-1
…k+2k+1 2k-1 2k
½
½
½
½
½½
…k-2 1½
½½
½
Value iteration: which guarantees?
6
½
½
½
½
k
k-1
…k+2k+1 2k-1 2k
½
½
½
½
½½
…k-2 1½
½½
½
State 0 1 2 3 … k-1 k k+1 … 2k
Value iteration: which guarantees?
6
½
½
½
½
k
k-1
…k+2k+1 2k-1 2k
½
½
½
½
½½
…k-2 1½
½½
½
State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0
Value iteration: which guarantees?
6
½
½
½
½
k
k-1
…k+2k+1 2k-1 2k
½
½
½
½
½½
…k-2 1½
½½
½
State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0
Value iteration: which guarantees?
6
½
½
½
½
k
k-1
…k+2k+1 2k-1 2k
½
½
½
½
½½
…k-2 1½
½½
½
State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0Step 3 1 1/2 1/4 0 … 0 0 0 … 0
Value iteration: which guarantees?
6
½
½
½
½
k
k-1
…k+2k+1 2k-1 2k
½
½
½
½
½½
…k-2 1½
½½
½
State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0Step 3 1 1/2 1/4 0 … 0 0 0 … 0Step 4 1 1/2 1/4 1/8 … 0 0 0 … 0
Value iteration: which guarantees?
6
½
½
½
½
k
k-1
…k+2k+1 2k-1 2k
½
½
½
½
½½
…k-2 1½
½½
½
State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0Step 3 1 1/2 1/4 0 … 0 0 0 … 0Step 4 1 1/2 1/4 1/8 … 0 0 0 … 0
… … … … … … … … … … …Step k 1 1/2 1/4 1/8 … 0 0 … 01/ 2k−1
Value iteration: which guarantees?
6
½
½
½
½
k
k-1
…k+2k+1 2k-1 2k
½
½
½
½
½½
…k-2 1½
½½
½
State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0Step 3 1 1/2 1/4 0 … 0 0 0 … 0Step 4 1 1/2 1/4 1/8 … 0 0 0 … 0
… … … … … … … … … … …Step k 1 1/2 1/4 1/8 … 0 0 … 0
Step k+1 1 1/2 1/4 1/8 … 0 … 0
1/ 2k−1
1/ 2k−1 1/ 2k
Value iteration: which guarantees?
6
½
½
½
½
k
k-1
…k+2k+1 2k-1 2k
½
½
½
½
½½
…k-2 1½
½½
½
State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0Step 3 1 1/2 1/4 0 … 0 0 0 … 0Step 4 1 1/2 1/4 1/8 … 0 0 0 … 0
… … … … … … … … … … …Step k 1 1/2 1/4 1/8 … 0 0 … 0
Step k+1 1 1/2 1/4 1/8 … 0 … 0≤1/2k 1/ 2k−1
1/ 2k−1 1/ 2k
Value iteration: which guarantees?
6
½
½
½
½
k
k-1
…k+2k+1 2k-1 2k
½
½
½
½
½½
…k-2 1½
½½
½
State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0Step 3 1 1/2 1/4 0 … 0 0 0 … 0Step 4 1 1/2 1/4 1/8 … 0 0 0 … 0
… … … … … … … … … … …Step k 1 1/2 1/4 1/8 … 0 0 … 0
Step k+1 1 1/2 1/4 1/8 … 0 … 0≤1/2k
Real value: 1/2 (by symmetry)
1/ 2k−1
1/ 2k−1 1/ 2k
Contributions
7
Contributions
1. Enhanced value iteration algorithm with strong guarantees
7
Contributions
1. Enhanced value iteration algorithm with strong guarantees
• performs two value iterations in parallel
7
Contributions
1. Enhanced value iteration algorithm with strong guarantees
• performs two value iterations in parallel
• keeps an interval of possible optimal values
7
Contributions
1. Enhanced value iteration algorithm with strong guarantees
• performs two value iterations in parallel
• keeps an interval of possible optimal values
• uses the interval for the stopping criterion
7
Contributions
1. Enhanced value iteration algorithm with strong guarantees
• performs two value iterations in parallel
• keeps an interval of possible optimal values
• uses the interval for the stopping criterion
2. Study of the speed of convergence
7
Contributions
1. Enhanced value iteration algorithm with strong guarantees
• performs two value iterations in parallel
• keeps an interval of possible optimal values
• uses the interval for the stopping criterion
2. Study of the speed of convergence
• also applies to classical value iteration
7
Contributions
1. Enhanced value iteration algorithm with strong guarantees
• performs two value iterations in parallel
• keeps an interval of possible optimal values
• uses the interval for the stopping criterion
2. Study of the speed of convergence
• also applies to classical value iteration
3. Improved rounding procedure for exact computation
7
Interval iteration
8
½
½
½
½
k
k-1
…k+2k+1 2k-1 2n
½
½
½
½
½½
…k-2 1½
½½
½
xs(0) = 1 if s =
0 otherwise
⎧⎨⎪⎪
⎩⎪⎪
xs(n+1) = max
a∈αδ
′s ∈S∑ (s,a)( ′s )×x ′s
(n)
Interval iteration
8
½
½
½
½
k
k-1
…k+2k+1 2k-1 2n
½
½
½
½
½½
…k-2 1½
½½
½
0
0,25
0,5
0,75
1
Number of steps
1 2 3 4 5 6
xs(0) = 1 if s =
0 otherwise
⎧⎨⎪⎪
⎩⎪⎪
xs(n+1) = max
a∈αδ
′s ∈S∑ (s,a)( ′s )×x ′s
(n)
Interval iteration
8
½
½
½
½
k
k-1
…k+2k+1 2k-1 2n
½
½
½
½
½½
…k-2 1½
½½
½
0
0,25
0,5
0,75
1
Number of steps
1 2 3 4 5 6
fmax(x)
s= max
a∈αδ
′s ∈S∑ (s,a)( ′s )×x ′s
xs(0) = 1 if s =
0 otherwise
⎧⎨⎪⎪
⎩⎪⎪
x (n+1) = fmax(x (n))
Interval iteration
8
½
½
½
½
k
k-1
…k+2k+1 2k-1 2n
½
½
½
½
½½
…k-2 1½
½½
½
0
0,25
0,5
0,75
1
Number of steps
1 2 3 4 5 6
fmax(x)
s= max
a∈αδ
′s ∈S∑ (s,a)( ′s )×x ′s
xs(0) = 1 if s =
0 otherwise
⎧⎨⎪⎪
⎩⎪⎪
x (n+1) = fmax(x (n))
usual stopping criterion
Interval iteration
8
½
½
½
½
k
k-1
…k+2k+1 2k-1 2n
½
½
½
½
½½
…k-2 1½
½½
½
0
0,25
0,5
0,75
1
Number of steps
1 2 3 4 5 6
fmax(x)
s= max
a∈αδ
′s ∈S∑ (s,a)( ′s )×x ′s
xs(0) = 1 if s =
0 otherwise
⎧⎨⎪⎪
⎩⎪⎪
x (n+1) = fmax(x (n))
usual stopping criterion
Interval iteration
8
½
½
½
½
k
k-1
…k+2k+1 2k-1 2n
½
½
½
½
½½
…k-2 1½
½½
½
0
0,25
0,5
0,75
1
Number of steps
1 2 3 4 5 6
NEW stopping criterion
fmax(x)
s= max
a∈αδ
′s ∈S∑ (s,a)( ′s )×x ′s
xs(0) = 1 if s =
0 otherwise
⎧⎨⎪⎪
⎩⎪⎪
x (n+1) = fmax(x (n))
usual stopping criterion
Interval iteration
8
½
½
½
½
k
k-1
…k+2k+1 2k-1 2n
½
½
½
½
½½
…k-2 1½
½½
½
0
0,25
0,5
0,75
1
Number of steps
1 2 3 4 5 6
NEW stopping criterion
fmax(x)
s= max
a∈αδ
′s ∈S∑ (s,a)( ′s )×x ′s
xs(0) = 1 if s =
0 otherwise
⎧⎨⎪⎪
⎩⎪⎪
ys(0) = 0 if s =
1 otherwise
⎧⎨⎪⎪
⎩⎪⎪
x (n+1) = fmax(x (n)) y(n+1) = f
max(y(n))
usual stopping criterion
Fixed point characterization
9
(Prsmax(F ))
s∈S is the smallest fixed point of . f
max
Fixed point characterization
9
(Prsmax(F ))
s∈S is the smallest fixed point of . f
max
0
0,25
0,5
0,75
1
Number of steps
1 2 3 4 5 6
Fixed point characterization
9
(Prsmax(F ))
s∈S is the smallest fixed point of . f
max
0
0,25
0,5
0,75
1
Number of steps
1 2 3 4 5 6
always converges towards (Prsmax(F ))s∈S
Fixed point characterization
9
(Prsmax(F ))
s∈S is the smallest fixed point of . f
max
0
0,25
0,5
0,75
1
Number of steps
1 2 3 4 5 6
always converges towards (Prsmax(F ))s∈S
not always…!
Fixed point characterization
9
(Prsmax(F ))
s∈S is the smallest fixed point of . f
max
0
0,25
0,5
0,75
1
Number of steps
1 2 3 4 5 6
always converges towards (Prsmax(F ))s∈S
not always…!
a
½½
c½g
b
e
d
0.2
0.3
f
Fixed point characterization
9
(Prsmax(F ))
s∈S is the smallest fixed point of . f
max
0
0,25
0,5
0,75
1
Number of steps
1 2 3 4 5 6
always converges towards (Prsmax(F ))s∈S
not always…!
a
½½
c½g
b
e
d
0.2
0.3
f
1 1
0.7
0 0
1
Fixed point characterization
9
(Prsmax(F ))
s∈S is the smallest fixed point of . f
max
0
0,25
0,5
0,75
1
Number of steps
1 2 3 4 5 6
always converges towards (Prsmax(F ))s∈S
not always…!
a
½½
c½g
b
e
d
0.2
0.3
f
0.5 0.5
0.45
0 0
1
Solution: ensure uniqueness!
10
a
½½
c½g
b
e
d
0.2
0.3
f
Usual techniques applied for MDPs do not apply…
Solution: ensure uniqueness!
10
a
½½
c½g
b
e
d
0.2
0.3
f
Usual techniques applied for MDPs do not apply…
Prsmax(F )= 1
Prsmax(F )= 0
Solution: ensure uniqueness!
10
Usual techniques applied for MDPs do not apply…
Prsmax(F )= 1
a
½½
c½g
b
e
0.2
0.3
f
Prsmax(F )= 0
Still multiple fixed points!
Solution: ensure uniqueness!
10
Usual techniques applied for MDPs do not apply…
NEW! Use Maximal End Components… (computable in polynomial time)
a
½½
c½g
b
e
d
0.2
0.3
f
[de Alfaro, 1997]
Solution: ensure uniqueness!
10
Usual techniques applied for MDPs do not apply…
NEW! Use Maximal End Components… (computable in polynomial time)
a
½½
c½g
b
e
d
0.2
0.3
f
Bottom Maximal End Component
Bottom Maximal End Component
Trivial Maximal End Component
Maximal End Component
[de Alfaro, 1997]
Solution: ensure uniqueness!
10
Usual techniques applied for MDPs do not apply…
NEW! Use Maximal End Components… (computable in polynomial time)and trivialize them!
½½
½g
b
e
0.2
0.3
f
Now, unicity of the fixed point
Max-reduced MDP
[de Alfaro, 1997]
An even smaller MDP for minimal probabilities
11
a
½½
c½g
b
e
d
0.2
0.3
f
Bottom Maximal End Component
Bottom Maximal End Component
Trivial Maximal End Component
Maximal End Component
An even smaller MDP for minimal probabilities
11
a
½½
c½g
b
e
d
0.2
0.3
f
Bottom Maximal End Component
Bottom Maximal End Component
Trivial Maximal End Component
Maximal End Component
Non-trivial (and non accepting) MEC have null minimal probability!
An even smaller MDP for minimal probabilities
11
b
e
0.2
0.3
f
Non-trivial (and non accepting) MEC have null minimal probability!
Min-reduced MDP
Interval iteration algorithm in reduced MDPs
12
Algorithm 1: Interval iteration algorithm for minimum reachability
Input: Min-reduced MDP M = (S,↵M, �M), convergence threshold "
Output: Under- and over-approximation of Pr
minM (F s+)
1 xs+ := 1; xs� := 0; ys+ := 1; ys� := 02 foreach s 2 S \ {s+, s�} do xs := 0; ys := 13 repeat4 foreach s 2 S \ {s+, s�} do5 x
0s := mina2A(s)
Ps02S �M(s, a)(s0)xs0
6 y
0s := mina2A(s)
Ps02S �M(s, a)(s0) ys0
7 � := maxs2S(y0s � x
0s)
8 foreach s 2 S \ {s+, s�} do x
0s := xs; y
0s := ys
9 until � 6 "
10 return (xs)s2S , (ys)s2S
– y
(0) = y
(0) and for all n 2 N, y(n+1) = f
min
(y(n));
– y
(0) = y
(0) and for all n 2 N, y(n+1) = f
max
(y(n)).
Because of the new choice for the initial vector, notice that y and y are non-increasing sequences. Hence, with the same reasoning as above, we know thatthese sequences converge, and that their limit, denoted by y
(1) and y
(1) re-spectively, are the minimal (respectively, maximal) reachability probabilities. Inparticular, notice that x and y, as well as x and y, are adjacent sequences, andthat
x
(1) = y
(1) = Pr
min
M (F s+) and x
(1) = y
(1) = Pr
max
M (F s+) .
Let us first consider a min-reduced MDP M. Then, our new value iterationalgorithm computes both in the same time sequences x and y and stops as soon
as ky(n) � x
(n)k 6 ". In case this criterion is satisfied, which will happen aftera finite (yet possibly large and not bounded a priori) number of iterations, wecan guarantee that we obtained over- and underapproximations of Pr
min
M (F s+)with precision at least " on every component. Because of the simultaneous com-putation of lower and upper bounds, we call this algorithm interval iterationalgorithm, and specify it in Algorithm 1. A similar algorithm can be designedfor maximum reachability probabilities, by considering max-reduced MDPs andreplacing min operations of lines 5 and 6 by max operations.
Theorem 15. For every min-reduced (respectively, max-reduced) MDP M, andconvergence threshold ", if the interval iteration algorithm returns the vectors x
and y on those inputs, then for all s 2 S, Pr
min
M,s
(F s+
) (respectively, Pr
max
M,s
(F s+
))is in the interval [x
s
, y
s
] of length at most ".
We implemented a prototype of the algorithm in OCaml.
Example 16. For the same example as the one in Example 14, our tool convergesafter 10548 steps, and outputs, for the initial state s = n, x
n
= 0.4995 andy
n
= 0.5005, given a good confidence to the user.
14
Interval iteration algorithm in reduced MDPs
12
Algorithm 1: Interval iteration algorithm for minimum reachability
Input: Min-reduced MDP M = (S,↵M, �M), convergence threshold "
Output: Under- and over-approximation of Pr
minM (F s+)
1 xs+ := 1; xs� := 0; ys+ := 1; ys� := 02 foreach s 2 S \ {s+, s�} do xs := 0; ys := 13 repeat4 foreach s 2 S \ {s+, s�} do5 x
0s := mina2A(s)
Ps02S �M(s, a)(s0)xs0
6 y
0s := mina2A(s)
Ps02S �M(s, a)(s0) ys0
7 � := maxs2S(y0s � x
0s)
8 foreach s 2 S \ {s+, s�} do x
0s := xs; y
0s := ys
9 until � 6 "
10 return (xs)s2S , (ys)s2S
– y
(0) = y
(0) and for all n 2 N, y(n+1) = f
min
(y(n));
– y
(0) = y
(0) and for all n 2 N, y(n+1) = f
max
(y(n)).
Because of the new choice for the initial vector, notice that y and y are non-increasing sequences. Hence, with the same reasoning as above, we know thatthese sequences converge, and that their limit, denoted by y
(1) and y
(1) re-spectively, are the minimal (respectively, maximal) reachability probabilities. Inparticular, notice that x and y, as well as x and y, are adjacent sequences, andthat
x
(1) = y
(1) = Pr
min
M (F s+) and x
(1) = y
(1) = Pr
max
M (F s+) .
Let us first consider a min-reduced MDP M. Then, our new value iterationalgorithm computes both in the same time sequences x and y and stops as soon
as ky(n) � x
(n)k 6 ". In case this criterion is satisfied, which will happen aftera finite (yet possibly large and not bounded a priori) number of iterations, wecan guarantee that we obtained over- and underapproximations of Pr
min
M (F s+)with precision at least " on every component. Because of the simultaneous com-putation of lower and upper bounds, we call this algorithm interval iterationalgorithm, and specify it in Algorithm 1. A similar algorithm can be designedfor maximum reachability probabilities, by considering max-reduced MDPs andreplacing min operations of lines 5 and 6 by max operations.
Theorem 15. For every min-reduced (respectively, max-reduced) MDP M, andconvergence threshold ", if the interval iteration algorithm returns the vectors x
and y on those inputs, then for all s 2 S, Pr
min
M,s
(F s+
) (respectively, Pr
max
M,s
(F s+
))is in the interval [x
s
, y
s
] of length at most ".
We implemented a prototype of the algorithm in OCaml.
Example 16. For the same example as the one in Example 14, our tool convergesafter 10548 steps, and outputs, for the initial state s = n, x
n
= 0.4995 andy
n
= 0.5005, given a good confidence to the user.
14
Sequences x and y converge towards the minimal probability to reach . Hence, the algorithm terminates by returning an interval
of length at most ε for each state containing .Prsmax(F )
Interval iteration algorithm in reduced MDPs
12
Algorithm 1: Interval iteration algorithm for minimum reachability
Input: Min-reduced MDP M = (S,↵M, �M), convergence threshold "
Output: Under- and over-approximation of Pr
minM (F s+)
1 xs+ := 1; xs� := 0; ys+ := 1; ys� := 02 foreach s 2 S \ {s+, s�} do xs := 0; ys := 13 repeat4 foreach s 2 S \ {s+, s�} do5 x
0s := mina2A(s)
Ps02S �M(s, a)(s0)xs0
6 y
0s := mina2A(s)
Ps02S �M(s, a)(s0) ys0
7 � := maxs2S(y0s � x
0s)
8 foreach s 2 S \ {s+, s�} do x
0s := xs; y
0s := ys
9 until � 6 "
10 return (xs)s2S , (ys)s2S
– y
(0) = y
(0) and for all n 2 N, y(n+1) = f
min
(y(n));
– y
(0) = y
(0) and for all n 2 N, y(n+1) = f
max
(y(n)).
Because of the new choice for the initial vector, notice that y and y are non-increasing sequences. Hence, with the same reasoning as above, we know thatthese sequences converge, and that their limit, denoted by y
(1) and y
(1) re-spectively, are the minimal (respectively, maximal) reachability probabilities. Inparticular, notice that x and y, as well as x and y, are adjacent sequences, andthat
x
(1) = y
(1) = Pr
min
M (F s+) and x
(1) = y
(1) = Pr
max
M (F s+) .
Let us first consider a min-reduced MDP M. Then, our new value iterationalgorithm computes both in the same time sequences x and y and stops as soon
as ky(n) � x
(n)k 6 ". In case this criterion is satisfied, which will happen aftera finite (yet possibly large and not bounded a priori) number of iterations, wecan guarantee that we obtained over- and underapproximations of Pr
min
M (F s+)with precision at least " on every component. Because of the simultaneous com-putation of lower and upper bounds, we call this algorithm interval iterationalgorithm, and specify it in Algorithm 1. A similar algorithm can be designedfor maximum reachability probabilities, by considering max-reduced MDPs andreplacing min operations of lines 5 and 6 by max operations.
Theorem 15. For every min-reduced (respectively, max-reduced) MDP M, andconvergence threshold ", if the interval iteration algorithm returns the vectors x
and y on those inputs, then for all s 2 S, Pr
min
M,s
(F s+
) (respectively, Pr
max
M,s
(F s+
))is in the interval [x
s
, y
s
] of length at most ".
We implemented a prototype of the algorithm in OCaml.
Example 16. For the same example as the one in Example 14, our tool convergesafter 10548 steps, and outputs, for the initial state s = n, x
n
= 0.4995 andy
n
= 0.5005, given a good confidence to the user.
14
Sequences x and y converge towards the minimal probability to reach . Hence, the algorithm terminates by returning an interval
of length at most ε for each state containing .Prsmax(F )
Possible speed-up: only check size of interval for a given state…
Rate of convergence
13
x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x
s= Pr
smin(F≤n ) ys = Prs
min(G≤n(¬ ))
½
½
½
½…
½
½
½
½
½½
…½
½½
½
Rate of convergence
13
x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x
s= Pr
smin(F≤n ) ys = Prs
min(G≤n(¬ ))
½
½
½
½…
½
½
½
½
½½
…½
½½
½
2 BMECs and only trivial MECs
Rate of convergence
13
x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x
s= Pr
smin(F≤n ) ys = Prs
min(G≤n(¬ ))
½
½
½
½…
½
½
½
½
½½
…½
½½
½
2 BMECs and only trivial MECsattractor decomposition: length I
Rate of convergence
13
x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x
s= Pr
smin(F≤n ) ys = Prs
min(G≤n(¬ ))
½
½
½
½…
½
½
½
½
½½
…½
½½
½
2 BMECs and only trivial MECsattractor decomposition: length I
Rate of convergence
13
x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x
s= Pr
smin(F≤n ) ys = Prs
min(G≤n(¬ ))
½
½
½
½…
½
½
½
½
½½
…½
½½
½
2 BMECs and only trivial MECsattractor decomposition: length I
Rate of convergence
13
x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x
s= Pr
smin(F≤n ) ys = Prs
min(G≤n(¬ ))
½
½
½
½…
½
½
½
½
½½
…½
½½
½
2 BMECs and only trivial MECsattractor decomposition: length I
Rate of convergence
13
x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x
s= Pr
smin(F≤n ) ys = Prs
min(G≤n(¬ ))
½
½
½
½…
½
½
½
½
½½
…½
½½
½
2 BMECs and only trivial MECsattractor decomposition: length I
Rate of convergence
13
x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x
s= Pr
smin(F≤n ) ys = Prs
min(G≤n(¬ ))
½
½
½
½…
½
½
½
½
½½
…½
½½
½
2 BMECs and only trivial MECsattractor decomposition: length Ismallest positive probability: η
Rate of convergence
13
x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x
s= Pr
smin(F≤n ) ys = Prs
min(G≤n(¬ ))
½
½
½
½…
½
½
½
½
½½
…½
½½
½
2 BMECs and only trivial MECsattractor decomposition: length Ismallest positive probability: η
Leaking property: ∀n ∈ ! Prsmax(G≤nI¬( ∨ ))≤ (1− ηI )n
Rate of convergence
13
x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x
s= Pr
smin(F≤n ) ys = Prs
min(G≤n(¬ ))
½
½
½
½…
½
½
½
½
½½
…½
½½
½
2 BMECs and only trivial MECsattractor decomposition: length Ismallest positive probability: η
Leaking property: ∀n ∈ ! Prsmax(G≤nI¬( ∨ ))≤ (1− ηI )n
ys(nI )−x
s(nI ) = Pr
sσ(G≤nI (¬ ))−Pr
s′σ (F≤nI )≤Pr
s′σ (G≤nI (¬ ))−Pr
s′σ (F≤nI )
Rate of convergence
13
x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x
s= Pr
smin(F≤n ) ys = Prs
min(G≤n(¬ ))
½
½
½
½…
½
½
½
½
½½
…½
½½
½
2 BMECs and only trivial MECsattractor decomposition: length Ismallest positive probability: η
Leaking property: ∀n ∈ ! Prsmax(G≤nI¬( ∨ ))≤ (1− ηI )n
G≤n(¬ )≡ G≤n¬( ∨ )⊕F≤nsince
= Prs′σ (G≤nI¬( ∨ ))≤ (1− ηI )n
ys(nI )−x
s(nI ) = Pr
sσ(G≤nI (¬ ))−Pr
s′σ (F≤nI )≤Pr
s′σ (G≤nI (¬ ))−Pr
s′σ (F≤nI )
Rate of convergence
13
x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x
s= Pr
smin(F≤n ) ys = Prs
min(G≤n(¬ ))
½
½
½
½…
½
½
½
½
½½
…½
½½
½
2 BMECs and only trivial MECsattractor decomposition: length Ismallest positive probability: η
Leaking property: ∀n ∈ ! Prsmax(G≤nI¬( ∨ ))≤ (1− ηI )n
The interval iteration algorithm converges in at most steps.Ilogε
log(1− ηI )
⎡
⎢⎢⎢
⎤
⎥⎥⎥
Stopping criterion for exact computation
14
MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities
Stopping criterion for exact computation
14
MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities
[Chatterjee, Henzinger 2008] claim for exact computation possible after iterations of value iterationd 8M
Stopping criterion for exact computation
14
MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities
[Chatterjee, Henzinger 2008] claim for exact computation possible after iterations of value iterationd 8M
Optimal probabilities and policies can be computed by the interval iteration algorithm in at most steps.O((1/ η)NN 3 logd)
Stopping criterion for exact computation
14
MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities
[Chatterjee, Henzinger 2008] claim for exact computation possible after iterations of value iterationd 8M
Optimal probabilities and policies can be computed by the interval iteration algorithm in at most steps.O((1/ η)NN 3 logd)
Improvement since !1/ η ≤d N ≤M
Stopping criterion for exact computation
14
MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities
[Chatterjee, Henzinger 2008] claim for exact computation possible after iterations of value iterationd 8M
Optimal probabilities and policies can be computed by the interval iteration algorithm in at most steps.O((1/ η)NN 3 logd)
Improvement since !1/ η ≤d N ≤M
Sketch of proof: • use as threshold (with α gcd of optimal probabilities)
• upper bound on α based on matrix properties of Markov chains:
ε = 1/ 2α
α =O(NNd 3N2
)
Conclusion and related work• Framework allowing guarantees for value iteration algorithm
• General results on convergence rate
• Criterion for computation of exact value
• Future work: test of our preliminary implementation over real instances
15
Conclusion and related work• Framework allowing guarantees for value iteration algorithm
• General results on convergence rate
• Criterion for computation of exact value
• Future work: test of our preliminary implementation over real instances
15
• [Katoen, Zapreev, 2006] On-the-fly detection of steady-state in the transient analysis of continuous-time Markov chains
⁓
Conclusion and related work• Framework allowing guarantees for value iteration algorithm
• General results on convergence rate
• Criterion for computation of exact value
• Future work: test of our preliminary implementation over real instances
15
• [Katoen, Zapreev, 2006] On-the-fly detection of steady-state in the transient analysis of continuous-time Markov chains
• [Kattenbelt, Kwiatkowska, Norman, Parker, 2010] CEGAR-based approach for stochastic games
⁓
Conclusion and related work• Framework allowing guarantees for value iteration algorithm
• General results on convergence rate
• Criterion for computation of exact value
• Future work: test of our preliminary implementation over real instances
15
• [Katoen, Zapreev, 2006] On-the-fly detection of steady-state in the transient analysis of continuous-time Markov chains
• [Kattenbelt, Kwiatkowska, Norman, Parker, 2010] CEGAR-based approach for stochastic games
• To be published at ATVA 2014 [Brázdil, Chatterjee, Chmelík, Forejt, Křetínský, Kwiatkowska, Parker, Ujma, 2014] same techniques in a machine learning framework with almost sure convergence and computation of non-trivial end components on-the-fly
⁓