reachability in mdps: refining convergence of value...

Reachability in MDPs: Refining Convergence

of Value Iteration

Serge Haddad (LSV, ENS Cachan, CNRS & Inria) and

Benjamin Monmege (ULB) !

RP 2014, Oxford

2

Markov Decision Processes

• What?

✦ Stochastic process with non-deterministic choices

✦ Non-determinism solved by policies/strategies

• Where?

✦ Optimization

✦ Program verification: reachability as the basis of PCTL model-checking

✦ Game theory: 1+½ players

2

Markov Decision Processes

• What?

✦ Stochastic process with non-deterministic choices

✦ Non-determinism solved by policies/strategies

MDPs: definition and objective

a½

½

⅓

⅔

bc

d

½

½ e

3


a½

½

⅓

⅔

bc

d

½

½ e

Finite number of states

3


a½

½

⅓

⅔

bc

d

½

½ e


Probabilistic states

3


a½

½

⅓

⅔

bc

d

½

½ e



Actions to be selected by the policy

3


a½

½

⅓

⅔

bc

d

½

½ e




M= (S,α,δ)δ :S×α→ Dist(S)

σ : (S ⋅α)! ⋅S → Dist(α)Policy3


a½

½

⅓

⅔

bc

d

½

½ e




Reachability objective


σ : (S ⋅α)! ⋅S → Dist(α)Policy3


a½

½

⅓

⅔

bc

d

½

½ e






σ : (S ⋅α)! ⋅S → Dist(α)Policy

Probability to reach: Prsσ(F )

3


a½

½

⅓

⅔

bc

d

½

½ e






σ : (S ⋅α)! ⋅S → Dist(α)Policy

Probability to reach: Prsσ(F )

3

Maximal probability to reach: Pr

smax(F )= sup

σPrsσ(F )

Optimal reachability probabilities of MDPs

• How?

✦ Linear programming

✦ Policy iteration

✦ Value iteration: numerical scheme that scales well and works in practice

4

Optimal reachability probabilities of MDPs

• How?

✦ Linear programming

✦ Policy iteration

✦ Value iteration: numerical scheme that scales well and works in practice

4

used in the numerical PRISM model checker

[Kwiatkowska, Norman, Parker, 2011]

Value iteration

5

a½

½

⅓

⅔

bc

d

½

½ e

Value iteration

5

a½

½

⅓

⅔

bc

d

½

½ e

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

Value iteration

5

a½

½

⅓

⅔

bc

d

½

½ e

0 0 0 0

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

Value iteration

5

a½

½

⅓

⅔

bc

d

½

½ e

0 0 0 00 2/3 (b) 0 0

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

Value iteration

5

a½

½

⅓

⅔

bc

d

½

½ e

0 0 0 00 2/3 (b) 0 0

1/3 2/3 (b) 0 0

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

Value iteration

5

a½

½

⅓

⅔

bc

d

½

½ e

0 0 0 00 2/3 (b) 0 0

1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 0

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

Value iteration

5

a½

½

⅓

⅔

bc

d

½

½ e

0 0 0 00 2/3 (b) 0 0

1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 07/12 13/18 (b) 1/4 0

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

Value iteration

5

a½

½

⅓

⅔

bc

d

½

½ e

0 0 0 00 2/3 (b) 0 0

1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 07/12 13/18 (b) 1/4 0… … … …

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

Value iteration

5

a½

½

⅓

⅔

bc

d

½

½ e

0 0 0 00 2/3 (b) 0 0

1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 07/12 13/18 (b) 1/4 0… … … …

0.7969 0.7988 (b) 0.3977 0

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

Value iteration

5

a½

½

⅓

⅔

bc

d

½

½ e

0 0 0 00 2/3 (b) 0 0

1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 07/12 13/18 (b) 1/4 0… … … …

0.7969 0.7988 (b) 0.3977 00.7978 0.7992 (b) 0.3984 0

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

Value iteration

5

a½

½

⅓

⅔

bc

d

½

½ e

0 0 0 00 2/3 (b) 0 0

1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 07/12 13/18 (b) 1/4 0… … … …

0.7969 0.7988 (b) 0.3977 00.7978 0.7992 (b) 0.3984 0

≤0.001

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

Value iteration: which guarantees?

6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k

½

½

½

½

½½

…k-2 1½

½½

½


6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k

½

½

½

½

½½

…k-2 1½

½½

½

State 0 1 2 3 … k-1 k k+1 … 2k


6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k

½

½

½

½

½½

…k-2 1½

½½

½

State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0


6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k

½

½

½

½

½½

…k-2 1½

½½

½

State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0


6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k

½

½

½

½

½½

…k-2 1½

½½

½

State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0Step 3 1 1/2 1/4 0 … 0 0 0 … 0


6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k

½

½

½

½

½½

…k-2 1½

½½

½

State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0Step 3 1 1/2 1/4 0 … 0 0 0 … 0Step 4 1 1/2 1/4 1/8 … 0 0 0 … 0


6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k

½

½

½

½

½½

…k-2 1½

½½

½


… … … … … … … … … … …Step k 1 1/2 1/4 1/8 … 0 0 … 01/ 2k−1


6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k

½

½

½

½

½½

…k-2 1½

½½

½


… … … … … … … … … … …Step k 1 1/2 1/4 1/8 … 0 0 … 0

Step k+1 1 1/2 1/4 1/8 … 0 … 0

1/ 2k−1

1/ 2k−1 1/ 2k


6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k

½

½

½

½

½½

…k-2 1½

½½

½


… … … … … … … … … … …Step k 1 1/2 1/4 1/8 … 0 0 … 0

Step k+1 1 1/2 1/4 1/8 … 0 … 0≤1/2k 1/ 2k−1

1/ 2k−1 1/ 2k


6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k

½

½

½

½

½½

…k-2 1½

½½

½


… … … … … … … … … … …Step k 1 1/2 1/4 1/8 … 0 0 … 0

Step k+1 1 1/2 1/4 1/8 … 0 … 0≤1/2k

Real value: 1/2 (by symmetry)

1/ 2k−1

1/ 2k−1 1/ 2k

Contributions

7

Contributions

1. Enhanced value iteration algorithm with strong guarantees

7

Contributions


• performs two value iterations in parallel

7

Contributions



• keeps an interval of possible optimal values

7

Contributions




• uses the interval for the stopping criterion

7

Contributions





2. Study of the speed of convergence

7

Contributions






• also applies to classical value iteration

7

Contributions






• also applies to classical value iteration

3. Improved rounding procedure for exact computation

7

Interval iteration

8

½

½

½

½

k

k-1

…k+2k+1 2k-1 2n

½

½

½

½

½½

…k-2 1½

½½

½

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

(n)

Interval iteration

8

½

½

½

½

k

k-1

…k+2k+1 2k-1 2n

½

½

½

½

½½

…k-2 1½

½½

½

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

(n)

Interval iteration

8

½

½

½

½

k

k-1

…k+2k+1 2k-1 2n

½

½

½

½

½½

…k-2 1½

½½

½

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

fmax(x)

s= max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

x (n+1) = fmax(x (n))

Interval iteration

8

½

½

½

½

k

k-1

…k+2k+1 2k-1 2n

½

½

½

½

½½

…k-2 1½

½½

½

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

fmax(x)

s= max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

x (n+1) = fmax(x (n))

usual stopping criterion

Interval iteration

8

½

½

½

½

k

k-1

…k+2k+1 2k-1 2n

½

½

½

½

½½

…k-2 1½

½½

½

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

NEW stopping criterion

fmax(x)

s= max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

x (n+1) = fmax(x (n))


Interval iteration

8

½

½

½

½

k

k-1

…k+2k+1 2k-1 2n

½

½

½

½

½½

…k-2 1½

½½

½

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

NEW stopping criterion

fmax(x)

s= max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

ys(0) = 0 if s =

1 otherwise

⎧⎨⎪⎪

⎩⎪⎪

x (n+1) = fmax(x (n)) y(n+1) = f

max(y(n))


Fixed point characterization

9

(Prsmax(F ))

s∈S is the smallest fixed point of . f

max


9

(Prsmax(F ))


max

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6


9

(Prsmax(F ))


max

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

always converges towards (Prsmax(F ))s∈S


9

(Prsmax(F ))


max

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6


not always…!


9

(Prsmax(F ))


max

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6


not always…!

a

½½

c½g

b

e

d

0.2

0.3

f


9

(Prsmax(F ))


max

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6


not always…!

a

½½

c½g

b

e

d

0.2

0.3

f

1 1

0.7

0 0

1


9

(Prsmax(F ))


max

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6


not always…!

a

½½

c½g

b

e

d

0.2

0.3

f

0.5 0.5

0.45

0 0

1

Solution: ensure uniqueness!

10

a

½½

c½g

b

e

d

0.2

0.3

f

Usual techniques applied for MDPs do not apply…


10

a

½½

c½g

b

e

d

0.2

0.3

f


Prsmax(F )= 1

Prsmax(F )= 0


10


Prsmax(F )= 1

a

½½

c½g

b

e

0.2

0.3

f

Prsmax(F )= 0

Still multiple fixed points!


10


NEW! Use Maximal End Components… (computable in polynomial time)

a

½½

c½g

b

e

d

0.2

0.3

f

[de Alfaro, 1997]


10


NEW! Use Maximal End Components… (computable in polynomial time)

a

½½

c½g

b

e

d

0.2

0.3

f

Bottom Maximal End Component


Trivial Maximal End Component

Maximal End Component

[de Alfaro, 1997]


10


NEW! Use Maximal End Components… (computable in polynomial time)and trivialize them!

½½

½g

b

e

0.2

0.3

f

Now, unicity of the fixed point

Max-reduced MDP

[de Alfaro, 1997]

An even smaller MDP for minimal probabilities

11

a

½½

c½g

b

e

d

0.2

0.3

f






11

a

½½

c½g

b

e

d

0.2

0.3

f





Non-trivial (and non accepting) MEC have null minimal probability!


11

b

e

0.2

0.3

f

Non-trivial (and non accepting) MEC have null minimal probability!

Min-reduced MDP

Interval iteration algorithm in reduced MDPs

12

Algorithm 1: Interval iteration algorithm for minimum reachability

Input: Min-reduced MDP M = (S,↵M, �M), convergence threshold "

Output: Under- and over-approximation of Pr

minM (F s+)

1 xs+ := 1; xs� := 0; ys+ := 1; ys� := 02 foreach s 2 S \ {s+, s�} do xs := 0; ys := 13 repeat4 foreach s 2 S \ {s+, s�} do5 x

0s := mina2A(s)

Ps02S �M(s, a)(s0)xs0

6 y

0s := mina2A(s)

Ps02S �M(s, a)(s0) ys0

7 � := maxs2S(y0s � x

0s)

8 foreach s 2 S \ {s+, s�} do x

0s := xs; y

0s := ys

9 until � 6 "

10 return (xs)s2S , (ys)s2S

– y

(0) = y

(0) and for all n 2 N, y(n+1) = f

min

(y(n));

– y

(0) = y


max

(y(n)).

Because of the new choice for the initial vector, notice that y and y are non-increasing sequences. Hence, with the same reasoning as above, we know thatthese sequences converge, and that their limit, denoted by y

(1) and y

(1) re-spectively, are the minimal (respectively, maximal) reachability probabilities. Inparticular, notice that x and y, as well as x and y, are adjacent sequences, andthat

x

(1) = y

(1) = Pr

min

M (F s+) and x

(1) = y

(1) = Pr

max

M (F s+) .

Let us first consider a min-reduced MDP M. Then, our new value iterationalgorithm computes both in the same time sequences x and y and stops as soon

as ky(n) � x

(n)k 6 ". In case this criterion is satisfied, which will happen aftera finite (yet possibly large and not bounded a priori) number of iterations, wecan guarantee that we obtained over- and underapproximations of Pr

min

M (F s+)with precision at least " on every component. Because of the simultaneous com-putation of lower and upper bounds, we call this algorithm interval iterationalgorithm, and specify it in Algorithm 1. A similar algorithm can be designedfor maximum reachability probabilities, by considering max-reduced MDPs andreplacing min operations of lines 5 and 6 by max operations.

Theorem 15. For every min-reduced (respectively, max-reduced) MDP M, andconvergence threshold ", if the interval iteration algorithm returns the vectors x

and y on those inputs, then for all s 2 S, Pr

min

M,s

(F s+

) (respectively, Pr

max

M,s

(F s+

))is in the interval [x

s

, y

s

] of length at most ".

We implemented a prototype of the algorithm in OCaml.

Example 16. For the same example as the one in Example 14, our tool convergesafter 10548 steps, and outputs, for the initial state s = n, x

n

= 0.4995 andy

n

= 0.5005, given a good confidence to the user.

14


12




minM (F s+)


0s := mina2A(s)


6 y

0s := mina2A(s)


7 � := maxs2S(y0s � x

0s)


0s := xs; y

0s := ys

9 until � 6 "


– y

(0) = y


min

(y(n));

– y

(0) = y


max

(y(n)).


(1) and y


x

(1) = y

(1) = Pr

min

M (F s+) and x

(1) = y

(1) = Pr

max

M (F s+) .


as ky(n) � x


min




min

M,s

(F s+

) (respectively, Pr

max

M,s

(F s+


s

, y

s




n

= 0.4995 andy

n


14

Sequences x and y converge towards the minimal probability to reach . Hence, the algorithm terminates by returning an interval

of length at most ε for each state containing .Prsmax(F )


12




minM (F s+)


0s := mina2A(s)


6 y

0s := mina2A(s)


7 � := maxs2S(y0s � x

0s)


0s := xs; y

0s := ys

9 until � 6 "


– y

(0) = y


min

(y(n));

– y

(0) = y


max

(y(n)).


(1) and y


x

(1) = y

(1) = Pr

min

M (F s+) and x

(1) = y

(1) = Pr

max

M (F s+) .


as ky(n) � x


min




min

M,s

(F s+

) (respectively, Pr

max

M,s

(F s+


s

, y

s




n

= 0.4995 andy

n


14

Sequences x and y converge towards the minimal probability to reach . Hence, the algorithm terminates by returning an interval

of length at most ε for each state containing .Prsmax(F )

Possible speed-up: only check size of interval for a given state…

Rate of convergence

13

x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

s= Pr

smin(F≤n ) ys = Prs

min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½

Rate of convergence

13


s= Pr


min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½

2 BMECs and only trivial MECs

Rate of convergence

13


s= Pr


min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½

2 BMECs and only trivial MECsattractor decomposition: length I

Rate of convergence

13


s= Pr


min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½

2 BMECs and only trivial MECsattractor decomposition: length Ismallest positive probability: η

Rate of convergence

13


s= Pr


min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½


Leaking property: ∀n ∈ ! Prsmax(G≤nI¬( ∨ ))≤ (1− ηI )n

Rate of convergence

13


s= Pr


min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½



ys(nI )−x

s(nI ) = Pr

sσ(G≤nI (¬ ))−Pr

s′σ (F≤nI )≤Pr

s′σ (G≤nI (¬ ))−Pr

s′σ (F≤nI )

Rate of convergence

13


s= Pr


min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½



G≤n(¬ )≡ G≤n¬( ∨ )⊕F≤nsince

= Prs′σ (G≤nI¬( ∨ ))≤ (1− ηI )n

ys(nI )−x

s(nI ) = Pr

sσ(G≤nI (¬ ))−Pr

s′σ (F≤nI )≤Pr

s′σ (G≤nI (¬ ))−Pr

s′σ (F≤nI )

Rate of convergence

13


s= Pr


min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½



The interval iteration algorithm converges in at most steps.Ilogε

log(1− ηI )

⎡

⎢⎢⎢

⎤

⎥⎥⎥

Stopping criterion for exact computation

14

MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities


14


[Chatterjee, Henzinger 2008] claim for exact computation possible after iterations of value iterationd 8M


14



Optimal probabilities and policies can be computed by the interval iteration algorithm in at most steps.O((1/ η)NN 3 logd)


14




Improvement since !1/ η ≤d N ≤M


14




Improvement since !1/ η ≤d N ≤M

Sketch of proof: • use as threshold (with α gcd of optimal probabilities)

• upper bound on α based on matrix properties of Markov chains:

ε = 1/ 2α

α =O(NNd 3N2

)

Conclusion and related work• Framework allowing guarantees for value iteration algorithm

• General results on convergence rate

• Criterion for computation of exact value

• Future work: test of our preliminary implementation over real instances

15





15

• [Katoen, Zapreev, 2006] On-the-fly detection of steady-state in the transient analysis of continuous-time Markov chains

⁓





15


• [Kattenbelt, Kwiatkowska, Norman, Parker, 2010] CEGAR-based approach for stochastic games

⁓





15


• [Kattenbelt, Kwiatkowska, Norman, Parker, 2010] CEGAR-based approach for stochastic games

• To be published at ATVA 2014 [Brázdil, Chatterjee, Chmelík, Forejt, Křetínský, Kwiatkowska, Parker, Ujma, 2014] same techniques in a machine learning framework with almost sure convergence and computation of non-trivial end components on-the-fly

⁓

reachability in mdps: refining convergence of value...

Documents