modeling and analysis of markov chains using decision...

Modeling and Analysis of Markov ChainsUsing Decision Diagrams

Gianfranco Ciardo

Department of Computer Science and Engineering

University of California at Riverside

Riverside, CA 92521, USA

[email protected]

(partially based on work with Andrew S. Miner and Paul L. E. Gr ieco)

Outline2

• Background on CTMCs

◦ Continuous–time Markov chains and high–level CTMC models

◦ Matrix–vector description and operations for CTMC solution

◦ Sparse matrices

• Structured models

◦ High–level CTMC models structured into submodels

◦ State space and state indexing: potential vs. actual indices

◦ Multiway decision diagrams (MDDs)

• Decision–diagram–based CTMC encodings

◦ Multiterminal multiway decision diagrams (MTMDDs)

◦ Kronecker descriptors

◦ Matrix diagrams (MXDs)

◦ Edge–valued multiway decision diagrams (EVMDDs)

• Solution algorithms

◦ Vector–matrix multiplication with Kronecker–based encodings

◦ Jacobi vs. Gauss–Seidel style iterations

Background on CTMCs

Discrete-state models4

A discrete state model is fully specified by:

• a potential state space S the “type” of the states

• a set of initial states Sinit ⊆ S often there is a single initial state sinit

• a next-state functionN : S → 2bS naturally extended to sets: N (X ) =

⋃i∈X N (i)

The state space S of the model is the smallest set containing Sinit and satisfying:

• the recursive definition i ∈ S ∧ j ∈ N (i) ⇒ j ∈ S

• or the fixed-point equation X = X ∪N (X )

S = Sinit ∪N (Sinit) ∪N 2(Sinit) ∪N 3(Sinit) ∪ · · · = N ∗(Sinit)

Definition of continuous-time Markov chain5

A stochastic process {X(t) : t ≥ 0} is a collection of r.v.’s indexed by a time parameter t

We say that X(t) is the state of the process at time t

The possible values X(t) can ever assume for any t is (a subset of) the state space S

{X(t) : t ≥ 0} over a discrete S is a continuous-time Markov chain (CTMC) if

Pr {X(tn+1) = in+1 | X(tn) = in ∧ X(tn−1) = in−1 ∧ . . . ∧ X(t0) = i0}

= Pr {X(tn+1) = in+1 | X(tn) = in}

for any times 0 ≤ t0 ≤ . . . ≤ tn−1 ≤ tn ≤ tn+1 and states {i0, . . . , in−1, in, jn+1} ⊆ S

Markov property:

“given the present state, the future is independent of the past”

“the most recent knowledge about the state is all we need”

Markov chain description and analysis6

A continuous-time Markov chain (CTMC) {X(t) : t ≥ 0} with state space S is described by

• its infinitesimal generator Q = R− diag(R · 1) = R− diag(h)−1 ∈ R|S|×|S|

• its initial probability vector π(0) ∈ R|S|

where

• R is the transition rate matrix : R[i, j] is the rate of going to state j when in state i

• h is the expected holding time vector : h[i] = 1/∑

j∈S R[i, j]

• π(0)[i] = Pr {chain is in state i at time 0, i.e., initially}

Transient probability vector π(t) ∈ R|S|: π(t)[i] = Pr {X(t) = i}

• π(t) is the solution ofdπ(t)

dt= π(t) ·Q with initial condition π(0)

Steady-state probability vector π ∈ R|S|: π[i] = limt→∞ Pr {X(t) = i}

• π is the solution of π ·Q = 0 subject to∑

i∈S π[i] = 1 (Q must be ergodic)

Storing a CTMC explicitly7

We need to store:

• R a real matrix of size |S| × |S|

• h a real vector of size |S|

We focus on R, for which we can employ sparse storage:

• Requires memory proportional to η(R), the number of nonzeros in R, instead of |S|2

• Allows iterative numerical methods to run in time proportional to η(R), instead of |S|2

α

β

γ

α

γ2

0 1

3

4

ε

δ

0

0 1 2 3 4

1

2

3

4

α β

γ

ε

δ

α

γ

0

1

2

3

4

α β

γ

ε

δ

α

γ

1

0

0

3

2

2

4

Exploiting common entries with explicit storage8

We can map the real values in R to (small) integer indices:

• Causes a small runtime overhead

• Reduces memory by a constant factor if R contains several entries with the same value

• If all nonzero entries have the same value (e.g., 1), this is essentially the reachability graph

α

β

γ

α

γ2

0 1

3

4

ε

δ

0

1

2

3

4

α β

γ

ε

δ

α

γ

1

0

0

3

2

2

4

0

1

2

3

4

1

0

0

3

2

2

4

α β γ εδ

0 1 2 3 4

0

0

2

2 3

4

1

Structured models

Structured discrete-state models10

A structured discrete state model is specified by

• a potential state space S = SK × · · · × S1 =×K≥k≥1Sk

◦ the “type” of the (global) state

◦ Sk is the (discrete) local state space for submodel k

• a set of initial states Sinit ⊆ S

◦ often there is a single initial state sinit

• a set of events E defining a disjunctively-partitioned next-state function

◦ Nα : S → 2bS j ∈ Nα(i) iff state j can be reached by firing event α in state i

◦ N : S → 2bS is defined by N (i) =

⋃α∈E Nα(i)

◦ we can extendN to take sets of states as argument N (X ) =⋃

i∈X N (i)

◦ α is enabled in i iffNα(i) 6= ∅, otherwise it is disabled

◦ i is absorbing, or a trap, or dead iffN (i) = ∅

Petri nets and their state space S : finite case11

a

b c d

e

p

q s

r t

a

b c d

e

p

q s

r t

a fires

c fir

es

a

b c d

e

p

q s

r t

d fires

a

b c d

e

p

q s

r t

b fir

es

a

b c d

e

p

q s

r t

e fires

c fir

es

d fires

b fir

es

If the initial state is sinit = (N, 0, 0, 0, 0), S contains(N + 1)(N + 2)(2N + 3)

6states

State indexing12

Let the size of the kth local state space be nk = |Sk| and map Sk to {0, 1, . . . , nk−1}

S contains |S| = nK · nK−1 · · ·n1 states, but not all of them are actually reachable

A potential indexing of a (global) state i = (iK , . . . , i1) is simply the mixed-base value of i:

ψ(i) =∑

K≥k≥1

ik∏

k>l≥1

nl

An actual indexing for the reachable states is instead harder to define, store, and compute

ψ : S → {0, 1, . . . , |S|−1} ∪ {null}

in reality, we often have no a priori knowledge of Sk

Explicit generation of S and R13

Explore(in: Sinit,N ; out: S,R, ψ) is

1. n ← 0; state indices start at 02. S ← ∅; S contains the states explored so far3. U ← Sinit; U contains the unexplored states known so far4. for each i ∈ Sinit do5. ψ(i) ← n++; assign to i the next available index and increment n6. end for7. while U 6= ∅ do8. choose a state i in U and move it from U to S ;9. for each event α ∈ E and each state j ∈ Nα(i) do

10. if j 6∈ S ∪ U then search to determine whether j is a new state11. ψ(j) ← n++; assign to j the next available index and increment n12. U ← U ∪ {j}; remember to explore j later13. end if;14. R[ψ(i), ψ(j)] ← R[ψ(i), ψ(j)] + λα(i)∆α(i, j); ψ is used to index R

15. end for;16. end while;

ψ : S → {0, ..., |S|−1} ∪ {null} is a state indexing function (e.g., discovery order)

λα(i) is the rate at which event α fires in state i

∆α(i, j) is the probability that, if event α fires in state i, the next state is j

(Quasi–reduced ordered) multiway decision diagrams14

• Nodes are organized into K + 1 levels

◦ Level K contains only one root node

◦ Levels K−1 through 1 contain one or more nodes, NO DUPLICATES

◦ Level 0 contains only the two terminal nodes, 0 and 1 (false and true).

• For k > 0, a node at level k has |Sk| arcs pointing to nodes at level k−1

S4 = {0, 1, 2, 3}

S3 = {0, 1, 2}

S2 = {0, 1}

S1 = {0, 1, 2}

10

0 1 2 3

0 1 2 0 1 2

0 1 0 1 0 1

0 1 20 1 2 0 1 2

0 1 2

0 1

S =

0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3 3 32 0 0 1 1 2 0 0 1 1 2 0 1 2 2 2 2 2 21 0 1 0 1 1 0 1 0 1 1 1 1 0 0 0 1 1 10 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 1 2

[Kam et al. 1998] defined fully–reduced ordered MDDs as an interface to BDDs

How to define and store the indexing function ψ using MDDs15

S4 = {0, 1, 2, 3}

S3 = {0, 1, 2}

S2 = {0, 1}

S1 = {0, 1, 2}

10

0 1 2 3

0 1 2 0 1 2

0 1 0 1 0 1

0 1 20 1 2 0 1 2

0 1 2

0 1

S =

0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3 3 32 0 0 1 1 2 0 0 1 1 2 0 1 2 2 2 2 2 21 0 1 0 1 1 0 1 0 1 1 1 1 0 0 0 1 1 10 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 1 2

To compute the index of a state, use edge–value offsets:

• Sum the offsets found on the corresponding path:

ψ(2, 1, 1, 0) = 6 + 2 + 1 + 0 = 9

• A state is unreachable if the path is not complete:

ψ(0, 2, 0, 0) = 0 + 0 + ? + ? = null

lexicographic, not discovery, order!!!

2 0 2

0 1

0 1 2

0 1 2

0 1

0 1 3

1

1

0

2

0 1 11

0 1 2

0 0 0 3

0 0 4 0 1 2

1 3

2 1 6

1 5 8

19

6

0

1

2

State indexing options: potential ψ vs. actual ψ16

Once we know S :

• We can store the original N : S → 2bS or its restriction N : S → 2S

• We can store R : S × S → R or R : S × S → R

• We can choose algorithms that use π : S → R or π : S → R

With strictly explicit methods: using actual R and π works best

With (even just partially) implicit methods, there are tradeoffs

• Storing π instead of π is often unavoidable if we employ a full vector

• Symbolic storage of R is much cheaper than that of R in terms of memory requirements

• However, using R in conjunction with π complicates indexing...

• ...at the very least, it forces us to store ψ : S → {0, 1, . . . , |S| − 1} ∪ {null}, hence S

Saturation: an iteration strategy based on the model structu re17

MDD node p at level k is saturated if the set of states it encodes is a fixed point w.r.t.any α s.t. Top(α) ≤ k (thus, all nodes below p are also saturated)

• build the K-level MDD encoding of Sinit (if |Sinit| = 1, there is one node per level)

• saturate each node at level 1: fire in them all events α s.t. Top(α) = 1

• saturate each node at level 2: fire in them all events α s.t. Top(α) = 2(if this creates nodes at level 1, saturate them immediately upon creation)

• saturate each node at level 3: fire in them all events α s.t. Top(α) = 3(if this creates nodes at levels 2 or 1, saturate them immediately upon creation)

• . . .

• saturate the root node at level K : fire in it all events α s.t. Top(α) = K(if this creates nodes at levels K−1, K−2, . . . , 1, saturate them immediately upon creation)

states are not discovered in breadth-first order

Solution requirements: SMART vs. NuSMV (800MHz P-III)18

Time and memory to generate S using saturation in SMART vs. breadth–first iterations in NuSMV

Final memory (kB) Peak memory (kB) Time (sec)N |S| SMART NuSMV SMART NuSMV SMART NuSMV

Dining Philosophers: K = N50 2.23×1031 18 10,800 22 10,819 0.15 5.9

200 2.47×10125 74 27,155 93 72,199 0.68 12,905.7

10,000 4.26×106269 3,749 — 4,686 — 877.82 —

Slotted Ring Network: K = N10 8.29×109 4 5,287 28 10,819 0.13 5.5

15 1.46×1015 10 9,386 80 13,573 0.39 2,039.5

200 8.38×10211 1,729 — 120,316 — 902.11 —

Round Robin Mutual Exclusion: K = N + 120 4.72×107 18 7,300 20 7,306 0.07 0.8

100 2.85×1032 356 16,228 372 26,628 3.81 2,475.3

300 1.37×1093 3,063 — 3,109 — 140.98 —

Flexible Manufacturing System: K = 1910 4.28×106 16 1,707 26 11,238 0.05 9.4

20 3.84×109 55 14,077 101 31,718 0.20 1,747.8

250 3.47×1026 25,507 — 69,087 — 231.17 —

Decision–diagram–based CTMC encodings

Multiterminal multiway decision diagrams20

From BDDs to MDDs: allow multiway choices at each nonterminal node, not just binary choices

From BDDs to MTBDDs: allow multiple terminal nodes, not just 0 and 1

From BDDs to MTMDDs combine both generalizations

We can use a quasi–reduced MTMDD to encode a real matrix A : S × S → R

• Nodes are organized into 2K + 1 levels

◦ Variables {iK , iK−1, ..., i1, jK , jK−1, ..., j1} are mapped onto {2K, 2K−1, ..., 1}

◦ Let v(l) be the variable corresponding to level l ∈ {2K, 2K−1, ..., 1}

◦ Level 2K contains only one root node

◦ Levels 2K−1 through 1 contain one or more nodes, no duplicate nodes allowed

◦ Level 0 contains as many nodes as the different entries in A

• A node at level l > 0, with v(l) = ik or jk, has |Sk| arcs pointing to nodes at level l−1

A[i, j] = x ⇔ path labelled {iK , iK−1, ..., i1, jK , jK−1, ..., j1} leads to node x at level 0

MTMDDs encoding of the transition rate matrix21

In principle, the mapping v can be any permutation of the 2K variables

In practice, the order (iK , jK , iK−1, jK−1, ..., i1, j1) is usually the best choice

(we still need to decide a good order for the K “from-to” pairs)

When using MTMDDs to store the transition rate matrix, we have a choice:

• Store R : S × S → R (note that R[i, j] = 0 if i ∈ S and j 6∈ S )

◦ a natural choice if we use a compositional approach

• Store R : S × S → R (actually, S × S → R, but R[i, j] = 0 if i 6∈ S or j 6∈ S )

◦ usually requires more MTMDD nodes

◦ can be built by enumerating the entries explicitly and storing them implicitly in an MTMDD

◦ or by zeroing the rows corresponding to S \ S in the MTMDD encoding of R

i.e., premultiplying R by a filtering diagonal matrix F[i, i] = 1 if i ∈ S , 0 if i ∈ S \ S

An example of MTMDD22

S4 :{p1,p0}≡{0,1} S3 :{q0r0,q1r0,q0r1}≡{0,1,2} S2 :{s0,s1}≡{0,1} S1 :{t0,t1}≡{0,1}

a

b c d

e

p

q s

r t

R 0 1

0 1 0 1

0 1 20 1 2 0 1 2

0 1 20 1 2 0 1 20 1 2

0 10 1 0 10 1

0 1 0 1 0 1 0 10 1

0 1 0 10 1

0 1 0 1

0 1 2

µd

0 1

0 1

0 1 0 1

µa

0 1

µe

0 1

0 1

0 1 0 1

0 1

0 1 0 1

µc µb

Definition of Kronecker product23

Given K matrices Ak ∈ Rnk×nk , their Kronecker product is

A =⊗

K≥k≥1

Ak ∈ RnK ···n1×nK ···n1

where

• A[i, j] = AK [iK , jK ] ·AK−1[iK−1, jK−1] · · ·A1[i1, j1]

• using the mixed-base numbering scheme (indices start at 0)

i = (...((iK) · nK−1 + iK−1) · nK−2 · · · ) · n1 + i1 =∑

K≥k≥1

ik ·∏

k>l≥1

nl

nonzeros: η

(⊗

K≥k≥1

Ak

)=

∏

K≥k≥1

η(Ak)

Kronecker product by example24

Given the real matrices A =

[a00 a01

a10 a11

]and B =

b00 b01 b02

b10 b11 b12

b20 b21 b22

A⊗B =

[a00B a01B

a10B a11B

]=

a00b00 a00b01 a00b02 a01b00 a01b01 a01b02

a00b10 a00b11 a00b12 a01b10 a01b11 a01b12

a00b20 a00b21 a00b22 a01b20 a01b21 a01b22

a10b00 a10b01 a10b02 a11b00 a11b01 a11b02

a10b10 a10b11 a10b12 a11b10 a11b11 a11b12

a10b20 a10b21 a10b22 a11b20 a11b21 a11b22

Kronecker product expresses contemporaneity or synchronization

If A and B are the transition probability matrices of independent discrete-time Markov chains

⇒ A⊗B is the transition probability matrix of their composition

Kronecker-consistent decomposition of a CTMC model25

A decomposition of a discrete-state model describing a CTMC is Kronecker-consistent if:

• the potential transition rate matrix R is additively partitioned R =∑

α∈E Rα

• S = SK × · · · × S1, a global state i consists of K local states i = (iK , . . . , i1)

• and, most importantly, we can multiplicatively partition each Rα, that is, we can write

λα(i) = λK,α(iK) · · ·λ1,α(i1) and ∆α(i, j) = ∆K,α(iK , jK) · · ·∆1,α(i1, j1)

Rα = RK,α ⊗ · · · ⊗R1,α

We encode the potential transition rate matrix R with |E| ×K small matrices Rk,α ∈ Rnk×nk

for stochastic Petri nets with transition rates depending on at most one

place, any partition of the places into K subsets is consistent

(even with inhibitor, reset, or probabilistic arcs)

Kronecker description of the transition rate matrix of a CTMC26

• Parallel composition of K submodels with overall event set E (synchronizing vs. local)

• Global state i is a K-tuple (iK , ..., i1) of local states S ⊆ S = SK × · · · × S1

• Transition rate matrix R = R[S,S] where R =∑

α∈E

⊗

K≥k≥1

Rk,α

• Rk,α[ik, jk] =

λk,α(ik) ·∆k,α(ik, jk) if α is in submodel k

1 if α is not in submodel k and ik = jk

0 if α is not in submodel k and ik 6= jk

encode a huge R with K · |E| “small” matrices

“On the stochastic structure of parallelism and synchronisation models for distributed algorithms”

Plateau (SIGMETRICS 1985)

factor K slowdown, still needs a probability vector of size |S|

“Complexity of memory-efficient Kronecker operations with applications to the solution of Markov mod-

els” Buchholz, Ciardo, Donatelli, Kemper (INFORMS J. Comp., 2000)

Kronecker encoding of R (K = 5)27

S5 = ? S4 = ? S3 = ? S2 = ? S1 = ?

LEVELS

↓

EVENTS→

R5,a: ?

[01

]I I I R5,e: ?

[00

]

R4,a: ?

[00

]R4,b: ?

[00

]R4,c: ?

[01

]I I

I R3,b: ?

[01

]R3,c: ?

[00

]I R3,e: ?

[01

]

R2,a: ?

[00

]I I R2,d: ?

[01

]I

I I I R1,d: ?

[00

]R1,e: ?

[010

]

a

b c d

e

p

q s

r t

we determine a priori from the model whether Rk,α=I

Kronecker encoding of R =∑

α∈{a,b,c,d,e}

⊗5≥k≥1 Rk,α

28

S5 :{p1,p0}≡{0,1} S4 :{q0,q1}≡{0,1} S3 :{r0,r1}≡{0,1} S2 :{s0,s1}≡{0,1} S1 :{t0,t1}≡{0,1}

LEVELS

↓

EVENTS→

R5,a:

[0 γa

50 0

]I I I R5,e:

[0 0γe5 0

]

R4,a:

[0 γa

40 0

]R4,b:

[0 γb

40 0

]R4,c:

[0 0γc4 0

]I I

I R3,b:

[0 0γb3 0

]R3,c:

[0 γc

30 0

]I R3,e:

[0 0γe3 0

]

R2,a:

[0 γa

20 0

]I I R2,d:

[0 0γd2 0

]I

I I I R1,d:

[0 γd

10 0

]R1,e:

[0 0γe1 0

]

a

b c d

e

p

q s

r t

R[00i30i1, 11i31i1] = γa5 · γ

a4 · γ

a2

Not a canonical representation: changing to γa5 ·2 and γa

4/2 would describe the same R

Kronecker encoding of R (K = 4)29

S4 = ? S3 = ? S2 = ? S1 = ?

LEVELS

↓

EVENTS→

R4,a :? I I I R4,e :?

R3,a :? R3,b :? R3,c :? I R3,e :?

R2,a :? I I R2,d :? I

I I I R1,d :? R1,e :?

a

b c d

e

p

q s

r t

The matrices for b and c differ only at level 3: we can merge them into a single (local) event l

we determine automatically from the model whether Rk,α=I

Kronecker encoding of R =∑

α∈{a,l,d,e}

⊗4≥k≥1 Rk,α

30

S4 :{p1,p0}≡{0,1} S3 :{q0r0,q1r0,q0r1}≡{0,1,2} S2 :{s0,s1}≡{0,1} S1 :{t0,t1}≡{0,1}

LEVELS

↓

EVENTS→

R4,a :

[0 µa

40 0

]I I R4,e:

[0 0µe

4 0

]

R3,a:

0 µa

3 00 0 00 0 0

R3,l:

0 0 00 0 µc

30 µb

3 0

I R3,e:

0 0 00 0 0µe

3 0 0

R2,a:

[0 µa

20 0

]I R2,d:

[0 0µd

2 0

]I

I I R1,d:

[0 µd

10 0

]R1,e:

[0 0µe

1 0

]

a

b c d

e

p

q s

r t

We have merged b and c into a single (local) event l

The matrix R encoded by the Kronecker descriptor (K = 4)31

a

b c d

e

p

q s

r t

{p1,p0}≡{0,1}

{q0r0,q1r0,q0r1}≡{0,1,2}

{s0,s1}≡{0,1}

{t0,t1}≡{0,1}

00 00 00 00 00 00 11 11 11 11 11 1100 00 11 11 22 22 00 00 11 11 22 2200 11 00 11 00 11 00 11 00 11 00 1101 01 01 01 01 01 01 01 01 01 01 01• • • • •

0000• · · · · · · · · · · · · · · · · · · a · · · · ·0001 · · · · · · · · · · · · · · · · · · · a · · · ·0010 · d · · · · · · · · · · · · · · · · · · · · · ·0011 · · · · · · · · · · · · · · · · · · · · · · · ·0100 · · · · · · · · c · · · · · · · · · · · · · · ·0101 · · · · · · · · · c · · · · · · · · · · · · · ·0110 · · · · · d · · · · c · · · · · · · · · · · · ·0111 · · · · · · · · · · · c · · · · · · · · · · · ·0200 · · · · b · · · · · · · · · · · · · · · · · · ·0201 · · · · · b · · · · · · · · · · · · · · · · · ·0210 · · · · · · b · · d · · · · · · · · · · · · · ·0211 · · · · · · · b · · · · · · · · · · · · · · · ·

1000 · · · · · · · · · · · · · · · · · · · · · · · ·1001 · · · · · · · · · · · · · · · · · · · · · · · ·1010 · · · · · · · · · · · · · d · · · · · · · · · ·1011 · · · · · · · · · · · · · · · · · · · · · · · ·1100 · · · · · · · · · · · · · · · · · · · · c · · ·1101• · · · · · · · · · · · · · · · · · · · · · c · ·1110• · · · · · · · · · · · · · · · · · d · · · · c ·1111 · · · · · · · · · · · · · · · · · · · · · · · c1200 · · · · · · · · · · · · · · · · b · · · · · · ·1201• e · · · · · · · · · · · · · · · · b · · · · · ·1210• · · · · · · · · · · · · · · · · · · b · · d · ·1211 · · e · · · · · · · · · · · · · · · · b · · · ·

Matrix diagrams (MXDs)32

A generalization of the idea of Kronecker encoding of a matrix

Allows us to enforce knowledge of the reachable states S to the potential transition rate matrix R

An example of (non–canonical) MXD:

7

1

2

1

11

24

7

9

2

5 6 1

2

8

1

2

2

1

3

577

35 42

56

42

54 6

12

70

90 10

20

44

14

18 22 24

4

32

33

15 18

24

28

36 4

8

22

10 12

16

22

10 12

16

56

72 8

16

55

25 30

40

1 20

1

2

0

1

0

1

0

1

2

0

1

2

01 20 1 20

10 10

1

010

1

010

000

001

002

010

011

012

100

101

102

110

111

112

200

201

202

210

211

212

000

001

002

010

011

012

100

101

102

110

111

112

200

201

202

210

211

212

R[001,210] = 5*2*9 = 90

R[111,211] = 2*(2*5+1*1) = 22

How to build an MXD33

Two methods for building MXDs in our tool SMART:

• From the Kronecker encoding

◦ Build the matrices for the Kronecker encoding

◦ Insert the matrices in an MXD, removing duplicate (and redundant?) ones

◦ Since we start from a Kronecker encoding, we have the same limitations

◦ Similar memory requirements to a Kronecker encoding in practice

◦ We can use them to store R instead of R

• From an explicit enumeration of the entries

◦ Requires the use of canonical MXDs [Miner PNPM’01]

◦ Entries are added individually (or in batches for greater efficiency)

◦ Requires more time (additional overhead prior to numerical solution) and memory

◦ They are fully general, can encode any matrix, do not need Kronecker consistency

MXDs exploit the presence of identity matrices Rk,α

(skipped levels)

EVBDDs by example34

[Lai et al. 1992] defined edge–valued binary decision diagrams

i3 0 0 0 0 1 1 1 1

i2 0 0 1 1 0 0 1 1

i1 0 1 0 1 0 1 0 1

f 0 2 3 2 2 4 1 0

03 0

-1 01 2

-1 22 3

-1

0 2 1 1 2 3

0 0 -3

T T T

0 2 0 -1 -1 1 1 0 -1 1 2 1

0 1

0 1 0 1

0 10 1

0 1

0 1 0 1

0 10 1

0 1

0 1 0 1

0 10 1

Canonicity: all nodes have 0–value on the 0–arc (only the first EVBDD is canonical)

In canonical form, the root incoming edge has value f(0 · · · 0)

EV+MDDs by example35

[Ciardo and Siminiceanu 2002] defined edge–valued positive multiway decision diagrams

From BDD to MDD: the usual extensionWe allow∞–edge values: can store partial arithmetic functionsNew canonization rule: essential to encode partial arithmetic functions

i3 0 0 0 0 1 1 1 1

i2 0 0 1 1 0 0 1 1

i1 0 1 0 1 0 1 0 1

f 0 2 3 2 2 4 1 0

0

T

0 0

0 2 2 0

0 2 1 0

0 1

0 1

0 1 0 1

0 1

i3 0 0 0 0 1 1 1 1

i2 0 0 1 1 0 0 1 1

i1 0 1 0 1 0 1 0 1

f 0 2 3∞∞ 4 1 0

0

0 0

0 3 4 0

020

0 1

0 1 0 1

0

0 1 0 1 0 1

1 0

T

Canonicity: all edge values are non–negative and at least one is zero

In canonical form, the root incoming edge has value mini∈ bS f(i)

f(1, 0, 0) =∞ but f(1, 0, 1) = 4the traditional EVMDD normalization cannot represent this function

Definition of EV +MDDs36

(differences from EVBDDs of [Lai et al. 1992] are in blue)

Given f : S → Z ∪ {∞}, an EV+MDD encoding f is a DAG with labelled edges such that:

• There is a single terminal node 〈0|T〉

• There is a single root 〈K|r〉 with a “dangling” incoming edge having value ρ ∈ Z ∪{∞}

• Non-terminal node 〈k|p〉 has nk edges, 〈k|p〉[ik].child has value 〈k|p〉[ik].val ∈ N ∪ {∞}

• If 〈k|p〉[ik].val =∞, the value of 〈k|p〉[ik].child is irrelevant

• If 〈k|p〉[ik].val ∈ N, 〈k|p〉[ik].child is the index of a node at level k − 1

• Each non-terminal node has at least one outgoing edge labelled with 0 (if not all∞)

• All nodes are unique (taking into account both 〈k|p〉[ik].child and 〈k|p〉[ik].val)

analogous definition uses real values instead of integers

Theorem: EV+MDDs are canonical

Using an EV +MDD to store the indexing function ψ37

The “MDD with offsets” used to store and evaluate ψ can be formalized as an EV+MDD

2 0 2

0 1

0 1 2

0 1 2

0 1

0 1 3

1

1

0

2

0 1 11

0 1 2

0 0 0 3

0 0 4 0 1 2

1 3

2 1 6

1 5 8

19

6

0

1

2

is equivalent to

1

0

0

T

0 1 2

0 1 2 3

0 1

0 1 2

0 1 2 0 1 2

0 1 0 1

0 1 2

0 1 6 11

4 0 1 20 2

0 1 0 0 3

0 0 2

lexicographic order for i ∈ Sψ(i) =∞ ⇔ i 6∈ S

EV∗MDDs by example38

One way to think about EV∗MDDs is “EV+MDD =− log(EV∗MDD)”:

0 ⇔ 1

edge values ∈ [0,+∞] ⇔ edge values ∈ [0, 1]

root incoming edge ∈ (−∞,+∞] ⇔ root incoming edge ∈ [0,+∞)

values add along the path ⇔ values multiply along the path

i3 0 0 0 0 1 1 1 1

i2 0 0 1 1 0 0 1 1

i1 0 1 0 1 0 1 0 1

f 3.5 7 2.1 0 0 2.8 7 1.4

1

7

1 1

1 0.3 0.4 1

1

0 1

0 1 0 1

1

0 1 0 1 0 1

1

T

0.20.5

Canonicity: all edge values are in [0, 1] and at least one is 1

In canonical form, the root incoming edge has value maxi∈ bS f(i)

Encoding R with an EV ∗MDD: initial non-canonical EV ∗MDDs39

We can store R with a 2K–level EVMDD: consider the example of Kronecker encoding

R =∑

α∈{a,l,d,e} Rα =∑

α∈{a,l,d,e}

⊗4≥k≥1 Rk,α

0 1

0 1 0 1

0 1

0 1 0 1

0 1

0 1

0 1 2

0 1 2

0 1

0 1

0 1

0 1 0 1

T

0 1 2

0 1 2

0 1

0 1 0 1

T

0 1 2

0 1

0 1

0 1

0 1 0 1

0 1 2

0 1 2

0 1

0 1

T

0 1 2 0 1 2

0 1

0 1 0 1

0 1

0 1

0 1 2

0 1 2

0 1

0 1

T

µa4

µa3

µa2

µc3 µb

3

µd1

µd2

µe1

µe4

µe3

Ra Rl Rd Re

note the shaded identity patterns!!!

Encoding R with an EV ∗MDD: canonical EV ∗MDDs40

(assume that µb3 is the largest rate in R)

0 1

0 1 0 1

0 1

0 1 0 1

0 1

0 1

0 1 2

0 1 2

0 1

0 1

0 1

0 1 0 1

T

Ra

0 1 2

0 1 2

0 1

0 1 0 1

T

Rl

0 1 2

0 1

0 1

0 1

0 1 0 1

0 1 2

0 1 2

0 1

0 1

T

Rd

0 1 2 0 1 2

0 1

0 1 0 1

0 1

0 1

0 1 2

0 1 2

0 1

0 1

T

Re

µa4µa

3µa2 µd

1µd2 µe

1µe4 µe

3

µc3 /µb

3

µb3

Encoding R with an EV ∗MDD: final EV ∗MDD41

Use a recursive algorithm to compute R =∑

α∈{a,l,d,e} Rα

µc3 /µb

3

R

T

0 1

0 1 0 1

0 1 20 1 2 0 1 2

0 1 20 1 2 0 1 20 1 2

0 10 1 0 10 1

0 1 0 1 0 1 0 10 1

0 1 0 10 1

0 1 0 1

µb3

µa4µa

3µa2 /µb

3

µd1µd

2 /µb3

µd1µd

2 /µb3 µd

1µd2 /µb

3

0 1 2

/µb3

µe1µe

4 µe3

hidden identity patterns remain!!!

Empirical comparison42

Memory consumption in bytes for:

S (MDD), R (Sparse), R (Kronecker), R and R (Pot/Act MXD), R and R (Pot/Act MTMDD)

Model N |S| |S| MDD Sparse Kron Pot Act Pot ActMXD MXD MTMDD MTMDD

qn4 2 324 324 333 14256 772 586 722 22784 227846 38416 38416 499 2524480 3092 2494 2870 36864 3686410 527076 527076 905 38524464 7076 5778 6522 62720 62720

qn8 2 6561 324 681 14256 1204 738 1688 43776 491526 5764801 38416 1119 2524480 2404 1674 5872 55040 7091210 214358881 527076 1953 38524464 3604 2610 12040 66304 98560

mserv2 3 1485 495 705 23352 4124 3246 3952 34560 407046 6345 2115 3176 111408 17468 13998 16432 111104 13516810 18495 6165 8846 342720 52228 42278 49032 306560 378460

mserv4 3 14256 495 1174 23352 5568 4098 4916 68864 796166 106596 2115 8453 111408 22920 17502 20054 254360 29885610 488268 6165 33739 342720 67560 52342 58934 873896 998552

mserv6 3 32076 495 1333 23352 5724 4066 5316 86784 1013766 239841 2115 8614 111408 23076 17470 20238 298596 34795610 1098603 6165 33900 342720 67716 52310 59118 982396 1112684

Model N |S| |S| MDD Sparse Kron Pot Act Pot ActMXD MXD MTMDD MTMDD

molloy4 5 4536 91 660 4204 1316 1148 2534 23552 281608 32805 285 1215 14676 2528 2300 5216 27648 3865610 87846 506 1766 27104 3556 3288 7504 31232 47360

molloy5 5 7776 91 846 4204 1100 792 4298 28416 371208 59049 285 1545 14676 1592 1188 9356 31232 5094410 161051 506 2223 27104 1920 1452 13778 33280 61952

kanban3 1 160 160 264 8032 500 412 544 18432 184323 58400 58400 937 5590400 7572 6786 8134 66816 670725 2546432 2546432 5646 303705920 45660 41816 48780 303776 303776

kanban4 1 256 160 332 8032 420 354 602 23552 245763 160000 58400 628 5590400 2500 2216 3284 44032 501765 9834496 2546432 1532 303705920 7940 7118 9950 92928 110592

kanban16 1 65536 160 1275 8032 2148 866 3000 95232 1075203 Overflow 58400 1902 5590400 3236 1746 10566 115456 1518085 Overflow 2546432 3149 303705920 4324 2626 24106 135168 216320

fms5 1 2100 84 535 3228 1456 604 1808 36096 409603 9432500 20600 3294 1554080 8304 5224 24320 151296 2470405 2016379008 852012 30490 82727748 34484 24664 138244 654892 1255108

fms21 1 4194304 84 2050 3228 3132 1132 7396 126976 1482243 Overflow 20600 6777 1554080 5028 2328 68762 176896 4377605 Overflow 852012 22038 82727748 6924 3524 255988 235008 1393932

Conclusions44

MTMDDs work best when many nonzero entries in R have the same value

Extensive presence of state–dependent rates can make MTMDDs inefficient

Kronecker, MXDs, and EV∗MDD remain instead efficient if the model is Kronecker–consistent

Size of Kronecker is not affected by variable ordering

MTMDDs, MXDs, EV∗MDDs, and MDDs are instead affected by variable ordering

Kronecker and MXDs are restricted to contiguous “(ik, jk)” or “(jk, ik)” variable ordering

MTMDDs and EV∗MDDs can instead have any ordering (but is this generality useful in practice?)

Kronecker is quite efficient but more restrictive

MXDs and EV∗MDDs are fully general and their size is similar to Kronecker, if it exists

MTMDDs are also fully general, their size tends to be larger than Kronecker, MXDs, and EV∗MDDs

MTMDDs, MXDs, and EV∗MDDs can encode R instead of R, but memory requirements increase

Kronecker must rely on an external (MDD) representation of S to zero unreachable rows

Fundamental advantage of Kronecker, MXDs over MTMDDs:they exploit the presence of numerous identity matrices in the description of R (alsotrue when encoding justN , e.g., in symbolic model-checking)As presented, EV∗MDDs do not exploit identities, but we are working on that

Solution algorithms

Potential Kronecker: the shuffle algorithm PSh46

First algorithm to be proposed for Kronecker solution [Plateau SIGMETRICS 1985]

PSh computes y ← x · ⊗K≥k≥1Ak

PSh+ computes y ← x · InK ···nk+1⊗Ak ⊗ Ink−1···n1

Based on the equality [Davio 1981]⊗

K≥k≥1

Ak =∏

K≥k≥1

ST(nK ···nk+1,nk···n1)

· (I| bS|/nk

⊗Ak) · S(nK ···nk+1,nk···n1)

where S(a,b) ∈ {0, 1}a·b×a·b is an (a, b)-perfect shuffle permutation:

S(a,b)[i, j] =

{1 if j = (i mod a) · b+ (i div a)

0 otherwise

Requires

• K vector permutations and

• K multiplications x · (I| bS|/nk

⊗Ak).

Complexity of the k-th multiplication: O(|S|/nk · η[Ak]).

Potential Kronecker: the shuffle algorithm PSh47

PSh(in: nK , ..., n1, AK , ...,A1; inout: bx, by);

1. nleft ← 1;2. nright ← nK−1 · · ·n1;3. for k = K down to 14. base ← 0;5. jump ← nk · nright ;6. if Ak 6= I then7. for block = 0 to nleft − 18. for offset = 0 to nright − 19. index ← base + offset ;

10. for h = 0 to nk − 111. zh ← bxindex ;12. index ← index + nright ;13. z′ ← z ·Ak ;14. index ← base + offset ;15. for h = 0 to nk − 116. byindex ← z′

h;17. index ← index + nright ;18. base ← base + jump;19. bx ← by;20. nleft ← nleft · nk ;21. nright ← nright/nk−1; Let n0 be 1

Example of shuffle computation48

y ← x · (A⊗B) Follow the entries marked with a diamond to obtain y2

x

y

A B

a B00 a B01

a B11a B10

x

u

v

A

A

A

0 0

0

00

0

w

B

B

0

0

I B2

I A3

y

S2,3T

S2,3

y ← x · ST2,3

| {z }

u

·(I3 ⊗A)

| {z }

v

·S2,3

| {z }

w

·ST6,1 · (I2 ⊗B) · S6,1

| {z }

yy2 ←

B02w0+B12w1+B22w2 =

B02v0+B12v2+B22v4 =

B02(u0A00+u1A10)+B12(u2A00+u3A10)+B22(u4A00+u5A10) =

B02(x0A00+x3A10)+B12(x1A00+x4A10)+B22(x2A00+x5A10) =

A00B02x0+A00B12x1+A00B22x2+A10B02x3+A10B12x4+A10B22x5

Complexity of PSh and PSh+49

PSh has complexity O

∑

K≥k≥1

|S|/nk · η[Ak]

= O(|S| ·K · α

)

Even when S = S , PSh is faster than Ordinary explicit multiplication only if

|S| ·K · α < |S| · αK ⇔ α > K1

K−1

PSh+ has complexity O(|S|/nk · η[Ak]

)= O

(|S| · α

)

Complexity of computing y ← y + x ·⊕

K≥k≥1 Ak:

O

∑

K≥k≥1

|S|/nk · η[Ak]

= O

|S|∑

K≥k≥1

η[Ak]

nk

= O(|S| ·K · α

)

Ordinary is faster than PSh if α ≤ 1PSh+ saves space, but not time, w.r.t. Ordinary

Potential Kronecker: PRw and PRw+50

PRwEl (in: i, x, nK , ..., n1,AK , ...,A1; inout: by)

1. for each jK s.t. AK [iK , jK ] > 02. j′K ← jK ; aK ← AK [iK , jK ];3. for each jK−1 s.t. AK−1[iK−1, jK−1] > 04. j′K−1 ← j′K · nK−1 + jK−1; aK−1 ← aK ·AK−1[iK−1, jK−1];

. . .5. for each j1 s.t. A1[i1, j1] > 06. j′1 ← j′2 · n1 + j1; a1 ← a2 ·A1[i1, j1];7. byj′

1← byj′

1+ x · a1;

PRw (in: bx, nK , ..., n1,AK , ...,A1; inout: by)

1. for i = 0 to | bS| − 12. PRwEl(i, bxi, nK , ..., n1,AK , ..,A1, by);

PRwEl+(in: nk, nk−1 · · ·n1, i−

k , ik, i+k , x,Ak ; inout: by)

1. for each jk s.t. Ak[ik, jk] > 02. j′ ← (i−k · nk + jk) · nk−1 · · ·n1 + i+k ;3. byj′ ← byj′ + x ·Ak[ik, jk];

PRw+(in: bx, nK · · ·nk+1, nk, nk−1 · · ·n1,Ak ; inout: by)

1. for i ≡ (i−k , ik, i+k ) = 0 to nK · · ·nk+1 · nk · nk−1 · · ·n1 − 1

2. PRwEl+(nk, nk−1 · · ·n1, i−

k , ik, i+k , bxi,Ak, by);

Complexity of PRw and PRw+51

PRw computes y← y + x ·A, according to the definition of Kronecker productRequires sparse row-wise format for each Ak

PRwEl computes the contribution of xi to each entry of y as

y ← y + xi ·Ai, bS

PRwEl reaches statement ak ← ak−1 ·Ak[ik, jk] O(αk) times.

PRw makes |S| calls to PRwEl , hence has complexity

O

|S| ·∑

K≥k≥1

αk

=

{O(|S| ·K

)= O(K · η[A]) if α ≤ 1

O(|S| · αK) = O(η[A]) if α > 1

PRw+ has complexity O

(|S| ·

η[Ak]

nk

)= O

(|S| · α

)

Complexity of computing y ← y + x ·⊕

K≥k≥1 Ak using PRw+: O(|S| ·K · α

)

PRw amortizes the multiplications for aK−1,...,a2 only if α� 1PRw+ saves space, but not time, w.r.t. Ordinary

Potential Kronecker: PRwCl and PRwCl+52

PRwCl (in: bx, nK , ..., n1,AK , ...,A1; inout: by)

1. for iK = 0 to nK − 12. for each jK s.t. AK [iK , jK ] > 03. j′K ← jK ; aK ← AK [iK , jK ];4. for iK−1 = 0 to nK−1 − 15. for each jK−1 s.t. AK−1[iK−1, jK−1] > 06. j′K−1 ← j′K · nK−1 + jK−1; aK−1 ← aK ·AK−1[iK−1, jK−1];. . .7. for i1 = 0 to n1 − 18. for each j1 s.t. A1[i1, j1] > 09. j′1 ← j′2 · n1 + j1; a1 ← a2 ·A1[i1, j1];

10. byj′1← byj′

1+ bxi · a1;

The overall complexity is O(|S| · αK

)

PRwCl+(in: bx, nK · · ·nk+1, nk, nk−1 · · ·n1,Ak ; inout: by)

1. for i−k = 0 to nK · · ·nk+1 − 12. for ik = 0 to nk − 13. for each jk s.t. Ak[ik, jk] > 04. j′k ← i−k · nk + jk ;5. for i+k = 0 to nk−1 · · ·n1 − 16. j′K ← j′k · nk−1 · · ·n1 + i+k ;7. byj′

K← byj′

K+ bx

(i−k

,ik,i+

k)·Ak[ik, jk];

Potential Kronecker: PRwCl53

x ·A = x ·

([a0 a1

a2 a3

]⊗

[b0 b1b2 b3

]⊗

[c0 c1c2 c3

])

PRw PRwCla1 c0 c1b0 c0 c1b1

a1 c2 c3b0 c2 c3b1

a0 c0 c1b0 c0 c1b1

a0 c2 c3b0 c2 c3b1

a1 c0 c1b2 c0 c1b3

a1 c2 c3b2 c2 c3b3

a0 c0 c1b2 c0 c1b3

a0 c2 c3b2 c2 c3b3

a3 c0 c1b0 c0 c1b1

a3 c2 c3b0 c2 c3b1

a2 c0 c1b0 c0 c1b1

a2 c2 c3b0 c2 c3b1

a3 c0 c1b2 c0 c1b3

a3 c2 c3b2 c2 c3b3

a2 c0 c1b2 c0 c1b3

a2 c2 c3b2 c2 c3b3

1 2 3 4 5 6 7 8

9 10 13 1411 12 15 16

18 21 2219 20 23 2417

26 29 3027 28 31 3225

34 37 3835 36 39 4033

42 45 4643 44 47 4841

50 53 5451 52 55 5649

58 61 6259 60 63 6457

a0

c0

c2

b0c1

c3

c0

c2

b1c1

c3

c0

c2

b2c1

c3

c0

c2

b3c1

c3

a1

c0

c2

b0c1

c3

c0

c2

b1c1

c3

c0

c2

b2c1

c3

c0

c2

b3c1

c3

a2

c0

c2

b0c1

c3

c0

c2

b1c1

c3

c0

c2

b2c1

c3

c0

c2

b3c1

c3

a3

c0

c2

b0c1

c3

c0

c2

b1c1

c3

c0

c2

b2c1

c3

c0

c2

b3c1

c3

1 2

3 4

5 6

7 8

9 10

11 12 15 16

13 14

19 20

17 18

23 24

21 22

31 32

29 30

27 28

25 26

55 56

53 54

51 52

49 50

63 64

61 62

59 60

57 58

47 48

45 46

43 44

41 42

39 40

37 38

35 36

33 34

Each “b” and “c” box corresponds to one multiplication: 8× 8 = 64 entries of the form aibjcl• Computing each entry from scratch: 64× 2 = 128 multiplications• Using PRw : 64 + 32 = 96 multiplications• Using PRwCl : 64 + 16 = 80 multiplications: interleaving helps!

same complexity as Ordinary regardless the sparsity level

but the entries of A are not generated in column order

Actual Kronecker: ARw54

ARw (in: x,AK , ...,A1,S ; inout: y)

1. for each i ∈ S2. I ← ψ(i);3. for each jK s.t. AK [iK , jK ] > 04. aK ← AK [iK , jK ];5. for each jK−1 s.t. AK−1[iK−1, jK−1] > 06. aK−1 ← aK ·AK−1[iK−1, jK−1];

. . .7. for each j1 s.t. A1[i1, j1] > 08. a1 ← a2 ·A1[i1, j1];9. J ← ψ(j);

10. yJ ← yJ + xI · a1;

Statement 9 computes the index J = ψ(j) of state j in the array y.

O

|S|·

∑

K≥k≥1

αk+αK · log |S|

=

{O (|S|·(K+log |S|)) if α ≤ 1

O(|S|·αK ·log |S|

)if α > 1

if K < log |S|: ARw has a log |S| overhead w.r.t. Ordinary

Actual Kronecker: ARwCl and ARwCl+55

ARwCl (in: x,AK , ...,A1,S ; inout: y)

1. for each iK ∈ SK all local states iK2. IK ← ψK(iK);3. for each jK s.t. AK [iK , jK ] > 04. JK ← ψK(jK);5. if JK 6= null then6. aK ← AK [iK , jK ];7. for each iK−1 ∈ SK−1(iK) all iK−1 compatible with iK8. IK−1 ← ψK−1(IK , iK−1);9. for each jK−1 s.t. AK−1[iK−1, jK−1] > 0

10. JK−1 ← ψK−1(JK , jK−1);11. if JK−1 6= null then12. aK−1 ← aK ·AK−1[iK−1, jK−1];

. . .13. for each i1 ∈ S1(iK , ..., i2) all i1 compatible with iK , ..., i214. I1 ← ψ1(I2, i1);15. for each j1 s.t. A1[i1, j1] > 016. J1 ← ψ1(J2, j1);17. if J1 6= null then18. a1 ← a2 ·A1[i1, j1];19. yJ1

← yJ1+ xI1 · a1;

note the need for EV+MDDs to index the state space

Actual Kronecker: ARwCl56

Complexity of ARwCl : O

∑

K≥k≥1

|S1| · · · |Sk| · αk · log nk

= O(|S| · αK · log nK)

(assuming that |S1| · · · |SK−1| � |S|)

Complexity of ARwCl+: O(|S| · α · log nK)

regardless of k, and the resulting complexity of computing

y ← y + x ·

⊕

K≥k≥1

Ak

S,S

using ARwCl+ is O (K · |S| · α · log nK)

only log nK overhead w.r.t. Ordinary for any sparsity level

but it cannot be used in a Gauss-Seidel iteration

Results for the numerical solution57

Matrix diagrams achieve the same efficiency as multiplication by blocks (ARwCl ). . .

. . . but can provide access by columns as required by Gauss–Seidel

Time requirements for the Kanban model (450MHz Pentium workstation with 384 Mbytes RAM)

number MXDs Kronecker Explicit

N |S| of arcs Gauss–Seidel Gauss–Seidel Jacobi Gauss–Seidel

in R Iters sec/iter Iters sec/iter Iters sec/iter Iters sec/iter

2 4,600 28,120 40 0.11 55 0.17 134 0.09 55 0.02

3 58,400 446,400 67 1.46 97 2.56 240 1.34 97 0.34

4 454,475 3,979,850 99 12.33 149 23.69 370 11.99 149 3.04

5 2,546,432 24,460,016 139 73.09 214 147.70 527 74.09 214 18.51

6 11,261,376 115,708,992 185 336.21 289 723.30 713 359.15 — —

7 41,644,800 450,455,040 238 1,289.91 374 2,922.80 — — — —

modeling and analysis of markov chains using decision...

Documents