institute for applied information processing and communications (iaik) 1 tu graz/computer...

Institute for Applied Information Processing and Communications (IAIK)

1

TU Graz/Computer Science/IAIK Graz, 2009 AK Design and Verification

Presentation for the Lecture:

AK Design and Verification

by

Robert Könighofer

[email protected]

A Strategy Improvement Algorithm for Mean Payoff Games

http://www.iaik.tugraz.at


2

TU Graz/Computer Science/IAIK AK Design and Verification

Contents

Main Source: H. Björklund, S. Sandberg, S. Vorobyov: A combinatorial strongly subexponential strategy improvement algorithm for mean payoff games. [1]

Recap: Mean Payoff Games 0-Mean Partition Problem

Longest Shortest Path Problem Algorithm Improvements

Ergodic Partition Problem Complexity Appendix: proof sketches of main theorems



3


Recap: Mean Payoff Games (MPG)

Given: Finite, directed, edge weighted, leafless graph:

G = (V, E, w) V = VMAX ∪ VMIN , w: E {-W, … , 0, … , W}

Example:

-1

2

-8

4

17

...VMAX

...VMIN



4


Recap: Mean Payoff Games (MPG)

Notation: 2 Player: MIN and MAX Play ρ = e0e1e2e3 … payoff(ρ) = val(ρ) = average(w(ei)) Positional Strategy for MAX:

σMAX: VMAX V so that (v, σ(v)) ∈ EGoals: MAX: maximize val(ρ) MIN: minimize val(ρ)

k

ii

kew

k 0

)(1

lim



5


MPG: Properties

Optimal strategy is positional val(v) = val(ρ) when ρ starts in v and both

players play optimal Optimal σMAX: ensures payoff(ρ) ≥ val(v)

Optimal σMIN: ensures payoff(ρ) ≤ val(v)

Play ρ = finite stem + loop val(ρ) = average(loop)



6


Computational Problems

Decision Problem: Can MAX guarantee payoff > p from v0 ?

p-Mean Partition: Divide V into V≤p and V>p

MAX can guarantee payoff >p from all v∈V>p

MIN can guarantee payoff ≤p from all v∈V≤p

Ergodic Partition: Compute val(v) for all v∈V



7


0-Mean Partition: Approach

MPG LSP (Longest Shortest Path Problem) Solve LSP by Strategy Improvement:

σ = σ0

while(σ changes):

σ = Improve(σ)



8


Longest Shortest Path Problem

Given: Finite, directed, edge weighted graph:

G = (V, t, E, w) V = VMAX ∪ VMIN

t = unique sink, t ∉ VMAX

here: σ0, avoiding negative cycles

Find: positional σ: shortest path from every v to t is as

long as possible in Gσ = G ∩ σ



9


Transformation MPG LSP

Insert ‘retreat vertex’ t For all vi ∈ VMAX: add edge ei = (vi,t), w(ei) = 0

Add edge (t,t) with w(t,t) = 0 Example:

-1

2

-8

4

17

...VMAX

...VMIN

0

0

0

t



10


Relation MPG LSP

MPG LSP v ∈ V>0 dist(v,t) = ∞

MAX: enforce pos. loop MAX: enforce pos. loop

MIN: enforce neg. loop MAX: retreat, dist(v,t) < ∞

-1

2

-8

4

17

-1

2

-8

4

17 0

0

0t

-1

2

-8

-2

17

-1

2

-8

-2

17 0

0

0t



11


Relation MPG LSP

Admissable strategy: enforces positive loops OR retreat

we iterate over admissable strategies only σ0: go to t from every v∈VMAX



12


Remember our approach:

MPG LSP (Longest Shortest Path Problem) Solve LSP by Strategy Improvement:

σ = σ0

while(σ changes):

σ = Improve(σ)



13


Quality of a Strategy

Only admissable strategies

MIN: take shortest path to t

(any other loop is positive)

valσ(v): shortest distance from v to t in Gσ

σ is better than σ* (σ > σ*) iff: ∀v∈V: valσ(v) ≥ valσ*(v) AND

∃v∈V: valσ(v) > valσ*(v)



14


Computing valσ(v): Shortest Path Problem

Given: Finite, directed, edge weighted graph:

G = (V, t, E, w) t = unique sink

Find: shortest path from every v to t

Algorithms: Dijkstra's algorithm: only positive weights Bellman Ford algorithm: also negative weights



15


Bellman Ford Algorithm [3]

Foreach v in V distance[v]= ∞ succ[v] = Nonedistance[t] = 0succ[t] = tdo |V|-1 times: foreach (u,v) in E: if(distance[v] + w(u,v) < distance[u]): distance[u] = distance[v] + w(u,v) succ[u] = v

u v2

5 2u v

2

4 2



16


Bellman Ford Algorithm

Example:

-1

2

-8

4

17

0

0

0

tv0

v1

v2

v3

...VMAX

...VMIN

-1

-8

7

0

0

0

tv0

v1

v2

v3 e1

e2

e3

e4

e5

0 1 2 3

t 0 0 0 0 0 0 0

v0 ∞ ∞ 0 0 0 0 0

v1 ∞ 0 0 0 0 0 0

v2 ∞ ∞ ∞ ∞ 6 -8 -8

v3 ∞ ∞ ∞ -1 -1 -1 -1

distances:



17


Bellman Ford Algorithm

Another Example:

0

t

v0

v1

0 1 2

t 0 0 0 0

v0 ∞ 12 12 9

v1 ∞ ∞ 10 10

distances:

12

-1 -2e2 e1

e3

Bellman Ford does not work with negative loops



18


Switching the strategy

Picking another successor in a VMAX vertex: Notation: σ[x y]

σ[x y](x) = y σ[x y](a) = σ(a)

Switch σ[v u] is: attractive iff: w(v,u) + valσ(u) > valσ(v)

profitable iff: σ[v u] > σ expensive to check

v u3

4 2v u

3

5 2



19


Main Theorems

Theorem 5.1: Switch is attractive Switch is profitable

Also holds for combinations of switches

Theorem 5.2: No more attractive switches strategy at least

as good as any other admissable strategy.



20


Putting the pieces togethersolve_0-mean_partition(G’): G = MPGtoLSP(G’) σ0 = computeInitialAdmissableStrategy(G) σ = σ0

while(σ changes): (σ, distance) = Improve(σ, G) VMAX = VMIN = emptySet foreach v in (G.V\t): distance[v] == ∞ ? VMAX.add(v) : VMIN.add(v) return (VMAX, VMIN, σ)

Improve(σ, G): Gσ = restrictGraph(G, σ) distance = BellmanFord(Gσ) (v, u, failed) = findAttractiveSwitch(distance) if(failed): return (σ, distance) return (σ[v->u], None)

findAttractiveSwitch(distance): foreach (v,u) in (G.E \ Gσ.E): if(w(v,u) + distance[u] > distance [v]): return (v,u,0) return (None, None,1)



21


Putting the pieces together

Example:

-1

2

-8

4

17

-1

2

-8

4

17 0

0

0t

0

0

-8

-1

MPG to LSP

σ = σ0

-1

2

-8

4

17 0

0

0t

∞

∞

∞

∞

σ = Improve(σ)-1

2

-8

4

17 0

0

0t

1

0

-7

-1

σ = Improve(σ)-1

2

-8

4

17

0

0t

1

0

-7

-1

σ = Improve(σ)



22


Improvements: Switches

Any combination of attractive switches improves the strategy

Multiple switches per iteration Try heuristics for selecting single or multiple

attractive switches Random, all attractive switches, ... Initial Multiple Switching Proceeding in Stages



23


Improvements: Randomization

Order of switches is crutial for complexity Facet F[u v] = set of strategies

where succ[u] = v, u ∈ VMAX

Randomization scheme [4]:

find_best_strategy(σ,G) if(G == Gσ): return σ while(true): randomly pick some F[u->v] not containing σ σ* = find_best_strategy(σ, G\(u,v)) if(σ* is optimal in G): return σ* G = F σ = σ[u->v]



24



Example:

-1

2

-8

4

17 0

0

0t

v0

v1

v2

v3

pick F[v1v0]

-1

2

-817 0

0

0t

v0

v1

v2

v3

(G\(v1,v0), σ):(G, σ):

σ* optimal in G?

NO! There is an

attractive switch!

-1

2

-8

4

7 0

0

t

v0

v1

v2

v3

σ = σ* , G = F[v1v0](G, σ):

-1

2

-8

4

17 0

0

0t

v0

v1

v2

v3

σ* = find_best_strategy(σ, G\(v1,v0))(G, σ*):

call

recursive



25



Example continued:

-1

2

-8

4

7 0

0

t

v0

v1

v2

v3

pick F[v0->t]

(G\(v0,t), σ ):

-1

2

-8

4

7 0

t

v0

v1

v2

v3

σ* optimal in G?

YES! No more attractive switches!

-1

2

-8

4

17 0

0

0t

v0

v1

v2

v3

(G, σ):

σ* = find_best_strategy(σ, G\(v0,t))(G, σ*):

-1

2

-8

4

7 0

t

v0

v1

v2

v3

0

recursive

call



26


Improvements: Recomputing the measure

Switch from σ to σ* = σ[v u] valσ(v) = valσ*(v) for some v

Compute nodes that change their value Bellman Ford Algorithm only for these

nodes



27


Which values change?

σ* = σ[u1 v1][u2 v2] ...

U = {u1, u2, ...}

Mark all vertices in Uwhile(U not empty): u = U.pop() foreach unmarked predecessors p of u in Gσ*: if w(p,x) + d[x] > d[p] for all unmarked succ x of p in Gσ*: mark u U.push(u)

u

p

x1 x2

5

23

46



28


Which values change?

Example:

-1

2

-8

4

17 0

0

0t

v0

v1

v2

v3

0

0

-8

-1

(G, σ):

-1

2

-8

4

17 0

0

0t

v0

v1

v2

v3

(G, σ*):

σ* = σ[v0v3]

Switch

-8

-1

2

-87 0

0t

0

-1

0v0

v1

v2

v3

(Gσ*, σ*):

marked U u {pi} {xj} condition TRUE for all xj ?

v0 {v0} v0 v2 v3 w(v2,v3)+d[v3]>d[v2]

7-1 > -8

YES

v0, v2 {v2} v2 - - - -



29


Complexity: p-Mean Partition

Without Improvement: n·W finite values n²·W switches Bellman Ford: O(n·m) O(n³·m·W)

With Improvement: Bellman Ford: O(ni · m), ni = |changing nodes|

Σ(ni) = n²·W

O(n²·m·W)



30


Complexity: p-Mean Partition

With Randomization [4]:

All together:

nnOWmnO log2 2,min

nnO log2



31


Ergodic Partition

w-, w+ : smallest and biggest edge weights val(ρ) = average(w(ei)) in [w-, w+]

denominator ≤ n Repeated p-mean partitioning:

Break interval [w-, w+] in parts of length 1/n² decide which v are in each interval

Min. difference between 2 values:

Unique value in each interval

2

1

)1(

11

1

1

nnnnn



32


Complexity: Comparison

This Approach Zwick / Paterson

Ergodic Partition

p-Mean Partition

nnOWmnO log2 2,min WmnO 2

WmnO 3 nnOW

WnWmnOlog

3

2)(log

,)log(min



33


Summary

The first strongly subexponential algorithm Algorithm for p-Mean Partition problem

Longest Shortest Path Problem Strategy improvement Improvements

Extended to Ergodic Partition problem



34


References[1] H. Björklund, S. Sandberg, S. Vorobyov, A combinatorial strongly subexponential

strategy improvement algorithm for mean payoff games, in: Proc. 29th International Symposium on Mathematical Foundations of Computer Science (MFCS), Vol. 3153 of Lecture Notes in Computer Science, Springer-Verlag, 2004, pp. 673–685.

[2] U. Zwick and M. Paterson. The complexity of mean payoff games on graphs. Theor.Comput. Sci., 158:343–359, 1996.

[3] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press and McGraw-Hill Book Company, Cambridge, MA, 2nd edition, 2001.

[4] J. Matousek, M. Sharir, and M. Welzl. A subexponential bound for linear programming. In 8th ACM Symp. on Computational Geometry, pages 1–8, 1992.



35


Questions / Discussion

... thank you for your attention



36


Appendix

Proof sketches for Theorem 5.1 and 5.2



37


Proof sketch: Theorem 5.1(attractive profitable)

Value increases at least in one vertex: Attractive switch σ* = σ[vu]:

w(v,u) + valσ(u) > valσ(v)

valσ*(v) > valσ(v)

Values do not decrease: New loops are positive New paths to the sink are longer



38



New loops are positive: Switch σ* = σ[vu]: New loop must contain switching vertex v

0

t

v

y

u

x y = valσ(v) < w(v,u) + valσ(u) ≤ x + y

x > 0

switch is attractive valσ(u) ≤ x – w(v,u) + y



39



New paths to the sink are longer : Switch σ* = σ[vu]: New path from any vertex n to t must contain

switching vertex v

0

tv

y

u

x y = valσ(v) < w(v,u) + valσ(u) ≤ x

y < xn

a

switch is attractive valσ(u) ≤ x – w(v,u)



40


Proof sketch: Theorem 5.2(stable optimal)

Proof for one-player games: MIN has no choices Finite values cannot become infinite

no more attractive switches no more new positive loops

Finite values do not improve finitely no more attractive switches no more new longer paths to t

Extension to two-player games: MIN does not need choices



41



No more new positive loops: Assumption: there is a new positive loop

0

t

v

y

u

x

switch attractive y < x + y

no attractive switches y ≥ x + y

x ≤ 0



42



No more new longer paths to t : Assumption: there is a new longer path to t

switch attractive y < x

no attractive switches y ≥ x

can not have better finite values0

tv

y

u

xn

a

institute for applied information processing and communications (iaik) 1 tu graz/computer...

Documents