uri zwick tel aviv university

33
Uri Zwick Tel Aviv University Simple Stochastic Games Mean Payoff Games Parity Games CSR 2008 Moscow, Russia

Upload: cissy

Post on 15-Jan-2016

44 views

Category:

Documents


1 download

DESCRIPTION

Uri Zwick Tel Aviv University. Simple Stochastic Games Mean Payoff Games Parity Games. CSR 2008 Moscow, Russia. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A. Simple Stochastic Games. Mean Payoff Games. Parity Games. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Uri Zwick Tel Aviv University

Uri ZwickTel Aviv University

Simple Stochastic GamesMean Payoff Games

Parity Games

CSR 2008Moscow, Russia

Page 2: Uri Zwick Tel Aviv University

Mean Payoff Games

Simple Stochastic Games

Parity Games

Randomized subexponential

algorithm for SSG

Deterministic subexponential

algorithm for PG

Page 3: Uri Zwick Tel Aviv University

Mean Payoff Games

Simple Stochastic Games

Parity Games

Page 4: Uri Zwick Tel Aviv University

R

R

R

R

A simple Simple Stochastic Game

Page 5: Uri Zwick Tel Aviv University

Simple Stochastic game (SSGs) Reachability version [Condon (1992)]

Objective: MAX/min the probability of getting to the MAX-sink

Two Players: MAX and min

MAX minRAND

R

MAX-sink

min-sink

Page 6: Uri Zwick Tel Aviv University

Simple Stochastic games (SSGs)Strategies

A general strategy may be randomized and history dependent

A positional strategy is deterministicand history independent

Positional strategy for MAX: choice of an outgoing edge from each MAX vertex

Page 7: Uri Zwick Tel Aviv University

Simple Stochastic games (SSGs)Values

Both players have positional optimal strategies

Every vertex i in the game has a value vi

positional general

positional general

There are strategies that are optimal for every starting position

Page 8: Uri Zwick Tel Aviv University

Simple Stochastic game (SSGs) [Condon (1992)]

The outdegrees of all non-sinks are 2

Terminating binary games

Easy reduction from general gamesto terminating binary games

All probabilities are ½.

The game terminates with prob. 1

Page 9: Uri Zwick Tel Aviv University

“Solving” terminating binary SSGs

The values vi of the vertices of a game are the unique solution of the following equations:

Corollary: Decision version in NP co-NP

The values are rational numbersrequiring only a linear number of bits

Page 10: Uri Zwick Tel Aviv University

Value iteration (for binary SSGs)

Iterate the operator:

Converges to the unique solution

But, may require an exponentialnumber of iterations just to get close

Page 11: Uri Zwick Tel Aviv University

Simple Stochastic game (SSGs) Payoff version [Shapley (1953)]

MAX minRAND

R

Limiting average version

Discounted version

Page 12: Uri Zwick Tel Aviv University

Markov Decision Processes (MDPs)

Values and optimal strategies of a MDP can be found by solving an LP

Theorem: [Epenoux (1964)]

MAX minRAND

R

Page 13: Uri Zwick Tel Aviv University

SSG NP co-NP – Another proof

Deciding whether the value of a game isat least (at most) v is in NP co-NP

To show that value v ,guess an optimal strategy for MAX

Find an optimal counter-strategy for min by solving the resulting MDP.

Is the problem in P ?

Page 14: Uri Zwick Tel Aviv University

Mean Payoff Games (MPGs)[Ehrenfeucht, Mycielski (1979)]

Non-terminating version

Discounted version

MPGsPayoffSSGs

Pseudo-polynomial algorithm (PZ’96)

MAX minRAND

R

ReachabilitySSGs

Page 15: Uri Zwick Tel Aviv University

Mean Payoff Games (MPGs)[Ehrenfeucht, Mycielski (1979)]

Value(σ,) – average of cycle formed

Again, both players have optimal positional strategies.

Page 16: Uri Zwick Tel Aviv University

Selecting the second largest element with only four storage locations [PZ’96]

Page 17: Uri Zwick Tel Aviv University

Parity Games (PGs) A simple example

2

1 4 1

3 2

EVEN wins if largest priorityseen infinitely often is even

Priorities

Page 18: Uri Zwick Tel Aviv University

Parity Games (PGs)

EVEN

3

ODD

8

EVEN wins if largest priorityseen infinitely often is even

Equivalent to many interesting problemsin automata and verification:

Non-emptyness of -tree automata

modal -calculus model checking

Page 19: Uri Zwick Tel Aviv University

Parity Games (PGs)

EVEN

3

ODD

8

Replace priority k by payoff (n)k

Mean Payoff Games (MPGs)

Move payoffs to outgoing edges

[Stirling (1993)] [Puri (1995)]

Page 20: Uri Zwick Tel Aviv University

Switches

…i

Value vector of strategy σ of MAX with respect to the optimal counter

strategy of min

Page 21: Uri Zwick Tel Aviv University

Strategy/Policy Iteration

Start with some strategy σ (of MAX)

While there are improving switches, perform some of them

As each step is strictly improving and as there is a finite number of strategies, the algorithm

must end with an optimal strategy

SSG PLS (Polynomial Local Search)

Page 22: Uri Zwick Tel Aviv University

Strategy/Policy IterationComplexity?

Performing only one switch at a time may lead to exponentially many improvements,even for MDPs [Condon (1992)]

What happens if we perform all profitable switches [Hoffman-Karp (1966)]

???

Not known to be polynomialBest upper bound: O(2n/n) [Mansour-Singh (1999)]

No non-linear examplesBest lower bounds: 2n-O(1) [Madani (2002)]

Page 23: Uri Zwick Tel Aviv University

A randomized subexponential algorithm for simple stochastic games

Page 24: Uri Zwick Tel Aviv University

Start with an arbitrary strategy for MAX

Choose a random vertex iVMAX

Find the optimal strategy ’ for MAX in the gamein which the only outgoing edge of i is (i,(i))

If switching ’ at i is not profitable, then ’ is optimal

Otherwise, let (’)i and repeat

A randomized subexponentialalgorithm for binary SSGs

[Ludwig (1995)][Kalai (1992)] [Matousek-Sharir-Welzl (1992)]

Page 25: Uri Zwick Tel Aviv University

A randomized subexponentialalgorithm for binary SSGs

[Ludwig (1995)][Kalai (1992)] [Matousek-Sharir-Welzl (1992)]

There is a hidden order of MAX vertices under which the optimal strategy returned by

the first recursive call correctly fixes the strategy of MAX at vertices 1,2,…,i

All correct !Would never be switched !

MAX vertices

Page 26: Uri Zwick Tel Aviv University

The hidden order

ui(σ) - the maximum sum of values of a strategy of MAX that agrees with σ on i

Page 27: Uri Zwick Tel Aviv University

The hidden order

Order the vertices such that

Positions 1,..,iwere switched

and would neverbe switched again

Page 28: Uri Zwick Tel Aviv University

SSGs are LP-type problems[Björklund-Sandberg-Vorobyov (2002)]

[Halman (2002)]

General (non-binary) SSGs can be solved in time

AUSO – Acyclic Unique Sink Orientations

Page 29: Uri Zwick Tel Aviv University

Parity Games (PGs) A simple example

2

1 4 1

3 2

EVEN wins if largest priorityseen infinitely often is even

Priorities

Page 30: Uri Zwick Tel Aviv University

Exponential algorithm for PGs[McNaughton (1993)] [Zielonka (1998)]

Vertices of highest priority

(even)

Vertices from whichEVEN can force the

game to enter A

Firstrecursive

call

Lemma: (i)

(ii)

Page 31: Uri Zwick Tel Aviv University

Exponential algorithm for PGs[McNaughton (1993)] [Zielonka (1998)]

Second recursive

call

In the worst case, both recursive calls are on games of size n1

Page 32: Uri Zwick Tel Aviv University

Deterministic subexponential alg for PGs Jurdzinski, Paterson, Z (2006)

Second recursive

call

Dominion

Idea: Look for small

dominions!

Dominion: A (small) set from which one of the players can win without the play ever leaving this set

Dominions of size s can be found

in O(ns) time

Page 33: Uri Zwick Tel Aviv University

Open problems

● Polynomial algorithms?● Is the Policy Improvement algorithm

polynomial?● Faster subexponential algorithms

for parity games? ● Deterministic subexponential algorithms

for MPGs and SSGs?● Faster pseudo-polynomial algorithms

for MPGs?