csr2011 june14 16_30_ibsen-jensen

Post on 21-Oct-2014

257 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Kristoffer Arnsfelt Hansen, Rasmus Ibsen-Jensen and Peter Bro Miltersen. The complexity of solving reachability games using value and strategy iteration

TRANSCRIPT

The complexity of solving reachability games using value andstrategy iteration

Kristoffer Arnsfelt HansenRasmus Ibsen-Jensen Peter Bro Miltersen

Aarhus UniversityDenmarkCSR 2011, 14’th June

Overview

What are concurrent reachabillity games? Two standard algorithms solving concurrent

reachabillity games: The value iteration algorithm The strategy iteration algorithm

Examplify important facts for the proof of the time lower bound for both algorithms

1/42

Matrix games von Neumann 1928

0 -1 1

1 0 -1

-1 1 0

2/42

Matrix games von Neumann 1928

0 -1 1

1 0 -1

-1 1 0

2/42

0 -1 1

1 0 -1

-1 1 0

Each entry can be either 0, 1 or a pointer

vs.Dante* Lucifer*

Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998

0 1

* Naming convention from Hansen, Koucky and Miltersen, 2009 3/42

vs.Dante* Lucifer*

Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998

Each entry can be either 0, 1 or a pointer

* Naming convention from Hansen, Koucky and Miltersen, 2009 3/42

Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998

Each entry can be either 0, 1 or a pointer

3/42

Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998

Each entry can be either 0, 1 or a pointer

0

0 0

0

0 0

0

0 0

0

0 0

3/42

Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998

Each entry can be either 0, 1 or a pointer

1

0 1

0 0 1

0

0 0

0

0 0

0

0 0

3/42

Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998

Each entry can be either 0, 1 or a pointer

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

3/42

Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998

Each entry can be either 0, 1 or a pointer

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S:

S S

0 S

0 0

S S

0 S

0 0

3/42

Histories

Each entry can be either 0, 1 or a pointer

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

4/42

Histories and strategies

History: Sequence of positions and choices for each player in each position.

Strategy: Map from histories to probability distributions over choices in the position we arrive at after the history

S1: Set of strategies for Dante

S2: Set of strategies for Lucifer

H1/H2: Sets of stationary strategies (sets of strategies that only depends on the position we arrive at after the history)

5/42

Payoffs

v(i,σ,π): The probability to eventually reach a 1, from position i, if Dante plays by strategy σ and Lucifer by π.

6/42

Everett 1957

iviviv

),,( supinf),,( infsup :i1221 SSSS

Value of i

iH

viviv

),,( supinf),,( infsup :i1221 SSH

7/42

Algorithmic problems

Quantitatively solving a game: Given the game, compute the value of all positions.

Strategically solving a game: Given the game and ε>0, compute σ such that for all π and i: v(i,σ,π)>vi-ε.

8/42

Value iteration Shapley 1953

9/42

Value iteration computes the value of each position in Gt in iteration t, on the basis of the value of each position in Gt-1.

Gt: A modified version of G, where Dante loses after t moves.

Our results: Lower bound for value iteration There exists a concurrent reachabillity game

G, with N matrices and m rows and columns in each matrix, so that:

val(G)=1 and val(Gt) = 3m-N/2, for t=2mN/2

10/42

Our results: Upper bound for value iteration For any concurrent reachabillity game G val(G)-val(Gt)<ε for t=(1/ε)mO(N)

11/42

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

12/42

Value iteration example – G0

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

0

0 0

0

12/42

Value iteration example – G0

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

0

0

0

0

1 S S

0 1 S

0 0 1

13/42

Value iteration example – G1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0

0 0

00

0 0

13/42

Value iteration example – G1

Value iteration example – G1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1 0

0 0

0

0 0

01

1

1

1

13/42

Value iteration example – G1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

1

0 1

0 0 1

0

0

0

0

13/42

Value iteration example – G1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

1

0 1

0 0 1

0

0 0000

0

13/42

0

Value iteration example – G1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

1 0 0

0 1 0

0 0 1

0.33333/

0

0 0

13/42

Value iteration example – G1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0 0

0.33333/0 00

0 0

13/42

Value iteration example – G1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1 0

0 0

0 0000

00000.33333/

0 0

13/42

Value iteration example – G1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0 0 0

0 0 0

0 0 0

0

0.33333/0

00/

0

13/42

Value iteration example – G1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0

0

0

0.33333/0

0/ 0/

0/

13/42

Value iteration example – G2

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0

0

0

0.33333/0.33333

0.11111/ 0/

0/

14/42

Value iteration example – G3

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0.11111

0

0

0.33333/0.33333

0.11111/ 0/

0.03704/

15/42

Value iteration example – G4

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0.11111

0.03704

0

0.33333/0.33333

0.11111/ 0.01235/

0.03704/

16/42

Value iteration example – G5

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0.11111

0.03704

0.01235

0.33748/0.33333

0.11533/ 0.01754/

0.04147/

17/42

Value iteration example – G6

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0.11533

0.04147

0.01754

0.33925/0.33748

0.11855/ 0.02172/

0.04493/

18/42

Value iteration example – G7

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0.11855

0.04493

0.02172

0.34068/0.33925

0.12064/ 0.02519/

0.04772/

19/42

Value iteration example – G8

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0.12064

0.04772

0.02519

0.34187/0.34068

0.12388/ 0.02815/

0.04991/

20/42

Value iteration example – G9

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0.12388

0.04991

0.02815

0.34378/0.34187

0.12517/ 0.03070/

0.05129/

21/42

Strategy iterationChatterjee, de Alfaro, Henzinger ’06

22/42

Was conjectured to be fast

Our results: Upper bound for strategy iteration An ε-optimal strategy is computed after

t=(1/ε)mO(N) iterations of strategy iteration

This follows from the corresponding results for value iteration

23/42

Our results: Lower bound for strategy iteration There exists a concurrent reachabillity game

G, with N matrices, for large N, and m rows and columns in each matrix, so that:

val(G)=1 and The strategy optained by strategy iteration

guarantees winning probability at most 4m-N/2, for t= 2mN/4

24/42

Strategy iteration, m=2

N Number of iterations neededto get over 1/2

7 18446744073709551617

8 340282366920938463463374607431768211457

9 115792089237316195423570985008687907853269984665640564039457584007913129639937

Strategy iteration: Before iteration 1

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1. Start strategy for Dante:= Uniform

25/42

Strategy iteration: Before iteration 1

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

SS S

0 S

0 0

1. Start strategy for Dante:= Uniform

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

25/42

Strategy iteration: Iteration 1

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

S

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

26/42

Strategy iteration: Iteration 1

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

S

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

26/42

Strategy iteration: Iteration 1

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

1

1

1

0

0 0

0

0 0

0

0 0

0

0 0

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

S

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

26/42

Strategy iteration: Iteration 1

0 S

S S

0 S

0 0

S S

0 S

0 0

1

0 0

S S

S S

0 S

0 0

0.66667

The numbers on the edges are the probability that the edge is used.Edges without a number have probability 0.33333 to be used.

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

S

0

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

26/42

Strategy iteration: Iteration 1

0

1

0.66667

The numbers on the edges are the probability that the edge is used.Edges without a number have probability 0.33333 to be used.

0.66667

0.66667

0.66667

0.66667

0.66667

0.66667

0.66667

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

26/42

Strategy iteration: Iteration 1

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

S

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

26/42

Strategy iteration: Iteration 1

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11111

0.03704

0.01235

0.33333

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

26/42

0.11111

0.03704

0.01235

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

Strategy iteration: Iteration 1

1 S S

0 1 S

0 0 1

0.01235

0

0 0

S

1

1

1

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.012350.012350.01235

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.33748

26/42

Strategy iteration: Iteration 1

S

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11111

0.03704

0.01235

0.33333

0.33748

0.33332

0.32920

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

26/42

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

Strategy iteration: Iteration 1

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11111

0.03704

0.01235

0.33333

0.33748

0.33332

0.32920

0.34599

0.33317

0.32084

0.37327

0.33180

0.29493

0.47368

0.31579

0.21053

26/42

Strategy iteration: Iteration 2

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11111

0.03704

0.01235

0.33333

0.33748

0.33332

0.32920

0.34599

0.33317

0.32084

0.37327

0.33180

0.29493

0.47368

0.31579

0.21053

27/42

Strategy iteration: Iteration 2

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11111

0.03704

0.01235

0.33333

0.33748

0.33332

0.32920

0.34599

0.33317

0.32084

0.37327

0.33180

0.29493

0.47368

0.31579

0.21053

27/42

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11111

0.03704

0.01235

0.33333

0.33748

0.33332

0.32920

0.34599

0.33317

0.32084

0.37327

0.33180

0.29493

0.47368

0.31579

0.21053

27/42

Strategy iteration: Iteration 2

Strategy iteration: Iteration 2

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11677

0.04359

0.02065

0.33748

0.33748

0.33332

0.32920

0.34599

0.33317

0.32084

0.37327

0.33180

0.29493

0.47368

0.31579

0.21053

27/42

Strategy iteration: Iteration 2

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11677

0.04359

0.02065

0.33748

0.34031

0.33329

0.32640

0.35458

0.33289

0.31253

0.39987

0.33180

0.32917

0.55453

0.29186

0.15361

27/42

Strategy iteration: Iteration 3

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11677

0.04359

0.02065

0.33748

0.34031

0.33329

0.32640

0.35458

0.33289

0.31253

0.39987

0.33180

0.32917

0.55453

0.29186

0.15361

28/42

Strategy iteration: Iteration 3

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11677

0.04359

0.02065

0.33748

0.34031

0.33329

0.32640

0.35458

0.33289

0.31253

0.39987

0.33180

0.32917

0.55453

0.29186

0.15361

28/42

Strategy iteration: Iteration 3

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11677

0.04359

0.02065

0.33748

0.34031

0.33329

0.32640

0.35458

0.33289

0.31253

0.39987

0.33180

0.32917

0.55453

0.29186

0.15361

28/42

Strategy iteration: Iteration 3

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.12067

0.04825

0.02676

0.34031

0.34031

0.33329

0.32640

0.35458

0.33289

0.31253

0.39987

0.33180

0.32917

0.55453

0.29186

0.15361

28/42

Strategy iteration: Iteration 3

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.12067

0.04825

0.02676

0.34031

0.34031

0.33329

0.32640

0.35458

0.33289

0.31253

0.39987

0.33180

0.32917

0.55453

0.29186

0.15361

28/42

Strategy iteration: Iteration 3

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.12067

0.04825

0.02676

0.34031

0.34241

0.33325

0.32434

0.36097

0.33259

0.30644

0.41947

0.32646

0.25407

0.60831

0.27098

0.12071

28/42

Strategy iteration: Iteration 4

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.12067

0.04825

0.02676

0.34031

0.34241

0.33325

0.32434

0.36097

0.33259

0.30644

0.41947

0.32646

0.25407

0.60831

0.27098

0.12071

29/42

Strategy iteration: Iteration 4

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.12067

0.04825

0.02676

0.34031

0.34241

0.33325

0.32434

0.36097

0.33259

0.30644

0.41947

0.32646

0.25407

0.60831

0.27098

0.12071

29/42

Strategy iteration: Iteration 4

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.12067

0.04825

0.02676

0.34031

0.34241

0.33325

0.32434

0.36097

0.33259

0.30644

0.41947

0.32646

0.25407

0.60831

0.27098

0.12071

29/42

Strategy iteration: Iteration 4

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.12067

0.04825

0.02676

0.34031

0.34241

0.33325

0.32434

0.36097

0.33259

0.30644

0.41947

0.32646

0.25407

0.60831

0.27098

0.12071

29/42

Strategy iteration: Iteration 4

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.12360

0.05185

0.03154

0.34241

0.34241

0.33325

0.32434

0.36097

0.33259

0.30644

0.41947

0.32646

0.25407

0.60831

0.27098

0.12071

29/42

Strategy iteration: Iteration 4

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12360

0.05185

0.03154

0.34241

0.34241

0.33325

0.32434

0.36097

0.33259

0.30644

0.41947

0.32646

0.25407

0.60831

0.27098

0.12071

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

29/42

Strategy iteration: Iteration 4

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12360

0.05185

0.03154

0.34241

0.34407

0.33322

0.32271

0.36601

0.33230

0.30169

0.43486

0.32390

0.24125

0.64720

0.25350

0.09930

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

29/42

Strategy iteration: Iteration 5

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12360

0.05185

0.03154

0.34241

0.34407

0.33322

0.32271

0.36601

0.33230

0.30169

0.43486

0.32390

0.24125

0.64720

0.25350

0.09930

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

30/42

Strategy iteration: Iteration 5

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12360

0.05185

0.03154

0.34241

0.34407

0.33322

0.32271

0.36601

0.33230

0.30169

0.43486

0.32390

0.24125

0.64720

0.25350

0.09930

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

30/42

Strategy iteration: Iteration 5

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12360

0.05185

0.03154

0.34241

0.34407

0.33322

0.32271

0.36601

0.33230

0.30169

0.43486

0.32390

0.24125

0.64720

0.25350

0.09930

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

30/42

Strategy iteration: Iteration 5

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12593

0.05476

0.03544

0.34407

0.34407

0.33322

0.32271

0.36601

0.33230

0.30169

0.43486

0.32390

0.24125

0.64720

0.25350

0.09930

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

30/42

Strategy iteration: Iteration 5

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12593

0.05476

0.03544

0.34407

0.34543

0.33319

0.32138

0.37015

0.33202

0.29783

0.44745

0.32152

0.23103

0.67692

0.23882

0.08426

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

30/42

Strategy iteration: Iteration 6

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12593

0.05476

0.03544

0.34407

0.34543

0.33319

0.32138

0.37015

0.33202

0.29783

0.44745

0.32152

0.23103

0.67692

0.23882

0.08426

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

31/42

Strategy iteration: Iteration 6

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12593

0.05476

0.03544

0.34407

0.34543

0.33319

0.32138

0.37015

0.33202

0.29783

0.44745

0.32152

0.23103

0.67692

0.23882

0.08426

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

31/42

Strategy iteration: Iteration 6

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12593

0.05476

0.03544

0.34407

0.34543

0.33319

0.32138

0.37015

0.33202

0.29783

0.44745

0.32152

0.23103

0.67692

0.23882

0.08426

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

31/42

Strategy iteration: Iteration 6

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12786

0.05721

0.03873

0.34543

0.34543

0.33319

0.32138

0.37015

0.33202

0.29783

0.44745

0.32152

0.23103

0.67692

0.23882

0.08426

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

31/42

Strategy iteration: Iteration 6

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12786

0.05721

0.03873

0.34543

0.34543

0.33319

0.32138

0.37015

0.33202

0.29783

0.44745

0.32152

0.23103

0.67692

0.23882

0.08426

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

31/42

Strategy iteration: Iteration 6

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12786

0.05721

0.03873

0.34543

0.34658

0.33316

0.32026

0.37366

0.33177

0.29457

0.45807

0.31933

0.22260

0.70055

0.22633

0.07312

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

31/42

Strategy iteration: Iteration 6

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12786

0.05721

0.03873

0.34543

0.34658

0.33316

0.32026

0.37366

0.33177

0.29457

0.45807

0.31933

0.22260

0.70055

0.22633

0.07312

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

31/42

Strategy iteration: Iteration 7

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12786

0.05721

0.03873

0.34543

0.34658

0.33316

0.32026

0.37366

0.33177

0.29457

0.45807

0.31933

0.22260

0.70055

0.22633

0.07312

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

32/42

Strategy iteration: Iteration 7

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12786

0.05721

0.03873

0.34543

0.34658

0.33316

0.32026

0.37366

0.33177

0.29457

0.45807

0.31933

0.22260

0.70055

0.22633

0.07312

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

32/42

Strategy iteration: Iteration 7

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12950

0.05932

0.04156

0.34658

0.34658

0.33316

0.32026

0.37366

0.33177

0.29457

0.45807

0.31933

0.22260

0.70055

0.22633

0.07312

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

32/42

Strategy iteration: Iteration 7

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12950

0.05932

0.04156

0.34658

0.34658

0.33316

0.32026

0.37366

0.33177

0.29457

0.45807

0.31933

0.22260

0.70055

0.22633

0.07312

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

32/42

Strategy iteration: Iteration 7

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12950

0.05932

0.04156

0.34658

0.34758

0.33313

0.31929

0.37670

0.33153

0.29177

0.46723

0.31730

0.21547

0.71988

0.21557

0.06455

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

32/42

Strategy iteration: Iteration 8

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12950

0.05932

0.04156

0.34658

0.34758

0.33313

0.31929

0.37670

0.33153

0.29177

0.46723

0.31730

0.21547

0.71988

0.21557

0.06455

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

33/42

Strategy iteration: Iteration 8

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12950

0.05932

0.04156

0.34658

0.34758

0.33313

0.31929

0.37670

0.33153

0.29177

0.46723

0.31730

0.21547

0.71988

0.21557

0.06455

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

33/42

Strategy iteration: Iteration 8

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12950

0.05932

0.04156

0.34658

0.34758

0.33313

0.31929

0.37670

0.33153

0.29177

0.46723

0.31730

0.21547

0.71988

0.21557

0.06455

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

33/42

Strategy iteration: Iteration 8

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.13093

0.06118

0.04404

0.34758

0.34758

0.33313

0.31929

0.37670

0.33153

0.29177

0.46723

0.31730

0.21547

0.71988

0.21557

0.06455

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

33/42

Strategy iteration: Iteration 8

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.13093

0.06118

0.04404

0.34758

0.34845

0.33311

0.31844

0.37937

0.33130

0.28933

0.47527

0.31541

0.20932

0.73606

0.20618

0.05776

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

33/42

Strategy iteration: Iteration 9

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.13093

0.06118

0.04404

0.34758

0.34845

0.33311

0.31844

0.37937

0.33130

0.28933

0.47527

0.31541

0.20932

0.73606

0.20618

0.05776

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

34/42

Strategy iteration: Iteration 9

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.13093

0.06118

0.04404

0.34758

0.34845

0.33311

0.31844

0.37937

0.33130

0.28933

0.47527

0.31541

0.20932

0.73606

0.20618

0.05776

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

34/42

Strategy iteration: Iteration 9

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.13093

0.06118

0.04404

0.34758

0.34845

0.33311

0.31844

0.37937

0.33130

0.28933

0.47527

0.31541

0.20932

0.73606

0.20618

0.05776

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

34/42

Strategy iteration: Iteration 9

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.13219

0.06283

0.04624

0.34845

0.34845

0.33311

0.31844

0.37937

0.33130

0.28933

0.47527

0.31541

0.20932

0.73606

0.20618

0.05776

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

34/42

Strategy iteration: Iteration 9

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.13219

0.06283

0.04624

0.34845

0.34845

0.33311

0.31844

0.37937

0.33130

0.28933

0.47527

0.31541

0.20932

0.73606

0.20618

0.05776

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

34/42

Strategy iteration: Iteration 9

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.13219

0.06283

0.04624

0.34845

0.34923

0.33309

0.31768

0.38176

0.33109

0.28715

0.48241

0.31366

0.20393

0.74985

0.19791

0.05224

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

34/42

Generalized Purgatory P(N,m) Lucifer repeatedly hides a number between 1

and m. Dante must try to guess the number. If he guesses correctly N times in a row, he

goes to heaven. If he ever guesses incorrectly overshooting

Lucifer’s number, he goes to hell.

35/42

Interesting fact

The probability that Dante goes to heaven from purgatory is nearly 1, if he plays well enough.

36/42

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

1

0 1

0

0

1

0 1

1

0 1

Strategy iteration on 3 matrices

37/42

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

1

0 1

0

0

1

0 1

1

0 1

t:=0

Strategy iteration on 3 matrices

37/42

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

0

0

1

0 1

1

0 1

t:=00

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

37/42

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

0

0

1

0 1

1

0 1

t:=10

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

38/42

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

0

0

1

0 1

1

0 1

t:=10

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

38/42

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

0

0

1

0 1

1

0 1

t:=10.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.25

0.125

38/42

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

t:=10.5

0.66667

0.33333

0.66667

0.33333

0.57143

0.42857

0.53333

0.46667

0.5

0.5

0.25

0.125

1

0 1

0

0

1

0 1

38/42

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

t:=20.5

0.66667

0.33333

0.66667

0.33333

0.57143

0.42857

0.53333

0.46667

0.5

0.5

0.25

0.125

1

0 1

0

0

1

0 1

39/42

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

t:=20.5

0.66667

0.33333

0.66667

0.33333

0.57143

0.42857

0.53333

0.46667

0.5

0.5

0.25

0.125

1

0 1

0

0

1

0 1

39/42

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

t:=20.66667

0.66667

0.33333

0.66667

0.33333

0.57143

0.42857

0.53333

0.46667

0.66667

0.53333

0.30476

0.20317

1

0 1

0

0

1

0 1

39/42

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

t:=20.66667

0.75000

0.25000

0.75000

0.25000

0.61765

0.38235

0.55654

0.44346

0.66667

1

0 1

0

0

1

0 1

0.53333

0.30476

0.20317

39/42

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

t:=30.66667

0.75000

0.25000

0.75000

0.25000

0.61765

0.38235

0.55654

0.44346

0.66667

1

0 1

0

0

1

0 1

0.53333

0.30476

0.20317

40/42

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

t:=30.66667

0.75000

0.25000

0.75000

0.25000

0.61765

0.38235

0.55654

0.44346

0.66667

1

0 1

0

0

1

0 1

0.53333

0.30476

0.20317

40/42

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

t:=30.75000

0.75000

0.25000

0.75000

0.25000

0.61765

0.38235

0.55654

0.44346

0.75000

0.55654

0.34374

0.25781

1

0 1

0

0

1

0 1

40/42

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

t:=30.75000

0.80000

0.20000

0.80000

0.20000

0.65072

0.34928

0.57399

0.42601

0.75000

0.55654

0.34374

0.25781

1

0 1

0

0

1

0 1

41/42

The end

Open problems: Find a fast algorithm for the problem

There exists a PSPACE algorithm for the problem, but it is not fast.

Thanks for listening

42/42

top related