stochastic equilibria under imprecise deviations in...

Stochastic Equilibria under Imprecise Deviations inTerminal-Reward Concurrent Games

Patricia Bouyer, Nicolas Markey and Daniel Stan

CNRS, LSV, ENS Cachan & Universite Paris Saclay

GandALF 2016

1 / 26

1 Concurrent framework

2 Existence of equilibria

3 Imprecise deviations

2 / 26

The framework

3 / 26

The framework

Games with mixed strategies

Concurrent non-zero sum games allow . . .

To modelize heterogeneous systems;

Several events to occur simultaneously;

Agents’ goals not to be necessarily antagonistic

whereas mixed strategies enable . . .

Breaking the symmetry (by randomization)

Equilibria more likely to occur.

Main goals:

Synthesizing strategies;

As simple as possible.

4 / 26

The framework

Main goals:

4 / 26

The framework

Main goals:

4 / 26

The framework

Concurrent games: an example

0, 3 3, 0

aa,bbab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

The framework

0, 3 3, 0

aa,bbab,ba

Game on graph

Several agents

Terminal rewards

5 / 26

The framework

0, 3 3, 0

a−b−

aa,bbab,ba

Game on graph

Several agents

Terminal rewards

5 / 26

The framework

0, 3 3, 0

aa,bbab,ba

Game on graph

Several agents

Terminal rewards

5 / 26

The framework

0, 3 3, 0

aa,bbab,ba

Game on graph

Several agents

Terminal rewards

5 / 26

The framework

0, 3 3, 0

aa,bbab,ba

Game on graph

Several agents

Terminal rewards

5 / 26

The framework

0, 3 3, 0

aa,bbab,ba

Game on graph

Several agents

Terminal rewards

5 / 26

The framework

0, 3 3, 0

aa,bbab,ba

Game on graph

Several agents

Terminal rewards

5 / 26

The framework

0, 3 3, 0

aa,bbab,ba

Game on graph

Several agents

Terminal rewards

5 / 26

The framework

0, 3 3, 0

aa,bbab,ba

Game on graph

Several agents

Terminal rewards

5 / 26

The framework

0, 3 3, 0

aa,bbab,ba

Game on graph

Several agents

Terminal rewards

5 / 26

The framework

0, 3 3, 0

aa,bbab,ba

Game on graph

Several agents

Terminal rewards

5 / 26

The framework

0, 3 3, 0

aa,bbab,ba

Game on graph

Several agents

Terminal rewards

5 / 26

The framework

0, 3 3, 0

aa,bbab,ba

Game on graph

Several agents

Terminal rewards

5 / 26

The framework

0, 3 3, 0

aa,bbab,ba

Game on graph

Several agents

Terminal rewards

5 / 26

The framework

0, 3 3, 0

aa,bbab,ba

Game on graph

Several agents

Terminal rewards

5 / 26

The framework

Definition (Strategies)

A strategy for agent i is given by σi such that for all h ∈ States+,

σi (h) ∈ Dist(Allowi (last(h)))

Depends on sequence of visited states

σi ∈M is stationnary if ∀h σi (h) = σi (last(h));

σi ∈ S is pure if all distributions give probability 1 to some action;

mixed otherwise;

finite memory if . . ..

6 / 26

The framework

mixed otherwise;

6 / 26

The framework

mixed otherwise;

6 / 26

The framework

mixed otherwise;

6 / 26

The framework

Nash Equilibrium

Definition

We denote by φi reward function for agent i and (σi )i∈Agt the strategyprofile.

σ is a Nash Equilibrium (NE) if for all agent i and any other strategy for i(deviation) σ′i ,

Eσ[i/σ′i ] (φi ) ≤ Eσ (φi )

No incentive to deviate to increase its own expected reward.

1, 0 0, 1

2; 2; 2

7 / 26

The framework

Nash Equilibrium

Definition

We denote by φi reward function for agent i and (σi )i∈Agt the strategyprofile.σ is a Nash Equilibrium (NE) if for all agent i and any other strategy for i(deviation) σ′i ,

1, 0 0, 1

2; 2; 2

7 / 26

The framework

Nash Equilibrium

Definition

1, 0 0, 1

2; 2; 2

7 / 26

The framework

Nash Equilibrium

Definition

1, 0 0, 1

2; 2; 2

7 / 26

The framework

Nash Equilibrium

Definition

1, 0 0, 1

2; 2; 2

σi =1

3( + + )

⇒ Eσ(φ) =

7 / 26

The framework

Nash Equilibrium

Definition

1, 0 0, 1

2; 2; 2

σi =1

3( + + )

⇒ Eσ(φ) =

7 / 26

The framework

Nash Equilibrium

Definition

1, 0 0, 1

2; 2; 2

σ1 = σ2 =1

3( + + )

is a NE of rewardEσ(φ) =

7 / 26

The framework

Existence of equilibria under mixed strategies

Theorem (Nash 1950)

Every one-stage game has a Nash Equilibrium in mixed strategies.

Theorem (Secchi and Sudderth 2001)

NE always exists for safety qualitative objectives in finite memorystrategies.

Theorem (Chatterjee et al. 2004)

For ε > 0, ε-Nash Equilibrium always exists with terminal reward, andstationnary strategies.

Eσ[i/σ′i ] (φi ) ≤ Eσ (φi ) + ε

8 / 26

The framework

Existence of equilibria under mixed strategies

Theorem (Nash 1950)

Every one-stage game has a Nash Equilibrium in mixed strategies.

Theorem (Secchi and Sudderth 2001)

NE always exists for safety qualitative objectives in finite memorystrategies.

Theorem (Chatterjee et al. 2004)

For ε > 0, ε-Nash Equilibrium always exists with terminal reward, andstationnary strategies.

Eσ[i/σ′i ] (φi ) ≤ Eσ (φi ) + ε

8 / 26

The framework

Does a mixed Nash Equilibrium always exist?

Idea: two player concurrent zero-sum games may not have optimalstrategies but only ε-optimal strategies (for any ε > 0).

1,−1 −1, 1

hs,rw rs

Hide-or-Run game.

2, 0 0, 2

hs,rw rs

Shifted hide-or-Run game

Value problem in a zero-sum game is not a special case of NashEquilibrium problem with positive terminal rewards.

9 / 26

The framework

1,−1 −1, 1

hs,rw rs

Hide-or-Run game.

2, 0 0, 2

hs,rw rs

9 / 26

The framework

1,−1 −1, 1

hs,rw rs

Hide-or-Run game.

2, 0 0, 2

1, 1hs,rw rs

9 / 26

The framework

NE on graphs are harder

Theorem (Bouyer et al. [2014])

The existence problem is undecidable for 3-player concurrent games withnon-negative terminal rewards and a constraint.

Also holds on arbitraryterminal rewards without constraints.

Theorem (Ummels [2010])

There exist games where Nash equilibria require finite memory.

10 / 26

The framework

The existence problem is undecidable for 3-player concurrent games withnon-negative terminal rewards and a constraint. Also holds on arbitraryterminal rewards without constraints.

10 / 26

The framework

The existence problem is undecidable for 3-player concurrent games withnon-negative terminal rewards and a constraint. Also holds on arbitraryterminal rewards without constraints.

10 / 26

Existence of equilibria

11 / 26

Equilibria existence proof: general scheme.

Let M be the set of stationary strategy profiles.

Consider the best response function:

BRi : M→ 2M

mapping best strategy of i against a profile.

Consider global best response BR = (BRi )i∈Agt and show it is

::::::::::continuous.

Apply Kakutani fixed-point theorem: ∃σ σ ∈ BR(σ).

Continuity of BR is based on termination assumptions.

One-stage termination;

Safety condition, discounted reward;

Enforce terminating actions.

12 / 26

BRi : M→ 2M

12 / 26

BRi : M→ 2M

12 / 26

BRi : M→ 2M

12 / 26

BRi : M→ 2M

12 / 26

Limit behaviour

War of wits: first player to play b gives up the game and looses.

Note (x , y) = (σ1(b | s1), σ2(b | s2)).

1, 2 2, 1

a−b−

NE strategy profiles: (x , 0)and (0, x) for any 1 ≥ x > 0.

NE payoffs: respectively(1, 2) and (2, 1)

BR function graph notcontinuous in (0, 0).

13 / 26

Limit behaviour

War of wits: first player to play b gives up the game and looses.Note (x , y) = (σ1(b | s1), σ2(b | s2)).

1, 2 2, 1

a−b−

13 / 26

Limit behaviour

1, 2 2, 1

a−b−

13 / 26

Limit behaviour

1, 2 2, 1

a−b−

13 / 26

Limit behaviour

1, 2 2, 1

a−b−

13 / 26

Forcing termination: non-cycling game

A state is cycling if agents can enforce a cycle, without terminatingdeviation by any agent.

1, 2 2, 1

aa,ab, ba

⇔0, 0

Any equilibrium in the reduced game is an equilibrium in the originalgame.

W.l.o.g, we assume there is always a player that can ensure positiveprobability termination.

14 / 26

1, 2 2, 1

aa,ab, ba

⇔0, 0

14 / 26

1, 2 2, 1

aa,ab, ba

⇔0, 0

14 / 26

1, 2 2, 1

aa,ab, ba

⇔0, 0

14 / 26

Cycle constraints: sketch

C ⊆ States is cycling if there exists a strategy profile ensuring a stay in Cfrom any state in C .

0, 3 3, 0

aa, bb

σ1(b | s1) ≥ ε; σ2(b | s2) ≥ ε∀i , j , x σi (x | tj) ≥ ε;

Denote with ∆ε ⊆M the set ofsatisfying stationary strategyprofiles.

Remark ∆ε =∏

i ∆(i)ε .

15 / 26

0, 3 3, 0

aa, bb

Remark ∆ε =∏

i ∆(i)ε .

15 / 26

0, 3 3, 0

aa, bb

Remark ∆ε =∏

i ∆(i)ε .

15 / 26

0, 3 3, 0

aa, bb

σ1(b | s1) ≥ ε; σ2(b | s2) ≥ ε

∀i , j , x σi (x | tj) ≥ ε;

Remark ∆ε =∏

i ∆(i)ε .

15 / 26

0, 3 3, 0

aa, bb

σ1(b | s1) ≥ ε; σ2(b | s2) ≥ ε

∀i , j , x σi (x | tj) ≥ ε;

Remark ∆ε =∏

i ∆(i)ε .

15 / 26

0, 3 3, 0

aa, bb

Remark ∆ε =∏

i ∆(i)ε .

15 / 26

0, 3 3, 0

aa, bb

Remark ∆ε =∏

i ∆(i)ε .

15 / 26

0, 3 3, 0

aa, bb

Remark ∆ε =∏

i ∆(i)ε .

15 / 26

Bounding probability of termination

Theorem

For ε > 0, there exists p > 0 and k ∈ N such that for any σ ∈ ∆ε,

∀s ∈ States Pσ(s · States≤k · F) ≥ p

That is to say, after k iterations, there is a bounded probability that a finalstate is reached.

Ensures almost-sure termination, and even more.

16 / 26

Bounding probability of termination

Theorem

For ε > 0, there exists p > 0 and k ∈ N such that for any σ ∈ ∆ε,

∀s ∈ States Pσ(s · States≤k · F) ≥ p

That is to say, after k iterations, there is a bounded probability that a finalstate is reached.

Ensures almost-sure termination, and even more.

16 / 26

Fixed-point

Theorem (Best response function)

BRε(σ) ={σ′ ∈ ∆(i)

∣∣∣ ∀i ∈ Agt ∀s ∈ States, Eσ[i/σ′i ](φi | s) ≥ Eσ(φi | s)

For 0 < ε ≤ 1|Act| , BRε has a fixed point σ ∈ BRε(σ) in ∆ε.

Sketch.

BRε graph is continuous over ∆ε, and for any σ ∈ ∆ε, BRε(σ) is anon-empty closed convex, so fixed point theorem (Kakutani [1941])holds.

17 / 26

Fixed-point

BRε(σ) ={σ′ ∈ ∆(i)

}For 0 < ε ≤ 1

|Act| , BRε has a fixed point σ ∈ BRε(σ) in ∆ε.

Sketch.

17 / 26

Fixed-point

BRε(σ) ={σ′ ∈ ∆(i)

}For 0 < ε ≤ 1

|Act| , BRε has a fixed point σ ∈ BRε(σ) in ∆ε.

Sketch.

17 / 26

Imprecise deviations

18 / 26

Previous fixed point is stationnary;

Not necessary a NE.

Definition

σ ∈ S, is an equilibrium under ε-imprecise deviations if for any player i ,

∀σ′i ∃σ′′i d(σ′i , σ′′i ) ≤ ε, Eσ[i/σ′i ] (φi | h) ≤ Eσ (φi | h)

with d(σ, σ′) the maximal distance between distributions.

A NE is an equilibria under ε-imprecise deviations.

ε-NE and equilibria under imprecise deviations are incomparable.

19 / 26

Not necessary a NE.

Definition

19 / 26

Not necessary a NE.

Definition

19 / 26

Memoryless deviations

Do fixed points of BRε fall in this category of equilibria ?

(Restriction tomemoryless deviations)

∀σ′i ∈Mi ∃σ′′i ∈Mi d(σ′i , σ′′i ) ≤ ε, Eσ[i/σ′i ] (φi | h) ≤ Eσ (φi | h)

20 / 26

Do fixed points of BRε fall in this category of equilibria ? (Restriction tomemoryless deviations)

∀σ′i ∈Mi ∃σ′′i ∈Mi d(σ′i , σ′′i ) ≤ ε, Eσ[i/σ′i ] (φi | h) ≤ Eσ (φi | h)

20 / 26

Consider σ ∈M, and an agent i , we construct a new game G 〈σ〉−iε

of thefollowing form:

σi (a|s)

[0, ε]

[0, 2ε]

[1− 2ε, 1] [1− ε, 1]

The game is turn-based;

All players except i have fixedstrategy;

i is playing against antagonisticplayer i ;

i suggests a new move;

Then i has some latitude todeviate.

Conclusion: turn-based games are determined (Thomas M. Liggett [1969])hence we can assume deviations to be memoryless.

21 / 26

σi (a|s)

[0, ε]

[0, 2ε]

[1− 2ε, 1] [1− ε, 1]

21 / 26

σi (a|s)

[0, ε]

[0, 2ε]

[1− 2ε, 1] [1− ε, 1]

21 / 26

σi (a|s)

[0, ε]

[0, 2ε]

[1− 2ε, 1] [1− ε, 1]The game is turn-based;

21 / 26

σi (a|s)

[0, ε]

[0, 2ε]

21 / 26

σi (a|s)

[0, ε]

[0, 2ε]

21 / 26

Other consequence: computating a NE is in PSPACE

Theorem

Assume the number of actions fixed. For every ε > 0, (xi )i , (yi )i ∈ RAgt,one can decide in polynomial space whether there exists a stationaryequilibrium under ε-imprecise deviations σ such that for every i ∈ Agt,xi ≤ Eσ(φi | s0) ≤ yi .

Sketch.

As for classical NE (Ummels and Wojtczak [2011]), we compute someinitial non-deterministic polynomial time pre-computation, to encode astability formula ϕ ∈ FO(R,≤).ϕ encodes existence of some kind of stable profile:

relies on the construction of G 〈σ〉−iε.

22 / 26

Theorem

Sketch.

As for classical NE (Ummels and Wojtczak [2011]), we compute someinitial non-deterministic polynomial time pre-computation, to encode astability formula ϕ ∈ FO(R,≤).

ϕ encodes existence of some kind of stable profile:

22 / 26

Theorem

Sketch.

As for classical NE (Ummels and Wojtczak [2011]), we compute someinitial non-deterministic polynomial time pre-computation, to encode astability formula ϕ ∈ FO(R,≤).ϕ encodes existence of some kind of stable profile:

22 / 26

Overview

Getting closer to exact NE existence problem (2 players),

Usage of linear constraints to enforce a non-linear property(termination),

New notion of equilibria, not equivalent to previous ones,

Still hope for exact NE in non-negative terminal reward games,

NP-hardness can be adapted to this new notion.

Thank you for your attention.

23 / 26

Overview

23 / 26

Overview

23 / 26

Bibliography I

Patricia Bouyer, Nicolas Markey, and Daniel Stan. Mixed Nash equilibriain concurrent games. In Proceedings of the 34th Conference onFoundations of Software Technology and Theoretical Computer Science(FSTTCS’14), volume 29 of Leibniz International Proceedings inInformatics, pages 351–363. Leibniz-Zentrum fur Informatik, December2014. doi: 10.4230/LIPIcs.FSTTCS.2014.351. URL http://www.

lsv.ens-cachan.fr/Publis/PAPERS/PDF/BMS-fsttcs14.pdf.

Krishnendu Chatterjee, Marcin Jurdzinski, and Rupak Majumdar. On Nashequilibria in stochastic games. In CSL’04, volume 3210 of LNCS, pages26–40. Springer, 2004.

Shizuo Kakutani. A generalization of brouwers fixed point theorem. DukeMath. J., 8(3):457–459, 09 1941. doi:10.1215/S0012-7094-41-00838-4. URLhttp://dx.doi.org/10.1215/S0012-7094-41-00838-4.

24 / 26

Bibliography II

John F. Nash. Equilibrium points in n-person games. Proceedings of theNational Academy of Sciences of the United States of America, 36(1):48–49, 1950.

Piercesare Secchi and William D. Sudderth. Stay-in-a-set games.International Journal of Game Theory, 30:479–490, 2001.

Steven A. Lippman Thomas M. Liggett. Short notes: Stochastic gameswith perfect information and time average payoff. SIAM Review, 11(4):604–607, 1969. ISSN 00361445. URLhttp://www.jstor.org/stable/2029090.

Michael Ummels. Stochastic Multiplayer Games: Theory and Algorithms.Ph.D. Thesis, Department of Computer Science, RWTH Aachen,Germany, January 2010. URL http://www.lsv.ens-cachan.fr/

Publis/PAPERS/PDF/ummels-phd10.pdf.

25 / 26

Bibliography III

Michael Ummels and Dominik Wojtczak. The complexity of Nashequilibria in stochastic multiplayer games. Logical Methods in ComputerScience, 7(3), 2011.

26 / 26

stochastic equilibria under imprecise deviations in...

Documents