stochastic equilibria under imprecise deviations in...

90
Stochastic Equilibria under Imprecise Deviations in Terminal-Reward Concurrent Games Patricia Bouyer, Nicolas Markey and Daniel Stan CNRS, LSV, ENS Cachan & Universit´ e Paris Saclay GandALF 2016 1 / 26

Upload: phungkhue

Post on 25-Mar-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Stochastic Equilibria under Imprecise Deviations inTerminal-Reward Concurrent Games

Patricia Bouyer, Nicolas Markey and Daniel Stan

CNRS, LSV, ENS Cachan & Universite Paris Saclay

GandALF 2016

1 / 26

Page 2: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

1 Concurrent framework

2 Existence of equilibria

3 Imprecise deviations

2 / 26

Page 3: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

1 Concurrent framework

2 Existence of equilibria

3 Imprecise deviations

3 / 26

Page 4: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Games with mixed strategies

Concurrent non-zero sum games allow . . .

To modelize heterogeneous systems;

Several events to occur simultaneously;

Agents’ goals not to be necessarily antagonistic

whereas mixed strategies enable . . .

Breaking the symmetry (by randomization)

Equilibria more likely to occur.

Main goals:

Synthesizing strategies;

As simple as possible.

4 / 26

Page 5: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Games with mixed strategies

Concurrent non-zero sum games allow . . .

To modelize heterogeneous systems;

Several events to occur simultaneously;

Agents’ goals not to be necessarily antagonistic

whereas mixed strategies enable . . .

Breaking the symmetry (by randomization)

Equilibria more likely to occur.

Main goals:

Synthesizing strategies;

As simple as possible.

4 / 26

Page 6: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Games with mixed strategies

Concurrent non-zero sum games allow . . .

To modelize heterogeneous systems;

Several events to occur simultaneously;

Agents’ goals not to be necessarily antagonistic

whereas mixed strategies enable . . .

Breaking the symmetry (by randomization)

Equilibria more likely to occur.

Main goals:

Synthesizing strategies;

As simple as possible.

4 / 26

Page 7: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

Page 8: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

Page 9: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

Page 10: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

Page 11: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

Page 12: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

Page 13: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

Page 14: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

Page 15: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

Page 16: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

Page 17: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

Page 18: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

Page 19: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

Page 20: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

Page 21: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

Page 22: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

Page 23: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Definition (Strategies)

A strategy for agent i is given by σi such that for all h ∈ States+,

σi (h) ∈ Dist(Allowi (last(h)))

Depends on sequence of visited states

σi ∈M is stationnary if ∀h σi (h) = σi (last(h));

σi ∈ S is pure if all distributions give probability 1 to some action;

mixed otherwise;

finite memory if . . ..

6 / 26

Page 24: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Definition (Strategies)

A strategy for agent i is given by σi such that for all h ∈ States+,

σi (h) ∈ Dist(Allowi (last(h)))

Depends on sequence of visited states

σi ∈M is stationnary if ∀h σi (h) = σi (last(h));

σi ∈ S is pure if all distributions give probability 1 to some action;

mixed otherwise;

finite memory if . . ..

6 / 26

Page 25: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Definition (Strategies)

A strategy for agent i is given by σi such that for all h ∈ States+,

σi (h) ∈ Dist(Allowi (last(h)))

Depends on sequence of visited states

σi ∈M is stationnary if ∀h σi (h) = σi (last(h));

σi ∈ S is pure if all distributions give probability 1 to some action;

mixed otherwise;

finite memory if . . ..

6 / 26

Page 26: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Definition (Strategies)

A strategy for agent i is given by σi such that for all h ∈ States+,

σi (h) ∈ Dist(Allowi (last(h)))

Depends on sequence of visited states

σi ∈M is stationnary if ∀h σi (h) = σi (last(h));

σi ∈ S is pure if all distributions give probability 1 to some action;

mixed otherwise;

finite memory if . . ..

6 / 26

Page 27: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Nash Equilibrium

Definition

We denote by φi reward function for agent i and (σi )i∈Agt the strategyprofile.

σ is a Nash Equilibrium (NE) if for all agent i and any other strategy for i(deviation) σ′i ,

Eσ[i/σ′i ] (φi ) ≤ Eσ (φi )

No incentive to deviate to increase its own expected reward.

s0

1, 0 0, 1

;; ;

;

2; 2; 2

7 / 26

Page 28: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Nash Equilibrium

Definition

We denote by φi reward function for agent i and (σi )i∈Agt the strategyprofile.σ is a Nash Equilibrium (NE) if for all agent i and any other strategy for i(deviation) σ′i ,

Eσ[i/σ′i ] (φi ) ≤ Eσ (φi )

No incentive to deviate to increase its own expected reward.

s0

1, 0 0, 1

;; ;

;

2; 2; 2

7 / 26

Page 29: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Nash Equilibrium

Definition

We denote by φi reward function for agent i and (σi )i∈Agt the strategyprofile.σ is a Nash Equilibrium (NE) if for all agent i and any other strategy for i(deviation) σ′i ,

Eσ[i/σ′i ] (φi ) ≤ Eσ (φi )

No incentive to deviate to increase its own expected reward.

s0

1, 0 0, 1

;; ;

;

2; 2; 2

7 / 26

Page 30: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Nash Equilibrium

Definition

We denote by φi reward function for agent i and (σi )i∈Agt the strategyprofile.σ is a Nash Equilibrium (NE) if for all agent i and any other strategy for i(deviation) σ′i ,

Eσ[i/σ′i ] (φi ) ≤ Eσ (φi )

No incentive to deviate to increase its own expected reward.

s0

1, 0 0, 1

;; ;

;

2; 2; 2

7 / 26

Page 31: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Nash Equilibrium

Definition

We denote by φi reward function for agent i and (σi )i∈Agt the strategyprofile.σ is a Nash Equilibrium (NE) if for all agent i and any other strategy for i(deviation) σ′i ,

Eσ[i/σ′i ] (φi ) ≤ Eσ (φi )

No incentive to deviate to increase its own expected reward.

s0

1, 0 0, 1

;; ;

;

2; 2; 2

σi =1

3( + + )

⇒ Eσ(φ) =

(1

2,

1

2

)

7 / 26

Page 32: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Nash Equilibrium

Definition

We denote by φi reward function for agent i and (σi )i∈Agt the strategyprofile.σ is a Nash Equilibrium (NE) if for all agent i and any other strategy for i(deviation) σ′i ,

Eσ[i/σ′i ] (φi ) ≤ Eσ (φi )

No incentive to deviate to increase its own expected reward.

s0

1, 0 0, 1

;; ;

;

2; 2; 2

σi =1

3( + + )

⇒ Eσ(φ) =

(1

2,

1

2

)

7 / 26

Page 33: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Nash Equilibrium

Definition

We denote by φi reward function for agent i and (σi )i∈Agt the strategyprofile.σ is a Nash Equilibrium (NE) if for all agent i and any other strategy for i(deviation) σ′i ,

Eσ[i/σ′i ] (φi ) ≤ Eσ (φi )

No incentive to deviate to increase its own expected reward.

s0

1, 0 0, 1

;; ;

;

2; 2; 2

σ1 = σ2 =1

3( + + )

is a NE of rewardEσ(φ) =

(12 ,

12

).

7 / 26

Page 34: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Existence of equilibria under mixed strategies

Theorem (Nash 1950)

Every one-stage game has a Nash Equilibrium in mixed strategies.

Theorem (Secchi and Sudderth 2001)

NE always exists for safety qualitative objectives in finite memorystrategies.

Theorem (Chatterjee et al. 2004)

For ε > 0, ε-Nash Equilibrium always exists with terminal reward, andstationnary strategies.

Eσ[i/σ′i ] (φi ) ≤ Eσ (φi ) + ε

8 / 26

Page 35: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Existence of equilibria under mixed strategies

Theorem (Nash 1950)

Every one-stage game has a Nash Equilibrium in mixed strategies.

Theorem (Secchi and Sudderth 2001)

NE always exists for safety qualitative objectives in finite memorystrategies.

Theorem (Chatterjee et al. 2004)

For ε > 0, ε-Nash Equilibrium always exists with terminal reward, andstationnary strategies.

Eσ[i/σ′i ] (φi ) ≤ Eσ (φi ) + ε

8 / 26

Page 36: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Does a mixed Nash Equilibrium always exist?

Idea: two player concurrent zero-sum games may not have optimalstrategies but only ε-optimal strategies (for any ε > 0).

1,−1 −1, 1

hs,rw rs

hw

Hide-or-Run game.

2, 0 0, 2

1, 1

hs,rw rs

hw

Shifted hide-or-Run game

Value problem in a zero-sum game is not a special case of NashEquilibrium problem with positive terminal rewards.

9 / 26

Page 37: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Does a mixed Nash Equilibrium always exist?

Idea: two player concurrent zero-sum games may not have optimalstrategies but only ε-optimal strategies (for any ε > 0).

1,−1 −1, 1

hs,rw rs

hw

Hide-or-Run game.

2, 0 0, 2

1, 1

hs,rw rs

hw

Shifted hide-or-Run game

Value problem in a zero-sum game is not a special case of NashEquilibrium problem with positive terminal rewards.

9 / 26

Page 38: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

Does a mixed Nash Equilibrium always exist?

Idea: two player concurrent zero-sum games may not have optimalstrategies but only ε-optimal strategies (for any ε > 0).

1,−1 −1, 1

hs,rw rs

hw

Hide-or-Run game.

2, 0 0, 2

1, 1hs,rw rs

hw

Shifted hide-or-Run game

Value problem in a zero-sum game is not a special case of NashEquilibrium problem with positive terminal rewards.

9 / 26

Page 39: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

NE on graphs are harder

Theorem (Bouyer et al. [2014])

The existence problem is undecidable for 3-player concurrent games withnon-negative terminal rewards and a constraint.

Also holds on arbitraryterminal rewards without constraints.

Theorem (Ummels [2010])

There exist games where Nash equilibria require finite memory.

10 / 26

Page 40: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

NE on graphs are harder

Theorem (Bouyer et al. [2014])

The existence problem is undecidable for 3-player concurrent games withnon-negative terminal rewards and a constraint. Also holds on arbitraryterminal rewards without constraints.

Theorem (Ummels [2010])

There exist games where Nash equilibria require finite memory.

10 / 26

Page 41: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

The framework

NE on graphs are harder

Theorem (Bouyer et al. [2014])

The existence problem is undecidable for 3-player concurrent games withnon-negative terminal rewards and a constraint. Also holds on arbitraryterminal rewards without constraints.

Theorem (Ummels [2010])

There exist games where Nash equilibria require finite memory.

10 / 26

Page 42: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Existence of equilibria

1 Concurrent framework

2 Existence of equilibria

3 Imprecise deviations

11 / 26

Page 43: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Existence of equilibria

Equilibria existence proof: general scheme.

Let M be the set of stationary strategy profiles.

Consider the best response function:

BRi : M→ 2M

mapping best strategy of i against a profile.

Consider global best response BR = (BRi )i∈Agt and show it is

::::::::::continuous.

Apply Kakutani fixed-point theorem: ∃σ σ ∈ BR(σ).

Continuity of BR is based on termination assumptions.

One-stage termination;

Safety condition, discounted reward;

Enforce terminating actions.

12 / 26

Page 44: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Existence of equilibria

Equilibria existence proof: general scheme.

Let M be the set of stationary strategy profiles.

Consider the best response function:

BRi : M→ 2M

mapping best strategy of i against a profile.

Consider global best response BR = (BRi )i∈Agt and show it is

::::::::::continuous.

Apply Kakutani fixed-point theorem: ∃σ σ ∈ BR(σ).

Continuity of BR is based on termination assumptions.

One-stage termination;

Safety condition, discounted reward;

Enforce terminating actions.

12 / 26

Page 45: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Existence of equilibria

Equilibria existence proof: general scheme.

Let M be the set of stationary strategy profiles.

Consider the best response function:

BRi : M→ 2M

mapping best strategy of i against a profile.

Consider global best response BR = (BRi )i∈Agt and show it is

::::::::::continuous.

Apply Kakutani fixed-point theorem: ∃σ σ ∈ BR(σ).

Continuity of BR is based on termination assumptions.

One-stage termination;

Safety condition, discounted reward;

Enforce terminating actions.

12 / 26

Page 46: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Existence of equilibria

Equilibria existence proof: general scheme.

Let M be the set of stationary strategy profiles.

Consider the best response function:

BRi : M→ 2M

mapping best strategy of i against a profile.

Consider global best response BR = (BRi )i∈Agt and show it is

::::::::::continuous.

Apply Kakutani fixed-point theorem: ∃σ σ ∈ BR(σ).

Continuity of BR is based on termination assumptions.

One-stage termination;

Safety condition, discounted reward;

Enforce terminating actions.

12 / 26

Page 47: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Existence of equilibria

Equilibria existence proof: general scheme.

Let M be the set of stationary strategy profiles.

Consider the best response function:

BRi : M→ 2M

mapping best strategy of i against a profile.

Consider global best response BR = (BRi )i∈Agt and show it is

::::::::::continuous.

Apply Kakutani fixed-point theorem: ∃σ σ ∈ BR(σ).

Continuity of BR is based on termination assumptions.

One-stage termination;

Safety condition, discounted reward;

Enforce terminating actions.

12 / 26

Page 48: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Existence of equilibria

Limit behaviour

War of wits: first player to play b gives up the game and looses.

Note (x , y) = (σ1(b | s1), σ2(b | s2)).

s1 s2

1, 2 2, 1

a−b−

−a

−b

NE strategy profiles: (x , 0)and (0, x) for any 1 ≥ x > 0.

NE payoffs: respectively(1, 2) and (2, 1)

BR function graph notcontinuous in (0, 0).

13 / 26

Page 49: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Existence of equilibria

Limit behaviour

War of wits: first player to play b gives up the game and looses.Note (x , y) = (σ1(b | s1), σ2(b | s2)).

s1 s2

1, 2 2, 1

a−b−

−a

−b

NE strategy profiles: (x , 0)and (0, x) for any 1 ≥ x > 0.

NE payoffs: respectively(1, 2) and (2, 1)

BR function graph notcontinuous in (0, 0).

13 / 26

Page 50: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Existence of equilibria

Limit behaviour

War of wits: first player to play b gives up the game and looses.Note (x , y) = (σ1(b | s1), σ2(b | s2)).

s1 s2

1, 2 2, 1

a−b−

−a

−b

NE strategy profiles: (x , 0)and (0, x) for any 1 ≥ x > 0.

NE payoffs: respectively(1, 2) and (2, 1)

BR function graph notcontinuous in (0, 0).

13 / 26

Page 51: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Existence of equilibria

Limit behaviour

War of wits: first player to play b gives up the game and looses.Note (x , y) = (σ1(b | s1), σ2(b | s2)).

s1 s2

1, 2 2, 1

a−b−

−a

−b

NE strategy profiles: (x , 0)and (0, x) for any 1 ≥ x > 0.

NE payoffs: respectively(1, 2) and (2, 1)

BR function graph notcontinuous in (0, 0).

13 / 26

Page 52: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Existence of equilibria

Limit behaviour

War of wits: first player to play b gives up the game and looses.Note (x , y) = (σ1(b | s1), σ2(b | s2)).

s1 s2

1, 2 2, 1

a−b−

−a

−b

NE strategy profiles: (x , 0)and (0, x) for any 1 ≥ x > 0.

NE payoffs: respectively(1, 2) and (2, 1)

BR function graph notcontinuous in (0, 0).

13 / 26

Page 53: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Existence of equilibria

Forcing termination: non-cycling game

A state is cycling if agents can enforce a cycle, without terminatingdeviation by any agent.

s1 s2

1, 2 2, 1

aa,ab, ba

bb

aa,ab, ba

bb

⇔0, 0

Any equilibrium in the reduced game is an equilibrium in the originalgame.

W.l.o.g, we assume there is always a player that can ensure positiveprobability termination.

14 / 26

Page 54: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Existence of equilibria

Forcing termination: non-cycling game

A state is cycling if agents can enforce a cycle, without terminatingdeviation by any agent.

s1 s2

1, 2 2, 1

aa,ab, ba

bb

aa,ab, ba

bb

⇔0, 0

Any equilibrium in the reduced game is an equilibrium in the originalgame.

W.l.o.g, we assume there is always a player that can ensure positiveprobability termination.

14 / 26

Page 55: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Existence of equilibria

Forcing termination: non-cycling game

A state is cycling if agents can enforce a cycle, without terminatingdeviation by any agent.

s1 s2

1, 2 2, 1

aa,ab, ba

bb

aa,ab, ba

bb

⇔0, 0

Any equilibrium in the reduced game is an equilibrium in the originalgame.

W.l.o.g, we assume there is always a player that can ensure positiveprobability termination.

14 / 26

Page 56: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Existence of equilibria

Forcing termination: non-cycling game

A state is cycling if agents can enforce a cycle, without terminatingdeviation by any agent.

s1 s2

1, 2 2, 1

aa,ab, ba

bb

aa,ab, ba

bb

⇔0, 0

Any equilibrium in the reduced game is an equilibrium in the originalgame.

W.l.o.g, we assume there is always a player that can ensure positiveprobability termination.

14 / 26

Page 57: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Existence of equilibria

Cycle constraints: sketch

C ⊆ States is cycling if there exists a strategy profile ensuring a stay in Cfrom any state in C .

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa, bb

ab,ba

aa, bb

ab,ba

σ1(b | s1) ≥ ε; σ2(b | s2) ≥ ε∀i , j , x σi (x | tj) ≥ ε;

Denote with ∆ε ⊆M the set ofsatisfying stationary strategyprofiles.

Remark ∆ε =∏

i ∆(i)ε .

15 / 26

Page 58: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Existence of equilibria

Cycle constraints: sketch

C ⊆ States is cycling if there exists a strategy profile ensuring a stay in Cfrom any state in C .

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa, bb

ab,ba

aa, bb

ab,ba

σ1(b | s1) ≥ ε; σ2(b | s2) ≥ ε∀i , j , x σi (x | tj) ≥ ε;

Denote with ∆ε ⊆M the set ofsatisfying stationary strategyprofiles.

Remark ∆ε =∏

i ∆(i)ε .

15 / 26

Page 59: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Existence of equilibria

Cycle constraints: sketch

C ⊆ States is cycling if there exists a strategy profile ensuring a stay in Cfrom any state in C .

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa, bb

ab,ba

aa, bb

ab,ba

σ1(b | s1) ≥ ε; σ2(b | s2) ≥ ε∀i , j , x σi (x | tj) ≥ ε;

Denote with ∆ε ⊆M the set ofsatisfying stationary strategyprofiles.

Remark ∆ε =∏

i ∆(i)ε .

15 / 26

Page 60: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Existence of equilibria

Cycle constraints: sketch

C ⊆ States is cycling if there exists a strategy profile ensuring a stay in Cfrom any state in C .

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa, bb

ab,ba

aa, bb

ab,ba

σ1(b | s1) ≥ ε; σ2(b | s2) ≥ ε

∀i , j , x σi (x | tj) ≥ ε;

Denote with ∆ε ⊆M the set ofsatisfying stationary strategyprofiles.

Remark ∆ε =∏

i ∆(i)ε .

15 / 26

Page 61: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Existence of equilibria

Cycle constraints: sketch

C ⊆ States is cycling if there exists a strategy profile ensuring a stay in Cfrom any state in C .

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa, bb

ab,ba

aa, bb

ab,ba

σ1(b | s1) ≥ ε; σ2(b | s2) ≥ ε

∀i , j , x σi (x | tj) ≥ ε;

Denote with ∆ε ⊆M the set ofsatisfying stationary strategyprofiles.

Remark ∆ε =∏

i ∆(i)ε .

15 / 26

Page 62: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Existence of equilibria

Cycle constraints: sketch

C ⊆ States is cycling if there exists a strategy profile ensuring a stay in Cfrom any state in C .

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa, bb

ab,ba

aa, bb

ab,ba

σ1(b | s1) ≥ ε; σ2(b | s2) ≥ ε∀i , j , x σi (x | tj) ≥ ε;

Denote with ∆ε ⊆M the set ofsatisfying stationary strategyprofiles.

Remark ∆ε =∏

i ∆(i)ε .

15 / 26

Page 63: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Existence of equilibria

Cycle constraints: sketch

C ⊆ States is cycling if there exists a strategy profile ensuring a stay in Cfrom any state in C .

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa, bb

ab,ba

aa, bb

ab,ba

σ1(b | s1) ≥ ε; σ2(b | s2) ≥ ε∀i , j , x σi (x | tj) ≥ ε;

Denote with ∆ε ⊆M the set ofsatisfying stationary strategyprofiles.

Remark ∆ε =∏

i ∆(i)ε .

15 / 26

Page 64: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Existence of equilibria

Cycle constraints: sketch

C ⊆ States is cycling if there exists a strategy profile ensuring a stay in Cfrom any state in C .

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa, bb

ab,ba

aa, bb

ab,ba

σ1(b | s1) ≥ ε; σ2(b | s2) ≥ ε∀i , j , x σi (x | tj) ≥ ε;

Denote with ∆ε ⊆M the set ofsatisfying stationary strategyprofiles.

Remark ∆ε =∏

i ∆(i)ε .

15 / 26

Page 65: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Existence of equilibria

Bounding probability of termination

Theorem

For ε > 0, there exists p > 0 and k ∈ N such that for any σ ∈ ∆ε,

∀s ∈ States Pσ(s · States≤k · F) ≥ p

That is to say, after k iterations, there is a bounded probability that a finalstate is reached.

Ensures almost-sure termination, and even more.

16 / 26

Page 66: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Existence of equilibria

Bounding probability of termination

Theorem

For ε > 0, there exists p > 0 and k ∈ N such that for any σ ∈ ∆ε,

∀s ∈ States Pσ(s · States≤k · F) ≥ p

That is to say, after k iterations, there is a bounded probability that a finalstate is reached.

Ensures almost-sure termination, and even more.

16 / 26

Page 67: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Existence of equilibria

Fixed-point

Theorem (Best response function)

Let

BRε(σ) ={σ′ ∈ ∆(i)

ε

∣∣∣ ∀i ∈ Agt ∀s ∈ States, Eσ[i/σ′i ](φi | s) ≥ Eσ(φi | s)

}

For 0 < ε ≤ 1|Act| , BRε has a fixed point σ ∈ BRε(σ) in ∆ε.

Sketch.

BRε graph is continuous over ∆ε, and for any σ ∈ ∆ε, BRε(σ) is anon-empty closed convex, so fixed point theorem (Kakutani [1941])holds.

17 / 26

Page 68: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Existence of equilibria

Fixed-point

Theorem (Best response function)

Let

BRε(σ) ={σ′ ∈ ∆(i)

ε

∣∣∣ ∀i ∈ Agt ∀s ∈ States, Eσ[i/σ′i ](φi | s) ≥ Eσ(φi | s)

}For 0 < ε ≤ 1

|Act| , BRε has a fixed point σ ∈ BRε(σ) in ∆ε.

Sketch.

BRε graph is continuous over ∆ε, and for any σ ∈ ∆ε, BRε(σ) is anon-empty closed convex, so fixed point theorem (Kakutani [1941])holds.

17 / 26

Page 69: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Existence of equilibria

Fixed-point

Theorem (Best response function)

Let

BRε(σ) ={σ′ ∈ ∆(i)

ε

∣∣∣ ∀i ∈ Agt ∀s ∈ States, Eσ[i/σ′i ](φi | s) ≥ Eσ(φi | s)

}For 0 < ε ≤ 1

|Act| , BRε has a fixed point σ ∈ BRε(σ) in ∆ε.

Sketch.

BRε graph is continuous over ∆ε, and for any σ ∈ ∆ε, BRε(σ) is anon-empty closed convex, so fixed point theorem (Kakutani [1941])holds.

17 / 26

Page 70: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Imprecise deviations

1 Concurrent framework

2 Existence of equilibria

3 Imprecise deviations

18 / 26

Page 71: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Imprecise deviations

Previous fixed point is stationnary;

Not necessary a NE.

Definition

σ ∈ S, is an equilibrium under ε-imprecise deviations if for any player i ,

∀σ′i ∃σ′′i d(σ′i , σ′′i ) ≤ ε, Eσ[i/σ′i ] (φi | h) ≤ Eσ (φi | h)

with d(σ, σ′) the maximal distance between distributions.

A NE is an equilibria under ε-imprecise deviations.

ε-NE and equilibria under imprecise deviations are incomparable.

19 / 26

Page 72: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Imprecise deviations

Previous fixed point is stationnary;

Not necessary a NE.

Definition

σ ∈ S, is an equilibrium under ε-imprecise deviations if for any player i ,

∀σ′i ∃σ′′i d(σ′i , σ′′i ) ≤ ε, Eσ[i/σ′i ] (φi | h) ≤ Eσ (φi | h)

with d(σ, σ′) the maximal distance between distributions.

A NE is an equilibria under ε-imprecise deviations.

ε-NE and equilibria under imprecise deviations are incomparable.

19 / 26

Page 73: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Imprecise deviations

Previous fixed point is stationnary;

Not necessary a NE.

Definition

σ ∈ S, is an equilibrium under ε-imprecise deviations if for any player i ,

∀σ′i ∃σ′′i d(σ′i , σ′′i ) ≤ ε, Eσ[i/σ′i ] (φi | h) ≤ Eσ (φi | h)

with d(σ, σ′) the maximal distance between distributions.

A NE is an equilibria under ε-imprecise deviations.

ε-NE and equilibria under imprecise deviations are incomparable.

19 / 26

Page 74: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Imprecise deviations

Memoryless deviations

Do fixed points of BRε fall in this category of equilibria ?

(Restriction tomemoryless deviations)

∀σ′i ∈Mi ∃σ′′i ∈Mi d(σ′i , σ′′i ) ≤ ε, Eσ[i/σ′i ] (φi | h) ≤ Eσ (φi | h)

20 / 26

Page 75: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Imprecise deviations

Memoryless deviations

Do fixed points of BRε fall in this category of equilibria ? (Restriction tomemoryless deviations)

∀σ′i ∈Mi ∃σ′′i ∈Mi d(σ′i , σ′′i ) ≤ ε, Eσ[i/σ′i ] (φi | h) ≤ Eσ (φi | h)

20 / 26

Page 76: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Imprecise deviations

Memoryless deviations

Consider σ ∈M, and an agent i , we construct a new game G 〈σ〉−iε

of thefollowing form:

σi (a|s)

[0, ε]

[0, 2ε]

[1− 2ε, 1] [1− ε, 1]

The game is turn-based;

All players except i have fixedstrategy;

i is playing against antagonisticplayer i ;

i suggests a new move;

Then i has some latitude todeviate.

Conclusion: turn-based games are determined (Thomas M. Liggett [1969])hence we can assume deviations to be memoryless.

21 / 26

Page 77: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Imprecise deviations

Memoryless deviations

Consider σ ∈M, and an agent i , we construct a new game G 〈σ〉−iε

of thefollowing form:

σi (a|s)

[0, ε]

[0, 2ε]

[1− 2ε, 1] [1− ε, 1]

The game is turn-based;

All players except i have fixedstrategy;

i is playing against antagonisticplayer i ;

i suggests a new move;

Then i has some latitude todeviate.

Conclusion: turn-based games are determined (Thomas M. Liggett [1969])hence we can assume deviations to be memoryless.

21 / 26

Page 78: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Imprecise deviations

Memoryless deviations

Consider σ ∈M, and an agent i , we construct a new game G 〈σ〉−iε

of thefollowing form:

σi (a|s)

[0, ε]

[0, 2ε]

[1− 2ε, 1] [1− ε, 1]

The game is turn-based;

All players except i have fixedstrategy;

i is playing against antagonisticplayer i ;

i suggests a new move;

Then i has some latitude todeviate.

Conclusion: turn-based games are determined (Thomas M. Liggett [1969])hence we can assume deviations to be memoryless.

21 / 26

Page 79: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Imprecise deviations

Memoryless deviations

Consider σ ∈M, and an agent i , we construct a new game G 〈σ〉−iε

of thefollowing form:

σi (a|s)

[0, ε]

[0, 2ε]

[1− 2ε, 1] [1− ε, 1]The game is turn-based;

All players except i have fixedstrategy;

i is playing against antagonisticplayer i ;

i suggests a new move;

Then i has some latitude todeviate.

Conclusion: turn-based games are determined (Thomas M. Liggett [1969])hence we can assume deviations to be memoryless.

21 / 26

Page 80: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Imprecise deviations

Memoryless deviations

Consider σ ∈M, and an agent i , we construct a new game G 〈σ〉−iε

of thefollowing form:

σi (a|s)

[0, ε]

[0, 2ε]

[1− 2ε, 1] [1− ε, 1]The game is turn-based;

All players except i have fixedstrategy;

i is playing against antagonisticplayer i ;

i suggests a new move;

Then i has some latitude todeviate.

Conclusion: turn-based games are determined (Thomas M. Liggett [1969])hence we can assume deviations to be memoryless.

21 / 26

Page 81: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Imprecise deviations

Memoryless deviations

Consider σ ∈M, and an agent i , we construct a new game G 〈σ〉−iε

of thefollowing form:

σi (a|s)

[0, ε]

[0, 2ε]

[1− 2ε, 1] [1− ε, 1]The game is turn-based;

All players except i have fixedstrategy;

i is playing against antagonisticplayer i ;

i suggests a new move;

Then i has some latitude todeviate.

Conclusion: turn-based games are determined (Thomas M. Liggett [1969])hence we can assume deviations to be memoryless.

21 / 26

Page 82: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Imprecise deviations

Other consequence: computating a NE is in PSPACE

Theorem

Assume the number of actions fixed. For every ε > 0, (xi )i , (yi )i ∈ RAgt,one can decide in polynomial space whether there exists a stationaryequilibrium under ε-imprecise deviations σ such that for every i ∈ Agt,xi ≤ Eσ(φi | s0) ≤ yi .

Sketch.

As for classical NE (Ummels and Wojtczak [2011]), we compute someinitial non-deterministic polynomial time pre-computation, to encode astability formula ϕ ∈ FO(R,≤).ϕ encodes existence of some kind of stable profile:

relies on the construction of G 〈σ〉−iε.

22 / 26

Page 83: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Imprecise deviations

Other consequence: computating a NE is in PSPACE

Theorem

Assume the number of actions fixed. For every ε > 0, (xi )i , (yi )i ∈ RAgt,one can decide in polynomial space whether there exists a stationaryequilibrium under ε-imprecise deviations σ such that for every i ∈ Agt,xi ≤ Eσ(φi | s0) ≤ yi .

Sketch.

As for classical NE (Ummels and Wojtczak [2011]), we compute someinitial non-deterministic polynomial time pre-computation, to encode astability formula ϕ ∈ FO(R,≤).

ϕ encodes existence of some kind of stable profile:

relies on the construction of G 〈σ〉−iε.

22 / 26

Page 84: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Imprecise deviations

Other consequence: computating a NE is in PSPACE

Theorem

Assume the number of actions fixed. For every ε > 0, (xi )i , (yi )i ∈ RAgt,one can decide in polynomial space whether there exists a stationaryequilibrium under ε-imprecise deviations σ such that for every i ∈ Agt,xi ≤ Eσ(φi | s0) ≤ yi .

Sketch.

As for classical NE (Ummels and Wojtczak [2011]), we compute someinitial non-deterministic polynomial time pre-computation, to encode astability formula ϕ ∈ FO(R,≤).ϕ encodes existence of some kind of stable profile:

relies on the construction of G 〈σ〉−iε.

22 / 26

Page 85: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Imprecise deviations

Overview

Getting closer to exact NE existence problem (2 players),

Usage of linear constraints to enforce a non-linear property(termination),

New notion of equilibria, not equivalent to previous ones,

Still hope for exact NE in non-negative terminal reward games,

NP-hardness can be adapted to this new notion.

Thank you for your attention.

23 / 26

Page 86: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Imprecise deviations

Overview

Getting closer to exact NE existence problem (2 players),

Usage of linear constraints to enforce a non-linear property(termination),

New notion of equilibria, not equivalent to previous ones,

Still hope for exact NE in non-negative terminal reward games,

NP-hardness can be adapted to this new notion.

Thank you for your attention.

23 / 26

Page 87: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Imprecise deviations

Overview

Getting closer to exact NE existence problem (2 players),

Usage of linear constraints to enforce a non-linear property(termination),

New notion of equilibria, not equivalent to previous ones,

Still hope for exact NE in non-negative terminal reward games,

NP-hardness can be adapted to this new notion.

Thank you for your attention.

23 / 26

Page 88: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Imprecise deviations

Bibliography I

Patricia Bouyer, Nicolas Markey, and Daniel Stan. Mixed Nash equilibriain concurrent games. In Proceedings of the 34th Conference onFoundations of Software Technology and Theoretical Computer Science(FSTTCS’14), volume 29 of Leibniz International Proceedings inInformatics, pages 351–363. Leibniz-Zentrum fur Informatik, December2014. doi: 10.4230/LIPIcs.FSTTCS.2014.351. URL http://www.

lsv.ens-cachan.fr/Publis/PAPERS/PDF/BMS-fsttcs14.pdf.

Krishnendu Chatterjee, Marcin Jurdzinski, and Rupak Majumdar. On Nashequilibria in stochastic games. In CSL’04, volume 3210 of LNCS, pages26–40. Springer, 2004.

Shizuo Kakutani. A generalization of brouwers fixed point theorem. DukeMath. J., 8(3):457–459, 09 1941. doi:10.1215/S0012-7094-41-00838-4. URLhttp://dx.doi.org/10.1215/S0012-7094-41-00838-4.

24 / 26

Page 89: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Imprecise deviations

Bibliography II

John F. Nash. Equilibrium points in n-person games. Proceedings of theNational Academy of Sciences of the United States of America, 36(1):48–49, 1950.

Piercesare Secchi and William D. Sudderth. Stay-in-a-set games.International Journal of Game Theory, 30:479–490, 2001.

Steven A. Lippman Thomas M. Liggett. Short notes: Stochastic gameswith perfect information and time average payoff. SIAM Review, 11(4):604–607, 1969. ISSN 00361445. URLhttp://www.jstor.org/stable/2029090.

Michael Ummels. Stochastic Multiplayer Games: Theory and Algorithms.Ph.D. Thesis, Department of Computer Science, RWTH Aachen,Germany, January 2010. URL http://www.lsv.ens-cachan.fr/

Publis/PAPERS/PDF/ummels-phd10.pdf.

25 / 26

Page 90: Stochastic Equilibria under Imprecise Deviations in ...perso.crans.org/dstan/talks/scientific/2016-09-14-GandALF.pdfPatricia Bouyer, Nicolas Markey and Daniel Stan CNRS, ... Main goals:

Imprecise deviations

Bibliography III

Michael Ummels and Dominik Wojtczak. The complexity of Nashequilibria in stochastic multiplayer games. Logical Methods in ComputerScience, 7(3), 2011.

26 / 26