stochastic equilibria under imprecise deviations in...

Post on 25-Mar-2018

215 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Stochastic Equilibria under Imprecise Deviations inTerminal-Reward Concurrent Games

Patricia Bouyer, Nicolas Markey and Daniel Stan

CNRS, LSV, ENS Cachan & Universite Paris Saclay

GandALF 2016

1 / 26

1 Concurrent framework

2 Existence of equilibria

3 Imprecise deviations

2 / 26

The framework

1 Concurrent framework

2 Existence of equilibria

3 Imprecise deviations

3 / 26

The framework

Games with mixed strategies

Concurrent non-zero sum games allow . . .

To modelize heterogeneous systems;

Several events to occur simultaneously;

Agents’ goals not to be necessarily antagonistic

whereas mixed strategies enable . . .

Breaking the symmetry (by randomization)

Equilibria more likely to occur.

Main goals:

Synthesizing strategies;

As simple as possible.

4 / 26

The framework

Games with mixed strategies

Concurrent non-zero sum games allow . . .

To modelize heterogeneous systems;

Several events to occur simultaneously;

Agents’ goals not to be necessarily antagonistic

whereas mixed strategies enable . . .

Breaking the symmetry (by randomization)

Equilibria more likely to occur.

Main goals:

Synthesizing strategies;

As simple as possible.

4 / 26

The framework

Games with mixed strategies

Concurrent non-zero sum games allow . . .

To modelize heterogeneous systems;

Several events to occur simultaneously;

Agents’ goals not to be necessarily antagonistic

whereas mixed strategies enable . . .

Breaking the symmetry (by randomization)

Equilibria more likely to occur.

Main goals:

Synthesizing strategies;

As simple as possible.

4 / 26

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

The framework

Concurrent games: an example

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa,bbab,ba

aa,bb

ab,ba

Game on graph

Several agents

For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions

Transitions played concurrently

Terminal rewards

Also: stochastic transitions (playersand environment)

Non-zero sum (cycling rewards 0)

5 / 26

The framework

Definition (Strategies)

A strategy for agent i is given by σi such that for all h ∈ States+,

σi (h) ∈ Dist(Allowi (last(h)))

Depends on sequence of visited states

σi ∈M is stationnary if ∀h σi (h) = σi (last(h));

σi ∈ S is pure if all distributions give probability 1 to some action;

mixed otherwise;

finite memory if . . ..

6 / 26

The framework

Definition (Strategies)

A strategy for agent i is given by σi such that for all h ∈ States+,

σi (h) ∈ Dist(Allowi (last(h)))

Depends on sequence of visited states

σi ∈M is stationnary if ∀h σi (h) = σi (last(h));

σi ∈ S is pure if all distributions give probability 1 to some action;

mixed otherwise;

finite memory if . . ..

6 / 26

The framework

Definition (Strategies)

A strategy for agent i is given by σi such that for all h ∈ States+,

σi (h) ∈ Dist(Allowi (last(h)))

Depends on sequence of visited states

σi ∈M is stationnary if ∀h σi (h) = σi (last(h));

σi ∈ S is pure if all distributions give probability 1 to some action;

mixed otherwise;

finite memory if . . ..

6 / 26

The framework

Definition (Strategies)

A strategy for agent i is given by σi such that for all h ∈ States+,

σi (h) ∈ Dist(Allowi (last(h)))

Depends on sequence of visited states

σi ∈M is stationnary if ∀h σi (h) = σi (last(h));

σi ∈ S is pure if all distributions give probability 1 to some action;

mixed otherwise;

finite memory if . . ..

6 / 26

The framework

Nash Equilibrium

Definition

We denote by φi reward function for agent i and (σi )i∈Agt the strategyprofile.

σ is a Nash Equilibrium (NE) if for all agent i and any other strategy for i(deviation) σ′i ,

Eσ[i/σ′i ] (φi ) ≤ Eσ (φi )

No incentive to deviate to increase its own expected reward.

s0

1, 0 0, 1

;; ;

;

2; 2; 2

7 / 26

The framework

Nash Equilibrium

Definition

We denote by φi reward function for agent i and (σi )i∈Agt the strategyprofile.σ is a Nash Equilibrium (NE) if for all agent i and any other strategy for i(deviation) σ′i ,

Eσ[i/σ′i ] (φi ) ≤ Eσ (φi )

No incentive to deviate to increase its own expected reward.

s0

1, 0 0, 1

;; ;

;

2; 2; 2

7 / 26

The framework

Nash Equilibrium

Definition

We denote by φi reward function for agent i and (σi )i∈Agt the strategyprofile.σ is a Nash Equilibrium (NE) if for all agent i and any other strategy for i(deviation) σ′i ,

Eσ[i/σ′i ] (φi ) ≤ Eσ (φi )

No incentive to deviate to increase its own expected reward.

s0

1, 0 0, 1

;; ;

;

2; 2; 2

7 / 26

The framework

Nash Equilibrium

Definition

We denote by φi reward function for agent i and (σi )i∈Agt the strategyprofile.σ is a Nash Equilibrium (NE) if for all agent i and any other strategy for i(deviation) σ′i ,

Eσ[i/σ′i ] (φi ) ≤ Eσ (φi )

No incentive to deviate to increase its own expected reward.

s0

1, 0 0, 1

;; ;

;

2; 2; 2

7 / 26

The framework

Nash Equilibrium

Definition

We denote by φi reward function for agent i and (σi )i∈Agt the strategyprofile.σ is a Nash Equilibrium (NE) if for all agent i and any other strategy for i(deviation) σ′i ,

Eσ[i/σ′i ] (φi ) ≤ Eσ (φi )

No incentive to deviate to increase its own expected reward.

s0

1, 0 0, 1

;; ;

;

2; 2; 2

σi =1

3( + + )

⇒ Eσ(φ) =

(1

2,

1

2

)

7 / 26

The framework

Nash Equilibrium

Definition

We denote by φi reward function for agent i and (σi )i∈Agt the strategyprofile.σ is a Nash Equilibrium (NE) if for all agent i and any other strategy for i(deviation) σ′i ,

Eσ[i/σ′i ] (φi ) ≤ Eσ (φi )

No incentive to deviate to increase its own expected reward.

s0

1, 0 0, 1

;; ;

;

2; 2; 2

σi =1

3( + + )

⇒ Eσ(φ) =

(1

2,

1

2

)

7 / 26

The framework

Nash Equilibrium

Definition

We denote by φi reward function for agent i and (σi )i∈Agt the strategyprofile.σ is a Nash Equilibrium (NE) if for all agent i and any other strategy for i(deviation) σ′i ,

Eσ[i/σ′i ] (φi ) ≤ Eσ (φi )

No incentive to deviate to increase its own expected reward.

s0

1, 0 0, 1

;; ;

;

2; 2; 2

σ1 = σ2 =1

3( + + )

is a NE of rewardEσ(φ) =

(12 ,

12

).

7 / 26

The framework

Existence of equilibria under mixed strategies

Theorem (Nash 1950)

Every one-stage game has a Nash Equilibrium in mixed strategies.

Theorem (Secchi and Sudderth 2001)

NE always exists for safety qualitative objectives in finite memorystrategies.

Theorem (Chatterjee et al. 2004)

For ε > 0, ε-Nash Equilibrium always exists with terminal reward, andstationnary strategies.

Eσ[i/σ′i ] (φi ) ≤ Eσ (φi ) + ε

8 / 26

The framework

Existence of equilibria under mixed strategies

Theorem (Nash 1950)

Every one-stage game has a Nash Equilibrium in mixed strategies.

Theorem (Secchi and Sudderth 2001)

NE always exists for safety qualitative objectives in finite memorystrategies.

Theorem (Chatterjee et al. 2004)

For ε > 0, ε-Nash Equilibrium always exists with terminal reward, andstationnary strategies.

Eσ[i/σ′i ] (φi ) ≤ Eσ (φi ) + ε

8 / 26

The framework

Does a mixed Nash Equilibrium always exist?

Idea: two player concurrent zero-sum games may not have optimalstrategies but only ε-optimal strategies (for any ε > 0).

1,−1 −1, 1

hs,rw rs

hw

Hide-or-Run game.

2, 0 0, 2

1, 1

hs,rw rs

hw

Shifted hide-or-Run game

Value problem in a zero-sum game is not a special case of NashEquilibrium problem with positive terminal rewards.

9 / 26

The framework

Does a mixed Nash Equilibrium always exist?

Idea: two player concurrent zero-sum games may not have optimalstrategies but only ε-optimal strategies (for any ε > 0).

1,−1 −1, 1

hs,rw rs

hw

Hide-or-Run game.

2, 0 0, 2

1, 1

hs,rw rs

hw

Shifted hide-or-Run game

Value problem in a zero-sum game is not a special case of NashEquilibrium problem with positive terminal rewards.

9 / 26

The framework

Does a mixed Nash Equilibrium always exist?

Idea: two player concurrent zero-sum games may not have optimalstrategies but only ε-optimal strategies (for any ε > 0).

1,−1 −1, 1

hs,rw rs

hw

Hide-or-Run game.

2, 0 0, 2

1, 1hs,rw rs

hw

Shifted hide-or-Run game

Value problem in a zero-sum game is not a special case of NashEquilibrium problem with positive terminal rewards.

9 / 26

The framework

NE on graphs are harder

Theorem (Bouyer et al. [2014])

The existence problem is undecidable for 3-player concurrent games withnon-negative terminal rewards and a constraint.

Also holds on arbitraryterminal rewards without constraints.

Theorem (Ummels [2010])

There exist games where Nash equilibria require finite memory.

10 / 26

The framework

NE on graphs are harder

Theorem (Bouyer et al. [2014])

The existence problem is undecidable for 3-player concurrent games withnon-negative terminal rewards and a constraint. Also holds on arbitraryterminal rewards without constraints.

Theorem (Ummels [2010])

There exist games where Nash equilibria require finite memory.

10 / 26

The framework

NE on graphs are harder

Theorem (Bouyer et al. [2014])

The existence problem is undecidable for 3-player concurrent games withnon-negative terminal rewards and a constraint. Also holds on arbitraryterminal rewards without constraints.

Theorem (Ummels [2010])

There exist games where Nash equilibria require finite memory.

10 / 26

Existence of equilibria

1 Concurrent framework

2 Existence of equilibria

3 Imprecise deviations

11 / 26

Existence of equilibria

Equilibria existence proof: general scheme.

Let M be the set of stationary strategy profiles.

Consider the best response function:

BRi : M→ 2M

mapping best strategy of i against a profile.

Consider global best response BR = (BRi )i∈Agt and show it is

::::::::::continuous.

Apply Kakutani fixed-point theorem: ∃σ σ ∈ BR(σ).

Continuity of BR is based on termination assumptions.

One-stage termination;

Safety condition, discounted reward;

Enforce terminating actions.

12 / 26

Existence of equilibria

Equilibria existence proof: general scheme.

Let M be the set of stationary strategy profiles.

Consider the best response function:

BRi : M→ 2M

mapping best strategy of i against a profile.

Consider global best response BR = (BRi )i∈Agt and show it is

::::::::::continuous.

Apply Kakutani fixed-point theorem: ∃σ σ ∈ BR(σ).

Continuity of BR is based on termination assumptions.

One-stage termination;

Safety condition, discounted reward;

Enforce terminating actions.

12 / 26

Existence of equilibria

Equilibria existence proof: general scheme.

Let M be the set of stationary strategy profiles.

Consider the best response function:

BRi : M→ 2M

mapping best strategy of i against a profile.

Consider global best response BR = (BRi )i∈Agt and show it is

::::::::::continuous.

Apply Kakutani fixed-point theorem: ∃σ σ ∈ BR(σ).

Continuity of BR is based on termination assumptions.

One-stage termination;

Safety condition, discounted reward;

Enforce terminating actions.

12 / 26

Existence of equilibria

Equilibria existence proof: general scheme.

Let M be the set of stationary strategy profiles.

Consider the best response function:

BRi : M→ 2M

mapping best strategy of i against a profile.

Consider global best response BR = (BRi )i∈Agt and show it is

::::::::::continuous.

Apply Kakutani fixed-point theorem: ∃σ σ ∈ BR(σ).

Continuity of BR is based on termination assumptions.

One-stage termination;

Safety condition, discounted reward;

Enforce terminating actions.

12 / 26

Existence of equilibria

Equilibria existence proof: general scheme.

Let M be the set of stationary strategy profiles.

Consider the best response function:

BRi : M→ 2M

mapping best strategy of i against a profile.

Consider global best response BR = (BRi )i∈Agt and show it is

::::::::::continuous.

Apply Kakutani fixed-point theorem: ∃σ σ ∈ BR(σ).

Continuity of BR is based on termination assumptions.

One-stage termination;

Safety condition, discounted reward;

Enforce terminating actions.

12 / 26

Existence of equilibria

Limit behaviour

War of wits: first player to play b gives up the game and looses.

Note (x , y) = (σ1(b | s1), σ2(b | s2)).

s1 s2

1, 2 2, 1

a−b−

−a

−b

NE strategy profiles: (x , 0)and (0, x) for any 1 ≥ x > 0.

NE payoffs: respectively(1, 2) and (2, 1)

BR function graph notcontinuous in (0, 0).

13 / 26

Existence of equilibria

Limit behaviour

War of wits: first player to play b gives up the game and looses.Note (x , y) = (σ1(b | s1), σ2(b | s2)).

s1 s2

1, 2 2, 1

a−b−

−a

−b

NE strategy profiles: (x , 0)and (0, x) for any 1 ≥ x > 0.

NE payoffs: respectively(1, 2) and (2, 1)

BR function graph notcontinuous in (0, 0).

13 / 26

Existence of equilibria

Limit behaviour

War of wits: first player to play b gives up the game and looses.Note (x , y) = (σ1(b | s1), σ2(b | s2)).

s1 s2

1, 2 2, 1

a−b−

−a

−b

NE strategy profiles: (x , 0)and (0, x) for any 1 ≥ x > 0.

NE payoffs: respectively(1, 2) and (2, 1)

BR function graph notcontinuous in (0, 0).

13 / 26

Existence of equilibria

Limit behaviour

War of wits: first player to play b gives up the game and looses.Note (x , y) = (σ1(b | s1), σ2(b | s2)).

s1 s2

1, 2 2, 1

a−b−

−a

−b

NE strategy profiles: (x , 0)and (0, x) for any 1 ≥ x > 0.

NE payoffs: respectively(1, 2) and (2, 1)

BR function graph notcontinuous in (0, 0).

13 / 26

Existence of equilibria

Limit behaviour

War of wits: first player to play b gives up the game and looses.Note (x , y) = (σ1(b | s1), σ2(b | s2)).

s1 s2

1, 2 2, 1

a−b−

−a

−b

NE strategy profiles: (x , 0)and (0, x) for any 1 ≥ x > 0.

NE payoffs: respectively(1, 2) and (2, 1)

BR function graph notcontinuous in (0, 0).

13 / 26

Existence of equilibria

Forcing termination: non-cycling game

A state is cycling if agents can enforce a cycle, without terminatingdeviation by any agent.

s1 s2

1, 2 2, 1

aa,ab, ba

bb

aa,ab, ba

bb

⇔0, 0

Any equilibrium in the reduced game is an equilibrium in the originalgame.

W.l.o.g, we assume there is always a player that can ensure positiveprobability termination.

14 / 26

Existence of equilibria

Forcing termination: non-cycling game

A state is cycling if agents can enforce a cycle, without terminatingdeviation by any agent.

s1 s2

1, 2 2, 1

aa,ab, ba

bb

aa,ab, ba

bb

⇔0, 0

Any equilibrium in the reduced game is an equilibrium in the originalgame.

W.l.o.g, we assume there is always a player that can ensure positiveprobability termination.

14 / 26

Existence of equilibria

Forcing termination: non-cycling game

A state is cycling if agents can enforce a cycle, without terminatingdeviation by any agent.

s1 s2

1, 2 2, 1

aa,ab, ba

bb

aa,ab, ba

bb

⇔0, 0

Any equilibrium in the reduced game is an equilibrium in the originalgame.

W.l.o.g, we assume there is always a player that can ensure positiveprobability termination.

14 / 26

Existence of equilibria

Forcing termination: non-cycling game

A state is cycling if agents can enforce a cycle, without terminatingdeviation by any agent.

s1 s2

1, 2 2, 1

aa,ab, ba

bb

aa,ab, ba

bb

⇔0, 0

Any equilibrium in the reduced game is an equilibrium in the originalgame.

W.l.o.g, we assume there is always a player that can ensure positiveprobability termination.

14 / 26

Existence of equilibria

Cycle constraints: sketch

C ⊆ States is cycling if there exists a strategy profile ensuring a stay in Cfrom any state in C .

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa, bb

ab,ba

aa, bb

ab,ba

σ1(b | s1) ≥ ε; σ2(b | s2) ≥ ε∀i , j , x σi (x | tj) ≥ ε;

Denote with ∆ε ⊆M the set ofsatisfying stationary strategyprofiles.

Remark ∆ε =∏

i ∆(i)ε .

15 / 26

Existence of equilibria

Cycle constraints: sketch

C ⊆ States is cycling if there exists a strategy profile ensuring a stay in Cfrom any state in C .

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa, bb

ab,ba

aa, bb

ab,ba

σ1(b | s1) ≥ ε; σ2(b | s2) ≥ ε∀i , j , x σi (x | tj) ≥ ε;

Denote with ∆ε ⊆M the set ofsatisfying stationary strategyprofiles.

Remark ∆ε =∏

i ∆(i)ε .

15 / 26

Existence of equilibria

Cycle constraints: sketch

C ⊆ States is cycling if there exists a strategy profile ensuring a stay in Cfrom any state in C .

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa, bb

ab,ba

aa, bb

ab,ba

σ1(b | s1) ≥ ε; σ2(b | s2) ≥ ε∀i , j , x σi (x | tj) ≥ ε;

Denote with ∆ε ⊆M the set ofsatisfying stationary strategyprofiles.

Remark ∆ε =∏

i ∆(i)ε .

15 / 26

Existence of equilibria

Cycle constraints: sketch

C ⊆ States is cycling if there exists a strategy profile ensuring a stay in Cfrom any state in C .

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa, bb

ab,ba

aa, bb

ab,ba

σ1(b | s1) ≥ ε; σ2(b | s2) ≥ ε

∀i , j , x σi (x | tj) ≥ ε;

Denote with ∆ε ⊆M the set ofsatisfying stationary strategyprofiles.

Remark ∆ε =∏

i ∆(i)ε .

15 / 26

Existence of equilibria

Cycle constraints: sketch

C ⊆ States is cycling if there exists a strategy profile ensuring a stay in Cfrom any state in C .

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa, bb

ab,ba

aa, bb

ab,ba

σ1(b | s1) ≥ ε; σ2(b | s2) ≥ ε

∀i , j , x σi (x | tj) ≥ ε;

Denote with ∆ε ⊆M the set ofsatisfying stationary strategyprofiles.

Remark ∆ε =∏

i ∆(i)ε .

15 / 26

Existence of equilibria

Cycle constraints: sketch

C ⊆ States is cycling if there exists a strategy profile ensuring a stay in Cfrom any state in C .

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa, bb

ab,ba

aa, bb

ab,ba

σ1(b | s1) ≥ ε; σ2(b | s2) ≥ ε∀i , j , x σi (x | tj) ≥ ε;

Denote with ∆ε ⊆M the set ofsatisfying stationary strategyprofiles.

Remark ∆ε =∏

i ∆(i)ε .

15 / 26

Existence of equilibria

Cycle constraints: sketch

C ⊆ States is cycling if there exists a strategy profile ensuring a stay in Cfrom any state in C .

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa, bb

ab,ba

aa, bb

ab,ba

σ1(b | s1) ≥ ε; σ2(b | s2) ≥ ε∀i , j , x σi (x | tj) ≥ ε;

Denote with ∆ε ⊆M the set ofsatisfying stationary strategyprofiles.

Remark ∆ε =∏

i ∆(i)ε .

15 / 26

Existence of equilibria

Cycle constraints: sketch

C ⊆ States is cycling if there exists a strategy profile ensuring a stay in Cfrom any state in C .

s1 s2

t1 t2

0, 3 3, 0

a−

b−

−a

−b

aa, bb

ab,ba

aa, bb

ab,ba

σ1(b | s1) ≥ ε; σ2(b | s2) ≥ ε∀i , j , x σi (x | tj) ≥ ε;

Denote with ∆ε ⊆M the set ofsatisfying stationary strategyprofiles.

Remark ∆ε =∏

i ∆(i)ε .

15 / 26

Existence of equilibria

Bounding probability of termination

Theorem

For ε > 0, there exists p > 0 and k ∈ N such that for any σ ∈ ∆ε,

∀s ∈ States Pσ(s · States≤k · F) ≥ p

That is to say, after k iterations, there is a bounded probability that a finalstate is reached.

Ensures almost-sure termination, and even more.

16 / 26

Existence of equilibria

Bounding probability of termination

Theorem

For ε > 0, there exists p > 0 and k ∈ N such that for any σ ∈ ∆ε,

∀s ∈ States Pσ(s · States≤k · F) ≥ p

That is to say, after k iterations, there is a bounded probability that a finalstate is reached.

Ensures almost-sure termination, and even more.

16 / 26

Existence of equilibria

Fixed-point

Theorem (Best response function)

Let

BRε(σ) ={σ′ ∈ ∆(i)

ε

∣∣∣ ∀i ∈ Agt ∀s ∈ States, Eσ[i/σ′i ](φi | s) ≥ Eσ(φi | s)

}

For 0 < ε ≤ 1|Act| , BRε has a fixed point σ ∈ BRε(σ) in ∆ε.

Sketch.

BRε graph is continuous over ∆ε, and for any σ ∈ ∆ε, BRε(σ) is anon-empty closed convex, so fixed point theorem (Kakutani [1941])holds.

17 / 26

Existence of equilibria

Fixed-point

Theorem (Best response function)

Let

BRε(σ) ={σ′ ∈ ∆(i)

ε

∣∣∣ ∀i ∈ Agt ∀s ∈ States, Eσ[i/σ′i ](φi | s) ≥ Eσ(φi | s)

}For 0 < ε ≤ 1

|Act| , BRε has a fixed point σ ∈ BRε(σ) in ∆ε.

Sketch.

BRε graph is continuous over ∆ε, and for any σ ∈ ∆ε, BRε(σ) is anon-empty closed convex, so fixed point theorem (Kakutani [1941])holds.

17 / 26

Existence of equilibria

Fixed-point

Theorem (Best response function)

Let

BRε(σ) ={σ′ ∈ ∆(i)

ε

∣∣∣ ∀i ∈ Agt ∀s ∈ States, Eσ[i/σ′i ](φi | s) ≥ Eσ(φi | s)

}For 0 < ε ≤ 1

|Act| , BRε has a fixed point σ ∈ BRε(σ) in ∆ε.

Sketch.

BRε graph is continuous over ∆ε, and for any σ ∈ ∆ε, BRε(σ) is anon-empty closed convex, so fixed point theorem (Kakutani [1941])holds.

17 / 26

Imprecise deviations

1 Concurrent framework

2 Existence of equilibria

3 Imprecise deviations

18 / 26

Imprecise deviations

Previous fixed point is stationnary;

Not necessary a NE.

Definition

σ ∈ S, is an equilibrium under ε-imprecise deviations if for any player i ,

∀σ′i ∃σ′′i d(σ′i , σ′′i ) ≤ ε, Eσ[i/σ′i ] (φi | h) ≤ Eσ (φi | h)

with d(σ, σ′) the maximal distance between distributions.

A NE is an equilibria under ε-imprecise deviations.

ε-NE and equilibria under imprecise deviations are incomparable.

19 / 26

Imprecise deviations

Previous fixed point is stationnary;

Not necessary a NE.

Definition

σ ∈ S, is an equilibrium under ε-imprecise deviations if for any player i ,

∀σ′i ∃σ′′i d(σ′i , σ′′i ) ≤ ε, Eσ[i/σ′i ] (φi | h) ≤ Eσ (φi | h)

with d(σ, σ′) the maximal distance between distributions.

A NE is an equilibria under ε-imprecise deviations.

ε-NE and equilibria under imprecise deviations are incomparable.

19 / 26

Imprecise deviations

Previous fixed point is stationnary;

Not necessary a NE.

Definition

σ ∈ S, is an equilibrium under ε-imprecise deviations if for any player i ,

∀σ′i ∃σ′′i d(σ′i , σ′′i ) ≤ ε, Eσ[i/σ′i ] (φi | h) ≤ Eσ (φi | h)

with d(σ, σ′) the maximal distance between distributions.

A NE is an equilibria under ε-imprecise deviations.

ε-NE and equilibria under imprecise deviations are incomparable.

19 / 26

Imprecise deviations

Memoryless deviations

Do fixed points of BRε fall in this category of equilibria ?

(Restriction tomemoryless deviations)

∀σ′i ∈Mi ∃σ′′i ∈Mi d(σ′i , σ′′i ) ≤ ε, Eσ[i/σ′i ] (φi | h) ≤ Eσ (φi | h)

20 / 26

Imprecise deviations

Memoryless deviations

Do fixed points of BRε fall in this category of equilibria ? (Restriction tomemoryless deviations)

∀σ′i ∈Mi ∃σ′′i ∈Mi d(σ′i , σ′′i ) ≤ ε, Eσ[i/σ′i ] (φi | h) ≤ Eσ (φi | h)

20 / 26

Imprecise deviations

Memoryless deviations

Consider σ ∈M, and an agent i , we construct a new game G 〈σ〉−iε

of thefollowing form:

σi (a|s)

[0, ε]

[0, 2ε]

[1− 2ε, 1] [1− ε, 1]

The game is turn-based;

All players except i have fixedstrategy;

i is playing against antagonisticplayer i ;

i suggests a new move;

Then i has some latitude todeviate.

Conclusion: turn-based games are determined (Thomas M. Liggett [1969])hence we can assume deviations to be memoryless.

21 / 26

Imprecise deviations

Memoryless deviations

Consider σ ∈M, and an agent i , we construct a new game G 〈σ〉−iε

of thefollowing form:

σi (a|s)

[0, ε]

[0, 2ε]

[1− 2ε, 1] [1− ε, 1]

The game is turn-based;

All players except i have fixedstrategy;

i is playing against antagonisticplayer i ;

i suggests a new move;

Then i has some latitude todeviate.

Conclusion: turn-based games are determined (Thomas M. Liggett [1969])hence we can assume deviations to be memoryless.

21 / 26

Imprecise deviations

Memoryless deviations

Consider σ ∈M, and an agent i , we construct a new game G 〈σ〉−iε

of thefollowing form:

σi (a|s)

[0, ε]

[0, 2ε]

[1− 2ε, 1] [1− ε, 1]

The game is turn-based;

All players except i have fixedstrategy;

i is playing against antagonisticplayer i ;

i suggests a new move;

Then i has some latitude todeviate.

Conclusion: turn-based games are determined (Thomas M. Liggett [1969])hence we can assume deviations to be memoryless.

21 / 26

Imprecise deviations

Memoryless deviations

Consider σ ∈M, and an agent i , we construct a new game G 〈σ〉−iε

of thefollowing form:

σi (a|s)

[0, ε]

[0, 2ε]

[1− 2ε, 1] [1− ε, 1]The game is turn-based;

All players except i have fixedstrategy;

i is playing against antagonisticplayer i ;

i suggests a new move;

Then i has some latitude todeviate.

Conclusion: turn-based games are determined (Thomas M. Liggett [1969])hence we can assume deviations to be memoryless.

21 / 26

Imprecise deviations

Memoryless deviations

Consider σ ∈M, and an agent i , we construct a new game G 〈σ〉−iε

of thefollowing form:

σi (a|s)

[0, ε]

[0, 2ε]

[1− 2ε, 1] [1− ε, 1]The game is turn-based;

All players except i have fixedstrategy;

i is playing against antagonisticplayer i ;

i suggests a new move;

Then i has some latitude todeviate.

Conclusion: turn-based games are determined (Thomas M. Liggett [1969])hence we can assume deviations to be memoryless.

21 / 26

Imprecise deviations

Memoryless deviations

Consider σ ∈M, and an agent i , we construct a new game G 〈σ〉−iε

of thefollowing form:

σi (a|s)

[0, ε]

[0, 2ε]

[1− 2ε, 1] [1− ε, 1]The game is turn-based;

All players except i have fixedstrategy;

i is playing against antagonisticplayer i ;

i suggests a new move;

Then i has some latitude todeviate.

Conclusion: turn-based games are determined (Thomas M. Liggett [1969])hence we can assume deviations to be memoryless.

21 / 26

Imprecise deviations

Other consequence: computating a NE is in PSPACE

Theorem

Assume the number of actions fixed. For every ε > 0, (xi )i , (yi )i ∈ RAgt,one can decide in polynomial space whether there exists a stationaryequilibrium under ε-imprecise deviations σ such that for every i ∈ Agt,xi ≤ Eσ(φi | s0) ≤ yi .

Sketch.

As for classical NE (Ummels and Wojtczak [2011]), we compute someinitial non-deterministic polynomial time pre-computation, to encode astability formula ϕ ∈ FO(R,≤).ϕ encodes existence of some kind of stable profile:

relies on the construction of G 〈σ〉−iε.

22 / 26

Imprecise deviations

Other consequence: computating a NE is in PSPACE

Theorem

Assume the number of actions fixed. For every ε > 0, (xi )i , (yi )i ∈ RAgt,one can decide in polynomial space whether there exists a stationaryequilibrium under ε-imprecise deviations σ such that for every i ∈ Agt,xi ≤ Eσ(φi | s0) ≤ yi .

Sketch.

As for classical NE (Ummels and Wojtczak [2011]), we compute someinitial non-deterministic polynomial time pre-computation, to encode astability formula ϕ ∈ FO(R,≤).

ϕ encodes existence of some kind of stable profile:

relies on the construction of G 〈σ〉−iε.

22 / 26

Imprecise deviations

Other consequence: computating a NE is in PSPACE

Theorem

Assume the number of actions fixed. For every ε > 0, (xi )i , (yi )i ∈ RAgt,one can decide in polynomial space whether there exists a stationaryequilibrium under ε-imprecise deviations σ such that for every i ∈ Agt,xi ≤ Eσ(φi | s0) ≤ yi .

Sketch.

As for classical NE (Ummels and Wojtczak [2011]), we compute someinitial non-deterministic polynomial time pre-computation, to encode astability formula ϕ ∈ FO(R,≤).ϕ encodes existence of some kind of stable profile:

relies on the construction of G 〈σ〉−iε.

22 / 26

Imprecise deviations

Overview

Getting closer to exact NE existence problem (2 players),

Usage of linear constraints to enforce a non-linear property(termination),

New notion of equilibria, not equivalent to previous ones,

Still hope for exact NE in non-negative terminal reward games,

NP-hardness can be adapted to this new notion.

Thank you for your attention.

23 / 26

Imprecise deviations

Overview

Getting closer to exact NE existence problem (2 players),

Usage of linear constraints to enforce a non-linear property(termination),

New notion of equilibria, not equivalent to previous ones,

Still hope for exact NE in non-negative terminal reward games,

NP-hardness can be adapted to this new notion.

Thank you for your attention.

23 / 26

Imprecise deviations

Overview

Getting closer to exact NE existence problem (2 players),

Usage of linear constraints to enforce a non-linear property(termination),

New notion of equilibria, not equivalent to previous ones,

Still hope for exact NE in non-negative terminal reward games,

NP-hardness can be adapted to this new notion.

Thank you for your attention.

23 / 26

Imprecise deviations

Bibliography I

Patricia Bouyer, Nicolas Markey, and Daniel Stan. Mixed Nash equilibriain concurrent games. In Proceedings of the 34th Conference onFoundations of Software Technology and Theoretical Computer Science(FSTTCS’14), volume 29 of Leibniz International Proceedings inInformatics, pages 351–363. Leibniz-Zentrum fur Informatik, December2014. doi: 10.4230/LIPIcs.FSTTCS.2014.351. URL http://www.

lsv.ens-cachan.fr/Publis/PAPERS/PDF/BMS-fsttcs14.pdf.

Krishnendu Chatterjee, Marcin Jurdzinski, and Rupak Majumdar. On Nashequilibria in stochastic games. In CSL’04, volume 3210 of LNCS, pages26–40. Springer, 2004.

Shizuo Kakutani. A generalization of brouwers fixed point theorem. DukeMath. J., 8(3):457–459, 09 1941. doi:10.1215/S0012-7094-41-00838-4. URLhttp://dx.doi.org/10.1215/S0012-7094-41-00838-4.

24 / 26

Imprecise deviations

Bibliography II

John F. Nash. Equilibrium points in n-person games. Proceedings of theNational Academy of Sciences of the United States of America, 36(1):48–49, 1950.

Piercesare Secchi and William D. Sudderth. Stay-in-a-set games.International Journal of Game Theory, 30:479–490, 2001.

Steven A. Lippman Thomas M. Liggett. Short notes: Stochastic gameswith perfect information and time average payoff. SIAM Review, 11(4):604–607, 1969. ISSN 00361445. URLhttp://www.jstor.org/stable/2029090.

Michael Ummels. Stochastic Multiplayer Games: Theory and Algorithms.Ph.D. Thesis, Department of Computer Science, RWTH Aachen,Germany, January 2010. URL http://www.lsv.ens-cachan.fr/

Publis/PAPERS/PDF/ummels-phd10.pdf.

25 / 26

Imprecise deviations

Bibliography III

Michael Ummels and Dominik Wojtczak. The complexity of Nashequilibria in stochastic multiplayer games. Logical Methods in ComputerScience, 7(3), 2011.

26 / 26

top related