stochastic equilibria under imprecise deviations in...
Post on 25-Mar-2018
215 Views
Preview:
TRANSCRIPT
Stochastic Equilibria under Imprecise Deviations inTerminal-Reward Concurrent Games
Patricia Bouyer, Nicolas Markey and Daniel Stan
CNRS, LSV, ENS Cachan & Universite Paris Saclay
GandALF 2016
1 / 26
1 Concurrent framework
2 Existence of equilibria
3 Imprecise deviations
2 / 26
The framework
1 Concurrent framework
2 Existence of equilibria
3 Imprecise deviations
3 / 26
The framework
Games with mixed strategies
Concurrent non-zero sum games allow . . .
To modelize heterogeneous systems;
Several events to occur simultaneously;
Agents’ goals not to be necessarily antagonistic
whereas mixed strategies enable . . .
Breaking the symmetry (by randomization)
Equilibria more likely to occur.
Main goals:
Synthesizing strategies;
As simple as possible.
4 / 26
The framework
Games with mixed strategies
Concurrent non-zero sum games allow . . .
To modelize heterogeneous systems;
Several events to occur simultaneously;
Agents’ goals not to be necessarily antagonistic
whereas mixed strategies enable . . .
Breaking the symmetry (by randomization)
Equilibria more likely to occur.
Main goals:
Synthesizing strategies;
As simple as possible.
4 / 26
The framework
Games with mixed strategies
Concurrent non-zero sum games allow . . .
To modelize heterogeneous systems;
Several events to occur simultaneously;
Agents’ goals not to be necessarily antagonistic
whereas mixed strategies enable . . .
Breaking the symmetry (by randomization)
Equilibria more likely to occur.
Main goals:
Synthesizing strategies;
As simple as possible.
4 / 26
The framework
Concurrent games: an example
s1 s2
t1 t2
0, 3 3, 0
a−
b−
−a
−b
aa,bbab,ba
aa,bb
ab,ba
Game on graph
Several agents
For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions
Transitions played concurrently
Terminal rewards
Also: stochastic transitions (playersand environment)
Non-zero sum (cycling rewards 0)
5 / 26
The framework
Concurrent games: an example
s1 s2
t1 t2
0, 3 3, 0
a−
b−
−a
−b
aa,bbab,ba
aa,bb
ab,ba
Game on graph
Several agents
For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions
Transitions played concurrently
Terminal rewards
Also: stochastic transitions (playersand environment)
Non-zero sum (cycling rewards 0)
5 / 26
The framework
Concurrent games: an example
s1 s2
t1 t2
0, 3 3, 0
a−b−
−a
−b
aa,bbab,ba
aa,bb
ab,ba
Game on graph
Several agents
For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions
Transitions played concurrently
Terminal rewards
Also: stochastic transitions (playersand environment)
Non-zero sum (cycling rewards 0)
5 / 26
The framework
Concurrent games: an example
s1 s2
t1 t2
0, 3 3, 0
a−
b−
−a
−b
aa,bbab,ba
aa,bb
ab,ba
Game on graph
Several agents
For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions
Transitions played concurrently
Terminal rewards
Also: stochastic transitions (playersand environment)
Non-zero sum (cycling rewards 0)
5 / 26
The framework
Concurrent games: an example
s1 s2
t1 t2
0, 3 3, 0
a−
b−
−a
−b
aa,bbab,ba
aa,bb
ab,ba
Game on graph
Several agents
For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions
Transitions played concurrently
Terminal rewards
Also: stochastic transitions (playersand environment)
Non-zero sum (cycling rewards 0)
5 / 26
The framework
Concurrent games: an example
s1 s2
t1 t2
0, 3 3, 0
a−
b−
−a
−b
aa,bbab,ba
aa,bb
ab,ba
Game on graph
Several agents
For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions
Transitions played concurrently
Terminal rewards
Also: stochastic transitions (playersand environment)
Non-zero sum (cycling rewards 0)
5 / 26
The framework
Concurrent games: an example
s1 s2
t1 t2
0, 3 3, 0
a−
b−
−a
−b
aa,bbab,ba
aa,bb
ab,ba
Game on graph
Several agents
For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions
Transitions played concurrently
Terminal rewards
Also: stochastic transitions (playersand environment)
Non-zero sum (cycling rewards 0)
5 / 26
The framework
Concurrent games: an example
s1 s2
t1 t2
0, 3 3, 0
a−
b−
−a
−b
aa,bbab,ba
aa,bb
ab,ba
Game on graph
Several agents
For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions
Transitions played concurrently
Terminal rewards
Also: stochastic transitions (playersand environment)
Non-zero sum (cycling rewards 0)
5 / 26
The framework
Concurrent games: an example
s1 s2
t1 t2
0, 3 3, 0
a−
b−
−a
−b
aa,bbab,ba
aa,bb
ab,ba
Game on graph
Several agents
For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions
Transitions played concurrently
Terminal rewards
Also: stochastic transitions (playersand environment)
Non-zero sum (cycling rewards 0)
5 / 26
The framework
Concurrent games: an example
s1 s2
t1 t2
0, 3 3, 0
a−
b−
−a
−b
aa,bbab,ba
aa,bb
ab,ba
Game on graph
Several agents
For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions
Transitions played concurrently
Terminal rewards
Also: stochastic transitions (playersand environment)
Non-zero sum (cycling rewards 0)
5 / 26
The framework
Concurrent games: an example
s1 s2
t1 t2
0, 3 3, 0
a−
b−
−a
−b
aa,bbab,ba
aa,bb
ab,ba
Game on graph
Several agents
For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions
Transitions played concurrently
Terminal rewards
Also: stochastic transitions (playersand environment)
Non-zero sum (cycling rewards 0)
5 / 26
The framework
Concurrent games: an example
s1 s2
t1 t2
0, 3 3, 0
a−
b−
−a
−b
aa,bbab,ba
aa,bb
ab,ba
Game on graph
Several agents
For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions
Transitions played concurrently
Terminal rewards
Also: stochastic transitions (playersand environment)
Non-zero sum (cycling rewards 0)
5 / 26
The framework
Concurrent games: an example
s1 s2
t1 t2
0, 3 3, 0
a−
b−
−a
−b
aa,bbab,ba
aa,bb
ab,ba
Game on graph
Several agents
For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions
Transitions played concurrently
Terminal rewards
Also: stochastic transitions (playersand environment)
Non-zero sum (cycling rewards 0)
5 / 26
The framework
Concurrent games: an example
s1 s2
t1 t2
0, 3 3, 0
a−
b−
−a
−b
aa,bbab,ba
aa,bb
ab,ba
Game on graph
Several agents
For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions
Transitions played concurrently
Terminal rewards
Also: stochastic transitions (playersand environment)
Non-zero sum (cycling rewards 0)
5 / 26
The framework
Concurrent games: an example
s1 s2
t1 t2
0, 3 3, 0
a−
b−
−a
−b
aa,bbab,ba
aa,bb
ab,ba
Game on graph
Several agents
For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions
Transitions played concurrently
Terminal rewards
Also: stochastic transitions (playersand environment)
Non-zero sum (cycling rewards 0)
5 / 26
The framework
Concurrent games: an example
s1 s2
t1 t2
0, 3 3, 0
a−
b−
−a
−b
aa,bbab,ba
aa,bb
ab,ba
Game on graph
Several agents
For each state s ∈ States, eachi ∈ Agt, set of Allows(s) actions
Transitions played concurrently
Terminal rewards
Also: stochastic transitions (playersand environment)
Non-zero sum (cycling rewards 0)
5 / 26
The framework
Definition (Strategies)
A strategy for agent i is given by σi such that for all h ∈ States+,
σi (h) ∈ Dist(Allowi (last(h)))
Depends on sequence of visited states
σi ∈M is stationnary if ∀h σi (h) = σi (last(h));
σi ∈ S is pure if all distributions give probability 1 to some action;
mixed otherwise;
finite memory if . . ..
6 / 26
The framework
Definition (Strategies)
A strategy for agent i is given by σi such that for all h ∈ States+,
σi (h) ∈ Dist(Allowi (last(h)))
Depends on sequence of visited states
σi ∈M is stationnary if ∀h σi (h) = σi (last(h));
σi ∈ S is pure if all distributions give probability 1 to some action;
mixed otherwise;
finite memory if . . ..
6 / 26
The framework
Definition (Strategies)
A strategy for agent i is given by σi such that for all h ∈ States+,
σi (h) ∈ Dist(Allowi (last(h)))
Depends on sequence of visited states
σi ∈M is stationnary if ∀h σi (h) = σi (last(h));
σi ∈ S is pure if all distributions give probability 1 to some action;
mixed otherwise;
finite memory if . . ..
6 / 26
The framework
Definition (Strategies)
A strategy for agent i is given by σi such that for all h ∈ States+,
σi (h) ∈ Dist(Allowi (last(h)))
Depends on sequence of visited states
σi ∈M is stationnary if ∀h σi (h) = σi (last(h));
σi ∈ S is pure if all distributions give probability 1 to some action;
mixed otherwise;
finite memory if . . ..
6 / 26
The framework
Nash Equilibrium
Definition
We denote by φi reward function for agent i and (σi )i∈Agt the strategyprofile.
σ is a Nash Equilibrium (NE) if for all agent i and any other strategy for i(deviation) σ′i ,
Eσ[i/σ′i ] (φi ) ≤ Eσ (φi )
No incentive to deviate to increase its own expected reward.
s0
1, 0 0, 1
;; ;
;
2; 2; 2
7 / 26
The framework
Nash Equilibrium
Definition
We denote by φi reward function for agent i and (σi )i∈Agt the strategyprofile.σ is a Nash Equilibrium (NE) if for all agent i and any other strategy for i(deviation) σ′i ,
Eσ[i/σ′i ] (φi ) ≤ Eσ (φi )
No incentive to deviate to increase its own expected reward.
s0
1, 0 0, 1
;; ;
;
2; 2; 2
7 / 26
The framework
Nash Equilibrium
Definition
We denote by φi reward function for agent i and (σi )i∈Agt the strategyprofile.σ is a Nash Equilibrium (NE) if for all agent i and any other strategy for i(deviation) σ′i ,
Eσ[i/σ′i ] (φi ) ≤ Eσ (φi )
No incentive to deviate to increase its own expected reward.
s0
1, 0 0, 1
;; ;
;
2; 2; 2
7 / 26
The framework
Nash Equilibrium
Definition
We denote by φi reward function for agent i and (σi )i∈Agt the strategyprofile.σ is a Nash Equilibrium (NE) if for all agent i and any other strategy for i(deviation) σ′i ,
Eσ[i/σ′i ] (φi ) ≤ Eσ (φi )
No incentive to deviate to increase its own expected reward.
s0
1, 0 0, 1
;; ;
;
2; 2; 2
7 / 26
The framework
Nash Equilibrium
Definition
We denote by φi reward function for agent i and (σi )i∈Agt the strategyprofile.σ is a Nash Equilibrium (NE) if for all agent i and any other strategy for i(deviation) σ′i ,
Eσ[i/σ′i ] (φi ) ≤ Eσ (φi )
No incentive to deviate to increase its own expected reward.
s0
1, 0 0, 1
;; ;
;
2; 2; 2
σi =1
3( + + )
⇒ Eσ(φ) =
(1
2,
1
2
)
7 / 26
The framework
Nash Equilibrium
Definition
We denote by φi reward function for agent i and (σi )i∈Agt the strategyprofile.σ is a Nash Equilibrium (NE) if for all agent i and any other strategy for i(deviation) σ′i ,
Eσ[i/σ′i ] (φi ) ≤ Eσ (φi )
No incentive to deviate to increase its own expected reward.
s0
1, 0 0, 1
;; ;
;
2; 2; 2
σi =1
3( + + )
⇒ Eσ(φ) =
(1
2,
1
2
)
7 / 26
The framework
Nash Equilibrium
Definition
We denote by φi reward function for agent i and (σi )i∈Agt the strategyprofile.σ is a Nash Equilibrium (NE) if for all agent i and any other strategy for i(deviation) σ′i ,
Eσ[i/σ′i ] (φi ) ≤ Eσ (φi )
No incentive to deviate to increase its own expected reward.
s0
1, 0 0, 1
;; ;
;
2; 2; 2
σ1 = σ2 =1
3( + + )
is a NE of rewardEσ(φ) =
(12 ,
12
).
7 / 26
The framework
Existence of equilibria under mixed strategies
Theorem (Nash 1950)
Every one-stage game has a Nash Equilibrium in mixed strategies.
Theorem (Secchi and Sudderth 2001)
NE always exists for safety qualitative objectives in finite memorystrategies.
Theorem (Chatterjee et al. 2004)
For ε > 0, ε-Nash Equilibrium always exists with terminal reward, andstationnary strategies.
Eσ[i/σ′i ] (φi ) ≤ Eσ (φi ) + ε
8 / 26
The framework
Existence of equilibria under mixed strategies
Theorem (Nash 1950)
Every one-stage game has a Nash Equilibrium in mixed strategies.
Theorem (Secchi and Sudderth 2001)
NE always exists for safety qualitative objectives in finite memorystrategies.
Theorem (Chatterjee et al. 2004)
For ε > 0, ε-Nash Equilibrium always exists with terminal reward, andstationnary strategies.
Eσ[i/σ′i ] (φi ) ≤ Eσ (φi ) + ε
8 / 26
The framework
Does a mixed Nash Equilibrium always exist?
Idea: two player concurrent zero-sum games may not have optimalstrategies but only ε-optimal strategies (for any ε > 0).
1,−1 −1, 1
hs,rw rs
hw
Hide-or-Run game.
2, 0 0, 2
1, 1
hs,rw rs
hw
Shifted hide-or-Run game
Value problem in a zero-sum game is not a special case of NashEquilibrium problem with positive terminal rewards.
9 / 26
The framework
Does a mixed Nash Equilibrium always exist?
Idea: two player concurrent zero-sum games may not have optimalstrategies but only ε-optimal strategies (for any ε > 0).
1,−1 −1, 1
hs,rw rs
hw
Hide-or-Run game.
2, 0 0, 2
1, 1
hs,rw rs
hw
Shifted hide-or-Run game
Value problem in a zero-sum game is not a special case of NashEquilibrium problem with positive terminal rewards.
9 / 26
The framework
Does a mixed Nash Equilibrium always exist?
Idea: two player concurrent zero-sum games may not have optimalstrategies but only ε-optimal strategies (for any ε > 0).
1,−1 −1, 1
hs,rw rs
hw
Hide-or-Run game.
2, 0 0, 2
1, 1hs,rw rs
hw
Shifted hide-or-Run game
Value problem in a zero-sum game is not a special case of NashEquilibrium problem with positive terminal rewards.
9 / 26
The framework
NE on graphs are harder
Theorem (Bouyer et al. [2014])
The existence problem is undecidable for 3-player concurrent games withnon-negative terminal rewards and a constraint.
Also holds on arbitraryterminal rewards without constraints.
Theorem (Ummels [2010])
There exist games where Nash equilibria require finite memory.
10 / 26
The framework
NE on graphs are harder
Theorem (Bouyer et al. [2014])
The existence problem is undecidable for 3-player concurrent games withnon-negative terminal rewards and a constraint. Also holds on arbitraryterminal rewards without constraints.
Theorem (Ummels [2010])
There exist games where Nash equilibria require finite memory.
10 / 26
The framework
NE on graphs are harder
Theorem (Bouyer et al. [2014])
The existence problem is undecidable for 3-player concurrent games withnon-negative terminal rewards and a constraint. Also holds on arbitraryterminal rewards without constraints.
Theorem (Ummels [2010])
There exist games where Nash equilibria require finite memory.
10 / 26
Existence of equilibria
1 Concurrent framework
2 Existence of equilibria
3 Imprecise deviations
11 / 26
Existence of equilibria
Equilibria existence proof: general scheme.
Let M be the set of stationary strategy profiles.
Consider the best response function:
BRi : M→ 2M
mapping best strategy of i against a profile.
Consider global best response BR = (BRi )i∈Agt and show it is
::::::::::continuous.
Apply Kakutani fixed-point theorem: ∃σ σ ∈ BR(σ).
Continuity of BR is based on termination assumptions.
One-stage termination;
Safety condition, discounted reward;
Enforce terminating actions.
12 / 26
Existence of equilibria
Equilibria existence proof: general scheme.
Let M be the set of stationary strategy profiles.
Consider the best response function:
BRi : M→ 2M
mapping best strategy of i against a profile.
Consider global best response BR = (BRi )i∈Agt and show it is
::::::::::continuous.
Apply Kakutani fixed-point theorem: ∃σ σ ∈ BR(σ).
Continuity of BR is based on termination assumptions.
One-stage termination;
Safety condition, discounted reward;
Enforce terminating actions.
12 / 26
Existence of equilibria
Equilibria existence proof: general scheme.
Let M be the set of stationary strategy profiles.
Consider the best response function:
BRi : M→ 2M
mapping best strategy of i against a profile.
Consider global best response BR = (BRi )i∈Agt and show it is
::::::::::continuous.
Apply Kakutani fixed-point theorem: ∃σ σ ∈ BR(σ).
Continuity of BR is based on termination assumptions.
One-stage termination;
Safety condition, discounted reward;
Enforce terminating actions.
12 / 26
Existence of equilibria
Equilibria existence proof: general scheme.
Let M be the set of stationary strategy profiles.
Consider the best response function:
BRi : M→ 2M
mapping best strategy of i against a profile.
Consider global best response BR = (BRi )i∈Agt and show it is
::::::::::continuous.
Apply Kakutani fixed-point theorem: ∃σ σ ∈ BR(σ).
Continuity of BR is based on termination assumptions.
One-stage termination;
Safety condition, discounted reward;
Enforce terminating actions.
12 / 26
Existence of equilibria
Equilibria existence proof: general scheme.
Let M be the set of stationary strategy profiles.
Consider the best response function:
BRi : M→ 2M
mapping best strategy of i against a profile.
Consider global best response BR = (BRi )i∈Agt and show it is
::::::::::continuous.
Apply Kakutani fixed-point theorem: ∃σ σ ∈ BR(σ).
Continuity of BR is based on termination assumptions.
One-stage termination;
Safety condition, discounted reward;
Enforce terminating actions.
12 / 26
Existence of equilibria
Limit behaviour
War of wits: first player to play b gives up the game and looses.
Note (x , y) = (σ1(b | s1), σ2(b | s2)).
s1 s2
1, 2 2, 1
a−b−
−a
−b
NE strategy profiles: (x , 0)and (0, x) for any 1 ≥ x > 0.
NE payoffs: respectively(1, 2) and (2, 1)
BR function graph notcontinuous in (0, 0).
13 / 26
Existence of equilibria
Limit behaviour
War of wits: first player to play b gives up the game and looses.Note (x , y) = (σ1(b | s1), σ2(b | s2)).
s1 s2
1, 2 2, 1
a−b−
−a
−b
NE strategy profiles: (x , 0)and (0, x) for any 1 ≥ x > 0.
NE payoffs: respectively(1, 2) and (2, 1)
BR function graph notcontinuous in (0, 0).
13 / 26
Existence of equilibria
Limit behaviour
War of wits: first player to play b gives up the game and looses.Note (x , y) = (σ1(b | s1), σ2(b | s2)).
s1 s2
1, 2 2, 1
a−b−
−a
−b
NE strategy profiles: (x , 0)and (0, x) for any 1 ≥ x > 0.
NE payoffs: respectively(1, 2) and (2, 1)
BR function graph notcontinuous in (0, 0).
13 / 26
Existence of equilibria
Limit behaviour
War of wits: first player to play b gives up the game and looses.Note (x , y) = (σ1(b | s1), σ2(b | s2)).
s1 s2
1, 2 2, 1
a−b−
−a
−b
NE strategy profiles: (x , 0)and (0, x) for any 1 ≥ x > 0.
NE payoffs: respectively(1, 2) and (2, 1)
BR function graph notcontinuous in (0, 0).
13 / 26
Existence of equilibria
Limit behaviour
War of wits: first player to play b gives up the game and looses.Note (x , y) = (σ1(b | s1), σ2(b | s2)).
s1 s2
1, 2 2, 1
a−b−
−a
−b
NE strategy profiles: (x , 0)and (0, x) for any 1 ≥ x > 0.
NE payoffs: respectively(1, 2) and (2, 1)
BR function graph notcontinuous in (0, 0).
13 / 26
Existence of equilibria
Forcing termination: non-cycling game
A state is cycling if agents can enforce a cycle, without terminatingdeviation by any agent.
s1 s2
1, 2 2, 1
aa,ab, ba
bb
aa,ab, ba
bb
⇔0, 0
Any equilibrium in the reduced game is an equilibrium in the originalgame.
W.l.o.g, we assume there is always a player that can ensure positiveprobability termination.
14 / 26
Existence of equilibria
Forcing termination: non-cycling game
A state is cycling if agents can enforce a cycle, without terminatingdeviation by any agent.
s1 s2
1, 2 2, 1
aa,ab, ba
bb
aa,ab, ba
bb
⇔0, 0
Any equilibrium in the reduced game is an equilibrium in the originalgame.
W.l.o.g, we assume there is always a player that can ensure positiveprobability termination.
14 / 26
Existence of equilibria
Forcing termination: non-cycling game
A state is cycling if agents can enforce a cycle, without terminatingdeviation by any agent.
s1 s2
1, 2 2, 1
aa,ab, ba
bb
aa,ab, ba
bb
⇔0, 0
Any equilibrium in the reduced game is an equilibrium in the originalgame.
W.l.o.g, we assume there is always a player that can ensure positiveprobability termination.
14 / 26
Existence of equilibria
Forcing termination: non-cycling game
A state is cycling if agents can enforce a cycle, without terminatingdeviation by any agent.
s1 s2
1, 2 2, 1
aa,ab, ba
bb
aa,ab, ba
bb
⇔0, 0
Any equilibrium in the reduced game is an equilibrium in the originalgame.
W.l.o.g, we assume there is always a player that can ensure positiveprobability termination.
14 / 26
Existence of equilibria
Cycle constraints: sketch
C ⊆ States is cycling if there exists a strategy profile ensuring a stay in Cfrom any state in C .
s1 s2
t1 t2
0, 3 3, 0
a−
b−
−a
−b
aa, bb
ab,ba
aa, bb
ab,ba
σ1(b | s1) ≥ ε; σ2(b | s2) ≥ ε∀i , j , x σi (x | tj) ≥ ε;
Denote with ∆ε ⊆M the set ofsatisfying stationary strategyprofiles.
Remark ∆ε =∏
i ∆(i)ε .
15 / 26
Existence of equilibria
Cycle constraints: sketch
C ⊆ States is cycling if there exists a strategy profile ensuring a stay in Cfrom any state in C .
s1 s2
t1 t2
0, 3 3, 0
a−
b−
−a
−b
aa, bb
ab,ba
aa, bb
ab,ba
σ1(b | s1) ≥ ε; σ2(b | s2) ≥ ε∀i , j , x σi (x | tj) ≥ ε;
Denote with ∆ε ⊆M the set ofsatisfying stationary strategyprofiles.
Remark ∆ε =∏
i ∆(i)ε .
15 / 26
Existence of equilibria
Cycle constraints: sketch
C ⊆ States is cycling if there exists a strategy profile ensuring a stay in Cfrom any state in C .
s1 s2
t1 t2
0, 3 3, 0
a−
b−
−a
−b
aa, bb
ab,ba
aa, bb
ab,ba
σ1(b | s1) ≥ ε; σ2(b | s2) ≥ ε∀i , j , x σi (x | tj) ≥ ε;
Denote with ∆ε ⊆M the set ofsatisfying stationary strategyprofiles.
Remark ∆ε =∏
i ∆(i)ε .
15 / 26
Existence of equilibria
Cycle constraints: sketch
C ⊆ States is cycling if there exists a strategy profile ensuring a stay in Cfrom any state in C .
s1 s2
t1 t2
0, 3 3, 0
a−
b−
−a
−b
aa, bb
ab,ba
aa, bb
ab,ba
σ1(b | s1) ≥ ε; σ2(b | s2) ≥ ε
∀i , j , x σi (x | tj) ≥ ε;
Denote with ∆ε ⊆M the set ofsatisfying stationary strategyprofiles.
Remark ∆ε =∏
i ∆(i)ε .
15 / 26
Existence of equilibria
Cycle constraints: sketch
C ⊆ States is cycling if there exists a strategy profile ensuring a stay in Cfrom any state in C .
s1 s2
t1 t2
0, 3 3, 0
a−
b−
−a
−b
aa, bb
ab,ba
aa, bb
ab,ba
σ1(b | s1) ≥ ε; σ2(b | s2) ≥ ε
∀i , j , x σi (x | tj) ≥ ε;
Denote with ∆ε ⊆M the set ofsatisfying stationary strategyprofiles.
Remark ∆ε =∏
i ∆(i)ε .
15 / 26
Existence of equilibria
Cycle constraints: sketch
C ⊆ States is cycling if there exists a strategy profile ensuring a stay in Cfrom any state in C .
s1 s2
t1 t2
0, 3 3, 0
a−
b−
−a
−b
aa, bb
ab,ba
aa, bb
ab,ba
σ1(b | s1) ≥ ε; σ2(b | s2) ≥ ε∀i , j , x σi (x | tj) ≥ ε;
Denote with ∆ε ⊆M the set ofsatisfying stationary strategyprofiles.
Remark ∆ε =∏
i ∆(i)ε .
15 / 26
Existence of equilibria
Cycle constraints: sketch
C ⊆ States is cycling if there exists a strategy profile ensuring a stay in Cfrom any state in C .
s1 s2
t1 t2
0, 3 3, 0
a−
b−
−a
−b
aa, bb
ab,ba
aa, bb
ab,ba
σ1(b | s1) ≥ ε; σ2(b | s2) ≥ ε∀i , j , x σi (x | tj) ≥ ε;
Denote with ∆ε ⊆M the set ofsatisfying stationary strategyprofiles.
Remark ∆ε =∏
i ∆(i)ε .
15 / 26
Existence of equilibria
Cycle constraints: sketch
C ⊆ States is cycling if there exists a strategy profile ensuring a stay in Cfrom any state in C .
s1 s2
t1 t2
0, 3 3, 0
a−
b−
−a
−b
aa, bb
ab,ba
aa, bb
ab,ba
σ1(b | s1) ≥ ε; σ2(b | s2) ≥ ε∀i , j , x σi (x | tj) ≥ ε;
Denote with ∆ε ⊆M the set ofsatisfying stationary strategyprofiles.
Remark ∆ε =∏
i ∆(i)ε .
15 / 26
Existence of equilibria
Bounding probability of termination
Theorem
For ε > 0, there exists p > 0 and k ∈ N such that for any σ ∈ ∆ε,
∀s ∈ States Pσ(s · States≤k · F) ≥ p
That is to say, after k iterations, there is a bounded probability that a finalstate is reached.
Ensures almost-sure termination, and even more.
16 / 26
Existence of equilibria
Bounding probability of termination
Theorem
For ε > 0, there exists p > 0 and k ∈ N such that for any σ ∈ ∆ε,
∀s ∈ States Pσ(s · States≤k · F) ≥ p
That is to say, after k iterations, there is a bounded probability that a finalstate is reached.
Ensures almost-sure termination, and even more.
16 / 26
Existence of equilibria
Fixed-point
Theorem (Best response function)
Let
BRε(σ) ={σ′ ∈ ∆(i)
ε
∣∣∣ ∀i ∈ Agt ∀s ∈ States, Eσ[i/σ′i ](φi | s) ≥ Eσ(φi | s)
}
For 0 < ε ≤ 1|Act| , BRε has a fixed point σ ∈ BRε(σ) in ∆ε.
Sketch.
BRε graph is continuous over ∆ε, and for any σ ∈ ∆ε, BRε(σ) is anon-empty closed convex, so fixed point theorem (Kakutani [1941])holds.
17 / 26
Existence of equilibria
Fixed-point
Theorem (Best response function)
Let
BRε(σ) ={σ′ ∈ ∆(i)
ε
∣∣∣ ∀i ∈ Agt ∀s ∈ States, Eσ[i/σ′i ](φi | s) ≥ Eσ(φi | s)
}For 0 < ε ≤ 1
|Act| , BRε has a fixed point σ ∈ BRε(σ) in ∆ε.
Sketch.
BRε graph is continuous over ∆ε, and for any σ ∈ ∆ε, BRε(σ) is anon-empty closed convex, so fixed point theorem (Kakutani [1941])holds.
17 / 26
Existence of equilibria
Fixed-point
Theorem (Best response function)
Let
BRε(σ) ={σ′ ∈ ∆(i)
ε
∣∣∣ ∀i ∈ Agt ∀s ∈ States, Eσ[i/σ′i ](φi | s) ≥ Eσ(φi | s)
}For 0 < ε ≤ 1
|Act| , BRε has a fixed point σ ∈ BRε(σ) in ∆ε.
Sketch.
BRε graph is continuous over ∆ε, and for any σ ∈ ∆ε, BRε(σ) is anon-empty closed convex, so fixed point theorem (Kakutani [1941])holds.
17 / 26
Imprecise deviations
1 Concurrent framework
2 Existence of equilibria
3 Imprecise deviations
18 / 26
Imprecise deviations
Previous fixed point is stationnary;
Not necessary a NE.
Definition
σ ∈ S, is an equilibrium under ε-imprecise deviations if for any player i ,
∀σ′i ∃σ′′i d(σ′i , σ′′i ) ≤ ε, Eσ[i/σ′i ] (φi | h) ≤ Eσ (φi | h)
with d(σ, σ′) the maximal distance between distributions.
A NE is an equilibria under ε-imprecise deviations.
ε-NE and equilibria under imprecise deviations are incomparable.
19 / 26
Imprecise deviations
Previous fixed point is stationnary;
Not necessary a NE.
Definition
σ ∈ S, is an equilibrium under ε-imprecise deviations if for any player i ,
∀σ′i ∃σ′′i d(σ′i , σ′′i ) ≤ ε, Eσ[i/σ′i ] (φi | h) ≤ Eσ (φi | h)
with d(σ, σ′) the maximal distance between distributions.
A NE is an equilibria under ε-imprecise deviations.
ε-NE and equilibria under imprecise deviations are incomparable.
19 / 26
Imprecise deviations
Previous fixed point is stationnary;
Not necessary a NE.
Definition
σ ∈ S, is an equilibrium under ε-imprecise deviations if for any player i ,
∀σ′i ∃σ′′i d(σ′i , σ′′i ) ≤ ε, Eσ[i/σ′i ] (φi | h) ≤ Eσ (φi | h)
with d(σ, σ′) the maximal distance between distributions.
A NE is an equilibria under ε-imprecise deviations.
ε-NE and equilibria under imprecise deviations are incomparable.
19 / 26
Imprecise deviations
Memoryless deviations
Do fixed points of BRε fall in this category of equilibria ?
(Restriction tomemoryless deviations)
∀σ′i ∈Mi ∃σ′′i ∈Mi d(σ′i , σ′′i ) ≤ ε, Eσ[i/σ′i ] (φi | h) ≤ Eσ (φi | h)
20 / 26
Imprecise deviations
Memoryless deviations
Do fixed points of BRε fall in this category of equilibria ? (Restriction tomemoryless deviations)
∀σ′i ∈Mi ∃σ′′i ∈Mi d(σ′i , σ′′i ) ≤ ε, Eσ[i/σ′i ] (φi | h) ≤ Eσ (φi | h)
20 / 26
Imprecise deviations
Memoryless deviations
Consider σ ∈M, and an agent i , we construct a new game G 〈σ〉−iε
of thefollowing form:
σi (a|s)
[0, ε]
[0, 2ε]
[1− 2ε, 1] [1− ε, 1]
The game is turn-based;
All players except i have fixedstrategy;
i is playing against antagonisticplayer i ;
i suggests a new move;
Then i has some latitude todeviate.
Conclusion: turn-based games are determined (Thomas M. Liggett [1969])hence we can assume deviations to be memoryless.
21 / 26
Imprecise deviations
Memoryless deviations
Consider σ ∈M, and an agent i , we construct a new game G 〈σ〉−iε
of thefollowing form:
σi (a|s)
[0, ε]
[0, 2ε]
[1− 2ε, 1] [1− ε, 1]
The game is turn-based;
All players except i have fixedstrategy;
i is playing against antagonisticplayer i ;
i suggests a new move;
Then i has some latitude todeviate.
Conclusion: turn-based games are determined (Thomas M. Liggett [1969])hence we can assume deviations to be memoryless.
21 / 26
Imprecise deviations
Memoryless deviations
Consider σ ∈M, and an agent i , we construct a new game G 〈σ〉−iε
of thefollowing form:
σi (a|s)
[0, ε]
[0, 2ε]
[1− 2ε, 1] [1− ε, 1]
The game is turn-based;
All players except i have fixedstrategy;
i is playing against antagonisticplayer i ;
i suggests a new move;
Then i has some latitude todeviate.
Conclusion: turn-based games are determined (Thomas M. Liggett [1969])hence we can assume deviations to be memoryless.
21 / 26
Imprecise deviations
Memoryless deviations
Consider σ ∈M, and an agent i , we construct a new game G 〈σ〉−iε
of thefollowing form:
σi (a|s)
[0, ε]
[0, 2ε]
[1− 2ε, 1] [1− ε, 1]The game is turn-based;
All players except i have fixedstrategy;
i is playing against antagonisticplayer i ;
i suggests a new move;
Then i has some latitude todeviate.
Conclusion: turn-based games are determined (Thomas M. Liggett [1969])hence we can assume deviations to be memoryless.
21 / 26
Imprecise deviations
Memoryless deviations
Consider σ ∈M, and an agent i , we construct a new game G 〈σ〉−iε
of thefollowing form:
σi (a|s)
[0, ε]
[0, 2ε]
[1− 2ε, 1] [1− ε, 1]The game is turn-based;
All players except i have fixedstrategy;
i is playing against antagonisticplayer i ;
i suggests a new move;
Then i has some latitude todeviate.
Conclusion: turn-based games are determined (Thomas M. Liggett [1969])hence we can assume deviations to be memoryless.
21 / 26
Imprecise deviations
Memoryless deviations
Consider σ ∈M, and an agent i , we construct a new game G 〈σ〉−iε
of thefollowing form:
σi (a|s)
[0, ε]
[0, 2ε]
[1− 2ε, 1] [1− ε, 1]The game is turn-based;
All players except i have fixedstrategy;
i is playing against antagonisticplayer i ;
i suggests a new move;
Then i has some latitude todeviate.
Conclusion: turn-based games are determined (Thomas M. Liggett [1969])hence we can assume deviations to be memoryless.
21 / 26
Imprecise deviations
Other consequence: computating a NE is in PSPACE
Theorem
Assume the number of actions fixed. For every ε > 0, (xi )i , (yi )i ∈ RAgt,one can decide in polynomial space whether there exists a stationaryequilibrium under ε-imprecise deviations σ such that for every i ∈ Agt,xi ≤ Eσ(φi | s0) ≤ yi .
Sketch.
As for classical NE (Ummels and Wojtczak [2011]), we compute someinitial non-deterministic polynomial time pre-computation, to encode astability formula ϕ ∈ FO(R,≤).ϕ encodes existence of some kind of stable profile:
relies on the construction of G 〈σ〉−iε.
22 / 26
Imprecise deviations
Other consequence: computating a NE is in PSPACE
Theorem
Assume the number of actions fixed. For every ε > 0, (xi )i , (yi )i ∈ RAgt,one can decide in polynomial space whether there exists a stationaryequilibrium under ε-imprecise deviations σ such that for every i ∈ Agt,xi ≤ Eσ(φi | s0) ≤ yi .
Sketch.
As for classical NE (Ummels and Wojtczak [2011]), we compute someinitial non-deterministic polynomial time pre-computation, to encode astability formula ϕ ∈ FO(R,≤).
ϕ encodes existence of some kind of stable profile:
relies on the construction of G 〈σ〉−iε.
22 / 26
Imprecise deviations
Other consequence: computating a NE is in PSPACE
Theorem
Assume the number of actions fixed. For every ε > 0, (xi )i , (yi )i ∈ RAgt,one can decide in polynomial space whether there exists a stationaryequilibrium under ε-imprecise deviations σ such that for every i ∈ Agt,xi ≤ Eσ(φi | s0) ≤ yi .
Sketch.
As for classical NE (Ummels and Wojtczak [2011]), we compute someinitial non-deterministic polynomial time pre-computation, to encode astability formula ϕ ∈ FO(R,≤).ϕ encodes existence of some kind of stable profile:
relies on the construction of G 〈σ〉−iε.
22 / 26
Imprecise deviations
Overview
Getting closer to exact NE existence problem (2 players),
Usage of linear constraints to enforce a non-linear property(termination),
New notion of equilibria, not equivalent to previous ones,
Still hope for exact NE in non-negative terminal reward games,
NP-hardness can be adapted to this new notion.
Thank you for your attention.
23 / 26
Imprecise deviations
Overview
Getting closer to exact NE existence problem (2 players),
Usage of linear constraints to enforce a non-linear property(termination),
New notion of equilibria, not equivalent to previous ones,
Still hope for exact NE in non-negative terminal reward games,
NP-hardness can be adapted to this new notion.
Thank you for your attention.
23 / 26
Imprecise deviations
Overview
Getting closer to exact NE existence problem (2 players),
Usage of linear constraints to enforce a non-linear property(termination),
New notion of equilibria, not equivalent to previous ones,
Still hope for exact NE in non-negative terminal reward games,
NP-hardness can be adapted to this new notion.
Thank you for your attention.
23 / 26
Imprecise deviations
Bibliography I
Patricia Bouyer, Nicolas Markey, and Daniel Stan. Mixed Nash equilibriain concurrent games. In Proceedings of the 34th Conference onFoundations of Software Technology and Theoretical Computer Science(FSTTCS’14), volume 29 of Leibniz International Proceedings inInformatics, pages 351–363. Leibniz-Zentrum fur Informatik, December2014. doi: 10.4230/LIPIcs.FSTTCS.2014.351. URL http://www.
lsv.ens-cachan.fr/Publis/PAPERS/PDF/BMS-fsttcs14.pdf.
Krishnendu Chatterjee, Marcin Jurdzinski, and Rupak Majumdar. On Nashequilibria in stochastic games. In CSL’04, volume 3210 of LNCS, pages26–40. Springer, 2004.
Shizuo Kakutani. A generalization of brouwers fixed point theorem. DukeMath. J., 8(3):457–459, 09 1941. doi:10.1215/S0012-7094-41-00838-4. URLhttp://dx.doi.org/10.1215/S0012-7094-41-00838-4.
24 / 26
Imprecise deviations
Bibliography II
John F. Nash. Equilibrium points in n-person games. Proceedings of theNational Academy of Sciences of the United States of America, 36(1):48–49, 1950.
Piercesare Secchi and William D. Sudderth. Stay-in-a-set games.International Journal of Game Theory, 30:479–490, 2001.
Steven A. Lippman Thomas M. Liggett. Short notes: Stochastic gameswith perfect information and time average payoff. SIAM Review, 11(4):604–607, 1969. ISSN 00361445. URLhttp://www.jstor.org/stable/2029090.
Michael Ummels. Stochastic Multiplayer Games: Theory and Algorithms.Ph.D. Thesis, Department of Computer Science, RWTH Aachen,Germany, January 2010. URL http://www.lsv.ens-cachan.fr/
Publis/PAPERS/PDF/ummels-phd10.pdf.
25 / 26
Imprecise deviations
Bibliography III
Michael Ummels and Dominik Wojtczak. The complexity of Nashequilibria in stochastic multiplayer games. Logical Methods in ComputerScience, 7(3), 2011.
26 / 26
top related