a lyapunov optimization approach to repeated stochastic games

A Lyapunov Optimization Approach to Repeated Stochastic Games

Michael J. NeelyUniversity of Southern California

http://www-bcf.usc.edu/~mjneelyProc. Allerton Conference on Communication, Control, and Computing, Oct. 2013

Game manager

Player 1

Player 2

Player 3

Player 4

Player 5

http://www-bcf.usc.edu/~mjneely

Game structure• Slotted time t in {0, 1, 2, …}.

• N players, 1 game manager.

• Slot t utility for each player depends on:(i) Random events ω(t) = (ω0(t), ω1(t),…,ωN(t))(ii) Control actions α(t) = (α1(t), … , αN(t))

• Players Maximize time average utility.

• Game manager Provides suggestions. Maintains fairness of utilities subject to equilibrium constraints.

Random events ω(t)

• Player i sees ωi(t).

• Manager sees: ω(t) = (ω0(t), ω1(t), … , ωN(t))

Game managerPlayer 1 ω1(t)

Player 2 ω2(t)

Player 3 ω3(t)

(ω0(t), ω1(t), …, ωΝ(t))

Only known to manager!

Random events ω(t)

• Player i sees ωi(t).

• Manager sees: ω(t) = (ω0(t), ω1(t), … , ωN(t))

• Vector ω(t) is i.i.d. over slots (components are possibly correlated)

Game managerPlayer 1 ω1(t)

Player 2 ω2(t)

Player 3 ω3(t)

(ω0(t), ω1(t), …, ωΝ(t))

Actions and utilities

• Manager sends suggested actions Mi(t).

• Players take actions αi(t) in Ai.

• Ui(t) = ui( α(t), ω(t) ).

Game managerPlayer 1 α1(t)

Player 2 α2(t)

Player 3 α3(t)

(ω0(t), ω1(t), …, ωΝ(t))

M1(t)

M2(t)

M3(t)

Example: Wireless MAC game

• Manager knows current channel conditions: ω0(t) = (C1(t), C2(t), … , CN(t))

• Users do not have this knowledge: ωi(t) = NULL

User 1

User 2

User 3

Access Point

C1(t)

C2(t)

C3(t)

Example: Economic market

• ω0(t) = vector of current prices.

• Prices are commonly known to everyone: ωi(t) = ω0(t) for all i.

Game managerPlayer 1

Player 2

Player 3

ω0(t) = [priceHAM(t)] [priceEGGS(t)]

ParticipationAt beginning of game, players choose either: (i) Participate: • Receive messages Mi(t).• Always choose αi(t) = Mi(t).

(ii) Do not participate: • Do not receive messages Mi(t).• Can choose αi(t) however they like.

Need incentives for participation…

ParticipationAt beginning of game, players choose either: (i) Participate: • Receive messages Mi(t).• Always choose αi(t) = Mi(t).

(ii) Do not participate: • Do not receive messages Mi(t).• Can choose αi(t) however they like.

Need incentives for participation…• Nash equilibrium (NE)• Correlated equilibrium (CE)• Coarse Correlated Equilibrium (CCE)

ΝΕ for Static Game • Consider special case with no ω(t) process.• Nash equilibrium (NE): Players actions are independent: Pr[α] = Pr[α1]Pr[α2]…Pr[αN]

Game manager not needed.

• Definition: Distribution Pr[α] is a Nash equilibrium (NE) if no player can benefit by unilaterally changing its action probabilities.

Finding a NE in a general game is a nonconvex problem!

CΕ for Static Game • Manager suggests actions α(t) i.i.d. Pr[α].

• Suppose all players participate.

• Definition: [Aumann 1974, 1987] Distribution Pr[α] is a Correlated Equilibrium (CE) if:

E[ Ui(t)| αi(t)=α ] ≥ E[ ui(β, α{-i}) | αi(t)=α] for all i in {1, …, N}, all pairs α, β in Ai.

LP with |A1|2 + |A2|2 + … + |AN|2 constraints

Criticism of CE• Manager gives suggestions Mi(t) to players even if

they do not participate.

• Without knowing message Mi(t) = αi : Player i only knows a-priori likelihood of other player actions via joint distribution Pr[α].

• Knowing Mi(t) = αi : Player i knows a-posteriori likelihood of other player actions via conditional distribution Pr[α | αi ]

CCΕ for Static Game • Manager suggests α(t) i.i.d. Pr[α]. • Gives suggestions only to participating players.• Suppose all players participate.

• Definition: [Moulin and Vial, 1978] Distribution Pr[α] is a Coarse Corr. Eq. (CCE) if:

E[ Ui(t) ] ≥ E[ ui(β, α{-i}) ] for all i in {1, …, N}, all pairs β in Ai.

LP with |A1| + |A2| + … + |AN| constraints.

( significantly less complex! )

Superset Theorem

The NE, CE, CCE definitions extend easily to the stochastic game.

Theorem:

{all NE} {all CE} {all CCE}

Example (static game)Pl

ayer

1Player 2

Utility function 1 Utility function 2

2 5253

4

Play

er 1

Player 2

50 1403

2

Avg.

Util

ity 2

Avg. Utility 1

(3.5, 2.4)

(3.5, 9.3)(3.87, 3.79)

NE and CE point

All players benefit if non-participants are denied access to the suggestions of the game manager.

CCE region

Pure strategies for stochastic games• Player i observes: ωi(t) in Ωi

• Player i chooses: αi(t) in Ai

• Definition: A pure strategy for player i is a function bi : Ωi Ai.

• There are |Ai||Ωi| pure strategies for player i.

• Define Si as this set of pure strategies.

Ωi Aibi(ωi)

Stochastic optimization problem

Subject to:

Ui ≥ Ui(s) for all i in {1, …, N}

for all s in Si

φ( U1, U2, …, UN )Maximize:

α(t) in A1 x A2 x … x AN for all t in {0, 1, 2, …}

1)

2)

Concave fairness function

CCE Constraints

Lyapunov optimization approach

Ui ≥ Ui(s) for all i in {1, …, N}, for all s in Si

Constraints:

Virtual queue:

Qi(s)(t)

ui(α(t), ω(t))ui(s)(α(t), ω(t))

Formally:ui

(s)(α(t), ω(t)) = ui((bi(s)(ωi(t)), α{-i}(t)), ω(t))

Online algorithm (main part):Every slot t: • Game manager observes queues and ω(t). • Chooses α(t) in A1 x A2 x … x AN to minimize:

• Do an auxiliary variable selection (omitted here).• Update virtual queues.

Knowledge of Pr[ω(t) = (ω0, ω1, …., ωN)] not required!

Conclusions:• CCE constraints are simpler and lead to improved

utilities.

• Online algorithm for the stochastic game.

• No knowledge of Pr[ω(t) = (ω0, ω1, …., ωN)] required!

• Complexity and convergence time is independent of size of Ω0.

• Scales gracefully with large N.

Aux variable update:• Choose xi(t) in [0, 1] to maximize:

Vφ(x1(t), …, xN(t)) – ∑ Zi(t)xi(t)

Where Zi(t) is another virtual queue, one for each player i in {1, …, N}. See paper for details: http://ee.usc.edu/stochastic-nets/docs/repeated-games-maxweight.pdf

a lyapunov optimization approach to repeated stochastic games

Documents

t process

game structureslotted

nt vector t

player actions

slot t utility

ntii control actions

static game manager

players actions