a lyapunov optimization approach to repeated stochastic games

21
Lyapunov Optimization Approa to Repeated Stochastic Gam Michael J. Neely University of Southern California http://www-bcf.usc.edu/~mjneely . Allerton Conference on Communication, Control, and Computing, Oct. Game manager Player 1 Player 2 Player 3 Player 4 Player 5

Upload: tarala

Post on 22-Feb-2016

61 views

Category:

Documents


0 download

DESCRIPTION

A Lyapunov Optimization Approach to Repeated Stochastic Games. Player 3. Player 1. Player 4. Game manager. Player 5. Player 2. Michael J. Neely University of Southern California http://www-bcf.usc.edu/~mjneely. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A  Lyapunov  Optimization Approach     to Repeated Stochastic Games

A Lyapunov Optimization Approach to Repeated Stochastic Games

Michael J. NeelyUniversity of Southern California

http://www-bcf.usc.edu/~mjneelyProc. Allerton Conference on Communication, Control, and Computing, Oct. 2013

Game manager

Player 1

Player 2

Player 3

Player 4

Player 5

Page 2: A  Lyapunov  Optimization Approach     to Repeated Stochastic Games

Game structure• Slotted time t in {0, 1, 2, …}.

• N players, 1 game manager.

• Slot t utility for each player depends on:(i) Random events ω(t) = (ω0(t), ω1(t),…,ωN(t))(ii) Control actions α(t) = (α1(t), … , αN(t))

• Players Maximize time average utility.

• Game manager Provides suggestions. Maintains fairness of utilities subject to equilibrium constraints.

Page 3: A  Lyapunov  Optimization Approach     to Repeated Stochastic Games

Random events ω(t)

• Player i sees ωi(t).

• Manager sees: ω(t) = (ω0(t), ω1(t), … , ωN(t))

Game managerPlayer 1 ω1(t)

Player 2 ω2(t)

Player 3 ω3(t)

(ω0(t), ω1(t), …, ωΝ(t))

Only known to manager!

Page 4: A  Lyapunov  Optimization Approach     to Repeated Stochastic Games

Random events ω(t)

• Player i sees ωi(t).

• Manager sees: ω(t) = (ω0(t), ω1(t), … , ωN(t))

• Vector ω(t) is i.i.d. over slots (components are possibly correlated)

Game managerPlayer 1 ω1(t)

Player 2 ω2(t)

Player 3 ω3(t)

(ω0(t), ω1(t), …, ωΝ(t))

Page 5: A  Lyapunov  Optimization Approach     to Repeated Stochastic Games

Actions and utilities

• Manager sends suggested actions Mi(t).

• Players take actions αi(t) in Ai.

• Ui(t) = ui( α(t), ω(t) ).

Game managerPlayer 1 α1(t)

Player 2 α2(t)

Player 3 α3(t)

(ω0(t), ω1(t), …, ωΝ(t))

M1(t)

M2(t)

M3(t)

Page 6: A  Lyapunov  Optimization Approach     to Repeated Stochastic Games

Example: Wireless MAC game

• Manager knows current channel conditions: ω0(t) = (C1(t), C2(t), … , CN(t))

• Users do not have this knowledge: ωi(t) = NULL

User 1

User 2

User 3

Access Point

C1(t)

C2(t)

C3(t)

Page 7: A  Lyapunov  Optimization Approach     to Repeated Stochastic Games

Example: Economic market

• ω0(t) = vector of current prices.

• Prices are commonly known to everyone: ωi(t) = ω0(t) for all i.

Game managerPlayer 1

Player 2

Player 3

ω0(t) = [priceHAM(t)] [priceEGGS(t)]

Page 8: A  Lyapunov  Optimization Approach     to Repeated Stochastic Games

ParticipationAt beginning of game, players choose either: (i) Participate: • Receive messages Mi(t).• Always choose αi(t) = Mi(t).

(ii) Do not participate: • Do not receive messages Mi(t).• Can choose αi(t) however they like.

Need incentives for participation…

Page 9: A  Lyapunov  Optimization Approach     to Repeated Stochastic Games

ParticipationAt beginning of game, players choose either: (i) Participate: • Receive messages Mi(t).• Always choose αi(t) = Mi(t).

(ii) Do not participate: • Do not receive messages Mi(t).• Can choose αi(t) however they like.

Need incentives for participation…• Nash equilibrium (NE)• Correlated equilibrium (CE)• Coarse Correlated Equilibrium (CCE)

Page 10: A  Lyapunov  Optimization Approach     to Repeated Stochastic Games

ΝΕ for Static Game • Consider special case with no ω(t) process.• Nash equilibrium (NE): Players actions are independent: Pr[α] = Pr[α1]Pr[α2]…Pr[αN]

Game manager not needed.

• Definition: Distribution Pr[α] is a Nash equilibrium (NE) if no player can benefit by unilaterally changing its action probabilities.

Finding a NE in a general game is a nonconvex problem!

Page 11: A  Lyapunov  Optimization Approach     to Repeated Stochastic Games

CΕ for Static Game • Manager suggests actions α(t) i.i.d. Pr[α].

• Suppose all players participate.

• Definition: [Aumann 1974, 1987] Distribution Pr[α] is a Correlated Equilibrium (CE) if:

E[ Ui(t)| αi(t)=α ] ≥ E[ ui(β, α{-i}) | αi(t)=α] for all i in {1, …, N}, all pairs α, β in Ai.

LP with |A1|2 + |A2|2 + … + |AN|2 constraints

Page 12: A  Lyapunov  Optimization Approach     to Repeated Stochastic Games

Criticism of CE• Manager gives suggestions Mi(t) to players even if

they do not participate.

• Without knowing message Mi(t) = αi : Player i only knows a-priori likelihood of other player actions via joint distribution Pr[α].

• Knowing Mi(t) = αi : Player i knows a-posteriori likelihood of other player actions via conditional distribution Pr[α | αi ]

Page 13: A  Lyapunov  Optimization Approach     to Repeated Stochastic Games

CCΕ for Static Game • Manager suggests α(t) i.i.d. Pr[α]. • Gives suggestions only to participating players.• Suppose all players participate.

• Definition: [Moulin and Vial, 1978] Distribution Pr[α] is a Coarse Corr. Eq. (CCE) if:

E[ Ui(t) ] ≥ E[ ui(β, α{-i}) ] for all i in {1, …, N}, all pairs β in Ai.

LP with |A1| + |A2| + … + |AN| constraints.

( significantly less complex! )

Page 14: A  Lyapunov  Optimization Approach     to Repeated Stochastic Games

Superset Theorem

The NE, CE, CCE definitions extend easily to the stochastic game.

Theorem:

{all NE} {all CE} {all CCE}

Page 15: A  Lyapunov  Optimization Approach     to Repeated Stochastic Games

Example (static game)Pl

ayer

1Player 2

Utility function 1 Utility function 2

2 5253

4

Play

er 1

Player 2

50 1403

2

Avg.

Util

ity 2

Avg. Utility 1

(3.5, 2.4)

(3.5, 9.3)(3.87, 3.79)

NE and CE point

All players benefit if non-participants are denied access to the suggestions of the game manager.

CCE region

Page 16: A  Lyapunov  Optimization Approach     to Repeated Stochastic Games

Pure strategies for stochastic games• Player i observes: ωi(t) in Ωi

• Player i chooses: αi(t) in Ai

• Definition: A pure strategy for player i is a function bi : Ωi Ai.

• There are |Ai||Ωi| pure strategies for player i.

• Define Si as this set of pure strategies.

Ωi Aibi(ωi)

Page 17: A  Lyapunov  Optimization Approach     to Repeated Stochastic Games

Stochastic optimization problem

Subject to:

Ui ≥ Ui(s) for all i in {1, …, N}

for all s in Si

φ( U1, U2, …, UN )Maximize:

α(t) in A1 x A2 x … x AN for all t in {0, 1, 2, …}

1)

2)

Concave fairness function

CCE Constraints

Page 18: A  Lyapunov  Optimization Approach     to Repeated Stochastic Games

Lyapunov optimization approach

Ui ≥ Ui(s) for all i in {1, …, N}, for all s in Si

Constraints:

Virtual queue:

Qi(s)(t)

ui(α(t), ω(t))ui(s)(α(t), ω(t))

Formally:ui

(s)(α(t), ω(t)) = ui((bi(s)(ωi(t)), α{-i}(t)), ω(t))

Page 19: A  Lyapunov  Optimization Approach     to Repeated Stochastic Games

Online algorithm (main part):Every slot t: • Game manager observes queues and ω(t). • Chooses α(t) in A1 x A2 x … x AN to minimize:

• Do an auxiliary variable selection (omitted here).• Update virtual queues.

Knowledge of Pr[ω(t) = (ω0, ω1, …., ωN)] not required!

Page 20: A  Lyapunov  Optimization Approach     to Repeated Stochastic Games

Conclusions:• CCE constraints are simpler and lead to improved

utilities.

• Online algorithm for the stochastic game.

• No knowledge of Pr[ω(t) = (ω0, ω1, …., ωN)] required!

• Complexity and convergence time is independent of size of Ω0.

• Scales gracefully with large N.

Page 21: A  Lyapunov  Optimization Approach     to Repeated Stochastic Games

Aux variable update:• Choose xi(t) in [0, 1] to maximize:

Vφ(x1(t), …, xN(t)) – ∑ Zi(t)xi(t)

Where Zi(t) is another virtual queue, one for each player i in {1, …, N}. See paper for details: http://ee.usc.edu/stochastic-nets/docs/repeated-games-maxweight.pdf