repeated games apec 8205: applied game theory fall 2007

Repeated Games

APEC 8205: Applied Game Theory

Fall 2007

Objectives

• Understand the Class of Repeated Games

• Understand Conditions Under Which Non-Nash Play Can be Sustained as a Subgame Perfect Nash Equilibrium when a Game is Repeated– Multiple Nash Equilibria

– Infinite Repetition

Why study repeated games?

• Many interactions in life are repeated.– Large retailers compete on a daily basis for customers.

– Dana and I compete on a daily basis to decide who will make dinner and who will pickup around the house.

– Mason and Spencer compete on a daily basis to see who gets to watch TV and who gets to play X-Box.

• What is of interest in these type of repeated interactions?– Can players achieve better results than might occur in a single shot game?

– Can players use the history of play to their advantage?

Some Terminology

• G: Stage game (usually thought of in normal form).– Players: i = 1,..,N

– ai Ai: Strategy space for player i.

– a = (a1,…,aN) A = i = 1NAi: Strategy profile for all players.

– ui(a): Player i’s payoff for strategy profile a.

– u(a) = (u1(a),…, uN(a)): Vector of player payoffs for strategy profile a.

• T: Number of times the stage game is repeated (could be infinite).

• ait Ai : Player i’s strategy choice at time t.

• at = (a1t,…,aNt) A = i = 1NAi: Strategy profile for all players at time t.

• ht = (a1,…,at-1) At = t’ = 1t-1A: History of play at time t.

• sit(ht) Ai: History dependent strategy.

• st(ht) = (s1t(ht), …, sNt(ht)) A: History dependent strategy profile.

• Ui(s1(h1),…, sT(hT)) = t=1Twitui(st(ht)): Player i’s payoff from the game.

• U(s1(h1),…, sT(hT)) = (U1(s1(h1),…, sT(hT)),…,UN(s1(h1),…, sT(hT))): Payoffs for all players.

Consider and Example

Suppose this Prisoner’s Dilemma game is played twice and that wit =1 for i = 1,2 and t = 1,2.

Player 2 C D

C 2 2

3 0

Player 1 D 0

3 1

1

Two Period Prisoner’s Dilemma ExampleIn Extensive Form

Player 1

Player 2

C D

C D C D

C D

C D C D

Player 2

Player 1 Player 1 Player 1 Player 1

C D

C D C D

Player 2

C D

C D C D

Player 2

C D

C D C D

Player 2

44

Player 1’s PayoffPlayer 2’s Payoff

25

52

33

25

06

33

14

52

33

60

41

33

14

41

22

Two Period Prisoner’s Dilemma ExampleAfter Solving Stage 2 Subgames

Player 1

Player 2

C D

C D C D

D

D

Player 2

Player 1 Player 1 Player 1 Player 1

D

D

Player 2

D

D

Player 2

D

D

Player 2


33

14

41

22

Two Period Prisoner’s Dilemma ExampleAfter Solving Game As Whole

Player 1

Player 2

D

D

Player 1

D

D

Player 2


22

Therefore, the subgame perfect strategies are (strategy choice in stage 1,

strategy choice in stage 2 given (D,D) in stage 1, strategy choice in stage 2 given (D,C) in stage 1, strategy choice in stage 2 given (C,D) in stage 1, strategy choice in stage 2 given (C,C) in stage 1)

=(D,D,D,D,D)

for both players.

So, what is the point?

• If the stage game of a finitely repeated game has a unique Nash equilibrium, then there is a unique subgame perfect equilibrium where that Nash equilibrium is played in every stage of the game!

• But what can happen if there is not a unique equilibrium?

• Or what if the stage game can be infinitely repeated?

What about multiple equilibria?

Consider this modified version of the Prisoner’s Dilemmaand assume T = 2 and wit = 1 for i = 1,2 and t = 1,2.

Player 2 L C R

U 4

4 5

2 0

0

Player 1

M 2

5 3

3 0

0

D 0

0 0

0 1

1

Starting with Period 2

• There are 9 possible histories for the 2nd period of this game:– (U,L), (U,C), (U,R), (M,L), (M,C), (M,R), (D,L), (D,C), and (D,R).

• For any subgame starting from one of these histories, there are two potential Nash equilibria: (M,C) or (D,R).

• Therefore, for an equilibrium strategy to be subgame perfect, it must have (M,C) or (R,D) in response to the history (x, y) for x = U, M, D and y = L, C, R in the first period.

Now Period 1

With these strategies the players’ payoffs for the game starting in period 1 are:

Player 2 L C R

U 7

7 6

3 1

1

Player 1

M 3

6 4

4 1

1

D 1

1 1

1 2

2

which yields a subgame perfect equilibrium with cooperative,Non-Nash stage game play in period 1!

Consider the strategies s12(h1) = M if h1 = (U,L) and D otherwise

& s22(h1) = C if h1 = (U,L) and R otherwise.

What about infinite repetition?

• First, two definitions:– Feasible Payoff: Any convex combination of the pure strategy profile payoffs.

i is feasible if i = sA s ui(s) where s 0 for s A and sA s = 1.

– Average Payoff : (1 - ) t = 1 t - 1 ui(ht) where 1 0 is the discount factor.

• Theorem (Friedman 1971): Let G be a finite, static game of complete information. Let (e1,…,eN) denote the payoffs from a Nash equilibrium of G, and let (x1,…,xN) denote any other feasible payoffs from G. If xi > ei for every i and if is sufficiently close to one, then there exists a subgame perfect Nash equilibrium of the infinitely repeated game G that achieves (x1,…,xN) as the average payoff.

• Often referred to as the Folk Theorem, but there are now lots of different versions of this Folk Theorem.

What does this result mean?

• In infinitely repeated games, we can get lots of subgame perfect equilibria.

• These equilibria can include actions in a stage game that are not Nash equilibrium actions for that stage game.

• You can get cooperative behavior in a Prisoner’s Dilemma!

Lets see what I mean.

Consider the Prisoner’s Dilemma

Player 2 C D

C 2 2

3 0

Player 1 D 0

3 1

1

Consider the strategy: Play C in Period 1,Play C in period t > 1 if at’ = (C, C) for all t’ t,Otherwise play D.

Can we find a discount rate such that this strategy is subgame perfect for this Prisoner’s Dilemma if it is repeated infinitely?

The answer to this question is yes!

• Suppose Player j is playing this type of strategy. At any point in time, Player j has either chosen D in the past in response to i’s choice of D or he has always chosen C because i has always chosen C. So, we must consider whether the strategy above is a best response for player i under both of these circumstances.

• If D has been chosen in the past, player j will always choose D in the future. What is optimal for i now will be optimal for i in the future due to infinite repetition.– Let VC & VD be the current value of playing strategy C & D.

– If C is optimal, i’s payoff from here on out will be VC = 0 + VC such that VC = 0.

– If D is optimal, i’s payoff from here on out will be VD = 1 + VD such that VD = 1/(1 - ).

– VD > VC, so D is optimal.

• If D has not been chosen in the past, player j will choose C in the immediate future and will continue to do so as long as i does. But if i chooses D, j will follow suit from here on out. Again, what is optimal for i now will be optimal for i in the future due to infinite repetition.– If C is optimal, i’s payoff from here on out will be VC = 2 + VC such that VC =

2/(1 - ).

– If D is optimal, i’s payoff from here on out will be VD = 3 + /(1 - ).

– VC >/=/< VD when >/=/< ½.

To summarize

• As long as > ½, this strategy will constitute a subgame perfect Nash strategy for the infinitely repeated Prisoner’s Dilemma.

• This type of strategy is often referred to as a trigger strategy.– Bad behavior on the part of one player triggers bad behavior on the part of his

opponent from here on after.

Are there other trigger strategies that can work?

YES!

General Trigger Strategy

• Define i

*: equilibrium payoffs (per stage)

iD: defection payoff

iP: punishment payoffs (Nash equilibrium payoff per stage)

• Assume iD > i

* > iP

• Cheating doesn’t pay when:

11

* PiD

ii

or Pi

Di

iDi

*

1

Are there other types of strategies that can work?

YES! LOTS MORE!

So what are we to make of all this?

• It does provide an explanation for cooperation in games where cooperation seems unlikely.

• However, the explanation tells us that almost anything is possible.– So, what type of behavior can we expect?

– The theory provides few answers.

• There has been a lot of research on repeated Prisoner Dilemma games to understand the best way to play as well as how people actually play. Of particular interest is Axelrod (1984). Axelrod had researches submit various strategies and had computers play them to see which ones performed the best. Tit-for-Tat strategies tended to perform the best the best.

Application: Cournot Duopoly with Repeated Play

• Who are the players?– Two Firms

• Who can do what when?– Firms Choose Output Each Period (qi

t for i = 1,2) to Infinity & Beyond

• Who knows what when?– Firms Know Output Choices for all Previous Periods

• How are players rewarded based on what they do?

–

ts

si

sj

si

sti qqqa

Stage Game Output & Profit

• Cournot Nash Equilibrium

– Output• q1

C = q2C = qC = a/3

– Profit 1

C = 2C = C = a2/9

• Collusive Monopoly Outcome – Output

• q1M = q2

M = qM = a/4

– Profit 1

M = 2M = M = a2/8

Is it possible to sustain the collusive Monopoly outcome as a subgame perfect Nash equilibrium with infinite repetition?

Consider the Strategy

• Period 1: qi1 = qM

• Period t > 1:– qi

t = qM if qit’ = qj

t’ = qM for t’ < t

– qit = qC otherwise

Lets check to see if this proposed strategy is a subgame perfect Nash equilibrium.

• To accomplish this, we need to show that the strategy is a Nash equilibrium in all possible subgames.

• Our task is simplified here by the fact that there are only two distinct types of subgames:– qi

t’ ≠ qM or qjt’ ≠ qM for some t’ < t

– qit’ = qj

t’ = qM for all t’ < t

First consider qit’ ≠ qM or qj

t’ ≠ qM for some t’ < t

• With this history, the proposed strategy says both players should choose qC.

• So, lets see what the optimal output in period t is for Firm i given Firm j will always choose qC.

ts

si

Csi

sti qqqa

0s

si

Csi

s

ts

si

Csi

tst

ti qqqaqqqaV

VqqqaqqqaqqqaV ti

Cti

s

si

Csi

sti

Cti

0

1

ti

Cti qqqa

V

01

2

Cti qqa

Cti qq

1max

ti

Cti

q

qqqaV

ti

01

23

Cti

C qqq

Firm i’s optimal strategy is to choose the Cournot outputjust like the proposed strategy says!

Now consider qit’ = qj

t’ = qM for some t’ < t

• With this history, the proposed strategy says both players should choose qM.

• So, lets see what the optimal output in period t is for Firm i given Firm j will always choose qM as long as Firm i chooses qM.

First, suppose that Firm i chooses qM in period t and forever after.

ts

MMMsti qqqa

1

2

4

2

0

M

s

MMMMs

ts

MMMtst

ti

q

qqqq

qqqaV

Now, suppose Firm i choose something other than qM in period t.

ts

si

Csi

sti

Mti

tti qqqaqqqa

0s

si

Csi

sti

Mtit

ti qqqaqqqaV

0s

si

Csi

s qqqa

1

2

0

Csi

s

Csi

s qqqqa

Recall that we have already solved the optimization problem for

which implies qis = qC for all s > t and

02 Mti qqa

1

max2C

ti

Mti

q

qqqqaV

ti

2

3 Msi

qq

22

22

9

16

14

914

9

MM

CM

qq

qqV

such that

Finally, Firm i will prefer to choose the Monopoly output forever after if

17

9

222

9

16

14

9

1

2 MMM

qqq

or

Therefore, if the discount rate is high enough, the proposed strategy will constitute a subgame perfect Nash equilibrium in this infinitely repeated game!

Is this the only subgame perfect Nash equilibrium?

• Hardly!

• One criticism of trigger strategies like our proposed strategy is that they do not permit cooperation to be reestablished.

• It is possible to find subgame perfect Nash equilibrium strategies that allow cooperation to be reestablished:– Period 1: qi

1 = qM.

– Period t > 1:• qi

t = qM if qjt – 1 = qM or qj

t – 1 = x

• x otherwise

• Though defining and proving such strategies are subgame perfect can be an arduous task!

repeated games apec 8205: applied game theory fall 2007

Documents

stage game

player i

strategy choice

vector of player payoffs

prisoners dilemma game

strategy space

time t

applied game theoryfall