repeated games apec 8205: applied game theory fall 2007
Post on 15-Jan-2016
249 views
TRANSCRIPT
Repeated Games
APEC 8205: Applied Game Theory
Fall 2007
Objectives
• Understand the Class of Repeated Games
• Understand Conditions Under Which Non-Nash Play Can be Sustained as a Subgame Perfect Nash Equilibrium when a Game is Repeated– Multiple Nash Equilibria
– Infinite Repetition
Why study repeated games?
• Many interactions in life are repeated.– Large retailers compete on a daily basis for customers.
– Dana and I compete on a daily basis to decide who will make dinner and who will pickup around the house.
– Mason and Spencer compete on a daily basis to see who gets to watch TV and who gets to play X-Box.
• What is of interest in these type of repeated interactions?– Can players achieve better results than might occur in a single shot game?
– Can players use the history of play to their advantage?
Some Terminology
• G: Stage game (usually thought of in normal form).– Players: i = 1,..,N
– ai Ai: Strategy space for player i.
– a = (a1,…,aN) A = i = 1NAi: Strategy profile for all players.
– ui(a): Player i’s payoff for strategy profile a.
– u(a) = (u1(a),…, uN(a)): Vector of player payoffs for strategy profile a.
• T: Number of times the stage game is repeated (could be infinite).
• ait Ai : Player i’s strategy choice at time t.
• at = (a1t,…,aNt) A = i = 1NAi: Strategy profile for all players at time t.
• ht = (a1,…,at-1) At = t’ = 1t-1A: History of play at time t.
• sit(ht) Ai: History dependent strategy.
• st(ht) = (s1t(ht), …, sNt(ht)) A: History dependent strategy profile.
• Ui(s1(h1),…, sT(hT)) = t=1Twitui(st(ht)): Player i’s payoff from the game.
• U(s1(h1),…, sT(hT)) = (U1(s1(h1),…, sT(hT)),…,UN(s1(h1),…, sT(hT))): Payoffs for all players.
Consider and Example
Suppose this Prisoner’s Dilemma game is played twice and that wit =1 for i = 1,2 and t = 1,2.
Player 2 C D
C 2 2
3 0
Player 1 D 0
3 1
1
Two Period Prisoner’s Dilemma ExampleIn Extensive Form
Player 1
Player 2
C D
C D C D
C D
C D C D
Player 2
Player 1 Player 1 Player 1 Player 1
C D
C D C D
Player 2
C D
C D C D
Player 2
C D
C D C D
Player 2
44
Player 1’s PayoffPlayer 2’s Payoff
25
52
33
25
06
33
14
52
33
60
41
33
14
41
22
Two Period Prisoner’s Dilemma ExampleAfter Solving Stage 2 Subgames
Player 1
Player 2
C D
C D C D
D
D
Player 2
Player 1 Player 1 Player 1 Player 1
D
D
Player 2
D
D
Player 2
D
D
Player 2
Player 1’s PayoffPlayer 2’s Payoff
33
14
41
22
Two Period Prisoner’s Dilemma ExampleAfter Solving Game As Whole
Player 1
Player 2
D
D
Player 1
D
D
Player 2
Player 1’s PayoffPlayer 2’s Payoff
22
Therefore, the subgame perfect strategies are (strategy choice in stage 1,
strategy choice in stage 2 given (D,D) in stage 1, strategy choice in stage 2 given (D,C) in stage 1, strategy choice in stage 2 given (C,D) in stage 1, strategy choice in stage 2 given (C,C) in stage 1)
=(D,D,D,D,D)
for both players.
So, what is the point?
• If the stage game of a finitely repeated game has a unique Nash equilibrium, then there is a unique subgame perfect equilibrium where that Nash equilibrium is played in every stage of the game!
• But what can happen if there is not a unique equilibrium?
• Or what if the stage game can be infinitely repeated?
What about multiple equilibria?
Consider this modified version of the Prisoner’s Dilemmaand assume T = 2 and wit = 1 for i = 1,2 and t = 1,2.
Player 2 L C R
U 4
4 5
2 0
0
Player 1
M 2
5 3
3 0
0
D 0
0 0
0 1
1
Starting with Period 2
• There are 9 possible histories for the 2nd period of this game:– (U,L), (U,C), (U,R), (M,L), (M,C), (M,R), (D,L), (D,C), and (D,R).
• For any subgame starting from one of these histories, there are two potential Nash equilibria: (M,C) or (D,R).
• Therefore, for an equilibrium strategy to be subgame perfect, it must have (M,C) or (R,D) in response to the history (x, y) for x = U, M, D and y = L, C, R in the first period.
Now Period 1
With these strategies the players’ payoffs for the game starting in period 1 are:
Player 2 L C R
U 7
7 6
3 1
1
Player 1
M 3
6 4
4 1
1
D 1
1 1
1 2
2
which yields a subgame perfect equilibrium with cooperative,Non-Nash stage game play in period 1!
Consider the strategies s12(h1) = M if h1 = (U,L) and D otherwise
& s22(h1) = C if h1 = (U,L) and R otherwise.
What about infinite repetition?
• First, two definitions:– Feasible Payoff: Any convex combination of the pure strategy profile payoffs.
i is feasible if i = sA s ui(s) where s 0 for s A and sA s = 1.
– Average Payoff : (1 - ) t = 1 t - 1 ui(ht) where 1 0 is the discount factor.
• Theorem (Friedman 1971): Let G be a finite, static game of complete information. Let (e1,…,eN) denote the payoffs from a Nash equilibrium of G, and let (x1,…,xN) denote any other feasible payoffs from G. If xi > ei for every i and if is sufficiently close to one, then there exists a subgame perfect Nash equilibrium of the infinitely repeated game G that achieves (x1,…,xN) as the average payoff.
• Often referred to as the Folk Theorem, but there are now lots of different versions of this Folk Theorem.
What does this result mean?
• In infinitely repeated games, we can get lots of subgame perfect equilibria.
• These equilibria can include actions in a stage game that are not Nash equilibrium actions for that stage game.
• You can get cooperative behavior in a Prisoner’s Dilemma!
Lets see what I mean.
Consider the Prisoner’s Dilemma
Player 2 C D
C 2 2
3 0
Player 1 D 0
3 1
1
Consider the strategy: Play C in Period 1,Play C in period t > 1 if at’ = (C, C) for all t’ t,Otherwise play D.
Can we find a discount rate such that this strategy is subgame perfect for this Prisoner’s Dilemma if it is repeated infinitely?
The answer to this question is yes!
• Suppose Player j is playing this type of strategy. At any point in time, Player j has either chosen D in the past in response to i’s choice of D or he has always chosen C because i has always chosen C. So, we must consider whether the strategy above is a best response for player i under both of these circumstances.
• If D has been chosen in the past, player j will always choose D in the future. What is optimal for i now will be optimal for i in the future due to infinite repetition.– Let VC & VD be the current value of playing strategy C & D.
– If C is optimal, i’s payoff from here on out will be VC = 0 + VC such that VC = 0.
– If D is optimal, i’s payoff from here on out will be VD = 1 + VD such that VD = 1/(1 - ).
– VD > VC, so D is optimal.
• If D has not been chosen in the past, player j will choose C in the immediate future and will continue to do so as long as i does. But if i chooses D, j will follow suit from here on out. Again, what is optimal for i now will be optimal for i in the future due to infinite repetition.– If C is optimal, i’s payoff from here on out will be VC = 2 + VC such that VC =
2/(1 - ).
– If D is optimal, i’s payoff from here on out will be VD = 3 + /(1 - ).
– VC >/=/< VD when >/=/< ½.
To summarize
• As long as > ½, this strategy will constitute a subgame perfect Nash strategy for the infinitely repeated Prisoner’s Dilemma.
• This type of strategy is often referred to as a trigger strategy.– Bad behavior on the part of one player triggers bad behavior on the part of his
opponent from here on after.
Are there other trigger strategies that can work?
YES!
General Trigger Strategy
• Define i
*: equilibrium payoffs (per stage)
iD: defection payoff
iP: punishment payoffs (Nash equilibrium payoff per stage)
• Assume iD > i
* > iP
• Cheating doesn’t pay when:
11
* PiD
ii
or Pi
Di
iDi
*
1
Are there other types of strategies that can work?
YES! LOTS MORE!
So what are we to make of all this?
• It does provide an explanation for cooperation in games where cooperation seems unlikely.
• However, the explanation tells us that almost anything is possible.– So, what type of behavior can we expect?
– The theory provides few answers.
• There has been a lot of research on repeated Prisoner Dilemma games to understand the best way to play as well as how people actually play. Of particular interest is Axelrod (1984). Axelrod had researches submit various strategies and had computers play them to see which ones performed the best. Tit-for-Tat strategies tended to perform the best the best.
Application: Cournot Duopoly with Repeated Play
• Who are the players?– Two Firms
• Who can do what when?– Firms Choose Output Each Period (qi
t for i = 1,2) to Infinity & Beyond
• Who knows what when?– Firms Know Output Choices for all Previous Periods
• How are players rewarded based on what they do?
–
ts
si
sj
si
sti qqqa
Stage Game Output & Profit
• Cournot Nash Equilibrium
– Output• q1
C = q2C = qC = a/3
– Profit 1
C = 2C = C = a2/9
• Collusive Monopoly Outcome – Output
• q1M = q2
M = qM = a/4
– Profit 1
M = 2M = M = a2/8
Is it possible to sustain the collusive Monopoly outcome as a subgame perfect Nash equilibrium with infinite repetition?
Consider the Strategy
• Period 1: qi1 = qM
• Period t > 1:– qi
t = qM if qit’ = qj
t’ = qM for t’ < t
– qit = qC otherwise
Lets check to see if this proposed strategy is a subgame perfect Nash equilibrium.
• To accomplish this, we need to show that the strategy is a Nash equilibrium in all possible subgames.
• Our task is simplified here by the fact that there are only two distinct types of subgames:– qi
t’ ≠ qM or qjt’ ≠ qM for some t’ < t
– qit’ = qj
t’ = qM for all t’ < t
First consider qit’ ≠ qM or qj
t’ ≠ qM for some t’ < t
• With this history, the proposed strategy says both players should choose qC.
• So, lets see what the optimal output in period t is for Firm i given Firm j will always choose qC.
ts
si
Csi
sti qqqa
0s
si
Csi
s
ts
si
Csi
tst
ti qqqaqqqaV
VqqqaqqqaqqqaV ti
Cti
s
si
Csi
sti
Cti
0
1
ti
Cti qqqa
V
01
2
Cti qqa
Cti qq
1max
ti
Cti
q
qqqaV
ti
01
23
Cti
C qqq
Firm i’s optimal strategy is to choose the Cournot outputjust like the proposed strategy says!
Now consider qit’ = qj
t’ = qM for some t’ < t
• With this history, the proposed strategy says both players should choose qM.
• So, lets see what the optimal output in period t is for Firm i given Firm j will always choose qM as long as Firm i chooses qM.
First, suppose that Firm i chooses qM in period t and forever after.
ts
MMMsti qqqa
1
2
4
2
0
M
s
MMMMs
ts
MMMtst
ti
q
qqqq
qqqaV
Now, suppose Firm i choose something other than qM in period t.
ts
si
Csi
sti
Mti
tti qqqaqqqa
0s
si
Csi
sti
Mtit
ti qqqaqqqaV
0s
si
Csi
s qqqa
1
2
0
Csi
s
Csi
s qqqqa
Recall that we have already solved the optimization problem for
which implies qis = qC for all s > t and
02 Mti qqa
1
max2C
ti
Mti
q
qqqqaV
ti
2
3 Msi
22
22
9
16
14
914
9
MM
CM
qqV
such that
Finally, Firm i will prefer to choose the Monopoly output forever after if
17
9
222
9
16
14
9
1
2 MMM
qqq
or
Therefore, if the discount rate is high enough, the proposed strategy will constitute a subgame perfect Nash equilibrium in this infinitely repeated game!
Is this the only subgame perfect Nash equilibrium?
• Hardly!
• One criticism of trigger strategies like our proposed strategy is that they do not permit cooperation to be reestablished.
• It is possible to find subgame perfect Nash equilibrium strategies that allow cooperation to be reestablished:– Period 1: qi
1 = qM.
– Period t > 1:• qi
t = qM if qjt – 1 = qM or qj
t – 1 = x
• x otherwise
• Though defining and proving such strategies are subgame perfect can be an arduous task!