mu ltiple o pp on ents and the l imits of r ep utation · mu ltiple o pp on ents and the l imits of...
TRANSCRIPT
Multiple Opponents and the Limits of Reputation
Sambuddha Ghosh!
January 30, 2009
Abstract
I consider a reputation game with perfect monitoring and multiple (two or more) long-lived
opponents indexed 1, 2..., n. Player 0, who attempts to build reputation, could be of one of
many types. The normal type of player 0 and players 1, 2, ..., n maximise the discounted sum
of stage-game utilities; non-normal types of 0 are committed to particular strategies. Let vmin0
be the minimum equilibrium payo! of player 0 in the limit when all players are patient and 0
is patient relative to the rest. The previous literature finds that for a single opponent (n = 1)
we have vmin0 ! L, where the lower bound L equals the maximum payo! feasible for 0 subject
to giving the opponent her minmax. In other words, 0 can appropriate ‘everything’. For
n > 1 , in contrast, I find an upper bound l such that vmin0 " l : Any payo! of the (complete
information) repeated game in which 0 gets more than l can be sustained even when 0 has
the twin advantages of relative patience and one-sided incomplete information. Typically l is
a low number and could be as low as 0’s minmax value. This result is robost to a very large
set of types — I start with the case where the type space may include any pure strategy of the
dynamic game; a later section extends the result to types that, in addition to playing arbitrarily
complex history-dependant actions, may be committed to mixing.
!I am grateful to Dilip Abreu for detailed comments and advice; Stephen Morris, without whose encouragementthis might never have taken shape; Faruk Gul and Eric Maskin, for their suggestions and guidance. Satoru Takahashiand Takuo Sugaya helped me check the proofs; discussions with Vinayak Tripathi sharpened my understanding ofthe literature. Several attendees at Princeton’s theory workshop & theory seminar, especially Wolfgang Pesendorfer,and at the Penn State Theory Seminar had valuable comments. Errors remain my responsibility.
1
1 Introduction and Literature Review
The literature on “reputation” starts with the premise that while we may be almost certain of
the payo!s and the structure of the game we cannot be absolutely certain. The small room for
doubt, when exploited by a very patient player( sometimes referred to as a long-run player, or
LR for short), leads to “reputation” being built. Finite horizon chain-store games of Kreps and
Wilson(82), and Milgrom and Roberts(82) introduced this idea. They show that arbitrarily small
amounts of incomplete information su"ce to give an incumbent monopolist a rational motive to
fight early entrants in a finitely repeated entry game even if in each stage it is better to acquiesce
once an entry has occured; in contrast the unique subgame-perfect equilibrium of the complete
information game is marked by entry followed by acquiescence in every period (this is Selten’s
well-known chain-store paradox ). Another paper by the above-named four explains cooperation in
the finitely repeated prisoner’s dilemma using slight incomplete information, although there is no
cooperation in the unique subgame perfect equilibrium of the corresponding complete-information
game. The first general result on reputation, due to Fudenberg and Levine (FL,1989), applies
to a long-lived player 0 playing an infinitely repeated simultaneous stage game against a myopic
opponent: As long as there is a positive probability of a type that plays the Stackelberg action1, a
su!ciently patient LR can approximate his Stackelberg payo!.
Work on reputation has not looked at games with multiple long-lived opponents who interact
with one another, not just with the long-run player; my paper investigates how far the results for
a single opponent extend to settings with multiple non-myopic opponents(i = 1, 2, ..., n;n > 1)
and one LR player(0), who is very patient relative to the rest of the players. I deal with a model
of perfect monitoring. The basic issue in the ’reputation’ literature is: How low can the payo!
of the LR player be when he has access to an appropriate set of types to mimic and is patient
relative to the other players? Fudenberg and Levine have shown that the answer to this question
is, in some sense, “very high” — When a patient player 0 faces a single myopic opponent, there is
a discontinuity in the (limiting) set of payo!s that can be supported as we go from the complete
information (c.i.) to the incomplete information(i.i.) game if we allow a rich enough space of types.
Even small ex-ante uncertainties are magnified in the limit, and the e!ect on equilibria is drastic.
Does the same message hold in variants of the basic framework?
Schmidt(Econometrica, 1994), Aoyagi (JET,1996), CFLP (Celentani, Fudenberg, Levine, Pe-
sendorfer; Econometrica, 1994) all consider only one non-myopic opponent. At the cost of some1The Stackelberg action is that action to which 0 should have liked to commit if he could. In other words, the
best response to it gives a higher payo! to player 0 than committing to any other action does.
2
additional notation let us be precise about the discount factors !0, !1, ..., !n. All three papers deal
with the limiting case where all players are patient but 0 is relatively patient, i.e. !i ! 1 "i > 0 and1!!01!!i
! 0 "i > 0. One standard justification of a di!erent discount factor for LR is that he is a large
player who plays with several opponents, who play relatively infrequently. My work will retain this
assumption. To see the importance of having a non-myopic opponent, recall FL’s argument for a
single myopic opponent: If the normal type mimics the type that plays the “Stackelberg action” in
every period, then eventually the opponent will play a best-response to the Stackelberg action if she
is myopic. Schmidt is the first to consider the case of one non-myopic opponent; he shows that this
natural modification introduces a twist in the tale: The opponent need not play a period-by-period
best response because she fears that she is facing a perverse type that plays like the commitment
type on the path but punishes severely o!-path. Since o!-path strategies are not learnt in any equi-
librium with perfect monitoring, this could lead to very weak lower bounds, ones lower than FL’s
bound in particular. Schmidt shows that the result in FL’89 extends only to “conflicting interest
games” — games where the reputation builder would like to commit to an action that minmaxes
the other player. Roughly this is because the SR player has no choice about how to respond —
she must play her static best response or get less than her minmax value, which is impossible in
equilibrium. Later Cripps, Schmidt, and Thomas (CST) consider arbitary stage-games and obtain
a tight bound that is strictly below the bound of FL. Their bound is “tight” in the sense that there
exist equilibria that give LR payo!s just above this bound.
The literature that follows argues that CST takes a somewhat bleaker view of reputation e!ects
than warranted. Aoyagi and CFLP both di!er from Schmidt and CST in that there is no issue
whether strategies can be learnt: Trembles in the first paper and imperfect monitoring in the second
ensure that all possible triggers are pressed and latent fears do not remain. When strategies are
eventually learnt reputation is once again a potent force. Here is their main result: As the (relatively
impatient) opponent also becomes patient in absolute terms, the payo! of the patient player tends
towards g""0 = max {v0|(v0, v1) # F "}2, where F " is the feasible and individually rational set of
payo!s; i.e. he gets the most that is consistent with individual rationality of the opponent, player
1. A final decisive step towards restoring a high lower bound was taken by Evans and Thomas (ET,
Econometrica, ’97); they showed that the weakness of the bound in CST relies on the assumption
that all irrational types impose punishments of bounded length; under limiting patience they extend
the FL result to the case of a single long-lived opponent playing an arbitrary simultaneous stage-2We adopt the convention that the generic value is v; w is the “worst” value; and b denotes the best. The
mnemonic advantages hopefully justify any break with practice.
3
game even under perfect monitoring. Assign a positive prior probability to a type that plays some
block of actions, expects a block of replies in return, and punishes for k periods the kth deviation
from the desired responses; then the lower bound, with a (su"ciently patient) long-lived opponent,
tends to the best feasible payo! for LR subject to the opponent getting her minmax payo!. ET
obtains the same bound g""0 as Aoyagi and CFLP. In terms of the framework used in the literature,
my work adds more opponents to Schmidt’s framework. To focus on a di!erent issue, I shall
introduce a signalling stage that abstracts from learning problems that could arise from perfect
monitoring. In a later section I show that this modification does not distort results, although it
does simplify greatly the description of the strategies.
The immediate generalisation of the bound obtained by Aoyagi, CFLP, and ET for n = 1 to
multiple (n > 1) opponents is g""0 = max {v0|(v0, v1, ..., vn) # F "} . One obvious case where this
holds is the one where the relatively patient player (0) is playing a series of independent games
with the other n players3. When players 1, ...n are myopic, i.e. their discount factors are 0, the
above bound follows readily from the analysis of FL ’89; if the other players are also patient the
bound derives from (an n-fold repetition) of the analysis of ET’97. However I find that this is not
true in general: The presence of additional opponents each with the ability to punish and play
out various repeated game strategies complicates the situation. Recall that the previous literature
obtains lower bounds and establishes that vmin0 $ L , where L is a large lower bound that equals
the Stackelberg payo! of 0 in FL’89 and is g""0 in the three other papers mentioned above. It
is interesting that, under some non-emptiness restrictions that rule out cases like the immediate
extension above, a world with multiple opponents gives an upper bound on how high a payo!
“reputation” can guarantee LR. In other words, I define a bound l such that all c.i. (i.e. without
the perturbed type space) equilibrium payo!s that give the LR player more than l can be sustained
in the limiting set of payo!s even under incomplete information, and even with all the CFLP and
the Schmidt-types; furthermore, sequential rationality is satisfied in the construction of the above
equilibria. Finally, l could be much lower than g""0 , and even as low as the minmax value of player
0. This means that while reputation is in general valuable to a player, its impact is less dramatic
when there are multiple opponents; above a certain value the perturbed game is no di!erent from
the original repeated game of complete information.
Fudenberg and Kreps(87) is the only paper to my knowledge that has multiple opponents playing
0, who is trying to build a reputation. Their framework is significantly di!erent from mine; in3By “independent” I mean that the payo! of any player i > 0 is independent of the actions of player j > 0, j "= i
. This game is no more than the concatenation of n games, each of which has player 0 as one of the two players.
4
particular, the SR players do not have a direct e!ect one another’s payo!s through actions. The
paper are more concerned with the e!ect of 0’s actions being observed publicly rather than privately
by each opponent when the basic “contest” is an entry deterrence game. At this point I should also
note the literature on games in which all players have equal discount factors. Cripps et. al. and
Chan have shown that, except in special cases, the reputation builder needs to be patient relative to
the others to be able to derive advantage from the incomplete information. My main result applies
equally well when all players have equal discount factors, although it contrasts most sharply with
the previous literature when player 0 is more patient than the rest, which is also the case that has
received the most attention. There is yet another strand of the reputation literarure, starting with
the work of Mailath and Samuelson in ??, that looks not at the payo! from building reputation,
but the actual belief of the opponent about the type of the reputation-builder. CMS show that, in a
world with imperfect monitoring, even where a patient player can get high payo!s in equilibrium by
mimicking a committed type, the single opponent will eventually believe with very high probability
that she is indeed facing a normal type; in other words, reputation disappears in the long run under
imperfect monitoring. This question will not be relevant in our context. The plan of the paper is
as follows. Section 2 has some motivating examples; section 3 lays out the formal details of the
model; the next deals with the upper bounds on the minimum equilibium payo!s of LR. I extend
my main result to additional situations of interest in section 5, although at the cost of increasing
complexity of the equilibrium strategies and the analysis. Section 6 proves analogous results when
there are types that explicity mix on the path. This calls for very di!erent techniques because,
unlike with oure strategy types, it is no longer the case that a deviation by the reputation-builder
is immediately detected; he could now conceivably get away with declaring one type and behaving
like yet another type, as long as he doesn’t give himself away.
2 Motivating Examples
This section presents a few simple examples that place this current paper in context. In each
example we start with a benchmark case, and in turn illustrate the implications of the theory of
repeated games, then of the existing literature on reputation, and finally that of the current paper.
They try to weave stories, admittedly simplistic ones, to give a sense of the results. The first
example is meant to be the simpler of the two; the second is best postponed until the reader has
acquired familiarity with the concepts and quantities introduced in the main model.
Example 1 (A Skeletal Oligopoly Example): A caveat: This example has special features that do
5
not appear in the proof but permit simpler analysis.
Consider a market of size 1 for a perishable good in which three firms 0, 1, 2 interact at times
t = 1, 2, ...,%. The demand curve is linear in the total output q:
p(q) : =! 1& q ; q # [0, 1]
0 ; q > 1.
I wish to consider a case where the products are slight imperfect substitutes. However to keep the
algebra to a minimum I use the shortcut of using a side-market. There is one small side-market
of size " shared by the SR firms 1 and 2 : p(q#) := " & q# ; q# # [0, "], where " is a very small
positive number and q# is the total output in the side-market. This ensures that the two firms 1
and 2 cannot be minmaxed unilaterally by 0; the formal model clarifies this point further. Let us
make another simplifying assumtion— fixed and marginal costs are 0. A firm’s output is chosen
from a compact convex set; although, strictly speaking, my result deals with finite action spaces,
an appropriately fine grid of actions can approximate arbitrarily closely the results that follow. For
the infinitely repeated game, firm i discounts profits at rate !i; we shall consider the case when 0
is relatively patient i.e. all !i’s are close to 1 but !0 is relatively closer. (For the main result with
multiple opponents it is enough that all players are patient, whether or not 0 is relatively patient;
however the results for n = 1 make critical use of the relative patience of 0.)
First we compute the following benchmark quantities for the main market. Suppose there is a
monopolist serving the main market. The monopolist’s per-period profit function is #0(q) = (1&q)q.
Calculate the following: ##(q) = 1&2q ; #!!(q) = &2 < 0. Therefore the maximum per-period profit
of the monopolist is #m = 1/4 at the output qm = 1/2. The maximum discounted and normalised
total profit is also 1/4 if the monopolist discounts using !0 # (0, 1). What is the Stackelberg profit
of firm 0 as leader and 1 as follower? First solve maxq1q1(1& q0 & q1) to get q"1 = 1!q02 ; the max
profit is maxq$[0,1]q01!q0
2 = 18 . The Cournot-Nash equilibrium will also be a useful benchmark. In
this equilibrium each firm produces 13 , leading to a price of 1
3 , and a profit of 19 for each of 0 and 1.
For use later in the example, note that in the side market of size " the monopoly profit is "2
4 , which
is obtained when the total output is "2 ; in the Cournot-Nash equiibrium each firm i = 1, 2 produces
"3 and earns a profit of "2
9 < "2
8 , one-half the monopoly profit.
For now suppose that 0 and 1 are the only firms present. The main market is then a Cournot
duopoly with perfect monitoring and, most importantly, complete information. The Nash-threats
folk theorem of Friedman shows that the point ( 18 , 1
8 ) obtained by a fair split of the monopoly profits
6
can be sustained in a equilibrium (indeed in a SPNE) if the players are patient enough because
the Cournot-Nash equilibrium features a strictly lower profit of 19 for each. The following trigger
strategy accomplishes it — each firm produces 14 , half the monopoly output; any deviation leads to
firms switching to the Cournot-Nash equilibrium forever.
Now consider the following thought experiment: Introduce one-sided incomplete information
about 0 through a type space #, which comprises the normal type $% and finitely many crazy types
$. The normal type discounts payo!s as usual. However each ‘crazy’ type $ '= $% is a choice
of output $#1 for t = 1 and for each t $ 1 a mapping $#
t+1(·) from the observed history of the
other players; given that the only opponent is player 1, $#t+1(·) maps from ht
1 := (q1(s))ts=1 to an
action/output. We start with the environment of FL : !1 = 0. If there is a positive probability
of a type that selects the Stackelberg output irrespective of history, 0 can guarantee himself very
close to 18 if he is patient enough : If he mimics this type, FL shows that player 1 must eventually
best respond with her “follower output” because she is myopic. Let us now make even player 1
patient, while 0 is patient relative to her: !1 ! 1, 1!!01!!1
! 0. Define the limiting payo! set as
V := lim!1&1 {lim!0&1V (!0, !1)}. Introduce a type of player 0 that produces almost the monopoly
output qm every period and if 1 produces more than some very small quantity, he punishes her by
flooding the market for k periods following the kth o!ence, and returns to producing almost qm
afterwards. It follows from the analysis of ET4 that if there is a positive probability of this type and
0 mimics him, player 1 can induce no more than a finite number of punishment rounds until she
finds it worthwhile to experiment and see if she can escape punishment by producing very little and
letting LR get almost monopoloy profit. In other words the limiting payo! set is V = ( 14 , "2
4 )! Thus
the combination of patience and reputation is enough for player-0 to extract the entire surplus —
in the limit he enjoys (almost) monopoly profits from the main market, while the other player gets
(almost) nothing in the main market and the small monopoly profit of "2
4 from the side-market.
Now introduce the third player, making the main market an oligopoly and the side-market a
duopoly; continue to assume the same type space #, keeping in mind that now $#t+1(·) maps from
(ht1, h
t2) := (q1(s), q2(s))t
s=1 to an action/output. The bound l that I refer to in the introduction can
be shown to be 0 in this example. Thus any payo! vector in the repeated game that gives player
0 more than 0 can be sustained. In particular, my results imply that as the perturbation becomes
small (i.e. the probability µ($%) of the normal type goes to 1), the limiting set of equilibrium
payo!s contains a point that gives each player an equal share of the monopoly profit #m from the4Once the formal model is in place I flesh out this argument in more general terms. For a fuller argument the
reader is referred to their paper.
7
main market. This point could not be sustained with only players 0 and 1, where we have already
reasoned that 0 gets arbitrarily close to his monopoly profit of 1/4 when he has enough patience
and crazy types to mimic.
Why the marked change in result? Here is a sketch of the argument. Add an announcement stage
to this game asking the patient player to declare his type, by sending a message m # #. Consider the
normal type $o of player 0. If he declares m = $o, then each firm produces qi(t) = 16 "i $ 0 "t $ 1
and makes a profit of 112 in the main market; in the side market play starts at the point M(see figure
below)— each SR firm produces half the monopoly output and makes half the monopoly profit, i.e
"2
8 each. Given that firms are patient, any deviation by 0 may be punished by a reversion to the
Cournot-Nash equilibrium(CNE) in the main market, in which each firm produces an output of 14 in
the main market and makes a profit of 116 ; deviations by i > 0 are punished by reverting to the CNE
in both markets. From the analysis of Friedman we already know that it is an SPNE. The trouble
is that $% would in general want to declare a di!erent type and mimic it. The following strategies
make it undesirable for him to mimic any other type: If m = $ '= $o, the others lock themselves into
a bad equilibrium %+($) as follows. At each t + 1 given any t-period history ht and announcement
$, players i > 0 are called upon to play some combination of actions/outputs (%+(t + 1)(ht, $)) so
as to eliminate all profits in the main market: $#(t + 1) (ht) +"
i>0 %i(t + 1)(ht, $) = 1; in the
side-market each firm i = 1, 2 produces "4 and earns a profit of "2
8 as before. If 0 deviates from his
announced strategy ever, he reveals himself to be the normal type and, after he is minmaxed to
wipe out any gains resulting from the above deviation, play moves to the symmetric cooperative
equilibrium sustained by Cournot-Nash reversion as in the c.i. game. The following figure will be
useful in clarifying the strategies following a deviation by any i > 0 following m '= $o.
8
A deviation by player 1 in either market at any point & impacts play in both markets. In the
main market the other SR player (2) minmaxes 1 by producing a total output of 1&$#(t+1) (ht, $)
for all subsequent periods t+1 = & +1, & +2, ...,% , while 1 is asked to produce 0. In the side market
play moves, starting from & + 1, to the point where firm 2 produces the Stackelberg output of 13"
while 1 responds with her follower output of 16"; this is the point R2 ( ( 1
12"2, 16"2) in the payo! space
(see figure). If 2 deviates, interchange the two players 1 and 2 in the above construction — play
moves to the point where firms 1 and 2 produce ( 13", 1
6") in the side-market leading to the payo!s
R1 ( ( 16"2, 1
12"2); now 2 is playing her best response to 1’s Stackelberg output. Any subsequent
deviation by i > 0 results in play moving to Rj ; j '= i. (In particular, if i '= j deviates from Rj then
play remains at Rj .) In our construction, the player who gets the lower profit in an asymmetric
point in the side-market is playing a best response. Player i does not deviate from Ri because it
results in a loss of profits equal to 16"2 in the side-market as play switches from Ri to Rj forever,
a su!cient deterrent given that firms are patient. Player j does not deviate from Ri because that
is strictly worse: She is anyway playing her unique best response to i’s output. Finally note that
player i punishes 0 according to the prescription of %+ because otherwise play transitions from M
to Rj in the side-market, resulting in lost profits worth 136"2. Here lies the key di!erence between
the single and multiple opponent cases: Now player 0 cannot unilaterally give player 1 a (small)
reward for producing a low output while he has a much larger share, for the other SR player 2 can
9
destroy these rewards. The ability of the SR players to inhibit rewards from LR and to punish each
other turns the tables on 0; the construction above translates this possibility into an equilibrium
where all players get low profits.We have thus answered the key question: Why does the patient
player not deviate and behave like some crazy type? As we reasoned, doing so would not guarantee
him anything at all whereas the proposed equilibrium o!ers him 112 . !
Example 2 (Oligopoly with Capacity Constraints) This example is best read after the reader
has seen the formal model.
Consider an oligopoly with capacity constraints. At each point of time t = 1, 2, ... there is a
market of size 1 for a perishable good ; to be specific the demand function is
p(q) : =! 1& q ; q # [0, 1]
0 ; q > 1.
There are four firms that serve the market— 0,1,2,3 with capacity constraints k"i = .5, .45, .45, .45
and zero marginal and fixed costs (adopted for simplicity). 0 is the reputation builder. First restrict
attention to the reputation game with n = 1. Player 1#s minmax is 116 . So in the c.i. repeated
game it is possible for 0 to get anything consistent with player 1 getting her minmax. In particular
0 can get #m & 116 = 3
16 . As I argued in example 1 above, player 0 can guarantee himself very close
to 316 in the i.i. game by mimicking a “crazy” type. The construction is the same as above; to keep
this example short I shall not mention the details of the type space. The assumption on discount
factors is also unchanged. Let us now compute the maxminmax of each firm i > 0 , say firm 1 .
This is the minmax value of i when 0 plays an action that is most favourable to i and all other try
to minmax her. Suppose that firm 0 produces an output of 0, while firms 2 and 3 produce .45 each.
The best firm 1 can achieve is obtained by solving:
maxq{1& (.9 + q)}q = .1 ) q & q2
The first order condition gives q" = 120 , which generates a profit of 1
20 ) ( 110 &
120 ) = 1
400 = Wi
, the maxminmax profit of i. This srtictly exceeds the minmax value wi = 0 of any i > 0 , when
all other firms flood the entire market; in other words, 0’s cooperation is needed to minmax any
i > 0. Now we need to check that the condition N introduced formally later holds; this requires us
to check that no matter what 0 does, the others can find an output vector giving each of them more
than Wi = 1400 . When 0’s output it low the other firms find it easier to attain any target level of
profit. So pick the worst slice of .5. We wish to find a symmetric point in this slice that gives each
10
player i > 0 strictly more than the maxminmax. Find the maximum symmetric point in the slice:
maxq (12& 3q) ) q * 1
2= 6q" * q" =
112
.
The associated level of profit for each firm i > 0 is ( 12 &
14 ) ) 1
12 = 148 > 1
400 . Therefore the non-
emptiness assumption N that I formally introduce later on is satisfied. Now we have to calculate
the lowest profit that 0 could get in any slice subject to the others getting above 1400 . Even without
exact and painstaking calculations it can be shown that this profit is very close to 1100 ; the argument
follows. Supose the other firms almost flood the market so that the price is 150 ; the maximum 0
could be producing is 0.5, earning a profit of 1100 . Each of the three remaining firms can produce
at least 18 , thereby making a profit of at least 1
8 )150 = 1
400 = Wi. Thus all collusive outcomes of
the repeated game that give 0 more than some very low number (below 1100 ) can be sustained even
in the repuational game. Recall that in contrast 0 can guarantee himself 316 +
1100 when only one
opponent is present, the presence of multiple opponents bringing about approximately a 20-fold
drop in the minimum assured profit of player 0.!
3 The Model: Reputation Against Multiple Opponents
There are n + 1 players— 0, 1, 2, ..., n ; 0 is the relatively patient, also referred to as long-run(LR),
player who attempts to build a reputation. We refer to players i > 0 as impatient 5 or SR (for
short-run) players even when their discount factors are strictly greater than 0. Let us first describe
the temporal structure of the complete-information (c.i.) repeated game, which is perturbed to
obtain the “reputational” or incomplete-information(i.i.) game. At each time t = 1, 2, ... the players
play a simultaneous stage-game
G = ,N = {0, 1, ..., n}, (Ai)ni=0 , (gi)n
i=0- .
Ai is the finite set of pure actions in each stage, and Ai := .(Ai) is the set of mixed actions 'i of
player i. An action profile a of all players is in A := /ni=0Ai ; the action vector a+ of the players
i > 0 lies in A+ := /i>0Ai. A comment on the use of subscripts is in order: Profiles are denoted
without a subscript; the ith element of a profile/vector is denoted by the same symbol but with
the subscript i ; the subscript + denotes all players i > 0 collectively. The payo! function of agent5The impatience is only relative.
11
i $ 0 is gi ; and g : A ! Rn+1. For any E 0 Rn, the projection of E onto the plane formed by
coordinates in J 0 {1, 2, ..., n} is denoted by EJ . For any E 0 Rk, define coE as the convex hull of
the set E. Let us recall the following definitions that are standard in the literature: The minmax
value of any player i $ 0 in G is wi. The feasible set of payo!s in G is F := co{g(a) : a # A} ,
using an RPD6(Random Public Device) available to agents; (t is the observed value of the RPD in
period t , and (t denotes the vector of all realised values from period 1 to period t. The strictly
individually rational set is F " := {v # F : vi > wi "i $ 0}. Monitoring is perfect.
Let us now add incomplete information to the repeated game, converting it into the reputational
game that we study in this paper. The players 1, 2, ..., n all maximise the discounted sum of per-
period payo!s; to cut down on notation and facilitate comparison with the literature on repeated
games I take a common discount factor : !i = ! "i > 07. However the patient player could be
one of many types $ # #. The prior on # is given by µ # .(#). The type-space # contains $%,
the normal type of the repeated game, who maximises the sum of per-period payo!s discounted by
!0 , the discount factor of agent-0. The other types are also expected utility maximisers but their
utility functions may not be represented as the discounted sums of stage-game payo!s; consider,
for example, the “strong monopolist” in the chain-store game of KW always prefers to fight entry.
The normal type $% might want to mimic these “crazy” types $ '= $% in order to secure a higher
payo!. As Fudenberg and Levine show, with !0 high enough the normal type of player 0 can get
away with the deception : If he is patient enough, mimicking a sutiable type guarantees him very
high payo!s. It is clear that the type space # must include suitable types for the normal type to
mimic, a feature captured in reputation papers by means of some form of full support assumption.
What types constitute a rich type-space? In order to investigate the maximal impact of reputation
I allow a rich perturbation that allows the crazy types to use strategies with infinite memory, as in
ET8. All strategies of bounded recall are subsumed in this set. To restate this more formally each6Given any payo! v in the convex hull of pure action payo!s, Sorin constucted a pure strategy without public
randomisation alternating over the extreme points so as to achieve exactly v when players are patient enough. Thisdoes not immediately allow us to get rid of the RPD because the construction of Sorin need not satisfy individualrationality — after some histories the continuation payo! could be below the IR level. Fudenberg and Maskinextended his arguments and showed that this could be done so that after any history all continuation payo!s liearbitrarily close to v ; if v is strictly individually rational so are the continuation values lying close enough. Takentogether these papers showed that a RPD is without loss of generality when players are patient. I continue to makethis assumtion in the interests of expositional clarity; a section extending the results of Sorin and FM91 to this gameis planned.
7Lehrer and Pauzner (Econometrica, 1999) look at repeated games with di!erential discount factors, and find thatthe possibility of temporal trade expands the feasible set beyond the feasible set of the stage-game. This creates noproblems for my results because the stage-game feasible set continues to remain feasible even if all !is are not equal .
8With two players, i.e. one opponent, it is enough (see Aoyagi and CFLP ) to consider all possible types with thefollowing bounded recall strategies — each crazy type with bounded recall " # N plays in each period an action thatdepends on the actions played by the other player in the past " periods.
12
type $ '= $% is identified with or defined by the following sequence
$1($) # A0,$t+1($) : At+ ! A0
, which gives an initial (t = 1) action and for each t > 1 maps any history of actions9 played by
players 1, 2...n into an action of player 0. I omit the time subscript when it is unlikely to cause
confusion. In what follows we fix (G, #, µ), where G is the stage-game, # = {$%, $1, ...,$K} is an
arbitrary finite set of types, and µ is the prior on #.
The dynamic game starts with an announcement phase at date-0 : The LR player sends a mes-
sage m # # , announcing his type. Then the usual repeated game with perfect monitoring is played
out over periods t = 1, 2, ...,%. Adding an annoucement makes this a (still rather complicated)
signalling game. This assumption is also used by Abreu and Pearce(2002,2007); that it permits
considerable expositional clarity will be seen from the complexity of the strategies and the analysis
when we later extend the main result without incorporating an announcement stage at time 0. As
with any other signalling game, the player is free to disclose a type m '= $, his true type.
In this framework an equilibrium comprises the following elements, defined recursively over the time
index:
(i) a messaging strategy m : # ! # for player 0 mapping his true type into the announced type;
(ii) for type $ of 0 a period-1 map %#(1) : # / {(1} ! A0 from the announcement space to the
period-1 action ; and for each t > 1 a map %#(t) : #/At!1 / {(t}! A010;
(iii) for each i > 0 a map %i(1) : #/{(1}! Ai , and for each t > 1 maps %i(t) : #/At!1/{(t}!
Ai ;
(iv) beliefs µ(.|ht!1) # .(#) are obtained by updating using Bayes’ rules wherever possible and
the t& 1-period history ht!1.
Additionally we stipulate that %#(t) is identically the strategy that is given by $ , if $ '= $o. This
in e!ect is a denifition of a “crazy” type. Let %# := {%#(t)}t'1; and let the collection of maps
{%#}#$! be denoted by %0. The strategy %i of player i is the collection of maps {%i(t)}t'0. In what
follows ui(•) refers to the discounted value of a strategy profile to player i , possibly contingent on
a certain history ; thus ui
#%|m, ht!1
$is the sum of the per-period payo!s of player i discounted
to the beginning of period t when players play in accordance with % after player 0 announces a
type m and the history of actions is ht!1. Even when m '= $% we refer to the payo! of the normal9Potentially player 0 could condition his play on the RPD upto period t + 1 , i.e. on #t+1 , in addition to at
+.This would not change our results or proofs; the notation is more involved though.
10Time over which actions are taken is numbered from 1 onwards, since period 0 is just an announcement phase.
13
type of player 0 whenever we use u0(•) ; since the “crazy” types do not have discounted payo!s,
we sometimes refer to this as the “dummy payo! to player 0” to be more accurate. We state the
following familiar definition.
Definition 1 : A tuple (m, %"0 , %"1 , ...,%"n, µ(.|ht!1)t>1) defines a Perfect Bayesian Equilibrium
(PBE) if no player has a strict unilateral deviation: For any i $ 0, we have
u0(%|m($%)) $ u0(%|$)"$ # # , and for any message m and any t& 1-period history ht!1 we also
have ui(%"|m, ht!1) $ ui(%i,%"!i|m, ht!1)"%i # %i. Furthermore, beliefs are updated using Bayes
rule wherever possible, as is the case in a PBE.
A subclass of equilibria are the ones in which there is truth-telling by the normal type of player-0:
Definition 2 : A truthtelling equilibrium is a triple (m, %, µ(.|ht!1)t>1) such that m($) = $ " $ #
# , and (m, %) is a PBE11.
With this notation in place let us state the result with a single long-lived opponent and perfect
monitoring, due to Evans and Thomas(ET). Player 0 is the reputation builder, while 1 is the normal
type opponent (n = 1). ET makes the following simplifying assumption:
Assumption PAM (Pure Action Minmax) : 0 can minmax 1 by playing a pure action.
As they point out, this assumption, while restrictive, is adopted for technical simplicity; otherwise
mixed strategies need to be learnt as in FL(92). Suppose we wish to approximate the best payo! g""0
for 0 to at most a margin " of error. Find a sequence of action profiles/pairs (a""0 (t), a""1 (t))t=1,...,T
such that the sum of 0’s payo!s over this block of T action pairs, i.e. 1T
"Tt=1 g0 (a""0 (t), a""1 (t))
, is within "/3 of g""0 , while 1T
"Tt=1 g1 (a""0 (t), a""1 (t)) is greater than 1’s minmax value. Let $
be the type that plays as follows: (a) 0 starts by playing a block of T actions (a""0 (t))Tt=1 ; (b) if
player 1 responds with the string (a""1 (t))Tt=1 , he repeates this block ; (c) when player 1 deviates
for the kth time from playing the role prescribed above, player 0 minmaxes her for k periods using
the pure strategy minmax from PAM above; (d) 0 returns to step (a) irrespective of what actions
player 1 responded with during punishment. The key feature is that $ metes out harsher punish-
ments as 1 continues to deviate. Before we state their result, let us define V (!0, !1) 0 R2 as the
set of Bayes-Nash equilibrium payo!s for discount factors !0 and !1 respectively. The associated
payo! set for player 0 only is given by the projection V0(!0, !1) 0 R of this set onto dimension
0. It might be useful to remind the reader of the following notation, which we introduced earlier
— g""0 is the maximum payo! of LR consistent with player 1 getting at least her minmax; i.e.
g""0 := max{v0 : (v0, v1) # F} , where F is the feasilble set of the 2-player c.i. game.11In other words, players 1, 2, ...n and type $" of player 0 do not want to deviate. For $" we need to ensure both
initial truthtelling and subsequent compliance.
14
Proposition 0 (Evans and Thomas, 1997, Econometrica) : Suppose PAM holds and µ($) >
0, i.e. the prior µ places a positive weight on $. Given " > 0 there exists a !min1 < 1 such that for
any !1 > !min1 , we have lim inf!0&1V (!0, !1) > g""0 & ".
Proof : See Evans and Thomas for details; a sketch follows. Fix " > 0 . This is the margin of error
we shall allow in approximating the best payo! g""0 .
Step 1: We have seen that the block of action profiles/pairs (a""0 (t), a""1 (t))t=1,...,T has the property
that 1T
"Tt=1 g0 (a""0 (t), a""1 (t)) , is within "/3 of g""0 , while 1
T
"Tt=1 g1 (a""0 (t), a""1 (t)) is greater
than 1’s minmax value.
Step 2 : Consider the type $ defined above. If µ($) > 0 , one available strategy of the normal type
of 0 is to declare and mimic $; the payo! from doing this is a lower bound on his equilibrium payo!.
Step3 : Apply Lemma 1 of ET (the Finite Surprises Property of FL 89) to show that if player 0
follows the above strategy at most a certain finite number of punishment phases can be triggered
without player 1 believing that with a high probability he is facing the type $. Also note that since
punishments get progressively tougher, the on-path play gets almost as bad as being mixmaxed
forever, whereas for a patient player 1 the discounted per-period payo! from (a""0 (t), a""1 (t))t=1,...,T
exceeds her minmax value. Together these two observations imply that in any BN eq. a patient
player 1 must eventually (after triggering enough punishments) find it worthwhile to experiment
with the actions (a""1 (t))t=1,...,T . Once she does so, by construction 0 gets a payo! which is within
"/3 of 1T
"Tt=1 g0 (a""0 (t), a""1 (t)), and therefore within 2"/3 of g""0 . Finally we re-use the fact that
0 is very patient — If 0 is relatively patient, losses sustained while mimicking $ cannot cost him
more than another "/3 in terms of payo!s. Thus mimicking the type $ assures player 0 payo!s
within " of g""0 . !
The upshot is that it is possible to secure very high payo!s for the normal type of 0 in the i.i.
game even when his opponent is also patient, as long as 0 is relatively patient and has the option
to mimic types that punish successive deviates with increasing harshness.
4 An Upper Bound
Here we start by introducing some notation that will prove useful later. For any choice of an action
a0 # A0 by player 0, “slice”-a0 refers to the induced game among the n normal-type players; actions
in A0 have a one-to-one relation with slices.
Definition 3 : The slice-a0 is formally the game G(a0) induced from G by replacing A0 by {a0}
15
, and restricting the domain of gi to {a0}/A+, i.e.
G(a0) :=< {0, 1, ..., n}; {a0}, (Ai)i>0; (gi)i'0 >
, such that on it domain of definition gi coincides with gi.
For each slice a0, define the slice-feasible set of payo!s as
F (a0) := co{g(a0, a+) : a+ # A+} 0 Rn+1
Notice that the set above is in the payo! space of n + 1 players although no more than n players
have non-trivial moves in any slice.
Defininition 4 : The conditional minmax of i > 0 in the slice-a0 as the minmax of i > 0
conditional on the slice G(a0) ; call it wi(a0).
This is defined exactly as in the usual theory once we replace the game among n + 1 players by a
slice played by n SR players. Just as the minmax wi of player i is delivered by the action profile
mi, we define the conditionally minmaxing punishment of player i > 0 in slice-a0 as mi(a0) # A+
such that
gi(a0, mi!i(a0), mi
i(a0)) = maxaigi(a0, mi!i(a0), ai) = wi(a0).
Now we come to the main result — Proposition 1 below shows an upper-bound on the minimum
payo! of player 0 accross all equilibria. Note that this is not an upper bound on the payo!s of
player-0 in the i.i. game: The best possible payo! that LR can obtain in the repeated game is
g""0 := max {v0 : (v0, v+) # F "for some v+ # Rn}; therefore it is an equilibrium payo! in any i.i.
game as well — 0 will play along if this desirable equilibrium is proposed because there is nothing
better he could possibly obtain. So the upper bound on his payo!s in both the c.i. and the i.i.
games is g""0 , a quantity that is di!erent from the bound l defined below — l is an upper bound on
the minimum equilibrium payo! of player 0 in BN equiibria. Putting this di!erently, if we pick a
v0 # (l, g""0 ) there exists an equilibrium in the i.i. game that gives 0 a payo! of v0. In what follows
the term “upper bound” should be understood in the sense above; the lax usage, I hope, will avoid
a tongue-twister without compromising clarity. I first define a new term, the maxminmax value ,
which is useful when we state and prove the bound. Recall definition 4: The conditional minmax
wi(a0) of i > 0 in the slice a0 is the minmax of i in slice G(a0).
Definition 5 : The maxminmax of a player i > 0 with respect to 0 is defined as Wi :=
maxa0$A0wi(a0), the maximum among all conditional minmaxes.
16
It is thus defined like the usual minmax but under the additional assumption that when others
(j '= i and j > 0) try to minmax i > 0 player-0 takes the actions that is best for i. In general it is
strictly greater than the minmax value, in as much as the others require the active cooperation of
player-0 to punish i most severely; the minmax value is wi12. Truncate the set F (a0) below Wi
to get
F(a0) := {v # F (a0) : vi $ Wi" i > 0} .
Both sets above are in the n + 1-Euclidean space —F (a0),F(a0) 0 Rn+1; the projection of F(a0)
onto the 0th coordinate is denoted by the subscript 0 , i.e. F0(a0) 0 R.
First, observe that the worst payo! for player-0 in the slice-a0 subject to player i getting above
her maxminmax is
w0(a0) := inf F0(a0) 1 min { v0 : ( v0, v+) # F(a0)} .
Now consider the maximum of these infima over all slices :
l := maxa0w0(a0) 1 maxa0inf F0(a0).
This is the maximum among the worst payo!s in each slice for player-0 subject to all others getting
their maxminmax. Define by B(W, r) the ball of radius r about the vector W := (W1, ...,Wn). We
now introduce a non-emptiness assumption13 :
Assumption N:%
a0$A0{F+(a0)}
%B(W, r) '= ! for all r > 0.
N (Non-emptiness) says that in all slices it is possible to get to a point close to the maxminmax
vector W for players i > 0. Returning to a model stated earlier, an undi!erentiated good monopoly
without capacity constraints does not satisfy N because the LR player can unilaterally minmax all
others. But in an oligopoly with a di!erentiated good this would be, in general, satisfied. N is both
intuitively plausible in a large class of games, and easy to state; furthermore, under N there exists
an upper bound l on what reputation can guarantee player-0 across Bayes-Nash equilibria and even
PBE of the game. We shall have l < g""0 .
Let V ci(!) be the set of equilibrium14 payo!s of the c.i. repeated game defined by G. The12wi := minA0wi(%0). $ minA"imaxAiui(%#i, ai)13This stronger version (N) may be further weakened to requiring the intersection
Ta0$A0
{F+(a0)} to have anon-empty interior. An analogous characterisation would work once we refine some of the quantities, but I adoptthe slightly stronger N because it contributes to expositional clarity and is satisfied in the illustrative examples.
14What concept of equilibrium do we use? When we use NE for the c.i. game the corresponding i.i. game usesBNE; and when we use SPNE for the c.i. game, the right notion of equilibrium for the i.i. game is PBE. The contextwill make it clear which one of the two we adopt.
17
limiting payo! set in the common discount factor ! is V ci := lim!&1V ci(!) . Similarly V (!0, !)
is the equilibrium payo! set of the i.i. game when 0#s discount factor is !0 and the others have a
common discount factor !. Let us be precise about how the limiting payo! set is defined for the
reputational game :
V := lim!&1lim!0&1V (!0, !)
As in Schmidt, Aoyagi, and CFLP, the order of limits in defining V is important— player 0 is
always patient relative to the others regardless of how patient they are.
Before proceeding further we introduce a full-dimensionality assumption, which first appeared in
FM:
Assumption FD (Full Dimensionality): The set F has dimension n + 1.
We now state and prove a lemma that will prove useful in proving the bound in Proposition 1 below.
In every slice we find an action profile )(a0) for the SR players such that each i > 0 gets more than
her Wi and 0 gets either les than or very close to w0(a0) when (a0, )(a0)) is played.
Lemma 1 (Threat Points Lemma): Fix " > 0. For each slice a0 there exists a probabil-
ity distribution over action profiles )(a0) # .A+ such that gi(a0, )(a0)) > Wi " i > 0 and
g0(a0, )(a0)) < w0(a0) + " 2 l + ". (NB: " is the same for all slices. )
Proof: see appendix
Proposition 1 below constructs an equilibrium of the i.i. game in which $% can be held down
to payo!s arbitrarily close to l. Before I formally state the result I should remind readers that V
is the limit of the PBE payo!s of the i.i. game as !i ! 1 and 1!!01!!i
! 0 for all i > 0; and vmin0
denotes the infimum of 0’s payo!s in V . The proof below assumes that mixing is ex-post observable,
this assumption being required to punish the SR players. Equivalently one could assume that each
i > 0 can be minmaxed in pure strategies. This is a technically convenient assumption and does not
a!ect the crux of the argument. In a later section I show how the argument of FM86 may be used
to dispose of the assumption that mixing by players i > 0 is observable even ex-post. Suppose we
want to give player 0 a payo! of v0 = l + ", for any given " > 0 small enough. The proof hinges on
our ability to force the normal and relatively very patient player-0 to reveal himself at the signalling
stage, for if he does so it is immediate that we can merely play out the repeated game equilibrium
%ci and give him u0(%ci) = v0. I construct a PBE that gives 0 less than v0 if he is $% but announces
m '= $%. The next figure should clarify the preceeding lemma and the proof of the first proposition.
18
Proposition 1: Under assumptions N and FD and (ex-post) observable mixed strategies, vmin0 2 l.
Proof: Fix a small " > 0. We assert that the following strategies are part of a PBE giving $0 a
payo! of v0 := l + ". Let %ci denote the equilibrium(SPNE) strategy profile of the c.i. repeated
game that gives $% a payo! of v0. If m($%) = $% then there is no problem in giving 0 a payo! of
v0. Recall that the “commitment” strategy $t(m) maps from At!1+ into slices. Lemma TPL above
asserts that in each slice we can construct a “threat point” )(·) giving 0 no more than l + " and
player i > 0 at least *i , which exceeds Wi. Given any announcement m '= $% by player-0 we
shall specify the strategy %+(.|m) of the players i > 0 to play the action )($1(m)) in period 1 and
)($t(m)(ht!1+ )) in all periods t > 1 until a deviation. Notice that 0 cannot profit by behaving like
some $ '= $% if all players i > 0 play in accordance with the strategy %+.
Pick + # B(W, r) such that + # F (a0)" a0 and r is small enough15 so that +i < *i. (This step
is possible since N holds.) Fix & > 0 such that *i > +i + & > +i > Wi. For each j > 0 define the
n-dimensional vector +(j) by +j(j) = +j and +i(j) = +i + & < *i "i '= j. The quantity & is a small
reward that is given to all the SR players who carry out a punishment against another player.
If the announced type in stage 0 is $%, then play the repeated game strategies %ci that deliver
the payo! l + " to player 0. If $ '= $% start Phase I, where the prescribed play at t conditional on15Any r < min1,2,...,n{&i %Wi} su"ces.
19
the t & 1-period history ht!1 is )($t(m)(ht!1+ )), the mixed action profile that gives player 0 very
close to lowest possible payo! in the slice that the announced crazy type would play. Fix a slice a0.
Denote gj(a0, mi(a0)) = wij(a0) as the payo! to player j '= i when i is being conditionally minmaxed
in the slice a0. Suppose player 0 has never deviated from his announced crazy strategy in the past
and that player i > 0 deviates from the prescribed path at time & ; then play enters Phase II(i) ,
where player i is conditionally minmaxed in slice $t(m)(ht!1+ ) at time t = & + 1, ..., & + P , where
the length P of the punishment satisfies the following inequality:
PWi + max gi(a) < P+i + min gi(a)"i > 0....())
This condition is always satisfied for some large enough integer P since Wi < +i. Once Phase II(i)
is over play moves into Phase III(i), where the play is such that +(i) is delivered at t = & + P +
1, & +P +2, ...,% ; recall that the ith component of the vector +(i) is +i , and all other components
j > 0 are +j + &. So +(i) incorporates a small reward of & for each SR player j other than the last
player(i) to be punished. If player j unilaterally deviates from Phase II(i) or Phase III(i) then
impose Phase IIj followed by Phase IIIj , and so on.
Conditional on an annoucement m = $ '= $%, players believe that they are indeed facing type $
until 0 deviates from $(m) and reveals himself to be the normal type. When 0 deviates he may be
minmaxed as in the repeated game. If some i > 0 deviates after 0 has deviated in the past, then
we are in the c.i. game — minmaxing i for P periods (see * above ) provides a su"cient deterrrent.
Let us see more precisely why the above is an equilibrium that is sequentially rational. The
argument runs similar to FM(86). Suppose m = $ '= $% has been announced, and check that every
player i > 0’s strategy is unimprovable holding the strategy of the others to be fixed. Suppose that
he deviates when 0 has never deviated before. Then if he is patient enough his payo! is close to
+i < *i , the payo! if he does not deviate.
Step 1: i > 0 does not deviate from Phase II(j) or Phase III(j) because he ends up with +i rather
than +i + &.
Step 2: i > 0 does not deviate from Phase II(i) because he is anyway playing his best response;
anything else gives a lower payo! in the current period and also prolongs the punishment; devi-
atiing from Phase III(i) is also not profitable because inequality ()) ensures that restarting the
punishment is costly for i.
Step 3: We now reason that $% does not deviate. The normal type annouces $% truthfully because
his payo! is l+ " when he announces truthfully and sticks to the equilibrium, whereas it is less than
20
l + " if he either announces anything else and faithfully mimics that type or if he announces $ '= $%
and does not play like the announced type. The moment player 0 deviates from his announced
strategy he reveals himself as the normal type; he is then minmaxed long enough to wipe out all
1-period gains from cheating and then we play out a repeated game strategy that gives player 0
close to his minmax. Players i > 0 minmax him as they would in the repeated game equilibrium
from which there is no profitable deviation; players i > 0 do not deviate from this part of the
proposed equilibrium using the same argument as in a pure repeated game. !
To summarise, if 0 declares a type other than $% the construction of the equilibrium keeps track
not only of what punishment, if any, is ongoing but also what the next action of player 0 is. If the
normal type is declared or revealed later through an o!-path deviation then plays reverts to the
usual c.i. repeated game strategies after all gains from cheating have been wiped out. The key to
the construction is to instruct all i > 0 to play a action profile (that resolves into a pure action
contingent on the RPD) that gives the normal type of player 0, i.e. type $%, the lowest payo!
in the slice that he is called upon to play as part of his declared type. Proposition 1 shows that,
even while respecting sequential rationality, we can impose upper bounds on what reputation can
achieve by delivering a payo! of no more than l+ " to the LR player while giving the others at least
their maxminmax value Wi. Proposition 1 below shows that if (v0, v+) # Rn+1 is an equilibrium
payo! vector in limiting set V ci under c.i. and v0 > l , then for any given pair (#, µ) the limiting
equilibrium payo! set V of the reputational games contains a n + 1-dimensional vector that gives
player-0 the value v0. Now we can restate the content of Proposition 1 in alternative notation as
[v0 # V ci0 and v0 > l ] * v0 # V0.
The next extends this result to include all payo! vectors in which 0 gets strictly more than l and
all others get more than their minmax wi rather than the maxminmax Wi ; it is thus a quasi
folk-theorem result.
Proposition 2 : Under N , any payo! v # F " that gives the LR player strictly more than l can be
sustained ; i.e.
[(v0, v+) # V CI & v0 > l ] * (v0, v+) # limµ(##)&1V.
Proof : Fix a probability µ($%) of the normal type. Consider an equilibrium %ci of the c.i. repeated
game in which the LR player gets u0(%ci) > l . Using %ci we construct a truthtelling equilibrium %
21
of the reputational game in which player 0 gets u0(%ci) conditional on any draw by nature. If the
announced type is the normal type , then play the eq. of the repeated game with the usual checks
and balances. If m '= $% then play the equilibrium outlined in Proposition 2 , giving the normal
type less than {l + u0(%ci)}/2. ( Take " of Proposition 1 as any positive no. below {u0(%ci)& l}/2.
) Our proposition now follows when we note that in this equilibrium ,independently of µ, the other
players get v+ if nature chooses the normal type of player-0 ; as this prior probability goes to 1,
their payo! vector tends to v+ . !
5 Some Extensions
Unobservable Mixed Strategies
As mention earlier a crucial assumption for the above proof is that the probabilites of mixing by
any player are ex-post perfectly observable. While this might be a reasonable assumption in some
circumstances, one can equally easily come up with ones where this is less natural. This creates a
problem during the punishment phases of players i > 0. Note that punishing player 0 for declaring
a non-normal type does not make use of the observability of mixed strategies, but punishing any
other player does: If player j '= i, 0 is not indi!erent between the myopic payo!s of all the actions
in the support of mij , then j will not mix with the desired probabilities in Phase II(i) ; deviations
to actions outside the support of mij are readily detected and deterred in the usual way. While the
minmax in the class of pure strategy punishments is immune to this complication, it is in general
less severe than the mixed-action minmax and reduces the payo!s that may be supported. As
noted by FM(86), the trick is to make each player j > 0, j '= i indi!erent between all actions in the
support of mij by adjusting the continuation values at the end of the minmax phase depending on
the realised sequence of actions of player j during the minmax phase.
Proposition 3 : Fix " > 0. Under N and FD and unobservable mixing by i > 0, there exists a
payo! of the incomplete information game in which player 0 gets l + " and any i > 0 gets at least
Wi.
Proof 16: Fix " > 0. Set v0 := l+" as the target payo!. We shall construct strategies that are part of
a sequentially rational equilibrium. Pick + as before and & > 0 such that W < + < + +&!& < (v0, *)
,17 where &!& denotes a vector with each element as &, and * = (*1, ...,*n). (This is the same16To prevent cluttering of notation we do not make explicit the presence of the RPD in our proofs.17Recall our convention for vector inequalities: x < y & xi < yi'i ; x ( y & xi ( yi'i and x "= y ; x ! y & xi (
yi'i.
22
construction as in Proposition 1.) If 0 declares himself as normal type( i.e m = $%) , play the (c.i.)
repeated games equilibrium %c.i. giving l + " to 0 , and at least Wi to all others. Otherwise start
play in phase I. In describing the phases below we will need to repeatedly use terms of the form zij
, the sum of realised utilities of player j during Phase II(i) discounted by the factor 1!!!P . After
player i has been minmxed or conditionally minmaxed, play transitions to a point adjusted by the
quantities zij . Also note that for high !s the magnitude of zi
j cannot exceed &/3; this ensures that
no continuation value below goes outside the set.
Phase I: Play action n-tuple giving a payo! of l + " to 0, this action vector being dependent on the
announced type and the history, i.e. in period t play the action n-tuple )($(ht!1+ )), where ht!1
+ is
the history of play by the normal types of 1, 2, ...n upto and including period t& 1 and )(a0) # A+
gives 0 no more than l + " in slice a0. If player 0 deviates from his announced strategy during any
phase, go to Phase II(0). If player i > 0 deviates unilaterally from any phase and Phase II(0)
has never been triggered, then switch to Phase II(i); if player 0 has deviated from his announced
strategy in the past, then switch to Phase III(i).
Phase II(0): Minmax 0 for P periods, where P is defined as before by ()). Then go to Phase
V (0, z01 , z0
2 , ...z0n). Note that play transitions to a stage that is indexed by the actual payo!s that
are earned by each of the normal type players during the minmaxing phase. Since we are in phase
II it implies that player 0 has so far stuck to his announced strategy and there is no need to worry
about payo!s of 0 when i > 0 is being punished.
Phase II(i): Conditionally minmax i for P periods. Then go to Phase IV (0, zi1, z
i2, ...z
in).
Phase III(i): Minmax i for P periods. Denote the discounted sum of realised utilities of player j
for Phase III(i) as zij . Then go to Phase V (i, zi
1, zi2, ...z
in) .
Phase IV (i, zi1, z
i2, ...z
in): Play the action tuple that gives players 1,2...,n the payo! vector (+1 +
&/2& zi1, ..., +i, ..., +n + &/2& zi
n). Since i is the last player to deviate note that she does not get
the reward &/2 .
Phase V (i, zi0, z
i1, z
i2, ...z
in): Play action vector that gives (+0, +1+&/2&zi
1, ..., +i, ..., +n+&/2&zin)18
.
From the standard folk theorem argument it follows immediately that the proposed strategies
constitute a sequentially rational equilibrium following the announcement m = $%. All that remains
is to verify that the play following m '= $% is also sequentially rational, and that the normal type
of player 0 has an incentive to tell the truth. This is done in a number of steps, checking that there18Note that play transitions to a stage that is indexed by the actual payo!s of all players including 0 during the
minmaxing phase. Since we are in phase III it implies that player 0 has revealed himself as the normal type and hispayo!s also need to be adjusted to ensure that he mixes with the appropriate probabilities.
23
is no history such that a unilateral one-step deviation by any j $ 0 at the chosen history gives j
strictly greater utility in the continuation game than the proposed equilibrium strategies. That this
su"ces folows from the definition of NE and well known results in dynamic programming. We first
check the incentive constraints for the SR players and then for LR.
Step 1: i > 0 has no incentive to deviate (always one-step and unilateral in what follows) from
Phase I .
If i deviates the maximum gain is (1& !)bi + !(1& !P )wi + !P+1+i , which is less than vi for ! close
to 1 because it converges to +i in the limit and by construction +i < vi.
Step 2: i > 0 has no incentive to deviate from Phase II(j) , where j '= i.
If i deviates to an action outside the support then i’s per-period payo! in the game converges to
+i < +i + &/2 in the long run. Thus she does not get the reward &/2 , which is given for carrying
out the punishment. Given the definition of zji , player i’s utility is independent of the probabilities
of mixing.
Step 3 : i has no incentive to deviate from Phase II(i).
If i deviates to an action outside the support, he not only plays a suboptimal response in the current
period but also re-starts the punishment; this lowers the current and future utility stream.
Step 4 : i > 0 has no incentive to deviate from Phase III(j) , where j '= i.
Step 5 : i has no incentive to deviate from Phase III(i).
If i deviates the maximum gain is (1& !)bi + !(1& !P )wi + !P+1+i. The payo! from conformity to
the equilibrium is +i. Thus a su!cient condition to rule out any profitable deviations is
(1& !)bi + !(1& !P )wi + !P+1+i < +i
As ! ! 1 the LHS converges to the RHS. But rearranging we get that the above is equivalent to
bi + !(1 + ... + !P )wi < (1 + ... + !P )+i
As ! ! 1 the LHS converges to bi + Pwi < (P + 1)+i , which is what the RHS converges to. This
holds from the definition of P .
!
24
Long-run Player Moves First
This section extends the results of the previous section to the case where the long-run player moves
first. Given any simultaneous stage game G as above, define the extensive form stage-game Gseq in
which player 0 moves first and players 1,2,...n move simultanously after observing the action chosen
by player 0. There is the obvious and natural one-to-one mapping from the set of action n + 1-
tuples of G to the set of terminal nodes of Gseq ; use that to define utilites for Gseq .
Corollary 4 : Even without announcements there exist sequentially rational equilibria giving (nor-
mal )player 0 a payo! of l + " when the stage game is of the form Gseq and N holds.
Proof: same as in Proposition 1 above.
Reputation without Announcements ( A Sketch )
The earlier framework poses the problem as a signalling game, where 0 is first given the opportunity
to state his type. While adding an annoucement phase is a natural modification, one might still
wonder if the result goes through without announcements19.Fortunately one can show that the main
result extends, although at the cost of added complexity of the strategies.20.
First, a few comments on the intuition and the strategy of proof of proposition 5 , which is the
counterpart of Proposition 1 when announcements are not available. Consider a strategy %ci of
players 0, 1, ..., n in the repeated game giving player 0 a payo! of l + " and any i > 0 at least Wi.
We shall now construct an eq. of the i.i. game in which $% plays according to %ci0 . Consider the
following strategy in the i.i. game. Players start by playing according to %ci; as soon as 0 deviates
from %ci0 the prior µ is appropriately modified. Based on the updated beliefs find the type of player
0 that is most likely the eq. strategy the next action played is intended to keep 0 down to l + ".
It is important to note that if |supp(µ)| = K , then along any path there can be at most K points
where not all types that are currently in the support are expected to not behave identically. This
is the content of Lemma 2. At each such point at least one type is eliminated from the support of
the posterior derived from µ by Bayesian updating. Thus there can be no more than K mistakes in
predicting the strategy of 0. More than K mistakes is a 0-probability event under the equilibrum
strategy and players 1, 2..., n (rightly) believe that they are facing a normal type of player-0 who
has deviated. Although this means that normal type of player 0 cannot gain by deviating at more
than K points, this is not enough for my proof to go through. There are two potention problems19The results on the previous section titled“Long-run Player Moves First”goes through even without annoucements.
As we reasoned above, annoucements play no role when 0’s action is visible before the normal players 1,2,...n move.20The critical value of the discount factor required for the proof to work would in general be dependent on the
cardinality of the type-space, but the limiting value is independent of the fine details.
25
with this line of argument. First, since each deviation gives player 0 a one-shot gain, it might be
tempting for a normal-0 to deviate at points of time that are very far apart. It might take so long
for his true type to be discovered that punishments could become irrelevant from the point of view
of the current period. Since he can choose any strategy he wishes, there is nothing to prevent him
from deviating at points of time very far apart. To prevent this we apply a “real-time” correction.
When a mistake has been made in predicting the strategy of 0, the others are asked to apply an
immediate correction by minmaxing 0 to wipe out the gains that a normal type of 0 would have
accumulated by that deviation. This need not be viewed as a punishment but as a corrective. The
second question is : Why would i > 0 stick to their assigned role? The argument is the same as in
Proposition 1 above as soon as we note that if i deviates she can escape punishment, and perhaps
even get good payo!s , for at most K periods; if ! is high enough , then punishments cannot be
avoided for too long — conditionally minmaxing her is enough. Before we state lemma 2 formally,
we recall the following — µ(•|ht!1, %) denotes the posterior at the start of the period t derived from
the prior µ , conditional on the history of play upto period t & 1 and the equilibrium strategy %.
For convenience we sometimes denote the above by µt!1.
Lemma 2 (Finite Non-identical Play Property ) : Suppose $% /# supp(µT ), for some
T > 0. Along any path of play, equilibrium or otherwise, there can be at most |#& {$%}| = K & 1
periods after T where all types of player 0 that are still in the support of the posterior beliefs do not
play the same action; consequently in at most K periods can the normal type of player 0 get more
than l + ".
Proof: Let A(µt!1) denote all actions that have positive probability of being played in period
t by player 0 given µt!1. Note that for all periods suppA(µt!1) 2 supp(µt!1) : At most each crazy
type plays a di!erent action. Suppose that suppA(µt!1) > 1. If at0 /# suppA(µt!1) , then Bayes
rule imposes no restrictions on how µt is to be derived from µt!1 ; we set µt($%) = 1. Otherwise,
if at0 # suppA(µt!1) , we must have supp(µt) 2 supp(µt!1) & 1, since all types in the support of
µ(t&1) that were expected to play the action at0 may now be dropped from the support of µ(t). For
any path of play where suppA(µt) > 1 for t = t1, t2, ..., tK , we must have suppA(µs) = 1 "s > tK .
!
Proposition 5 below formalises the argument in the above paragraph using lemma 2 (FNPP).
The proof continues to employ the more stringent requirement that, while the RPD is available,
the mixing is not observable even ex-post. One could avoid the ungainly arguments by assuming
that mixing is observable.
Proposition 5 (Upper Bound Without Annoucements) Fix " > 0. Under assumption N
26
and FD there exists a payo! of the I .I. game in which player 0 gets l + " and any i > 0 gets at
least Wi.
Proof : Set v s.t. v0 := l + " is the target payo!. Pick v# and & > 0 such that W < v# <
v# +&!& < (v0, *) . In what follows a “deviation” by player 0 in period t from a strategy profile % is
taken to mean that in period t player 0 plays an action at0 /# supp(µ(ht!i|%)). Beliefs are updated
using Bayes’ Rule following an event that has positive probability; following a 0-probability event
under % , the posterior puts probability 1 on the normal type of player 0 as in lemma 2 above. The
play proceeds according to the following phases. If in period t we have A(µt!1) = {ao} then denote
by 't(µt!1) the n-tuple of actions that delivers l + " in the slice ao. Otherwise choose ao as the
action on which µt!1 places the higest probability. (Indeed we could just as well choose this using
any other deterministic algorithm.)
Phase I: In period t play the action n-tuple 't(µt!1) , this action vector being dependent on the
announced type and the history through µt!1, if it is the case that at!10 = A(µt!1) i.e. if the other
players correctly anticipated the action of player 0. If at0 '= A(µt!1) but at
0 # supp(µt!1) then
switch to Phase Adjust. If player 0 deviates from his announced strategy during any phase (this
includes the case at0 /# supp(µt!1) ) , go to Phase II(0). If player i > 0 deviates unilaterally from
any phase and Phase V has never been triggered, then switch to Phase II(i) ; if player 0 has never
deviated from his announced strategy and Phase V has been triggered in the past, then switch to
Phase III(i).
Phase Adjust : Minmax 0 for P periods, and transition to Phase I after readjusting the target
payo!s of all players i > 0 to take into account the actual utilities earned by the players during the
minmax phase. The following makes it possible to carry out this proof: If all players are patient
enough the possible variation in utility of any player i $ 0 during this stage cannot be more than
&/2K.
Phase II(0) : Minmax 0 for P periods, where P is the same as before. Otherwise denote the dis-
counted sum of realised utilities of player j for Phase II(0) as z0j . Then go to Phase V (0, z0
1 , z02 , ...z0
n).
Phase II(i) : Conditionally minmax i for P periods, where P is the same as before. Denote the dis-
counted sum of realised utilities of player j for Phase II(i) as zij . Then go to Phase IV (0, zi
1, zi2, ...z
in)
. Note that play transitions to a stage that is indexed by the actual payo!s that are earned by each
of the normal type players during the minmaxing phase. Since we are in phase II it implies that
player 0 has so far stuck to his announced startegy and there is no need to worry about is payo!s
when i > 0 is being punished.
Phase III(i) : Minmax i for P periods, where P is the same as before. Denote the discounted sum
27
of realised utilities of player j for Phase III(i) as zij . Then go to Phase V (i, zi
1, zi2, ...z
in) .
Phase IV (i, zi1, z
i2, ...z
in) : Play the action tuple that gives players 1,2...,n the payo! vector21
(v#1 + &/3& zi1, ..., v
#i & zi
i , ..., v#n + &/3& zi
n).
Phase V (i, zi0, z
i1, z
i2, ...z
in): Play action vector that gives (v#0, v#1 +&/3&zi
1, ..., v#i&zi
i , ..., v#n +&/3&
zin)22 .
This may be verified to be a sequentially rational equilibrium. !
6 Behavioural Types that Mix
All previous results are proved for a type space # = {$%, $1, ...,$K}, where the behavioural types
$1, ...,$K are committed to pure dynamic game strategies, i.e. each type $ # # & {$%} plays in
each period t = 1, 2, ... a unique action $t($)(ht!1) # A0 conditional on the history ht!1+ of action
profiles of players 1, 2, ..., n. Each type $ '= $% is thus identified with or defined by the following
sequence of mappings
$1($) # A0,$t($) : At!1+ ! A0
, which specifies an initial action (for t = 1) and for each t > 1 maps any history of actions
played by players 1, 2, ..., n into an action of player 0. This notion of a type as a pure (rather than
mixed) strategy of the dynamic game is important both for the exact numerical bound and, more
substantively, the technique of proof: At each t, given the history ht!1, the pure action $t($)(ht!1)
that 0 plays is known to the other players 1, 2, ..., n; this enables them to not only mete out a
suitably tailored punishment, but also detect deviations by the reputation builder as soon as they
occur. For the case of mixed strategy crazy types we maintain the earlier notation — fix (G, #, µ),
where G is the stage-game, # = {$%, $1, ...,$K} is an arbitrary finite set of types, and µ is the prior
on #. Each type $ '= $% is now identified with or defined by the following sequence
$1($) # .(A0),$t($) : At!1 !.(A0)
,where each A0 has been replaced by .(A0), and and type $’s period-t strategy maps from At!1
rather than At!1+ . The reason for the latter modification is that when a behavioural type plays a
pure strategy it is unnecessary to specify how he plays after having himself deviated in the past21v%i; i > 0 is the readjusted value after all previos Adjust phases , and not the original values. A similar statement
holds for Phase V 'i ) 0.22Note that play transitions to a stage that is indexed by the actual payo!s of all players including 0 during the
minmaxing phase. Since we are in phase III it implies that player 0 has revealed himself as the normal type and hispayo!s also need to be adjusted to ensure that he mixes with the appropriate probabilities.
28
and thereby revealed himself; with mixed strategies more actions than one are consistent with the
type, and there is scope for specifying how 0 plays conditional on his earlier realised actions.
Using relatively standard techniques I extend my results in proposition 3 to the case where
mixing by players 1..., n is not observable even ex-post; however doing the same for the reputation-
builder requires significantly di!erent techniques. Before a formal statement of the proposition let
me try to explain the key issues involved, starting with the conceptually simpler case of mixed
strategies being ex-post observable. First, the presence of types that mix means that the previous
bound won’t work because at each step there is more freedom for player 0: instead of playing
exactly one action from A0 he can mix among these actions, each such mix inducing a game among
the remaining n players. One could readily extend the result to mixed strategy crazy types when
mixing is ex-post observable; the bound needs to be redefined to allow for the extra flexibility
available to player 0, but the proof remains unchanged in essence —the normal type of 0 cannot
get more than the bound if he sticks to his announced mixing probabilities; if he deviates he reveals
himself immediately to be a normal type. However, matters are more complicated when types
are committed to mixed strategies that are unobservable even ex-post, because deviations are not
readily detected. This point is explained in greater depth further in the section.
We now introduce some notation, which closely follows that for pure types. Recall that the
upper bound was earlier defined by taking the max of w0(a0) over all actions a0 # A0. Now we
need to take the supremum over all probability distributions '0 on A0. This is clearly a larger set
and consequently the upper bound cannot decrease. We now redefine the bound l. First, for each
mixed action '0 # .(A0) we have an induced game G('0) as before.
Definition 4 : The conditional minmax of i > 0 in the slice-'0 is the minmax of i > 0 conditional
on the slice G('0) ; call it wi('0).
Definition 5 : The maxminmax of a player i > 0 with respect to 0 is defined as Wi := sup$0wi('0),
the supremum of all conditional minmaxes.
Start by defining
F('0) := {v # F ('0) : vi $ Wi" i > 0} .
While F('0) 0 Rn+1, the projection of F('0) onto the 0th coordinate is denoted by adding the
subscript 0 , i.e. F0('0) 0 R.
First, observe that the worst payo! for player-0 in the slice-'0 subject to each player i > 0
29
getting above her maxminmax is
w0('0) := inf F0('0) 1 inf { v0 : ( v0, v!0) # F('0)} .
The new upper bound is given by the supremum of these infima over all slices :
l := sup$0w0('0) 1 sup$0inf F0('0).
Even with this new bound, designing and enforcing a punishment for announcing a non-normal
type proves challenging. To avoid further complications at this point, we retain the announcement
phase. Since announcements are available we allow player 0 to send an initial message m and
subsequent messages mt at each time t = 1, 2, .... We also make the simplifying assumption that a
RPD is available. (For simplicity assume that mixing by players i > 0 is observable ex-post, even
though we cannot observe mixing probabilities chosen by 0; this simplifies the proof considerably
without a!ecting the essence of the result23. )With types that mix the crux is that detecting
deviations is no longer immediate — $% can announce some $ '= $% but behave like %( '= %($);
there is no way to immediately detect a discrepancy between the initially announced type and the
actual play if, for example, %($) and %( have the same support in A0 at each t on the path of play.
The key to resolving this problem is that while there is no way of detecting a deviation period by
period, one may monitor the distribution of the actions of 0 over time. More precisely, given any
announced $ we can define for each period t an excess reward function ,t, which measures the excess
of the realised payo! of 0 over the expected payo!. Over time a Law of Large Numbers(henceforth
LLN) would, we might hope, assure us that player 0 won’t be too much above his expected payo!
if he is playing according to the initially announced type. At a technical level, one should note that
classical LLNs use independent and identically distributed random variables. The requirement that
the random variables ,t be identical is easily relaxed since they are uniformly bounded in variance.
(This follows immediately from the payo! functions gi : A ! R being uniformly bounded by ±M ,
where M := maximaxa|gi(a)| ; it follows that |,t| 2 2M "t , which together with E,t = 0 implies
that V ,t 2 4M2. ) The first condition (independence) is not unfortunately not met because types
condition their t period play on the history of actions upto t & 1; in other words, a type can play
a di!erent mix at di!erent histories. To avoid this problem we could assume that the type space23In other words, even if we assumed that mixing by all players 0, 1, ..., n is unobservable even ex-post we can
define and prove a non-trivial upper bound on reputation. We would then use the minmax (wp0) of 0 over A+ rather
than *A+ during the adjustment phase, to be defined shortly. The new bound would merely be max{wp0 , l},where
l is the bound defined above. In particular the above argument works unchanged when (a) wp0 = w0, or (b) wp
0 ( l ,where (a) implies (b). Since l < g!!0 , so is max{wp
0 , l}, ensuring that the bound will be non-trivial.
30
# contains only constant (mixed) action types, i.e. each $ '= $% plays '($) # .A0 after all his-
tories. Then players i > 0 also need to play a constant profile )('($)) to punish 0 for declaring a
non-normal type $. Since the mixed action profile is history-independent as long as there are no de-
viations, ,t’s are indeed IID; this case is readily covered by LLNs. However reputational arguments
derive considerable power from the reputation builder’s ability to condition his future actions on the
past play of his opponents. Fortunately, even when one allows such permissive history-dependent
behavioural types that mix we can show that our construction admits of an appropriate LLN (for
dependent random variables) is applicable. We now formally define the excess payo! functions and
state the appropriate LLN.
Definition : The excess payo! function ,$ : A ! R is defined by ,$(a) := g0(a)& g0(') , where
' is any mixed action profile.
This is the excess payo! to player 0 when the action profile played is a and the desired mixed action
was '. Using this we next define the excess payo! given any announced type $ and any t&1 period
history ht!1 when players follow the punitive plan encoded in )(3).
Definition : ,t($, ht!1)(a) := ,%(a) , where - = ($t($)(ht!1), )($t($)(ht!1))
Before proceeding further we state some results that will enable us to state an appropriate weak
law of large numbers. Let Y1, Y2,... be a sequence of zero-mean random variables , and let It!1 is
the information filtration at time t& 1 . Then we define the following:
Definiton : {Yt}t'1 is a Martingale Di!erence Sequence if E(Yt|It!1) = 0"t.
The next property records that the random variables ,t form a MDS, where It!1 now denotes the
filtration generated by ,t!1, ..., ,1 :
Property : E(,t|It!1) = 0 and V ,t 2 4M2 "$, ht!1.
Either of the following theorems may, for example, be used (for the first see Badi Baltagi, pg. 218).
Theorem 1: Let {Yt}t'1 be a MDS. If"T
t=1&2
tt2 < %"T then 1
T
"Tt=1 Yt 4&!prob 0.
Fact 1 :")
t=11t2 is a convergent series.
Theorem 2(Chow): Let {Yt}t'1 be an MDS w.r.t {It}t'1. If for some r $ 1 ,"T
t=1(E|Yt|2r)/tr+1 <
%, then 1T
"Tt=1 Yt 4&!a.s. 0.
Fact 2 : Convergence almost surely implies convergence in probability.
Lemma 2 : Along any equilibrium path and for any type $, we have 1T
"Tt=1 ,t 4&!prob 0.
Proof : The proof follows from the properties of the excess payo! functions and either (a) Theorem
1 and Fact 1, or (b) Chow’s theorem, Fact 1, and Fact 2 when we take r = 1 . !
Lemma 3 : For any $ and any history ht , 5 N and !0 # (0, 1) such that if !0 > !0 , the total
31
discounted sum of excess payo!s
T&
t=1
,t + !0,t+1 + ... + !N!10 ,t+N!1
N< ".
Before an exact statement of the proposition and its proof, let me present a sketch highlighting the
key features of the construction. Recall the structure of announcements — player 0 sends an initial
message m # # at t = 0, and at each time t $ 1 a message mt # # just before the stage-game G
is played. (Thus each t is a length rather than instant of time.) Beliefs are formed in the natural
way — in keeping with Bayes’ rule players i > 0 put probability 1 on the announced type m;
following any unexpected message or play, players put probability 1 on the normal type. As before
the normal type is asked to send the message $% at all t. Any type $ '= $% sends the message $
at all t. If m = $%, then play moves to the SPNE of the c.i. game (see FM’86) giving player 0 a
payo! of v0 = l + 4". If m = $ '= $%, then play starts in Phase Ia, where for N periods play is
according to )(3), which tracks the (mixed) strategy of player 0 and punishes him on each slice. By
the definition of )(.) the expected payo! of 0 at each time t does not exceed l + ". However even if
0 mixes accoding to the initially annouced type, there is a positive probability that the discounted
sum of actual excess payo! during any Phase Ia exceeds l + ". In order to ensure that these excess
payo!s to not accrue, we need to do a review after the N periods of Phase Ia,— which was not
necessary when $t(m)(ht!1) was a degenerate distribution over the actions in A0. If the realised
payo! of 0 during the block of N periods exceeds the target payo! (= l + ") by more than " the
other players mixmax 0 for exactly NP periods so as to wipe out any gain over and above the
permissible limit of ". (If ! is high then players i > 0 would rather carry out a costly punishment
for NP periods than be punished into the future. ) At the end of this adjustment phase we move
back to the start of phase Ia for another cycle of N periods. This ensures that if we target l + ",
then 0 cannot get more than l + 2". With types that mix it is thus a positive probability event
that the adjustment phase is triggered even on the equilibrium path; we need to ensure that this is
not often enough to reduce the expected utility from the equilibrium path to where players 1, ..., n
would rather face punishment than carry out this expensive threat against player 0. However if we
choose N to be a large enough integer this probability can be made arbitrarily small, say .; lemma
2, together with the fact that players are patient, assures us of this. The proof makes the argument
exact by computing bounds on the payo!s of players i > 0 during the on-path play, and verifies
that appropriate incentives are in place.
32
Since players i > 0 carry out this strategy the normal player 0 cannot get more than v0 & 2" by
declaring m = mt = $ '= $% and following %(; this is strictly worse than the payo! to declaring
m = mt = $% and following the SPNE %ci(v0) of the c.i. game. What does the strategy ask $% to
do after declaring m '= $% ? He is then asked to reveal himself at the first possible opportunity. For
example, if type $% has until time & mimicked $ then he is asked to send the message m'+1 = $%
and reveal himself ; if & belongs to a Phase Ia that started at s < & then play switches to a game of
complete information that gives player-0 a payo! of v0 & " = l + 3" from s onwards. (This can be
done in such a way that others get payo!s above their respective wi. Note that in order to give 0
a payo! of l + 3" from the start of the current phase Ia players i > 0 might need to give him more
than l + 3" in the continuation game when he deviates. If all players are patient this will cost them
no more than a small amount #, which would still be above their maxminmax. Note: Indeed all we
need is to give i > 0 a payo! above the minmax wi, which is in general lower than the maxminmax
Wi.) The analogous result to proposition 1 for the case of behavioural types that mix when mixed
strategies are not ex-post observable is.
Proposition 7: Under assumptions N and FD (with RPD and announcements), we have vmin0 2 l.
Proof : Fix an arbitrarily small " > 0. (Note: As we shall see readily " cannot be so large that the
target payo!s are infeasibly high.) It is su"cient to find a PBE in which $0 gets an equilibrium
payo! of v0 := l + 4". In what follows let %ci(x0) denote a SPNE for the c.i. game that gives
player 0 a payo! of x0. First we describe the strategy profile that gives $% a payo! of v0 in the
reputational game. We assume for convenience that a RPD is available.
Description of the Strategy Profile :
The normal type is asked to send the message $% at all t, and play according to %ci(v0). Conditional
on having announced a non-normal type until date & , $% is expected to reveal himself at the start
of & + 1 by sending the message $%. (Any type $ '= $% sends the message $ at all t and plays
according to $($).) The path of play for players i > 0 depends on the initial announcement, as
detailed below.
Case I (m = $%): If m = $% , then play the strategy profile %ci(v0) starting at time t = 0.
Case II (m = $%): If m '= $% and no player has ever deviated, then we are in Phase I. In
what follows we use a RPD; therefore “deviation” by i > 0 refers to deviation from a pure action
contingent on the realised draw; for player 0, a “deviation” is said to occur at period t only when
an action a0(t) not in the support of $t(m)(ht!1) is played.
We first define the following quantities. In each slice '0 # .A0, by lemma 4 5)('0) # &A+ such
33
that
g0('0, )('0)) 2 l + " and *i('0) := gi('0, )('0)) > Wi + / for some / > 0
,where / depends on " . This implies that *i := sup$0*i('0) > Wi +/. Now fix the n+2 quantities
#, (+i)i>0,& s.t. Wi < +i < +i + & < *i & # < *i. Since Wi < +i by construction, we can find a
large enough interger P , which will later be seen to be the length of the conditional minmaxing
phase, such that
PWi + max gi(a) < P+i + min gi(a)"i > 0....())
We start in PhaseIa: During the first N periods, where N is chosen as in lemma 3 above,
and the prescribed play at t conditional on the t & 1-period history ht!1 is )($t(m)(ht!1)), the
mixed action profile that gives player 0 a low expected payo!(no more than l + ") in $t(m)(ht!1).
After N periods we now do a review, which was not necessary when $t(m)(ht!1) was a degenerate
distribution over A0. If the realised payo! of 0 during the block of N periods does not exceed the
target payo! by more than ", we start another Phase Ia; otherwise the other players mixmax 0 for
NP periods, wiping out any gain over and above the permissible limit of ", while the normal type of
0 is instructed to play some pure best reply during this stage. At the end of this adjustment phase
IIb we move back to the start of phase Ia for another cycle of N periods. By lemma 3 we can find
a large enough integer N which is independent of the discount factors and the type $ such that the
probability of the average discounted excess payo! being more than 2" after N periods falls short
of ., whre . > 0 will be chosen to be suitably small later on.
Suppose player 0 has never deviated from his announced strategy in the past and that player
i > 0 deviates from the prescribed path at time & ; then play enters Phase IIi , where player i is
conditionally minmaxed in slice $t(m)(ht!1) at time t = &, & + 1, ..., & + P , where P is the length
of the punishment. Once Phase IIi is over play moves into Phase IIIi , where the play is such
that +(i) is delivered at t = & +P +1, & +P +2, ...,% ; recall that the ith component of this vector
is +i , and all other components j > 0 are +j + &. If player j unilaterally deviates from Phase IIi
or Phase IIIi, then impose Phase IIj followed by Phase IIIj , and so on.
Conditional on an announcement m = $ '= $%, players believe that they are indeed facing type
$ until 0 deviates from $(m) or sends a message m '= $ and reveals himself to be the normal type.
Once the continuation game is a c.i. game, any player including 0 may be punished by minmaxing
by the other players P periods (see * above ). If player 0 reveals himself as the normal type at the
start of a Phase Ia then play re-adjusts to give him v0 & " = l + 3" from the start of the current
34
Phase Ia. If he reveals normality during a Phase Ib then the minmaxing continues until the current
phase is played out, and thereafter play switches to %ci(v0 & ").
Checking that the Strategy Profile Constitues a PBE:
Step 1 : Player 0 will not deviate :
(i) From standard results for the c.i. game it follows that 0 does not deviate after he reveals
himself to be a normal type.
(ii) $% will reveal himself with m = $% and get v0 because the maximum following any an-
nouncement m '= $% is below v0 & 2".
(iii) Following m '= $% , in phase Ia the normal type in indi!erent between revealing himself at
any point within the current phase, because it will give him the same utility from the start of the
current Ia ; delaying revelation until after the current phase is strictly worse.
Step 2 : Player i > 0 will not deviate :
(i) during Phase Ia : When Phase Ia has just been started let player i’s utility be vai . Let
T := N & 1 + NP . The following inequality follows from the recursion structure:
vai
1& !$ *i(1 + ! + !2 + ... + !N!1) + .{(&M)(!N + ... + +!T ) + !T+1 va
i
1& !}+ (1& .)
vai
1& !!N
=* vai $
*i(1& !N )&M.!N (1& !T!N+1){1& (1& .)!N & .!T+1}
As . ! 0, the right-hand side tends to (i(1!!N )(1!!N ) = *i ; thus 5." s.t . 2 ." * va
i $ *i & #.
The total discounted value at the start of Phase Ia comprises the following terms: First we have
N periods during each of which which player i expects to get no less than *i, given the announced
type and the construction of )(3); with probability . this is followed by an adjustment phase in
which player i gets no less than &M and subsequently play moves to the next Phase Ia, whereas
with the remaining probability 1& . Phase Ia is repeated.
(ii) during Phase Ib : When Phase Ib (the adjustment phase) has just been started let player
i’s utility be vbi . It is easy to see that if i does not have an incentive to deviate at the satrt of
the adjustment phase then she will not do so later, because it is at the start that the player has
to be ready to mete out a possibly costly punishment for the greatest number, viz. NP , periods.
The minimum utility from carrying this out is {(&M)(1 + ... + +!NP!1) + !NP vai
1!!}, where vai is
as above. If ! is high enough, this is close to vai . If i deviates, play eventually settles down in an
equilbrium that gives +i to player i. By construction +i < *i & #, and *i & # 2 vai from the step
35
above. It therefore follows that a deviation by any i > 0 from Phase Ib proves costly.
(iii) Player i does not deviate from IIIj or IIj because, in expected value, she ends up losing
the reward & and any gains from the deviation are wiped out.
(iv) Player i does not deviate from IIi because, in expectaton, she is playing her best response
anyway; deviation worsens both the current as well as the future payo!. Likewise deviation from
IIIi is unprofitable because of condition (*). !
7 A Lower Bound
Having already seen that the LR player can be forced down to payo!s arbitrarily close to l , we
end by looking at what payo!s LR can actually guarantee himself. Start by considering types that
have a dominant strategy to play a constant action a0 in every period (bounded recall zero). Fix
any a0 # A0, and ask the question: What can the LR player guarantee himself by mimicking a type
that plays this constant action each period: at0 = a0 " t = 1, 2, ...? First define the corresponding
individually rational slice F " (a0) defined as follows, where wi is the minmax of i in G:
F " (a0) := {v # F (a0) : vi $ wi" i > 0} .
Thus both F(a0) and F "(a0) are defined by truncating the slice-feasible payo! set F (a0) below
some level for each player i > 0. In the first case this level is the maxminmax for each player, and
in the second case this is the conditional minmax. Note that F " (a0) 0 Rn+1, the n+1-dimensional
Euclidean plane. Take the infimum of the projection of this set onto dimension 0 ; that is the lowest
payo! that 0 can get in an equilibrium if he sticks to a0 for all t. The rough reasoning is that if 0
continues to play a0 , the others cannot continue to play a strategy profile that gives them less than
their respective minmax values. If all he could do was to mimic a type that plays a constant action
every period (a bounded recall strategy with 0 memory) the worst he could do is to get the max
over a0 # A0 of these minima. Thus we have a lower bound l0 := maxa0$A0inf F "0 (a0) ; in any
equilibrium player-0 cannot get much less than l0 when all players are patient and he is relatively
patient, and the prior places positive weight on all types that play a constant action a0 for all t. A
formal statement follows. We make the following assumption:
Assumption TC : All “crazy” types declare their type truthfully.
This assumption, it should be remarked, is not assuming truthtelling for the entire game. In no way
36
does it constrain the normal type $%’s announcement24. This assumption has been used by Abreu
and Pearce (2002, 2007) as a shortcut to an explicit model with trembles or imperfect monitoring,
in which the strategies would eventually be learnt whether or not some irrational types declare
truthfully. However such a model would of necessity be technically challenging to handle, especially
in view of the large type space I wish to support. One might thus justify it as contributing to
technical simplicity25.
Next, a richness assumption will capture the premise of reputational arguments —Even when we
believe that a player is overwhelmingly likely to be of a given type, we can never be absolutely sure
that he is. Naturally reputational arguments are interesting precisely because they work even when
one type — the normal type of the corresponding repeated game— $% # # has high probability
mass, i.e. µ($%) ( 1. The following makes the assumption that the prior µ places a positive but
arbitrarily small weight on all types that play a constant action every period.
Assumption ACT (All Constant action Types ) : Assume µ($(a0)) > 0 "a0 # A0 .
Under this rather weak assumption ACT and TC we have the following result that puts a lower
bound on 0’s payo! across all BN equilibria.
Proposition 6 (Lower Bound): Under TC and ACT there exists a lower bound l0(!0, !1) on the
payo!s on 0 in a BN eq. such that lim!&1lim!0&1l0(!0, !) = l0 , where l0 := maxa0$A0inf F "0 (a0).
Proof: This is done in the same way as CST. !
The above step establishes the existence lower bound on 0’s min payo! in BN eq.26 using strategies
that involve playing a single action at all times. Note that the definition of F "(a0) uses the minmax
value wi , which would in general be less than a conditional minmax defined earlier.
In a game where strategies can be learnt because of trembles or imperfect monitoring the set
F "(a0) would be replaced by the set H (a0), which uses the conditional minmax rather than the
minmax: H (a0) := co {v # F (a0) : , vi $ wi (a0)" i > 0} . This would raise the lower bound as
wi (a"0) > wi in general. We use F "(a0), which truncates F (a0) below the minmax wi rather than
below the conditional minmax wi(a0), because there might exist equilibria in which even if player
0 plays a0 always it is not clear to the others that this is the case; consequently they perceive their
lowest eq. payo! as wi rather than wi(a0).
In general, it is clear that constant action types constitute a small class of strategies; there are
uncountably infinite ways of switching between the various actions in A0, each action amounting to24If truthtelling is to hold there it must be derived; otherwise studying reputation is pointless.25A later section extends the analysis to situations where announcements are unavailable; the accompanying proof
would not need TC .26This bound thus applies to all BNE, not just the smaller subset of PBE.
37
the choice of a“slice”of the game G. The next step would be to look at increasingly longer strategies
of bounded recall, perhaps using some kind of induction on the memory size and see to what extent
they lead to further improvements. This unfortunately turns out to be a very hard problem to
solve, partly because there are uncountably many “crazy” strategies that 0 could potentially mimic.
In principle one could think of each “crazy” strategy as a rule for transitioning among the slices
G ; thus solving the i.i. game is akin to solving a class of stochastic games among n rather than
n + 1 players, defined using the repeated game. We then need to find for each game in the class
the worst possible payo! of the normal type, and finally taking sup/max over all these possible
minima(or infima). Since stochastic games are notoriously hard to handle, this compounds the
di"culty. However as we have seen, LR cannot guarantee himself anything more than l no matter
how patient he is relative to his opponents, who are also patient.
8 Conclusion
This paper attempts to contribute to the literature on reputation that starts with the work of Kreps,
Wilson, Milgrom and Roberts, and is sharpened into powerful and general theoretical insights by
Fudenberg and Levine, and subsequent papers. My paper analyses reputation formation against
multiple patient opponents. I show that there are some additional insights to be gained from this
case, over and above the elegant theoretical insights of the previous literature. While reputation
is in general valuable even against multiple players, it may not be possible for the patient player
to extract the entire surplus while leaving the others with barely their minmax values. Let vmin0
be the minimum equilibrium payo! of player 0 in the limit when all players are patient and 0 is
patient relative to the rest. I find an upper bound l such that vmin0 2 l : Any payo! of the (complete
information) repeated game in which 0 gets more than l can be sustained. A single opponent cannot
threaten credibly to punish and thwart a patient player building reputation. But with more than
one patient opponent, there might be ways to commit to punishing even a patient player for not
behaving like the normal type.
References
[1] Abreu, Dilip ; On the Theory of Infinitely Repeated Games with Discounting Econometrica,
Vol. 56, No. 2 (Mar., 1988)
38
[2] Abreu, Dilip , Prajit K. Dutta, Lones Smith ; The Folk Theorem for Repeated Games: A Neu
Condition; Econometrica, Vol. 62, No. 4 (Jul., 1994)
[3] Abreu, Dilip , David Pearce ; Bargaining, Reputation and Equilibrium Selection in Repeated
Games with Contracts; Econometrica
[4] Aoyagi, Masaki ; Reputation and Dynamic Stackelberg Leadership in Infinitely Repeated
Games , Journal of Economic Theory, 1996, vol. 71, issue 2, pages 378-393
[5] Celetani, Marco ; Drew Fudenberg ; David K. Levine ; Wolfgang Pesendorfer ; Maintaining a
Reputation Against a Long-Lived Opponent; Econometrica, Vol. 64, No. 3 (May, 1996)
[6] Chan, Jimmy ; ”On the Non-Existence of Reputation E!ects in Two-Person Infinitely-Repeated
Games”, April 2000, Johns Hopkins University working paper.
[7] Cripps, Martin W. , E. Dekel, W. Pesendorfer; ”Reputation with Equal Discounting in Re-
peated Games with Strictly Conflicting Interests,” Journal of Economic Theory, 2005, vol. 121,
259-272.
[8] Cripps Martin W. ; George J. Mailath; Larry Samuelson [CMS], Disappearing Reputations in
the Long-run — Imperfect Monitoring and Impermanent Reputations Econometrica, Vol. 72,
No. 2. (Mar., 2004), pp. 407-432.
[9] Cripps Martin W. ; George J. Mailath; Larry Samuelson; Disappearing private reputations in
long-run relationships ; Journal of Economic Theory, Volume 134, Issue 1, May 2007, Pages
287-316
[10] Cripps Martin W. , Klaus M. Schmidt and Jonathan P. Thomas ; Reputation in Perturbed
Repeated Games; Journal of Economic Theory, Volume 69, Issue 2, May 1996, Pages 387-410
[11] Cripps, Martin W. , J. P. Thomas; “Some Asymptotic Results in Discounted Repeated Games
of One-Sided Incomplete Information,” Mathematics of Operations Research, 2003, vol. 28,
433-462.
[12] Evans, Robert ; Jonathan P. Thomas; Reputation and Experimentation in Repeated Games
With Two Long-Run Players Econometrica, Vol. 65, No. 5. (Sep., 1997), pp. 1153-1173.
[13] Fudenberg, Drew; David K. Levine; Reputation and Equilibrium Selection in Games with a
Patient Player , Econometrica, Vol. 57, No. 4 (Jul., 1989)
39
[14] Fudenberg, Drew; David K. Levine ; Maintaining a Reputation when Strategies are Imperfectly
Observed; The Review of Economic Studies, Vol. 59, No. 3 (Jul., 1992)
[15] Friedman, James ; A Non-cooperative Equilibrium for Supergames James W. Friedman The
Review of Economic Studies, Vol. 38, No. 1 (Jan., 1971), pp. 1-12
[16] Fudenberg, Drew ; David M. Kreps ; Reputation in the Simultaneous Play of Multiple Oppo-
nents ; The Review of Economic Studies, Vol. 54, No. 4 (Oct., 1987)
[17] Fudenberg, Drew; Eric Maskin; The Folk Theorem in Repeated Games with Discounting or
with Incomplete Information, Econometrica, Vol. 54, No. 3. (May, 1986), pp. 533-554.
[18] Kreps, David ; Robert Wilson ; Reputation and Imperfect Information ; Journal of Economic
Theory, 1982
[19] Klaus M. Schmidt ; Reputation and Equilibrium Characterization in Repeated Games with
Conflicting Interests; Econometrica, Vol. 61, No. 2 (Mar., 1993)
Appendix
Proof of TPL: Fix " > 0. Pick any slice a0. First we show that there exists a payo! vector
*(a0) # F(a0) such that *i(a0) > Wi " i > 0 and *0(a0) < w0(a0) + " 2 l + ". (" is the same for
all slices. ) Since inf{v0 : (v0, v+) # F(a0) and vi $ Wi "i > 0} = w0(a0) , there exists a vector
*(a0) # F(a0) such that *i(a0) $ Wi"i and *0(a0) < w0(a0) + "2 2 l + "
2 . By N, we can pick a
point *(a0) # F (a0) where each i gets > Wi. Now define *(a0) := #(a0)*(a0) + {1 & #(a0)}*(a0)
, where #(a0) is close enough to 1 so that *i(a0) > Wi and *0(a0) < w0(a0) + ". Since *(a0) #
F(a0) 0 co{F (a0)} , 5 a probability distribution )(a0) # .A+ such that
g(a0, )(a0)) 1&
a+$A+
g(a0, a+))(a0)(a+) = *(a0).!
Define *i := mina0{*i(a0)} . Thus i gets at least *i in every slice if all players j > 0 play )(a0)
and 0 induces slice-a0. Since *i(a0) > Wi "a0, we also have *i > Wi.
Proof of Lemma 3 : By lemma 2 we know that for any . > 0, 5N large enough so that for
any t and any initial t& period history ht!1 we have
Pr(|T&
t=1
,t + ,t+1 + ... + ,t+N!1
N| > "
2) 2 ..
40
Now the absolute distance between the disocunted and undiscounted means is, given N ,
|T&
t=1
,t + !0,t+1 + ... + !N!10 ,t+N!1
N&
T&
t=1
,t + ,t+1 + ... + ,t+N!1
N|
2N!1&
k=1
(1& !k0 )|,t+k|N
2 (1& !N0 )(N & 1)M
N<
"
2
, when !0 > !0 , where !0 solves the equation obtained when the last inequality is replaced by an
equality; thus the discounted excess payo! over any N periods will with probability 1&. not exceed
"2 + "
2 = ". !
41