mu ltiple o pp on ents and the l imits of r ep utation · mu ltiple o pp on ents and the l imits of...

Multiple Opponents and the Limits of Reputation

Sambuddha Ghosh!

January 30, 2009

Abstract

I consider a reputation game with perfect monitoring and multiple (two or more) long-lived

opponents indexed 1, 2..., n. Player 0, who attempts to build reputation, could be of one of

many types. The normal type of player 0 and players 1, 2, ..., n maximise the discounted sum

of stage-game utilities; non-normal types of 0 are committed to particular strategies. Let vmin0

be the minimum equilibrium payo! of player 0 in the limit when all players are patient and 0

is patient relative to the rest. The previous literature finds that for a single opponent (n = 1)

we have vmin0 ! L, where the lower bound L equals the maximum payo! feasible for 0 subject

to giving the opponent her minmax. In other words, 0 can appropriate ‘everything’. For

n > 1 , in contrast, I find an upper bound l such that vmin0 " l : Any payo! of the (complete

information) repeated game in which 0 gets more than l can be sustained even when 0 has

the twin advantages of relative patience and one-sided incomplete information. Typically l is

a low number and could be as low as 0’s minmax value. This result is robost to a very large

set of types — I start with the case where the type space may include any pure strategy of the

dynamic game; a later section extends the result to types that, in addition to playing arbitrarily

complex history-dependant actions, may be committed to mixing.

!I am grateful to Dilip Abreu for detailed comments and advice; Stephen Morris, without whose encouragementthis might never have taken shape; Faruk Gul and Eric Maskin, for their suggestions and guidance. Satoru Takahashiand Takuo Sugaya helped me check the proofs; discussions with Vinayak Tripathi sharpened my understanding ofthe literature. Several attendees at Princeton’s theory workshop & theory seminar, especially Wolfgang Pesendorfer,and at the Penn State Theory Seminar had valuable comments. Errors remain my responsibility.

1

1 Introduction and Literature Review

The literature on “reputation” starts with the premise that while we may be almost certain of

the payo!s and the structure of the game we cannot be absolutely certain. The small room for

doubt, when exploited by a very patient player( sometimes referred to as a long-run player, or

LR for short), leads to “reputation” being built. Finite horizon chain-store games of Kreps and

Wilson(82), and Milgrom and Roberts(82) introduced this idea. They show that arbitrarily small

amounts of incomplete information su"ce to give an incumbent monopolist a rational motive to

fight early entrants in a finitely repeated entry game even if in each stage it is better to acquiesce

once an entry has occured; in contrast the unique subgame-perfect equilibrium of the complete

information game is marked by entry followed by acquiescence in every period (this is Selten’s

well-known chain-store paradox ). Another paper by the above-named four explains cooperation in

the finitely repeated prisoner’s dilemma using slight incomplete information, although there is no

cooperation in the unique subgame perfect equilibrium of the corresponding complete-information

game. The first general result on reputation, due to Fudenberg and Levine (FL,1989), applies

to a long-lived player 0 playing an infinitely repeated simultaneous stage game against a myopic

opponent: As long as there is a positive probability of a type that plays the Stackelberg action1, a

su!ciently patient LR can approximate his Stackelberg payo!.

Work on reputation has not looked at games with multiple long-lived opponents who interact

with one another, not just with the long-run player; my paper investigates how far the results for

a single opponent extend to settings with multiple non-myopic opponents(i = 1, 2, ..., n;n > 1)

and one LR player(0), who is very patient relative to the rest of the players. I deal with a model

of perfect monitoring. The basic issue in the ’reputation’ literature is: How low can the payo!

of the LR player be when he has access to an appropriate set of types to mimic and is patient

relative to the other players? Fudenberg and Levine have shown that the answer to this question

is, in some sense, “very high” — When a patient player 0 faces a single myopic opponent, there is

a discontinuity in the (limiting) set of payo!s that can be supported as we go from the complete

information (c.i.) to the incomplete information(i.i.) game if we allow a rich enough space of types.

Even small ex-ante uncertainties are magnified in the limit, and the e!ect on equilibria is drastic.

Does the same message hold in variants of the basic framework?

Schmidt(Econometrica, 1994), Aoyagi (JET,1996), CFLP (Celentani, Fudenberg, Levine, Pe-

sendorfer; Econometrica, 1994) all consider only one non-myopic opponent. At the cost of some1The Stackelberg action is that action to which 0 should have liked to commit if he could. In other words, the

best response to it gives a higher payo! to player 0 than committing to any other action does.

2

additional notation let us be precise about the discount factors !0, !1, ..., !n. All three papers deal

with the limiting case where all players are patient but 0 is relatively patient, i.e. !i ! 1 "i > 0 and1!!01!!i

! 0 "i > 0. One standard justification of a di!erent discount factor for LR is that he is a large

player who plays with several opponents, who play relatively infrequently. My work will retain this

assumption. To see the importance of having a non-myopic opponent, recall FL’s argument for a

single myopic opponent: If the normal type mimics the type that plays the “Stackelberg action” in

every period, then eventually the opponent will play a best-response to the Stackelberg action if she

is myopic. Schmidt is the first to consider the case of one non-myopic opponent; he shows that this

natural modification introduces a twist in the tale: The opponent need not play a period-by-period

best response because she fears that she is facing a perverse type that plays like the commitment

type on the path but punishes severely o!-path. Since o!-path strategies are not learnt in any equi-

librium with perfect monitoring, this could lead to very weak lower bounds, ones lower than FL’s

bound in particular. Schmidt shows that the result in FL’89 extends only to “conflicting interest

games” — games where the reputation builder would like to commit to an action that minmaxes

the other player. Roughly this is because the SR player has no choice about how to respond —

she must play her static best response or get less than her minmax value, which is impossible in

equilibrium. Later Cripps, Schmidt, and Thomas (CST) consider arbitary stage-games and obtain

a tight bound that is strictly below the bound of FL. Their bound is “tight” in the sense that there

exist equilibria that give LR payo!s just above this bound.

The literature that follows argues that CST takes a somewhat bleaker view of reputation e!ects

than warranted. Aoyagi and CFLP both di!er from Schmidt and CST in that there is no issue

whether strategies can be learnt: Trembles in the first paper and imperfect monitoring in the second

ensure that all possible triggers are pressed and latent fears do not remain. When strategies are

eventually learnt reputation is once again a potent force. Here is their main result: As the (relatively

impatient) opponent also becomes patient in absolute terms, the payo! of the patient player tends

towards g""0 = max {v0|(v0, v1) # F "}2, where F " is the feasible and individually rational set of

payo!s; i.e. he gets the most that is consistent with individual rationality of the opponent, player

1. A final decisive step towards restoring a high lower bound was taken by Evans and Thomas (ET,

Econometrica, ’97); they showed that the weakness of the bound in CST relies on the assumption

that all irrational types impose punishments of bounded length; under limiting patience they extend

the FL result to the case of a single long-lived opponent playing an arbitrary simultaneous stage-2We adopt the convention that the generic value is v; w is the “worst” value; and b denotes the best. The

mnemonic advantages hopefully justify any break with practice.

3

game even under perfect monitoring. Assign a positive prior probability to a type that plays some

block of actions, expects a block of replies in return, and punishes for k periods the kth deviation

from the desired responses; then the lower bound, with a (su"ciently patient) long-lived opponent,

tends to the best feasible payo! for LR subject to the opponent getting her minmax payo!. ET

obtains the same bound g""0 as Aoyagi and CFLP. In terms of the framework used in the literature,

my work adds more opponents to Schmidt’s framework. To focus on a di!erent issue, I shall

introduce a signalling stage that abstracts from learning problems that could arise from perfect

monitoring. In a later section I show that this modification does not distort results, although it

does simplify greatly the description of the strategies.

The immediate generalisation of the bound obtained by Aoyagi, CFLP, and ET for n = 1 to

multiple (n > 1) opponents is g""0 = max {v0|(v0, v1, ..., vn) # F "} . One obvious case where this

holds is the one where the relatively patient player (0) is playing a series of independent games

with the other n players3. When players 1, ...n are myopic, i.e. their discount factors are 0, the

above bound follows readily from the analysis of FL ’89; if the other players are also patient the

bound derives from (an n-fold repetition) of the analysis of ET’97. However I find that this is not

true in general: The presence of additional opponents each with the ability to punish and play

out various repeated game strategies complicates the situation. Recall that the previous literature

obtains lower bounds and establishes that vmin0 $ L , where L is a large lower bound that equals

the Stackelberg payo! of 0 in FL’89 and is g""0 in the three other papers mentioned above. It

is interesting that, under some non-emptiness restrictions that rule out cases like the immediate

extension above, a world with multiple opponents gives an upper bound on how high a payo!

“reputation” can guarantee LR. In other words, I define a bound l such that all c.i. (i.e. without

the perturbed type space) equilibrium payo!s that give the LR player more than l can be sustained

in the limiting set of payo!s even under incomplete information, and even with all the CFLP and

the Schmidt-types; furthermore, sequential rationality is satisfied in the construction of the above

equilibria. Finally, l could be much lower than g""0 , and even as low as the minmax value of player

0. This means that while reputation is in general valuable to a player, its impact is less dramatic

when there are multiple opponents; above a certain value the perturbed game is no di!erent from

the original repeated game of complete information.

Fudenberg and Kreps(87) is the only paper to my knowledge that has multiple opponents playing

0, who is trying to build a reputation. Their framework is significantly di!erent from mine; in3By “independent” I mean that the payo! of any player i > 0 is independent of the actions of player j > 0, j "= i

. This game is no more than the concatenation of n games, each of which has player 0 as one of the two players.

4

particular, the SR players do not have a direct e!ect one another’s payo!s through actions. The

paper are more concerned with the e!ect of 0’s actions being observed publicly rather than privately

by each opponent when the basic “contest” is an entry deterrence game. At this point I should also

note the literature on games in which all players have equal discount factors. Cripps et. al. and

Chan have shown that, except in special cases, the reputation builder needs to be patient relative to

the others to be able to derive advantage from the incomplete information. My main result applies

equally well when all players have equal discount factors, although it contrasts most sharply with

the previous literature when player 0 is more patient than the rest, which is also the case that has

received the most attention. There is yet another strand of the reputation literarure, starting with

the work of Mailath and Samuelson in ??, that looks not at the payo! from building reputation,

but the actual belief of the opponent about the type of the reputation-builder. CMS show that, in a

world with imperfect monitoring, even where a patient player can get high payo!s in equilibrium by

mimicking a committed type, the single opponent will eventually believe with very high probability

that she is indeed facing a normal type; in other words, reputation disappears in the long run under

imperfect monitoring. This question will not be relevant in our context. The plan of the paper is

as follows. Section 2 has some motivating examples; section 3 lays out the formal details of the

model; the next deals with the upper bounds on the minimum equilibium payo!s of LR. I extend

my main result to additional situations of interest in section 5, although at the cost of increasing

complexity of the equilibrium strategies and the analysis. Section 6 proves analogous results when

there are types that explicity mix on the path. This calls for very di!erent techniques because,

unlike with oure strategy types, it is no longer the case that a deviation by the reputation-builder

is immediately detected; he could now conceivably get away with declaring one type and behaving

like yet another type, as long as he doesn’t give himself away.

2 Motivating Examples

This section presents a few simple examples that place this current paper in context. In each

example we start with a benchmark case, and in turn illustrate the implications of the theory of

repeated games, then of the existing literature on reputation, and finally that of the current paper.

They try to weave stories, admittedly simplistic ones, to give a sense of the results. The first

example is meant to be the simpler of the two; the second is best postponed until the reader has

acquired familiarity with the concepts and quantities introduced in the main model.

Example 1 (A Skeletal Oligopoly Example): A caveat: This example has special features that do

5

not appear in the proof but permit simpler analysis.

Consider a market of size 1 for a perishable good in which three firms 0, 1, 2 interact at times

t = 1, 2, ...,%. The demand curve is linear in the total output q:

p(q) : =! 1& q ; q # [0, 1]

0 ; q > 1.

I wish to consider a case where the products are slight imperfect substitutes. However to keep the

algebra to a minimum I use the shortcut of using a side-market. There is one small side-market

of size " shared by the SR firms 1 and 2 : p(q#) := " & q# ; q# # [0, "], where " is a very small

positive number and q# is the total output in the side-market. This ensures that the two firms 1

and 2 cannot be minmaxed unilaterally by 0; the formal model clarifies this point further. Let us

make another simplifying assumtion— fixed and marginal costs are 0. A firm’s output is chosen

from a compact convex set; although, strictly speaking, my result deals with finite action spaces,

an appropriately fine grid of actions can approximate arbitrarily closely the results that follow. For

the infinitely repeated game, firm i discounts profits at rate !i; we shall consider the case when 0

is relatively patient i.e. all !i’s are close to 1 but !0 is relatively closer. (For the main result with

multiple opponents it is enough that all players are patient, whether or not 0 is relatively patient;

however the results for n = 1 make critical use of the relative patience of 0.)

First we compute the following benchmark quantities for the main market. Suppose there is a

monopolist serving the main market. The monopolist’s per-period profit function is #0(q) = (1&q)q.

Calculate the following: ##(q) = 1&2q ; #!!(q) = &2 < 0. Therefore the maximum per-period profit

of the monopolist is #m = 1/4 at the output qm = 1/2. The maximum discounted and normalised

total profit is also 1/4 if the monopolist discounts using !0 # (0, 1). What is the Stackelberg profit

of firm 0 as leader and 1 as follower? First solve maxq1q1(1& q0 & q1) to get q"1 = 1!q02 ; the max

profit is maxq$[0,1]q01!q0

2 = 18 . The Cournot-Nash equilibrium will also be a useful benchmark. In

this equilibrium each firm produces 13 , leading to a price of 1

3 , and a profit of 19 for each of 0 and 1.

For use later in the example, note that in the side market of size " the monopoly profit is "2

4 , which

is obtained when the total output is "2 ; in the Cournot-Nash equiibrium each firm i = 1, 2 produces

"3 and earns a profit of "2

9 < "2

8 , one-half the monopoly profit.

For now suppose that 0 and 1 are the only firms present. The main market is then a Cournot

duopoly with perfect monitoring and, most importantly, complete information. The Nash-threats

folk theorem of Friedman shows that the point ( 18 , 1

8 ) obtained by a fair split of the monopoly profits

6

can be sustained in a equilibrium (indeed in a SPNE) if the players are patient enough because

the Cournot-Nash equilibrium features a strictly lower profit of 19 for each. The following trigger

strategy accomplishes it — each firm produces 14 , half the monopoly output; any deviation leads to

firms switching to the Cournot-Nash equilibrium forever.

Now consider the following thought experiment: Introduce one-sided incomplete information

about 0 through a type space #, which comprises the normal type $% and finitely many crazy types

$. The normal type discounts payo!s as usual. However each ‘crazy’ type $ '= $% is a choice

of output $#1 for t = 1 and for each t $ 1 a mapping $#

t+1(·) from the observed history of the

other players; given that the only opponent is player 1, $#t+1(·) maps from ht

1 := (q1(s))ts=1 to an

action/output. We start with the environment of FL : !1 = 0. If there is a positive probability

of a type that selects the Stackelberg output irrespective of history, 0 can guarantee himself very

close to 18 if he is patient enough : If he mimics this type, FL shows that player 1 must eventually

best respond with her “follower output” because she is myopic. Let us now make even player 1

patient, while 0 is patient relative to her: !1 ! 1, 1!!01!!1

! 0. Define the limiting payo! set as

V := lim!1&1 {lim!0&1V (!0, !1)}. Introduce a type of player 0 that produces almost the monopoly

output qm every period and if 1 produces more than some very small quantity, he punishes her by

flooding the market for k periods following the kth o!ence, and returns to producing almost qm

afterwards. It follows from the analysis of ET4 that if there is a positive probability of this type and

0 mimics him, player 1 can induce no more than a finite number of punishment rounds until she

finds it worthwhile to experiment and see if she can escape punishment by producing very little and

letting LR get almost monopoloy profit. In other words the limiting payo! set is V = ( 14 , "2

4 )! Thus

the combination of patience and reputation is enough for player-0 to extract the entire surplus —

in the limit he enjoys (almost) monopoly profits from the main market, while the other player gets

(almost) nothing in the main market and the small monopoly profit of "2

4 from the side-market.

Now introduce the third player, making the main market an oligopoly and the side-market a

duopoly; continue to assume the same type space #, keeping in mind that now $#t+1(·) maps from

(ht1, h

t2) := (q1(s), q2(s))t

s=1 to an action/output. The bound l that I refer to in the introduction can

be shown to be 0 in this example. Thus any payo! vector in the repeated game that gives player

0 more than 0 can be sustained. In particular, my results imply that as the perturbation becomes

small (i.e. the probability µ($%) of the normal type goes to 1), the limiting set of equilibrium

payo!s contains a point that gives each player an equal share of the monopoly profit #m from the4Once the formal model is in place I flesh out this argument in more general terms. For a fuller argument the

reader is referred to their paper.

7

main market. This point could not be sustained with only players 0 and 1, where we have already

reasoned that 0 gets arbitrarily close to his monopoly profit of 1/4 when he has enough patience

and crazy types to mimic.

Why the marked change in result? Here is a sketch of the argument. Add an announcement stage

to this game asking the patient player to declare his type, by sending a message m # #. Consider the

normal type $o of player 0. If he declares m = $o, then each firm produces qi(t) = 16 "i $ 0 "t $ 1

and makes a profit of 112 in the main market; in the side market play starts at the point M(see figure

below)— each SR firm produces half the monopoly output and makes half the monopoly profit, i.e

"2

8 each. Given that firms are patient, any deviation by 0 may be punished by a reversion to the

Cournot-Nash equilibrium(CNE) in the main market, in which each firm produces an output of 14 in

the main market and makes a profit of 116 ; deviations by i > 0 are punished by reverting to the CNE

in both markets. From the analysis of Friedman we already know that it is an SPNE. The trouble

is that $% would in general want to declare a di!erent type and mimic it. The following strategies

make it undesirable for him to mimic any other type: If m = $ '= $o, the others lock themselves into

a bad equilibrium %+($) as follows. At each t + 1 given any t-period history ht and announcement

$, players i > 0 are called upon to play some combination of actions/outputs (%+(t + 1)(ht, $)) so

as to eliminate all profits in the main market: $#(t + 1) (ht) +"

i>0 %i(t + 1)(ht, $) = 1; in the

side-market each firm i = 1, 2 produces "4 and earns a profit of "2

8 as before. If 0 deviates from his

announced strategy ever, he reveals himself to be the normal type and, after he is minmaxed to

wipe out any gains resulting from the above deviation, play moves to the symmetric cooperative

equilibrium sustained by Cournot-Nash reversion as in the c.i. game. The following figure will be

useful in clarifying the strategies following a deviation by any i > 0 following m '= $o.

8

A deviation by player 1 in either market at any point & impacts play in both markets. In the

main market the other SR player (2) minmaxes 1 by producing a total output of 1&$#(t+1) (ht, $)

for all subsequent periods t+1 = & +1, & +2, ...,% , while 1 is asked to produce 0. In the side market

play moves, starting from & + 1, to the point where firm 2 produces the Stackelberg output of 13"

while 1 responds with her follower output of 16"; this is the point R2 ( ( 1

12"2, 16"2) in the payo! space

(see figure). If 2 deviates, interchange the two players 1 and 2 in the above construction — play

moves to the point where firms 1 and 2 produce ( 13", 1

6") in the side-market leading to the payo!s

R1 ( ( 16"2, 1

12"2); now 2 is playing her best response to 1’s Stackelberg output. Any subsequent

deviation by i > 0 results in play moving to Rj ; j '= i. (In particular, if i '= j deviates from Rj then

play remains at Rj .) In our construction, the player who gets the lower profit in an asymmetric

point in the side-market is playing a best response. Player i does not deviate from Ri because it

results in a loss of profits equal to 16"2 in the side-market as play switches from Ri to Rj forever,

a su!cient deterrent given that firms are patient. Player j does not deviate from Ri because that

is strictly worse: She is anyway playing her unique best response to i’s output. Finally note that

player i punishes 0 according to the prescription of %+ because otherwise play transitions from M

to Rj in the side-market, resulting in lost profits worth 136"2. Here lies the key di!erence between

the single and multiple opponent cases: Now player 0 cannot unilaterally give player 1 a (small)

reward for producing a low output while he has a much larger share, for the other SR player 2 can

9

destroy these rewards. The ability of the SR players to inhibit rewards from LR and to punish each

other turns the tables on 0; the construction above translates this possibility into an equilibrium

where all players get low profits.We have thus answered the key question: Why does the patient

player not deviate and behave like some crazy type? As we reasoned, doing so would not guarantee

him anything at all whereas the proposed equilibrium o!ers him 112 . !

Example 2 (Oligopoly with Capacity Constraints) This example is best read after the reader

has seen the formal model.

Consider an oligopoly with capacity constraints. At each point of time t = 1, 2, ... there is a

market of size 1 for a perishable good ; to be specific the demand function is

p(q) : =! 1& q ; q # [0, 1]

0 ; q > 1.

There are four firms that serve the market— 0,1,2,3 with capacity constraints k"i = .5, .45, .45, .45

and zero marginal and fixed costs (adopted for simplicity). 0 is the reputation builder. First restrict

attention to the reputation game with n = 1. Player 1#s minmax is 116 . So in the c.i. repeated

game it is possible for 0 to get anything consistent with player 1 getting her minmax. In particular

0 can get #m & 116 = 3

16 . As I argued in example 1 above, player 0 can guarantee himself very close

to 316 in the i.i. game by mimicking a “crazy” type. The construction is the same as above; to keep

this example short I shall not mention the details of the type space. The assumption on discount

factors is also unchanged. Let us now compute the maxminmax of each firm i > 0 , say firm 1 .

This is the minmax value of i when 0 plays an action that is most favourable to i and all other try

to minmax her. Suppose that firm 0 produces an output of 0, while firms 2 and 3 produce .45 each.

The best firm 1 can achieve is obtained by solving:

maxq{1& (.9 + q)}q = .1 ) q & q2

The first order condition gives q" = 120 , which generates a profit of 1

20 ) ( 110 &

120 ) = 1

400 = Wi

, the maxminmax profit of i. This srtictly exceeds the minmax value wi = 0 of any i > 0 , when

all other firms flood the entire market; in other words, 0’s cooperation is needed to minmax any

i > 0. Now we need to check that the condition N introduced formally later holds; this requires us

to check that no matter what 0 does, the others can find an output vector giving each of them more

than Wi = 1400 . When 0’s output it low the other firms find it easier to attain any target level of

profit. So pick the worst slice of .5. We wish to find a symmetric point in this slice that gives each

10

player i > 0 strictly more than the maxminmax. Find the maximum symmetric point in the slice:

maxq (12& 3q) ) q * 1

2= 6q" * q" =

112

.

The associated level of profit for each firm i > 0 is ( 12 &

14 ) ) 1

12 = 148 > 1

400 . Therefore the non-

emptiness assumption N that I formally introduce later on is satisfied. Now we have to calculate

the lowest profit that 0 could get in any slice subject to the others getting above 1400 . Even without

exact and painstaking calculations it can be shown that this profit is very close to 1100 ; the argument

follows. Supose the other firms almost flood the market so that the price is 150 ; the maximum 0

could be producing is 0.5, earning a profit of 1100 . Each of the three remaining firms can produce

at least 18 , thereby making a profit of at least 1

8 )150 = 1

400 = Wi. Thus all collusive outcomes of

the repeated game that give 0 more than some very low number (below 1100 ) can be sustained even

in the repuational game. Recall that in contrast 0 can guarantee himself 316 +

1100 when only one

opponent is present, the presence of multiple opponents bringing about approximately a 20-fold

drop in the minimum assured profit of player 0.!

3 The Model: Reputation Against Multiple Opponents

There are n + 1 players— 0, 1, 2, ..., n ; 0 is the relatively patient, also referred to as long-run(LR),

player who attempts to build a reputation. We refer to players i > 0 as impatient 5 or SR (for

short-run) players even when their discount factors are strictly greater than 0. Let us first describe

the temporal structure of the complete-information (c.i.) repeated game, which is perturbed to

obtain the “reputational” or incomplete-information(i.i.) game. At each time t = 1, 2, ... the players

play a simultaneous stage-game

G = ,N = {0, 1, ..., n}, (Ai)ni=0 , (gi)n

i=0- .

Ai is the finite set of pure actions in each stage, and Ai := .(Ai) is the set of mixed actions 'i of

player i. An action profile a of all players is in A := /ni=0Ai ; the action vector a+ of the players

i > 0 lies in A+ := /i>0Ai. A comment on the use of subscripts is in order: Profiles are denoted

without a subscript; the ith element of a profile/vector is denoted by the same symbol but with

the subscript i ; the subscript + denotes all players i > 0 collectively. The payo! function of agent5The impatience is only relative.

11

i $ 0 is gi ; and g : A ! Rn+1. For any E 0 Rn, the projection of E onto the plane formed by

coordinates in J 0 {1, 2, ..., n} is denoted by EJ . For any E 0 Rk, define coE as the convex hull of

the set E. Let us recall the following definitions that are standard in the literature: The minmax

value of any player i $ 0 in G is wi. The feasible set of payo!s in G is F := co{g(a) : a # A} ,

using an RPD6(Random Public Device) available to agents; (t is the observed value of the RPD in

period t , and (t denotes the vector of all realised values from period 1 to period t. The strictly

individually rational set is F " := {v # F : vi > wi "i $ 0}. Monitoring is perfect.

Let us now add incomplete information to the repeated game, converting it into the reputational

game that we study in this paper. The players 1, 2, ..., n all maximise the discounted sum of per-

period payo!s; to cut down on notation and facilitate comparison with the literature on repeated

games I take a common discount factor : !i = ! "i > 07. However the patient player could be

one of many types $ # #. The prior on # is given by µ # .(#). The type-space # contains $%,

the normal type of the repeated game, who maximises the sum of per-period payo!s discounted by

!0 , the discount factor of agent-0. The other types are also expected utility maximisers but their

utility functions may not be represented as the discounted sums of stage-game payo!s; consider,

for example, the “strong monopolist” in the chain-store game of KW always prefers to fight entry.

The normal type $% might want to mimic these “crazy” types $ '= $% in order to secure a higher

payo!. As Fudenberg and Levine show, with !0 high enough the normal type of player 0 can get

away with the deception : If he is patient enough, mimicking a sutiable type guarantees him very

high payo!s. It is clear that the type space # must include suitable types for the normal type to

mimic, a feature captured in reputation papers by means of some form of full support assumption.

What types constitute a rich type-space? In order to investigate the maximal impact of reputation

I allow a rich perturbation that allows the crazy types to use strategies with infinite memory, as in

ET8. All strategies of bounded recall are subsumed in this set. To restate this more formally each6Given any payo! v in the convex hull of pure action payo!s, Sorin constucted a pure strategy without public

randomisation alternating over the extreme points so as to achieve exactly v when players are patient enough. Thisdoes not immediately allow us to get rid of the RPD because the construction of Sorin need not satisfy individualrationality — after some histories the continuation payo! could be below the IR level. Fudenberg and Maskinextended his arguments and showed that this could be done so that after any history all continuation payo!s liearbitrarily close to v ; if v is strictly individually rational so are the continuation values lying close enough. Takentogether these papers showed that a RPD is without loss of generality when players are patient. I continue to makethis assumtion in the interests of expositional clarity; a section extending the results of Sorin and FM91 to this gameis planned.

7Lehrer and Pauzner (Econometrica, 1999) look at repeated games with di!erential discount factors, and find thatthe possibility of temporal trade expands the feasible set beyond the feasible set of the stage-game. This creates noproblems for my results because the stage-game feasible set continues to remain feasible even if all !is are not equal .

8With two players, i.e. one opponent, it is enough (see Aoyagi and CFLP ) to consider all possible types with thefollowing bounded recall strategies — each crazy type with bounded recall " # N plays in each period an action thatdepends on the actions played by the other player in the past " periods.

12

type $ '= $% is identified with or defined by the following sequence

$1($) # A0,$t+1($) : At+ ! A0

, which gives an initial (t = 1) action and for each t > 1 maps any history of actions9 played by

players 1, 2...n into an action of player 0. I omit the time subscript when it is unlikely to cause

confusion. In what follows we fix (G, #, µ), where G is the stage-game, # = {$%, $1, ...,$K} is an

arbitrary finite set of types, and µ is the prior on #.

The dynamic game starts with an announcement phase at date-0 : The LR player sends a mes-

sage m # # , announcing his type. Then the usual repeated game with perfect monitoring is played

out over periods t = 1, 2, ...,%. Adding an annoucement makes this a (still rather complicated)

signalling game. This assumption is also used by Abreu and Pearce(2002,2007); that it permits

considerable expositional clarity will be seen from the complexity of the strategies and the analysis

when we later extend the main result without incorporating an announcement stage at time 0. As

with any other signalling game, the player is free to disclose a type m '= $, his true type.

In this framework an equilibrium comprises the following elements, defined recursively over the time

index:

(i) a messaging strategy m : # ! # for player 0 mapping his true type into the announced type;

(ii) for type $ of 0 a period-1 map %#(1) : # / {(1} ! A0 from the announcement space to the

period-1 action ; and for each t > 1 a map %#(t) : #/At!1 / {(t}! A010;

(iii) for each i > 0 a map %i(1) : #/{(1}! Ai , and for each t > 1 maps %i(t) : #/At!1/{(t}!

Ai ;

(iv) beliefs µ(.|ht!1) # .(#) are obtained by updating using Bayes’ rules wherever possible and

the t& 1-period history ht!1.

Additionally we stipulate that %#(t) is identically the strategy that is given by $ , if $ '= $o. This

in e!ect is a denifition of a “crazy” type. Let %# := {%#(t)}t'1; and let the collection of maps

{%#}#$! be denoted by %0. The strategy %i of player i is the collection of maps {%i(t)}t'0. In what

follows ui(•) refers to the discounted value of a strategy profile to player i , possibly contingent on

a certain history ; thus ui

#%|m, ht!1

$is the sum of the per-period payo!s of player i discounted

to the beginning of period t when players play in accordance with % after player 0 announces a

type m and the history of actions is ht!1. Even when m '= $% we refer to the payo! of the normal9Potentially player 0 could condition his play on the RPD upto period t + 1 , i.e. on #t+1 , in addition to at

+.This would not change our results or proofs; the notation is more involved though.

10Time over which actions are taken is numbered from 1 onwards, since period 0 is just an announcement phase.

13

type of player 0 whenever we use u0(•) ; since the “crazy” types do not have discounted payo!s,

we sometimes refer to this as the “dummy payo! to player 0” to be more accurate. We state the

following familiar definition.

Definition 1 : A tuple (m, %"0 , %"1 , ...,%"n, µ(.|ht!1)t>1) defines a Perfect Bayesian Equilibrium

(PBE) if no player has a strict unilateral deviation: For any i $ 0, we have

u0(%|m($%)) $ u0(%|$)"$ # # , and for any message m and any t& 1-period history ht!1 we also

have ui(%"|m, ht!1) $ ui(%i,%"!i|m, ht!1)"%i # %i. Furthermore, beliefs are updated using Bayes

rule wherever possible, as is the case in a PBE.

A subclass of equilibria are the ones in which there is truth-telling by the normal type of player-0:

Definition 2 : A truthtelling equilibrium is a triple (m, %, µ(.|ht!1)t>1) such that m($) = $ " $ #

# , and (m, %) is a PBE11.

With this notation in place let us state the result with a single long-lived opponent and perfect

monitoring, due to Evans and Thomas(ET). Player 0 is the reputation builder, while 1 is the normal

type opponent (n = 1). ET makes the following simplifying assumption:

Assumption PAM (Pure Action Minmax) : 0 can minmax 1 by playing a pure action.

As they point out, this assumption, while restrictive, is adopted for technical simplicity; otherwise

mixed strategies need to be learnt as in FL(92). Suppose we wish to approximate the best payo! g""0

for 0 to at most a margin " of error. Find a sequence of action profiles/pairs (a""0 (t), a""1 (t))t=1,...,T

such that the sum of 0’s payo!s over this block of T action pairs, i.e. 1T

"Tt=1 g0 (a""0 (t), a""1 (t))

, is within "/3 of g""0 , while 1T

"Tt=1 g1 (a""0 (t), a""1 (t)) is greater than 1’s minmax value. Let $

be the type that plays as follows: (a) 0 starts by playing a block of T actions (a""0 (t))Tt=1 ; (b) if

player 1 responds with the string (a""1 (t))Tt=1 , he repeates this block ; (c) when player 1 deviates

for the kth time from playing the role prescribed above, player 0 minmaxes her for k periods using

the pure strategy minmax from PAM above; (d) 0 returns to step (a) irrespective of what actions

player 1 responded with during punishment. The key feature is that $ metes out harsher punish-

ments as 1 continues to deviate. Before we state their result, let us define V (!0, !1) 0 R2 as the

set of Bayes-Nash equilibrium payo!s for discount factors !0 and !1 respectively. The associated

payo! set for player 0 only is given by the projection V0(!0, !1) 0 R of this set onto dimension

0. It might be useful to remind the reader of the following notation, which we introduced earlier

— g""0 is the maximum payo! of LR consistent with player 1 getting at least her minmax; i.e.

g""0 := max{v0 : (v0, v1) # F} , where F is the feasilble set of the 2-player c.i. game.11In other words, players 1, 2, ...n and type $" of player 0 do not want to deviate. For $" we need to ensure both

initial truthtelling and subsequent compliance.

14

Proposition 0 (Evans and Thomas, 1997, Econometrica) : Suppose PAM holds and µ($) >

0, i.e. the prior µ places a positive weight on $. Given " > 0 there exists a !min1 < 1 such that for

any !1 > !min1 , we have lim inf!0&1V (!0, !1) > g""0 & ".

Proof : See Evans and Thomas for details; a sketch follows. Fix " > 0 . This is the margin of error

we shall allow in approximating the best payo! g""0 .

Step 1: We have seen that the block of action profiles/pairs (a""0 (t), a""1 (t))t=1,...,T has the property

that 1T

"Tt=1 g0 (a""0 (t), a""1 (t)) , is within "/3 of g""0 , while 1

T

"Tt=1 g1 (a""0 (t), a""1 (t)) is greater

than 1’s minmax value.

Step 2 : Consider the type $ defined above. If µ($) > 0 , one available strategy of the normal type

of 0 is to declare and mimic $; the payo! from doing this is a lower bound on his equilibrium payo!.

Step3 : Apply Lemma 1 of ET (the Finite Surprises Property of FL 89) to show that if player 0

follows the above strategy at most a certain finite number of punishment phases can be triggered

without player 1 believing that with a high probability he is facing the type $. Also note that since

punishments get progressively tougher, the on-path play gets almost as bad as being mixmaxed

forever, whereas for a patient player 1 the discounted per-period payo! from (a""0 (t), a""1 (t))t=1,...,T

exceeds her minmax value. Together these two observations imply that in any BN eq. a patient

player 1 must eventually (after triggering enough punishments) find it worthwhile to experiment

with the actions (a""1 (t))t=1,...,T . Once she does so, by construction 0 gets a payo! which is within

"/3 of 1T

"Tt=1 g0 (a""0 (t), a""1 (t)), and therefore within 2"/3 of g""0 . Finally we re-use the fact that

0 is very patient — If 0 is relatively patient, losses sustained while mimicking $ cannot cost him

more than another "/3 in terms of payo!s. Thus mimicking the type $ assures player 0 payo!s

within " of g""0 . !

The upshot is that it is possible to secure very high payo!s for the normal type of 0 in the i.i.

game even when his opponent is also patient, as long as 0 is relatively patient and has the option

to mimic types that punish successive deviates with increasing harshness.

4 An Upper Bound

Here we start by introducing some notation that will prove useful later. For any choice of an action

a0 # A0 by player 0, “slice”-a0 refers to the induced game among the n normal-type players; actions

in A0 have a one-to-one relation with slices.

Definition 3 : The slice-a0 is formally the game G(a0) induced from G by replacing A0 by {a0}

15

, and restricting the domain of gi to {a0}/A+, i.e.

G(a0) :=< {0, 1, ..., n}; {a0}, (Ai)i>0; (gi)i'0 >

, such that on it domain of definition gi coincides with gi.

For each slice a0, define the slice-feasible set of payo!s as

F (a0) := co{g(a0, a+) : a+ # A+} 0 Rn+1

Notice that the set above is in the payo! space of n + 1 players although no more than n players

have non-trivial moves in any slice.

Defininition 4 : The conditional minmax of i > 0 in the slice-a0 as the minmax of i > 0

conditional on the slice G(a0) ; call it wi(a0).

This is defined exactly as in the usual theory once we replace the game among n + 1 players by a

slice played by n SR players. Just as the minmax wi of player i is delivered by the action profile

mi, we define the conditionally minmaxing punishment of player i > 0 in slice-a0 as mi(a0) # A+

such that

gi(a0, mi!i(a0), mi

i(a0)) = maxaigi(a0, mi!i(a0), ai) = wi(a0).

Now we come to the main result — Proposition 1 below shows an upper-bound on the minimum

payo! of player 0 accross all equilibria. Note that this is not an upper bound on the payo!s of

player-0 in the i.i. game: The best possible payo! that LR can obtain in the repeated game is

g""0 := max {v0 : (v0, v+) # F "for some v+ # Rn}; therefore it is an equilibrium payo! in any i.i.

game as well — 0 will play along if this desirable equilibrium is proposed because there is nothing

better he could possibly obtain. So the upper bound on his payo!s in both the c.i. and the i.i.

games is g""0 , a quantity that is di!erent from the bound l defined below — l is an upper bound on

the minimum equilibrium payo! of player 0 in BN equiibria. Putting this di!erently, if we pick a

v0 # (l, g""0 ) there exists an equilibrium in the i.i. game that gives 0 a payo! of v0. In what follows

the term “upper bound” should be understood in the sense above; the lax usage, I hope, will avoid

a tongue-twister without compromising clarity. I first define a new term, the maxminmax value ,

which is useful when we state and prove the bound. Recall definition 4: The conditional minmax

wi(a0) of i > 0 in the slice a0 is the minmax of i in slice G(a0).

Definition 5 : The maxminmax of a player i > 0 with respect to 0 is defined as Wi :=

maxa0$A0wi(a0), the maximum among all conditional minmaxes.

16

It is thus defined like the usual minmax but under the additional assumption that when others

(j '= i and j > 0) try to minmax i > 0 player-0 takes the actions that is best for i. In general it is

strictly greater than the minmax value, in as much as the others require the active cooperation of

player-0 to punish i most severely; the minmax value is wi12. Truncate the set F (a0) below Wi

to get

F(a0) := {v # F (a0) : vi $ Wi" i > 0} .

Both sets above are in the n + 1-Euclidean space —F (a0),F(a0) 0 Rn+1; the projection of F(a0)

onto the 0th coordinate is denoted by the subscript 0 , i.e. F0(a0) 0 R.

First, observe that the worst payo! for player-0 in the slice-a0 subject to player i getting above

her maxminmax is

w0(a0) := inf F0(a0) 1 min { v0 : ( v0, v+) # F(a0)} .

Now consider the maximum of these infima over all slices :

l := maxa0w0(a0) 1 maxa0inf F0(a0).

This is the maximum among the worst payo!s in each slice for player-0 subject to all others getting

their maxminmax. Define by B(W, r) the ball of radius r about the vector W := (W1, ...,Wn). We

now introduce a non-emptiness assumption13 :

Assumption N:%

a0$A0{F+(a0)}

%B(W, r) '= ! for all r > 0.

N (Non-emptiness) says that in all slices it is possible to get to a point close to the maxminmax

vector W for players i > 0. Returning to a model stated earlier, an undi!erentiated good monopoly

without capacity constraints does not satisfy N because the LR player can unilaterally minmax all

others. But in an oligopoly with a di!erentiated good this would be, in general, satisfied. N is both

intuitively plausible in a large class of games, and easy to state; furthermore, under N there exists

an upper bound l on what reputation can guarantee player-0 across Bayes-Nash equilibria and even

PBE of the game. We shall have l < g""0 .

Let V ci(!) be the set of equilibrium14 payo!s of the c.i. repeated game defined by G. The12wi := minA0wi(%0). $ minA"imaxAiui(%#i, ai)13This stronger version (N) may be further weakened to requiring the intersection

Ta0$A0

{F+(a0)} to have anon-empty interior. An analogous characterisation would work once we refine some of the quantities, but I adoptthe slightly stronger N because it contributes to expositional clarity and is satisfied in the illustrative examples.

14What concept of equilibrium do we use? When we use NE for the c.i. game the corresponding i.i. game usesBNE; and when we use SPNE for the c.i. game, the right notion of equilibrium for the i.i. game is PBE. The contextwill make it clear which one of the two we adopt.

17

limiting payo! set in the common discount factor ! is V ci := lim!&1V ci(!) . Similarly V (!0, !)

is the equilibrium payo! set of the i.i. game when 0#s discount factor is !0 and the others have a

common discount factor !. Let us be precise about how the limiting payo! set is defined for the

reputational game :

V := lim!&1lim!0&1V (!0, !)

As in Schmidt, Aoyagi, and CFLP, the order of limits in defining V is important— player 0 is

always patient relative to the others regardless of how patient they are.

Before proceeding further we introduce a full-dimensionality assumption, which first appeared in

FM:

Assumption FD (Full Dimensionality): The set F has dimension n + 1.

We now state and prove a lemma that will prove useful in proving the bound in Proposition 1 below.

In every slice we find an action profile )(a0) for the SR players such that each i > 0 gets more than

her Wi and 0 gets either les than or very close to w0(a0) when (a0, )(a0)) is played.

Lemma 1 (Threat Points Lemma): Fix " > 0. For each slice a0 there exists a probabil-

ity distribution over action profiles )(a0) # .A+ such that gi(a0, )(a0)) > Wi " i > 0 and

g0(a0, )(a0)) < w0(a0) + " 2 l + ". (NB: " is the same for all slices. )

Proof: see appendix

Proposition 1 below constructs an equilibrium of the i.i. game in which $% can be held down

to payo!s arbitrarily close to l. Before I formally state the result I should remind readers that V

is the limit of the PBE payo!s of the i.i. game as !i ! 1 and 1!!01!!i

! 0 for all i > 0; and vmin0

denotes the infimum of 0’s payo!s in V . The proof below assumes that mixing is ex-post observable,

this assumption being required to punish the SR players. Equivalently one could assume that each

i > 0 can be minmaxed in pure strategies. This is a technically convenient assumption and does not

a!ect the crux of the argument. In a later section I show how the argument of FM86 may be used

to dispose of the assumption that mixing by players i > 0 is observable even ex-post. Suppose we

want to give player 0 a payo! of v0 = l + ", for any given " > 0 small enough. The proof hinges on

our ability to force the normal and relatively very patient player-0 to reveal himself at the signalling

stage, for if he does so it is immediate that we can merely play out the repeated game equilibrium

%ci and give him u0(%ci) = v0. I construct a PBE that gives 0 less than v0 if he is $% but announces

m '= $%. The next figure should clarify the preceeding lemma and the proof of the first proposition.

18

Proposition 1: Under assumptions N and FD and (ex-post) observable mixed strategies, vmin0 2 l.

Proof: Fix a small " > 0. We assert that the following strategies are part of a PBE giving $0 a

payo! of v0 := l + ". Let %ci denote the equilibrium(SPNE) strategy profile of the c.i. repeated

game that gives $% a payo! of v0. If m($%) = $% then there is no problem in giving 0 a payo! of

v0. Recall that the “commitment” strategy $t(m) maps from At!1+ into slices. Lemma TPL above

asserts that in each slice we can construct a “threat point” )(·) giving 0 no more than l + " and

player i > 0 at least *i , which exceeds Wi. Given any announcement m '= $% by player-0 we

shall specify the strategy %+(.|m) of the players i > 0 to play the action )($1(m)) in period 1 and

)($t(m)(ht!1+ )) in all periods t > 1 until a deviation. Notice that 0 cannot profit by behaving like

some $ '= $% if all players i > 0 play in accordance with the strategy %+.

Pick + # B(W, r) such that + # F (a0)" a0 and r is small enough15 so that +i < *i. (This step

is possible since N holds.) Fix & > 0 such that *i > +i + & > +i > Wi. For each j > 0 define the

n-dimensional vector +(j) by +j(j) = +j and +i(j) = +i + & < *i "i '= j. The quantity & is a small

reward that is given to all the SR players who carry out a punishment against another player.

If the announced type in stage 0 is $%, then play the repeated game strategies %ci that deliver

the payo! l + " to player 0. If $ '= $% start Phase I, where the prescribed play at t conditional on15Any r < min1,2,...,n{&i %Wi} su"ces.

19

the t & 1-period history ht!1 is )($t(m)(ht!1+ )), the mixed action profile that gives player 0 very

close to lowest possible payo! in the slice that the announced crazy type would play. Fix a slice a0.

Denote gj(a0, mi(a0)) = wij(a0) as the payo! to player j '= i when i is being conditionally minmaxed

in the slice a0. Suppose player 0 has never deviated from his announced crazy strategy in the past

and that player i > 0 deviates from the prescribed path at time & ; then play enters Phase II(i) ,

where player i is conditionally minmaxed in slice $t(m)(ht!1+ ) at time t = & + 1, ..., & + P , where

the length P of the punishment satisfies the following inequality:

PWi + max gi(a) < P+i + min gi(a)"i > 0....())

This condition is always satisfied for some large enough integer P since Wi < +i. Once Phase II(i)

is over play moves into Phase III(i), where the play is such that +(i) is delivered at t = & + P +

1, & +P +2, ...,% ; recall that the ith component of the vector +(i) is +i , and all other components

j > 0 are +j + &. So +(i) incorporates a small reward of & for each SR player j other than the last

player(i) to be punished. If player j unilaterally deviates from Phase II(i) or Phase III(i) then

impose Phase IIj followed by Phase IIIj , and so on.

Conditional on an annoucement m = $ '= $%, players believe that they are indeed facing type $

until 0 deviates from $(m) and reveals himself to be the normal type. When 0 deviates he may be

minmaxed as in the repeated game. If some i > 0 deviates after 0 has deviated in the past, then

we are in the c.i. game — minmaxing i for P periods (see * above ) provides a su"cient deterrrent.

Let us see more precisely why the above is an equilibrium that is sequentially rational. The

argument runs similar to FM(86). Suppose m = $ '= $% has been announced, and check that every

player i > 0’s strategy is unimprovable holding the strategy of the others to be fixed. Suppose that

he deviates when 0 has never deviated before. Then if he is patient enough his payo! is close to

+i < *i , the payo! if he does not deviate.

Step 1: i > 0 does not deviate from Phase II(j) or Phase III(j) because he ends up with +i rather

than +i + &.

Step 2: i > 0 does not deviate from Phase II(i) because he is anyway playing his best response;

anything else gives a lower payo! in the current period and also prolongs the punishment; devi-

atiing from Phase III(i) is also not profitable because inequality ()) ensures that restarting the

punishment is costly for i.

Step 3: We now reason that $% does not deviate. The normal type annouces $% truthfully because

his payo! is l+ " when he announces truthfully and sticks to the equilibrium, whereas it is less than

20

l + " if he either announces anything else and faithfully mimics that type or if he announces $ '= $%

and does not play like the announced type. The moment player 0 deviates from his announced

strategy he reveals himself as the normal type; he is then minmaxed long enough to wipe out all

1-period gains from cheating and then we play out a repeated game strategy that gives player 0

close to his minmax. Players i > 0 minmax him as they would in the repeated game equilibrium

from which there is no profitable deviation; players i > 0 do not deviate from this part of the

proposed equilibrium using the same argument as in a pure repeated game. !

To summarise, if 0 declares a type other than $% the construction of the equilibrium keeps track

not only of what punishment, if any, is ongoing but also what the next action of player 0 is. If the

normal type is declared or revealed later through an o!-path deviation then plays reverts to the

usual c.i. repeated game strategies after all gains from cheating have been wiped out. The key to

the construction is to instruct all i > 0 to play a action profile (that resolves into a pure action

contingent on the RPD) that gives the normal type of player 0, i.e. type $%, the lowest payo!

in the slice that he is called upon to play as part of his declared type. Proposition 1 shows that,

even while respecting sequential rationality, we can impose upper bounds on what reputation can

achieve by delivering a payo! of no more than l+ " to the LR player while giving the others at least

their maxminmax value Wi. Proposition 1 below shows that if (v0, v+) # Rn+1 is an equilibrium

payo! vector in limiting set V ci under c.i. and v0 > l , then for any given pair (#, µ) the limiting

equilibrium payo! set V of the reputational games contains a n + 1-dimensional vector that gives

player-0 the value v0. Now we can restate the content of Proposition 1 in alternative notation as

[v0 # V ci0 and v0 > l ] * v0 # V0.

The next extends this result to include all payo! vectors in which 0 gets strictly more than l and

all others get more than their minmax wi rather than the maxminmax Wi ; it is thus a quasi

folk-theorem result.

Proposition 2 : Under N , any payo! v # F " that gives the LR player strictly more than l can be

sustained ; i.e.

[(v0, v+) # V CI & v0 > l ] * (v0, v+) # limµ(##)&1V.

Proof : Fix a probability µ($%) of the normal type. Consider an equilibrium %ci of the c.i. repeated

game in which the LR player gets u0(%ci) > l . Using %ci we construct a truthtelling equilibrium %

21

of the reputational game in which player 0 gets u0(%ci) conditional on any draw by nature. If the

announced type is the normal type , then play the eq. of the repeated game with the usual checks

and balances. If m '= $% then play the equilibrium outlined in Proposition 2 , giving the normal

type less than {l + u0(%ci)}/2. ( Take " of Proposition 1 as any positive no. below {u0(%ci)& l}/2.

) Our proposition now follows when we note that in this equilibrium ,independently of µ, the other

players get v+ if nature chooses the normal type of player-0 ; as this prior probability goes to 1,

their payo! vector tends to v+ . !

5 Some Extensions

Unobservable Mixed Strategies

As mention earlier a crucial assumption for the above proof is that the probabilites of mixing by

any player are ex-post perfectly observable. While this might be a reasonable assumption in some

circumstances, one can equally easily come up with ones where this is less natural. This creates a

problem during the punishment phases of players i > 0. Note that punishing player 0 for declaring

a non-normal type does not make use of the observability of mixed strategies, but punishing any

other player does: If player j '= i, 0 is not indi!erent between the myopic payo!s of all the actions

in the support of mij , then j will not mix with the desired probabilities in Phase II(i) ; deviations

to actions outside the support of mij are readily detected and deterred in the usual way. While the

minmax in the class of pure strategy punishments is immune to this complication, it is in general

less severe than the mixed-action minmax and reduces the payo!s that may be supported. As

noted by FM(86), the trick is to make each player j > 0, j '= i indi!erent between all actions in the

support of mij by adjusting the continuation values at the end of the minmax phase depending on

the realised sequence of actions of player j during the minmax phase.

Proposition 3 : Fix " > 0. Under N and FD and unobservable mixing by i > 0, there exists a

payo! of the incomplete information game in which player 0 gets l + " and any i > 0 gets at least

Wi.

Proof 16: Fix " > 0. Set v0 := l+" as the target payo!. We shall construct strategies that are part of

a sequentially rational equilibrium. Pick + as before and & > 0 such that W < + < + +&!& < (v0, *)

,17 where &!& denotes a vector with each element as &, and * = (*1, ...,*n). (This is the same16To prevent cluttering of notation we do not make explicit the presence of the RPD in our proofs.17Recall our convention for vector inequalities: x < y & xi < yi'i ; x ( y & xi ( yi'i and x "= y ; x ! y & xi (

yi'i.

22

construction as in Proposition 1.) If 0 declares himself as normal type( i.e m = $%) , play the (c.i.)

repeated games equilibrium %c.i. giving l + " to 0 , and at least Wi to all others. Otherwise start

play in phase I. In describing the phases below we will need to repeatedly use terms of the form zij

, the sum of realised utilities of player j during Phase II(i) discounted by the factor 1!!!P . After

player i has been minmxed or conditionally minmaxed, play transitions to a point adjusted by the

quantities zij . Also note that for high !s the magnitude of zi

j cannot exceed &/3; this ensures that

no continuation value below goes outside the set.

Phase I: Play action n-tuple giving a payo! of l + " to 0, this action vector being dependent on the

announced type and the history, i.e. in period t play the action n-tuple )($(ht!1+ )), where ht!1

+ is

the history of play by the normal types of 1, 2, ...n upto and including period t& 1 and )(a0) # A+

gives 0 no more than l + " in slice a0. If player 0 deviates from his announced strategy during any

phase, go to Phase II(0). If player i > 0 deviates unilaterally from any phase and Phase II(0)

has never been triggered, then switch to Phase II(i); if player 0 has deviated from his announced

strategy in the past, then switch to Phase III(i).

Phase II(0): Minmax 0 for P periods, where P is defined as before by ()). Then go to Phase

V (0, z01 , z0

2 , ...z0n). Note that play transitions to a stage that is indexed by the actual payo!s that

are earned by each of the normal type players during the minmaxing phase. Since we are in phase

II it implies that player 0 has so far stuck to his announced strategy and there is no need to worry

about payo!s of 0 when i > 0 is being punished.

Phase II(i): Conditionally minmax i for P periods. Then go to Phase IV (0, zi1, z

i2, ...z

in).

Phase III(i): Minmax i for P periods. Denote the discounted sum of realised utilities of player j

for Phase III(i) as zij . Then go to Phase V (i, zi

1, zi2, ...z

in) .

Phase IV (i, zi1, z

i2, ...z

in): Play the action tuple that gives players 1,2...,n the payo! vector (+1 +

&/2& zi1, ..., +i, ..., +n + &/2& zi

n). Since i is the last player to deviate note that she does not get

the reward &/2 .

Phase V (i, zi0, z

i1, z

i2, ...z

in): Play action vector that gives (+0, +1+&/2&zi

1, ..., +i, ..., +n+&/2&zin)18

.

From the standard folk theorem argument it follows immediately that the proposed strategies

constitute a sequentially rational equilibrium following the announcement m = $%. All that remains

is to verify that the play following m '= $% is also sequentially rational, and that the normal type

of player 0 has an incentive to tell the truth. This is done in a number of steps, checking that there18Note that play transitions to a stage that is indexed by the actual payo!s of all players including 0 during the

minmaxing phase. Since we are in phase III it implies that player 0 has revealed himself as the normal type and hispayo!s also need to be adjusted to ensure that he mixes with the appropriate probabilities.

23

is no history such that a unilateral one-step deviation by any j $ 0 at the chosen history gives j

strictly greater utility in the continuation game than the proposed equilibrium strategies. That this

su"ces folows from the definition of NE and well known results in dynamic programming. We first

check the incentive constraints for the SR players and then for LR.

Step 1: i > 0 has no incentive to deviate (always one-step and unilateral in what follows) from

Phase I .

If i deviates the maximum gain is (1& !)bi + !(1& !P )wi + !P+1+i , which is less than vi for ! close

to 1 because it converges to +i in the limit and by construction +i < vi.

Step 2: i > 0 has no incentive to deviate from Phase II(j) , where j '= i.

If i deviates to an action outside the support then i’s per-period payo! in the game converges to

+i < +i + &/2 in the long run. Thus she does not get the reward &/2 , which is given for carrying

out the punishment. Given the definition of zji , player i’s utility is independent of the probabilities

of mixing.

Step 3 : i has no incentive to deviate from Phase II(i).

If i deviates to an action outside the support, he not only plays a suboptimal response in the current

period but also re-starts the punishment; this lowers the current and future utility stream.

Step 4 : i > 0 has no incentive to deviate from Phase III(j) , where j '= i.

Step 5 : i has no incentive to deviate from Phase III(i).

If i deviates the maximum gain is (1& !)bi + !(1& !P )wi + !P+1+i. The payo! from conformity to

the equilibrium is +i. Thus a su!cient condition to rule out any profitable deviations is

(1& !)bi + !(1& !P )wi + !P+1+i < +i

As ! ! 1 the LHS converges to the RHS. But rearranging we get that the above is equivalent to

bi + !(1 + ... + !P )wi < (1 + ... + !P )+i

As ! ! 1 the LHS converges to bi + Pwi < (P + 1)+i , which is what the RHS converges to. This

holds from the definition of P .

!

24

Long-run Player Moves First

This section extends the results of the previous section to the case where the long-run player moves

first. Given any simultaneous stage game G as above, define the extensive form stage-game Gseq in

which player 0 moves first and players 1,2,...n move simultanously after observing the action chosen

by player 0. There is the obvious and natural one-to-one mapping from the set of action n + 1-

tuples of G to the set of terminal nodes of Gseq ; use that to define utilites for Gseq .

Corollary 4 : Even without announcements there exist sequentially rational equilibria giving (nor-

mal )player 0 a payo! of l + " when the stage game is of the form Gseq and N holds.

Proof: same as in Proposition 1 above.

Reputation without Announcements ( A Sketch )

The earlier framework poses the problem as a signalling game, where 0 is first given the opportunity

to state his type. While adding an annoucement phase is a natural modification, one might still

wonder if the result goes through without announcements19.Fortunately one can show that the main

result extends, although at the cost of added complexity of the strategies.20.

First, a few comments on the intuition and the strategy of proof of proposition 5 , which is the

counterpart of Proposition 1 when announcements are not available. Consider a strategy %ci of

players 0, 1, ..., n in the repeated game giving player 0 a payo! of l + " and any i > 0 at least Wi.

We shall now construct an eq. of the i.i. game in which $% plays according to %ci0 . Consider the

following strategy in the i.i. game. Players start by playing according to %ci; as soon as 0 deviates

from %ci0 the prior µ is appropriately modified. Based on the updated beliefs find the type of player

0 that is most likely the eq. strategy the next action played is intended to keep 0 down to l + ".

It is important to note that if |supp(µ)| = K , then along any path there can be at most K points

where not all types that are currently in the support are expected to not behave identically. This

is the content of Lemma 2. At each such point at least one type is eliminated from the support of

the posterior derived from µ by Bayesian updating. Thus there can be no more than K mistakes in

predicting the strategy of 0. More than K mistakes is a 0-probability event under the equilibrum

strategy and players 1, 2..., n (rightly) believe that they are facing a normal type of player-0 who

has deviated. Although this means that normal type of player 0 cannot gain by deviating at more

than K points, this is not enough for my proof to go through. There are two potention problems19The results on the previous section titled“Long-run Player Moves First”goes through even without annoucements.

As we reasoned above, annoucements play no role when 0’s action is visible before the normal players 1,2,...n move.20The critical value of the discount factor required for the proof to work would in general be dependent on the

cardinality of the type-space, but the limiting value is independent of the fine details.

25

with this line of argument. First, since each deviation gives player 0 a one-shot gain, it might be

tempting for a normal-0 to deviate at points of time that are very far apart. It might take so long

for his true type to be discovered that punishments could become irrelevant from the point of view

of the current period. Since he can choose any strategy he wishes, there is nothing to prevent him

from deviating at points of time very far apart. To prevent this we apply a “real-time” correction.

When a mistake has been made in predicting the strategy of 0, the others are asked to apply an

immediate correction by minmaxing 0 to wipe out the gains that a normal type of 0 would have

accumulated by that deviation. This need not be viewed as a punishment but as a corrective. The

second question is : Why would i > 0 stick to their assigned role? The argument is the same as in

Proposition 1 above as soon as we note that if i deviates she can escape punishment, and perhaps

even get good payo!s , for at most K periods; if ! is high enough , then punishments cannot be

avoided for too long — conditionally minmaxing her is enough. Before we state lemma 2 formally,

we recall the following — µ(•|ht!1, %) denotes the posterior at the start of the period t derived from

the prior µ , conditional on the history of play upto period t & 1 and the equilibrium strategy %.

For convenience we sometimes denote the above by µt!1.

Lemma 2 (Finite Non-identical Play Property ) : Suppose $% /# supp(µT ), for some

T > 0. Along any path of play, equilibrium or otherwise, there can be at most |#& {$%}| = K & 1

periods after T where all types of player 0 that are still in the support of the posterior beliefs do not

play the same action; consequently in at most K periods can the normal type of player 0 get more

than l + ".

Proof: Let A(µt!1) denote all actions that have positive probability of being played in period

t by player 0 given µt!1. Note that for all periods suppA(µt!1) 2 supp(µt!1) : At most each crazy

type plays a di!erent action. Suppose that suppA(µt!1) > 1. If at0 /# suppA(µt!1) , then Bayes

rule imposes no restrictions on how µt is to be derived from µt!1 ; we set µt($%) = 1. Otherwise,

if at0 # suppA(µt!1) , we must have supp(µt) 2 supp(µt!1) & 1, since all types in the support of

µ(t&1) that were expected to play the action at0 may now be dropped from the support of µ(t). For

any path of play where suppA(µt) > 1 for t = t1, t2, ..., tK , we must have suppA(µs) = 1 "s > tK .

!

Proposition 5 below formalises the argument in the above paragraph using lemma 2 (FNPP).

The proof continues to employ the more stringent requirement that, while the RPD is available,

the mixing is not observable even ex-post. One could avoid the ungainly arguments by assuming

that mixing is observable.

Proposition 5 (Upper Bound Without Annoucements) Fix " > 0. Under assumption N

26

and FD there exists a payo! of the I .I. game in which player 0 gets l + " and any i > 0 gets at

least Wi.

Proof : Set v s.t. v0 := l + " is the target payo!. Pick v# and & > 0 such that W < v# <

v# +&!& < (v0, *) . In what follows a “deviation” by player 0 in period t from a strategy profile % is

taken to mean that in period t player 0 plays an action at0 /# supp(µ(ht!i|%)). Beliefs are updated

using Bayes’ Rule following an event that has positive probability; following a 0-probability event

under % , the posterior puts probability 1 on the normal type of player 0 as in lemma 2 above. The

play proceeds according to the following phases. If in period t we have A(µt!1) = {ao} then denote

by 't(µt!1) the n-tuple of actions that delivers l + " in the slice ao. Otherwise choose ao as the

action on which µt!1 places the higest probability. (Indeed we could just as well choose this using

any other deterministic algorithm.)

Phase I: In period t play the action n-tuple 't(µt!1) , this action vector being dependent on the

announced type and the history through µt!1, if it is the case that at!10 = A(µt!1) i.e. if the other

players correctly anticipated the action of player 0. If at0 '= A(µt!1) but at

0 # supp(µt!1) then

switch to Phase Adjust. If player 0 deviates from his announced strategy during any phase (this

includes the case at0 /# supp(µt!1) ) , go to Phase II(0). If player i > 0 deviates unilaterally from

any phase and Phase V has never been triggered, then switch to Phase II(i) ; if player 0 has never

deviated from his announced strategy and Phase V has been triggered in the past, then switch to

Phase III(i).

Phase Adjust : Minmax 0 for P periods, and transition to Phase I after readjusting the target

payo!s of all players i > 0 to take into account the actual utilities earned by the players during the

minmax phase. The following makes it possible to carry out this proof: If all players are patient

enough the possible variation in utility of any player i $ 0 during this stage cannot be more than

&/2K.

Phase II(0) : Minmax 0 for P periods, where P is the same as before. Otherwise denote the dis-

counted sum of realised utilities of player j for Phase II(0) as z0j . Then go to Phase V (0, z0

1 , z02 , ...z0

n).

Phase II(i) : Conditionally minmax i for P periods, where P is the same as before. Denote the dis-

counted sum of realised utilities of player j for Phase II(i) as zij . Then go to Phase IV (0, zi

1, zi2, ...z

in)

. Note that play transitions to a stage that is indexed by the actual payo!s that are earned by each

of the normal type players during the minmaxing phase. Since we are in phase II it implies that

player 0 has so far stuck to his announced startegy and there is no need to worry about is payo!s

when i > 0 is being punished.

Phase III(i) : Minmax i for P periods, where P is the same as before. Denote the discounted sum

27

of realised utilities of player j for Phase III(i) as zij . Then go to Phase V (i, zi

1, zi2, ...z

in) .

Phase IV (i, zi1, z

i2, ...z

in) : Play the action tuple that gives players 1,2...,n the payo! vector21

(v#1 + &/3& zi1, ..., v

#i & zi

i , ..., v#n + &/3& zi

n).

Phase V (i, zi0, z

i1, z

i2, ...z

in): Play action vector that gives (v#0, v#1 +&/3&zi

1, ..., v#i&zi

i , ..., v#n +&/3&

zin)22 .

This may be verified to be a sequentially rational equilibrium. !

6 Behavioural Types that Mix

All previous results are proved for a type space # = {$%, $1, ...,$K}, where the behavioural types

$1, ...,$K are committed to pure dynamic game strategies, i.e. each type $ # # & {$%} plays in

each period t = 1, 2, ... a unique action $t($)(ht!1) # A0 conditional on the history ht!1+ of action

profiles of players 1, 2, ..., n. Each type $ '= $% is thus identified with or defined by the following

sequence of mappings

$1($) # A0,$t($) : At!1+ ! A0

, which specifies an initial action (for t = 1) and for each t > 1 maps any history of actions

played by players 1, 2, ..., n into an action of player 0. This notion of a type as a pure (rather than

mixed) strategy of the dynamic game is important both for the exact numerical bound and, more

substantively, the technique of proof: At each t, given the history ht!1, the pure action $t($)(ht!1)

that 0 plays is known to the other players 1, 2, ..., n; this enables them to not only mete out a

suitably tailored punishment, but also detect deviations by the reputation builder as soon as they

occur. For the case of mixed strategy crazy types we maintain the earlier notation — fix (G, #, µ),

where G is the stage-game, # = {$%, $1, ...,$K} is an arbitrary finite set of types, and µ is the prior

on #. Each type $ '= $% is now identified with or defined by the following sequence

$1($) # .(A0),$t($) : At!1 !.(A0)

,where each A0 has been replaced by .(A0), and and type $’s period-t strategy maps from At!1

rather than At!1+ . The reason for the latter modification is that when a behavioural type plays a

pure strategy it is unnecessary to specify how he plays after having himself deviated in the past21v%i; i > 0 is the readjusted value after all previos Adjust phases , and not the original values. A similar statement

holds for Phase V 'i ) 0.22Note that play transitions to a stage that is indexed by the actual payo!s of all players including 0 during the

minmaxing phase. Since we are in phase III it implies that player 0 has revealed himself as the normal type and hispayo!s also need to be adjusted to ensure that he mixes with the appropriate probabilities.

28

and thereby revealed himself; with mixed strategies more actions than one are consistent with the

type, and there is scope for specifying how 0 plays conditional on his earlier realised actions.

Using relatively standard techniques I extend my results in proposition 3 to the case where

mixing by players 1..., n is not observable even ex-post; however doing the same for the reputation-

builder requires significantly di!erent techniques. Before a formal statement of the proposition let

me try to explain the key issues involved, starting with the conceptually simpler case of mixed

strategies being ex-post observable. First, the presence of types that mix means that the previous

bound won’t work because at each step there is more freedom for player 0: instead of playing

exactly one action from A0 he can mix among these actions, each such mix inducing a game among

the remaining n players. One could readily extend the result to mixed strategy crazy types when

mixing is ex-post observable; the bound needs to be redefined to allow for the extra flexibility

available to player 0, but the proof remains unchanged in essence —the normal type of 0 cannot

get more than the bound if he sticks to his announced mixing probabilities; if he deviates he reveals

himself immediately to be a normal type. However, matters are more complicated when types

are committed to mixed strategies that are unobservable even ex-post, because deviations are not

readily detected. This point is explained in greater depth further in the section.

We now introduce some notation, which closely follows that for pure types. Recall that the

upper bound was earlier defined by taking the max of w0(a0) over all actions a0 # A0. Now we

need to take the supremum over all probability distributions '0 on A0. This is clearly a larger set

and consequently the upper bound cannot decrease. We now redefine the bound l. First, for each

mixed action '0 # .(A0) we have an induced game G('0) as before.

Definition 4 : The conditional minmax of i > 0 in the slice-'0 is the minmax of i > 0 conditional

on the slice G('0) ; call it wi('0).

Definition 5 : The maxminmax of a player i > 0 with respect to 0 is defined as Wi := sup$0wi('0),

the supremum of all conditional minmaxes.

Start by defining

F('0) := {v # F ('0) : vi $ Wi" i > 0} .

While F('0) 0 Rn+1, the projection of F('0) onto the 0th coordinate is denoted by adding the

subscript 0 , i.e. F0('0) 0 R.

First, observe that the worst payo! for player-0 in the slice-'0 subject to each player i > 0

29

getting above her maxminmax is

w0('0) := inf F0('0) 1 inf { v0 : ( v0, v!0) # F('0)} .

The new upper bound is given by the supremum of these infima over all slices :

l := sup$0w0('0) 1 sup$0inf F0('0).

Even with this new bound, designing and enforcing a punishment for announcing a non-normal

type proves challenging. To avoid further complications at this point, we retain the announcement

phase. Since announcements are available we allow player 0 to send an initial message m and

subsequent messages mt at each time t = 1, 2, .... We also make the simplifying assumption that a

RPD is available. (For simplicity assume that mixing by players i > 0 is observable ex-post, even

though we cannot observe mixing probabilities chosen by 0; this simplifies the proof considerably

without a!ecting the essence of the result23. )With types that mix the crux is that detecting

deviations is no longer immediate — $% can announce some $ '= $% but behave like %( '= %($);

there is no way to immediately detect a discrepancy between the initially announced type and the

actual play if, for example, %($) and %( have the same support in A0 at each t on the path of play.

The key to resolving this problem is that while there is no way of detecting a deviation period by

period, one may monitor the distribution of the actions of 0 over time. More precisely, given any

announced $ we can define for each period t an excess reward function ,t, which measures the excess

of the realised payo! of 0 over the expected payo!. Over time a Law of Large Numbers(henceforth

LLN) would, we might hope, assure us that player 0 won’t be too much above his expected payo!

if he is playing according to the initially announced type. At a technical level, one should note that

classical LLNs use independent and identically distributed random variables. The requirement that

the random variables ,t be identical is easily relaxed since they are uniformly bounded in variance.

(This follows immediately from the payo! functions gi : A ! R being uniformly bounded by ±M ,

where M := maximaxa|gi(a)| ; it follows that |,t| 2 2M "t , which together with E,t = 0 implies

that V ,t 2 4M2. ) The first condition (independence) is not unfortunately not met because types

condition their t period play on the history of actions upto t & 1; in other words, a type can play

a di!erent mix at di!erent histories. To avoid this problem we could assume that the type space23In other words, even if we assumed that mixing by all players 0, 1, ..., n is unobservable even ex-post we can

define and prove a non-trivial upper bound on reputation. We would then use the minmax (wp0) of 0 over A+ rather

than *A+ during the adjustment phase, to be defined shortly. The new bound would merely be max{wp0 , l},where

l is the bound defined above. In particular the above argument works unchanged when (a) wp0 = w0, or (b) wp

0 ( l ,where (a) implies (b). Since l < g!!0 , so is max{wp

0 , l}, ensuring that the bound will be non-trivial.

30

# contains only constant (mixed) action types, i.e. each $ '= $% plays '($) # .A0 after all his-

tories. Then players i > 0 also need to play a constant profile )('($)) to punish 0 for declaring a

non-normal type $. Since the mixed action profile is history-independent as long as there are no de-

viations, ,t’s are indeed IID; this case is readily covered by LLNs. However reputational arguments

derive considerable power from the reputation builder’s ability to condition his future actions on the

past play of his opponents. Fortunately, even when one allows such permissive history-dependent

behavioural types that mix we can show that our construction admits of an appropriate LLN (for

dependent random variables) is applicable. We now formally define the excess payo! functions and

state the appropriate LLN.

Definition : The excess payo! function ,$ : A ! R is defined by ,$(a) := g0(a)& g0(') , where

' is any mixed action profile.

This is the excess payo! to player 0 when the action profile played is a and the desired mixed action

was '. Using this we next define the excess payo! given any announced type $ and any t&1 period

history ht!1 when players follow the punitive plan encoded in )(3).

Definition : ,t($, ht!1)(a) := ,%(a) , where - = ($t($)(ht!1), )($t($)(ht!1))

Before proceeding further we state some results that will enable us to state an appropriate weak

law of large numbers. Let Y1, Y2,... be a sequence of zero-mean random variables , and let It!1 is

the information filtration at time t& 1 . Then we define the following:

Definiton : {Yt}t'1 is a Martingale Di!erence Sequence if E(Yt|It!1) = 0"t.

The next property records that the random variables ,t form a MDS, where It!1 now denotes the

filtration generated by ,t!1, ..., ,1 :

Property : E(,t|It!1) = 0 and V ,t 2 4M2 "$, ht!1.

Either of the following theorems may, for example, be used (for the first see Badi Baltagi, pg. 218).

Theorem 1: Let {Yt}t'1 be a MDS. If"T

t=1&2

tt2 < %"T then 1

T

"Tt=1 Yt 4&!prob 0.

Fact 1 :")

t=11t2 is a convergent series.

Theorem 2(Chow): Let {Yt}t'1 be an MDS w.r.t {It}t'1. If for some r $ 1 ,"T

t=1(E|Yt|2r)/tr+1 <

%, then 1T

"Tt=1 Yt 4&!a.s. 0.

Fact 2 : Convergence almost surely implies convergence in probability.

Lemma 2 : Along any equilibrium path and for any type $, we have 1T

"Tt=1 ,t 4&!prob 0.

Proof : The proof follows from the properties of the excess payo! functions and either (a) Theorem

1 and Fact 1, or (b) Chow’s theorem, Fact 1, and Fact 2 when we take r = 1 . !

Lemma 3 : For any $ and any history ht , 5 N and !0 # (0, 1) such that if !0 > !0 , the total

31

discounted sum of excess payo!s

T&

t=1

,t + !0,t+1 + ... + !N!10 ,t+N!1

N< ".

Before an exact statement of the proposition and its proof, let me present a sketch highlighting the

key features of the construction. Recall the structure of announcements — player 0 sends an initial

message m # # at t = 0, and at each time t $ 1 a message mt # # just before the stage-game G

is played. (Thus each t is a length rather than instant of time.) Beliefs are formed in the natural

way — in keeping with Bayes’ rule players i > 0 put probability 1 on the announced type m;

following any unexpected message or play, players put probability 1 on the normal type. As before

the normal type is asked to send the message $% at all t. Any type $ '= $% sends the message $

at all t. If m = $%, then play moves to the SPNE of the c.i. game (see FM’86) giving player 0 a

payo! of v0 = l + 4". If m = $ '= $%, then play starts in Phase Ia, where for N periods play is

according to )(3), which tracks the (mixed) strategy of player 0 and punishes him on each slice. By

the definition of )(.) the expected payo! of 0 at each time t does not exceed l + ". However even if

0 mixes accoding to the initially annouced type, there is a positive probability that the discounted

sum of actual excess payo! during any Phase Ia exceeds l + ". In order to ensure that these excess

payo!s to not accrue, we need to do a review after the N periods of Phase Ia,— which was not

necessary when $t(m)(ht!1) was a degenerate distribution over the actions in A0. If the realised

payo! of 0 during the block of N periods exceeds the target payo! (= l + ") by more than " the

other players mixmax 0 for exactly NP periods so as to wipe out any gain over and above the

permissible limit of ". (If ! is high then players i > 0 would rather carry out a costly punishment

for NP periods than be punished into the future. ) At the end of this adjustment phase we move

back to the start of phase Ia for another cycle of N periods. This ensures that if we target l + ",

then 0 cannot get more than l + 2". With types that mix it is thus a positive probability event

that the adjustment phase is triggered even on the equilibrium path; we need to ensure that this is

not often enough to reduce the expected utility from the equilibrium path to where players 1, ..., n

would rather face punishment than carry out this expensive threat against player 0. However if we

choose N to be a large enough integer this probability can be made arbitrarily small, say .; lemma

2, together with the fact that players are patient, assures us of this. The proof makes the argument

exact by computing bounds on the payo!s of players i > 0 during the on-path play, and verifies

that appropriate incentives are in place.

32

Since players i > 0 carry out this strategy the normal player 0 cannot get more than v0 & 2" by

declaring m = mt = $ '= $% and following %(; this is strictly worse than the payo! to declaring

m = mt = $% and following the SPNE %ci(v0) of the c.i. game. What does the strategy ask $% to

do after declaring m '= $% ? He is then asked to reveal himself at the first possible opportunity. For

example, if type $% has until time & mimicked $ then he is asked to send the message m'+1 = $%

and reveal himself ; if & belongs to a Phase Ia that started at s < & then play switches to a game of

complete information that gives player-0 a payo! of v0 & " = l + 3" from s onwards. (This can be

done in such a way that others get payo!s above their respective wi. Note that in order to give 0

a payo! of l + 3" from the start of the current phase Ia players i > 0 might need to give him more

than l + 3" in the continuation game when he deviates. If all players are patient this will cost them

no more than a small amount #, which would still be above their maxminmax. Note: Indeed all we

need is to give i > 0 a payo! above the minmax wi, which is in general lower than the maxminmax

Wi.) The analogous result to proposition 1 for the case of behavioural types that mix when mixed

strategies are not ex-post observable is.

Proposition 7: Under assumptions N and FD (with RPD and announcements), we have vmin0 2 l.

Proof : Fix an arbitrarily small " > 0. (Note: As we shall see readily " cannot be so large that the

target payo!s are infeasibly high.) It is su"cient to find a PBE in which $0 gets an equilibrium

payo! of v0 := l + 4". In what follows let %ci(x0) denote a SPNE for the c.i. game that gives

player 0 a payo! of x0. First we describe the strategy profile that gives $% a payo! of v0 in the

reputational game. We assume for convenience that a RPD is available.

Description of the Strategy Profile :

The normal type is asked to send the message $% at all t, and play according to %ci(v0). Conditional

on having announced a non-normal type until date & , $% is expected to reveal himself at the start

of & + 1 by sending the message $%. (Any type $ '= $% sends the message $ at all t and plays

according to $($).) The path of play for players i > 0 depends on the initial announcement, as

detailed below.

Case I (m = $%): If m = $% , then play the strategy profile %ci(v0) starting at time t = 0.

Case II (m = $%): If m '= $% and no player has ever deviated, then we are in Phase I. In

what follows we use a RPD; therefore “deviation” by i > 0 refers to deviation from a pure action

contingent on the realised draw; for player 0, a “deviation” is said to occur at period t only when

an action a0(t) not in the support of $t(m)(ht!1) is played.

We first define the following quantities. In each slice '0 # .A0, by lemma 4 5)('0) # &A+ such

33

that

g0('0, )('0)) 2 l + " and *i('0) := gi('0, )('0)) > Wi + / for some / > 0

,where / depends on " . This implies that *i := sup$0*i('0) > Wi +/. Now fix the n+2 quantities

#, (+i)i>0,& s.t. Wi < +i < +i + & < *i & # < *i. Since Wi < +i by construction, we can find a

large enough interger P , which will later be seen to be the length of the conditional minmaxing

phase, such that

PWi + max gi(a) < P+i + min gi(a)"i > 0....())

We start in PhaseIa: During the first N periods, where N is chosen as in lemma 3 above,

and the prescribed play at t conditional on the t & 1-period history ht!1 is )($t(m)(ht!1)), the

mixed action profile that gives player 0 a low expected payo!(no more than l + ") in $t(m)(ht!1).

After N periods we now do a review, which was not necessary when $t(m)(ht!1) was a degenerate

distribution over A0. If the realised payo! of 0 during the block of N periods does not exceed the

target payo! by more than ", we start another Phase Ia; otherwise the other players mixmax 0 for

NP periods, wiping out any gain over and above the permissible limit of ", while the normal type of

0 is instructed to play some pure best reply during this stage. At the end of this adjustment phase

IIb we move back to the start of phase Ia for another cycle of N periods. By lemma 3 we can find

a large enough integer N which is independent of the discount factors and the type $ such that the

probability of the average discounted excess payo! being more than 2" after N periods falls short

of ., whre . > 0 will be chosen to be suitably small later on.

Suppose player 0 has never deviated from his announced strategy in the past and that player

i > 0 deviates from the prescribed path at time & ; then play enters Phase IIi , where player i is

conditionally minmaxed in slice $t(m)(ht!1) at time t = &, & + 1, ..., & + P , where P is the length

of the punishment. Once Phase IIi is over play moves into Phase IIIi , where the play is such

that +(i) is delivered at t = & +P +1, & +P +2, ...,% ; recall that the ith component of this vector

is +i , and all other components j > 0 are +j + &. If player j unilaterally deviates from Phase IIi

or Phase IIIi, then impose Phase IIj followed by Phase IIIj , and so on.

Conditional on an announcement m = $ '= $%, players believe that they are indeed facing type

$ until 0 deviates from $(m) or sends a message m '= $ and reveals himself to be the normal type.

Once the continuation game is a c.i. game, any player including 0 may be punished by minmaxing

by the other players P periods (see * above ). If player 0 reveals himself as the normal type at the

start of a Phase Ia then play re-adjusts to give him v0 & " = l + 3" from the start of the current

34

Phase Ia. If he reveals normality during a Phase Ib then the minmaxing continues until the current

phase is played out, and thereafter play switches to %ci(v0 & ").

Checking that the Strategy Profile Constitues a PBE:

Step 1 : Player 0 will not deviate :

(i) From standard results for the c.i. game it follows that 0 does not deviate after he reveals

himself to be a normal type.

(ii) $% will reveal himself with m = $% and get v0 because the maximum following any an-

nouncement m '= $% is below v0 & 2".

(iii) Following m '= $% , in phase Ia the normal type in indi!erent between revealing himself at

any point within the current phase, because it will give him the same utility from the start of the

current Ia ; delaying revelation until after the current phase is strictly worse.

Step 2 : Player i > 0 will not deviate :

(i) during Phase Ia : When Phase Ia has just been started let player i’s utility be vai . Let

T := N & 1 + NP . The following inequality follows from the recursion structure:

vai

1& !$ *i(1 + ! + !2 + ... + !N!1) + .{(&M)(!N + ... + +!T ) + !T+1 va

i

1& !}+ (1& .)

vai

1& !!N

=* vai $

*i(1& !N )&M.!N (1& !T!N+1){1& (1& .)!N & .!T+1}

As . ! 0, the right-hand side tends to (i(1!!N )(1!!N ) = *i ; thus 5." s.t . 2 ." * va

i $ *i & #.

The total discounted value at the start of Phase Ia comprises the following terms: First we have

N periods during each of which which player i expects to get no less than *i, given the announced

type and the construction of )(3); with probability . this is followed by an adjustment phase in

which player i gets no less than &M and subsequently play moves to the next Phase Ia, whereas

with the remaining probability 1& . Phase Ia is repeated.

(ii) during Phase Ib : When Phase Ib (the adjustment phase) has just been started let player

i’s utility be vbi . It is easy to see that if i does not have an incentive to deviate at the satrt of

the adjustment phase then she will not do so later, because it is at the start that the player has

to be ready to mete out a possibly costly punishment for the greatest number, viz. NP , periods.

The minimum utility from carrying this out is {(&M)(1 + ... + +!NP!1) + !NP vai

1!!}, where vai is

as above. If ! is high enough, this is close to vai . If i deviates, play eventually settles down in an

equilbrium that gives +i to player i. By construction +i < *i & #, and *i & # 2 vai from the step

35

above. It therefore follows that a deviation by any i > 0 from Phase Ib proves costly.

(iii) Player i does not deviate from IIIj or IIj because, in expected value, she ends up losing

the reward & and any gains from the deviation are wiped out.

(iv) Player i does not deviate from IIi because, in expectaton, she is playing her best response

anyway; deviation worsens both the current as well as the future payo!. Likewise deviation from

IIIi is unprofitable because of condition (*). !

7 A Lower Bound

Having already seen that the LR player can be forced down to payo!s arbitrarily close to l , we

end by looking at what payo!s LR can actually guarantee himself. Start by considering types that

have a dominant strategy to play a constant action a0 in every period (bounded recall zero). Fix

any a0 # A0, and ask the question: What can the LR player guarantee himself by mimicking a type

that plays this constant action each period: at0 = a0 " t = 1, 2, ...? First define the corresponding

individually rational slice F " (a0) defined as follows, where wi is the minmax of i in G:

F " (a0) := {v # F (a0) : vi $ wi" i > 0} .

Thus both F(a0) and F "(a0) are defined by truncating the slice-feasible payo! set F (a0) below

some level for each player i > 0. In the first case this level is the maxminmax for each player, and

in the second case this is the conditional minmax. Note that F " (a0) 0 Rn+1, the n+1-dimensional

Euclidean plane. Take the infimum of the projection of this set onto dimension 0 ; that is the lowest

payo! that 0 can get in an equilibrium if he sticks to a0 for all t. The rough reasoning is that if 0

continues to play a0 , the others cannot continue to play a strategy profile that gives them less than

their respective minmax values. If all he could do was to mimic a type that plays a constant action

every period (a bounded recall strategy with 0 memory) the worst he could do is to get the max

over a0 # A0 of these minima. Thus we have a lower bound l0 := maxa0$A0inf F "0 (a0) ; in any

equilibrium player-0 cannot get much less than l0 when all players are patient and he is relatively

patient, and the prior places positive weight on all types that play a constant action a0 for all t. A

formal statement follows. We make the following assumption:

Assumption TC : All “crazy” types declare their type truthfully.

This assumption, it should be remarked, is not assuming truthtelling for the entire game. In no way

36

does it constrain the normal type $%’s announcement24. This assumption has been used by Abreu

and Pearce (2002, 2007) as a shortcut to an explicit model with trembles or imperfect monitoring,

in which the strategies would eventually be learnt whether or not some irrational types declare

truthfully. However such a model would of necessity be technically challenging to handle, especially

in view of the large type space I wish to support. One might thus justify it as contributing to

technical simplicity25.

Next, a richness assumption will capture the premise of reputational arguments —Even when we

believe that a player is overwhelmingly likely to be of a given type, we can never be absolutely sure

that he is. Naturally reputational arguments are interesting precisely because they work even when

one type — the normal type of the corresponding repeated game— $% # # has high probability

mass, i.e. µ($%) ( 1. The following makes the assumption that the prior µ places a positive but

arbitrarily small weight on all types that play a constant action every period.

Assumption ACT (All Constant action Types ) : Assume µ($(a0)) > 0 "a0 # A0 .

Under this rather weak assumption ACT and TC we have the following result that puts a lower

bound on 0’s payo! across all BN equilibria.

Proposition 6 (Lower Bound): Under TC and ACT there exists a lower bound l0(!0, !1) on the

payo!s on 0 in a BN eq. such that lim!&1lim!0&1l0(!0, !) = l0 , where l0 := maxa0$A0inf F "0 (a0).

Proof: This is done in the same way as CST. !

The above step establishes the existence lower bound on 0’s min payo! in BN eq.26 using strategies

that involve playing a single action at all times. Note that the definition of F "(a0) uses the minmax

value wi , which would in general be less than a conditional minmax defined earlier.

In a game where strategies can be learnt because of trembles or imperfect monitoring the set

F "(a0) would be replaced by the set H (a0), which uses the conditional minmax rather than the

minmax: H (a0) := co {v # F (a0) : , vi $ wi (a0)" i > 0} . This would raise the lower bound as

wi (a"0) > wi in general. We use F "(a0), which truncates F (a0) below the minmax wi rather than

below the conditional minmax wi(a0), because there might exist equilibria in which even if player

0 plays a0 always it is not clear to the others that this is the case; consequently they perceive their

lowest eq. payo! as wi rather than wi(a0).

In general, it is clear that constant action types constitute a small class of strategies; there are

uncountably infinite ways of switching between the various actions in A0, each action amounting to24If truthtelling is to hold there it must be derived; otherwise studying reputation is pointless.25A later section extends the analysis to situations where announcements are unavailable; the accompanying proof

would not need TC .26This bound thus applies to all BNE, not just the smaller subset of PBE.

37

the choice of a“slice”of the game G. The next step would be to look at increasingly longer strategies

of bounded recall, perhaps using some kind of induction on the memory size and see to what extent

they lead to further improvements. This unfortunately turns out to be a very hard problem to

solve, partly because there are uncountably many “crazy” strategies that 0 could potentially mimic.

In principle one could think of each “crazy” strategy as a rule for transitioning among the slices

G ; thus solving the i.i. game is akin to solving a class of stochastic games among n rather than

n + 1 players, defined using the repeated game. We then need to find for each game in the class

the worst possible payo! of the normal type, and finally taking sup/max over all these possible

minima(or infima). Since stochastic games are notoriously hard to handle, this compounds the

di"culty. However as we have seen, LR cannot guarantee himself anything more than l no matter

how patient he is relative to his opponents, who are also patient.

8 Conclusion

This paper attempts to contribute to the literature on reputation that starts with the work of Kreps,

Wilson, Milgrom and Roberts, and is sharpened into powerful and general theoretical insights by

Fudenberg and Levine, and subsequent papers. My paper analyses reputation formation against

multiple patient opponents. I show that there are some additional insights to be gained from this

case, over and above the elegant theoretical insights of the previous literature. While reputation

is in general valuable even against multiple players, it may not be possible for the patient player

to extract the entire surplus while leaving the others with barely their minmax values. Let vmin0

be the minimum equilibrium payo! of player 0 in the limit when all players are patient and 0 is

patient relative to the rest. I find an upper bound l such that vmin0 2 l : Any payo! of the (complete

information) repeated game in which 0 gets more than l can be sustained. A single opponent cannot

threaten credibly to punish and thwart a patient player building reputation. But with more than

one patient opponent, there might be ways to commit to punishing even a patient player for not

behaving like the normal type.

References

[1] Abreu, Dilip ; On the Theory of Infinitely Repeated Games with Discounting Econometrica,

Vol. 56, No. 2 (Mar., 1988)

38

[2] Abreu, Dilip , Prajit K. Dutta, Lones Smith ; The Folk Theorem for Repeated Games: A Neu

Condition; Econometrica, Vol. 62, No. 4 (Jul., 1994)

[3] Abreu, Dilip , David Pearce ; Bargaining, Reputation and Equilibrium Selection in Repeated

Games with Contracts; Econometrica

[4] Aoyagi, Masaki ; Reputation and Dynamic Stackelberg Leadership in Infinitely Repeated

Games , Journal of Economic Theory, 1996, vol. 71, issue 2, pages 378-393

[5] Celetani, Marco ; Drew Fudenberg ; David K. Levine ; Wolfgang Pesendorfer ; Maintaining a

Reputation Against a Long-Lived Opponent; Econometrica, Vol. 64, No. 3 (May, 1996)

[6] Chan, Jimmy ; ”On the Non-Existence of Reputation E!ects in Two-Person Infinitely-Repeated

Games”, April 2000, Johns Hopkins University working paper.

[7] Cripps, Martin W. , E. Dekel, W. Pesendorfer; ”Reputation with Equal Discounting in Re-

peated Games with Strictly Conflicting Interests,” Journal of Economic Theory, 2005, vol. 121,

259-272.

[8] Cripps Martin W. ; George J. Mailath; Larry Samuelson [CMS], Disappearing Reputations in

the Long-run — Imperfect Monitoring and Impermanent Reputations Econometrica, Vol. 72,

No. 2. (Mar., 2004), pp. 407-432.

[9] Cripps Martin W. ; George J. Mailath; Larry Samuelson; Disappearing private reputations in

long-run relationships ; Journal of Economic Theory, Volume 134, Issue 1, May 2007, Pages

287-316

[10] Cripps Martin W. , Klaus M. Schmidt and Jonathan P. Thomas ; Reputation in Perturbed

Repeated Games; Journal of Economic Theory, Volume 69, Issue 2, May 1996, Pages 387-410

[11] Cripps, Martin W. , J. P. Thomas; “Some Asymptotic Results in Discounted Repeated Games

of One-Sided Incomplete Information,” Mathematics of Operations Research, 2003, vol. 28,

433-462.

[12] Evans, Robert ; Jonathan P. Thomas; Reputation and Experimentation in Repeated Games

With Two Long-Run Players Econometrica, Vol. 65, No. 5. (Sep., 1997), pp. 1153-1173.

[13] Fudenberg, Drew; David K. Levine; Reputation and Equilibrium Selection in Games with a

Patient Player , Econometrica, Vol. 57, No. 4 (Jul., 1989)

39

[14] Fudenberg, Drew; David K. Levine ; Maintaining a Reputation when Strategies are Imperfectly

Observed; The Review of Economic Studies, Vol. 59, No. 3 (Jul., 1992)

[15] Friedman, James ; A Non-cooperative Equilibrium for Supergames James W. Friedman The

Review of Economic Studies, Vol. 38, No. 1 (Jan., 1971), pp. 1-12

[16] Fudenberg, Drew ; David M. Kreps ; Reputation in the Simultaneous Play of Multiple Oppo-

nents ; The Review of Economic Studies, Vol. 54, No. 4 (Oct., 1987)

[17] Fudenberg, Drew; Eric Maskin; The Folk Theorem in Repeated Games with Discounting or

with Incomplete Information, Econometrica, Vol. 54, No. 3. (May, 1986), pp. 533-554.

[18] Kreps, David ; Robert Wilson ; Reputation and Imperfect Information ; Journal of Economic

Theory, 1982

[19] Klaus M. Schmidt ; Reputation and Equilibrium Characterization in Repeated Games with

Conflicting Interests; Econometrica, Vol. 61, No. 2 (Mar., 1993)

Appendix

Proof of TPL: Fix " > 0. Pick any slice a0. First we show that there exists a payo! vector

*(a0) # F(a0) such that *i(a0) > Wi " i > 0 and *0(a0) < w0(a0) + " 2 l + ". (" is the same for

all slices. ) Since inf{v0 : (v0, v+) # F(a0) and vi $ Wi "i > 0} = w0(a0) , there exists a vector

*(a0) # F(a0) such that *i(a0) $ Wi"i and *0(a0) < w0(a0) + "2 2 l + "

2 . By N, we can pick a

point *(a0) # F (a0) where each i gets > Wi. Now define *(a0) := #(a0)*(a0) + {1 & #(a0)}*(a0)

, where #(a0) is close enough to 1 so that *i(a0) > Wi and *0(a0) < w0(a0) + ". Since *(a0) #

F(a0) 0 co{F (a0)} , 5 a probability distribution )(a0) # .A+ such that

g(a0, )(a0)) 1&

a+$A+

g(a0, a+))(a0)(a+) = *(a0).!

Define *i := mina0{*i(a0)} . Thus i gets at least *i in every slice if all players j > 0 play )(a0)

and 0 induces slice-a0. Since *i(a0) > Wi "a0, we also have *i > Wi.

Proof of Lemma 3 : By lemma 2 we know that for any . > 0, 5N large enough so that for

any t and any initial t& period history ht!1 we have

Pr(|T&

t=1

,t + ,t+1 + ... + ,t+N!1

N| > "

2) 2 ..

40

Now the absolute distance between the disocunted and undiscounted means is, given N ,

|T&

t=1

,t + !0,t+1 + ... + !N!10 ,t+N!1

N&

T&

t=1

,t + ,t+1 + ... + ,t+N!1

N|

2N!1&

k=1

(1& !k0 )|,t+k|N

2 (1& !N0 )(N & 1)M

N<

"

2

, when !0 > !0 , where !0 solves the equation obtained when the last inequality is replaced by an

equality; thus the discounted excess payo! over any N periods will with probability 1&. not exceed

"2 + "

2 = ". !

41

mu ltiple o pp on ents and the l imits of r ep utation · mu ltiple o pp on ents and the l imits of...

Documents