reputation and the flow of information in repeated …ef253/2013_02_20_repflow.pdfand roberts(1982))...

Reputation and the Flow of Information in RepeatedGames with Frequent Interactions�

Eduardo FaingoldDepartment of Economics

Yale [email protected]

February 21, 2013

Abstract

This paper derives payoff bounds from reputation effects (akin to those of Fuden-berg and Levine (1992)) for repeated moral hazard games in which a long-run playerinteracts frequently with a population of short-run players.

Keywords: Reputation, commitment, imperfect monitoring, frequent interactions.

JEL Classification: C70, C72.

�I thank Olivier Gossner and George Mailath for stimulating discussions and helpful comments, and theNational Science Foundation (grant SES-1227593) for financial support. This paper subsumes and improvesupon the earlier analysis of reputations under frequent interactions in Faingold (2006).

1

1 Introduction

In this paper I study reputation effects (in the sense of Kreps and Wilson (1982), Milgromand Roberts (1982)) in repeated games in which a long-run player interacts frequently witha population of short-run players. I provide upper and lower bounds (akin to those ofFudenberg and Levine (1989, 1992)) on the set of Nash equilibrium payoffs of the long-runplayer in the limit as the discount rate and the period length tend to zero (in any order).

In typical studies of repeated games, it is common to interpret the comparative staticswith the players’ discount factor as an exercise of varying the length of the period of in-teraction. In particular, the limit when the discount factor tends to one is interpreted as thelimit when the period length tends to zero and the monitoring technology is fixed. However,when monitoring is imperfect so that players observe noisy signals about each other’s ac-tions, as in most cases of interest, it may be more natural to assume that the informativenessof the signals is constant over each unit of real time. But this requires the signal distri-butions to vary with the length of the period, as in the seminal paper of Abreu, Milgrom,and Pearce (1991), which leads to a comparative statics that is different from the traditionalone: as the periods shrink, the discount factor tends to one and the informativeness of thesignals (over the period of interaction) deteriorates. It turns out that allowing the monitor-ing technology to vary with the period length in this way can have a dramatic effect on theequilibrium analysis of repeated games: in somes cases, the set of perfect public equilibriapayoffs of the repeated game converges, as the period length tends to zero, to the convexhull of static Nash equilibrium payoffs (Abreu, Milgrom, and Pearce (1991), Faingold andSannikov (2011), Fudenberg and Levine (2007, 2009), Sannikov and Skrzypacz (2007)).

How about when the repeated game is perturbed by a small ex ante probability that thelong-run player may be a commitment type? Do reputation effects survive in the limit asthe period length shrinks to zero and the signal informativeness deteriorates? I show thatreputation effects remain powerful in the frequent interaction limit even under monitoringtechnologies that lead the equilibria of the underlying complete information game to col-lapse to static Nash equilibria. Specifically, I provide upper and lower bounds on the setof Nash equilibrium payoffs of the long-run player in the limit as the period length and theinstantaneous discount rate tend to zero (in any order). I then provide examples in which thebounds are tight and equal the Stackelberg payoff, whereas without uncertainty over typesthe equilibrium set would collapse to static Nash. I also discuss the connection between thefrequent-interaction bounds and the bounds that can be obtained from studying a limit gamethat is played directly in continuous time. In particular, I show that the set of equilibriumpayoffs can be discontinuous at the continuous-time limit game.

The intuition for the dramatic difference between the complete and incomplete infor-

2

mation results is rather simple. In the perturbed game, due to the learning process of theuninformed short-run players, the relevant time frame for reputation effects is the “longrun”—a reputation is not won in a day—and in the long run the many (poorly informative)public signals can be aggregated to become informative. By contrast, in the unperturbedcomplete information game, there is no learning about types and incentives can only becreated by using each period’s signals to adjust continuation play to deter deviations. Thus,unlike the perturbed game, the relevant period for incentives in the complete informationgame is the period of interaction itself, hence the possibility of collapse to static Nash equi-librium.1

In spite of this clean intuition, the result cannot be proved simply by taking the limitsof the Fudenberg-Levine (FL) bounds as the period length shrinks. To see why, let usassume (for simplicity) that there is a unique best-reply to the Stackelberg action and that thesignals statistically identify the action of the long-run player. Fix an " > 0 and an arbitraryNash equilibrium of the perturbed game and consider what happens under a deviation inwhich the normal type mimics the behavior of the Stackelberg type (i.e, the behavioraltype who is committed to playing the Stackelberg action in every period). Fudenberg andLevine (1992) show that, under this deviation, on a set of infinite histories with probabilityat least 1 � ", there is an upper bound K on the number of periods in which the short-run players are not playing a best-reply to the Stackelberg action, where K depends onthe monitoring technology and on ", but is independent of the discount factor and of theparticular equilibrium fixed. Due to this uniformity, by letting the discount factor tend to 1and " tend to zero in this order (assuming a fixed monitoring structure), the expected averagediscounted payoff of the normal type (under the deviation) converges to the Stackelbergpayoff. Hence, the Stackelberg payoff must be a lower bound on (the patient limit of) the setof Nash equilibrium payoffs of the normal type, because it is a lower bound on the expectedpayoff from a deviation. Now, this argument assumed a fixed monitoring technology, andthus a fixed period of interaction �. The reason why the payoff bounds for games withfrequent interactions do not follow by taking the limits of the FL bounds as � ! 0 has todo with the way the uniform bound K depends on � (through the monitoring technology).In a simple example with Poisson signals, I show that K does not scale with �. That is, ifthe period length is cut in half, one might expect the number of periods of non-best repliesto (at least approximately) double, but this is not the case for the uniform boundK providedby Fudenberg and Levine (1992).

To circumvent this problem I rely on a recent result of Gossner (2011), who uses en-tropy techniques to derive payoff bounds from reputation effects that are sharper than the FL

1See a recent paper by Bohren (2011) for an alternative channel by which intertemporal incentives can becreated in the frequent interaction limit which does not rely on uncertainty over types.

3

bounds for fixed discount factor and fixed monitoring. Although the Gossner bounds and theFL bounds coincide in the patient limit when the monitoring technology is fixed, the limitsturn out to be very different when the monitoring and the discount factor vary simultane-ously to adjust for the period length. Indeed, I derive the reputation bounds for games withfrequent interactions by considering the limits of the Gossner bounds as the period lengthvanishes. The frequent-interaction bounds rely on a notion of self-confirmed best-reply thatis based on an infinitesimal form of the relative entropy (also called Kullback-Leibler di-vergence), which can be interpreted as a measure of the flow of information in the game.I also provide examples to illustrate how this measure of information flow (and hence thebounds) can be calculated, including an example in which the upper and lower bounds con-verge to the Stackelberg payoff as the period length and the discount rate tend to zero inthis order, while applying the same order of limits to the Fundenberg-Levine discrete-timeresult yields completely uninformative payoff bounds.2 The latter example is not at all ex-ceptional or pathological: the gap between the frequent-interaction limit of the FL and theGossner bounds always arises for (nontrivial) games with signals that are sampled from aPoisson counting process (as in Abreu, Milgrom, and Pearce (1991)).

2 Baseline Model

2.1. Discrete-time repeated game. We recall the canonical reputation model of Fudenbergand Levine (1992), in which a long-run player faces a sequence of short-run players in aninfinitely repeated game with imperfect monitoring in discrete time. At the beginning ofperiod n D 0; 1; 2; : : :, the long-run player and the current short-run player simultaneously,and privately, choose actions an1 2 A1 and an2 2 A2 respectively, whereA1 andA2 are finiteaction spaces. A publicly observable signal yn is then drawn from a measurable space Yaccording to a probability distribution q.�jan1 ; a

n2/, where q W A! �.Y / andA WD A1�A2.

At the end of the period, the current short-run player departs from the game, being replacedby a new short-run player at the beginning of the following period. The new short-runplayer knows the entire past history of public signals.

The long-run player is privately informed of his type ! 2 �, where � is a countabletype space. The short-run players are uncertain as to which type of long-run player theyface and share a common prior � 2 �.�/, which satisfies �.!/ > 0 for all ! 2 �. Thetype space� contains strategic (or normal) types and behavioral types. The set of strategic

2That is, the FL upper and lower bounds converge to the greatest and least feasible payoff of the large playerin the stage game, respectively.

4

types is denoted �s � � and the overall payoff of a strategic type ! is

1∑nD0

.1 � ı/ıng1.an1 ; a

n2 ; !/;

where g1 W A1�A2��s ! R is the stage-game payoff function of the long-run player andı 2 .0; 1/ is his discount factor. A behavioral type is a mechanistic type of long-run playerwho is committed to playing some mixed action ˛1 2 �.A1/ in every period, irrespectiveof history.3 The set of behavioral types, denoted �b WD � n�s , is then naturally identifiedwith a countable subset of �.A1/. Finally, the payoff function of the short-run players iscommon knowledge and denoted g2 W A1 � A2 ! R.

2.2. Strategies and equilibrium. The space of n-period histories of the long-run player isdenoted Hn

1 WD �� .A1 � Y /n and is equipped with the product � -field, where H 0

1 WD �.Likewise, the space of n-period histories of the short-run players is denotedHn

2 WD Yn and

is endowed with the product � -field, where H 02 is the one-point set. A (behavior) strategy

of player i (i D 1; 2) is a sequence of measurable maps �ni W Hni ! �.Ai /, n > 0, such

that �1.!; �/ D ! for every behavioral type ! 2 �b . The set of strategies of player i isdenoted†i and the set of strategy profiles is denoted† WD †1�†2. Given the prior�, eachstrategy profile � 2 † induces a unique probability measure P�;� over the space of playsH1 WD ��.A1�A2�Y /

1 endowed with the product � -field. For each! 2 �, the symbolP!;� designates the induced probability measure over plays conditional on !. Expectationswith respect to P�;� and P!;� are denoted E�;� and E!;� , respectively. A profile � 2 †is a (Bayes-)Nash equilibrium of the repeated game �ı D

(.�;�/; .Ai ; gi /iD1;2; .Y; q/; ı

)if for every � 01 2 †1,

E�;�

[ 1∑nD1

.1 � ı/ıng1.an1 ; a

n2/]> E�;.� 01;�2/

[ 1∑nD1

.1 � ı/ıng1.an1 ; a

n2/];

and for every n > 0 and � 02 2 †2 such that � 0m2 D �m2 for every m ¤ n,

E�;�[g2.a

n1 ; a

n2/]> E�;.�1;�

02/

[g2.a

n1 ; a

n2/]:

For each ! 2 �s , let

N1.!; ı/ WD{E!;�

[ 1∑nD1

.1 � ı/ıng1.an1 ; a

n2/]W � is a Nash equilibrium of �ı

};

the set of Nash equilibrium interim payoffs of type !. Finally, define

N 1.!; ı/ WD infN1.!; ı/ and xN1.!; ı/ WD supN1.!; ı/:

3The payoff functions of behavioral types are left unmodeled

5

2.3. Reputation effects. We review the classical result of Fudenberg and Levine (1992),which provides upper and lower bounds on the set of Nash equilibrium payoffs of patientstrategic types. Recall that a mixed action ˛2 2 �.A2/ is a self-confirmed best-reply to ˛1 2�.A1/, written ˛2 2 B2.˛1/, if ˛2 is not weakly dominated and ˛2 2 arg max˛02 g2.˛

01; ˛02/

for some ˛01 2 �.A1/ with q.�j˛01; ˛2/ D q.�j˛1; ˛2/. For each ! 2 �s define

g1.!/ WD sup

˛12�.A1/

inf˛22B2.˛1/

g1.˛1; ˛2; !/

andxg1.!/ WD sup

˛12�.A1/

sup˛22B2.˛1/

g1.˛1; ˛2; !/:

The latter is called the generalized Stackelberg payoff, which can be greater than the tradi-tional Stackelberg payoff due to imperfect observability. Indeed the set of self-confirmedbest-replies can be greater than the set of best-replies, but the two concepts coincide whenthe long-run player’s actions are identified, i.e. when q.�j˛1; ˛2/ D q.�j˛01; ˛2/ implies˛1 D ˛01 for all ˛1, ˛01 2 �.A1/ and all ˛2 2 �.A2/ that are not weakly dominated.Finally, the stage game is called non-degenerate if there is no undominated pure actiona2 2 A2 with g1.�; a2/ D g1.�; ˛2/ for some ˛2 2 �.A2/ n fa2g.

Theorem 1 (Fudenberg and Levine (1992)). Suppose that behavioral types have full sup-port, i.e. �b is a dense subset of �.A1/. Then, for every strategic type ! with �.f!g/ > 0,

g1.!/ 6 lim inf

ı!1N 1.!; ı/ 6 lim sup

ı!1

xN1.!; ı/ 6 xg1.!/:

Moreover, if the stage game is non-degenerate and the long-run player’s action are identi-fied, then xg1.!/ D g1.!/ and hence,

limı!1

N1.!; ı/ D{xg1.!/

}:

3 Reputation under Frequent Interactions

Turning to the analysis of reputation effects in games with frequent interactions, embed thediscrete-time model of the previous section in continuous time, so each period of interac-tion has length � > 0. The long-run player discounts his flow payoffs exponentially atrate r > 0, so his discount factor across periods is ı D e�r�. As in Abreu, Milgrom, andPearce (1991), I allow the monitoring structure .Y �; q�/ to also vary with the period length�. In particular, the informativeness of each period’s signals may deteriorate as the peri-ods shrink, as when the signals are sampled from an underlying continuous-time process,such as a Poisson process (as in Abreu, Milgrom, and Pearce (1991)), a Brownian motion

6

(as in Sannikov and Skrzypacz (2007) and Fudenberg and Levine (2007, 2009)) or, moregenerally, a Lévy process (as in Sannikov and Skrzypacz (2010)).

The ensuing analysis relies on Gossner (2011), who uses methods from informationtheory (Cover and Thomas, 2006) to derive payoff bounds that are tighter than those ofFudenberg and Levine (1992) for fixed discount factor and monitoring structure. Recallthe definition of the relative entropy (also called Kullback-Leibler divergence) between twoprobability measures P and Q on a measurable space X :

H.QjjP / WD

{ ∫X ln.dQ

dP/ dQ; if Q << P;

1 otherwise,

where dQ=dP denotes the Radon-Nikodym derivative. The relative entropy is known tobe non-negative, equal to zero if Q D P and lower semi-continuous when viewed as afunction defined on �.X/ ��.X/ mapping into R [ fC1g.4

For each ˛1, ˛01 2 �.A1/ and ˛2 2 �.A2/ define

h.˛1; ˛01I˛2/ WD lim inf

�!0

1

�H(q�.�j˛1; ˛2/

∣∣∣∣q�.�j˛01; ˛2/); (1)

which inherits the properties of the relative entropy and is thus non-negative, equal to zeroif ˛1 D ˛01 for any ˛2, and lower semi-continuous when viewed as a function defined on�.A1/ ��.A1/ ��.A2/ taking values in R [ fC1g. The function h quantifies the flowof information that the short-run players receive about the actions of the long-run player inthe limit when the interactions become arbitrarily frequent.

A mixed action ˛2 2 �.A2/ is an information-flow confirmed best-reply to ˛1 2�.A1/, written ˛2 2 B�2 .˛1/, if ˛2 is not weakly dominated and ˛2 2 arg max˛02 g2.˛

01; ˛02/

for some ˛01 2 �.A1/with h.˛1; ˛01I˛2/ D 0. The long-run player’s actions are information-flow identified if h.˛1; ˛01I˛2/ D 0 implies ˛1 D ˛01 for all ˛1, ˛01 2 �.A1/ and undomi-nated ˛2 2 �.A2/. For each ! 2 �s define

g�1.!/ WD sup

˛12�.A1/

inf˛22B

�2 .˛1/

g1.˛1; ˛2; !/

andxg�1 .!/ WD sup

˛12�.A1/

sup˛22B

�2 .˛1/

g1.˛1; ˛2; !/:

Finally, denote by N 1.!; r;�/ and xN1.!; r;�/ the infimum and supremum, respectively,of the set of Nash equilibrium payoffs of type ! in the repeated game with discount rate

4The lower semi-continuity is relative to the norm topology on �.X/, not the weak*-topology.

7

r and period length � (i.e. the repeated game of the previous section with discount factore�r� and monitoring structure .Y �; q�/).

The following result is the analogue of Theorem 1 for games with frequent interactions.The proof, relegated to the appendix, is an immediate application of the main result inGossner (2011).

Theorem 2. Suppose that behavioral types have full support, i.e. �b is a dense subset of�.A1/. Then, for every strategic type ! with �.f!g/ > 0,

g�1.!/ 6 lim inf

r!0lim inf�!0

N 1.!; r;�// 6 lim supr!0

lim sup�!0

xN1.!; r;�// 6 xg�1 .!/:

Moreover, if the stage game is non-degenerate and the long-run player’s action are information-flow identified, then xg�1 .!/ D g

�

1.!/ and hence,

limr!0

lim�!0

N1.!; r;�/ D{xg�1 .!/

}:

Thus, what emerges from the analysis, and especially from the examples below, is thatthe appropriate notion of signal informativeness to study reputation effects in the frequentinteraction limit is the one given by the function h, the infinitesimal version of the relativeentropy that I called information flow. This is key, for if other notions of informativenesswere used, such as the one based on the total variation distance used by Fudenberg andLevine (1992) in discrete time, the resulting payoff bounds may become completely unin-formative in the frequent interaction limit, as demonstrated in the next section.

To illustrate the result, I provide explicit calculations of the information flow h in fiveexamples of monitoring structures with frequent interactions. The examples will also beuseful to clarify the connection between the frequent-interaction limit and repeated gamesthat are played directly in continuous time.

Example 1 (Sampling from a Poisson process). As in Abreu, Milgrom, and Pearce (1991),assume there is an underlying counting process, N.t/, whose instantaneous arrival rateis a function of the action profile. The players can adjust their actions at discrete timest D 0;�; 2�; : : :, and at any time t in this grid the value of N.t/ is publicly observedbefore the players get to choose their time t actions. Thus, in the formalism of Section 2,the public signals are the increments yn D N.n�/ �N..n � 1/�/, so that

Y � D f0; 1; 2; : : :g and q�.yja/ D e��.a/�.�.a/�/y=yŠ for all y 2 f0; 1; : : :g;

where the function � W A ! Œ0;1/ gives the arrival rate. Then, the information flow canbe explicitly calculated:

h.˛1; ˛01I˛2/ D �.˛

01; ˛2/ � �.˛1; ˛2/ � �.˛1; ˛2/ ln

(�.˛01; ˛2/�.˛1; ˛2/

);

8

for every ˛1, ˛01 2 �.A1/ and ˛2 2 �.A2/. By the concavity of the logarithm, h.˛1; ˛01I˛2/is always nonnegative, and it can be readily verified that h.˛1; ˛01I˛2/ D 0 if and only if�.˛01; ˛2/ D �.˛1; ˛2/. In particular, if the long-run player has two pure actions, then thegame is information-flow identified if and only if the arrival rates induced by the two pureactions are different regardless of the short-run player’s mixed action; if the long-run playerhas three or more pure actions, then the game cannot be information-flow identified.

Thus, if the long-run player has two pure actions, which induce different arrival rates,then the set of Nash equilibrium payoffs of a strategic type will converge to his Stackelbergpayoff as the period length and the discount rate tend to zero (assuming a non-degeneratestage game and a prior with full support). By contrast, consider the repeated product-choicegame where the long-run player is a firm who chooses whether to provide high or lowquality, and the short-run players are consumers who decide whether to buy the high-end orthe low-end product of the firm; the stage-game payoffs are:

High-end Low-endHigh 2; 3 0; 2

Low 3; 0 1; 1

Then, if the Poisson arrivals are “good news,” i.e. �.High; a2/ > �.Low; a2/ for everya2, then the only sequential equilibrium of the complete information game (when � issufficiently small) is the repetition of the static equilibrium .Low;Low-end/ after everyhistory, regardless of the discount rate, by an argument similar to that in Abreu, Milgrom,and Pearce (1991).5 �

Example 2 (Sampling from a Brownian motion). As in Sannikov and Skrzypacz (2007),assume there is an underlying diffusion process, X.t/, which evolves according to

dX.t/ D �.at / dt C dZ.t/; t 2 Œ0;1/;

where Z is a standard Brownian motion and the drift is given by function � W A ! R.Players can adjust their actions at times t D 0;�; 2�; : : :, and at any time t in this gridthe value of X.t/ is publicly observed before the players get to choose their actions at D.at1; a

t2/, so that

Y � D R and q�.�ja/ � N(�.a/�;�

):

Thus, for any � > 0 and any mixed-action profile, the induced distribution over publicsignals is a mixture of Gaussian distributions. The Kullback-Leibler divergence between

5Although Abreu, Milgrom, and Pearce (1991) only examine repeated games between two long-run players,their arguments also apply to games between a long-run player and a sequence of short-run players.

9

mixtures of Gaussians does not admit a tractable representation in terms of the divergencesbetween the pairs of Gaussians in the support of the mixtures. Indeed, there is no closed-form expression for the divergence between mixtures of Gaussians. Fortunately, we are onlyinterested in the flow of information h, that is, the limit as �! 0 of the quotient betweenthe divergence and the period length �, and that turns out to have a closed form. Indeed, asecond-order Taylor approximation argument can be used to show that the information flowis

h.˛1; ˛01I˛2/ D

1

2

(�.˛01; ˛2/ � �.˛1; ˛2/

)2;

for every ˛1, ˛01 2 �.A1/ and ˛2 2 �.A2/, so that h.˛1; ˛01I˛2/ D 0 if and only if�.˛01; ˛2/ D �.˛1; ˛2/. In particular, if the long-run player has two pure actions, then thegame is information-flow identified if and only if the drifts induced by the two pure actionsare different regardless of the mixed action of the short-run player; if the long-run playerhas three or more actions, then the game cannot be information-flow identified.

Thus, also in this example, the incomplete information game features strong reputationeffects, while the equilibrium of the complete information game becomes degenerate as theperiod length tends to zero, as shown in Fudenberg and Levine (2007, 2009). �

Example 3 (Binomial approximation of Brownian motion I). Consider the monitoring tech-nology .Y �; q�/, where

Y � D{p

�;�p�}

and q�(˙p�∣∣a) D 1

2˙ �.a/

p�;

for given functions � W A ! R and � W A2 ! .0;1/. This relates to Brownian motionvia Donsker’s Invariance Principle: For any fixed a 2 A, the family of stochastic processes.S�/�>0,

S�.t/ D

Œt=��∑nD0

yn�; t 2 Œ0;1/;

where y1�; y2�; : : : are i.i.d. random variables with distribution q�.�ja/, converges weakly

to a diffusion process with drift �.a/ and volatility 1, as � ! 0. Here, as expected, theinformation flow h takes exactly the same form as in the previous example, as can be readilyverified. However, this is not true for all discrete-time approximations of Brownian motion,as shown by the next two examples. �

Example 4 (Binomial approximation of Brownian motion II). Consider the monitoring tech-nology .Y �; q�/, where

Y � D{�.a/�˙

p� W a 2 A

}and q�

(�.a/�˙

p�ja

)D1

2

10

for given � W A ! R. Here, too, the cumulative process S� (defined as in the previousexample) satisfies the conditions of Donsker’s Theorem, and so it converges weakly to adiffusion with drift �.a/ and volatility 1. Suppose that the long-run player has only twoactions and that those two actions induce different values of � irrespective of the short-runplayer’s action. Then, for every positive �, the short-run players can perfectly monitor thelong-run player (so that h D 1). In particular, the long-run player’s actions are identifiedfor every positive �. Thus, given any non-degenerate stage game and any prior with fullsupport on behavioral types, the set of Nash equilibrium payoffs of any strategic type con-verges to his Stackelberg payoff as � tends to 0, irrespective of patience (i.e., for any fixedpositive discount rate r).

Nonetheless, the limit continuous-time game has full support imperfect monitoring:changes of drift induce absolutely continuous changes of the probability measure over thetrajectories of the diffusion (over any finite horizon). A companion paper, Faingold (2013),shows that in the continuous-time game the Nash equilibrium payoffs of a strategic typeconverge to his Stackelberg payoff as the discount rate r tends to 0. But, for fixed dis-count rate, the equilibrium payoffs in the continuous-time game are typically lower thanthe Stackelberg payoff (see Faingold and Sannikov (2011)), unlike the frequent interac-tion limit discussed in the previous paragraph, which does not require r ! 0 to yield areputation effect. �

Example 5 (Binomial approximation of Brownian motion III). Suppose A1 D f0; 1g andconsider the monitoring technology .Y �; q�/, where

Y � D{˙p�;˙

p�.1 ��/

}and q�

(˙p�.1 � a1�/ja

)D1

2for a1 2 f0; 1g:

Then, the cumulative process S�, defined as in Example 3, converges weakly to a standardBrownian motion irrespective of a.6 Thus, in the continuous time limit the signal is com-pletely uninformative, and therefore only static equilibria are possible. Nonetheless, foreach positive �, the signal distributions under the two pure actions have disjoint support,hence there is perfect monitoring for each � > 0. Just like in the previous example, herea reputation result obtains in the limit as the period length � tends to zero for any fixeddiscount rate. But, unlike the previous example, only static equilibria are possible in thecontinuous-time game, irrespective of patience. So, here, the discontinuity between discreteand continuous time is even more dramatic.7 �

6It can be readily verified that the conditions of Donsker’s Theorem are satisfied.7The reader might be, at first, unsatisfied with this example (and the previous one), because the approxima-

tion does not have full support. But the example can be slightly modified to deliver a full support approximationwith essentially the same features. Indeed, suppose that when player 1 chooses a1 2 f0; 1g then with proba-

11

4 Discussion

I discuss the connection between the frequent interation bounds based on the informationflow h and the limit of the FL bounds as the period length tends to zero. I argue that in thefrequent interaction limit, the FL bounds may become completely uninformative. Hence,the improvement provided by Gossner (2011) (on which Theorem 2 is based) turns out tobe necessary to obtain useful reputation bounds for frequent interaction games and patientlong-run player. This is unlike the discrete-time case with fixed period length, where theimprovement of the Gossner bounds over the FL bounds can only be useful when the long-run player is impatient.

I first review the main building block in Fudenberg and Levine’s (1992) proof of theirreputation bounds, which is a statistical learning result related to Blackwell and Dubins(1962)’s classical merging theorem.8 Let .�;F / be a measurable space and .Fn/n>0 afiltration. Given P and Q probability measures on .�;F / with Q << P , define

dn.Q;P / D ess supB2FnC1

jP.BjFn/ �Q.BjFn/j;

where ess sup denotes the essential supremum of a family of measurable functions.9

Uniform Merging Theorem (Fudenberg and Levine (1992)).10 For every � 2 .0; 1�, " >0 and � > 0, there exists a non-negative integer K such that, for every P , Q and Q0

probability measures on .�;F / with P D �QC .1 � �/Q0, one has

Q(

#fn > 0 W dn.Q;P / > "g > K)6 �: (2)

bility 1 � � the public signal is drawn from f˙p�.1 � a1�/g with equal probabilities, and with probability

� it is drawn from f˙p�.1 � .1 � a1/�/g also with equal probabilities. Such a quadrinomial approximation

also satisfies the conditions of Donsker’s Theorem, but it has support independent of a1 for any �. It can bereadily verified that h D 1 also in this case, and hence a reputation result obtains in the frequent-interactionlimit, although in the continuous-time game the monitoring is pure noise.

8See Sorin (1999) for further connections between merging and reputation.9The ess sup operator gives the smallest measurable function that is an almost sure upper bound on a given

family of measurable functions. The essential supremum above is well-defined because Q << P (Neveu,1975).

10The Q-almost sure convergence dn.Q;P / ! 0 is an immediate implication of Blackwell and Dubins’s(1962) merging theorem. The significance of the uniform merging theorem of Fudenberg and Levine (1992) isthat it strengthens the conclusion of the latter under a stronger form of absolute continuity, known as the grainof truth condition: the theorem establishes the uniformity of the rate of convergence K over all P , Q and Q0

that satisfy P D �QC .1� �/Q0 for a fixed size � of the “grain of truth.” Merging also plays a central role inthe literature on rational learning in games (Kalai and Lehrer (1993)).

12

For concreteness, let us review how this result relates to the FL bounds in the contextof the product-choice game with Poisson signals of Example 1. In fact, to make thingseven simpler, let us assume that only the long-run player controls the arrival rate �. Fixthe period length � > 0 and consider a behavioral type O! committed to playing High withprobability strictly greater than 1=2. Denote the mixed action of this behavioral type by O1.Let � denote the prior probability on type O!. Let � be an arbitrary Nash equilibrium of therepeated game with period length �. Thus, in the notation of Section 2,

P�;� D �P O!;� C .1 � �/P: O!;� :

Next, choose ".�/ > 0 small enough that supy jq�.yj˛1/ � q

�.yj O1/j 6 ".�/ implies˛1.High-end/ > 1=2. For future reference, I note that for any choice of ".�/ for which thelatter implication holds, one has lim sup�!0 ".�/=� <1.11

If the normal firm deviates and mimics the behavioral type O!, the payoff from thisdeviation must be lower than the payoff the firm receives in equilibrium. The uniformmerging lemma will help finding a lower bound on the firm’s payoff under this deviation.Fix � > 0 and let the integer K D K.�; �; O!/ be such that inequality (2) holds for " D".�/, P D P�;� , Q D P O!;� , Q0 D P: O!;� and Fn equal to the product � -field on Hn

2 .This implies that, under the deviation, with probability at least 1 � � there are at mostK.�; �; O!/ periods in which the consumers are expecting the firm to choose high qualitywith probability less than 1/2. Hence, with probability at least 1 � � under the deviation,there are also at most K.�; �; O!/ periods in which the consumers are not consuming thehigh-end product.12 Assume, conservatively, that such exceptional periods are precisely theK.�; �; O!/ initial periods. This yields the following lower bound for the payoff the firmreceives in any Nash equilibrium:

FL.r;�; �; O!/ WD .1 � �/.3 � O1.High//e�r�K.�;�; O!/: (3)

Notice that K.�; �; O!/ depends only on the period length � (through ".�/), on � and onthe behavioral type O!. In particular, it is independent of the discount rate r . Thus, lettingr ! 0 first, and then � ! 0 and O1.High/ ! 1=2, yields the Stackelberg payoff of 2:5.This is the argument of Fudenberg and Levine (1992).

Now, consider what happens to the FL bound, FL.r;�; �; O!/, when we send � to 0before sending r to 0. To figure out this limit, we need to understand how K.�; �; O!/

varies with �. Or, more generally, how K in the uniform merging theorem varies with ".The following result (proved in the appendix) provides sharp estimates:

11This follows from the fact that supy jq�.yj˛1/ � q

�.yj O1/j D j O1.High/ � ˛1.High/j � j�.Low/ ��.High/j�C o.�/ and O1.High/ > 1=2.

12Recall that whenever the consumers assign probability greater than 12 to high quality, their best-reply is to

consume the high-end product.

13

Lemma 1. For each � 2 .0; 1� and ", � > 0, let K�."; �; �/ be the smallest non-negativeinteger such that inequality (2) holds for all probability measures P;Q and Q0 with P D�QC .1 � �/Q0. Then, for every � 2 .0; 1� and � > 0,

lim sup"!0

"2K�."; �; �/ <1 :

Furthermore, if Fn ¤ FnC1 for all n, then for every > 0 and � 2 .0; 1�, there existsN� > 0 such that for every 0 < � < N�,

lim"!0

"2� K�."; �; �/ D1 :

Now, consider the Fudenberg-Levine bound (3) withK.�; �; O!/ D K�.".�/; �; �.f O!g//.Then, the second part of the lemma above (with D 1) yields ".�/K.�; �; O!/ ! 1 as� ! 0. But since lim sup�!0 ".�/=� < 1, we have that the real time �K.�; �; O!/ !1 as �! 0, and hence

lim�!0

FL.r;�; �; O!/ D 0;

for every � > 0 small enough and every O! with O1.High/ > 1=2. Thus, for the product-choice game with Poisson signals of Example 1, the FL bounds become uninformative inthe frequent-interaction limit.

A Appendix

A.1 Proof of Theorem 2

The main result of Gossner (2011) yields the following lower bound on N 1.!; r;�/:

supO!2�b

w�O! ..1 � e

�r�/ ln�. O!// D supO!2�b

w�O! .r� ln�. O!/C o.�//;

where, for each O! 2 �b , the function " 7! w�O!."/ is the pointwise supremum over all

convex functions that are below

" 7! inf{g1. O!; ˛2I!/ W ˛2 2 B

�2;". O!/

}and

B�2;". O!/ D{˛2 W ˛2 is undominated and 9˛1 s.t. H.q�.�j O!; ˛2/jjq�.�j˛1; ˛2// 6 "; ˛2 2 arg max

˛02

g2.˛1; ˛02/}:

Since for all O!, ˛1 and ˛2,

H.q�.�j O!; ˛2/jjq�.�j˛1; ˛2// > h. O!; ˛1I˛2/�C o.�/;

14

we must haveB�2;.1�e�r�/ ln�. O!/. O!/ � B

�2;r ln�. O!/Co.1/. O!/;

and hence,

supO!2�b

w�O! ..1 � e

�r�/ ln�. O!// > supO!2�b

w�O!.r ln�. O!/C o.1//;

where o.1/ is a function that converges to zero as �! 0, and " 7! w�O!."/ is the pointwise

supremum over all convex functions that are below

" 7! inf{u1. O!; ˛2/ W ˛2 2 B

�2;". O!/

}:

The result then follows from the lower semi-continuity of the function h, by taking the limitas �! 0 and r ! 0.13 The proof for the upper bound is analogous, and hence omitted.

A.2 Proof of Lemma 1

We begin with a few lemmas that are needed in the proof of Lemma 1. The first lemmabelow is related to the Borel-Cantelli lemma. Under a stronger assumption than the latter, itasserts that, in addition to the events happening infinitely often with probability one, thereis positive probability that the upper density of the indices for which the event obtains isbounded above 0.

Lemma 2. Let A1; A2; : : : be a sequence of independent events on a probability space.�;F ; P /. Suppose that c D lim infn!1 1

n

∑nkD1 PAk > 0. Then P.An i.o./ D 1 and,

in addition,

P

(lim supN!1

# fn 6 N W An obtainsgN

> c)> lim inf

N!1P

(# fn 6 N W An obtainsg

N> c

)>1

2:

Proof. Since the assumption implies that∑n PAn D 1, it follows from Borel-Cantelli

that P.An i.o./ D 1. Let us turn to the statement concerning the upper density. By theCentral Limit Theorem, ∑n

kD1 IAk� PAk√∑n

kD1 PAk

�! N .0; 1/ ;

13See Gossner (2011, Lemma 8) for details of a similar continuity argument.

15

where the convergence is in distribution. Let 0 < c < lim infn!1 1n

∑nkD1 PAk . We have:

P

(1

n

n∑kD1

IAk> c

)D P

∑nkD1 IAk

� PAk√∑nkD1 PAk

>cn �

∑nkD1 PAk√∑n

kD1 PAk

> P

∑nkD1 IAk

� PAk√∑nkD1 PAk

> 0

for n sufficiently large :

Therefore, denoting by ˆ the standard normal distribution function, we have

lim infn

P

(1

n

n∑kD1

IAk> c

)> 1 �ˆ.0/ D 1=2;

as required. �

Lemma 3. If pk D12.1C 1=k˛/ with ˛ > 1=2, then

1∑kD1

ln.1

4pk.1 � pk// <1 and

1∑kD1

.ln.pk

1 � pk//2 <1 :

Proof. Follows directly from the definition of pk , the inequality ln.1C x/ < x for x > 0,and the fact that

∑1kD1 k

�� <1 for all � > 1. �

Lemma 4. Let P;Q and Q0 be such that P D �Q C .1 � �/Q0. Let �n be the posteriorprobability on the event that the process follows Q, conditional on Fn. Then 8c > 0,

Q.�n 6 c�0/ 6 c :

Proof. Since the odds ratio .1 � �n/=�n is a martingale under Q, we have:

Q.�n 6 c�1/ D Q

(1 � �n

�n>1 � c�0

c�0

)6

EQŒ.1 � �n/=�n�

.1 � c�0/=.c�0/

D.1 � �0/=�0

.1 � c�0/=.c�0/6 c;

as was to be shown. �

16

Lemma 5. Suppose that .Fn/n > 1 is a strictly increasing sequence of atomic � -algebras.Then, 8� > 1=2 and 8� 2 .0; 1�, 9P;Q and Q0 with P D �Q C .1 � �/Q0, such that8c > 0,

lim infN!1

Q(#fn > 1 W dn.P;Q/ > c=N �

g > N)> 1=2 ;

Proof. First, we prove the result for the case in which the measurable space is the set of allinfinite sequences of Heads and Tails and the � -algebra Fn is that generated by all historiesof length n. Let Q be the probability measure corresponding to i.i.d. draws of an unbiasedcoin. Let Q0 be the probability corresponding to independent draws of a coin such thatthe probability of Heads in period n is pn D 1=2.1 C 1=n˛/, where 1=2 < ˛ < �. LetP D �QC.1��/Q0 and �n be the posterior probability that the stochastic process followsQ conditional on period n information.

Notice that dn.P;Q/ D .pn � 1=2/.1 � �n/. Denote by �n the f0; 1g-valued ran-dom variable that takes value 1 if and only if the coin turns out Heads at time n. Explicitcomputation of Bayes rule yields for all K > 0:

Q

(dn.P;Q/

�n>4K=�1

n�

)D Q

(1 � �n

�n>8K=�1

n��˛

)D Q

(n∏kD1

.2pk/�k .2.1 � pk//

.1��k/ >8K=.1 � �1/

n��˛

)

D Q

n∑kD1

�k ln.2pn/Cn∑kD1

.1 � �k/ ln.2.1 � pk// > ln.8K=.1 � �1//DWB

� .� � ˛/

DW >0

lnn

D Q

(n∑kD1

ln.pk

1 � pk/�k >

n∑kD1

ln.1

2.1 � pk//C B � lnn

)

D Q

(n∑kD1

ln.pk

1 � pk/.�k �

1

2/ >

1

2

n∑kD1

ln.1

4pk.1 � pk//C B � lnn

)

> Q

(j

n∑kD1

ln.pk

1 � pk/.�k �

1

2/j 6 lnn �

1

2

n∑kD1

ln.1

4pk.1 � pk// � B

)(4)

Chebishev’s inequality yields for all M > 0:

Q

(j

n∑kD1

ln.pk

1 � pk/.�k �

1

2/j > M

)6∑1kD1.ln.

pk

1�pk//2

4M 2

By Lemma 3,∑1kD1.ln.

pk

1�pk//2 <1, hence we can choose M > 0 large enough so that

Q

(j

n∑kD1

ln.pk

1 � pk/.�k �

1

2/j > M

)61

2: (5)

17

On the other hand, by Lemma 3, also∑1kD1 ln. 1

4pk.1�pk// < 1. Therefore, for n suffi-

ciently large,

lnn �1

2

n∑kD1

ln.1

4pk.1 � pk// � B > M :

Therefore, by the inequalities (4) and (5) above, for all n sufficiently large,

Q

(dn.P;Q/=�n >

4K=�1

n�

)>1

2

By Lemma 4, Q(�n 6 1

4�1)6 1

4. Thus, for n large enough,

Q(dn.P;Q/ > K=n�

)C1

4> Q

(dn.P;Q/ > K=n�

)CQ

(�n 6

1

4�1

)> Q

(dn.P;Q/=�n >

4K=�1

n�

)>1

2:

We have thus proved that,

lim infn!1

Q(dn.P;Q/ > K=n

)>1

4:

Therefore, the events fdn.P;Q/ > K=n�g satisfy the assumption of Lemma 2 and we have

lim infN!1

Q

(#fn 6 N W dn.P;Q/ > K=n�g

N>1

4

)>1

2:

Since

Q

(#fn 6 N W dn.P;Q/ > K=N �g

N>1

4

)> Q

(#fn 6 N W dn.P;Q/ > K=n�g

N>1

4

);

it follows that

lim infN

Q

(#fn W dn.P;Q/ > K=N �

g >1

4N

)>1

2;

which proves the desired result for the case in which .�;F ; .Fn/n > 1/ is the canonicalcoin tossing space.

If .�;F / is an arbitrary space with a strictly increasing atomic filtration .Fn/, then, forevery n, there are at least two disjoint atomic events An; Bn such that An; Bn 2 Fn nFn�1.Proceed as in the previous construction, identifying the event An with “Heads” and Bn with“Tails”. �

We are now ready to prove Lemma 1.

18

First, a close look at S. Sorin’s proof of the Fudenberg-Levine uniform merging the-orem (Sorin, 1999, Lemma 2.5) reveals that the integer K of inequality (2) can be takenas C="2, where C > 0 is a constant that depends only on � and �. Hence, we havelim sup"!0 "

2K�."; �; �/ 6 lim sup"!0 "2K D C <1, which is the first claim.

Consider the second claim. Assume the filtration is strictly increasing. Suppose, to-wards a contradiction, that for some q > 0 and � 2 .0; 1� there exist a function ."; �/ 7!N."; �/ and a sequence �k ! 0 with

lim sup"!0

"2�qN."; �k/ <1 ;

and such that for all P;Q and Q0 with P D �QC .1 � �/Q0 one has:

Q.#fn > 1 W dn.P;Q/ > "g > N."; �k// 6 �k

for all k > 1 and for all " > 0 sufficiently small.

For each k > 1, let Mk D 1 C lim sup"!0 "2�qN."; �k/. Define a double sequence

"m;k D .Mk=m/1=.2�q/ for all k;m > 1. Hence we have for all k > 1,

lim supm!1

N."m;k; �k/=m < 1

By Lemma 5, there exist probability measures P;Q and Q0, with P D �Q C .1 � �/Q0,such that for all k > 1,

lim infm!1

Q(#fn > 1 W dn.P;Q/ > "m;kg > N."m;k; �k/

)> 1=2

Since �k < 1=2 for k large enough, it follows that for every N" > 0 and every k sufficientlylarge 9" 2 .0; N"� such that:

Q.#fn > 1 W dn.P;Q/ > "g > N."; �k// > �k ;

which is a contradiction. The contradiction shows that

lim"!0

"2�qK�."; �; �/ D1

for every q > 0, � 2 .0; 1� and � sufficiently small, as was to be shown.

References

Abreu, Dilip, Paul Milgrom, and David Pearce (1991), “Information and timing in repeatedpartnerships.” Econometrica, 59, 1713–1733.

19

Blackwell, D. and L. Dubins (1962), “Merging of opinions with increasing information.”Ann. Math. Statist., 38, 882–886.

Bohren, Aislinn (2011), “Stochastic games in continuous time: Persistent actions in long-run relationships.” Univeristy of California, San Diego.

Cover, Thomas M. and Joy A. Thomas (2006), Elements of Iinformation Theory. Wiley,second edition.

Faingold, Eduardo (2006), “Building a reputation under frequent decisions.” Mimeo., Uni-versity of Pennsylvania.

Faingold, Eduardo (2013), “Maintaining a reputation in continuous time.” Yale University.

Faingold, Eduardo and Yuliy Sannikov (2011), “Reputation in continuous-time games.”Econometrica, 79, 773–876.

Fudenberg, Drew and David K. Levine (1989), “Reputation and equilibrium selection ingames with a single long-run player.” Econometrica, 57, 759–778.

Fudenberg, Drew and David K. Levine (1992), “Maintaining a reputation when strategiesare imperfectly observed.” Review of Economic Studies, 59, 561–579.

Fudenberg, Drew and David K. Levine (2007), “Continuous time models of repeated gameswith imperfect public monitoring.” Review of Economic Dynamics, 10, 173–192.

Fudenberg, Drew and David K. Levine (2009), “Repeated games with frequent signals.”Quarterly Journal of Economics, 124, 233–265.

Gossner, Olivier (2011), “Simple bounds on the value of a reputation.” Econometrica, 79,1627–1641.

Kalai, Ehud and Ehud Lehrer (1993), “Rational learning leads to nash equilibrium.” Econo-metrica, 61, 1019–1045.

Kreps, David and Robert Wilson (1982), “Reputation and imperfect information.” Journalof Economic Theory, 27, 253–279.

Milgrom, Paul and John Roberts (1982), “Predation, reputation and entry deterrence.” Jour-nal of Economic Theory.

Neveu, J. (1975), Discrete-Parameter Martingales. Elsevier Science.

Sannikov, Yuliy and Andrzej Skrzypacz (2007), “Impossibility of collusion under imperfectmonitoring with flexible production.” American Economic Review, 97, 1794–1823.

20

Sannikov, Yuliy and Andrzej Skrzypacz (2010), “The role of information in repeated gameswith frequent actions.” Econometrica, 78, 847–882.

Sorin, Sylvain (1999), “Merging, reputation and repeated games with incomplete informa-tion.” Games and Economic Behavior, 29, 274–308.

21

reputation and the flow of information in repeated …ef253/2013_02_20_repflow.pdfand roberts(1982))...

Documents