site.stanford.edu · endogenous experimentation in organizations germ an gieczewskiy svetlana...

70
ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS * Germ´ an Gieczewski Svetlana Kosterina June 2020 Abstract We study policy experimentation in organizations with endogenous mem- bership. An organization initiates a policy experiment and then decides when to stop it based on its results. As information arrives, agents update their be- liefs, and become more pessimistic whenever they observe bad outcomes. At the same time, the organization’s membership adjusts endogenously: unsuccessful experiments drive out conservative members, leaving the organization with a radical median voter. We show that there are conditions under which the latter effect dominates. As a result, policy experiments, once begun, continue for too long. In fact, the organization may experiment forever in the face of mount- ing negative evidence. This result provides a rationale for obstinate behavior by organizations, and contrasts with models of collective experimentation with fixed membership, in which under-experimentation is the typical outcome. JEL codes: D71, D83 Keywords: experimentation, dynamics, median voter, endogenous population Word count: 11607 * We would like to thank Daron Acemoglu, Avidit Acharya, Alessandro Bonatti, Glenn Ellison, Robert Gibbons, Faruk Gul, Matias Iaryczower, Navin Kartik, Wolfgang Pesendorfer, Nicola Per- sico, Andrea Prat, Caroline Thomas, Juuso Toikka, the audiences at the 2018 American Political Science Association Conference (Boston), the 2018 Columbia Political Economy Conference, the 2019 Canadian Economic Theory Conference (Montreal) and the 2020 Utah Winter Organizational and Political Economics Conference, as well as seminar participants at MIT, Princeton, Indiana University Kelley School of Business, University of Calgary and UCEMA for helpful comments. All remaining errors are our own. Department of Politics, Princeton University. Address: 311 Fisher Hall, Princeton NJ 08540. Phone number: 6172330642. E-mail: [email protected]. Department of Politics, Princeton University. Address: 300 Fisher Hall, Princeton NJ 08540. Phone number: 9179756491. Email: [email protected]. 1

Upload: others

Post on 10-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

ENDOGENOUS EXPERIMENTATIONIN ORGANIZATIONS∗

German Gieczewski† Svetlana Kosterina‡

June 2020

Abstract

We study policy experimentation in organizations with endogenous mem-bership. An organization initiates a policy experiment and then decides whento stop it based on its results. As information arrives, agents update their be-liefs, and become more pessimistic whenever they observe bad outcomes. At thesame time, the organization’s membership adjusts endogenously: unsuccessfulexperiments drive out conservative members, leaving the organization with aradical median voter. We show that there are conditions under which the lattereffect dominates. As a result, policy experiments, once begun, continue for toolong. In fact, the organization may experiment forever in the face of mount-ing negative evidence. This result provides a rationale for obstinate behaviorby organizations, and contrasts with models of collective experimentation withfixed membership, in which under-experimentation is the typical outcome.

JEL codes: D71, D83

Keywords: experimentation, dynamics, median voter, endogenous population

Word count: 11607

∗We would like to thank Daron Acemoglu, Avidit Acharya, Alessandro Bonatti, Glenn Ellison,Robert Gibbons, Faruk Gul, Matias Iaryczower, Navin Kartik, Wolfgang Pesendorfer, Nicola Per-sico, Andrea Prat, Caroline Thomas, Juuso Toikka, the audiences at the 2018 American PoliticalScience Association Conference (Boston), the 2018 Columbia Political Economy Conference, the2019 Canadian Economic Theory Conference (Montreal) and the 2020 Utah Winter Organizationaland Political Economics Conference, as well as seminar participants at MIT, Princeton, IndianaUniversity Kelley School of Business, University of Calgary and UCEMA for helpful comments. Allremaining errors are our own.†Department of Politics, Princeton University. Address: 311 Fisher Hall, Princeton NJ 08540.

Phone number: 6172330642. E-mail: [email protected].‡Department of Politics, Princeton University. Address: 300 Fisher Hall, Princeton NJ 08540.

Phone number: 9179756491. Email: [email protected].

1

Page 2: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

1 Introduction

Organizations frequently face opportunities to experiment with promising rut

untested policies. Two conclusions follow from our understanding of experimentation

to date. First, experimentation should respond to information. That is, when a pol-

icy experiment performs badly, agents should become more pessimistic about it, and

if enough negative information accumulates, they should abandon it. Second, when

experimentation is collective, the temptation to free-ride and fears that information

will be misused by other agents lower incentives to experiment.1 Thus organizations

should experiment too little. However, history is littered with examples of organi-

zations that have stubbornly persisted with unsuccessful policies to the bitter end.

This presents a puzzle for our understanding of decision-making in organizations.

Consider, for example, the case of Theranos, a Silicon Valley start-up founded

by Elizabeth Holmes in 2003. Theranos sought to produce a portable machine capable

of running hundreds of medical tests on a single drop of blood. If successful, Theranos

would have revolutionized medicine, but its vision was exceedingly difficult to realize.

Over the course of ten years, the firm invested over a hundred million dollars into

trying to attain Holmes’s vision, while devoting little effort to developing a more

incremental improvement over existing technologies as a fall-back plan. Theranos

eventually launched in 2013 with a mixture of inaccurate and fraudulent tests, and

the ensuing scandal irreversibly damaged the company.

Up to the point where Theranos began to engage in outright fraud, a pattern

repeated itself. The company would bring in high-profile hires and create enthusiasm

with its promises, but once inside the organization, employees and board members

would gradually become disillusioned by the lack of progress.2 As a result, many

left the company,3 with those who were more pessimistic about Theranos’s prospects

being more likely to leave than those who saw Holmes as a visionary. While the board

came close to removing Holmes as CEO early on, she managed to retain control for

many years after, because too many of the people who had lost faith in her leadership

1See, for example, Keller, Rady and Cripps (2005) and Strulovici (2010).2For instance, Theranos’s lead scientist, Ian Gibbons, told his wife that nothing at Theranos was

working, years after joining the company. See Carreyrou (2018).3For example, board member Avie Tevanian, while mulling over a decision to buy more shares

of the company at a low price, was asked by a friend: “Given everything you now know about thiscompany, do you really want to own more of it?” See Carreyrou (2018).

2

Page 3: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

had left the organization before they could have a majority.

The takeaway from this example is that Theranos experimented for too long in

spite of not succeeding precisely because the pessimistic members of the organization

kept leaving. Motivated by this and similar examples, we propose an explanation for

obstinate behavior by organizations that rests on three key premises. First, agents

disagree about the merits of different policies, that is, they have heterogeneous prior

beliefs. Second, the membership of organizations is fluid: agents are free to enter and

leave in response to information. Third, the organization’s policies are responsive to

the opinions of its members.

We operationalize these assumptions in the following model. An organization

chooses between a safe policy and a risky policy in each period. The safe policy

yields a flow payoff known by everyone, while the risky policy yields an uncertain

flow payoff, which may be higher or lower than that of the safe policy depending

on its type. There is a continuum of agents. In every period, each agent decides

whether to work for the organization or independently. The payoff of working for

the organization depends on its policy. The outside option yields a guaranteed flow

payoff.

All agents want to maximize their returns but hold heterogeneous prior beliefs

about the quality of the risky policy. As long as agents work for the organization,

they remain voting members of the organization and vote on the policy it pursues. We

assume that the median voter—that is, the member with the median prior belief—

chooses the organization’s policy. Whenever the risky policy is used, the results are

publicly observed.

We show that experimentation in organizations is inefficient in two ways that

are novel to the literature. First, there is over-experimentation from the point of

view of all agents. Over-experimentation takes a particularly stark form: the organi-

zation experiments forever regardless of the outcome. Second, the organization may

experiment more if the risky policy is bad than if it is good. In other words, the

organization’s policy may respond to information in a perverse way.

Our main result provides a simple necessary and sufficient condition under which

perpetual experimentation is the unique equilibrium outcome. The condition requires

that, at each history, the pivotal agent prefers perpetual experimentation to no ex-

3

Page 4: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

perimentation. We then analyze the comparative statics. We show that perpetual

experimentation is more likely when the outside option is more attractive, the orga-

nization’s safe policy is less attractive, agents are more patient, and the distribution

of prior beliefs contains more optimists in the MLRP sense.

Two forces affect the amount of experimentation in our model. On the one

hand, the median member of an organization is reluctant to experiment today if she

anticipates losing control of the organization tomorrow as a result. On the other

hand, if no successes are observed, as time passes, only the most optimistic mem-

bers remain in the organization, and these are precisely the members who want to

continue experimenting the most. The first force makes under-experimentation more

likely, while the second pushes the organization to over-experiment. Ex-ante, it is not

obvious which force will dominate. We show that the second force often dominates.

This underpins the main result of our paper.

We next show that our result of perpetual experimentation is robust. Our

baseline model features perfectly informative good news. We show that perpetual ex-

perimentation also obtains under bad news and imperfectly informative good news.

In addition, a novel result arises when news are imperfectly informative: for appro-

priately chosen parameter values, there is an equilibrium in which the organization

stops experimenting with a strictly positive probability only if enough successes are

observed. Hence, counterintuitively, the organization is more likely to experiment for-

ever if the technology is bad than if it is good.4 The implication is that self-selection of

agents into organizations may not only induce excessive experimentation overall—it

may also cause organizations to actively radicalize in the face of failure. Conversely,

success may make organizations more conservative and prone to backing away from

the very strategies that brought them success.

Finally, we show that our result is robust to general voting rules; settings in

which the members’ flow payoffs, or the learning rate, depend on the size of the

organization; an alternative specification in which agents have common values; and

a specification in which the size of the organization is fixed rather than variable and

the organization hires agents based on ability.

The rest of the paper proceeds as follows. Section 2 discusses the applications of

4Approximately, for this to happen, it is sufficient for the distribution of prior beliefs to besingle-dipped enough over some interval.

4

Page 5: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

the model. Section 3 reviews the related literature. Section 4 introduces the baseline

model. Section 5 analyzes the set of the equilibria in the baseline model. Section 6

considers other learning processes. Section 7 develops other extensions of the model,

such as general voting rules and settings where the members’ flow payoffs depend on

the size of the organization.

2 Applications

Our model has a variety of applications besides the one discussed in the Intro-

duction. In this Section, we discuss how our assumptions map to several settings such

as cooperatives, non-profits, activist organizations, firms and political parties.

We first consider experimentation in a cooperative. Agents are individual pro-

ducers who own factors of production. In a dairy cooperative, for example, each

member owns a cow. The agent can manufacture and sell his own dairy products

independently or he can join the cooperative. If he joins, his milk will be processed

at the cooperative’s plants, which benefit from economies of scale. The cooperative

can choose from a range of dairy production policies, some of which are riskier than

others. For instance, it can limit itself to selling fresh milk and yogurt, or it can

develop a new line of premium cheeses that may or may not become profitable. Dairy

farmers have different beliefs about the market viability of the latter strategy. Should

this strategy be used, only the more optimistic farmers will choose to join or remain

in the cooperative. Moreover, cooperatives typically allow their members to elect

directors.

In the case of activist organizations, agents are citizens seeking to change the

government’s policy or the behavior of multinational corporations. Agents with envi-

ronmental concerns can act independently by writing to their elected representatives,

or they can join an organization, such as Greenpeace, that has access to strategies not

available to a citizen acting alone, such as lobbying, demonstrations, or direct action—

for instance, confronting whaling ships. While all members of the organization want

to bring about a policy change, their beliefs as to the best means of achieving this

goal differ. Some support safe strategies, such as lobbying, while others prefer riskier

ones, such as direct action.

5

Page 6: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

An organization that employs direct action will drive away its moderate mem-

bers, increasingly so if its attempts are unsuccessful. The resulting self-selection can

sustain a base of support for extremist strategies. Our model can thus explain the

behavior of fringe environmental groups, such as Extinction Rebellion, Animal Lib-

eration Front and Earth Liberation Front that engage in ecoterrorism and economic

sabotage in spite of the apparent ineffectiveness of their approach.5 The same logic

applies to other forms of activism, as well as to charitable organizations choosing

retween more or less widely understood poverty alleviation tactics, for example, cash

transfers as opposed to microcredit.6

Our model is also relevant to the functioning of political parties. Here agents are

potential politicians or active party members, and the party faces a choice between

a widely understood mainstream platform—for example, social democracy—and an

extremist one which may be vindicated or else fade into irrelevance. A communist

platform that claims the collapse of capitalism is imminent is an example of the latter.

Again, the selection of extremists into extremist parties, which intensifies when such

parties are unsuccessful, explains their rigidity in the face of setbacks. For example,

the French Communist Party declined from a base of electoral support of roughly

20% in the postwar period to less than 3% in the late 2010s.7 Despite this dramatic

decline, partly caused by the demise of the Soviet Union, they have preserved the

main tenets of their platform, such as the claim that the capitalist system is on the

verge of collapse.

3 Related Literature

This paper is related to the literature on strategic experimentation with multiple

agents (Keller et al. 2005, Keller and Rady 2010, Keller and Rady 2015, Strulovici

5See, for example, https://www.theguardian.com/commentisfree/2019/apr/19/extinction-rebellion-climate-change-protests-london.

6Note that in these examples agents should be modeled as having common values, since agentsbenefit from a change in public policy regardless of how much their actions contributed to it. Al-though we write our main model for the case of private values, we show in Section 7 that our mainresults survive in the common values setting. Another way to accommodate common values in ourmodel is to endow agents with expressive payoffs, whereby agents benefit not just from a policychange but also from having participated in the efforts that brought it about.

7See, for example, Bell (2003), Brechon (2011) and Damiani and De Luca (2016).

6

Page 7: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

2010), as well as the literature on dynamic decision-making in clubs (Acemoglu et

al. (2008, 2012, 2015), Roberts 2015, Bai and Lagunoff 2011, Gieczewski 2019). The

idea that members of a declining organization may react by leaving or pushing for

policy changes has been discussed qualitatively by Hirschman (1970).

In Keller, Rady and Cripps (2005), multiple agents with common priors control

two-armed bandits of the same type which may have breakthroughs at different times.

In their model, there is under-experimentation due to free-riding. In contrast, we

study an organization making a single collective decision in each period about whether

to experiment. While the organization experiments, members can exit and collect an

outside option, but at the cost of their voting rights. The selection of optimists into

the organization causes excessive experimentation.8

In Strulovici (2010) a community of agents decides by voting whether to col-

lectively experiment with a risky technology. Agents have common priors, but ex-

perimentation gradually reveals some of them to be winners and others to be losers

from the risky technology. In equilibrium, there is too little experimentation because

agents fear being trapped into using a technology that turns out to be bad for them.

A similar motive to under-experiment is present in our model. Indeed, consider

an agent who would prefer to experiment today but not tomorrow. If she anticipates

that learning will result in an extreme optimist coming to power tomorrow, then she

may choose not to experiment today, lest she be forced to over-experiment or switch to

her inefficient outside option. However, there are two important differences between

our model and Strulovici’s. First, in our model, agents can switch to an outside

option at the cost of their voting rights. This novel assumption is what allows for the

organization to be captured by extremists. Second, in Strulovici’s model, learning

exacerbates the conflict between agents, while in our model learning helps agents

converge to a common belief.

The literature on decision-making in clubs studies dynamic policy-making in set-

tings where current policy choices determine current flow payoffs and future control

of the club. Some papers in this literature focus on discrete policy spaces (Acemoglu

et al. 2008, 2012, 2015, Roberts 2015), as we do, while others consider continuous

8While there is free-riding in the sense that outsiders benefit from the option value of experimen-tation, it is not socially costly because we assume the learning rate to be independent of the size ofthe organization. See Section 7 for an extension with an endogenous learning rate.

7

Page 8: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

policy spaces (Bai and Lagunoff 2011, Gieczewski 2019). In all these papers, tensions

arise due to conflicting preferences; in contrast, in our model, they are the result of

heterogeneous beliefs. Furthermore, we allow these beliefs to change endogenously

and stochastically due to experimentation, whereas the existing work considers deter-

ministic models.9 Like the present paper, Gieczewski (2019) studies a setting in which

agents are free to join or leave an organization, and only exert control over the policy

if they are members. The rest of the clubs literature, on the other hand, considers

settings in which the policy choice influences the set of the decision-makers directly.

Our main results can only be obtained in a model with learning and heterogeneous

beliefs, and are thus unique to our paper. In particular, in the equilibrium with

perpetual experimentation that we characterize, the organization persists in using a

policy that, in the limit, is preferred by almost nobody. The observation that such

an equilibrium can exist is novel to the literature.

4 The Model

Time t ∈ [0,∞) is continuous. There is an organization that has access to a

risky policy and a safe policy. The risky policy is either good or bad and its type is

persistent. We use the notation θ = G,B for each respective scenario.

The world is populated by a continuum of agents, represented by a continuous

density f over [0, 1]. The position of an agent in the interval [0, 1] indicates her beliefs:

an agent x ∈ [0, 1] has a prior belief that the risky policy is good with probability x.

All agents discount the future at rate γ.

At every instant, each agent chooses whether to be a member of the organization.

Agents can enter and leave the organization at no cost.10 Agents who choose not to

be members work independently and obtain a guaranteed autarkic flow payoff a. The

flow payoffs of members depend on the organization’s policy.

Whenever the organization uses the safe policy (πt = 0), all members receive a

guaranteed flow payoff s. When the risky policy is used (πt = 1), its payoffs depend

9The only exception is Acemoglu et al. (2015), but they only consider exogenous stochasticshocks.

10Our main results survive if we assume that agents cannot reenter after exiting.

8

Page 9: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

on the state of the world. If the risky policy is good, it succeeds according to a

Poisson process with rate r. If the risky policy is bad, it never succeeds. Each time

the risky policy succeeds, all members receive a lump-sum unit payoff. At all other

times, the members receive zero while the risky policy is used.

We assume that 0 < a < s < r. That is, the organization’s safe policy is

preferable to working independently.11 Moreover, the risky policy would be the best

choice were it known to be good, but the bad risky policy is the worst of all options.

When the risky policy is used, its successes are observed by everyone. By

Bayes’ rule, the posterior belief of an agent with prior x who has seen k successes

after experimenting for a length of time t is

x

x+ (1− x)L(k, t)

where L(k, t) = 1k=0ert. (Note that all posteriors jump to 1 after a success). Since

L(k, t) serves as a sufficient statistic for the agents’ beliefs, suppressing the dependence

on k and t, we take L = L(k, t) to be the state variable in our model and define p(L, x)

as the posterior of an agent with prior x when the state variable is L. We use Lt to

denote the value of L at time t.

We focus on Markov Perfect Equilibria, that is, equilibria in which the strategies

condition only on the information about the risky policy revealed so far and on the

incumbent policy. We let m(L, π) denote the equilibrium median member given the

state variable L and the incumbent policy π.

The structure of the game is as follows. At each instant t > 0, policy and

membership decisions are made. That is, first the median member of the organization,

m(Lt, πt), chooses a policy to be used by the organization in the immediate future.12

After this, all agents are allowed to enter or leave the organization.

11Our model features one organization with access to a risky technology. We can, however, al-low for the existence of other organizations that only have access to a safe technology. a can beinterpreted as the (maximal) productivity of these alternative organizations. The assumption a < smeans that the organization with access to the risky technology also enjoys a competitive advantagein the use of the safe technology. If a ≥ s, then our main results still go through but become lessinteresting as there is no longer an opportunity cost to having the organization experiment.

12For the median to be well-defined, the set of members must be Lebesgue-measurable. This willnot be an issue, since in equilibrium the set of members is always an interval.

9

Page 10: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

To simplify the presentation of the model, we will make the following assump-

tions. First, we assume that the organization is using the risky policy in the beginning

of the game, that is, π0 = 1.13 Second, we assume that a switch to the safe policy is

irreversible.14

Formally, a pure Markov strategy profile is given by a policy function α(L)

and a membership function β(x, L, π), where α(L) = 1 if the organization continues

experimenting in state L and α(L) = 0 if it stops, and β(x, L, π) = 1 if x is a member

in state (L, π).15

Definition 1. An equilibrium satisfies the following:

(i) Agents choose whether to be members based on their flow payoffs: x is a member

at time t if and only if s+ πt(p(Lt, x)r − s) > a.

(ii) The organization continues experimenting at time t if and only if m(Lt, 1)’s pay-

off from the equilibrium continuation is greater than the payoff from switching

to the safe policy, sγ.

The reason that agents make membership decisions based on their flow payoffs

is that there is a continuum of agents, so an agent obtains no value from her ability to

vote. Part (ii) of the definition embeds an important additional assumption about the

timing of policy and membership decisions: it implies that, for the organization to stop

experimenting, a majority of those who chose to be members under experimentation

must be in favor of stopping. In other words, we rule out equilibria in which a large

number of agents who dislike the current policy join the organization and immediately

change its policy.

The equilibrium we define can be obtained as a limit of the equilibria of a

discrete-time game in which membership and policy decisions are made at times

13Starting with the safe policy at t = 0 is equivalent to starting with the risky policy, unless thepopulation median finds the continuation in the latter scenario inferior to the payoff from neverexperimenting, in which case the safe policy is used forever.

14We show in the Appendix that this assumption is without loss of generality: in a more generalmodel in which the organization can change policies any number of times, switches to the safe policyare permanent in every equilibrium. The reason is that switching to the safe policy brings in morepessimistic members, and hence yields control to an even more pessimistic median than the one whochose to stop experimenting.

15We can also define non-Markov strategies by making α and β functions of the history h ratherthan the payoff-relevant state (L, π), but this will not be necessary for our analysis.

10

Page 11: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

t ∈ {0, ε, 2ε, . . .} with ε > 0 small. In this game, at each time t in {0, ε, 2ε, . . .}, first

the incumbent median chooses a policy πt for time [t, t+ε), and then all agents choose

whether to be members. The agents who choose to be members at time t – and hence

accrue the flow payoffs generated by policy πt – are the incumbent members at time

t + ε. The median of this set of members then chooses πt+ε. In this game, equilibria

in which all agents coordinate to join the organization at a particular time and vote

to stop experimentation do not exist because each agent can profitably deviate by

waiting one more period to join.

An additional tie-breaking rule is required to eliminate undesirable equilibria of

the following variety. For example, α(L) = 0 for all L – that is, all pivotal agents

immediately switch to the safe policy – is always compatible with equilibrium: any

agent who deviates towards experimenting will see her decision immediately over-

turned, so Condition (ii) holds vacuously. To rule out such equilibria, we will require

optimal behavior even when the agent’s policy choice only affects the path of play for

an infinitesimal amount of time.16

(iii) If m(Lt, 1) expects experimentation to stop immediately regardless of her action

but strictly prefers experimentation to continue for any small enough length of

time ε > 0 rather than stop, then α(Lt) = 1.

5 Equilibria in the Baseline Model

In this section we characterize the equilibria of the model described above. The

presentation of the results is structured as follows. We first explain who the members

of the organization are depending on what has happened in the game so far. We then

reduce the problem of equilibrium characterization to finding an optimal stopping

time. Next, we state our first main result, which asserts that the equilibrium in

which the organization experiments forever is unique whenever it exists (Proposition

1). We present comparative statics in Proposition 2. Finally, in Proposition 3 we

characterize the set of equilibria in the alternative case where experimentation cannot

go on forever.

16Condition (iii) is in the spirit of weak dominance: an agent who prefers experimentation shouldexperiment if she expects her successors to tremble and continue experimenting with some positiveprobability.

11

Page 12: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

We start with two useful observations. First, because the bad risky policy never

succeeds, the posterior belief of every agent with a positive prior jumps to 1 if a

success is observed. Since r > s, a, if a success is ever observed, the risky policy is

always used thereafter, all agents enter the organization and remain members forever.

Second, whenever the risky policy is being used, the set of members is the

set of agents for whom p(L, x)r ≥ a. It is clear that, for any L > 0 (that is, if

no successes have been observed), p(L, x) is increasing in x. That is, agents who

are more optimistic at the outset remain more optimistic after observing additional

information. Hence the set of members is an interval of the form [yt, 1], where yt is

defined by p(Lt, yt) = ar. Note that yt is uniquely determined as a function of Lt, and

m(Lt, 1) can be calculated as the median of F restricted to [yt, 1].

These observations, together with our assumption that switching to the safe

policy is irreversible, imply that any equilibrium path must have the following struc-

ture. The risky policy is used until some time T ∈ [0,∞]. If it succeeds by then,

it is used forever. Otherwise the organization switches to the safe policy at time

T .17 While no successes are observed, agents become more pessimistic over time and

the organization becomes smaller. As soon as a success occurs or the organization

switches to the safe policy, all agents join and remain members of the organization

forever, and no further learning occurs.

Proposition 1 states our first main result. Before stating it, we introduce several

definitions. We let V (x) denote the continuation utility of an agent with belief x,

provided that she expects experimentation to continue forever. This is the payoff

that agent x will get from staying in the organization until some time t(x) when her

posterior reaches ar

(if a success is not observed by then), then exiting the organization

and working on her own, rejoining the organization if and only if a success is observed.

We let mt denote the median voter at time t provided that the organization has

experimented unsuccessfully up to time t, (that is, mt = m(ert, 1)), and we let pt(mt)

denote mt’s posterior in this case (that is, pt(mt) = p(ert,mt)). Note that V (x), mt

and pt(mt) are all uniquely defined and exogenous functions of the primitives; explicit

formulas for them are given in Claims 9.1, 9.2 and 9.3 in the Appendix.

It is clear that there exists an equilibrium in which the organization experiments

17If T =∞, the risky policy is used forever.

12

Page 13: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

forever if and only if V (pt(mt)) ≥ sγ

for all t. Indeed, for such an equilibrium to exist,

it must be that for all t the pivotal agent at time t, mt, weakly prefers to continue

experimenting. By construction, V (pt(mt)) and sγ

are her payoffs from continuing

and experimentation, respectively, under the expectation that future pivotal agents

will never stop experimenting.

We can show that the set of parameters for which the condition V (pt(mt)) ≥ sγ

is satisfied for all t is non-empty. In particular, the condition is satisfied if the payoff a

from the outside option is not too low compared to the payoff s from the safe policy.18

Proposition 1 shows that the equilibrium with perpetual experimentation is unique,

whenever it exists.

Proposition 1. Whenever there is an equilibrium in which the organization experi-

ments forever, this is the unique equilibrium.

00

1

t

x

mtxtyt

Pro-risky policyPro-safe policyNon-members

Figure 1: Median voter, indifferent voter, and marginal member on the equilibrium path

The intuition for the equilibrium dynamics is illustrated in Figure 1. As the

organization experiments unsuccessfully on the equilibrium path, all agents become

more pessimistic. That is, pt(x) is decreasing in t for fixed x. Letting xt denote the

agent indifferent about continuing experimentation at time t, so that V (pt(xt)) = sγ,

this implies that xt must be increasing in t. Thus there is a shrinking mass of agents

in favor of the risky policy (the agents shaded in blue in Figure 1) and a growing

mass of agents against it (the agents shaded in red and green). For high t, almost all

agents agree that experimentation should be stopped.

However, growing pessimism induces members to leave. Hence the marginal

18An explicit threshold such that the condition holds if a exceeds this threshold is derived inCorollary 9.1 in the Appendix.

13

Page 14: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

00

1

t

x

pt(mt)pt(xt)pt(yt)

Figure 2: Posterior beliefs on the equilibrium path

member becomes more extreme, and so does the median member. If mt ≥ xt for all

t, that is, if the prior of the median is always higher than the prior of the indifferent

agent, then the agents in favor of the risky policy always retain a majority within the

organization, due to most of their opposition forfeiting their voting rights.

Figure 2 shows the same result in the space of posterior beliefs. The accu-

mulation of negative information puts downward pressure on pt(mt) as t grows but

selection forces prevent it from converging to zero. Instead, pt(mt) converges to a be-

lief strictly between 0 and 1, which is above the critical value pt(xt) in this example.

Hence the median voter always remains optimistic enough to continue experimenting.

To establish whether this equilibrium entails over-experimentation, we need a

definition of over-experimentation in a setting with heterogeneous priors. We will use

the following notion. Consider an alternative model in which an agent with initial

belief x controls the policy at all times. It is well-known that whenever 0 < x < 1,

the agent would experiment until some finite time depending on x. We say that

an equilibrium of our model features over-experimentation from x’s point of view if

experimentation continues longer than that. By this definition, when the condition

V (pt(mt)) ≥ sγ

for all t is satisfied, there is over-experimentation from the point of

view of all agents except those with prior belief exactly equal to 1.

The level of experimentation in equilibrium is determined by the interaction of

two opposing forces, in addition to the usual incentives present in the canonical single-

agent bandit problem. When the pivotal agent decides whether to stop experimenting

14

Page 15: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

at time t, she takes into account the difference in the expected flow payoffs generated

by the safe policy and the risky one, as well as the option value of experimenting

further. However, because the identity of the median voter changes over time, the

pivotal agent knows that if she chooses to continue experimenting, the organization

will stop at a time chosen by some other agent, which she likely considers suboptimal.

This force encourages her to stop experimentation while the decision is still in her

hands, leading to under-experimentation. It is similar to the force behind the under-

experimentation result in Strulovici (2010) in that, in both cases, agents prefer a

sub-optimal amount of experimentation because they expect a loss of control over

future decisions if they allow experimentation to continue. It is also closely related

to the concerns about slippery slopes faced by agents in the clubs literature (see, for

example, Bai and Lagunoff (2011) and Acemoglu et. al. (2015)).

The second force stems from the endogeneity of the median voter’s position in

the distribution. As discussed above, the more pessimistic a fixed observer becomes

about the risky policy, the more extreme the median voter is. This effect is so strong

that, as time passes, the posterior belief of the median after observing no successes

does not converge to zero, and the median voter may choose to continue experimenting

when no successes have been observed for an arbitrarily long time.

Next, we explain why the equilibrium with perpetual experimentation is unique.

The key here is that if an agent prefers to experiment forever rather than not at all,

then she also prefers to experiment for any finite amount of time T rather than

not at all. Thus if the median conjectured that the organization will experiment

for some finite time T instead of forever, the median still would not want to stop

experimentation.

The central result that we use in the proof here is that the value function

WT (x) of an agent with prior x who expects experimentation to continue for time T

is single-peaked in T . That is, there is a time T ∗ such that if the agent had control

over the policy at all times, the agent would experiment for time T ∗, and the farther

away the actual length of experimentation is from T ∗, the less happy the agent is.

Since W0(x) = sγ

and WT (x) is increasing in T before the peak, the agent prefers

to experiment for time T before the peak rather than not at all. Moreover, since

limT→∞WT (x) = V (x) and V (x) > sγ

by our hypothesis, the fact that WT (x) is

decreasing in T after the peak implies that the agent also prefers to experiment for

15

Page 16: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

time T after the peak rather than not at all. This establishes our result.

We can check whether there is perpetual experimentation in equilibrium by

calculating inft V (pt(mt)). For certain families of belief densities, either an exact

expression or a lower bound for inft V (pt(mt)) can be given.19 For instance, if the

density f is non-decreasing, then the following holds:

γ inft≥0

V (pt(mt)) = γV

(2a

r + a

)=

2ra

r + a+

(1

2

) γr a(r − a)

r + a

r

γ + r

The value function of the pivotal agent has a simple interpretation. Under a non-

decreasing density f , as the organization experiments unsuccessfully, the posterior

belief of the median converges to 2ar+a

from above. The first term on the right-hand

side then represents the agent’s expected flow payoff from experimentation: it is the

product of the probability 2ar+a

that the asymptotic median assigns to the risky policy

being good, and the expected flow payoff r from the good risky policy. The second

term is the option value derived from the agent’s ability to leave the organization

when she becomes pessimistic enough, and to return if there is a success.

Our next major result concerns the comparative statics of our model. We show

that perpetual experimentation is more likely when the payoffs from the risky policy

and the outside option are high, the payoff from the organization’s safe policy is low,

the agents are patient, and there are many optimists.

Proposition 2. If there is an equilibrium with perpetual experimentation under pa-

rameters (r, s, a, γ, f), then the same holds for any set of parameters (r, s′, a′, γ′, f ′)

such that s′ ≤ s, a′ ≥ a, γ′ ≤ γ and f ′ MLRP-dominates f .20

The intuition behind this result is as follows. An increase in the payoff s from

the safe policy makes the safe policy more attractive and has no effect on the expected

payoff from perpetual experimentation. An increase in patience is equivalent to an

increase in the rate of learning from experimentation, and a higher learning rate allows

agents to make better entry-exit decisions.

An increase in the number of optimists leaves the value function and the marginal

19See Proposition 9 in the Appendix for more details.20We say that g MLRP-dominates f if x 7→ g(x)

f(x) is non-decreasing for x ∈ [0, 1).

16

Page 17: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

member unchanged but results in a more optimistic median who is more likely to

support experimentation.21 An increase in a has two effects that favor experimenta-

tion. First, it increases the payoff from perpetual experimentation because the agent

expects to quit the organization and collect the outside option payoff with some prob-

ability. Second, it induces agents to quit, which leaves the organization with a more

radical median voter.

Finally, an increase in r has three effects. On the one hand, it makes experi-

mentation more attractive, both by directly increasing the payoff from the good risky

policy and by increasing the learning rate. At the same time, it induces more agents

to stay in the organization, which makes the median more pessimistic. Hence the

overall effect is ambiguous. However, for a well-behaved family of densities, the first

two effects dominate, so that perpetual experimentation is also more likely for high

r.22

If there does not exist an equilibrium with perpetual experimentation, there

may be multiple equilibria featuring different levels of experimentation, supported by

different off-path behavior.23 To state the results, we let T denote the time such that

the initial median is indifferent between switching to the safe policy and continuing

to experiment if she anticipates that, should she continue, the organization will stop

experimentation at time T .

Remark. Any equilibrium stopping time must lie in[0, T

].

Note that experimentation must stop by time T , as otherwise the initial median

would switch to the safe policy immediately. Proposition 3 speaks to the extent of

experimentation when there does not exist an equilibrium with perpetual experimen-

tation.

Proposition 3.

21For example, if f is uniform, mt is the midpoint between yt and 1, while if f is increasing, thenmt is closer to 1 than yt.

22Specifically, for each ω ≥ 0, let fω(x) denote a density in the power-law family given by fω(x) =(ω + 1)(1− x)ω. If f = fω for some ω, then inft V (pt(mt)) is increasing in r. See Proposition 9 andthe proof of Proposition 2 in the Appendix for details.

23More generally, a pure strategy equilibrium can be described by a sequence t0 < t1 < t2 < . . .of stopping times as follows. For any t ∈ (tn−1, tn], if the risky policy was used in the period [0, t]and no successes were observed, the organization continues using it until time tn. If the risky policyhas not succeeded by tn, the organization switches to the safe policy at tn.

17

Page 18: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

1. There are parameters under which24 all t ∈[0, T

]are equilibrium stopping

times.

2. If for all t ∈[0, t], mt prefers to experiment forever rather than not at all, then

in any equilibrium the organization experiments up to at least t.

It can be shown that, because WT is single-peaked in T , the initial median’s ideal

stopping time lies between 0 and T . Then the first part of the Proposition implies that

both over- and under-experimentation are possible depending on which equilibrium is

played. Under-experimentation obtains if an early median voter expects that, should

she continue experimenting, the next stopping time will be too far in the future.

The second part of the Proposition obtains for the following reason. If all

pivotal agents up to some time t prefer experimenting forever to not at all, then,

because WT is single-peaked, these agents will never stop experimenting. Therefore,

the equilibrium stopping time must be at least t. This means that our result of

perpetual experimentation survives in an approximate form even when, for instance,

the support of the prior distribution is truncated away from 1.25

Welfaregap

sr0

t < ∞ t = ∞ t = ∞

Figure 3: Welfare gap between the equilibrium and the socially optimal stopping time

Figure 3 illustrates the welfare effect of varying the quality of the outside option.

The blue curve is the difference between the initial median’s equilibrium utility and

24The condition on the parameters roughly amounts to requiring that pt(mt) does not decreasetoo steeply in t. For example, pt(mt) being constant in t would be sufficient. See Proposition 10 inthe Appendix for the exact condition and further details.

25Formally, suppose that for some density f there is perpetual experimentation, and let fy be ftruncated at y, that is, fy(x) ∝ f(x)1x≤y. Denote by ty the minimal equilibrium stopping time asa function of y. Then ty →∞ as y → 1.

18

Page 19: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

her utility from experimentation at her most preferred time. The shaded blue region

represents the range of welfare outcomes that obtain when multiple equilibria exist.

The reason this region occupies only part of the space between the two dotted lines

is that, as a increases, some equilibria disappear, so the range of obtainable welfare

outcomes shrinks.

In this example, the welfare gap in the worst equilibrium, compared to the initial

median’s optimal stopping time, is U-shaped. The reason is as follows. When a ≥ s,

the organization experiments forever. Here perpetual experimentation is optimal from

the point of view of all agents. This is because, since the organization’s safe policy is

no better than the outside option, agents who lose faith in the risky policy prefer to

switch to the outside option rather than use the organization’s safe policy.

For a < s relatively close to s, the organization still experiments forever, but

this is now inefficient from the point of view of all agents, with the size of the welfare

loss increasing as the gap between s and a grows. The inefficiency comes from the

fact that pessimistic agents are denied access to the organization’s safe policy.

For even lower values of a, perpetual experimentation is impossible because

too few agents leave the organization. Here a range of outcomes is possible. These

outcomes may include the initial median’s preferred stopping time. Finally, when a is

very close to 0, we can show that the equilibrium is again unique. In this equilibrium,

few agents ever leave, but the median still becomes more optimistic over time as bad

news arrive. This means that, generally, the decision is still made by an agent more

optimistic than the initial median. Moreover, the gap between the initial median and

the median who stops is growing in the number of people who leave, which itself is

increasing in a.

6 Other Learning Processes

The baseline model presented above has two salient features. First, experimen-

tation has a low probability of generating a success, which increases agents’ posterior

beliefs substantially, and a high probability of generating no successes, which lowers

their posteriors slightly. In other words, the baseline model is a model of good news.

Second, because the risky policy can only succeed when it is good, good news are

19

Page 20: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

perfectly informative.

In this Section, we relax these assumptions, presenting variants of the model

which allow for imperfectly informative good news and for bad news. In the first case,

we show that our finding of over-experimentation is robust to imperfectly informative

news. We also show that the organization may respond perversely to information,

becoming more reluctant to experiment after a success. In the case of perfectly

informative bad news, in contrast, there is typically under-experimentation.

6.1 A Model of Bad News

In this section we consider the same model as in Section 4, except that the risky

policy now generates different flow payoffs. In particular, if the risky policy is good,

it generates a guaranteed flow payoff r. If it is bad, it generates a guaranteed flow

payoff r but also experiences failures, which arrive according to a Poisson process

with rate r. Each failure lowers the payoffs of all members by 1. Thus, as in the

baseline model, the expected flow payoff from using the risky policy is r when it is

good and 0 when it is bad. The learning process, however, is different.

The dynamics of organizations under bad news differ substantially from those

in the baseline model. As is usual in models of bad news, as long as no failures are

observed, all agents become more optimistic about the risky technology, so the orga-

nization expands over time instead of shrinking. This gradual expansion continues

either forever or until some time T unless a failure occurs, in which case the organiza-

tion switches to the safe technology and all agents previously outside the organization

become members. Interestingly, the switch to the safe technology must happen upon

observing a failure but may happen even if no failures are observed.

As before, mt is the median member at time t provided that the risky policy

has been used up to time t and no failures have been observed. pt(mt) is the median’s

posterior belief at time t, and V (pt(mt)) is her continuation value when experimen-

tation is expected to continue forever unless there is a failure. Let us use t to denote

the earliest time when an agent with V (pt(mt)) <sγ

is pivotal. Proposition 4 provides

a characterization of the equilibria in this variant of the model.

Proposition 4.

20

Page 21: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

1. If V (pt(mt)) >sγ

for all t, then there is a unique equilibrium. In it, the organi-

zation experiments forever.

2. If V (pt(mt)) <sγ

for some t, then in any equilibrium the organization stops

experimenting at a finite time T < t.26

Proposition 4 shows that perpetual experimentation is the unique equilibrium

outcome if all pivotal agents prefer it to the safe policy. If, however, some pivotal

agents are pessimistic enough to stop experimentation, the organization will always

switch to the safe policy even before any of these pessimists become pivotal. Note

that, even when perpetual experimentation arises in the bad news setting, it does

not constitute over-experimentation, as it is possible only when all agents agree that

perpetual experimentation is optimal.

To understand these results, it is instructive to consider the associated single-

agent bandit problem. In a model of bad news, the agent switches to the safe policy

permanently upon observing a failure, and becomes more optimistic over time if she

pursues the risky policy and observes no failures. The more optimistic the agent

becomes, the more she wants to continue using the risky policy. Hence the agent will

either want to experiment forever or not at all.

Two implications follow from this observation. First, pessimistic agents with

V (pt(mt)) <sγ

always switch to the safe policy when they are pivotal: by assumption,

they prefer no experimentation to perpetual experimentation, and thus also to any

other continuation. Second, optimistic agents with V (pt(mt)) > sγ

have stronger

incentives to experiment if they expect experimentation to continue in the future:

only then can they collect the option value of learning about the policy. For them,

current and future experimentation are strategic complements.

These implications lead to the organization experimentation strictly before t, as

follows. Optimistic agents with V (pt(mt)) >sγ

are willing to experiment if they expect

perpetual experimentation in the continuation. However, agents who are pivotal

shortly before t know that any experimentation they attempt will be short-lived.

Thus they will choose to stop experimenting even if they are optimists. In turn, their

expected behavior may induce even earlier pivotal agents to switch to the safe policy

as well.

26This is true so long as t > 0. If t = 0, then T = 0.

21

Page 22: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

To summarize, in a bad news setting, over-experimentation is never possible

from the point of view of any pivotal agent, while under-experimentation is possible,

and always obtains when experimentation does not continue forever. These results

stand in stark contrast to those of our previous models. The results depend on a

special feature of the perfectly informative bad news learning process: bad news

create common knowledge that the risky policy is bad. Because of this, there is no

room for organizational capture by optimists who would disagree with the majority.

6.2 A Model of Imperfectly Informative (Good) News

We first treat the case of imperfectly informative news, which allows for much

richer dynamics than the baseline model: agents’ beliefs, rather than decreasing mono-

tonically or else jumping permanently to 1, can change in both directions as successes

and failures arrive. For brevity, we consider the case of good news, but similar results

can be obtained for imperfectly informative bad news.

The model is the same as in Section 4 except for the payoffs generated by the

risky policy. If the risky policy is good, it generates successes according to a Poisson

process with rate r. If it is bad, it generates successes according to a Poisson process

with rate c. We now assume that r > s > a > c > 0.

As before, the effect of past information on the agents’ beliefs can be aggregated

into a one-dimensional sufficient statistic. Suppose the risky policy has been used for

a length of time t and k successes have occurred during that time. Define

L(k, t) =(cr

)ke(r−c)t

Then the posterior of an agent with prior x at time t after observing the organization

use the risky policy for a length of time t and achieve k successes is

x

x+ (1− x)L(k, t)

We again suppress the dependence of L(k, t) on k and t and use L to denote

our sufficient statistic. Note that high L indicates bad news about the risky policy.

22

Page 23: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

We let Vx(L) denote the value function of an agent with prior x given that the

state is L and the organization experiments forever in the continuation. In addition,

we denote x’s ex ante utility in the same situation by V (x) = Vx(1). The next Propo-

sition shows that, as in Section 5, experimentation can continue forever regardless of

how badly the risky policy performs.

Proposition 5. If Vm(L) (L) > sγ

for all L, then there is a unique equilibrium. In this

equilibrium, the organization experiments forever.

We can show that there exist parameters such that the inequality Vm(L) (L) > sγ

is satisfied. For instance, if f is non-decreasing, then the posterior of the median

converges to 2(a−c)(r−c)+(a−c) , so inft V (pt(mt)) = V

(2(a−c)

(r−c)+(a−c)

). Then it is enough to

verify that V(

2(a−c)(r−c)+(a−c)

)≥ s

γ.27

The following result illustrates the novel outcomes that can arise under imper-

fectly informative news.

Proposition 6. There exist parameters such that there is an equilibrium in which

the organization experiments more when the risky policy is bad than when it is good.

The intuition for the result in Proposition 6 is as follows. We first show that,

for an appropriately chosen density f , an equilibrium of the following form exists:

whenever L = L∗, the organization stops experimenting with probability ε, and at all

other times the organization continues experimenting for sure. For this to work, f

must be such that the median is most pessimistic when L = L∗.28 Moreover, s must

be such that Vm(L∗)(L∗) = s

γ, so that the median is indifferent about experimentation

at L∗, while other agents prefer to continue experimenting when they are pivotal.

The striking feature of this equilibrium is that only happens for an intermediate

value of L. In particular, if L∗ is smaller than the initial L, the only way experimen-

tation will stop is if it succeeds enough times for L to decrease all the way to L∗,

which is more likely to happen when the risky policy is good.

27However, in this case it is not possible to give an exact expression for V , owing to the complicatedbehavior of L over time.

28This occurs, for instance, if f is very high in a small neighborhood of y(L∗). Then, whenL > L∗, all the pessimists to the left of y(L∗) leave, so that m(L) is more optimistic, while whenL < L∗, pessimists become members, yielding a lower m(L). This construction is not possible forsome densities of prior beliefs. In particular, if f is uniform or follows a power law distribution, thenp(L,m(L)) is decreasing in L (Lemma 5).

23

Page 24: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

7 Other Extensions

7.1 General Voting Rules

We assume throughout the paper that the median member of the organization is

pivotal. This assumption is not essential to our analysis: our results can be extended

to other voting rules under which the agent at the q-th percentile is pivotal.

It is instructive to consider how the results change as we vary q. Letting qt

represent the pivotal agent at time t, it is clear that qt and pt(qt) are increasing in

q for all t. To illustrate further, assume that f is uniform. Then, as t → ∞, the

posterior belief of the pivotal agent converges to aqa+(1−q)r rather than 2a

r+a. It follows

that more stringent supermajority requirements are functionally equivalent to more

optimistic leadership of the organization, and make it easier to sustain an equilibrium

with excessive experimentation.

7.2 Size-Dependent Payoffs

In some settings the payoffs that an organization generates may depend on its

size. In this section we discuss how different operationalizations of this assumption

affect our results. We show that our main result is robust to this extension, and

discuss how different kinds of size-dependent payoffs may exacerbate or prevent over-

experimentation.

We consider three types of size-dependent payoffs. For the first two, we suppose

that when the set of members of the organization has measure µ, the safe policy

yields a flow payoff sg(µ), the good risky policy yields instantaneous payoffs of size

g(µ) generated at rate r, and the bad risky policy yields zero payoffs. We assume

that g(1) = 1, so that r, s and 0 are the expected flow payoffs from the good risky

policy, the safe policy and the bad risky policy respectively when all agents are in

the organization. For the first type of payoffs we consider, g(µ) is increasing in µ, so

there are economies of scale. For the second type, g(µ) is decreasing in µ, so there is

a congestion effect.

In general, the effect of size-dependent payoffs on the level of experimentation

24

Page 25: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

is ambiguous because of two countervailing effects. On the one hand, when there is a

congestion effect, as the organization shrinks, higher flow payoffs increase the benefits

from experimentation, which makes experimentation more attractive.29 We call this

the payoff effect. On the other hand, because increasing flow payoffs provide incentives

for agents to stay in the organization, the organization shrinks at a lower speed, which

causes the median voter in control of the organization to be more pessimistic about

the risky policy. We call this the control effect. When there are economies of scale,

these effects are reversed.

When there are economies of scale, the set of members may not be uniquely

determined as a function of the state at time t. This is because the more members

there are, the higher payoffs are, so the membership stage may have multiple equilib-

ria. We will assume, for simplicity, that the set of members is uniquely determined.30

It is sufficient to assume that g does not increase too fast.

The following Proposition presents our first result.

Proposition 7. Suppose that f = fω.31 Let g = limµ→0 g(µ), and let Vg,t(pt(mt))

denote the utility of the pivotal agent at time t if she expects experimentation to

continue forever. If

λγr a

r

γ + r+a

λ

γ

γ + r> r

r

γ + r+ a

γ

γ + r

then limt→∞ Vg,t(pt(mt)) is strictly increasing in g for all g ∈ [a,∞). In this case,

perpetual experimentation obtains for a greater set of parameter values with a conges-

tion effect and for a smaller set of parameter values with economies of scale, relative

to the baseline model.

Conversely, if the reverse inequality holds strictly, then limt→∞ Vg,t(pt(mt)) is

strictly decreasing in g for all g ∈ [a,∞).

The intuition for the Proposition is as follows. By the same argument as in

the baseline model, Proposition 1 holds, and a sufficient condition to obtain experi-

29While the safe policy could also yield high payoffs when the organization is small, all agents willenter as soon as the safe policy is implemented, so these high payoffs can never be captured.

30Formally, we require that the equation ytyt+(1−yt)ert = a

g(1−F (yt))bhas a unique fixed point for all

t ≥ 0.31Recall that fω(x) is a density with support [0, 1] such that fω(x) = (ω+1)(1−x)ω for x ∈ [0, 1].

25

Page 26: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

mentation forever is that Vg,t(pt(mt)) ≥ sγ

for all t. While it is difficult to calculate

Vg,t(pt(mt)) explicitly for all t, calculating its limit as t → ∞ is tractable and often

allows us to determine whether the needed condition holds for all t. We show that

the limit depends only on g rather than the entire function g. Moreover, it is a hy-

perbola in g, so it is either increasing or decreasing in g everywhere. In the first case,

size-dependent payoffs affect the equilibrium mainly through the payoff effect, so ex-

perimentation is more attractive with a congestion effect and less so with economies

of scale. In the second case, the control effect dominates, and the comparative statics

are reversed. These statements are precise as t→∞ (that is, conditional on the risky

policy having reen used for a long time). We can show that when congestion effects

make experimentation more likely in the limit, they do so for all t.32

The inequality in the Proposition determines which case we are in. Because

r > λγr a and a

λ> a, if r is large enough relative to γ, then over-experimentation is

easier to obtain with economies of scale than in the baseline model, and easier to

obtain in the baseline model than with a congestion effect. The opposite happens

if γ is large relative to r. The reason is that, under economies of scale, the pivotal

decision-maker is very optimistic about the risky policy but expects to receive a low

payoff from the first success. If rγ

is large, so that successes arrive at a high rate or

the agent is very patient, the first success is expected to be one of many, while if rγ

is small, further successes are expected to be heavily discounted. Conversely, with

a congestion effect, for large t the pivotal decision-maker is almost certain that the

risky policy is bad but believes that, with a low probability, it will net a very large

payoff before she leaves.

The third way to operationalize size-dependent payoffs that we consider deals

with changes to the learning rate rather than to flow payoffs. Here we suppose that

when the organization is of size µ, the good risky policy generates successes at a rate

rµ. Each success pays a total of 1, which is split evenly among members, so that

each member gets 1µ. All other payoffs are the same as in the baseline model. An

example that fits this setting is a group of researchers trying to find a breakthrough. If

there are fewer researchers, breakthroughs are just as valuable but happen less often.

When f is uniform, and using V to denote the continuation utility under perpetual

32It can be shown that when congestion effects make experimentation less likely in the limit, theymay not do so for all t.

26

Page 27: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

experimentation, we have

γ inf Vt(pt(mt)) = γ limt→∞

Vt

(2a

r + a

)=

2ra

r + a

In other words, the asymptotic median’s expected payoff from experimentation

comes only from the flow payoff of the risky policy; the option value of experimen-

tation vanishes as the learning rate converges to zero. It follows that perpetual

experimentation is less likely to obtain here than in the baseline model, but is still

the unique outcome if 2rar+a

> sγ. Note that, in this case, as agents who join the or-

ganization increase the learning rate, they confer a positive externality on outsiders

which is not internalized. Hence, there is free-riding as in Keller et al. (2005). It is

simultaneously possible that too few agents partake in experimentation—given that

the organization’s policy is risky—and that the risky policy is used for too long.

7.3 Organizations of Fixed Size

For the sake of simplicity and clarity, our main model assumes that the organi-

zation allows agents to enter and exit freely and adjusts in size to accommodate them.

While free exit is a reasonable assumption in all of our applications, the assumptions

of free entry and flexible size are often violated: in the short run, organizations may

need to maintain a stable size to sustain their operations. In this Section, we discuss

a variant of our model incorporating these concerns.

Assume now that agents differ in two dimensions: their prior belief x ∈ [0, 1]

and their ability χ ∈ [χ, χ]. Suppose that the density of agents at each pair (x, χ) is

of the form f(x)h(χ), where f is a probability density function and h is a degenerate

density such that, for any χ > χ,∫ χχh(χ)dχ = ∞ but

∫ χχh(χ)dχ < ∞. In other

words, prior beliefs and ability are independently distributed, and for each belief x

there is a deep pool of agents, if candidates of low enough ability are considered.

Assume that the organization must maintain a fixed size µ; that it observes only

ability, and not beliefs, from its hiring process; and that it benefits from hiring high-

ability agents. Suppose that agents are compensated equally for their ability inside

or outside the organization (that is, their propensity to be members is independent of

27

Page 28: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

ability). Then, in equilibrium, at time t only candidates with prior belief x ≥ yt are

willing to work at the organization. Here yt is given by pt(yt) = ar, as in the baseline

model. The organization hires all candidates of ability at least χt, where χt is chosen

so that (1− F (yt))(1−H(χt)) = µ.

Since x and χ are independently distributed, the median belief within the orga-

nization at time t is still pt(mt). From this fact we can derive an analog of Proposition

1 and show that perpetual experimentation can also obtain in this case.33 Moreover,

over-experimentation becomes even more likely if prior beliefs and ability are posi-

tively correlated, or if the organization is able to observe beliefs to some extent and

prefers optimistic agents.

7.4 Common Values

Although our model features agents with private values, our results can be

extended to a model with common values, which is more appropriate for some of our

applications, such as environmental organizations or civil rights activism.

We discuss this extension in the context of our example of civil rights activism.

Instead of each agent x generating some private income flow, she now makes a flow

contribution to a rate of change in the relevant laws which can be attributed to the

activism of agent x. The mapping from membership and policy decisions to the

outcomes is the same as in Section 4, but now agents care only about the overall rate

of changes in the law, and not about their own contribution.

Formally, we let Ux(σy, σ) denote the private utility of agent x when she plays the

equilibrium strategy of agent y and the equilibrium path is dictated by the strategy

profile σ. Then in the private values case x’s equilibrium utility is Ux(σx, σ), while in

the common values case it is∫ 1

0Ux(σy, σ)f(y)dy. Note that, even though all agents

share the objective of maximizing the aggregate rate of legal change, their utility

functions still differ due to differences in prior beliefs. However, it is still optimal for

33In fact, perpetual experimentation is easier to obtain in this case. Letting χ0 be the abilitythreshold when everyone wants to be a member, that is, when there has been a success or the safepolicy is being used, we can show that the incentives to advocate for experimentation are the sameas in the baseline model for agents (x, χ) with χ ≥ χ0. However, agents (x, χ) with χ < χ0 havea dominant strategy to advocate for experimentation, because they know that a switch to the safepolicy would see them fired immediately.

28

Page 29: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

them to make membership decisions that maximize their flow contributions at each

point in time, just as in Section 4.34

Let us conjecture a strategy profile in which the organization experiments for-

ever, and let Vt(x) denote the continuation utility at time t of an agent who has

posterior belief x at time t under this strategy profile.35 Then Proposition 1 holds

with a similar proof, replacing V (pt(mt)) in the original proof with Vt(pt(mt)). More-

over, the following lower bound for the value function holds:

Proposition 8. For any x ∈ [0, 1] and any t ≥ 0,

V (x) ≥ Vt(x) ≥ min

{xr

γ, xr

γ+ (1− x)

a

γ− xr − a

γ + r

}

The lower bound on Vt(x) is the minimum of two expressions. The first one is the

expected payoff from the organization experimenting forever and all agents remaining

members forever. The second one is the payoff from the organization experimenting

forever and (almost) no one remaining a member.

Proposition 8 can be used to obtain a sufficient condition for perpetual ex-

perimentation. For instance, when the density of the prior beliefs f is uniform,

experimentation continues forever as long as

min

{2ra

r + a,

2ra

r + a+

(r − a)a

r + a

(1− 2

γ

γ + r

)}≥ s

In other words, over-experimentation can still occur in equilibrium for reason-

able parameter values – in particular, if a is close enough to s.

The common values setting differs from the baseline model in two important

ways. First, the fact that Vt(x) ≤ V (x) means that agents’ payoffs from experimen-

tation are always weakly lower in the common values case than in the private values

case. As a result, over-experimentation occurs for a smaller set of parameter values in

34Agents are now indifferent about their membership decisions: the membership status of a setof agents of measure zero has no impact on anyone’s payoffs. However, it is natural to assume thateach agent joins when doing so would be optimal if her behavior had a positive weight in her ownutility function. For instance, this is the case in a model with a finite number of agents.

35t matters in this case, in contrast to the model in Section 4, because the membership strategiesof other agents, which depend on t rather than x, enter the agent’s utility function.

29

Page 30: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

the common values case. The reason is that, under common values, an agent consid-

ers the entry and exit decisions of other agents suboptimal, and her payoff is affected

by these decisions as long as experimentation continues. In contrast, in the private

values case, agents’ payoffs depend only on their own entry and exit decisions, which

are chosen optimally given their beliefs.

Second, in the private values case, experimentation can continue forever even if

agents are impatient, as long as the density f does not decrease too quickly near 1 and

other parameters are chosen appropriately (for example, a is close to s). This occurs

because the pivotal agent is optimistic enough that the expected flow payoff from

experimentation is higher than s, even without taking the option value into account.

In contrast, in the common values case, the expected flow payoff from experimentation

goes to a as t→∞ if there are no successes, no matter how optimistic the agent is.

Indeed, here agents care about the contributions of all players, and they understand

that for large t most players will become outsiders and generate a, regardless of the

quality of the policy. Thus perpetual experimentation is only possible if agents are

patient enough.

30

Page 31: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

References

Acemoglu, Daron, Georgy Egorov, and Kostantin Sonin, “Coalition Forma-

tion in Non-Democracies,” Review of Economic Studies, 2008, 75 (4), 987–1009.

, , and , “Dynamics and Stability of Constitutions, Coalitions and Clubs,”

American Economic Review, 2012, 102 (4), 1446–1476.

, , and , “Political Economy in a Changing World,” Journal of Political

Economy, 2015, 123 (5), 1038–1086.

Bai, Jinhui and Roger Lagunoff, “On the Faustian Dynamics of Policy and Po-

litical Power,” Review of Economic Studies, 2011, 78 (1), 17–48.

Bell, David S, “The French Communist Party: from revolution to reform,” in Jo-

celyn Evans, ed., The French party system, Manchester University Press, 2003,

chapter 2.

Brechon, Pierre, Les partis politiques francais, La Documentation francaise, 2011.

Carreyrou, John, Bad Blood: Secrets and Lies in a Silicon Valley Startup, Penguin

Random House, 2018.

Damiani, Marco and Marino De Luca, “From the Communist Party to the

Front de gauche. The French radical left from 1989 to 2014,” Communist and Post-

Communist Studies, 2016, 49 (4), 313–321.

Durrett, Rick, Probability: Theory and Examples, 4 ed., Cambridge University

Press, 2010.

Gieczewski, German, “Policy Persistence and Drift in Organizations,” 2019.

Hirschman, Albert O., Exit, Voice, and Loyalty: Responses to Decline in Firms,

Organizations, and States, Cambridge, Massachusetts: Harvard University Press,

1970.

Keller, Godfrey and Sven Rady, “Strategic experimentation with Poisson ban-

dits,” Theoretical Economics, 2010, 5 (2), 275–311.

and , “Breakdowns,” Theoretical Economics, 2015, 10 (1), 175–202.

31

Page 32: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

, , and Martin Cripps, “Strategic Experimentation with Exponential Bandits,”

Econometrica, 2005, 73 (1), 39–68.

Roberts, Kevin, “Dynamic voting in clubs,” Research in Economics, 2015, 69 (3),

320–335.

Strulovici, Bruno, “Learning While Voting: Determinants of Collective Experi-

mentation,” Econometrica, 2010, 78 (3), 933–971.

32

Page 33: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

A Appendix (For Online Publication)

A.1 Proofs

We begin with three useful Lemmas. Lemma 1 shows that agents with more

optimistic priors derive higher payoffs from any expected continuation that involves

experimentation. Lemma 2 proves several properties of the agents’ value function

when the policy path involves switching to the safe policy if there are no successes by

some time T . In particular, we prove that the value function is single-peaked in T .

Lemma 3 shows that increasing the number of optimists (in the MLRP sense) results

in a more optimistic pivotal agent after any history.

Lemma 1. Let Vx(L, π) denote the value function of an agent with prior x when the

initial state is (L, π). Then for all (L, π), x 7→ Vx(L, π) is strictly and continuously

increasing for all agents x that are in the organization while it experiments, at a set

of times of positive measure with a positive probability (on the equilibrium path).

Proof of Lemma 1.

Consider two agents x′ > x. Let V xx′(L, π) denote the payoff to agent x′ from

copying the equilibrium strategy of agent x. When x and x′ are outside the organi-

zation, their flow payoffs are equal to a and do not depend on their priors.

When x′ is in the organization, if the organization is using the risky policy, at

a continuation where the state variable is L, x′’s expected flow payoff is p(L, x′

)b.

Because x′ > x, we have p(L, x′

)> p

(L, x

)and thus p

(L, x′

)r > p

(L, x

)b, so

x′’s flow payoff is higher than x’s when x and x′ are members.

We then have V xx′(L, π) > Vx(L, π) if x is in the organization while it exper-

iments, at a set of times of positive measure with a positive probability. Because

Vx′(L, π) ≥ V xx′(L, π), we have Vx′(L, π) > Vx(L, π), as required. The continuity is

proved similarly. �

Corollary 1. If the organization experiments forever on the equilibrium path, then

Vx(L, π) = Vp(L,x)(1, π). Moreover, Vp(L,x)(1, π) is increasing in p(L, x).

Let WT (x) denote the continuation value of an agent with current belief x in

33

Page 34: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

an equilibrium in which the organization stops experimenting after a length of time

T .36 Let T ∗ = argmaxT WT (x) denote the optimal amount of time that an agent with

prior x would want to experiment for if she was always in control of the organization.

Lemma 2. (i) T 7→ WT (x) is differentiable for all T ∈ (0,∞) and right-differentiable

at T = 0.

(ii) W0(x) = sγ

and ∂+WT (x)∂T

∣∣∣T=0

= max{xr, a} − s+ xr(r−s)γ

.

(iii) T 7→ WT (x) is strictly increasing for T ∈ [0, T ∗] and strictly decreasing for

T > T ∗.

(iv) If V (x) > sγ

, then WT (x) > sγ

for all T > 0.

Proof of Lemma 2.

Fix T0 ≥ 0 and ε > 0. We can write

WT0+ε(x)−WT0(x) = e−γT0QT0(x) (Wε(pT0(x))−W0(pT0(x)))

= e−γT0QT0(x)

(Wε(pT0(x))− s

γ

)where QT (x) is the probability that there is no success up to time T , based on the

prior x, and pT (x) is the posterior belief of an agent with prior x in this case. (See

Lemma 8 for details.) Then

∂WT (x)

∂T

∣∣∣∣∣T=T0

= limε↘0

WT0+ε(x)−WT0(x)

ε=

= limε↘0

e−γT0QT0(x)Wε(pT0(x))− s

γ

ε= e−γT0QT0(x)

∂WT (pT0(x))

∂T

∣∣∣∣∣T=0

The case with ε < 0 is analogous. Then it is enough to prove that T 7→ WT (x) is

right-differentiable at T = 0. This can be done by calculating WT (x) explicitly for

small T > 0. We do this for x > ar. In this case, for T sufficiently small, x is in the

organization because the experiment with the risky policy is going to stop before she

36This is defined for T ∈ [0,∞], where W∞(x) = V (x).

34

Page 35: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

wants to leave. Using that 1−QT (x) = x(1− e−rT ),

WT (x) =

∫ T

0

xre−γtdt+

∫ ∞T

e−γt(QT (x)s+ (1−QT (x))r)dt =

= xr1− e−γT

γ+e−γT

γ

(s+ x

(1− e−rT

)(r − s)

)This implies that ∂+WT (x)

∂T

∣∣∣T=0

= xr − s+ xr(r−s)γ

.

Similarly, it can be shown that if x ≤ ar, then ∂+WT (x)

∂T

∣∣∣T=0

= a − s + xr(r−s)γ

.

This proves (i) and (ii).

For (iii), note that, by our previous result, ∂WT (x)∂T

∣∣∣T=T0

is positive (negative)

whenever∂WT (pT0 (x))

∂T

∣∣∣T=0

is positive (negative). In addition, it follows from our cal-

culations that y 7→ ∂WT (y)∂T

∣∣∣T=0

is increasing and T0 7→ pT0(x) is decreasing. Moreover,

for large T0, pT0(x) is close to zero, so∂WT (pT0 (x))

∂T

∣∣∣T=0

is negative. It follows that

T 7→ WT (x) is single-peaked. If ∂WT (x)∂T

∣∣∣T=0

> 0, then the peak is the unique T ∗

satisfying ∂WT (x)∂T

∣∣∣T=T ∗

= 0. If ∂WT (x)∂T

∣∣∣T=0≤ 0, then T ∗ = 0.

Hence if 0 < T ≤ T ∗, then WT (x) > W0(x) = sγ

because in this case T 7→ WT (x)

is increasing by (iii), and if T > T ∗, then WT (x) ≥ limT→∞WT (x) = V (x) because

in this case T 7→ WT (x) is decreasing by (iii). This proves (iv). �

Lemma 3. Let m(L) and m(L) denote the median voter when the state variable is

(L, 1) and the density is f and f respectively. Suppose that f MLRP-dominates f .

Then m(L) ≥ m(L) for all L.

Proof of lemma 3.

Let y(L) denote the indifferent agent given information L under either density

(note that y(L) is given by the condition p(L, y(L)) = ar, which is independent of

the density). By definition, we have∫ m(L)

y(L)f(x)dx =

∫ 1

m(L)f(x)dx. Suppose that

m(L) < m(L). This is equivalent to∫ m(L)

y(L)

f(x)

f(x)f(x)dx =

∫ m(L)

y(L)

f(x)dx >

∫ 1

m(L)

f(x)dx =

∫ 1

m(L)

f(x)

f(x)f(x)dx

35

Page 36: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

Since f MLRP-dominates f , f(x)f(x)

is weakly increasing. Thus

∫ m(L)

y(L)

f(m(L))

f(m(L))f(x)dx ≥

∫ m(L)

y(L)

f(x)

f(x)f(x)dx >

∫ 1

m(L)

f(x)

f(x)f(x)dx ≥

∫ 1

m(L)

f(m(L))

f(m(L))f(x)dx

which is a contradiction. �

Proof of Proposition 1.

We will prove the following statement that encompasses Proposition 1 and the

remark preceding it. If V (pt(mt)) >sγ

for all t, there is an essentially unique equilib-

rium. In this equilibrium, the organization experiments forever. If inft≥0 V (pt(mt)) <sγ, there is no equilibrium in which the organization experiments forever.

We first argue that if inft≥0 V (pt(mt)) ≥ sγ, then experimenting forever is an

equilibrium if the organization is experimenting at time t = 0.

Consider the following strategy profile: α(L) = 1 for all L and β(x, L, π) is given

by Condition (i). We can check that Conditions (i)-(iii) hold, so this is an equilibrium

(in the class of equilibria that we restrict attention to).

Next, we argue that if V (pt(mt)) >sγ

for all t ≥ 0, then any equilibrium must

be of this form. Let σ be an equilibrium. Clearly, the risky policy must always be

used after a success.37 By assumption, after a switch from the risky policy to the safe

policy, the safe policy is used forever.

Now suppose for the sake of contradiction that σ is not the equilibrium with

perpetual experimentation given above – that is, suppose that for some L we have

α(L) 6= 1. By Condition (ii), we must have sγ≥ Vm(L,1)(L).

Let T be the (possibly random) time until the policy first switches to 0, starting

in state L. Then Vm(L,1)(L) = E[WT (p(L,m(L, 1)))], where WT (x) is as in Lemma 2

and the expectation is taken over T .

Recall that V (p(L,m(L, 1))) > sγ

by assumption. By Lemma 2, this im-

plies that WT (p(L,m(L, 1)))) > sγ

for all T > 0. If E[T ] > 0, it follows that

E[WT (p(L,m(L, 1)))] > sγ, implying that Vm(L,1)(L) > s

γ, a contradiction. If E[T ] =

37This follows trivially from Conditions (ii) and (iii).

36

Page 37: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

0, then Vm(L,1)(L) = sγ

but, by the same argument, Wε(p(L,m(L, 1)))) > sγ

for all

ε > 0, so α(L) = 1 by Condition (iii), a contradiction.

Finally, suppose that inft≥0 V (pt(mt)) <sγ, so that V (pt0(mt0)) <

for some

t0. Then an equilibrium in which α(L) = 1 for all L cannot exist, as by assumption

we would have α(ert0) = 0 by Condition (ii), so experimentation would stop at time

t0 if unsuccessful. �

Proposition 9. The value function V in Section 5 satisfies the following:

(i) If f is non-decreasing, then

γ inft≥0

V (pt(mt)) = γV

(2a

r + a

)=

2ra

r + a+

(1

2

) γr a(r − a)

r + a

r

γ + r

(ii) Given ω > 0, let fω(x) denote a density with support [0, 1] such that fω(x) =

(ω+ 1)(1− x)ω for x ∈ [0, 1]. Let f be a density with support [0, 1] that MLRP-

dominates fω. Let λ = 1

21

ω+1. Then

γ inft≥0

V (pt(mt)) ≥ γV

(a

λr + (1− λ)a

)=

ra

λr + (1− λ)a+λ

γ+rr

a(r − a)

λr + (1− λ)a

r

γ + r

(iii) Let f be any density with support [0, 1]. Then

γ inft≥0

V (pt(mt)) ≥ γV(ar

)= a+

a(r − a)

γ + r

Proof of Proposition 9.

We prove each inequality in three steps.

First, we show that the median posterior belief is uniformly bounded below for

all t, with different bounds depending on the density. When f is uniform, we have

pt(mt) ≥ 2ar+a

. For any ω > 0, when f = fω, we have pt(mt) ≥ aλr+(1−λ)a

for λ = 1

21

ω+1.

Finally, if f has full support, we have pt(mt) ≥ ar. The first two assertions follow

from the following Claim:

37

Page 38: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

Claim 9.1. Suppose that the initial distribution of priors is fω for some ω ≥ 0. Then

pt(mt) =a+ (1− λ)(r − a)e−rt

λ(r − a) + a+ (1− λ)(r − a)e−rt

Proof of Claim 9.1. The posterior belief of agent x at time t is given by

pt(x) = xe−rt

xe−rt+1−x . Using the fact that pt(yt) = ar

for the marginal member yt, we setyte−rt

yte−rt+1−yt = ar. Solving for yt, we obtain

yt =ar

ar

+(1− a

r

)e−rt

=a

a+ (r − a)e−rt

The median mt must satisfy the condition 2∫ 1

mtfω(x)dx =

∫ 1

ytfω(x)dx, so that 2(1−

mt)ω+1 = (1− yt)ω+1. Hence 1−mt = λ(1− yt), which implies that

mt = 1− λ+ λyt = 1− λ+ λa

a+ (r − a)e−rt=a+ (1− λ)(r − a)e−rt

a+ (r − a)e−rt

Substituting the above expression into the formula for pt(x), we obtain

pt(mt) =a+ (1− λ)(r − a)e−rt

λ(r − a) + a+ (1− λ)(r − a)e−rt

In particular, if ω = 0, then f is uniform and we have pt(mt) = 2a+(r−a)e−rt

r+a+(r−a)e−rt. �

Indeed, it then follows that pt(mt) ↘ 2ar+a

as t → ∞ when f is uniform, and

pt(mt)↘ aλr+(1−λ)a

when f = fω. The third claim is implied by the fact that mt ≥ yt

and pt(yt) = ar.

Second, we argue that these bounds hold not just for the aforementioned den-

sities but also for any that dominate them in the MLRP sense. This follows from

Lemma 3 and the fact that the function x 7→ pt(x) is strictly increasing.

Third, we observe that, since V (x) is strictly increasing and continuous in x

(by Lemma 1), we have inft≥0 V (pt(mt)) = V (inft≥0 pt(mt)). Hence, to arrive at the

bounds in the Proposition, it is enough to evaluate V at the appropriate beliefs.

To calculate V(

aλr+(1−λ)a

), we use the following two Claims:

Claim 9.2. Let t(x) denote the time it will take for an agent’s posterior belief to go

38

Page 39: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

from x to ar

(provided that no successes are observed during this time), at which time

she would leave the organization. Then

V (x) = xr1

γ+ (1− x)e−γt(x) a

γ− x(r − a)

e−(γ+r)t(x)

γ + r

Proof of Claim 9.2.

Let Pt = x(1 − e−rt) denote the probability that an agent with prior belief x

assigns to having a success by time t. Then

V (x) = x

∫ t(x)

0

re−γτdτ +

∫ ∞t(x)

(Pτr + (1− Pτ )a) e−γτdτ

The first term is the payoff from time 0 to time t(x), when the agent stays in the

organization. The second term is the payoff after time t(x), when the agent leaves

the organization and obtains the flow payoff a thereafter, unless the risky technology

has had a success (in which case the agent returns to the organization and receives a

guaranteed expected flow payoff r). We have

V (x) = x

∫ t(x)

0

re−γτdτ +

∫ ∞t(x)

ae−γτdτ +

∫ ∞t(x)

Pτ (r − a)e−γτdτ

= xr1− e−γt(x)

γ+ e−γt(x) a

γ+ x(r − a)

(e−γt(x)

γ− e−(γ+r)t(x)

γ + r

)= xr

1

γ+ (1− x)e−γt(x) a

γ− x(r − a)

e−(γ+r)t(x)

γ + r.

Claim 9.3. Let ty(x) denote he time it takes for an agent’s posterior belief to go from

x to y. Then

ty(x) = −ln(

y1−y

1−xx

)r

t(x) = −ln(a(1−x)(r−a)x

)r

If x = 2ar+a

, then e−rt(x) = 12. If x = a

λr+(1−λ)a, then e−rt(x) = λ.

Proof of Claim 9.3.

We solve pt(x) = xe−rt

xe−rt+1−x = y for t. Then we obtain e−rty(x) = y

1−y1−xx

or,

39

Page 40: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

equivalently, ty(x) = − ln( y1−y

1−xx )

r.

In particular, t(x) = tar (x) = −

ln(a(1−x)(r−a)x)r

. Substituting x = 2ar+a

into e−rt(x) =a(1−x)(r−a)x

and simplifying, we obtain e−rt = 12. Substituting x = a

λr+(1−λ)ainto e−rt(x) =

a(1−x)(r−a)x

and simplifying, we obtain e−rt(x) = λ. �

Thus, taking x = aλr+(1−λ)a

, we have e−rt(x) = λ and e−γt(x) = λγr . Substituting

this value of x into the formula for V (x) from Claim 9.2, we obtain

γV

(a

λr + (1− λ)a

)=

ra

λr + (1− λ)a+

(r − a)a

λr + (1− λ)a

r

γ + rλγ+rr

In particular, for ω = 0, this becomes

γV

(2a

r + a

)=

2ra

r + a+

(1

2

) γr (r − a)a

r + a

r

γ + r

On the other hand, for x = ar, we have t(x) = 0. Substituting this in, we obtain

γV(ar

)= a+

r − ar

a− a

r(r − a)

γ

γ + r= a+

(r − a)a

γ + r

An additional argument is required to show that the bound is tight in part (i).

Take f to be any non-decreasing density. Let mt denote the median at time

t under f , and let mt denote the median at time t under the uniform density. It is

sufficient to show that the asymptotic posterior of the median is 2ar+a

under f , that

is, that limt→∞ pt(mt) = limt→∞ pt(mt) = 2ar+a

.

Claim 9.4. Let m(L) and m(L) denote the median voters when the state variable

is L under the uniform density and a non-decreasing density f respectively. Suppose

that y(L)→ 1 as L→∞. Then 1−m(L)1−m(L)

→ 1 as L→∞.

Proof of Claim 9.4.

Given a state variable L and the marginal member y(L) corresponding to it,

let f0L = f(y(L)) and f1 = f(1). By Lemma 3, we have m(L) ≤ m(L) ≤ m(L)

where m(L) is the median corresponding to a density f such that f(x) = f0L for

x ∈ [y(L), m(L)] and f(x) = f1 for x ∈ [m(L), 1].

40

Page 41: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

By construction, because m(L) is the median, we have f0L(m(L) − y(L)) =

f1(1 − m(L)), so m(L) = f0Ly(L)+f1f0L+f1

. Thus 1 − m(L) = f0L(1−y(L))f0L+f1

and, because

m(L) = y(L)+12

so that 1−m(L) = 1−y(L)2

, we have 1−m(L)1−m(L)

= 2f0Lf0L+f1

.

Since f is increasing, using the fact that f(x)→ supy∈[0,1) f(y) as x→ 1, we find

that f(x)→ f(1) as x→ 1. Then, as t→∞, we have y(L)→ 1, f0L = f(y(L))→ f1

and 1−m(L)1−m(L)

→ 1. �

Claim 9.5. Let xt, xt be two time-indexed sequences of agents such that xt ≤ xt for

all t and xt → 1 as t→∞. If 1−xt1−xt → 1, then pt(xt)

pt(xt)→ 1.

Proof of Claim 9.5.

Using the formula for the posterior beliefs, we have

pt(xt)

pt(xt)=

xtxt + (1− xt)Lt

xt + (1− xt)Ltxt

=xtxt

xt + (1− xt)Ltxt + (1− xt)Lt

.

Since xt → 1 and xt ≥ xt for all t, xt → 1, whence xtxt→ 1. In addition, since

1−xt1−xt → 1, (1−xt)Lt

(1−xt)Lt → 1. As a result, for all t,

min

{xtxt,(1− xt)Lt(1− xt)Lt

}≤ xt + (1− xt)Ltxt + (1− xt)Lt

≤ max

{xtxt,(1− xt)Lt(1− xt)Lt

}so xt+(1−xt)Lt

xt+(1−xt)Lt → 1, which concludes the proof. �

Claim 9.4 shows that 1−mt1−mt → 1. Note that we have mt ≥ mt for all t by Lemma

3 and mt → 1 as t→∞. Then Claim 9.5 applies. Claim 9.5 applied to the sequences

mt and mt guarantees that the ratio pt(mt)pt(mt)

converges to 1. �

Corollary 9.1. There exist parameters such that inf V (pt(mt)) ≥ sγ

.

Proof of Corollary 9.1. We can, in fact, prove that, holding the other param-

eters constant, there is a∗ < s such that, if a ∈ [a∗, s), then inft V (pt(mt)) ≥ sγ. This

follows from Proposition 9(iii): indeed, we can take a∗ to be such that a∗+ a∗(r−a∗)γ+r

=

s. �

Proof of Proposition 2 and the assertions following it. First note that

when f = fω, the fact that inft V (pt(mt)) is increasing in r follows from the formula for

41

Page 42: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

V(

aλr+(1−λ)a

)given in the proof of Proposition 9. Lemma 3 implies that an MLRP-

increase in f increases inft V (pt(mt)). As for changes in a, note that an increase in

a clearly increases V (x) for each x, and also increases yt, and hence mt, for each t.

Finally, note that a decrease in γ is equivalent to an increase in the learning rate.38

This leaves yt and mt unchanged but increases V (x) for all x, as agents have strictly

more information to base their entry and exit decisions on. �

Next we will characterize equilibria without perpetual experimentation. To do

this, define a stopping function ϕ : [0,∞) → [0,∞] as follows. For each t ≥ 0,

ϕ(t) ≥ t is such that mt is indifferent about switching to the safe policy at time t if

she expects a continuation where experimentation will stop at time ϕ(t) should she

fail to stop at t. If the agent never wants to experiment regardless of the expected

continuation, then ϕ(t) = t, while if she always does, ϕ(t) =∞.

Proposition 10. Any pure strategy equilibrium σ in which the organization does not

experiment forever is given by a sequence of stopping times t0(σ) ≤ t1(σ) ≤ t2(σ) ≤ . . .

such that tn(σ) = ϕ(tn−1(σ)) for all n > 0 and t0(σ) ≤ ϕ(0).

There exists t ∈ [0, ϕ(0)] for which (t, ϕ(t), ϕ(ϕ(t)), . . .) constitutes an equilib-

rium. Moreover, if ϕ is weakly increasing, then (t, ϕ(t), ϕ(ϕ(t)), . . .) constitutes an

equilibrium for all t ∈ [0, ϕ(0)].

Proof of Proposition 10. We first argue that the stopping function ϕ is

well-defined. Let t be the current time and let t∗ be the time at which mt would

choose to stop experimenting if she had complete control over the policy. Recall the

definition of WT−t(x) from Lemma 2: WT−t(x) is the value function starting at time

t of an agent with belief x at time t given a continuation equilibrium path on which

the organization experiments until T and then switches to the safe technology. Then,

equivalently, t∗ = argmaxT WT−t(x).

There are three cases. If t∗ = t, then ϕ(t) = t. If t∗ > t, that is, if x wants to

experiment for a positive amount of time, and V (pt(mt)) <sγ, then WT−t(pt(mt)) is

strictly increasing in T for T ∈ [t, t∗] and strictly decreasing in T for T > t∗, as shown

in Lemma 2, and there is a unique ϕ(t) > t∗ for which Wϕ(t)−t(pt(mt)) = sγ. Finally,

38In other words, it is equivalent to increasing the success rate of the good risky policy to rq, forq > 1, and lowering the payoff per success to 1

q .

42

Page 43: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

if t∗ > t and V (pt(mt)) ≥ sγ, then ϕ(t) =∞.

Next, note that ϕ is continuous. If ϕ(t0) ∈ (t0,∞), then for t in a neighborhood

of t0, ϕ(t) is defined by the condition Wϕ(t)−t(pt(mt)) = sγ, where pt(mt) is continu-

ous in t, and WT (x) is differentiable in (T, x) at (T, x) = (ϕ(t), pt(mt)) and strictly

decreasing in T ,39 ao the continuity of ϕ follows from the Implicit Function Theorem.

The proofs for the cases when ϕ(t0) = 0 or ϕ(t0) =∞ are similar.

Consider a pure strategy equilibrium σ in which the organization does not exper-

iment forever on the equilibrium path. Let t0(σ) be the time at which experimentation

stops on the equilibrium path. Clearly, we have t0(σ) ≤ ϕ(0), as otherwise m0 would

switch to the safe policy at time 0. As before, if a success occurs or if the organization

switches to the safe policy, everyone joins the organization permanently.

Consider what happens at time t0(σ) if mt0(σ) deviates and continues experi-

menting. Suppose first that ϕ(t0(σ)) ∈ (t0(σ),∞). In a pure strategy equilibrium,

there must be a time t1(σ) ≥ t0(σ) for which experimentation stops in this continua-

tion, and it must satisfy t1(σ) = ϕ(t0(σ)). To see why, suppose that t1(σ) > ϕ(t0(σ)).

In this case, for ε > 0 sufficiently small, mt0(σ)+ε would strictly prefer to stop experi-

menting, a contradiction. On the other hand, if t1(σ) < ϕ(t0(σ)), then mϕ0(σ) would

strictly prefer to deviate from the equilibrium path and not stop.

Next, suppose that ϕ(t0(σ)) = ∞, that is, mt0(σ) weakly prefers to continue

experimenting regardless of the continuation. Then it must be that t1(σ) = ∞ and

V(pt0(σ)

(mt0(σ)

))= s

γ, and in this case we must still have t1(σ) = ϕ(t0(σ)).

Now suppose that ϕ(t0(σ)) = t0(σ), that is, mt0(σ) weakly prefers to stop regard-

less of the continuation. In this case, the implied sequence of points is (t0(σ), t0(σ), . . .).

This does not fully describe the equilibrium, as it does not specify what happens con-

ditional on not experimentation by t0(σ), but still provides enough information to

characterize the equilibrium path fully, as in any equilibrium experimentation must

stop at t0(σ).

Next, we show that if ϕ is increasing and t ∈ [0, ϕ(0)], then (t, ϕ(t), ϕ(ϕ(t)), . . .)

constitutes an equilibrium. Our construction already shows that mtn(σ) is indifferent

between switching to the safe policy at time tn(σ) and continuing to experiment.

39The fact that WT (x) is differentiable in T at (T, x) = (ϕ(t), pt(mt)) and strictly decreasing inT is implied by Lemma 2.

43

Page 44: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

To finish the proof, we have to show that for t not in the sequence of the stopping

times, mt weakly prefers to continue experimenting. Fix t ∈ (tn(σ), tn+1(σ)). Since

t > tn(σ) and ϕ is increasing, we have ϕ(t) ≥ ϕ(tn(σ)) = tn+1(σ). Hence the

definition of ϕ(t) and the fact that T 7→ WT (x) is single-peaked by Lemma 2 imply

that Wtn+1(σ)−t(pt(mt)) ≥ sγ, and Conditions (ii) and (iii) imply that mt weakly prefers

to continue experimenting.

Finally, we show that even if ϕ is not increasing, this construction yields an

equilibrium for at least one value of t ∈ [0, ϕ(0)]. Note that our construction fails if

and only if there is t ∈ (tk(σ), tk+1(σ)) for which ϕ(t) < tk+1(σ). Motivated by this,

we say t is valid if ϕ(t) = inft′≥t ϕ(t′), and say t is n-valid if t, ϕ(t), . . . , ϕ(n−1)(t) are

all valid. Let A0 = [0, ϕ(0)] and, for n ≥ 1, let An = {t ∈ [0, ϕ(0)] : t is n-valid}.

Suppose that ϕ(t) > t and ϕ(t) < ∞ for all t. Clearly, An ⊇ An+1 for all n,

and the continuity of ϕ implies that An is closed for all n. In addition, An must be

non-empty for all n by the following argument. Take t0 = t and define a sequence

{t0, t−1, t−2, . . . , t−k} by t−i = max {ϕ−1(t−i+1)} for i ≤ −1, and t−k ∈ [0, ϕ(0)]. By

construction, t−k ∈ A0 is k-valid, and, because ϕ(t) < ∞ for all t, if we choose t

large enough, we can make k arbitrarily large.40 Then A = ∩∞0 An 6= ∅ by Cantor’s

intersection theorem, and any sequence (t, ϕ(t), . . .) with t ∈ A yields an equilibrium.

The same argument goes through if ϕ(t) = ∞ for some values of t but there are

arbitrarily large t for which ϕ(t) <∞.

If ϕ(t) = t for some t, let t = min{t ≥ 0 : ϕ(t) = t}. If there is ε > 0 such that

ϕ(t) ≥ ϕ(t)

for all t ∈(t− ε, t

), then we can find a finite equilibrium sequence of

stopping times by setting t0 = t and using the construction in the previous paragraph.

If there is no such ε, then the previous argument works.41 The only difference is that,

to show the non-emptiness of An, we take t→ t instead of making t arbitrarily large.

If ϕ(t) > t for all t and there is t for which ϕ(t) =∞ for all t ≥ t, without loss

of generality, take t to be minimal (that is, let t = min{t ≥ 0 : ϕ(t) =∞}). Then we

can find a finite sequence of stopping times compatible with equilibrium by taking

40Under the assumption that ϕ(t) <∞ for all t, since ϕ is continuous, the image of ϕl restrictedto the set [0, ϕ(0)] is compact and hence bounded for all l. Thus for any t larger than the supremumof this image, k must be larger than l.

41If there is ε > 0 with the aforementioned property, then ϕ−1(t) is strictly lower than t andreaching [0, ϕ(0)] takes finitely many steps. If there is no such ε, then ϕ−1(t) = t and there exists asequence converging to t.

44

Page 45: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

t0 = t, assuming that mt0 stops at t0 and using the above construction. �

Proof of Proposition 3. Part 1 follows from the last claim in Proposition 10:

indeed, it holds whenever the parameters are such that τ is increasing (note also that

T = τ(0)). Part 2 follows from the same proof as Proposition 1. In addition, the

Remark preceding Proposition 3 follows from the first claim in Proposition 10. �

A.2 A Model of Bad News

Lemma 4. In a model of bad news, the value function of an agent with prior x who

is in the organization and expects the organization to continue forever unless a failure

is observed is

V (x) = (xr + (1− x)s)1

γ− (1− x)s

1

γ + r

Proof of lemma 4. Note that an agent receives an expected flow payoff of

r only if the technology is good and the organization has not switched to the safe

technology upon observing a failure. Because a good technology cannot experience a

failure, as long as experimentation continues, an agent with posterior belief x receives

an expected flow payoff of r with probability x.

Let Pt = x + (1− x)e−rt denote the probability that an agent with prior belief

x assigns to not having a failure by time t. Then

V (x) =

∫ ∞0

(xr + (1− Pτ )s) e−γτdτ =

∫ ∞0

(xr + (1− x)(1− e−rτ )s

)e−γτdτ

= (xr + (1− x)s)1

γ− (1− x)s

1

γ + r

Assumption 1. The parameters r, s, a, γ, f are such that for all t′ > t, ∂∂tWt′−t(pt(mt)) 6=

0 whenever Wt′−t(pt(mt)) = sγ

.

Assumption 1 guarantees that the agents’ value functions are well-behaved: that

is, for each t′, the function t 7→ Wt′−t(pt(mt)) crosses the threshold sγ

finitely many

times, and is never tangent to it. Under this assumption, Proposition 11 characterizes

the equilibrium in the bad news model.

45

Page 46: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

Proposition 11. Under Assumption 1, there is a unique equilibrium. The equilibrium

can be described by a finite, possibly empty set of intervals I0 = [t0, t1], I1 = [t2, t3],

. . . , In such that t0 < t1 < t2 < . . . as follows: conditional on the risky policy having

reen used during [0, t] with no failures, the median mt switches to the safe policy at

time t if and only if t ∈ Ik for some k.

Proof of Proposition 11. We first argue that there exists T such that for

all t ≥ T , if no failures have been observed during [0, t], then V (pt(mt)) >sγ

and

pt(mt)r > s. Note that, because in a model of bad news agents do not leave the

organization, we have lim inft→∞mt > 0. Moreover, limt→∞ e−rt = 0. This implies

that limt→∞ pt(mt) = limt→∞mt

mt+e−rt(1−mt) = 1, so limt→∞ pt(mt)r = r > s. Provided

that no failures have been observed during [0, t], we have limt→∞ V (pt(mt)) = V (1)

because V is continuous, and V (1) = rγ> s

γ. Next, we argue that these agents will

always experiment.

Claim 11.1. If pt(mt)r > s, then in any equilibrium mt continues experimenting.

Proof of claim 11.1. Suppose for the sake of contradiction that this is not

the case. Let t + t+ denote the first time after t when the equilibrium prescribes a

switch to the safe policy.42 Because experimentation will stop after a period of length

t+, the payoff to mt from experimenting is

Wt+(pt(mt)) =

∫ t+

0

(pt(mt)r + (1− Pτ )s)e−γτdτ +

∫ ∞t+

se−γτdτ

≥∫ t+

0

pt(mt)re−γτdτ +

∫ ∞t+

se−γτdτ = pt(mt)r1− e−γt+

γ+ s

e−γt+

γ

The payoff to experimentation is sγ. Then, since pt(mt)r > s by assumption, we

have 1−e−γt+γ

pt(mt)r+ e−γt+

γs > s

γ, so mt strictly prefers to continue experimenting, a

contradiction. �

Our results so far are already enough to deal with one important case. If

V (pt(mt)) >sγ

for all t, then the organization experiments forever. The reason is

as follows. For t ≥ T , all pivotal agents mt continue experimenting by Claim 11.1.

Let T ⊆ [0, T ) be the set of times for which the pivotal agent at that time stops

42We write the argument assuming that t+ > 0. If t+ = 0, the proof follows a similar argumentleveraging Condition (iii).

46

Page 47: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

experimenting in equilibrium, and assume T is nonempty. Let t∗ = sup T . If t∗ ∈ T ,

then mt∗ stops experimenting even though V (pt(mt)) >sγ

and mt∗ gets perpetual

experimentation by continuing, a contradiction. If t∗ /∈ T , a similar argument can be

made leveraging Condition (iii).

Suppose then that there exists t ≤ T such that V (pt(mt)) <sγ.

Claim 11.2. Suppose that on the equilibrium path, the organization continues experi-

menting for time t+ unless a failure occurs and then switches to the safe policy. Then

the value function of an agent with prior x in this equilibrium is given by

(xr + (1− x)s)1− e−γt+

γ− (1− x)s

1− e−(γ+r)t+

γ + r+ e−γt+

s

γ

Proof of claim 11.2. Because experimentation ends after a period of length

t+, the value function of agent x is given by∫ t+

0

(xr + (1− Pτ )s) e−γτdτ +

∫ ∞t+

se−γτdτ

= (xr + (1− x)s)1− e−γt+

γ− (1− x)s

1− e−(γ+r)t+

γ + r+ e−γt+

s

γ

Claim 11.3. Suppose that in some equilibrium mt0 stops experimenting. If for all

t ∈ [t, t0) we have pt(mt)r < s, then for all t ∈ [t, t0), mt stops experimenting.

Proof of claim 11.3. Suppose for the sake of contradiction that this is not

the case. Then there exists a non-empty subset B ⊆ [t, t0) such that for all t ∈ B,

mt continues experimenting.

There are two cases. In the first case, B has a non-empty interior. In this

case, for all ε > 0 small, there must exist τ ∈ [t, t0) such that, starting at time τ ,

experimentation continues up to time τ + ε and then stops.43

By claim 11.2, the payoff tomτ from continuing experimentation isWε(pτ (mτ )) =

(pτ (mτ )r + (1 − pτ (mτ ))s)1−e−γε

γ− (1 − pτ (mτ ))s

1−e−(γ+r)ε

γ+r+ e−γε s

γ, which is of the

form sγ

+ (pτ (mτ ))r − s)ε + O(ε2). The payoff to experimentation is sγ. Then, since

43To find such τ , let t be in the interior of B, and let ˜t = inf{t ≥ t : t /∈ B}. Then τ = ˜t− ε worksfor all ε > 0 small enough.

47

Page 48: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

pτ (mτ )r < s by assumption, for ε small enough mτ strictly prefers to stop experi-

menting, a contradiction.

In the second case, the interior of B is empty. In this case, the proof follows a

similar argument leveraging Condition (iii). �

Let t2n+1 = sup{t : V (pt(mt)) <

}denote the largest time for which the me-

dian stops experimenting.

Let T1 = {t : pt(mt)r ≤ s} and T2 = {t : pt(mt)r > s}. Our genericity

assumption (Assumption 1) implies that T1 and T2 are finite collections of intervals.

Enumerate the intervals such that T1 = ∪ni=0[ti, ti].

Suppose first that pt(mt)r ≤ s for all t < t2n+1. In this case, by claim 11.3, for

all t ≤ t2n+1, mt stops experimentation. Then we set n = 0, t0 = 0 and I0 = [t0, t1].

Suppose next that there exists t < t2n+1 such that pt(mt)r > s. Set t2n =

sup{t < t2n+1 : pt(mt)r > s}. Note that, because F admits a continuous density,

t 7→ pt(mt) is continuous, which implies that we must have pt2n(mt2n)r− s = 0. Then

claim 11.3 implies that for all t ∈ [t2n, t2n+1], mt stops experimentation. Note also

that t2n < t2n+1 as s = γV (pt2n+1(mt2n+1)) > pt2n+1(mt2n+1)r.

Let us conjecture a continuation equilibrium path on which, starting at t, the

organization experiments until t2n. Recall that Wt2n−t(x) denotes the value function

of an agent with belief x (at time t) given this continuation equilibrium path. We

then let t2n−1 = sup{t < t2n : Wt2n−t(pt(mt)) ≤ s

γ

}.

Note that, because, by construction, for t ∈ (t2n−1, t2n) we haveWt2n−t(pt(mt)) >sγ, the median mt continues experimentation for all t ∈ (t2n−1, t2n).

Since F admits a continuous density, t 7→ Wt2n−t(pt(mt)) is continuous, which

implies that we must have t2n−1 = max{t < t2n : Wt2n−t(pt(mt)) ≤ s

γ

}. Note that it

is then consistent with equilibrium for the median mt2n to stop experimenting.

Now note that if Wt2n−t2n−1(pt2n−1(mt2n−1)) = sγ, then pt2n−1(mt2n−1)r < s. By

continuity, there exists an interval [ti, ti] in T1 such that t2n−1 ∈ [ti, ti] (and ti satisfies

ti = min{t < t2n−1 : pt(mt)r ≤ s}).

Set t2n−2 = ti. Because pt(mt)r ≤ s for all t ∈ [t2n−2, t2n−1], claim 11.1 implies

that, for all t ∈ [t2n−2, t2n−1], mt stops experimenting.

48

Page 49: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

We then proceed inductively in the above manner, finding the largest t strictly

less than t2n−2 such that Wt2n−2−t(pt(mt)) ≤ sγ. Because T1 is finite collection of

intervals, the induction terminates in a finite number of steps.

The equilibrium is generically unique for the following reason. Under Assump-

tion 1, each t2k+1 satisfies not onlyWt2k+2(pt2k+1

(mt2k+1)) = s

γbut also ∂

∂tWt2k+2−t(pt(mt))|t=t2k+1

>

0, that is, Wt2k+2−t(pt(mt)) <sγ

for all t < t2k+1 close enough to t2k+1. Thus, even if

we allow mt2k+1to continue experimenting, all agents in (t2k+1− ε, t2k+1) must stop as

they strictly prefer to do so. Likewise, each t2k satisfies not only pt2k(mt2k)b− s = 0

but also ∂∂tpt(mt)|t=t2k < 0, that is, pt(mt)r− s > 0 for all t < t2k close enough to t2k.

Thus, even if we allow mt2k to stop experimenting, all agents in (t2k − ε, t2k) must

stop as they strictly prefer to do so. �

Proof of Proposition 4. Part 1 is proved as part of Proposition 11. Part

2 follows from the characterization given in Proposition 11, in particular, from the

observation that t2n < t2n+1. �

A.3 A Model of Imperfectly Informative (Good) News

Proposition 12. If Vm(L)(L) > sγ

for all L, then there is a unique equilibrium.

In it, the organization experiments forever. Moreover, if f is non-decreasing, then

Vm(L)(L) ≥ V(

2(a−c)(r−c)+(a−c)

)≥ 1

γ(r−c)a+(a−c)r(r−c)+(a−c) , so there exist parameter values such that

Vm(L)(L) > sγ

for all L.

Proof of Proposition 12. The proof is similar to the proof for the baseline

model (Propositions 1 and 9). If Vm(L)(L) > sγ

for all L, perpetual experimentation

is clearly an equilibrium, as each pivotal agent m(L) has a choice between Vm(L)(L)

and sγ, and strictly prefers the former. The equilibrium is unique by the following

argument. Suppose for the sake of contradiction that there is another equilibrium in

which experimentation stops whenever L ∈ L 6= ∅. Let WL(x) denote the continu-

ation utility of an agent with current belief x when she expects the organization to

stop whenever L ∈ L.

For L close enough to 0, it can be shown that pivotal agents will prefer to

experiment no matter what equilibrium continuation they expect. That is, WL(x) ≥

49

Page 50: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

for all L and x close enough to 1. In other words, there is L0 > 0 such that

L ⊆ (L0,+∞).

Let L1 = inf L. It can be shown that, because m(L1) would rather experi-

ment forever than not at all, she would also prefer to experiment until L hits L.

That is, if Vm(L1)(L1) > sγ

then WL(p(L1,m(L1))) > sγ. To see why, suppose that

WL(p(L1,m(L1))) ≤ sγ. Clearly, this implies WL(p(L,m(L1))) < s

γfor any L > L1.

Note that changing the set of states from L to ∅ changes the payoff m(L1) gets from

some continuations—namely, continuation starting at states L > L1—from sγ

to ob-

jects of the form WL(p(L,m(L1))) for L > L1. Hence we must have Vm(L1)(L1) < sγ,

a contradiction.

This proves the first statement. Next, we provide an explicit bound on V when

f is non-decreasing.

Claim 12.1. If the organization is experimenting at time t, then an agent with prior

belief x is in the organization at time t if and only if L(k, t) ≤ x(r−a)(1−x)(a−c) .

Proof of Claim 12.1. Because agents make their membership decisions based

on expected flow payoffs, agent x is a member at time t if and only if p(L, x)r+ (1−p(L, x))c ≥ a, that is, if p(L, x) ≥ a−c

r−c . Since p(L, x) = xx+(1−x)L(k,t)

, this is equivalent

to L(k, t) ≤ x(r−a)(1−x)(a−c) . �

Claim 12.2. If f is uniform, limL→∞ p(L,m(L)) = 2(a−c)(r−c)+(a−c) .

Proof of Claim 12.2. By Claim 12.1, the marginal member y(L) satisfies

L = y(L)(r−a)(1−y(L))(a−c) , whence y(L) = a−c

a−c+(r−a) 1L

. If f is uniform, we have m(L) = 1+y(L)2

,

so m(L) = 12

2L(a−c)+r−aL(a−c)+r−a . Recall that p(L,m(L)) = 1

1+( 1m(L)

−1)L. Then p(L,m(L)) =

2L(a−c)+r−aL(2(a−c)+r−a)+r−a , so limL→∞ p(L,m(L)) = 2(a−c)

(r−c)+(a−c) . �

Assume that f is uniform. By Lemma 1, x 7→ V (x) is strictly increasing and,

by Lemma 5, L 7→ p(L,m(L)) is decreasing, so L 7→ V (p(L,m(L))) is decreasing.

Thus, for all L, V (p(L,m(L))) ≥ limL′→∞ V (p(L′,m(L′))) ≥ sγ. By Claim 12.2,

limL′→∞ V (p(L′,m(L′))) ≥ V(

2(a−c)(r−c)+(a−c)

). By Lemma 3, this result extends to all

non-decreasing densities f .

Next, we show that V(

2(a−c)(r−c)+(a−c)

)≥ 1

γ(r−c)a+(a−c)r(r−c)+(a−c) . Note that, in an equi-

librium in which the organization experiments forever, the payoff of an agent x is

50

Page 51: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

bounded below by her payoff from staying in the organization forever. This is given

by rγ

if the risky policy is good and cγ

if not. Then V (x) ≥ x rγ

+ (1 − x) cγ, which

implies that V(

2(a−c)(r−c)+(a−c)

)≥ 1

γ(r−c)a+(a−c)r(r−c)+(a−c) .

Finally, note that there exist parameter values such that 1γ

(r−c)a+(a−c)r(r−c)+(a−c) ≥

is satisfied. In general, for any values of r, a and c satisfying r > a > c > 0,

there is r∗(r, a, c) such that the condition holds if s ≤ s∗(r, a, c), and, moreover,

r∗(r, a, c) ∈ (a, r). �

Proof of Proposition 5. Follows from Proposition 12. �

The following two results (Lemma 5 and Lemma 6) show how to construct prior

distributions for which the mapping L 7→ p(L,m(L)) has a strict interior minimum,

as needed for Proposition 13. Lemma 6 provides a tractable necessary and sufficient

condition, and Lemma 5 gives an explicit construction.

Lemma 5. If the distribution of priors is power law, then L 7→ p(L,m(L)) is decreas-

ing. Moreover, if L0m′(L0) < m(L0)(1 − m(L0)), then L 7→ p(L,m(L)) is strictly

decreasing at L = L0, and if L0m′(L0) > m(L0)(1−m(L0)), then L 7→ p(L,m(L)) is

strictly increasing at L = L0.

Proof of lemma 5. The density of the power law distribution is given by

fω(x) = (1− x)ω(ω + 1). Hence we have F (z) = 1− (1− z)ω+1 and the CDF of the

distribution with support on [y, 1] is given by (1−y)ω+1−(1−z)ω+1

(1−y)ω+1 .

Recall that m(L) and y(L) denote the median and the marginal members of

the organization respectively when the state variable is L. The above argument

implies that the median must satisfy (1−y(L))ω+1−(1−m(L))ω+1

(1−y(L))ω+1 = 12. Equivalently, we

must have (1−m(L))ω+1 = 12(1−y(L))ω+1. Then the median must satisfy 1−m(L) =

(1− y(L))2−1

ω+1 , or m(L) = 1− κ+ κy(L) for κ = 2−1

ω+1 .

Note that p(L,m(L)) = 1

1+( 1m(L)

−1)L. Then ∂

∂Lp(L,m(L)) ∝ − ∂

∂L

(1 +

(1

m(L)− 1)L)

and ∂∂L

(1 +

(1

m(L)− 1)L)

= ∂∂L

((1

m(L)− 1)L)

= 1m(L)− 1− L

(m(L))2m′(L).

This implies that if L0m′(L0) < m(L0)(1 − m(L0)), then L 7→ p(L,m(L)) is

strictly decreasing at L = L0, and if L0m′(L0) > m(L0)(1 − m(L0)), then L 7→

p(L,m(L)) is strictly increasing at L = L0.

51

Page 52: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

After some algebra, using the fact that y(L) = a−ca−c+(r−a) 1

L

, we get that if the

distribution of priors is power law, then Lm′(L) < m(L)(1 −m(L)) is equivalent to

0 < (1 − κ)(1 − ζ), where ζ = a−cr−c . Since κ and ζ are between 0 and 1, this always

holds. �

Lemma 6. There exist distributions for which there exist states L1 < L2 such that

L1 is a unique minimizer of p(L,m(L)) and L 7→ p(L,m(L)) is strictly increasing on

(L1, L2).

Proof of lemma 6.

Consider a distribution with a density f(x) = d for x ∈ [0, b] and f(x) = d′

for x ∈ [b, 1]. Note that we must have db + d′(1 − b) = 1 so that f integrates to

1. Define r = r − c, a = a − c, y = y(L), m = m(L), z = p(L,m(L)). Let L1 be

such that m(L1) = b and let L2 be such that y(L2) = b. Clearly, 0 < L1 < L2. For

L > L2, m(L) and p(L,m(L)) are the same as in the uniform case. In particular,

p(L,m(L)) = 2La+r−aL(a+r)+r−a , which is decreasing in L. Moreover, with the notation we

have defined, the formula for y(L) can be written as y = LaLa+r−a .

For L ∈ (L1, L2), we have d(b − y) + d′(m − b) = d′(1 − m), that is, m =1+b

2− db

2d′+ d

2d′y. Equivalently, m =

(1− 1

2d′

)+ d

2d′y =

(1− 1

2d′

)+ d

2d′La

La+r−a . Then

1

z− 1 =

L(1−m)

m= L

L1−d2d′a+ 1

2d′(r − a)(

1− 12d′

+ d2d′

)La+

(1− 1

2d′

)(r − a)

For L < L1, we have d(m−y) = d(b−m)+d′(1−b), that is, 2dm = db+d′(1−b)+dy =

1 + dy, so m = 12d

+ 12y, and

1

z− 1 =

L(1−m)

m= L

L(

12− 1

2d

)a+

(1− 1

2d

)(r − a)(

12d

+ 12

)La+ 1

2d(r − a)

.

Now take d′ = 12

and any d > 1 (note that choosing roth pins down b = 12d−1

). Then

we can verify that L 7→ 1p(L,m(L))

−1 is increasing on (0, L1) and decreasing on (L1, L2).

In other words, L 7→ p(L,m(L)) is decreasing on (0, L1) and (L2,∞) but increasing

on (L1, L2), so L1 is a local minimizer for p(L,m(L)).

Moreover, we can verify that under some extra conditions L1 is a global min-

imizer: note that limL→∞1

p(L,m(L))− 1 = r−a

2a, while 1

p(L1,m(L1))− 1 = L1(1−d)a+r−a

da.

52

Page 53: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

Since m(L1) = b, we have

1

p(L1,m(L1))− 1 =

L1

m(L1)− L1 =

L1

b− L1 =

L1(1− d)a+ r − ada

L1 =r − a

a(db− 1)

1

p(L1,m(L1))− 1 =

L1

b− L1 =

r − aa

1b− 1

db− 1

=r − aa

1− bd− b

=r − aa

2d− 2

2d2 − d− 1=r − aa

1

d+ 12

so L1 is a global minimizer if we take d ∈(1, 3

2

). �

Proposition 13. There exist r, s, a, c, f , ε ∈ (0, 1] and L∗ > 0 such that an

equilibrium of the following form exists: whenever L = L∗, the organization stops

experimenting with probability ε, and whenever L 6= L∗, the organization continues

experimenting with probability one.

Proof of Proposition 13. For convenience, we multiply all the value functions

in this proof by γ. Let V εx (L) denote the value function of agent x given that the

state is L(k, t) = L and the behavior on the equilibrium path is as described in the

Proposition. Note that V 0x (L) is the value function of agent x when the state is L

and there is perpetual experimentation.

We claim that we can choose the density f such that there is a unique global

minimum of L 7→ V 0m(L)(L), which we will call L∗, and in addition so that V 0

m(L)(L)

has a kink at L∗. By Corollary 1, this is equivalent to p(L,m(L)) being uniquely

minimized at L∗ with a kink at L∗. This claim then follows from Lemma 6.44

Note that V 0m(L)(L) does not depend on s and the density f constructed in the

proof of Lemma 6 does not depend on s, which implies that L∗ does not depend on

s. Then we can choose s such that

V 0m(L∗)(L

∗) = s (1)

44Technically, we also need the condition that V 0m(L∗)(L

∗) < limL→∞ V 0m(L)(L), but this is also

satisfied by the construction in Lemma 6.

53

Page 54: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

Then, because L∗ is the unique minimizer of L 7→ V 0m(L)(L), we have V 0

m(L)(L) >

s for all L 6= L∗.

We aim to show that if we change the equilibrium to require that experimenta-

tion stops at L = L∗ with an appropriately chosen probability ε > 0, the constraints

V εm(L∗)(L

∗) = s and V εm(L)(L) ≥ s for all L 6= L∗ still hold.

It is useful at this point to write the value function recursively, as follows. For

any strategy profile and any L, L′ ∈ R, define

ζx,L′(L) =

∫ ∞0

γe−γtPr[∃t′ ∈ [0, t] : Lt′ = L′|L0 = L]dt

Vx,L′(L) =

∫∞0γe−γtE[ux(h

t)16∃t′∈[0,t]:Lt′=L′|L0 = L]dt

1− ζx,L′(L)

where ux(ht) is agent x’s flow payoff at time t and history ht and the expectation is

taken with respect to the stochastic process induced by the equilibrium strategy and

the stochastic process(Lτ

)τ.

Intuitively, ζx,L′(L) is the weighted discounted probability that the stochastic

process (Lt)t hits the value L′ at least once, and the numerator of Vx,L′(L) is the

expected utility of agent x starting with L0 = L but setting the continuation value

to zero when (Lt)t hits L′. Then the value function can be written recursively as

Vx(L) = (1− ζx,L′(L))Vx,L′(L) + ζx,L′(L)Vx(L′)

Taking L′ = L∗, this implies that for any ε ∈ [0, 1], we have

V εx (L) = (1− ζx,L∗(L))Vx,L∗(L) + ζx,L∗(L)V ε

x (L∗) (2)

where ζx,L∗(L) is independent of ε, since changing ε has no impact on the policy path

except when L = L∗. Let

ζx,L′(L+) =

∫ ∞0

γe−γtPr[∃t′ ∈ (0, t] : Lt′ = L′|L0 = L]dt

Vx,L′(L+) =

∫∞0γe−γtE[ux(h

t)16∃t′∈(0,t]:Lt′=L′|L0 = l]dt

1− ζx,L′(L+)

54

Page 55: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

Observe that V εx,L∗(L

∗+) = limL↘L∗ Vεx,L∗(L) and ζx,L∗(L

∗+) = limL↘L∗ ζx,L∗(L).

Let W εx = V ε

x,L∗(L∗+) and W ε

x = limL↘L∗ Vεx (L). W ε

x is the expected continuation

value of agent x when L = L∗ and the median member, m(L∗), has just decided

not to stop experimenting. This is closely related to V εx (L∗), which is the expected

continuation value where the expectation is taken before m(L∗) has decided whether

to stop experimenting or not. Specifically, we have

V εx (L∗) = εs+ (1− ε)W ε

x = εs+ (1− ε)((

1− ζx,L∗(L∗+

))W εx + ζx,L∗

(L∗+

)V εx (L∗)

)Solving this for V ε

x (L∗), we obtain

V εx (L∗) =

εs+ (1− ε) (1− ζx,L∗(L∗+)) W εx

1− (1− ε)ζx,L∗(L∗+)

= V 0x (L∗) + ε

s− V 0x (L∗)

1− (1− ε)ζx,L∗(L∗+)

(3)

where the second equality follows from the fact that W εx = V 0

x (L∗) because W εx is the

continuation value of the agent conditional on the event that (Lt)t never hits L∗ again,

which means that in this case experimentation continues forever. Hence, substituting

(3) into (2), we obtain

V εx (L) = (1− ζx,L∗(L))Vx,L∗(L) + ζx,L∗(L)

(V 0x (L∗) + ε

s− V 0x (L∗)

1− (1− ε)ζx,L∗(L∗+)

)= ζx,L∗(L)ε

s− V 0x (L∗)

1− (1− ε)ζx,L∗(L∗+)+ V 0

x (L)

(4)

At the same time, because we have assumed that Vm(L)(L) is minimized at L∗

with a kink at L∗, there exist δ > 0 and K > 0 such that for all L ∈ (L∗ − δ, L∗ + δ)

V 0m(L)(L) = V 0

p(L,m(L))(1) ≥ V 0p(L∗,m(L∗))(1) +K|L− L∗| = s+K|L− L∗| (5)

where the first equality follows from Corollary 1, and the last equality follows from

(1). On the other hand, for L 6∈ (L∗ − δ, L∗ + δ) there exists K ′ > 0 such that

V 0m(L)(L) = V 0

p(L,m(L))(0) ≥ V 0p(L∗,m(L∗))(0) +K ′ = s+K ′ (6)

55

Page 56: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

where the first equality follows from Corollary (1), the inequality follows from the

fact that p(L,m(L))− p(L∗,m(L∗)) is bounded away from zero in this case, and the

second equality follows from (1).

By (4), V εm(L)(L) ≥ s is equivalent to V 0

m(L)(L) ≥ s−ζm(L),L∗(L)εs−V 0

m(L)(L∗)

1−(1−ε)ζm(L),L∗ (L∗+).

If V 0m(L)(L

∗)− s ≤ 0, then we are done, so suppose that V 0m(L)(L

∗)− s > 0.

Suppose that L ∈ (L∗ − δ, L∗ + δ). Then, by (5), it is sufficient that

s+K|L− L∗| ≥ s− ζm(L),L∗(L)εs− V 0

m(L)(L∗)

1− (1− ε)ζm(L),L∗(L∗+)

⇐⇒ K|L− L∗| ≥ ζm(L),L∗(L)εV 0m(L)(L

∗)− s1− (1− ε)ζm(L),L∗(L∗+)

⇐= K|L− L∗| ≥ εV 0m(L)(L

∗)− s1− ζm(L),L∗(L∗+)

⇐⇒ ε ≤K|L− L∗|(1− ζm(L),L∗(L

∗+))

V 0m(L)(L

∗)− V 0m(L∗)(L

∗)

where we have used that ζm(L),L∗(L) ∈ (0, 1).

Suppose next that L 6∈ (L∗ − δ, L∗ + δ). Then, because V 0m(L)(L) ≥ s + K ′ by

(6), it is sufficient that

s+K ′ ≥ s− ζm(L),L∗(L)εs− V 0

m(L)(L∗)

1− (1− ε)ζm(L),L∗(L∗+)

⇐⇒ K ′ ≥ ζm(L),L∗(L)εV 0m(L)(L

∗)− s1− (1− ε)ζm(L),L∗(L∗+)

⇐= K ′ ≥ εr − s

1− ζm(L),L∗(L∗+)⇐⇒ ε ≤ K ′(1− ζm(L),L∗(L

∗+))1

r − s

where we have used that ζm(L),L∗(L) ∈ (0, 1) and V 0m(L)(L

∗) ≤ r.

Let ε1 = infL∈(L∗−δ,L∗+δ)K(1− ζm(L),L∗(L

∗+)) |L−L∗|V 0m(L)

(L∗)−V 0m(L∗)(L

∗)and ε2 =

infL6∈(L∗−δ,L∗+δ) K′ (1− ζm(L),L∗(L

∗+))

1r−s . To have min{ε1, ε2} > 0 we need to ver-

ify that supx ζx,L∗(L∗+) < 1 and that supL∈(L∗−δ,L∗+δ)

∣∣∣∣V 0m(L)

(L∗)−V 0m(L∗)(L

∗)

L−L∗

∣∣∣∣ is finite.

supx ζx,L∗(L∗+) < 1 is immediate. The fact that supL∈(L∗−δ,L∗+δ)

∣∣∣∣V 0m(L)

(L∗)−V 0m(L∗)(L

∗)

L−L∗

∣∣∣∣is finite follows from the fact that ∂

∂xV 0x (L∗) and m′(L) are bounded.

Then choosing ε ∈ (0,min{ε1, ε2}) delivers the result. �

56

Page 57: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

Proof of Proposition 6. Take the example constructed in Proposition 13,

and assume that L0 > L∗.45

Let Pθ(L0) be the probability that, conditional on starting at L0 and the state

being θ ∈ {G,B}, the organization stops experimenting at any finite time t < ∞.

We will show that PG(L0) > PB(L0) for L0 large enough. In fact, we will prove a

stronger result: we will show that there is C > 0 such that PG(L0) ≥ C > 0 for all

L0 > L∗, but limL0→∞ PB(L0) = 0.

Let Qθ(L0, L∗) denote the probability that there exists t < ∞ such that Lt ∈((

cr

)L∗, L∗

]when the state is θ ∈ {G,B}. Qθ(L0, L

∗) is the probability that Lt ever

crosses over to the left of L∗.

We claim that QG(L0, L∗) = 1 for all L0 > L∗ but limL0→∞QB(L0, L

∗) = 0.

Let l(k, t) = lnL(k, t), and note that l(k, t) = ln((

cr

)ke(r−c)t

)= k ln

(cr

)+

ln(e(r−c)t) = k(ln(c)− ln(b)) + (r − c)t. Let l0 = ln(L0).

When θ = G, we then have (lt)t = l0 + (r − c)t − [ln(r)− ln(c)]N(t), where

(N(t))t is a Poisson process with rate r, that is, N(t) ∼ P (rt). This can be written

as a random walk: for integer values of t, lt − l0 =∑t

i=0 Si, where Si = r − c −[ln(r)− ln(c)]Ni, and Ni ∼ P (b) are iid. Note that E[Si] = r− c− r (ln(r)− ln(c)) <

0.46 Then, by the strong law of large numbers, we have ltt−−−→t→∞

E[Si] < 0 a.s., whence

lt −−−→t→∞

−∞ a.s., implying the first claim.

On the other hand, when θ = B, we have (lt)t = l0 + (r − c)t − ln(r − c)N(t),

where (N(t))t is a Poisson process with rate c. This can be written as a random

walk with positive drift: lt − l0 =∑t

i=0 Si, where Si = r − c − [ln(r)− ln(c)]Ni,

Ni ∼ P (c), and E[Si] = r − c− c (ln(r)− ln(c)) > 0. As above, by the strong law of

large numbers, we have lt −−−→t→∞

∞ a.s.

45Technically, our definition of Lt requires that L0 = 1, but we can relax this assumption byconsidering a continuation of the game starting at some t0 > 0, where, by assumption, the numberof successes at time t0 is such that the state variable at t0 is L0. This example can be fit into ouroriginal framework by redefining the density of prior beliefs f to be the density of the posteriorsheld by agents when L = L0 and f is as in Proposition 13. With this relabeling, L0 would equal 1and L∗ would shift to some value less than 1. We find it is easier to think in terms of shifting L0

and leaving f unchanged.46Let r

c = 1+x. Then r−c−r (ln(r)− ln(c)) = c(x− (1+x) ln(1+x)), where x− (1+x) ln(1+x)is negative for all x > 0. Similarly, r − c− c (ln(r)− ln(c)) = c(x− ln(1 + x)), where x− ln(1 + x)is positive for all x > 0.

57

Page 58: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

Note that QB

(L, c

rL)

= q is independent of L because (lt)t follows a random

walk. Now suppose for the sake of contradiction that lim supL→∞QB(L,L∗) > 0.

We claim that this implies q = 1. Suppose towards a contradiction that q < 1. Fix

J ∈ N. Then, for L0 large enough that(cr

)2J+1L0 > L∗,

QB(L0, L∗) ≤

J∏j=0

QB

((cr

)2j

L0,(cr

)2j+1

L0

)= qJ+1

This implies that, whenever lim supL→∞QB(L,L∗) > 0, we have q = 1, as

the above equation must hold for arbitrarily large J . Hence (lt)t is recurrent, that

is, it visits the neighborhood of every l ∈ R infinitely often (Durrett 2010: pp.

190–201). However, this contradicts the fact that limt→∞ lt = ∞ a.s. Therefore,

lim supL→∞QB(L,L∗) = 0.

This implies that PB(L0) ≤ QB(L0, L∗) → 0 as L0 → ∞. On the other hand,

PG(L0) ≥ QG(L0, L∗) infL∈(( cr )L∗,L∗]

PG(L) > 0. The first inequality holds for the

following reason. With probability 1, if Lt = L∗ for some t, there must be t′ < t

such that Lt′ ∈(crL∗, L∗

), which happens with probability QG(L0, L

∗). Conditional

on this event, the probability of hitting state L∗ in the continuation is PG(Lt′). Note

that infL∈(( cr )L∗,L∗]PG(L) > 0 because it is equal to PG

((cr

)L∗). �

A.4 Other Extensions

Proof of Proposition 7. Fix an equilibrium σ in which organization experi-

ments forever and let µt be the size of the organization at time t on the equilibrium

path. Let gt = g(µt). The first success that happens at time t yields the per-capita

payoff of gt, and all further successes pay 1 (because all agents enter the organization

after the first success).

Let Pt = 1− e−rt denote the probability that there is a success by time t given

that the risky technology is good. In the above problem, an agent with belief x who

58

Page 59: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

expects experimentation to continue forever has utility

V(gt)t(x) =x

∫ t∗

0

e−γt (Ptr + (1− Pt)gtr) dt+

x

∫ ∞t∗

e−γt (Ptr + (1− Pt)a) dt+ (1− x)

∫ ∞t∗

e−γtadt

where t∗ is the time at which the agent leaves, that is, when her posterior reaches agtr

.

Now consider the case in which gt = g for all t. Then the above expression equals

Vg(x) = x

(r

γ− r

γ + r+gr(1− e−(γ+r)t∗

)γ + r

+e−(γ+r)t∗a

γ + r− e−γt

∗a

γ

)+e−γt

∗a

γ

Suppose that f = fω, as in Proposition 9. By the same arguments as in that

Proposition, if yt satisfies pt(yt) = agr

for all t, then pt(mt) ↘ aλ(gr−a)+a

as t → ∞,

and, by Claim 9.3 in the proof of Proposition 9, we have t∗ = − ln(λ)r

.47 Then

Vg

(a

λ(gr − a) + a

)=

a

λgr + (1− λ)a

r

γ− r

γ + r+gr(

1− λ γ+rr)

γ + r+λγ+rr a

γ + r− λ

γr a

γ

+λγr a

γ

Since this is a hyperbola in g, it is either increasing in g for all g > 0 or decreasing

in g for all g > 0. In particular, when the congestion effect is maximal, that is, when

g →∞, we have

limg→∞

γVg

(a

λ(gr − a) + a

)=a

λ

(1− λ

γ+rr

) γ

γ + r+ λ

γr a =

a

λ

γ

γ + r+ λ

γr a

r

γ + r

On the other hand, when the economies of scale are maximal, that is, as g → ar,48

limg→a

r

γVg

(a

λ(gr − a) + a

)= γ

r

γ− r

γ + r+a(

1− λ γ+rr)

γ + r+λγ+rr a

γ + r

= rr

γ + r+ a

γ

γ + r

Thus aλ

γγ+r

+ λγr a r

γ+r> r r

γ+r+ a γ

γ+ris equivalent to limg→∞ Vg

(a

λ(gr−a)+a

)>

47Note that this is the same t∗ as in the baseline model.48If g < a

r , we enter a degenerate case in which the organization becomes empty immediately.

59

Page 60: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

limg→arVg

(a

λ(gr−a)+a

). Because Vg

(a

λ(gr−a)+a

)is either increasing in g for all g > 0 or

decreasing in g for all g > 0, this condition implies that Vg

(a

λ(gr−a)+a

)is increasing

in g for all g > 0. The argument in the case when the inequality is reversed is similar.

In addition, note that if Vg

(a

λ(gr−a)+a

)is increasing in g, then we can guarantee

that, with a congestion effect,

V(gτ )τ≥t (pt(mt)) > Vgt (pt(mt)) > Vgt

(a

λ(gtr − a) + a

)> V

(a

λ(r − a) + a

)

Here the first inequality follows because gτ 7→ Vgt,...,gτ ,... (x) is increasing, under

congestion effect µ 7→ g(µ) is decreasing and under perpetual experimentation t 7→ µt

is decreasing, so t 7→ gt = g(µt) is increasing. The second inequality follows because

x 7→ Vgt(x) is strictly increasing and pt(mt)↘ aλ(gr−a)+a

as t→∞. The last inequality

follows because g 7→ Vg

(a

λ(gr−a)+a

)is increasing, under congestion effect µ 7→ g(µ) is

decreasing and in the baseline model we have g(µ) = g(1) = 1 for all µ.

Thus the condition to obtain experimentation forever is slacker with a congestion

effect than in the baseline model at every t, not just in the limit. By the same

argument, the condition for experimentation forever is tighter for all t under economies

of scale.49 �

Proof of Proposition 8.

We have

Vt(x) =x

∫ ∞t

e−γ(τ−t) [Pτr + (1− Pτ )(aF (yτ ) + r(1− F (yτ )))] dτ

+ (1− x)

∫ ∞t

e−γ(τ−t)aF (yτ )dτ,

where Pτ = 1 − e−r(τ−t) is the probability that there has been a success by time τ ,

conditional on the state being good and there being no success up to time t, and

F (yτ ) is the fraction of the population that are outsiders at time τ , conditional on no

49If g 7→ Vg

(a

λ(gr−a)+a

)is decreasing, it is more difficult to make general statements about what

happens away from the limit because in this case the effect of moving away from the limit goesagainst the result: for instance, under congestion effect the condition becomes tighter in the limitbut increasing gt slackens the condition.

60

Page 61: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

successes. We can rewrite this equation as

Vt(x) = x

∫ ∞t

e−γ(τ−t) [r − (1− Pτ )(r − a)F (yτ )] dτ + (1− x)

∫ ∞t

e−γ(τ−t)aF (yτ )dτ

= xr

γ− x

∫ ∞t

e−(γ+r)(τ−t)(r − a)F (yτ )dτ + (1− x)

∫ ∞t

e−γ(τ−t)aF (yτ )dτ (7)

Let Fτ = F (yτ ) and note that τ 7→ Fτ is weakly increasing.

The upper bound for Vt(x) is now obtained as follows. Note that, given τ ≥ t,

the derivative of (7) with respect to Fτ is proportional to −xe−r(τ−t)(r−a)+(1−x)a.

It follows that if the agent could choose Fτ everywhere at will to maximize her payoff,

she would choose Fτ = 1 for τ ≥ t(x) and Fτ = 0 for τ < t(x), where t(x) is defined

by the condition xe−r(t(x)−t)(r − a) = (1 − x)a (obtained by setting the derivative

equal to 0). The result of this choice is V (x), her utility in the private values case, in

which she only cares about her own entry and exit decisions and gets to choose them

optimally. Because in the common values case the entry and exit decisions of other

agents are not optimal from x’s point of view, Vt(x) must be weakly lower than V (x).

As for the lower bound, assume for the sake of argument that Fτ is constant for

all yτ and equal to F ∈ [0, 1]. Then the expression in (7) is

xr

γ− x(r − a)F

γ + r+ (1− x)

sF

γ

which is linear in F and is minimized either when F = 0 or when F = 1. In the first

case, the expression equals x rγ. In the second case, it equals x r

γ− x r−a

γ+r+ (1− x) a

γ.

To finish the proof, we argue that whenever Fτ is weakly increasing in τ , the

expression in 7 is higher than the expression that is obtained when Fτ is replaced

by a suitably chosen constant F . Hence the lower bound obtained for constant Fτ

applies in all cases.

The argument is as follows. Take F = Ft(x). Then for τ > t(x), Fτ is weakly

greater than F and Vt(x) is increasing in the value of F at τ . Conversely, for τ < t(x),

Fτ is weakly lower than F and Vt(x) is decreasing in the value of F at τ . Hence the

agent’s utility is weakly higher under Fτ than under a constant Ft(x). �

61

Page 62: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

B Model with Unrestricted Policy Changes (For

Online Publication)

In this Section we present a more general model which relaxes the assumption

that switching to the safe policy is irreversible. Instead, we allow the organization to

change its policy πt any number of times. The main result is that, under a sensible

restriction on the behavior of agents under indifference (which is uniquely selected

by the discrete-time limit discussed in Section 4), switches to the safe policy are

permanent in every equilibrium. Hence our assumption of irreversible policy changes

is without loss of generality.

B.1 Definition of Equilibrium

We let πt− and πt+ denote the left and right limits of the policy path at time

t respectively, whenever the limits are well-defined. We require that πt, the current

policy at time t, is chosen by the decision-maker who is pivotal given the incumbent

policy πt− . Similarly, πt+ is chosen by the decision-maker who is pivotal given πt. That

is, for the policy to change from π to π′ along the path of play, the decision-maker

induced by π must be in favor of the change.

We define a membership function β so that β(x, L, π) = 1 if agent x chooses to

be a member of the organization given information L and policy π, and β(x, L, π) = 0

otherwise. We define a policy correspondence α so that α(L, π) is the set of policies

that the median voter, m(L, π), is willing to choose.50 We emphasize that α(L, π)

need not be the set of policies that the median voter finds optimal in the sense

of maximizing her utility given the behavior of the other agents – that is, α(L, π)

is not an equilibrium notion. Our notion of strategy profile summarizes the above

requirements:

Definition 2. A Markov strategy profile is given by a membership function β : [0, 1]×R+ × {0, 1} → {0, 1}, a policy correspondence α : R+ × {0, 1} → {{0}, {1}, {0, 1}},

50α(L, π) can take the values {0}, {1} and {0, 1}. Defining α(L, π) in this way is convenientbecause some paths of play cannot be easily described in terms of the instantaneous switchingprobabilities of individual agents. α should be understood as a choice rule in the decision-theoreticsense.

62

Page 63: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

and a stochastic path of play consisting of information and policy paths (Lt, πt)t

satisfying the following:

(a) Conditional on the policy type θ, (Lt, πt)t≥0 is a progressively measurable Markov

process with paths that have left and right limits at every t ≥ 0, satisfying

(L0, π0) = (1, 1).

(b) Letting(kτ

denote a Poisson process with rate r or 0 if θ = G or B respec-

tively, letting(Lτ

be given by Lτ = L(kτ , τ

), and letting n(t) =

∫ t0πt′dt

denote the amount of experimentation up to time t, we have Lt = Ln(t).

(c) πt ∈ α(Lt, πt−) for all t ≥ 0.

(d) πt+ ∈ α(Lt, πt) for all t ≥ 0.

We define Vx(L, π) as the continuation utility of an agent with prior belief x

given information L and incumbent policy π. In other words, Vx(L, π) is the utility

agent x expects to get starting at time t0 when the state follows the process (Lt, πt)t≥t0

given that (Lt0 , πt0) = (L, π).

Definition 3. An equilibrium σ is a strategy profile such that:

(i) β(x, L, π) = 1 if s+ π(p(L, x)r − s) > a and β(x, L, π) = 0 otherwise.

(ii) If Vm(L,π)(L, π′) > Vm(L,π)(L, 1− π′), then α(L, π) = {π′}.

Part (i) of the definition of equilibrium says that agents make membership

decisions that maximize their flow payoffs. Part (ii) says that the pivotal agent chooses

her preferred policy based on her expected utility, assuming that the equilibrium

strategies are played in the continuation.

As in Section 4, an additional condition is needed to rule out undersiable equi-

libria that arise when Vm(L,π)(L, 1) = Vm(L,π)(L, 0) for the trivial reason that the con-

tinuation is independent of m(L, π)’s actions. To eliminate such equilibria, we will

consider short-lived deviations optimal if they would be profitable when extended for

a short amount of time. To formalize this, we define V x(L, π, ε) as x’s continuation

utility under the following assumptions: the state is (L, π) at time t0, the policy π is

locked in for a length of time ε > 0 irrespective of the equilibrium path of play, and

the equilibrium path of play continues at time t0 + ε. We will impose the requirement

63

Page 64: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

that equilibria satisfy the following:

(iii) If Vm(L,π)(L, 1) = Vm(L,π)(L, 0) but V m(L,π)(L, π′, ε) − V m(L,π)(L, 1 − π′, ε) > 0

for all ε > 0 small enough, then α(L, π) = {π′}.

B.2 Analysis

The results are structured as follows. Lemmas 7, 8 and 10 are technical state-

ments. Lemma 9 shows that agents strictly prefer the risky policy after a success.

Proposition 14 shows that switches to the safe policy are permanent.

Lemma 7. For any policy path (πt)t with left and right-limits everywhere, there is

another policy path (πt)t such that π0 = π0, (πt)t is cadlag for all t > 0, and (πt)t is

equal to (πt)t almost everywhere.51

Proof of Lemma 7.

Define π0 = π0 and πt = πt+ for all t > 0. Let T = R≥0 \ {t ≥ 0 : πt− =

πt = πt+}. Because (πt)t has left and right-limits everywhere, T must be countable—

otherwise T would have an accumulation point t0, and either the left-limit or right-

limit of (πt)t at t0 would not be well-defined. Then, since πt = πt for all t /∈ T , (πt)t

and (πt)t only differ on a countable set. Moreover, it is straightforward to show that,

for all t > 0, πt− = πt− and πt+ = πt+ = π, so (πt)t is cadlag. �

Corollary 2. For any strategy profile (β, α, (Lt, πt)t|(L, π, θ)) the stochastic process

(Lt, πt)t|(L, π, θ) (where (πt)t is as in Lemma 7) has cadlag paths, satisfies Conditions

(a) and (b), and induces a path of play that yields the same payoffs as the strategy.

Lemma 8 (Recursive Decomposition). Let Θ ⊆ Rn be a closed set, let (θt)t be

a right-continuous progressively measurable Markov process with support contained in

Θ, let f be a bounded function, and let

U(θ0) =

∫ ∞0

e−γtEθ0 [f(θt)]dt

Let Ψ be a closed subset of Θ and define a stochastic process (ψt)t with a co-

domain (Ψ ∪ {∅}) as follows: ψt = θ if there exists t′ ≤ t such that θt′ = θ and

51Hence it is payoff-equivalent to πt and generates the same learning path (Lt)t.

64

Page 65: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

θt′′ /∈ Ψ for all t′′ < t′. If this is not true for any θ ∈ Ψ, then ψt = ∅.52 Then

U(θ0) =

∫ ∞0

e−γtEθ0[f(θt)1{ψt=∅}

]dt+

∫Ψ

U(θ)dP

where P is defined as follows: Pψt is the probability measure on Ψ∪∅ induced by ψt,

and P = γ∫∞

0e−γtPψtdt.

Proof of Lemma 8.

U(θ0) =

∫ ∞0

e−γtEθ0 [f(θt)]dt =∫ ∞0

e−γtEθ0[f(θt)1{ψt=∅}

]dt+

∫ ∞0

e−γtEθ0[f(θt)1{ψt∈Ψ}

]dt

So it remains to show that∫Ψ

U(θ)dP =

∫ ∞0

e−γtEθ0[f(θt)1{ψt∈Ψ}

]dt

Define a random variable z with a co-domain (Ψ× [0,∞)) ∪ {∅} as follows:

ξ = (θ, t) if θt = θ ∈ Ψ and θt′ /∈ Ψ for all t′ < t. If this is not true for any θ ∈ Ψ

and t ≥ 0, then ξ = ∅.53 Let Pξ be the probability measure on (Ψ× [0,∞)) ∪ {∅}induced by ξ. Let θ(ξ) and t(ξ) be the random variables equal to the first and second

coordinates of ξ, conditional on ξ 6= ∅. Note that ψt = θ if and only if ξ = (θ, t′) for

some t′ ≤ t. Then we can write∫ ∞0

e−γtEθ0[f(θt)1{ψt∈Ψ}

]dt =

∫ ∞0

e−γtEθ0[f(θt)1{ξ∈Ψ×[0,∞)}1{t≥t(ξ)}

]dt =

=

∫Ψ×[0,∞)

(∫ ∞t(ξ)

e−γtEθ0 [f(θt)|ξ]dt)dPξ =

∫Ψ×[0,∞)

e−γt(ξ)U(θ(ξ))dPξ =

=

∫Ψ×[0,∞)

(∫ ∞t(ξ)

γe−γtdt

)U(θ(ξ))dPξ =

∫ ∞0

∫Ψ×[0,∞)

γe−γt1{t≥t(ξ)}U(θ(ξ))dPξdt =

=

∫ ∞0

γe−γt(∫

Ψ×[0,∞)

1{t≥t(ξ)}U(θ(ξ))dPξ

)dt =

∫ ∞0

γe−γt(∫

Ψ

U(θ)dPψt

)dt =

∫Ψ

U(θ)dP

as desired. �

52In other words, ψt takes the value of the first θ ∈ Ψ that (θt)t hits.53In other words, ξ takes the value of the first θ ∈ Ψ that (θt)t hits, and the time when it hits.

65

Page 66: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

Lemma 9. In any equilibrium, α(0, 1) = α(0, 0) = 1.

Proof of Lemma 9.

Observe that Lt0 = 0 implies Lt = 0 for all t ≥ t0 no matter what policy path

is followed, and hence p(Lt, x) = 1 for all t and x. For the rest of the argument, we

can then write V (0, π) instead of Vx(0, π). By Lemma 8, there is ρ ∈ [0, 1] such that

V (0, 0) = ρs

γ+ (1− ρ)V (0, 1) (8)

It follows that there exist η ∈ [0, 1] and η′ ∈ [0, 1] such that η ≥ η′54 and

V (0, 0) = ηs

γ+ (1− η)

r

γV (0, 1) = η′

s

γ+ (1− η′) r

γ

η and η′ are the discounted fractions of the expected time that the organization spends

on the safe policy, when starting in states (0, 0) and (0, 1), respectively.

Observe that if η > η′, then V (0, 0) < V (0, 1). In particular, Vm(0,π)(0, 1) >

Vm(0,π)(0, 0) for all π, which implies that α(0, π) = 1 for all π by Condition (ii). If

η = η′, then V (0, 0) = V (0, 1). Because V (0, 0, ε) < V (0, 1, ε) for any ε > 0, by

Condition (iii), in this case we must also have α(0, π) = 1 for all π. �

Lemma 10. For any state (L, π), there is a CDF G with support55 contained in [0,∞]

such that

Vx(L, π) =

∫ ∞0

WT (p(L, x))dG(T )

for all x ∈ [0, 1], where WT (y) is as defined in Lemma 2.

Similarly, for any state (L, π) and any ε > 0, there is a distribution Gε with

support contained in [0,∞] such that

V x(L, π, ε) =

∫ ∞0

WT (p(L, x))dGε(T )

54We have η ≥ η′ for the following reason. V (0, 1) = η′ sγ + (1− η′) rγ and (8) imply that η sγ + (1−η) rγ = V (0, 0) = ρ sγ + (1− ρ)V (0, 1) = (ρ+ (1− ρ)η′) sγ + (1− ρ)(1− η′) rγ . Then η = ρ+ (1− ρ)η′,which implies that η ≥ η′, as required.

55G is a degenerate CDF that can take the value ∞ with positive probability. Equivalently, Gsatisfies all the standard conditions for the definition of a CDF, except that limT→∞G(T ) ≤ 1instead of limT→∞G(T ) = 1. This is needed to allow for the case where experimentation continuesforever.

66

Page 67: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

for any x ∈ [0, 1].

Proof of Lemma 10.

We prove the first statement. The proof of the second statement is analogous.

Note that we can, without loss of generality, assume that the distribution over

future states (L, π) induced by the continuation starting in state (L, π) satisfies the

following: the policy is equal to 1 in the beginning and, if it ever changes from 1 to

0, it never changes back to 1. Indeed, suppose that π switches from 1 to 0 at time t

and switches back at a random time t + ν, where ν is distributed according to some

CDF H. Let p =∫∞

0e−γνdH(ν). Then a continuation path on which the policy only

switches to 0 at time t with probability 1− p and never returns to 1 after switching

induces the same discounted distribution over future states.

Under the above assumption and given that the policy always remains at 1 after

a success by Lemma 9, the path of play can be described as follows: experimentation

continues uninterrupted until a success or a permanent stop. Then we can let G be

the CDF of the stopping time, conditional on no success being observed. �

Proposition 14. In any equilibrium, for any L, if 0 ∈ α(L, 1), then α(L, 0) = 0.

Proof of Proposition 14.

If L = 0, then α(0, π) = 1 for all π by Lemma 9, so the statement is vacuously

true. Suppose then that L > 0. Suppose for the sake of contradiction that the

statement is false.

Observe that for all L there is ρL ∈ [0, 1] independent of x such that

Vx(L, 0) = ρLs

γ+ (1− ρL)Vx(L, 1) (9)

for all x. This follows from Lemma 8, with the added observation that ρL (equiva-

lently, P in the notation of Lemma 8) is independent of x in this case because the

stochastic process governing (L, π) is independent of x if π = 0.56 We now consider

56If (Lt, πt) has cadlag paths, this follows from Lemma 8. If not, then Lemma 8 cannot be appliedbecause the stochastic process in question is not necessarily right-continuous. However, we can useCorollary 2 of Lemma 7 to obtain a payoff-equivalent path of play with cadlag paths and then applyLemma 8 to it.

67

Page 68: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

three cases.

Case 1: Suppose that ρL > 0, and that the expected amount of experimentation

under the continuation starting in state (L, 1) is positive. By Lemma 1, Vx(L, 1) is

strictly increasing in x. Then equation (9) implies that Vx(L, 1)− Vx(L, 0) is strictly

increasing in x. Since m(L, 1) > m(L, 0), we have Vm(L,1)(L, 1) − Vm(L,1)(L, 0) >

Vm(L,0)(L, 1)−Vm(L,0)(L, 0). Since 1 ∈ α(L, 0) implies that Vm(L,0)(L, 1)−Vm(L,0)(L, 0) ≥0, we have Vm(L,1)(L, 1)− Vm(L,1)(L, 0) > 0, and thus α(L, 1) = 1, a contradiction.

Case 2: Suppose that ρL = 0. We make two observations. First, Vx(L, 0) =

Vx(L, 1) for all x. Second, the expected amount of experimentation under the con-

tinuation starting in state (L, 1) is positive. Indeed, ρL = 0 implies that, conditional

on the state at t being (Lt, πt) = (L, 0), we have inf{t′ > t : πt′ = 1} = t a.s. Since

Condition (d) requires that πt+ exists and the result that inf{t′ > t : πt′ = 1} = t a.s.

rules out that πt+ = 0 with a positive probability, it must be that πt+ = 1 a.s. In turn,

this implies that inf{t′ > t : πt′ = 0} > t a.s. Then E [inf{t′ > t : πt′ = 0}]− t > 0.

By definition, we have

V x(L, 0, ε) = ρεs

γ+ (1− ρε)Vx(L, 1) (10)

for ρε = 1− e−γε.

In the following argument, for convenience, we subtract sγ

from every value

function.57 Note that, by definition, because we lock policy 1 in for time ε, Gε

satisfies 1 − Gε(T ) = min{

1−G(T )1−G(ε)

, 1}

. Then for T ∈ [0, ε], 1 − Gε(T ) = 1 and for

57That is, we let Vx(L, π) = Vx(L, π)− sγ , WT (x) = WT (x)− s

γ , V x(L, π, ε) = V x(L, π, ε)− sγ . For

the rest of this proof, we work with the normalized functions V , W , V , but drop the operator ◦ tosimplify notation.

68

Page 69: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

T > ε, 1−Gε(T ) = 1−G(T )1−G(ε)

. Hence for ε > 0 sufficiently small we have

V x(L, 1, ε) =

∫ ∞0

WT (p(L, x))dGε(T ) =

=

∫ ε

0

WT (p(L, x))dGε(T ) +

∫ ∞ε

WT (p(L, x))dGε(T ) =

= 0 +1

1−G(ε)

∫ ∞ε

WT (p(L, x))dG(T ) =

=Vx(L, 1)

1−G(ε)− 1

1−G(ε)

∫ ε

0

WT (p(L, x))dG(T ) =Vx(L, 1)

1−G(ε)+G(ε)O(ε)

The first part of the third equality follows from the fact that Gε(T ) = 0 for all

T ∈ [0, ε] and the second part of the third equality follows from the fact that Gε(T ) =G(T )−G(ε)

1−G(ε)for T > ε. The last equality follows from the fact that limε→0G(ε) = 0 since

inf{t′ > t : πt′ = 1} = t a.s. and from the fact that ∂+WT (x)∂T

∣∣∣T=0

= max{xr, a} − s +

xr(r−s)γ

by part (ii) of lemma 2.58

Suppose for the sake of contradiction that Vm(L,0)(L, 1) < 0. Note that (10) then

implies that Vm(L,0)(L, 1) < V m(L,0)(L, 0, ε). Therefore, we have V m(L,0)(L, 1, ε) ≤Vm(L,0)(L, 1) < V m(L,0)(L, 0, ε) for all ε > 0 sufficiently small and hence α(L, 0) = 0,

a contradiction. Hence Vm(L,0)(L, 1) ≥ 0. It follows that, because m(L, 1) > m(L, 0)

and, by Lemma 1, x 7→ Vx(L, 1) is strictly increasing, we have Vm(L,1)(L, 1) > 0.

Then V x(L, 1, ε) = Vx(L,1)1−G(ε)

+ G(ε)O(ε) implies that V m(L,1)(L, 1, ε) ≥ Vm(L,1)(L, 1).59

Moreover, because Vm(L,1)(L, 1) > 0, we have Vm(L,1)(L, 1) > V m(L,1)(L, 0, ε), as

V m(L,1)(L, 0, ε) is a convex combination of Vm(L,1)(L, 1) and 0.60 Then V m(L,1)(L, 1, ε) ≥Vm(L,1)(L, 1) > V m(L,1)(L, 0, ε) for all ε > 0 sufficiently small. By Condition (iii), this

implies that α(L, 1) = 1, a contradiction.

Case 3: Suppose that the expected amount of experimentation starting in state

(L, 1) is zero. In this case Vx(L, 0) = Vx(L, 1) = sγ

for all x, and V x(L, 0, ε) = sγ

for

all x and ε > 0. Again, we subtract sγ

from every value function for simplicity.

58In greater detail,∫ ε0WT (p(L, x))dG(T ) ≈

∫ ε0

(k0T + k1)dG(T ) ≤∫ ε0

(k0ε + k1)dG(T ) = (k0ε +

k1)∫ ε0dG(T ) = (k0ε+ k1)(G(ε)−G(0)) = (k0ε+ k1)G(ε) = k0εG(ε) = G(ε)O(ε) where we have used

the fact that we subtracted sγ from every value function to get rid of the constant k1.

59V m(L,1)(L, 1, ε) ≥ Vm(L,1)(L, 1) is then equivalent to G(ε)(1−G(ε))O(ε) ≥ −G(ε)Vm(L,1)(L, 1),which is satisfied for Vm(L,1)(L, 1) > 0.

60Recall that we have subtracted sγ from every value function.

69

Page 70: site.stanford.edu · ENDOGENOUS EXPERIMENTATION IN ORGANIZATIONS Germ an Gieczewskiy Svetlana Kosterinaz June 2020 Abstract We study policy experimentation in organizations with endogenous

By definition, for all ε > 0 the path starting in state (L, 1, ε) has a positive

expected amount of experimentation. Moreover, Gε defined in Lemma 10 is FOSD-

decreasing in ε (that is, if ε′ < ε, then Gε′ ≥ Gε) and hence, taken as a function of ε,

has a pointwise limit G (that is, Gε(T ) −−→ε→0

G(T ) for all T ≥ 0). Then

V x(L, 1, ε) −−→ε→0

∫ ∞0

WT (p(L, x))dG(T )

Since 1 ∈ α(L, 0), there exists a sequence εn ↘ 0 such that V m(L,0)(L, 1, εn) ≥ 0 for

all n,61 whence limε→0 V m(L,0)(L, 1, ε) ≥ 0.

There are now two cases. First, if EG[T ] > 0, we can use the following argument.

limε→0 V m(L,0)(L, 1, ε) ≥ 0 implies that limε→0 V m(L,1)(L, 1, ε) > 0 because m(L, 1) >

m(L, 0) and x 7→ Vx(L, 1) is strictly increasing by Lemma 1 (note that we have used

the fact that EG[T ] > 0 to apply Lemma 1 here). Because ε 7→ V m(L,1)(L, 1, ε) is

continuous, it follows that V m(L,1)(L, 1, ε) > 0 for all ε > 0 sufficiently small. But

then α(L, 1) = 1 by Condition (iii), a contradiction.

Second, if EG[T ] = 0, then we can employ a similar argument using the fact

that, by part (ii) of lemma 2, ∂Wε(p(L,x))∂ε

∣∣∣∣∣ε=0

is strictly increasing in x and that, by

Lemma 2, we have

limε→0

V x(L, 1, ε)

EGε [T ]= lim

ε→0

Wε(p(L, x))

ε=∂Wε(p(L, x))

∂ε

∣∣∣∣∣ε=0

The remaining results of the paper can then be proved in this model.

61Suppose for the sake of contradiction that 1 ∈ α(L, 0) and such a sequence does not exist. Thenfor all ε > 0 sufficiently small we have V m(L,0)(L, 1, ε) < 0 (note that we have used the fact that

we subtract sγ from every value function here). Then V m(L,0)(L, 1, ε) < V m(L,0)(L, 0, ε) = 0, which

contradicts 1 ∈ α(L, 0) by Condition (iii).

70