learning from failures: optimal contract for
TRANSCRIPT
Learning from Failures:
Optimal Contract for Experimentation and Production*
Fahad Khalil, Jacques Lawarree
Alexander Rodivilov**
August 17, 2018
Abstract: Before embarking on a project, a principal must often rely on an agent to learn about its
profitability. We model this learning as a two-armed bandit problem and highlight the interaction
between learning (experimentation) and production. We derive the optimal contract for both
experimentation and production when the agent has private information about his efficiency in
experimentation. This private information in the experimentation stage generates asymmetric
information in the production stage even though there was no disagreement about the profitability
of the project at the outset. The degree of asymmetric information is endogenously determined by
the length of the experimentation stage. An optimal contract uses the length of experimentation,
the production scale, and the timing of payments to screen the agents. Due to the presence of an
optimal production decision after experimentation, we find over-experimentation to be optimal.
The asymmetric information generated during experimentation makes over-production optimal.
An efficient type is rewarded early since he is more likely to succeed in experimenting, while an
inefficient type is rewarded at the very end of the experimentation stage. This result is robust to
the introduction of ex post moral hazard.
Keywords: Information gathering, optimal contracts, strategic experimentation.
JEL: D82, D83, D86.
* We are thankful for the helpful comments of Nageeb Ali, Renato Gomes, Marina Halac, Navin Kartik, David
Martimort, Dilip Mookherjee, and Larry Samuelson. ** [email protected], [email protected], Department of Economics, University of Washington; [email protected],
School of Business, Stevens Institute of Technology
1
1. Introduction
Before embarking on a project, it is important to learn about its profitability to determine
its optimal scale. Consider, for instance, shareholders (principal) who hire a manager (agent) to
work on a new project.1 To determine its profitability, the principal asks the agent to explore
various ways to implement the project by experimenting with alternative technologies. Such
experimentation might demonstrate the profitability of the project. A longer experimentation
allows the agent to better determine its profitability but that is also costly and delays production.
Therefore, the duration of the experimentation and the optimal scale of the project are
interdependent.
An additional complexity arises if the agent is privately informed about his efficiency in
experimentation. If the agent is not efficient at experimenting, a poor result from his experiments
only provides weak evidence of low profitability of the project. However, if the owner
(principal) is misled into believing that the agent is highly efficient, she becomes more
pessimistic than the agent. A trade-off appears for the principal. More experimentation may
provide better information about the profitability of the project but can also increase asymmetric
information about its expected profitability, which leads to information rent for the agent in the
production stage.
In this paper, we derive the optimal contract for an agent who conducts both
experimentation and production. We model the experimentation stage as a two-armed bandit
problem.2 At the outset, the principal and agent are symmetrically informed that production cost
can be high or low. The contract determines the duration of the experimentation stage. Success
in experimentation is assumed to take the form of finding βgood newsβ, i.e., the agent finds out
that production cost is low.3 After success, experimentation stops, and production occurs. If
experimentation continues without success, the expected cost increases, and both principal and
agent become pessimistic about project profitability. We say that the experimentation stage fails
if the agent never learns the true cost.
1 Other applications are the testing of new drugs, the adoption of new technologies or products, the identification of
new investment opportunities, the evaluation of the state of the economy, consumer search, etc. See KrΓ€hmer and
Strausz (2011) and Manso (2011) for other relevant examples. 2 See, e.g., Bolton and Harris (1999), or Bergemann and VΓ€limΓ€ki (2008). 3 We present our main insights by assuming that the agentβs effort and success in experimentation are publicly
observed but show that our key results hold even if the agent could hide success. We also show our key insights
hold in the case of success being bad news.
2
In our model, the agentβs efficiency is determined by his probability of success in any
given period of the experimentation stage when cost is low. Since the agent is privately
informed about his efficiency, when experimentation fails, a lying inefficient agent will have a
lower expected cost of production compared to the principal. This difference in expected cost
implies that the principal (mistakenly believing the agent is efficient) will overcompensate him
in the production stage. Therefore, an inefficient agent must be paid a rent to prevent him from
overstating his efficiency.
A key contribution of our model is to study how the asymmetric information generated
during experimentation impacts production, and how production decisions affect
experimentation.4 At the end of the experimentation stage, there is a production decision, which
generates information rent as it depends on what is learned during experimentation. Relative to
the nascent literature on incentives for experimentation, reviewed below, the novelty of our
approach is to study optimal contracts for both experimentation and production. Focusing on
incentives to experiment, the literature has equated project implementation with success in
experimentation. In contrast, we study the impact of learning from failures on the optimal
contract for production and experimentation. Thus, our analysis highlights the impact of
endogenous asymmetric information on optimal decisions ex post, which is not present in a
model without a production stage.
First, in a model with experimentation and production, we show that over
experimentation relative to the first-best is an optimal screening strategy for the principal,
whereas under experimentation is the standard result in existing models of experimentation.5
Since increasing the duration of experimentation helps to raise the chance of success, by asking
the agent to over experiment, the principal makes it less likely for the agent to fail and exploit the
asymmetry of information about expected costs. Moreover, we find that the difference in
expected costs is non-monotonic in time: we prove that it is increasing for earlier periods but
converges to zero if the experimentation stage is sufficiently long. Intuitively, the updated
beliefs for each type initially diverge with successive periods without success, but they must
4 Intertemporal contractual externality across agency problems also plays an important role in Arve and Martimort
(2016). 5 To the best of our knowledge, ours is the first paper in the literature that predicts over experimentation. The reason
is that over-experimentation might reduce the rent in the production stage, non-existent in standard models of
experimentation.
3
eventually converge. As a result, increasing the duration of experimentation might help reducing
the asymmetric information after a series of failed experiments.
Second, we show that experimentation also influences the choice of output in the
production stage. We prove that if experimentation succeeds, the output is at the first best level
since there is no difference in beliefs regarding the true cost after success. However, if
experimentation fails, the output is distorted to reduce the rent of the agent. Since the inefficient
agent always gets a rent, we expect, and indeed find, that the output of the efficient agent is
distorted downward. This is reminiscent to a standard adverse selection problem.
Interestingly, we find another effect: the output of the inefficient agent is distorted
upward. This is the case when the efficient agent also commands a rent, which is a new result
due to the interaction between the experimentation and production stages. The efficient type
faces a gamble when misreporting his type as inefficient. While he has the chance to collect the
rent of the inefficient type, he also faces a cost if experimentation fails. Since he is then
relatively more pessimistic than the principal, he will be under-compensated at the production
stage relative to the inefficient type. The principal can increase the cost of lying by asking the
inefficient type to produce more. A higher output for the inefficient agent makes it costlier for
the efficient agent who must produce more output with higher expected costs.
Third, to screen the agents, the principal distributes the information rent as rewards to the
agent at different points in time. When both types obtain a rent, each typeβs comparative
advantage on obtaining successes or failures determines a unique optimal contract. Each type is
rewarded for events which are relatively more likely for him. It is optimal to reward the efficient
agent at the beginning and the inefficient agent at the very end of the experimentation stage.
Interestingly, the inefficient agent is rewarded after failure if the experimentation stage is
relatively short and after success in the last period otherwise.6 Our result suggests that the
principal is more likely to tolerate failures in industries where cost of an experiment is relatively
high; for example, this is the case in oil drilling. In contrast, if the cost of experimentation is low
(like on-line advertising) the principal will rely on rewarding the agent after success.
6 In an insightful paper, Manso (2011), argues that golden parachutes and managerial entrenchment, which seem to
reward or tolerate failure, can be effective for encouraging corporate innovation (see also, Ederer and Manso (2013),
and Sadler (2017)). Our analysis suggests that such practices may also have screening properties in situations where
innovators have differences in expertise.
4
We show that the relative likelihood of success for the two types is monotonic, and it
determines the timing of rewards. Given risk-neutrality and absence of moral hazard in our base
model, this property also implies that each type is paid a reward only once, whereas a more
realistic payment structure would involve rewards distributed over multiple periods. This would
be true in a model with moral hazard in experimentation, which is beyond the scope of this
paper.7 However, in an extension section, we do introduce ex post moral hazard simply in this
model by assuming that success is private. That leads to moral hazard rent in every period in
addition to the previously derived asymmetric information rent.8 By suppressing moral hazard,
our framework allows us to highlight the screening properties of the optimal contract that deals
with both experimentation and production in a tractable model.
Related literature. Our paper builds on two strands of the literature. First, it is related to
the literature on principal-agent contracts with endogenous information gathering before
production.9 It is typical in this literature to consider static models, where an agent exerts effort
to gather information relevant to production. By modeling this effort as experimentation, we
introduce a dynamic learning aspect, and especially the possibility of asymmetric learning by
different agents. We contribute to this literature by characterizing the structure of incentive
schemes in a dynamic learning stage. Importantly, in our model, the principal can determine the
degree of asymmetric information by choosing the length of the experimentation stage, and over
or under-experimentation can be optimal.
To model information gathering, we rely on the growing literature on contracting for
experimentation following Bergmann and Hege (1998, 2005). Most of that literature has a
different focus and characterizes incentive schemes for addressing moral hazard during
experimentation but does not consider adverse selection.10 Recent exceptions that introduce
adverse selection are Gomes, Gottlieb and Maestri (2016) and Halac, Kartik and Liu (2016).11 In
7 Halac et al. (2016) illustrate the challenges of having both hidden effort and hidden skill in experimentation in a
model without production stage. 8 The monotonic likelihood ratio of success continues to be the key determinant behind the screening properties of
the contract. It remains optimal to provide exaggerated rewards for the efficient type at the beginning and for the
inefficient type at the end of experimentation even under ex post moral hazard. 9 Early papers are Cremer and Khalil (1992), Lewis and Sappington (1997), and Cremer, Khalil, and Rochet (1998),
while KrΓ€hmer and Strausz (2011) contains recent citations. 10 See also Horner and Samuelson (2013). 11 See also Gerardi and Maestri (2012) for another model where the agent is privately informed about the quality
(prior probability) of the project.
5
Gomes, Gottlieb and Maestri, there is two-dimensional hidden information, where the agent is
privately informed about the quality (prior probability) of the project as well as a private cost of
effort for experimentation. They find conditions under which the second hidden information
problem can be ignored. Halac, Kartik and Liu (2016) have both moral hazard and hidden
information. They extend the moral hazard-based literature by introducing hidden information
about expertise in the experimentation stage to study how asymmetric learning by the efficient
and inefficient agents affects the bonus that needs to be paid to induce the agent to work.12
We add to the literature by showing that asymmetric information created during
experimentation affects production, which in turn introduces novel aspects to the incentive
scheme for experimentation. Unlike the rest of the literature, we find that over-experimentation
relative to the first best, and rewarding an agent after failure can be optimal to screen the agent.
The rest of the paper is organized as follows. In section 2, we present the base good-
news model under adverse selection with exogenous output and public success. In section 3, we
consider extensions and robustness checks. In particular, we allow the principal to choose output
optimally and use it as a screening variable, study ex post moral hazard where the agent can hide
success, and the case where success is bad news.
2. The Model (Learning good news)
A principal hires an agent to implement a project. Both the principal and agent are risk
neutral and have a common discount factor πΏ β (0,1]. It is common knowledge that the
marginal cost of production can be low or high, i.e., π β {π, π}, with 0 < π < π. The probability
that π = π is denoted by π½0 β (0,1). Before the actual production stage, the agent can gather
information regarding the production cost. We call this the experimentation stage.
The experimentation stage
During the experimentation stage, the agent gathers information about the cost of the
project. The experimentation stage takes place over time, π‘ β {1,2,3, β¦ . π}, where π is the
maximum length of the experimentation stage and is determined by the principal.13 In each
12 They show that, without the moral hazard constraint, the first best can be reached. In our model, we impose a
limited liability instead of a moral hazard constraint. 13 Modeling time as discrete is more convenient to study the optimal timing of payment (section 2.2.3).
6
period π‘ , experimentation costs πΎ > 0, and we assume that this cost πΎ is paid by the principal at
the end of each period. We assume that it is always optimal to experiment at least once.14
In the main part of the paper, information gathering takes the form of looking for good
news (see section 3.3 for the case of bad news). If the cost is low, the agent learns it with
probability π in each period π‘ β€ π. If the agent learns that the cost is low (good news) in a
period π‘, we will say that the experimentation was successful. To focus on the screening features
of the optimal contract, we assume for now that the agent cannot hide evidence of the cost being
low. In section 3.2, we will revisit this assumption and study a model with both adverse
selection and ex post moral hazard. We say that experimentation has failed if the agent fails to
learn that cost is low in all π periods. Even if the experimentation stage results in failure, the
expected cost is updated, so there is much to learn from failure. We turn to this next.
We assume that the agent is privately informed about his experimentation efficiency
represented by π. Therefore, the principal faces an adverse selection problem even though all
parties assess the same expected cost at the outset. The principal and agent may update their
beliefs differently during the experimentation stage. The agentβs private information about his
efficiency π determines his type, and we will refer to an agent with high or low efficiency as a
high or low-type agent. With probability π, the agent is a high type, π = π». With probability
(1 β π), he is a low type, π = πΏ. Thus, we define the learning parameter with the type
superscript:
ππ = ππ(π‘π¦ππ π ππππππ π = π|π = π),
where 0 < ππΏ < ππ» < 1.15 If experimentation fails to reveal low cost in a period, agents with
different types form different beliefs about the expected cost of the project. We denote by π½π‘π the
updated belief of a π-type agent that the cost is actually low at the beginning of period π‘ given
π‘ β 1 failures. For period π‘ > 1, we have π½π‘π =
π½π‘β1π (1βππ)
π½π‘β1π (1βππ)+(1βπ½π‘β1
π ) , which in terms of π½0 is
π½π‘π =
π½0(1βππ)π‘β1
π½0(1βππ)π‘β1
+(1βπ½0).
The π-type agentβs expected cost at the beginning of period π‘ is then given by:
14 In the optimal contract under asymmetric information, we allow the principal to choose zero experimentation for
either type. 15 If ππ = 1, the first failure would be a perfect signal regarding the project quality.
7
ππ‘π = π½π‘
ππ + (1 β π½π‘π) π.
Three aspects of learning are worth noting. First, after each period of failure during
experimentation, π½π‘π falls, there is more pessimism that the true cost is low, and the expected cost
ππ‘π increases and converges to π. Second, for the same number of failures during
experimentation, the expected cost is higher as both ππ‘π»and ππ‘
πΏ approach π. An example of how
the expected cost ππ‘π converges to π for each type is presented in Figure 1 below.
Figure 1. Expected cost with ππ» = 0.35, ππΏ = 0.2, π½0 = 0.7, π = 0.5, π = 5.
Third, we also note the important property that the difference in the expected cost, βππ‘ =
ππ‘π» β ππ‘
πΏ > 0, is a non-monotonic function of time: initially increasing and then decreasing.16
Intuitively, each type starts with the same expected cost, which initially diverge as each type of
the agent updates differently, but they eventually have to converge to π.
The production stage
After the experimentation stage ends, production takes place. The principalβs value of
the project is π(π), where π > 0 is the size of the project. The function π(β ) is strictly
16 There exists a unique time period π‘β such that βππ‘ achieves the highest value at this time period, where
π‘β = πππmax1β€π‘β€π
(1 β ππΏ)π‘ β (1 β ππ»)π‘
(1 β π½0 + π½0(1 β ππ»)π‘)(1 β π½0 + π½0(1 β ππΏ)π‘).
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
1 6 11 16π, amount of failures
πππ―
πππ³
π«ππ
8
increasing, strictly concave, twice differentiable on (0, +β), and satisfies the Inada conditions.17
The size of the project and the payment to the agent are determined in the contract offered by the
principal before the experimentation stage takes place. If experimentation reveals that cost is
low in a period π‘ β€ π, experimentation stops, and production takes place based on π = π.18 We
call ππ the output after success. If experimentation fails, i.e., there is no success during the
experimentation stage, production occurs based on the expected cost in period π + 1.19 We call
ππΉ the output after failure. We assume that ππ > ππΉ > 0. Since our main interest is to capture
the impact of asymmetric information after failure, it is enough to assume that ππ and ππΉ are
exogenously determined. We relax this assumption in section 3.1, where the output is optimally
chosen by the principal given her beliefs.
The contract
Before the experimentation stage takes place, the principal offers the agent a menu of
dynamic contracts. Without loss, we use a direct truthful mechanism, where the agent is asked to
announce his type, denoted by π. A contract specifies, for each type of agent, the length of the
experimentation stage, the size of the project, and a transfer as a function of whether or not the
agent succeeded while experimenting. In terms of notation, in the case of success we include π
as an argument in the wage and output for each π‘. In the case of failure, we include the expected
cost πποΏ½ΜοΏ½οΏ½ΜοΏ½ .20 A contract is defined formally by
ποΏ½ΜοΏ½ = (ποΏ½ΜοΏ½, {π€π‘οΏ½ΜοΏ½(π)}
π‘=1
ποΏ½ΜοΏ½
, π€οΏ½ΜοΏ½ (πποΏ½ΜοΏ½+1
οΏ½ΜοΏ½ )),
where ποΏ½ΜοΏ½ is the maximum duration of the experimentation stage for the announced type π,
π€π‘οΏ½ΜοΏ½(π) is the agentβs wage if he observed π = π in period π‘ β€ ποΏ½ΜοΏ½ and π€οΏ½ΜοΏ½ (π
ποΏ½ΜοΏ½+1
οΏ½ΜοΏ½ ) is the agentβs
wage if the agent fails ποΏ½ΜοΏ½ consecutive times.
17 Without the Inada conditions, it may be optimal to shut down the production of the high type after failure if
expected cost is high enough. In such a case, neither type will get a rent. 18 In this model, there is no reason for the principal to continue to experiment once she learns that cost is low. 19 We assume that the agent will learn the exact cost later, but it is not contractible. 20 Since the principal pays for the experimentation cost, the agent is not paid if he does not succeed in any π‘ < ποΏ½ΜοΏ½ .
9
An agent of type π, announcing his type as π, receives expected utility ππ(ποΏ½ΜοΏ½) at time
zero from a contract ποΏ½ΜοΏ½:
ππ(ποΏ½ΜοΏ½) = π½0 β πΏπ‘ποΏ½ΜοΏ½
π‘=1 (1 β ππ)π‘β1
ππ(π€π‘οΏ½ΜοΏ½(π) β πππ)
+πΏποΏ½ΜοΏ½(1 β π½0 + π½0(1 β ππ)
ποΏ½ΜοΏ½
) (π€οΏ½ΜοΏ½ (πποΏ½ΜοΏ½+1
οΏ½ΜοΏ½ ) β πποΏ½ΜοΏ½+1
π ππΉ).
Conditional on the actual cost being low, which happens with probability π½0, the
probability of succeeding for the first time in period π‘ β€ ποΏ½ΜοΏ½ is given by (1 β ππ)π‘β1
ππ.
Experimentation fails if the cost is high (π = πΜ ), which happens with probability 1 β π½0, or, if
the agent fails ποΏ½ΜοΏ½ times despite π = π, which happens with probability π½0(1 β ππ)ποΏ½ΜοΏ½.
The optimal contract will have to satisfy the following incentive compatibility constraints
for all π and π:
(πΌπΆ) ππ(ππ) β₯ ππ(ποΏ½ΜοΏ½).
We also assume that the agent must be paid his expected production costs whether
experimentation succeeds or fails.21 Therefore, the individual rationality constraints must be
satisfied ex post (i.e., after experimentation):
(πΌπ ππ‘π) π€π‘
π(π) β πππ β₯ 0 for π‘ β€ ππ,
(πΌπ πΉπππ ) π€π(π
ππ+1π ) β π
ππ+1π ππΉ β₯ 0,
where the π and πΉ are to denote success and failure.
The principalβs expected payoff at time zero from a contract ππ offered to the agent of
type π is
ππ(ππ) = π½0 β πΏπ‘ππ
π‘=1 (1 β ππ)π‘β1
ππ(π(ππ) β π€π‘π(π) β π€π‘)
+πΏππ(1 β π½0 + π½0(1 β ππ)
ππ
) (π(ππΉ) β π€π(πππ+1π ) β π€ππ),
where the cost of experimentation is π€π‘ =β πΏπ πΎπ‘
π =1
πΏπ‘. Thus, the principalβs objective function is:
πππ»(ππ») + (1 β π)ππΏ(ππΏ).
21 See the recent paper by KrΓ€hmer and Strausz (2015) on the importance of ex post participation constraints in a
sequential screening model. They provide multiple examples of legal restrictions on penalties on agents prematurely
terminating a contract. Our results remain intact as long as there are sufficient restrictions on penalties imposed on
the agent. If we assumed unlimited penalties, for example, with only an ex ante participation constraint, we can
apply well-known ideas from mechanisms Γ la CrΓ©mer-McLean (1985) that says the principal can still receive the
first best profit.
10
To summarize, the timing is as follows:
(1) The agent learns his type π.
(2) The principal offers a contract to the agent. In case the agent rejects the contract, the
game is over and both parties get payoffs normalized to zero; if the agent accepts the
contract, the game proceeds to the experimentation stage with duration as specified in
the contract.
(3) The experimentation stage begins.
(4) If the agent learns that π = π, the experimentation stage stops, and the production
stage starts with output and transfers as specified in the contract.
In case no success is observed during the experimentation stage, the production
occurs with output and transfers as specified in the contract.
Our focus is to study the interaction between endogenous asymmetric information due to
experimentation and optimal decisions that are made after the experimentation stage. The focus
of the existing literature on experimentation has been on providing incentives to experiment,
where success is identified as an outcome with a positive payoff. The decision ex post is not
explicitly modeled. In contrast, to highlight the role of asymmetric information on decisions ex
post, we model an ex post production stage that is performed by the same agent who
experiments. This is common in a wide range of applications such as the regulation of natural
monopolies or when a project relies on new, untested technologies.22
2.1 The First Best Benchmark
Suppose the agentβs type π is common knowledge before the principal offers the contract.
The first-best solution is found by maximizing the principalβs profit such that the wage to the
agent covers the cost in case of success and the expected cost in case of failure.
π½0 βπΏπ‘
ππ
π‘=1
(1 β ππ)π‘β1
ππ(π(ππ) β π€π‘π(π) β π€π‘)
+πΏππ(1 β π½0 + π½0(1 β ππ)
ππ
) (π(ππΉ) β π€π(πππ+1π ) β π€ππ)
22 As noted by Laffont and Tirole (1988), in the presence of cost uncertainty and risk aversion, separating the two
tasks may not be optimal. Moreover, hiring one agent for experimentation and another one for production might
lead to an informed principal problem. For example, in case the former agent provides negative evidence about the
projectβs profitability, the principal may benefit from hiding this information from the second agent to keep him
more optimistic about the project.
11
subject to
(πΌπ ππ‘π) π€π‘
π(π) β πππ β₯ 0 for π‘ β€ ππ,
(πΌπ πΉπππ ) π€π(π
ππ+1π ) β π
ππ+1π ππΉ β₯ 0.
The individual rationality constraints are binding. If the agent succeeds, the transfers
cover the actual cost with no rent given to the agent: π€π‘π(π) = πππ for π‘ β€ ππ. In case the agent
fails, the transfers cover the current expected cost and no expected rent is given to the
agent: π€π(πππ+1π ) = π
ππ+1π ππΉ. Since the expected cost is rising as long as success is not
obtained, the termination date ππΉπ΅π is bounded and it is the highest π‘π such that
πΏπ½π‘ππ ππ[π(ππ) β πππ] + πΏ(1 β π½
π‘ππ ππ)[π(ππΉ) β π
π‘π+1π ππΉ]
β₯ πΎ + [π(ππΉ) β ππ‘ππ ππΉ]
The intuition is that, by extending the experimentation stage by one additional period, the
agent of type π can learn that π = π with probability π½π‘ππ ππ.
Note that the first-best termination date of the experimentation stage ππΉπ΅π is a non-
monotonic function of the agentβs type. In the beginning of Appendix A, we formally prove that
there exists a unique value of ππ called οΏ½ΜοΏ½, such that:
πππΉπ΅π
πππ < 0 for ππ < οΏ½ΜοΏ½ and πππΉπ΅
π
πππ β₯ 0 for ππ β₯ οΏ½ΜοΏ½.
This non-monotonicity is a result of two countervailing forces.23 In any given period of
the experimentation stage, the high type is more likely to learn π = π (conditional on the actual
cost being low) since ππ» > ππΏ. This suggests that the principal should allow the high type to
experiment longer. However, at the same time, the high type agent becomes relatively more
pessimistic with repeated failures. This can be seen by looking at the probability of success
conditional on reaching period π‘, given by π½0(1 β ππ)π‘β1
ππ, over time. In Figure 2, we see that
this conditional probability of success for the high type becomes smaller than that for the low
type at some point. Given these two countervailing forces, the first-best stopping time for the
high type agent can be shorter or longer than that of the type πΏ agent depending on the
parameters of the problem.24 Therefore, the first-best stopping time is increasing in the agentβs
23 A similar intuition can be found in Halac et al. (2016) in a model without production. 24 For example, if ππΏ = 0.2, ππ» = 0.4, π = 0.5, π = 20, π½0 = 0.5, πΏ = 0.9, πΎ = 2, and π = 10βπ, then the first-
best termination date for the high type agent is ππΉπ΅π» = 4, whereas it is optimal to allow the low type agent to
experiment for seven periods, ππΉπ΅πΏ = 7. However, if we now change ππ» to 0.22 and π½0 to 0.4, the low type agent is
allowed to experiment less, that is, ππΉπ΅π» = 4 > ππΉπ΅
πΏ = 3.
12
type for small values of ππ when the first force (relative efficiency) dominates, but becomes
decreasing for larger values when the second force (relative pessimism) becomes dominant.
Figure 2. Probability of success with ππ» = 0.4, ππΏ = 0.2, π½0 = 0.5.
2.2 Asymmetric information
Assume now that the agent privately knows this type. Recall that all parties have the
same expected cost at the outset. Asymmetric information arises in our setting because the two
types learn asymmetrically in the experimentation stage, and not because there is any inherent
difference in their ability to implement the project. Furthermore, private information can exist
only if experimentation fails since the true cost π = π is revealed when the agent succeeds.
We now introduce some notation for ex post rent of the agent, which is the rent in the
production stage. Define by π¦π‘π the wage net of cost to the π type who succeeds in period π‘, and
by π₯π the wage net of the expected cost to the π type who failed during the entire
experimentation stage:
π¦π‘π β‘ π€π‘
π(π) β πππ for 1 β€ π‘ β€ ππ,
π₯π β‘ π€π(πππ+1π ) β π
ππ+1π ππΉ .
Therefore, the ex post (πΌπ ) constraints can be written as:
(πΌπ ππ‘π) π¦π‘
π β₯ 0 for π‘ β€ ππ,
(πΌπ πΉπππ ) π₯π β₯ 0,
where the π and πΉ are to denote success and failure.
0.00
0.05
0.10
0.15
0.20
0.25
1 3 5 7 9 11 13 15
π·π π β ππ³ πβπππ³
π·π π β ππ― πβπππ―
π, amount of failures
13
To simplify the notation, we denote with πππ the probability that an agent of type π does
not succeed during the π periods of the experimentation stage:
πππ = 1 β π½0 + π½0(1 β ππ)
π.
Using this notation, we can rewrite the two incentive constraints as:
(πΌπΆπΏ,π») π½0 β πΏπ‘ππΏ
π‘=1 (1 β ππΏ)π‘β1ππΏπ¦π‘πΏ + πΏππΏ
πππΏπΏ π₯πΏ
β₯ π½0 β πΏπ‘ππ»
π‘=1 (1 β ππΏ)π‘β1ππΏπ¦π‘π» + πΏππ»
πππ»πΏ [π₯π» + βπππ»+1ππΉ],
(πΌπΆπ»,πΏ) π½0 β πΏπ‘ππ»
π‘=1 (1 β ππ»)π‘β1ππ»π¦π‘π» + πΏππ»
πππ»π» π₯π»
β₯ π½0 β πΏπ‘ππΏ
π‘=1 (1 β ππ»)π‘β1ππ»π¦π‘πΏ + πΏππΏ
πππΏπ» [π₯πΏ β βπππΏ+1ππΉ],
Using our notation, the principal maximizes the following objective function subject to
(πΌπ ππ‘πΏ), (πΌπ πΉ
ππΏπΏ ), (πΌπ ππ‘
π»), (πΌπ πΉππ»π» ), (πΌπΆπΏ,π»), and (πΌπΆπ»,πΏ)
πΈπ {π½0 β πΏπ‘ππ
π‘=1 (1 β ππ)π‘β1
ππ[π(ππ) β πππ β π€π‘] + πΏπππ
πππ [π(ππΉ) β π
ππ+1π ππΉ β π€ππ]
βπ½0 β πΏπ‘ππ
π‘=1 (1 β ππ)π‘β1
πππ¦π‘π β πΏππ
ππππ π₯π
}
We start by analyzing the two (πΌπΆ) constraints and first show that the low type always
earns a rent.25 The reason that (πΌπΆπΏ,π») is binding is that since a high type must be given his
expected cost following failure, a low type will have to be given a rent to truthfully report his
type as his expected cost is lower. That is, the π π»π of (πΌπΆπΏ,π») is strictly positive since
βπππ»+1 = πππ»+1π» β π
ππ»+1πΏ > 0. We denote by ππΏ the rent to the low type by the πΏπ»π of the
(πΌπΆπΏ,π»):
ππΏ β‘ π½0 β πΏπ‘ππΏ
π‘=1 (1 β ππΏ)π‘β1ππΏπ¦π‘πΏ + πΏππΏ
πππΏπΏ π₯πΏ > 0.
2.2.1. (π°πͺπ―,π³) binding.
Interestingly, it is also possible that the high type wants to misreport his type such that
(πΌπΆπ»,πΏ) is binding too. While the low typeβs benefit from misreporting is positive for sure
(βπππ»+1 > 0), the high typeβs expected utility from misreporting his type is a gamble. There is a
positive part since he has a chance to claim the rent ππΏ of the low type. This part is positively
related to βπππ»+1 adjusted by relative probability of collecting the low typeβs rent. However,
there is a negative part as well since he runs the risk of having to produce while being
25 We prove this result in a Claim in Appendix A.
14
undercompensated since paid as a low type whose expected cost is lower when experimentation
fails. This term is positively related to βπππΏ+1 adjusted by probability of starting production
after failure. This is reflected in πΏππΏπ
ππΏπ» βπππΏ+1ππΉ on the π π»π of (πΌπΆπ»,πΏ). The (πΌπΆπ»,πΏ) is binding
only when the positive part of the gamble dominates the negative part.26
The complexity of the model calls for an illustrative example that demonstrates that
(πΌπΆπ»,πΏ) might be binding in equilibrium. Consider a case where the two types are significantly
different, e.g., ππΏ is close to zero and ππ» is close to one so that, in the first-best, ππΏ = 0 and
ππ» > 0. 27 Suppose the low type claims being high. Since his expected cost is lower than the
cost of the high type after ππ» unsuccessful experiments (πππ»πΏ < π
ππ»π» ), the low type must be given
a rent to induce truth-telling. Consider now the incentives of the high type to claim being low. In
this case, production starts immediately without experimentation under identical beliefs about
expected cost (π½0π + (1 β π½0) π). Therefore, the high type simply collects the rent of the low
type without incurring the negative part of the gamble when producing. And, (πΌπΆπ»,πΏ) is binding.
In our model, the exact value of the gamble depends on the difference in expected costs
and also the relative probabilities of success and failure. These, in turn, are determined by the
optimal durations of experimentation stage, ππΏ and ππ». To see how ππΏ and ππ» affect the value
of the gamble, consider again our simple example when the principal asks the low type to (over)
experiment (by) one period, ππΏ = 1, and look at the high-typeβs incentive to misreport again.
The high-type now faces a risk. If the project is bad, he will fail with probability (1 β π½0) and
have to produce in period π‘ = 2 knowing almost for sure that the cost is π, while the principal is
led to believe that the expected cost is π2πΏ = π½2
πΏπ + (1 β π½2πΏ) π < π. Therefore, by increasing the
low-typeβs duration of experimentation, the principal can use the negative part of the gamble to
mitigate the high-typeβs incentive to lie and, therefore, relax the (πΌπΆπ»,πΏ).
26 Suppose that the principal pays the rent to the low type after an early success. The high type may be interested in
claiming to be low type to collect the rent. Indeed, the high type is more likely to succeed early given that the
project is low cost. However, misreporting his type is risky for the high type. If he fails to find good news, the
principal, believing that he is a low type, will require the agent to produce based on a lower expected cost. Thus,
misreporting his type becomes a gamble for the high type: he has a chance to obtain the low-typeβs rent, but he will
be undercompensated relative to the low type in the production stage if he fails during the experimentation stage. 27 In this example, we emphasize the role of the difference in πs and suppress the impact of the relative probabilities
of success and failure.
15
Another way to illustrate the impact of ππΏ and ππ» on the gamble is to consider a case
where the principal must choose an identical length of the experimentation stage for both types
(ππ» = ππΏ = π). 28 We prove in Proposition 1 below that, in this scenario, the (πΌπΆπ»,πΏ) constraint
is not binding. Intuitively, since the relevant probabilities πππ and the difference in expected cost
βππ+1 are both identical in the positive and negative part of the gamble, they cancel each other.
This implies that misreporting his type will be unattractive for the high type.29
Proposition 1.
If the duration of experimentation must be chosen identical for both types, ππ» = ππΏ, then
the high type obtains no rent.
Proof: See Supplementary Appendix B.
Based on Proposition 1, we conclude that it is the principalβs choice to have different
lengths of the experimentation stage that results in (πΌπΆπ»,πΏ) being binding. Since the two types
have different efficiencies in experimentation (ππ» > ππΏ), the principal optimally chooses
different durations of experimentation for each type. This reveals that having both incentive
constraints binding might be in the interest of the principal.
In our model, the efficiency in experimentation (ππ» > ππΏ) is private information and the
principal chooses ππΏ and ππ» to screen the agents. This choice determines equilibrium values of
the relative probabilities of success and failure, and the difference in expected costs which
determine the gamble. The non-monotonicity of the first-best termination dates (section 2.1) and
also non-monotonicity in the difference in expected costs (Figure 1) make it difficult to provide a
simple characterization of the optimal durations. This indicates the challenge in deriving a
necessary condition for the sign of the gamble.
We provide below sufficient conditions for the (πΌπΆπ»,πΏ) constraint to be binding, which
are fairly intuitive given the challenges mentioned above. To determine the sufficient
conditions, we focus on the adverse selection parameter π. These conditions say that the
constraint is binding as long as the order of termination dates at the optimum remain unaltered
from that under the first best. Recalling the definition of οΏ½ΜοΏ½ from the discussion of first best, for
28 For example, the FDA requires all the firms to go through the same amount of trials before they are allowed to
release new drugs on the market. 29 We prove formally in Supplementary Appendix B that if π is the same for both types, the gamble is zero.
16
small values of π (ππΏ < ππ» < οΏ½ΜοΏ½) this means ππΏ < ππ», while the opposite is true for high values
of π (ππ» > ππΏ > οΏ½ΜοΏ½).30
Claim. Sufficient conditions for (πΌπΆπ»,πΏ) to be binding.
For any ππΏ β (0,1), there exists 0 < ππ»(ππΏ) < ππ»(ππΏ) < 1 such that the first best order
of termination dates is preserved in equilibrium and (πΌπΆπ»,πΏ) binds if
either i) ππ» < πππ{ππ»(ππΏ), οΏ½ΜοΏ½} for ππΏ < οΏ½ΜοΏ½ or ii) ππ» > ππ»(ππΏ) > ππΏ for ππΏ β₯ οΏ½ΜοΏ½.
Proof: See Appendix A.
The optimal contract is derived formally in Appendix A, and the key results are presented
in Propositions 2 and 3. The principal has two tools to screen the agent: the length of the
experimentation period and the timing of the payments for each type. We examine each of them
first, and later in section 3.2, we let the principal screening by choosing the optimal outputs
following both failure and success.
2.2.2. The length of the experimentation period: optimality of over-experimentation
While the standard result in the experimentation literature is under-experimentation, we
find that over-experimentation can also occur when there is a production stage following
experimentation. Experimenting longer increases the chance of success, and it can also help
reduce information rent. We explain this next.
To give some intuition, consider the case where only (πΌπΆπΏ,π») binds. The high type gets
no rent while rent of the low type is ππΏ = πΏππ»π
ππ»πΏ βπππ»+1ππΉ. In this case, there is no benefit
from distorting the duration of the experimentation for the low type (ππΏ = ππΉπ΅πΏ ). However, the
principal optimally distorts ππ» from its first-best level to mitigate rent of the low type. The
reason why the principal may decide to over-experiment is that it might reduce the rent in the
production stage, non-existent in standard models of experimentation. First, by extending the
experimentation period, the agent is more likely to succeed in experimentation. And, after
success, the cost of production is known, and no rent can originate from the production stage.
Second, even if experimentation fails, increasing the duration of experimentation can help reduce
30 As we will see below, when πs are high, the Ξππ‘ function is skewed to the left, and its shape largely determines
equilibrium properties as well as our sufficient condition. When πs are small, the Ξππ‘ function is relatively flat, and
the relative probabilities of success and failure play a more prominent role.
17
the asymmetric information and thus the agentβs rent in the production stage. This is because the
difference in expected cost βππ‘ is non-monotonic in π‘. We show that such over-experimentation
is more effective if the agents are sufficiently different in their learning abilities.31 This does not
depend on whether only one or both (πΌπΆ)s are binding.
When both (πΌπΆ)s are binding, there is another novel reason for over experimentation. By
increasing the duration of experimentation for the low type, ππΏ, the principal can increase the
high typeβs cost of lying. Recall that the negative part of the high-typeβs gamble when he lies is
represented by πΏππΏπ
ππΏπ» βπππΏ+1ππΉ on the π π»π of (πΌπΆπ»,πΏ). Since βππ‘ is non-monotonic, the
principal can increase the cost of lying by increasing ππΏ.
In Proposition 2, we provide sufficient conditions for over-experimentation. In Appendix
A, we also give sufficient conditions for under-experimentation to be optimal for the high type.
We also provide a numerical example of over-experimentation in Figure 4 below.
Proposition 2. Sufficient conditions for over-experimentation.
For any ππΏ β (0,1), there exists 0 < ππ»(ππΏ) < ππ»(ππΏ) < 1 such that the high type over-
experiments if ππ»is different enough from ππΏ:
ππ» > ππΉπ΅π» if ππ» > π
π»(ππΏ),
and the low type over-experiments if ππ» is not too different from ππΏ:
ππΏ β₯ ππΉπ΅πΏ if ππ» < ππ»(ππΏ).
Proof: See Appendix A.
In Figure 3 below, we use an example to illustrate that increasing ππ» is more effective
when ππ» is higher (relative to ππΏ). Start with case when the difference in expected cost is given
by the dashed line, and there is over-experimentation (ππΉπ΅π» = 10, while πππ΅
π» = 11). By
increasing ππ», the principal decreases πππ»πΏ βπππ»+1, which in turn decreases the positive part of
the gamble and makes lying less attractive for the high type. This effect is even stronger for a
higher ππ» (see plain line), where the difference in expected cost is skewed to the left with a
relatively high βπππ»+1 at the first best ππΉπ΅π» = 5. Now, increasing ππ» is even more effective
since the decrease in πππ»πΏ βπππ»+1 is even sharper.
31 The opposite is true when the πs are relatively small and close to each other. Then, the principal prefers to under-
experiment which reduces the difference in expected cost.
18
Figure 3. Difference in the expected cost with ππ» = 0.35(= 0.82), ππΏ = 0.2, π½0 = 0.7, π = 0.1, π = 10,
π(π) = 3.5βπ, πΏ = 0.9, and πΎ = 1.
Finally, as in the first best, either type may experiment longer, and ππΏ can be larger or
smaller than ππ».
2.2.3. The timing of the payments: rewarding failure or early/late success?
The principal chooses the timing of rewards and the duration of experimentation at the
same time as part of the contract, and, in this section, we analyze the principalβs choice of timing
of rewards to each type: should the principal reward early or late success in the experimentation
stage? Should she reward failure?
Recall that the low type receives a strictly positive rent, ππΏ > 0, and (πΌπΆπΏ,π») is binding.
The principal has to determine when to pay this rent to the low type while taking into account the
high typeβs incentive to misreport. This is achieved by rewarding the low type at events which
are relatively more likely for the low type. Since the high type is relatively more likely to
succeed early, he is optimally rewarded early, while the reward to the low type is optimally
postponed. Furthermore, since the high type is more likely to fail if experimentation lasts long
enough, rewarding the low type after late success or failure will depend on the length of the
experimentation stage, which is determined by the cost of experimentation (πΎ).
We will next characterize the optimal timing of payments. There are two cases
depending on whether only (πΌπΆπΏ,π») or both πΌπΆ constraints are binding.
0
1
2
3
4
5
6
1 π, amount of failures
ππΉπ΅π» =10
ππΉπ΅π» =5
π ππ
19
Proposition 3. The optimal timing of payments.
Case A: Only the low typeβs IC is binding.
The high type gets no rent. There is no restriction on when to reward the low type.
Case B: Both typesβ IC are binding.
The principal must reward the high-type for early success (in the very first period)
π¦1π» > 0 = π₯π» = π¦π‘
π» for all π‘ > 1.
The low type agent is rewarded
(i) after failure if the cost of experimentation is large (πΎ > πΎβ):
π₯πΏ > 0 = π¦π‘πΏ for all π‘ β€ ππΏ, and
(ii) after success in the last period if the cost of experimentation is small (πΎ < πΎβ):
π¦ππΏπΏ > 0 = π₯πΏ = π¦π‘
πΏ for all π‘ β€ ππΏ.
Proof: See Appendix A.
If (πΌπΆπ»,πΏ) is not binding, we show in Case A of Appendix A that the principal can use
any combination of π¦π‘πΏ and π₯πΏ to satisfy the binding (πΌπΆπΏ,π»): there is no restriction on when and
how the principal pays the rent to the low type as long as π½0 β πΏπ‘ππΏ
π‘=1 (1 β ππΏ)π‘β1ππΏπ¦π‘πΏ +
πΏππΏπ
ππΏπΏ π₯πΏ = πΏππ»
πππ»πΏ βπππ»+1ππΉ. Therefore, the principal can reward either early or late success,
or even failure.
If (πΌπΆπ»,πΏ) is binding, the high type has an incentive to claim to be a low type and we are
in Case B. This is the more interesting case and we focus on its characterization in this section.
The timing of rewards to the low type depends on which scheme yields a lower π π»π of (πΌπΆπ»,πΏ).
We start by analyzing the case where the principal rewards the agent after success. In the
optimal timing of payments is determined by the relative likelihood ratio of success in period π‘,
π½0(1βππ»)π‘β1
ππ»
π½0(1βππΏ)π‘β1ππΏ ,
which is strictly decreasing in π‘. Therefore, if the principal chooses to reward the low type for
success, she will optimally postpone this reward till the very last period, ππΏ, to minimize the
high typeβs incentive to misreport.
To see why the principal may want to reward the low type agent after failure, we need to
compare the relative likelihood of ratio of success (π½0(1βππ»)
π‘β1ππ»
π½0(1βππΏ)π‘β1ππΏ ) and failure (πππΏπ»
πππΏπΏ ). We show
20
in Appendix A that there is a unique period οΏ½ΜοΏ½πΏ such that the two relative probabilities are
equal:32
(1βππ»)οΏ½ΜοΏ½πΏβ1
ππ»
(1βππΏ)οΏ½ΜοΏ½πΏβ1ππΏβ‘
πππΏπ»
πππΏπΏ .
This critical value οΏ½ΜοΏ½πΏ (depicted in Figure 4 below) determines which type is relatively
more likely to succeed or fail during the experimentation stage. In any period π‘ < οΏ½ΜοΏ½πΏ, the high
type who chooses the contract designed for the low type is relatively more likely to succeed than
fail compared to the low type. For π‘ > οΏ½ΜοΏ½πΏ, the opposite is true. This feature plays an important
role in structuring the optimal contract. The critical value οΏ½ΜοΏ½πΏ determines whether the principal
will choose to reward success or failure in the optimal contract.
Figure 4. Relative probability of success/failure with ππ» = 0.4, ππΏ = 0.2, π½0 = 0.5.
If the principal wants to reward the low type after success, it will only be optimal if the
experimentation stage lasts long enough. If ππΏ > οΏ½ΜοΏ½πΏ, the principal can reward success in the last
period as the relative probability of success is declining over time and the rent is the smallest in
the last period ππΏ. If the experimentation stage is short, ππΏ < οΏ½ΜοΏ½πΏ, the principal will pay the rent
to the low type by rewarding failure since the high type is relatively more likely to succeed
during the experimentation stage.
The important question is therefore what the optimal value of ππΏ is relative to οΏ½ΜοΏ½πΏ and
consequently whether the principal should reward success or failure. The optimal value of ππΏ is
32 See Lemma 1 in Appendix A for the proof.
0.0
0.5
1.0
1.5
2.0
2.5
πππππ‘ππ£π ππππππππππ‘π¦ ππ ππππππππππ π» π‘π¦ππ
πππππ‘ππ£πππππππππππ‘π¦ππ πππππππfor H type
π, amount of failuresοΏ½ΜοΏ½
21
inversely related to the cost of experimentation πΎ. In Appendix A, we prove in Lemma 6 that
there exists a unique value of πΎβ such that ππΏ < οΏ½ΜοΏ½πΏ for any πΎ > πΎβ. Therefore, when the cost of
experimentation is high (πΎ > πΎβ), the length of experimentation will be short, and it will be
optimal for the principal to reward the low type after failure. Intuitively, failure is a better
instrument to screen out the high type when experimentation cost is high. So, it is the adverse
selection concern that makes it optimal to reward failure.
Finally, if the high type also gets positive rent, we show in Appendix A, that the principal
will reward him for success in the first period only. This is the period when success is most
likely to come from a high type than a low type.
3. Extensions
3.1. Over-production as a screening device
In this section, we allow the principal to choose output optimally after success and after
failure, and she can now use output as another screening variable. While our main findings
continue to hold, the key new results are that if the experimentation stage fails, the inefficient
type is asked to over-produce, while the efficient type under-produces. Just like over-
experimentation, over-production can be used to increase the cost of lying.
When output is optimally chosen by the principal in the contract, ππ is now replaced by
by ππ‘π(π) and is determined by πβ² (ππ‘
π(π)) = π. The main change from the base model is that
output after failure which is denoted by ππ(πππ+1π ), can vary continuously depending on the
expected cost. We can simply replace ππΉ by ππ(πππ+1π ) and ππ by ππ‘
π(π) in the principalβs
problem.
We derive the formal output scheme in Supplementary Appendix C but present the
intuition here. When experimentation is successful, there is no asymmetric information and no
reason to distort the output. Both types produce the first best output. When experimentation
fails to reveal the cost, asymmetric information will induce the principal to distort the output to
limit the rent. This is a familiar result in contract theory. In a standard second best contract Γ la
Baron-Myerson, the type who receives rent produces the first best level of output while the type
with no rent under-produces relative to the first best.
22
We find a similar result when only the low typeβs incentive constraint binds. The low
type produces the first best output while the high type under-produces relative to the first best.
To limit the rent of the low type, the high type is asked to produce a lower output.
However, we find a new result when both πΌπΆ are binding simultaneously. We give below
the sufficient conditions such that both incentive constraints are binding when output is variable,
and these conditions are similar to the ones identified earlier. When both incentive constraints
bind, to limit the rent of the high type, the principal will increase the output of the low type and
require over-production relative to the first best. To understand the intuition behind this result,
recall that the rent of the high type mimicking the low type is a gamble with two components.
The positive part is due to the rent promised to the low type after failure in the experimentation
stage which is increasing in ππ»(πππ»+1π» ). By making this output smaller, the principal can
decrease the positive component of the gamble. The negative part is now given by
πΏππΏπ
ππΏπ» βπππΏ+1π
πΏ(πππΏ+1πΏ ), and it comes from the higher expected cost of producing the output
required from the low type. By making this output higher, the principal can increase the cost of
lying and lower the rent of the high type. We summarize the results in Proposition 4 below.
Proposition 4. Optimal output.
After success, each type produces at the first best level:
πβ² (ππ‘π(π)) = π for π‘ β€ ππ.
After failure, the high type under-produces relative to the first best output:
πππ΅π» (π
ππ»+1π» ) < ππΉπ΅
π» (πππ»+1π» ).
After failure, the low type over-produces:
πππ΅πΏ (π
ππΏ+1πΏ ) β₯ ππΉπ΅
πΏ (πππΏ+1πΏ ).
Proof: See Supplementary Appendix C.
As in our main model, we now derive sufficient conditions for both πΌπΆ to bind and for
over-experimentation to occur and discuss them in turn. We show below that the sufficient
conditions for (πΌπΆπ»,πΏ) to be binding are stricter than in our main model with an exogenous
output. This is not surprising since the principal now has an additional screening instrument to
23
reduce the high typeβs incentives to misreport. We introduce οΏ½ΜοΏ½π»(ππΏ) < πππ{ππ»(ππΏ), οΏ½ΜοΏ½} and
οΏ½ΜοΏ½π»(ππΏ) > ππ»(ππΏ) to provide sufficient condition for both (πΌπΆ) to be binding simultaneously.
Claim. Sufficient condition for (πΌπΆπ»,πΏ) to be binding with endogenous output.
For any ππΏ β (0,1), there exists 0 < οΏ½ΜοΏ½π»(ππΏ) < οΏ½ΜοΏ½π»(ππΏ) < 1 such that
(πΌπΆπ»,πΏ) binds if either i) ππ» < οΏ½ΜοΏ½π»(ππΏ) for ππΏ < οΏ½ΜοΏ½ or ii) ππ» > οΏ½ΜοΏ½π»(ππΏ) > ππΏ for ππΏ β₯ οΏ½ΜοΏ½.
Proof: See Supplementary Appendix C.
The exact distortions in ππ»(πππ»+1π» ) and ππΏ(π
ππΏ+1πΏ ) are chosen by the principal to mitigate
the rent. Since the agentβs rent depends on both output and the difference in expected costs after
failure, distortions in output are proportional to βπππ»+1 and βπππΏ+1, which are non-monotonic in
time. This makes it challenging to derive necessary and sufficient conditions for over/under
experimentation when output is chosen optimally. We characterize sufficient conditions for over
and under experimentation for the high type below.
Claim. Sufficient conditions for over-experimentation.
There exist 0 < ππΏ
< οΏ½ΜΏοΏ½πΏ < 1 and οΏ½ΜΏοΏ½π» > ππ»(ππΏ) such that
ππ» > ππΉπ΅π» (over experimentation is optimal) if οΏ½ΜΏοΏ½π» > ππ» > π
π»(ππΏ) and ππΏ > π
πΏ;
ππ» < ππΉπ΅π» (under experimentation is optimal) if ππ» < ππ»(ππΏ) and ππΏ < οΏ½ΜΏοΏ½πΏ.
Proof: See Supplementary Appendix C.
3.2. Success might be hidden: ex post moral hazard
In the base model, we have suppressed moral hazard to highlight the screening properties
of the timing of rewards, which allowed us to isolate the importance of the monotonic likelihood
ratio of success in screening the two types. As we noted before, modeling both hidden effort and
privately known skill in experimentation is beyond the scope of this paper. However, we can
introduce ex post moral hazard by relaxing our assumption that the outcome of experiments in
each period is publicly observable. This introduces a moral hazard rent in each period, but our
key insights regarding the screening properties of the optimal contract remain intact. It remains
optimal to provide exaggerated rewards for the efficient type at the beginning and for the
24
inefficient type at the end of experimentation even under ex post moral hazard. Furthermore, the
agentβs rent is still determined by the difference in expected cost, which remains non-monotonic
in time. Thus, the reasons for over-experimentation also remain intact.
Specifically, we assume that success is privately observed by the agent, and that an agent
who finds success in some period π can choose to announce or reveal it at any period π‘ β₯ π.
Thus, we assume that success generates hard information that can be presented to the principal
when desired, but it cannot be fabricated. The agentβs decision to reveal success is affected not
only by the payment and the output tied to success/failure in the particular period π, but also by
the payment and output in all subsequent periods of the experimentation stage.
Note first that if the agent succeeds but hides it, the principal and the agentβs beliefs are
different at the production stage: the principalβs expected cost is given by πππ+1π while the agent
knows the true cost is π. In addition to the existing (πΌπ ) and (πΌπΆ) constraints, the optimal
scheme must now satisfy the following new ex post moral hazard constraints:
(πΈππ»π) π¦πππ β₯ π₯π + (π
ππ+1π β π)ππΉ for π = π», πΏ, and
(πΈπππ‘π) π¦π‘
π β₯ πΏπ¦π‘+1π for π‘ β€ ππ β 1.
The (πΈππ»π) constraint makes it unprofitable for the agent to hide success in the last
period. The (πΈπππ‘π) constraint makes it unprofitable to postpone revealing success in prior
periods. The two together imply that the agent cannot gain by postponing or hiding success.
The principalβs problem is exacerbated by having to address the ex post moral hazard constraints
in addition to all the constraints presented before. First, as formally shown in the Supplementary
Appendix D, both (πΌπΆπ»,πΏ) and (πΌπΆπΏ,π») may be slack, and either or both may be binding.33 Since
the ex post moral hazard constraints imply that both types will receive rent, these rents may be
sufficient to satisfy the (πΌπΆ) constraints. Second, private observation of success increases the
cost of paying a reward after failure. When the principal rewards failure with π₯π > 0, the
(πΈππ»π) constraint forces her to also reward success in the last period (π¦πππ > 0 because of
(πΈππ»π)) and in all previous periods (π¦π‘π > 0 because of (πΈπππ‘
π)). However, we show below
that it can still be optimal to reward failure.
33 Unlike the case when success is public, the (πΌπΆπΏ,π») may not always be binding.
25
Proposition 5.
When success can be hidden, the principal must reward success in every period for each type.
When both the (πΌπΆπΏ,π») and (πΌπΆπ»,πΏ) constraints bind and the optimal ππΏ β€ οΏ½ΜοΏ½πΏ, it is optimal to
reward failure for the low type.
Proof: See Supplementary Appendix D.
Details and the formal proof are in the Supplementary Appendix D. Here we provide
some intuition why rewarding failure or postponing rewards remain optimal even when the agent
privately observes success. We also provide an example below where this occurs in equilibrium.
The argument for postponing rewards to the low type to effectively screen the high type applies
even when success is privately observed. This is because the relative probability of success
between types is not affected by the two ex post moral hazard constraints above. An increase of
$1 in π₯π causes an increase of $1 in π¦πππ , which in turn causes an increase in all the previous π¦π‘
π
according to the discount factor. Therefore, the increases in π¦πππ and π¦π‘
π are not driven by the
relative probability of success between types. And, just as in Proposition 3, we again find that it
is optimal to postpone the reward for the low type if he experiments for a relatively brief length
of time and both (πΌπΆπ»,πΏ) and (πΌπΆπΏ,π») are binding. For example, when π½0 = 0.7 πΎ = 2, ππΏ =
0.28, ππ» = 0.7 the principal optimally chooses ππ» = 1, ππΏ = 2 and rewards the low type after
failure since οΏ½ΜοΏ½πΏ = 3.
While we have focused on how the ex post moral hazard affects the benefit of rewarding
failure, those constraints also affect the other optimal variables of the contract. For instance, the
constraint (πΈπππ‘π) can be relaxed by decreasing either ππ. So, we expect a shorter
experimentation stage and a lower output when success can he hidden.
3.3. Learning bad news
In this section, we show that our main results survive if the object of experimentation is
to seek bad news, where success in an experiment means discovery of high cost π = π. For
instance, stage 1 of a drug trial looks for bad news by testing the safety of the drug. Following
the literature on experimentation we call an event of observing π = π by the agent βsuccessβ
although this is bad news for the principal. If the agentβs type were common knowledge, the
principal and agent both become more optimistic if success is not achieved in a particular period
26
and relatively more optimistic when the agent is a high type than a low type. Also, as time goes
by without learning that the cost is high, the expected cost becomes lower due to Bayesian
updating and converges to π. In addition, the difference in the expected cost is now negative,
βππ‘ = ππ‘π» β ππ‘
πΏ < 0 since the π» type is relatively more optimistic after the same amount of
failures. However, βππ‘ remains non-monotonic in time and the reasons for over experimentation
remain unchanged.
Denoting by π½π‘π the updated belief of agent π that the cost is actually high, the type πβs
expected cost is then ππ‘π = π½π‘
ππ + (1 β π½π‘π) π. An agent of type π, announcing his type as π,
receives expected utility ππ(ποΏ½ΜοΏ½) at time zero from a contract ποΏ½ΜοΏ½, but now π¦π‘οΏ½ΜοΏ½ = π€π‘
οΏ½ΜοΏ½(π) β
πππ‘οΏ½ΜοΏ½(π) is a function of π.
Under asymmetric information about the agentβs type, the intuition behind the key
incentive problem is similar to that under learning good news. However, it is now the high type
who has an incentive to claim to be a low type. Given the same length of experimentation,
following failure, the expected cost is higher for the low type. Thus, a high type now has an
incentive to claim to be a low type: since a low type must be given his expected cost following
failure, a high type will have to be given a rent to truthfully report his type as his expected cost is
lower, that is, πππΏ+1π» < π
ππΏ+1πΏ . The details of the optimization problem mirror the case for good
news of Propositions 2, 3, and 4 and the results are similar. We present results formally in
Proposition 6 in Supplementary Appendix E.
We find similar restrictions when both (πΌπΆ) constraints bind as in Propositions 2, 3 and 4.
The type of news, however, determines length of experimentation decisions. The parallel
between good news and bad news is remarkable but not difficult to explain. In both cases, the
agent is looking for news. The types determine how good the agent is at obtaining this news.
The contract gives incentives for each type of agent to reveal his type, not the actual news.
4. Conclusions
In this paper, we have studied the interaction between experimentation and production
where the length of the experimentation stage determines the degree of asymmetric information
at the production stage. While there has been much recent attention on studying incentives for
experimentation in two-armed bandit settings, details of the optimal production decision are
27
typically suppressed to focus on incentives for experimentation. Each stage may impact the
other in interesting ways and our paper is a step towards studying this interaction.
There is also a significant literature on endogenous information gathering in contract
theory but typically relying on static models of learning. By modeling experimentation in a
dynamic setting, we have endogenized the degree of asymmetry of information in a principal
agent model and related it to the length of the learning stage.
When there is an optimal production decision after experimentation, we find a new result
that over-experimentation is a useful screening device. Likewise, over production is also useful
to mitigate the agentβs information rent. By analyzing the stochastic structure of the dynamic
problem, we clarify how the principal can rely on the relative probabilities of success and failure
of the two types to screen them. The rent to a high type should come after early success and to
the low type for late success. If the experimentation stage is relatively short, the principal has no
recourse but to pay the low typeβs rent after failure, which is another novel result.
While our main section relies on publicly observed success, we show that our key
insights survive if the agent can hide success. Then, there is ex post moral hazard, which implies
that the agent is paid a rent in every period, but the screening properties of the optimal contract
remain intact. Finally, we prove that our key insights do hold in both good and bad-news
models.
28
References
Arve M. and D. Martimort (2016). Dynamic procurement under uncertainty: Optimal design and
implications for incomplete contracts. American Economic Review 106 (11), 3238-3274.
Bergemann D. and U. Hege (1998). Venture capital financing, moral hazard, and learning.
Journal of Banking & Finance 22 (6-8), 703-735.
Bergemann D. and U. Hege (2005). The Financing of Innovation: Learning and Stopping. Rand
Journal of Economics 36 (4), 719-752.
Bergemann D. and J. VΓ€limΓ€ki (2008). Bandit Problems. In Durlauf, S., Blume, L. (Eds.), The
New Palgrave Dictionary of Economics. 2nd edition. Palgrave Macmillan Ltd,
Basingstoke, New York.
Bolton P. and Harris C. (1999). Strategic Experimentation. Econometrica 67(2), 349β374.
CrΓ©mer J. and Khalil F. (1992). Gathering Information before Signing a Contract. American
Economic Review 82(3), 566-578.
CrΓ©mer J., Khalil F. and Rochet J.-C. (1997). Contracts and productive information gathering.
Games and Economic Behavior 25(2), 174-193.
Ederer F. and Manso G. (2013). Is Pay-for-Performance Detrimental to Innovation? Management
Science, 59 (7), 1496-1513.
Gerardi D. and Maestri L. (2012). A Principal-Agent Model of Sequential Testing. Theoretical
Economics 7, 425β463.
Gomes R., Gottlieb D. and Maestri L. (2016). Experimentation and Project Selection: Screening
and Learning. Games and Economic Behavior 96, 145-169.
Halac M., N. Kartik and Q. Liu (2016). Optimal contracts for experimentation. Review of
Economic Studies 83(3), 1040-1091.
Horner J. and L. Samuelson (2013). Incentives for Experimenting Agents. RAND Journal of
Economics 44(4), 632-663.
KrΓ€hmer D. and Strausz R. (2011). Optimal Procurement Contracts with Pre-Project Planning.
Review of Economic Studies 78, 1015-1041.
KrΓ€hmer D. and Strausz R. (2015) Optimal Sales Contracts with Withdrawal Rights. Review of
Economic Studies, 82, 762β790.
Laffont J.-J. and Tirole J. (1998). The Dynamics of Incentive Contracts. Econometrica 56(5),
1153-1175.
Lewis T. and Sappington D. (1997) Information Management in Incentive Problems. Journal of
Political Economy 105(4), 796-821.
Manso G. (2011). Motivating Innovation. Journal of Finance 66 (2011), 1823β1860.
Rodivilov A. (2018). Monitoring Innovation. Working Paper.
Rosenberg, D., Salomon, A. and Vieille, N. (2013). On games of strategic experimentation.
Games and Economic Behavior 82, 31-51.
Sadler E. (2017). Dead Ends. Working Paper, April 2017.
29
Appendix A (Proof of Sufficient conditions for (π°πͺπ―,π³) to be binding,
Propositions 2 and 3)
First best: Characterizing οΏ½ΜοΏ½.
Claim. There exists οΏ½ΜοΏ½ β (0,1), such that πππΉπ΅
π
πππ < 0 for ππ < οΏ½ΜοΏ½ and πππΉπ΅
π
πππ β₯ 0 for ππ β₯ οΏ½ΜοΏ½.
Proof: The first-best termination date π‘ is such that
π½π‘π[π(ππ) β πππ] + (1 β π½π‘π)[π(ππΉ) β ππ‘+1ππΉ] = πΎ + [π(ππΉ) β ππ‘ππΉ].
Rewriting it next we have
π½π‘π[π(ππ) β πππ β π(ππΉ)] + ππΉ[ππ‘ β ππ‘+1(1 β π½π‘π)] = πΎ,
which given that ππ‘ β ππ‘+1(1 β π½π‘π) = π½π‘ππ, can be rewritten next as
π½π‘π =πΎ
(π(ππ)βπππ)β(π(ππΉ)βπππΉ),
which implicitly determines π‘ as a function of π, π‘(π). Using the Implicit Function Theorem
ππ‘
ππ= β
π[ππ½0(1βπ)π‘β1
π½0(1βπ)π‘β1+(1βπ½0)]
ππ
π[ππ½0(1βπ)π‘β1
π½0(1βπ)π‘β1+(1βπ½0)]
ππ‘
.
Since π[
ππ½0(1βπ)π‘β1
π½0(1βπ)π‘β1+(1βπ½0)]
ππ=
π½0((1 β π)π‘β1 + π(1 β π)π‘β2(π‘ β 1)(β1)) (π½
0(1 β π)π‘β1 + (1 β π½
0)) β π½
0π(1 β π)π‘β1 (π½
0(1 β π)π‘β1 + (1 β π½
0))
(π½0(1 β π)π‘β1 + (1 β π½
0))
2
=π½0(1βπ)π‘β1[1βπ½0+π½0(1βπ)π‘β1β
(1βπ½0)π(π‘β1)
1βπ]
(π½0(1βπ)π‘β1+(1βπ½0))2 ,
and π[
ππ½0(1βπ)π‘β1
π½0(1βπ)π‘β1+(1βπ½0)]
ππ‘=
π½0(1βπ½0)π(1βπ)π‘β1 ππ(1βπ)
(π½0(1βπ)π‘β1+(1βπ½0))2 , we have
ππ‘
ππ= β
π½0(1βπ)π‘β1[1βπ½0+π½0(1βπ)π‘β1β(1βπ½0)π(π‘β1)
1βπ]
(π½0(1βπ)π‘β1+(1βπ½0))2
π½0(1βπ½0)π(1βπ)π‘β1 ππ(1βπ)
(π½0(1βπ)π‘β1+(1βπ½0))2
= β(1βπ½0)(1βππ‘)+π½0(1βπ)π‘
(1βπ½0)π(1βπ) ππ(1βπ).
30
For ππ‘
ππ< 0 it is necessary and sufficient that (1 β π½0)(1 β ππ‘) + π½0(1 β π)π‘ < 0. Since
π[(1βπ½0)(1βππ‘)+π½0(1βπ)π‘]
ππ‘< 0 for any π it is sufficient to find οΏ½ΜοΏ½ such that (1 β π½0)(1 β 2π) +
π½0(1 β π)2 < 0 for any π > οΏ½ΜοΏ½. Since (1 β π½0)(1 β 2π) + π½0(1 β π)2 =
π½0 (π β1ββ1βπ½0
π½0) (π β
1+β1βπ½0
π½0), we define οΏ½ΜοΏ½ =
1ββ1βπ½0
π½0.34 Q.E.D.
Sufficient conditions for (π°πͺπ―,π³) to be binding.
Claim. For any ππΏ β (0,1), there exists 0 < ππ»(ππΏ) < ππ»(ππΏ) < 1 such that the first best order
of termination dates is preserved in equilibrium and (πΌπΆπ»,πΏ) binds if
either i) ππ» < πππ{ππ»(ππΏ), οΏ½ΜοΏ½} for ππΏ < οΏ½ΜοΏ½ or ii) ππ» > ππ»(ππΏ) > ππΏ for ππΏ β₯ οΏ½ΜοΏ½.
Proof: We will prove later in this Appendix A (see Optimal payment structure) that when the
(πΌπΆπ»,πΏ) binds, the principal will pay ππΏ by only rewarding failure (π₯πΏ > 0), or only rewarding
success in the last period ππΏ (π¦ππΏπΏ > 0). In step 1 below, we characterize a function ν(π‘) that
determines the sign of the high-typeβs gamble, i.e., if (πΌπΆπ»,πΏ) is binding, regardless of whether
the agent is optimally paid after failure or success. In step 2, we characterize values of ππΏ and ππ»
such that ν(π‘) is monotonic. These two steps together imply that the gamble is positive under
this set of ππΏ and ππ» if the order of optimal termination dates (ππΏ and ππ») is the same as in the
first best. In step 3, we prove that under this set of ππΏ and ππ» it is indeed optimal for the principal
to make the (πΌπΆπ»,πΏ) binding in equilibrium. Therefore, the sufficient conditions characterize
values of ππΏ and ππ» such that the principal finds it optimal to preserve the first best order of
termination dates in equilibrium.
We begin by deriving the gamble if the agent is optimally rewarded after failure and after
success in the last period ππΏ. If the principal rewards the low type after failure, the high typeβs
expected utility from misreporting (i.e., the π π»π of the (πΌπΆπ»,πΏ) constraint) is:
πππΏπ»
πππΏπΏ πΏππ»
πππ»πΏ βπππ»+1ππΉ β πΏππΏ
πππΏπ» βπππΏ+1ππΉ = ππΉπ
ππΏπ» (πΏππ» π
ππ»πΏ
πππΏπΏ βπππ»+1 β πΏππΏ
βπππΏ+1).
34 Note that 1ββ1βπ½0
π½0 is well defined and 0 <
1ββ1βπ½0
π½0< 1 for π½0 < 1.
31
If the principal rewards the low type only after success in period ππΏ, the high typeβs
expected utility from misreporting (i.e., the π π»π of the (πΌπΆπ»,πΏ) constraint) is:
π½0(1βππ»)ππΏβ1
ππ»
π½0(1βππΏ)ππΏβ1ππΏπΏππ»
πππ»πΏ βπππ»+1ππΉ β πΏππΏ
πππΏπ» βπππΏππΉ =
ππΉ ((1βππ»)
ππΏβ1ππ»
(1βππΏ)ππΏβ1ππΏπΏππ»
πππ»πΏ βπππ»+1 β πΏππΏ
πππΏπ» βπππΏ+1).
Step 1. The gamble for the high type is positive (regardless of the payment scheme) if
and only if ν(ππ») > ν(ππΏ), where ν(π‘) β‘ πΏπ‘ππ‘πΏ(π½π‘+1
πΏ β π½π‘+1π» ).
a) Consider the case of reward after failure first, ππΉπππΏπ» (πΏππ» π
ππ»πΏ
πππΏπΏ βπππ»+1 β πΏππΏ
βπππΏ+1). Given
that βππ‘ = (π β π)(π½π‘πΏ β π½π‘
π»), the gamble is positive if and only if
πΏππ» πππ»πΏ
πππΏπΏ βπππ»+1 β πΏππΏ
βπππΏ+1 > 0,
πΏππ»π
ππ»πΏ (π½
ππ»+1πΏ β π½
ππ»+1π» ) > πΏππΏ
πππΏπΏ (π½
ππΏ+1πΏ β π½
ππΏ+1π» ),
which can be re-written as ν(ππ») > ν(ππΏ).
b) Consider now the case of reward after success, ππΉ ((1βππ»)
ππΏβ1ππ»
(1βππΏ)ππΏβ1ππΏπΏππ»
πππ»πΏ βπππ»+1 β
πΏππΏπ
ππΏπ» βπππΏ+1). Similarly, the gamble is positive if and only if
(1βππ»)ππΏβ1
ππ»
(1βππΏ)ππΏβ1ππΏπΏππ»
πππ»πΏ βπππ»+1 β πΏππΏ
πππΏπ» βπππΏ+1 > 0,
(1βππ»)ππΏβ1
ππ»
(1βππΏ)ππΏβ1ππΏ
πππΏπΏ
πππΏπ» >
πΏππΏπππΏπΏ (π½
ππΏ+1πΏ βπ½
ππΏ+1π» )
πΏππ»πππ»πΏ (π½
ππ»+1πΏ βπ½
ππ»+1π» )
.
We will prove later in this Appendix A (see Optimal payment structure) that when gamble
for the high type is positive, the low type is rewarded for success if and only if (1βππ»)
ππΏβ1ππ»
(1βππΏ)ππΏβ1ππΏ>
πππΏπ»
πππΏπΏ or, alternatively,
(1βππ»)ππΏβ1
ππ»πππΏπΏ
(1βππΏ)ππΏβ1ππΏπππΏπ»
> 1. Therefore, a sufficient condition for the gamble to be
positive when the low type is rewarded for success is 1 >πΏππΏ
πππΏπΏ (π½
ππΏ+1πΏ βπ½
ππΏ+1π» )
πΏππ»πππ»πΏ (π½
ππ»+1πΏ βπ½
ππ»+1π» )
. Given that
πΏππΏπππΏπΏ (π½
ππΏ+1πΏ βπ½
ππΏ+1π» )
πΏππ»πππ»πΏ (π½
ππ»+1πΏ βπ½
ππ»+1π» )
=π(ππΏ)
π(ππ»), the condition above may be rewritten as 1 >
π(ππΏ)
π(ππ»).
32
Therefore, we have proved that a sufficient condition for the gamble to be positive
(regardless of the payment scheme) is ν(ππ») > ν(ππΏ).
We now explore the properties of ν(π‘) function.
Step 2. Next, we prove that for any ππΏ β (0,1), there exist ππ»(ππΏ) and ππ»(ππΏ), 1 > π
π» >
ππ» > 0 such that ππ(π‘)
ππ‘< 0 if ππ» > π
π» and
ππ(π‘)
ππ‘> 0 if ππ» < ππ».
To simplify, ν(π‘) = πΏπ‘(1 β π½0 + π½0(1 β ππΏ)π‘) (π½0(1βππΏ)
π‘
π½0(1βππΏ)π‘+(1βπ½0)β
π½0(1βππ»)π‘
π½0(1βππ»)π‘+(1βπ½0))
= πΏπ‘π½0((1βππΏ)
π‘(π½0(1βππ»)
π‘+(1βπ½0))β(1βππ»)
π‘(1βπ½0+π½0(1βππΏ)
π‘))
π½0(1βππ»)π‘+(1βπ½0)
= πΏπ‘π½0(1βπ½0)((1βππΏ)
π‘β(1βππ»)
π‘)
(π½0(1βππ»)π‘+(1βπ½0))= πΏπ‘
π½0(1βπ½0)((1βππΏ)π‘β(1βππ»)
π‘)
ππ‘π» .
πν(π‘)
ππ‘
= πΏπ‘((1 β ππΏ)π‘ππ(1 β ππΏ) β (1 β ππ»)π‘ππ(1 β ππ»))ππ‘
π» β π½0(1 β ππ»)π‘ππ(1 β ππ»)((1 β ππΏ)π‘ β (1 β ππ»)π‘)
(ππ‘π»)2 1
π½0(1 β π½
0)
+πΏπ‘ ππ πΏππ‘
π»((1 β ππΏ)π‘ β (1 β ππ»)π‘)
(ππ‘π»)2 1
π½0(1 β π½0)
= πΏπ‘ [ππ(1βππΏ)+πππΏ](1βππΏ)π‘ππ‘
π»β(1βππ»)π‘(ππ‘
πΏππ(1βππ»)+ππ‘π»πππΏ)
(ππ‘π»)
2 1
π½0(1βπ½0)
.
Function ν(π‘) decreases with π‘ if and only if π(ππ») < 0, where
π(ππ») = [ππ(1 β ππΏ) + ππ πΏ](1 β ππΏ)π‘ππ‘π» β (1 β ππ»)π‘(ππ‘
πΏππ(1 β ππ») + ππ‘π»πππΏ).
We prove first that π(ππ») < 0 for any π‘ if ππ» is sufficiently high.
Since both (1 β ππΏ)π‘ππ‘π» and (1 β ππ»)π‘(ππ‘
πΏππ(1 β ππ») + ππ‘π»πππΏ) are increasing in π‘, we have
π(ππ») < (1 β π½0)(1 β ππΏ)π ππ[πΏ(1 β ππΏ)] β (1 β ππ»)(π1πΏππ(1 β ππ») + π1
π»πππΏ),
where π = max{ππ», ππΏ}.
Next, (1 β ππ»)(π1πΏππ(1 β ππ») + π1
π»πππΏ) > (1 β ππ»)(1 β π½0ππΏ) ππ[πΏ(1 β ππ»)] and
ππ[πΏ(1βππ»)]
ππ[πΏ(1βππΏ)]> 1. Therefore,
(1βπ½0)(1βππΏ)π
(1βππ»)(1βπ½0ππΏ)> 1 βΉ π(ππ») < 0.
Rearranging the above, we have π(ππ») < 0 for any π‘ if ππ» > 1 β(1βπ½0)(1βππΏ)
π
(1βπ½0ππΏ).
33
Denote
ππ»
β‘ 1 β(1 β π½0)(1 β ππΏ)π
(1 β π½0ππΏ).
Therefore, for any π‘, π(ππ») < 0 if ππ» > ππ»
and, consequently, ππ(π‘)
ππ‘< 0 for any π‘ if ππ» > π
π».
We next prove π(ππ») > 0 for any π‘ if ππ» is sufficiently low.
Similarly, because both (1 β ππΏ)π‘ππ‘π» and (1 β ππ»)π‘(ππ‘
πΏππ(1 β ππ») + ππ‘π»πππΏ) are increasing in π‘,
π(ππ») > π1π»(1 β ππΏ) ππ[πΏ(1 β ππΏ)] β (1 β ππ»)π(π
ππΏππ(1 β ππ») + π
ππ»πππΏ).
Next, (1 β ππ»)ππππ» ππ[πΏ(1 β ππ»)] > (1 β ππ»)π(π
ππΏππ(1 β ππ») + π
ππ»πππΏ) and
ππ[πΏ(1βππ»)]
ππ[πΏ(1βππΏ)]> 1.
Therefore, (1 β π½0ππ»)(1 β ππΏ) < (1 β ππ»)π(1 β π½0 + π½0(1 β ππ»)π) βΉ π(ππ») > 0.
Rearranging the above, we have π(ππ») > 0 for any π‘ if
(1βπ½0ππ»)
(1βππ»)π(1 β ππΏ) < (1 β π½0 + π½0(1 β ππ»)π).
The left-hand side of the above inequality, (1βπ½0ππ»)
(1βππ»)π(1 β ππΏ), is increasing in ππ»:
π[(1βπ½0ππ»)
(1βππ»)π
]
πππ» =βπ½0(1βππ»)+π(1βπ½0ππ»)
(1βππ»)π+1> 0.
The right-hand side of the above inequality, 1 β π½0 + π½0(1 β ππ»)π, is decreasing in ππ»:
π[1βπ½0+π½0(1βππ»)π]
πππ» = βπ½0π(1 β ππ»)πβ1 < 0.
Therefore, there exists a unique value of ππ» = ππ», such that if ππ» < ππ»: (1βπ½0ππ»)
(1βππ»)π(1 β ππΏ) <
(1 β π½0 + π½0(1 β ππ»)π) and (1βπ½0ππ»)
(1βππ»)π(1 β ππΏ) > (1 β π½0 + π½0(1 β ππ»)π) if ππ» > ππ».
Denote ππ» as the unique value of ππ» that makes the πΏπ»π and π π»π equal
ππ»: (1βπ½0ππ»)
(1βππ»)π
(1 β ππΏ) = (1 β π½0 + π½0(1 β ππ»)π).
Therefore, for any π‘, π(ππ») > 0 if ππ» < ππ» and, consequently, ππ(π‘)
ππ‘> 0 for any π‘ if ππ» < ππ».
Step 3. We now prove by contradiction that for ππ» < πππ{ππ»(ππΏ), οΏ½ΜοΏ½} and οΏ½ΜοΏ½ < ππΏ < ππ»
<
ππ», it is optimal to have (πΌπΆπ»,πΏ) binding, i.e., pay rent to both types rather than the low type only.
34
We consider οΏ½ΜοΏ½ < ππΏ < ππ»
< ππ» and argue later that case 0 < ππΏ < ππ» < min{ππ», οΏ½ΜοΏ½} can
be proven by repeating the same procedure.
Step 3a. Assume (πΌπΆπ»,πΏ) is not binding and it is optimal to pay rent only to the low type.
Denote optimal duration of experimentation stage for both types πππ΅πΏ and πππ΅
π» , respectively. We
prove later in this Appendix A (see Optimal length of experimentation, Case A) that in this case
ππΉπ΅π» < πππ΅
π» and ππΉπ΅πΏ = πππ΅
πΏ .
Since gamble for the high type is positive if ν(ππ») > ν(ππΏ) and ππ(π‘)
ππ‘< 0 for ππ» > π
π», it
must be that πππ΅π» > ππΉπ΅
πΏ (otherwise the gamble for the high type would be positive and both types
would collect rent). Denote the expected surplus net of costs for π = π», πΏ by Ξ©π(ππ) =
π½0 β πΏπ‘ππ
π‘=1 (1 β ππ)π‘β1
ππ[π(ππ) β πππ β π€π‘] + πΏπππ
πππ [π(ππΉ) β π
ππ+1π ππΉ β π€ππ]. Since the
principal optimally distorts ππ», it must be that
πΞ©π»(πππ΅π» ) + (1 β π)Ξ©πΏ(ππΉπ΅
πΏ ) β (1 β π)ππΏ(πππ΅π» ) >
πΞ©π»(ππΉπ΅π» ) + (1 β π)Ξ©πΏ(ππΉπ΅
πΏ ) β πππ»(ππΉπ΅π» , ππΉπ΅
πΏ ) β (1 β π)ππΏ(ππΉπ΅π» , ππΉπ΅
πΏ ),
where ππ»(ππΉπ΅π» , ππΉπ΅
πΏ ) and ππΏ(ππΉπ΅π» , ππΉπ΅
πΏ ) appear on the RHS because the ππΉπ΅π» < ππΉπ΅
πΏ and, given and
ππ(π‘)
ππ‘< 0, the gamble is positive (so the principal has to pay rent to both types).
Given that ππΉπ΅πΏ = πππ΅
πΏ , the condition above simplifies to
πππ»(ππΉπ΅π» , ππΉπ΅
πΏ ) + (1 β π)ππΏ(ππΉπ΅π» , ππΉπ΅
πΏ ) β (1 β π)ππΏ(πππ΅π» ) > π[Ξ©π»(ππΉπ΅
π» ) β Ξ©π»(πππ΅π» )].
Step 3b. We now show that our assumption (that (πΌπΆπ»,πΏ) is not binding and it is optimal to pay
rent only to the low type) leads to a contradiction by proving that the principal can be better off
by paying rent to both types instead.
Consider another contract with ππΏ = ππΉπ΅πΏ and ππ» = ππΉπ΅
πΏ β ν, where agentβs rewards are
chosen optimally given these πs.35 With ππΏ = ππΉπ΅πΏ and ππ» = ππΉπ΅
πΏ β ν, the principalβs expected
profit becomes
πΞ©π»(ππΉπ΅πΏ β ν) + (1 β π)Ξ©πΏ(ππΉπ΅
πΏ ) β πππ»(ππΉπ΅πΏ β ν, ππΉπ΅
πΏ ) β (1 β π)ππΏ(ππΉπ΅πΏ β ν, ππΉπ΅
πΏ ).
Therefore, the principal is better off with the newly introduced contract (ππΏ = ππΉπ΅πΏ and
ππ» = ππΉπ΅πΏ β ν) than with (ππΉπ΅
πΏ , πππ΅π» ) if
πΞ©π»(ππΉπ΅πΏ β ν) + (1 β π)Ξ©πΏ(ππΉπ΅
πΏ ) β πππ»(ππΉπ΅πΏ β ν, ππΉπ΅
πΏ ) β (1 β π)ππΏ(ππΉπ΅πΏ β ν, ππΉπ΅
πΏ ) >
35 Since we consider ππ» sufficiently higher than ππΏ , we can always choose ππΉπ΅
π» < ππΉπ΅πΏ + 1 for οΏ½ΜοΏ½ < ππΏ and, there
always exists ν such that ππΉπ΅πΏ β ν > ππΉπ΅
π» .
35
πΞ©π»(πππ΅π» ) + (1 β π)Ξ©πΏ(ππΉπ΅
πΏ ) β (1 β π)ππΏ(πππ΅π» ) or, equivalently
π[Ξ©π»(ππΉπ΅πΏ β ν) β Ξ©π»(πππ΅
π» )] >
πππ»(ππΉπ΅πΏ β ν, ππΉπ΅
πΏ ) + (1 β π)ππΏ(ππΉπ΅πΏ β ν, ππΉπ΅
πΏ ) β (1 β π)ππΏ(πππ΅π» ).
Since πππ»(ππΉπ΅π» , ππΉπ΅
πΏ ) + (1 β π)ππΏ(ππΉπ΅π» , ππΉπ΅
πΏ ) β (1 β π)ππΏ(πππ΅π» ) > π[Ξ©π»(ππΉπ΅
π» ) β Ξ©π»(πππ΅π» )] by
assumption (see step 3a) and π[Ξ©π»(ππΉπ΅π» ) β Ξ©π»(πππ΅
π» )] > π[Ξ©π»(ππΉπ΅πΏ β ν) β Ξ©π»(πππ΅
π» )] because
πππ΅π» β ππΉπ΅
π» > πππ΅π» β (ππΉπ΅
πΏ β ν), the principal is better off with the newly introduced contract
(ππΏ = ππΉπ΅πΏ and ππ» = ππΉπ΅
πΏ β ν) than with (ππΉπ΅πΏ , πππ΅
π» ) if
πππ»(ππΉπ΅π» , ππΉπ΅
πΏ ) + (1 β π)ππΏ(ππΉπ΅π» , ππΉπ΅
πΏ ) β (1 β π)ππΏ(πππ΅π» ) >
πππ»(ππΉπ΅πΏ β ν, ππΉπ΅
πΏ ) + (1 β π)ππΏ(ππΉπ΅πΏ β ν, ππΉπ΅
πΏ ) β (1 β π)ππΏ(πππ΅π» ).
The above inequality holds for any ν < ππΉπ΅πΏ β ππΉπ΅
π» , since we prove later in this Appendix A (see
Sufficient conditions for over/under experimentation) that for ππ»
< ππ» both πππ»(ππ»,ππΏ)
πππ» < 0 and
πππΏ(ππ»,ππΏ)
πππ» < 0. Therefore, the principal is better off by paying rent to both types rather than to
the low type only.
Consider now 0 < ππΏ < ππ» < min{ππ», οΏ½ΜοΏ½}. Repeating the same procedure with ππΏ =
ππΉπ΅π» β ν and ππ» = ππΉπ΅
π» and given that for ππΏ < ππ» < min{ππ», οΏ½ΜοΏ½} both ππ(π‘)
ππ‘< 0 and ππΉπ΅
π» > ππΉπ΅πΏ a
similar conclusion follows.
Q.E.D.
Proof of Propositions 2 and 3
We first characterize the optimal payment structure given ππΏ and ππ», π₯πΏ, {π¦π‘πΏ}π‘=1
ππΏ, π₯π» and
{π¦π‘π»}π‘=1
ππ» (Proposition 3), then the optimal length of experimentation, ππΏ and ππ» (Proposition 2).
Denote the expected surplus net of costs for π = π», πΏ by Ξ©π(ππ) = π½0 β πΏπ‘ππ
π‘=1 (1 β
ππ)π‘β1
ππ[π(ππ) β πππ β π€π‘] + πΏπππ
πππ [π(ππΉ) β π
ππ+1π ππΉ β π€ππ]. The principalβs optimization
problem then is to choose contracts ππ» and ππΏ to maximize the expected net surplus minus rent
of the agent, subject to the respective πΌπΆ and πΌπ constraints given below:
πππ₯ πΈπ {Ξ©π(ππ) β π½0 β πΏπ‘ππ
π‘=1 (1 β ππ)π‘β1
πππ¦π‘π β πΏππ
ππππ π₯π} subject to:
(πΌπΆπ»,πΏ) π½0 β πΏπ‘ππ»
π‘=1 (1 β ππ»)π‘β1ππ»π¦π‘π» + πΏππ»
πππ»π» π₯π»
36
β₯ π½0 β πΏπ‘ππΏ
π‘=1 (1 β ππ»)π‘β1ππ»π¦π‘πΏ + πΏππΏ
πππΏπ» [π₯πΏ β βπππΏ+1ππΉ],
(πΌπΆπΏ,π») π½0 β πΏπ‘ππΏ
π‘=1 (1 β ππΏ)π‘β1ππΏπ¦π‘πΏ + πΏππΏ
πππΏπΏ π₯πΏ
β₯ π½0 β πΏπ‘ππ»
π‘=1 (1 β ππΏ)π‘β1ππΏπ¦π‘π» + πΏππ»
πππ»πΏ [π₯π» + βπππ»+1ππΉ],
(πΌπ ππ‘π») π¦π‘
π» β₯ 0 for π‘ β€ ππ»,
(πΌπ ππ‘πΏ) π¦π‘
πΏ β₯ 0 for π‘ β€ ππΏ,
(πΌπ πΉππ»π» ) π₯π» β₯ 0,
(πΌπ πΉππΏπΏ ) π₯πΏ β₯ 0.
We begin to solve the problem by first proving the following claim.
Claim: The constraint (πΌπΆπΏ,π») is binding and the low type obtains a strictly positive rent.
Proof: If the (πΌπΆπΏ,π») constraint was not binding, it would be possible to decrease the payment to
the low type until (πΌπ ππ‘πΏ) and (πΌπ πΉπ‘
πΏ) are binding, but that would violate (πΌπΆπΏ,π») since
βπππ»+1ππΉ > 0. Q.E.D.
I. Optimal payment structure, ππ³, {πππ³}π=π
π»π³, ππ― and {ππ
π―}π=ππ»π―
(Proof of Proposition 3)
First we show that if the high type claims to be the low type, the high type is relatively
more likely to succeed if experimentation stage is smaller than a threshold level, οΏ½ΜοΏ½πΏ. In terms of
notation, we define π2(π‘, ππΏ) =
πππΏπ»
πππΏπΏ (1 β ππΏ)π‘β1ππΏ β (1 β ππ»)π‘β1ππ» to trace difference in the
likelihood ratios of failure and success for two types.
Lemma 1: There exists a unique οΏ½ΜοΏ½πΏ > 1, such that π2(οΏ½ΜοΏ½πΏ , ππΏ) = 0, and
π2(π‘, ππΏ) {< 0 for π‘ < οΏ½ΜοΏ½πΏ
> 0 for π‘ > οΏ½ΜοΏ½πΏ.
Proof: Note that πππΏπ»
πππΏπΏ is a ratio of the probability that the high type does not succeed to the
probability that the low type does not succeed for ππΏ periods. At the same time,
π½0(1 β ππ)π‘β1
ππ is the probability that the agent of type π succeeds at period π‘ β€ ππΏ of the
experimentation stage and π½0(1βππ»)
π‘β1ππ»
π½0(1βππΏ)π‘β1ππΏ =(1βππ»)
π‘β1ππ»
(1βππΏ)π‘β1ππΏ is a ratio of the probabilities of success at
period π‘ by two types. As a result, we can rewrite π2(π‘, ππΏ) > 0 as
37
1βπ½0+π½0(1βππ»)ππΏ
1βπ½0+π½0(1βππΏ)ππΏ >(1βππ»)
π‘β1ππ»
(1βππΏ)π‘β1ππΏ for 1 β€ π‘ β€ ππΏ or, equivalently,
1βπ½0+π½0(1βππ»)ππΏ
(1βππ»)π‘β1ππ» >
1βπ½0+π½0(1βππΏ)ππΏ
(1βππΏ)π‘β1ππΏ for 1 β€ π‘ β€ ππΏ,
where 1βπ½0+π½0(1βππ)
ππΏ
(1βππ)π‘β1
ππ can be interpreted as a likelihood ratio.
We will say that when π2(π‘, ππΏ) > 0 (< 0) the high type is relatively more likely to fail
(succeed) than the low type during the experimentation stage if he chooses a contract designed
for the low type.
There exists a unique time period οΏ½ΜοΏ½πΏ(ππΏ , ππΏ , ππ», π½0) such that π2(οΏ½ΜοΏ½πΏ , ππΏ) = 0 defined as
οΏ½ΜοΏ½πΏ β‘ οΏ½ΜοΏ½πΏ(ππΏ , ππΏ , ππ», π½0) = 1 +
ln(π
ππΏπ»
πππΏπΏ
ππΏ
ππ»)
ln(1βππ»
1βππΏ) ,
where uniqueness follows from (1βππ»)
π‘β1ππ»
(1βππΏ)π‘β1ππΏ being strictly decreasing in π‘ and ππ»
ππΏ > 1 >πππΏπ»
πππΏπΏ .36 In
addition, for π‘ < οΏ½ΜοΏ½πΏ it follows that π2(π‘, ππΏ) < 0 and, as a result, the high type is relatively more
likely to succeed than the low type whereas for π‘ > οΏ½ΜοΏ½πΏ the opposite is true. Q.E.D.
We will show that the solution to the principalβs optimization problem depends on
whether the (πΌπΆπ»,πΏ) constraint is binding or not; we explore each case separately in what follows.
Case A: The (π°πͺπ―,π³) constraint is not binding.
In this case the high type does not receive any rent and it immediately follows that π₯π» =
0 and π¦π‘π» = 0 for 1 β€ π‘ β€ ππ», which implies that the rent of the low type in this case becomes
πΏππ»π
ππ»πΏ βπππ»+1ππΉ. Replacing π₯πΏ in the objective function, the principalβs optimization problem
is to choose ππ», ππΏ, {π¦π‘πΏ}π‘=1
ππΏ to
πππ₯ πΈπ{ππΉπ΅π (ππ) β (1 β π)πΏππ»
πππ»πΏ βπππ»+1ππΉ} subject to:
(πΌπ ππ‘πΏ) π¦π‘
πΏ β₯ 0 for π‘ β€ ππΏ,
and (πΌπ πΉππΏ) πΏππ»π
ππ»πΏ βπππ»+1ππΉ β π½0 β πΏπ‘ππΏ
π‘=1 (1 β ππΏ)π‘β1ππΏπ¦π‘πΏ β₯ 0.
36 To explain, π2(π‘, ππΏ) = 0 if and only if
1βπ½0+π½0(1βππ»)ππΏ
1βπ½0+π½0(1βππΏ)ππΏ =
(1βππ»)π‘β1
ππ»
(1βππΏ)π‘β1
ππΏ. Given that the right hand side of the
equation above is strictly decreasing since 1βππ»
1βππΏ < 1 and if evaluated at π‘ = 1 is equal to ππ»
ππΏ . Since
1βπ½0+π½0(1βππ»)ππΏ
1βπ½0+π½0(1βππΏ)ππΏ < 1 and
ππ»
ππΏ > 1 the uniqueness immediately follows. So οΏ½ΜοΏ½πΏ satisfies πππΏ π»
πππΏ πΏ =
(1βππ»)οΏ½ΜοΏ½πΏ β1
ππ»
(1βππΏ)οΏ½ΜοΏ½πΏ β1
ππΏ.
38
When the (πΌπΆπ»,πΏ) constraint is not binding, the claim below shows that there are no
restrictions in choosing {π¦π‘πΏ}π‘=1
ππΏ except those imposed by the (πΌπΆπΏ,π») constraint. In other words,
the principal can choose any combinations of nonnegative payments to the low type
(π₯πΏ , {π¦π‘πΏ}π‘=1
ππΏ) such that π½0 β πΏπ‘ππΏ
π‘=1 (1 β ππΏ)π‘β1ππΏπ¦π‘πΏ + πΏππΏ
πππΏπΏ π₯πΏ = πΏππ»
πππ»πΏ βπππ»+1ππΉ. Labeling
by {πΌπ‘πΏ}π‘=1
ππΏ, πΌπΏ the Lagrange multipliers of the constraints associated with (πΌπ ππ‘
πΏ) for π‘ β€ ππΏ,
and (πΌπ πΉππΏ) respectively, we have the following claim.
Claim A.1: If (πΌπΆπ»,πΏ) is not binding, we have πΌπΏ = 0 and πΌπ‘πΏ = 0 for all π‘ β€ ππΏ.
Proof: We can rewrite the Kuhn-Tucker conditions as follows:
πβ
ππ¦π‘πΏ = πΌπ‘
πΏ β πΌπΏπ½0πΏπ‘(1 β ππΏ)π‘β1ππΏ = 0 for 1 β€ π‘ β€ ππΏ;
πβ
ππΌπ‘πΏ = π¦π‘
πΏ β₯ 0; πΌπ‘πΏ β₯ 0; πΌπ‘
πΏπ¦π‘πΏ = 0 for 1 β€ π‘ β€ ππΏ.
Suppose to the contrary that πΌπΏ > 0. Then,
πΏππ»π
ππ»πΏ βπππ»+1ππΉ β π½0 β πΏπ‘ππΏ
π‘=1 (1 β ππΏ)π‘β1ππΏπ¦π‘πΏ = 0,
and there must exist π¦π πΏ > 0 for some 1 β€ π β€ ππΏ. Then, we have πΌπ
πΏ = 0, which leads to a
contradiction since πβ
ππ¦π‘πΏ = 0 cannot be satisfied unless πΌπΏ = 0.
Suppose to the contrary that πΌπ πΏ > 0 for some 1 β€ π β€ ππΏ. Then, πΌπΏ > 0, which leads to a
contradiction as we have just shown above. Q.E.D.
Case B: The (π°πͺπ―,π³) constraint is binding.
We will now show that when the (πΌπΆπ»,πΏ) becomes binding, there are restrictions on the
payment structure to the low type. Denoting by π = πππ»π» π
ππΏπΏ β π
ππΏπ» π
ππ»πΏ , we can re-write the
incentive compatibility constraints as:
π₯π»πΏππ»π = π½0 β πΏπ‘ππ»
π‘=1 [πππΏπ» (1 β ππΏ)π‘β1ππΏ β π
ππΏπΏ (1 β ππ»)π‘β1ππ»]π¦π‘
π»
+π½0 β πΏπ‘ππΏ
π‘=1 [πππΏπΏ (1 β ππ»)π‘β1ππ» β π
ππΏπ» (1 β ππΏ)π‘β1ππΏ]π¦π‘
πΏ
+πππΏπ» ππΉ(πΏππ»
πππ»πΏ βπππ»+1 β πΏππΏ
πππΏπΏ βπππΏ+1), and
π₯πΏπΏππΏπ = π½0 β πΏπ‘ππ»
π‘=1 [πππ»π» (1 β ππΏ)π‘β1ππΏ β π
ππ»πΏ (1 β ππ»)π‘β1ππ»]π¦π‘
π»
+π½0 β πΏπ‘ππΏ
π‘=1 [πππ»πΏ (1 β ππ»)π‘β1ππ» β π
ππ»π» (1 β ππΏ)π‘β1ππΏ]π¦π‘
πΏ
+πππ»πΏ ππΉ(πΏππ»
πππ»π» βπππ»+1 β πΏππΏ
πππΏπ» βπππΏ+1).
39
First, we consider the case when π β 0. This is when the likelihood ratio of reaching the
last period of the experimentation stage is different for both types i.e., when πππ»π»
πππ»πΏ β
πππΏπ»
πππΏπΏ (Case
B.1). We showed in Lemma 1 that there exists a time threshold οΏ½ΜοΏ½πΏ such that if type π» claims to
be type πΏ, he is more likely to fail (resp. succeed) than type πΏ if the experimentation stage is
longer (resp. shorter) than οΏ½ΜοΏ½πΏ. In Lemma 2 we prove that, if the principal rewards success, it is
at most once. In Lemma 3, we establish that the high type is never rewarded for failure. In
Lemma 4, we prove that the low type is rewarded for failure if and only if ππΏ β€ οΏ½ΜοΏ½πΏ and, in
Lemma 5, that he is rewarded for the very last success if ππΏ > οΏ½ΜοΏ½πΏ. In Lemma 6, we prove that
οΏ½ΜοΏ½πΏ > ππΏ(<) for high (small) values of πΎ. Therefore, if the cost of experimentation is large ( πΎ >
πΎβ), the principal must reward the low type after failure. If the cost of experimentation is small
(πΎ < πΎβ ) , the principal must reward the low type after late success (last period). We also show
that the high type may be rewarded only for the very first success.
Finally, we analyze the case when πππ»π»
πππ»πΏ =
πππΏπ»
πππΏπΏ (Case B.2). In this case, the likelihood ratio
of reaching the last period of the experimentation stage is the same for both types and π₯π» and π₯πΏ
cannot be used as screening variables. Therefore, the principal must reward both types for
success and she chooses ππΏ > οΏ½ΜοΏ½πΏ.
Case B.1: π = πππ»π» π
ππΏπΏ β π
ππΏπ» π
ππ»πΏ β 0.
Then π₯π» and π₯πΏ can be expressed as functions of {π¦π‘π»}π‘=1
ππ», {π¦π‘
πΏ}π‘=1ππΏ
, ππ», ππΏ only from the
binding (πΌπΆπ»,πΏ) and (πΌπΆπΏ,π»). The principalβs optimization problem is to choose ππ», {π¦π‘π»}π‘=1
ππ»,
ππΏ, {π¦π‘πΏ}π‘=1
ππΏ to
πππ₯ πΈπ {Ξ©π(ππ) β πΏππ
ππππ π₯π({π¦π‘
π»}π‘=1ππ»
, {π¦π‘πΏ}π‘=1
ππΏ, ππ» , ππΏ)
βπ½0 β πΏπ‘ππ
π‘=1 (1 β ππ)π‘β1
πππ¦π‘π
} subject to
(πΌπ ππ‘π) π¦π‘
π β₯ 0 for π‘ β€ ππ,
(πΌπ πΉππ) π₯π({π¦π‘π»}π‘=1
ππ», {π¦π‘
πΏ}π‘=1ππΏ
, ππ», ππΏ) β₯ 0 for π = π», πΏ.
Labeling {πΌπ‘π»}π‘=1
ππ», {πΌπ‘
πΏ}π‘=1ππΏ
, ππ» and ππΏ as the Lagrange multipliers of the constraints
associated with (πΌπ ππ‘π»), (πΌπ ππ‘
πΏ), (πΌπ πΉππ») and (πΌπ πΉππΏ) respectively, the Lagrangian is:
40
β = πΈπ {Ξ©π(ππ) β π½0 β πΏπ‘ππ
π‘=1 (1 β ππ)π‘β1
πππ¦π‘π β
πΏπππ
πππ π₯π({π¦π‘
π»}π‘=1ππ»
, {π¦π‘πΏ}π‘=1
ππΏ, ππ» , ππΏ)}
+βπΌπ‘π»π¦π‘
π»
ππ»
π‘=1
+ βπΌπ‘πΏπ¦π‘
πΏ
ππΏ
π‘=1
+ ππ»π₯π»({π¦π‘π»}π‘=1
ππ», {π¦π‘
πΏ}π‘=1ππΏ
, ππ», ππΏ)
+ππΏπ₯πΏ({π¦π‘π»}π‘=1
ππ», {π¦π‘
πΏ}π‘=1ππΏ
, ππ», ππΏ).
We assumed that ππΏ > 0 and ππ» > 0. The Kuhn-Tucker conditions with respect to π¦π‘π»
and π¦π‘πΏ are:
πβ
ππ¦π‘π» = βπ {π½0πΏ
π‘(1 β ππ»)π‘β1ππ» + πΏππ»π
ππ»π»
π½0πΏπ‘[π
ππΏπ» (1 β ππΏ)π‘β1ππΏ β π
ππΏπΏ (1 β ππ»)π‘β1ππ»]
πΏππ»(πππ»
π» πππΏπΏ β πππΏ
π» πππ»πΏ )
}
β(1 β π)πΏππΏπ
ππΏπΏ
π½0πΏπ‘[π
ππ»π» (1 β ππΏ)π‘β1ππΏ β π
ππ»πΏ (1 β ππ»)π‘β1ππ»]
πΏππΏ(πππ»
π» πππΏπΏ β πππΏ
π» πππ»πΏ )
+ πΌπ‘π»
+ππ»π½0πΏπ‘[π
ππΏπ» (1βππΏ)
π‘β1ππΏβπ
ππΏπΏ (1βππ»)
π‘β1ππ»]
πΏππ»(π
ππ»π» π
ππΏπΏ βπ
ππΏπ» π
ππ»πΏ )
+ ππΏπ½0πΏπ‘[π
ππ»π» (1βππΏ)
π‘β1ππΏβπ
ππ»πΏ (1βππ»)
π‘β1ππ»]
πΏππΏ(π
ππ»π» π
ππΏπΏ βπ
ππΏπ» π
ππ»πΏ )
;
πβ
ππ¦π‘πΏ = β(1 β π) {π½0πΏ
π‘(1 β ππΏ)π‘β1ππΏ + πΏππΏπ
ππΏπΏ
π½0πΏπ‘[π
ππ»πΏ (1 β ππ»)π‘β1ππ» β π
ππ»π» (1 β ππΏ)π‘β1ππΏ]
πΏππΏ(πππ»
π» πππΏπΏ β πππΏ
π» πππ»πΏ )
}
βππΏππ»π
ππ»π»
π½0πΏπ‘[π
ππΏπΏ (1 β ππ»)π‘β1ππ» β π
ππΏπ» (1 β ππΏ)π‘β1ππΏ]
πΏππ»(πππ»
π» πππΏπΏ β πππΏ
π» πππ»πΏ )
+ πΌπ‘πΏ
+ππ»π½0πΏπ‘[π
ππΏπΏ (1βππ»)
π‘β1ππ»βπ
ππΏπ» (1βππΏ)
π‘β1ππΏ]
πΏππ»(π
ππ»π» π
ππΏπΏ βπ
ππΏπ» π
ππ»πΏ )
+ ππΏπ½0πΏπ‘[π
ππ»πΏ (1βππ»)
π‘β1ππ»βπ
ππ»π» (1βππΏ)
π‘β1ππΏ]
πΏππΏ(π
ππ»π» π
ππΏπΏ βπ
ππΏπ» π
ππ»πΏ )
.
We can rewrite the Kuhn-Tucker conditions above as follows:
(π¨π) πβ
ππ¦π‘π» =
π½0πΏπ‘
π[π
ππ»π» π1(π‘) [ππ
ππΏπ» + (1 β π)π
ππΏπΏ β
ππΏ
πΏππΏ] +ππ»
πΏππ» πππΏπΏ π2(π‘) +
πΌπ‘π»π
π½0πΏπ‘] = 0,
(π¨π) πβ
ππ¦π‘πΏ =
π½0πΏπ‘
π[π
ππΏπΏ π2(π‘) [ππ
ππ»π» + (1 β π)π
ππ»πΏ β
ππ»
πΏππ»] +ππΏ
πΏππΏ πππ»π» π1(π‘) +
πΌπ‘πΏπ
π½0πΏπ‘] = 0,
where
π1(π‘, ππ») =
πππ»πΏ
πππ»π» (1 β ππ»)π‘β1ππ» β (1 β ππΏ)π‘β1ππΏ, and
π2(π‘, ππΏ) =
πππΏπ»
πππΏπΏ (1 β ππΏ)π‘β1ππΏ β (1 β ππ»)π‘β1ππ».
Next, we show that the principal will not commit to reward success in two different
periods for either type (the principal will reward success in at most one period).
Lemma 2. There exists at most one time period 1 β€ π β€ ππΏ such that π¦ππΏ > 0 and at most one
time period 1 β€ π β€ ππ» such that π¦π π» > 0.
41
Proof: Assume to the contrary that there are two distinct periods 1 β€ π,π β€ ππΏ such that π β π
and π¦ππΏ, π¦π
πΏ > 0. Then from the Kuhn-Tucker conditions (π΄1) and (π΄2) it follows that
πππΏπΏ π2(π, ππΏ) [ππ
ππ»π» + (1 β π)π
ππ»πΏ β
ππ»
πΏππ»] +ππΏ
πΏππΏ πππ»π» π1(π, ππ») = 0,
and, in addition, πππΏπΏ π2(π, ππΏ) [ππ
ππ»π» + (1 β π)π
ππ»πΏ β
ππ»
πΏππ»] +ππΏ
πΏππΏ πππ»π» π1(π, ππ») = 0.
Thus, π2(π,ππΏ)
π1(π,ππ»)=
π2(π,ππΏ)
π1(π,ππ»), which can be rewritten as follows:
(πππΏπ» (1 β ππΏ)πβ1ππΏ β π
ππΏπΏ (1 β ππ»)πβ1ππ»)(π
ππ»πΏ (1 β ππ»)πβ1ππ» β π
ππ»π» (1 β ππΏ)πβ1ππΏ)
= (πππΏπ» (1 β ππΏ)πβ1ππΏ β π
ππΏπΏ (1 β ππ»)πβ1ππ»)(π
ππ»πΏ (1 β ππ»)πβ1ππ» β π
ππ»π» (1 β ππΏ)πβ1ππΏ),
π[(1 β ππ»)πβ1(1 β ππΏ)πβ1 β (1 β ππΏ)πβ1(1 β ππ»)πβ1] = 0,
(1 β ππΏ)πβπ(1 β ππ»)πβπ = 1,
(1βππΏ
1βππ»)πβπ
= 1, which implies that π = π and we have a contradiction.
Following similar steps, one could show that there exists at most one time period 1 β€ π β€ ππ»
such that π¦π π» > 0. Q.E.D.
For later use, we prove the following claim:
Claim B.1. ππΏ
πΏππΏ β ππππΏπ» + (1 β π)π
ππΏπΏ and
ππ»
πΏππ» β ππππ»π» + (1 β π)π
ππ»πΏ .
Proof: By contradiction. Suppose ππΏ
πΏππΏ = ππππΏπ» + (1 β π)π
ππΏπΏ . Then combining conditions (π΄1)
and (π΄2) we have
πππΏπΏ π2(π‘, π
πΏ)[ππππ»π» + (1 β π)π
ππ»πΏ ] +
ππΏ
πΏππΏ πππ»π» π1(π‘, π
π»)
= (πππΏπ» (1 β ππΏ)π‘β1ππΏ β π
ππΏπΏ (1 β ππ»)π‘β1ππ»)[ππ
ππ»π» + (1 β π)π
ππ»πΏ ]
+(πππ»πΏ (1 β ππ»)π‘β1ππ» β π
ππ»π» (1 β ππΏ)π‘β1ππΏ)[ππ
ππΏπ» + (1 β π)π
ππΏπΏ ]
= βπ((1 β π)(1 β ππΏ)π‘β1ππΏ + π(1 β ππ»)π‘β1ππ»),
which implies that βπ((1 β π)(1 β ππΏ)π‘β1ππΏ + π(1 β ππ»)π‘β1ππ») +πΌπ‘
πΏπ
π½0πΏπ‘ = 0 for 1 β€ π‘ β€ ππΏ.
Thus, πΌπ‘
πΏ
π½0πΏπ‘ = (1 β π)(1 β ππΏ)π‘β1ππΏ + π(1 β ππ»)π‘β1ππ» > 0 for 1 β€ π‘ β€ ππΏ, which leads
to a contradiction since then π₯πΏ = π¦π‘πΏ = 0 for 1 β€ π‘ β€ ππΏ which implies that the low type does
not receive any rent.
Next, assume ππ»
πΏππ» = ππππ»π» + (1 β π)π
ππ»πΏ . Then combining conditions (π΄1) and (π΄2) gives
πππ»π» π1(π‘, π
π»)[ππππΏπ» + (1 β π)π
ππΏπΏ ] +
ππ»
πΏππ» πππΏπΏ π2(π‘, π
πΏ)
= (πππ»πΏ (1 β ππ»)π‘β1ππ» β π
ππ»π» (1 β ππΏ)π‘β1ππΏ)[ππ
ππΏπ» + (1 β π)π
ππΏπΏ ]
+(πππΏπ» (1 β ππΏ)π‘β1ππΏ β π
ππΏπΏ (1 β ππ»)π‘β1ππ»)[ππ
ππ»π» + (1 β π)π
ππ»πΏ ]
= βπ((1 β π)(1 β ππΏ)π‘β1ππΏ + π(1 β ππ»)π‘β1ππ»),
which implies that βπ((1 β π)(1 β ππΏ)π‘β1ππΏ + π(1 β ππ»)π‘β1ππ») +πΌπ‘
π»π
π½0πΏπ‘ = 0 for 1 β€ π‘ β€ ππ».
42
Then πΌπ‘
π»
π½0πΏπ‘ = (1 β π)(1 β ππΏ)π‘β1ππΏ + π(1 β ππ»)π‘β1ππ» > 0 for 1 β€ π‘ β€ ππ», which leads to a
contradiction since then π₯π» = π¦π‘π» = 0 for 1 β€ π‘ β€ ππ» (which implies that the high type does not
receive any rent and we are back in Case A.) Q.E.D.
Now we prove that the high type may be only rewarded for success. Although the proof
is long, the result should appear intuitive: Rewarding high type for failure will only exacerbates
the problem as the low type is always relatively more optimistic in case he lies and
experimentation fails.
Lemma 3: The high type is not rewarded for failure, i.e., π₯π» = 0.
Proof: By contradiction. We consider separately Case (a) ππ» = ππΏ = 0, and Case (b) ππ» = 0 and
ππΏ > 0.
Case (a): Suppose that ππ» = ππΏ = 0, i.e., the (πΌπ πΉππ»π» ) and (πΌπ πΉ
ππΏπΏ ) constraints are not binding.
We can rewrite the Kuhn-Tucker conditions (π΄1) and (π΄2) as follows:
πβ
ππ¦π‘π» =
π½0πΏπ‘
π[π
ππ»π» π1(π‘, π
π»)[ππππΏπ» + (1 β π)π
ππΏπΏ ] +
πΌπ‘π»π
π½0πΏπ‘] = 0 for 1 β€ π‘ β€ ππ»;
πβ
ππ¦π‘πΏ =
π½0πΏπ‘
π[π
ππΏπΏ π2(π‘, π
πΏ)[ππππ»π» + (1 β π)π
ππ»πΏ ] +
πΌπ‘πΏπ
π½0πΏπ‘] = 0 for 1 β€ π‘ β€ ππΏ.
Since π1(π‘, ππ») is strictly positive for all π‘ < οΏ½ΜοΏ½π» from π
ππ»π» π1(π‘, π
π»)[ππππΏπ» +
(1 β π)πππΏπΏ ] = β
πΌπ‘π»π
π½0πΏπ‘ it must be that πΌπ‘
π» > 0 for all π‘ < οΏ½ΜοΏ½π» and π < 0. In addition, since
π2(π‘, ππΏ) is strictly negative for π‘ < οΏ½ΜοΏ½πΏ from π
ππΏπΏ π2(π‘, π
πΏ)[ππππ»π» + (1 β π)π
ππ»πΏ ] = β
πΌπ‘πΏπ
π½0πΏπ‘ it must
be that that πΌπ‘πΏ > 0 for π‘ < οΏ½ΜοΏ½πΏ and π > 0, which leads to a contradiction37.
Case (b): Suppose that ππ» = 0 and ππΏ > 0, i.e., the (πΌπ πΉππ»π» ) constraint is not binding but
(πΌπ πΉππΏπΏ ) is binding.
We can rewrite the Kuhn-Tucker conditions (π΄1) and (π΄2) as follows:
πβ
ππ¦π‘π» =
π½0πΏπ‘
π[π
ππ»π» π1(π‘, π
π») [ππππΏπ» + (1 β π)π
ππΏπΏ β
ππΏ
πΏππΏ] +πΌπ‘
π»π
π½0πΏπ‘] = 0 for 1 β€ π‘ β€ ππ»;
πβ
ππ¦π‘πΏ =
π½0πΏπ‘
π[π
ππΏπΏ π2(π‘, π
πΏ)[ππππ»π» + (1 β π)π
ππ»πΏ ] +
ππΏ
πΏππΏ πππ»π» π1(π‘, π
π») +πΌπ‘
πΏπ
π½0πΏπ‘] = 0 for 1 β€ π‘ β€ ππΏ.
37 If there was a solution with ππ» = ππΏ = 0 then with necessity it would be possible only if ππ» and ππΏ are such that
it holds simultaneously πππ»π» π
ππΏπΏ β π
ππΏπ» π
ππ»πΏ > 0 and π
ππ»π» π
ππΏπΏ β π
ππΏπ» π
ππ»πΏ < 0, since the two conditions are mutually
exclusive the conclusion immediately follows. Recall that we assumed so far that π β 0; we study π = 0 in details
later in Case B.2.
43
If πΌπ π» = 0 for some 1 β€ π β€ ππ» then π
ππ»π» π1(π , π
π») [ππππΏπ» + (1 β π)π
ππΏπΏ β
ππΏ
πΏππΏ] = 0,
which implies that ππΏ
πΏππΏ = ππππΏπ» + (1 β π)π
ππΏπΏ 38. Since we rule out this possibility it immediately
follows that all πΌπ‘π» > 0 for all 1 β€ π‘ β€ ππ» which implies that π¦π‘
π» = 0 for 1 β€ π‘ β€ ππ».
Finally, from πππ»π» π1(π‘, π
π») [ππππΏπ» + (1 β π)π
ππΏπΏ β
ππΏ
πΏππΏ] = βπΌπ‘
π»π
π½0πΏπ‘ we conclude that ππ» β€
οΏ½ΜοΏ½π» and there can be one of two sub-cases:39 (b.1) π > 0 and ππΏ
πΏππΏ > ππππΏπ» + (1 β π)π
ππΏπΏ , or (b.2)
π < 0 and ππΏ
πΏππΏ < ππππΏπ» + (1 β π)π
ππΏπΏ . We consider each sub-case next.
Case (b.1): ππ» β€ οΏ½ΜοΏ½π», π > 0, ππΏ
πΏππΏ > ππππΏπ» + (1 β π)π
ππΏπΏ , ππ» = 0, πΌπ‘
π» > 0 for 1 β€ π‘ β€ ππ».
We know from Lemma 3 that there exists only one time period 1 β€ π β€ ππΏ such that
π¦ππΏ > 0 (πΌπ
πΏ = 0). This implies that
πππΏπΏ π2(π, π
πΏ)[ππππ»π» + (1 β π)π
ππ»πΏ ] +
ππΏ
πΏππΏ πππ»π» π1(π, π
π») = 0
and πππΏπΏ π2(π‘, π
πΏ)[ππππ»π» + (1 β π)π
ππ»πΏ ] +
ππΏ
πΏππΏ πππ»π» π1(π‘, π
π») = βπΌπ‘
πΏπ
π½0πΏπ‘ < 0 for 1 β€ π‘ β π β€ ππΏ.
Alternatively, π2(π‘, ππΏ) <
π1(π‘,ππ»)
π1(π,ππ»)π2(π, π
πΏ) for 1 β€ π‘ β π β€ ππΏ.
If π1(π, ππ») > 0 (π < οΏ½ΜοΏ½π») then
(πππΏπ» (1 β ππΏ)π‘β1ππΏ β π
ππΏπΏ (1 β ππ»)π‘β1ππ»)(π
ππ»πΏ (1 β ππ»)πβ1ππ» β π
ππ»π» (1 β ππΏ)πβ1ππΏ)
< (πππΏπ» (1 β ππΏ)πβ1ππΏ β π
ππΏπΏ (1 β ππ»)πβ1ππ»)(π
ππ»πΏ (1 β ππ»)π‘β1ππ» β π
ππ»π» (1 β ππΏ)π‘β1ππΏ).
π[(1 β ππ»)π‘β1(1 β ππΏ)πβ1 β (1 β ππΏ)π‘β1(1 β ππ»)πβ1] < 0 for 1 β€ π‘ β π β€ ππΏ.
π [1 β (1βππΏ
1βππ»)π‘βπ
] < 0, which implies that π‘ > π for all 1 β€ π‘ β π β€ ππΏ or, equivalently, π = 1.
If π1(π, ππ») < 0 (π > οΏ½ΜοΏ½π») then the opposite must be true and π‘ < π for all 1 β€ π‘ β π β€ ππΏ or,
equivalently, π = ππΏ.
For π > οΏ½ΜοΏ½π» we have π1(π, ππ») < 0 and it follows that π
ππΏπΏ π2(π‘, π
πΏ)[ππππ»π» + (1 β π)π
ππ»πΏ ] +
ππΏ
πΏππΏ πππ»π» π1(π‘, π
π») < βπ((1 β π)(1 β ππΏ)π‘β1ππΏ + π(1 β ππ»)π‘β1ππ») < 0, which implies that
π¦ππΏ > 0 is only possible for π < οΏ½ΜοΏ½π». Thus, this case is only possible if π = 1.
Case (b.2): ππ» β€ οΏ½ΜοΏ½π», π < 0, ππΏ
πΏππΏ < ππππΏπ» + (1 β π)π
ππΏπΏ , ππ» = 0, πΌπ‘
π» > 0 for 1 β€ π‘ β€ ππ».
As in the previous case, from Lemma 3 it follows that there exists only one time period
1 β€ π β€ ππΏ such that π¦π πΏ > 0 (πΌπ
πΏ = 0). This implies that πππΏπΏ π2(π , π
πΏ)[ππππ»π» + (1 β π)π
ππ»πΏ ] +
38 If π = οΏ½ΜοΏ½π», then both π₯π» > 0 and π¦
οΏ½ΜοΏ½π»π» > 0 can be optimal.
39 If ππ» > οΏ½ΜοΏ½π» then there would be a contradiction since π1(π‘, ππ») must be of the same sign for all π‘ β€ ππ» .
44
ππΏ
πΏππΏ πππ»π» π1(π , π
π») = 0 and πππΏπΏ π2(π‘, π
πΏ)[ππππ»π» + (1 β π)π
ππ»πΏ ] +
ππΏ
πΏππΏ πππ»π» π1(π‘, π
π») = βπΌπ‘
πΏπ
π½0πΏπ‘ > 0
for 1 β€ π‘ β π β€ ππΏ. Alternatively, π2(π‘, ππΏ) >
π1(π‘,ππ»)
π1(π ,ππ»)π2(π , π
πΏ).
If π1(π , ππ») > 0 (π < οΏ½ΜοΏ½π») then π2(π‘, π
πΏ)π1(π , ππ») > π1(π‘, π
π»)π2(π , ππΏ)
(πππΏπ» (1 β ππΏ)π‘β1ππΏ β π
ππΏπΏ (1 β ππ»)π‘β1ππ»)(π
ππ»πΏ (1 β ππ»)π β1ππ» β π
ππ»π» (1 β ππΏ)π β1ππΏ)
> (πππΏπ» (1 β ππΏ)π β1ππΏ β π
ππΏπΏ (1 β ππ»)π β1ππ»)(π
ππ»πΏ (1 β ππ»)π‘β1ππ» β π
ππ»π» (1 β ππΏ)π‘β1ππΏ).
π [1 β (1βππΏ
1βππ»)
π βπ‘
] < 0, which implies that π‘ > π for all1 β€ π‘ β π β€ ππΏ or, equivalently, π = 1.
If π1(π , ππ») < 0 (π > οΏ½ΜοΏ½π») then the opposite must be true and π‘ < π for all 1 β€ π‘ β π β€
ππΏ or, equivalently, π = ππΏ.
For π‘ > οΏ½ΜοΏ½π» it follows that πππΏπΏ π2(π‘, π
πΏ)[ππππ»π» + (1 β π)π
ππ»πΏ ] +
ππΏ
πΏππΏ πππ»π» π1(π‘, π
π»)
> βπ((1 β π)(1 β ππΏ)π‘β1ππΏ + π(1 β ππ»)π‘β1ππ») > 0, which implies that π¦π πΏ > 0 is only
possible for π < οΏ½ΜοΏ½π», which is only possible if π = 1.
For both cases we just considered, we have
π₯π» =π½0πΏπ
ππΏπΏ (βπ2(1,ππΏ))π¦1
πΏ
πΏππ»π
+ ππΉ
πππΏπ» (πΏππ»
πππ»πΏ βπ
ππ»+1βπΏππΏ
πππΏπΏ βπ
ππΏ+1)
πΏππ»π
β₯ 0;
π₯πΏ =π½0πΏπ
ππ»π» π1(1,ππ»)π¦1
πΏ
πΏππΏπ
+ ππΉ
πππ»πΏ (πΏππ»
πππ»π» βπ
ππ»+1βπΏππΏ
πππΏπ» βπ
ππΏ+1)
πΏππΏπ
= 0.
Note that Case B.2 is possible only if πΏππ»π
ππ»πΏ βπππ»+1ππΉ β πΏππΏ
πππΏπΏ βπππΏ+1ππΉ > 040. This
fact together with π₯π» β₯ 0 implies that π > 0. Since π1(1, ππ») > 0, π₯πΏ = 0 is possible only if
πΏππ»π
ππ»π» βπππ»+1ππΉ β πΏππΏ
πππΏπ» βπππΏ+1ππΉ < 0. However, πΏππ»
πππ»πΏ βπππ»+1π
π»(πππ»+1π» ) >
πΏππΏπ
ππΏπΏ βπππΏ+1ππΉ implies that πΏππ»
πππ»π» βπππ»+1ππΉ > πΏππΏ π
ππ»π»
πππ»πΏ π
ππΏπΏ βπππΏ+1ππΉ. Note that π
ππ»π» π
ππΏπΏ β
πππΏπ» π
ππ»πΏ > 0 implies
πππ»π»
πππ»πΏ π
ππΏπΏ > π
ππΏπ» , and then πΏππ»
πππ»π» βπππ»+1ππΉ > πΏππΏ
πππΏπ» βπππΏ+1ππΉ, which
implies π₯πΏ > 0 and we have a contradiction. Thus, ππ» > 0 and the high type gets rent only after
success (π₯π» = 0). Q.E.D.
We now prove that the low type is rewarded for failure only if the duration of the
experimentation stage for the low type, ππΏ, is relatively short: ππΏ β€ οΏ½ΜοΏ½πΏ.
Lemma 4. ππΏ = 0 β ππΏ β€ οΏ½ΜοΏ½πΏ, πΌπ‘πΏ > 0 for π‘ β€ ππΏ (it is optimal to set π₯πΏ > 0, π¦π‘
πΏ = 0 for π‘ β€
ππΏ) and πΌπ‘π» > 0 for all π‘ > 1 and πΌ1
π» = 0 (it is optimal to set π₯π» = 0, π¦π‘π» = 0 for all π‘ > 1 and
π¦1π» > 0).
40 Otherwise the (πΌπΆπ»,πΏ) is not binding.
45
Proof: Suppose that ππΏ = 0, i.e., the (πΌπ πΉππΏπΏ ) constraint is not binding. We can rewrite the Kuhn-
Tucker conditions (π΄1) and (π΄2) as follows:
πβ
ππ¦π‘π» =
π½0πΏπ‘
π[π
ππ»π» π1(π‘, π
π»)[ππππΏπ» + (1 β π)π
ππΏπΏ ] +
ππ»
πΏππ» πππΏπΏ π2(π‘, π
πΏ) +πΌπ‘
π»π
π½0πΏπ‘] = 0 for 1 β€ π‘ β€ ππ»;
πβ
ππ¦π‘πΏ =
π½0πΏπ‘
π[π
ππΏπΏ π2(π‘, π
πΏ) [ππππ»π» + (1 β π)π
ππ»πΏ β
ππ»
πΏππ»] +πΌπ‘
πΏπ
π½0πΏπ‘] = 0 for 1 β€ π‘ β€ ππΏ.
If πΌπ πΏ = 0 for some 1 β€ π β€ ππΏ then π
ππΏπΏ π2(π‘, π
πΏ) [ππππ»π» + (1 β π)π
ππ»πΏ β
ππ»
πΏππ»] = 0,
which implies that ππ»
πΏππ» = ππππ»π» + (1 β π)π
ππ»πΏ 41. Since we already rule out this possibility it
immediately follows that πΌπ‘πΏ > 0 for all 1 β€ π‘ β€ ππΏ which implies that π¦π‘
πΏ = 0 for 1 β€ π‘ β€ ππΏ.
Finally, πππΏπΏ π2(π‘, π
πΏ) [ππππ»π» + (1 β π)π
ππ»πΏ β
ππ»
πΏππ»] = βπΌπ‘
πΏπ
π½0πΏπ‘ for 1 β€ π‘ β€ ππΏ and we
conclude that ππΏ β€ οΏ½ΜοΏ½πΏ and there can be one of two sub-cases:42 (a) π > 0 and
ππ»
πΏππ» < ππππ»π» +
(1 β π)πππ»πΏ , or (b) π < 0 and
ππ»
πΏππ» > ππππ»π» + (1 β π)π
ππ»πΏ . We consider each sub-case next.
Case (a): ππΏ β€ οΏ½ΜοΏ½πΏ, π > 0,
ππ»
πΏππ» < ππππ»π» + (1 β π)π
ππ»πΏ , ππΏ = 0, πΌπ‘
πΏ > 0 for 1 β€ π‘ β€ ππΏ.
From Lemma 2, there exists only one time period 1 β€ π β€ ππ» such that π¦ππ» > 0 (πΌπ
π» =
0). This implies that
πππ»π» π1(π, π
π»)[ππππΏπ» + (1 β π)π
ππΏπΏ ] +
ππ»
πΏππ» πππΏπΏ π2(π, π
πΏ) = 0 and
πππ»π» π1(π‘, π
π»)[ππππΏπ» + (1 β π)π
ππΏπΏ ] +
ππ»
πΏππ» πππΏπΏ π2(π‘, π
πΏ) = βπΌπ‘
π»π
π½0πΏπ‘ < 0 for 1 β€ π‘ β π β€ ππ».
Alternatively, π1(π‘, ππ») <
π1(π,ππ»)
π2(π,ππΏ)π2(π‘, π
πΏ) for 1 β€ π‘ β π β€ ππ».
If π2(π, ππΏ) > 0 (π > οΏ½ΜοΏ½
πΏ) then π1(π‘, π
π»)π2(π, ππΏ) < π1(π, π
π»)π2(π‘, ππΏ)
(πππΏπ» (1 β ππΏ)πβ1ππΏ β π
ππΏπΏ (1 β ππ»)πβ1ππ»)(π
ππ»πΏ (1 β ππ»)π‘β1ππ» β π
ππ»π» (1 β ππΏ)π‘β1ππΏ)
< (πππΏπ» (1 β ππΏ)π‘β1ππΏ β π
ππΏπΏ (1 β ππ»)π‘β1ππ»)(π
ππ»πΏ (1 β ππ»)πβ1ππ» β π
ππ»π» (1 β ππΏ)πβ1ππΏ),
π [1 β (1βππΏ
1βππ»)
πβπ‘
] < 0,
which implies that π‘ < π for all 1 β€ π‘ β π β€ ππ» or, equivalently, π = ππ».
If π2(π, ππΏ) < 0 (π < οΏ½ΜοΏ½
πΏ) then the opposite must be true and π‘ > π for all 1 β€ π‘ β π β€ ππ»
or, equivalently, π = 1.
For π‘ > οΏ½ΜοΏ½πΏ it follows that π
ππ»π» π1(π‘, π
π»)[ππππΏπ» + (1 β π)π
ππΏπΏ ] +
ππ»
πΏππ» πππΏπΏ π2(π‘, π
πΏ)
< βπ((1 β π)(1 β ππΏ)π‘β1ππΏ + π(1 β ππ»)π‘β1ππ») < 0, which implies that π¦ππ» > 0 is only
possible for π < οΏ½ΜοΏ½πΏ
and we have π = 1.
Case (b): ππΏ β€ οΏ½ΜοΏ½πΏ, π < 0,
ππ»
πΏππ» > ππππ»π» + (1 β π)π
ππ»πΏ , ππΏ = 0, πΌπ‘
πΏ > 0 for 1 β€ π‘ β€ ππΏ.
41 If π‘ = οΏ½ΜοΏ½πΏ, then both π₯πΏ > 0 and π¦
οΏ½ΜοΏ½πΏπΏ > 0 can be optimal.
42 If ππΏ > οΏ½ΜοΏ½πΏ, then there would be a contradiction since π2(π‘, ππΏ) must be of the same sign for all π‘ β€ ππΏ ..
46
From Lemma 2, there exists only one time period 1 β€ π β€ ππ» such that π¦ππ» > 0 (πΌπ
π» =
0). This implies that
πππ»π» π1(π, π
π»)[ππππΏπ» + (1 β π)π
ππΏπΏ ] +
ππ»
πΏππ» πππΏπΏ π2(π, π
πΏ) = 0 and
πππ»π» π1(π‘, π
π»)[ππππΏπ» + (1 β π)π
ππΏπΏ ] +
ππ»
πΏππ» πππΏπΏ π2(π‘, π
πΏ) = βπΌπ‘
π»π
π½0πΏπ‘> 0 for 1 β€ π‘ β π β€ ππ».
Alternatively, π1(π‘, ππ») >
π1(π,ππ»)
π2(π,ππΏ)π2(π‘, π
πΏ) for 1 β€ π‘ β π β€ ππ».
If π2(π, ππΏ) > 0 (π > οΏ½ΜοΏ½
πΏ) then π [1 β (
1βππΏ
1βππ»)π‘βπ
] < 0, which implies that π‘ < π for all
1 β€ π‘ β π β€ ππ» or, equivalently, π = ππ».
If π2(π, ππΏ) < 0 (π < οΏ½ΜοΏ½πΏ) then the opposite must be true and π‘ > π for all 1 β€ π‘ β π β€ ππ»
or, equivalently, π = 1.
For π‘ > οΏ½ΜοΏ½πΏ (π2(π‘, ππΏ) > 0) it follows that
πππ»π» π1(π‘, π
π»)[ππππΏπ» + (1 β π)π
ππΏπΏ ] +
ππ»
πΏππ» πππΏπΏ π2(π‘, π
πΏ)
> βπ((1 β π)(1 β ππΏ)π‘β1ππΏ + π(1 β ππ»)π‘β1ππ») > 0,
which implies that π¦ππ» > 0 is only possible for π < οΏ½ΜοΏ½
πΏ and we have π = 1.
If ππΏ < οΏ½ΜοΏ½πΏ, from the binding incentive compatibility constraints, we derive the optimal
payments:
π¦1π» = ππΉ
πππΏπ» (πΏππΏ
πππΏπΏ βπ
ππΏ+1βπΏππ»
πππ»πΏ βπ
ππ»+1)
π½0πΏπππΏπΏ π2(1,ππΏ)
β₯ 0; π₯πΏ =πΏππΏ
ππΏπππΏπ» βπ
ππΏ+1βπΏππ»
ππ»πππ»πΏ βπ
ππ»+1
πΏππΏπππΏπΏ π2(1,ππΏ)
> 0.
Q.E.D.
We now prove that the low type is rewarded for success only if the duration of the
experimentation stage for the low type, ππΏ, is relatively long: ππΏ > οΏ½ΜοΏ½πΏ.
Lemma 5: ππΏ > 0 β ππΏ > οΏ½ΜοΏ½πΏ, πΌπ‘πΏ > 0 for π‘ < ππΏ, πΌ
ππΏπΏ = 0 and πΌπ‘
π» > 0 for π‘ > 1, πΌ1π» = 0 (it is
optimal to set π₯πΏ = 0, π¦π‘πΏ = 0 for π‘ < ππΏ, π¦
ππΏπΏ > 0 and π₯π» = 0, π¦π‘
π» = 0 for π‘ > 1, π¦1π» > 0)
Proof: Suppose that ππΏ > 0, i.e., the (πΌπ πΉππΏ) constraint is binding. We can rewrite the Kuhn-
Tucker conditions (π΄1) and (π΄2) as follows:
[πππ»π» π1(π‘, π
π») [ππππΏπ» + (1 β π)π
ππΏπΏ β
ππΏ
πΏππΏ] +ππ»
πΏππ» πππΏπΏ π2(π‘, π
πΏ) +πΌπ‘
π»π
π½0πΏπ‘] = 0 for 1 β€ π‘ β€ ππ»;
[πππΏπΏ π2(π‘, π
πΏ) [ππππ»π» + (1 β π)π
ππ»πΏ β
ππ»
πΏππ»] +ππΏ
πΏππΏ πππ»π» π1(π‘, π
π») +πΌπ‘
πΏπ
π½0πΏπ‘] = 0 for 1 β€ π‘ β€ ππΏ.
Claim: If both types are rewarded for success, it must be at extreme time periods, i.e. only at the
last or the first period of the experimentation stage.
Proof: Since (See Lemma 2) there exists only one time period 1 β€ π β€ ππΏ such that π¦ππΏ > 0
(πΌππΏ = 0) it follows that
47
πππΏπΏ π2(π, π
πΏ) [ππππ»π» + (1 β π)π
ππ»πΏ β
ππ»
πΏππ»] +ππΏ
πΏππΏ πππ»π» π1(π, π
π») = 0 and
πππΏπΏ π2(π‘, π
πΏ) [ππππ»π» + (1 β π)π
ππ»πΏ β
ππ»
πΏππ»] +ππΏ
πΏππΏ πππ»π» π1(π‘, π
π») = βπΌπ‘
πΏπ
π½0πΏπ‘ for 1 β€ π‘ β π β€ ππΏ.
Alternatively, ππΏ
πΏππΏ [π1(π‘, ππ») β
π2(π‘,ππΏ)π1(π,ππ»)
π2(π,ππΏ)] = β
πΌπ‘πΏπ
π½0πΏπ‘πππ»π» for 1 β€ π‘ β π β€ ππΏ.
Suppose π > 0. Then π1(π‘, ππ») β
π2(π‘,ππΏ)π1(π,ππ»)
π2(π,ππΏ)< 0 for 1 β€ π‘ β π β€ ππΏ.
If π2(π, ππΏ) > 0 (π > οΏ½ΜοΏ½πΏ) then π [1 β (
1βππΏ
1βππ»)πβπ‘
] < 0 which implies 1 β (1βππΏ
1βππ»)πβπ‘
< 0 or,
equivalently, π > π‘ for 1 β€ π‘ β π β€ ππΏ which implies that π = ππΏ > οΏ½ΜοΏ½πΏ.
If π2(π, ππΏ) < 0 (π < οΏ½ΜοΏ½πΏ) then π [1 β (
1βππΏ
1βππ»)πβπ‘
] > 0 which implies 1 β (1βππΏ
1βππ»)πβπ‘
> 0 or,
equivalently, π < π‘ for 1 β€ π‘ β π β€ ππΏ which implies that π = 1.
Suppose π < 0. Then π1(π‘, ππ») β
π2(π‘,ππΏ)π1(π,ππ»)
π2(π,ππΏ)> 0 for 1 β€ π‘ β π β€ ππΏ.
If π2(π, ππΏ) > 0 (π > οΏ½ΜοΏ½πΏ) then π [1 β (
1βππΏ
1βππ»)πβπ‘
] > 0 which implies 1 β (1βππΏ
1βππ»)πβπ‘
< 0 or,
equivalently, π > π‘ for 1 β€ π‘ β π β€ ππΏ which implies that π = ππΏ > οΏ½ΜοΏ½πΏ.
If π2(π, ππΏ) < 0 (π < οΏ½ΜοΏ½πΏ) then π [1 β (
1βππΏ
1βππ»)πβπ‘
] < 0 which implies 1 β (1βππΏ
1βππ»)πβπ‘
> 0 or,
equivalently, π < π‘ for 1 β€ π‘ β π β€ ππΏ which implies that π = 1.
Since (from Lemma 2) there exists only one time period 1 β€ π β€ ππ» such that π¦π π» > 0
(πΌπ π» = 0) it follows that
πππ»π» π1(π , π
π») [ππππΏπ» + (1 β π)π
ππΏπΏ β
ππΏ
πΏππΏ] +ππ»
πΏππ» πππΏπΏ π2(π , π
πΏ) = 0,
πππ»π» π1(π‘, π
π») [ππππΏπ» + (1 β π)π
ππΏπΏ β
ππΏ
πΏππΏ] +ππ»
πΏππ» πππΏπΏ π2(π‘, π
πΏ) = βπΌπ‘
π»π
π½0πΏπ‘ < 0 for 1 β€ π‘ β π β€ ππ».
Alternatively, ππ»
πΏππ» [π2(π‘, ππΏ) β
π2(π ,ππΏ)π1(π‘,ππ»)
π1(π ,ππ»)] = β
πΌπ‘π»π
π½0πΏπ‘πππΏπΏ for 1 β€ π‘ β π β€ ππ».
Suppose π > 0. Then π2(π‘, ππΏ) β
π2(π ,ππΏ)π1(π‘,ππ»)
π1(π ,ππ»)< 0 for 1 β€ π‘ β π β€ ππ».
If π1(π , ππ») > 0 (π < οΏ½ΜοΏ½π») then π [1 β (
1βππΏ
1βππ»)π‘βπ
] < 0 which implies 1 β (1βππΏ
1βππ»)π‘βπ
< 0 or,
equivalently, π‘ > π for 1 β€ π‘ β π β€ ππ» which implies that π = 1.
If π1(π , ππ») < 0 (π > οΏ½ΜοΏ½π») then π [1 β (
1βππΏ
1βππ»)π‘βπ
] > 0 which implies 1 β (1βππΏ
1βππ»)π‘βπ
> 0 or,
equivalently, π‘ < π for 1 β€ π‘ β π β€ ππ» which implies that π = ππ» > οΏ½ΜοΏ½π».
Suppose π < 0. Then π2(π‘, ππΏ) β
π2(π ,ππΏ)π1(π‘,ππ»)
π1(π ,ππ»)> 0 for 1 β€ π‘ β π β€ ππ».
If π1(π , ππ») > 0 (π < οΏ½ΜοΏ½π») then π [1 β (
1βππΏ
1βππ»)π‘βπ
] > 0 which implies 1 β (1βππΏ
1βππ»)π‘βπ
< 0 or,
equivalently, π‘ > π for 1 β€ π‘ β π β€ ππ» which implies that π = 1.
48
If π1(π , ππ») < 0 (π > οΏ½ΜοΏ½π») then π [1 β (
1βππΏ
1βππ»)π‘βπ
] < 0 which implies 1 β (1βππΏ
1βππ»)π‘βπ
> 0 or,
equivalently, π‘ < π for 1 β€ π‘ β π β€ ππ» which implies that π = ππ» > οΏ½ΜοΏ½π». Q.E.D.
The Lagrange multipliers are uniquely determined from (π΄1) and (π΄2) as follows:
ππΏ
πΏππΏ =π[π(1βππ»)
π β1ππ»+(1βπ)(1βππΏ)
π β1ππΏ]π2(π,ππΏ)
πππ»π» [π1(π,ππ»)π2(π ,ππΏ)βπ1(π ,ππ»)π2(π,ππΏ)]
> 0,
ππ»
πΏππ» =π[π(1βππ»)
πβ1ππ»+(1βπ)(1βππΏ)
πβ1ππΏ]π1(π ,ππ»)
πππΏπΏ [π1(π,ππ»)π2(π ,ππΏ)βπ1(π ,ππ»)π2(π,ππΏ)]
> 0,
which also implies that π2(π, ππΏ) and π1(π , π
π») must be of the same sign.
Assume π = ππ» > οΏ½ΜοΏ½π». Then π1(π , ππ») < 0 and the optimal contract involves
π₯π» =π½0πΏππ»
πππΏπΏ π2(ππ»,ππΏ)π¦
ππ»π» βπ½0πΏπ
ππΏπΏ π2(1,ππΏ)π¦1
πΏ
πΏππ»π
+ ππΉ
πππΏπ» (πΏππ»
πππ»πΏ βπ
ππ»+1βπΏππΏ
πππΏπΏ βπ
ππΏ+1)
πΏππ»π
= 0;
π₯πΏ =π½0π
ππ»π» πΏπ1(1,ππ»)π¦1
πΏβπ½0πΏππ»πππ»π» π1(ππ»,ππ»)π¦
ππ»π»
πΏππΏπ
+ ππΉ
πππ»πΏ (πΏππ»
πππ»π» βπ
ππ»+1βπΏππΏ
πππΏπ» βπ
ππΏ+1)
πΏππΏπ
= 0.
Since Case B.2 is possible only if πΏππ»π
ππ»πΏ βπππ»+1π
π»(πππ»+1π» ) β πΏππΏ
πππΏπΏ βπππΏ+1π
πΏ(πππΏ+1πΏ ) > 043,
we have a contradiction since βπ2(1, ππΏ) > 0 and π2(ππ», ππΏ) > 0 imply that π₯π» > 0. As a
result, π = 1. Since π2(π, ππΏ) and π1(π , π
π») must be of the same sign we have π = ππΏ > οΏ½ΜοΏ½πΏ.
If ππΏ > οΏ½ΜοΏ½πΏ, from the binding incentive compatibility constraints, we derive the optimal
payments:
π¦1π» = ππΉ
πΏππ»πππ»πΏ βπ
ππ»+1(1βππ»)
ππΏβ1ππ»βπΏππΏ
πππΏπ» βπ
ππΏ+1(1βππΏ)
ππΏβ1ππΏ
π½0πΏππΏππ»((1βππΏ)ππΏβ1β(1βππ»)ππΏβ1)β₯ 0;
π¦ππΏπΏ = ππΉ
(πΏππ»ππ»π
ππ»πΏ βπ
ππ»+1βπΏππΏ
ππΏπππΏπ» βπ
ππΏ+1)
π½0πΏππΏππΏππ»((1βππΏ)ππΏβ1β(1βππ»)ππΏβ1)
> 0. Q.E.D.
We now prove that οΏ½ΜοΏ½πΏ > ππΏ(<) for high (small) values of πΎ.
Lemma 6. There exists a unique value of πΎβ such that οΏ½ΜοΏ½πΏ > ππΏ (<) for any πΎ > πΎβ (<).
Proof: We formally defined οΏ½ΜοΏ½πΏ as: (1βππ»)
οΏ½ΜοΏ½πΏβ1ππ»
(1βππΏ)οΏ½ΜοΏ½πΏβ1ππΏβ‘
πππΏπ»
πππΏπΏ , for any ππΏ. This explicitly determines οΏ½ΜοΏ½πΏ
as a function of ππΏ:
οΏ½ΜοΏ½πΏ(ππΏ) = 1 + log(1βππ»
1βππΏ)
πππΏπ»
πππΏπΏ
ππ»
ππΏ .
43 Otherwise the (πΌπΆπ»,πΏ) is not binding.
49
We will prove next that there exist a unique value of οΏ½ΜοΏ½πΏ > 0 such that οΏ½ΜοΏ½πΏ > ππΏ (<) for
any ππΏ < οΏ½ΜοΏ½πΏ (>). With that aim, we define the function π(ππΏ) β‘ οΏ½ΜοΏ½πΏ(ππΏ) β ππΏ = 1 +
log(1βππ»
1βππΏ )
πππΏπ»
πππΏπΏ
ππ»
ππΏβ ππΏ
= 1 + log(1βππ»
1βππΏ )
ππ»
ππΏ + log(1βππ»
1βππΏ)
πππΏπ»
πππΏπΏ β ππΏ.
Then ππ
πππΏ=
(π½0(1βππ»)ππΏ
ln(1βππ»))πππΏπΏ βπ
ππΏπ» (π½0(1βππΏ)
ππΏln(1βππΏ))
πππΏπ»
πππΏπΏ ln(
1βππ»
1βππΏ)(πππΏπΏ )
2β 1
=(π½0(1βππ»)
ππΏln(1βππ»))π
ππΏπΏ βπ
ππΏπ» (π½0(1βππΏ)
ππΏln(1βππΏ))
πππΏπΏ π
ππΏπ» ln(
1βππ»
1βππΏ)β 1
=πππΏπΏ ln(1βππ»)(π½0(1βππ»)
ππΏβπ
ππΏπ» )+π
ππΏπ» ln(1βππΏ)(π
ππΏπΏ βπ½0(1βππΏ)
ππΏ)
πππΏπΏ π
ππΏπ» ln(
1βππ»
1βππΏ)=
(1βπ½0)(πππΏπ» ln(1βππΏ)βπ
ππΏπΏ ln(1βππ»))
πππΏπΏ π
ππΏπ» ln(
1βππ»
1βππΏ).
Since πππΏπ» < π
ππΏπΏ and |ln(1 β ππ»)| > |ln(1 β ππΏ)|, π
ππΏπ» ln(1 β ππΏ) β π
ππΏπΏ ln(1 β ππ») > 0
and, as a result, ππ
πππΏ < 0. Since π(0) > 0 there is only one point where π(οΏ½ΜοΏ½πΏ) = 0. Thus, there
exist a unique value of οΏ½ΜοΏ½πΏ such that οΏ½ΜοΏ½πΏ > ππΏ (<) for any ππΏ < οΏ½ΜοΏ½πΏ (>). Furthermore, οΏ½ΜοΏ½πΏ >0.
Finally, since the optimal ππΏ is strictly decreasing in πΎ, and π(β) is independent of πΎ, it follows
that there exists a unique value of πΎβ such that οΏ½ΜοΏ½πΏ > ππΏ (<) for any πΎ > πΎβ (<). Q.E.D.
Finally, we consider the case when the likelihood ratio of reaching the last period of the
experimentation stage is the same for both types, π
ππ»π»
πππ»πΏ =
πππΏπ»
πππΏπΏ .
Case B.2: knife-edge case when π = π·π»π―π― π·
π»π³π³ β π·
π»π³π― π·
π»π―π³ = π.
Define a οΏ½ΜοΏ½π» similarly to οΏ½ΜοΏ½πΏ, as done in Lemma 1, by πππ»πΏ
πππ»π» =
(1βππ»)οΏ½ΜοΏ½π»β1
ππ»
(1βππΏ)οΏ½ΜοΏ½π»β1ππΏ.
Claim B.2.1. πππ»π» π
ππΏπΏ β π
ππΏπ» π
ππ»πΏ = 0 βΊ οΏ½ΜοΏ½π» = οΏ½ΜοΏ½πΏ for any ππ», ππΏ .
Proof: Recall that οΏ½ΜοΏ½πΏ was determined by πππΏπ»
πππΏπΏ =
(1βππΏ)οΏ½ΜοΏ½πΏβ1
ππΏ
(1βππ»)οΏ½ΜοΏ½πΏβ1ππ». Next, π
ππ»π» π
ππΏπΏ β π
ππΏπ» π
ππ»πΏ = 0 βΊ
πππ»πΏ
πππ»π» =
πππΏπΏ
πππΏπ» , which immediately implies that
50
πππ»π» π
ππΏπΏ β π
ππΏπ» π
ππ»πΏ = 0 βΊ
(1βππ»)οΏ½ΜοΏ½π»β1
ππ»
(1βππΏ)οΏ½ΜοΏ½π»β1ππΏ=
(1βππ»)οΏ½ΜοΏ½πΏβ1
ππ»
(1βππΏ)οΏ½ΜοΏ½πΏβ1ππΏ;
(1βππ»
1βππΏ)οΏ½ΜοΏ½π»βοΏ½ΜοΏ½πΏ
= 1 or, equivalently οΏ½ΜοΏ½π» = οΏ½ΜοΏ½πΏ. Q.E.D.
We prove now that the principal will choose ππΏ and ππ» optimally such that π = 0 only if
ππΏ > οΏ½ΜοΏ½πΏ.
Lemma B.2.1. πππ»π» π
ππΏπΏ β π
ππΏπ» π
ππ»πΏ = 0 β ππΏ > οΏ½ΜοΏ½πΏ, ππ» > 0, ππΏ > 0, πΌπ‘
π» > 0 for π‘ > 1 and πΌπ‘πΏ >
0 for π‘ < ππΏ (it is optimal to set π₯πΏ = π₯π» = 0, π¦π‘π» = 0 for π‘ > 1 and π¦π‘
πΏ = 0 for π‘ < ππΏ).
Proof: Labeling {πΌπ‘π»}π‘=1
ππ», {πΌπ‘
πΏ}π‘=1ππΏ
, πΌπ», πΌπΏ, ππ» and ππΏ as the Lagrange multipliers of the
constraints associated with (πΌπ ππ‘π»), (πΌπ ππ‘
πΏ), (πΌπΆπ»,πΏ), (πΌπΆπΏ,π»), (πΌπ πΉππ») and (πΌπ πΉππΏ)
respectively, we can rewrite the Kuhn-Tucker conditions as follows:
πβ
ππ₯π» = βππΏππ»π
ππ»π» + ππ» = 0, which implies that ππ» > 0 and, as a result, π₯π» = 0;
πβ
ππ₯πΏ = β(1 β π)πΏππΏπ
ππΏπΏ + ππΏ = 0, which implies that ππΏ > 0 and, as a result, π₯πΏ = 0;
πβ
ππ¦π‘π» = βπ(1 β ππ»)π‘β1ππ» + πΌπ»π
ππΏπΏ π2(π‘, π
πΏ) β πΌπΏπππ»π» π1(π‘, π
π») +πΌπ‘
π»
πΏπ‘π½0= 0 for 1 β€ π‘ β€ ππ»;
πβ
ππ¦π‘πΏ = β(1 β π)(1 β ππΏ)π‘β1ππΏ β πΌπ»π
ππΏπΏ π2(π‘, π
πΏ) + πΌπΏπππ»π» π1(π‘, π
π») +πΌπ‘
πΏ
πΏπ‘π½0= 0 for 1 β€ π‘ β€ ππΏ.
Similar results to those from Lemma 2 hold in this case as well.
Lemma B.2.2. There exists at most one time period 1 β€ π β€ ππΏ such that π¦ππΏ > 0 and at most
one time period 1 β€ π β€ ππ» such that π¦π π» > 0.
Proof: Assume to the contrary that there are two distinct periods 1 β€ π,π β€ ππ» such that π β
π and π¦ππ», π¦π
π» > 0. Then from the Kuhn-Tucker conditions it follows that
βπ(1 β ππ»)πβ1ππ» + πΌπ»πππΏπΏ π2(π, ππΏ) β πΌπΏπ
ππ»π» π1(π, ππ») = 0,
and, in addition, βπ(1 β ππ»)πβ1ππ» + πΌπ»πππΏπΏ π2(π, ππΏ) β πΌπΏπ
ππ»π» π1(π, ππ») = 0.
Combining the two equations together, πΌπΏπππ»π» (π1(π, ππ»)π2(π, ππΏ) β π1(π, ππ»)π2(π, ππΏ))
+πππ»((1 β ππ»)πβ1π2(π, ππΏ) β (1 β ππ»)πβ1π2(π, ππΏ)) = 0, which can be rewritten as
follows44:
πππΏπ»
πππΏπΏ ππΏ((1 β ππ»)πβ1(1 β ππΏ)πβ1 β (1 β ππ»)πβ1(1 β ππΏ)πβ1) = 0,
44 After some algebra, one could verify that π1(π, ππ»)π2(π, ππΏ) β π1(π, ππ»)π2(π, ππΏ)
= πππ»ππΏ
πππ»π» π
ππΏπΏ [(1 β ππ»)πβ1(1 β ππΏ)πβ1 β (1 β ππΏ)πβ1(1 β ππ»)πβ1].
51
(1βππ»
1βππΏ)πβπ
= 1, which implies that π = π and we have a contradiction.
In the same way, there exists at most one time period 1 β€ π β€ ππΏ such that π¦ππΏ > 0. Q.E.D
Lemma B.2.3: Both types may be rewarded for success only at extreme time periods, i.e. only at
the last or the first period of the experimentation stage.
Proof: Since (See Lemma B.2.2) there exists only one time period 1 β€ π β€ ππ» such that π¦π π» > 0
(πΌπ π» = 0) it follows that βπ(1 β ππ»)π β1ππ» + πΌπ»π
ππΏπΏ π2(π , π
πΏ) β πΌπΏπππ»π» π1(π , π
π») = 0 and
βπ(1 β ππ»)π‘β1ππ» + πΌπ»πππΏπΏ π2(π‘, π
πΏ) β πΌπΏπππ»π» π1(π‘, π
π») = βπΌπ‘
π»
πΏπ‘π½0 for 1 β€ π‘ β π β€ ππ».
Combining the equations together, πΌπΏπππ»π» (π1(π , π
π»)π2(π‘, ππΏ) β π1(π‘, π
π»)π2(π , ππΏ))
+πππ»((1 β ππ»)π β1π2(π‘, ππΏ) β (1 β ππ»)π‘β1π2(π , π
πΏ)) = βπΌπ‘
π»
πΏπ‘π½0π2(π , π
πΏ), which can be rewritten
as follows:
πππΏπ» (1βππ»)
π‘β1(1βππΏ)
π‘β1
πππΏπΏ ((1 β ππ»)π βπ‘ β (1 β ππΏ)π βπ‘) = β
πΌπ‘π»
πΏπ‘π½0π2(π , π
πΏ) for 1 β€ π‘ β π β€ ππ».
If π2(π , ππΏ) > 0 (π > οΏ½ΜοΏ½π») then (1 β ππ»)π βπ‘ β (1 β ππΏ)π βπ‘ < 0, which implies that π‘ < π
for 1 β€ π‘ β π β€ ππ» and it must be that π = ππ» > οΏ½ΜοΏ½π». If π2(π , ππΏ) < 0 (π < οΏ½ΜοΏ½π») then
(1 β ππ»)π βπ‘ β (1 β ππΏ)π βπ‘ < 0, which implies that π‘ > π for 1 β€ π‘ β π β€ ππ» and it must be
that π = 1. In a similar way, for 1 β€ π β€ ππΏ such that π¦ππΏ > 0 it must be that either π = 1 or π =
ππΏ > οΏ½ΜοΏ½πΏ. Q.E.D.
Finally, from πβ
ππ¦1π» = βπππ» + πΌπ»π
ππΏπΏ π2(1, ππΏ) β πΌπΏπ
ππ»π» π1(1, ππ») = 0 when π¦1
π» > 0 and
πβ
ππ¦1πΏ = β(1 β π)ππΏ β πΌπ»π
ππΏπΏ π2(1, ππΏ) + πΌπΏπ
ππ»π» π1(1, ππ») = 0 when π¦1
πΏ > 0 we have a
contradiction. As a result, π¦1π» > 0 implies π¦
ππΏπΏ > 0 with ππΏ > οΏ½ΜοΏ½πΏ. Q.E.D.
II. Optimal length of experimentation
(Proposition 2)
Since ππΏ and ππ» affect the information rents, ππΏ and ππ», there will be a distortion in the
duration of the experimentation stage for both types:
πβ
πππ=
π(πΈπ Ξ©π(ππ)βπ ππ»β(1βπ) ππΏ)
πππ= 0.
The exact values of ππ» and ππΏ depend on whether we are in Case A ((πΌπΆπ»,πΏ) is slack) or Case B
(both (πΌπΆπΏ,π») and (πΌπΆπ»,πΏ) are binding.) In Case A, by Claim A.1, the low typeβs rent
πΏππ»π
ππ»πΏ βπππ»+1ππΉ is not affected by ππΏ. Therefore, the F.O.C. with respect to ππΏ is identical to
that under first best: πβ
πππΏ =ππΈπ Ξ©π(ππ)
πππΏ = 0, or, equivalently, πππ΅πΏ = ππΉπ΅
πΏ when (πΌπΆπ»,πΏ) is not
52
binding. However, since the low typeβs information rent depends on ππ», there will be a
distortion in the duration of the experimentation stage for the high type:
πβ
πππ» =π(πΈπ Ξ©π(ππ)β(1βπ) πΏππ»
πππ»πΏ βπ
ππ»+1ππΉ)
πππ» = 0.
Since the informational rent of the low-type agent, πΏππ»π
ππ»πΏ βπππ»+1ππΉ, is non-monotonic
in ππ», it is possible, in general, to have πππ΅π» > ππΉπ΅
π» or πππ΅π» < ππΉπ΅
π» .
In Case B, the exact values of ππ» and ππΏ depend on whether ππΏ < οΏ½ΜοΏ½πΏ (Lemma 4) or
ππΏ > οΏ½ΜοΏ½πΏ (Lemma 5), but in each case ππΏ > 0 and ππ» β₯ 0. It is possible, in general, to have
πππ΅π» > ππΉπ΅
π» or πππ΅π» < ππΉπ΅
π» and πππ΅πΏ > ππΉπ΅
πΏ or πππ΅πΏ < ππΉπ΅
πΏ .
We provide sufficient conditions for over and under experimentation next.
Sufficient conditions for over/under experimentation
Case A: If (πΌπΆπ»,πΏ) is not binding, ππΉπ΅π» < πππ΅
π» if ππ» > ππ»
and ππΉπ΅π» > πππ΅
π» if ππ» < ππ».
Proof: If (πΌπΆπ»,πΏ) is not binding, the rent to the low type is ππΏ = πΏππ»π
ππ»πΏ βπππ»+1ππΉ. Given that
βππ‘ = (π β π)(π½π‘πΏ β π½π‘
π»), the rent becomes ππΏ = πΏππ»π
ππ»πΏ (π½
ππ»+1πΏ β π½
ππ»+1π» )(π β π)ππΉ.
Recall the function ν(π‘) β‘ πΏπ‘ππ‘πΏ(π½π‘+1
πΏ β π½π‘+1π» ) from the proof for Sufficient Conditions
for (πΌπΆπ»,πΏ) to be binding. Rent to the low type can be rewritten as ππΏ = ν(ππ»)(π β π)ππΉ. In
proving sufficient condition for (πΌπΆπ»,πΏ) to be binding we proved that for any π‘, ππ(π‘)
ππ‘< 0 for
ππ» > ππ»
, and ππ(π‘)
ππ‘> 0 for ππ» < ππ». Therefore, if ππ» > π
π»(< ππ») then
πππΏ
πππ» < 0 (> 0) and we
have over (under) experimentation. Q.E.D.
Case B: If both (πΌπΆπ»,πΏ) and (πΌπΆπΏ,π») are binding, ππΉπ΅π» < πππ΅
π» if ππ» > ππ»
and ππΉπ΅π» > πππ΅
π» if ππ» <
ππ», and ππΉπ΅πΏ < πππ΅
πΏ if ππ» < ππ».
Proof: If both (πΌπΆπ»,πΏ) and (πΌπΆπΏ,π») are binding, rent paid to the low and high type is
ππΏ = ππΉ
(1βππΏ)ππΏβ1
(πΏππ»ππ»π
ππ»πΏ βπ
ππ»+1βπΏππΏ
ππΏπππΏπ» βπ
ππΏ+1)
ππ»((1βππΏ)ππΏβ1β(1βππ»)ππΏβ1) and
ππ» = ππΉ
πΏππ»πππ»πΏ βπ
ππ»+1(1βππ»)
ππΏβ1ππ»βπΏππΏ
βπππΏ+1
(1βππΏ)ππΏβ1
ππΏ
ππΏ((1βππΏ)ππΏβ1β(1βππ»)ππΏβ1), respectively.
Given that βππ‘ = (π β π)(π½π‘πΏ β π½π‘
π»), we have
53
ππΏ = ππΉ(π β π)(πΏππ»
ππ»πππ»πΏ (π½
ππ»+1πΏ βπ½
ππ»+1π» )βπΏππΏ
ππΏπππΏπ» (π½
ππΏ+1πΏ βπ½
ππΏ+1π» ))
ππ»(1β(1βππ»
1βππΏ )ππΏβ1
)
and ππ» = ππΉ(π β π)πΏππ»
πππ»πΏ (π½
ππ»+1πΏ βπ½
ππ»+1π» )(
1βππ»
1βππΏ )ππΏβ1
ππ»βπΏππΏπππΏπ» βπ
ππΏ+1ππΏ
ππΏ(1β(1βππ»
1βππΏ )ππΏβ1
)
.
Recall function ν(π‘) β‘ πΏπ‘ππ‘πΏ(π½π‘+1
πΏ β π½π‘+1π» ). Then ππΏ and ππ» can be rewritten as
ππΏ = ππΉ(π β π)
(ππ»π(ππ»)βππΏπ(ππΏ)π
ππΏπ»
πππΏπΏ )
ππ»(1β(1βππ»
1βππΏ )ππΏβ1
)
, and
ππ» = ππΉ(π β π)
π(ππ»)(1βππ»
1βππΏ )ππΏβ1
ππ»βπ(ππΏ)π
ππΏπ» ππΏ
πππΏπΏ
ππΏ(1β(1βππ»
1βππΏ )ππΏβ1
)
.
Consider first ππ». Both ππΏ and ππ» are increasing in ν(ππ»). In proving sufficient
condition for (πΌπΆπ»,πΏ) to be binding we proved that ππ(ππ»)
πππ» < 0 for ππ» > ππ»
, and ππ(ππ»)
πππ» < 0 for
ππ» < ππ». Therefore, it is optimal to let the high type over experiment (ππΉπ΅π» < πππ΅
π» ) if ππ» > ππ»
and under experiment (ππΉπ΅π» > πππ΅
π» ) if ππ» < ππ».
Consider now ππΏ. Since (1βππ»
1βππΏ)ππΏβ1
is decreasing in ππΏ, both ππΏ and ππ» will be
decreasing in ππΏ if π(ππΏ)π
ππΏπ»
πππΏπΏ is increasing in ππΏ. We next prove that
π(ππΏ)πππΏπ»
πππΏπΏ is increasing in ππΏ
for small values of ππ».
To simplify, π(ππΏ)π
ππΏπ»
πππΏπΏ =
πΏππΏπππΏπΏ (π½
ππΏ+1πΏ βπ½
ππΏ+1π» )π
ππΏπ»
πππΏπΏ = πΏππΏ
π½0(1βπ½0)((1βππΏ)ππΏ
β(1βππ»)ππΏ
)
πππΏπΏ .
π
[
πΏππΏ((1βππΏ)
ππΏ
β(1βππ»)ππΏ
)
πππΏπΏ
]
πππΏ =
πΏππΏ((1 β ππΏ)ππΏ
ππ(1 β ππΏ) β (1 β ππ»)ππΏππ(1 β ππ»))π
ππΏπΏ
β π½0(1 β ππΏ)ππΏ
ππ(1 β ππΏ)((1 β ππΏ)ππΏβ (1 β ππ»)ππΏ
)
(πππΏπΏ )
2
54
+πΏππΏππ πΏ
πππΏπΏ ((1 β ππΏ)ππΏ
β (1 β ππ»)ππΏ)
(πππΏπΏ )
2
= πΏππΏ ππ(1βππΏ)(1βππΏ)ππΏ
πππΏπ» β(1βππ»)
ππΏππ(1βππ»)π
ππΏπΏ
(πππΏπΏ )
2 + πΏππΏππ πΏ
πππΏπΏ ((1βππΏ)
ππΏβ(1βππ»)
ππΏ)
(πππΏπΏ )
2
= πΏππΏ (1βππΏ)ππΏ
[πππΏπ» ππ(1βππΏ)+π
ππΏπΏ ππ πΏ]β(1βππ»)
ππΏπππΏπΏ ππ[πΏ(1βππ»)]
(πππΏπΏ )
2 .
πΏππΏ((1βππΏ)
ππΏβ(1βππ»)
ππΏ)
πππΏπΏ increases with ππΏ if and only if π (ππ») > 0,
π (ππ») = (1 β ππΏ)ππΏ[π
ππΏπ» ππ(1 β ππΏ) + π
ππΏπΏ ππ πΏ] β (1 β ππ»)ππΏ
πππΏπΏ ππ[πΏ(1 β ππ»)].
We prove next that π (ππ») > 0 for any π‘ if ππ» is sufficiently low.
Since both (1 β ππ»)ππΏπ
ππΏπΏ and (1 β ππΏ)ππΏ
[πππΏπ» ππ(1 β ππΏ) + π
ππΏπΏ ππ πΏ] are increasing in π‘, we
have π (ππ») > (1 β ππΏ)[π1π»ππ(1 β ππΏ) + π1
πΏ ππ πΏ] β (1 β ππ»)ππππΏ ππ[πΏ(1 β ππ»)],
where π = max{ππ», ππΏ}.
Next, (1 β ππΏ)[π1π»ππ(1 β ππΏ) + π1
πΏ ππ πΏ] > (1 β ππΏ)π1πΏ ππ[πΏ(1 β ππΏ)] and
ππ[πΏ(1βππ»)]
ππ[πΏ(1βππΏ)]> 1. Therefore,
(1βππΏ)π1πΏ
(1βππ»)ππππΏ< 1 βΉ π (ππ») > 0.
Rearranging the above, we have π (ππ») > 0 for any π‘ if ππ» < 1 β ((1βππΏ)π1
πΏ
πππΏ )
1
π.
Denote
ππ» β‘ 1 β ((1 β ππΏ)π1
πΏ
πππΏ )
1
π.
Therefore, for any π‘, π (ππ») > 0 if ππ» < ππ» and, consequently, both ππΏ and ππ» are
decreasing in ππΏ. Consequently, it is optimal to let the low type over experiment (ππΉπ΅πΏ < πππ΅
πΏ ) if
ππ» < ππ». Q.E.D.