Download - Contract Manufacturing Relationships - Home | OpenScholar ... · Contract Manufacturing Relationships Can Urgun December 2017 Abstract ... droni, Niko Matouschek, Bruno Strulovici,

Contract Manufacturing Relationships

Can Urgun∗

December 2017

Abstract

Contract manufacturing enables intellectual property holders to

enjoy scale economies, reduce labor costs and free up capital. However,

in many scenarios contract manufacturing is a double-edged sword,

rife with entrenchments, threats of predation or hold up. I explore

contract manufacturing relationships in two scenarios: First, a setup

with a single agent that can learn, forget and threaten the principal

with predation. Second, a setup with multiple agents that have ca-

pacity constraints and can threaten the principal with hold up. The

analysis requires a novel methodological approach: in both settings

the principal optimal contract is characterized by a simple index rule.

JEL-Classification: D21, D86, L14, L24

∗Princeton University, e-mail: [email protected]. I am grateful to Alvaro San-

droni, Niko Matouschek, Bruno Strulovici, Dan Barron, Mike Powell, Ehud Kalai, Jin

Li, Elliot Lipnowski and Nick Buchholz for helpful comments and suggestions. I am also

thankful to seminar participants at Northwestern, Chicago, Princeton, UCSD, London

Business School, Indiana, UIUC and Rochester. The usual disclaimer applies.

1

1 Introduction

Firms often maintain relationships with trading partners to outsource pro-

duction. The main form of outsourcing in many industries is a contract man-

ufacturing agreement. However, these agreements are seldom complete and

are often backed with informal promises. Outsourcing the entire production

to a contract manufacturer (CM) allows a firm to concentrate on enhancing

products by focusing on R&D, marketing and design, while enjoying the cost

advantages brought in by the expertise of contract manufacturers.

In a contract manufacturing agreement, a client engages a contractor to

manufacture a product in exchange for a negotiated fee.12 Contract manu-

facturing agreements are used in a broad range of industries including elec-

tronics, pharmaceutical, automotive, food and beverages (Tully 1994). For

example, global contract manufacturing had an expected volume of $515 bil-

lion in electronics industry and $40.7 billion in pharmaceutical industry in

2015. It is expected to grow even more with projected annual growth rates

of 8.6% and 6.4% for the respective industries (Rajaram 2015, Pandya and

Shah 2013). These two industries also display the most distinct characteris-

tics. If a product is novel and complex as in electronics, a client will gravitate

towards a single source contract manufacturing agreement as switching be-

tween CMs becomes costly. On the other hand, as products commodify as

in production facilities for pharmaceuticals, clients gain a wide choice of in-

terchangeable CMs (Arrunada and Vazquez 2006).

Despite the cost advantages, contract manufacturing entails some in-

escapable hazards as evidenced by the rocky relationships between, firms

such as Cisco, Dell, Compaq, Gateway, IBM, Lucent, Hewlett-Packard,

Abbott Laboratories, Pfizer, Merck and their contract manufacturers (see,

1Instead of outsourcing parts of a product, in contract manufacturing the entire productis outsourced.

2For ease of referral, I address the principal/client as female and the agents/contractorsas male.

2

Casadesus-Masanell, Yoffie, and Mattu (2002), Lakenan, Boyd, and Frey

(2001), Arrunada and Vazquez (2006), Langer (2015a)). In many con-

tract manufacturing agreements parties soon find themselves immersed in

a “melodrama replete with promiscuity, infidelity, and betrayal” (Arrunada

and Vazquez 2006). On one hand, if the product is novel and sole sourced,

then the CM is in a prime position to compete or even overtake the client.

“Adding insult to injury, if the client had not given its business to the

traitorous contract manufacturer, the CM’s knowledge might have remained

sufficiently meager to prevent it from entering its patron’s market”.(Arrunada

and Vazquez 2006). Indeed, Intel, Cisco Systems and Alcatel retain some

plants despite outsourcing most of their production. These firms juggle their

production between the CM and their own inefficient, in-house production ca-

pabilities in order to curb the learning and efficiency of the CM. On the other

hand, if the product is commodified and in a mature market then another

problem arises. In the case of commodified products such as pharmaceuti-

cals, CM’s must be able to meet a client’s needs for flexible scheduling and

capacity (Langer 2015b). However, as McCoy (2003) notes, when a client ap-

proaches a contractor she may discover that he is entrenched with little flex

capacity. The relationships between a client and a potential contractor be-

come necessarily intertwined despite their bilateral nature as the client might

just want to cycle through CMs to avoid such entrenchments. Contractors

manage these diverse relationships by trying to keep their facilities running

at 70 − 80% capacity and they meet extra demands by working overtime

(Tully 1994). Thus, a contractor may have to over-utilize his assets, which

increases his contracting costs.

The main difference of contract manufacturing problems from other rela-

tional contracting problems is that utilizing a contract manufacturer has an

impact on future costs. I explore two main settings that have this feature:

sole sourcing and multiple sourcing. In the sole sourcing model, utilization

leads to learning by doing. The sole sourcing model is motivated by the elec-

3

tronics industry, where firms retain some in-house production capacity to

limit the efficiency gains of their contract manufacturer in order to prevent

the contract manufacturer from competing against them later. If the agent

is utilized, he gets more efficient and hence becomes a potential threat to

the principal, if the principal opts for in-house production the agent slowly

becomes less efficient as organizational forgetting takes place. In the multiple

sourcing model, utilization leads to entrenchment, hence a loss of efficiency.

Multiple sourcing model is motivated by the pharmaceutical industry. The

product is more commodified and an intellectual property holder wishes to

produce a drug via a facility that satisfies some requirements (e.g. FDA

regulations). The agents are sometimes entrenched with other obligations

and repeated utilization forces the agents to over-use their assets, increasing

their costs. If an agent who has other entrenchments is not utilized by the

principal, then he has an opportunity to catch-up to his other obligations,

rest, and decrease future costs.

Despite the complexities in these economies, the principal optimal uti-

lization schedule is achieved by a simple index rule and an accompanying

payment rule. The index of an agent depends only on the current cost of

that agent and captures the shadow value of utilizing the agent whenever his

cost is equal to the current level. The payments play a dual role of satisfying

incentive constraints as well as being embedded into the index.

The simplicity of this policy reveals striking characteristics of the optimal

contract. When making a utilization decision, the principal could potentially

rely on many factors. These include the entire history of relationships, all

the informal promises she made, or even calendar time. However, the index

does not depend on these factors, it simply depends on the current cost of

an agent and the mechanics i.e. the underlying law of motion governing the

cost.

The indices do not depend on calendar time, which is particularly signif-

icant in the sole sourcing model. Hence, the length of the relationship does

4

not effect the decisions of the principal. The level of efficiency, no matter

how long it took to reach, is the deciding factor.

The indices do not depend on history which has important ramifications

in the multiple sourcing model. In particular, the indices do not depend

on whether an agent has ever worked for the principal or not. This rules

out the insider-outsider phenomena, where preferential treatment is given to

agents with whom the principal has worked before. Loyalty in the form of

keeping promises occurs in the optimal contract, but loyalty in the form of

preferential treatment does not.

The analysis of utilization impacting future costs requires novel tech-

niques. In the existing literature, dependence of future costs on utilization

is limited as it turns the game into a reducible stochastic game. Unlike ir-

reducible stochastic games, there are no results such as Abreu, Pearce, and

Stacchetti (1990) for providing a recursive characterization the payoff space.3

Thus promise keeping constraints that are endogenous parts of the equilibria

are hard to tackle. Hence, a new methodology is required for this paper. I

show that a principal optimal contract in this game can be identified by index

policies and the principal’s problem is a relaxed version of a non-standard

bandit problem where I build upon the Whittle (1988) index. Despite the

various incentive frictions and complex relationships, the indices in this pa-

per share some of the characteristics of the Gittins (1979) index, which was

celebrated for its surprising simplicity.

The index policy approach has several advantages. First, although I fo-

cus on economies where costs are the only phenomena that are dependent on

utilization, the methodology is broadly applicable to other scenarios such as

persistent capital investments, liquidity constraints that are tied to perfor-

mance, or reputation build-up in different markets. In fact, the methodology

can be further generalized by using multi-mode bandit indices to capture dif-

3A stochastic game is irreducible if no player has a strategy that can reduce the payoffrelevant stochastic process.

5

ferent effort levels. Second, the optimal policy will always be time consistent.

Finally, the index policy is fully described and the indices are identified in

closed form. For example in the learning by doing model the closed form

enables comparative statics about the speed of learning and profits.

This paper is organized as follows. Section 1 continues with a short liter-

ature review followed by a brief outline of the model for general applications.

Section 2 2 investigates the sole sourcing model, inspired by electronics con-

tract manufacturing. Section 3 investigates the multiple sourcing model,

inspired by pharmaceutical contract manufacturing. Finally, section 4 con-

cludes. All proofs are relegated to the appendix.

1.1 Related Literature

This paper builds on a large number of relational contracting papers. This is

a vast literature that I do not survey here. Malcomson et al. (2010) provides

an excellent survey.

The sole-sourced contract manufacturing model explores learning-by-doing

and organizational forgetting in a dynamic setting. A comprehensive survey

for learning by doing is Thompson (2010), which also notes that dynamic

models quickly become intractable, hence much of the existing analysis has

been made in restricted settings. An important contribution that notes the

importance of learning in technology markets is Lewis and Yildirim (2002)

without organizational forgetting. However, in addition to learning organi-

zational forgetting is also an important phenomena in many industries as

evidenced by Argote, Beckman, and Epple (1990), Darr, Argote, and Ep-

ple (1995), Shafer, Nembhard, and Uzumeri (2001), Thompson (2007) and

Benkard (2000). Due to the technical challenges involved, as noted in Be-

sanko, Doraszelski, Kryukov, and Satterthwaite (2010) organizational for-

getting has been largely ignored in the theoretical literature. The closest

models that look at contract manufacturing in particular are Plambeck and

Taylor (2005) and Gray, Tomlin, and Roth (2009) as they explore learning by

6

doing but the analysis is limited to two periods and without organizational

forgetting. The sole-sourced model delivers the principal-optimal contract in

a fully dynamic model where both learning and organizational forgetting are

present.

The multiple-sourced contract manufacturing model explores dynamic

work allocation across multiple agents. The additional component is that the

principal has some control over how entrenched each agent is via utilization.

The classic reference to the entrenchment model is Levin (2002), however that

is a complete contracting setup thus the work allocation is not fully dynamic.

The effect of commitment is pretty severe in multiple suppliers setups and has

been highlighted in Cisternas and Figueroa (2015). The references to fully

dynamic work allocation are Board (2011), and Andrews and Barron (2013)

since they feature multiple agents in a relational contracting setting. The

main difference is that the control via utilization is absent in those settings

and the optimal contract turns out to be history dependent in both. The

multiple-sourced model delivers the principal-optimal contract in a dynamic

work allocation setup highlighting the effect of control and its effect on history

independence while simultaneously introducing bandits as a potential and

tractable tool to analyze such settings.

The critical problem for the principal in both settings is to find an opti-

mal utilization schedule even though there is no inherent recursive structure

in the game. Bandit problems are also scheduling problems which need not

recur thus I build upon techniques in the bandit literature. From a method-

ological perspective, approaching the principal’s problem as a bandit problem

is different from canonical papers in relational incentive contracting, such as

Levin (2003), Baker, Gibbons, and Murphy (2002), and Malcomson et al.

(2010). Most of the literature utilizes the inherent recursion in repeated

games, which provides a recursive characterization of the payoff space. The

main advantage of a bandit approach is that it allows for an easily imple-

mentable policy when there is no inherent recursion while still delivering the

7

principal optimal behavior.

Since the paper utilizes bandits, a technically close literature is the exper-

imentation literature. This is a vast literature that I do not survey here, but

some notable contributions are Bolton and Harris (1999), Bergemann and

Valimaki (1996), Keller, Rady, and Cripps (2005), Rosenberg, Solan, and

Vieille (2007), Strulovici (2010) , Klein and Rady (2011), Fryer and Harms

(2017). Large chunk of this literature uses standard bandits (a notable ex-

ception that also utilizes restless bandits is Fryer and Harms (2017)) in order

to answer questions of when to make a switch from experimentation to ex-

ploitation in various settings with beliefs about a project being the deciding

factor of experimentation. This paper interprets arms of a bandit as the

agents themselves directly thus introduces bandits as a potential framework

for dynamic contracting. The “arms” have their own incentive constraints

that need to be satisfied, and the state reflects the state of an agent that is

commonly known.

Within the bandit literature this paper builds upon restless bandit prob-

lems. Gittins, Glazebrook, and Weber (2011) provides an excellent treatment

of this literature, and Nino-Mora et al. (2001), Glazebrook, Nino-Mora, and

Ansell (2002), Nino-Mora (2002), Glazebrook, Ruiz-Hernandez, and Kirk-

bride (2006), Glazebrook, Hodge, Kirkbride, et al. (2013) are notable contri-

butions. Restless bandits are bandit problems where even the arms that are

not operated continue to give rewards and to change states, albeit at differ-

ent rates. The pioneering work in that literature is Whittle (1988), where

he derives a heuristic index based on a Lagrangian relaxation of the undis-

counted problem. Papadimitriou and Tsitsiklis (1999) showed that general

restless bandits are intractable and even indexability of the problem is hard

to ascertain. However, here I show that in this special case of bi-directional

restless bandits, the intractability can be bypassed via using a smaller and

more tractable space of policies to calculate indices in closed form. In order

to do this, I build upon the work of Glazebrook, Hodge, Kirkbride, et al.

8

(2013).

Finally, as a generalization of bandit problems this paper also utilizes

some general existence results on Markov decision problems. Markov decision

problems is a vast literature that I do not survey here. Blackwell (1965) is

an important pioneering work, and Puterman (2014) provides a remarkable

treatment of the literature.

1.2 A Short Discussion of the Methodology and Bi-

Directional Principal-Agent Relationships

In this section I give a broad description of the problems that can be tackled

via the techniques in this paper. Thus, this section is intended to flesh out

the overall approach without attempting to characterize a formal setup or

solutions. The techniques in this paper are extensions of the restless ban-

dits literature (see, Whittle (1988), Gittins, Glazebrook, and Weber (2011),

Nino-Mora et al. (2001), Glazebrook, Nino-Mora, and Ansell (2002), Nino-

Mora (2002), Glazebrook, Ruiz-Hernandez, and Kirkbride (2006), Glaze-

brook, Hodge, Kirkbride, et al. (2013)) into principal-agent setups, thus are

not limited to contract manufacturing.

The principal optimal contracts identified in this paper are mostly tied

to the optimal utilization schedules which are characterized by indices. In-

dex policies are unorthodox in relational contracting settings, but they are

prevalent in bandit problems. The pioneering work of Gittins (1979) showed

that some bandit scheduling problems can be solved by a rather simple index

policy, which is characterized arm by arm via identifying indifference points.

The methodology for solving the principal’s problem is slightly more involved

than a bandit problem but capitalizes on the same idea.

In its most basic form the environment consists of a bilateral relationship

between a principal and an agent. The principal has the opportunity to

interact with and utilize the agent at periods t = 0, 1, 2, . . . without a long

term contract. Utilization is binary and denoted by I t ∈ 0, 1. Roughly,

9

utilization could mean any binary choice that has an effect on the value of

the interaction in that period. For example, utilization could mean that the

agent exerts binary choice of effort or it could mean that agent is employed

by the principal. The agent also has a physical state st ∈ S where S =

s1, s2, . . . , sN with sk < sk+1 and sk ∈ R for k = 1, 2, . . . , N . The value

of the interaction at any period is jointly determined by a pair of functions

v : 0, 1 × S → R for the principal and c : 0, 1 × S → R for the agent,

which depend on the state of the agent and utilization. Roughly, the state

captures the history dependence in the value of the interaction, and could

potentially be how tired the agent is or how experienced the agent is.

In the environments that these techniques are applicable to, utilization

also controls the state of the agent. For example if the state of the agent is

how experienced the agent is, then the utilization decision not only effects

the value of the interaction but also effects how experienced the agent will

be in the next period. If the agent is utilized the state of the agent changes

according to a time homogenous Markov distribution Pa with P sk,sma denot-

ing the probability that the next state is sm given that the current state is

sk. If agent is not utilized, then the state of the agent does not remain at

rest, instead it changes according to another time homogenous Markov dis-

tribution Pp with P sk,smp denoting the probability that the next state is sm

given that the current state is sk. An agent is called restless if P sk,smp > 0

for at least one m 6= k. Let Pa(k) denote the set of states that are reachable

in the next period if the current state is sk and agent is utilized, that is

Pa(k) = sm ∈ S : P sk,sma > 0. Similarly, let Pp(k) denote the set of states

that are reachable in the next period if the current state is sk and agent is

not utilized, that is Pp(k) = sm ∈ S : P sk,smp > 0.

A particular form of restlessness that is natural in many economic phe-

nomena is utilization and non-utilization having opposite effects. For exam-

ple, if the state is how tired the agent is, and utilization is exerting effort

then it is easy to envision that exerting effort causes the agent to become

10

more tired, and not exerting effort causes the agent to become less tired. In

this vein, a restless agent is going to be called bi-directional if for all sk ∈ S,

either Pa(k) ≤SSO sk ≤SSO Pp(sk) or Pp(sk) ≤SSO sk ≤SSO Pa(k),

where ≤SSO denotes being smaller in the strong set order. Bi-directionality

means that utilization and no utilization have roughly opposite effects (the

magnitude can differ) on the agent’s state. If utilization causes the state to

go up (probabilistically), then non-utilization causes the state to go down; if

utilization causes the state to go down, then non-utilization causes the state

to go up. Bi-directionality does not rule out the potential to stay in the same

state, but it ensures that if effort is tiring, then the agent can not be less

tired after exerting effort.

The final notion that I will introduce is about “smoothness” of the tran-

sitions at least in one direction. This notion is based on a similar notion

introduced in Glazebrook, Hodge, Kirkbride, et al. (2013). A bi-directional

agent is active skip free if either Pa(k) ⊂ sk, sk+1 or Pa(k) ⊂ sk, sk−1for all states sk. Similarly if a bi-directional agent is passive skip free if ei-

ther Pp(k) ⊂ sk, sk+1 or Pp(k) ⊂ sk, sk−1 for all states sk. That is a

bi-directional agent is skip free if moving in at least one direction is gradual.

It is easy to see that relabeling the states and multiplying them with −1 and

relabeling utilization is without loss, thus throughout I will assume the agent

is active skip free.

With the basic environment identified, the mechanics of the interactions

can now be introduced. Since long term contracting is not possible the fol-

lowing events unfold at every period. The principal observes the state of the

agent and offers a contract (which can be spot or relational based on the

setting at hand, but not full commitment). The contract will specify a non-

negative payment from the principal to the agent (and inherent promises) for

the agent to abide by the principal’s desired level of utilization. As with most

contracting settings I assume the parties’ payoffs are quasi-linear in the pay-

ments, hence, in an interaction with payments pt ≥ 0, the principal receives

11

a payoff of v(I t, st)− pt, and the agent receives a payoff of pt + c(I t, st). The

agent can refuse the contract, in which case the parties take their outside

options v(st), c(st) which could potentially depend on the state.

Along with most of the dynamic contracting literature, this paper focuses

on constructing a plan of payments and utilization decisions to achieve the

best long term outcome for the principal. Of course the agent has to be

incentivized to follow through, which is done via the payments and promises

since there is no long term contracting. Moreover, the principal should also be

incentivized to follow through with future promises as well. I do not further

characterize the incentive constraints in this section as they depend on the

particular relationship of interest and two particular cases are developed

further in a contract manufacturing setting in sections 2 and 3.

Given that the physical, payoff relevant state is controlled by utilization

in a Markovian fashion, finding the principal optimal plan can be considered

as a Markov decision problem for the principal. In this Markov decision

problem there will be some forward looking constraints arising from incentive

constraints. The choice variables are the utilization decisions and payments.

Letting the incentive constraints be denoted by ICA and ICP respectively

for the agent and the principal, the problem of finding the optimal plan looks

as follows: 4

maxpt,Itt∈N

E

(∞∑t=0

δt(v(I t, st)− pt)

)subject to

ICA

ICP

As noted in Marcet and Marimon (2011) “in the presence of forward look-

4Notice that incentive constraints is used rather loosely here and could be a set ofconstraints involving participation, incentives, truth-telling etc. IC· here denotes theentire set of constraints that have to hold on all histories for the agent/principal.

12

ing constraints, optimal plans typically do not satisfy the Bellman equation

and the solution does not have a standard recursive form”. There are two

approaches that are generally taken when faced with this difficulty, the first

is using endogenous promise keeping constraints in the form of future pay-

offs, but in that case a recursive characterization of the payoff space (such as

Abreu, Pearce, and Stacchetti (1990) for repeated games) is necessary. Al-

ternatively a recursive saddle point operation can be done as in Marcet and

Marimon (2011) but in that case concavity of the problem and smoothness of

the forward looking variables are necessary to define a recursive Lagrangian

with additional co-states. The approach taken in this paper is a middle

ground of both approaches, relying on two key aspects: bi-directionality and

transferrable utility.

The first step is using bi-directionality to characterize a set of possible

solutions to a relaxed version of the principal’s problem. In general, incentive

constraints are forward looking, hence period 0 constraints incorporate fu-

ture constraints on a time discounted average criterion instead of holding at

every history. Using this observation a relaxed version of the problem where

only the period 0 constraints of the agent are taken into account is a rele-

vant upper bound for the principal’s problem. Let IC0A denote the period 0

constraints of the agent. Given some continuity and compactness conditions

on the actions and payoffs the relaxed problem is a fairly “standard” Markov

Decision Problem which has an optimal Markov Policy.5

maxpt,Itt∈N

E

(∞∑t=0

δt(v(I t, st)− pt)

)subject to

IC0A

5There are multiple different equivalent conditions for having a “well behaved” MarkovDecision problem, instead of trying to give a general treatment I refer interested readersto Puterman (2014) here, for the models in sections 2 and 3 a more precise statement ofthe relevant conditions is provided in the appendix.

13

Notice that a time-homogenous Markov Policy to this relaxed problem will

specify a payment and utilization for every state of the agent. However,

due to skip free bi-directionality, it can be shown that any time-homogenous

Markov Policy is equivalent to a threshold utilization policy.6 That is there

will be a state sk∗ , such that if the current state is below/above the threshold

the agent is utilized/not utilized. The intuition for this result is rather clear,

suppose for Pa(k) ≤SSO sk for all states k and consider any Markov Policy.

Since a Markov policy maps the states to utilization decisions, due to the

finite state space one of the following has to be true: Either the agent is

utilized on all states under this policy or there is a highest state sk where

the agent is not utilized. In the latter case, after a finite amount of time

the states below sk are never reached since utilization is stopped after sk

and the agent moves in the opposite direction when not utilized. But by

this argument any Markov Policy is equivalent to a threshold policy as far as

utilization decision is concerned. This intuition is not enough to tell what the

threshold is or what the payments are, just that the search for the optimal

policy can be reduced to policies where the utilization is done via a threshold.

Given the threshold form of utilization decisions the next step is to con-

struct a Markov payment scheme. This construction is ideally done for gen-

eral threshold utilization decisions and satisfies the incentive conditions of

the agent on all possible histories realized under threshold policies. Although

there is no general sufficient and necessary conditions for existence of such

payments, in practice it can be done via various methods depending on the

problem at hand. This payment will generally depend on both the state and

the utilization decision p(I t, st). 7 Once a payment scheme is identified, then

6The formal versions of this statement for each model along with their proofs can befound in the appendix

7The existence and properties of the such payments depends on the incentive frictionsat hand and in general depend on the entire structure of the problem including the lawsof motion at different states, furthermore there could be multiple payment schemes thatsatisfy the incentive constraints in a Markovian fashion. In sections 2 and 3 two suchconstructions are provided for two different setups.

14

the principal can focus on the returns from the agent, v(I t, st) − p(I t, st).

Since the returns only depend on the utilization decision they can be repre-

sented as Ra(st) = v(1, st)−p(1, st) and Rp(s

t) = v(0, st)−p(0, st). Utilizing

this notation, the relaxed principal’s problem can be written as

maxItt∈N

E

(∞∑t=0

δt(I tRa(st) + (1− I t)Rp(s

t))

).

This problem is a single-arm restless bandit problem. If the problem is

indexable then the optimal threshold can be characterized using the Whittle

Index (Whittle 1988), or more generally the marginal productivity index

(Nino-Mora et al. 2001). The indexability of a restless bandit is a large

literature that I do not discuss here, but I refer interested readers to Gittins,

Glazebrook, and Weber (2011) and the references therein. However, it is

important to notice that payments are baked into the bandit in these setups.

Thus, unlike standard restless bandit problems the properties of the bandit

can be altered directly using the payments if there are multiple payment

schemes that satisfy the incentive constraints. For the particular models in

sections 2 and 3, the bi-directional form allows me to capture indices in closed

form, and I use the sufficiency condition introduced in Nino-Mora (2007)

which enables ascertaining the indexability of the problem via monotonicity

of the indices.

The index has a natural interpretation in an economic setting. Consider

a hypothetical agent problem where utilization yields the same returns, and

non-utilization yields a subsidy equal to λ in addition to the returns from

non-utilization. Using threshold policies, the subsidy level for indifference

of utilization at particular states can be identified and yields the indices

much like Gittins (1979) index. The main characteristics of the indices are

quite similar, they provide a way to capture the marginal value of utilizing

an agent/activating an arm in a state, taking into account the entire law

15

of motion. They do not depend on other agents, they do not depend on

history, they only depend on the current state. The original Gittins index

policy is for a problem where arms that are not pulled remain the same,

thus the Gittins index has the natural interpretation as the time normalized

average returns of utilization and looks like a stopping problem. The index

identified here captures the time normalized marginal returns to utilization

and is a problem of selecting a threshold. The margins here are in the policy

space but have the added benefit that in cases where there are no payments

without utilization, positive indices also imply that the principal is willing

to follow the utilization decision.

Just like the transition from single arm to multiple arm bandits, if there

are multiple bilateral relationships that the principal maintains that are tied

by utilization the approach could be extended. An example could be the prin-

cipal choosing among multiple agents where the utilized agents become tired

and the non utilized agents become less tired. The caveat in this extension is

that the utilization constraints in a restless bandit in the most general form

are on average instead of exact. That is, if there are M agents and the uti-

lization constraint has to be such that the principal utilizes k < M agents on

average, sometimes less, sometimes more, instead of utilizing k agents every

period. The model in section 3 explores how such a setup arises in pharma-

ceutical contract manufacturing and formalizes the intuition provided in this

section.

The two different settings in this paper explore contract manufacturing,

an important form of outsourcing in many industries. In addition to exploring

the dynamics of contract manufacturing, the two models also highlight the

applicability of the methodology. Despite differences in incentive conditions,

differences in the number of agents and the non-recursive nature, this new

approach promises wide applications to economic problems where lack of

recursion causes technical challenges.

16

2 Single Sourcing and Learning by Doing

In this section, I investigate the optimal relational contract between a single

powerful contract manufacturer and a single client. This section is motivated

by electronics contract manufacturing, which was roughly $515 billion in 2015

(Rajaram 2015). As noted in Lewis and Yildirim (2002) learning through

production is very prominent in high technology markets. However as noted

in (Arrunada and Vazquez 2006), clients might sometimes take production

in-house as “an ambitious, upstart CM may decide to build its own brand

and forge its own relationships”. Thus the in-house production relies on

organizational forgetting, which as noted in Besanko, Doraszelski, Kryukov,

and Satterthwaite (2010) is well documented in many industries. This section

explores how a client can exploit the learning economies while still avoiding

potential competition arising from the CM itself.

Formally, there are two players, 1 principal (she) and 1 agent (he). Time

is discrete and the horizon is infinite, and both players discount future payoffs

with a discount factor δ ∈ (0, 1). At each period t the following events unfold:

• The agent’s production cost ct is realized and becomes publicly known.

• The principal makes an offer to the agent that specifies a production

source that is either in-house or the agent, and a payment to the agent.8

• The agent decides whether to accept the offer not. If the agent accepts,

production and payments happen as contractually specified, in addition

if the source of production is the agent, the principal also covers the

cost ct. Alternatively the agent can reject the contract and enter into

competition with the principal by paying a fixed fee F . If the agent

8The contract where the source of production is the principal reflects the case wherethe agent agrees not to enter that period and may receive payments in return

17

enters into competition the principal is forced to produce in-house from

that period onward.

A history at period t consists of all the utilization decisions and all

the payments made as well as all the costs, ht = ((I0, p0, c0), (I1, p1, c1),

. . . , (I t, pt, ct)) up to period t. Given that the strategic interaction ends once

the agent enters I implicitly assume that the agent has not entered until

period t while considering a period t history. The set of all histories at pe-

riod t, t + 1, . . . generate a growing sequence of σ-algebras, i.e. a filtration,

Ft. The probability triple (i.e. the filtered probability space) is given by

(Ω, Ft, P ) where, Ω is the set of all histories P is the probability measure

over Ω and Ft is the natural filtration. Let E denote the expectation oper-

ator associated with P . Throughout the analysis I focus on pure strategies.

The principal’s strategy denoted as π is a Ft measurable plan of utilization

decisions and payments I t, ptt∈N. The agent’s entry decision is an extended

stopping time τe on (Ω, Ft, P ), τe : Ω→ N∪∞ such that the event τe ≤ t

is Ft measurable. A tuple (π, τe) is called a relational contract.

Throughout I assume there is open book accounting and the costs of

production is borne by the agent but paid in full by the principal. In this

section only I abstract away from explicitly modeling potential frictions such

as hold up problems which are prevalent in contract manufacturing. However

hold up will be considered more explicitly in the following section. In this

section such cases can readily be encompassed into the model without much

alteration. Simply put, the fixed cost F in principle could take into account

future benefits and liabilities of holding up the principal as well.

2.1 Costs and Per Period Payoffs

2.1.1 Cost Structure

If the agent is utilized for production he may get more efficient. However,

when the principal opts for in-house production the agent organizational

18

forgetting takes place and the agent loses efficiency. Formally, the costs of

production for the agent is a controlled Markov chain with a finite state space.

For simplicity I assume the following cost structure and law of motion: The

cost of the agent in period t is ct ∈ C ≡ c1, c2, . . . , cn−1, cn ⊂ R+ with

n ∈ N. The costs are weakly increasing i.e., c1 ≤ c2 ≤ . . . cn and satisfy the

following convexity assumption: ck− ck−1 ≤ ck+1− ck for 1 < k < n. That is

the efficiency gains get smaller as the agent moves along the learning curve.

If the production is outsourced to the agent in period t, then the costs

decrease according to the following distribution:

(ct+1|I t = 1, ct = ck) =

ck with 1− r probability if ck > c1

ck−1 with r probability if ck > c1

ck if ck = c1

(CD)

Here a single parameter (r) captures the speed of learning. When r is higher

the agent is likely to experience cost reduction when being utilized for pro-

duction. When r is low the agent is likely to stay with the same cost even if

he is utilized. On the other hand when the principal produces in-house, the

agents cost increase according to the following distribution:

(ct+1|I t = 0, ct = ck) =

ck+1 with q probability if ck < cn

ck with 1− q probability if ck < cn

ck if ck = cn

(CU)

Similarly, the parameter (q) captures the speed of forgetting in this formula-

tion. When q is higher the agent is likely to lose efficiency when not working.

When q is low the agent is likely to stay with the same cost even if he doesn’t

work.

This cost structure captures learning by doing in a simple manner. As

the agent is utilized more by the principal he gets more efficient, when he

19

isn’t utilized he slowly loses this efficiency.

An alternative interpretation of the cost structure that fits the electronics

industry is as follows. Over time the production of cutting edge technology

incrementally becomes more demanding and technically involved, thus being

utilized for production enables the contract manufacturer to at least keep up

with the requirements of the current frontier, while a non-utilized agent may

fall behind if the frontier moves forward and become less proficient.

In order to capture learning by doing, I assume that at the beginning of

the game the agent is at the beginning of the learning curve, i.e. c0 = cn.

2.1.2 Actions and Per Period Payoffs

The profits from production depend on the market structure (i.e. the number

of producing firms) which, in this model is determined by whether the agent

has entered the market or not. I assume that there are no learning opportu-

nities in the in-house production as the principal has already done the R &

D. The inefficiency in the in-house production arises because the principal’s

production capabilities are small compared to the agent and hence, does not

have similar scale economies.

Given a relational contract (π, τe) the period t payoff of the principal

up(ct, π, τe) is given by:

up(ct, π, τe) =

vo − ct − pt if I t = 1 and P (τe > t|Ft) = 1

wo − pt if I t = 0 and P (τe > t|Ft) = 1

we otherwise

20

The period t payoff of the agent denoted by uao is as follows:

ua(ct, π, τe) =

pt if P (τe > t|Ft) = 1

ve − ct − F if P (τe = t|Ft) = 1

ve − ct if P (τe < t|Ft) = 1

Where w’s represent profits via in-house production and v’s represent

revenues from production by the agent. I assume we < wo and ve < vo so

competition reduces the profits for both parties.

2.2 Payoffs and Constraints

Given a relational contract (π, τe) the total discounted payoff to agent Ua(π, τe)

is as follows:

Ua(π, τe) = E(τe−1∑t=0

δtpt +∞∑t=τe

δt(ve − ct)− δτeF |π, τe) (2.1)

Given a relational contract (π, τe) the total discounted payoff of the prin-

cipal Up(π, τe) is as follows:

Up(π, τe) = E(τe−1∑t=0

δt(I t(vo − ct) + (1− I t)wo − pt) +∞∑t=τe

δtwe|π, τe) (2.2)

The payoff of the agent when he enters the market is dependent only on

his cost level when he enters but he continues to learn after entry. Thus the

payoff of the agent after he enters the market with a ck is:

21

U ea(k) = E(

∞∑t=0

δt(ve − ct)|c0 = ck)− F

=k−2∑n=0

(rδ

1− δ(1− r)

)nve − ck−n

1− δ(1− r)+

(rδ

1− δ(1− r)

)k−1ve − c1

1− δ− F

Finally the agents profits net of the fixed cost after entry with cost level ck

is denoted A(k) and is given by:

A(k) = U ea(k) + F

I assume the following:

Assumption 1.

(1− δ + δr)wo + δq(vo − cn)

(1− δ)(1− δ + δr + δq)− U e

a(n− 1) ≥ we1− δ

Assumption 1 is a sufficient condition for the principal to not compete

in the long run. That is the payoff from maintaining the relationships at its

most inefficient level net of the outside option for the agent is larger than

eventually allowing the agent to enter. Under assumption 1 the optimal

contract is one of infinite length since it should perform better than utilizing

the agent at the highest cost. I restrict attention to relationships of infinite

length, as relationships of finite length are just optimal control problems with

stopping. Optimal control problems with stopping have a well developed

literature hence not of interest in this paper.

Assumption 2.

U ea(n) ≥ 0

Assumption 2 guarantees that entry yields non-negative profits for the

agent.9

9Assumption 2 is for simplifying the strategy space. It is possible to consider an

22

In order to make sure that the agent doesn’t enter, it must be the case

that for any period the continuation utility for the agent from that period

onward is greater than entering the market. For any period t

E(∞∑t=t

δt−tpt|Ft) ≥ E(∞∑t=t

δt−tve − ct|Ft)− F. (ICA)

Analogously the incentive constraint of the principal takes the following

form for any period t:

E(∞∑t=t

δt−t(I t(vo − ct) + (1− I t)wo)−∞∑t=t

δt−τpt|Ft) ≥we

1− δ(ICP )

With all the constraints identified, the principal’s problem can be summed

up as follows:

Problem 1 (Principal’s Problem).

maxpt,Itt∈N

E(∞∑t=0

δtI t(v − ct + (1− I t)wo)−∞∑t=0

δtpt)

subject to ICA

ICP

2.3 The Principal Optimal Contract

Under assumption 1, in the principal optimal relational contract the agent

is sometimes utilized and sometimes is not, but the agent is paid enough so

that he never enters. In this section I first identify the payment rules that

extended model where the agent can reject the offer of the principal but not enter. Sincerejection only happens off-path, the assumption guarantees that the optimal punishmentis entry in this broader model. This broader model without the assumption also results ina qualitatively similar contract but with a slightly different payment scheme if it is everthe case that the agent can not threaten with entry for some high cost levels.

23

satisfy the agents incentive constraints, then I identify the optimal schedule

of utilization.

2.3.1 Monotone Payments

First, I introduce the following payment scheme, which is a Markovian pay-

ment scheme that depends on the cost of the agent and utilization.

Definition 1 (Monotone Payments). A monotone payment scheme is identi-

fied by Markovian payments that depends on the current efficiency level of the

agent and whether or not the agent is utilized, (pt|ct = ck, It = 1) = pa(ck)

and (pt|ct = ck, It = 0) = pp(ck) where,

pa(ck) =ve − ck − (1− δ)F

pp(ck) =A(k − 1) (1− 2δ + δ2 + δq + δr − δ2q − δ2r)

1− δ + δr

− F (1− 2δ + δ2 − δr − δ2r)− (ve − ck)δq1− δ + δr

.

The agent can threaten to enter into the market regardless of whether the

principal wants to outsource in that period or not. However, the threat de-

pends on the current efficiency level of the agent thus the monotone payments

when not utilized reflect this.

Before fully describing the optimal contract, it is useful to introduce the

following definition and highlight its importance.

Definition 2 (Monotone Utilization Policies). A policy is called monotone

utilization policy if ∃k ∈ 1, 2, . . . , n such that for all j < k ct = cj ⇒I(t) = 0 and for all l ≥ k ct = cl ⇒ I(t) = 1.

A policy is called monotone utilization policy with threshold k if the agent

is utilized whenever his costs are greater or equal to ck and he is not utilized

if his costs are below ck.

24

In the appendix, propositions 4 and 5 show that with a monotone utiliza-

tion policy, monotone payments satisfy the agent’s incentive constraint with

equality.

2.3.2 Optimal Utilization Rule

In this setting it turns out that a principal optimal utilization rule is an

index policy with a threshold. Each cost level of the agent is associated with

an index, the agent is utilized if and only if the index of the current cost

level is higher than the threshold. The indices are increasing as the cost

level increases thus they also pinpoint a monotone utilization policy. From

the above discussion monotone payments are an immediate candidate for

payments, and in fact, are an intrinsic part of the indices.

Definition 3. The index of state cx denoted by λ(cx) is given by

λ(cx) =fxx − fxx+1

gxx − gxx+1

,

where

fxx =(1− δ + δq)

(1− δ)(1− δ + δr + δq)(vo − cx − pa(cx))

+(δr)

(1− δ)(1− δ + δr + δq)(wo − pp(cx−1))

fxx+1 =δq

(1− δ)(1− δ + δr + δq)(vo − cx+1 − pa(cx+1))

+(1− δ + δr)

(1− δ)(1− δ + δr + δq)(wo − pp(cx))

gxx =(1− δ + δq)

(1− δ)(1− δ + δr + δq)

gxx+1 =δq

(1− δ)(1− δ + δr + δq).

The index of state cx captures the time normalized marginal change in

25

profits by adding the state cx to the monotone utilization policy x+ 1.

Theorem 1. The optimal utilization rule is characterized by a set of indices

λ(cx)cx∈C and a threshold λ∗ ≥ 0. For all t, I t = 1 ⇔ λ(cx) ≥ λ∗ and

I t = 0⇔ λ(cx) < λ∗, λ∗ is the smallest λ ∈ R+ that satisfies

fc(λ)c(λ) ≥

(1− δ)(vo − ve + (1− δ)F ) + δrwe(1− δ)(1− δ + δr)

.

With c(λ) = minc ∈ c1, c2, . . . , cn−1, cn : λ(c) ≥ λ

The dynamics of the relationship are driven by the learning opportunities

available to the agent hence comparative statics with respect to learning are

of particular importance. Due to the relatively simple policy and the closed

form of the index, it is possible to investigate these relationships.

Proposition 1. As the speed of learning, r, increases, the profits of the

principal and the expected discounted time the agent spends working, decrease.

Proposition 1 highlights the tension in the relationship as the learning

opportunities increase. The decreasing profits for the principal that arises

from higher speed of learning may at first seem surprising but has a very

clear intuition. A higher speed of learning increases the predatory threat

of the agent. That is, an agent who learns faster may be able to enter the

market and compete with the principal sooner. Thus, the principal has to

pay higher wages to prevent entry at the threshold level. Moreover, due

to faster learning, the time that the agent is utilized for production at the

threshold level also decreases, leading to loss of profits for the principal since

the agent is more efficient. Finally, due to monotone payments, the gains

arising from faster learning before the threshold is reached goes to the agent.

Thus, there is no change to the principal’s profits before the threshold is

reached. The three effects combined lead to a loss for the principal arising

from a faster learning agent.

26

When choosing the optimal threshold the principal must consider two

factors, first is the effect of utilizing the agent on profits, second the effect of

this utilization on the threat of entry. The index λ(cx) captures the marginal

gains in profits from a utilization decision. An important feature of the

optimal policy is that the payments become larger as the agent gets more

efficient. That is, as the agent gets more efficient, the payments necessary

to maintain the relationship increase. The driving factor for this result is

that a gain in efficiency increases the outside option of the agent. Thus the

efficiency gained precisely by working for the principal is used to threaten

the principal. At the most efficient level of the optimal contract the agent is

not utilized but he is paid just to make sure he does not enter.

3 Multiple Sourcing and Capacity Constraints

In this section I investigate the optimal relational contract between a single

principal and multiple imperfectly substitutable contract manufacturers. In

situations where the product of the principal is commodified, a principal usu-

ally maintains relationships with multiple contract manufacturers. This is

due to the fact that sometimes the contract manufacturers are capacity con-

strained and have to utilize their “flex capacity” which reflects on their costs

(Langer 2015a). In this case, a contract manufacturer can still work albeit

at higher costs due to the overtime required. Moreover since the product is

commodified, hold-up becomes a more relevant threat, whereas learning goes

out of the picture as CM’s are often at the forefront of technology adoption

(Langer 2015b).

I build upon the workhorse model of Board (2011) while adding the effect

of utilization. Formally, there are N + 1 players; 1 principal (she) and N

agents (he). Time is discrete and the horizon is infinite i.e., t ∈ N. All parties

share a common discount rate δ ∈ (0, 1). At each period t the following events

unfold:

27

• Agents’ production costs ct = (ct1, . . . , ctN) are realized and become

publicly known.

• The principal chooses on average one agent i ∈ N to utilize and pays

the cost cti, and implicitly promises pti out of production. Agents can

refuse the utilization before the cost is paid. Formally, let I ti ∈ 0, 1denote the principal utilizing agent i in period t.10

• The utilized agents manufacture products of value v and keep pti, giving

back v − pti to the principal. Alternatively the agents can hold up the

principal and keep up to v − li 6= pti, where li < v.

In case of contract manufacturing, firms often outsource the entire pro-

duction, and many aspects of the process are not contractible. The maximum

amount of hold up, v− li, captures the liabilities associated with intellectual

property rights of the principal and potential costs of litigation associated if

the agent tries to hold up the principal. The bargaining power of an agent

depends on the size of his liabilities li. The case of 0 liabilities becomes a

pure hold-up problem where agents enjoy full bargaining power.

A history at period t consists of all the utilization decisions and all the

payments made as well as all the costs, ht = ((I0i i∈N , p0

i i∈N , c0i i∈N),

(I1i i∈N , p1

i i∈N , c1i i∈N), . . . , (I tii∈N , ptii∈N , ctii∈N)) up to period t.

The set of all histories at period t, t + 1, . . . generate a growing sequence

of σ-algebras, i.e. a filtration, Ft. The probability triple (i.e. the filtered

probability space) is given by (Ω, Ft, P ) where, Ω is the set of all histo-

ries P is the probability measure over Ω and Ft is the natural filtration.

Let E denote the expectation operator associated with P . Principal’s strat-

egy denoted as π is a Ft measurable plan of utilization decisions. Formally,

I tii∈Nt∈N, where I ti = 1 if agent i is selected at period t and 0 otherwise.

Similarly pti denotes the non-contractible fee that i receives from the prin-

cipal. I assume that fees are withheld from the value of the product thus

10See equation 3.1 for the precise statement of the average constraint.

28

can only be made when an agent is producing for the principal. Formally,

I ti = 0 ⇒ pti = 0. An agents strategy denoted by σi is a F measurable

plan of fees kept out of production value, ptit∈N. A tuple (π, σ), where

σ = (σ1, . . . , σN) is called a relational contract.

3.1 Costs and Per Period Payoffs

3.1.1 Cost Structure and Average Demand

In the previous section utilization could lead to a decrease in costs due to

learning. However, in the case of a commodified product such in pharma-

ceuticals, there is no scope for learning. As noted in (Langer 2015a) CMs

are often at the forefront of new technology adoption, however, encounter-

ing capacity problems due to utilization by clients is a significant problem.

Thus, a frequent use of an agent may lead to higher costs and not lower,

since overuse of assets by the CM (e.g., pay overtime) will slowly cause his

costs to increase. When an agent is not utilized by the principal, he does

not encounter any capacity constraints. Thus his costs decrease to a lower

bound.

Formally, the principal’s costs of utilizing an agent is a controlled Markov

chain with a finite state space. For simplicity I assume the following cost

structure and law of motion: The cost of an agent i at period t is cti ∈ Ci ≡ci,1, ci,2, . . . , ci,ni−1, ci,ni

⊂ R+ with ni ∈ N. The costs are weakly increasing

i.e., ci,1 ≤ ci,2 ≤ . . . ci,niand satisfy the following convexity assumption:

ci,k− ci,k−1 ≤ ci,k+1− ci,k for 1 < k < ni. The whole vector of costs in period

t is denoted ct = (ct1, . . . , ctN) ∈ C1 × . . . CN .

I assume v > ci,nifor all i, so if there were no payments to the agents, the

principal would always like to produce. Similarly the liabilities are smaller

than the cost of production, thus for all i, ci,1 > li. If agent i is chosen

by the principal in period t, his costs increase according to the following

29

distribution:

(ct+1i |I ti = 1, cti = ci,k) =

ci,k+1 with qi probability if ci,k < ci,ni

ci,k with 1− qi probability if ci,k < ci,ni

ci,k if ci,k = ci,ni

(CU)

If agent i is not chosen by the principal in period t, he catches up with his

entrenchments and his costs decrease to their initial levels:

(ct+1i |I ti = 1, cti = ci,k) =

ci,1 with ri probability

ci,k with 1− ri probability(CD)

This cost structure displays entrenchments and spillovers in the economy in a

simple manner. CU captures the upward movements in costs due to reaching

capacity constraints after being utilized and CD captures the reinitialization.

So entrenching to an agent increases his costs (price of utilization), and the

spillover from demand for an agent is the opportunity generated for other

agents to catch up to their other work/rest so that their future costs are not

as high.

Finally I assume that the principal faces a long term average demand of 1,

with potential fluctuations hence the total amount produced via utilization

has to satisfy the following.

E(∞∑t=0

∑i∈N

I ti ) ≤ 1/(1− δ). (3.1)

This average demand can be interpreted as a complete markets assumption,

where the principal can shift production intertemporally. However, it is

also necessary since it reduces the frictions arising from the general discrete

modelling. A sharper utilization constraint such as∑

i∈N Iti ≤ 1 requires

30

more stringent assumptions on the structure due to the way the Whittle

(1988) index is defined. A particular assumption is not adopted in the bulk

of this paper as the assumptions in the restless bandits literature are only

of sufficient nature. One particular simplifying assumption is provided as an

extension at the end of this section.

3.1.2 Actions and Per Period Payoffs

The period t payoff of the principal denoted up(ct, π) given a relational con-

tract (π, σ) is given by

up(ct, π, σ) =

N∑i=1

I ti (v − cti − pti).

The period t payoff of an agent i denoted ui(ct, π) given a relational

contract (π, σ) is given by

ui(ct, π, σ) = I tip

ti.

3.2 Payoffs and Constraints

The total discounted payoff of agent i denoted Ui(π, σ) given (π, σ) is as

follows:

Ui(π, σ) = E(∞∑t=0

δtI tipti|π, σ). (3.2)

Similarly the profits of the principal Up(π, σ) given (π, σ) is as follows:

Up(π, σ) = E(∞∑t=0

N∑i=1

δtI ti (v − cti − pti)|π, σ). (3.3)

31

From the profits it is easy to isolate the profits raised from agent i as follows:

U ip(π, σ) = E(

∞∑t=0

δtI ti (v − cti − pti)|π, σ).

Since the principal only incentivizes agents when they are utilized, the

incentive constraint depends on the times she utilizes agent i. Let τi,1 =

inft ≥ 0 : I ti = 1 denote the first time agent i is utilized by the principal.

Inductively, let τi,n = inft > τi,n−1 : I ti = 1 denote the nth time agent i is

utilized by the principal. Thus the sequence of random variables τi,nn∈Ndenotes all the periods that agent i is utilized by the principal. Hence, the

incentive constraint takes the following form:

E(∞∑k=n

δτi,k−τi,npτi,ki |π, σ) ≥ E(I

τi,ni (v − li)) for all n. (ICi)

Analogously the incentive constraint of the principal takes the following

form:

E(∞∑k=n

δτi,k−τi,n(v − cτi,ki − pτi,ki )|π, σ) ≥ 0 for all n for all i. (ICP )

Notice that this incentive condition implies that the punishments are bilat-

eral. The main reason for this is to broaden the scope of industries captured

by the model. Depending on the industry, agents may or may not be able

to jointly punish a principal, but if a relational contract is sustainable under

bilateral punishments, then it is necessarily sustainable when multiple agents

cooperate to punish the principal.

With all the constraints identified, the principal’s problem can be summed

up as follows:

32

Problem 2 (Principal’s Problem).

maxptii∈N ,Iti i∈Nt∈N

E(∞∑t=0

N∑i=1

δtI ti (v − cti − pti))

subject to E(∞∑t=0

∑i∈N

I ti ) ≤ 1/(1− δ)

ICi ∀i

ICP

3.3 The Principal Optimal Contract

The principal optimal contract consists of utilization decisions and payments.

In this section, first I identify a simple payment rule that gives away min-

imal economic rents for any utilization rule. Then, I identify the optimal

utilization rule.

3.4 Fastest Prices

Once an utilization rule is chosen, payments can be made in a myriad of

ways while satisfying the incentive constraints of the agents. The fastest

prices were introduced in Board (2011) as a payment scheme that satisfies

the incentive conditions for all agents at every period with equality, i.e., while

giving minimum economic rents. The remarkable feature of fastest prices is

that they tie payments directly to utilization of the agent in question, without

relying on any other variables or agents. In addition, the payment scheme will

inherit any properties of the utilization scheme itself (such as being Markov).

Definition 4 (Fastest Prices). For any utilization rule I tit∈N, the fastest

prices are given by

pτi,ni = (v − li)E(1− δτi,n+1|τi,n) for all n ∈ N

33

Proposition 2 (Board). For any utilization rule I tit∈N, no pricing rule

can yield higher profits than fastest prices.

The proof of proposition 2 is identical to Board (2011), hence omitted.

One more helpful attribute of fastest prices is that even if the actual payment

times are not fully characterized, the delay arising from waiting until the next

state can be calculated in closed form, enabling identification of indifferences

very easily.

3.5 Optimal Utilization Rule

The optimal utilization rule is crucial since the payment rule can be readily

characterized for a given utilization rule using fastest prices. Hence, the

behavior of the entire principal optimal contract depends on the properties

of the utilization rule.

Despite the complex nature of the problem, the optimal utilization rule

is surprisingly simple. The optimal utilization rule is an index rule that is

augmented by a threshold. In particular, each agent is assigned an index that

is only dependent on his current cost, and the principal employs all agents

that have an index higher than the threshold.

Definition 5. The index of agent i at state ci,x is given by :

λi(ci,x) =fxi,x − fxi,x−1

gxi,x − gxi,x−1

,

34

where

gxi,x =

11−δ(1−qi) + δ qiδ

1−δ(1−qi)

[∑x−2n=0

11−δ(1−qi)

(δqi

1−δ(1−qi)

)n]1− δ

(δqi

1−δ(1−qi)

)x ,

gxi,x−1 =δ[∑x−2

n=01

1−δ(1−qi)

(δqi

1−δ(1−qi)

)n]1− δ

(δqi

1−δ(1−qi)

)x−1 ,

fxi,x =

v−ci,x1−δ(1−qi) + δ qiδ

1−δ(1−qi)

[∑x−2n=0

v−ci,n+1

1−δ(1−qi)

(δqi

1−δ(1−qi)

)n]− (v − l)

1− δ(

δqi1−δ(1−qi)

)x ,

fxi,x−1 =δ[∑x−2

n=0v−ci,n+1

1−δ(1−qi)

(δqi

1−δ(1−qi)

)n]− δ(v − l)

1− δ(

δqi1−δ(1−qi)

)x−1 .

Theorem 2. The optimal utilization rule is characterized by a set of indices

λi(ci,x)ci,x∈Cii∈N and a threshold λ∗ ≥ 0. For all t and all i, I ti = 1 ⇔

λi(ci,x) > λ∗ and I ti ∈ [0, 1]⇔ λi(ci,x) = λ∗, and I ti = 0⇔ λi(ci,x) < λ∗

λ∗ is the smallest λ ∈ R+ that satisfies

∑i∈N

[x−2∑n=0

1

1− δ(1− q)

(δq

1− δ(1− q)

)n+

(δq

1− δ(1− q)

)x−1

gci(λ)i,ci(λ)

]= 1/(1− δ).

With ci(λ) = maxci ∈ ci,1, ci,2, . . . , ci,ni−1, ci,ni : λi(ci) ≥ λ

The randomization at the exact threshold level is necessary to ensure that

the average demand is met. In general, when no two agents are identical,

only one agent will ever need to randomize.

When making utilization decisions, the principal could potentially be re-

lying on the entire history of all the relationships she maintains, as well as

all the promises she made. However, the optimal utilization scheme takes

a rather simple form. The threshold is fixed at the very beginning of the

game and the indices only depend on the current cost level of an agent. In

35

an economy with heterogenous agents and laws of motion, the indices pro-

vide a simple utilization scheme to maximize profits, where just utilizing the

cheapest agent is simple, but not necessarily profit maximizing.

An important feature of the index is that it does not depend on utilization

history. Thus, in an optimal utilization rule, whether an agent has worked

or not for the principal does not factor into the utilization decision. In

particular the index of an agent does not depend on the utilization history

of any agents, including himself. So, in the optimal contract an agent who

has worked for the principal does not receive preferential treatment over an

agent who has never worked for the principal before.

Moreover initial costs do not factor into the indices either. In particular,

at the start of a relationship an agent might have significant cost advantages

or disadvantages, resulting in very frequent or very rare utilization in the

early periods. However, these frequencies do not necessarily last throughout

the game. Where an agent starts from has no bearing on the relationship in

the long run.

Finally, the indices do not depend on other agents at all. Any utilization

decision necessarily implies that some agents are preferred over others. How-

ever, in the optimal contract this preference is not a cross comparison. The

principal does not compare agents and pick the best, she employs anyone that

is good enough. How “good” an agent is dependent on both the current cost

of an agent and his entrenchments (the law of motion for costs) which is cap-

tured by his index. The threshold represents the criterion for what is “good

enough”. This criterion for being good enough is fixed at the beginning of

the game and doesn’t change according to the realized costs or actions of

players throughout, although the initial determination does depend on the

total number of agents, their potential cost structures, and their starting

costs. In essence, who is good enough depends on what every agent should

do, but which history is actually realized does not change this criterion.

There are two kinds of loyalty that could be considered in this setting.

36

The first one is being loyal to promises, that is, if the principal promises

future work she will indeed employ the agents in the future. The principal

is loyal to her promises, in the sense that any utilization promise she makes

will be fulfilled in finite time. This is especially important since the entire

economy need not follow a recurring pattern, even in the very long run,

unless the strategies chosen by the players are precisely intended to cause

the economy to recur. When recursion is easily avoidable, breaking promises

is a very plausible strategy, unlike a repeated environment. However, even

though the optimal contract does not start with a cyclical pattern, a patient

principal will maintain her relationships by being loyal to her promises and

will eventually converge to a cyclical utilization pattern. The second kind of

loyalty that could be considered is preferential treatment of agents based on

a longer utilization history. As the indices do not depend on history at all,

this kind of loyalty is not present in the optimal contract.

3.6 A Simplification with Full Resetting

Although the setting explored here with different randomized recovery times

is broader, it comes with the cost of relaxing the utilization decisions to an

average. However if desired, by setting ri = 1 for all i and using the same

indices a sharper result can be obtained.

Proposition 3 (Jacko). If ri = 1 for all i, an index policy with fastest prices

and with indices identified as

λi(ci,x) =fxi,x − fxi,x−1

gxi,x − gxi,x−1

will yield at most one agent being employed every period. That is, the uti-

37

lization constraint can be reduced to∑i∈N

I ti ≤ 1.

The result of Jacko (2011) was to show that the marginal productivity

indices identified in a deterministically fully resetting restless bandit setting

is still optimal with the restricted utilization, which interpreted in this setup

yields the above proposition. However, this result also highlights two aspects

of the analysis. First, putting more structure onto the restless relationships

might deliver sharper results. Second, as the sufficient nature of the result

might imply, it is a daunting task to identify necessary conditions for par-

ticularly desirable results. In this paper I explored bi-directionality, which

I modeled as moving on a cost curve. However, the assumptions made, in-

cluding bi-directionality are sufficient (and not necessary) for indexability of

the problems.

4 Conclusion

Outsourcing the entire manufacturing of a product allows original equipment

manufacturers to reduce labor costs, free up capital, and improve worker

productivity (Arrunada and Vazquez 2006). Global contract manufacturing

had an expected volume of $515 billion in electronics industry and $40.7

billion in pharmaceutical industry (Rajaram 2015, Pandya and Shah 2013).

Under such potential gains contract manufacturing is inevitable, though it

entails inescapable hazards. (Arrunada and Vazquez 2006). This paper

explores two frequent problems, under different incentive conditions. First,

I explore potential predation by a single contract manufacturer when there

are opportunities for learning by doing that is tied to predation. Second, I

explore a setting with multiple contract manufacturers where entrenchments

and hold up opportunities need to be navigated.

38

In the first setting the opportunity for learning by doing has two effects,

while it becomes cheaper to employ the contract manufacturer as he gets more

efficient; under the threat of predation the efficiency gains are only enjoyed

by the contract manufacturer. The principal tries to limit the rents that need

to be paid by stopping utilization once the agent becomes too efficient. As

the contract manufacturer loses its edge by being kept out of production, he

is utilized again. The rents that need to be paid slowly increases as the agent

becomes more efficient and reaches a maximum when utilization is halted.

In the second setting a commodified product with multiple potential con-

tractors poses new incentive frictions. As McCoy (2003) notes, entrenchment

is a frequent problem, and changes the dynamics of not one, but multiple bi-

lateral relationships. This paper characterizes the optimal policy completely,

which utilizes any agent that is good enough and not only the best. The op-

timal policy is time consistent, and is in a simple closed form characterized

by indices. The principal keeps her promises, but she does not prefer one

agent over another just because she had employed one of them in the past.

So, past utilization does not lead to preferential treatment.

39

5 Appendix - For Online Publication

In most calculations it is necessary to use a common version (see Serfozo

(2009)) of Wald’s identity for discounted partial sums with stopping times.

For convenience I will include the identity here as well.

Identity 1 (Wald’s Identity for Discounted Sums). Suppose that X1, X2, . . .

are i.i.d. with mean x. Let δ ∈ (0, 1) and τ be a stopping time for X1, X2, . . .

with E(τ) <∞ and E(δτ ) exists. Then

E(τ∑t=0

δtXt) =x(1− δE(δτ ))

1− δ

5.1 Sole Sourcing Model

5.1.1 Proof Idea for Theorem 1

The proof of theorem 1 will be done as follows. First a relaxation of the

principal’s problem will be considered. Then a class of policies for the re-

laxed problem will be identified as candidates for an optimal solution to the

relaxed problem. The class identified will be used to calculate an appropri-

ate payment scheme that serves the dual purpose of satisfying the incentive

constraint of the agent and turning the relaxed problem into a single armed

restless bandit. The single arm bandit will have indices identified taking into

account the payment scheme and an optimal index rule will be delivered for

the relaxed problem. Finally, the return to the original problem is achieved

by observing the fact that the incentive constraint of the agent is already

incorporated to the indices. Principal’s incentive constraint will be satisfied

by making sure that indices are not only positive but above a threshold.

40

5.1.2 Proof of Theorem 1

This is the principals problem in the sole sourcing model(PPS).

maxpt,Itt∈N

E(∞∑t=0

δtI t(v − ct + (1− I t)wo)−∞∑t=0

δtpt) (PPS)

subject to

E(∞∑t=t

δt−tpt|Ft) ≥ E(∞∑t=t

δt−tve − ct|Ft)− F. for all t

E(∞∑t=t

δt−t(I t(vo − ct) + (1− I t)wo)−∞∑t=t

δt−τpt|Ft) ≥we

1− δfor all t

Consider the following relaxation of PPS:

maxpt,Itt∈N

E(∞∑t=0

δtI t(v − ct + (1− I t)wo)−∞∑t=0

δtpt) (RPPS)

subject to

E(∞∑t=0

δtpt|F0) ≥ E(∞∑t=0

δtve − ct|F0)− F.

Letting VPPS denote the optimal value of PPS and VRPPS denote the

optimal value of RPPS, it must be the case that

VRPPS ≥ VPPS

Focusing on RPPS, since the action space is compact and the returns

function is upper semi-continuous, Puterman (2014) shows that there is an

optimal pure Markovian solution.

Proposition 4. Any pure Markovian policy has an equivalent monotone uti-

lization policy.

41

Proof.

Observation 1. Any pure Markov policy, will map states into payments and

utilization decisions. Let π be any pure Markov policy and let Sπ denote its

active set, such that I t = 1⇔ ct ∈ Sπ.

Notice that the initial cost is cn, and consider any pure Markov policy

π, identified with its active set Sπ. Let cx = maxc ∈ C : c 6∈ Sπ. Then

by definition under policy π, for all t ct ∈ cx+1, . . . cn. Moreover, for all t,

I t = 1 ⇔ ct > cx. Now, consider the monotone utilization policy that has

the same payment rule as π, identified with cx+1 denoted by πx+1. Then, by

definition under policy πx+1 for all t, ct ∈ cx+1, . . . cn. Moreover, for all t,

I t = 1⇔ ct > cx. Thus the two policies are equivalent.

Thus for any pure Markov policy, there is an equivalent monotone uti-

lization policy.

Proposition 5. For a monotone utilization policy with threshold k, no pay-

ment scheme can yield higher profits than a monotone payments.

Proof. Consider any monotone utilization policy πk with threshold k. Under

a monotone utilization policy πk the CM will be utilized repeatedly until

his costs reach the threshold ck and after reaching the threshold he will be

rested whenever his costs hit ck−1 and will be utilized again when the costs

again rise to ck. Thus, under a monotone utilization policy, all states cl with

l > k are transient and the states ck and ck−1 are recurrent. The following

lemma’s will show that the incentive constraints hold with equality in both

the transient and recurrent states, implying that there are not payments that

can yield higher profits with a monotone utilization policy.

Lemma 1. Consider any monotone utilization policy πk with threshold k.

Incentive conditions hold with equality at the recurrent states with monotone

payments identified in definition 1.

Proof of Lemma 1. Let T (z)yx denote the expected discounted time spent in

cz starting from state cy under πx.

42

Starting from the recurrent states, ck and ck−1, the incentive conditions

holding with equality implies:

pa(ck)T (k)kk + pp(ck−1)T (k − 1)kk =E(∞∑t=0

δtve − ct|c0 = ck)− F

pp(ck−1)T (k − 1)k−1k + pa(ck)T (k)k−1

k =E(∞∑t=0

δtve − ct|c0 = ck−1)− F

Sublemma 1. For any 0 < r, q ≤ 1,

T (x)xx =(1− δ + δq)

(1− δ)(1− δ + δr + δq)(5.1)

T (x− 1)xx =δr

(1− δ)(1− δ + δr + δq)(5.2)

T (x)x−1x =

δq

(1− δ)(1− δ + δr + δq)(5.3)

T (x− 1)x−1x =

1− δ + δr

(1− δ)(1− δ + δr + δq)(5.4)

Proof of sublemma 1. T (x)xx is the unique solution to the following system

T (x)xx = 1 + δ(1− r)T (x)xx + δrT (x)x−1x

T (x)x−1x = δqT (x)xx + δ(1− q)T (x)x−1

x

Since expected discounted time is equal to 1/(1− δ), T (x− 1)xx satisfies

T (x− 1)xx = 1/(1− δ)− T (x)xx

43

Thus,

T (x)xx =(1− δ + δq)

(1− δ)(1− δ + δr + δq)

T (x− 1)xx =δr

(1− δ)(1− δ + δr + δq)

Similarly T (x− 1)x−1x is the unique solution to the following system

T (x− 1)xx = δrT (x− 1)x−1x + δ(1− r)T (x− 1)xx

T (x− 1)x−1x = 1 + δqT (x− 1)xx + δ(1− q)T (x− 1)x−1

x

Identically, T (x− 1)x−1x satisfies

T (x− 1)x−1x = 1/(1− δ)− T (x)x−1

x

Thus,

T (x)x−1x =

δq

(1− δ)(1− δ + δr + δq)

T (x− 1)x−1x =

1− δ + δr

(1− δ)(1− δ + δr + δq)

Letting A(k) denote E(∑∞

t=0 δt(ve − ct)|c0 = ck), by utilizing Strong

Markov property and Wald’s identity we have

A(k) =k−2∑n=0

(rδ

1− δ(1− r)

)nve − ck−n

1− δ(1− r)+

(rδ

1− δ(1− r)

)k−1ve − c1

1− δ

Finally, noticing A(k) and A(k − 1) satisfies the following identity

A(k) =δr

1− δ(1− r)A(k − 1) +

ve − ck1− δ(1− r)

44

A straightforward application of Farka’s Lemma implies that there are two

positive numbers pa(ck) and pp(ck−1) that satisfies the incentive conditions

with equality. Solving the system yields

pa(ck) = ve − ck − (1− δ)F

pp(ck−1) =A(k − 1) (1− 2δ + δ2 + δq + δr − δ2q − δ2r)

1− δ + δr

− F (1− 2δ + δ2 − δr − δ2r)− (ve − ck)δq1− δ + δr

Observe that monotone payments introduced in definition 1 will satisfy the

recurring states with equality.

Lemma 2. Consider any monotone utilization policy πk with threshold k.

Incentive conditions hold with equality at the transient states with monotone

payments identified in definition 1.

Proof of Lemma 2. The incentive condition is satisfied with equality starting

from the state ck, thus for pa(ck+1), letting τk denote inft≥0t : ct = ck for

the incentive condition to hold with equality we must have:

E(

τk−1∑t=0

δtpa(ck+1) + δτk(A(k)− F )) =rδ

1− δ(1− r)A(k) +

ve − ck+1

1− δ(1− r)− F

Plugging in the expectation yields:

(pa(ck+1)))

1− δ(1− r)+

δr

1− δ(1− r)(A(k)− F ) =

δr

1− δ(1− r)A(k) +

ve − ck+1

1− δ(1− r)− F

Simplifying yields:

pa(ck+1) = ve − ck+1 − (1− δ)F

45

Inductively for all x ≥ k + 1 we must have

pa(cx) = ve − cx − (1− δ)F

Where the incentive conditions hold with equality at every transient state.

Due to lemmas 1 and 2, the incentive conditions hold with equality at

every history implying that the principal can not to any better.

Now, with the monotone payments baked in I introduce the following

augmented return functions Ra(ct) and Rp(c

t) that is conditional on utiliza-

tion.

Ra(ct) = vo − ct − pa(ct)

Rp(ct) = wo − pp(ct)

Now, consider the following relaxed hypothetical problem with no con-

straints, and the law of motion for ct is identical to the principals problem.

maxIt

E(∞∑t=0

δt(I tRa(ct) + (1− I t)Rp(c

t)) (5.5)

Notice that this problem is a restriction of the problem RPPS where the

payments are assumed to be monotone payments. Since monotone payments

are Markovian as well, this problem also has a Markovian solution. Due to

proposition 4 any Markovian policy is equivalent to a monotone policy. Due

to proposition 5, monotone payments satisfy the constraint with equality,

thus an optimal solution to problem 5.5 is also an optimal solution to the

problem RPPS. Finally also notice that problem 5.5 is a one armed restless

bandit problem, explored in Nino-Mora (2007).

46

Let πx denote a monotone policy, such that I t = 1 ⇔ ct ≥ cx. Let

fkx denote expected discounted returns under policy πx with initial state ck.

Similarly let gkx denote expected discounted utilization under policy πx with

initial state ck. Formally:

fkx = E(∞∑t=0

δtR(ct|I t)|πx, c0 = ck)

gkx = E(∞∑t=0

δtI t|πx, c0 = ck)

Since monotone policies automatically induce a family of nested sets,

utilizing Nino-Mora (2007), the marginal productivity index for any state cx

denoted λ(cx) for the relaxed problem can be readily computed as

λ(ck) =fxx − fxx+1

gxx − gxx+1

Utilizing the strong Markov property along with Wald’s Identity, the

components of the index can be calculated in closed form.

fxx =(1− δ + δq)

(1− δ)(1− δ + δr + δq)(vo − cx − pa(cx))

+(δr)

(1− δ)(1− δ + δr + δq)(wo − pp(cx−1))

fxx+1 =δq

(1− δ)(1− δ + δr + δq)(vo − cx+1 − pa(cx+1))

+(1− δ + δr)

(1− δ)(1− δ + δr + δq)(wo − pp(cx))

47

Similarly,

gxx =(1− δ + δq)

(1− δ)(1− δ + δr + δq)

gxx+1 =δq

(1− δ)(1− δ + δr + δq)

Due to Gittins, Glazebrook, and Weber (2011) we know that the index

being monotone in the state is a sufficient condition for the problem to be

indexable, hence the index identified indeed captures the marginal produc-

tivity. Furthermore as there is just a single restless arm as shown by Jacko

(2009) the relaxed problem is optimally solved by indices.

The next step is to observe that the optimal solution of the relaxed prob-

lem is feasible in PPS. Due to proposition 5 the incentive constraint of the

agent binds with equality on every history that is reachable under the index

policy. The only thing that needs to be checked is the principals constraint

is satisfied.

Proposition 6. Under a monotone utilization policy with threshold λ∗ the

principal’s incentive condition is satisfied in all states.

Proof. It is helpful to introduce the following definitions

Definition 6 (Index policy augmented with threshold λ). A utilization pol-

icy identified by indices λ(cx)cx∈C and threshold λ is an index policy with

threshold λ if

I t = 1⇔ λ(ct) ≥ λ and λ(ct) ≥ 0

An index policy augmented with threshold λ chooses a monotone policy

via a single threshold and since the indices are increasing in the state, all

monotone policies can be identified by a threshold.

Definition 7 (Workload limit with threshold λ). For any index policy aug-

mented with threshold λ, the workload limit for an agent (when it exists) is

48

given by

c(λ) = maxc ∈ c1, c2, . . . , cn−1, cn : λ(c) ≥ λ.11

The workload limit for the agent indicates the lowest level of costs that

the agent is utilized at a given threshold λ. If the workload limit does not

exist for an agent at some threshold, then that agent is never utilized with a

threshold λ. Furthermore it is easy to see that c(λ) is increasing in λ. The

rest of the proof follows from the following two lemmas:

Lemma 3. fxx ≤ fx+1x+1 .

Proof of lemma 3. Observe that

fxx =(1− δ + δq)

(1− δ)(1− δ + δr + δq)(vo − ve + (1− δ)F )

+(δr)

(1− δ)(1− δ + δr + δq)(wo − pp(cx−1))

and

fx+1x+1 =

(1− δ + δq)

(1− δ)(1− δ + δr + δq)(vo − ve + (1− δ)F )

+(δr)

(1− δ)(1− δ + δr + δq)(wo − pp(cx))

But since pp(cx) ≤ pp(cx−1) we must have, fx+1x+1 ≥ fxx .

Lemma 4. fkx ≥ fxx for any k > x.

Proof of lemma 4. Observe that the principal’s expected profits with mono-

tone payments starting from any state ck is equal to:

fkx = gkx(vo − ve + (1− δ)F ) +

(1

1− δ− gkx

)(wo − pp(cx))

11Notice I used min here since indices are increasing in state.

49

Since (vo − ve + (1 − δ)F ) ≥ (wo − pp(cx)) for any x we only need to show

that. Letting τx denote inft≥0t : ct = cx by definition we also have

gkx = E(τx−1∑t=0

1|c0 = ck) + gxx

Thus, when k > x, gkx is increasing in k, which in turn implies fkx is increasing

in k for k > x.

Lemma 5. If fxx ≥(1−δ)(vo−ve+(1−δ)F )+δrwe

(1−δ)(1−δ+δr) , then fxx , fx−1x ≥ we/(1− δ).

Proof of lemma 5. Notice that

fxx =vo − ve + (1− δ)F

1− δ(1− r)+

δr

1− δ(1− r)fx−1x

Plugging in the above:

fxx ≥(1− δ)(vo − ve + (1− δ)F ) + δrwe

(1− δ)(1− δ + δr)

vo − ve + (1− δ)F1− δ(1− r)

+δr

1− δ(1− r)fx−1x ≥ (1− δ)(vo − ve + (1− δ)F ) + δrwe

(1− δ)(1− δ + δr)

Simplifying the terms yield

fx−1x ≥ we/(1− δ)

Noticing that gx−1x = gxx+1 we have gxx ≥ gx−1

x = gxx+1. Moreover we also have

fkx = gkx(vo − ve + (1− δ)F ) +

(1

1− δ− gkx

)(wo − pp(cx))

Since (vo − ve + (1 − δ)F ) ≥ (wo − pp(cx)) for any x, it must be the case

fxx ≥ fx−1x ≥ we/(1− δ).

Notice that due to lemma 3 fxx is increasing in state, thus a minima

50

is well defined. Due to lemma 4 we know that if the principal’s incentive

condition is satisfied at a threshold x, then it is also satisfied at any higher

state. Finally, due to lemma 5 we know that if x is a threshold that satisfies

the condition, then both at the threshold and the state immediately below

(where the agent is paid despite not being utilized) the incentive condition

of the principal is satisfied. Thus at the smallest positive threshold, both the

indices are positive and the principals’ incentive condition is satisfied.

5.1.3 Proof of Proposition 1

Proof of Proposition 1. The proposition follows directly from the following

lemmas and corollaries.

Lemma 6. For any x and −1 ≤ k ≤ n− x, as r increases gx+kx decreases.

Proof of lemma 6. Starting from the recurrent states:

gxx =(1− δ + δq)

(1− δ)(1− δ + δr + δq)

gx−1x =

δq

(1− δ)(1− δ + δr + δq)

are both decreasing in r. For any higher state x+ k for k ≤ n− x.

gx+kx =

1

1− δ(1− r)

1−(

rδ1−δ(1−r)

)k+1

1−(

rδ1−δ(1−r)

) +

(rδ

1− δ(1− r)

)k+1(δq)

(1− δ)(1− δ + δr + δq)

Differentiating yields

∂gx+kx

∂r=− k

δ(1− rδ1−δ(1−r))

(1− δ(1− r))2(

rδ

1− δ(1− r))k − qδ2

1− δ( rδ

1−δ(1−r))k+1

(1− δ + δq + δr)2

+kqδ2

(1− δ)(1− δ + δr + δq)

δ(1− rδ1−δ(1−r))

1− δ(1− r)(

rδ

1− δ(1− r))k ≤ 0

51

Lemma 7. For any k > 1 as r increases U ea(k) increases.

Proof of lemma 7. The proof is done by simple differentiation, observe that

for any state ck,

U ea(k) =

k−2∑l=0

(rδ

1− δ(1− r)

)lve − ck−l

1− δ(1− r)+

(rδ

1− δ(1− r)

)k−1ve − c1

1− δ− F

Differentiating yields

∂U ea(k)

∂r=

1

r2δ

[(k − 1)

((v − c1)

rδ

1− δ + δr

)k+ r2

]

+k−2∑l=0

rl−1δlv − ck−l

(1− δ + δr)l+2(l(1− δ) + rδ) ≥ 0

From lemma 7 it is easy to ascertain by a straightforward calculation, the

following corollary.

Corollary 1. For any x as r increases pp(cx) increases.

Lemma 8. Holding the threshold constant, the expected profits of the prin-

cipal decrease as r increases.

Proof of lemma 8. Let the threshold level be cx. First, observe that due to

monotone payments at any cost level ck ≥ cx higher than the threshold the

monotone payments to the agent for utilization is ve − cl − (1 − δ)F . But

this also implies that above the threshold level the profits of the principal is

independent of the cost level of the agent and equal to vo − ve + (1 − δ)F .

Thus the principal’s expected profits with monotone payments starting from

52

any state ck ≥ cx is equal to:

fkx = gkx(vo − ve + (1− δ)F ) +

(1

1− δ− gkx

)(wo − pp(cx))

But from lemma 6 and corollary 1 we know that gkx is decreasing in r, and

pp(cx) is increasing in r. Since (vo − ve + (1 − δ)F ) ≥ (wo − pp(cx)) for any

x, it must be the case that fkx is decreasing in r.

Lemma 9. Expected profits of the principal is weakly decreasing in r.

Proof of lemma 9. Suppose not, then there exists r′ ≥ r such that the profits

of the principal at the beginning of the game with r′ denoted by fnx∗(r′)(r′)

is greater than the expected profits of the principal at the beginning of the

game with r, denoted fnx∗(r)(r). Let cx∗(r′) denote the optimal threshold with

learning speed r′ and similarly let cx∗(r) denote the optimal threshold with

learning speed r. By assumption we must have

fnx∗(r′)(r′) > fnx∗(r)(r)

Now consider the expected payoffs of the principal with the suboptimal

policy, that adopts the threshold cx∗(r′) with learning speed r, denoted by

fnx∗(r′)(r). Since r′ ≥ r by lemma 8 it must be the case that:

fnx∗(r′)(r) ≥ fnx∗(r′)(r′) > fnx∗(r)(r)

But cx∗(r) is the optimal threshold for learning speed r, thus it must also be

the case that:

fnx∗(r)(r) ≥ fnx∗(r′)(r) ≥ fnx∗(r′)(r′) > fnx∗(r)(r)

leading to the desired contradiction.

53

5.2 Multiple Sourcing Model

5.2.1 Proof Idea for Theorem 2

The proof of theorem 2 will be done as follows. First, we will assume that

fastest prices are going to be utilized as a payment scheme to directly tie the

payment made to an agent to utilization of said agent. Then a relaxation

of the principal’s problem will be considered. Using the average utilization

constraint the relaxed problem will be decoupled as the fastest payments are

agent specific. The single agent problems are tied via a Lagrange multiplier,

but for any fixed Lagrange multiplier the single agent problems are also single

arm restless bandits. Once again single arm bandits will have indices iden-

tified taking into account the fastest payments and an optimal index rule

will be delivered for the relaxed problem. Finally, the return to the original

problem is achieved by observing the fact that the incentive constraint of the

agent is already incorporated to the indices due to the payments. Principal’s

incentive constraint will be satisfied by making sure that indices are posi-

tive. The average employment constraint will be satisfied by making sure

the indices are not only positive, but also are above a threshold.

5.2.2 Proof of Theorem 2

Fixing the payment rule to be the fastest prices, since there are no returns

when an agent is not utilized, with a slight abuse of notation let Ri(cti) =

v− cti− pti. Similar to the sole sourcing model, consider the following relaxed

version of the principal’s problem.

[max

Iti i∈Nt∈NE(

∞∑t=0

N∑i=1

δtI tiRi(cti))

]

subject to E(∞∑t=0

N∑i=1

δt(1− I ti )) ≥ 0

(5.6)

54

The Lagrangian for the relaxed problem is as follows

maxIti i∈Nt∈N

[E(

N∑i=1

∞∑t=0

δt(I tiRi(cti))|c0) + λ

(E(

N∑i=1

∞∑t=0

δt(1− I ti )|c0)

)]

Rearranging the terms yield,

maxIti i∈Nt∈N

E(N∑i=1

∞∑t=0

δt(I tiRi(cti) + (1− I ti )λ)|c0) (5.7)

Due to the linearity of expectations equation 5.7 can be further rearranged

as follows:

maxIti i∈Nt∈N

[N∑i=1

E(∞∑t=0

δt(I tiRi(cti) + (1− I ti )λ|c0))

](5.8)

The final iteration yields a relaxed restless bandit problem. Whittle

(1988) has shown that this relaxation is solved optimally arm by arm by

index policies if the problem is indexable. In particular, it is easy to see

that the entire sum is going to be maximized if each individual summand is

maximized. A single summand is a single arm restless bandit problem where

passive rewards are equal to λ.

5.2.2.1 Single Arm Problems

Each of the summands in problem 5.8 is as follows:

maxIti t∈N

E(∞∑t=0

δt(I tiRi(cti) + (1− I ti )λ)|c0

i ) (λ-passive problem for agent i)

Since the focus is on a single agent the notation regarding the agent is

suppressed whenever it is clear.

For this problem consider the following policies

55

Definition 8 (Monotone Policies). A policy is called monotone if ∃c ∈c1, c2, . . . , cn−1, cn such that for all x > c I t = 0 and for all z ≤ c I t = 1

A policy is called monotone if the agent is utilized whenever his costs are

below a level c and he is not utilized if his costs are higher than c.

Proposition 7. Any Markovian policy has an equivalent monotone policy.

Proof of proposition 7.

Observation 2. Any Markov utilization policy π can be identified by its

active set Sπ, such that I t = 1⇔ ct ∈ Sπ.

Let c0 = c denote the initial state and consider any Markov utilization

policy π, identified with its active set Sπ. Let cx = maxc ∈ Sπ : c ≤ c and

let cx = minc ∈ C \ Sπ : c ≥ c. There are two possible cases

Case 1. c ∈ Sπ. Under policy π for all t, ct ∈ c, . . . cx. Moreover, for all

t, I t = 1 ⇔ ct < cx. Now, consider the monotone policy, identified with cx

denoted by Px. Then, by definition under policy Px for all t, ct ∈ c, . . . , cx.Moreover, for all t, I t = 1⇔ ct < cx. Thus the two policies are equivalent.

Case 2. c 6∈ Sπ. Under policy π for all t, ct ∈ cx, . . . , c. Moreover, for all

t, I t = 0 ⇔ ct > cx. Now, consider the monotone policy, identified with cx

denoted by Px. Then, by definition under policy Px for all t, ct ∈ cx, . . . , c.Moreover, for all t, I t = 0⇔ ct > cx. Thus the two policies are equivalent.

Thus for any Markov policy, starting from any initial state, there is an

equivalent monotone policy.

Similar to the previous section the single agent problems can be solved

optimally via an index policy.

For x < y, let σyx denote the time when an agent i who starts in state x

at time 0, and who works every period reaches the state y. Formally:

σyx = inft > 0 : (cti) = y and c0i = x and Is = 1 ∀s < t.

56

The expected waiting just before changing state E(δσx+1x −1) is given by

E(δσx+1x −1) =

∞∑n=0

δn(1− q)nq

=q

1− δ(1− q)

Similarly for state any state let γ denote the time it takes an agent to reini-

tialize from any state ck > c1 at time 0, who rests every period. Formally:

γ = inft > 0 : (cti) = c1 and c0i 6= c1 and Is = 0 ∀s < t.

The expected discounted waiting time before reinitializing E(δγ−1) is

given by

E(δγ−1) =∞∑n=0

δn(1− r)nr

=r

1− δ(1− r)

Let πx denote a monotone policy, such that I t = 1 ⇔ ct ≥ cx. Let

fkx denote expected discounted returns under policy πx with initial state ck.

Similarly let gkx denote expected discounted utilization under policy πx with

initial state ck. Formally:

fkx = E(∞∑t=0

δtI tRi(ct)|πx, c0 = ck)

gkx = E(∞∑t=0

δtI t|πx, c0 = ck)

Since monotone policies automatically induce a family of nested sets,

utilizing Nino-Mora (2007), the marginal productivity index for any state cx

57

denoted λ(cx) for the relaxed problem can be readily computed as

λ(ck) =fxx − fxx−1

gxx − gxx−1

The nested sets are decreasing now since the law of motion under the

active action is reversed.

Finally, once again utilizing Wald’s identity along with strong markov

property the components of the index can be calculated as follows:

fxx = E

σx+1x −1∑t=0

δt(v − cx) + δσx+1x +γ

σx1−1∑n=0

δnv − cn+1 + δσx1 fxx

− (v − l)

=

v − cx1− δ(1− q)

− (v − l)

+δr

1− δ(1− r)qδ

1− δ(1− q)

[x−2∑n=0

v − cn+1

1− δ(1− q)

(δq

1− δ(1− q)

)n+

(δq

1− δ(1− q)

)x−1

fxx

]

=

v−cx1−δ(1−q) + δr

1−δ(1−r)qδ

1−δ(1−q)

[∑x−2n=0

v−cn+1

1−δ(1−q)

(δq

1−δ(1−q)

)n]− (v − l)

1− δr1−δ(1−r)

(δq

1−δ(1−q)

)x

fxx−1 =δr

1− δ(1− r)E

σx1−1∑n=0

δn(v − cn+1) + δσx1 fxx−1 − (v − l)

=

δr1−δ(1−r)

[∑x−2n=0

v−cn+1

1−δ(1−q)

(δq

1−δ(1−q)

)n]− δ(v − l)

1− δr1−δ(1−r)

(δq

1−δ(1−q)

)x−1

58

gxx = E

σx+1x −1∑t=0

δt1 + δσx+1x +γ

σx1−1∑n=0

δn1 + δσx1 gxx

=

1

1− δ(1− q)

+δr

1− δ(1− r)qδ

1− δ(1− q)

[x−2∑n=0

1

1− δ(1− q)

(δq

1− δ(1− q)

)n+

(δq

1− δ(1− q)

)x−1

gxx

]

=

11−δ(1−q) + δr

1−δ(1−r)qδ

1−δ(1−q)

[∑x−2n=0

11−δ(1−q)

(δq

1−δ(1−q)

)n]1− δr

1−δ(1−r)

(δq

1−δ(1−q)

)x

gxx−1 =δr

1− δ(1− r)E

σx1−1∑n=0

δn1 + δσx1 gxx−1

=

δr1−δ(1−r)

[∑x−2n=0

11−δ(1−q)

(δq

1−δ(1−q)

)n]1− δr

1−δ(1−r)

(δq

1−δ(1−q)

)x−1

5.2.2.2 Optimal Threshold

Given that each λ-passive problem has a solution in monotone policies, the

final step is to identify which monotone policy is optimal for each arm. In

that direction we first introduce the following notions

Definition 9 (Index policy augmented with threshold λ). A utilization policy

identified by indices λi(ci,x)ci,x∈Cii∈N and threshold λ is an index policy

with threshold λ if

I ti = 1⇔ λi(cti) ≥ λ and λi(c

ti) ≥ 0

An index policy augmented with threshold λ chooses a monotone policy

59

for each arm in 5.7 since the identified indices arise from the decoupling of

problem 5.7 the selection of a single threshold is without loss of generality.

Definition 10 (Workload limit with threshold λ). For any index policy aug-

mented with threshold λ, the workload limit for an agent (when it exists) is

given by

ci(λ) = maxci ∈ ci,1, ci,2, . . . , ci,ni−1, ci,ni : λi(ci) ≥ λ.12

Here it is helpful to introduce the following notation as well, k(λ) denotes

the ordinal level of ci(λ), i.e. ci(λ) = ci,k(λ) where k(λ) ∈ 1, 2, . . . , ni. The

workload limit for an agent indicates the highest level of costs that the agent

is utilized at a given threshold λ. If the workload limit does not exist for an

agent at some threshold, then that agent is never utilized with a threshold

λ. Furthermore it is easy to see that for each arm ci(λ) is decreasing in λ.

For identifying the expected discounted total utilization of a single arm,

we return back to the components of the index.

g1i,ci(λ) = E(

∞∑t=0

δtI ti |I ti = 1⇔ cti ≤ ci(λ))

=x−2∑n=0

1

1− δ(1− q)

(δq

1− δ(1− q)

)n+

(δq

1− δ(1− q)

)x−1

gci(λ)i,ci(λ)

Observe that for each arm since the workload limits are decreasing as the

threshold decreases, the sum of expected utilizations denoted as

∑i∈N

g1i,ci(λ) = E(

∞∑t=0

∑i∈N

I ti )

is also decreasing in λ in a monotone fashion. Thus by starting from a very

12Notice I used max here since indices are decreasing in state.

60

low λ and slowly increasing λ it is possible to make sure that the constraint

E(∞∑t=0

∑i∈N

I ti ) ≤ 1/(1− δ)

is satisfied, with one caveat, there might be a need to randomize for one agent

at the workload limit for a given λ if the constraint is to be satisfied with

equality since for a fixed discount factor the utilizations make discrete jumps

as λ changes. There are two potential cases that needs to be considered

for finding the optimal λ and whether the constraint will be satisfied with

equality, which ties the problem back to the principal’s incentive constraint.

Again to identify the connection between the indices and the principal’s

incentive constraint, we return back to another component of the index,

where

fci(λ)i,ci(λ) = E(

∞∑t=0

δtRi(cti)I

ti |I ti = 1⇔ cti ≤ ci(λ))

=

v−ci(λ)1−δ(1−q) + δr

1−δ(1−r)qδ

1−δ(1−q)

[∑k(λ)−2n=0

v−cn+1

1−δ(1−q)

(δq

1−δ(1−q)

)n]− (v − l)

1− δr1−δ(1−r)

(δq

1−δ(1−q)

)k(λ)

Notice that for any m ∈ 1, 2, . . . , k(λ) we also have fci(λ)−mi,ci(λ) ≥ f

ci(λ)i,ci(λ). Thus

the only point where the IC of the principal binds is at the workload limit.

However observe that by definition since indices are restricted to be positive,

fci(λ)i,ci(λ) ≥ 0 whenever I ti = 1, thus the principal’s IC will always be satisfied.

With this case ruled out the optimal threshold is purely driven by the total

utilization.

Observation 3. Optimal threshold λ∗ is the smallest λ ∈ R such that

∑i∈N

g1i,ci(λ) ≤ E(

∞∑t=0

∑i∈N

I ti )

61

Due to the discrete nature of the g1i,ci(λ)’s some randomization at exactly

the threshold level will be necessary, however it is easy to see if the agents

are not identical, only one agent will ever be randomizing.

References

Abreu, D., D. Pearce, and E. Stacchetti (1990): “Toward a Theory of

Discounted Repeated Games with Imperfect Monitoring,” Econometrica,

58(5), 1041–1063.

Andrews, I., and D. Barron (2013): “The allocation of future business:

Dynamic relational contracts with multiple agents,” American Economic

Review.

Argote, L., S. L. Beckman, and D. Epple (1990): “The persistence

and transfer of learning in industrial settings,” Management science, 36(2),

140–154.

Arrunada, B., and X. H. Vazquez (2006): “When your contract man-

ufacturer becomes your competitor,” Harvard business review, 84(9), 135.

Baker, G., R. Gibbons, and K. J. Murphy (2002): “Relational Con-

tracts and the Theory of the Firm,” Quarterly Journal of economics, pp.

39–84.

Benkard, C. L. (2000): “Learning and forgetting: The dynamics of aircraft

production,” American Economic Review, 90(4), 1034–1054.

Bergemann, D., and J. Valimaki (1996): “Learning and strategic pric-

ing,” Econometrica: Journal of the Econometric Society, pp. 1125–1149.

Besanko, D., U. Doraszelski, Y. Kryukov, and M. Satterthwaite

(2010): “Learning-by-doing, organizational forgetting, and industry dy-

namics,” Econometrica, 78(2), 453–508.

62

Blackwell, D. (1965): “Discounted Dynamic Programming,” The Annals

of Mathematical Statistics, 36(1), pp. 226–235.

Board, S. (2011): “Relational contracts and the value of loyalty,” The

American Economic Review, pp. 3349–3367.

Bolton, P., and C. Harris (1999): “Strategic experimentation,” Econo-

metrica, 67(2), 349–374.

Casadesus-Masanell, R., D. B. Yoffie, and S. Mattu (2002): “Intel

Corporation: 1968-2003,” .

Cisternas, G., and N. Figueroa (2015): “Sequential procurement auc-

tions and their effect on investment decisions,” The RAND Journal of

Economics, 46(4), 824–843.

Darr, E. D., L. Argote, and D. Epple (1995): “The acquisition, trans-

fer, and depreciation of knowledge in service organizations: Productivity

in franchises,” Management science, 41(11), 1750–1762.

Fryer, R., and P. Harms (2017): “Two-armed restless bandits with im-

perfect information: Stochastic control and indexability,” Mathematics of

Operations Research.

Gittins, J., K. Glazebrook, and R. Weber (2011): Multi-armed bandit

allocation indices. John Wiley & Sons.

Gittins, J. C. (1979): “Bandit processes and dynamic allocation indices,”

Journal of the Royal Statistical Society. Series B (Methodological), pp.

148–177.

Glazebrook, K., D. Hodge, C. Kirkbride, et al. (2013): “Monotone

policies and indexability for bidirectional restless bandits,” Advances in

Applied Probability, 45(1), 51–85.

63

Glazebrook, K., J. Nino-Mora, and P. Ansell (2002): “Index policies

for a class of discounted restless bandits,” Advances in Applied Probability,

pp. 754–774.

Glazebrook, K., D. Ruiz-Hernandez, and C. Kirkbride (2006):

“Some indexable families of restless bandit problems,” Advances in Ap-

plied Probability, pp. 643–672.

Gray, J. V., B. Tomlin, and A. V. Roth (2009): “Outsourcing to

a Powerful Contract Manufacturer: The Effect of Learning-by-Doing,”

Production and Operations Management, 18(5), 487–505.

Jacko, P. (2009): “Marginal productivity index policies for dynamic prior-

ity allocation in restless bandit models,” .

(2011): “Optimal index rules for single resource allocation to

stochastic dynamic competitors,” in Proceedings of the 5th International

ICST Conference on Performance Evaluation Methodologies and Tools,

pp. 425–433. ICST (Institute for Computer Sciences, Social-Informatics

and Telecommunications Engineering).

Keller, G., S. Rady, and M. Cripps (2005): “Strategic experimentation

with exponential bandits,” Econometrica, 73(1), 39–68.

Klein, N., and S. Rady (2011): “Negatively Correlated Bandits,” The

Review of Economic Studies.

Lakenan, B., D. Boyd, and E. Frey (2001): “Why Cisco fell: outsourc-

ing and its perils,” Strategy and Business, pp. 54–65.

Langer, E. S. (2015a): 12th Annual Report and Survey of Biopharmaceu-

tical Manufacturing Capacity and Production: A Study of Biotherapeu-

tic Developers and Contract Manufacturing Organizations. BioPlan Asso-

ciates, Incorporated.

64

(2015b): “CMOs Facing Significant Capacity Constraints,” Con-

tract Pharma.

Levin, J. (2002): “Multilateral contracting and the employment relation-

ship,” Quarterly Journal of economics, pp. 1075–1103.

(2003): “Relational incentive contracts,” The American Economic

Review, 93(3), 835–857.

Lewis, T. R., and H. Yildirim (2002): “Managing Dynamic Competi-

tion,” The American Economic Review, 92(4), 779–797.

Malcomson, J. M., et al. (2010): Relational incentive contracts. Depart-

ment of Economics, University of Oxford.

Marcet, A., and R. Marimon (2011): “Recursive contracts,” .

McCoy, M. (2003): “Serving emerging pharma,” Chemical & engineering

news, 81(16), 21–33.

Nino-Mora, J. (2002): “Dynamic allocation indices for restless projects

and queueing admission control: a polyhedral approach,” Mathematical

programming, 93(3), 361–413.

Nino-Mora, J. (2007): “Dynamic priority allocation via restless bandit

marginal productivity indices,” Top, 15(2), 161–198.

Nino-Mora, J., et al. (2001): “Restless bandits, partial conservation laws

and indexability,” Advances in Applied Probability, 33(1), 76–98.

Pandya, E., and K. Shah (2013): “CONTRACT MANUFACTURING IN

PHARMA INDUSTRY.,” Pharma Science Monitor, 4(3).

Papadimitriou, C. H., and J. N. Tsitsiklis (1999): “The complexity of

optimal queuing network control,” Mathematics of Operations Research,

24(2), 293–305.

65

Plambeck, E. L., and T. A. Taylor (2005): “Sell the plant? The im-

pact of contract manufacturing on innovation, capacity, and profitability,”

Management Science, 51(1), 133–150.

Puterman, M. L. (2014): Markov decision processes: discrete stochastic

dynamic programming. John Wiley & Sons.

Rajaram, S. (2015): “Electronics Contract Manufacturing and Design Ser-

vices: The Global Market,” Discussion paper, BCC Research.

Rosenberg, D., E. Solan, and N. Vieille (2007): “Social Learning in

One-Arm Bandit Problems,” Econometrica, 75(6), 1591–1611.

Serfozo, R. (2009): Basics of applied stochastic processes. Springer Science

& Business Media.

Shafer, S. M., D. A. Nembhard, and M. V. Uzumeri (2001): “The

effects of worker learning, forgetting, and heterogeneity on assembly line

productivity,” Management Science, 47(12), 1639–1653.

Strulovici, B. (2010): “Learning while voting: Determinants of collective

experimentation,” Econometrica, 78(3), 933–971.

Thompson, P. (2007): “How much did the Liberty shipbuilders forget?,”

Management science, 53(6), 908–918.

(2010): “Learning by doing,” Handbook of the Economics of Inno-

vation, 1, 429–476.

Tully, S. (1994): “Youll never guess who really makes,” Fortune, 130(7),

124–128.

Whittle, P. (1988): “Restless bandits: Activity allocation in a changing

world,” Journal of applied probability, pp. 287–298.

66

Download - Contract Manufacturing Relationships - Home | OpenScholar ... · Contract Manufacturing Relationships Can Urgun December 2017 Abstract ... droni, Niko Matouschek, Bruno Strulovici,

Top Related