differential games - university of texas at dallassethi/postscript/opre7320_ch_14.pdf ·...

21
Chapter 14 Differential Games In previous chapters, we were mainly concerned with the optimal control problems formulated by a single objective function (or a single decision maker). However, there are situations when there may be more than one decision maker, each having one’s own objective function that each is trying to maximize, subject to a set of differential equations. This extension of the optimal control theory is referred to as the theory of differential games. The study of differential games was initiated by Isaacs (1965). Af- ter the development of Pontryagin’s maximum principle, it became clear that there was a connection between differential games and optimal con- trol theory. In fact, differential game problems represent a generalization of optimal control problems in cases where there are more than one con- troller or player. However, differential games are conceptually far more complex than optimal control problems in the sense that it is no longer obvious what constitutes a solution; see Starr and Ho (1969), Ho (1970), Varaiya (1970), Friedman (1971), Leitmann (1974), Case (1979), Selten (1975), Mehlmann (1988), Berkovitz (1994), Basar and Olsder (1999), Dockner, Jørgensen, Long, and Sorger (2000), and Basar, Bensoussan, and Sethi (2010). Indeed, there are a number of different types of so- lutions such as minimax, Nash, Stackelberg, along with possibilities of cooperation and bargaining; see, e.g., Tolwinski (1982) and Haurie, Tol- winski, and Leitmann (1983). We will discuss minimax solutions for zero- sum differential games in Section 14.1, Nash solutions for nonzero-sum games in Section 14.2, and Stackelberg differential games in Section 14.3. 385

Upload: phungkhue

Post on 25-Jul-2018

223 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Differential Games - University of Texas at Dallassethi/Postscript/OPRE7320_Ch_14.pdf · Differential Games ... (1969), Ho (1970), Varaiya (1970), Friedman (1971), Leitmann (1974),

Chapter 14

Differential Games

In previous chapters, we were mainly concerned with the optimal controlproblems formulated by a single objective function (or a single decisionmaker). However, there are situations when there may be more thanone decision maker, each having one’s own objective function that eachis trying to maximize, subject to a set of differential equations. Thisextension of the optimal control theory is referred to as the theory ofdifferential games.

The study of differential games was initiated by Isaacs (1965). Af-ter the development of Pontryagin’s maximum principle, it became clearthat there was a connection between differential games and optimal con-trol theory. In fact, differential game problems represent a generalizationof optimal control problems in cases where there are more than one con-troller or player. However, differential games are conceptually far morecomplex than optimal control problems in the sense that it is no longerobvious what constitutes a solution; see Starr and Ho (1969), Ho (1970),Varaiya (1970), Friedman (1971), Leitmann (1974), Case (1979), Selten(1975), Mehlmann (1988), Berkovitz (1994), Basar and Olsder (1999),Dockner, Jørgensen, Long, and Sorger (2000), and Basar, Bensoussan,and Sethi (2010). Indeed, there are a number of different types of so-lutions such as minimax, Nash, Stackelberg, along with possibilities ofcooperation and bargaining; see, e.g., Tolwinski (1982) and Haurie, Tol-winski, and Leitmann (1983). We will discuss minimax solutions for zero-sum differential games in Section 14.1, Nash solutions for nonzero-sumgames in Section 14.2, and Stackelberg differential games in Section 14.3.

385

Page 2: Differential Games - University of Texas at Dallassethi/Postscript/OPRE7320_Ch_14.pdf · Differential Games ... (1969), Ho (1970), Varaiya (1970), Friedman (1971), Leitmann (1974),

386 14. Differential Games

14.1 Two-Person Zero-Sum Differential Games

Consider the state equation

x = f(x, u, v, t), x(0) = x0, (14.1)

where we may assume all variables to be scalar for the time being. Ex-tension to the vector case simply requires appropriate reinterpretationsof each of the variables and the equations. In this equation, we let uand v denote the controls applied by players 1 and 2, respectively. Weassume that

u(t) ∈ U, v(t) ∈ V, t ∈ [0, T ],

where U and V are convex sets in E1. Consider further the objectivefunction

J(u, v) = S[x(T )] +

∫ T

0F (x, u, v, t)dt, (14.2)

which player 1 wants to maximize and player 2 wants to minimize. Sincethe gain of player 1 represents a loss to player 2, such games are appro-priately termed zero-sum games. Clearly, we are looking for admissiblecontrol trajectories u∗ and v∗ such that

J(u∗, v) ≥ J(u∗, v∗) ≥ J(u, v∗). (14.3)

The solution (u∗, v∗) is known as the minimax solution. Here u∗ and v∗

stand for u∗(t), t ∈ [0, T ], and v∗(t), t ∈ [0, T ], respectively.The necessary conditions for u∗ and v∗ to satisfy (14.3) are given by

an extension of the maximum principle. To obtain these conditions, weform the Hamiltonian

H = F + λf (14.4)

with the adjoint variable λ satisfying the equation

λ = −Hx, λ(T ) = Sx[x(T )]. (14.5)

The necessary condition for trajectories u∗ and v∗ to be a minimax so-lution is that for t ∈ [0, T ],

H(x∗(t), u∗(t), v∗(t), λ(t), t) = minv∈V

maxu∈U

H(x∗(t), u, v, λ(t), t), (14.6)

which can also be stated, with suppression of (t), as

H(x∗, u∗, v, λ, t) ≥ H(x∗, u∗, v∗, λ, t) ≥ H(x∗, u, v∗, λ, t) (14.7)

Page 3: Differential Games - University of Texas at Dallassethi/Postscript/OPRE7320_Ch_14.pdf · Differential Games ... (1969), Ho (1970), Varaiya (1970), Friedman (1971), Leitmann (1974),

14.1. Nash Differential Games 387

for u ∈ U and v ∈ V. Note that (u∗, v∗) is a saddle point of the Hamil-tonian function H.

Note that if u and v are unconstrained, i.e., when, U = V = E1,condition (14.6) reduces to the first-order necessary conditions

Hu = 0 and Hv = 0, (14.8)

and the second-order conditions are

Huu ≤ 0 and Hvv ≥ 0. (14.9)

We now turn to the treatment of nonzero-sum differential games.

14.2 Nash Differential Games

In this section, let us assume that we have N players where N ≥ 2. Letui ∈ U i, i = 1, 2, . . . , N, represent the control variable for the ith player,where U i is the set of controls from which the ith player can choose. Letthe state equation be defined as

x = f(x, u1, u2, . . . , uN , t). (14.10)

Let J i, defined by

J i = Si[x(T )] +

∫ T

0F i(x, u1, u2, . . . , uN , t)dt, (14.11)

denote the objective function which the ith player wants to maximize. Inthis case, a Nash solution is defined by a set of N admissible trajectories

{u1∗, u2∗, . . . , uN∗}, (14.12)

which have the property that

J i(u1∗, u2∗, . . . , uN∗) =

maxui∈U i

J i(u1∗, u2∗, . . . , u(i−1)∗, ui, . . . , u(i+1)∗, . . . , uN∗) (14.13)

for i = 1, 2, . . . , N.

To obtain the necessary conditions for a Nash solution for nonzero-sum differential games, we must make a distinction between open-loopand closed-loop controls.

Page 4: Differential Games - University of Texas at Dallassethi/Postscript/OPRE7320_Ch_14.pdf · Differential Games ... (1969), Ho (1970), Varaiya (1970), Friedman (1971), Leitmann (1974),

388 14. Differential Games

14.2.1 Open-Loop Nash Solution

The open-loop Nash solution is defined when (14.12) is given as func-tions of time satisfying (14.13). To obtain the maximum principle typeconditions for such solutions to be a Nash solution, let us define theHamiltonian functions

H i = F i + λif (14.14)

for i = 1, 2, . . . , N, with λi satisfying

λi= −H i

x, λi(T ) = Si

x[x(T )]. (14.15)

The Nash control ui∗ for the ith player is obtained by maximizing theith Hamiltonian H i with respect to ui, i.e., ui∗ must satisfy

H i(x∗, u1∗, . . . , u(i−1)∗, ui∗, u(i+1)∗, . . . , uN∗, λ, t) ≥

H i(x∗, u1∗, . . . , u(i−1)∗, ui, u(i+1)∗, . . . , uN∗, λ, t), t ∈ [0, T ],

(14.16)for all ui ∈ U i, i = 1, 2, . . . , N.

Deal, Sethi, and Thompson (1979) formulated and solved an adver-tising game with two players and obtained the open-loop Nash solutionby solving a two-point boundary value problem. In Exercise 14.1, youare asked to obtain their boundary value problem. See also Deal (1979).

14.2.2 Feedback Nash Solution

A feedback Nash solution is obtained when (14.12) is defined in terms ofthe current state of the system. To avoid confusion, we let

ui∗(x, t) = ϕi(x, t), i = 1, 2, . . . , N. (14.17)

For these controls to represent a Nash strategy, we must recognize the de-pendence of the other players’ actions on the state variable x. Therefore,we need to replace the adjoint equation (14.15) by

λi= −H i

x −N∑j=1

H iujϕ

jx = −H i

x −N∑

j=1,j =i

H iujϕ

jx. (14.18)

The presence of the summation term in (14.18) makes the necessarycondition for the feedback solution virtually useless for deriving compu-tational algorithms; see Starr and Ho (1969). It is, however, possible

Page 5: Differential Games - University of Texas at Dallassethi/Postscript/OPRE7320_Ch_14.pdf · Differential Games ... (1969), Ho (1970), Varaiya (1970), Friedman (1971), Leitmann (1974),

14.1. Nash Differential Games 389

to use a dynamic programming approach for solving extremely simplenonzero-sum games, which require the solution of a partial differentialequation. We will use this approach in Section 14.3.

The troublesome summation term in (14.18) is absent in three im-portant cases: (a) in optimal control problems (N = 1) since Huux = 0,(b) in two-person zero-sum games because H1 = −H2 so that H1

u2u2x =

−H2u2u

2x = 0 and H2

u1u1x = −H1

u1u1x = 0, and (c) in open-loop nonzero-

sum games because ujx = 0. It certainly is to be expected, therefore, thatthe feedback and open-loop Nash solutions are going to be different, ingeneral. This can be shown explicitly for the linear-quadratic case.

We conclude this section by providing an interpretation to the adjointvariable λi. It is the sensitivity of the ith player’s profit to a perturbationin the state vector. If the other players are using closed-loop strategies,any perturbation δx in the state vector causes them to revise their con-trols by the amount ϕjxδx. If the ith Hamiltonian H i were extremizedwith respect to uj , j = i, this would not affect the ith player’s profit;but since ∂H i/∂uj = 0 for i = j, the reactions of the other players tothe perturbation influence the ith player’s profit, and the ith player mustaccount for this effect in considering variations of the trajectory.

14.2.3 An Application to the Common-Property FisheryResources

Consider extending the fishery model of Section 10.1 by assuming thatthere are two producers having unrestricted rights to exploit the fishstock in competition with each other. This gives rise to a nonzero-sumdifferential game analyzed by Clark (1976).

Equation (10.2) is modified by

x = g(x)− q1u1x− q2u2x, x(0) = x0, (14.19)

where ui(t) represents the rate of fishing effort and qiuix is the rate ofcatch for the ith producer, i = 1, 2. The control constraints are

0 ≤ ui(t) ≤ U i, i = 1, 2, (14.20)

the state constraints arex(t) ≥ 0, (14.21)

and the objective function for the ith producer is the total present valueof his profits, namely,

J i =

∫ ∞

0(piqix− ci)uie−ρtdt, i = 1, 2. (14.22)

Page 6: Differential Games - University of Texas at Dallassethi/Postscript/OPRE7320_Ch_14.pdf · Differential Games ... (1969), Ho (1970), Varaiya (1970), Friedman (1971), Leitmann (1974),

390 14. Differential Games

To find the Nash solution for this model, we let xi denote the turn-pike (or optimal biomass) level given by (10.12) on the assumption thatthe ith producer is the sole-owner of the fishery. Let the bionomic equi-librium xib and the corresponding control uib associated with producer ibe defined by (10.4), i.e.,

xib =ci

piqiand uib =

g(xib)pi

ci. (14.23)

As shown in Exercise 10.2, xib < xi, and we assume U i to be sufficientlylarge so that uib ≤ U i. We also assume that

x1b < x2b , (14.24)

which means that producer 1 is more efficient than producer 2, i.e.,producer 1 can make a positive profit at any level in the interval (x1b , x

2b ],

while producer 2 loses money in the same interval, except at x2b , wherehe breaks even. For x > x2b , both producers make positive profits.

Since U1 ≥ u1b by assumption, producer 1 has the capability of drivingthe fish stock to a level down to at least x1b which, by (14.24), is less thanx2b . This implies that producer 2 cannot operate at a sustained levelabove x2b ; and at a sustained level below x2b , he cannot make a profit.Hence, his optimal feedback policy is bang-bang:

u2∗(x) =

U2 if x > x2b ,

0 if x ≤ x2b .

(14.25)

As far as producer 1 is concerned, he wants to attain his turnpike levelx1 if x1 ≤ x2b . If x

1 > x2b and x0 ≥ x1, then from (14.25) producer 2 willfish at his maximum rate until the fish stock is driven to x2b . At thislevel, it is optimal for producer 1 to fish at a rate which maintains thefish stock at level x2b in order to keep producer 2 from fishing. Thus, theoptimal feedback policy for producer 1 can be stated as

u1∗(x) =

U1 if x > x1

u1 = g(x1)q1x1 if x = x1

0 if x < x1

, if x1 < x2b , (14.26)

Page 7: Differential Games - University of Texas at Dallassethi/Postscript/OPRE7320_Ch_14.pdf · Differential Games ... (1969), Ho (1970), Varaiya (1970), Friedman (1971), Leitmann (1974),

14.1. Nash Differential Games 391

u1∗(x) =

U1 if x > x2b

g(x2b)

q1x2b

if x = x2b

0 if x < x2b

, if x1 ≥ x2b . (14.27)

The formal proof that policies (14.25)-(14.27) give a Nash solutionrequires direct verification using the result of Section 10.1.2. The Nashsolution for this case means that for all feasible paths u1 and u2,

J1(u1∗, u2∗) ≥ J1(u1, u2∗), (14.28)

and

J2(u1∗, u2∗) ≥ J2(u1∗, u2). (14.29)

The direct verification involves defining a modified growth function

g1(x) =

g(x)− q2U2x if x > x2b ,

g(x) if x ≤ x2b ,

and using the Green’s theorem results of Section 10.1.2. Since U2 ≥ u2b byassumption, we have g1(x) ≤ 0 for x ≥ x2b . From (10.12) with g replacedby g1, it can be shown that the new turnpike level for producer 1 ismin(x1, x2b), which defines the optimal policy (14.26)-(14.27) for producer1. The optimality of (14.25) for producer 2 follows easily.

To interpret the results of the model, suppose that producer 1 orig-inally has sole possession of the fishery, but anticipates a rival entry.Producer 1 will switch from his own optimal sustained yield u1 to amore intensive exploitation policy prior to the anticipated entry.

We can now guess the results in situations involving N producers.The fishery will see the progressive elimination of inefficient producersas the stock of fish decreases. Only the most efficient producers willsurvive. If, ultimately, two or more maximally efficient producers exist,the fishery will converge to a classical bionomic equilibrium, with zerosustained economic rent.

We have now seen that a Nash competitive solution involving N ≥ 2producers results in the long-run dissipation of economic rents. Thisconclusion depends on the assumption that producers face an infinitelyelastic supply of all factors of production going into the fishing effort, but

Page 8: Differential Games - University of Texas at Dallassethi/Postscript/OPRE7320_Ch_14.pdf · Differential Games ... (1969), Ho (1970), Varaiya (1970), Friedman (1971), Leitmann (1974),

392 14. Differential Games

typically the methods of licensing entrants to regulated fisheries makesome attempt also to control the factors of production such as permittingthe licensee to operate only a single vessel of specific size.

In order to develop a model for licensing of fishermen, we let thecontrol variable vi denote the capital stock of the ith producer and letthe concave function f(vi), with f(0) = 0, denote the fishing mortalityfunction for i = 1, 2, . . . , N. This requires the replacement of qiui inthe previous model by f(vi). The extended model becomes nonlinear incontrol variables. You are asked in Exercise 14.3 to formulate this newmodel and develop necessary conditions for a feedback Nash solution forthis model with N producers. The reader is referred to Clark (1976) forfurther details.

For other papers on applications of differential games to fisherymanagement, see Hamalainen, Haurie, and Kaitala (1984, 1985) andHamalainen, Ruusunen, and Kaitala (1986, 1990). For applications toproblems in environmental management, see the edited volume by Car-raro and Filar (1995) on the topic.

14.3 A Stochastic Nash Differential Game inAdvertising

In this section, we will study a competitive extension of the Sethi ad-vertising model discussed in Section 13.3. This will give us a stochasticdifferential game, for which we aim to obtain a feedback Nash equilibriumby using a dynamic programming approach developed in Section 13.1.We should note that this approach can also be used to obtain feedbackNash equilibria in deterministic differential games as an alternative tothe maximum principle approach developed in Section 14.2.2.

Specifically, we consider a duopoly market in a mature product cate-gory where total sales are distributed between two firms, labeled as Firm1 and Firm 2, which compete for market share through advertising ex-penditures. We let Xt denote the market share of Firm 1 at time t, sothat the market share of Firm 2 is (1−Xt). Let U1t and U2t denote theadvertising effort rates of Firms 1 and 2, respectively, at time t. Usingthe subscript i ∈ {1, 2} to reference the two firms, let ri > 0 denotethe advertising effectiveness parameter, πi > 0 denote the sales margin,ρi > 0 denote the discount rate, and ci > 0 denote the cost parameter sothat the cost of advertising effort u by Firm i is ciu

2. Further, let δ > 0 bethe churn parameter, zt be the standard one-dimensional Wiener process,

Page 9: Differential Games - University of Texas at Dallassethi/Postscript/OPRE7320_Ch_14.pdf · Differential Games ... (1969), Ho (1970), Varaiya (1970), Friedman (1971), Leitmann (1974),

14.1. A Stochastic Nash Differential Game in Advertising 393

and σ(x) be the diffusion coefficient function as defined in Section 13.3.Then, in view of the competition between the firms, Prasad and Sethi(2004) extend the dynamics in (13.33) as the Ito stochastic differentialequation

dXt = [r1U1t

√1−Xt − δXt − r2U2t

√Xt + δ(1−Xt) + σ(Xt)dzt],

X(0) = x0 ∈ [0, 1]. (14.30)

We formulate the optimal control problem faced by the two firms as

maxu1≥0

{V1(x0) = E

∫ ∞

0e−ρ1t

[π1Xt − c1U

21t

]dt

}, (14.31)

maxu2≥0

{V2(x0) = E

∫ ∞

0e−ρ2t

[π2(1−Xt)− c2U

22t

]dt

}, (14.32)

subject to (14.30). Thus, each firm seeks to maximize its expected,discounted profit stream subject to the market share dynamics.

To find the feedback Nash equilibrium solution, we form theHamilton-Jacobi-Bellman (HJB) equations for the value function V1(x)and V2(x) :

ρ1V1 = maxu1≥0

{π1x− c1u21 + V1x[r1u1

√1− x− r2u2

√x− δ(2x− 1)]

+(σ(x))2V1xx/2}, (14.33)

ρ2V2 = maxu2≥0

{π2(1− x)− c2u22

+V2x[r1u1√1− x− r2u2

√x− δ(2x− 1)]

+(σ(x))2V2xx/2}. (14.34)

We use the first-order conditions to obtain the optimal feedback adver-tising decisions

u∗1(x) = V1x(x)r1√1− x/2c1 and u∗2(x) = −V2x(x)r2

√x/2c2. (14.35)

Since it is reasonable to expect that V1x ≥ 0 and V2x ≤ 0, these controlswill turn out to be nonnegative as we will see later.

Substituting (14.35) in (14.33) and (14.34), we obtain the Hamilton-Jacobi equations

ρ1V1 = π1x+ V 21xr

21(1− x)/4c1 + V1xV2xr

22x/2c2

−V1xδ(2x− 1) + (σ(x))2V1xx/2, (14.36)

Page 10: Differential Games - University of Texas at Dallassethi/Postscript/OPRE7320_Ch_14.pdf · Differential Games ... (1969), Ho (1970), Varaiya (1970), Friedman (1971), Leitmann (1974),

394 14. Differential Games

ρ2V2 = π2(1− x) + V 22xr

22x/4c2 + V1xV2xr

21(1− x)/2c1

−V2xδ(2x− 1) + (σ(x))2V2xx/2. (14.37)

As in Section 13.3, we look for the following forms for the value functions

V1 = α1 + β1x and V2 = α2 + β2(1− x). (14.38)

These are inserted into (14.36) and (14.37) to determine the unknowncoefficients α1, β1, α2, and β2. Equating the coefficients of x and theconstants on both sides of (14.36) and the coefficients of (1− x) and theconstants on both sides of (14.37), the following four equations emerge,which can be solved for the unknowns α1, β1, α2, and β2:

ρ1α1 = β21r21/4c1 + β1δ, (14.39)

ρ1β1 = π1 − β21r21/4c1 − β1β2r

22/2c2 − 2β1δ, (14.40)

ρ2α2 = β22r22/4c2 + β2δ, (14.41)

ρ2β2 = π2 − β22r22/4c2 − β1β2r

21/2c1 − 2β2δ. (14.42)

Let us first consider the special case of symmetric firms, i.e., whenα = α1 = α2, β = β1 = β2, π = π1 = π2, c = c1 = c2, r = r1 = r2, andρ = ρ1 = ρ2. The four equations in (14.39-14.42) reduce to the followingtwo:

ρα = β2r2/4c+ βδ and ρβ = π − 3β2r2/4c− 2βδ. (14.43)

There are two solution for β. One is negative, which clearly makes nosense. Thus, the remaining positive solution is the correct one. This alsoallows us to obtain the corresponding α. The solution is

α = [(ρ− δ)(W −√W 2 + 12Rπ) + 6Rπ]/18Rρ, (14.44)

β = (√W 2 + 12Rπ)−W/6R, (14.45)

where R = r2/4c and W = ρ + 2δ. With this the value functions in(14.38) are defined, and the controls in (14.35) can be written as

u∗1(x) =β1r1

√1− x

2c1and u∗2(x) =

β2r2√x

2c2,

which are clearly nonnegative as required.We return now to the general case of asymmetric firms. For this, we

re-express equations (14.39-14.42) in terms of a single variable β1, whichis determined by solving the quartic equation

3R21β

41 + 2R1(W1 +W2)β

31 + (4R2π2 − 2R1π1 −W 2

1 + 2W1W2)β21

+ 2π1(W1 −W2)β1 − π21 = 0. (14.46)

Page 11: Differential Games - University of Texas at Dallassethi/Postscript/OPRE7320_Ch_14.pdf · Differential Games ... (1969), Ho (1970), Varaiya (1970), Friedman (1971), Leitmann (1974),

14.1. A Stochastic Nash Differential Game in Advertising 395

This equation can be solved explicitly to give four roots. We will findthat only one of these is positive. We select that as our value of β1.With that, other coefficients can be obtained by solving for α1 and β2and then, in turn, α2, as follows:

α1 = β1(β1R1 + δ)/ρ1, (14.47)

β2 = (π1 − β21R1 − β1W1)/2β1R2, (14.48)

α2 = β2(β2R2 + δ)/ρ2, (14.49)

where R1 = r21/4c1, R2 = r22/4c2, W1 = ρ1 + 2δ, and W2 = ρ2 + 2δ.

In Exercise 14.4 you are asked to use Mathematica or another suit-able software program to solve (14.46) to obtain β1 and then obtainthe coefficients α1, α2, and β2 by using (14.47)-(14.49), when ρ1 = ρ2 =0.05, π1 = π2 = 1, δ = 0.01, R1 = 1, R2 = 4, x0 = 0.5, and σ(x) =√

0.5x(1− x). Figure 14.1 represents a sample path of the market shareof the two firms with this data.

X

Y

X

Y

Figure 14.1: A Sample Path of Optimal Market Share Trajectories

Page 12: Differential Games - University of Texas at Dallassethi/Postscript/OPRE7320_Ch_14.pdf · Differential Games ... (1969), Ho (1970), Varaiya (1970), Friedman (1971), Leitmann (1974),

396 14. Differential Games

14.4 A Stackelberg Differential Game of Coop-erative Advertising

The preceding sections in this chapter dealt with differential games inwhich all players make their decisions simultaneously. We now discussa differential game in which two players make their decisions in a hier-archical manner. The player having the right to move first is called theleader and the other player is called the follower. In case there are twoor more leaders, they play Nash, and the same goes for the followers.

In terms of solutions of Stackelberg differential games, we have open-loop and feedback solutions. An open-loop Stackelberg equilibrium spec-ifies, at the initial time (say, t = 0), the decisions over the entire horizon.As in Section 14.1, there is a maximum principle for open-loop solutions.Typically, open-loop solutions are not time consistent in the sense thatat any time t > 0, the remaining decision may no longer be optimal. Afeedback or Markovian Stackelberg equilibrium, on the other hand, con-sists of decisions expressed as functions of the current state and time.Such a solution is time consistent.

In this section, we will not develop the general theory, for which werefer the reader to Basar and Olsder (1999) and Dockner et al. (2000).Instead, we will formulate a Stackelberg differential game of cooperativeadvertising between a manufacturer as the leader and a retailer as thefollower, and obtain a feedback Stackelberg solution.

The manufacturer sells a product to end users through the retailer.The product is in a mature category where sales, expressed as a fractionof the potential market, is influenced through advertising expenditures.The manufacturer decides on an advertising support scheme via a sub-sidy rate, i.e., he will contribute a certain percentage of the advertisingexpenditure by the retailer. Specifically, the manufacturer decides on asubsidy rate Wt and the retailer decides on the advertising effort levelUt, t ≥ 0.

As in Section 13.3, the cost of advertising is quadratic in the adver-tising effort Ut. Then, with the advertising effort Ut and the subsidy rateWt, the manufacturer’s and the retailer’s advertising expenditures areWtU

2t and (1−Wt)U

2t , respectively.

The sequence of events is as follows. First, the manufacturer an-nounces the feedback subsidy rate w(x) ∈ [0, 1] as a function of the pre-vailing market share x ∈ [0, 1]. This means that at any time t ≥ 0, thesubsidy rate Wt will be given as w(Xt), where Xt represents the market

Page 13: Differential Games - University of Texas at Dallassethi/Postscript/OPRE7320_Ch_14.pdf · Differential Games ... (1969), Ho (1970), Varaiya (1970), Friedman (1971), Leitmann (1974),

14.1. A Stackelberg Differential Game of Cooperative Advertising 397

share at that time. Second, the retailer sets the advertising effort rate Ut

as its optimal response to the manufacturer’s announced decisions. Theretailer accomplishes this by solving an optimization problem to maxi-mize the present value of its profit stream over the infinite horizon. Giventhe manufacturer’s announced policy w(x), the retailer’s optimal controlproblem, using the notation of Section 13.3, is given by the followingobvious modification of the problem (13.33):

maxUt≥0

E

[∫ ∞

0e−ρt(πXt − (1− w(Xt))U

2t )dt

](14.50)

subject to

dXt = (rUt

√1−Xt − δXt)dt+ σ(Xt)dzt, X0 = x0. (14.51)

The retailer’s problem can be solved as in Section 13.3. With V R(x)denoting the retailer’s value function, the retailer’s HJB equation is

ρV R = maxu≥0

{πx− (1− w(x))u2 + V R

x

(ru

√1− x− δx

)+

(σ(x))2V Rxx

2

}.

(14.52)The first-order condition for a maximum gives the form of the retailer’sresponse to the manufacturer’s announced policy w(x) as

u∗(x|w(x)) = V Rx r

√1− x

2(1− w(x)). (14.53)

As in Section 13.3, we will see that this control will turn out to benonnegative. Substituting u∗(x|w) for u in (14.52), we get the retailer’sHamilton-Jacobi equation as

ρV R = πx+(V R

x )2r2(1− x)

4(1− w(x))− V R

x δx+(σ(x))2V R

xx

2. (14.54)

Next we consider the formulation of the manufacturer’s problem.The manufacturer anticipates the retailer’s reaction function (14.53) andincorporates this into his optimal control problem, which can be statedas follows:

max0≤Wt≤1

E

[∫ ∞

0e−ρt

(πMXt −Wt[u

∗(Xt|Wt)]2)dt

](14.55)

subject to

dXt = (ru∗(Xt|Wt)− δXt)dt+ σ(Xt)dzt, X0 = x0, (14.56)

Page 14: Differential Games - University of Texas at Dallassethi/Postscript/OPRE7320_Ch_14.pdf · Differential Games ... (1969), Ho (1970), Varaiya (1970), Friedman (1971), Leitmann (1974),

398 14. Differential Games

where πM denotes the manufacturer’s margin on sales and where wehave, from (14.53),

u∗(Xt|Wt) =V Rx (Xt)r

√1−Xt

2(1−Wt).

Letting VM (x) denote the manufacturer’s value function, we can writethe HJB equation it should satisfy:

ρVM = max0≤w≤1

{πMx− (V R

x )2r2(1− x)w

4(1− w)2

+ VMx

(V Rx r

2(1− x)

2(1− w)− δx

)+

(σ(x))2VMxx

2

}. (14.57)

The first-order condition for maximizing the RHS of (14.57) gives

1 + w

1− w=

2VMx

V Rx

. (14.58)

Since VMx and V R

x are expected to be positive, we see that (14.58) willalways have a solution w < 1. This makes intuitive sense because oth-erwise, w ≥ 1 and the retailer would indulge in an infinite amount ofadvertising, which would not be optimal for the leader. Since, we alsorestrict the subsidy rate to be nonnegative, we can conclude from (14.58)that

w∗(x) = max

{0,

2VMx − V R

x

2VMx + V R

x

}. (14.59)

Next, we investigate the two cases where the subsidy rate is (a) zeroor (b) positive, and determine the condition required for no subsidy tobe optimal.Case (a): No Co-op Advertising (w∗ = 0). Inserting w(x) = 0 into(14.53) gives

u∗(x) =rV R

x

√1− x

2. (14.60)

Inserting w(x) = 0 into (14.57) and (14.54), we have

ρVM = πMx+VMx V R

x r2(1− x)

2− VM

x δx+(σ(x))2VM

xx

2, (14.61)

ρV R = πx+(V R

x )2r2(1− x)

4− V R

x δx+(σ(x))2V R

xx

2. (14.62)

Page 15: Differential Games - University of Texas at Dallassethi/Postscript/OPRE7320_Ch_14.pdf · Differential Games ... (1969), Ho (1970), Varaiya (1970), Friedman (1971), Leitmann (1974),

14.1. A Stackelberg Differential Game of Cooperative Advertising 399

Let VM (x) = αM + βMx and V R(x) = α + βx. Then, VMx = βM and

V Rx = β. Substituting these into (14.61) and (14.62) and equating like

powers of x, we can express all the unknowns in terms of β, which itselfcan be explicitly solved. That is, we obtain

β =2π√

(ρ+ δ)2 + r2π + (ρ+ δ), βM =

2πM2(ρ+ δ) + βr2

,

α =β2r2

4ρ, αM =

ββMr2

2ρ. (14.63)

Using (14.63) in (14.60), we can write u∗(x) =√ρα(1− x). Finally, we

can derive the required condition from (14.59), which is 2VMx ≤ V R

x ,for no co-op advertising (w∗ = 0) in the equilibrium. This is given by2βM ≤ β, or

4πM

2(ρ+ δ) + 2πr2√(ρ+δ)2+r2π+(ρ+δ)

≤ 2π√(ρ+ δ)2 + r2π + (ρ+ δ)

. (14.64)

After a few steps of algebra, this yields the required condition

θ =πM√

(ρ+ δ)2 + r2π− π√

(ρ+ δ)2 + r2π + (ρ+ δ)≤ 0. (14.65)

Next, we obtain the solution when θ > 0.

Case (b): Co-op Advertising (w∗ > 0). Then, (14.59) reducesto

w∗(x) =2VM

x − V Rx

2VMx + V R

x

. (14.66)

Inserting this for w(x) into (14.57) and (14.54), we have

ρVM = πMx− r2(1− x)[4(VMx )2 − (V R

x )2]

16

+VMx r2(1− x)[2VM

x + V Rx ]

4

−VMx δx+

(σ(x))2VMxx

2, (14.67)

ρV R = πx+

[(V R

x )2r2(1− x)

4

] [2VM

x + V Rx

2V Rx

]− V R

x δx+(σ(x))2V R

xx

2.

(14.68)

Page 16: Differential Games - University of Texas at Dallassethi/Postscript/OPRE7320_Ch_14.pdf · Differential Games ... (1969), Ho (1970), Varaiya (1970), Friedman (1971), Leitmann (1974),

400 14. Differential Games

Once again, VM (x) = αM + βMx, VR = α + βx, VM

x = βM , VRx = β.

Substituting these into (14.67) and (14.68) and equating like powers ofx, we have

α =β(β + 2βM )r2

8ρ, (14.69)

(ρ+ δ)β = π − β(β + 2βM )r2

8, (14.70)

αM =(β + 2βM )2r2

16ρ, (14.71)

(ρ+ δ)βM = πM − (β + 2βM )2r2

16. (14.72)

From (14.53), (14.66), and (14.71), we can write

u∗(x) =r(V R

x + 2VMx )

√1− x

4=√ραM (1− x). (14.73)

The four equations (14.69)–(14.72) determine the solutions for thefour unknowns, α, β, αM , and βM . From (14.70) and (14.72), we canobtain

β3 +2πMρ+ δ

β2 +8π

r2β − 8π2

(ρ+ δ)r2= 0. (14.74)

If we denote

a1 =2πMρ+ δ

, a2 =8π

r2, and a3 =

8π2

(ρ+ δ)r2,

then a1 > 0, a2 > 0, and a3 < 0. From Descarte’s Rule of Signs, thereexists a unique, positive real root. The two remaining roots may beboth imaginary or both real and negative. Since this is a cubic equation,a complete solution can be obtained. Using Mathematica or followingSpiegel, Lipschutz, and Liu (2008), we can write down the three roots as

β(1) = S + T − 1

3a1,

β(2) = −1

2(S + T )− 1

3a1 +

√3

2i(S − T ),

β(3) = −1

2(S + T )− 1

3a1 −

√3

2i(S − T ),

with

S =3

√R+

√Q3 +R2, T =

3

√R−

√Q3 +R2, i =

√−1,

Page 17: Differential Games - University of Texas at Dallassethi/Postscript/OPRE7320_Ch_14.pdf · Differential Games ... (1969), Ho (1970), Varaiya (1970), Friedman (1971), Leitmann (1974),

14.1. A Stackelberg Differential Game of Cooperative Advertising 401

where

Q =3a2 − a21

9, R =

9a1a2 − 27a3 − 2a3154

.

Next, we identify the positive root in each of the following three cases:

Case 1: (Q > 0). We have S > 0 > T and Q3 + R2 > 0. Thereis one positive root and two imaginary roots. The positive root isβ = S + T − (1/3)a1.

Case 2: (Q < 0 and Q3 + R2 > 0.) There are three real rootswith one positive root, which is β = S + T − (1/3)a1.

Case 3: (Q < 0 and Q3 + R2 < 0). S and T are both imagi-nary. We have three real roots with one positive root. While subcasescan be given to identify the positive root, for our purposes, it is enoughto identify it numerically.

Finally, we can conclude that 2βM − β > 0 so that w∗ > 0, since ifthis were not the case, then w∗ would be zero, and we would once againbe in Case (a).

We can now summarize the optimal feedback Stackelberg equilibriumin Table 14.1. In Figure 14.2 we also graph the optimal subsidy rate w∗

against the retailer’s margin π, when the parameters πM = 0.5, r =.05, ρ = 2, δ = 1.

Figure 14.2: Optimal Subsidy Rate vs. the Retailer’s Margin

There have been many applications of differential games in market-ing in general and optimal advertising in particular. Some referencesare Bensoussan, Bultez, and Naert (1978), Deal, Sethi, and Thompson(1979), Deal (1979), Jørgensen (1982a), Rao (1984, 1990), Dockner andJørgensen (1986, 1992), Chintagunta and Vilcassim (1992), Chintaguntaand Jain (1994, 1995), Fruchter (1999), Jarrar, Martın-Herran and Zac-cour (2004), Martın-Herran, Taboubi and Zaccour (2005), Breton, Jarrarand Zaccour (2006), Jørgensen and Zaccour (2007), He and Sethi (2008),

Page 18: Differential Games - University of Texas at Dallassethi/Postscript/OPRE7320_Ch_14.pdf · Differential Games ... (1969), Ho (1970), Varaiya (1970), Friedman (1971), Leitmann (1974),

402 14. Differential Games

(a) if θ ≤ 0 (b) if θ > 0

No Co-op Equilibrium Co-op Equilibrium

Retailer’s

profit V R V R(x) = α+ βx V R(x) = α+ βx

Manufacturer’s

profit VM VM (x) = αM + βMx VM (x) = αM + βMx

Coefficients of β = 2π√(ρ+δ)2+rπ+(ρ+δ)

β = πρ+δ − β(β+2βM )r2

8(ρ+δ)

profit functions, βM = 2πM

2(ρ+δ)+βr2 βM = πM

ρ+δ − (β+2βM )2r2

16(ρ+δ)

α, β, αM , βM α = β2r2

4ρ α = β(β+2βM )r2

obtained from: αM = ββMr2

2ρ αM = (β+2βM )2r2

16ρ

Subsidy

rate w∗(x) = 0 2βM−β2βM+β = 1− α

αM

Advertising

effort u∗(x) = rβ√1−x2 =

√ρα(1− x) r(β+2βM )

√1−x

4 =√ραM (1− x)

Table 14.1: Optimal Feedback Stackelberg Solution

Naik, Prasad and Sethi (2008), Zaccour (2008a), Jørgensen, Kort andZaccour (2009), Prasad and Sethi (2009). The literature on advertisingdifferential games is surveyed by Jørgensen (1982a) and the literature onmanagement applications of Stackelberg differential games is reviewedby He et al. (2007). Monographs are written by Erickson (1991) andJørgensen and Zaccour (2004).

For applications of differential games to economics and managementscience in general, see the book by Dockner, Jørgensen, Long, and Sorger(2000).

EXERCISES FOR CHAPTER 14

E 14.1 A Bilinear Quadratic Advertising Model (Deal, Sethi, andThompson, 1979). Let xi be the market share of firm i and ui be its

Page 19: Differential Games - University of Texas at Dallassethi/Postscript/OPRE7320_Ch_14.pdf · Differential Games ... (1969), Ho (1970), Varaiya (1970), Friedman (1971), Leitmann (1974),

Exercises for Chapter 14 403

advertising rate, i = 1, 2. The state equations are

x1 = b1u1(1− x1 − x2) + e1(u1 − u2)(x1 + x2)− a1x1

x1(0) = x10,

x2 = b2u2(1− x1 − x2) + e2(u2 − u1)(x1 + x2)− a2x2

x2(0) = x20,

where bi, ei, and ai are given positive constants. Firm i wants to maxi-mize

Ji = wie−ρTxi(T ) +

∫ T

0(cixi − u2i )e

−ρtdt,

where wi, ci, and ρ are positive constants. Derive the necessary condi-tions for the open-loop Nash solution, and formulate the resulting bound-ary value problem. In a related paper, Deal (1979) provides a numericalsolution to this problem with e1 = e2 = 0.

E 14.2 Let x(t) denote the stock of pollution at time t ∈ [0, T ] thataffects the welfare of two countries. The state dynamics is

x = u+ v, x(0) = x0,

where u and v are emission rates of the leader and the follower, respec-tively. Let their instantaneous utility functions be

u− (u2 + x2)/2 and v − (v2 + x2)/2,

respectively. Obtain the open-loop Nash solution. By re-solving thisproblem at time τ , 0 < τ < T, show that the first solution obtained istime inconsistent.

E 14.3 Develop the nonlinear model for licensing of fisherman describedtoward the end of Section 14.2.3 by rewriting (14.19) and (14.22) for themodel. Derive the adjoint equation for λi for the ith producer, and showthat the feedback Nash policy for producer i is given by

f ′(vi) =ci

(pi − λi)x.

Page 20: Differential Games - University of Texas at Dallassethi/Postscript/OPRE7320_Ch_14.pdf · Differential Games ... (1969), Ho (1970), Varaiya (1970), Friedman (1971), Leitmann (1974),

404 14. Differential Games

E 14.4 Use Mathematica or another suitable software program to solvethe quartic equation (14.46). Show that for ρ1 = ρ2 = 0.05, π1 = π2 = 1,δ = 0.01, R1 = 1, R2 = 4, the only positive solution for β1 is 0.264545.Figure 14.1 gives a sample path of the optimal market shares of the twofirms for this problem. Draw another sample path.

E 14.5 Solve the Stackelberg differential game of Section 14.4 for π =0.25, πM = 0.5, r = 2, ρ = 0.05, δ = 1, and σ(x) =

√0.25x(1− x),

obtain the coefficients α, β, αM , βM , and show that v∗ = 0.58.

E 14.6 Assume no cooperative advertising in Exercise 14.5 and obtainthe value functions of the manufacturer and the retailer. Compare themanufacturer’s profit in this case with that in Exercise 14.5 when x0 =0.5, and obtain the manufacturer’s loss.

E 14.7 Suppose that the manufacturer and the retailer in the problemof Section 14.4 are integrated into a single firm. Then, formulate thestochastic optimal control problem of the integrated firm. Also, usingthe data in Exercise 14.5, obtain the value function V I(x) = αI + βIxof the integrated firm. Finally, compare the profit of the integratedchannel to the sum of the profits of the manufacturer and the retailerwhen x0 = 0.5.

E 14.8 Consider an N-firm oligopoly. Let Si(t) denote the cumulativesales by time t of firm i ∈ {1, 2, ..., N} and define S(t) =

∑Ni=1 Si(t). Let

Ai(t) denote firm i’s advertising rate. With positive constants a, b, andd, assume that the differential game has the diffusion dynamics

Si(t) = [a+ b logAi(t) + dS(t)][M − S(t)], Si(0) = Si0 ≥ 0,

which means that a firm can stimulate its sales through advertising (butsubject to decreasing returns) and that demand learning effects (imita-tion) are industry-wide. (If these effects were firm-specific we would haveSi instead of S in the brackets on the right-hand side of the dynamics.)Payoffs are given by

J i =

∫ T

0[(pi − ci)Si(t)−Ai(t)]dt,

in which prices and unit costs are constant. Since Si(t) in the expressionfor J i is stated in terms of the state variable S(t) and the control vari-ables Ai(t), i ∈ {1, 2, . . . , N}, formulate the differential game problem

Page 21: Differential Games - University of Texas at Dallassethi/Postscript/OPRE7320_Ch_14.pdf · Differential Games ... (1969), Ho (1970), Varaiya (1970), Friedman (1971), Leitmann (1974),

Exercises for Chapter 14 405

with S(t) as the state variable. In the open-loop Nash equilibrium,show that the advertising rates are monotonically decreasing over time.

HINT: Assume ∂2H i/∂S2 ≤ 0 so that H i is concave in S. Usethis condition to prove the monotone property.