irelandp

8/7/2019 irelandp

http://slidepdf.com/reader/full/irelandp 1/104

Copyright (c) 2008 by Peter N. Ireland. Redistribution is permitted for educational and research purposes, so long as no changes are made. All copies much be provided free of charge and must include this copyright notice.

LECTURE NOTES ON ECONOMIC DYNAMICS

Peter N. Ireland

Department of Economics Boston College

[email protected] http://www2.bc.edu/~irelandp/ec720.html

8/7/2019 irelandp


Two Useful Theorems

Two theorems will prove quite useful in all of our discussions of dynamic optimization:the Kuhn-Tucker Theorem and the Envelope Theorem. Let’s consider each of these in turn.

1 The Kuhn-Tucker Theorem

References:

Dixit, Chapters 2 and 3.

Simon-Blume, Chapter 18.

Consider a simple constrained optimization problem:

x ∈ R choice variable

F : R→ R objective function, continuously diff erentiable

c ≥ G(x) constraint, with c ∈ R and G : R→ R, also continuously diff erentiable.

The problem can be stated as:

maxx

F (x) subject to c ≥ G(x)

Probably the easiest way to solve this problem is via the method of Lagrange multipliers.The mathematical foundations that allow for the application of this method are givento us by Lagrange’s Theorem or, in its most general form, the Kuhn-Tucker Theorem.

To prove this theorem, begin by defining the Lagrangian:

L(x, λ) = F (x) + λ[c−G(x)]

for any x ∈ R and λ ∈ R.

Theorem (Kuhn-Tucker) Suppose that x∗ maximizes F (x) subject to c ≥ G(x), whereF and G are both continuously diff erentiable, and suppose that G0(x∗) 6= 0. Thenthere exists a value λ∗ of λ such that x∗ and λ∗ satisfy the following four conditions:

L1(x∗, λ∗) = F 0(x∗)− λ∗G0(x∗) = 0, (1)

1

8/7/2019 irelandp


L2(x∗, λ∗) = c−G(x∗) ≥ 0, (2)

λ∗ ≥ 0, (3)

andλ∗[c−G(x∗)] = 0. (4)

Proof Consider two possible cases, depending on whether or not the constraint is bindingat x∗.

Case 1: Nonbinding constraint.

If c > G(x∗), then let λ∗ = 0. Clearly, (2)-(4) are satisfied, so it only remains to showthat (1) must hold. With λ∗ = 0, (1) holds if and only if

F 0(x∗) = 0. (5)

We can show that (5) must hold using a proof by contradiction. Suppose that

instead of (5), it turns out that

F 0(x∗) < 0.

Then, by the continuity of F and G, there must exist an ε > 0 such that

F (x∗ − ε) > F (x∗) and c > G(x∗ − ε).

But this result contradicts the assumption that x∗ maximizes F (x) subject toc ≥ G(x). Similarly, if it turns out that

F 0(x∗) > 0,

then by the continuity of F and G there must exist an ε > 0 such that

F (x∗ + ε) > F (x∗) and c > G(x∗ + ε).

But, again, this result contradicts the assumption that x∗ maximizes F (x) subjectto c ≥ G(x). This establishes that (5) must hold, completing the proof for case 1.

Case 2: Binding Constraint.

If c = G(x∗), then let λ∗ = F 0(x∗)/G0(x∗). This is possible, given the assumption

that G0

(x∗

) 6= 0. Clearly, (1), (2), and (4) are satisfied, so it only remains to showthat (3) must hold. With λ∗ = F 0(x∗)/G0(x∗), (3) holds if and only if

F 0(x∗)/G0(x∗) ≥ 0. (6)

We can show that (6) must hold using a proof by contradiction. Suppose thatinstead of (6), it turns out that

F 0(x∗)/G0(x∗) < 0.

2

8/7/2019 irelandp


One way that this can happen is if F 0(x∗) > 0 and G0(x∗) < 0. But if theseconditions hold, then the continuity of F and G implies the existence of an ε > 0such that

F (x∗ + ε) > F (x∗) and c = G(x∗) > G(x∗ + ε),

which contradicts the assumption that x∗ maximizes F (x) subject to c ≥ G(x).If, instead, F 0(x∗)/G0(x∗) < 0 because F 0(x∗) < 0 and G0(x∗) > 0, then thecontinuity of F and G implies the existence of an ε > 0 such that

F (x∗ − ε) > F (x∗) and c = G(x∗) > G(x∗ − ε),

which again contradicts the assumption that x∗ maximizes F (x) subject to c ≥G(x). This establishes that (6) must hold, completing the proof for case 2.

Notes:

a) The theorem can be extended to handle cases with more than one choice variable

and more than one constraint: see Dixit or Simon-Blume.

b) Equations (1)-(4) are necessary conditions: If x∗ is a solution to the optimizationproblem, then there exists a λ∗ such that (1)-(4) must hold. But (1)-(4) are notsufficient conditions: if x∗ and λ∗ satisfy (1)-(4), it does not follow automaticallythat x∗ is a solution to the optimization problem.

Despite point (b) listed above, the Kuhn-Tucker theorem is extremely useful in practice.Suppose that we are looking for the solution x∗ to the constrained optimization problem

maxx

F (x) subject to c ≥ G(x).

The theorem tells us that if we form the Lagrangian

L(x, λ) = F (x) + λ[c−G(x)],

then x∗ and the associated λ∗ must satisfy the first-order condition (FOC) obtainedby diff erentiating L by x and setting the result equal to zero:

L1(x∗, λ∗) = F 0(x∗)− λ∗G0(x∗) = 0, (1)

In addition, we know that x∗ must satisfy the constraint:

c ≥ G(x∗). (2)

We know that the Lagrange multiplier λ∗ must be nonnegative:

λ∗ ≥ 0. (3)

3

8/7/2019 irelandp


And finally, we know that the complementary slackness condition

λ∗[c−G(x∗)] = 0, (4)

must hold: If λ∗ > 0, then the constraint must bind; if the constraint does not bind,then λ∗ = 0.

In searching for the value of x that solves the constrained optimization problem, we onlyneed to consider values of x∗ that satisfy (1)-(4).

Two pieces of terminology:

a) The extra assumption that G0(x∗) 6= 0 is needed to guarantee the existence of amultiplier λ∗ satisfying (1)-(4). This extra assumption is called the constraintqualification, and almost always holds in practice.

b) Note that (1) is a FOC for x, while (2) is like a FOC for λ. In most applications,the second-order conditions (SOC) will imply that x∗ maximizes L(x, λ), while λ∗

minimizes L(x, λ). For this reason, (x∗

, λ

∗

) is typically a saddle-point of L(x, λ).Thus, in solving the problem in this way, we are using the Lagrangian to turn a constrained

optimization problem into an unconstrained optimization problem, where we seek tomaximize L(x, λ) rather than simply F (x).

One final note:

Our general constraint, c ≥ G(x), nests as a special case the nonnegativity constraintx ≥ 0, obtained by setting c = 0 and G(x) = −x.

So nonnegativity constraints can be introduced into the Lagrangian in the same wayas all other constraints. If we consider, for example, the extended problem

maxx

F (x) subject to c ≥ G(x) and x ≥ 0,

then we can introduce a second multiplier µ, form the Lagrangian as

L(x,λ,µ) = F (x) + λ[c−G(x)] + µx,

and write the first order condition for the optimal x∗ as

L1(x∗, λ∗, µ∗) = F 0(x∗)− λ∗G0(x∗) + µ∗ = 0. (10)

In addition, analogs to our earlier conditions (2)-(4) must also hold for the secondconstraint: x∗ ≥ 0, µ∗ ≥ 0, and µ∗x∗ = 0.

Kuhn and Tucker’s original statement of the theorem, however, does not incorporatenonnegativity constraints into the Lagrangian. Instead, even with the additionalnonnegativity constraint x ≥ 0, they continue to define the Lagrangian as

L(x, λ) = F (x) + λ[c−G(x)].

If this case, the first order condition for x∗ must be modified to read

L1(x∗, λ∗) = F 0(x∗)− λ∗G0(x∗) ≤ 0, with equality if x∗ > 0. (100)

4

8/7/2019 irelandp


Of course, in (10), µ∗ ≥ 0 in general and µ∗ = 0 if x∗ > 0. So a close inspection revealsthat these two approaches to handling nonnegativity constraints lead in the endto the same results.

2 The Envelope TheoremReferences:

Dixit, Chapter 5.

Simon-Blume, Chapter 19.

In our discussion of the Kuhn-Tucker theorem, we considered an optimization problem of the form

maxx

F (x) subject to c ≥ G(x)

Now, let’s generalize the problem by allowing the functions F and G to depend on aparameter θ ∈ R. The problem can now be stated as

maxx

F (x, θ) subject to c ≥ G(x, θ)

For this problem, define the maximum value function V : R→ R as

V (θ) = maxx


Note that evaluating V requires a two-step procedure:

First, given θ, find the value of x∗ that solves the constrained optimization problem.

Second, substitute this value of x∗, together with the given value of θ, into the objec-tive function to obtain

V (θ) = F (x∗, θ)

Now suppose that we want to investigate the properties of this function V . Suppose, inparticular, that we want to take the derivative of V with respect to its argument θ.

As the first step in evaluating V 0(θ), consider solving the constrained optimization problemfor any given value of θ by setting up the Lagrangian

L(x, λ) = F (x, θ) + λ[c−G(x, θ)]

We know from the Kuhn-Tucker theorem that the solution x∗ to the optimization problemand the associated value of the multiplier λ∗ must satisfy the complementary slacknesscondition:

λ∗[c−G(x∗, θ)] = 0

5

8/7/2019 irelandp


Use this last result to rewrite the expression for V as

V (θ) = F (x∗, θ) = F (x∗, θ) + λ∗[c−G(x∗, θ)]

So suppose that we tried to calculate V 0(θ) simply by diff erentiating both sides of thisequation with respect to θ:

V 0(θ) = F 2(x∗, θ)− λ∗G2(x∗, θ).

In principle, this formula may not be correct. The reason is that x∗ and λ∗ will themselvesdepend on the parameter θ, and we must take this dependence into account whendiff erentiating V with respect to θ.

However, the envelope theorem tells us that our formula for V 0(θ) is, in fact, correct. Thatis, the envelope theorem tells us that we can ignore the dependence of x∗ and λ∗ on θin calculating V 0(θ).

To see why, for any θ, let x∗

(θ) denote the solution to the problem: max F (x, θ) subject toc ≥ G(x, θ), and let λ∗(θ) be the associated Lagrange multiplier.

Theorem (Envelope) Let F and G be continuously diff erentiable functions of x and θ.For any given θ, let x∗(θ) maximize F (x, θ) subject to c ≥ G(x, θ), and let λ∗(θ) bethe value of the associated Lagrange multiplier. Suppose, further, that x∗(θ) and λ∗(θ)are also continuously diff erentiable functions, and that the constraint qualificationG1[x∗(θ), θ] 6= 0 holds for all values of θ. Then the maximum value function defined by

V (θ) = maxx


satisfies

V 0

(θ) = F 2[x∗

(θ), θ]− λ∗

(θ)G2[x∗

(θ), θ]. (7)

Proof The Kuhn-Tucker theorem tells us that for any given value of θ, x∗(θ) and λ∗(θ)must satisfy

L1[x∗(θ), λ∗(θ)] = F 1[x∗(θ), θ]− λ∗(θ)G1[x∗(θ), θ] = 0, (1)

andλ∗(θ){c−G[x∗(θ), θ]} = 0. (4)

In light of (4),V (θ) = F [x∗(θ), θ] = F [x∗(θ), θ] + λ∗(θ){c−G[x∗(θ), θ]}

Diff erentiating both sides of this expression with respect to θ yields

V 0(θ) = F 1[x∗(θ), θ]x∗0(θ) + F 2[x∗(θ), θ]

+λ∗0(θ){c−G[x∗(θ), θ]}− λ∗(θ)G1[x∗(θ), θ]x∗0(θ)

−λ∗(θ)G2[x∗(θ), θ]

which shows that, in principle, we must take the dependence of x∗ and λ∗ on θ intoaccount when calculating V 0(θ).

6

8/7/2019 irelandp


Note, however, that

V 0(θ) = {F 1[x∗(θ), θ]− λ∗(θ)G1[x∗(θ), θ]}x∗0(θ)

+F 2[x∗(θ), θ] + λ∗0(θ){c−G[x∗(θ), θ]}− λ∗(θ)G2[x∗(θ), θ],

which by (1) reduces to

V 0(θ) = F 2[x∗(θ), θ] + λ∗0(θ){c−G[x∗(θ), θ]}− λ∗(θ)G2[x∗(θ), θ]

Thus, it only remains to show that

λ∗0(θ){c−G[x∗(θ), θ]} = 0 (8)

Clearly, (8) holds for any θ such that the constraint is binding.

For θ such that the constraint is not binding, (4) implies that λ∗(θ) must equal zero.Furthermore, by the continuity of G and x∗, if the constraint does not bind at θ, there

exists a ε∗

> 0 such that the constraint does not bind for all θ + ε with ε∗

> |ε|. Hence,(4) also implies that λ∗(θ + ε) = 0 for all ε∗ > |ε|. Using the definition of the derivative

λ∗0(θ) = limε→0

λ∗(θ + ε)− λ∗(θ)

ε= lim

ε→0

0

ε= 0,

it once again becomes apparent that (8) must hold.

Thus,V 0(θ) = F 2[x∗(θ), θ]− λ∗(θ)G2[x∗(θ), θ]

as claimed in the theorem.

Once again, this theorem is useful because it tells us that we can ignore the dependence of x∗ and λ∗ on θ in calculating V 0(θ).

But what is the intuition for why the envelope theorem holds? To obtain some intuition,begin by considering the simpler, unconstrained optimization problem:

maxx

F (x, θ),

where x is the choice variable and θ is the parameter.

Associated with this unconstrained problem, define the maximum value function in thesame way as before:

V (θ) = maxx

F (x, θ).

To evaluate V for any given value of θ, use the same two-step procedure as before. First,find the value x∗(θ) that solves the unconstrained maximization problem for that valueof θ. Second, substitute that value of x back into the objective function to obtain

V (θ) = F [x∗(θ), θ].

7

8/7/2019 irelandp


Now diff erentiate both sides of this expression through by θ, carefully taking the dependenceof x∗ on θ into account:

V (θ) = F 1[x∗(θ), θ]x∗0(θ) + F 2[x∗(θ), θ].

But, if x∗

(θ) is the value of x that maximizes F given θ, we know that x∗

(θ) must be acritical value of F :

F 1[x∗(θ), θ] = 0.

Hence, for the unconstrained problem, the envelope theorem implies that

V (θ) = F 2[x∗(θ), θ],

so that, again, we can ignore the dependence of x∗ on θ in diff erentiating the maximumvalue function. And this result holds not because x∗ fails to depend on θ: to thecontrary, in fact, x∗ will typically depend on θ through the function x∗(θ). Instead, the

result holds because since x∗

is chosen optimally, x∗

(θ) is a critical point of F given θ.

Now return to the constrained optimization problem

maxx


and define the maximum value function as before:

V (θ) = maxx

F (x, θ) subject to c ≥ G(x, θ).

The envelope theorem for this constrained problem tells us that we can also ignore the

dependence of x∗

on θ when diff erentiating V with respect to θ, but only if we start byadding the complementary slackness condition to the maximized objective function tofirst obtain

V (θ) = F [x∗(θ), θ] + λ∗(θ){c−G[x∗(θ), θ]}.

In taking this first step, we are actually evaluating the entire Lagrangian at the optimum,instead of just the objective function. We need to take this first step because for theconstrained problem, the Kuhn-Tucker condition (1) tells us that x∗(θ) is a criticalpoint, not of the objective function by itself, but of the entire Lagrangian formed byadding the product of the multiplier and the constraint to the objective function.

And what gives the envelope theorem its name? The “envelope” theorem refers to ageometrical presentation of the same result that we’ve just worked through.

To see where that geometrical interpretation comes from, consider again the simpler, un-constrained optimization problem:

maxx

F (x, θ),

where x is the choice variable and θ is a parameter.

8

8/7/2019 irelandp


Following along with our previous notation, let x∗(θ) denote the solution to this problemfor any given value of θ, so that the function x∗(θ) tells us how the optimal choice of x depends on the parameter θ.

Also, continue to define the maximum value function V in the same way as before:

V (θ) = maxx

F (x, θ).

Now let θ1 denote a particular value of θ, and let x1 denote the optimal value of x associatedwith this particular value θ1. That is, let

x1 = x∗(θ1).

After substituting this value of x1 into the function F , we can think about how F (x1, θ)varies as θ varies–that is, we can think about F (x1, θ) as a function of θ, holding x1

fixed.

In the same way, let θ2 denote another particular value of θ, with θ2 > θ1 let’s say. Andfollowing the same steps as above, let x2 denote the optimal value of x associated withthis particular value θ2, so that

x2 = x∗(θ2).

Once again, we can hold x2 fixed and consider F (x2, θ) as a function of θ.

The geometrical presentation of the envelope theorem can be derived by thinking about theproperties of these three functions of θ: V (θ), F (x1, θ), and F (x2, θ).

One thing that we know about these three functions is that for θ = θ1:

V (θ1) = F (x1, θ1) > F (x2, θ1),

where the first equality and the second inequality both follow from the fact that, bydefinition, x1 maximizes F (x, θ1) by choice of x.

Another thing that we know about these three functions is that for θ = θ2:

V (θ2) = F (x2, θ2) > F (x1, θ2),

because again, by definition, x2 maximizes F (x, θ2) by choice of x.

On a graph, these relationships imply that:

At θ1, V (θ) coincides with F (x1, θ), which lies above F (x2, θ).

At θ2, V (θ) coincides with F (x2, θ), which lies above F (x1, θ).

And we could find more and more values of V by repeating this procedure for moreand more specific values of θi, i = 1, 2, 3,....

In other words:

9

8/7/2019 irelandp


ө1 ө2 ө

V(ө)

F(x1,ө)

F(x2,ө)

The Envelope Theorem

8/7/2019 irelandp


V (θ) traces out the “upper envelope” of the collection of functions F (xi, θ), formedby holding xi = x∗(θi) fixed and varying θ.

Moreover, V (θ) is tangent to each individual function F (xi, θ) at the value θi of θ forwhich xi is optimal, or equivalently:

V 0

(θ) = F 2[x∗

(θ), θ],

which is the same analytical result that we derived earlier for the unconstrainedoptimization problem.

To generalize these arguments so that they apply to the constrained optimization problem

maxx

F (x, θ) subject to c ≥ G(x, θ),

simply use the fact that in most cases (where the appropriate second-order conditionshold) the value x∗(θ) that solves the constrained optimization problem for any given

value of θ also maximizes the Lagrangian functionL(x,λ,θ) = F (x, θ) + λ[c−G(x, θ)],

so that

V (θ) = maxx


= maxx

L(x,λ,θ)

Now just replace the function F with the function L in working through the argumentsfrom above to conclude that

V 0(θ) = L3[x∗(θ), λ∗(θ), θ] = F 2[x∗(θ), θ]− λ∗(θ)G2[x∗(θ), θ],

which is exactly the same result that we derived before!

3 Two Examples

3.1 Utility Maximization

A consumer has a utility function defined over consumption of two goods: U (c1, c2)

Prices: p1 and p2

Income: I

Budget constraint: I ≥ p1c1 + p2c2 = G(c1, c2)

The consumer’s problem is:

maxc1,c2

U (c1, c2) subject to I ≥ p1c1 + p2c2

10

8/7/2019 irelandp


The Kuhn-Tucker theorem tells us that if we set up the Lagrangian:

L(c1, c2, λ) = U (c1, c2) + λ(I − p1c1 − p2c2)

Then the optimal consumptions c∗1

and c∗2

and the associated multiplier λ∗ must satisfy the

FOC: L1(c∗1, c∗

2, λ∗) = U 1(c∗

1, c∗

2)− λ∗p1 = 0

andL2(c∗

1, c∗

2, λ∗) = U 2(c∗

1, c∗

2)− λ∗p2 = 0

Move the terms with minus signs to the other side, and divide the first of these FOC bythe second to obtain

U 1(c∗1, c∗

2)

U 2(c∗1, c∗

2)

=p1p2

,

which is just the familiar condition that says that the optimizing consumer should setthe slope of his or her indiff erence curve, the marginal rate of substitution, equal to

the slope of his or her budget constraint, the ratio of prices.

Now consider I as one of the model’s parameters, and let the functions c∗1(I ), c∗

2(I ), and

λ∗(I ) describe how the optimal choices c∗1

and c∗2

and the associated value λ∗ of themultiplier depend on I .

In addition, define the maximum value function as

V (I ) = maxc1,c2

U (c1, c2) subject to I ≥ p1c1 + p2c2

The Kuhn-Tucker theorem tells us that

λ∗(I )[I − p1c∗1(I )− p2c∗

2(I )] = 0

and hence

V (I ) = U [c∗1(I ), c∗

2(I )] = U [c∗

1(I ), c∗

2(I )] + λ∗(I )[I − p1c∗

1(I )− p2c∗

2(I )].

The envelope theorem tells us that we can ignore the dependence of c∗1

and c∗2

on I incalculating

V 0(I ) = λ∗(I ),

which gives us an interpretation of the multiplier λ∗ as the marginal utility of income.

3.2 Cost Minimization

The Kuhn-Tucker and envelope conditions can also be used to study constrained minimiza-tion problems.

Consider a firm that produces output y using capital k and labor l, according to thetechnology described by

f (k, l) ≥ y.

11

8/7/2019 irelandp


r = rental rate for capital

w = wage rate

Suppose that the firm takes its output y as given, and chooses inputs k and l to minimizecosts. Then the firm solves

mink,l

rk + wl subject to f (k, l) ≥ y

If we set up the Lagrangian as

L(k,l,λ) = rk + wl − λ[f (k, l)− y],

where the term involving the multiplier λ is subtracted rather than added in the case of a minimization problem, the Kuhn-Tucker conditions (1)-(4) continue to apply, exactlyas before.

Thus, according to the Kuhn-Tucker theorem, the optimal choices k∗ and l∗ and the asso-ciated multiplier λ∗ must satisfy the FOC:

L1(k∗, l∗, λ∗) = r− λ∗f 1(k∗, l∗) = 0 (9)

andL2(k∗, l∗, λ∗) = w − λ∗f 2(k∗, l∗) = 0 (10)

Move the terms with minus signs over to the other side, and divide the first FOC by thesecond to obtain

f 1(k∗, l∗)

f 2(k∗, l∗) =

r

w ,

which is another familiar condition that says that the optimizing firm chooses factorinputs so that the marginal rate of substitution between inputs in production equalsthe ratio of factor prices.

Now suppose that the constraint binds, as it usually will:

y = f (k∗, l∗) (11)

Then (9)-(11) represent 3 equations that determine the three unknowns k∗, l∗, and λ∗ as

functions of the model’s parameters r, w, and y. In particular, we can think of thefunctionsk∗ = k∗(r,w,y)

andl∗ = l∗(r,w,y)

as demand curves for capital and labor: strictly speaking, they are conditional (on y)factor demand functions.

12

8/7/2019 irelandp


Now define the minimum cost function as

C (r,w,y) = mink,l

rk + wl subject to f (k, l) ≥ y

= rk∗(r,w,y) + wl∗(r,w,y)

= rk

∗

(r,w,y) + wl

∗

(r,w,y)−λ∗(r,w,y){f [k∗(r,w,y), l∗(r,w,y)]− y}

The envelope theorem tells us that in calculating the derivatives of the cost function, wecan ignore the dependence of k∗, l∗, and λ∗ on r, w, and y.

Hence:C 1(r,w,y) = k∗(r,w,y),

C 2(r,w,y) = l∗(r,w,y),

and

C 3(r,w,y) = λ∗

(r,w,y).

The first two of these equations are statements of Shephard’s lemma; they tell us thatthe derivatives of the cost function with respect to factor prices coincide with theconditional factor demand curves. The third equation gives us an interpretation of themultiplier λ∗ as a measure of the marginal cost of increasing output.

Thus, our two examples illustrate how we can apply the Kuhn-Tucker and envelope theoremsin specific economic problems.

The two examples also show how, in the context of specific economic problems, it is often

possible to attach an economic interpretation to the multiplier λ

∗

.

13

8/7/2019 irelandp


The Maximum Principle

Here, we will explore the connections between two popular ways of solving dynamicoptimization problems, that is, problems that involve optimization over time. The firstsolution method is just a straightforward application of the Kuhn-Tucker theorem; the secondsolution method relies on a result known as the maximum principle.

We’ll being by briefly noting the basic features that set dynamic optimization problemsapart from purely static ones. Then we’ll go on consider the connections between the Kuhn-Tucker theorem and the maximum principle in both discrete and continuous time.

Reference:

Dixit, Chapter 10.

1 Basic Elements of Dynamic Optimization Problems

Moving from the static optimization problems that we’ve considered so far to the dynamicoptimization problems that are of primary interest here involves only a few minorchanges.

a) We need to index the variables that enter into the problem by t, in order to keep trackof changes in those variables that occur over time.

b) We need to distinguish between two types of variables:

stock variables - e.g., stock of capital, assets, or wealth

flow variables - e.g., output, consumption, saving, or labor supply per unit of time

c) We need to introduce constraints that describe the evolution of stock variables over time:e.g., larger flows of savings or investment today will lead to larger stocks of wealth or

capital tomorrow.

2 The Maximum Principle: Discrete Time

2.1 A Dynamic Optimization Problem in Discrete Time

Consider a dynamic optimization in discrete time, that is, in which time can be indexed byt = 0, 1,...,T .

1

8/7/2019 irelandp


yt = stock variable

z t = flow variable

Objective function:T Xt=0

β tF (yt, z t; t)

Following Dixit, we can allow for a wider range of possibilities by letting the functions aswell as the variables depend on the time index t.

1 ≥ β > 0 = discount factor

Constraint describing the evolution of the stock variable:

Q(yt, z t; t) ≥ yt+1 − yt

oryt + Q(yt, z t; t) ≥ yt+1

for all t = 0, 1,...,T

Constraint applying to variables within each period:

c ≥ G(yt, z t; t)

for all t = 0, 1,...,T

Constraints on initial and terminal values of stock:

y0 given

yT +1 ≥ y∗

The dynamic optimization problem can now be stated as: choose sequences {z t}T t=0 and

{yt}T +1t=1 to maximize the objective function subject to all of the constraints.

Notes:

a) It is important for the application of the maximum principle that the problembe additively time separable: that is, the values of F , Q, and G at time t mustdepend on the values of yt and z t only at time t.

b) Although the constraints describing the evolution of the stock variable and applyingto the variables within each period can each be written in the form of a singleequation, it must be emphasized that these constraints must hold for all t =0, 1,...,T . That is, each of these equations actually describes T + 1 constraints.

2

8/7/2019 irelandp


2.2 The Kuhn-Tucker Formulation

Let’s begin by applying the Kuhn-Tucker Theorem to solve this problem. That is, let’s setup the Lagrangian and take first-order conditions.

Set up the Lagrangian, recognizing that the constraints must hold for all t = 0, 1,...,T :

L =T Xt=0

β tF (yt,z t; t)+T Xt=0

πt+1[yt+Q(yt, z t; t)−yt+1]+T Xt=0

λt[c−G(yt, z t; t)]+φ(yT +1−y∗)

The Kuhn-Tucker theorem tells us that the solution to this problem must satisfy the FOCfor the choice variables z t for t = 0, 1,...,T and yt for t = 1, 2,...,T + 1.

FOC for z t, t = 0, 1,...,T :

β tF z(yt, z t; t) + πt+1Qz(yt, z t; t)− λtGz(yt, z t; t) = 0 (1)

for all t = 0, 1,..T.

FOC for yt, t = 1, 2,...,T :

β tF y(yt, z t; t) + πt+1 + πt+1Qy(yt, z t; t)− λtGy(yt, z t; t)− πt = 0

orπt+1 − πt = −[β tF y(yt, z t; t) + πt+1Qy(yt, z t; t)− λtGy(yt, z t; t)] (2)

for all t = 1, 2,...,T .

FOC for yT +1:

−πT +1 + φ = 0

Let’s assume that the problem is such that the constraint governing the evolution of thestock variable always holds with equality, as will typically be the case in economicapplications. Then another condition describing the solution to the problem is

yt+1 − yt = Q(yt, z t; t) (3)

for all t = 0, 1,...,T .

Finally, let’s write down the initial condition for the stock variable and the complementaryslackness condition for the constraint on the terminal value of the stock:

y0 given (4)

φ(yT +1 − y∗) = 0

or, using the FOC for yT +1:πT +1(yT +1 − y∗) = 0 (5)

Notes:

3

8/7/2019 irelandp


a) Together with the complementary slackness condition

λt[c−G(yt, z t; t)] = 0,

which implies eitherλt = 0 or c = G(yt, z t; t),

we can think of (1)-(3) as forming a system of four equations in four unknownsyt, z t, πt, λt. This system of equations determines the problem’s solution.

b) Equations (2) and (3), linking the values of yt and πt at adjacent points in time, areexamples of diff erence equations. They must be solved subject to two boundaryconditions:

The initial condition (4).

The terminal, or transversality, condition (5).

c) The analysis can also be applied to the case of an infinite time horizon, whereT = ∞. In this case, (1) must hold for all t = 0, 1, 2,..., (2) must hold for all

t = 1, 2, 3,..., (3) must hold for all t = 0, 1, 2,..., and (5) becomes a condition onthe limiting behavior of πt and yt:

limT →∞

πT +1(yT +1 − y∗) = 0. (6)

2.3 An Alternative Formulation

Now let’s consider the problem in a slightly diff erent way.

Begin by defining the Hamiltonian for time t:

H (yt, πt+1; t) = maxzt

β tF (yt, z t; t) + πt+1Q(yt, z t; t) subject to c ≥ G(yt, z t; t) (7)

Note that the Hamiltonian is a maximum value function.

Note also that the maximization problem on the right-hand side of (7) is a static optimiza-tion problem, involving no dynamic elements.

By the Kuhn-Tucker theorem:


β tF (yt, z t; t) + πt+1Q(yt, z t; t) + λt[c−G(yt, z t; t)]

And by the envelope theorem:

H y(yt, πt+1; t) = β tF y(yt, z t; t) + πt+1Qy(yt, z t; t)− λtGy(yt, z t; t) (8)

andH π(yt, πt+1; t) = Q(yt, z t; t) (9)

where z t solves the optimization problem on the right-hand side of (7) and must there-fore satisfy the FOC:


4

8/7/2019 irelandp


Now notice the following:

a) Equation (10) coincides with (1).

b) In light of (8) and (9), (2) and (3) can be written more compactly as

πt+1 − πt = −[β tF y(yt, z t; t) + πt+1Qy(yt, z t; t)− λtGy(yt, z t; t)] (2)

πt+1 − πt = −H y(yt, πt+1; t) (11)

andyt+1 − yt = Q(yt, z t; t) (3)

yt+1 − yt = H π(yt, πt+1; t). (12)

This establishes the following result.

Theorem (Maximum Principle) Consider the discrete time dynamic optimization prob-lem of choosing sequences {z t}

T t=0 and {yt}

T +1t=1 to maximize the objective function

T Xt=0

β tF (yt, z t; t)

subject to the constraintsyt + Q(yt, z t; t) ≥ yt+1

for all t = 0, 1,...,T ,c ≥ G(yt, z t; t)

for all t = 0, 1, . . ,T ,y0 given

and

yT +1 ≥ y∗

.

Associated with this problem, define the Hamiltonian


β tF (yt, z t; t) + πt+1Q(yt, z t; t) st c ≥ G(yt, z t; t). (7)

Then the solution to the dynamic optimization problem must satisfy

a) The first-order condition for the static optimization problem on the right-hand sideof (7):


for all t = 0, 1,...,T .

b) The pair of diff erence equations:

πt+1 − πt = −H y(yt, πt+1; t) (11)

for all t = 1, 2,...,T and

yt+1 − yt = H π(yt, πt+1; t) (12)

for all t = 0, 1,...,T , where the derivatives of H can be calculated using theenvelope theorem.

5

8/7/2019 irelandp


c) The initial conditiony0 given (4)

d) The terminal, or transversality, condition

πT +1(yT +1 − y∗) = 0 (5)

in the case where T < ∞ or

limT →∞

πT +1(yT +1 − y∗) = 0 (6)

in the case where T = ∞.

According to the maximum principle, there are two ways of solving discrete time dynamicoptimization problems, both of which lead to the same answer:

a) Set up the Lagrangian for the dynamic optimization problem and take first-order

conditions for all t = 0, 1,...,T .b) Set up the Hamiltonian for the problem and derive the first-order and envelope con-

ditions (10)-(12) for the static optimization problem that appears in the definitionof the Hamiltonian.

3 The Maximum Principle: Continuous Time

3.1 A Dynamic Optimization Problem in Continuous Time

Like the extension from static to dynamic optimization, the extension from discrete to

continuous time requires no new substantive ideas, but does require some changes innotation.

Accordingly, suppose now that the variable t, instead of taking on discrete values t =0, 1,...,T , takes on continuous values t ∈ [0, T ], where as before, T can be finite orinfinite.

It is most convenient now to regard the variables as functions of time:

y(t) = stock variable

z (t) = flow variable

The obvious analog to the objective function from before is:

Z T 0

e−ρtF (y(t), z (t); t)dt

ρ ≥ 0 = discount rate

Example:

6

8/7/2019 irelandp


β = 0.95

ρ = 0.05

β t for t = 1 is 0.95

e−ρt for t = 1 is 0.951, or approximately 0.95

Consider next the constraint describing the evolution of the stock variable.

In the discrete time case, the interval between time periods is just ∆t = 1.

Hence, the constraint might be written as

Q(y(t), z (t); t)∆t ≥ y(t +∆t)− y(t)

or

Q(y(t), z (t); t) ≥y(t +∆t)− y(t)

∆t

In the limit as the interval ∆t goes to zero, this last expression simplifies to

Q(y(t), z (t); t) ≥ y(t)

for all t ∈ [0, T ], where y(t) denotes the derivative of y(t) with respect to t.

The constraint applying to variables at a given point in time remains the same:

c ≥ G(y(t), z (t); t)

for all t ∈ [0, T ].

Note once again that these constraints must hold for all t ∈ [0, T ]. Thus, each of the twoequations from above actually represents an entire continuum of constraints.

Finally, the initial and terminal constraints for the stock variable remain unchanged:

y(0) given

y(T ) ≥ y∗

The dynamic optimization problem can now be stated as: choose functions z (t) for t ∈

[0, T ] and y(t) for t ∈ (0, T ] to maximize the objective function subject to all of the

constraints.

7

8/7/2019 irelandp



Once again, let’s begin by setting up the Lagrangian and taking first-order conditions:

L =

Z T 0

e−ρtF (y(t), z (t); t)dt +

Z T 0

π(t)[Q(y(t), z (t); t)− y(t)]dt

+

Z T

0

λ(t)[c−G(y(t), z (t); t)]dt + φ[y(T )− y∗]

Now we are faced with a problem: y(t) is a choice variable for all t ∈ [0, T ], but y(t) appearsin the Lagrangian.

To solve this problem, use integration by parts:Z T 0

½d

dt[π(t)y(t)]

¾dt =

Z T 0

π(t)y(t)dt +

Z T 0

π(t)y(t)dt

π(T )y(T )− π(0)y(0) =Z T 0

π(t)y(t)dt +Z T 0

π(t)y(t)dt

−

Z T 0

π(t)y(t)dt =

Z T 0

π(t)y(t)dt + π(0)y(0)− π(T )y(T )

Use this result to rewrite the Lagrangian as

L =

Z T 0

e−ρtF (y(t), z (t); t)dt +

Z T 0

π(t)Q(y(t), z (t); t)dt

+

Z T 0

π(t)y(t)dt + π(0)y(0)− π(T )y(T )

+

Z T 0

λ(t)[c−G(y(t), z (t); t)]dt + φ[y(T )− y∗]

Before taking first-order conditions, note that the multipliers π(t) and λ(t) are functions of tand that the corresponding constraints appear in the form of integrals. These featuresof the Lagrangian reflect the fact that the constraints must hold for all t ∈ [0, T ].

FOC for z (t), t ∈ [0, T ]:

e−ρtF z(y(t), z (t); t) + π(t)Qz(y(t), z (t); t)− λ(t)Gz(y(t), z (t); t) = 0 (13)

for all t ∈ [0, T ]

FOC for y(t), t ∈ (0, T ):

e−ρtF y(y(t), z (t); t) + π(t)Qy(y(t), z (t); t) + π(t)− λ(t)Gy(y(t), z (t); t) = 0

orπ(t) = −[e−ρtF y(y(t), z (t); t) + π(t)Qy(y(t), z (t); t)− λ(t)Gy(y(t), z (t); t)]

for all t ∈ (0, T ).

8

8/7/2019 irelandp


If we require all functions of t to be continuously diff erentiable, then this last equation willalso hold for t = 0 and t = T , so that we can write

π(t) = −[e−ρtF y(y(t), z (t); t) + π(t)Qy(y(t), z (t); t)− λ(t)Gy(y(t), z (t); t)] (14)

for all t∈

[0, T ].FOC for y(T ):

0 = e−ρT F y(y(T ), z (T ); T ) + π(T )Qy(y(T ), z (T ); T )

+π(T )− π(T )− λ(T )Gy(y(T ), z (T ); T ) + φ

or, using (14),π(T ) = φ

Assume, as before, that the constraint governing y(t) holds with equality:

y(t) = Q(y(t), z (t); t) (15)


Finally, write down the initial condition

y(0) given (16)

and the complementary slackness, or transversality condition

φ[y(T )− y∗] = 0

orπ(T )[y(T )− y∗] = 0 (17)

or in the infinite-horizon case

limT →∞

π(T )[y(T )− y∗] = 0. (18)

Notes:

a) Together with the complementary slackness condition

λ(t)[c−G(y(t), z (t); t)] = 0,

we can think of (13)-(15) as a system of four equations in four unknowns y(t),z (t), π(t), and λ(t). This system of equations determines the problem’s solution.

b) Equations (14) and (15), describing the behavior of y(t) and π(t), are examples of diff erential equations. They must be solved subject to two boundary conditions:(16) and either (17) or (18).

9

8/7/2019 irelandp



As before, define the Hamiltonian for this problem as

H (y(t), π(t); t) = maxz(t)

e−ρtF (y(t), z (t); t) + π(t)Q(y(t), z (t); t) (19)

st c ≥ G(y(t), z (t); t)

As before, the Hamiltonian is a maximum value function. And as before, the maximizationproblem of the right-hand side is a static one.

By the Kuhn-Tucker theorem:


e−ρtF (y(t), z (t); t) + π(t)Q(y(t), z (t); t) + λ(t)[c−G(y(t), z (t); t)]


H y(y(t), π(t); t) = e−ρtF y(y(t), z (t); t)+π(t)Qy(y(t), z (t); t)−λ(t)Gy(y(t), z (t); t) (20)

andH π(y(t), π(t); t) = Q(y(t), z (t); t) (21)

where z (t) solves the optimization problem on the right-hand side of (19) and musttherefore satisfy the FOC:

e−ρtF z(y(t), z (t); t) + π(t)Qz(y(t), z (t); t)− λ(t)Gz(y(t), z (t); t) = 0. (22)

Now notice the following:

a) Equation (22) coincides with (13).

b) In light of (20) and (21), (14) and (15) can be written more compactly as

π(t) = −H y(y(t), π(t); t) (23)

andy(t) = H π(y(t), π(t); t). (24)

This establishes the following result.

Theorem (Maximum Principle) Consider the continuous time dynamic optimizationproblem of choosing continuously diff erentiable functions z (t) and y(t) for t ∈ [0, T ] tomaximize the objective function

Z T 0

e−ρtF (y(t), z (t); t)dt

subject to the constraintsQ(y(t), z (t); t) ≥ y(t)

10

8/7/2019 irelandp


for all t ∈ [0, T ],c ≥ G(y(t), z (t); t)

for all t ∈ [0, T ],y(0) given,

and y(T ) ≥ y∗.

Associated with this problem, define the Hamiltonian


e−ρtF (y(t), z (t); t) + π(t)Q(y(t), z (t); t) (19)

st c ≥ G(y(t), z (t); t).

Then the solution to the dynamic optimization problem must satisfy

a) The first-order condition for the static optimization problem on the right-hand sideof (19):



b) The pair of diff erential equations

π(t) = −H y(y(t), π(t); t) (23)

andy(t) = H π(y(t), π(t); t) (24)

for all t ∈ [0, T ], where the derivatives of H can be calculated using the envelopetheorem.

c) The initial conditiony(0) given. (16)

d) The terminal, or transversality, condition

π(T )[y(T )− y∗] = 0 (17)

in the case where T < ∞ or

limT →∞

π(T )[y(T )− y∗] = 0. (18)

in the case where T = ∞.

Once again, according to the maximum principle, there are two ways of solving continuoustime dynamic optimization problems, both of which lead to the same answer:

a) Set up the Lagrangian for the dynamic optimization problem and take first-orderconditions for all t ∈ [0, T ].

b) Set up the Hamiltonian for the problem and derive the first-order and envelope con-ditions (22)-(24) for the static optimization problem that appears in the definitionof the Hamiltonian.

11

8/7/2019 irelandp


4 Two Examples

4.1 Life-Cycle Saving

Consider a consumer who is employed for T + 1 years: t = 0, 1,...,T .

w = constant annual labor income

kt = stock of assets at the beginning of period t = 0, 1,...,T + 1

k0 = 0

kt can be negative for t = 1, 2,...,T , so that the consumer is allowed borrow.

However, kT +1 must satisfykT +1 ≥ k∗ > 0

where k∗ denotes saving required for retirement.

r = constant interest rate

total income during period t = w + rkt

ct = consumption

Hence,kt+1 = kt + w + rkt − ct

or equivalently,kt + Q(kt, ct; t) ≥ kt+1

whereQ(kt, ct; t) = Q(kt, ct) = w + rkt − ct

for all t = 0, 1,...,T

Utility function:T Xt=0

β t ln(ct)

The consumer’s problem: choose sequences {ct}T t=0 and {kt}

T +1t=1 to maximize the utility

function subject to all of the constraints.

For this problem:

kt = stock variable

ct = flow variable

To solve this problem, set up the Hamiltonian:

H (kt, πt+1; t) = maxct

β t ln(ct) + πt+1(w + rkt − ct)

12

8/7/2019 irelandp


FOC for ct:β t

ct= πt+1 (25)

Diff erence equations for πt and kt:

πt+1 − πt = −H k(kt, πt+1; t) = −πt+1r (26)

andkt+1 − kt = H π(kt, πt+1; t) = w + rkt − ct (27)

Equations (25)-(27) represent a system of three equations in the three unknowns ct, πt, andkt. They must be solved subject to the boundary conditions

k0 = 0 given (28)

andπT +1(kT +1 − k∗) = 0 (29)

We can use (25)-(29) to deduce some key properties of the solution even without solvingthe system completely.

Note first that (25) implies that

πT +1 =β T

cT > 0.

Hence, it follows from (29) thatkT +1 = k∗.

Thus, the consumer saves just enough for retirement and no more.

Next, note that (26) implies πt+1 − πt = −πt+1r

(1 + r)πt+1 = πt (30)

Use (25) to obtain

πt+1 =β t

ctand πt =

β t−1

ct−1

and substitute these expressions into (30) to obtain

(1 + r)β t

ct=

β t−1

ct−1

(1 + r)β

ct=

1

ct−1ct

ct−1= β (1 + r) (31)

Equation (31) reveals that the optimal growth rate of consumption is constant, and is fasterfor a more patient consumer, with a higher value of β , and a consumer that faces ahigher interest rate r.

13

8/7/2019 irelandp


4.2 Optimal Growth

Consider an economy in which output is produced with capital according to the productionfunction

F (kt) = kαt ,

where 0 < α < 1.

ct = consumption

δ = depreciation rate for capital, 0 < δ < 1

Then the evolution of the capital stock is governed by

kt+1 = kαt + (1− δ )kt − ct

orkt+1 − kt = kα

t − δkt − ct

Our first example had a finite horizon and was cast in discrete time. So for the sake of variety, make this second example have an infinite horizon in continuous time.

The continuous time analog to the capital accumulation constraint shown above is just

k(t)α − δk(t)− c(t) ≥ ˙k(t)

orQ(k(t), c(t); t) ≥ ˙k(t),

where

Q(k(t), c(t); t) = Q(k(t), c(t)) = k(t)α − δk(t)− c(t)

for all t ∈ [0,∞)

Initial condition:k(0) given

Objective of a benevolent social planner or the utility of an infinitely-lived representativeconsumer: Z

∞

0

e−ρt ln(c(t))dt,

where ρ > 0 is the discount rate.

The problem: choose continuously diff erentiable functions c(t) and k(t) for t ∈ [0,∞) tomaximize utility subject to all of the constraints.

For this problem:

k(t) = stock variable

c(t) = flow variable

14

8/7/2019 irelandp


To solve this problem, set up the Hamiltonian:

H (k(t), π(t); t) = maxc(t)

e−ρt ln(c(t)) + π(t)[k(t)α − δk(t)− c(t)]

FOC for c(t):e−ρt = c(t)π(t) (32)

Diff erential equations for π(t) and k(t):

π(t) = −H k(k(t), π(t); t) = −π(t)[αk(t)α−1 − δ ] (33)

and˙k(t) = H π(k(t), π(t); t) = k(t)α − δk(t)− c(t). (34)

Equations (32)-(34) form a system of three equations in the three unknowns c(t), π(t), andk(t). How can we solve them?

Start by diff erentiating both sides of (32) with respect to t:

e−ρt = c(t)π(t) (32)

−ρe−ρt = c(t)π(t) + c(t)π(t)

−ρc(t)π(t) = c(t)π(t) + c(t)π(t)

Next, use (33)π(t) = −π(t)[αk(t)α−1 − δ ] (33)

to rewrite this last equation as

−ρc(t)π(t) = c(t)π(t)− c(t)π(t)[αk(t)α−1 − δ ]

−ρc(t) = c(t)− c(t)[αk(t)α−1 − δ ]

c(t) = c(t)[αk(t)α−1 − δ − ρ] (35)

Collect (34) and (35):˙k(t) = k(t)α − δk(t)− c(t). (34)

c(t) = c(t)[αk(t)α−1 − δ − ρ] (35)

and notice that these two diff

erential equations depend only on k(t) and c(t).Equation (35) implies that c(t) = 0 when

αk(t)α−1 − δ − ρ = 0

or

k(t) =

µδ + ρ

α

¶1/(α−1)= k∗

15

8/7/2019 irelandp


And since α − 1 < 0, (35) also implies that c(t) < 0 when k(t) > k∗ and c(t) > 0 whenk(t) < k∗.

Equation (34) implies that ˙k(t) = 0 when

k(t)α − δk(t)− c(t) = 0

orc(t) = k(t)α − δk(t).

Moreover, (34) implies that ˙k(t) < 0 when

c(t) > k(t)α − δk(t)

and ˙k(t) > 0 whenc(t) < k(t)α − δk(t)

We can illustrate these conditions graphically using a phase diagram, which reveals that:

The economy has a steady state at (k∗, c∗).

For each possible value of k0, there exists a unique value of c0 such that, starting from(k0, c0), the economy converges to the steady state (k∗, c∗).

Starting from all other values of c0, either k becomes negative or c approaches zero.

Trajectories that lead to negative values of k violate the nonnegativity condition forcapital, and hence cannot represent a solution.

Trajectories that lead towards zero values of c violate the transversality condition

limT →∞π(T )k(T ) = limT →∞

e−ρT

c(T ) k(T ) = 0

and hence cannot represent a solution.

Hence, the phase diagram allows us to identify the model’s unique solution.

5 One Final Note on the Maximum Principle

In applying the maximum principle in discrete time, we defined the Hamiltonian as


β tF (yt, z t; t) + πt+1Q(yt, z t; t) st c ≥ G(yt, z t; t) (7)

= maxzt

β tF (yt, z t; t) + πt+1Q(yt, z t; t) + λt[c−G(yt, z t; t)]

and used this definition to derive the optimality conditions (10)-(12) and either (5) or(6), depending on whether the horizon is finite or infinite.

The Hamiltonian, when defined as above, is often called the present-value Hamiltonian,because β tF (yt, z t; t) measures the present value at time t = 0 of the payoff F (yt, z t; t)received at time t > 0.

16

8/7/2019 irelandp


c

c’ = 0 isocline or locus (k = k*)

steady state

(k*,c*)

k’ = 0 isocline or locus

(c = kα – δk)

saddle path or

stable manifold

kk*

or eac va ue o 0, t ere s a un que va ue o c0 t at

leads the system to converge to the steady state.

8/7/2019 irelandp


The present-value Hamilton stands in contrast to the current-value Hamiltonian, definedby multiplying both sides of (7) by β −t:

β −tH (yt, πt+1; t) = maxzt

F (yt, z t; t) + β −tπt+1Q(yt, z t; t) + β −tλt[c−G(yt, z t; t)]

= maxzt

F (yt, z t; t) + θt+1Q(yt, z t; t) + μt[c−G(yt, z t; t)]

= H (yt, θt+1; t),

where the last line states the definition of the current-value Hamiltonian H (yt, θt+1; t)and where

θt+1 = β −tπt+1⇒ πt+1 = β tθt+1

andμt = β −tλt ⇒ λt = β tμt

Let’s consider rewriting the optimality conditions (10)-(12) and (5) in terms of the currentvalue Hamiltonian H (yt, θt+1; t).

To do this, note first that by definition

H (yt, πt+1; t) = β tH (yt, θt+1; t) = β tH (yt, β −tπt+1; t)

HenceH y(yt, πt+1; t) = β tH y(yt, θt+1; t)

and

H π(yt, πt+1; t) =∂

∂πt+1[β tH (yt, β −tπt+1; t)]

= β tβ −

tH θ(yt, β −

tπt+1; t)= H θ(yt, θt+1; t)

In light of these results, (10) can be rewritten


F z(yt, z t; t) + β −tπt+1Qz(yt, z t; t)− β −tλtGz(yt, z t; t) = 0

F z(yt, z t; t) + θt+1Qz(yt, z t; t)− μtGz(yt, z t; t) = 0 (100)

(11) can be rewritten

πt+1 − πt = −H y(yt, πt+1; t) (11)

β tθt+1 − β t−1θt = −β tH y(yt, θt+1; t)

θt+1 − β −1θt = −H y(yt, θt+1; t) (110)

(12) can be rewrittenyt+1 − yt = H π(yt, πt+1; t) (12)

yt+1 − yt = H θ(yt, θt+1; t) (120)

17

8/7/2019 irelandp


(5) can be rewrittenπT +1(yT +1 − y∗) = 0 (5)

β T θT +1(yT +1 − y∗) = 0 (50)

(6) can be rewritten

limT →∞

πT +1(yT +1 − y∗) = 0 (6)

limT →∞

β T θT +1(yT +1 − y∗) = 0 (60)

Thus, when the maximum principle in discrete time is stated in terms of the current-valueHamiltonian instead of the present-value Hamiltonian, (10)-(12) and (5) or (6) arereplaced by (100)-(120) and (50) or (60).

We can use the same types of transformations in the case of continuous time, where thepresent-value Hamiltonian is defined by

H (y(t), π(t); t) = maxz(t) e−

ρtF (y(t), z (t); t) + π(t)Q(y(t), z (t); t) (19)

st c ≥ G(y(t), z (t); t)

= maxz(t)

e−ρtF (y(t), z (t); t) + π(t)Q(y(t), z (t); t)

+λ(t)[c−G(y(t), z (t); t)]

Define the current-value Hamiltonian by multiplying both sides of (19) by eρt:

eρtH (y(t), π(t); t) = maxz(t)

F (y(t), z (t); t) + eρtπ(t)Q(y(t), z (t); t)

+eρt

λ(t)[c−G(y(t), z (t); t)]= max

z(t)F (y(t), z (t); t) + θ(t)Q(y(t), z (t); t)

+μ(t)[c−G(y(t), z (t); t)]

= H (y(t), θ(t); t)

where the last line defines the current-value Hamiltonian H (y(t), θ(t); t) and where

θ(t) = eρtπ(t) ⇒ π(t) = e−ρtθ(t)

μ(t) = eρtλ(t) ⇒ λ(t) = e−ρtμ(t)

In the case of continuous time, the optimality conditions derived from (19) are (22)-(24)and either (17) or (18). Let’s rewrite these conditions in terms of the current-valueHamiltonian H (y(t), θ(t); t).

To begin, note that

H (y(t), π(t); t) = e−ρtH (y(t), θ(t); t) = e−ρtH (y(t), eρtπ(t); t)

18

8/7/2019 irelandp


HenceH y(y(t), π(t); t) = e−ρtH y(y(t), θ(t); t)

and

H π(y(t), π(t); t) =∂

∂π(t)[e−ρtH (y(t), eρtπ(t); t)]

= e−ρteρtH θ(y(t), eρtπ(t); t)

= H θ(y(t), θ(t); t)

and, finally,

π(t) =∂

∂t[e−ρtθ(t)] = −ρe−ρtθ(t) + e−ρt ˙θ(t)

In light of these results, (22) can be rewritten


F z(y(t), z (t); t) + eρtπ(t)Qz(y(t), z (t); t)− eρtλ(t)Gz(y(t), z (t); t) = 0

F z(y(t), z (t); t) + θ(t)Qz(y(t), z (t); t)− μ(t)Gz(y(t), z (t); t) = 0 (220)

(23) can be rewrittenπ(t) = −H y(y(t), π(t); t) (23)

−ρe−ρtθ(t) + e−ρt ˙θ(t) = −e−ρtH y(y(t), θ(t); t)

˙θ(t) = ρθ(t)− H y(y(t), θ(t); t) (230)

(24) can be rewritten˙y(t) = H π(y(t), π(t); t) (24)

y(t) = H θ(y(t), θ(t); t) (240)

(17) can be rewrittenπ(T )[y(T )− y∗] = 0 (17)

e−ρT θ(T )[y(T )− y∗] = 0 (170)

(18) can be rewrittenlimT →∞

π(T )[y(T )− y∗] = 0 (18)

limT →∞e−ρT

θ(T )[y(T )− y∗

] = 0 (180

)

Thus, when the maximum principle in continuous time is stated in terms of the current-value Hamiltonian instead of the present-value Hamiltonian, (22)-(24) and (17) or (18)are replaced by (220)-(240) and (170) or (180).

19

8/7/2019 irelandp


Dynamic Programming

We have now studied two ways of solving dynamic optimization problems, one basedon the Kuhn-Tucker theorem and the other based on the maximum principle. These twomethods both lead us to the same sets of optimality conditions; they diff er only in terms of how those optimality conditions are derived.

Here, we will consider a third way of solving dynamic optimization problems: the methodof dynamic programming. We will see, once again, that dynamic programming leads us to thesame set of optimality conditions that the Kuhn-Tucker theorem does; once again, this new

method diff ers from the others only in terms of how the optimality conditions are derived.While the maximum principle lends itself equally well to dynamic optimization problems

set in both discrete time and continuous time, dynamic programming is easiest to apply indiscrete time settings. On the other hand, dynamic programing, unlike the Kuhn-Tuckertheorem and the maximum principle, can be used quite easily to solve problems in whichoptimal decisions must be made under conditions of uncertainty.

Thus, in our discussion of dynamic programming, we will begin by considering dynamicprogramming under certainty; later, we will move on to consider stochastic dynamic pro-gramming.

Reference:

Dixit, Chapter 11.

1 Dynamic Programming Under Certainty

1.1 A Perfect Foresight Dynamic Optimization Problem in Dis-

crete Time

No uncertainty

Discrete time, infi

nite horizon: t = 0, 1, 2,...yt = stock, or state, variable

zt = flow, or control, variable

Objective function:∞Xt=0

β tF (yt, zt; t)

1

8/7/2019 irelandp


1 > β > 0 discount factor

Constraint describing the evolution of the state variable

Q(yt, zt; t) ≥ yt+1 − yt

oryt + Q(yt, zt; t) ≥ yt+1

for all t = 0, 1, 2,...

Constraint applying to variables within each period:

c ≥ G(yt, zt; t)

for all t = 0, 1, 2,...

Constraint on initial value of the state variable:

y0 given

The problem: choose sequences {zt}∞

t=0 and {yt}∞

t=1 to maximize the objective functionsubject to all of the constraints.

Notes:

a) It is important for the application of dynamic programming that the problem isadditively time separable: that is, the values of F , Q, and G at time t mustdepend only on the values of yt and zt at time t.

b) Once again, it must be emphasized that although the constraints describing theevolution of the state variable and that apply to the variables within each periodcan each be written in the form of a single equation, these constraints must holdfor all t = 0, 1, 2,.... Thus, each equation actually represents an infinite numberof constraints.


Let’s being our analysis of this problem by applying the Kuhn-Tucker theorem. That is,let’s begin by setting up the Lagrangian and taking first order conditions.

Set up the Lagrangian, recognizing that the constraints must hold for all t = 0, 1, 2,...:

L =∞Xt=0

β tF (yt, zt; t) +∞Xt=0

μt+1[yt + Q(yt, zt; t)− yt+1] +∞Xt=0

λt[c−G(yt, zt; t)]

2

8/7/2019 irelandp


It will be convenient to define

μt+1 = β −(t+1)μt+1 =⇒ μt+1 = β t+1μt+1

λt = β −tλt =⇒ λt = β tλt

and to rewrite the Lagrangian in terms of μt+1 and λt instead of μt+1 and λt:

L =∞Xt=0

β tF (yt, zt; t) +∞Xt=0

β t+1μt+1[yt + Q(yt, zt; t)− yt+1] +∞Xt=0

β tλt[c−G(yt, zt; t)]

FOC for zt, t = 0, 1, 2,...:

β tF 2(yt, zt; t) + β t+1μt+1Q2(yt, zt; t)− β tλtG2(yt, zt; t) = 0

FOC for yt, t = 1, 2, 3,...:

β tF 1(yt, zt; t) + β t+1μt+1[1 + Q1(yt, zt; t)]− β tλtG1(yt, zt; t)− β tμt = 0

Now, let’s suppose that somehow we could solve for μt as a function of the state variableyt:

μt = W (yt; t)

μt+1 = W (yt+1; t + 1) = W [yt + Q(yt, zt; t); t + 1]

Then we could rewrite the FOC as:

F 2(yt, zt; t) + βW [yt + Q(yt, zt; t); t + 1]Q2(yt, zt; t)− λtG2(yt, zt; t) = 0 (1)

W (yt; t) = F 1(yt, zt; t) + βW [yt + Q(yt, zt; t); t +1][1+ Q1(yt, zt; t)]−λtG1(yt, zt; t) (2)

And together with the binding constraint

yt+1 = yt + Q(yt, zt; t) (3)

and the complementary slackness condition

λt[c−G(yt, zt; t)] = 0 (4)

we can think of (1) and (2) as forming a system of four equations in three unknownvariables yt, zt, and λt and one unknown function W (:, t). This system of equationsdetermines the problem’s solution.

Note that since (3) is in the form of a diff erence equation, finding the problem’s solutioninvolves solving a diff erence equation.

3

8/7/2019 irelandp



Now let’s consider the same problem in a slightly diff erent way.

For any given value of the initial state variable y0, define the value function

v(y0; 0) = max{zt}∞t=0,{yt}

∞

t=1

∞Xt=0

β tF (yt, zt; t)

subject toy0 given

yt + Q(yt, zt; t) ≥ yt+1 for all t = 0, 1, 2,...

c ≥ G(yt, zt; t) for all t = 0, 1, 2,...

More generally, for any period t and any value of yt, define

v(yt; t) = max{zt+j}∞j=0,{yt+j}∞j=1

∞Xj=0

β jF (yt+j , zt+j ; t + j)

subject toyt given

yt+j + Q(yt+j, zt+j; t + j) ≥ yt+j+1 for all j = 0, 1, 2,...

c ≥ G(yt+j , zt+j; t + j) for all j = 0, 1, 2,...

Note that the value function is a maximum value function.

Now consider expanding the definition of the value function by separating out the time tcomponents:

v(yt; t) = maxzt,yt+1

[F (yt, zt; t) + max{zt+j}∞j=1,{yt+j}∞j=2

∞Xj=1

β jF (yt+j, zt+j; t + j)]

subject toyt given

yt + Q(yt, zt; t) ≥ yt+1

yt+j + Q(yt+j, zt+j; t + j) ≥ yt+j+1 for all j = 1, 2, 3,...

c ≥ G(yt, zt; t)

c ≥ G(yt+j , zt+j; t + j) for all j = 1, 2, 3,...

4

8/7/2019 irelandp


8/7/2019 irelandp



v0(yt; t) = F 1(yt, zt; t) + βv 0[yt + Q(yt, zt; t); t + 1][1 + Q1(yt, zt; t)]− λtG1(yt, zt; t) (7)

Together with the binding constraint

yt+1 = yt + Q(yt, zt; t) (3)

and complementary slackness condition

λt[c−G(yt, zt; t)] = 0, (4)

we can think of (6) and (7) as forming a system of four equations in three unknownvariables yt, zt, and λt and one unknown function v(:, t). This system of equationsdetermines the problem’s solution.

Note once again that since (3) is in the form of a diff erence equation, finding the problem’ssolution involves solving a diff erence equation.

But more important, notice that (6) and (7) are equivalent to (1) and (2) with

v0(yt; t) = W (yt; t).

Thus, we have two ways of solving this discrete time dynamic optimization problem, bothof which lead us to the same set of optimality conditions:

a) Set up the Lagrangian for the dynamic optimization problem and take first orderconditions for zt, t = 0, 1, 2,... and yt, t = 1, 2, 3,....

b) Set up the Bellman equation and take the first order condition for zt and thenderive the envelope condition for yt.

One question remains: How, in practice, can we solve for the unknown value functionsv(:, t)?

To see how to answer this question, consider two examples:

Example 1: Optimal Growth - Here, it will be possible to solve for v explicitly.

Example 2: Saving Under Certainty - Here, it will not be possible to solve for v

explicitly, yet we can learn enough about the properties of v to obtain someuseful economic insights.

6

8/7/2019 irelandp


2 Example 1: Optimal Growth

Here, we will modify the optimal growth example that we solved earlier using the maximumprinciple in two ways:

a) We will switch to discrete time in order to facilitate the use of dynamic program-ming.

b) Set the depreciation rate for capital equal to δ = 1 in order to obtain a very specialcase in which an explicit solution for the value function can be found.

Production function:F (kt) = kα

t

where 0 < α < 1

kt = capital (state variable)

ct = consumption (control variable)

Evolution of the capital stock:kt+1 = kα

t − ct

for all t = 0, 1, 2,...

Initial condition:k0 given

Utility or social welfare:∞Xt=0

β t

ln(ct)

The social planner’s problem: choose sequences {ct}∞

t=0 and {kt}∞

t=1 to maximize the utilityfunction subject to all of the constraints.

To solve this problem via dynamic programming, use

kt = state variable

ct = control variable

Set up the Bellman equation:

v(kt; t) = maxct

ln(ct) + βv(kαt − ct; t + 1)

Now guess that the value function takes the time-invariant form

v(kt; t) = v(kt) = E + F ln(kt),

where E and F are constants to be determined.

7

8/7/2019 irelandp


Using the guess for v, the Bellman equation becomes

E + F ln(kt) = maxct

ln(ct) + βE + βF ln(kαt − ct) (8)

FOC for ct:

1ct− βF

kαt − ct

= 0 (9)

Envelope condition for kt:F

kt=

αβFkα−1t

kαt − ct

(10)


kt+1 = kαt − ct,

equations (8)-(10) form a system of four equations in 4 unknowns: ct, k

t, E , and F .

Equation (9) implieskαt − ct = βF ct

or

ct =

µ1

1 + βF

¶kαt (11)

Substitute (11) into the envelope condition (10):

F

kt

=αβFkα−1

t

k

α

t−

ct

(10)

F kαt − F

µ1

1 + βF

¶kαt = αβFkα

t

1−

µ1

1 + βF

¶= αβ

Hence1

1 + βF = 1 − αβ (12)

Or, equivalently,

1 + βF = 11− αβ

βF =1

1− αβ − 1 =

αβ

1− αβ

F =α

1− αβ (13)

8

8/7/2019 irelandp


Numerical Solutions to the Optimal Growth Model with Complete Depreciation

Generated using equations (14) and (15). Each example sets α = 0.33 and β = 0.99.

Example 1: k(0) = 0.01

0

0.1

0.2

0.3

0.4

0.5

0 5 10 15 20

consumption

0

0.05

0.1

0.15

0.2

0 5 10

capital stock

Example 2: k(0) = 1

In both examples, c(t) converges to its steady state value of 0.388 and k(t) converges to its steady-state va

0

0.2

0.4

0.6

0.8

0 5 10 15 20

consumption

00.20.40.60.8

1

1.2

0 5 10

capital stock

8/7/2019 irelandp


Substitute (12) into (11) to obtainct = (1− αβ )kα

t (14)

which shows that it is optimal to consume the fixed fraction 1− αβ of output.

Evolution of capital:

kt+1 = kαt − ct = kαt − (1− αβ )kαt = αβkαt (15)

which is in the form of a diff erence equation for kt.

Equations (14) and (15) show how the optimal values of ct and kt+1 depend on the statevariable kt and the parameters α and β . Given a value for k0, these two equations canbe used to construct the optimal sequences {ct}

∞

t=0 and {kt}∞

t=1.

For the sake of completeness, substitute (14) and (15) back into (8) to solve for E :

E + F ln(kt) = maxct

ln(ct) + βE + βF ln(kαt − ct) (8)

E + F ln(kt) = ln(1− αβ ) + α ln(kt) + βE + βF ln(αβ ) + αβF ln(kt)

Since (13) implies thatF = α + αβF,

this last equality reduces to

E = ln(1− αβ ) + βE + βF ln(αβ )

which leads directly to the solution

E =ln(1− αβ ) + αβ

1−αβln(αβ )

1− β

3 Example 2: Saving Under Certainty

Here, a consumer maximizes utility over an infinite horizon, t = 0, 1, 2,..., earning incomefrom labor and from investments.

At = beginning-of-period assets

At can be negative, that is, the consumer is allowed to borrow

yt = labor income (exogenous)

ct = consumption

saving = st = At + yt − ct

r = constant interest rate

9

8/7/2019 irelandp


Evolution of assets:At+1 = (1 + r)st = (1 + r)(At + yt − ct)

Note:

At + yt − ct = µ1

1 + r¶At+1

At =

µ1

1 + r

¶At+1 + ct − yt

Similarly,

At+1 =

µ1

1 + r

¶At+2 + ct+1 − yt+1

Combining these last two equalities yields

At =

µ1

1 + r

¶2

At+2 +

µ1

1 + r

¶(ct+1 − yt+1) + (ct − yt)

Continuing in this manner yields

At =

µ1

1 + r

¶T

At+T +T −1Xj=0

µ1

1 + r

¶j

(ct+j − yt+j).

Now assume that the sequence {At}∞

t=0 must remain bounded (while borrowing is allowed,unlimited borrowing is ruled out), and take the limit as T →∞ to obtain

At =∞

Xj=0µ

1

1 + r

¶j

(ct+j − yt+j)

or

At +∞Xj=0

µ1

1 + r

¶j

yt+j =∞Xj=0

µ1

1 + r

¶j

ct+j. (16)

Equation (16) takes the form of an infinite horizon budget constraint, indicating that overthe infinite horizon beginning at any period t, the consumer’s sources of funds includeassets At and the present value of current and future labor income, while the consumer’suse of funds is summarized by the present value of current and future consumption.

The consumer’s problem: choose the sequences {st}∞

t=0 and {At}∞

t=1 to maximize the utilityfunction

∞Xt=0

β tu(ct) =∞Xt=0

β tu(At + yt − st)

subject to the constraintsA0 given

and(1 + r)st ≥ At+1

for all t = 0, 1, 2,...

10

8/7/2019 irelandp


To solve the problem via dynamic programming, note first that

At = state variable

st = control

Set up the Bellman equation

v(At; t) = maxst

u(At + yt − st) + βv(At+1; t + 1) st (1 + r)st ≥ At+1

v(At; t) = maxst

u(At + yt − st) + βv [(1 + r)st; t + 1]

FOC for st:−u0(At + yt − st) + β (1 + r)v0[(1 + r)st; t + 1] = 0

Envelope condition for At:v0(At; t) = u0(At + yt − st)

Use the constraints to rewrite these optimality conditions as

u0(ct) = β (1 + r)v0(At+1; t + 1) (17)

andv0(At; t) = u0(ct) (18)

Since (18) must hold for all t = 0, 1, 2,..., it implies

v0(At+1; t + 1) = u0(ct+1)

Substitute this result into (17) to obtain:

u0(ct) = β (1 + r)u0(ct+1) (19)

Now make 2 extra assumptions:

a) β (1 + r) = 1 or 1 + r = 1/β , the interest rate equals the discount rate

b) u is strictly concave

Under these 2 additional assumptions, (19) implies

u0(ct) = u0(ct+1)

orct = ct+1

And since this last equation must hold for all t = 0, 1, 2,..., it implies

ct = ct+j for all j = 0, 1, 2,...

11

8/7/2019 irelandp


Now, return to (16):

At +∞Xj=0

µ1

1 + r

¶j

yt+j =∞Xj=0

µ1

1 + r

¶j

ct+j. (16)

At +

∞Xj=0

µ 1

1 + r¶j

yt+j = ct

∞Xj=0 β

j

(20)

FACT: Since |β | < 1,∞Xj=0

β j =1

1− β

To see why this is true, multiply both sides by 1− β :

1 =1− β

1− β

= (1− β )

∞Xj=0

β j

= (1 + β + β 2 + ...)− β (1 + β + β 2 + ...)

= (1 + β + β 2 + ...)− (β + β 2 + β 3 + ...)

= 1

Use this fact to rewrite (20):

At +∞

Xj=0 µ1

1 + r¶j

yt+j = µ1

1− β ¶ ct

or

ct = (1− β )

"At +

∞Xj=0

µ1

1 + r

¶j

yt+j

#(21)

Equation (21) indicates that it is optimal to consume a fixed fraction 1−β of wealth at eachdate t, where wealth consists of value of current asset holdings and the present dis-counted value of future labor income. Thus, (21) describes a version of the permanentincome hypothesis.

4 Stochastic Dynamic Programming

4.1 A Dynamic Stochastic Optimization Problem

Discrete time, infinite horizon: t = 0, 1, 2,...

yt = state variable

zt = control variable

12

8/7/2019 irelandp


εt+1 = random shock, which is observed at the beginning of t + 1

Thus, when zt is chosen:

εt is known ...

... but εt+1 is still viewed as random.

The shock εt+1 may be serially correlated, but will be assumed to have the Markov property(i.e., to be generated by a Markov process): the distribution of εt+1 depends on εt, butnot on εt−1, εt−2, εt−3,....

For example, εt+1 may follow a first-order autoregressive process:

εt+1 = ρεt + ηt+1.

Now, the full state of the economy at the beginning of each period is described jointly by

the pair of values for yt and εt, since the value for εt is relevant for forecasting, that is,forming expectations of, future values of εt+j , j = 1, 2, 3,....

Objective function:

E 0

∞Xt=0

β tF (yt, zt, εt)

1 > β > 0 discount factor

E 0 = expected value as of t = 0

Constraint describing the evolution of the state variable

yt + Q(yt, zt, εt+1) ≥ yt+1

for all t = 0, 1, 2,... and for all possible realizations of εt+1

Thus, the value of yt+1 does not become known until εt+1 is observed at the beginning of t + 1 for all t = 0, 1, 2,...

Constraint on initial value of the state variable:

y0 given

The problem: choose contingency plans for zt, t = 0, 1, 2,..., and yt, t = 1, 2, 3,..., tomaximize the objective function subject to all of the constraints.

Notes:

a) In order to incorporate uncertainty, we have really only made two adjustments tothe problem:

13

8/7/2019 irelandp


First, we have added the shock εt to the objective function for period t and theshock εt+1 to the constraint linking periods t and t + 1.

And second, we have assumed that the planner cares about the expected valueof the objective function.

b) For simplicity, the functions F and Q are now assumed to be time-invariant,although now they depend on the shock as well as on the state and control variable.

c) For simplicity, we have also dropped the second set of constraints, c ≥ G(yt, zt).Adding them back is straightforward, but complicates the algebra.

d) In the presence of uncertainty, the constraint

yt + Q(yt, zt, εt+1) ≥ yt+1

must hold, not only for all t = 0, 1, 2,..., but for all possible realizations of εt+1as well. Thus, this single equation can actually represent a very large number of constraints.

e) The Kuhn-Tucker theorem can still be used to solve problems that feature uncer-tainty. But because problems with uncertainty can have a very large number of constraints, the Kuhn-Tucker theorem can become very difficult to apply in prac-tice, since one may have to introduce a very large number of Lagrange multipliers.Dynamic programming, therefore, can be an easier and more convenient way tosolve dynamic stochastic optimization problems.

4.2 The Dynamic Programming Formulation

Once again, for any values of y0 and ε0, define

v(y0, ε0) = max{zt}∞t=0,{yt}

∞

t=1

E 0

∞Xt=0

β tF (yt, zt, εt)

subject toy0 and ε0 given

yt + Q(yt, zt, εt+1) ≥ yt+1 for all t = 0, 1, 2,... and all εt+1

More generally, for any period t and any values of yt and εt, define

v(yt, εt) = max{zt+j}∞

j=0,{yt+j}

∞

j=1E t

∞Xj=0

β jF (yt+j, zt+j, εt+j)

subject toyt and εt given

yt+j + Q(yt+j , zt+j , εt+j+1) ≥ yt+j+1 for all j = 0, 1, 2,... and all εt+j+1

Note once again that the value function is a maximum value function.

14

8/7/2019 irelandp


Now separate out the time t components:

v(yt, εt) = maxzt,yt+1

[F (yt, zt, εt) + max{zt+j}∞j=1,{yt+j}∞j=2

E t

∞Xj=1

β jF (yt+j, zt+j, εt+j)]


yt + Q(yt, zt, εt+1) ≥ yt+1 for all εt+1

yt+j + Q(yt+j , zt+j , εt+j+1) ≥ yt+j+1 for all j = 1, 2, 3,... and all εt+j+1

Relabel the time indices:

v(yt, εt) = maxzt,yt+1

[F (yt, zt, εt) + β max{zt+1+j}

∞

j=0,{yt+1+j}∞

j=1

E t

∞Xj=0

β jF (yt+1+j , zt+1+j, εt+1+j)]


yt + Q(yt, zt, εt+1) ≥ yt+1 for all εt+1

yt+j+1 + Q(yt+1+j, zt+1+j , εt+1+j+1) ≥ yt+1+j+1 for all j = 0, 1, 2,... and all εt+1+j+1

FACT (Law of Iterated Expectations): For any random variable X t+j , realized at time t+j,j = 0, 1, 2,...:

E tE t+1X t+j = E tX t+j.

To see why this fact holds true, consider the following example:

Suppose εt+1 follows the first-order autoregression:

εt+1 = ρεt + ηt+1, with E tηt+1 = 0

Henceεt+2 = ρεt+1 + ηt+2, with E t+1ηt+2 = 0

orεt+2 = ρ2εt + ρηt+1 + ηt+2.

It follows that

E t+1εt+2 = E t+1(ρ2εt + ρηt+1 + ηt+2) = ρ2εt + ρηt+1

and thereforeE tE t+1εt+2 = E t(ρ2εt + ρηt+1) = ρ2εt.

It also follows thatE tεt+2 = E t(ρ2εt + ρηt+1 + ηt+2) = ρ2εt.

15

8/7/2019 irelandp


8/7/2019 irelandp


The envelope condition for yt is:

v1(yt, εt) = F 1(yt, zt, εt) + βE t{v1[yt + Q(yt, zt, εt+1), εt+1][1 + Q1(yt, zt, εt+1)]} (24)

Equations (23)-(24) coincide exactly with the first-order conditions for zt and yt that we

would have derived through a direct application of the Kuhn-Tucker theorem to theoriginal, dynamic stochastic optimization problem.


yt+1 = yt + Q(yt, zt, εt+1) (25)

we can think of (23) and (24) as forming a system of three equations in two unknownvariables yt and zt and one unknown function v. This system of equations determinesthe problem’s solution, given the behavior of the exogenous shocks εt.

Note that (25) is in the form of a diff erence equation; once again, solving a dynamic

optimization problem involves solving a diff erence equation.

5 Example 3: Saving with Multiple Random Returns

This example extends example 2 by:

a) Introducing n ≥ 1 assets

b) Allowing returns on each asset to be random

As in example 2, we will not be able to solve explicitly for the value function, but we will

be able to learn enough about its properties to derive some useful economic results.

Since we are extending the example in two ways, assume for simplicity that the consumerreceives no labor income, and therefore must finance all of his or her consumption byinvesting.

At = beginning-of-period financial wealth

ct = consumption

sit = savings allocated to asset i = 1, 2,...,n

Hence,At = ct +

nXi=1

sit

Rit+1 = random gross return on asset i, not known until t + 1

Hence, when sit is chosen:

Rit is known ...

17

8/7/2019 irelandp


... but Rit+1 is still viewed as random.

Hence

At+1 =n

Xi=1Rit+1sit

does not become known until the beginning of t+1, even though the sit must be chosenduring t.

Utility:

E 0

∞Xt=0

β tu(ct) = E 0

∞Xt=0

β tu(At −

nXi=1

sit)

The problem can now be stated as: choose contingency plans for sit for all i = 1, 2,...,nand t = 0, 1, 2,... and At for all t = 0, 1, 2,... to maximize

E 0

∞Xt=0

β tu(At −

nXi=1

sit)

subject toA0 given

andnXi=1

Rit+1sit ≥ At+1

for all t = 0, 1, 2,... and all possible realizations of Rit+1 for each i = 1, 2,...,n.

As in the general case, the returns can be serially correlated, but must have the Markovproperty.

To solve this problem via dynamic programming, let

At = state variable

sit, i = 1, 2,...n = control variables

Rt = [R1t, R2t,...Rnt] = vector of random returns

The Bellman equation is

v(At, Rt) = max{sit}

ni=1

u(At −

nXi=1

sit) + βE tv(nXi=1

Rit+1sit, Rt+1)

FOC:

−u0(At −

nXi=1

sit) + βE tRit+1v1(nXi=1

Rit+1sit, Rt+1) = 0

for all i = 1, 2,...,n

18

8/7/2019 irelandp


Envelope condition:

v1(At, Rt) = u0(At −

nXi=1

sit)

Use the constraints to rewrite the FOC and envelope conditions more simply as

u0(ct) = βE tRit+1v1(At+1, Rt+1)

for all i = 1, 2,...,n andv1(At, Rt) = u0(ct)

Since the envelope condition must hold for all t = 0, 1, 2,..., it implies

v1(At+1, Rt+1) = u0(ct+1)

Hence, the FOC imply that

u0

(ct) = βE tRit+1u0

(ct+1) (26)must hold for all i = 1, 2,...,n

Equation (26) generalizes (19) to the case where there is more than one asset and wherethe asset returns are random. It must hold for all assets i = 1, 2,...,n, even thougheach asset may pay a diff erent return ex-post.

In example 2, we combined (19) with some additional assumptions to derive a version of the permanent income hypothesis. Similarly, we can use (26) to derive a version of thefamous capital asset pricing model.

For simplicity, letmt+1 =

βu0(ct+1)

u0(ct)

denote the consumer’s intertemporal marginal rate of substitution.

Then (26) can be written more simply as

1 = E tRit+1mt+1 (27)

Keeping in mind that (27) must hold for all assets, suppose that there is a risk-free asset,with return Rf

t+1 that is known during period t. Then Rf t+1 must satisfy

1 = Rf t+1E tmt+1

or

E tmt+1 =1

Rf t+1

(28)

19

8/7/2019 irelandp


FACT: For any two random variables x and y,

cov(x, y) = E [(x− μx)(y − μy)], where μx = E (x) and μy = E (y).

Hence,

cov(x, y) = E [xy − μxy − xμy + μxμy]= E (xy)− μxμy − μxμy + μxμy

= E (xy)− μxμy

= E (xy)−E (x)E (y)

Or, by rearranging,E (xy) = E (x)E (y) + cov(x, y)

Using this fact, (27) can be rewritten as

1 = E tRit+1mt+1 = E tRit+1E tmt+1 + covt(Rit+1, mt+1)

or, using (28),Rf t+1 = E tRit+1 + Rf

t+1covt(Rit+1, mt+1)

E tRit+1 −Rf t+1 = −Rf

t+1covt(Rit+1, mt+1) (29)

Equation (29) indicates that the expected return on asset i exceeds the risk-free rate onlyif Rit+1 is negatively correlated with mt+1.

Does this make sense?

Consider that an asset that acts like insurance pays a high return Rit+1 during bad

economic times, when consumption ct+1 is low. Therefore, for this asset:covt(Rit+1, ct+1) < 0⇒ covt[Rit+1, u0(ct+1)] > 0

⇒ covt(Rit+1, mt+1) > 0⇒ E tRit+1 < Rf t+1.

This implication seems reasonable: assets that work like insurance often haveexpected returns below the risk-free return.

Consider that common stocks tend to pay a high return Rit+1 during good economictimes, when consumption ct+1 is high. Therefore, for stocks:

covt(Rit+1, ct+1) > 0⇒ covt[Rit+1, u0(ct+1)] < 0

⇒covt(Rit+1, mt+1) < 0

⇒E tRit+1 > R

f

t+1.This implication also seems to hold true: historically, stocks have had expectedreturns above the risk-free return.

Recalling once more that (29) must hold for all assets, consider in particular the asset whosereturn happens to coincide exactly with the representative consumer’s intertemporalmarginal rate of substitution:

Rmt+1 = mt+1.

20

8/7/2019 irelandp


For this asset, equation (29) implies

E tRmt+1 −Rf

t+1 = −Rf t+1covt(Rm

t+1, mt+1)

E tmt+1 −Rf t+1 = −Rf

t+1covt(mt+1, mt+1) = −Rf t+1vart(mt+1)

or

−Rf t+1 =

E tmt+1 −Rf t+1

vart(mt+1)(30)

Substitute (30) into the right-hand side of (29) to obtain

E tRit+1 −Rf t+1 =

covt(Rit+1, mt+1)

vart(mt+1)(E tmt+1 −Rf

t+1)

orE tRit+1 −Rf

t+1 = bit(E tmt+1 −Rf t+1), (31)

wherebit =

covt(Rit+1, mt+1)

vart(mt+1)

is like the slope coefficient from a regression of Rit+1 on mt+1.

Equation (31) is a statement of the consumption-based capital asset pricing model, orconsumption CAPM. This model links the expected return on each asset to the risk-free rate and the representative consumer’s intertemporal marginal rate of substitution.

21

8/7/2019 irelandp


Eigenvalues and Eigenvectors

As we have seen, the equations characterizing the solution to a dynamic optimizationproblem usually take the form of diff erential or diff erence equations. Thus, it will be usefulto consider diff erential and diff erence equations in more detail and to focus, in particular, onways in which these equations can be solved. In solving diff erential and diff erence equations,some basic results from matrix algebra, having to do with eigenvalues and eigenvectors,will often be of use. So before moving on to explicitly consider diff erential and diff erenceequations, we will review some definitions and facts about eigenvalues and eigenvectors.

Reference:

Simon and Blume, Chapter 23.

1 Definitions and Examples

Definition Let A be a square matrix. An eigenvalue of A is a number r that, whensubtracted from each of the diagonal elements of A, converts A into a singular matrix.

Since A is singular if and only if its determinant is zero, we can calculate the eigenvaluesof A by solving the characteristic equation

det(A−rI ) = 0, (1)

where I is the identity matrix and det(A−rI ) is the characteristic polynomial of A.

Example: Consider the general 2× 2 matrix

A =

∙a11 a12

a21 a22

¸.

For this matrix,

det(A−rI ) = det

∙a11−r a12

a21 a22−r

¸

= (a11−r)(a22−r)−a12a21

= a11a22− (a11 + a22)r + r2−a12a21

= r2− (a11 + a22)r + (a11a22−a12a21),

1

8/7/2019 irelandp


so that the characteristic polynomial is a second order polynomial. Thus, the charac-teristic equation (1) can be solved using the quadratic formula:

r =a11 + a22 ± [(a11 + a22)2−4(a11a22−a12a21)]1/2

2.

This example reveals that a 2 × 2 matrix has two eigenvalues. More generally, an n × nmatrix has n eigenvalues.

Recall, next, that a square matrix B is singular if and only if there exists a nonzero vectorx such that Bx = 0.

This fact tells us that if r is an eigenvalue of A, so that A−rI is singular, then there existsa nonzero vector v such that

(A−rI )v = 0. (2)

This vector v is called an eigenvector of A corresponding to the eigenvalue r. Note that (2)

is equivalent to Av = rv, (3)

so that an eigenvector must also satisfy (3).

Example: Consider the 2× 2 matrix

A =

∙−1 32 0

¸.

The characteristic equation is

0 = det ∙ −1−r 3

2 −r¸ = r(1 + r)−6 = r2 + r−6 = (r + 3)(r−2),

so that the eigenvalues of A are r1 =−3 and r2 = 2.

The eigenvector v1 corresponding to the eigenvalue r1 =−3 must satisfy∙−1 + 3 32 3

¸∙v11v12

¸=

∙00

¸

2v11 + 3v12 = 0

v12 = −(2/3)v11

Evidently, any vector of the form ∙v11−(2/3)v11

¸

is an eigenvector of A. A simple vector of this form is∙3−2

¸

but, as this example shows, eigenvectors are not uniquely determined.

2

8/7/2019 irelandp


In fact, we might have seen this earlier by examining equation (2), defining an eigenvector:

(A−rI )v = 0. (2)

Clearly, if a vector v1 satisfies (2), then so does the vector αv1 for any α 6= 0.

Exercise: Show that the eigenvector v2 corresponding to r2 = 2 must be of the form∙v21v21

¸,

as is ∙11

¸.

2 Using Eigenvalues and Eigenvectors to Diagonalize

a Square MatrixEigenvalues and eigenvectors are useful in solving diff erential and diff erence equations be-

cause they can often be used to diagonalize a square matrix.

Let A be an n×n matrix, and consider the problem of finding a nonsingular matrix P suchthat

P −1AP = D, (4)

where D is a diagonal matrix.

To solve this problem, calculate the n eigenvalues of A, r1, r2, ..., rn, along with the cor-

responding eigenvectors v1, v2, ..., vn. Then form the matrix P using the eigenvectorsas its columns:

P =£

v1 v2 ... vn

¤.

And form the matrix D by placing the eigenvalues on the diagonal and zeros everywhereelse:

D =

⎡⎢⎢⎣

r1 0 ... 00 r2 ... 0... ... ... ...0 0 ... rn

⎤⎥⎥⎦ .

P will be nonsingular if A has n linearly independent eigenvectors. In this case, equation

(4) requires thatAP = P D

A£

v1 v2 ... vn¤

=£

v1 v2 ... vn

¤⎡⎢⎢⎣

r1 0 ... 00 r2 ... 0... ... ... ...0 0 ... rn

⎤⎥⎥⎦

£Av1 Av2 ... Avn

¤=£

r1v1 r2v2 ... rnvn¤

(5)

3

8/7/2019 irelandp


Since the ri’s are eigenvalues and the vi’s are eigenvectors, the definition (3) tells us that(5) must hold.

Thus, we can state the following theorem.

Theorem Let A be an n× n matrix, let r1, r2, ..., rn be the eigenvalues of A, and let v1,v2, ..., vn be the corresponding eigenvectors. Form the matrix

P =£

v1 v2 ... vn¤

using the eigenvectors as columns. If A has n linearly independent eigenvectors, sothat P is nonsingular, then

P −1AP = D, (4)

where

D =

⎡

⎢⎢⎣r1 0 ... 00 r2 ... 0

... ... ... ...0 0 ... rn

⎤

⎥⎥⎦is a matrix with the n eigenvalues along the diagonal and zeros everywhere else. Con-versely, if P −1AP = D is diagonal, then the diagonal elements of D must be theeigenvalues of A, and the columns of P must be the corresponding eigenvectors.

Note: An n × n matrix that does not have n linearly independent eigenvectors is callednondiagonalizable or defective.

Let’s check that (4) holds for the matrix A that we considered in one of our earlier examples:

A =∙ −

1 32 0¸

.

For this choice of A, we’ve already found the eigenvalues r1 = −3 and r2 = 2, as well asthe corresponding eigenvectors

v1 =

∙3−2

¸and v2 =

∙11

¸.

Accordingly, form the matrix P using v1 and v2 as its columns:

P =∙

3 1−2 1

¸.

For this matrix P ,

P −1 =1

5

∙1 −12 3

¸.

4

8/7/2019 irelandp


8/7/2019 irelandp


Together r1 and v1 must satisfy(A−r1I )v1 = 0½∙

1 −22 1

¸−

∙1 + 2i 00 1 + 2i

¸¾ ∙α11 + β

11i

α12 + β 12

i

¸=

∙00

¸

∙ −2i−

22 −2i¸∙

α11 + β 11iα12 + β 12

i¸

=∙

00¸

.

Hence, in particular,−2i(α11 + β

11i)−2(α12 + β

12i) = 0

−2iα11 + 2β 11−

2α12−2β 12

i = 0 + 0i,

which requires that2(β

11−α12) = 0 or α12 = β

11

and−2(α11 + β

12)i = 0i or β

12= −α11.

Evidently, v1 takes the more specific form

v1 =

∙α11 + β

11i

β 11−

α11i

¸.

One particularly simple choice for v1 is therefore

v1 =

∙1 + i1− i

¸.

Now let’s find the eigenvector v2 corresponding to the eigenvalue r2. This second eigenvector

will be of the general formv2 =

∙α21 + β

21i

α22 + β 22

i

¸.

Together r2 and v2 must satisfy(A−r2I )v2 = 0½∙

1 −22 1

¸−

∙1−2i 00 1−2i

¸¾ ∙α21 + β

21i

α22 + β 22

i

¸=

∙00

¸∙

2i −22 2i

¸∙α21 + β

21i

α22 + β 22

i

¸=

∙00

¸.

Hence, in particular,2i(α21 + β

21i)−2(α22 + β

22i) = 0

2iα21−2β 21−

2α22−2β 22

i = 0 + 0i,

which requires that−2(β

21+ α22) = 0 or α22 = −β

21

and2(α21−β

22)i = 0i or β

22= α21.

6

8/7/2019 irelandp


Evidently, v2 takes the more specific form

v2 =

∙α21 + β

21i

−β 21

+ α21i

¸.

One particularly simple choice for v2 is therefore

v2 =

∙1 + i−1 + i

¸.

Based on these results, form the matrix P using v1 and v2 as its columns:

P =

∙1 + i 1 + i1− i −1 + i

¸.

And form D using r1 and r2 as its diagonal elements:

D =

∙1 + 2i 00 1−2i

¸.

Now let’s verify thatP −1AP = D

AP = P D∙1 −22 1

¸ ∙1 + i 1 + i1− i −1 + i

¸=

∙1 + i 1 + i1− i −1 + i

¸∙1 + 2i 00 1−2i

¸

∙ 1 + i−2 + 2i 1 + i + 2−2i2 + 2i + 1− i 2 + 2i−1 + i

¸=∙ (1 + i)(1 + 2i) (1 + i)(1−2i)

(1− i)(1 + 2i) (−1 + i)(1−2i)¸

∙−1 + 3i 3− i3 + i 1 + 3i

¸=

∙−1 + 3i 3− i3 + i 1 + 3i

¸,

where the last line is exactly as required!

7

8/7/2019 irelandp


Diff erential Equations

Regardless of whether we use the Kuhn-Tucker theorem or the maximum principle to solvea dynamic optimization problem in continuous time, we must ultimately solve a system of diff erential equations. Thus, we will now consider diff erential equations and their solutionsin more detail. We will begin by considering the solution to a single diff erential equationand then go on to consider the solution to systems of multiple diff erential equations.

1 Scalar Diff erential EquationsReference:


1.1 Introduction

Let’s start by considering an example: a bank account with continuous compounding of interest.

y(t) = funds in the bank account at time tr = interest rate

The evolution of y(t) is described by

dy(t)

dt= y(t) = ry(t), (1)

since at each point in time, the interest payment ry(t) is added to the account.

One solution to (1) isy(t) = ert.

To see this, note thaty(t) = rert = ry(t).

But note also thaty(t) = kert

also satisfies (1), sincey(t) = rkert = ry(t)

for any constant k.

1

8/7/2019 irelandp


Thus, in general, (1) has many solutions.

Why isn’t the solution determined uniquely in this example? To see why, just notethat in order to know the amount of funds in a bank account, one needs to knowmore than just the rate of interest; one also needs to know the amount that wasinitially deposited in the account.

Let y0 denote the amount initially deposited. Then y(t) must satisfy both (1) andthe initial condition

y(0) = y0. (2)

This initial condition determines a specific value for k:

y(t) = kert =⇒ y(0) = k =⇒ k = y0.

Hence, the unique function that satisfies both (1) and (2) is

y(t) = y0ert.

Notes:

a) Obviously, knowing the value of y(t) at any date t would allow one to find theexact value of k. But problems like finding a function y(t) that satisfies both (1)and the initial condition (2) are so common that they have a special name: theyare called initial value problems.

b) The many solutions to (1) and the particular solution to (1) and (2) are all functionsy(t). This fact makes solving a diff erential equation more complicated than solvinga simple algebraic equation that has as its solution an unknown variable.

Definition An ordinary diff erential equation is an equation

y(t) = F [y(t), t]

relating the derivative y(t) of an unknown function y(t) to an expression involving y(t)and t. If the expression does not specifically involve t, so that the equation can bewritten as

y(t) = F [y(t)],

then the diff erential equation is said to be autonomous or time-independent. Otherwise,the diff erential equation is nonautonomous or time-dependent.

Notes:

a) The definition describes first order diff erential equations that involve only thefirst derivative of the unknown function. More generally, an ith order diff erentialequation involves derivatives up to and including the ith derivative of y(t).

b) The definition describes ordinary diff erential equations that relate the value of afunction of a single variable to the value of its derivative. Diff erential equationsthat link the value of a function of several variables to the values of its partialderivatives are called partial diff erential equations.

2

8/7/2019 irelandp


c) Again, it is important to emphasize that solving a diff erential equation involvesfinding an unknown function y(t).

d) Sometimes the solution to a diff erential equation will be a constant function of theform y(t) = c, where c is a constant. These constant solutions are called steadystates, stationary solutions, stationary points, or equilibria. Since y(t) = 0 when

y(t) = c, the steady states of a diff erential equation such as

y(t) = F [y(t)]

can be found by finding any constants c that satisfy

0 = F (c).

e) As in our first example, the full set of solutions y(t) can often be indexed by aparameter k, and written y(t, k). If every solution to the diff erential equation canbe achieved by letting k take on diff erent values, then y(t, k) is called a general

solution of the diff

erential equation.

1.2 Explicit Solutions

Most diff erential equations do not have solutions that can be written down explicitly. Thereare, however, some important exceptions, two of which are described in the examplesbelow.

Example 1: Consider the diff erential equation

y(t) = ay(t).

As we have seen, the general solution to this equation is

y(t) = keat,

where diff erent values of k correspond to diff erent values of y(0) = y0.

Example 2: Generalize the first example by considering the diff erential equation

y(t) = ay(t) + b.

Here, the general solution isy(t) = −b/a + keat,

sincey(t) = akeat

anday(t) + b = −b + akeat + b = akeat.

These two examples are among the few for which explicit solutions are available. Since theseexamples are both ones in which y(t) enters the equation linearly, they are examplesof linear diff erential equations.

3

8/7/2019 irelandp


Fortunately, there is a general result on the existence and uniqueness of solutions to initialvalue problems that is very easy to apply.

Theorem (Fundamental Theorem of Diff erential Equations) Consider the initialvalue problem described by the diff erential equation

y(t) = F [y(t), t]

and the initial conditiony(0) = y0.

If F is continuous at (y0, 0), then there exists a continuously diff erentiable functiony(t) defined on an interval I = (−a, a) for some a > 0 such that y(0) = y0 andy(t) = F [y(t), t] for all t ∈ I . That is, the function y(t) solves the problem on I .Moreover, if F is continuously diff erentiable at (y0, 0), then the solution y(t) is unique.

In some cases, the number a > 0 referred to in the theorem can be arbitrarily large. Forexample, consider the initial value problem:

y(t) = ry(t) and y(0) = y0.

We know that the solution isy(t) = y0ert

and this solution clearly applies for all t ∈ (−∞,∞).

But in other cases, a > 0 must be finite. For example, consider the initial value problem

y(t) =1

t2 − 1and y(0) = 0.

A solution to this problem is given by

y(t) =1

2ln

µ1− t

1 + t

¶

since

y(0) =1

2ln(1) = 0

and

y(t) =1

2

µ1 + t

1− t

¶ ∙(1 + t)(−1)− (1− t)

(1 + t)2

¸

= 12

µ1

1− t

¶µ−2

1 + t

¶

= −

∙1

(1− t)(1 + t)

¸

= −

µ1

1− t2

¶

=1

t2 − 1.

4

8/7/2019 irelandp


8/7/2019 irelandp


(3) also implies that y(t) > 0 whenever

y(t) < 0 and 2 < y(t), but these two conditions cannot hold simultaneously.

Thus, we know that (3) requires

y(t) = 0 whenever y(t) = 0 or y(t) = 2

˙y(t) > 0 whenever 2 > y(t) > 0y(t) < 0 whenever y(t) > 2 or 0 > y(t)

These results can be illustrated graphically using a one-dimensional phase diagram.

The phase diagram reveals that:

If y(0) = y0 < 0, then y(t) decreases forever.

If y(0) = y0 = 0, then y(t) remains at 0 forever.

If y(0) = y0 > 0, then y(t) converges to 2.

In this example, the stationary solution y(t) = 2 is asymptotically stable, since for allvalues of y0 close to 2, the solution converges to the steady state.

On the other hand, the stationary solution y(t) = 0 is unstable, since y(t) moves awayfrom y0 6= 0, even when y0 is arbitrarily close to 0.

Example 2: Consider the more complicated diff erential equation

y(t) = ey(t){y(t)[1− y(t)2]}.

Stationary solutions y(t) = c can be found be solving

0 = ec[c(1− c2)]

or, since ec is always positive,

0 = c(1− c2).

Evidently, there are three stationary solutions:

y(t) = 0, y(t) = 1, and y(t) = −1

Next, consider that y(t) > 0 whenever:

y(t) > 0 and 1 > y(t)2 =⇒ 1 > y(t) > 0

Also, y(t) > 0 whenever

y(t) < 0 and y(t)2

> 1 =⇒ y(t) < −1Thus,

y(t) = 0 whenever y(t) = 0, y(t) = 1, or y(t) = −1

y(t) > 0 whenever 1 > y(t) > 0 or y(t) < −1

y(t) < 0 whenever y(t) > 1 or 0 > y(t) > −1

The phase diagram reveals that y(t) = −1 and y(t) = 1 are asymptotically stable,while y(t) = 0 is unstable.

6

8/7/2019 irelandp


0 2

Example 1: y’(t) = y(t)[2-y(t)]

0-1 1

Example 2: y’(t) = exp[y(t)]{y(t)[1-y(t)2]}

ONE-DIMENS

PHASE DIAG

8/7/2019 irelandp


8/7/2019 irelandp


define the new functionv(t) = y(t).

Thenv(t) = y(t),

so that (5) is equivalent to

y(t) = v(t)

v(t) = F [v(t), y(t), t]

Thus, it is without loss of generality that we’ve focused mainly on first orderdiff erential equations.

Fact 2) Every nonautonomous diff erential equation can be written as a system of twoautonomous equations as follows. Starting from the nonautonomous equation

y(t) = F [y(t), t], (6)

define the new functionv(t) = t.

Thenv(t) = 1,

so that (6) is equivalent to

v(t) = 1

y(t) = F [y(t), v(t)]

Thus, it is also without loss of generality that we’ve focused mainly on autonomousdiff erential equations.

Fact 3) The existence and uniqueness results for scalar diff erential equations also holdfor systems of diff erential equations.

2.2 Explicit Solutions

As in the scalar case, explicit solutions are available for systems of linear diff erential equa-tions.

A general system of linear equations can be written as

x1(t) = a11x1(t) + a12x2(t) + ... + a1nxn(t)

x2(t) = a21x1(t) + a22x2(t) + ... + a2nxn(t)

...

xn(t) = an1x1(t) + an2x2(t) + ... + annxn(t)

or, more simply, asx(t) = Ax(t), (7)

8

8/7/2019 irelandp


where

x(t) =

⎡⎢⎢⎣

x1(t)x2(t)...xn(t)

⎤⎥⎥⎦ ,

x(t) =

⎡⎢⎢⎣

x1(t)x2(t)...xn(t)

⎤⎥⎥⎦ ,

and

A =

⎡⎢⎢⎣

a11 a12 ... a1na21 a22 ... a2n... ... ... ...an1 an2 ... ann

⎤⎥⎥⎦ .

In the simplest case, A is diagonal, so that (7) is just a system of n self-contained equations,each of which takes the formxi(t) = aiixi(t).

We already know that the general solution in this case is just

xi(t) = kieaiit

for i = 1, 2,...,n.

But even in the more general case where A is not diagonal, we can often solve the linearsystem (7) almost as easily by drawing on our results having to do with eigenvalues,

eigenvectors, and diagonalizable matrices.

Begin by calculating the eigenvalues r1, r2, ..., rn of A, together with the associated eigen-vectors v1, v2, ..., vn. Then form the matrix

P =£

v1 v2 ... vn¤

using the eigenvectors as columns and the matrix

D =

⎡

⎢⎢⎣r1 0 ... 00 r2 ... 0

... ... ... ...0 0 ... rn

⎤

⎥⎥⎦

with the eigenvalues along the diagonal and zeros everywhere else. If the eigenvectorsare linearly independent, then we know from before that

P −1AP = D.

9

8/7/2019 irelandp


Next, define a new vector of functions

z(t) = P −1x(t),

so that

x(t) = P z(t),z(t) = P −1x(t),

andx(t) = P z(t).

Now (7) can be rewritten asx(t) = Ax(t), (7)

P z(t) = AP z(t)

z(t) = P −1AP z(t)

˙z(t) = Dz(t). (8)

And since the matrix D is diagonal, (8) is just a system of n self-contained equations, eachof which takes the form

zi(t) = rizi(t)

and therefore has the familiar solution

zi(t) = kierit.

Finally, with these solutions for the zi(t) in hand, undo the transformation to find the

solutions for the xi(t):

x(t) = P z(t)

=£

v1 v2 ... vn¤⎡⎢⎢⎣

z1(t)z2(t)...zn(t)

⎤⎥⎥⎦

=£

v1 v2 ... vn¤⎡⎢⎢⎣

k1er1t

k2er2t

...knernt

⎤⎥⎥⎦

= k1er1tv1 + k2er2tv2 + ... + knerntvn

Note that by setting k1 = k2 = ... = kn = 0, we obtain the stationary solution x(t) = 0 to(7).

10

8/7/2019 irelandp


Suppose now that the eigenvalues r1, r2, ..., rn are real numbers. If all of the ri are negative,then

limt→∞

erit = 0

for all i = 1, 2,...,n, which implies that for any choice of k1, k2, ..., kn corresponding

to any set of initial conditions x1(0), x2(0), ..., xn(0),limt→∞

x(t) = limt→∞

[k1er1tv1 + k2er2tv2 + ... + knerntvn] = 0.

Thus, if all of the eigenvalues of A are real and negative, then the stationary solutionx(t) = 0 of (7) is asymptotically stable.

On the other hand, even if just one of the eigenvalues ri is positive, so that

limt→∞

erit =∞,

then the stationary solution x(t) = 0 will be unstable.

2.3 Phase Diagrams

As in the scalar case, explicit solutions are usually not available for systems of diff erentialequations outside of the linear class. But again, even when an explicit solution is notavailable, the solution can be characterized graphically using a phase diagram.

Example 1: Consider the systemx(t) = −x(t) (9a)

y(t) = −y(t) (9b)

We already know that the solution to this system is just

x(t) = k1e−t and y(t) = k2e−t,

so that for any choices of k1 and k2 corresponding to any initial conditions x(0) =x0 and y(0) = y0,

limt→∞

x(t) = 0 and limt→∞

y(t) = 0.

But let’s draw the phase diagram anyway, to make sure that it illustrates theseresults.

Equation (9a) impliesx(t) = 0 whenever x(t) = 0

x(t) > 0 whenever x(t) < 0

x(t) < 0 whenever x(t) > 0

Equation (9b) implies

y(t) = 0 whenever y(t) = 0

y(t) > 0 whenever y(t) < 0

11

8/7/2019 irelandp


y(t) < 0 whenever y(t) > 0

These conditions can be illustrated using a two-dimensional phase diagram, whichshows that the stationary solution with x(t) = 0 and y(t) = 0 is asymptoticallystable.

Example 2: Now consider the more complicated system

x(t) = y(t)− x(t)2 (10a)

y(t) = −y(t) (10b)

This system is nonlinear, and is difficult to solve explicitly. But we can still charac-terize the solution using a phase diagram.

Begin by considering stationary solutions of the form x(t) = x and y(t) = y. Equations(10a) and (10b) imply that these solutions can be found by solving the equations

0 = y − x2

0 = −y.

Clearly, the only stationary solution has x(t) = 0 and y(t) = 0.

Next, note that (10a) implies

x(t) = 0 whenever y(t) = x(t)2

x(t) > 0 whenever y(t) > x(t)2

x(t) < 0 whenever y(t) < x(t)2

Equation (10b) impliesy(t) = 0 whenever y(t) = 0

y(t) > 0 whenever y(t) < 0

y(t) < 0 whenever y(t) > 0

The phase diagram reveals that the stationary solution with x(t) = 0 and y(t) = 0is unstable: although there are some initial conditions, such as those labelledas points A, B, and C, from which the system converges, there are other initialconditions, such as those labelled as D and E, which are close to the steady statebut which to not lead to convergence.

3 A Linearized System

Before finishing with our discussion of diff erential equations, let’s return once again to thesystem of diff erential equations that we derived when using the maximum principle tosolve the optimal growth example in continuous time.

12

8/7/2019 irelandp


yx’=0 (x=0)

AB

CD

TWO

PHA

Example 1:

x’(t) = -x(t), y’(t) = -y(t)

8/7/2019 irelandp


y

C

TWO

PHA

Example 2:

x’(t) = y(t) - x(t)2

y’(t) = -y(t)x

A B

D

E

8/7/2019 irelandp


That system consisted of two nonlinear diff erence equations: one for the capital stock,

˙k(t) = k(t)α − δk(t)− c(t),

and the other for consumption,

c(t) = c(t)[αk(t)α−1 − δ − ρ]

Earlier, we used a phase diagram to characterize the solution to this system. The diagramshowed us that for each possible value of k(0), there exists a unique value of c(0) suchthat the system converges to a steady state, with

limt→∞

k(t) = k∗ =

µδ + ρ

α

¶1/(α−1)

and

limt→∞c(t) = c∗

= k∗α

− δk∗

There is an alternative way of analyzing this system that relies on algebra rather thangeometry.

This alternative method involves taking a first order Taylor approximation around thesteady state (k∗, c∗) to the expressions on the right hand side of each of the two equa-tions, thereby approximating the nonlinear system by a linear system for which anexplicit solution exists.

Start by considering the diff erential equation for k(t):

˙k(t) = k(t)α − δk(t)− c(t) (11)

≈ (k∗α − δk∗ − c∗) + (αk∗α−1 − δ )[k(t)− k∗]− [c(t)− c∗]

= (αk∗α−1 − δ )[k(t)− k∗]− [c(t)− c∗]

= ρ[k(t)− k∗]− [c(t)− c∗],

since

αk∗α−1 − δ = α

µδ + ρ

α

¶− δ = ρ.

Next, consider the diff erential equation for c(t):

c(t) = c(t)[αk(t)α−1 − δ − ρ] (12)

≈ c∗[αk∗α−1 − δ − ρ]

+α(α− 1)c∗k∗α−2[k(t)− k∗] + [αk∗α−1 − δ − ρ][c(t)− c∗]

= θ[k(t)− k∗],

whereθ = α(α− 1)c∗k∗α−2 < 0.

13

8/7/2019 irelandp


Now define the new variablesx(t) = k(t)− k∗

andy(t) = c(t)− c∗,

so that x(t) is the deviation of k(t) from its steady state level and y(t) is the deviationof c(t) from its steady state level. Note that these definitions imply that

x(t) = ˙k(t)

andy(t) = c(t).

In terms of these new variables, (11) and (12) can be rewritten as

˙k(t) = ρ[k(t)− k∗]− [c(t)− c∗] (11)

x(t) = ρx(t)− y(t)

andc(t) = θ[k(t)− k∗] (12)

y(t) = θx(t)

If we let

z(t) =

∙x(t)y(t)

¸,

then these two equations can be written in matrix form as

z(t) =

∙x(t)y(t)

¸=

∙ρ −1θ 0

¸∙x(t)y(t)

¸= Az(t).

We know that this system of linear diff erential equations has the general solution

z(t) = q 1er1tv1 + q 2er2tv2,

where r1 and r2 are the eigenvalues of A and v1 and v2 are the corresponding eigenvec-tors.

By definition, r is an eigenvalue of A if it satisfies

0 = det(A− rI )

= det

∙ρ− r −1θ 0− r

¸

= r2 − ρr + θ.

14

8/7/2019 irelandp


The quadratic formula then implies that the eigenvalues are

r1 =ρ− [ρ2 − 4θ]1/2

2

and

r2 = ρ + [ρ2 − 4θ]1/2

2

Since ρ > 0 and θ < 0,ρ2 − 4θ > ρ2

so that

r1 =ρ− [ρ2 − 4θ]1/2

2<

ρ− (ρ2)1/2

2= 0

while

r2 =ρ + [ρ2 − 4θ]1/2

2>

ρ− (ρ2)1/2

2= 0.

We now know that the general solution takes the form

z(t) = q 1er1tv1 + q 2er2tv2,

where r1 < 0 and r2 > 0. Thus, the requirement that the system converge to thesteady state, so that

limt→∞

z(t) =

∙x(t)y(t)

¸=

∙k(t)− k∗

c(t)− c∗

¸= 0,

amounts to the requirement that q 2

= 0.

And if q 2 = 0, the solution reduces to

z(t) =

∙k(t)− k∗

c(t)− c∗

¸= q 1er1tv1.

Now consider

z(0) =

∙k(0)− k∗

c(0)− c∗

¸= q 1v1.

This equation shows that the constant q 1 can be chosen to satisfy the initial condition

k(0) given. This value of q 1, in turn, determines the unique value of c(0) that puts thesystem on the saddle path towards (k∗, c∗).

Once again, we can conclude that for each possible value of k(0), there exists a unique valueof c(0) such that the system converges to a steady state.

It turns out that many economic applications share the general structure of this example.

15

8/7/2019 irelandp


That is, a dynamic optimization problem with economic content will often give rise toa system of n nonlinear diff erential equations that can be approximated by a linearsystem of the form

z(t) = Az(t),

where the n × 1 vector

z(t) =∙

x(t)y(t)

¸

is made up of the n1 × 1 vector x(t) of predetermined variables, whose initial valuesx(0) are given, and the n2 × 1 vector y(t) of nonpredetermined, or jump, variables,whose initial values can adjust to place the system on the saddle path towards thesteady state, where n1 + n2 = n.

Order the eigenvalues of A so that r1 < r2 < ... < rn, and write the general solution to thelinear system as

z(t) = q 1er1tv1 + q 2er2tv2 + ... + q nerntvn

If the first n1 eigenvalues are negative and the remaining n2 eigenvalues of A are positive,then the requirement that the system converge to the steady state, so that

limt→∞

z(t) = 0

requires that q n1+1 = q n1+2 = ... = q n = 0, reducing the solution to

z(t) =

∙x(t)y(t)

¸= q 1er1tv1 + q 2er2tv2 + ... + q n1

ern‘ tvn1

Usingz(0) =

∙x(0)y(0)

¸= q 1v1 + q 2v2 + ... + q n1

vn1 ,

the n1 constants q 1, q 2, ..., q n1can be chosen to satisfy the n1 initial conditions given

by x(0). These values of q 1, q 2, ..., q n1then determine the unique values in y(0) that

place the system on the saddle path towards the steady state.

Thus, in general, it is often said that an economic model has a unique solution if and onlyif the number of negative eigenvalues is exactly equal to the number of predeterminedvariables.

16

8/7/2019 irelandp


Diff erence Equations

Regardless of whether we use the Kuhn-Tucker theorem, the maximum principle, ordynamic programming to solve a dynamic optimization problem in discrete time, we mustultimately solve a system of diff erence equations. Thus, we will now consider diff erenceequations and their solutions in more detail.

As we will see, the issues and concepts involved with solving diff erence equations are quitesimilar to those involved with solving diff erential equations. Thus, much of our discussion of diff erence equations will parallel our earlier discussion of diff erential equations. In particular,

we will begin by focusing on linear diff erence equations, since linear diff erence equations, likelinear diff erential equations, have solutions that can be characterized most sharply. But laterin our discussion, we will consider some new tools that are especially useful in solving othertypes of diff erence equations.

Reference:


1 Linear Diff erence Equations

We have seen that in continuous time, explicit solutions exist for linear diff erential equations.Similarly, in discrete time, explicit solutions can be found for linear diff erence equations.

Let’s start by recasting our familiar bank account example in discrete time:

yt = funds in the bank account at time t

r = net interest rate

The evolution of yt is described byyt+1 = (1 + r)yt (1)

Starting from y0, use (1) to calculate:

y1 = (1 + r)y0

y2 = (1 + r)y1 = (1 + r)2y0

y3 = (1 + r)y2 = (1 + r)3y0

1

8/7/2019 irelandp


Hence, in generalyt = (1 + r)ty0

Another way of deriving this particular solution is to note that the general solution to (1)takes the form

yt = k(1 + r)t

since this solution satisfies

yt+1 = k(1 + r)t+1 = (1 + r)k(1 + r)t = (1 + r)yt

A particular solution, that is, a specific value for k, can be found if we know a specific valueof yt at some date t. For example, if we have the initial condition y0 given, then thegeneral solution requires

y0 = k(1 + r)0 = k

so that the particular solution is found once again to be

yt = (1 + r)ty0

Equation (1) is an example of a first order diff erence equation, since only one past value of yt+1 appears on the right-hand side.

Next, consider a second order diff erence equation, with two past values of yt+1 on theright-hand side:

yt+1 = a1yt + a2yt−1 (2)

Suppose that we define the vector

zt+1 =∙ y

t+1yt¸

Then we could rewrite (2) as

zt+1 =

∙yt+1yt

¸=

∙a1 a21 0

¸∙ytyt−1

¸= Azt

Thus, we can always rewrite a second order diff erence equation as a system of two firstorder diff erence equations.

More generally, an nth order diff erence equation

yt+1 = a1yt + a2yt−1 + ... + anyt−n+1

can be rewritten as a system of n first order diff erence equations by defining the vector

zt+1 =

⎡⎢⎢⎣

yt+1yt...yt−n+2

⎤⎥⎥⎦

2

8/7/2019 irelandp


and writing

zt+1 =

⎡⎢⎢⎣

yt+1yt...

yt−

n+2

⎤⎥⎥⎦

=

⎡⎢⎢⎣

a1 a2 ... ... an1 0 ... ... 0... ... ... ... ...

0 0 ... 1 0

⎤⎥⎥⎦

⎡⎢⎢⎣

ytyt−1...

yt−

n+1

⎤⎥⎥⎦

= Azt

Thus, as long as we are willing to consider systems of diff erence equations we can, withoutany loss of generality, confine our attention to first order diff erence equations.

A general system of linear diff erence equations can be written as

xt+1 = Axt, (3)

where

xt+1 =

⎡

⎢⎢⎣x1t+1

x2t+1

...xnt+1

⎤

⎥⎥⎦ ,

xt =

⎡⎢⎢⎣

x1t

x2t

...xnt

⎤⎥⎥⎦ ,

and

A =

⎡⎢⎢⎣

a11 a12 ... a1na21 a22 ... a2n... ... ... ...

an1 an2 ... ann

⎤⎥⎥⎦

Each equation in the system (3) takes the general form

xit+1 = ai1x1t + ai2x2t + ... + ainxnt

In the simplest case, A is diagonal, so that (3) reduces to a system of n self-containedequations, each of which takes the form

xit+1 = aiixit

We already know that the general solution in this case has

xit = kiatii

for all i = 1, 2,...,n, where particular values for the constants ki, i = 1, 2,...,n, can befound if, for example, the initial conditions x10, x20, ..., xn0 are given.

But even in the more general case where A is not diagonal, we can solve (3) almost as easilyby drawing on our results having to do with eigenvalues, eigenvectors, and diagonaliz-able matrices.

3

8/7/2019 irelandp


8/7/2019 irelandp


Finally, with these solutions for the zit’s in hand, undo the transformation to solve for thexit’s:

xt = P zt

=£

v1 v2 ... vn¤⎡⎢⎢⎣

z1t

z2t...znt

⎤⎥⎥⎦

=£

v1 v2 ... vn¤⎡⎢⎢⎣

k1rt1k2rt2...knrtn

⎤⎥⎥⎦

orxt = k1rt1v1 + k2rt2v2 + ... + knrtnvn (5)

where particular values for the constants ki, i = 1, 2,...,n can be pinned down, for

example, by initial conditions x10, x20, ..., xn0 and the implied values of z10, z20, ..., zn0

There is another way of solving systems of linear diff erence equations that also exploits thefact that

P −1AP = D,

where D is diagonal. Note that this equality can be restated as

A = P DP −1.

Now consider (3) once again:xt+1 = Axt (3)

xt+1 = P DP −1xt

Starting from x0, calculatex1 = P DP −1x0

x2 = P DP −1x1 = P DP −1P DP −1x0 = PDDP −1x0 = P D2P −1x0

x3 = P DP −1x2 = P DP −1P D2P −1x0 = P D3P −1x0

Hence, in general,xt = P DtP −1x0,

which suggests that (3) has the general solution

xt = P DtP −1q, (6)

where particular values of the constants in the vector

q =

⎡⎢⎢⎣

q 1q 2...q n

⎤⎥⎥⎦

5

8/7/2019 irelandp


can be pinned down, for example if the initial conditions

x0 =

⎡⎢⎢⎣

x10

x20

...

xn0

⎤⎥⎥⎦

are given, since in that case:

x0 = P D0P −1q

= P IP −1q

= q

To verify that (6) is the general solution, note that it satisfies

xt+1 = P Dt+1P −1q

= P DD

t

P

−1

q = PDIDtP −1q

= P DP −1P DtP −1q

= P DP −1xt

= Axt

Now, in general, it is not the case that for an arbitrary square matrix B, Bt can be calculatedby raising each element of B to the tth power. However, in the special case of a diagonalmatrix, it does turn out that this is true.

FACT: For the diagonal matrix D:

Dt =

⎡⎢⎢⎣

rt1 0 ... 00 rt2 ... 0... ... ... ...0 0 ... rtn

⎤⎥⎥⎦

In light of this fact, (6) can be rewritten as

xt = P DtP −1q, (6)

xt =£

v1 v2 ... vn¤⎡⎢⎢⎣

rt1 0 ... 0

0 rt2 ... 0

... ... ... ...0 0 ... rtn

⎤⎥⎥⎦⎡⎢⎢⎣k1

k2...kn

⎤⎥⎥⎦where ⎡

⎢⎢⎣k1k2...kn

⎤⎥⎥⎦ = P −1q

6

8/7/2019 irelandp


Thus, (6) is equivalent to

xt =£

v1 v2 ... vn¤⎡⎢⎢

⎣

k1rt1k2rt2...knrtn

⎤⎥⎥

⎦or

xt = k1rt1v1 + k2rt2v2 + ... + knrtnvn, (5)

which confirms that the two approaches lead us to the same general solution.

Before moving on, note that a stationary, or steady state, solution to

xt+1 = Axt (3)

is a solution of the form xt = x for all t = 0, 1, 2,..., where the vector of constants xmust satisfy

x = Ax.

If A−

I is nonsingular, this requires thatx = 0,

where 0 is an n × 1 vector of zeros:

0 =

⎡⎢⎢⎣

00...0

⎤⎥⎥⎦

These calculations reveal that (3) has the steady state solution

xt =¯0

The general solutionxt = k1rt1v1 + k2rt2v2 + ... + knrtnvn, (5)

reveals that the steady state xt = 0 will be asymptotically stable if

|ri| < 1

for all i = 1, 2,...,n, for in this case

limt→∞

rti = 0

for all i = 1, 2,...,n and hence

limt→∞xt = 0

for any choice of k1, k2, ..., kn corresponding to any set of initial conditions x10, x20,..., xn0.

If, on the other hand, one or more of the ri’s is such that

|ri| ≥ 1,

then xt = 0 is unstable.

7

8/7/2019 irelandp


2 Lag Operators

Many economic applications, including some dynamic optimization problems, give rise todiff erence equations of the slightly more general form

yt = ayt−1 + xt, (7)

where xt is an exogenous, and possibly random, forcing variable.

In these cases, we often want to obtain solutions that express yt in terms of current, past,or future values of xt.

Suppose, therefore, that (7) holds for negative as well as positive values of t. And supposethat the time horizon extends into the infinite past as well as into the infinite future.Then t = ...,−2,−1, 0, 1, 2,...

Now consider working backward:

yt = ayt−1 + xt

andyt−1 = ayt−2 + xt−1

implyyt = a2yt−2 + axt−1 + xt

And sinceyt−2 = ayt−3 + xt−2,

we can also writeyt

= a3yt−

3+ a2x

t−

2+ ax

t−

1+ x

t

or, more generally,

yt = aT yt−T +T −1Xj=0

ajxt−j (8)

Now assume that:

a) The parameter a is less than one in absolute value: |a| < 1

b) The exogenous sequence {xt}∞

t=−∞ is bounded

c) The endogenous sequence {yt}∞

t=−∞ is also required to be bounded

Then we might repeat our backward substitution an infinite number of times or, equiva-lently, take the limit in (8) as T →∞

Since |a| < 1 and {yt}∞

t=−∞ is bounded,

limT →∞

aT yt−T = 0

8

8/7/2019 irelandp


And since |a| < 1 and {xt}∞


limT →∞

T −1Xj=0

ajxt−j =∞Xj=0

ajxt−j <∞

Thus, if |a| < 1, (7) has the solution

yt =∞Xj=0

ajxt−j (9)

in which yt is expressed as a weighted sum of current and past values of xt.

What happens when, instead, |a| > 1?

In this case, we can rewrite (7) asyt

= ayt−

1+ x

t, (7)

yt+1 = ayt + xt+1

ayt = yt+1 − xt+1

yt = (1/a)yt+1 − (1/a)xt+1

Now work forward:yt = (1/a)yt+1 − (1/a)xt+1

andyt+1 = (1/a)yt+2 − (1/a)xt+2

implyyt = (1/a)2yt+2 − (1/a)2xt+2 − (1/a)xt+1

And sinceyt+2 = (1/a)yt+3 − (1/a)xt+3

we can also write

yt = (1/a)3yt+3 − (1/a)3xt+3 − (1/a)2xt+2 − (1/a)xt+1

or, more generally,

yt = (1/a)T yt+T −

T Xj=1

(1/a)jxt+j (10)

Since |a| > 1 and {yt}∞


limT →∞

(1/a)T yt+T = 0

9

8/7/2019 irelandp


And since |a| > 1 and {xt}∞


limT →∞

T Xj=1

(1/a)jxt+j =∞Xj=1

(1/a)jxt+j <∞

Thus, if we repeat our forward substitution an infinite number of times or, equivalently,take the limit in (10) as T →∞, we obtain the solution

yt = −∞Xj=1

(1/a)jxt+j (11)

in which yt is expressed as a weighted sum of future values of xt.

In turns out that we can derive the solutions (9) and (11) more quickly if we use an analytictool called the lag operator.

The lag operator, denoted by L, is defined by

Lxt = xt−1

andLyt = yt−1

Note thatL2xt = LLxt = Lxt−1 = xt−2

and, more generally,Ljxt = xt−j

for j = 0, 1, 2,...

In addition,Lxt+1 = xt

impliesL−1xt = xt+1

and, more generally,L−jxt = xt+j

for j = 0, 1, 2,...

Thus, the lag operator works to shift the time index on a variable backward when raised

to positive powers and forward when raised to negative powers.Using the lag operator, (7) can be written as

yt = ayt−1 + xt, (7)

yt = aLyt + xt

(1− aL)yt = xt

yt = (1 − aL)−1xt. (12)

10

8/7/2019 irelandp


Now suppose that |a| < 1, and recall the following fact.

FACT: If |β | < 1, then∞Xj=0

β j =1

1− β = (1− β )−1

It turns out that we can apply this fact to (12) as well, and write

yt = (1 − aL)−1xt. (12)

yt =∞Xj=0

(aL)jxt

=∞Xj=0

ajLjxt

or

yt =∞Xj=0

ajxt−j (9)

which is, of course, the same solution that we obtained by repeated backward substi-tution.

Alternatively, if |a| > 1, we can use the lag operator to rewrite (7) as

yt = ayt−1 + xt, (7)

yt+1 = ayt + xt+1

L−1yt = ayt + L−1xt+1

ayt − L−1yt = −L−1xt

yt − (aL)−1yt = −(aL)−1xt

[1− (aL)−1]yt = −(aL)−1xt

yt = −(aL)−1[1− (aL)−1]−1xt (13)

Since |a| > 1, |1/a| < 1. Thus, applying the fact

(1− β )−1 =∞Xj=0

β j

11

8/7/2019 irelandp


to (13) yields

yt = −(aL)−1∞Xj=0

(aL)−jxt

= −(aL)−1

∞Xj=0

(1/a)j

L−j

xt

= −

∞Xj=0

(1/a)j+1L−j−1xt

= −

∞Xj=1

(1/a)jL−jxt

or

yt = −∞Xj=1

(1/a)jxt+j (11)

which is, of course, the same solution that we obtained using repeated forward substi-tution.

Note: Consider forming a matrix A with the parameter a from (7) as its only element:

A =£

a¤

Then A has a as its single eigenvalue: r1 = a.

Next, recall that for the linear system

xt+1 = Axt,

the steady state xt = 0 is asymptotically stable if the eigenvalues of A are all less thanone in absolute value and unstable if one or more of the eigenvalues of A is greaterthan one in absolute value.

By analogy, the diff erence equationyt = ayt−1 + xt (7)

is said to be stable if |a| < 1 and unstable if |a| > 1.

Since, when |a| < 1, we can use backward substitution to obtain

yt =∞

Xj=0

ajxt−j (9)

and, when |a| > 1, we can use forward substitution to obtain

yt = −∞Xj=1

(1/a)jxt+j (11)

it is often said that stable equations are solved backwards and unstable equations aresolved forwards.

12

8/7/2019 irelandp


3 Four Examples

3.1 Example 1: Optimal Growth

Production function:

F (kt) = k

α

t

where 0 < α < 1

Evolution of the capital stock:kt+1 = kαt − ct

Utility:∞Xt=0

β t ln(ct)

where 0 < β < 1

Earlier, we solved this problem via dynamic programming, by guessing that the valuefunction takes the form

v(kt) = E + F ln(kt).

After deriving the first order and envelope conditions and solving for the unknownconstants E and F , we concluded that the optimal capital stock follows the diff erenceequation

kt+1 = αβkαt (14)

So far, we have only discussed linear diff erence equations. But notice that if we take logson both sides of (14), we obtain

ln(kt+1) = ln(αβ ) + α ln(kt)

ln(kt+1) = (1− α)ln(αβ )

1− α+ α ln(kt)

ln(kt+1)−ln(αβ )

1− α= α

∙ln(kt)−

ln(αβ )

1− α

¸

Consider, therefore, defining the new variable

zt = ln(kt)−ln(αβ )

1− αand rewriting the diff erence equation more simply as

zt+1 = αzt

Now we have a linear diff erence equation, which we know has the general solution

zt = qαt

13

8/7/2019 irelandp


Given the initial condition k0, we can calculate the initial condition

z0 = ln(k0)−ln(αβ )

1− α

and thereby determine the particular solution

zt = αtz0 = αt

∙ln(k0)−

ln(αβ )

1− α

¸

Now undo the transformation to find the solution for kt:

ln(kt)−ln(αβ )

1− α= zt = αt

∙ln(k0)−

ln(αβ )

1− α

¸

Hence, given k0, (14) has the solution

ln(kt) = ln(αβ )1− α

+ αt∙

ln(k0)− ln(αβ )1− α

¸

Since |α| < 1, this solution tells us that for any value of k0,

limt→∞

ln(kt) =ln(αβ )

1− α= ln[(αβ )1/(1−α)]

Hence, starting from any initial capital stock k0, the capital stock converges to a steadystate level:

limt→∞kt = (αβ )

1/(1−α)

3.2 Example 2: Stochastic Optimal Growth

Problem set 3 extended our optimal growth example by adding a random shock to produc-tivity.

Production function:F (kt, zt) = ztk

αt

where 0 < α < 1 and zt is random with E t ln(zt) = 0 for all t = 0, 1, 2,...

Evolution of the capital stock:kt+1 = ztk

αt − ct

Expected utility:

E 0

∞Xt=0

β t ln(ct)

where 0 < β < 1

14

8/7/2019 irelandp


Problem set 3 asked you to solve this problem via dynamic programming and to show thatthe optimal capital stock follows the first order autoregressive process

ln(kt+1) = ln(αβ ) + α ln(kt) + ln(zt) (15)

Rewrite this diff

erence equation as

ln(kt+1) = (1− α)ln(αβ )

1− α+ α ln(kt) + ln(zt)

or, more simply, asxt+1 = αxt + εt,

where

xt = ln(kt)−ln(αβ )

1− α

andεt = ln(zt)

Since |α| < 1, we can use the lag operator to solve for xt in terms of past values of εt:

xt+1 = αxt + εt,

xt = αxt−1 + εt−1

xt = αLxt + Lεt

(1− αL)xt = Lεt

xt = L(1− αL)−1εt

xt = L

∞Xj=0

(αL)j

εt

xt = L∞Xj=0

αjLjεt

xt =∞Xj=0

αjLj+1εt

xt =∞Xj=0

αjεt−j−1

Now undo the transformations to find the solution for kt in terms of past zt:

ln(kt)−ln(αβ )

1− α= xt =

∞Xj=0

αjεt−j−1

ln(kt) =ln(αβ )

1− α+

∞Xj=0

αj ln(zt−j−1)

15

8/7/2019 irelandp


Use this solution to characterize the behavior of output:

yt = ztkαt

implies

ln(yt) = ln(zt) + α ln(kt)or

ln(yt) = ln(zt) +

µα

1− α

¶ln(αβ ) + α

∞Xj=0

αj ln(zt−j−1) (16)

Equation (16) reveals that yt will be serially correlated even if zt is serially uncorrelated.Thus, the process of optimal capital accumulation can transform serially uncorrelatedshocks to productivity into serially correlated movements in output like those thatoccur over the business cycle.

3.3 Example 3: Nonlinear Diff erence EquationsIn our first two examples, we were able to solve nonlinear diff erence equations by trans-

forming them into linear diff erence equations.

For this third example, consider a more general nonlinear diff erence equation of the form

xt+1 = g(xt)

and suppose that the function g does not allow us to rewrite this equation as a lineardiff erence equation.

In this more general case, we might not be able to find an explicit solution.

As in continuous time, however, we might still be able to characterize the solution graphi-cally.

In case one, illustrated below, the graph of g(x) intersects the 45-degree line at x = 0 andx = x∗, revealing that there are two steady states.

But starting from x0, which can be arbitrarily close to zero, the graph reveals that

limt→∞

xt = x∗,

implying that xt = x∗ is asymptotically stable, while xt = 0 is unstable.

In case two, once again, there are two steady states: xt = 0 and xt = x∗.

But in this second case, xt = 0 is asymptotically stable, while xt = x∗ is unstable.

16

8/7/2019 irelandp


xt+1

45-d

0 x*x0

x1

x1

x2

x2

Case 1: xt = x* is asymptotically stable

8/7/2019 irelandp


xt+1

45-dxt+1 = g(xt)

0 x*x0

x1

x1

x2

x2

Case 2: xt = 0 is asymptotically stable

8/7/2019 irelandp


3.4 Example 4: Markov Processes

For our last example, consider an economy in which, during each period t, each individ-ual agent experiences a random shock that places him or her into one of n distinctcategories, or states.

Example: During any period, an individual worker might be employed (in state 1) orunemployed (in state 2).

For i = 1, 2,...,n, let

xit = probability that a representative agent will be in state i during period t

If the size of the population is large, then

xit = fraction of the population in state i during period t

For all t = 0, 1, 2..., it must be that nXi=1

xit = 1

Next, for i = 1, 2,...,n and j = 1, 2,...,n, let:

mij = probability that an agent in state j during period t will be in state i duringperiod t + 1

Again, if the size of the population is large,

mij = fraction of the agents in state j during period t who will move to state i duringperiod t + 1

Once again, for all t = 0, 1, 2,... and j = 1, 2,...,n, it must be that

nXi=1

mij = 1

The random, or stochastic process that allocates individual agents to individual states inthis example is called a Markov process or a Markov chain.

The defining characteristic of a Markov process is that only the immediate past matters:the probability that an agent will be in state i during period t + 1 depends only on thestate j that the agent is in during period t.

The probabilities mij are called transition probabilities. Since, in this example, the mij donot depend on time, the Markov process is stationary.

17

8/7/2019 irelandp


Suppose we collect all of the transition probabilities into a matrix:

M =

⎡⎢⎢⎣

m11 m12 ... m1n

m21 m22 ... m2n

... ... ... ...

mn1 mn2 ... mnn

⎤⎥⎥⎦

Then M is called a Markov matrix, and has the special property that each of its columnshave entries that sum to one.

In general, since the mij are probabilities, they must satisfy

mij ≥ 0

for all i = 1, 2,...,n and j = 1, 2,...,n. If the stronger condition

mij > 0

holds for all i = 1, 2,...,n and j = 1, 2,...,n, then the Markov matrix is said to beregular.

Note that if we know all of the fractions xjt, j = 1, 2,...,n, we can calculate the fractionsxit+1 using

xit+1 =nX

j=1

mijxjt (17)

for all i = 1, 2,...,n.

Alternatively, if we define

xt =

⎡⎢⎢⎣

x1t

x2t

...xnt

⎤⎥⎥⎦ ,

then we can write the equations in (17) in matrix form as:

xt+1 =

⎡⎢⎢⎣

x1t+1

x2t+1

...xnt+1

⎤⎥⎥⎦

=

⎡⎢⎢⎣

m11 m12 ... m1n

m21 m22 ... m2n

... ... ... ...m

n1m

n2... m

nn

⎤⎥⎥⎦

⎡⎢⎢⎣

x1t

x2t

...xnt

⎤⎥⎥⎦

= Mxt, (18)

which is just a system of linear diff erence equations.

FACTS: Let M be a regular Markov matrix. Then

a) M has one eigenvalue that is equal to one: r1 = 1

b) Every other eigenvalue of M is less than one in absolute value: |ri| < 1 for i =2, 3,...,n

18

8/7/2019 irelandp


c) For any constant k1, we can find an eigenvector v1 of M corresponding to theeigenvalue r1 = 1, such that the elements of w1 = k1v1 all lie between zeroand one. Moreover, the elements of w1 sum to one. The elements of w1 cantherefore be interpreted as probabilities or, if the population is large, fractions of the population.

We know that the general solution to a system of linear diff erence equations like (18) takesthe form

xt = k1rt1v1 + k2rt2v2 + ... + knrtnvn, (5)

We also know from our facts that if M is a regular Markov matrix,

limt→∞

rti = 0 for i = 2, 3,...,n

Thus, so long as M is regular, (5) implies that starting from any initial x0,

limt→∞

xt = k1v1 = w1,

where the elements of w1 can be interpreted as fractions of the population.

These results tell us that if the Markov matrix M is regular, then starting from any initialdistribution of the population into states,

x0 =

⎡⎢⎢⎣

x10

x20

...xn0

⎤⎥⎥⎦

,

the economy will converge over time towards a steady state, in which the distributionof the population into states is given by

w1 =

⎡⎢⎢

w11

w12

⎤⎥⎥ .

irelandp

Documents