introduction - university of marylandecon-server.umd.edu/vincent/2014/notes/staticlecture1.pdf1the...

STATIC LECTURE 1

UNIVERSITY OF MARYLAND: ECON 600

1. Introduction12

The first half of the course is about techniques for solving a class of constrained

optimization problems that, in their most general form can be described verbally as

follows. There are some outcomes, x ∈ A that a decision-maker cares about. The

decision-maker can only choose from a subset, C ⊂ A. What is the best choice for

the decision-maker?

Micro-economics is very much about problems of this sort:

• How does a consumer select a consumption bundle from his affordable set

to maximize utility?

• How does a monopolist set prices to maximize profits given demand?

• How can a manager select inputs so as to minimize costs and achieve a

target output?

And so on. Mathematically, this problem can be stated by assuming there exists

a real-valued function, f : A → R which characterizes the preferences of the

decision-maker. The decision-maker then solves the problem

maxx∈C

f(x)

Although the theory allows us to deal with a wide variety of different types of x,

in this course we will almost exclusively focus on the case where A = Rn and the

decision-maker is selecting an n-dimensional vector.

Date: Summer 2013.1The notes corresponding to the static part of the course were prepared by Prof. Vincent.2Notes compiled on August 2, 2013

1

STATIC LECTURE 1 2

From your initial experience with microeconomics, you may already have been

confronted with concepts and issues in constrained optimization. Many of the

techniques you acquired in coping with these problems are covered in more detail

in this course.

The intent is two-fold. First, to give you a grounding in the theory that underlies

much of the standard constrained optimization techniques that you already are

using. Second, because the statements of some of the theorems are delicate, (they

require some special conditions) you need to be aware of when you can use them

directly, when some dangers arise and some ways of coping with these dangers.

The plan of the (first half of the) course is to establish some mathematical prelim-

inaries, then to start with unconstrained optimization (Part II), next characteristics

of constraint sets (Part III)), then to go to the most important theory in constrained

optimization, Kuhn-Tucker Theory (Part IV)). After that, applications are studied

to appreciate where the theory works and where it fails. The last section is an

important area of applications of K-T Theory which is how to use it to conduct

comparative statics. (Part V)). The last half of the course then uses these techniques

to examine dynamic optimization.

2. Preliminary Concepts

2.1. Some Examples. How do we actually go about solving an optimization

problem of the form maxx f(x)? One can imagine just programming the function

and computing the solution. But what would be a more reliable analytic approach?

You all are probably fairly confident about the ability to solve many simply

unconstrained optimization problems. Loosely speaking, you take a first derivative

and find where the slope of the function is zero.

Obviously, this is too simple as the next figures show (Figures 2.1, 2.2, 2.3, and

2.4).

The problem in Figure 2.1 is that the solution is unbounded. No matter how

high we select x we still can do better by choosing a higher one.

STATIC LECTURE 1 3

Figure 2.1. Function increases without bound

In Figure 2.2, the difficulty is that the function is not differentiable at the

optimum.

In Figure 2.3, the function is not continuous and does not achieve an optimum

at the candidate solution, x∗.

In Figure 2.4, there are multiple local optima. In fact, there are three solutions

to the problem, take the derivative of f(x) and find where it is equal to zero.

Other problems may also arise. For example, while we can understand the

approach for the single-dimensional problem, f : R→ R, how does the technique

work for the more interesting and more common multi-dimensional case, f : Rn → R?

2.2. Continuity and Linearity.

2.2.1. Metrics. If we are focusing on the set of problems where the choice variable

is a n-dimensional real vector (x ∈ Rn) then we need to develop an idea of what it

STATIC LECTURE 1 4

Figure 2.2. Derivative does not exist at maximum

Figure 2.3. Function is discontinuous at maximum

STATIC LECTURE 1 5

Figure 2.4. Function has multiple local maxima

means for two different choices, x and y to be ‘‘close’’ to each other. That is, we

need an idea of distance, or norm or metric.

The notion of distance or ‘‘closeness’’ is the usual common sense idea of the

Euclidean distance. For a vector x ∈ Rn, we say that the length of that vector (or

equivalently, the distance of that vector from the origin) or its norm or its metric

is defined by

‖x‖ =√x21 + x22 + x23 + · · ·+ x2n

That is, the square root of the sum of the squares of its components.

Remark 1. See SB 29.4

In fact, there are many different possible concepts of distance we could have used

even in Rn. If you notice what the norm is doing you will see that it is taking an

element from our Vector Space and giving us back a real number which we interpret

as telling us the distance or size of the element.

STATIC LECTURE 1 6

Figure 2.5. The Triangle Inequality: ‖x+ y‖ ≤ ‖x‖+ ‖y‖

More generally, then, a norm is any function that operates on a Vector Space

and gives us a real number back which also satisfies the following three properties:

(1) ‖x‖ ≥ 0, and furthermore ‖x‖ = 0 if and only if x = 0

(2) ‖x+ y‖ ≤ ‖x‖+ ‖y‖(Triangle Inequality)

(3) ‖ax‖ = |a| ‖x‖, ∀a ∈ R, ∀x ∈ V

Prove for yourself, that the Euclidean norm satisfies these conditions.

Observe that the triangle inequality is geometrically sensible in Figure 2.5

Remark 2. Some other possible metrics on Rn are:

(1) ‖x‖p = (|x1|p + |x2|p + |x3|p + · · ·+ |xn|p)1/p (Lp norm)

(2) ‖x‖1 =∑n

i=1 |xi| (taxicab norm)

(3) ‖x‖∞ = max{|x1|, |x2|, |x3|, . . . , |xn|} (maximum norm)

Which is the appropriate choice of norm can depend partly on the context (for

example, suppose you cared about a choice of multiple lane tunnels and you needed

to make sure that the tunnel you selected had a lane wide enough to let your vehicle

pass through, which would be the appropriate norm?) and partly on convenience --

some norms have better features or are more tractable than others.

STATIC LECTURE 1 7

Note that if we think of the norm of x as denoting the distance of x from the

origin, for any two vectors, x and y, the norm of x− y is a measure of the distance

between x and y.

We have a good intuition about what continuity means for a function from the

real line to the real line, (no jumps or gaps), since we can rarely draw the graphs of

functions from more complicated spaces, we need a more precise definition.

Roughly, we want to ensure that whenever x′ is close to x in the domain space,

f(x′) is close tof(x) in the target space.

Since we focus onRn, we will typically just use the Euclidean Norm as our idea

of distance.

Definition 3. A sequence of elements, {xn} is said to converge to a point, x ∈ Rn,

if for every δ > 0 there is a number, N such that for all n > N , ‖xn − x‖ < δ.

Definition 4. A function f : Rn → Rm is said to be continuous at a point x if for

ALL sequences {xn} converging to x, the derived sequence of points in the target

space, {f(xn)} converges to the point f(x). We say that a function is continuous if

it is continuous at all points in its domain.

Observe why the example in Figure 2.3 above fails the condition of continuity at

x∗.

Definition 5. A function f : V →W is linear if for any two real numbers a, b and

for any two elements, v,v′ ∈ V we have f(av + bv′) = af(v) + bf(v′).

Note that any linear function from Rn to Rm can be represented by an m by n

matrix, A, such that f(x) = Ax. (You might also observe that this means that

f(x) is the (column) vector of numbers that result when we take the inner product

of every row of A with x.)

Note that although we sometimes call functions from R to R of the form f(x) =

mx+b ‘‘linear’’ functions, these are really “affine’’ functions. Why do these functions

not generally satisfy the definition of linear functions?

STATIC LECTURE 1 8

Remark 6. In order to ensure that a solution exists to an optimization problem,

(that is, to rule out problems like that shown in Figure 2.3) we generally need to

rule out decision problems where the objective function, f(x), is not continuous

in x. The typical approach is simply to assume that the problem is such that

the continuity of the objective function holds. Usually this is not controversial.

However, there are cases where it is too strong. We can sometimes make a less

strong assumption and assume that the objective function satisfies

∀xn → x, limn→∞

f(xn) ≤ f(x)

In this case, we say f(x) is ‘‘upper semi-continuous’’. For an economic example

where this became an issue, see Dasgupta, P. and E. Maskin, “The Existence of

Equilibria in Discontinuous Games: Theory’’, Review of Economic Studies, LIII,

1986, pp. 1-26.

2.3. Vector Geometry. The next problem addressed (in the next two subsections),

is how to extend our intuition about an optimum being at a point where the objective

function has slope zero to multiple dimensions. First, we need some notions of

vector geometry and later some multidimensional calculus.

Definition 7. A set of vectors {v1,v2, . . . ,vn} is linearly independent if and only

if the only set of numbers {a1, a2, . . . , an} that satisfies the following equation is

the trivial solution where all a are identically 0

a1v1 + a2v2 + · · ·+ anvn = 0

For x,y ∈ Rn, the inner product of x and y is x ·y = (x1y1 + x2y2 + · · ·+ xnyn).

Note that there is a direct relationship between the inner product and the Euclidean

norm. That is

‖x‖2 = x · x

STATIC LECTURE 1 9

Figure 2.6. Upper semi-continuous (bottom point open, top closed)

Two vectors are orthogonal to each other (geometrically, are perpendicular to

each other) if x · y = 0.

Suppose that x = (1, 0, 0). Find a vector which is orthogonal to x. Show that

if y is orthogonal to x, then ay is also orthogonal to x. Show that there are two

linearly independent vectors which are orthogonal to x.

Let v,w ∈ Rn. In matrix notation, v and w are n × 1 matrices. v′ is the

transpose of v, that is, it is the 1× n matrix derived from v. We can thus write

the inner product of v and w as the transpose of v pre-multiplying w. That is, let

v′w =∑n

i=1 viwi = v ·w .

STATIC LECTURE 1 10

Note as well, that inRn, if we take any two vectors and join them “at the tail’’,

the two vectors will define a plane (a two dimensional at surface) inRn. How are

the two vectors related?

Theorem 8. (SB 10.4)

• If v ·w > 0, v and w form an acute angle with each other.

• If v ·w < 0 they form an obtuse (greater than 90 degrees) angle with each

other.

• If v ·w = 0, then they are perpendicular to each other. (They are orthogonal

to each other.)

2.4. Hyperplanes: Supporting and Separating.

Definition 9. A linear function is a transformation from Rn toRm with the feature

that f(ax+ by) = af(x) + bf(y) for all x,y inRn and for all a, b inR.

Fact 10. Every linear functional (a linear function with R as the range-space) can

itself be represented by an n-dimensional vector (call it (f1, f2, . . . , fn)) with the

feature that

f(x) =

n∑i=1

fixi

That is, the value of the functional at x, is just the inner product of this defining

vector (f1, f2, . . . , fn) with x.

Remark 11. Note that once we fix a domain space, for example,Rn, we could ask

the question, ‘‘What constitutes all of the possible linear functionals defined on that

space?’’ Obviously this is a large set. The set of all such functionals for a given

domain space V , is called the dual space of V and is often denoted, V ∗ .

Fact 12. The fact above implies thatRn is its own dual space. This symmetry,

though, does not always hold (for example if the domain space is the vector space

of all continuous functions defined over [0, 1], the dual space is quite different) so

STATIC LECTURE 1 11

it is mathematically correct to continue to carefully distinguish between a domain

space and its dual space.

Definition 13. A hyperplane is the set of points given by {x : f(x) = c)} where f

is a linear functional and c is some real number. (We cannot have f be the trivial

linear functional of all zeroes.)

Example. For R2 a hyperplane is a straight line.

Example. For R3 a hyperplane is a plane.

Intuition: Note that using the definition of a hyperplane, we can think of it as

one of the many level sets of the special linear functional, f . As we vary c, we

change level sets.

Intuition: Suppose that x,y are two points on a given hyperplane with defining

vector given by (f1, f2, . . . , fn). Note that the vector that joins x and y, x− y lies

along the hyperplane. Using the definition of the hyperplane, we can show that the

defining vector (f1, f2, . . . , fn) is orthogonal to the hyperplane in the sense that it

is orthogonal to any line that joins any two points on the hyperplane. (Prove this

for yourself.)

Definition 14. A Half-Space is the set of points on one side or another of a

hyperplane. It can be defined formally as HS(f) = {x : f(x) ≥ c} or HS(f) = {x :

f(x) ≤ c} where f is the linear functional that defines the hyperplane.

Now consider any two disjoint (nonintersecting) sets. When can I construct a

hyperplane that goes in between them or that ‘‘separates’’ them?

Definition 15. A hyperplane separates two sets, C1,C2, if for all x ∈ C1, f(x) ≤ c

and for all x ∈ C2, f(x) ≥ c. That is, the two sets lie completely in two different

halfspaces determined by the hyperplane.

In R2 , the separating hyperplane looks like this:

But of course, it is not always possible to draw separating hyperplanes. Try

doing it on figure 2.8.

STATIC LECTURE 1 12

Figure 2.7. A Separating hyperplane

Definition 16. If C lies in a half-space defined by H and H contains a point on

the boundary of C then we say that H is a supporting hyperplane of C.

Recall the following important definition.

Definition 17. A set C ⊂ Rn is convex if for all x,y ∈ C, for all α ∈ [0, 1],

αx+ (1− α)y ∈ C

Since any given convex set can be represented as the intersection of halfspaces

defined by all of the supporting hyperplanes of the set, we may be able to anticipate

the role of hyperplanes in optimization theory. The problem is to find a point in C

STATIC LECTURE 1 13

Figure 2.8. No separating Hyperplane exists

Figure 2.9. Supporting Hyperplanes

to minimize the distance to x that would yield the same answer as the problem:

among all the separating hyperplanes between x and C, find the hyperplane that is

the farthest away from x. Note that this hyperplane is a supporting hyperplane of

C and is orthogonal to the vector x. (See Figure 2.9)

STATIC LECTURE 1 14

Figure 2.10. Separating Hyperplane Examples

There are many versions of separating hyperplane theorems but I will give just

one. (See also de la Fuente, 241-244) (See lecture 3 for more details on Int and

other set notation.)

Theorem 18. (Takayama pp. 39-49) Suppose X, Y are non-empty, convex sets

inRn such that Int(Y ) ∩X = ∅; and the Interior of Y is not empty. Then there

exists a vector a in Rn which is the defining vector of a separating hyperplane

between X and Y . That is, for all x ∈ X, ax ≤ c and for all y ∈ Y, c ≤ ay.

Remark 19. The requirement that the interior of Y be disjoint with X allows for

the two sets to intersect on a boundary. The requirement that the interior of Y be

nonempty rules out the counterexample of two intersecting lines (see Figure 2.10).

Definition 20. The Graph of a function from V to W is the ordered pair of

elements, {(v, w) : v ∈ V,w = f(v)}.

Example. The graph of f(x) = x2 is {(x, x2) : x ∈ R} See Figure 2.11.

Remark 21. The graph of a function is what you normally see when you draw the

function in a Cartesian diagram.

STATIC LECTURE 1 15

Figure 2.11. A Graph

2.5. Derivatives, Gradients, and Subgradients. You already know that, in

well-behaved problems, a necessary condition for x∗ to be an unconstrained maximum

of a function f is that its derivative be zero (if the derivative exists) at x∗. Indeed,

this notion generalizes, if the partial derivatives of a function, f : Rn → R exist at

x∗ and x∗ is an unconstrained maximum, then all the partial derivatives at x∗ must

be zero.

The questions explored in this section are:

• Why is this true?

• What happens if the derivatives do not exist?

• What is the geometric interpretation of this?

2.5.1. Single Dimension. For f : R→ R, the formal definition of the derivative of

f at some point x is

f ′(x) = limh→0

f(x+ h)− f(x)h

where h represents any sequence going to zero.

STATIC LECTURE 1 16

Figure 2.12. The Derivative of f

Note that this object does not exist at all x for every function. Thus, we sometimes

encounter functions (even continuous functions) which do not have a derivative at

some x. Though we can often rule these out without any harm, it is also the case

that non-differentiable functions arise naturally in economic problems so we cannot

always do this.

Informally, we think of the derivative of f at x as telling us about the slope of f .

Note that this is really a notion about the graph of f .

Another way to think about what the derivative does, which ties more directly

into optimization theory and also gives us a better clue about how to extend it to

many dimensions is to see that it defines a ‘‘supporting hyperplane’’ to the graph of

f at the point (x, f(x)).

To see this, consider the points in the (x, y) space given by

H = {(x, y)|(f ′(x∗),−1) · (x, y) = f ′(x∗)x∗ − f(x∗)}

This is a hyperplane and exactly defines the line drawn in the graph. It touches

(is tangent to) the graph of f at f(x∗).

STATIC LECTURE 1 17

2.5.2. Multidimensional Derivatives. The extension of the derivative to functions

f : Rn → R is fairly direct. For the ith component, the ith partial derivative of f

at x = (xi,x−i) is computed by thinking of the function fxi: R→ R given by

fxi= f(xi;x−i)

where we treat the components x−i as fixed and vary xi. We then compute the

partial derivative of f with respect to xi at x by computing the ordinary one-

dimensional derivative of fxi. This is like taking a ’slice’ of the graph of f along

the ith dimension.

∂f(x)

∂xi= lim

h→0

f(xi + h,x−i)− f(x)h

Definition 22. The gradient of f at x (written, ∇f(x)) is the n-dimensional vector

which lists all the partial derivatives of f if they exist.

∇f(x) =(∂f(x)

∂x1,∂f(x)

∂x2, . . . ,

∂f(x)

∂xn

)′Definition 23. The derivative of f at x, written:

Df =

(∂f(x)

∂x1,∂f(x)

∂x2, . . . ,

∂f(x)

∂xn

)is the 1 × n row vector of partial derivatives if they exist. That is, it is the

transpose of the gradient of f .

These objects are useful because if we take a small vector v = (v1, v2, . . . , vn) ∈ Rn

the vector, ∇f helps us determine approximately how f changes when we move

from x in the direction of v. The sum over i = 1, . . . , n of vi · ∂f(x)/∂xi is a very

close estimate of the change in f when we move from x to x+ v. That is

f(x+ v) ≈ f(x) +n∑

i=1

∂f(x)

∂xivi

STATIC LECTURE 1 18

Figure 2.13. The two-dimensional gradient as a supporting hyperplane

Using the gradient of f , we can write this summation term more concisely as an

inner product:

n∑i=1

∂f(x)

∂xivi = ∇f(x) · v

or as a vector multiplication:

n∑i=1

∂f(x)

∂xivi = v′∇f(x)

As in the one-dimensional case, the gradient can be interpreted as a supporting

hyperplane of the graph of f . The hyperplane is defined as:

H = {(x, y)|x ∈ Rn, y ∈ R, (∇f(x∗),−1)(x, y) = ∇f(x∗)x∗ − f(x∗)}

The object on the right side of the equality is just a real number that corresponds

to the position of the hyperplane, the left side corresponds to the ‘slope’ of the

hyperplane.

The function in Figure 2.13 is the graph of f(x, y) = x2 + y2 and its gradient

hyperplane at the point (1, 1). The flat plane (the gradient hyperplane) just kisses

the graph at the point (1, 1, 2).

STATIC LECTURE 1 19

2.5.3. Second Order Derivatives. If f : Rn → Rm then f is a vector valued function

(that is, for any x ∈ Rn, f gives a vector in Rm). Alternatively, we can think of

f as consisting of a list of m functions, f i : Rn → R, i = 1, 2, . . . ,m, and we can

take derivatives of each of these functions as above. In this case, the gradient of f

at x is actually a n×m matrix. Each column of the matrix, say column i is the

n-dimensional gradient of the function, f i.

This logic allows us to consider second derivatives of a function, f : Rn → R. If

f is twice continuously differentiable (written as C2) then we can also differentiate

the gradient of f (which is a function fromRn toRn) to get an n × n matrix of

functions:

∇2f(x) =

∂2f(x)∂x2

1

∂2f(x)∂x1∂x2

. . . ∂2f(x)∂x1∂xn

∂2f(x)∂x2∂x1

∂2f(x)∂x2∂x2

. . . ∂2f(x)∂x2∂xn

......

. . ....

∂2f(x)∂xn∂x1

∂2f(x)∂xn∂x2

. . . ∂2f(x)∂x2

n

which is sometimes written as

∇2f(x) =

f11 f12 . . . f1n

f21 f22 . . . f2n...

.... . .

...

fn1 fn2 . . . fnn

Definition 24. The derivative of the gradient (or Jacobian) of f is called the

Hessian of f

Theorem 25. Young’s Theorem (SB 14.5): If f is C2, fij(x) = fji(x). That is,

the Hessian of f is symmetric.

2.6. Homogeneous and Homothetic Functions. Certain functions in Rn are

especially well-behaved:

Definition 26. A function f : Rn → R is homogeneous of degree k if f(tx1, tx2, . . . , txn) =

tkf(x) for t ∈ R.

STATIC LECTURE 1 20

One reason why these functions are useful is that they are very easy to characterize.

For example, suppose we only knew the value that the function takes for some ‘‘ball’’

around the origin. Then we can use the homogeneity assumption to determine its

value everywhere in Rn. That is because for a ball that completely surrounds the

origin, we can define any point x′ ∈ Rn as some scalar multiple of some point on

that ball, so x′ = tx. Then apply the definition.

Definition 27. If f(tx1, tx2, . . . , txn) = tf(x) we say that f is linearly homoge-

neous.

An important feature of homogeneous functions comes from the following theorem:

Theorem 28. Euler’s Theorem (SB 20.4): If f is homogeneous of degree k then

x · ∇f(x) = kf(x)

Proof. Use the chain rule of differentiation to get

d

dtf(tx1, tx2, . . . , txn) =

∂f(tx)

∂x1x1 +

∂f(tx)

∂x2x2 + · · ·+

∂f(tx)

∂xnxn

=

n∑i=1

∂f(tx)

∂xixi

= ∇f(tx) · x

where ∂f(tx)/∂xi represents the partial derivative of f with respect to its ith

argument evaluated at tx. Now note that by homogeneity,

d

dtf(tx1, tx2, . . . , txn) =

d

dttkf(x1, x2, . . . , xn)

= ktk−1f(x)

Both of these results hold for any value of t so in particular, choose t = 1. Substi-

tuting and combining the two equations gives the result. �

Definition 29. A ray through x ∈ Rn is defined as the set, {x′ ∈ Rn : x′ = tx, t ∈

R}. Geometrically, it is the line joining x and the origin and extending forever in

both directions.

STATIC LECTURE 1 21

A useful feature of homogeneous functions is that the Jacobian or gradient of the

function is (essentially) the same along any ray. Since for any two points, x, x′ on

a ray (x′ = tx), we have f(x′) = tkf(x), then ∇f(x′) = ∇f(tx) = tk∇f(x). That

is, the gradient at x′ is just a scalar multiple of the gradient at x and so the two

vectors are linearly dependent.

This means that the level sets of the function, along any ray, have the same

slope.

Application: An important application of this is that homogeneous utility func-

tions rule out ‘‘income effects’’ on demand. (For constant prices, consumers demand

goods in the same proportion as income changes.) This feature of identical Jacobian

vectors is not restricted to homogeneous functions. Homothetic functions also exhibit

this feature.

Definition 30. A function f : Rn+ → R+ is homothetic if f(x) = h(v(x)) where

h : R+ → R+ is strictly increasing and v : Rn+ → R+ is homogeneous of degree k.

2.7. Some More Geometry of Vectors in Rn.

Theorem 31. (SB Theorems 10.3 and 14.2). Consider a continuously differentiable

function, f : Rn → R. ∇f(x) is a vector in Rn which points in the direction of

greatest increase of f moving from the point x.

Note that if we define a (small) vector v such that ∇f(x)v = 0, then we know

that v is moving us away from x in a direction that adds zero to the value of f(x).

Therefore, solving the equation v · ∇f(x) = 0 is a way of finding the level sets of

f(x). Geometrically, the vector v is tangent to the level set of f(x).

Also, we know that v and f(x) are orthogonal (or normal) to each other (by

definition since two vectors, w,v are orthogonal if w · v = 0) . Thus, the direction

of greatest increase of a function at a point x is at right angles to the level set at x.

Definition 32. Consider a function f : Rn → R. The level set of f or the set

{x : f(x) = c} is the set of points in Rn which yield the same value c for f . The set

of points {x : f(x) ≥ c} is an upper contour set of f .

STATIC LECTURE 1 22

If n = 2 and f is C1, we can solve for the level set of f by using the total

differential: Let (dx, dy) satisfy

df = 0 =∂f(x, y)

∂x· dx+

∂f(x, y)

∂y· dy

Then the differential equation,

dy

dx= −

∂f(x;y)∂x

∂f(x;y)∂y

along with an initial condition, y0(x0) = y0 will trace out the level set of f through

the point (x0, y0).

introduction - university of marylandecon-server.umd.edu/vincent/2014/notes/staticlecture1.pdf1the...

Documents