ch4

36
Chapter 4: Dierentiation Marianito R. Rodrigo 1 Dierentiability and the derivative Let us recall how we defined the derivative in elementary calculus. Suppose that f : E , where E is an open interval containing x 0 . We defined the derivative of f at x 0 to be the real number f ( x 0 ) = lim h0 f ( x 0 + h) f ( x 0 ) h , (1) provided the limit on the right-hand side exists. Now suppose that f : E m , where E n is now an open set containing x 0 . Then we cannot extend (1) straightforwardly since h would necessarily be a vector in n for x 0 + h to make sense. But then we would be dividing by the vector h, which is undefined when m > 1. One way around this problem is to reformulate the definition of the derivative in the single-variable case. Define a linear function T : such that T (h) = f ( x 0 )h for every h . Then (1) is equivalent to lim h0 f ( x 0 + h) f ( x) T (h) h = 0 or lim |h|→0 | f ( x 0 + h) f ( x) T (h)| |h| = 0. (2) Viewed in this way, we can alternatively say that a real-valued function of a real variable is dierentiable at x 0 if we can find a linear function T : such that (2) holds. The main dierence is that (2) makes sense even when f is a vector- valued function of a vector variable. In this case, however, we consider a linear transformation instead of a linear function. 1

Upload: twistedfillament

Post on 08-Nov-2014

15 views

Category:

Documents


7 download

DESCRIPTION

descript

TRANSCRIPT

Page 1: ch4

Chapter 4: Differentiation

Marianito R. Rodrigo

1 Differentiability and the derivativeLet us recall how we defined the derivative in elementary calculus. Suppose thatf : E → �, where E ⊆ � is an open interval containing x0. We defined thederivative of f at x0 to be the real number

f ′(x0) = limh→0

f (x0 + h) − f (x0)h

, (1)

provided the limit on the right-hand side exists. Now suppose that f : E →�m, where E ⊆ �n is now an open set containing x0. Then we cannot extend(1) straightforwardly since h would necessarily be a vector in �n for x0 + h tomake sense. But then we would be dividing by the vector h, which is undefinedwhen m > 1. One way around this problem is to reformulate the definition of thederivative in the single-variable case.

Define a linear function T : �→ � such that

T (h) = f ′(x0)h

for every h ∈ �. Then (1) is equivalent to

limh→0

f (x0 + h) − f (x) − T (h)h

= 0

orlim|h|→0

| f (x0 + h) − f (x) − T (h)||h| = 0. (2)

Viewed in this way, we can alternatively say that a real-valued function of a realvariable is differentiable at x0 if we can find a linear function T : �→ � such that(2) holds. The main difference is that (2) makes sense even when f is a vector-valued function of a vector variable. In this case, however, we consider a lineartransformation instead of a linear function.

1

Page 2: ch4

Definition 1.1. Let f : E → �m, where E ⊆ �n is an open set containing x0.Then f is said to be differentiable at x0 if there exists a linear transformation T :�n → �m such that

lim|h|→0

| f (x0 + h) − f (x0) − T (h)||h| = 0. (3)

We say that f is differentiable on E if f is differentiable at x0 for every x0 ∈ E.

Some remarks are in order here. Firstly, in (3) the norm in the numerator istaken in �m, while the norm in the denominator is taken in �n. Secondly, x0 is aninterior point of E since E is open. This implies that there exists r > 0 such thatB(x0; r) ⊆ E. If h ∈ �n is taken in such a way that 0 < |h| < r, then

|(x0 + h) − x0| = |h| < r,

i.e., x0 + h ∈ B(x0; r) ⊆ E. Therefore f is always defined at x0 + h provided that|h| is small but positive.

The next proposition asserts the uniqueness of the linear transformation satis-fying (3).

Proposition 1.2. Let f : E → �m, where E ⊆ �n is an open set containing x0.Suppose that there exist linear transformations S : �n → �m and T : �n → �m

such thatlim|h|→0

| f (x0 + h) − f (x0) − S (h)||h| = 0

andlim|h|→0

| f (x0 + h) − f (x0) − T (h)||h| = 0.

Then S = T.

Proof. For h ∈ �n r {0}, we have by the Triangle Inequality that

0 ≤ |S (h) − T (h)||h| =

|S (h) − [ f (x0 + h) − f (x0)] + [ f (x0 + h) − f (x0)] − T (h)||h|

≤ | f (x0 + h) − f (x0) − S (h)||h| +

| f (x0 + h) − f (x0) − T (h)||h| .

Taking the limit as |h| → 0 and using the hypotheses on S and T , we obtain

lim|h|→0

|S (h) − T (h)||h| = 0.

2

Page 3: ch4

Let x = (1/t)h, where t ∈ � r {0}. Then |h| = |tx| = |t||x| and therefore |h| → 0is equivalent to t → 0. Thus, using the linearity of S and T ,

0 = limt→0

|S (tx) − T (tx)||tx| = lim

t→0

|t||S (x) − T (x)||t||x|

= limt→0

|S (x) − T (x)||x|

=|S (x) − T (x)|

|x| ,

where the last line holds since the function inside the limit is independent of |t|.Then |S (x) − T (x)| = 0, or S (x) = T (x) for every x ∈ �n r {0}. Moreover,S (0) = 0 = T (0) since S and T are linear transformations. This proves thatS (x) = T (x) for every x ∈ �n, i.e., S = T . �

Due to this uniqueness result, we may denote T by D f (x0) and call it thederivative of f at x0. We denote the matrix of the linear transformation D f (x0)by f ′(x0), and call it the Jacobian matrix of f at x0. This is also sometimesdenoted by J f (x0). We reiterate that D f (x0) is a linear transformation from �n

to �m, while f ′(x0) is an m × n matrix. When m = n = 1, then f ′(x0) is a 1 × 1matrix whose single entry is the number which is denoted by f ′(x0) in elementarycalculus.

Example 1.3. Fix c ∈ �m. Let f : �n → �m be defined by f (x) = c for everyx ∈ �n. Define the linear transformation T (h) = 0 ∈ �m for every h ∈ �n. Then

lim|h|→0

| f (x0 + h) − f (x0) − T (h)||h| = lim

|h|→0

|c − c − 0||h| = 0.

By uniqueness of the derivative, D f (x0)(h) = 0 for every h ∈ �n.

Example 1.4. Let f : �n → �m be a linear transformation. Define T such thatT (h) = f (h) for every h ∈ �n. It follows that T is also a linear transformation.Then

lim|h|→0

| f (x0 + h) − f (x0) − T (h)||h| = lim

|h|→0

| f (x0) + f (h) − f (x0) − f (h)||h| = 0.

By uniqueness of the derivative, D f (x0)(h) = f (h) for every h ∈ �n.

Example 1.5. Let f : �n → � be the scalar field defined by

f (x) = ⟨c, x⟩ = c1x1 + · · · + cnxn,

3

Page 4: ch4

where c = (c1, . . . , cn) (a fixed n-vector) and x = (x1, . . . , xn). Using the propertiesof the inner product, it is easily verified that f is a linear transformation. Theprevious example then gives D f (x0)(h) = f (h) = ⟨c, h⟩. For each j = 1, . . . , n,

D f (x0)(e j) = ⟨c, e j⟩ = c j.

Hence f ′(x0) is the 1 × n matrix

f ′(x0) =(c1 · · · cn

).

Example 1.6. Let f : �2 → � be the scalar field defined by

f (x) = f (x1, x2) = x21x2, x = (x1, x2).

Suppose that x0 = (1, 1), and take the linear transformation (verify!)

T (h) = T (h1, h2) = 2h1 + h2, h = (h1, h2).

Then

f (x0 + h) − f (x0) − T (h) = f (1 + h1, 1 + h2) − f (1, 1) − T (h1, h2)

= (1 + h1)2(1 + h2) − 1 − (2h1 + h2)

= h21 + 2h1h2 + h2

1h2.

Recall that|h1| ≤ |h|, |h2| ≤ |h|.

Hence

| f (x0 + h) − f (x0) − T (h)| = |h21 + 2h1h2 + h2

1h2|≤ |h1|2 + 2|h1||h2| + |h1|2|h2|≤ |h|2 + 2|h|2 + |h|3,

so that0 ≤ | f (x0 + h) − f (x0) − T (h)|

|h| ≤ 3|h| + |h|2.

Taking the limit as |h| → 0 gives

lim|h|→0

| f (x0 + h) − f (x0) − T (h)||h| = 0.

By uniqueness of the derivative, D f (x0)(h) = D f (1, 1)(h1, h2) = 2h1 + h2. Since

D f (x0)(e1) = 2, D f (x0)(e2) = 1,

the Jacobian matrix of f at x0 is f ′(x0) = f ′(1, 1) =(2 1

).

Finding the derivative of a function using the definition, as well as the matrixassociated with it, is difficult in general. Fortunately, we will find an easier wayof doing it once we have studied the properties of the derivative.

4

Page 5: ch4

2 Algebra of derivativesIn this section we look at the properties of the derivative. As to be expected, theseproperties are generalizations of those obtained in elementary calculus.

Proposition 2.1. If f is differentiable at x0, then it is continuous at x0.

Proof. Suppose that f is differentiable at x0. Then

lim|h|→0

| f (x0 + h) − f (x0) − D f (x0)(h)||h| = 0.

Also, since D f (x0) is a linear transformation, there exists K > 0 such that

|D f (x0)(h)| ≤ K|h|

for every h ∈ �n. If h , 0, then the Triangle Inequality yields

0 ≤ | f (x0 + h) − f (x0)| = | f (x0 + h) − f (x0) − D f (x0)(h) + D f (x0)(h)|≤ | f (x0 + h) − f (x0) − D f (x0)(h)| + |D f (x0)(h)|

=| f (x0 + h) − f (x0) − D f (x0)(h)|

|h| |h| + |D f (x0)(h)|

≤ | f (x0 + h) − f (x0) − D f (x0)(h)||h| |h| + K|h|.

Taking the limit as |h| → 0 shows that

lim|h|→0| f (x0 + h) − f (x0)| = 0,

i.e., f is continuous at x0. �

Note that the converse is not always true, as we have seen in elementary cal-culus. For example, the function f : � → � defined by f (x) = |x| is continuousat x0 = 0 but is not differentiable there.

Proposition 2.2. Let f : E → �m and g : E → �m, where E ⊆ �n is an openset containing x0, and suppose that t ∈ �. If f and g are differentiable at x0, thenf + g and t f are also differentiable at x0. Furthermore,

D( f + g)(x0) = D f (x0) + Dg(x0) and D(t f )(x0) = tD f (x0).

Proof. If f and g are differentiable at x0, then

lim|h|→0

| f (x0 + h) − f (x0) − S (h)||h| = 0

5

Page 6: ch4

andlim|h|→0

|g(x0 + h) − g(x0) − T (h)||h| = 0,

whereS = D f (x0) and T = Dg(x0).

We claim that S +T is the derivative of f + g and tS is the derivative of t f . Recallthat S + T and tS are linear transformations if S and T are. For h , 0, we see that

0 ≤ |( f + g)(x0 + h) − ( f + g)(x0) − (S + T )(h)||h|

=| f (x0 + h) − f (x0) − S (h) + g(x0 + h) − g(x0) − T (h)|

|h|

≤ | f (x0 + h) − f (x0) − S (h)||h| +

|g(x0 + h) − g(x0) − T (h)||h| .

Taking the limit as |h| → 0 gives

lim|h|→0

|( f + g)(x0 + h) − ( f + g)(x0) − (S + T )(h)||h| = 0.

By uniqueness of the derivative, D( f + g)(x0) = S + T = D f (x0) + Dg(x0).Similarly, for h , 0,

0 ≤ |(t f )(x0 + h) − (t f )(x0) − (tS )(h)||h| =

|t f (x0 + h) − t f (x0) − tS (h)||h|

= |t| | f (x0 + h) − f (x0) − S (h)||h| ,

which tends to zero as |h| → 0. Again, by uniqueness of the derivative, we con-clude that D(t f )(x0) = tS = tD f (x0). �

Theorem 2.3 (Chain Rule). Let E ⊆ �n and F ⊆ �m be open sets with x0 ∈ E.Let f : E → �m and g : F → �p be functions such that f (E) ⊆ F. If f isdifferentiable at x0 and g is differentiable at f (x0), then the composition g ◦ f isdifferentiable at x0 and

D(g ◦ f )(x0) = Dg( f (x0)) ◦ D f (x0).

We remark that in terms of matrices of linear transformations, this result saysthat

(g ◦ f )′(x0) = g′( f (x0)) f ′(x0).

When m = n = p = 1, we recover the usual Chain Rule of elementary calculus.

6

Page 7: ch4

Proof. Let y0 = f (x0). For notational simplicity we introduce

S = D f (x0) and T = Dg(y0).

For h ∈ �n and k ∈ �m, define the remainder functions

R f (h) = f (x0 + h) − f (x0) − S (h),Rg(k) = g(y0 + k) − g(y0) − T (k).

Then

lim|h|→0

|R f (h)||h| = 0 and lim

|k|→0

|Rg(h)||k| = 0

since f is differentiable at x0 and g is differentiable at y0. We see that

(g ◦ f )(x0 + h) = g( f (x0 + h)) = g( f (x0) + S (h) + R f (h)).

Takingk = S (h) + R f (h)

and recalling that y0 = f (x0), the definition of Rg gives

(g ◦ f )(x0 + h) = g( f (x0) + S (h) + R f (h))= g(y0 + k)= g(y0) + T (k) + Rg(k)= g( f (x0)) + T (S (h) + R f (h)) + Rg(S (h) + R f (h)).

Since T is linear, T (S (h) + R f (h)) = T (S (h)) + T (R f (h)). Hence

(g ◦ f )(x0 + h) = g( f (x0)) + Rg(S (h) + R f (h)) + T (S (h)) + T (R f (h))= (g ◦ f )(x0) + (T ◦ S )(h) + [T (R f (h)) + Rg(S (h) + R f (h))].

If we define Rg◦ f by

Rg◦ f (h) = T (R f (h)) + Rg(S (h) + R f (h)),

then the above equation is equivalent to

(g ◦ f )(x0 + h) = (g ◦ f )(x0) + (T ◦ S )(h) + Rg◦ f (h).

Our task is to prove that

lim|h|→0

|Rg◦ f (h)||h| = 0,

7

Page 8: ch4

which would be equivalent to showing that

lim|h|→0

|(g ◦ f )(x0 + h) − (g ◦ f )(x0) − (T ◦ S )(h)||h| = 0.

It would then follow from uniqueness of the derivative that

D(g ◦ f )(x0) = T ◦ S = Dg(y0) ◦ D f (x0) = Dg( f (x0)) ◦ D f (x0).

We observe that since S and T are linear, there exist positive numbers K andL such that

|S (x)| ≤ K|x| and |T (x)| ≤ L|x|for all x ∈ �n. If h , 0, then using the Triangle Inequality gives

|Rg◦ f (h)||h| ≤

|T (R f (h))||h| +

|Rg(S (h) + R f (h))||h|

=|T (R f (h))||h| +

|Rg(k)||h|

≤L|R f (h)||h| +

|Rg(k)||k||k||h| .

Since f is differentiable at x0, it is continuous at x0 and

lim|h|→0|k| = lim

|h|→0|S (h) + R f (h)| = lim

|h|→0| f (x0 + h) − f (x0)| = 0.

Moreover, the Triangle Inequality again gives

|k||h| =

|S (h) + R f (h)||h|

=| f (x0 + h) − f (x0)|

|h|

=| f (x0 + h) − f (x0) − S (h) + S (h)|

|h|

≤ | f (x0 + h) − f (x0) − S (h)||h| +

|S (h)||h|

≤|R f (h)||h| + K.

Combining the upper bounds yields

0 ≤|Rg◦ f (h)||h| ≤

L|R f (h)||h| +

|Rg(k)||k|

[ |R f (h)||h| + K

].

8

Page 9: ch4

Taking the limit as |h| → 0 (which implies that |k| → 0), we obtain

lim|h|→0

|Rg◦ f (h)||h| = 0

as required. �

Proposition 2.4. Let f : E → �m be a vector field, where E ⊆ �n is an open setcontaining x0. Suppose that f has components

f (x) = ( f1(x), . . . , fm(x)),

where fi : E → � are scalar fields for all i = 1, . . . ,m. Then f is differentiable atx0 if and only if each fi for i = 1, . . . ,m is differentiable at x0, and

D f (x0)(h) = (D f1(x0)(h), . . . ,D fm(x0)(h))

for every h ∈ �n.

Proof. First, let us suppose that each fi for i = 1, . . . ,m is differentiable at x0.Note that D fi(x0) is a linear transformation from �n to �. Let T : �n → �m bethe linear transformation whose value at each h ∈ �n is

T (h) = (D f1(x0)(h), . . . ,D fm(x0)(h)).

Then

f (x0 + h) − f (x0) − T (h)= ( f1(x0 + h) − f1(x0) − D f1(x0)(h), . . . , fm(x0 + h) − fm(x0) − D fm(x0)(h)).

Before we proceed, note that

|x| ≤m∑

i=1

|xi|

for any x = (x1, . . . , xm) ∈ �m. To prove this, we can show by induction on m that

m∑i=1

x2i =

m∑i=1

|xi|2 ≤ m∑

i=1

|xi|2

.

Taking the square root of both sides yields

|x| = m∑

i=1

x2i

1/2

≤m∑

i=1

|xi|.

9

Page 10: ch4

Using the above bound,

0 ≤ | f (x0 + h) − f (x0) − T (h)||h| ≤

m∑i=1

| fi(x0 + h) − fi(x0) − D fi(x0)(h)||h| .

Each of the terms on the right-hand side tends to zero as |h| → 0 since each fi isdifferentiable at x0. Therefore

lim|h|→0

| f (x0 + h) − f (x0) − T (h)||h| = 0,

i.e., f is differentiable at x0.On the other hand, suppose that f is differentiable at x0. Then fi = πi ◦ f ,

where πi is a projection function, since

(πi ◦ f )(x) = πi( f (x)) = πi( f1(x), . . . , fm(x)) = fi(x).

Since πi is a linear transformation, it is differentiable at f (x0). By the ChainRule, since f is differentiable at x0 and πi is differentiable at f (x0), it followsthat fi = πi ◦ f is differentiable at x0 for all i = 1, . . . ,m. �

Lemma 2.5. Let (h, k), (x0, y0) ∈ �2. We have the following results:

(i) If s : �2 → � is defined for every (x, y) ∈ �2 by

s(x, y) = x + y,

thenDs(x0, y0)(h, k) = h + k,

i.e., Ds(x0, y0) = s.

(ii) If p : �2 → � is defined for every (x, y) ∈ �2 by

p(x, y) = xy,

thenDp(x0, y0)(h, k) = y0h + x0k.

Proof. Part (i) is a special case of Example 1.5 since

s(x, y) = ⟨(1, 1), (x, y)⟩.

ThenDs(x0, y0)(h, k) = ⟨(1, 1), (h, k)⟩ = h + k = s(x0, y0)(h, k).

10

Page 11: ch4

For (ii), we see that

|p(x0 + h, y0 + k) − p(x0, y0) − (y0h + x0k)||(h, k)|

=|(x0 + h)(y0 + k) − x0y0 − (y0h + x0k)|

|(h, k)|

=|hk||(h, k)| .

But

|hk| ≤|h|2 if |k| ≤ |h|,|k|2 if |h| ≤ |k|.

In either case, |hk| ≤ |h|2 + |k|2. Thus

|hk||(h, k)| ≤

h2 + k2

√h2 + k2

=√

h2 + k2

and0 ≤ |p(x0 + h, y0 + k) − p(x0, y0) − (y0h + x0k)|

|(h, k)| ≤√

h2 + k2.

Taking the limit as√

h2 + k2 → 0 implies that

lim|(h,k)|→0

|p(x0 + h, y0 + k) − p(x0, y0) − (y0h + x0k)||(h, k)| = 0,

i.e.,Dp(x0, y0)(h, k) = y0h + x0k.

The last set of derivative properties are valid for scalar fields. We include thecase of the derivative of a sum here for completeness, but it is also valid for vectorfields (see Proposition 2.2).

Proposition 2.6. Let f : E → � and g : E → �, where E ⊆ �n is an openset containing x0. If f and g are differentiable at x0, then f + g, f g, and f /g (ifg(x0) , 0) are also differentiable at x0. Furthermore,

D( f + g)(x0) = D f (x0) + Dg(x0),

D( f g)(x0) = g(x0)D f (x0) + f (x0)Dg(x0),

and

D(

fg

)(x0) =

g(x0)D f (x0) − f (x0)Dg(x0)g(x0)2 .

11

Page 12: ch4

Proof. We use the previous lemma here. To prove the first result, we note that

f + g = s ◦ ( f , g).

Then the Chain Rule gives

D( f + g)(x0) = Ds( f (x0), g(x0)) ◦ D( f , g)(x0)= s ◦ (D f (x0),Dg(x0))= D f (x0) + Dg(x0).

For the second result, we see that

f g = p ◦ ( f , g).

Again, the Chain Rule implies that

D( f g)(x0) = Dp( f (x0), g(x0)) ◦ D( f , g)(x0)= Dp( f (x0), g(x0)) ◦ (D f (x0),Dg(x0))= Dp( f (x0), g(x0))(D f (x0),Dg(x0))= g(x0)D f (x0) + f (x0)Dg(x0).

Finally, the third result is partially proved as follows. Let

h =fg

or f = gh.

Applying the second result,

D f (x0) = g(x0)Dh(x0) + h(x0)Dg(x0).

But h(x0) = f (x0)/g(x0), so

D(

fg

)(x0) = Dh(x0) =

1g(x0)

[D f (x0) − f (x0)

g(x0)Dg(x0)

]=

g(x0)D f (x0) − f (x0)Dg(x0)g(x0)2 .

This is only a partial proof since we implicitly assumed that h = f /g was alreadyshown to be differentiable at x0. �

12

Page 13: ch4

3 Directional derivatives and partial derivativesHere we introduce the concepts of the “directional derivative” and the “partialderivative”, the latter being a special case of the former. We shall see that thecalculation of the Jacobian matrix f ′(x0) is facilitated by calculating the “partialderivatives” of f at x0.

Definition 3.1. Let f : E ⊆ �, where E ⊆ �n is an open set containing x0. Letv ∈ �n. If

limt→0

f (x0 + tv) − f (x0)t

exists, then we call it the directional derivative of f at x0 in the direction of v anddenote it by D f (x0; v).

Note that since f is a scalar field, the above limit is just the ordinary limit of areal-valued function of the real variable t. Also, D f (x0; v) is a real number, unlikeD f (x0) (a linear transformation) or f ′(x0) (a matrix). Some authors require that vbe a unit vector, i.e., |v| = 1, but we do not need to do so here.

Example 3.2. Suppose that f : �n → � is a linear transformation. Then

f (x0 + tv) − f (x0)t

=f (x0) + t f (v) − f (x0)

t= f (v).

Hence for every v ∈ �n,

D f (x0; v) = limt→0

f (x0 + tv) − f (x0)t

= limt→0

f (v) = f (v).

Example 3.3. Let f : �n → � be defined by

f (x) =12|x|2.

Find D f (ei; e j). Note that if x = (x1, . . . , xn), then this function is just

f (x) = f (x1, . . . , xn) =12

(x21 + · · · + x2

n).

For every x0, v ∈ �n,

f (x0 + tv) − f (x0)t

=|x0 + tv|2 − |x0|2

2t

=⟨x0 + tv, x0 + tv⟩ − ⟨x0, x0⟩

2t

=⟨x0, x0⟩ + 2t⟨x0, v⟩ + t2⟨v, v⟩ − ⟨x0, x0⟩

2t

= ⟨x0, v⟩ +12

t|v|2.

13

Page 14: ch4

HenceD f (x0; v) = lim

t→0

f (x0 + tv) − f (x0)t

= ⟨x0, v⟩

and

D f (ei; e j) = ⟨ei, e j⟩ =0 if i , j,

1 if i = j.

Example 3.4. Let f : �2 → � be defined for all (x, y) ∈ �2 by

f (x, y) = exy.

Find the directional derivative of f at (−1, 1) in the direction of (−1,−1).For any (x0, y0), (a, b) ∈ �2,

f (x0 + at, y0 + bt) − f (x0, y0)t

=e(x0+at)(y0+bt) − ex0y0

t.

But

limt→0

e(x0+at)(y0+bt) − ex0y0

t= lim

t→0e(x0+at)(y0+bt)(ay0 + bx0 + 2abt)

= (ay0 + bx0)ex0y0

using L’Hopital’s Rule. Therefore

D f ((x0, y0); (a, b)) = (ay0 + bx0)ex0y0 .

In particular,

D f ((−1, 1); (−1,−1)) = (−1)(1) + (−1)(−1) = 0.

If v = ei, then we call D f (x0; ei) the ith partial derivative of f at x0 and wewrite

Di f (x0) = D f (x0; ei).

Other notations for the ith partial derivative of f at x0 are

∂ f∂xi

(x0) and fxi(x0),

where x = (x1, . . . , xn) and f (x) = f (x1, . . . , xn).Suppose that Di f (x0) exists for all x0 belonging to an open set E ⊆ �n. Then

we can define a function Di f : E → � whose value is Di f (x) at each x ∈ E. Inother words, Di f (x0) is a real number for a fixed x0, but if x0 is allowed to varyover E, then Di f becomes a function.

14

Page 15: ch4

For each x = (x1, . . . , xn) ∈ E,

Di f (x) = limt→0

f (x + tei) − f (x)t

= limt→0

f (x1, . . . , xi + t, . . . , xn) − f (x1, . . . , xi, . . . , xn)t

,

which is the ordinary derivative of a real-valued function g defined by

g(xi) = f (x1, . . . , xi, . . . , xn).

Thus the problem of finding the ith partial derivative of f reduces to that of find-ing the ordinary derivative of f with respect to xi while keeping the other vari-ables x1, . . . , xi−1, xi+1, . . . , xn fixed.

Example 3.5. Letf (x, y, z) = ex2−y2

cos z

for every (x, y, z) ∈ �3. Find D1 f , D2 f , and D3 f at (1,−2, π/2).We have

D1 f (x, y, z) = 2xex2−y2cos z,

D2 f (x, y, z) = −2yex2−y2cos z,

D3 f (x, y, z) = −ex2−y2sin z.

Then D1 f (1,−2, π/2) = 0, D2 f (1,−2, π/2) = 0, and D3 f (1,−2, π/2) = −e−3.

Example 3.6. Verify that the scalar field

u(x, y) = log(ex/2 + ey/3)

satisfies2∂u∂x+ 3∂u∂y= 1

for all (x, y) ∈ �2. This is an example of a partial differential equation. Weremark that it is a standard convention to suppress the arguments in the partialderivatives when writing a partial differential equation, e.g.,

∂u∂x

instead of∂u∂x

(x, y).

Keeping y fixed and applying the ordinary Chain Rule in the variable x gives

∂u∂x=

12

ex/2

ex/2 + ey/3 .

15

Page 16: ch4

Similarly, keeping x fixed and applying the ordinary Chain Rule in the variable ygives

∂u∂y=

13

ey/3

ex/2 + ey/3 .

Hence

2∂u∂x+ 3∂u∂y=

ex/2

ex/2 + ey/3 +ey/3

ex/2 + ey/3 =ex/2 + ey/3

ex/2 + ey/3 = 1

for all (x, y) ∈ �2.

Example 3.7. Verify that the scalar field

w(x, y, z) =√

x2 + y2 + z2

satisfies the partial differential equation

w2x + w2

y + w2z = 1

for all (x, y, z) ∈ �3 r {(0, 0, 0)}.It is clear that

wx =2x

2√

x2 + y2 + z2=

x√x2 + y2 + z2

,

wy =2y

2√

x2 + y2 + z2=

y√x2 + y2 + z2

,

wz =2z

2√

x2 + y2 + z2=

z√x2 + y2 + z2

.

Therefore

w2x + w2

y + w2z =

x2

x2 + y2 + z2 +y2

x2 + y2 + z2 +z2

x2 + y2 + z2 = 1

for all (x, y, z) ∈ �3 r {(0, 0, 0)}.

We have seen that if f is differentiable at x0, then it is continuous at x0. Itis natural to ask whether the same result holds for partial derivatives, namely,if D1 f (x0), . . . ,Dn f (x0) exist, is it necessarily true that f is continuous at x0?Consider f : �2 → � defined by

f (x, y) =

xyx2+y2 if (x, y) , (0, 0),

0 if (x, y) = (0, 0).(4)

16

Page 17: ch4

Let us take (x0, y0) = (0, 0) and calculate D1 f (0, 0) and D2 f (0, 0). By definition,

D1 f (0, 0) = D f ((0, 0); e1)

= limt→0

f (0 + t, 0) − f (0, 0)t

= limt→0

f (t, 0)t

= 0

and

D2 f (0, 0) = D f ((0, 0); e2)

= limt→0

f (0, 0 + t) − f (0, 0)t

= limt→0

f (0, t)t

= 0.

Note that for f (t, 0) we have used the first part of the definition for f since t ,0. Hence both partial derivatives exist at (0, 0). However, we have seen in theprevious chapter that this function is not continuous at (0, 0). Thus we should notuse the partial derivative as our generalization of the usual derivative since it doesnot satisfy the analogous condition that differentiability implies continuity. In thissense, partial differentiability is weaker than differentiability as defined in termsof linear transformations.

4 Calculation of the derivative via partial deriva-tives

Although directional derivatives (hence partial derivatives as well) do not havesome desired properties that we want for a derivative (e.g., differentiability impliescontinuity), they are still useful since we can express the Jacobian matrix f ′(x0)in terms of the partial derivatives of the component functions that make up thefunction f . Showing this result is one of the main goals of this section.

Theorem 4.1 (Necessary condition for differentiability). The following resultshold for scalar and vector fields:

(i) Let f : E → � be a scalar field, where E ⊆ �n is an open set containingx0. Assume that f is differentiable at x0, with derivative D f (x0). Then thedirectional derivative D f (x0; v) exists for every v ∈ �n and

D f (x0)(v) = D f (x0; v).

17

Page 18: ch4

In fact,D f (x0)(v) = ⟨∇ f (x0), v⟩,

where ∇ f (x0) is the gradient of f at x0 and is defined by the n-vector

∇ f (x0) = (D1 f (x0), . . . ,Dn f (x0)).

Moreover, f ′(x0) is the 1 × n matrix

f ′(x0) =(D1 f (x0) D2 f (x0) · · · Dn f (x0)

).

(ii) Let f : E → �m be a vector field, where E ⊆ �n is an open set containingx0. Suppose that f is of the form

f (x) = ( f1(x), . . . , fm(x))

for every x in E, where fi : E → � are scalar fields for all i = 1, . . . ,m.Assume that f is differentiable at x0, with derivative D f (x0). Then for alli = 1, . . . ,m and for all v ∈ �n, the directional derivative D fi(x0; v) existsand

D f (x0)(v) = (D f1(x0; v), . . . ,D fm(x0; v)).

In fact,D f (x0)(v) = (⟨∇ f1(x0), v⟩, . . . , ⟨∇ fm(x0), v⟩)

and f ′(x0) is the m × n matrix

f ′(x0) =

D1 f1(x0) D2 f1(x0) · · · Dn f1(x0)D1 f2(x0) D2 f2(x0) · · · Dn f2(x0)...

......

D1 fm(x0) D2 fm(x0) · · · Dn fm(x0)

. (5)

The gradient of f is also denoted by grad f . The expression ∇ f (x0) is read as“del f at x0” or “grad f at x0”.

Proof. We first consider (i). Suppose that v = 0. Then

D f (x0; 0) = limt→0

f (x0) − f (x0)t

= 0.

On the other hand, D f (x0) is a linear transformation; hence D f (x0)(0) = 0. There-fore

D f (x0)(0) = D f (x0; 0).

18

Page 19: ch4

Now suppose that v , 0. Since f is differentiable at x0, then

lim|h|→0

|R f (h)||h| = 0,

whereR f (h) = f (x0 + h) − f (x0) − D f (x0).

Let h = tv, where t , 0. Then |h| = |t||v| and |h| → 0 is equivalent to t → 0.Moreover,

f (x0 + tv) = f (x0) + D f (x0)(tv) + R f (tv) = f (x0) + tD f (x0)(v) + R f (tv)

since D f (x0) is a linear transformation. Then∣∣∣∣∣ f (x0 + tv) − f (x0)t

− D f (x0)(v)∣∣∣∣∣ = |R f (tv)|

|t| =|R f (tv)||t||v| |v| =

|R f (h)||h| |v|

and

limt→0

∣∣∣∣∣ f (x0 + tv) − f (x0)t

− D f (x0)(v)∣∣∣∣∣ = lim

t→0

|R f (h)||h| |v| = |v| lim|h|→0

|R f (h)||h| = 0.

This implies that

limt→0

[f (x0 + tv) − f (x0)

t− D f (x0)(v)

]= 0

orD f (x0; v) = lim

t→0

f (x0 + tv) − f (x0)t

= D f (x0)(v)

for all v , 0.If v = v1e1 + · · · + vnen, then the linearity of D f (x0) implies that

D f (x0)(v) = D f (x0)(v1e1 + · · · + vnen)= v1D f (x0)(e1) + · · · + vnD f (x0)(en)= D1 f (x0)v1 + · · · + Dn f (x0)vn

= ⟨∇ f (x0), v⟩.

For all j = 1, . . . , n,

D f (x0)(e j) = ⟨∇ f (x0), e j⟩ = D j f (x0).

Hencef ′(x0) =

(D1 f (x0) D2 f (x0) · · · Dn f (x0)

).

19

Page 20: ch4

To prove (ii), suppose that f is differentiable at x0. By Proposition 2.4, each fi

is differentiable at x0 and

D f (x0)(v) = (D f1(x0)(v), . . . ,D fm(x0)(v))

for every v ∈ �n. Since each fi is a differentiable scalar field, we see from Part (i)that D fi(x0; v) exists for all v ∈ �n. In particular, taking v = e j for j = 1, . . . , nimplies that the partial derivatives D j fi(x0) exist for all i = 1, . . . ,m and for allj = 1, . . . , n. Also from Part (i) we see that

D fi(x0)(v) = D fi(x0; v) = ⟨∇ fi(x0), v⟩

for all i = 1, . . . ,m and for all v ∈ �n. Then

D f (x0)(v) = (D f1(x0; v), . . . ,D fm(x0; v))= (⟨∇ f1(x0), v⟩, . . . , ⟨∇ fm(x0), v⟩).

Finally, for each j = 1, . . . , n,

D f (x0)(e j) = (⟨∇ f1(x0), e j⟩, . . . , ⟨∇ fm(x0), e j⟩)= (D j f1(x0), . . . ,D j fm(x0)),

and the components of this vector form the jth column of f ′(x0); thus (5) follows.�

Example 4.2. Let us revisit Example 1.6, where f : �2 → � is the scalarfield f (x1, x2) = x2

1x2 and (x0, y0) = (1, 1). We have

D1 f (x1, x2) = 2x1x2, D2 f (x1, x2) = x21;

hence∇ f (x1, x2) = (2x1x2, x2

1) and ∇ f (1, 1) = (2, 1).

For any v = (v1, v2) ∈ �2,

D f (1, 1)(v) = D f (1, 1)(v1, v2) = ⟨∇ f (1, 1), (v1, v2)⟩ = 2v1 + v2

andf ′(1, 1) =

(D1 f (1, 1) D2 f (1, 1)

)=

(2 1

).

Example 4.3. Let f : �2 → �2 be given by

f (x, y) = (e3x+2y, sin(2x + 3y))

for every (x, y) ∈ �2. Find f ′(0, π).

20

Page 21: ch4

The component functions of f are

f1(x, y) = e3x+2y, f2(x, y) = sin(2x + 3y).

ThenD1 f1(x, y) = 3e3x+2y, D2 f1(x, y) = 2e3x+2y

andD1 f2(x, y) = 2 cos(2x + 3y), D2 f2(x, y) = 3 cos(2x + 3y).

Thus

f ′(0, π) =(D1 f1(0, π) D2 f1(0, π)D1 f2(0, π) D2 f2(0, π)

)=

(3e2π 2e2π

−2 −3

).

The last proposition shows that the directional derivative is linear in its secondargument.

Proposition 4.4. Let f : E → � be a scalar field, where E ⊆ �n is an open setcontaining x0. Let v,w ∈ �n and t ∈ �. Then

D f (x0; tv) = tD f (x0; v).

Moreover, if f is differentiable at x0, then

D f (x0; v + w) = D f (x0; v) + D f (x0; w).

Proof. Without loss of generality, we may assume that t , 0 since the equalityclearly holds when t = 0. We see that

D f (x0; tv) = limu→0

f (x0 + u(tv)) − f (x0)u

= t limu→0

f (x0 + (tu)v) − f (x0)tu

= t lims→0

f (x0 + sv) − f (x0)s

(s = tu)

= tD f (x0; v).

If f is differentiable at x0, then by Theorem 4.1,

D f (x0; v) = D f (x0)(v)

for every v ∈ �n. Since D f (x0) is a linear transformation,

D f (x0)(v + w) = D f (x0)(v) + D f (x0)(w),

orD f (x0; v + w) = D f (x0; v) + D f (x0; w).

21

Page 22: ch4

5 Sufficient condition for differentiabilityThe main problem with the calculations in the previous two examples is that wehave implicitly assumed that f was differentiable at x0. But in general given afunction f , we do not know a priori whether the function is differentiable at x0, sothe conclusions of Theorem 4.1 cannot be inferred. Another way of looking at thisproblem is as follows. Theorem 4.1 says that differentiability implies the existenceof all the partial derivatives. Does the existence of all the partial derivatives implythe differentiability of f ? The answer to this is ‘No’, as we can see from (4). Thepartial derivatives of f exist at (0, 0) but it is not differentiable there since it is notcontinuous at (0, 0). However, we now prove that if all the partial derivatives of fexist and are continuous, then f is differentiable.

Definition 5.1. Let f : E → �m, where E ⊆ �n is open. Then f is said to becontinuously differentiable (or of class C1) on E if f is continuous on E and thepartial derivatives D j fi (i = 1, . . . ,m and j = 1, . . . , n) are also continuous on E.In such case we write f ∈ C1(E).

Theorem 5.2 (Sufficient condition for differentiability). Let f : E → �m, whereE ⊆ �n is open. If f ∈ C1(E), then f is differentiable on E.

Proof. It suffices to show the case when f is a scalar field since the vector fieldcase will follow from Proposition 2.4. We wish to prove that

lim|h|→0

| f (x0 + h) − f (x0) − ⟨∇ f (x0), h⟩||h| = 0

for every x0 ∈ E. By uniqueness of the derivative, it would follow that f isdifferentiable on E and D f (x0)(v) = ⟨∇ f (x0), v⟩ for every v ∈ �n.

Since E is open, there exists r > 0 such that B(x0; r) ⊆ E. Let h = (h1, . . . , hn)with 0 < |h| < r/n. Then

|(x0 + h) − x0| = |h| <rn≤ r

and x0 + h ∈ B(x0; r) ⊆ E. We can express h in terms of the standard basis as

h = h1e1 + · · · + hnen.

Let us construct a finite sequence of vectors

w0 = 0, w1 = h1e1, w2 = h1e1 + h2e2, . . . , wn = h1e1 + · · · + hnen = h.

Thenw j = w j−1 + h je j (6)

22

Page 23: ch4

for all j = 1, . . . , n. We saw above that x0 + wn = x0 + h ∈ B(x0; r); now we wantto verify that x0 + w j ∈ B(x0; r) for all j = 1, . . . , n. Since

w j = h1e1 + · · · + h je j,

and recalling that|h j| ≤ |h|

for all j = 1, . . . , n, we obtain

|(x0 + w j) − x0| = |w j| ≤ |h1||e1| + · · · + |h j||e j| = |h1| + · · · + |h j| ≤ j|h| ≤ n|h| < r.

This verifies that x0 + w j ∈ B(x0; r) for all j = 1, . . . , n. This allows us to evaluatef at each x0 + w j since f is defined on E and B(x0; r) ⊆ E.

We express f (x0 + h) − f (x0) as a telescoping sum and use (6) to give

f (x0 + h) − f (x0) = f (x0 + wn) − f (x0)

=

n∑j=1

[ f (x0 + w j) − f (x0 + w j−1)]

=

n∑j=1

[ f (x0 + w j−1 + h je j) − f (x0 + w j−1)].

For all t ∈ [0, 1] and j = 1, . . . , n, define g j : [0, 1]→ � by

g j(t) = f (x0 + w j−1 + th je j).

We claim that each g j is differentiable (in the sense of elementary calculus) and

g′j(t) = h jD j f (x0 + w j−1 + th je j).

We see from the definition of the usual derivative that

g′j(t) = lims→0

g j(t + s) − g j(t)s

= lims→0

f (x0 + w j−1 + (t + s)h je j) − f (x0 + w j−1 + th je j)s

= lims→0

f ((x0 + w j−1 + th je j) + sh je j) − f (x0 + w j−1 + th je j)s

= D f (x0 + w j−1 + th je j; sh je j)= sh jD f (x0 + w j−1 + th je j; e j)= sh jD j f (x0 + w j−1 + th je j),

23

Page 24: ch4

where we used the fact that the directional derivative “scales” its second argument.It is not difficult to see that

|w j−1| = |h1e1 + · · · + h j−1e j−1| ≤ |h1| + · · · |h j−1|,

so that

|(x0 + w j−1 + th je j) − x0| = |w j−1 + th je j|≤ |h1| + · · · + |h j−1| + |h j|≤ j|h|< r

since j ≤ n and |h| < r/n. Hence x0 + w j−1 + th je j ∈ B(x0; r) ⊆ E and

D j f (x0 + w j−1 + th je j),

for all j = 1, . . . , n, exist given that the partial derivatives of f exist on E (beingcontinuous on E). This implies that g′j(t) for all j = 1, . . . , n also exist, i.e., eachg j is differentiable on [0, 1], thus proving the claim.

Hence, since g j is differentiable on [0, 1] (implying that g j is continuous on[0, 1]), we conclude from the Mean Value Theorem that there exists t∗ ∈ (0, 1)such that

g′j(t∗) = g j(1) − g j(0),

orf (x0 + w j−1 + h je j) − f (x0 + w j−1) = h jD j f (x0 + w j−1 + t∗h je j).

To finish off the proof, we deduce that

f (x0 + h) − f (x0) − ⟨∇ f (x0), h⟩ =n∑

j=1

[ f (x0 + w j−1 + h je j) − f (x0 + w j−1)]

−n∑

j=1

h jD j f (x0)

=

n∑j=1

h j[D j f (x0 + w j−1 + t∗h je j) − D j f (x0)].

The Triangle Inequality gives

| f (x0 + h) − f (x0) − ⟨∇ f (x0), h⟩| ≤n∑

j=1

|h j||D j f (x0 + w j−1 + t∗h je j) − D j f (x0)|

≤ |h|n∑

j=1

|D j f (x0 + w j−1 + t∗h je j) − D j f (x0)|.

24

Page 25: ch4

If h , 0, then we get

0 ≤ | f (x0 + h) − f (x0) − ⟨∇ f (x0), h⟩||h| ≤

n∑j=1

|D j f (x0 + w j−1 + t∗h je j) − D j f (x0)|.

Recalling that|w j| ≤ |h1| + · · · + |h j| ≤ j|h| ≤ n|h|

and denotingv = w j−1 + t∗h je j,

we have|v| ≤ n|h| + t∗|h j| ≤ (n + t∗)|h|.

Thus |h| → 0 implies that |v| → 0 and

lim|h|→0|D j f (x0 + w j−1 + t∗h je j) − D j f (x0)| = lim

|v|→0|D j f (x0 + v) − D j f (x0)| = 0

since each D j f is continuous on E by assumption. Finally, we are able to arrive at

lim|h|→0

| f (x0 + h) − f (x0) − ⟨∇ f (x0), h⟩||h| = 0,

proving that f is indeed differentiable at x0 for all x0 ∈ E provided that f ∈C1(E). �

6 Higher-order partial derivativesLet f : E → � be a scalar field, where E ⊆ �n is open. Suppose that Di f (x0)exists for all x0 ∈ E. We have seen that Di f defines a function from E to �. Thenthe jth partial derivative of Di f , provided it exists, is given by D j(Di f ). Thereare several notations for this second-order partial derivative (also known as amixed partial derivative), namely,

D j(Di f ) = Di, j f = fxi x j =∂2 f∂x j∂xi

=∂

∂x j

(∂ f∂xi

).

Note that the indices in the notations Di, j and fxi x j are written in reverse. Whenj = i, we write

∂2 f∂x2

i

instead of∂2 f∂xi∂xi

.

Higher-order partial derivatives can also be considered.

25

Page 26: ch4

Example 6.1. Let f (x, y) = x2y3 for all (x, y) ∈ �2. Then

D1 f (x, y) = 2xy3, D2 f (x, y) = 3x2y2.

The second-order partial derivatives are therefore

D1,1 f (x, y) = 2y3, D1,2 f (x, y) = 6xy2

andD2,1 f (x, y) = 6xy2, D2,2 f (x, y) = 6x2y.

A few third-order partial derivatives are

D1,1,1 f (x, y) = 0, D1,1,2 f (x, y) = 6y2, D1,2,1 f (x, y) = 6y2, D1,2,2 f (x, y) = 12xy.

Example 6.2. Find nonzero real numbers a and b such that

u(x, t) = cos at sin bx

satisfies the partial differential equation

∂2u∂t2 =

∂2u∂x2

for all (x, t) ∈ �2.Substituting u gives

∂2u∂t2 −

∂2u∂x2 = −a2 cos at sin bx + b2 cos at sin bx

= (b2 − a2) cos at sin bx.

If we take b = ±a, where a is any nonzero real number, then

∂2u∂t2 −

∂2u∂x2 = 0

for all (x, t) ∈ �2.

In the previous examples we saw that D j(Di f ) = Di(D j f ), although this isnot always the case. The next theorem gives sufficient conditions for the mixedpartial derivatives to be the same. Before stating and proving it, let us first look ata useful lemma.

Lemma 6.3. Let f : E → � be a scalar field, where E ⊆ �n is an open setcontaining v. Let w ∈ �n and t ∈ � such that v + tw ∈ E. Furthermore, supposethat D f (v + tw; w) exists and define

g(t) = f (v + tw).

Theng′(t) = D f (v + tw; w).

26

Page 27: ch4

Proof. We have

g(t + u) − g(t)u

=f (v + tw + uw) − f (v + tw)

u

Taking the limit as u→ 0 gives

g′(t) = D f (v + tw; w).

Theorem 6.4 (Equality of mixed partial derivatives). Let f : E → � be ascalar field, where E ⊆ �n is an open set and n ≥ 2. For fixed i and j, withi , j, suppose that Di f , D j f , Di, j f , and D j,i f all exist on E. If Di, j f and D j,i f arecontinuous on E, then

Di, j f (x0) = D j,i f (x0)

for every x0 ∈ E.

Proof. Let x0 ∈ E. Then there exists r > 0 such that B(x0; r) ⊆ E since E is open.Let h = (h1, . . . , hn) such that 0 < |h| < r/n. Then

|(x0 + h) − x0| = |h| <rn≤ r,

implying that x0 + h ∈ B(x0; r) ⊆ E. Note also that

2|h| < 2rn≤ r or 2|h| < r

since n ≥ 2.Fix i and j with i , j. We can write

h = h1e1 + · · · + hnen.

Then|(x0 + hiei + h je j) − x0| = |hiei + h je j| ≤ |hi| + |h j| ≤ 2|h| < r.

This implies that x0+hiei+h je j ∈ B(x0; r) ⊆ E. Similarly, we observe that x0+hiei

and x0 + h je j both belong to B(x0; r) ⊆ E. Thus the expression

∆ = f (x0 + hiei + h je j) − f (x0 + hiei) − f (x0 + h je j) + f (x0)

is well defined.For s, t ∈ [0, 1], define the functions

F(s; t) = f (x0 + shiei + th je j) − f (x0 + shiei) (t fixed)

27

Page 28: ch4

andG(t; s) = f (x0 + shiei + th je j) − f (x0 + th je j) (s fixed).

It is easy to see that

F(1; 1) − F(0; 1) = ∆ = G(1, 1) −G(0; 1).

For a fixed t ∈ [0, 1],

F(s + u; t) − F(s; t)u

=f (x0 + shiei + uhiei + th je j) − f (x0 + shiei + uhiei)

u

−f (x0 + shiei + th je j) − f (x0 + shiei)

u

=f (x0 + shiei + th je j + uhiei) − f (x0 + shiei + th je j)

u

− f (x0 + shiei + uhiei) − f (x0 + shiei)u

.

Taking the limit as u→ 0,

F′(s; t) = D f (x0 + shiei + th je j; hiei) − D f (x0 + shiei; hiei)

from the definition of the directional derivative. Using Proposition 4.4 and thedefinition of the partial derivative, we can rewrite this as

F′(s; t) = hiD f (x0 + shiei + th je j; ei) − hiD f (x0 + shiei; ei)= hi[Di f (x0 + shiei + th je j) − Di f (x0 + shiei)].

Since

|(x0 + shiei + th je j) − x0| = |shiei + th je j| ≤ |s||hi| + |t||h j| ≤ |hi| + |h j| ≤ 2|h| < r

for any s, t ∈ [0, 1], we see that x0+ shiei+ th je j and x0+ shiei belong to B(x0; r) ⊆E, where Di f exists by assumption. Thus F(·; t) is differentiable on [0, 1] for everyfixed t ∈ [0, 1]. By the Mean Value Theorem, there exists s∗1 = s∗1(t) ∈ (0, 1) suchthat

F′(s∗1; t) = F(1; t) − F(0; t).

Similarly, for a fixed s ∈ [0, 1],

G(t + u; s) −G(t; s)u

=f (x0 + shiei + th je j + uh je j) − f (x0 + th je j + uh je j)

u

−f (x0 + shiei + th je j) − f (x0 + th je j)

u

=f (x0 + shiei + th je j + uh je j) − f (x0 + shiei + th je j)

u

−f (x0 + th je j + uh je j) − f (x0 + th je j)

u.

28

Page 29: ch4

Taking the limit as u→ 0,

G′(t; s) = D f (x0 + shiei + th je j; h je j) − D f (x0 + th je j; h je j)

from the definition of the directional derivative. Using Proposition 4.4 and thedefinition of the partial derivative, we can rewrite this as

G′(t; s) = h jD f (x0 + shiei + th je j; e j) − h jD f (x0 + th je j; e j)= h j[D j f (x0 + shiei + th je j) − D j f (x0 + th je j)].

As before, x0+ shiei+ th je j and x0+ th je j belong to B(x0; r) ⊆ E, where D j f existsby assumption. Thus G(·; s) is differentiable on [0, 1] for every fixed s ∈ [0, 1].By the Mean Value Theorem, there exists t∗1 = t∗1(s) ∈ (0, 1) such that

G′(t∗1; s) = G(1; s) −G(0; s).

Combining the above results yields

F′(s∗1; 1) = ∆ = G′(t∗1; 1),

where

F′(s∗1; 1) = F(1; 1) − F(0; 1) = hi[Di f (x0 + s∗1hiei + h je j) − Di f (x0 + s∗1hiei)]

and

G′(t∗1; 1) = G(1; 1) −G(0; 1) = h j[D j f (x0 + hiei + t∗1h je j) − D j f (x0 + t∗1h je j)].

Now, for each u ∈ [0, 1], define

F(u) = Di f (x0 + s∗1hiei + uh je j) = Di f ((x0 + s∗1hiei) + u(h je j))

andG(u) = D j f (x0 + uhiei + t∗1h je j) = D j f ((x0 + t∗1h je j) + u(hiei)).

ThenF′(s∗1; 1) = hi[F(1) − F(0)], G′(t∗1; 1) = h j[G(1) − G(0)]. (7)

Lemma 6.3 and Proposition 4.4 give

F′(u) = D(Di f )((x0 + s∗1hiei) + u(h je j); h je j)= h jD(Di f )((x0 + s∗1hiei) + u(h je j); e j)= h jD j(Di f )(x0 + s∗1hiei + uh je j)

29

Page 30: ch4

and

G′(u) = D(D j f )((x0 + t∗1h je j) + u(hiei); hiei)= hiD(D j f )((x0 + t∗1h je j) + u(hiei); ei)= hiDi(D j f )((x0 + t∗1h je j) + u(hiei))= hiDi(D j f )(x0 + uhiei + t∗1h je j).

Then|(x0 + s∗1hiei + uh je j) − x0| ≤ s∗1|hi| + u|h j| ≤ 2|h| < r

and|(x0 + uhiei + t∗1h je j) − x0| ≤ u|h1| + t∗1|h j| ≤ 2|h| < r,

i.e., x0 + s∗1hiei + uh je j and x0 + uh1e1 + t∗1h je j both belong to B(x0; r) ⊆ E, whereDi, j f and D j,i f both exist. This implies that F and G are both differentiable on[0, 1]. By the Mean Value Theorem there exist s∗2, t

∗2 ∈ (0, 1) such that

F′(s∗2) = F(1) − F(0), G′(t∗2) = G(1) − G(0).

Recalling that F′(s∗1; 1) = ∆ = G′(t∗1; 1), we obtain from (7) that

hi[F(1) − F(0)] = h j[G(1) − G(0)]

orhiF′(s∗2) = h jG′(t∗2).

But this is the same as

hih jD j(Di f )(x0 + s∗1hiei + s∗2h je j) = h jhiDi(D j f )(x0 + t∗2hiei + t∗1h je j)

orD j(Di f )(x0 + s∗1hiei + s∗2h je j) = Di(D j f )(x0 + t∗2hiei + t∗1h je j).

Note that

0 ≤ |s∗1hiei + s∗2h je j| ≤ s∗1|hi| + s∗2|h j| ≤ |hi| + |h j| ≤ 2|h|

and0 ≤ |t∗2hiei + t∗1h je j| ≤ t∗2|hi| + t∗1|h j| ≤ |hi| + |h j| ≤ 2|h|.

As |h| → 0,

|s∗1hiei + s∗2h je j| → 0 and |t∗2hiei + t∗1h je j| → 0.

Then the continuity of Di, j f and D j,i f at x0 implies that

D j(Di f )(x0) = Di(D j f )(x0).

30

Page 31: ch4

7 Weak form of the Chain RuleThe next result, sometimes called the Chain Rule, is actually a weaker form in thesense that the partial derivatives are assumed to be continuously differentiable.Recall that in the actual Chain Rule we only require the function to be differen-tiable. Nevertheless, this weaker version is one of the most useful results in thecalculus of several variables.

Theorem 7.1 (Weak form of the Chain Rule). Suppose that gi : �n → � forall i = 1, . . . ,m are continuously differentiable at x0 ∈ �n. Let f : �m → � bedifferentiable at (g1(x0), . . . , gm(x0)). Define the function h : �n → � by

h(x) = f (g1(x), . . . , gm(x)).

Then

D jh(x0) =m∑

i=1

Di f (g(x0))D jgi(x0) =m∑

i=1

Di f (g1(x0), . . . , gm(x0))D jgi(x0).

Proof. We see that h = f ◦ g, where

g(x) = (g1(x), . . . , gm(x)).

Since gi : �n → � is continuously differentiable at x0, then gi is differentiableat x0 for all i = 1, . . . ,m. Then g : �n → �m is also differentiable at x0 fromProposition 2.4. Hence by the Chain Rule

h′(x0) = ( f ◦ g)′(x0) = f ′(g(x0))g′(x0),

whereh′(x0) =

(D1h(x0) D2h(x0) · · · Dnh(x0)

),

f ′(g(x0)) =(D1 f (g(x0)) D2 f (g(x0)) · · · Dm f (g(x0))

),

and

g′(x0) =

D1g1(x0) D2g1(x0) · · · Dng1(x0)D1g2(x0) D2g2(x0) · · · Dng2(x0)...

...D1gm(x0) D2gm(x0) · · · Dngm(x0).

31

Page 32: ch4

The left- and right-hand sides are each 1 × n matrices. Equating the ith entriesgives

D1h(x0)= D1 f (g(x0))D1gi(x0) + D2 f (g(x0))D1g2(x0) + · · · + Dm f (g(x0))D1gm(x0),

D2h(x0)= D1 f (g(x0))D2g1(x0) + D2 f (g(x0))D2g2(x0) + · · · + Dm f (g(x0))D2gm(x0),...

Dnh(x0)= D1 f (g(x0))Dng1(x0) + D2 f (g(x0))Dng2(x0) + · · · + Dm f (g(x0))Dngm(x0).

Therefore the jth equation is

D jh(x0) =m∑

i=1

Di f (g(x0))D jgi(x0) =m∑

i=1

Di f (g1(x0), . . . , gm(x0))D jgi(x0).

Let us look at special cases of Theorem 7.1. Suppose that m = n = 2, so that

h(x, y) = f (u(x, y), v(x, y)).

Then

D1h(x, y) = D1 f (u(x, y), v(x, y))D1u(x, y) + D2 f (u(x, y), v(x, y))D1v(x, y),D2h(x, y) = D1 f (u(x, y), v(x, y))D2u(x, y) + D2 f (u(x, y), v(x, y))D2v(x, y).

Alternatively, these are expressed as

∂h∂x=∂h∂u∂u∂x+∂h∂v∂v∂x,∂h∂y=∂h∂u∂u∂y+∂h∂v∂v∂y.

Now suppose that m = 2 and n = 3, so that

h(x, y, z) = f (u(x, y, z), v(x, y, z)).

Then

D1h(x, y, z)= D1 f (u(x, y, z), v(x, y, z))D1u(x, y, z) + D2 f (u(x, y, z), v(x, y, z))D1v(x, y, z),

D2h(x, y, z)= D1 f (u(x, y, z), v(x, y, z))D2u(x, y, z) + D2 f (u(x, y, z), v(x, y, z))D2v(x, y, z),

D3h(x, y, z)= D1 f (u(x, y, z), v(x, y, z))D3u(x, y, z) + D2 f (u(x, y, z), v(x, y, z))D3v(x, y, z).

32

Page 33: ch4

Alternatively,

∂h∂x=∂h∂u∂u∂x+∂h∂v∂v∂x,∂h∂y=∂h∂u∂u∂y+∂h∂v∂v∂y,∂h∂z=∂h∂u∂u∂z+∂h∂v∂v∂z.

Finally, suppose that m = 2 and n = 1, so that

h(t) = f (u(t), v(t)).

ThenD1h(t) = D1 f (u(t), v(t))D1u(t) + D2 f (u(t), v(t))D1v(t)

ordhdt=∂h∂u

dudt+∂h∂v

dvdt.

Note that we replaced the appropriate partial derivatives by ordinary derivatives.

Example 7.2. Let f : �2 → � be a scalar field whose value is f (x, y). Supposethat x = r cos θ and y = r sin θ. Define

g(r, θ) = f (r cos θ, r sin θ).

Express∂2g∂θ2

in terms of the partial derivatives of f .We have

∂x∂r= cos θ,

∂x∂θ= −r sin θ,

∂y∂r= sin θ,

∂y∂θ= r cos θ.

Therefore∂g∂r=∂ f∂x∂x∂r+∂ f∂y∂y∂r=∂ f∂x

cos θ +∂ f∂y

sin θ

and∂g∂θ=∂ f∂x∂x∂θ+∂ f∂y∂y∂θ= −∂ f∂x

r sin θ +∂ f∂y

r cos θ.

Note that this last equation is the same as

D2g(r, θ) = −D1 f (r cos θ, r sin θ)r sin θ + D2 f (r cos θ, r sin θ)r cos θ.

Hence

D2,2g(r, θ)= −D1 f (r cos θ, r sin θ)(r cos θ)− [D1,1 f (r cos θ, r sin θ)(−r sin θ) + D1,2 f (r cos θ, r sin θ)(r cos θ)](r sin θ)+ D2 f (r cos θ, r sin θ)(−r sin θ)+ [D2,1 f (r cos θ, r sin θ)(−r sin θ) + D2,2 f (r cos θ, r sin θ)(r cos θ)](r cos θ),

33

Page 34: ch4

which can also be expressed as

∂2g∂θ2= −∂ f∂x

r cos θ −(−∂

2 f∂x2 r sin θ +

∂2 f∂y∂x

r cos θ)

r sin θ

− ∂ f∂y

r sin θ +(− ∂

2 f∂x∂y

r sin θ +∂2 f∂y2 r cos θ

)r cos θ

= −∂ f∂x

r cos θ +∂2 f∂x2 r2 sin2 θ − ∂

2 f∂y∂x

r2 sin θ cos θ

− ∂ f∂y

r sin θ − ∂2 f∂x∂y

r2 sin θ cos θ +∂2 f∂y2 r2 cos2 θ.

Exercises1. Let f : �n → � be the scalar field given by

f (x) = |x|4.

Compute D f (x0; v) for any x0, v ∈ �n.

2. Let T : �n → �n be a given linear transformation. Let f : �n → � be thescalar field whose value is

f (x) = ⟨x,T (x)⟩.

Compute D f (x0; v) for any x0, v ∈ �n.

3. A set E ⊆ �n is said to be convex if for every x, y ∈ E, the line segment

{tx + (1 − t)y : 0 ≤ t ≤ 1}

also belongs to E.

(a) Prove that every open n-ball is convex.

(b) Suppose that f : E → � is a scalar field, where E ⊆ �n is convex.Prove that if D f (x0; v) = 0 for every x0 ∈ E and for every v ∈ �n, thenf is constant on E.

4. For the following scalar fields defined on an appropriate subset of �2, findall the first-order partial derivatives and verify that D2(D1 f ) = D1(D2 f ):

(a)f (x, y) = (x2 − y2)2;

34

Page 35: ch4

(b)f (x, y) =

x√x2 + y2

;

(c)

f (x, y) = tanx2

y;

(d)f (x, y) = xy;

(e)f (x, y) = tan−1 x + y

1 − xy.

5. Let

v(r, t) = tn exp(−r2

4t

).

Find a value of the constant n such that

∂v∂t=

1r2

∂r

(r2∂v∂r

).

6. Consider the vector field f : �3 → �3 defined by

f (x, y, z) = (x, y, z)

for every (x, y, z) ∈ �3. Find the Jacobian matrix of f at any point (x, y, z).

7. Let f : �2 → �2 and g : �3 → �2 be vector fields defined by

f (x, y) = (ex+2y, sin(2x + y)), g(u, v,w) = (u + 2v2 + 3w3,−u2 + 2v),

respectively.

(a) Compute f ′(x, y) and g′(u, v,w);

(b) Find h(u, v,w) = f (g(u, v,w));

(c) Compute h′(1,−1, 1).

8. Define

f (x, y) =∫ √

xy

0e−t2 dt (x, y > 0).

Find fx and fy in terms of x and y.

35

Page 36: ch4

9. A function u is defined by an equation of the form

u(x, y) = x f(

x + yxy

).

Show that u satisfies a partial differential equation of the form

x2∂u∂x− y2∂u∂y= F(x, y)u

and find F(x, y).

10. The substitution x = es, y = et converts f (x, y) into g(s, t), where g(s, t) =f (es, et). If f is known to satisfy the partial differential equation

x2∂2 f∂x2 + y2∂

2 f∂y2 + x

∂ f∂x+ y∂ f∂y= 0,

show that g satisfies the partial differential equation

∂2g∂s2 +

∂2g∂t2 = 0.

11. Let f : E → � be a scalar field, where E ⊆ �n is open. Assume that f isdifferentiable on E. We say that f is homogeneous of degree p over E if

f (tx) = tp f (x)

for every t > 0 and for every x ∈ E for which tx ∈ E. For a homogeneousscalar field of degree p, show that

⟨x,∇ f (x)⟩ = p f (x)

for each x ∈ E. If x = (x1, . . . , xn), then this equation can be expressed as

x1∂ f∂x1+ · · · + xn

∂ f∂xn= p f (x1, . . . , xn).

(Hint: For a fixed x, define g(t) = f (tx) and compute g′(1).)

12. This is the converse of the previous problem. Prove that if f satisfies

⟨x,∇ f (x)⟩ = p f (x)

for all x in an open set E, then f must be homogeneous of degree p. (Hint:For a fixed x, define g(t) = f (tx) − tp f (x) and compute g′(t).)

36