introduction to real analysis spring 2014 lecture notesvern/4332lec.pdf · introduction to real...

Introduction to Real Analysis

Spring 2014 Lecture Notes

Vern I. Paulsen

April 22, 2014

Contents

1 Sequences and Series of Functions 51.1 Behavior of Riemann Integrals with Limits . . . . . . . . . . 81.2 Uniform Convergence and Continuity . . . . . . . . . . . . . . 101.3 Uniform Convergence and Derivatives . . . . . . . . . . . . . 131.4 Series of Functions . . . . . . . . . . . . . . . . . . . . . . . . 151.5 An Increasing Function with a Dense Set of Discontinuities . 181.6 A Space Filling Curve . . . . . . . . . . . . . . . . . . . . . . 221.7 Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241.8 Operations on Power Series . . . . . . . . . . . . . . . . . . . 281.9 Taylor Series . . . . . . . . . . . . . . . . . . . . . . . . . . . 341.10 Polynomial Approximation . . . . . . . . . . . . . . . . . . . 371.11 The Stone-Weierstrass Theorem . . . . . . . . . . . . . . . . . 401.12 Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . 431.13 Orthonormal Sets of Functions . . . . . . . . . . . . . . . . . 451.14 Fourier Series, Continued . . . . . . . . . . . . . . . . . . . . 491.15 Equicontinuity and the Arzela-Ascoli Theorem . . . . . . . . 53

2 Multivariable Differential Calculus 572.1 The Total Derivative . . . . . . . . . . . . . . . . . . . . . . . 602.2 Differentiability and Continuity . . . . . . . . . . . . . . . . . 642.3 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . 652.4 Multivariable Mean Value Theorem . . . . . . . . . . . . . . . 672.5 Continuously Differentiable Functions . . . . . . . . . . . . . 692.6 The Inverse Function Theorem . . . . . . . . . . . . . . . . . 712.7 The Multi-Variable Newton and Quasi-Newton Methods . . . 762.8 The Implicit Function Theorem . . . . . . . . . . . . . . . . . 782.9 Local Extrema and the Second Derivative Test . . . . . . . . 842.10 Taylor Series in Several Variables . . . . . . . . . . . . . . . . 88

3

4 CONTENTS

Chapter 1

Sequences and Series ofFunctions

In this chapter we introduce different notions of convergence for sequenceand series of functions and then examine how integrals and derivatives be-have upon taking limits of functions in these various senses. We then applythese results to power series and Fourier series.

Definition 1.1. Given a set X, a metric space (Y, ρ), functions fn : X →Y, n ∈ N and f : X → Y, we say that the sequence of functions {fn}converges pointwise to f provided that for each x ∈ X, the sequenceof points {fn(x)} converges to the point f(x) in the metric of Y. That is,

provided that limn ρ(fn(x), f(x)) = 0. When this occurs we write, fnptw−−→ f.

Note that the statement that {fn} converges pointwise to f is equivalentto the requirement that for each ε > 0 and x ∈ X, there is a Nx so thatwhen n > Nx we have that ρ(fn(x), f(x)) < ε. That is, since pointwiseconvergence only requires convergence at each point in X, the value that wetake for N could depend on the individual point x as well as on ε. When thevalue for N can be picked depending only on ε and independent of the pointx, then we call the convergence uniform. The precise definition follows.

Definition 1.2. Given a set X, a metric space (Y, ρ), functions fn : X → Y,n ∈ N and f : X → Y, we say that the sequence of functions {fn} convergesuniformly to f provided that for each ε > 0, there is N, so that whenn > N, then for every x ∈ X, we have ρ(fn(x), f(x)) < ε. When this occurswe write, fn

u−→ f.

Note that uniform convergence always implies pointwise convergence,since it is the stronger condition that N is independent of the point x.

5

6 CHAPTER 1. SEQUENCES AND SERIES OF FUNCTIONS

Just as pointwise convergence is the requirement that limn ρ(fn(x), f(x)) =0, uniform convergence is equivalent to the requirement that the sequence ofnumbers sn = sup{ρ(fn(x), f(x)) : x ∈ X} converges to 0. That is, providedthat

limn

[sup{ρ(fn(x), f(x)) : x ∈ X}] = 0.

We prove this below.

Proposition 1.3. Let X be a set, let (Y, ρ) a metric space, let fn : X → Y,n ∈ N and f : X → Y be functions and set

sn = sup{ρ(fn(x), f(x)) : x ∈ X}.

Then fnu−→ f if and only if limn→∞ sn = 0.

Proof. First assume that fnu−→ f. Let ε > 0 be given. By the definition of

uniform convergence, there is a N so that when n > N, for every x ∈ X, wehave ρ(fn(x), f(x)) < ε/2. But this implies that for n > N, we have

sn = sup{ρ(fn(x), f(x)) : x ∈ X} ≤ ε/2 < ε.

Thus, for n > N we have that |sn−0| < ε and since ε was arbitrary, thisproves that limn→∞ sn = 0.

Conversely, assume that limn→∞ sn = 0. Then given ε > 0, there is aN so that n > N implies that 0 ≤ sn < ε whihc implies that for everyn > N and for every x ∈ X, we have ρ(fn(x), f(x)) < ε. This proves thatfn

u−→ f.

Note that for both of these definitions, we did not need for X to be ametric space, although in many of the interesting examples, we will alsohave X a metric space.

Also note that if fnu−→ f, then fn

ptw−−→ f.

Example 1.4. Let X = Y = [0, 1] be endowed with the usual metric, let

fn(x) = xn and let f(x) =

{0 0 ≤ x < 1

1 x = 1. Then it is easy to see that

fnptw−−→ f. To decide if the convergence is also uniform, we compute

sn = sup{ρ(fn(x), f(x)) : 0 ≤ x ≤ 1} =

sup{|xn − f(x)| : 0 ≤ x ≤ 1} = sup{xn : 0 ≤ x < 1} = 1.

Since, limn sup{ρ(fn(x), f(x)) : 0 ≤ x ≤ 1} = 1 6= 0, we conclude that {fn}does not converge uniformly to f.

7

Thus although uniform convergence implies pointwise convergence, wehave that pointwise convergence does not imply uniform convergence. Forthis reason we say that uniform convergence is a stronger convergence thanpointwise convergence.

In this example, each of the functions fn is continuous, but f is clearlynot continuous. Thus, this example shows that the pointwise limit of con-tinuous functions need not be a continuous function. Later we will see thatuniform limits of continuous functions are again continuous.

Example 1.5. This is a slight modification of the first example. Let X =[0, B], Y = [0, 1] with 0 < B < 1, let fn(x) = xn and let f(x) = 0. Now

sn = sup{|fn(x)− f(x)| : x ∈ X} = sup{xn : 0 ≤ x ≤ B} = Bn.

Since, limn→∞Bn = 0, we have that fn

u−→ f.

Example 1.6. Let X = Y = R with the usual metric, let fn(x) = xn and let

f(x) = 0. Then fnptw−−→ f but {fn} does not converge to f uniformly.

Example 1.7. Let X = Y = [−π,+π] with the usual metric, let fn(x) =sin(nx)

n and let f(x) = 0. Since sup{|fn(x)− f(x)| : −π ≤ x ≤ +π} = 1n we

have that fnu−→ f, but f ′n(0) = 1, while f ′(0) = 0.

Thus, even uniform convergence of functions does not guarantee conver-gence of their derivatives.

Problem 1.8. Let X = Y = [0, 1] with the usual metric, let fn : X → Y bedefined by fn(x) = x1/n and let

f(x) =

{0 when x = 0

1 when x 6= 0.

Prove that fnptw−−→ f and that {fn} does not converge to f uniformly.

Problem 1.9. Let X ⊂ R be compact, let fn : X → R be defined byfn(x) = x

n and let f(x) = 0 be the 0 function. Prove that fnu−→ f.

Problem 1.10. Let f : R → R be continuous at 0. Set gn(x) = f(xn) andlet g be the constant function that is equal to f(0).

1. Prove that gnptw−−→ g as functions on R.

2. Give an example of a function f such that the convergence is not uni-form as functions on R.

3. Prove that if X ⊆ R is any compact subset of R, then gnu−→ g as

functions on X.


1.1 Behavior of Riemann Integrals with Limits

We now look at how Riemann integration behaves under pointwise and uni-form convergence.

Example 1.11. Let X = Y = [0, 1] with the usual metric. Recall that therationals are countable and let {rn}n∈N be an enumeration of the rationalnumbers in [0, 1]. We set

fn(x) =

{1 x = rk, 1 ≤ k ≤ n0 otherwise

and f(x) =

{1 x rational

0 x irrational.

It is easily checked that fnptw−−→ f, but for every n, sup{|fn(x)− f(x)| : 0 ≤

x ≤ 1} = 1 and so the sequence does not converge uniformly to f. Now verifythat each function fn is Riemann integrable on [0, 1] with

∫ 10 fn(x)dx = 0,

but f is not Riemann integrable.

Thus, we see that if {fn} is a sequence of Riemann integrable fuctionsthat converges pointwise to a function f, then f might not even be Riemannintegrable! In this case there is no chance that the limit of the integrals isthe integral of the limit.

The next example shows that even when the limit function is Riemannintegrable, pointwise convergence is not enough to guarantee that the limitof the integrals is the integral of the limit.

Example 1.12. Let X = Y = [0, 1], set

fn(x) =

2n2x 0 ≤ x ≤ 1

2n

2n− 2n2x 12n < x ≤ 1

n

0 1n < x ≤ 1

and let f(x) = 0. Then fnptw−−→ f, all of these functions are Riemann inte-

grable on [0, 1], but∫ 10 fn(x)dx = 1/2 for all n, while

∫ 10 f(x)dx = 0.

These last two examples show that the formula,

limn

∫ b

afn(x)dx =

∫ b

alimnfn(x)dx,

is not valid when the limit of the functions is meant in the pointwise sense.We now prove that all of these problems “go away” when one considers

uniform convergence instead.

1.1. BEHAVIOR OF RIEMANN INTEGRALS WITH LIMITS 9

Lemma 1.13. Let α : [a, b] → R be an increasing function and let f, g :[a, b] → R be two bounded functions. If δ = sup{|f(x)− g(x)| : a ≤ x ≤ b},then for any partition P = {x0, ..., xn} of [a, b] e have that U(f,P, α) ≤U(g,P, α) + δ(α(b)− α(a)) and L(f,P, α) ≥ L(g,P, α)− δ(α(b)− α(a)).

Proof. For each x, we have that g(x)− δ ≤ f(x) ≤ g(x) + δ. Hence, for eachsubinterval of the partition we have that mi(g)− δ = inf{g(x) : xi−1 ≤ x ≤xi}−δ ≤ inf{f(x) : xi−1 ≤ x ≤ xi} = mi(f) and Mi(f) = sup{f(x) : xi−1 ≤x ≤ xi} ≤ sup{g(x) : xi−1 ≤ x ≤ xi}+ δ = Mi(g) + δ.

Thus, U(f,P, α) =∑n

i=1Mi(f)(α(xi)−α(xi−1)) ≤∑n

i=1(Mi(g)+δ)(α(xi)−α(xi−1)) = U(g,P, α) + δ(α(b)− α(a)).

The proof of the other inequality is similar.

Theorem 1.14. Let α : [a, b] → R be an increasing function on [a, b] andlet {fn} be a sequence of functions that are Riemann-Stieltjes integrable on[a, b] with respect to α and converge uniformly to a function f on [a, b]. Thenf is Riemann-Stieltjes integrable on [a, b] with respect to α and

limn

∫ b

afndα =

∫ b

afdα.

Proof. We first prove that f satisfies the Riemann-Stieltjes integrability cri-terion on [a, b]. The case that α(a) = α(b) is trivial, so we assume thatα(b)− α(a) > 0.

Given ε > 0, set δ = ε4(α(b)−α(a)) and choose an integer N so that for

n ≥ N we have that sup{|f(x)− fn(x)| : a ≤ x ≤ b} < δ.Since fN is Riemann-Stieltjes integrable, we may choose a partition P,

so that U(fN ,P, α)− L(fN ,P, α) < ε2 . Applying the lemma, we have that

U(f,P, α)− L(f,P, α) ≤[U(fN ,P, α) + δ(α(b)− α(a))]− [L(fN ,P, α)− δ(α(b)− α(a))] =

[U(fN ,P, α)− L(fN ,P, α)] + 2δ(α(b)− α(a)) <ε

2+ε

2.

Thus, f satisfies the Riemann-Stieltjes integrability criterion.Also, for any n ≥ N, applying the lemma again, we have that∫ b

afdα = inf{U(f,P, α) : P} ≤

inf{U(fn,P, α) + δ(α(b)− α(a)) : P} =

∫ b

afndα+

ε

4,


while∫ b

afdα = sup{L(f,P, α) : P} ≥

sup{L(fn,P, α)− δ(α(b)− α(a)) : P} =

∫ b

afndα−

ε

4.

These two inequalities show that for n ≥ N, we have that

|∫ b

afdα−

∫ b

afndα| ≤

ε

4< ε.

Since ε was arbitrary, we have that

limn

∫ b

afndα =

∫ b

afdα.

Problem 1.15. Prove that the functions {fn} in Example 1.7 do convergepointwise to f, prove that each fn is Riemann integrable with Riemann in-tegral equal to 0 and prove that f is not Riemann integrable.

Problem 1.16. Prove that the functions {fn} of Example 1.8 do convergepointwise to f and prove that each fn is Riemann integrable with integralequal to 1/2.

1.2 Uniform Convergence and Continuity

We have already seen that pointwise limits of continuous functions need notbe continuous. In this section we will show that uniform limits preservecontinuity. Some of these results have been seen in a different form inChapter I.4.

Theorem 1.17. Let (X, d) and (Y, ρ) be metric spaces, let fn : X → Y, n ∈N be a sequence of functions from X to Y that converge uniformly to afunction f : X → Y and let x0 ∈ X. If fn is continuous at x0 for all n ∈ N,then f is continuous at x0.

Proof. Given ε > 0, we may pick N so that for n > N, we have thatsup{ρ(fn(x), f(x)) : x ∈ X} < ε/3. Fix any n > N, and using the fact thatfn is continuous at x0, we may pick δ > 0, so that d(x, x0) < δ implies thatρ(fn(x), fn(x0)) < ε/3.

1.2. UNIFORM CONVERGENCE AND CONTINUITY 11

Hence, for d(x, x0) < δ, by applying the triangle inequality twice, wehave that

ρ(f(x), f(x0)) ≤ ρ(f(x), fn(x)) + ρ(fn(x), f(x0)) ≤ρ(f(x), fn(x)) + ρ(fn(x), fn(x0)) + ρ(fn(x0), f(x0)) < ε/3 + ε/3 + ε/3,

and we conclude that f is continuous at x0.

Corollary 1.18. Let (X, d) and (Y, ρ) be metric spaces, let fn : X → Y, n ∈N be a sequence of functions from X to Y that converge uniformly to afunction f : X → Y. If fn is continuous for all n ∈ N, then f is continuous.

We now consider the set of all continuous functions between two metricspaces and show that when the domain space is compact, then this set canbe endowed with a metric.

Lemma 1.19. Let (X, d) and (Y, ρ) be metric spaces and let f, g : X → Y,be continuous functions. Then the function h : X → R defined by h(x) =ρ(f(x), g(x)) is continuous.

Proof. Fix a point x0 ∈ X, and we will show that h is continuous at x0.Givenε > 0, since f and g are continuous at x0, there exists δ1 > 0 and δ2 > 0,so that d(x, x0) < δ1 implies that ρ(f(x), f(x0)) < ε/2, while d(x, x0) < δ2implies that ρ(g(x), g(x0)) < ε/2. Let δ = min{δ1, δ2}, then for d(x, x0) < δ,and using the reverse triangle inequality we have that

|h(x)− h(x0)| = |ρ(f(x), g(x))− ρ(f(x0), g(x0)| ≤|ρ(f(x), g(x))− ρ(f(x0), g(x))|+ |ρ(f(x0), g(x))− ρ(f(x0), g(x0))| ≤

ρ(f(x), f(x0)) + ρ(g(x), g(x0)) < ε/2 + ε/2.

Since ε > 0 was arbitrary, this proves that h is continuous at x0.

Definition 1.20. Given two metric spaces (X, d) and (Y, ρ), we let C(X;Y )denote the set of all continuous functions from X to Y. Given f, g ∈ C(X;Y )we set

γ(f, g) = sup{ρ(f(x), g(x)) : x ∈ X}.

When this supremum is unbounded, we set γ(f, g) = +∞.

Theorem 1.21. Let (X, d) and (Y, ρ) be metric spaces with X compact.Then

1. γ defines a metric on C(X;Y ),


2. a sequence {fn} ⊆ C(X;Y ) converges to f ∈ C(X;Y ) in the metric γif and only if fn

u−→ f,

3. if (Y, ρ) is complete, then (C(X;Y ), γ) is also a complete metric space.

Proof. Given f, g ∈ C(X;Y ), by the above lemma the function x→ ρ(f(x), g(x))is a continuous function from the compact metric space X to R. Hence, itis bounded and so γ(f, g) ≥ 0 is a finite real number. Note that γ(f, g) = 0if and only if ρ(f(x), g(x)) = 0 for every x ∈ X which is if and only iff(x) = g(x) for every x ∈ X.

Clearly, γ(f, g) = γ(g, f). So all that remains to show that γ is a metricis to verify the triangle inequality. To this end let f, g, h ∈ C(X;Y ). Then

γ(f, g) = sup{ρ(f(x), g(x)) : x ∈ X} ≤sup{ρ(f(x), h(x)) + ρ(h(x), g(x)) : x ∈ X} ≤ γ(f, h) + γ(h, g).

Thus, γ is a metric and 1) is proven.Next we prove the second statement. To see this we refer back to Propo-

sition 1.3. Note that the sn of that proposition is sn = ρ(fn, f). Thus, {fn}converges to f in the metric ρ iff limn→∞ ρ(fn, f) = 0 iff limn→∞ sn = 0. ByProposition 1.3, this last statement is equivalent to requiring that fn

u−→ f.To prove the third statement, we let {fn} ⊆ C(X;Y ) be a sequence that

is Cauchy in the metric γ, and we must prove that there exists a functionf ∈ C(X;Y ), so that {fn} converges to f in the metric γ.

Since for each fixed x ∈ X, we have that ρ(fn(x), fm(x)) ≤ γ(fn, fm),we see that the sequence of points in Y, given by {fn(x)} is also Cauchy.Since (Y, ρ) is complete, there is a point that this sequence converges toand we set f(x) = limn fn(x). Since x ∈ X, was an arbitrary point, we seethat we can define a function, f : X → Y, by setting for each point x ∈ X,f(x) = limn fn(x).

If we can prove that the sequence {fn} converges uniformly to f, thenby the above theorem, f will be a continuous function from X to Y and by2) we will have that {fn} converges to f in the metric γ, so our proof willbe complete.

Given ε > 0, since our original sequence of functions is Cauchy, we maypick N, so that for any n,m > N, we have that γ(fn, fm) < ε/2. Now fixx0 ∈ X, and n > N, then we have that

ρ(f(x0), fn(x0)) = limmρ(fm(x0), fn(x0)) ≤ ε/2.

Since x0 was arbitrary, we have that

sup{ρ(f(x), fn(x)) : x ∈ X} ≤ ε/2 < ε,

1.3. UNIFORM CONVERGENCE AND DERIVATIVES 13

for every n > N. This proves that the convergence is uniform. Thus, f ∈C(X;Y ) and we see that this last inequality also shows that γ(f, fn) < ε forn > N. Thus the sequence {fn} converges to the continuous function f inthe metric γ and our proof is complete.

When Y = R with the usual metric, then we simplify notation by settingC(X) = C(X;R). We can now see that Theorem I.4.7 is a special case ofthe above theorem.

When Y = Rk with the usual Euclidean metric, then we can think of afunction in ~f ∈ C(X;Rk) as a vector-valued function ~f(x) = (f1(x), ..., fk(x))where each fi : X → R. Using Theorem I.3.18, we see that ~f is continuousif and only if each fi ∈ C(X).

Problem 1.22. Let α : [a, b]→ R be an increasing function and let R([a, b], α)denote the set of functions on [a,b] that are Riemann-Stieltjes integrable withrespect to α. Given f, g ∈ R([a, b], α) define γ(f, g) = sup{|f(x)−g(x)| : a ≤x ≤ b}. Prove that γ defines a metric on R([a, b], α) and that (R([a, b], α), γ)is a complete metric space.

Problem 1.23. Let ~fn = (f1,n, ..., fk,n) ∈ C(X;Rk) be a sequence of func-

tions. Prove that ~fn converges uniformly to ~g = (g1, ..., gk) if and only if foreach i, 1 ≤ i ≤ k, we have that fi,n converges uniformly to gi.

Problem 1.24. Let fn : [0, 1] → R be defined by fn(x) = nx1+n2x2

. Use

standard results from calculus to prove that fnptw−−→ 0, where 0 denotes the

function that is identically equal to 0 and that limn→∞∫ 10 fn(x)dx = 0. Prove

or disprove that fnu−→ 0.

1.3 Uniform Convergence and Derivatives

In this section we look at one key theorem about the behaviour of derivativesunder the taking of uniform limits. Given a function f : (a, b) → R we willsay that f is differentiable on (a,b) provided that it is differentiable ateach point x, a < x < b.

Theorem 1.25. Let {fn} be a sequence of differentiable functions on (a, b).

If fnptw−−→ f and f ′n

u−→ g on (a, b) then f is differentiable on (a, b), f ′ = g,and fn

u−→ f.


Proof. Fix c, a < c < b and define

h(x) =

{f(x)−f(c)

x−c , x 6= c

g(c), x = c.

If we can show that h is continuous at c, then that will show that f ′(c) existsand is equal to g(c). If we let

hn(x) =

{fn(x)−fn(c)

x−c , x 6= c

f ′n(c), x = c,

then since fn is differentiable, hn is continuous at c. We will prove that h iscontinuous at c by showing that the sequence {hn} converges uniformly toh.

Given ε > 0, we may choose N1 so that for n > N1, we have thatsup{|f ′n(x) − g(x)| : a < x < b} < ε/3. Then for any m,n > N1, and any xwe will have that |f ′m(x)− f ′n(x)| ≤ |f ′m(x)− g(x)|+ |g(x)− f ′n(x)| < 2ε/3.

Thus, applying the Mean Value Theorem, for any m,n > N1 and x 6=c we will have a point x1 between x and c so that |hm(x) − hn(x)| =

| (fm(x)−fn(x))−(fm(c)−fn(c))x−c | = |(fm − fn)′(x1)| < 2ε/3. Thus, for x 6= c, we

have that |h(x)− hn(x)| = limm→+∞ |hm(x)− hn(x)| ≤ 2ε/3 < ε.On the other hand, at the point c, we have that h(c) = g(c) = limn f

′n(c) =

limn hn(c). So we may choose N2 so that for n > N2, we have that |h(c) −hn(c)| < ε. Setting N = max{N1, N2}, we have that for n > N, and anya < x < b, |h(x)−hn(x)| < ε. This proves that the sequence {hn} convergesuniformly to h, which completes the proof that f ′ = g.

To see that fnu−→ f, given any δ > 0, we may choose N1 so that for

n > N1, |hn(x)−h(x)| < δ for all x, a < x < b. Substituting in the definitionsof these functions we see that

|(fn(x)− fn(c))− (f(x)− f(c))| < δ|x− c|,

and hence,|fn(x)− f(x)| < δ|x− c|+ |fn(c)− f(c)|.

Now given ε > 0, choose δ so that δmax{|b− c|, |c− a|} = ε/2 and take

the corresponding N1. Also, since fnptw−−→ f, we may choose N2, so that for

n > N2, we have |fn(c)− f(c)| < ε/2.Finally, taking N3 = max{N1, N2}, we have that for any n > N3, and

any x, a < x < b, we have that

|fn(x)− f(x)| < δ|x− c|+ |fn(c)− f(c)| < ε/2 + ε/2.

1.4. SERIES OF FUNCTIONS 15

The following problem generalizes Theorem 1.25.

Problem 1.26. Let {fn} be a sequence of differentiable functions on (a, b).If {f ′n} converges uniformly to a function g on (a, b) and for some givenpoint x0, a < x0 < b, the sequence of real numbers, {fn(x0)} converges, thenprove that there exists a differentiable function f on (a, b) such that fn

u−→ fwith f ′ = g. [Hint: Use the Mean Value Theorem to show {fn(x)} is Cauchyfor any a < x < b.]

1.4 Series of Functions

Recall that given real numbers, {ak}k∈N we can form the infinite series

+∞∑n=1

ak.

We say that this series converges to a real number s provided that: whenwe form the sequence of partial sums,

sn =

n∑k=1

ak,

then limn sn = s.

When each ak ≥ 0, then the partial sums sn are an increasing sequenceof non-negative reals. Recall that in this case either the sequence {sn} isbounded above and limn sn = sup{sn : n ∈ N} or the sequence {sn} is notbounded above, in which case we write limn sn = +∞ and we will also write∑+∞

k=1 ak = +∞.Now suppose that we have a set X and functions, fk : X → R, k ∈ N.

We wish to define convergence of a series of functions,

+∞∑k=1

fk.

To do this note that for each x ∈ X, we have real numbers {fk(x)}k∈N. Iffor each x ∈ X the series,

+∞∑k=1

fk(x)


converges, then we may define a function s : X → R by setting

s(x) =+∞∑k=1

fk(x).

In the case we say that∑+∞

k=1 fk converges to the function s. Note thatthis is equivalent to saying that the sequence of partial sum functions,

sn =n∑k=1

fk,

converges pointwise to s. For this reason, we will sometimes, for emphasis,say that the series converges pointwise to s and write

s = ptw −+∞∑k=1

fk.

When the sequence {sn} of partial sum functions converges uniformlyto s then we say that the series

∑+∞k=1 fk converges uniformly to s. To

indicate this stronger convergence, we will write

s = u−+∞∑k=1

fk.

Thus, when we write s =∑+∞

k=1 fk we are only asserting that the partialsums converge pointwise. It is important to remember that the “default”notion of convergence for series of functions is pointwise convergence of thepartial sums.

However, as we saw in the earlier sections, unifrom convergence has manyimportant properties, so we will often be interested in when we have thisstronger form of convergence.

Example 1.27. Let X = [0, 1], set f1(x) = x and for n > 1, set fn(x) =xn − xn−1. Then the series “telescopes” and for n > 1, we have that thepartial sums satisfy sn(x) = xn. Thus, by our earlier examples, we see that

snptw−−→ s, where s(x) =

{0 0 ≤ x < 1

1 x = 1and the convergence of these partial

sums is not uniform. Thus, we have

s =+∞∑k=1

fk,

1.4. SERIES OF FUNCTIONS 17

but

s 6= u−+∞∑k=1

fk.

One of the best ways to determine if a series of functions convergesuniformly is given by Weierstrass’ M-test.

Theorem 1.28 (Weierstrass’ M-test). Let X be a set, let fk : X → R,k ∈ N be a sequence of functions and let Mk ∈ R be a sequence satisfyingsup{|fk(x)| : x ∈ X} ≤Mk. If

∑+∞k=1Mk is finite, then there exists a function

s : X → R such that s = u−∑+∞

k=1 fk.

Proof. Set A =∑+∞

k=1Mk. Since each Mk ≥ 0, we have that the partial sumsof the series

∑+∞k=1Mk are increasing to A. Thus, given ε > 0, if we choose

an integer N so that 0 ≤ A−∑N

k=1Mk < ε, then∑+∞

k=N+1Mk < ε.

If we set sn(x) =∑n

k=1 fk(x), then for any m > n > N, we have that

|sm(x) − sn(x)| = |m∑

k=n+1

fk(x)| ≤m∑

k=n+1

|fk(x)| ≤m∑

k=n+1

Mk < ε.

This inequality shows that, for each x ∈ X, the sequence of real numbers{sn(x)} is Cauchy and so has a limit, s(x). Moreover, s(x) =

∑+∞k=1 fk(x).

Finally, since this inequality is independent of x, we have that

|s(x)− sn(x)| = limm|sm(x)− sn(x)| ≤ ε,

for any n > N. Since ε > 0 was arbitrary and since N is independent of thepoint x ∈ X, this shows that the sequence of partial sums {sn} convergesuniformly to s.

Thus, we have that s = u−∑+∞

k=1 fk.

Example 1.29. Let fk : R → R be defined by fk(x) = sin(kx)k3

. Since

|fk(x)| ≤ 1k3

and the series∑+∞

k=11k3

converges( by the integral test), bythe Weierstrass M-test, we have that the partial sums

sn(x) =n∑k=1

fk(x)

converge uniformly to the function s(x) =∑+∞

k=1 fk(x), i.e., s = u−∑+∞

k=1 fk.Since each fk is continuous, we have that s is continuous by Corollary 1.18.


Moreover, note that

s′n(x) =n∑k=1

cos(kx)

k2

and that | cos(kx)k2| ≤ 1

k2which is another convergent series. Applying Weier-

strass’ M-test again, we have that s′nu−→ g where

g(x) = u−+∞∑k=1

cos(kx)

k2.

Thus, g is continuous, and by Theorem 1.25, s′(x) = g(x), in other words,

s′ = u−+∞∑k=1

f ′k

and we get that the “derivative of the sum is the sum of the derivatives” inthis case.

Note that we run into difficulties if we try to differentiate again, we haveno way yet to decide if the series,

+∞∑k=1

f ′′k (x) =

+∞∑k=1

−sin(kx)

k

even converges pointwise.

Problem 1.30. Let fn(x) = x1+n3x2

. Prove that the series∑+∞

n=1 fn(x) con-verges uniformly on R.

1.5 An Increasing Function with a Dense Set ofDiscontinuities

The Weierstrass M-test can be used to construct many functions with in-teresting and surprising properties. Here we construct a function that isincreasing, continuous at every irrational number and discontinuous at ev-ery rational number.

Recall that a function f : R→ R is said to have a jump discontinuityat c provided that both the one-sided limits as x approaches c exist, but

1.5. AN INCREASING FUNCTIONWITH ADENSE SET OF DISCONTINUITIES19

the three numbers f(c), limx→c− f(x) and limx→c+ f(x) do not all have thesame value. A simple example of such a function is the “jump” function

Jc(x) =

{0 x < c

1 c ≤ x.

Also, recall that the rational numbers are countable and let {rn}n∈N bean enumeration of the rationals. We now define a rather “bizarre” functionas follows. Set

B(x) =

+∞∑k=1

1

2kJrk(x).

Since the functions, fk(x) = 12kJrk(x) satisfy

sup{|fk(x)| : x ∈ R} =1

2k

and∑+∞

k=112k

= 1 by the Weierstrass M-test we see that B = u −∑+∞

k=1 fk.We summarize some of the properties of the function B below. First, wewill need a result many have probably seen.

Proposition 1.31. Let g : R→ R be an increasing function and let c ∈ R.Then both of the one-sided limits exist at c with limx→c− g(x) = sup{g(x) :x < c} and limx→c+ g(x) = inf{g(x) : c < x}. Consequently, an increasingfunction can only have jump discontinuities.

Proof. We only do the case of the limit from the left, the other case is similar.Since g is increasing, for x < c, we have g(x) ≤ g(c). Thus, g(c) is an upperbound for the set {g(x) : x < c}. This shows that the supremum exists andis not +∞. Set L− = sup{g(x) : x < c}. Now given ε > 0, L− − ε is nolonger an upper bound, so there exists an x1 with L− − ε < g(x1) ≤ L−.Let δ = c − x1, then for any x with c − δ < x < c, we have x1 < x < cand hence, L− − ε < g(x1) ≤ g(x) ≤ L−. Hence, c− δ < x < c implies that|L− − g(x)| < ε. This proves that limx→c− g(x) = L−.

Proposition 1.32. Let B be the function defined above. Then:

1. B is strictly increasing, that is, if x < y, then B(x) < B(y),

2. B is continuous at every irrational number,

3. B has a jump discontinuity at every rational number with limx→r−n B(x) =

B(rn)− 12n and limx→r+n B(x) = B(rn),


4. limx→−∞B(x) = 0,

5. limx→+∞B(x) = 1.

Proof. To see the first statement, note that x ≤ y implies that fk(x) ≤ fk(y)for every k. Hence, B(x) =

∑+∞k=1 fk(x) ≤

∑+∞k=1 fk(y) = B(y). On the other

hand, if x < y, then there is a rational between them, so there exists nwith x < rn < y. Thus, fn(x) = 0, while fn(y) = 1

2n . This implies thatB(x) + 1

2n ≤ B(y), and the first statement follows.To see the second, let I denote the set of irrational numbers. Then each

function fk is continuous on I. Since B is the uniform sum of continuousfunctions on I, B is continuous on I.

Since B is increasing, the two one-sided limits exist and are given bythe suprema and infima as in the above proposition. Given any ε > 0,choose K > n so that

∑+∞k=K+1

12k

< ε and let δ = min{|rn − rk| : 1 ≤k ≤ K, and k 6= n}. Then for any x with rn < x < rn + δ we have thatfk(rn) = fk(x) for 1 ≤ k ≤ K. Hence,

|B(rn)−B(x)| = B(x)−B(rn) =+∞∑

k=K+1

fk(x)− fk(rn) ≤+∞∑

k=K+1

fk(x) < ε.

This shows that limx→r+n B(X) = B(rn).On the other hand, if rn − δ < x < rn, then fk(x) = fk(rn), 1 ≤ k ≤

K, k 6= n, while fn(x) = 0 and fn(rn) = 12n . Hence,

|(B(rn)− 1

2n)−B(x)| =

+∞∑k=K+1

(fk(rn)− fk(x)) ≤+∞∑

k=K+1

1

2k< ε.

This proves that limx→r−n B(x) = B(rn)− 12n .

To prove statements 4 and 5, given ε > 0, let again∑+∞

k=K+112k< ε and

set m = min{rk : 1 ≤ k ≤ K} and M = max{rk : 1 ≤ k ≤ K}. Thenfor x < m, we have that 0 ≤ B(x) =

∑+∞k=K+1 fk(x) ≤

∑+∞k=K+1

12k

< ε,which proves that limx→−∞B(x) = 0. While for x > M, we have that1− ε <

∑Kk=1

12k

=∑K

k=1 fk(x) ≤∑+∞

k=1 fk(x) ≤∑+∞

k=112k

= 1, which provesthat limx→+∞B(X) = 1.

Since the function B is increasing, it is also possible to do Riemann-Stieltjes integration with respect to B. Because of the series representationof B it is possible to write explicit formulas for this integration. We do thecase when the interval has irrational endpoints to keep things simple.

Recall that continuous functions are always Riemann-Stieltjes integrable.We will also need a well-known estimate, whose proof we leave as an exercise:

1.5. AN INCREASING FUNCTIONWITH ADENSE SET OF DISCONTINUITIES21

Lemma 1.33. Let α : [a, b] → R be increasing and let f : [a, b] → R beRiemann-Stieltjes integrable with respect to α with M = sup{|f(x)|; a ≤ x ≤b}. Then

|∫ b

afdα| ≤M(α(b)− α(a)).

Proposition 1.34. Let a < b be irrational numbers and let f : [a, b] → Rbe continuous and set f(x) = 0 for x /∈ [a, b]. Then

∫ b

afdB =

+∞∑k=1

f(rk)

2k.

Proof. Let BK =∑K

k=1 fk and RK =∑+∞

k=K+1 fk. Then B = BK +RK andBK and RK are both increasing functions. By Proposition I.5.21, we havethat∫ b

afdB =

∫ b

afdBK +

∫ b

afdRK =

K∑k=1

∫ b

afdfk +

∫ b

afdRK =

K∑k=1

f(rk)

2k+

∫ b

afdRK ,

by Theorem I.5.23.

Now we want to show that∫ ba fdRK → 0 as K → +∞. To see this let

M = sup{|f(x)| : a ≤ x ≤ b} and using the lemma, we have that

|∫ b

afdRK | ≤M(Rk(b)−Rk(a)) ≤M

+∞∑k=K+1

1

2k=M

2K,

which clearly tends to 0 as K → +∞.Thus, we have that

∫ b

afdB = lim

K→+∞

K∑k=1

f(rk)

2k,

and the result follows.


With a little care one can improve on Proposition 1.22. If a function fhas a jump discontinuity at c, then we call

δ = | limx→c−

f(x)− limx→c+

f(x)|,

the size of the jump discontinuity.

Problem 1.35. Let f : [a, b] → R be an increasing function. Prove thatif f has jump discontinuities of size at least δ at points {c1, ..., ck}, witha < c1 < ... < ck < b then

k ≤ f(b)− f(a)

δ.

Problem 1.36. Use the result of the last problem to show that an increasingfunction f : [a, b] → R can have at most countably many discontinuities.Prove that an increasing function f : R → R can have at most countablymany discontinuities.

Problem 1.37. Prove Lemma 1.33.

1.6 A Space Filling Curve

Given a continuous function f : R → R2 we often imagine its graph as thepath traced out by a moving point and in calculus we are often interested incomputing the length of the path traveled. We also have a naive concept ofhow dimension behaves. We think of the real line as being “one-dimensional”and imagine that the continuous image of a one-dimensional set should alsobe “one-dimensional”.

In this section we construct a continuous function f : [0, 1]→ [0, 1]×[0, 1]that is onto, which shows how ungraphable a continuous function can be andalso shows that our “primitive” notions of dimension require some concretehypotheses to make them precise. The construction that we present is dueto I.J. Schoenberg.

First we define a continuous function φ : [0, 2]→ [0, 1] by setting

φ(t) =

0, 0 ≤ t ≤ 1/3,

3t− 1, 1/3 ≤ t ≤ 2/3,

1, 2/3 ≤ t ≤ 4/3,

−3t+ 5, 4/3 ≤ t ≤ 5/3,

3t− 1, 5/3 ≤ t ≤ 2.

1.6. A SPACE FILLING CURVE 23

We then extend φ to a periodic function on all of R(still denoted by φ) settingφ(t+ 2) = φ(t). Since φ(0) = φ(2), this periodic extension is continuous.

Now define functions by,

f1(t) =+∞∑k=1

φ(32k−2t)

2kand f2(t) =

+∞∑k=1

φ(32k−1t)

2k.

Since φ is bounded by 1, each of the functions in these series is bounded byMk = 2−k and consequently, they converge uniformly to define continuousfunctions f1, f2 on R. Moreover, since |φ(t)| ≤ 1, we see that |fi(t)| ≤ 1.

Thus, we may define a continuous function f : R → [0, 1] × [0, 1] bysetting f(t) = (f1(t), f2(t)).

We will now prove that f maps [0, 1] onto the unit square [0, 1]× [0, 1].To see this, pick any point (a, b) with 0 ≤ a, b ≤ 1. It is most convenient ifwe use their binary expansion. That is, we choose sequences an, bn ∈ {0, 1},such that

a =+∞∑n=1

an2n

and b =+∞∑n=1

bn2n.

Set c = 2∑+∞

n=1cn3n where c2n−1 = an and c2n = bn. Clearly, 0 ≤ c ≤

2∑+∞

n=113n = 1.

We claim that f1(c) = a and f2(c) = b.Note that

3kc = 2k∑

n=1

cn3n−k

+ 2+∞∑

n=k+1

cn3n−k

.

Since n − k is negative for 1 ≤ n ≤ k, the first sum is an even integer. Letdk denote the second term, then φ(3kc) = φ(dk).

Now if ck+1 = 0, then 0 ≤ dk = 2∑+∞

n=k+2cn

3n−k ≤ 2∑+∞

m=213m = 1/3.

Hence, φ(3kc) = φ(dk) = 0 = ck+1.When ck+1 = 1, a similar calculation shows that 2/3 ≤ dk ≤ 1, and

hence, φ(3kc) = φ(dk) = 1.Thus, in either case, φ(3kc) = ck+1, and we have that

f1(c) =+∞∑k=1

φ(32k−2C)

2k=

+∞∑k=1

c2k−12k

= a,

while

f2(c) =

+∞∑k=1

φ(32k−1c)

2k=

+∞∑k=1

c2k2k

= b.


Problem 1.38. Show that when ck+1 = 1, that 2/3 ≤ dk ≤ 1.

Problem 1.39. Use the above result to construct a continuous functionf : [0, 1]→ [0, 1]× [0, 1]× [0, 1] that is onto.

1.7 Power Series

Given x0 ∈ R by a power series centered at x0 we mean an expressionof the form

+∞∑k=0

ak(x− x0)k,

where ak ∈ R are constants and for convenience of notation we let (x−x0)0denote the function that is constantly equal to 1. Thus, a power series isjust a particular type of series of functions, where each function fk(x) =ak(x− x0)k is a function from R to R.

In this section we study convergence of power series, behavior of theirderiviatives, integrals and the behavior of sums and products of power series.

Note that the above power series always converges to a0 when x = x0.Thus every power series always converges at this one point.

Most theorems about power series rely on the root test for series, so werecall it here. Since the convergence or divergence of a series doesn’t dependon the first few terms, we will write our series as

∑k ak.

Theorem 1.40 (Root Test). Given a series∑

k ak, set

A = lim supk|ak|1/k.

1. If A < 1, then∑

k ak converges absolutely.

2. If A > 1, then the series∑

k ak diverges.

3. There exist series with A = 1 which converge and series with A = 1which diverge, i.e., A = 1 is inconclusive.

Proof. If A < 1, then pick r with A < r < 1. By the definition of thelimsup, there is a K so that for every k ≥ K, we have that |ak|1/k < r. Hence,|ak| < rk for k ≥ K. But the series

∑+∞k=K r

k converges and hence∑+∞

k=K |ak|converges by the comparison test. Thus,

∑ak converges absolutely.

If A > 1, then pick r with A > r > 1. Again by the definition of thelimsup, for every integer K, there must exist k > K, such that |ak|1/k > r.

1.7. POWER SERIES 25

This implies that |ak| > rk > 1. Thus, there are infinitely many terms inthe series that are greater than 1. This implies that limk ak 6= 0, and hencethe series diverges.

Finally,∑

1/k and∑

1/k2 are series for which A = 1, one diverges andthe other converges.

Given a power series as above we set A = lim supn→+∞ |an|1/n, wherewe allow the possibility A = +∞ to indicate that the limit is unbounded.

We set

R =

+∞ A = 0

1/A 0 < A < +∞0 A = +∞

,

and we call R the radius of convergence of the power series. Thefollowing result explains this definition.

Recall that a series of numbers∑+∞

k=0 bk is said to converge absolutelyprovided that the series

∑+∞k=0 |bk| converges and that absolute convergence

of a series implies convergence of the series.

Theorem 1.41. Let∑+∞

k=0 ak(x − x0)k be a power series and let R be theradius of convergence. Then

1. for each fixed x satisfying |x−x0| < R, the series converges absolutelyto define a continuous function s(x) on the set |x− x0| < R,

2. for each x satisfying |x− x0| > R, the series diverges,

3. for each r < R, the series converges uniformly on the closed interval[x0 − r, x0 + r] to the function s.

Proof. If we fix x with |x− x0| = b < R, then we have that

lim supk→+∞

|ak(x− x0)k|1/k = lim supk→+∞

b|ak|1/k = bA < 1.

Thus, by the root test, the series of numbers∑+∞

k=0 ak(x − x0)k convergesabsolutely. This proves the series converges to define a function on the set|x− x0| < R.

If b = |x − x0| > R, then lim supk→+∞ |ak(x − x0)k|1/k = bA > 1. ByTheorem 1.40.2, the series diverges.

To prove the third statement, we apply Weierstrass’ M-test. On theclosed interval [x0 − r, x0 + r] we have that the function fk(x) = ak(x −x0)

k satisfies |fk(x)| ≤ |ak|rk = Mk. Applying the root test to the series


∑+∞k=0Mk, we see that lim supk→+∞M

1/kk = r(lim supk→+∞ |ak|1/k) = rA <

1. Thus, by the root test, the series∑+∞

k=0Mk converges and so by theWeierstrass’ M-test the series of functions converges uniformly on its domain.

Now we show that s is continuous on |x − x0| < R. Given any x1, with|x1 − x0| < R, choose r with |x1 − x0| < r < R. Since the convergence isuniform on |x − x0| ≤ r, and each function ak(x − x0)k is continuous, s(x)is continuous on [x0 − r, x0 + r] and hence at x1. This finishes the proof ofthe theorem.

Corollary 1.42. Let∑+∞

k=0 ak(x − x0)k be a power series. If this series

converges for a real number x1, with |x1 − x0| = R1, then R1 ≤ R.

Example 1.43. Set a0 = 1 and ak = kk for k ≥ 1. Then A = +∞, R = 0and the power series

∑+∞k=0 akx

k only converges for x = 0.

Example 1.44 (The Geometric Series). Set ak = 1 for all k, then A = 1 =R, and the series

∑+∞k=0 x

k converges for −1 < x < +1. This series does notconverge at either of the endpoints, x = +1 or x = −1.

Since∑+∞

k=0 xk = 1

1−x , we see that the partial sums of this series, sN (x) =∑Nk=0 x

k converge to the function 11−x uniformly on any interval of the form

[−r,+r], r < 1.

Another method that is often useful for estimating radii of convergenceis the ratio test. The following result shows that it always gives a lowerestimate on the radius of convergence. We then give an example to showthat it in fact does not always give the actual radius of convergence.

Proposition 1.45. Let B = lim supk→+∞|ak+1||ak| , then lim supk→+∞ |ak|1/k ≤

B. Hence, 1/B ≤ 1/A = R.

Proof. Let δ > 0, since B = infm sup{ |ak+1||ak| : k ≥ m}, there exists an m, so

that sup{ |ak+1||ak| : k ≥ m} ≤ B + δ. This implies that

|am+j | ≤ (B + δ)j |am|.

Hence, A = lim supj→+∞ |am+j |1/(m+j) ≤ lim supj→+∞(B+δ)j

j+m |am|1/(m+j) =B + δ.

Since δ was arbitrary, A ≤ B.

The above result also shows that the root test is a more powerful test,in the following sense. If B ≤ 1, then A ≤ 1. So any time that the ratio

1.7. POWER SERIES 27

test shows that a series converges, the root test would also show that itconverges. However, since A ≤ B it is possible that A < 1 < B in whichcase the series would converge, but you would not see that with the ratio.

The advantage of the ratio test is that often the limit of the ratios iseasier to compute.

Example 1.46. Let ak =

{2k, k even

3k, k oddand set f(x) =

∑+∞k=0 akx

k. Since

lim supk→+∞ |ak|1/k = 3, this power series has radius of convergence 1/3.

However, lim supk→+∞|ak+1||ak| = +∞.

Thus, the ratio test only allows you to deduce that the series convergesfor x = 0, while we know that it converges for any |x| < 1/3.

Example 1.47 (The Exponential Series). Let ak = 1k! where we set 0! = 1.

Then B = lim supk→+∞ak+1

ak= 0. Hence by Proposition 1.45, A = 0, too

and the series∑+∞

k=0xk

k! has radius of convergence R = +∞. The functionthat it converges to is denoted ex or exp(x). On every interval [−r,+r] wehave that the partial sums of this series converge uniformly to ex.

The number

e =+∞∑k=0

1

k!

plays an important role in mathematics, so it is often useful to know howaccurately the partial sums of the series approximate e.

Proposition 1.48. Let sn =∑n

k=01k! , then 0 < e− sn < 1

(n+1)!n+2n+1 .

Proof. We have that

e−sn =1

(n+ 1)!+

1

(n+ 2)!+· · · = 1

(n+ 1)![1+

1

n+ 2+

1

(n+ 2)(n+ 3)+· · · ] <

1

(n+ 1)![1 +

1

(n+ 2)+

1

(n+ 2)2+ · · · ] =

1

(n+ 1)!

n+ 2

n+ 1.

Theorem 1.49. The number e is irrational.

Proof. Suppose instead that e = pq , with p, q positive integers. Then we

have that 0 < (q!)[e − sq] < q!(q+1)!

q+2q+1 = q+2

(q+1)2< 1. Since e = p

q , we have

that (q!)e is an integer. Also, (q!)sq =∑q

k=0q!k! is a sum of integers. Hence,

(q!)[e− sq] is an integer strictly between 0 and 1, contradiction.


Problem 1.50. Prove that on the interval (−1/3,+1/3) the power seriesof Example 1.46 converges to the function

1

1− 4x2+

3x

1− 9x2.

Problem 1.51. Prove that if ak ∈ R and∑+∞

k=0 ak converges absolutely,then

∑+∞k=0 akx

k converges uniformly on [−1,+1].

Problem 1.52. Prove that if ak ∈ R and∑+∞

k=0 ak converges, then∑+∞

k=0 akxk

converges uniformly on [−r,+1] for any 0 ≤ r < 1.

1.8 Operations on Power Series

In this section we examine how power series behave under sums, products,differentiation and integration.

Lemma 1.53. For any 0 < A we have that limk→+∞A1/k = 1 and limk→+∞(k+

1)1/k = 1.

Proof. We first prove the second statement. Write (k + 1)1/k = 1 + bk for

some bk > 0. Then we have that k + 1 = (1 + bk)k = 1 + kbk + k2

2 b2k + ...,

where all the remaining terms are non-negative. Hence, 1 + k2

2 b2k ≤ 1 + k

and it follows that b2k ≤2k . Thus, b2k → 0 and bk → 0, from which it follows

that (k + 1)1/k → 1.

Now let 1 ≤ A, then for k large enough we have that A ≤ k + 1 andhence, 1 ≤ A1/k ≤ (k+ 1)1/k. Thus, by the “Squeeze Theorem” and the factlimk(k+1)1/k = 1, we have that limk A

1/k = 1. Now if 0 < A < 1, then 1A ≥ 1

and hence 1 = limk(1/A)1/k = limk1

A1/k . Hence, limk A1/k = 1

1 = 1.

Theorem 1.54. Let the power series f(x) =∑+∞

k=0 ak(x− x0)k have radiusof convergence R > 0. Then

1. the power series g(x) =∑+∞

k=0 ak+1(k + 1)(x− x0)k has radius of con-vergence at least R,

2. the function f is differentiable on the interval |x−x0| < R, and f ′(x) =g(x),

3. the power series F (x) =∑+∞

k=1ak−1

k (x−x0)k has radius of convergenceat least R and F ′(x) = f(x).

1.8. OPERATIONS ON POWER SERIES 29

Proof. Let A = lim supk→+∞ |ak|1/k. We only consider the case that 0 <A < +∞, and leave the other cases to the reader. Given any δ > 0, we maychoose K large enough so that sup{|ak|1/k : k ≥ K} ≤ A+ δ. We have thatsup{|ak+1(k + 1)|1/k : k ≥ K} = sup{[|ak+1|1/(k+1)](k+1)/k(k + 1)1/k : k ≥K} ≤ sup{(A + δ)(k+1)/k(k + 1)1/k : k ≥ K}. Now by the above lemma wemay choose K large enough so that for any k ≥ K, the above product is asclose to A+ δ as we may choose. Since δ > 0, was arbitrary, we have that

lim supk→+∞

|ak+1(k + 1)|1/k ≤ lim supk→+∞

|ak|1/k,

and so the power series for g has a greater radius of convergence than f.

The second statement follows from the first by considering the partialsums of sn(x) =

∑nk=0 ak(x− x0)k of f. On any given closed interval [x0 −

r, x0 + r], with r < R, these functions converge uniformly to f and theirderivatives, s′n are the partial sums of g which converge uniformly to g.Hence, applying Theorem 1.20, we have that f ′ = g.

The proof that the power series for F has radius of convergence at leastR is similar and then the fact that F ′ = f follows by applying the first tworesults to the power series for F.

Corollary 1.55. Let f(x) =∑+∞

k=0 ak(x − x0)k have radius of convergenceR. Then on |x−x0| < R, f is infinitely differentiable, the derivatives of f canbe obtained by formally differentiating the power series for f term-by-termand these new power series will have radius of convergence R.

Proof. By the previous theorem, each derivative will have radius of conver-gence at least R and so it can again be differentiated. The only thing thatneeds to be checked is that they all have radius of convergence exactly R.To see this suppose that the power series for f ′ had radius of convergenceR1 > R. Then by the third part of the theorem, the integral of f ′ wouldalso have radius of convergence at least R1. But this function differs from fby at most a constant and so f would have radius of convergence R1. Con-tradiction. Hence the radius of convergence of every derivative has exactlythe same radius of convergence as f.

The following is elementary so we omit its proof.

Proposition 1.56. Let f(x) =∑+∞

k=0 ak(x− x0)k and g(x) =∑+∞

k=0 bk(x−x0)

k both have radius of convergence at least R. Then∑+∞

k=0(ak+bk)(x−x0)khas radius of convergence at least R and converges to f(x) + g(x) on thisdomain.


Products of power series are more interesting. Note that if we have twopolynomials f(x) =

∑Nk=0 akx

k and g(x) =∑M

k=0 bkxk and we think of them

as power series by setting all of their higher order coefficients equal to 0,then gathering terms we see that

f(x)g(x) =∑k=0

ckxx

where c0 = a0b0, c1 = a0b1 + a1b0, and cn =∑n

k=0 akbn−k. This motivatesthe following results:

Theorem 1.57 (Cauchy Product Theorem). Assume that∑+∞

k=0 ak con-verges absolutely to A and

∑+∞k=0 bk converges absolutely to B, and set cn =∑n

k=0 akbn−k. Then∑+∞

k=0 ck converges absolutely to AB.

Proof. Set An =∑n

k=0 ak, Bn =∑n

k=0 bk and Cn =∑n

k=0 ck. Also letP =

∑+∞k=0 |ak| and let Q =

∑+∞k=0 |bk|.

Given a finite set S ⊆ N × N, we shall adopt the notation,∑

S akbj todenote the sum over all pairs (k, j) that belong to S.

First to see the absolute convergence, note that

N∑n=0

|cn| =N∑n=0

|n∑k=0

akbn−k| ≤

N∑n=0

n∑k=0

|ak||bn−k| =∑SN

|ak||bj |,

where SN = {(k, j) : 0 ≤ k + j ≤ N}.Now since SN ⊆ RN = {(k, j) : 0 ≤ k ≤ N, 0 ≤ j ≤ N}, we have that

N∑n=0

|cn| ≤∑RN

|ak||bj | = (

N∑k=0

|ak|)(N∑j=0

|bj |).

Thus, for every N, we have that

N∑n=0

|cn| ≤ (

+∞∑k=0

|ak|)(+∞∑j=0

|bj |) = PQ

and so∑cn is absolutely convergent.

Next, note that ANBN −CN =∑

RN\SNakbj . Since limN ANBN = AB,

to prove that∑+∞

n=0 cn = AB, it will be enough to show that limN (ANBN −CN ) = 0.


To this end given ε > 0, choose L so that∑+∞

j=L |bj | < ε/2P and∑+∞k=L |ak| < ε/2Q.

Now if N ≥ 2L, then RN\SN ⊆ {(k, j) : 0 ≤ k ≤ N,L ≤ j ≤ N} ∪{(k, j) : L ≤ k ≤ N, 0 ≤ j ≤ N}. Hence,

|ANBN − CN | ≤∑

RN\SN

|ak||bj | ≤

N∑k=0

N∑j=L

|ak||bj |+N∑k=L

N∑j=0

|ak||bj | <

P (ε/2P ) +Q(ε/2Q) = ε.

This inequality shows that ANBN −CN → 0 as N → +∞ and so the proofis complete.

As an application of this result we prove a familiar fact about the expo-nential function.

Proposition 1.58. Let a, b ∈ R. Then exp(a) · exp(b) = exp(a+ b).

Proof. Let ak = ak

k! and let bk = bk

k! so that exp(a) =∑+∞

k=0 ak, exp(b) =∑+∞k=0 bk and both series converge absolutely. Thus, by the Cauchy Product

theorem, if we set cn =∑n

k=0 akbn−k, then exp(a)exp(b) =∑+∞

n=0 cn.

However, using the binomial formula,

cn =1

n!

n∑k=0

n!

k!(n− k)!akbn−k =

1

n!

n∑k=0

(nk

)akbn−k =

1

n!(a+ b)n.

Hence,∑+∞

n=0 cn =∑+∞

n=0(a+b)n

n! = exp(a+ b).

We can now prove our main theorem about products of power series.

Theorem 1.59. Let f(x) =∑+∞

k=0 ak(x − x0)k and g(x) =

∑+∞k=0 bk(x −

x0)k be two power series both with radius of convergence at least R and set

cn =∑n

k=0 akbn−k. Then the power series h(x) =∑+∞

k=0 ck(x − x0)k has

radius of convergence at least R and for any x, |x − x0| < R, we have thath(x) = f(x)g(x).

Proof. For each fixed x, with |x−x0| < R, we have that both series of num-bers converge absolutely. Hence, by the above result their Cauchy productconverges absolutely. But when we fix x, the k-th term of each series is


ak = ak(x− x0)k and bk = bk(x− x0)k. Thus, the n-th term of the Cauchyproduct is

cn =

n∑k=0

ak bn−k =

n∑k=0

(ak(x− x0)k)(bn−k(x− xx)n−k) = cn(x− x0)n.

Hence, by the Cauchy product theorem for each x with |x − x)| < R,

the series∑+∞

n=0 cn(x − x0)n converges absolutely to the number f(x)g(x).Thus, the product formula holds and by Corollary this implies that the series∑+∞

n=0 cn(x− x0)n has radius of convergence at least R.

Corollary 1.60. Let cn =∑n

k=0 akbn−k, then

lim supn|cn|1/n ≤ max{lim sup

n|an|1/n, lim sup

n|bn|1/n}.

Proof. We only do the case where 0 < A < +∞ and 0 < B < +∞. Let A =lim supn |an|1/n and B = lim supn |bn|1/n. Then the power series

∑+∞k=0 akx

k

and∑+∞

k=0 bkxk both have radius of convergence R = min{1/A, 1/B}, and

hence the series∑+∞

k=0 ckxk converges absolutely for |x| < R. By the root test

this implies that lim supn |cnxn|1/n < 1, for |x| < R.Hence, lim supn |cn|1/n ≤1/R = max{A,B}.

The following result tells us a great deal about convergence of a powerseries at an endpoint of its interval of convergence.

Theorem 1.61. Let f(x) =∑+∞

n=0 an(x − x0)n have radius of convergenceat least R. If s =

∑+∞n=0 anR

n converges, then

limx→(x0+R)−

f(x) = s.

Thus, if one sets f(x0 +R) = s, then f is defined and continuous on theinterval (x0 −R, x0 +R].

The case for the other endpoint of the interval is similar and is left tothe exercises.

Proof. The continuity statement follows from the limit result. We only provethe case R = 1 and x0 = 0, the general case follows easily from this case bysubstitution. Thus, we must prove that limx→1− f(x) = s.

For n ≥ 0, let sn =∑n

k=0 ak denote the partial sums of the series and forconvenience of notation set s−1 = 0. Then we will have that an = sn− sn−1for n ≥ 0.


Since the sequence of partial sums is convergent, it is bounded, say,|sn| ≤ M and we have lim supn |sn|1/n ≤ lim supnM

1/n = 1. Thus, thepower series g(x) =

∑+∞n=0 snx

n has radius of convergence at least 1.We have

K∑n=0

anxn =

K∑n=0

(sn − sn−1)xn =

K∑n=0

snxn −

K∑n=1

sn−1xn =

K∑n=0

snxn −

K−1∑n=0

snxn+1 = sKx

K + (1− x)

K−1∑n=0

snxn.

Thus, for |x| < 1, we have

f(x) = limK→+∞

K∑n=0

anxn = lim

K→+∞sKx

K + (1− x)K−1∑n=0

snxn = (1− x)g(x),

since sKxK → 0, for |x| < 1.

Note that for |x| < 1, we have (1− x)∑+∞

n=0 xn = 1.

Given ε > 0, choose N so that n > N implies that |s − sn| < ε/2. LetCN =

∑Nn=0 |s− sn|, and set δ = ε

2CN. Then for 1− δ < x < 1, we have that

|f(x)− s| = |(1− x)+∞∑n=0

snxn − s(1− x)

+∞∑n=0

xn| =

|1− x| · |+∞∑n=0

(sn − s)xn| ≤ |1− x|N∑n=0

|sn − s|xn + |1− x|+∞∑

n=N+1

ε

2xn ≤

δ

N∑n=0

|s− sn|+ |1− x|ε

2

xN+1

1− x≤ δCN +

ε

2xN+1 < ε.

This proves that limx→1− f(x) = s, which completes the proof.

Problem 1.62. Use the power series for 1(1−x) to find a power series that

converges for |x| < 1 to 1(1−x)2 , two different ways. First, using differentia-

tion and secondly using the Cauchy product. Cite theorems to justify yourcalculations.

Problem 1.63. Use the fact that (ln(x))′ = 1x , differentiation and the

formula, 1x = 1

1−(1−x) , to find a power series for ln(x) that converges for

|x− 1| < 1. Justify your calculations with theorems.


Problem 1.64. Give a careful proof that the series f(x) =∑+∞

k=0(−1)k(2k)! x

2k

and g(x) =∑+∞

k=0(−1)k(2k+1)!x

2k+1 both have radius of convergence +∞.

Problem 1.65. Let f, g be the functions given by the above power series.Prove that f ′(x) = −g(x), that g′(x) = f(x) and that f2(x) + g2(x) = 1 forevery x ∈ R.

Problem 1.66. Complete the proof of Theorem 1.61 by showing that thecase R = 1 and x0 = 0 implies the general case.

Problem 1.67. Let f(x) =∑+∞

n=0 an(x − x0)n be a power series with ra-dius of convergence at least R. Prove that if s =

∑+∞n=0 an(−R)n converges,

then limx→(x0−R)+ f(x) = s. Conclude that setting f(x0 − R) = s defines afunctions that is continuous on [x0 −R, x0 +R).

1.9 Taylor Series

In this section we take a careful look at Taylor polynomials and convergenceof Taylor series.

When f is differentiable on an interval and the function f ′ is also differ-entiable, then we write f ′′ for the derivative of f ′. This notation becomestedious for more than a couple of derivatives, so we adopt the notation, f (n)

for the function that is the result of performing the derivative operation ntimes. In particular, f (1) = f ′ and f (0) = f.

We will say that a function is differentiable on a closed interval [a, b]provided that it is differentiable at each point in (a, b) and that at the twoendpoints, we have the one-sided limits exist.

Recall that if a function has a derivative, then it is continuous. So whenwe say that f is n-times differentiable on an interval [a, b] this means thatf, f (1), ..., f (n−1) exist and are continuous on [a, b] and that the derivativeof the function f (n−1) exists at each point of [a, b], but we make no claimsabout its continuity.

Theorem 1.68 (Taylor’s Theorem). Let f : [a, b] → R be (n − 1)-timesdifferentiable on [a, b], let f (n) exist on (a, b), let x0 ∈ [a, b], and set

P (t) =

n−1∑k=0

f (k)(x0)

k!(t− x0)k.

Then for any x ∈ [a, b], x 6= x0, there exists c between x and x0 such that

f(x) = P (x) + f (n)(c)n! (x− x0)n.

1.9. TAYLOR SERIES 35

Note that when n = 1 this is the Mean Value Theorem. The requirementthat x 6= x0 guarantees that f (n) exists for every t between x and x0. Thepolynomial P appearing in the statement of the theorem is called the (n-1)-th Taylor polynomial centered at x0.

Proof. Set M = f(x)−P (x)(x−x0)n so that we need to show that there exists c be-

tween x and x0 with n!M = f (n)(c). Let

g(t) = f(t)− P (t)−M(t− x0)n, a ≤ t ≤ b,

so that g(n)(t) = f (n)(t) − n!M,a < t < b. The proof wil be complete if weshow that there is c between x and x0 with g(n)(c) = 0.

Since P (k)(x0) = f (k)(x0), 0 ≤ k ≤ n− 1, we have that g(k)(x0) = 0, 0 ≤k ≤ n−1. Now, g(x0) = g(x) = 0, so by the MVT there exists x1 between x0and x with g′(x1) = 0. Thus, g′(x0) = g′(x1) = 0 and so applying the MVTagain, there exists x2 between x0 and x1 with g(2)(x2) = 0. Continuing inthis fashion we obtain points xk with g(k)(xk) = 0, for 1 ≤ k ≤ n. Settingc = xn completes the proof.

Corollary 1.69. Let 0 < R ≤ +∞ and let f be a real-valued function thatis infinitely differentiable for |x− x0| < R. If for each r < R, we have that

limn→+∞

(sup{|f(n)(t)|n!

: |t− x0| ≤ r})rn = 0,

then the Taylor series+∞∑n=0

f (k)(x0)

k!(x− x0)k

has radius of convergence at least R and converges to f(x) for |x−x0| < R.

Proof. Given any x, with |x − x0| < R, pick r < R with |x − x0| < r. ByTaylors Theorem,

|f(x)−n−1∑k=0

f (k)(x0)

k!(x− x0)k| ≤ sup{f

(n)(t)

n!: |t− x0| ≤ r}rn,

so by the above estimate, the partial sums of the series converge to f(x).

The following famous example is a function that is infinitely differen-tiable, its Taylor series centered at 0 has infinite radius of convergence, yetthe Taylor series is only equal to the function at the point 0. To avoid toomany calculations, we only sketch the details.


We defined a function f : R→ R by

f(x) =

{e−1/x

2, x 6= 0

0, x = 0.

Proposition 1.70. The function f is infinitely differentiable on R, f (n)(0) =0 for every n, the Taylor series for f centered at 0 has infinite radius of con-vergence, but it converges to f(x) only for x = 0.

Proof. Once we verify the formula for the derivative, we will have that all thecoefficients of the Taylor series are 0’s and hence the Taylor series convergeseverywhere to the function that is identically 0. But f(x) = 0 only whenx = 0. Thus, all that remains is to verify the formula for the derivatives.

We compute f ′(0) = limx→0e−1/x2−0x−0 = limx→0

1/x

e1/x2 . If we substitute

y = 1/x2 then as x → 0, we have y → +∞. Using the series expression forey we see that for y ≥ 0, we have yn+1/(n + 1)! ≤ ey. From this it follows

that yn

ey ≤(n+1)!y and so limy→+∞

yn

ey = 0 for any n ∈ N. Consequently,

limy→+∞yr

ey = 0 for any real number r.

This shows that f ′(0) = limx→0±√yey = 0.

To compute higher derivatives, note that for x 6= 0, f ′(x) = 2x−3e−1/x2

=

2±y3/2

ey . Hence,

f (2)(0) = limx→0

f ′(x)− 0

x− 0= lim

x→02y2

ey= 0.

The remaining calculations are similar. Inductively, we show that f (n)(0) =0 and that for x 6= 0, f (n)(x) = p(1/x)e−1/x

2where p is some polynomial.

Consequently,

f (n+1)(0) = limx→0

±√yp(±√y)

ey= 0.

Problem 1.71. Use Corollary 1.59 to prove that the Taylor series centeredat 0 for cos(x) and sin(x) converge to these functions on the whole real lineand show that these Taylor series are the power series f and g of Problem1.64.

Earlier, we found a power series in (x − 1) that converged to ln(x) for|x − 1| < 1. In the following problems we will see that this is indeed theTaylor series for ln(x) centered at 1, and see what the above results sayabout convergence of this series.

1.10. POLYNOMIAL APPROXIMATION 37

Problem 1.72. Compute the Taylor series for ln(x) centered at 1 and useCorollary 1.59 to prove that the Taylor series centered at 1 for ln(x) con-verges to ln(x) for |x− 1| < 1/2.

Problem 1.73. Use the estimate in Taylor’s theorem to prove that for 1 <x < 2, the Taylor series for ln(x) centered at 1 converges to ln(x).

Problem 1.74. Find the Taylor series for arctan(x) = tan−1(x) centeredat 0 and prove that it converges to arctan(x) on (−1,+1). Use the fact thatarctan(1/

√3) = π/6 to give an infinite series representation for π.

Problem 1.75. Use Theorem 1.61, the fact that arctan(x) is continuous at+1 and that arctan(+1) = π/4 to give another infinite series representationfor π.

1.10 Polynomial Approximation

In this section we prove that every continuous function on an interval canbe aproximated uniformly by a polynomial. First we need a prelimary cal-culation.

Let Qn(x) = cn(1 − x2)n where cn is chosen so that∫ +1−1 Qn(x)dx = 1.

Note that g(x) = (1−x2)n− 1 +nx2 ≥ 0 for 0 ≤ x ≤ 1. To see this differen-tiate, check that the derivative is positive on this interval and is positive at

x = 0. Since c−1n =∫ +1−1 (1−x2)ndx = 2

∫ 10 (1−x2)ndx ≥ 2

∫√1/n0 (1−x2)ndx ≥

2∫√1/n0 (1−nx2)dx = 4

3√n

we get that cn ≤ 3√n

4 . Also, since Qn is decreasing

on [0,1], we have that for 0 < δ ≤ |x| ≤ 1,

Qn(x) ≤√n(1− x2)n ≤

√n(1− δ2)n → 0.

Thus, Qnu−→ 0 on δ ≤ |x| ≤ 1.

Theorem 1.76 (Weierstrass’ Approximation Theorem). If f is a continu-ous function on [a,b], then there exists a sequence of polynomials pn, suchthat pn

u−→ f on [a,b].

Proof. Suppose we prove the theorem for [0,1] and f is continuous on [a,b].Setting f(t) = f(a+(b−a)t), we obtain a continuous function, which we canapproximate uniformly by polynomials pn on [0,1]. Setting pn(x) = pn(x−ab−a )we obtain a sequence of polynomials that approximate f uniformly. Thus,it will be sufficient to consider the case of [0,1].


If we let g(x) = f(x) − f(0) − xf(1), then g(0) = g(1) = 0 and f isequal to g plus a polynomial. Thus, if we can approximate g by a sequenceof polynomials, then we can approximate f by a sequence of polynomials.Hence, it will be enough to prove the theorem under the extra assumptionthat f(0) = f(1) = 0.

Since f(0 = f(1) = 0, if we set f(x) = 0 for x < 0 and for x > 1, thenthis extends f to a continuous function on the whole real line.

For 0 ≤ x ≤ 1, set

pn(x) =

∫ 1−x

−xf(x+ t)Qn(t)dt =

∫ 1

0f(s)Qn(s− x)ds,

where s = x+ t, which shows that pn is a polynomial. Since f is uniformlycontinuous on [0,1], given ε > 0, we may choose δ > 0, so that |x − y| < δ,0 ≤ x, y ≤ 1, implies that |f(x) − f(y)| < ε/2. Because f is identically 0outside of this interval, this same ε-δ relation holds for any x, y ∈ R.

Let M = sup{|f(t)| : 0 ≤ t ≤ 1} and recall that Qn(x) ≥ 0. Usingthe extended values of f we have that pn(x) =

∫ 1−x−x f(x + t)Qn(t)dt =∫ +1

−1 f(x+ t)Qn(t)dt. Hence,

|pn(x)−f(x)| = |∫ +1

−1(f(x+t)−f(x))Qn(t)dt| ≤

∫ +1

−1|f(x+t)−f(x)|Qn(t)dt ≤

2M

∫ −δ−1

Qn(t)dt+

∫ +δ

−δε/2Qn(t)dt+ 2M

∫ 1

+δQn(t)dt.

The middle of these three integrals is less than ε/2 and since Qn tendsuniformly to 0 on the intervals of the first and third integrals, we may pickN so that for n > N, the other two integrals are at most ε/(4M). Thus, forn > N, we will have that |pn(x)− f(x)| < ε for all 0 ≤ x ≤ 1.

Corollary 1.77. For any A > 0, there is a sequence of polynomials {pn}with pn(0) = 0 that converge uniformly on [−A,+A] to |x|.

Proof. By Weierstrass’ theorem there is a sequence of polynomials qn suchthat qn

u−→ f on [−A,+A] where f(x) = |x|. Then we have that qn(0) →f(0) = 0. Let pn(t) = qn(t)− qn(0), so that pn(0) = 0.

Given ε > 0, there is an N such that n > N implies |qn(t)− f(t)| < ε/2for all −A ≤ t ≤ +A. Then we have that |pn(t) − f(t)| = |qn(t) − qn(0) −f(t) + f(0)| ≤ |qn(t) − f(t)| + |qn(0) − f(0)| < ε/2 + ε/2 = ε for all n > N

and all −A ≤ t ≤ +A. Hence, pntextu−−−→ .

1.10. POLYNOMIAL APPROXIMATION 39

The above proof is an example of a “convolution” proof. Given contin-uous functions f, g on [-1,1], we can set

f ∗ g(x) =

∫ 1

0f(s)g(s− x)ds.

This behaves like a product in the sense that (f1 + f2) ∗ g = f1 ∗ g + f2 ∗ g,f∗(g1+g2) = f∗g1+f∗g2 and for any constant c, (cf)∗g = f∗(cg) = c(f∗g).In the above proof we showed that,

f ∗Qnu−→ f.

Thus, the functions Qn behave “approximately” as an identity for the “con-volution product”. The concepts of convolution and approximate identities,play an important role in signal processing and harmonic analysis.

There is another proof of the Weierstrass approximation theorem dueto Bernstein that gives the approximating polynomials by “sampling” thefunction f. Although we will not prove it here, we wish to state the theorem.This type of “sampling theorem” is also important in signal processing andin the area known as “approximation theory”.

Definition 1.78. Given a function f : [0, 1]→ R then the n-th Bernsteinpolynomial of f is the polynomial,

Bn(x) =n∑k=0

f(k

n)

(nk

)xk(1− x)n−k.

Theorem 1.79 (Bernstein’s Approximation Theorem). If f : [0, 1] → R iscontinuous, then Bn

u−→ f.

Problem 1.80. Let f be continuous on [a,b] and let M = sup{|f(t)| : a ≤t ≤ b}. Prove that there exists a sequence of polynomials {pn} that convergesuniformly to f and satisfy sup{|pn(t)| : a ≤ t ≤ b} ≤M, for all n.

Problem 1.81. Let f be continuous on [a,b] and let c be a point, a ≤ c ≤ b.Prove that there exists a sequence of polynomials {pn} satisfying pn(c) = f(c)that converges uniformly to f on [a,b].

Problem 1.82. Use induction and the previous problem to prove that if fis continuous on [a,b] and {x1, ..., xK} is a finite collection of points in [a,b],then there exists a sequence of polynomials {pn} satisfying pn(xi) = f(xi)for 1 ≤ i ≤ K that converges uniformly to f.


1.11 The Stone-Weierstrass Theorem

The Stone-Weierstrass theorem is a generalization of Weierstrass’ theoremthat applies to more general settings. We shall actually need this generalform later for the study of Fourier series. first some definitions.

Definition 1.83. Let E be a set and let A be a collection of functions fromE to R. We call A an algebra of functions provided that

• f, g ∈ A, implies that f + g ∈ A,

• f, g ∈ A, implies that fg ∈ A,

• f ∈ A and r ∈ R implies that rf ∈ A.

Notice that the first and third properties are the defining properties of avector space. Some examples of algebras of functions that we know are thecontnuous real-valued functions on a metric space, the Riemann integrablefunctions on an interval.

Also, the set of polynomials satisfies the above three properties, so theymay be regarded as an algebra of functions on any subset E ⊆ R.

Definition 1.84. A collection A of real-valued functions on a set E is saidto separate points, provided that given any x1, x2 ∈ E, with x1 6= x2, thereis f ∈ A with f(x1) 6= f(x2). the collection is said to vanish at no point,provided that for each x1 ∈ E, there is f ∈ A with f(x1) 6= 0.

Proposition 1.85. Let (X, d) be a metric space and let C(X) denote the setof continuous functions from X to R. Then C(X) is an algebra of functionsthat separates points and vanishes at no point.

Proof. We have already noticed that it is an algebra of function, since sumsand products of continuous functions are continuous. Also, recall that if wefix any point x1 ∈ X, then we proved last semester that f(x) = d(x, x1)is continuous. Hence, if x2 6= x1, then f(x2) 6= 0 while f(x1) = 0. Thus,f(x1) 6= f(x2), and so C(X) separates points. Finally, if we set g(x) =d(x, x1) + 1, then g(x1) 6= 0. So C(X) vanishes at no point.

Also since every continuous function on [a,b] is Riemann integrable on[a,b], the algebra of Riemann integrable functions also separates points andvanishes at no point.

Finally, by just considering first degree polynomials, we can see that thealgebra of polynomials separates points and vanishes at no point.

1.11. THE STONE-WEIERSTRASS THEOREM 41

For some examples that fail these properties. Suppose that we considerthe set of polynomials in x2, then p(−a) = p(+a) and so thes polynomials donot separate points when viewed as functions on [−1.+ 1], but they vanishat no point. Also if we consider the polynomials with constant term equalto 0, then these are an algebra of functions on [−1,+1], they separate pointson [−1,+1], but every function in this set vanishes at 0.

Here is the main theorem:

Theorem 1.86 (Stone-Weierstrass Theorem). Let (K, d) be a compact met-ric space and let A be an algebra of continuous real valued functions on K. IfA separates points on K and vanishes at no point, then given any f ∈ C(K)there is a sequence of functions {fn} in A, such that fn

u−→ f.

When every function in C(K) is the uniform limit of a sequence of func-tions in A, then we say that A is uniformly dense in C(K). Thus, theStone-Weierstrass Theorem is often summarized as saying that if an algebraof real-valued functions on a compact metric space, separates points andvanishes at no point, then it is uniformly dense in C(K).

This theorem is also true for any compact, Hausdorff topological space(aconcept that we haven’t yet encountered). There is also a version forcomplex-valued functions. The only extra condition that is needed in thecomplex case is that if f ∈ A, then f ∈ A.

To prove this result, it will be convenient to first prove some lemmas.

Lemma 1.87. Let A be an algebra of functions on E that separates pointsand vanishes at no point. If x1 6= x2 are points in E and c1, c2 ∈ R, thenthere exists f ∈ A such that f(x1) = c1 and f(x2) = c2.

Proof. Since A separates points there exists g ∈ A with g(x1) 6= g(x2). SinceA vanishes at no point, there exists hi ∈ A, i = 1, 2, with hi(xi) 6= 0, i = 1, 2.

Set

f(x) = c1g(x)− g(x2)

g(x1)− g(x2)

h1(x)

h1(x1)+ c2

g(x)− g(x1)

g(x2)− g(x1)

h2(x)

h2(x2),

then f ∈ A and f(xi) = ci, i = 1, 2.

Lemma 1.88. Let A be an algebra of functions on a set E and let B bethe set of functions that are uniform limits of functions in A. Then B is analgebra of functions on E and if {gn} is a sequence in B and gn

u−→ g, theng ∈ B.


Proof. If f, g ∈ B and r ∈ R, then there are sequences {fn} and {gn} in Awith fn

u−→ f and gnu−→ g. We have that fn + gn

f−→ +g, so that f + g ∈ B.Similarly, fngN

u−→ fg and rfnu−→ rf, so fg ∈ B and rf ∈ B. Hence B is an

algebra.Finally, if gn ∈ B and gn

u−→ g, then we may pick fn ∈ A, with sup{|gn(t)−fn(t)| : t ∈ E} < 1/n. It then follows that fn

u−→ h and so g ∈ B.

We are now prepared to prove the Stone-Weierstrass Theorem.

Proof. For B as defined in the last Lemma, we need to prove that B = C(K).Let f ∈ A. Since f : K → R is continuous and K is compact, there is an

A > 0, so that −A ≤ f(x) ≤ +A. By Corollary 1.66 there is a sequence ofpolynomials {pn} so that pn

u−→ |t| on [−A,+A] and pn(0) = 0. Since thesepolynomials have no constant term and since A is an algebra, it follows thatpn(f) ∈ A.

Also for x ∈ K, we have |pn(f(x))−|f(x)|| ≤ sup{|pn(t)−|t|| : −A ≤ t ≤+A}. Hence, pn(f)

u−→ |f | and we have that |f | ∈ B. By the above lemma, iff ∈ B, then |f | ∈ B.

Given any two functions f, g on X, we let min{f, g} be the functiondefined by min{f, g}(x) = min{f(x), g(x)}. We define max{f, g} similarly.

Since for any two real numbers, min{a, b} = a+b2 −

|a−b|2 and max{a, b} =

a+b2 + |a−b|

2 , we see that for f, g ∈ B,

max{f, g} =f + g

2+|f − g|

2∈ B and min{f, g} =

f + g

2− |f − g|

2∈ B.

Note that if f1, f2, f3 ∈ B, then max{f1, f2, f3} = max{f1,max{f2, f3}} ∈B. Thus, by induction if f1, ..., fn ∈ B, then max{f1, ..., fn} ∈ B, and simi-larly, min{f1, ..., fn} ∈ B.

Given f ∈ C(K), a point x ∈ K and ε > 0, we claim that there is afunction gx ∈ B, with gx(x) = f(x) and gx(t) > f(t)− ε.

To see why this is true, first given y 6= x, by the first Lemma, we canfind hy ∈ A with hy(x) = f(x) and hy(y) = f(y). Since hy − f is continuousand (hy − f)(y) = 0, we may find a small ball of radius say δy > 0, so thatwhen d(t, y) < δy, then |hy(t)−f(t)| < ε, which implies that hy(t) > f(t)−εfor t ∈ B(y; δy). Now the set of all balls {B(y; δy) : y ∈ K} is an open coverof K so we may pick a finite subcover, i.e., K = ∪ni=1B(yi; δyi). Since eachhyi ∈ A ⊆ B, we have that gx = max{hy1 , ...hyn} ∈ B. But given any t ∈ K,there is an i so that t ∈ B(yi; δyi), and so gx(t) ≥ hyi(t) > f(t)− ε.

Now in a similar way, since each gx(x) = f(x), we may choose rx > 0,so that d(x, t) < rx implies that |gx(t) − f(t)| < ε and so gx(t) < f(t) + ε.

1.12. FOURIER SERIES 43

Again {B(x; rx) : x ∈ K} is an open cover of K and so we may choose afinite subcover, K = ∪mj=1B(xj ; rxj ).

Set h = min{gx1 , ..., gxn} ∈ B. Given any t ∈ K, there is j so thatt ∈ B(xj , rxj ) and hence, h(t) ≤ gxj (t) < f(t) + ε. On the other hand, theremust be an l, so that h(t) = min{gx1(t), ..., gxn(t)} = gxl(t) > f(t)− ε.

These two inequalities show that we have produced a h ∈ B, with |h(t)−f(t)| < ε for all t ∈ K. Taking ε = 1/n, and the corresponding hn ∈ B,gives us a sequence {hn} in B so that hn

u−→ f. By the second lemma, wecan instead choose fn ∈ A, with fn

u−→ f. This completes the proof of thetheorem.

Recall that when we are given a polynomial p(x1, ..., xn) in n variables,then we regard p as a function on Rn.

Problem 1.89. Let K ⊆ Rn be a compact set and let A be the algebraof polynomials in n variables, regarded as functions on K. Prove that A isuniformly dense in C(K).

Problem 1.90. Let A be an algebra of functions on E that separates pointsand vanishes at no point. Let x1, ..., xn be n distinct points in E and letc1, ..., cn be real numbers. Prove that there is a function in f ∈ A such thatf(xi) = ci, i = 1, ..., n.

Problem 1.91. Let T = {(x, y) ∈ R2 : x2 + y2 = 1}, a compact setand let φ : [−π,+π] → T be the continuous function defined by φ(t) =(cos(t), sin(t)). If h ∈ C(T ) and f(t) = h ◦ φ(t), then f ∈ c([−π,+π]) andf(−π) = f(+π). Prove the converse, that is, prove that if f ∈ C([−π,+π]),with f(−π) = f(+π), then there exists a continuous function h ∈ C(T ),with f = h ◦ φ.[HINT: The proof of this problem does not use the Stone-Weierstrass theorem.]

1.12 Fourier Series

In addition to the Taylor series, another famous way to approximate func-tions is by means of their Fourier series. These series play an importantrole in signal processing and in many applications to finding solutions ofdifferential equations.

Definition 1.92. A trigonometric polynomial is any finite sum of theform f(x) = a0 +

∑Nn=1(ancos(nx) + bnsin(nx)).


It is convenient to write a0 for the constant coefficient since cos(0x) = 1.Also, since sin(0x) = 0, we could write f(x) =

∑Nn=0(ancos(nx)+bnsin(nx))

and nothing would be changed. This sometimes makes it easier in formulas,since we can avoid the “special case” of n = 0.

The idea of Fourier series is to try and represent more general functionsas “trigonometric series” that is as limits of infinite sums of trigonometricpolynomials. The first question that we shall address, is to determine whatthe coefficients should be if one wants to represent a function as such aseries.

If one uses the trigonometric identities,

cos(a)cos(b) = 1/2[cos(a−b)+cos(a+b)] and sin(a)sin(b) = 1/2[cos(a−b)−cos(a+b)],

then one obtains the following integral identities for integers m and n:

•∫ +π−π cos(nx)sin(mx)dx = 0, for all m,n,

•∫ +π−π cos(nx)cos(mx)dx =

{0, m 6= n

π, m = n,

•∫ +π−π sin(nx)sin(mx)dx =

{0, m 6= n

π, m = n.

Because of these formulas, if we have a trig polynomial f as in the aboveDefinition, then we can recover the constants by integration, namely,

• a0 = 1/2π∫ +π−π f(x)1dx,

• an = 1/π∫ +π−π f(x)cos(nx)dx, for 1 ≤ n ≤ N,

• bn = 1/π∫ +π−π f(x)sin(nx)dx, for 1 ≤ n ≤ N.

Moreover, if n > N, then each of these integrals would be 0.Thus, if we “imagine” a more general function f corresponding to an

infinite series of cosine and sine functions, then these formulas tell us thevalues of the coefficients of that series. This leads to the following definition.

Definition 1.93. Let f : [−π,+π] → R be a Riemann integrable function.Then we set

• a0 = 1/2π∫ +π−π f(x)1dx,

• an = 1/π∫ +π−π f(x)cos(nx)dx, for n ∈ N,

1.13. ORTHONORMAL SETS OF FUNCTIONS 45

• bn = 1/π∫ +π−π f(x)sin(nx)dx, for n ∈ N,

and we call these numbers the Fourier coefficients of f . The infiniteseries,

a0 +

+∞∑n=1

(ancos(nx) + bnsin(nx))

is called the Fourier series for f. We set

sN (f, x) = a0 +N∑n=0

(ancos(nx) + bnsin(nx),

and call this the N -th partial sum of the series.

Note that each of the functions in the above formulas is Riemann inte-grable, since the product of Riemann integrable functions is again Riemannintegrable.

One problem that we wish to discuss is to what extent the Fourier seriesconverges to the function f. Note that if we start with a Riemann integrablefunction f and we change its value at a single point x0 to get a new functiong, then g will still be Riemann integrable and all the Fourier coefficients ofg will be equal to those of f. Hence, sN (f, x) = sN (g, x) for all N and x. Soif the Fourier series does converge to a value at the point x0, then there isno way that it could give both f(x0) and g(x0). Thus, in general, we cannotexpect the Fourier series for f to converge pointwise to f.

Problem 1.94. Let h(x) =

{0, −π ≤ x ≤ 0

1, 0 < x ≤ +π. Compute the Fourier coef-

ficients for h and sN (h, x).

Problem 1.95. Let f(x) = |x|. Compute the Fourier coefficients for f andsN (f, x).

1.13 Orthonormal Sets of Functions

Many of the results that we shall need about trigonometric polynomials aretrue because they are an orthonormal set of functions. So we first prove theresults in this more general setting.

Definition 1.96. A set of I of Riemann integrable functions on [a,b] iscalled an orthogonal family provided that for any f, g ∈ I, with f 6= g,∫ b

af(x)g(x)dx = 0.


If, in addition,∫ ba f(x)2dx = 1 for every f ∈ I, then we call the set an

orthonormal family.

By the formulas in the last section, we see that

I = {1, cos(nx), sin(nx) : n ∈ N}

is an orthogonal set of functions on [−π,+π] and that

I = { 1√2π,cos(nx)√

π,sin(nx)√

π: n ∈ N}

is an orthonormal set of functions on [−π,+π].We will see that there are some advantages to working with orthonormal

sets of functions.

Definition 1.97. Let f be a Riemann integrable function on [a,b] and letI = {φα}α∈A where A is some index set be an orthonormal set of functions,then we call

cα =

∫ b

af(x)φα(x)dx

the (generalized) Fourier coefficients of f.

If f(x) = a0 +∑N

n=1(ancos(nx) + bnsin(nx)), then when we use the

orthonormal set of functions, I = { 1√2π, cos(nx)√

π, sin(nx)√

π: n ∈ N}, we see

that the generalized Fourier coefficients of f with respect to this set area0 =

∫ +π−π f(x) 1√

2πdx = a0

√2π, and similarly, an = an

√π, and bn = bn

√π.

Thus,

sN (f ;x) = a0+N∑n=1

(ancos(nx)+bnsin(nx)) = a0(1√2π

)+N∑n=1

(an(cos(nx)√

π)+bn(

sin(nx)√π

)),

and we see that the partial sums of the Fourier series for f are exactly thesums that one obtains by summing the generalized Fourier coefficients withrespect to this orthonormal family.

We will see that orthonormal sets make the problem of finding the bestaproximation to a function in something called the L2-norm easy.

Definition 1.98. Let f be a Riemann integrable function on [a,b], then theL2-norm of f is the number,

‖f‖ =

√∫ b

af(x)2dx.

1.13. ORTHONORMAL SETS OF FUNCTIONS 47

Note that many functions have an L2-norm of 0. For example, any func-tion that is non-zero at only finitely many points. Exactly which functionshave L2-norm of 0 is examined more closely in courses on measure theory.

Theorem 1.99 (Theorem on Best Approximation). Let f be a Riemannintegrable function on [a,b], let {φ1, ..., φN} be a finite set of orthonormal

functions on [a,b], let cn =∫ ba f(x)φn(x)dx be the generalized Fourier co-

efficients of f, set sN (x) =∑N

n=1 cnφn(x), let b1, ..., bN be arbitrary real

numbers and set tN (x) =∑N

n=1 bnφn(x). Then

• ‖f − sN‖ ≤ ‖f − tN‖,

• ‖f − sN‖ = ‖f − tN‖ if and only if cj = bj for all 1 ≤ j ≤ N.

In summary, this theorem says that if one takes the vector space spannedby the functions {φ1, ..., φN} then among all functions in that vector space,the function given by using the generalized Fourier coefficients is the functionthat is closest to f in the L2-norm and it is the unique function in thatsubspace that is closest to f. Applying this theorem to the Fourier series, esee that Sn(f ;x) is the unique function in the(2N + 1)-dimensional vectorspace spanned by {1, cos(nx), sin(nx) : 1 ≤ n ≤ N} that is closest to f inthe L2-norm.

Before proving the theorem, we prove some useful formulas.

Lemma 1.100. Let {φ1, ..., φN} be an orthonormal set of functions on [a,b],and let tN =

∑Nn=1 bnφn, then

‖tN‖2 =

N∑n=1

b2n.

Proof. We have that

‖tN‖2 =

∫ b

atN (x)2 =

N∑i,j=1

∫ b

abibjφi(x)φj(x)dx =

N∑j=1

b2j .

We are now ready to prove the theorem.


Proof. For any real numbers b1, ..., bN we have that

‖f − tN‖2 =

∫ b

a(f(x)2 − 2f(x)tN (x) + tN (x)2)dx =

‖f‖2 − 2N∑n=1

∫ b

af(x)bnφn(x)dx+

N∑n=1

b2n =

‖f‖2 − 2N∑n=1

cnbn +N∑n=1

b2n = ‖f‖2 −N∑n=1

c2n +N∑n=1

(c2n − 2cnbn + b2n) =

‖f‖2 −N∑n=1

c2n +N∑n=1

(cn − bn)2.

Thus, when bn = cn for all n,

‖f − sN‖2 = ‖f‖2 −N∑n=1

c2n,

and for any tN 6= sN ,

‖f − tN‖2 = ‖f − sN‖2 +N∑n=1

(cn − bn)2 > ‖f − sN‖2.

This theorem has two important consequences.

Corollary 1.101 (Bessel’s Inequality). Let {φn : n ∈ N} be an orthonormalset of functions on [a,b], let f be Riemann integrable on [a,b] and let cn −∫ ba f(x)φn(x)dx be the generalized Fourier coefficients of f. Then

+∞∑n=1

|cn|2 ≤ ‖f‖2.

In particular, the series∑+∞

n=1 |cn|2 of squares of Fourier coefficients isalways summable for every Riemann integrable function.

Proof. In the proof of the above theorem, we saw that

N∑n=1

c2n = ‖f‖2 − ‖f − sN‖2 ≤ ‖f‖2.

1.14. FOURIER SERIES, CONTINUED 49

Corollary 1.102 (Parseval’s Equation). Let {φn : n ∈ N} be an orthonor-mal set of functions on [a,b], let f be Riemann integrable on [a,b], let cn −∫ ba f(x)φn(x)dx be the generalized Fourier coefficients of f and let sN (x) =∑Nn=1 cnφn(x). Then limN→+∞ ‖f−sN‖ = 0 if and only if

∑+∞n=1 c

2n = ‖f‖2.

Proof. In the proof of the above theorem we saw that

‖f − sN‖2 = ‖f‖2 −N∑n=1

c2n.

Thus, the left hand side has limit zero if and only if the right hand side haslimit zero.

The statement, ‖f‖2 =∑+∞

n=1 c2n is generally referred to as Parseval’s

identity and the above corollary is often summarized as saying that Parse-val’s identity holds if and only if limN→+∞ ‖f − sN‖ = 0.

1.14 Fourier Series, Continued

We now wish to prove that Parseval’s identity holds for the Fourier series.We will first need some preliminary theorems. These theorems are bothimportant approximation results. The first tells us that in the L2-norm,Riemann integrable functins can be approximated by continuous functions.The second says that 2π-periodic continuous functions can be approximateduniformly by trigonometric polynomials.

Theorem 1.103. Let f be Riemann integrable on [a, b], then for every ε > 0there exists a continuous function g such that

‖f − g‖2 = (

∫ b

a|f(t)− g(t)|2dt)1/2 < ε.

Proof. Let M = sup{|f(t)| : a ≤ t ≤ b} and choose a partition, P = {a =x0 < ... < xn = b} such that

U(f ;P)− L(f ;P) =

n∑i=1

(Mi −mi)(xi − xi−1) <ε2

2M,

where as usual mi = inf{f(t) : xi−1 ≤ t ≤ xi} and Mi = sup{f(t) : xi−1 ≤t ≤ xi}.


Define a function g : [a, b]→ R by setting

g(t) =xi − t

xi − xi−1f(xi−1) +

t− xi−1xi − xi−1

f(xi), for xi−1 ≤ t ≤ xi,

so that g is linear on each subinterval [xi−1, xi] and g(xi) = f(xi) for eachi. Since the formulas for g agree at the endpoint of each subinterval, g iscontinuous on [a, b].

Also, since on each subinterval g is linear and agrees with f at the end-points, we have thatmi ≤ min{f(xi−1), f(xi)} ≤ g(t) ≤ max{f(xi−1), f(xi)} ≤Mi, for xi−1 ≤ t ≤ xi. This implies that |f(t) − g(t)| ≤ Mi − mi, forxi−1 ≤ t ≤ xi.

Hence,

‖f − g‖22 =

n∑i=1

∫ xi

xi−1

|f(t)− g(t)|2dt ≤n∑i=1

∫ xi

xi−1

(Mi −mi)2dt

≤n∑i=1

∫ xi

xi−1

2M(Mi −mi)dt = 2M(U(f : P)− L(f ;P)) < ε2


Before our next result we need a lemma.

Lemma 1.104. Every function of the form cosn(t), sinm(t), and cosk(t)sinj(t),for n,m, k, j ∈ N is a trigonometric polynomial.

Proof. Using the trigonometric identities, cos(at)cos(bt) = 1/2[cos(at +bt) + cos(at − bt)], sin(at)sin(bt) = 1/2[cos(at − bt) − cos(at + bt)] andcos(at)sin(bt) = 1/2[sin(a+ b)− sin(a− b)] we can iteratively reduce prod-ucts of trig functions into sums that are one degree lower.

Theorem 1.105. Let f : [−π,+π]→ R be continuous with f(−π) = f(+π)and let ε > 0. Then there is a trigonometric polynomial g, such that |f(t)−g(t)| < ε for −π ≤ t ≤ +π.

Proof. Let T = {(x, y) : x2+y2 = 1}. By Problem 1.84 There is a continuousfunction h : T → R such that f(t) = h(cos(t), sin(t)). Since the polynomialsin x and y are an algebra of functions on T that separate points and vanishat no point and since T is compact, by the Stone-Weierstrass theorem thereis a polynomial p(x, y) such that |h(x, y)− p(x, y)| < ε for every (x, y) ∈ T.

1.14. FOURIER SERIES, CONTINUED 51

Let g(t) = p(cos(t), sin(t)). By the Lemma, g is a trigonometric polyno-mial and

sup{|f(t)− g(t)| : −π ≤ t ≤ +π} ≤ sup{|h(x, y)− p(x, y)| : x2 + y2 = 1}.

Two important inequalities.

Proposition 1.106. Let f, g be Riemann integrable on [a, b] then

1. Cauchy-Schwarz inequality, |∫ ba f(t)g(t)dt| ≤ ‖f‖2‖g‖2,

2. Minkowski inequality, ‖f + g‖2 ≤ ‖f‖2 + ‖g‖2.

Proof. First check that when a, c ≥ 0 then the polynomial at2 + bt + c ≥ 0for all t if and only if it only has one or zero roots. This is because its graphis a parabola that opens upwards. But it has one or zero roots if and onlyif b2 − 4ac ≤ 0.

Now since

0 ≤∫ b

a(tf(x) + g(x))2dx = ‖f‖22t2 + 2t

∫ b

af(x)g(x)dx+ ‖g‖22,

we have that

4(

∫ b

af(x)g(x)dx)2 ≤ 4‖f‖22‖g‖22

and Schwarz’s inequality follows by taking square roots.To see Minkowski’s inequality, we note that

‖f+g‖22 =

∫ b

a(f(x)+g(x))2dx =

∫ b

af(x)2dx+2

∫ b

af(x)g(x)dx+

∫ b

ag(x)2dx

≤ ‖f‖22 + 2‖f‖2‖g‖2 + ‖g‖22 = (‖f‖2 + ‖g‖2)2,

and the result follows by taking square roots.

We leave the proof of the following lemma to the exercises.

Lemma 1.107. Let g be a continuous function on [a, b] and let ε > 0 begiven. Then there exists a continuous function g1 on [a, b] such g1(a) =g1(b) = 0 and ‖g − g1‖2 < ε.

Now for the main theorem on Fourier series.


Theorem 1.108. Let f be a Riemann integrable function on [−π,+π] andlet sN (f ;x) = a0 +

∑Nn=1(ancos(nx) + bnsin(nx)) denote the partial sums

of its Fourier series. Then:

1. limN→+∞ ‖f − sN‖2 = 0,

2. ‖f‖22 =∫ +π−π f(x)2 = π[2a20 +

∑+∞n=1(a

2n + b2n)].

Proof. Recall that the functions { 1√2π, cos(nx)√

π, sin(nx)√

π: n ≥ 1} form an or-

thonormal family and by the Theorem on Best Approximation,

sN (f ;x) = (√

2πa0)(1√2π

) +N∑n=1

(√πan)(

cos(nx)√π

) + (√πbn)(

sin(nx)√π

) =

a0(1√2π

) +

N∑n=1

(ancos(nx)√

π+ bn

sin(nx)√π

),

is the unique function in the span of these functions that is closest to f.From this fact we see that ‖f − sN‖2 ≥ ‖f − sN+1‖2 for all N, since

as the vector space grows larger the distance to the nearest function mustdecrease.

Now we prove that the limit in (1) converges to zero. Given ε > 0,by theorem 1.96 we may pick a continuous function g on [−π,+π] so that‖f−g‖2 < ε/3. By Lemma 1.100, we may pick a continuous function g1 withg1(−π) = g1(+π) = 0, such that ‖g− g1‖2 < ε/3. Now by Theorem 1.98, wecan find a trigonometric polynomial p such that |g1(t)− p(t)| < ε√

18π.

This last inequality implies that

‖g1 − p‖22 =

∫ +π

−π(g(t)− p(t))2dt <

∫ +π

−π

ε2

18π=ε2

9,

so that ‖g1 − p‖2 < ε/3.Now p is a trigonometric polynomial of some degreeM and by Minkowski,

we have that

‖f − p‖2 = ‖(f − g) + (g − g1) + (g1 − p)‖2 < ε.

By the Theorem on Best Approximation, ‖f − sM‖2 ≤ ‖f − p‖2 < ε andhence for any N ≥M, we have ‖f − sN‖ < ε and so (1) follows.

Now since ‖f − sN‖2 → 0, by Parseval we have that

‖f‖22 = a02 +

+∞∑n=1

(an2 + bn

2) = π(2a20 +

+∞∑n=1

(a2n + b2n),

1.15. EQUICONTINUITY AND THE ARZELA-ASCOLI THEOREM 53

using the identities relating the “hatted” coefficients to their hatless cousins.

Problem 1.109. Prove Lemma 1.100.

Problem 1.110. Use the above theorem together with problems 1.87 and1.88 to give some infinite series for π.

Problem 1.111. Prove that the set of trigonometric polynomials is an al-gebra of functions on [−π,+π] that vanishes at no point. Also prove that if−π ≤ x1 < x2 < +π, then there exists a trigonometric polynomial p withp(x1) 6= p(x2).

1.15 Equicontinuity and the Arzela-Ascoli Theo-rem

The Heine-Borel theorem tells us that every bounded sequence of numbershas a convergent subsequence and that every bounded sequence of vectorsin Rk has a convergent subsequence. This theorem also gave us a character-ization of compact sets in Rk, namely, a set in Rk is compact if and only ifit is closed and bounded.

In this section we consider the corresponding problem for sets of contin-uous real-valued functions whose domain is a compact set. The result is,formally, very similar to the Heine-Borel result with one extra ingredient,the concept of equicontinuity.

Definition 1.112. Let (X, d) and (Y, ρ) be metric spaces. A set F offunctions from X to Y is called equicontinuous provided that for everyε > 0, there is δ > 0, so that for every f ∈ F and every x1, x2 ∈ X withd(x1, x2) < δ, we have ρ(f(x1), f(x2) < ε.

Note that the definition implies that every function in F is uniformlycontinuous and that the same values of ε-δ work for every function in F .

Here is an example of an equicontinuous family. Let X = Y = R, fix anumber M and let F denote the set of all differentiable functions on R withthe property that |f ′(x)| ≤M for all x. Given any ε > 0, if we set δ = ε/M,then when |x1 − x2| < δ, by the Mean Vaule Theorem, we will have that|f(x1)− f(x2)| = |f ′(c)(x1 − x2)| ≤M |x1 − x2| < Mδ = ε.

Definition 1.113. Let E be a set and let F be a set of real-valued functionson E. We say that F is pointwise bounded provided that for each x ∈ E,there is a constant Mx such that for every f ∈ F , |f(x)| ≤Mx.


The following result uses the idea of Cantor’s diagonalization processthat we encountered last semester.

Proposition 1.114. Let E = {xi}i∈N be a countable set and let {fn} bea sequence of real-valued functions on E that is pointwise bounded. Thenthere is a subsequence {fnk

} such that for every x ∈ E, the sequence {fnk(x)}

converges.

Proof. Recall that a subsequence of a convergent sequence still convergesto the same point. Since {fn} is pointwise bounded, the sequence {fn(x1)}of real numbers is bounded and so we may choose a subsequence so that{fnk

(x1)} is convergent(as k → +∞). Now because the sequence(indexedby k), {fnk

(x2)} is bounded we may choose a subsubsequence(indexed bysay j) so that {fnkj

(x2)} converges as j → +∞. This subsubsequence, would

have the property that it converges for the points x1 and x2.

Because the language of “subsubsequences” is a bit confusing, it is bestto remember that a subsequence is an infinite subset S1 ⊆ N and then weare listing the numbers, S1 = {n1 < n2 < ...} in their natural order. Sowhen we speak about a subsubsequence we are just taking an infinite subsetS2 ⊆ S1.

Thus, we see that we can define infinite sets N ⊇ S1 ⊇ S2 ⊇ ... so thatthe set SJ = {n1 < n2 < ...} has the property that {fnk

(xj)} convergesfor j = 1, ..., J. Now if the intersection ∩Sj of all these sets was non-emptyand infinite, then this would define a subsequence with the property that{fnk

(xj)} converged for every j. The problem is that the intersection couldbe empty or finite.

However, using Cantor’s diagonalization idea, we define a subsequence bykeeping the k-th number in Sk. Note that this is a strictly increasing sequenceof numbers. The sequence of numbers that we define in this manner, then hasthe property that {n1, n2, ...} ⊆ S1, {n2, n3, ...} ⊆ S2, and {nk, nk+1, ...} ⊆Sk, for all k. Because the behavior of the first few terms of a sequence doesn’taffect its convergence, we see that this subsequence has the property that{fnk

(xj)} converges as k → +∞ for every j.

We now can state the main thoerem of this section.

Theorem 1.115 (Arzela-Ascoli). Let (K, d) be a compact metric space andlet fn : K → R, n ∈ N be a sequence of continuous real-valued functions onK that is equicontinuous and pointwise bounded. Then there is a subsequence{fnk} and a continuous function f : K → R such that fnk

u−→ f.

1.15. EQUICONTINUITY AND THE ARZELA-ASCOLI THEOREM 55

Proof. Recall that every compact metric space has a countable dense set.Let E = {xj} be a countable dense set in K.

By the previous proposition, we may choose a subsequence {fnk} such

that limk fnk(xj) exists for every j.

We claim that for every x ∈ K, the sequence {fnk(x)} is Cauchy. To see

this, given ε > 0, by equicontinuity, we may pick δ > 0, so that d(x, y) < δimplies for every n that |fn(x)−fn(y)| < ε/3. Now pick j so that d(x, xj) < δand pick I so that i, k > I implies that |fnk

(xj) − fni(xj)| < ε/3. Then fori, k > I we have that

|fnk(x)−fni(x)| ≤ |fnk

(x)−fnk(xj)|+|fnk

(xj)−fni(xj)|+|fni(xj)−fni(x)| < ε.

Now since this sequence is Cauchy and R is complete, we may define afunction f : K → R by setting f(x) = limk fnk

(x).We now claim that {fnk

} converges uniformly to f. To see this, givenε > 0, let δ be as before. Using the fact that K is compact, we may choose{x1, ..., xp} so that K = ∪pj=1B(xj ; δ).

Now for each j = 1, ..., p we may pick Ij so that i, k > Ij implies that|fnk

(xj)− fni(xj)| < ε/3. Let I = max{I1, ..., Ip}.Then for k > I and any x ∈ K, we have that x ∈ B(xj ; δ) for some value

of j, 1 ≤ j ≤ p and hence,

|f(x)− fnk(x)| = lim

i|fni(x)− fnk

(x)| ≤

lim supi|fni(x)− fni(xj)|+ |fni(xj)− fnk

(xj)| ≤ ε/3 + ε/3 < ε.

Corollary 1.116 (Arzela-Ascoli). Let (K, d) be a compact metric space andlet C(K) denote the set of continuous functions from K to R endowed withthe uniform metric γ(f, g) = sup{|f(x) − g(x)| : x ∈ K}. Then F ⊆ C(K)is a compact subset of (C(K), γ) if and only if F is closed, equicontinuousand pointwise bounded.

Proof. Recall that a sequence converges in the metric γ if and only if itconverges uniformly. If F is closed, equicontinuous and pointwise boundedand {fn} is a sequence in F , then by the Arzela-Ascoli theorem, there is asubsequence {fnk

} that converges uniformly, and hence in the metric γ toa continuous function f. Since F is closed, f ∈ F . Thus, F is sequentiallycompact, which we have shown is the same as compact.

Conversely, assume that F is compact, then F is closed. Given ε > 0,since F is compact we may choose finitely many functions in F such that


F ⊆ ∪Nn=1B(fn, ε/3). Since K is compact, each function fn is uniformlycontinuous and so there exists δn > 0 so that d(x, y) < δn implies that|fn(x)− fn(y)| < ε/3. Let δ = min{δ1, ..., δn} > 0.

Given any f ∈ F there is a j so that γ(f, fj) < ε/3. Hence, if d(x, y) < δ,then

|f(x)− f(y)| ≤ |f(x)− fj(x)|+ |fj(x)− fj(y)|+ |fj(y)− f(y)| < ε.

This proves that F is equicontinuous.To see that F is pointwise bounded, let {f1, ..., fn} be chosen as above

for the value ε = 1. Let Mj = sup{|fj(x)| : x ∈ K} which is finite since Kis compact and let M = max{M1, ...,Mn}. Given any f ∈ F there is j sothat γ(f, fj) < 1 and hence, |f(x)| ≤ |f(x) − fj(x)| + |fj(x)| ≤ 1 + Mj ≤M + 1. Thus, the functions in F are pointwise bounded and even uniformlybounded.

Problem 1.117. Let F be a set of continuous functions from R to R thatis equicontinuous. Prove that if sup{|f(0)| : f ∈ F} < +∞, then F ispointwise bounded.

Problem 1.118. Give an example of a metric space (X, d) and an equicon-tinuous set of continuous real-valued functions on X that is pointwise boundedat one point but not pointwise bounded.

Chapter 2

Multivariable DifferentialCalculus

We shall always endow Rn with the Euclidean metric. Given a set S ⊆ Rna function ~f : S → Rm has the form ~f(~x) = (f1(~x), ..., fm(~x)) where wecall the functions fj : S → R the component functions. In this chapterwe study the theory of differentiablity of such functions. We begin withdirectional derivatives.

Definition 2.1. Let S ⊆ Rn, let ~x0 ∈ S be an interior point let ~f : S → Rmand let ~u = (u1, ..., un) ∈ Rn. We say that ~f has a directional derivativein the direction ~u at ~x0 provided that the following vector-valued limitexists,

limt→0

~f( ~x0 + t~u)− ~f( ~x0)

t.

When this limit exists, we denote it by ~f ′~u( ~x0).

Many books make the requirement that ~u is of unit length. We don’tmake that requirement here.

Note that when ~u = ~0, then the numerator in the above limit is always~0, hence the limit always exists and

~f ′~0( ~x0) = ~0.

For this reason we wil generally be interested in the case ~u 6= ~0.

Note that when ~f : S → Rm, then provided that it exists, ~f ′~u( ~x0) ∈ Rm.Also, recall that the limit of a vector-valued function exists if and only if the

57

58 CHAPTER 2. MULTIVARIABLE DIFFERENTIAL CALCULUS

limit of each component function exists. Thus, we see that ~f ′~u( ~x0) exists ifand only if for each j, 1 ≤ j ≤ m, the real limit

bj = limt→0

fj( ~x0 + t~u)− fj( ~x0)t

exists and in this case ~f ′~u( ~x0) = (b1, ..., bm).Let m = 1, so that S ⊆ Rn and f : S → R and let ~x0 = (a1, ..., an). For

i, 1 ≤ i ≤ n, we let ~ei denote the “standard” basis vectors, that is the vectorthat is 1 in the i-th entry and 0 in all other entries. In this case we havethat

f ′~ei( ~x0) = limt→0

f( ~x0 + t~ei)− f( ~x0)

t=

limt→0

f(a1, ..., ai−1, ai + t, ai+1, ..., an)− f(a1, ..., an)

t=

∂f

∂xi( ~x0),

the partial derivative of f in the i-th direction.For this reason, when m ≥ 1, and ~f : S → Rm, we will often use the

notation,

~f ′ ~ei( ~x0) =∂ ~f

∂xi,

and we still call this vector in Rm the partial derivative of ~f in the i-thdirection.

Some examples are in order.

Example 2.2. Let f : R2 → R be defined by

f(x1, x2) =

{x21x2x41+x

22, (x1, x2) 6= (0, 0)

0, (x1, x2) = (0, 0).

Given ~u = (u1, u2) 6= (0, 0), we have that

f ′~u(0, 0) = limt→0

f(tu1, tu2)− f(0, 0)

t= lim

t→0

t3u21u2t4u41+t

2u22− 0

t=

limt→0

u21u2t2u41 + u22

=

{0, u2 = 0u21u2, u2 6= 0

.

Thus, we see that the partial derivative of f exists at (0, 0) for everydirection ~u.

59

However, we will now show that f is not even continuous at (0, 0). Tosee that f is not continuous, consider

limt→0

f(t, t2) = limt→0

t4

t4 + t4= 1/2 6= f(0, 0),

even though limt→0(t, t2) = (0, 0). Hence, f is not continuous at (0, 0).

Thus unlike the one variable case, where existence of the derivative guar-antees continuity, existence of the directional derivatives at a pointin every direction does not even guarantee that the function iscontinuous.

The next example shows that it is possible for a function to be continuousat (0, 0) have both partial derivatives exist, yet the directional derivativesin other directions do not exist.

Example 2.3. Let f : R2 → R be defined by

f(x1, x2) =

{x1 − x2, when x1 ≥ 0 or x2 ≥ 0

0, x1 < 0 and x2 < 0.

It is easily checked that this function is continuous at (0, 0) and that∂f∂x1

(0, 0) = 1 and ∂f∂x2

(0, 0) = −1.If ~u = (a, b) lies in the 2nd or 4th quadrants, i.e., ab < 0, then

f ′~u(0, 0) = limt→0

f(ta, tb)

t= lim

t→0

t(a− b)t

= a− b.

But if ~u = (a, b) with a > 0 and b > 0, then

f ′~u(0, 0) = limt→0

f(ta, tb)

t.

When t → 0−, this numerator is 0 and so this one-sided limit exists and isequal to 0. But

limt→0+

f(ta, tb)

t= a− b,

and so the two one-sided limits are equal if and only if a = b. Thus, fora > 0, b > 0, f ′~u(0, 0) exists only for a = b.

Thus, we see that the existence of both partial derivatives does notguarantee the existence of directional derivatives in all directions. In fact,we see that the set of directions for which directional derivatives exist maybe a very surprising set.


2.1 The Total Derivative

To avoid all the problems associated with just partial derivatives, we usethe total derivative which is actually a linear map. This is motivated by thetangent line approximation in one variable.

Definition 2.4. Let E ⊆ Rn and let ~x0 be an interior point of E andlet ~f : E → Rm. We say that ~f is differentiable at ~x0 and has totalderivative L where L : Rn → Rm is a linear map provided that

lim~h→~0

‖~f( ~x0 + ~h)− ~f( ~x0)− L(~h)‖2‖~h‖2

= 0.

When the total derivative exists, we set ~f ′( ~x0) = L.

We will denote the vector appearing in the numerator of the above limitby Ef (~h), that is,

Ef (~h) = ~f( ~x0 + ~h)− ~f( ~x0)− L(~h).

In this notation the condition for differentiability can be written as

lim~h→~0

‖Ef (~h)‖2‖~h‖2

= 0.

We shall often use the following reformulation.

Proposition 2.5. Let E ⊆ Rn, let ~x0 be an interior point of E, let ~f : E →Rm. Then ~f is differentiable at ~x0 if and only if for every ε > 0, there existsr > 0 so that ‖~h‖2 < r implies that ‖Ef (~h)‖2 ≤ ε‖~h‖2.

Note that in the reformulation we have moved the ‖~h‖2 from the de-nominator to the right hand side of the inequality and we are now allowing~h = ~0.

Proof. Requiring the limit to be 0 is equivalent to requiring that for everyε > 0, there is an r > 0, so that when ‖~h‖2 < r and ~h 6= ~0, we have

‖Ef (~h)‖2‖~h‖2

< ε

or‖Ef (~h)‖2 < ε‖~h‖2.

2.1. THE TOTAL DERIVATIVE 61

Hence, ‖Ef (~h)‖2 ≤ ε‖~h‖2, for ‖~h‖2 < r and ~h 6= ~0. Now that ‖~h‖2 is no

longer in the denominator, we can allow ~h = ~0. When ~h = ~0, both sides ofthe above inequality are 0, so the less than or equal to holds there as well.

Conversely, if we only have the condition with the less than or equal tocondition, then given ε > 0, we may pick an r > 0, so that for ‖~h‖2 < r, wehave that ‖Ef (~h)‖2 ≤ ε

2‖~h‖2. Hence, for every ‖~h‖2 < r and ~h 6= ~0, we havethat

‖Ef (~h)‖2‖~h‖2

≤ ε

2< ε.

Notice that the condition that ~f be differentiable at ~x0 is that, not onlydoes Ef (~h) tend to 0 as ~h tends to ~0 but it does it so rapidly that it still

tends to 0 when divided by ‖~h‖2.Recall that every linear map L : Rn → Rm is given by multiplication by

an m×n matrix A = (ai,j). Because matrix multiplication is best visualizedby considering vectors as column vectors and column vectors are hard toinclude in text, we will write a vector ~h ∈ Rn as ~h = (h1, ..., hn)t where thesuperscript t denotes transpose. Thus,

L(~h) = A · ~h = (n∑j=1

a1,jhj , ...,n∑j=1

an,jhj)t.

We will often write L = LA when we want to emphasize that L is the linearmap arising as multiplication by the matrix A. Other times we will simplyidentify the linear map and the matrix and write L = A.

We now look at implications of a function being differentiable.

Theorem 2.6. Let E ⊆ Rn and let ~x0 be an interior point of E and let~f : E → Rm. If ~f is differentiable at ~x0 with f ′( ~x0) = L, then the directionalderivatives also exist for all directions and

~f ′~u( ~x0) = L(~u).

Proof. The result is obvious when ~u = ~0, so we only consider ~u 6= ~0. Wehave

~f ′~u( ~x0) = limt→0

~f( ~x0 + t~u)− ~f( ~x0)

t=

limt→0

~f( ~x0 + t~u)− ~f( ~x0)− L(t~u)

t+L(t~u)

t= L(~u)+lim

t→0

~f( ~x0 + t~u)− ~f( ~x0)− L(t~u)

t=

L(~u) + limt→0

Ef (t~u)

t.


When ~u 6= ~0, then

‖Ef (t~u)

t‖2 =

‖Ef (t~u)‖2‖t~u‖2

‖~u‖2 → 0,

by the differentiability condition. Thus,

Ef (t~u)

t→ ~0

as t→ 0 and the result follows.

Theorem 2.7. Let E ⊆ Rn, let ~x0 be an interior point of E, let ~f : E →Rm and let fi : E → R, 1 ≤ i ≤ m denote the components of ~f. If ~f isdifferentiable at ~x0, then ∂fi( ~x0)

∂xjexists for all i and j and

~f ′( ~x0) =(∂fi( ~x0)

∂xj

).

Proof. Let ~ej denote the unit vector in the j-th direction. Then

(∂f1∂xj

( ~x0), ...,∂fm∂xj

( ~x0))t =

∂ ~f

∂xj( ~x0) = ~f ′~ej ( ~x0) = ~f ′( ~x0)(~ej).

Now note that if one multiplies a matrix times the vector ~ej , then theresulting vector is the j-th column of the matrix. the result follows.

Some special cases of the above notation are useful. First we look atthe case where the range i one dimensional ad then when the domain is onedimensional.

Let E ⊆ Rn, let ~x0 be an interior point of E, let f : E → R be differen-tiable at ~x0. Then f ′( ~x0) is the 1× n matrix( ∂f

∂x1( ~x0), ....,

∂f

∂xn( ~x0)

).

In calculus, this (row) vector is usually deoted by ∇f. Thus, in our notationfor f : Rn → R we have that when f is differentiable,

f ′ = ∇f.

We now look at the case when the domain is one dimensional. LetE ⊆ R, let x0 be an interior point of E, let ~f : E → Rm and let fi : E →R, 1 ≤ i ≤ m denote the components of ~f. If ~f is differentiable at ~x0 then

2.1. THE TOTAL DERIVATIVE 63

~f ′(x0) =(∂fi∂x1

(x0))

is an m × 1 matrix. Notice that since there is really

only one variable, we usually omit the ∂ notation, use ordinary derivatives

signs and this (column) vector, in calculus, is denoted by d~fdx (x0). Thus, in

this case,

~f ′(x0) =(df1dx

(x0), ...,dfmdx

(x0))t

=d~f

dx(x0).

Proposition 2.8. Let E ⊆ Rn, let ~x0 be an interior point of E, let ~f : E →Rm and let fi : E → R, 1 ≤ i ≤ m denote the components of ~f. Then ~f isdifferentiable at ~x0 if and only if fi is differentiable at ~x0 for all i = 1, ...,m.

Proof. Note that Ef (~h) is a vector whose components are Ei(~h), i = 1, ...,m.

Thus, we have that |Efi(~h)| ≤ ‖Ef (~h)||2, for i = 1, ...,m. Conversely,

‖Ef (~h)‖22 ≤ |Ef1(~h)|2 + · · ·+ |Efm(~h)|2.

If ~f is differentiable at ~x0 and given ε > 0, we pick r > 0, so that‖Ef (~h)‖2‖h‖2 < ε for ‖~h‖2 < r,~h 6= ~0, then

|Efi(~h)|

‖~h‖2< ε for ‖~h‖2 < r,~h 6= ~0.

Conversely, if each fi is differentiable at ~x0, then given ε > 0, we may

pick ri > 0 so that|Efi

(~h)|‖~h‖2

< ε√m, for ‖~h‖2 < ri,~h 6= ~0. If we let r =

min{r1, ..., rm}, then for ‖~h‖2 < r,~h 6= ~0, we have

‖Ef (~h)‖2‖~h‖2

=

√|Ef1(~h)|2 + · · ·+ |Efm(~h)|2

‖~h‖2=√( |Ef1(~h)|

‖~h‖2

)2+ · · ·+

( |Efm(~h)|‖~h‖2

)2<

√(ε√m

)2 + · · ·+ (ε√m

)2 = ε.

Hence, ~f is differentiable at ~x0.

Problem 2.9. Let g, h : R→ R. Prove that if g is differentiable at a(in theusual sense) and h is differentiable at b(in the usual sense), then f : R2 → Rdefined by f(x1, x2) = g(x1)h(x2) is differentiable at ~x0 = (a, b).

Problem 2.10. Use the above problem to give an example of a functionf : R2 → R that is differentiable at (0, 0), ∂f

∂xj, j = 1, 2 both exist everywhere,

but the partial derivatives are not continuous at (0, 0).


Problem 2.11. Let E ⊆ R, let x0 be an interior point of E, let ~f : E → Rmand let fi : E → R, 1 ≤ i ≤ m denote the components of ~f. Prove that if dfi

dx

exists at x0 for all 1 ≤ i ≤ m, then ~f satisfies the definition of differentiableat x0.

Problem 2.12. Let E ⊆ Rn, let ~x0 be an interior point of E, let ~f : E →Rm and let fi : E → R, 1 ≤ i ≤ m denote the components of ~f. Fix a(row) vector ~a = (a1, ..., am) ∈ Rm and let g : E → R be the functiong = a1f1 + · · ·+ amfm, i.e., g = ~a · ~f. Prove that if ~f is differentiable at ~x0,then g is differentiable at ~x0 and

g′( ~x0) = ~a · ~f ′( ~x0),

where this last product is the product of a 1×m matrix and an m×n matrix.

2.2 Differentiability and Continuity

We saw earlier that the existence of partial derivatives is not enough to implycontinuity. Now we will prove that a differentiable function is continuous.

Definition 2.13. Let A = (ai,j) be an m× n matrix. Then we set

‖A‖2 =

√√√√ m∑i=1

n∑j=1

a2i,j .

For example, if we In denote the n× n identity matrix, i.e., the matrixwith 1’s for its diagonal entries and 0’s elsewhere, then ‖In‖2 =

√n.

Proposition 2.14. Let A = (ai,j) be an m×n matrix and let ~h = (h1, ..., hn)t ∈Rn, then

‖A(~h)‖2 ≤ ‖A‖2‖~h‖2.

Proof. Using the Schwarz inequality, we have that

‖A(~h)‖22 = ‖(n∑j=1

a1,jhj , ...,n∑j=1

am,jhj)‖22 =

m∑i=1

[ n∑j=1

ai,jhj

]2≤

m∑i=1

[(n∑j=1

a2i,j)(n∑j=1

h2j )]

=

m∑i=1

[(n∑j=1

a2i,j)‖~h‖22]

= ‖A‖22‖~h‖22.

2.3. THE CHAIN RULE 65

Theorem 2.15. Let E ⊆ Rn, let ~x0 be an interior point of E, let ~f : E →Rm. If ~f is differentiable at ~x0, then ~f is continuous at ~x0.

Proof. Let ~f ′( ~x0) = A. Since ~f is differentiable at ~x0 for ε1 = 1, there

exists r1 > 0, such that ‖~h‖2 < r1,~h 6= ~0 implies that‖Ef (~h)‖2‖~h‖2

< 1. Hence,

‖Ef (~h‖2 ≤ ‖~h‖2 for all ‖~h‖2 < r1.

Given ε > 0, set r2 = ε‖A‖2+1 , and let r = min{r1, r2}. Then for ‖~h‖2 < r,

we have that

‖~f( ~x0 + ~h)− ~f( ~x0)‖2 = ‖A(~h) + Ef (~h)‖2 ≤

‖A(~h)‖2 + ‖Ef (~h)‖2 ≤ ‖A‖2‖~h‖2 + ‖~h‖2 = (‖A‖2 + 1)‖~h‖2 < ε.

Hence, ~f is continuous at ~x0.

Since the function of Example 2.2 is not continuous at (0, 0), it is notdifferentiable at (0, 0). Thus, the function of Example 2.2 is a function thathas directional derivatives in every direction, but is not differen-tiable.

2.3 The Chain Rule

In calculus we often learn many multivariable versions of the chain ruledepending on the situation. But the most concise form involves compositionof linear maps and matrix products.

Recall that given linear maps LB : Rn → Rm and LA : Rm → Rp,their composition is the linear map LA ◦ LB : Rn → Rp, and it is given bymultiplication by the matrix that is the product of the matrices A and B,where the (i,j)-entry of the matrix product is given by

AB =( m∑k=1

ai,kbk,j

).

In particular, LA ◦ LB = LAB.

Theorem 2.16. Let E ⊆ Rn, let ~x0 be an interior point of E, let ~g : E →Rm be differentiable at ~x0 with ~g′( ~x0) = B and let Y ⊆ Rm with ~g(E) ⊆ Y,let ~y0 = ~g( ~x0) be an interior point of Y and let ~f : Y → Rp be differentiableat ~y0 with ~f ′(~y0) = A. Then the function ~f ◦ ~g : E → Rp is differentiable at~x0 and

(~f ◦ ~g)′( ~x0) = ~f ′(~g( ~x0)) ◦ ~g′( ~x0) = AB.


Proof. Let ε > 0. Using the differentiablility of ~g at ~x0 we may pick r1 > 0,so that ‖~h‖2 < r1 implies that ‖Eg(~h)‖2 ≤ ‖~h‖2. Using the differentiablity of~f at ~y0 we may choose r2 > 0, so that for ‖~k‖2 < r2, we have that ‖Ef (~k)‖ ≤

ε2(‖B‖2+1)‖~k‖2. Finally, using the differentiablilty of ~g at ~x0 again, we may

choose r3 > 0, so that for ‖~h‖2 < r3, we have that ‖Eg(~h)‖2 ≤ ε2(‖A‖2+1)‖~h‖2.

Let r = min{r1, r2‖B‖2+1 , r3}. Then for ‖~h‖2 < r, we have that

‖B(~h) + Eg(~h)‖2 ≤ ‖B‖2‖~h‖2 + ‖Eg(~h)‖2 ≤ (‖B‖2 + 1)‖~h‖2 < r2,

and hence,

‖~f◦~g( ~x0+~h)− ~f◦~g( ~x0)−AB(~h)‖2 = ‖~f(~g( ~x0+~h)

)− ~f(~g( ~x0)

)−AB(~h)‖2 =

‖~f(~g( ~x0) +B(~h) + Eg(~h)

)− ~f

(~g( ~x0)

)−A

(B(~h) + Eg(~h)

)+A

(Eg(~h)

)‖2 =

‖Ef(B(~h)+Eg(~h)

)+A

(Eg(~h)

)‖2 ≤ ‖Ef

(B(~h)+Eg(~h)

)‖2 +‖A

(Eg(~h)

)‖2 ≤

ε

2(‖B‖2 + 1)‖B(~h) + Eg(~h)‖2 + ‖A‖2‖‖Eg(~h)‖2 ≤

ε

2(‖B‖2 + 1)(‖B‖2 + 1)‖~h‖2 + ‖A‖2

ε

2(‖A‖2 + 1)‖~h‖2 ≤ ε‖~h‖2.

This proves that the derivative exists and is equal to AB.

We now look at a couple of applications of the Chain Rule. First noticethat if we fix ~x0, ~u ∈ Rn and define ~g : R→ Rn, by ~g(t) = ~x0 + t~u, then ~g isdifferentiable by Problem 2.11 and

~g′(0) =d~g

dt(0) = ~u.

Hence, if E ⊆ Rn, ~x0 is an interior point of E and ~f : E → Rm is differen-tiable at ~x0, then by the Chain Rule

~f ◦ ~g(t) = ~f( ~x0 + t~u),

is differentiable at 0, and

~f ′~u( ~x0) =(~f ◦ ~g

)′(0) = ~f ′( ~x0)~u.

Thus, we see that Theorem 2.6 is also a consequence of the Chain Rule.

2.4. MULTIVARIABLE MEAN VALUE THEOREM 67

2.4 Multivariable Mean Value Theorem

Recall that the one-variable Mean Value theorem says that if f : [a, b]→ Ris continuous on [a, b] and differentiable on (a, b), then there exists a <t0 < b such that f(b) − f(a) = f ′(t0)(b − a). Now suppose that we have~f : [a, b] → Rm with component functions fi that is differentiable on (a, b)and continuous on [a, b]. A direct extension of the mean value theorem wouldbe for there to exist a < t0 < b, so that ~f(b)− ~f(a) = ~f ′(t0)(b−a). However,looking at each component, this would imply that for this one value t0, wehad fi(b)− fi(a) = f ′i(t0)(b− a), for all i = 1, ...,m. Thus, a single numer t0would be working, simultaneously, as the point for every one of the functions,f1, ..., fm. This is just not true, in fact it is quite easy to find differentiablefunctions f1, f2 : [0, 1] → R so that there are unique points 0 < t1, t2 < 1with f1(1) − f1(0) = f ′1(t1)(1 − 0) and f2(1) − f2(0) = f ′2(t2)(1 − 0), butt1 6= t2.

Thus, the most obvious multivariable generalization of the mean valuetheorem is not true even for just vector-valued functions. But there is amultivariable mean value theorem that is nearly as useful as the one variableversion.

Definition 2.17. A subset C ⊆ Rn is called convex if whenever ~x2, ~x1 ∈ C,then every point on the line segment that joins them

~x1 + t( ~x2 − ~x1), 0 ≤ t ≤ 1,

is also in C.

For some examples of convex sets, note that every ball B( ~x0; r) is convex,and the set C = {(x1, ..., xn) ∈ Rn : xi > 0∀i} is convex.

Theorem 2.18 (Multivariable Mean Value Theorem). Let C ⊆ Rn be anopen convex set, let ~f : C → Rm and let fi : C → R, 1 ≤ i ≤ m denote thecomponents of ~f. If ~f is differentiable at each point in C, ~a = (a1, ..., am) ∈Rm and ~x2, ~x1 ∈ C, then there exists ~z ∈ C such that

~a · (~f( ~x2)− ~f( ~x1)) = ~a · [~f ′(~z)( ~x2 − ~x1)].

Proof. Since the line segment ~x1 + t( ~x2 − ~x1), 0 ≤ t ≤ 1 joining ~x1 to ~x2stays inside C, we may define g : [0, 1]→ R by

g(t) = a1f1( ~x1 + t( ~x2 − ~x1)) + · · ·+ amfm( ~x1 + t( ~x2 − ~x1)).


By the chain rule g is differentiable on (0, 1) and g is continuous on [0, 1]and so there is a point 0 < t0 < 1 with

~a · ~f( ~x2)− ~a · ~f( ~x1) = g(1)− g(0) = g′(t0) =

a1f′1( ~x1 + t0( ~x2− ~x1)) · ( ~x2− ~x1) + · · ·+ amf

′m( ~x1 + t0( ~x2− ~x1)) · ( ~x2− ~x1) =

~a · [f ′( ~x1 + t0( ~x2 − ~x1))( ~x2 − ~x1)] = ~a · [~f ′(~z)( ~x2 − ~x1)],

where ~z = ~x1 + t0( ~x2 − ~x1) ∈ C.

For the next result it will be better to have some better estimates.

Definition 2.19. Given an m × n matrix A we define the norm of thematrix by

‖A‖ = sup{‖A~h‖2‖h‖2

: ~h 6= ~0},

where the supremum is taken over all non-zero vectors ~h ∈ Rn.

Note that we have the inequality ‖A~h‖2 ≤ ‖A‖‖~h‖2, and another wayto view the number ‖A‖ is that it is the smallest constant K satisfying,‖A~h‖2 ≤ K‖~h‖2. In fact, it is not hard to show that

‖A‖ = inf{K : ‖A~h‖2 ≤ K‖~h‖2∀~h}.

Since we have shown earlier that ‖A~h‖2 ≤ ‖A‖2‖~h‖2, we have that

‖A‖ ≤ ‖A‖2.

If we consider the n× n identity matrix In, then we have that In~h = ~h,and so ‖In‖ = 1, but ‖In‖2 =

√n.

Theorem 2.20. Let C ⊆ Rn be an open convex set and let ~f : C → Rm.If ~f is differentiable at every point in C and ‖~f ′(~z)‖ ≤ M for every ~z ∈ C,then ‖~f( ~x2)− ~f( ~x1)‖2 ≤M‖ ~x2 − ~x1‖2 for any ~x2, ~x1 ∈ C.

Proof. Let ~a = ~f( ~x2 − ~f( ~x1) and as before define g : [0, 1] → R by g(t) =~a · ~f( ~x1 + t( ~x2 − ~x1)). We have

‖~f( ~x2)− ~f( ~x1)‖22 = ~a · ~f( ~x2)− ~a · ~f( ~x1) = g(1)− g(0) =

g′(t0) = ~a · [~f ′( ~x1 + t0( ~x2 − ~x1))( ~x2 − ~x1)] =

~a · [~f ′(~z)( ~x2 − ~x1)].

2.5. CONTINUOUSLY DIFFERENTIABLE FUNCTIONS 69

Now applying the Schwarz inequality,

‖~f( ~x2)− ~f( ~x1)‖22 ≤ ~a · [~f ′(~z)( ~x2 − ~x1)] ≤

‖~a‖2‖~f ′(~z)( ~x2 − ~x1)‖2 ≤ ‖~a‖2‖~f ′(~z)‖‖ ~x2 − ~x1‖2 ≤

M‖~a‖2‖ ~x2 − ~x1‖2 = M‖~f( ~x2)− ~f( ~x1)‖2‖ ~x2 − ~x1‖2,

and after cancelling ‖~f( ~x2) − ~f( ~x1)‖2 from both sides of the equation, theresult follows.

Theorem 2.21. Let U ⊆ Rn be an open connected set and let ~f : U → Rm.If ~f is differentiable at each point of U and ~f ′(~x) = 0 for all ~x ∈ U, then ~fis constant.

Proof. Pick a point ~b ∈ U and let ~c = ~f(~b). We will prove that ~f(~x) = ~c forall ~x ∈ U. To this end let V = {~x ∈ U : ~f(~x) = ~c}. We have that ~b ∈ V, so Vis non-empty. We will prove that V is an open and closed in the connectedmetric space (U, d2), and hence V = U and the theorem is proved.

It remains to verify the claim. First, if ~x0 ∈ V, then since ~x0 ∈ U, thereis an r > 0, so that B( ~x0; r) ⊆ U. Now if ~x1, ~x2 ∈ B( ~x0; r), then by ourlast theorem with M = 0, we have ‖~f( ~x2) − ~f( ~x1)‖2 ≤ 0‖ ~x2 − ~x1‖2. Thus,~f( ~x2) = ~f( ~x1).

2.5 Continuously Differentiable Functions

In this section we give a sufficient condition for a function to be differentiable.

Definition 2.22. Let U ⊆ Rn, let ~f : U → Rm and let fi, i = 1, ...,m be thecomponent functions. We say that ~f is C1 on U provided that ∂fi

∂xjexist for

1 ≤ i ≤ m and 1 ≤ j ≤ n and are continuous on U.

Proposition 2.23. Let U ⊆ Rn be an open set and let f : U → R be C1 onU. Then f is differentiable on U.

Proof. For clarity of exposition we only do the case n = 3. Let ~x0 =(a, b, c) ∈ U. To prove that f is differentiable at ~x0 we must show thatfor ~h = (h1, h2, h3), we have

lim~h→~0

|f( ~x0 + ~h)− f( ~x0)− ∂f( ~x0)∂x1

h1 − ∂f( ~x0)∂x2

h2 − ∂f( ~x0)∂x3

h3|‖~h‖2

= 0.


We regroup the numerator and then apply the one variable mean valuetheorem three times to get

f( ~x0 + ~h)− f( ~x0)−∂f( ~x0)

∂x1h1 −

∂f( ~x0)

∂x2h2 −

∂f( ~x0)

∂x3h3 =

f((a+ h1, b+ h2, c+ h3))− f((a+ h1, b+ h2, c))−∂f( ~x0)

∂x3h3

+ f((a+ h1, b+ h2, c))− f((a+ h1, b, c))−∂f( ~x0)

∂x2h2

+ f((a+ h1, b, c))− f((a, b, c))− ∂f((a, b, c))

∂x1h1

=∂f((a+ h1, b+ h2, t3)

∂x3h3 −

∂f((a, b, c))

∂x3h3

+∂f((a+ h1, t2, , c))

∂x2h2 −

∂f((a, b, c))

∂x2h2

+∂f((t1, b, c))

∂x1h1 −

∂f((a, b, c))

∂x1h1

=(∂f((a+ h1, b+ h2, t3)

∂x3− ∂f((a, b, c))

∂x3

)h3

+(∂f((a+ h1, t2, , c))

∂x2− ∂f((a, b, c))

∂x2

)h2

+(∂f((t1, b, c))

∂x1− ∂f((a, b, c))

∂x1

)h1, (2.1)

where t3 is between c and c+h3, t2 is between b and b+h2, and t1 is betweena and a+ h1.

Thus, the distance between each of the points (t1, b, c), (a+h1, t2, c), and(a + h1, b + h2, t3) and (a, b, c) is less than

√h21 + h22 + h23 = ‖~h‖2. By the

continuity of the partial derivatives, given ε > 0 there is an r > 0, so thatwhen ‖~h‖2 < r, we have that the differences between each pair of partials inthe parentheses above is at most ε1 = ε/

√3.

Thus, if we apply the Schwarz inequality to (2.1), we get that the abovequantity is less than√

ε21 + ε21 + ε21

√h21 + h22 + h23 = ε‖~h‖2.

Thus, for ‖~h‖2 < r, we have that the above fraction is at most ε.

Theorem 2.24. Let U ⊆ Rn be open and let ~f : U → Rm be C1 on U. Then~f is differentiable on U.

2.6. THE INVERSE FUNCTION THEOREM 71

Proof. By the above proposition each component function fi : U → R isdifferentiable. By Proposition 2.8, the fact that each component function isdifferentiable implies that ~f is differentiable.

Since ~f ′ = ( ∂fi∂xj) and each entry of this matrix is continuous, we see that

~f ′ is continuous.

2.6 The Inverse Function Theorem

Loosely stated the Inverse Functions Theorem says that if U ⊆ Rn, ~f : U →Rn is C1 and the matrix ~f ′( ~x0) is invertible, then that is enough to guaranteethat in a small neighborhood of ~x0, the function ~f is one-to-one, so that aninverse function ~g exists and that inverse function will also be C1 on someneighborhood of ~y0 = ~f( ~x0), with ~g′(~y0) =

(~f ′( ~x0)

)−1.

Before we can prove the inverse function theorem, we will need a fewpreliminary results about matrices and their norms.

Lemma 2.25. Let A be an m × n matrix and let B be an n × p matrix.Then ‖AB‖ ≤ ‖A‖‖B‖.

Proof. Given any ~h ∈ Rp, we have that

‖(AB)~h‖2 = ‖A(B~h)‖2 ≤ ‖A‖‖B~h‖2 ≤ ‖A‖‖B‖‖~h‖2,


Lemma 2.26. If A is an n×n matrix that is invertible, then for any ~h ∈ Rn,we have

‖~h‖2‖A−1‖

≤ ‖A~h‖2.

Proof. We have that

‖~h‖2 = ‖A−1(A~h)‖2 ≤ ‖A−1‖‖A~h‖2,

and the result follows by dividing this inequality by ‖A−1‖.

Lemma 2.27 (The Reverse Triangle Inequality). Let ~x, ~y ∈ Rn, then

|‖~x‖2 − ‖~y‖2| ≤ ‖~x+ ~y‖2.


Proof. By Minkowski’s inequality,

‖~x‖2 = ‖(~x+ ~y)− ~y‖2 ≤ ‖~x+ ~y‖2 + ‖~y‖2,

which implies that ‖~x‖2 − ‖~y‖2 ≤ ‖~x + ~y‖2. Reversing the roles of ~x and~y, yields that ‖~y‖2 − ‖~x‖2 ≤ ‖~x + ~y‖2. These two inequalities yield theresult.

The following result uses norms of matrices(a metric concept) to guaran-tee invertibility(a linear algebra concept)! Roughly, it says that if a matrixis close enough to an invertible matrix, then it is also invertible.

Theorem 2.28. Let A and B be n × n matrices. If A is invertible and‖A−B‖ < 1

‖A−1‖ , then B is invertible.

Proof. Suppose that B~x = ~0 with ~x 6= ~0. Then e have that A~x = (A−B)~x,and hence, ~x = A−1(A−B)~x.

Computing the norm of both sides, yields

‖~x‖2 = ‖A−1(A−B)~x‖2 ≤ ‖A−1‖‖A−B‖‖~x‖2 < ‖~x‖2.

Thus, ‖~x‖2 < ‖~x‖2! A contradiction. Hence, B~x = ~0 implies that ~x = ~0,and so B is one-to-one, and from linear algebra, this means that B is alsoonto, and so invertible.

Theorem 2.29 (Inverse Function Theorem). Let U ⊆ Rn be open, let ~x0 ∈U, and let ~f : U → Rn be C1 on U. If ~f ′( ~x0) is invertible, then:

1. there exists r > 0 such that ~f is one-to-one on B( ~x0; r),

2. f ′(~x) is invertible for ~x ∈ B( ~x0; r),

3. V = ~f(B( ~x0; r)) is open,

4. if ~g = ~f−1, denotes the inverse function, then ~g : V → B( ~x0; r) is C1

on V and ~g′(~y) =(~f ′(~g(~y))

)−1.

Proof. Let A = ~f ′( ~x0) and fix a number λ, 0 < λ < 1. Note that

‖~f ′(~x)−A‖2 ≤ ‖~f ′(~x)−A‖22 =n∑

i,j=1

(∂fi(~x)

∂xj− ∂fi( ~x0)

∂xj

)2.


Thus, using the continuity of all the partial derivatives, we may choose r > 0so that ‖~x− ~x0‖2 < r implies that

‖~f ′(~x)− ~f ′( ~x0)‖ ≤ ‖~f ′(~x)−A‖2 <λ

‖A−1‖<

1

‖A−1‖.

Hence, f ′(x) is invertible by Theorem 2.28.Let ~f1(~x) = ~f(~x)−A(~x− ~x0). Then for ~x ∈ B( ~x0; r) we have that

‖~f1(~x)‖ = ‖~f(~x)−A‖ < λ

‖A−1‖.

Using Theorem 2.20, we have that for ~x2, ~x1 ∈ B( ~x0; r),

‖~f1( ~x2)− ~f1( ~x1)‖2 ≤λ

‖A−1‖‖ ~x2 − ~x1‖2. (2.2)

Hence, for any ~x2, ~x1 ∈ B( ~x0; r) we have

‖~f( ~x2)− ~f( ~x1)‖2 = ‖~f1( ~x2)− ~f1( ~x1) +A( ~x2 − ~x1)‖2 ≥

‖A( ~x2 − ~x1)‖2 − ‖~f1( ~x2)− ~f1( ~x1)‖2 ≥‖ ~x2 − ~x1‖2‖A−1‖

− λ

‖A−1‖‖ ~x2 − ~x1‖2 ≥

(1− λ)

‖A−1‖‖ ~x2 − ~x1‖2. (2.3)

Note that when ~x2 6= ~x1 then the last term in this inequality is strictlypositive, and so ‖~f( ~x2) − ~f( ~x1)‖2 6= 0, which implies that ~f( ~x2) 6= ~f( ~x1).Thus, ~f is one-to-one on B( ~x0; r) and we have proven (1).

We now prove that V = ~f(B( ~x0; r)) is open. To this end pick ~y1 ∈ V, sothat ~y1 = ~f( ~x1) for some ~x1 ∈ B( ~x0; r). Choose an r1 > 0 sufficiently smallthat the closed ball,

B− = B( ~x1; r1)− ⊆ B( ~x0; r).

We claim that if ‖~y− ~y1‖ < (1−λ)r1‖A−1‖ , then there is ~x ∈ B− with ~f(~x) = ~y.

This will prove that B(~y1;(1−λ)r1‖A−1‖ ⊆ V and so V is open.

We now prove the claim. To this end we define a function φ : B− → R¯n

by settingφ(~x) = ~x+A−1(~y − ~f(~x)).

We have that

φ′(~x) = In −A−1(~f ′(~x)) = A−1(A− ~f ′(~x)).


Hence,‖φ′(~x)‖ ≤ ‖A−1‖‖A− ~f ′(~x)‖ ≤ λ,

for any ~x ∈ B( ~x0; r), by the way that we chose r. Applying Theorem 2.20again, we have that ‖φ( ~x3) − φ( ~x2)‖2 ≤ λ‖ ~x3 − ~x2‖2. In particular φ is acontraction mapping.

We now wish to show that φ : B− → B−. First note that

‖φ( ~x1)− ~x1‖ = ‖A−1(~y − ~f( ~x1))‖ = ‖A−1(~y − ~y1)‖ < (1− λ)r1.

Thus, for any ~x ∈ B−, we have that

‖φ(~x)− ~x1‖ ≤ ‖φ(~x)− φ( ~x1)‖2 + ‖φ( ~x1)− ~x1‖2 ≤λ‖~x− ~x1‖2 + (1− λ)r1 ≤ λr1 + (1− λ)r1 = r1,

and so φ(~x) ∈ B−.Since B− is closed and bounded, it is a compact set by the Heine-Borel

theorem. Thus, φ is a contraction mapping of a compact set back into itself.By the Contraction Mapping Theorem, φ has a unique fixed point. Thus,there is a point ~x ∈ B−, with

~x = φ(~x) = ~x+A−1(~y − ~f(~x).

This implies that ~0 = A−1(~y − ~f(~x)) and hence, ~y = ~f(~x).This completes the proof of the claim and hence V is open.We now prove (3). Pick ~y ∈ V and ~y + ~k ∈ V. Then there exist ~x and

~x+ ~h in B( ~x0; r) with ~y = ~f(~x) and ~y + ~k = ~f(~x+ ~h). By inequality (2.3),we have that

‖~k‖2 = ‖~f(~x+ ~h)− ~f(~x)‖2 ≥(1− λ)

‖A−1‖‖~h‖2.

Thus, if ~k → ~0, then ~h→ ~0.Let T =

(~f ′(~x)

)−1, then we have that

‖~g(~y + ~k)− ~g(~y)− T (~k)‖2‖~k‖2

=‖(~x+ ~h)− ~x− T

(~f(~x+ ~h)− ~f(~x)

)‖2

‖~k‖2

=‖~h+ T (~f(~x))− T

(~f(~x+ ~h)

)‖2

‖~k‖2=‖T(~f ′(~x)

(~h)

+ ~f(~x)− ~f(~x+ ~h))‖2

‖~k‖2≤

‖T‖‖Ef (~h)‖2‖~k‖2

≤‖T‖‖Ef (~h)‖2

(1− λ)‖~h‖2/‖A−1‖=

‖A−1‖‖T‖1− λ

‖Ef (~h)‖2‖~h‖2

.


As ~k → ~0, we have ~h→ ~0, and so by the differentiability of ~f, as ~k → ~0,

we also have that‖Ef (~h)‖2‖~h‖2

→ 0. Thus, given ε > 0 we may choose ‖~k‖2sufficiently small so that

‖Ef (~h‖2‖~h‖2

<(1− λ)ε

‖A−1‖‖T‖.

This proves that‖Eg(~k)‖2‖~k‖2

→ 0 as ~k → ~0, and so ~g is differentiable at ~y with

~g′(~y) = T =(~f ′(~x)

)−1=(~f ′(~g(~y)

))−1.

Finally, to see that ~g is C1 on V, we note that since ~g is differentiable ateach point in V it is continuous at each point in V.

Cramer’s rule for the inverse of a matrix expresses each entry in theinverse as the quotient of one of the entries and a determinant of a submatrix.Since determinants are made up of sums and products of entries, if eachcomponent of a matrix is a continuous function, then each subdeterminantis a continuous function. When the matrix is invertible, then none of thesesubdeterminants can be equal to 0.

Hence, we see by Cramer’s rule that if we are given a matrix of continuousfunctions and we know that the matrix is invertible, then each entry of theinverse matrix is also a continuous function

Thus, since ~g is continuous and ~f is C1, each entry of ~f ′(~g(~y)) is acontinuous function of ~y. By the Cramer’s rule reasoning, each entry of

~g′(~y) =(~f ′(~g(~y)

))−1is also a continuous function of ~y. But the entries of

~g′(~y) are the partial derivatives, ∂gi(~y)∂yj. Thus, all the partials are continuous

functions and so ~g is C1 on V. This completes the proof of the theorem.

Problem 2.30. Let ~f : R3 → R3 have component functions f1(x1, x2, x3) =3x1 + 2x2 + x3, f2(x1, x2, x3) = 2x1x2 + x23, f3(x1, x2, x3) = 4x1x2x3. Note

that ~f(1, 2, 0) = (7, 4, 0). Use the inverse function theorem to compute alinear approximation to the vector ~x that solves ~f(~x) = (7.1, 4.2, 0.3).

As a quick summary, we will often say that the inverse function theoremsays that if ~f(~x) = ~y is C1 and ~f ′( ~x0) is invertible, then locally we can solvefor ~x as a function of ~y.


2.7 The Multi-Variable Newton and Quasi-NewtonMethods

We start with the motivation, which is very much like the 1-variable case.Suppose that ~f : Rn → Rn and we wish to solve ~f(~x) = ~0. If we are given anapproximate solution ~x1 and we wish to improve it a “better” approximatesolution, ~x2, then we write

~f(~x2) = ~f( ~x1 + (~x2 − ~x1)) ≈ ~f(~x1) + ~f ′(~x1)(~x2 − ~x1).

Setting ~f(~x2) = ~0 and solving for ~x2 yields

~x2 = ~x1 −(~f ′(~x1)

)−1 ~f(~x1).

Thus, Newton’s Method is to start with an initial guess for the solu-tion ~x1 and form the sequence,

~xn+1 = ~xn −(~f ′(~xn)

)−1 ~f(~xn).

Of course, to form this sequence we will need that the matrix ~f ′(~xn) isinvertible for all n.

Notice that at each step in this method, to compute ~xn+1 we need tocompute the inverse of the matrix ~f ′(~xn). This is time consuming.

The Quasi-Newton Method is simpler to implement because it uses~f ′(~x1) to approximate the remaining derivatives. Thus, this method onlyrequires computing the inverse of a single matrix, namely, the inverse of thederivative at the initial point. Thus, the quasi-Newton method, is to startwith an initial “guess” ~x1 and form the sequence

~xn+1 = ~xn −(~f ′(~x1)

)−1 ~f(~xn).

Of course for both of these algorithms, success depends a lot on howgood the initial “guess” is.

Theorem 2.31 (Quasi-Newton Method). Let U ⊆ Rn be an open set, let~f : U → Rn and assume that there is ~z ∈ U such that ~f(~z) = ~0. If ~f is C1 onU and the matrix ~f ′(~z) is invertible, then there exists an r > 0, so that forany x1 ∈ B(~z; r) the sequence {~xn} generated by the Quasi-Newton Methodconverges to ~z.

Proof. First, since ~f is C1 and ~f ′(~z) is invertible, we may pick r1 such thatfor ~x ∈ B(~z; r1) the matrix ~f ′(~x) is invertible. Also, since all of the entries of

2.7. THEMULTI-VARIABLE NEWTONANDQUASI-NEWTONMETHODS77

this matrix are continuous functions of ~x by applying Cramer’s Rule again,we see that the entries of

(~f ′(~x)

)−1and the norm of this matrix will also

vary continuously with ~x. Thus, we may find an 0 < r2 ≤ r1 such that for~x ∈ B(~z; r2) we have a bound, say,

‖(~f ′(~x)

)−1‖ ≤M.

Finally, fix some number λ, 0 < λ < 1 and again using the continuity ofthe partial derivatives, we may pick an r3 > 0 so that when ~x, ~y ∈ B(~z; r3)then ‖~f ′(~x)− ~f ′(~y)‖ < M−1λ.

Set r = min{r1, r2, r3}. We claim that for any intial ~x1 ∈ B(~z; r) thesequence generated by the quasi-Newton algorithm converges to ~z.

To simplify notation let A = ~f ′(~x1), so that

xn+1 = xn −A−1 ~f(~xn).

We must show that this sequence converges to ~z.Now define ~g(~x) = x−A−1 ~f(~x), so that

~xn+1 = ~g(~xn).

Note that ~g(~x) = ~x if and only if ~f(~x) = ~0.Then

~g′(~x) = I −A−1 ~f ′(~x) = A−1[A− ~f ′(~x)].

Hence, for any ~x ∈ B(~z; r),

‖~g′(~x)‖ ≤ ‖A−1‖ ‖A− ~f ′(~x)‖ ≤MM−1λ = λ < 1.

Thus, by the corollary to the Multivariable Mean Value Theorem, for ~x, ~y ∈B(~z; r) we have that

‖~g(~x)− ~g(~y)‖2 ≤ λ‖~x− ~y‖2,

that is ~g is a strict contraction. Also,

‖~g(~x)− ~z‖2 = ‖~g(~x)− ~g(~z)‖2 ≤ λ‖~x− ~z‖2 < r.

So ~g maps B(~z; r) back into itself. Finally, as in the proof of the contractionmapping theorem, we have

‖~xn+1 − ~z‖2 = ‖~g(~xn)− ~g(~z)‖2 ≤ λ‖~xn − ~z‖2.

Thus, by induction, we have that

‖~xn+1 − ~z‖2 ≤ λn‖~x1 − ~z‖2 → 0,

and the proof is complete.


Remark 2.32. Note that the proof also shows that ~z is the unique pointin B(~z; r) that satisfies ~f(~x) = ~0. To see this note that if ~x1 was any pointsatisfying ~f(~x1) = ~0, then using this point for our initial guess, we wouldhave that ~xn = ~x1 fo all n, while at the same time, ~xn → ~z.

Problem 2.33. Let ~f(x, y) = (x2−y, y2 + 3x−1). Prove that there is somepoint ~z = (a, b) with 0 ≤ a, b ≤ 1 satisfying ~f(~z) = (0, 0). Pick ~x1 = (1, 1)and compute ~x2, ~x3 for both the quasi-Newton method and Newton method.

2.8 The Implicit Function Theorem

Given a set of n equations in n+m unknowns, we would like to be able toexpress n of the unknowns, called the dependent variables, in terms of m ofthe unknowns, called either the free variables or independent variables. Theimplicit function theorem gives us conditions under which we can do thislocally and gives us a formula for computing the partial derivatives of thedependent variables in terms of the independent variables.

For example, from calculus we know that on the sphere given by theequation

x2 + y2 + z2 = 9,

we can define the dependent variable z as a function of x and y, except whenz = 0. Our calculus approach would be to write

∂

∂x[x2 + y2 + z2] =

∂

∂x[9]

and hence,

2x+ 0 + 2z∂z

∂x= 0,

implying∂z

∂x= −x

z.

The reason that we get 0 for the second term in this sum, was that x and ywere both considered as independent variables and hence, ∂y∂x = 0. Similarly,∂z∂y = −y

z .

Thus, at the point (2, 1,−2), we would have that ∂z∂x = 1, ∂z

∂y = 1/2.What these numbers tell us is that for nearby points (2 + h1, 1 + h2, z) onthe sphere we know that a good linear approximation to z is

z=− 2 + 1h1 + (1/2)h2.

2.8. THE IMPLICIT FUNCTION THEOREM 79

In this case we can see directly why z = 0 is problematical. For z > 0,we can actually sove this equation explicitly and z = +

√9− x2 − y2, while

for z < 0, z = −√

9− x2 − y2. Thus, in any neighborhood of a point of theform (x0, y0, 0) for each nearby (x, y) there are two nearby z’s that satisfythe equation and so z is not uniquely determined as a function of (x, y).. Inthis case, we can actually picture this quite easily, since (x0, y0, ) is a pointon the equator of this sphere and in any neighborhood of such a point, for agiven nearby (x, y) there is a z > 0 “above” the equator and z < 0 “below”the equator.

For a second example, we consider the pair of equations,

f1(x, y, z) = x2 + y2 + z2 = 9 and f2(x, y, z) = 3x+ 4y + 2z = 6,

which represent the intersection of the sphere with a plane yielding a circleon the sphere. This “one-dimensional” object should have only one “freevariable”. So in this case we might like to regard z and y as dependentvariables and compute their derivatives with respect to the independentvariable x.

Computing ddx of the pair of equations, yields

2x+ 2ydy

dx+ 2z

dz

dx= 0 and 3 + 4

dy

dx+ 2

dz

dx= 0,

which can be expressed in matrix vector form as(2x3

)+

(2y 2z4 2

)( dydxdzdx

)=

(00

).

Note that the first vector is d~fdx the matrix is ~f ′ if we only regard ~f as a

function of y, z and the second vector is ddx(y, z)t.

At the point (2, 1,−2) this matrix is invertible and we obtain,( dydxdzdx

)= −

(2 −44 2

)−1(43

)=−1

20

(2 4−4 2

)(43

)=−1

20

(20−10

)=

(−11/2

).

What these numbers tell us is that if near (2, 1,−2) we wish to find apoint on this circle on the sphere of the form (2 +h, y, z) then a good linearapproximation is given by

y=1− h and z=− 2 + h/2.

Note that these calculations tell us a great deal about this circle thatwould be difficult to see directly. For example, the above matrix is not


invertible when y − 4z = 0. So perhaps if we tried to explicitly solve theabove pair of equations for y and z as functions of x near a point of theform (x0, 4b, b) then we would find that there were multiple solutions nearsuch points and so y and z cannot be expressed as functions, or perhaps thetangent line to the circle is vertical in either the y-direction or z-directionand so no derivative exists.

In the general case of the implicit function theorem, we will want toexamine n equations f1, ..., fn in m + n variables and regard n of thesevariables as depending on m independent variables. To keep our notationclose to what we encounter in calculus, we shall write ~y = (y1, .., yn) for thedependent variables and ~x = (x1, ...xm) for the independent variables. Ourgoal is to consider the equations,

f1(~x, ~y) = c1, ...., fn(~x, ~y) = cn,

where c1, ..., cn are constants and for a given point ( ~x0, ~y0) that satisfies theseequations to compute

∂yi∂xj

, 1 ≤ i ≤ n, 1 ≤ j ≤ m.

If we write this in vector notation, with ~f = (f1, ..., fn), then we havethat ~f : Rm+n → Rn, with a typical vector in Rm+n written as (~x, ~y), with~x ∈ Rm, and ~y ∈ Rn. Thus, our equation becomes ~f(~x, ~y) = ~c and we knowthat ~f( ~x0, ~y0) = ~c.

Note that the derivative of ~f will be an n× (m+n) matrix. The first mcolumns of this matrix will involve derivatives with respect to the x variablesand the next n columns will be derivatives with respect to the y variables.Thus,

~f ′(~x, ~y) =[ ∂ ~f∂x1

, ...,∂ ~f

∂xm,∂ ~f

∂y1, ...,

∂ ~f

∂yn

].

To simplify notation further, we will write

~f ′(~x, ~y) =[∂ ~f∂~x,∂ ~f

∂~y

],

where∂ ~f

∂~x

is the n×m matrix consisting of the first m columns of ~f ′ and

∂ ~f

∂~y


is the n× n matrix consisting of the last n columns of ~f ′. In this notation,our goal is to find the n×m matrix

∂~y

∂~x=( ∂yi∂xj

).

This notation really carries the day, because what we will show is that,in matrix notation using matrix products,

∂ ~f

∂~x+∂ ~f

∂~y

∂~y

∂~x= 0.

Hence, when the n× n matrix ∂ ~f∂~y is invertible, we expect to find that

∂~y

∂~x= −

(∂ ~f∂~y

)−1∂ ~f∂~x.

We are now prepared to prove the implicit function theorem. First twolemmas about matrices.

Lemma 2.34. Let A =

(A1,1 A1,2

A2,1 A2,2

)be a (n1 + n2) × (p1 + p2) matrix

partitioned into submatrices Ai,j of size ni × pj and let B =

(B1,1 B1,2

B2,1 B2,2

)be a (p1 + p2) × (m1 + m2) matrix partioned into submatrices Bi,j of sizepi ×mj . Then

A ·B =

(A1,1B1,1 +A1,2B2,1 A1,1B1,2 +A1,2B2,2

A2,1B1,1 +A2,2B2,1 A2,1B1,2 +A2,2B2,2

).

Lemma 2.35. Let A =

(A1,1 0A2,1 A2,2

)be a (n1 + n2) × (n1 + n2) matrix

partitioned into submatrices of size ni × nj . If A1,1 and A2,2 are invertible,then A is invertible and

A−1 =

(A−11,1 0

−A−12,2A2,1A−11,1 A−12,2

).

Theorem 2.36 (Implicit Function Theorem). Let U ⊆ Rm+n and writepoints in U as (~x, ~y) with ~x ∈ Rm, ~y ∈ Rn, let ~f : Rm+n → Rn be C1 on U,let ( ~x0, ~y0) ∈ U with ~f( ~x0, ~y0) = ~c. If the n× n matrix(∂ ~f

∂~y( ~x0, ~y0)

)


is invertible, then there is an r > 0 and a C1 function ~g : B( ~x0; r) → Rnwith ~g( ~x0) = ~y0 such that

~f(~x,~g(~x)) = ~c

and

~g′(~x) = −(∂ ~f∂~y

(~x,~g(~x)))−1(∂ ~f

∂~x(~x,~g(~x))

),

for all ~x ∈ B( ~x0; r).

Proof. Let A2,1 be the n×m matrix(∂ ~f∂~x ( ~x0, ~y0)

)and let A2,2 be the n× n

matrix(∂ ~f~y ( ~x0, ~y0)

). Define ~F : U → Rm+n by

~F (~x, ~y) =

(~x

~f(~x, ~y)

).

Then ~F is C1 on U and its derivative is an (m+n)× (m+n) matrix, writingthis matrix in block form we have

~F ′(~x, ~y) =

(Im 0(∂ ~f∂~x

) (∂ ~f∂~y

)) .In particular,

~F ′( ~x0, ~y0) =

(Im 0A2,1 A2,2

).

By Lemma 2.32 this matrix is invertible, and so we may apply the inversefunction theorem to conclude that there exists r1 > 0 so that ~F is on-to-oneon B = B(( ~x0, ~y0); r1) ⊆ Rm+n, V = ~F (B) is open and the inverse of ~F ,~G : V → B is C1 on V with the formula for the derivative of ~G as given inthe theorem.

Note that since ~F (~x, ~y) = (~x, ~f(~x, ~y)), we have that if (~x, ~z) ∈ V, then

~G(~x, ~z) = (~x, ~g1(~x, ~z)),

for some function ~g1 : V → Rn. Moreover, ~g1 will also be C1 on V since it isjust the last n components of ~G.

Since V is open and ~F ( ~x0, ~y0) = ( ~x0,~c) ∈ V, there is r > 0, so that‖~x − ~x0‖2 < r implies that (~x,~c) ∈ V. Thus, on B( ~x0; r) ⊆ Rm, we maydefine a function

~g : B( ~x0; r)→ Rn by ~g(~x) = ~g1(~x,~c).


Since ~F ◦ ~G(~x, ~z) = (~x, ~z), we have for ~x ∈ B( ~x0; r),(~x,~c)

= ~F ◦ ~G(~x,~c) = ~F (~x,~g(~x)) =(~x, ~f

(~x,~g(~x)

)),

and it follows that~f(~x,~g(~x)

)= ~c.

Finally, to find the derivative formula, first note that

~g′(~x) =(∂ ~g1∂x

(~x,~c)),

and since ~F ◦ ~G(~x, ~z) = (~x, ~z) by the chain rule

(Im 00 In

)= ~F ′

(~G(~x,~c)

)~G′(~x,~c) =(

Im 0(∂ ~f∂x

(~x, ~g1(~x,~c)

)) (∂ ~f∂~y

(~x, ~g1(~x,~c)

)))( Im 0(∂ ~g1∂~x (~x,~c)

) (∂ ~g1∂~z (~x,~c)

))

Examining the (2,1)-entries of these block matrices, yields that

0 =(∂ ~f∂x

(~x, ~g1(~x,~c)

))· Im +

(∂ ~f∂~y

(~x, ~g1(~x,~c)

))·(∂ ~g1∂~x

(~x,~c))

=(∂ ~f∂x

(~x,~g(~x)

))+(∂ ~f∂~y

(~x,~g(~x)

))· ~g′(~x).

Solving for ~g′ yields

~g′(~x) = −(∂ ~f∂~y

(~x,~g(~x)))−1·(∂ ~f∂~x

(~x,~g(~x))).

Using our simplified notation, the function ~g is “solving” for the “y”variables, so that yi = gi(~x). Thus, this last equation(ignoring the variables)is ( ∂yi

∂xj

)=( ∂gi∂xj

)= ~g′ = −

(∂ ~f∂~y

)−1 · (∂ ~f∂~x

).


Problem 2.37. Consider the equations

u5 + xv2 − y + w = 0,

v5 + yu2 − x+ w = 0,

w4 + y5 − x4 − 1 = 0

which are satisfied by x = 1, y = 1, u = 1, v = 1, w = −1. Show thatthe hypotheses of the implicit function are met allowing us to express u =g1(x, y), v = g2(x, y), w = g3(x, y) in a neighborhood of (x0, y0) = (1, 1).

• Use the implicit function theorem to compute the 6 partial derivativesof the g’s with respect to x and y at (1, 1).

• Use these partials to give approximate values of u, v, w that satisfythese equations for x = 1.1 and y = 0.8.

Problem 2.38. Consider the equations,

xu2 + yv2 + xy − 9 = 0,

xv2 + yu2 − xy − 7 = 0.

Assuming that a point (x0, y0, u0, v0) satisfies these equations, what condi-tions on x0, y0 are needed to guarantee that the hypostheses of the implicitfunction theorem are met allowing us to express u, v as functions of x, y nearx0, y0.? Repeat the exercise, for expressing x, y as functions of u, v.

2.9 Local Extrema and the Second Derivative Test

Definition 2.39. Let U ⊆ Rn and let f : U → R. A point ~x ∈ U is called alocal maximum if there is an r > 0 so that if ‖~y−~x‖ < r then f(~x) ≥ f(~y)and a strict local maximum if there is an r > 0 so that f(~x) > f(~y) for0 < ‖~x−~y‖ < r. Similarly, ~x is called a local minimum(respectively, strictlocal minimum) if f(~x) ≤ f(~y) for ‖~x − ~y‖ < r(respectively, f(~x) < f(~y)for 0 < ‖~x − ~y‖ < r). The point ~x is called a local extrema(respectively,strict local extrema) if it is either a local maximum or local minimum(respectively, strict local maximum or strict local minimum).

Proposition 2.40 (1st Derivative Test). Let U ⊆ Rn be open, let f : U → Rand let ~x ∈ U be a local extrema. If f is differentiable at ~x then f ′(~x) = ~0.

2.9. LOCAL EXTREMA AND THE SECOND DERIVATIVE TEST 85

Proof. Say that f(~x) ≥ f(~y) for all ~y ∈ B(~x; r). Let ~u be any unit vectorand define g : (−r,+r) → Rn by g(t) = ~x + t~u and h : (−r,+r) → R byh(t) = f(g(t)). Then h has a local max at t = 0 and so by the one-variabletheorem and chain rule,

0 = h′(0) = f ′(g(0)) · g′(0) = f ′(~x) · ~u.

This shows that f ′(~x) is a vector whose dot product with every unit vectoris 0. Hence, f ′(~x) = ~0.

We now wish to prove the analogue of the one-variable second derivativetest. For this we will need a few new concepts.

Definition 2.41. Let P = (pi,j) = P t be an n×n matrix. Then P is calledpositive definite provided that for every non-zero vector ~v = (v1, ..., vn)we have that

(P~v) · ~v =n∑

i,j=1

pi,jvivj > 0.

One needs to be careful when using other books. Some only demand thatthe “ > ” be a “≥” and use the phrase “strictly positive definite” instead.

There are many theorems in linear algebra characterizing these matrices.The following will be useful for us.

Given P = (pi,j) an n×n matrix for each k, 1 ≤ k ≤ n, we let Pk be thek × k matrix

Pk = (pi,j)ki,j=1.

Theorem 2.42. Let P = P t be an n×n matrix. Then P is positive definiteif and only if

det(Pk) > 0, for 1 ≤ k ≤ n.

Recall the idea of higher order partial derivatives. The order is the totalnumber of partial derivatives that one takes.

Definition 2.43. Let U ⊆ Rn be an open set and let f : U → R. We saythat f is Ck on U if all the partial derivatives up to order k exist and arecontinuous on U.

Theorem 2.44 (Equality of Mixed Partials). Let U ⊆ Rn and let f : U →R. If f is C2 on U, then for any 1 ≤ i, j ≤ n,

∂2f

∂xi∂xj=

∂2f

∂xj∂xi.


Proof. It is enough to consider the case n = 2 and to simplify we’ll label thevariables x and y. Suppose that there is a point where they are not equal,then one is strictly larger and by continuity of the partials, the larger one

stays strictly larger on an open set. So say that ∂2f∂x∂y >

∂2f∂y∂x on an open set

V ⊆ U.Pick a closed rectangle contained in V, [a, b] × [c, d] ⊆ V. Then we have

that ∫ b

a

∫ d

c

∂2f

∂x∂ydydx >

∫ b

a

∫ d

c

∂2f

∂y∂xdydx.

Starting with the 2nd integral we have that∫ b

a

∫ d

c

∂2f

∂y∂x(x, y)dydx =

∫ b

a

(∫ d

c

∂

∂y[∂f

∂x(x, y)]dy

)dx =∫ b

a

(∂f∂x

(x, d)− ∂f

∂x(x, c)

)dx =

(f(b, d)− f(a, d)

)−(f(b, c)− f(a, c)

)However, after first reversing the order of integration, one gets the same

answer for the first integral.Contradicting the strict inequality.

Definition 2.45. Let U ⊆ Rn and let f : U → R be twice differentiable.Then the n× n matrix

H(f) =( ∂2f

∂xi∂xj

)is called the Hessian of f.

By the last result if f is C2 on U then the H(f) = H(f)t.

Theorem 2.46 (Second Derivative Test). Let U ⊆ Rn be open, let ~x0 ∈ Uand let f : U → R. If

• f ′(~x0) = ~0,

• f is C2 on a neighborhood of ~x0,

• H(f)(~x0) is positive definite,

then ~x0 is a strict local minimum.

Remark 2.47. To test for a local maximum, we just apply the above theoremto −f . Thus, ~x0 will be a local maximum if the first two conditions are metand −H(f)( ~x0) is positive definite.

2.9. LOCAL EXTREMA AND THE SECOND DERIVATIVE TEST 87

We need two lemmas.

Lemma 2.48. Let h : (−r,+r)→ R be twice differentiable on (−r,+r) withh′(0) = 0. If h′′(s) > 0 for all |s| < r, then h(0) < h(t) for all 0 < |t| < r.

Proof. We apply Taylor’s theorem to write

h(t) = h(0) + h′(0)t+h′′(s)

2t2 = h(0) +

h′′(s)

2t2,

for some s between 0 and t. By our hypotheses the 2nd term is positive andso h(t) > h(0).

Lemma 2.49. Let U ⊆ Rn be open, f : U → R be C2 on U , and let ~x0 ∈ U.If H(f)(~x0) is positive definite, then there is r > 0 such that H(f)(~x) ispositive definite for ~x ∈ B(~x0; r).

Proof. To check if a matrix A is positive definite it is enough to check if(A~u) · ~u > 0 for every unit vector ~u. But since the set of unit vectors is acompact subset and this is clearly a continuous function there will exist aminimum value m > 0 so that (A~u) · ~u ≥ m. Now, we may pick a distanceδ so that if B with |ai,j − bi,j | < δ, then (B~u) · ~u ≥ m/2 > 0.

Since f is C2 the entries of H(f) all vary continuously and so we maypick an r > 0 so that if ‖~x− ~x0‖ < r then∣∣∣ ∂2f

∂xi∂xj(~x)− ∂2f

∂xi∂xj(~x0)

∣∣∣ < δ.

Proof. Pick an r as given by the last lemma so that for ‖~x − ~x0‖ < r wehave that H(f)(~x) is positive definite.

Fix a unit vector ~u = (u1, ..., un) define g : (−r,+r) → Rn by g(t) =~x0 + t~u and let h : (−r,+r)→ R be h(t) = f(g(t)) as earlier. We have that

h′(t) = f ′(g(t)) · ~u =n∑j=1

∂f

∂xj(g(t))uj .

Now when we apply the chain rule to differentiate each function in thissum we see that

h′′(t) =n∑j=1

( n∑i=1

∂2f

∂xi∂xj(g(t))uiuj

)= [H(f)(g(t))~u] · ~u.


Hence, h′′(t) > 0 for all |t| < r.

By the one-variable second derivative test we have that

f(~x0 + t~u) > f(~x0),

for 0 < |t| < r. Since this holds for every direction ~u we have that

f(~x) > f(~x0)

whenever ‖~x− ~x0‖ < r.

Definition 2.50. A point ~x0 such that f ′(~x0) = ~0 but neither H(f)(~x0) nor−H(f)(~x0) is positive definite is called a saddle point.

Problem 2.51. For f(x, y, z) = (x3−3x+4)(y2+1)(z2+1) find and classifyall points where the derivative vanishes.

Problem 2.52. Let f(x, y, z) = x2 + (x− 1)2(y− 2)2 + (z− 1)2y. Find andclassify all the points where the deriviative vanishes.

2.10 Taylor Series in Several Variables

First we introduce multi-index notation, which simplifies many formulas.By a multi-index in dimension n we mean a tuple I = (i1, ..., in) where eachik is a non-negative integer. We set

• |I| = i1 + · · ·+ in,

• xI = xi11 · · ·xinn , which is a monomial in the variables x1, ..., xn,

• I! = (i1!) · · · (in!).

• given a functionf of n variables,

∂|I|f

∂xI=

∂i1

∂xi11· · · ∂

in

∂xinnf

By a polynomial in n variables, we mean any finite sum of the form

p(x1, ..., xn) = p(~x) =∑

aIxI ,

2.10. TAYLOR SERIES IN SEVERAL VARIABLES 89

where aI are real numbers. The total degree or more shortly, degree ofsuch a polynomial is the maximum of |I| over all non-zero terms. We canwrite the most general polynomial in n variables of degree at most M as∑

|I|≤M

aIxI .

Note that given a polynomial, p(~x) =∑aIx

I , we obtain the coefficientof xI by

aI =1

I!

∂|I|p

∂xI(~0).

Thus, given an open set U ⊆ Rn a point x0 ∈ U and an infinitelydifferentiable function f : U → R we define its Taylor series centered atx0 to be ∑

|I|≥0

1

I!

(∂|I|f∂xI

(x0))

(x− x0)I ,

where if x0 = (a1, ..., an) and x = (x1, ..., xn) then

(x− x0)I = (x1 − a1)i1 · · · (xn − an)in .

The Taylor polynomial of degree M is defined to be

∑|I|≤M

1

I!

(∂|I|f∂xI

(x0))

(x− x0)I .

The following theorem is called the multi-variable Taylor Theoremwith Lagrange form of the remainder.

Theorem 2.53. Let r > 0, let ~x0 ∈ Rn and let f : B(~x0; r)→ R be CM+1 onB(~x0; r). Then for each ~x ∈ B(~x0; r) there is ~z on the line segment joining~x0 and ~x such that

f(~x) =∑|I|≤M

1

I!

(∂|I|f∂xI

(~x0))

(x− x0)I +∑

|I|=M+1

1

I!

(∂M+1f

∂xI(~z))

(x− x0)I .

Proof. Set

g(t) = f(~x0 + t(~x− ~x0)),

so that g(0) = f(~x0) and g(1) = f(~x). Also g is defined for |t| < R = r‖~x−~x0‖

where R > 1 and has M + 1 derivatives on the interval (−R,+R).


Thus, by the one variable Taylor theorem,

f(~x) = g(1) =

M∑j=0

g(j)(0)

j!1j +

g(M+1)(s)

(M + 1)!,

for some 0 < s < 1.The remainder of the proof is done by showing that

g(j)(t) =∑|I|=j

j!

I!

(∂jf∂xI

(~x0 + t(~x− ~x0)))

(x− x0)I .

This last fact is a messy counting argument. We only illustrate it in thecase that n = 2 and j = 3. Set ~x0 = (a1, a2). So first notice that by thechain rule

g′(t) =

2∑k=1

( ∂f∂xk

(a1 + t(x1 − a1), a2 + t(x2 − a2)))

(xk − ak).

Repeating the chain rule three times yields,

g(3)(t) =2∑

i,j,k=1

∂3f

∂xi∂xj∂xk(a1+t(x1−a1), a2+t(x2−a2))

)(xi−ai)(xj−aj)(xk−ak).

Now comes the counting part. For example if I = (2, 1) then (~x−~x0)I =(x1−a1)2(x2−a2)1 and we need to count how many times this term appearsin the above sum. Well we have 3 factors and we need to pick 2 of them tobe (x1 − a1) and 1 of them to be (x2 − a2) we can do this exactly(

32

)=

3!

(2!)(1!)=j!

I!

ways!In the general case our sum for the jth deriviative will involve all possible

products of j terms involving the factors (x1 − a1), ..., (xn − an) and tocompute the coefficient of (x− x0)I we need to compute how many of theseproducts will be equal to this term. So among the j factors we need to pick(x1 − a1) exactly i1 times, (x2 − a2) exactly i2 times, etc. The answer tothis combinatorial problem is exactly

j!

(i1!) · · · (in!)=j!

I!.

2.10. TAYLOR SERIES IN SEVERAL VARIABLES 91

Problem 2.54. For f(x, y) = cos(x2y) compute the Taylor polynomial ofdegree 2 centered at ~x0 = (0, 0) and at ~x0 = (0, 1).

Problem 2.55. Repeat the above problem for f(x, y) = ex + sin(y).

introduction to real analysis spring 2014 lecture notesvern/4332lec.pdf · introduction to real...

Documents