mathematics a real analysis · a course in real analysis provides a rigorous treatment of the...

A Course in Real Analysis provides a rigorous treatment of the foundations of differ-ential and integral calculus at the advanced undergraduate level.

The first part of the text presents the calculus of functions of one variable. This part covers traditional topics, such as sequences, continuity, differentiability, Riemann inte-grability, numerical series, and the convergence of sequences and series of functions. It also includes optional sections on Stirling’s formula, functions of bounded variation, Riemann–Stieltjes integration, and other topics.

The second part focuses on functions of several variables. It introduces the topological ideas needed (such as compact and connected sets) to describe analytical properties of multivariable functions. This part also discusses differentiability and integrability of multivariable functions and develops the theory of differential forms on surfaces in n.

The third part consists of appendices on set theory and linear algebra as well as solu-tions to some of the exercises.

Features• Provides a detailed axiomatic account of the real number system• Develops the Lebesgue integral on n from the beginning• Gives an in-depth description of the algebra and calculus of differential forms on

surfaces in n

• Offers an easy transition to the more advanced setting of differentiable manifolds by covering proofs of Stokes’s theorem and the divergence theorem at the concrete level of compact surfaces in n

• Summarizes relevant results from elementary set theory and linear algebra • Contains over 90 figures that illustrate the essential ideas behind a concept or

proof• Includes more than 1,600 exercises throughout the text, with selected solutions

in an appendix

With clear proofs, detailed examples, and numerous exercises, this book gives a thor-ough treatment of the subject. It progresses from single variable to multivariable func-tions, providing a logical development of material that will prepare readers for more advanced analysis-based studies.

K22153

w w w . c r c p r e s s . c o m

A COURSE IN

REAL ANALYSISA COURSE IN

REAL ANALYSISHUGO D. JUNGHENN

JUNGHENN• Access online or download to your smartphone, tablet or PC/Mac• Search the full text of this and other titles you own• Make and share notes and highlights• Copy and paste text and figures for use in your own documents• Customize your view by changing font size and layout

WITH VITALSOURCE®

EBOOK

Mathematics

A COURSE IN

REAL ANALYSIS

K22153_FM.indd 1 1/9/15 4:46 PM

K22153_FM.indd 2 1/9/15 4:46 PM

A COURSE IN

REAL ANALYSIS

HUGO D. JUNGHENNThe George Washington Universit y

Washington, D.C. , USA

K22153_FM.indd 3 1/9/15 4:46 PM

CRC PressTaylor & Francis Group6000 Broken Sound Parkway NW, Suite 300Boca Raton, FL 33487-2742

© 2015 by Taylor & Francis Group, LLCCRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government worksVersion Date: 20150109

International Standard Book Number-13: 978-1-4822-1928-9 (eBook - PDF)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information stor-age or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copy-right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-vides licenses and registration for a variety of users. For organizations that have been granted a photo-copy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe.

Visit the Taylor & Francis Web site athttp://www.taylorandfrancis.com

and the CRC Press Web site athttp://www.crcpress.com

TO THE MEMORY OF MYPARENTS

Rita and Hugo

Contents

Preface xi

List of Figures xiii

List of Tables xvii

List of Symbols xix

I Functions of One Variable 1

1 The Real Number System 31.1 From Natural Numbers to Real Numbers . . . . . . . . . . 31.2 Algebraic Properties of R . . . . . . . . . . . . . . . . . . . 41.3 Order Structure of R . . . . . . . . . . . . . . . . . . . . . 81.4 Completeness Property of R . . . . . . . . . . . . . . . . . 121.5 Mathematical Induction . . . . . . . . . . . . . . . . . . . . 191.6 Euclidean Space . . . . . . . . . . . . . . . . . . . . . . . . 24

2 Numerical Sequences 292.1 Limits of Sequences . . . . . . . . . . . . . . . . . . . . . . 292.2 Monotone Sequences . . . . . . . . . . . . . . . . . . . . . . 362.3 Subsequences and Cauchy Sequences . . . . . . . . . . . . . 382.4 Limits Inferior and Superior . . . . . . . . . . . . . . . . . 42

3 Limits and Continuity on R 473.1 Limit of a Function . . . . . . . . . . . . . . . . . . . . . . . 47*3.2 Limits Inferior and Superior . . . . . . . . . . . . . . . . . 553.3 Continuous Functions . . . . . . . . . . . . . . . . . . . . . 593.4 Properties of Continuous Functions . . . . . . . . . . . . . 633.5 Uniform Continuity . . . . . . . . . . . . . . . . . . . . . . . 67

4 Differentiation on R 734.1 Definition of Derivative and Examples . . . . . . . . . . . . 734.2 The Mean Value Theorem . . . . . . . . . . . . . . . . . . . 80*4.3 Convex Functions . . . . . . . . . . . . . . . . . . . . . . . 854.4 Inverse Functions . . . . . . . . . . . . . . . . . . . . . . . 884.5 L’Hospital’s Rule . . . . . . . . . . . . . . . . . . . . . . . . 94

vii

viii Contents

4.6 Taylor’s Theorem on R . . . . . . . . . . . . . . . . . . . . 100*4.7 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . 103

5 Riemann Integration on R 1075.1 The Riemann–Darboux Integral . . . . . . . . . . . . . . . . 1075.2 Properties of the Integral . . . . . . . . . . . . . . . . . . . 1165.3 Evaluation of the Integral . . . . . . . . . . . . . . . . . . . 120*5.4 Stirling’s Formula . . . . . . . . . . . . . . . . . . . . . . . 1295.5 Integral Mean Value Theorems . . . . . . . . . . . . . . . . . 131*5.6 Estimation of the Integral . . . . . . . . . . . . . . . . . . . 1345.7 Improper Integrals . . . . . . . . . . . . . . . . . . . . . . . 1435.8 A Deeper Look at Riemann Integrability . . . . . . . . . . . 151*5.9 Functions of Bounded Variation . . . . . . . . . . . . . . . 152*5.10 The Riemann–Stieltjes Integral . . . . . . . . . . . . . . . . 156

6 Numerical Infinite Series 1636.1 Definition and Examples . . . . . . . . . . . . . . . . . . . 1636.2 Series with Nonnegative Terms . . . . . . . . . . . . . . . . 1696.3 More Refined Convergence Tests . . . . . . . . . . . . . . . 1766.4 Absolute and Conditional Convergence . . . . . . . . . . . . 181*6.5 Double Sequences and Series . . . . . . . . . . . . . . . . . 188

7 Sequences and Series of Functions 1937.1 Convergence of Sequences of Functions . . . . . . . . . . . 1937.2 Properties of the Limit Function . . . . . . . . . . . . . . . 1997.3 Convergence of Series of Functions . . . . . . . . . . . . . . 2047.4 Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

II Functions of Several Variables 229

8 Metric Spaces 2318.1 Definitions and Examples . . . . . . . . . . . . . . . . . . . . 2318.2 Open and Closed Sets . . . . . . . . . . . . . . . . . . . . . 2388.3 Closure, Interior, and Boundary . . . . . . . . . . . . . . . 2438.4 Limits and Continuity . . . . . . . . . . . . . . . . . . . . . 2488.5 Compact Sets . . . . . . . . . . . . . . . . . . . . . . . . . 255*8.6 The Arzelà–Ascoli Theorem . . . . . . . . . . . . . . . . . . 2638.7 Connected Sets . . . . . . . . . . . . . . . . . . . . . . . . . 2688.8 The Stone–Weierstrass Theorem . . . . . . . . . . . . . . . 275*8.9 Baire’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . 282

9 Differentiation on Rn 2879.1 Definition of the Derivative . . . . . . . . . . . . . . . . . . . 2879.2 Properties of the Differential . . . . . . . . . . . . . . . . . 2959.3 Further Properties of the Differential . . . . . . . . . . . . . 3019.4 Inverse Function Theorem . . . . . . . . . . . . . . . . . . . 306

Contents ix

9.5 Implicit Function Theorem . . . . . . . . . . . . . . . . . . 3129.6 Higher Order Partial Derivatives . . . . . . . . . . . . . . . 3189.7 Higher Order Differentials and Taylor’s Theorem . . . . . . 323*9.8 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 330

10 Lebesgue Measure on Rn 34310.1 General Measure Theory . . . . . . . . . . . . . . . . . . . 34310.2 Lebesgue Outer Measure . . . . . . . . . . . . . . . . . . . . 34710.3 Lebesgue Measure . . . . . . . . . . . . . . . . . . . . . . . . 35110.4 Borel Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35610.5 Measurable Functions . . . . . . . . . . . . . . . . . . . . . 360

11 Lebesgue Integration on Rn 36711.1 Riemann Integration on Rn . . . . . . . . . . . . . . . . . . . 36711.2 The Lebesgue Integral . . . . . . . . . . . . . . . . . . . . . 36811.3 Convergence Theorems . . . . . . . . . . . . . . . . . . . . 37911.4 Connections with Riemann Integration . . . . . . . . . . . 38511.5 Iterated Integrals . . . . . . . . . . . . . . . . . . . . . . . . 38811.6 Change of Variables . . . . . . . . . . . . . . . . . . . . . . 398

12 Curves and Surfaces in Rn 40912.1 Parameterized Curves . . . . . . . . . . . . . . . . . . . . . 40912.2 Integration on Curves . . . . . . . . . . . . . . . . . . . . . 41212.3 Parameterized Surfaces . . . . . . . . . . . . . . . . . . . . 42212.4 m-Dimensional Surfaces . . . . . . . . . . . . . . . . . . . . 432

13 Integration on Surfaces 44713.1 Differential Forms . . . . . . . . . . . . . . . . . . . . . . . . 44713.2 Integrals on Parameterized Surfaces . . . . . . . . . . . . . . 46113.3 Partitions of Unity . . . . . . . . . . . . . . . . . . . . . . . 47213.4 Integration on Compact m-Surfaces . . . . . . . . . . . . . 47513.5 The Fundamental Theorems of Calculus . . . . . . . . . . . 478*13.6 Closed Forms in Rn . . . . . . . . . . . . . . . . . . . . . . 495

III Appendices 503

A Set Theory 505

B Linear Algebra 509

C Solutions to Selected Problems 517

Bibliography 581

Index 583

Preface

The purpose of this text is to provide a rigorous treatment of the foundationsof differential and integral calculus at the advanced undergraduate level. Itis assumed that the reader has had the traditional three semester calculussequence and some exposure to elementary set theory and linear algebra. Asregards the last two subjects, appendices provide a summary of most of theresults used in the text. Linear algebra will not be needed until Part II.

The book consists of three parts. Part I treats the calculus of functions ofone variable. Here, one can find the traditional topics: sequences, continuity,differentiability, Riemann integrability, numerical series, and convergence ofsequences and series of functions. Optional sections on Stirling’s formula,Riemann–Stieltjes integration, and other topics are also included. As the ideasinherent in these subjects ultimately rest on properties of real numbers, thebook begins with a careful treatment of the real number system. For this wetake an axiomatic rather than a constructive approach, guided as much bythe need for efficiency of exposition as by pedagogical preference. Of course,presenting the real number system in this way begs the excellent questionas to whether such a system exists. It is a question we do not answer, butthe interested reader may wish to consult a text on the construction of thereal number system from the natural numbers, or even on the philosophy ofmathematics.

Part II treats functions of several variables. Many of the results in Part I,such as the chain rule, the inverse function theorem, and the change of variablestheorem, have counterparts in Part II. The reader’s exposure to the one-variableresults should make the multivariable versions more meaningful and accessible.As might be expected, however, some results in Part II have no counterparts inPart I, the implicit function theorem and the iterated integral (Fubini–Tonelli)theorem being obvious examples.

Part II begins with a chapter on metric spaces. Here we introduce thetopological ideas needed to describe some of the analytical properties ofmultivariable functions. Primary among these are the notions of compact setand connected set, which, for example, allow the extension to higher dimensionsof the extreme value and intermediate value theorems. The remainder of Part IIcovers differentiability and integrability of multivariable functions. As regardsintegrability, we have chosen to develop from the beginning the Lebesgueintegral rather than to the extend the Riemann integral to higher dimensions.The additional time required for this approach is, in my view, more than offset

xi

xii Preface

by the enormous added utility of the Lebesgue integral. The last chapter ofPart II develops the theory of differential forms on surfaces in Rn. The chapterculminates with proofs of Stokes’s theorem and the divergence theorem forcompact surfaces. It is hoped that exposure to these topics at the concretelevel of surfaces in Rn will ease the transition to more advanced courses suchas calculus on differentiable manifolds.

Part III consists of the aforementioned appendices on set theory and linearalgebra, as well as solutions to some of the over 1600 exercises found in thetext. For convenience, exercises with solutions that appear in the appendixare marked with a superscript S. Exercises that will find important uses laterare marked with a downward arrow ⇓. Instructors with suitable bona fidesmay obtain from the publisher a manual of complete solutions to all of theexercises.

The book is an outgrowth of notes developed over many years of teachingreal analysis to undergraduates at George Washington University. The morerecent versions of these notes have been specifically tested in classes over thelast three years. During this period, the typical two-semester course closelyfollowed the non-starred sections of this text: Chapters 1–7 for the first semesterand 8–13 for the second. Given the wealth of material, it was necessary toleave some proofs for students to read on their own, a not wholly unfortunatecompromise. Material in some starred sections was assigned as optional reading.

I would like to express my gratitude to the many students whose criticaleyes caught errors before they made their way into these pages. Of course, anyremaining errors are my complete responsibility. Special thanks are due toZehua Zhang, whose enlightened comments have improved the exposition ofseveral topics.

Finally, to my wife Mary for her support and understanding during thewriting of this book: thank you!

Hugo D. JunghennWashington, D.C.September 2014

List of Figures

1.1 Supremum and infimum of A . . . . . . . . . . . . . . . . . 121.2 Greatest integer function . . . . . . . . . . . . . . . . . . . . 14

2.1 Convergence of a sequence . . . . . . . . . . . . . . . . . . . 302.2 Squeeze principle . . . . . . . . . . . . . . . . . . . . . . . . . 312.3 Interval halving process . . . . . . . . . . . . . . . . . . . . 392.4 Limits supremum and infimum . . . . . . . . . . . . . . . . 42

3.1 Limit of a function . . . . . . . . . . . . . . . . . . . . . . . 483.2 L can’t be greater than M . . . . . . . . . . . . . . . . . . . 533.3 One-to-one correspondence between D and Q . . . . . . . . . 613.4 Intermediate value property . . . . . . . . . . . . . . . . . . 64

4.1 Trigonometric inequality . . . . . . . . . . . . . . . . . . . . 744.2 Local extrema . . . . . . . . . . . . . . . . . . . . . . . . . . 804.3 Mean value theorems . . . . . . . . . . . . . . . . . . . . . . . 814.4 Convex function . . . . . . . . . . . . . . . . . . . . . . . . . 864.5 Convex function inequalities . . . . . . . . . . . . . . . . . . . 874.6 Intermediate value property implies monotonicity . . . . . . 894.7 Intermediate value property implies continuity . . . . . . . . 894.8 Newton’s method . . . . . . . . . . . . . . . . . . . . . . . . 104

5.1 Upper and lower sums . . . . . . . . . . . . . . . . . . . . . 1085.2 The partitions P and Q . . . . . . . . . . . . . . . . . . . . 1105.3 The partition Pn . . . . . . . . . . . . . . . . . . . . . . . . . 1115.4 The partitions P ′, P, and P ′′ . . . . . . . . . . . . . . . . . 1125.5 Riemann sum . . . . . . . . . . . . . . . . . . . . . . . . . . 1135.6 The partitions Px and Py . . . . . . . . . . . . . . . . . . . 1225.7 Trapezoidal rule approximation . . . . . . . . . . . . . . . . 1365.8 Midpoint rule approximation . . . . . . . . . . . . . . . . . . . 1375.9 Simpson’s rule approximation . . . . . . . . . . . . . . . . . 1395.10 The partition Q . . . . . . . . . . . . . . . . . . . . . . . . . 159

7.1 Uniform convergence . . . . . . . . . . . . . . . . . . . . . . 1937.2 Pointwise convergence insufficient . . . . . . . . . . . . . . . . 201

xiii

xiv List of Figures

8.1 An open ball is open . . . . . . . . . . . . . . . . . . . . . . 2398.2 The functions gn and g . . . . . . . . . . . . . . . . . . . . . 2408.3 Convex and non-convex sets . . . . . . . . . . . . . . . . . . . 2418.4 The neighborhoods Ux and Vx . . . . . . . . . . . . . . . . . 2558.5 A 2ε net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2568.6 A bounded set in Rn is totally bounded . . . . . . . . . . . . 2578.7 A separation (U, V ) of E . . . . . . . . . . . . . . . . . . . . 2688.8 C1(−1, 0) ∪ C1(1, 0) is path connected . . . . . . . . . . . . . 2718.9 E is path connected . . . . . . . . . . . . . . . . . . . . . . . 2728.10 A piecewise linear function . . . . . . . . . . . . . . . . . . . 2768.11 Sawtooth function . . . . . . . . . . . . . . . . . . . . . . . . 285

9.1 The domain of argθ0 . . . . . . . . . . . . . . . . . . . . . . 3109.2 Saddle point . . . . . . . . . . . . . . . . . . . . . . . . . . . 330

10.1 Interval grid . . . . . . . . . . . . . . . . . . . . . . . . . . . 34810.2 Coverings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34910.3 Middle thirds . . . . . . . . . . . . . . . . . . . . . . . . . . 35310.4 Ternary expansion algorithm . . . . . . . . . . . . . . . . . . 35410.5 Decomposition into half-open intervals . . . . . . . . . . . . . 35710.6 K = cl(E) \ U . . . . . . . . . . . . . . . . . . . . . . . . . . 35810.7 The components of fk . . . . . . . . . . . . . . . . . . . . . 36310.8 The components of fk+1 . . . . . . . . . . . . . . . . . . . . 363

11.1 Partition of an n-dimensional interval . . . . . . . . . . . . . . 36711.2 Three-dimensional simplex . . . . . . . . . . . . . . . . . . . 39011.3 Concentric cube and ball . . . . . . . . . . . . . . . . . . . . 40211.4 The paving Qr . . . . . . . . . . . . . . . . . . . . . . . . . 40311.5 Theorem of Pappus . . . . . . . . . . . . . . . . . . . . . . . 408

12.1 Curves in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 40912.2 A piecewise smooth curve with tangent vectors . . . . . . . 41012.3 Inscribed polygonal line . . . . . . . . . . . . . . . . . . . . 41212.4 Vector field on E . . . . . . . . . . . . . . . . . . . . . . . . 41612.5 Closed curve ϕ . . . . . . . . . . . . . . . . . . . . . . . . . 41812.6 Concatenation of curves . . . . . . . . . . . . . . . . . . . . 41912.7 Tangent spaces at p . . . . . . . . . . . . . . . . . . . . . . . 42212.8 Affine space . . . . . . . . . . . . . . . . . . . . . . . . . . . 42412.9 The inward unit normal . . . . . . . . . . . . . . . . . . . . . 42712.10 Normal vector to S at p . . . . . . . . . . . . . . . . . . . . . 42712.11 Surface of revolution . . . . . . . . . . . . . . . . . . . . . . 42912.12 Möbius strip . . . . . . . . . . . . . . . . . . . . . . . . . . . 43012.13 The mapping G−1

a . . . . . . . . . . . . . . . . . . . . . . . . 43412.14 Transition mappings . . . . . . . . . . . . . . . . . . . . . . 43512.15 Stereographic projection . . . . . . . . . . . . . . . . . . . . 43612.16 The mapping dψx . . . . . . . . . . . . . . . . . . . . . . . . 438

List of Figures xv

12.17 Cylinder-with-boundary . . . . . . . . . . . . . . . . . . . . 44012.18 Surface element . . . . . . . . . . . . . . . . . . . . . . . . . . 44112.19 Induced orientation of T ∂Sa . . . . . . . . . . . . . . . . . . . 44312.20 Stereographic projection from p . . . . . . . . . . . . . . . . 444

13.1 Parallelogram approximation to ϕ(Q) . . . . . . . . . . . . . 46313.2 Two dimensional simplex . . . . . . . . . . . . . . . . . . . . 47013.3 A partition of unity subordinate to U1 and U2 . . . . . . . . 47213.4 The functions h and g . . . . . . . . . . . . . . . . . . . . . 47313.5 The cubes Wi and Vi . . . . . . . . . . . . . . . . . . . . . . 47413.6 Regular region E . . . . . . . . . . . . . . . . . . . . . . . . 48313.7 Annulus in R2 with exterior normal . . . . . . . . . . . . . . 48313.8 The case a ∈ E . . . . . . . . . . . . . . . . . . . . . . . . . 48413.9 The case a ∈ bd(E) . . . . . . . . . . . . . . . . . . . . . . . 48513.10 Regular region in R2 . . . . . . . . . . . . . . . . . . . . . . 48813.11 Piecewise smooth surfaces . . . . . . . . . . . . . . . . . . . 48913.12 Oriented cube without bottom face . . . . . . . . . . . . . . 49013.13 Closed polygon . . . . . . . . . . . . . . . . . . . . . . . . . 49013.14 Surfaces S1 and S2 with common boundary C . . . . . . . . 49413.15 Curves contracting to p must pass through q . . . . . . . . 49513.16 Boundary parametrization . . . . . . . . . . . . . . . . . . . . 49713.17 Star-shaped and non-star-shaped regions . . . . . . . . . . . 499

C.1 Open balls for Exercise 1 . . . . . . . . . . . . . . . . . . . . . 551

List of Tables

4.1 Newton’s method for ex + x− 2 = 0 . . . . . . . . . . . . . 105

5.1 Table for evaluating∫fh by parts . . . . . . . . . . . . . . . 125

5.2 Table for evaluating∫

(x+ 1)3e5x dx by parts . . . . . . . . 1255.3 A comparison of the methods . . . . . . . . . . . . . . . . . 143

9.1 Values of ∆ . . . . . . . . . . . . . . . . . . . . . . . . . . . 3339.2 Values of ∆ . . . . . . . . . . . . . . . . . . . . . . . . . . . 334

xvii

List of Symbols

R real number system . . . . . . . . . . . . . 3∑summation symbol . . . . . . . . . . . . . 4∏product symbol . . . . . . . . . . . . . . . 4

N set of natural numbers . . . . . . . . . . . 6Z set of integers . . . . . . . . . . . . . . . . 6Q set of rational numbers . . . . . . . . . . . 6I set of irrational numbers . . . . . . . . . . 6n! n factorial . . . . . . . . . . . . . . . . . . . 7a a greater than . . . . . . . . . . . . . . . . . 8a ≤ b less than or equal . . . . . . . . . . . . . . 9b ≥ a greater than or equal . . . . . . . . . . . . 9|x| absolute value of x . . . . . . . . . . . . . 9maxS maximum of S . . . . . . . . . . . . . . . . 10minS minimum of S . . . . . . . . . . . . . . . . 10x+ positive part of x . . . . . . . . . . . . . . 10x− negative part of x . . . . . . . . . . . . . . 10supA supremum of A . . . . . . . . . . . . . . . 12inf A infimum of A . . . . . . . . . . . . . . . . . 12bxc greatest integer in x . . . . . . . . . . . . . 14R extended real number system . . . . . . . . 15+∞, −∞ positive infinity, negative infinity . . . . . . 15(a, b) open interval . . . . . . . . . . . . . . . . . 16(a, b] left-open interval . . . . . . . . . . . . . . 16[a, b) right-open interval . . . . . . . . . . . . . . 16[a, b] closed interval . . . . . . . . . . . . . . . . 16(nk

)binomial coefficient . . . . . . . . . . . . . . 21

Rn Euclidean space . . . . . . . . . . . . . . . 24x · y Euclidean inner product . . . . . . . . . . 25‖x‖2 Euclidean norm . . . . . . . . . . . . . . . 25‖x‖1 `1 norm . . . . . . . . . . . . . . . . . . . . 26‖x‖∞ max norm . . . . . . . . . . . . . . . . . . 26a× b cross product . . . . . . . . . . . . . . . . . 27limn an limit of a sequence . . . . . . . . . . . . . . 29an ↑ increasing sequence of real numbers . . . . 36

xix

xx List of Symbols

an ↓ decreasing sequence of real numbers . . . . 36an ↑ a sequence increases to a . . . . . . . . . . . 36an ↓ a sequence decreases to a . . . . . . . . . . . 36lim infn an limit infimum of a sequence . . . . . . . . 42lim supn an limit supremum of a sequence . . . . . . . 42N (a) = Nr(a) neighborhood of a . . . . . . . . . . . . . . . 47limx→ax∈E

f(x) limit of f along E . . . . . . . . . . . . . . . 47

limx→a−

f(x) left-hand limit . . . . . . . . . . . . . . . . 48limx→a+

f(x) right-hand limit . . . . . . . . . . . . . . . 48limx→a

f(x) two-sided limit . . . . . . . . . . . . . . . . 48lim

x→+∞f(x) limit at +∞ . . . . . . . . . . . . . . . . . 48

limx→−∞

f(x) limit at −∞ . . . . . . . . . . . . . . . . . 48lim infx→ax∈E

f(x) limit inferior of f along E . . . . . . . . . 56

lim supx→ax∈E

f(x) limit superior of f along E . . . . . . . . . 56

f ′ = Df = dfdx derivative of f . . . . . . . . . . . . . . . . 73

D`f(a) = f ′`(a) left-hand derivative at a . . . . . . . . . . 75Drf(a) = f ′r(a) right-hand derivative at a . . . . . . . . . . 75f (n) nth derivative of f . . . . . . . . . . . . . . 77Tn(x, a) Taylor polynomial . . . . . . . . . . . . . . . 101Rn(x, a) Taylor remainder . . . . . . . . . . . . . . . 101‖P‖ mesh of partition P . . . . . . . . . . . . . . 107S(f,P) lower Darboux sum . . . . . . . . . . . . . . 107S(f,P) upper Darboux sum . . . . . . . . . . . . . . 107∫ baf lower Darboux integral . . . . . . . . . . . 109∫ baf upper Darboux integral . . . . . . . . . . . 109∫ b

af Riemann–Darboux integral . . . . . . . . . 109

Rba set of Riemann integrable functions on [a, b] 110S(f,P, ξ) Riemann sum . . . . . . . . . . . . . . . . 113∫f indefinite integral of f . . . . . . . . . . . . 121

V ba (f) total variation of f on [a, b] . . . . . . . . 152Sw(f,P, ξ) Riemann-Stieltjes sum . . . . . . . . . . . 156∫ baf dw Riemann-Stieltjes integral . . . . . . . . . 156

Sw(f,P) upper Darboux–Stieltjes sum . . . . . . . . 160Sw(f,P) lower Darboux–Stieltjes sum . . . . . . . . 160∫ baf dw upper Darboux-Stieltjes integral . . . . . . 160∫ baf dw lower Darboux-Stieltjes integral . . . . . . 160

List of Symbols xxi∑n an =

∑∞n=1 an infinite series of real numbers . . . . . . . 163

limm limn am,n iterated limit . . . . . . . . . . . . . . . . . 188limm,n am,n double limit . . . . . . . . . . . . . . . . . 189∑m,n am,n double infinite series . . . . . . . . . . . . . 190∑∞j=1

∑∞k=1 aj,k iterated series . . . . . . . . . . . . . . . . 190∑

n fn =∑∞n=1 fn infinite series of functions . . . . . . . . . . 204∑∞

n=0 cn(x− a)n power series in x about a . . . . . . . . . . . 211R = ρ−1 radius of convergence . . . . . . . . . . . . . 211(an

)generalized binomial coefficient . . . . . . . 216

(X, d) metric space . . . . . . . . . . . . . . . . . . 231‖x‖ norm of x . . . . . . . . . . . . . . . . . . 233(X , ‖ · ‖) normed vector space . . . . . . . . . . . . . 233d2 Euclidean metric on Rn . . . . . . . . . . . 233d1 `1 metric on Rn . . . . . . . . . . . . . . . 233d∞ max metric on Rn . . . . . . . . . . . . . . 233B(S) space of bounded f : S → R . . . . . . . . 233‖f‖∞ supremum norm f . . . . . . . . . . . . . . 233`∞ set of bounded sequences . . . . . . . . . . 234`1 set of summable sequences . . . . . . . . . 234‖a‖1 `1 norm of a . . . . . . . . . . . . . . . . . 234d× ρ product metric . . . . . . . . . . . . . . . . 234Br(x) open ball . . . . . . . . . . . . . . . . . . . 238Cr(x) closed ball . . . . . . . . . . . . . . . . . . 238Sr(x) sphere . . . . . . . . . . . . . . . . . . . . 238C([a, b]) space of cont. f on [a, b] . . . . . . . . . . 240D([a, b]) space of diff. f on [a, b] . . . . . . . . . . . 240[a : b] line segment from a to b . . . . . . . . . . . 241cl(E) closure of E . . . . . . . . . . . . . . . . . 243int(E) interior of E . . . . . . . . . . . . . . . . . 243bd(E) boundary of E . . . . . . . . . . . . . . . . 243lim{x→a, x∈E} f(x) limit of f along E . . . . . . . . . . . . . . 248lim(x,y)→(a,b) f(x, y) double limit . . . . . . . . . . . . . . . . . 249limx→a limy→b f(x, y) iterated limit . . . . . . . . . . . . . . . . . 249d(A) diameter of A . . . . . . . . . . . . . . . . . 261d(A,B) distance between A and B . . . . . . . . . . 261C(X,Y ) set of cont. f : X → Y . . . . . . . . . . . 263ext(E) exterior of E . . . . . . . . . . . . . . . . . 274C(X) space of cont. f X :→ R . . . . . . . . . . 275∂jf = fxj = ∂f

∂xjpartial derivative of f . . . . . . . . . . . . 289

∇f or grad f gradient of f . . . . . . . . . . . . . . . . . 289dfa : Rn → Rm differential of f at a . . . . . . . . . . . . . . 291f ′(a) Jacobian matrix of f at a . . . . . . . . . . 291Jf (a) = ∂(f1,...,fn)

∂(x1,...,xn) (a) Jacobian of f . . . . . . . . . . . . . . . . 292

xxii List of Symbols∂mf

∂xm1i1···∂xmn

in

higher order partial derivative . . . . . . . 318

Dmfx mth total differential of f . . . . . . . . . . 324(m

m1,m2,...,mn

)multinomial coefficient . . . . . . . . . . . 325

Tm(x,a) Taylor polynomial . . . . . . . . . . . . . . . 327Rm(x,a) Taylor remainder term . . . . . . . . . . . . 327λ∗ Lebesgue outer measure . . . . . . . . . . . 349M =M(Rn) Lebesgue measurable sets . . . . . . . . . . . 351λ = λn Lebesgue measure on Rn . . . . . . . . . . . 351B = B(Rn) Borel measurable sets . . . . . . . . . . . . 3561A indicator function of A . . . . . . . . . . . 362S+(F) set of F-measurable simple functions ≥ 0 . 362∫f dλ Lebesgue integral of f . . . . . . . . . . . . 370∫Ef dλ Lebesgue integral of f on E . . . . . . . . 370

L1(E) space of integrable functions on E . . . . . 370∫Rp∫Rqf(x, z) dz dx iterated integral . . . . . . . . . . . . . . . 388

f ∗ g convolution of f and g . . . . . . . . . . . 389αn volume of unit ball in Rn . . . . . . . . . . 390length(ϕ) length of curve ϕ . . . . . . . . . . . . . . 412∫ϕf ds line integral over ϕ . . . . . . . . . . . . . 415

~F vector field . . . . . . . . . . . . . . . . . . 416~Tϕ unit tangent vector field along ϕ . . . . . . 416f1 dx1 + · · ·+ fn dxn 1-form in Rn . . . . . . . . . . . . . . . . . . 417ω · ~H inner product of a form and vector field . . . 417ϕ parameterized m-surface . . . . . . . . . . 422Tϕ(u) tangent space of ϕ . . . . . . . . . . . . . . 422sign(ϕ) sign of parametrization ϕ . . . . . . . . . . 424∂ϕ⊥ normal vector to surface ϕ . . . . . . . . . 425~Nϕ normal vector field . . . . . . . . . . . . . 426ϕa : Ua → Sa local parametrization of S . . . . . . . . . 435ϕab transition mapping . . . . . . . . . . . . . 435Sa surface element . . . . . . . . . . . . . . . 435Rn−1

+ Rn−1 with xn−1 > 0 . . . . . . . . . . . . . 440Hn−1 Rn−1 with xn−1 ≥ 0 . . . . . . . . . . . . . 440∂Hn−1 boundary of Hn−1 . . . . . . . . . . . . . . 440∂S boundary of S . . . . . . . . . . . . . . . . 440S \ ∂S interior of S . . . . . . . . . . . . . . . . . 440dxj multidifferential . . . . . . . . . . . . . . . 448ωx differential form . . . . . . . . . . . . . . . . 451ω ∧ η wedge product . . . . . . . . . . . . . . . . 452dω differential of ω . . . . . . . . . . . . . . . 454ϕ∗ω pullback of ω by ϕ . . . . . . . . . . . . . . . 457area(ϕ) area of ϕ . . . . . . . . . . . . . . . . . . . 463∫ϕf dS integral of f on a para. surface . . . . . . . 463

List of Symbols xxiii∫ϕω integral of a form on a para. surface . . . . 466

L(U ,V) set of linear transformations T : U → V . . 510At transpose of A . . . . . . . . . . . . . . . . . 511[T ] matrix of T . . . . . . . . . . . . . . . . . 513detA determinant of A . . . . . . . . . . . . . . 514

Part I

Functions of One Variable

Chapter 1The Real Number System

If the notion of limit is the cornerstone of analysis, then the real numbersystem is the bedrock. In this chapter we provide a description of the realnumber system that is sufficiently detailed to allow a careful development oflimit in the various forms that appear in this book.

The real number system is defined as a nonempty set R together withtwo algebraic operations, called addition and multiplication, and an orderingless than that collectively satisfy three sets of axioms: the algebraic or fieldaxioms, the order axioms, and the completeness axiom. These are discussed inSections 1.2–1.4. We begin, however, with a brief description of how the realnumber system may be constructed from a more fundamental number system.

1.1 From Natural Numbers to Real NumbersA rigorous construction of the real number system starts with the set of

natural numbers (positive integers) N and then proceeds to the set of integersZ, the rational number system Q, and, finally, the real number system R. Inthis approach the natural numbers are assumed to satisfy a set of axiomscalled the Peano Axioms. These are used to define the operations of additionand multiplication in N. Subtraction is introduced by enlarging the systemof natural numbers to Z, thereby allowing solutions of all equations of theform x+m = n, m, n ∈ Z. To obtain division, Z is enlarged to Q by formingall quotients m/n, where m, n ∈ Z, n 6= 0. In this system, one may solve allequations of the form ax+ b = c, a 6= 0. The final step, the construction of Rfrom Q, may be viewed as “filling in the gaps” of the rational number line,these gaps corresponding to the so-called irrational numbers.1

For the details of this “bottom up” approach, the interested reader isreferred to [7] or [10]. We shall instead take a “top down” approach, describingthe real number system axiomatically.

1This step results in a system that, while having the structure necessary to formulate arobust theory of limits, does not allow solutions of all polynomial equations. This shortcomingis removed by introducing complex numbers, a subject outside the scope of this book.

3

4 A Course in Real Analysis

1.2 Algebraic Properties of RIn this section we list the axioms that govern the use of addition (+) and

multiplication (·) in the real number system. These axioms lead to all of thefamiliar algebraic properties of real numbers.

The operations of addition and multiplication satisfythe following field axioms, where a, b, c denote arbitrarymembers of R:

• Closure under addition: a+ b ∈ R.• Associative law of addition: (a+ b) + c = a+ (b+ c).• Commutative law of addition: a+ b = b+ a.• Existence of an additive identity: There exists a member 0 of R such

that a+ 0 = a for all a ∈ R.• Existence of additive inverses: For each a ∈ R there exists a member −a

of R such that a+ (−a) = 0.• Closure under multiplication: a · b ∈ R.• Associative law of multiplication: (a · b) · c = a · (b · c).• Commutative law of multiplication: a · b = b · a.• Existence of a multiplicative identity: There exists a real number 1 6= 0

such that a · 1 = a for all a ∈ R.• Existence of multiplicative inverses: For each a 6= 0 there exists a membera−1 of R such that a · a−1 = 1.

• Distributive law: a · (b+ c) = a · b+ a · c.

We use the following standard notation:

a− b = a+ (−b), ab = a · b, a

b= a/b = ab−1,

a+ b+ c = (a+ b) + c = a+ (b+ c), abc = (ab)c = a(bc),an = aa · · · a︸︷︷︸

n

, a−n = 1/an (a 6= 0), and a0 = 1.

We also use the summation and product symbols∑

and∏

defined byn∑

j=maj = am + am+1 + · · ·+ an and

n∏j=m

aj = amam+1 · · · an.

The field axioms may be used to derive the standard rules of algebra.Some of these are given in the following proposition; others may be found inExercise 1.

The Real Number System 5

1.2.1 Proposition. The following algebraic properties hold in R:

(a) The additive identity is unique; that is, if 0′ is a real number such thata+ 0′ = a for all a ∈ R, then 0′ = 0.

(b) The additive inverse of a real number is unique; that is, if a+ b = 0, thenb = −a.

(c) The multiplicative identity is unique; that is, if 1′ is a real number suchthat a · 1′ = a for all a ∈ R, then 1′ = 1.

(d) a · 0 = 0 for all a ∈ R.

(e) The multiplicative inverse of a nonzero real number is unique; that is, ifab = 1, then b = 1/a.

(f) If ab = 0, then either a = 0 or b = 0.

(g) If ab = ac and a 6= 0, then b = c.

(h) If b 6= 0 and d 6= 0, then a/b = c/d if and only if ad = bc.

(i) If a 6= 0 and b 6= 0, then (ab)−1 = a−1b−1, or 1ab

= 1a

1b.

Proof. (a) If a + 0′ = a for all a then, in particular, 0 + 0′ = 0. But, bydefinition of 0 and commutativity of addition, 0 + 0′ = 0′. Therefore 0′ = 0.

(b) By associativity and commutativity of addition,

b = b+ 0 = 0 + b = (−a+ a) + b = −a+ (a+ b) = −a+ 0 = −a.

(c) If a · 1′ = a for all a then, in particular, 1 · 1′ = 1. But, by definition ofthe multiplicative identity and commutativity of multiplication, 1 · 1′ = 1′.Therefore 1′ = 1.

(d) By the distributive property,

a · 0 = a(0 + 0) = a · 0 + a · 0.

Adding −(a · 0) to both sides of this equation and using associativity ofaddition produces the desired equation.

(e) By associativity and commutativity of multiplication,

b = 1 · b = (a−1a)b = a−1(ab) = a−1 · 1 = a−1.

(f) Assume a 6= 0. By (d) and commutativity and associativity of multiplica-tion,

0 = a−1 · 0 = (a−1)(ab) = (a−1a)b = 1 · b = b.


(g) By commutativity and associativity of multiplication,

b = 1 · b = (a−1a)b = a−1(ab) = a−1(ac) = (a−1a)c = c.

(h) If a/b = c/d, then multiplying both sides by bd and using the commutativityand associativity of multiplication we obtain ad = bc. Conversely, if ad = bc,then multiplying both sides by 1/(bd) yields a/b = c/d.

(i) By associativity and commutativity of multiplication,

(ab)(a−1b−1) = (aa−1)(bb−1) = 1.

Now apply (e).

The reader will notice that the assertions in the proposition are implications,that is, statements of the form p implies q, frequently written p ⇒ q. Suchassertions may be proved directly by assuming p and then deducing q, orindirectly by assuming the negation of q and arguing to a contradiction orto the negation of p. Part (h) also contains an assertion of the form p if andonly if q (hereafter, shortened to p iff q). Such an assertion is established byproving both p⇒ q and q ⇒ p. Throughout the text, we shall encounter manyexamples of such proofs. The reader is advised that a careful proof requires thateach (nontrivial) step be justified by citing hypotheses, appropriate axioms, orpreviously proved results.

One more point of logic: To prove that a general statement involving theuniversal quantifier “for every” (or “for all”) is false, one must construct acounterexample. For example, the assertion that xy = x+y for all real numbersx and y is clearly false. For a proof, one need only find a single pair of numbersx and y such that xy 6= x + y, for example x = y = 1. On the other hand,to prove that x2 − y2 = (x − y)(x + y) for all real numbers x and y, it notsufficient to verify the statement for a specific pair of numbers; a general proofis needed here. For details on constructing proofs in mathematics, the readeris referred to [2].

The number systems described in Section 1.1 are summarized as follows:

• N = {1, 2 := 1 + 1, 3 := 2 + 1, . . .} (positive integers),• Z = {0, ±1, ±2, ±3, . . .} (integers),• Q = {m/n : m,n ∈ Z, n 6= 0} (rational numbers),• I = R \Q (irrational numbers).

An integer is said to be even (odd) if n = 2k (n = 2k + 1) for some k ∈ Z.A precise definition of N is given in Section 1.5. From this it is possible

to argue rigorously that the number system N is closed under addition andmultiplication. As a consequence, Z is closed under addition, subtraction, andmultiplication, and Q is closed under addition, subtraction, multiplication, anddivision (Exercise 2).


Exercises1. Prove the following properties of addition and multiplication in R:

(a) −(−a) = a. (b)S −(ab) = (−a)b = a(−b).(c)⇓2 (−a)(−b) = ab. (d)S (−1)a = −a.

(e) If b, d 6= 0, then a/b

c/d= ad

bc= a

b

d

c.

(f)S If b 6= 0 and d 6= 0, then a

b+ c

d= ad+ bc

bd.

2. Let r, s ∈ Q. Assuming that Z is closed under addition and multiplication,prove that r ± s, rs, r/s ∈ Q, the last provided that s 6= 0.

3.S If r 6= 0 ∈ Q and x ∈ I, prove that r ± x, rx, r/x ∈ I.

4. Let n ∈ N. Prove the following identities without using mathematicalinduction:

(a) ⇓3 xn − yn = (x− y)n∑j=1

xn−jyj−1.

(b) xn + yn = (x+ y)n∑j=1

(−1)j−1xn−jyj−1 if n is odd.

(c) x−n − y−n = (y − x)n∑j=1

xj−n−1y−j if x 6= 0 and y 6= 0.

5.S Define 0! = 1 and, for n ∈ N, define n! = n(n− 1) · · · 2 · 1 (n factorial).Prove the following:

(a) (1− 1/n)(1− 2/n) · · ·(1− (n− 1)/n

)= n!nn

.

(b) 1 · 3 · 5 · · · (2n− 1) = (2n)!2nn! .

6. ⇓4 For n ∈ Z+ and k = 0, 1, . . . , n, define the binomial coefficient(n

k

)= n!k!(n− k)!

(read “n choose k”). Prove that(n+ 1k

)=(

n

k − 1

)+(n

k

).

2This exercise will be used in 1.3.2.3This exercise will be used in 4.1.2.4This exercise will be used in 1.5.5.

7. Without using mathematical induction, prove that for any n ∈ N,

(a)n∑k=0

1(k + 1)(n− k + 1) = 2

n+ 2

n∑k=0

1k + 1 .

(b)n∑k=0

1(2k + 1)(2n− 2k + 1) = 1

n+ 1

n∑k=0

12k + 1 .

8.S Find a polynomial f(x) of degree 2 such that∑nk=1 f(k) = n3 for all n.

1.3 Order Structure of RThe order relation on R is derived from the following order axiom.

There exists a nonempty subset P of R, closed under addi-tion and multiplication, such that for each x ∈ R exactlyone of the following holds: x ∈ P, −x ∈ P, or x = 0.

The last part of the axiom is known as the trichotomy property. A real numberx is called positive if x ∈ P and negative if −x ∈ P.

1.3.1 Definition. Let a and b be real numbers. If b− a ∈ P, we write a < bor b > a and say that a is less than b or that b is greater than a. ♦

1.3.2 Proposition. The order relation < on R has the following properties:

(a) a −b (reflection property).(b) If a 0, then ac < bc (multiplication property).(e) For a, b ∈ R, exactly one of the following is true: a = b, a < b, or b < a

(trichotomy property).(f) If x 6= 0, then x2 > 0. In particular, 1 > 0.

Proof. (a) a −b.(b) By hypothesis, b− a ∈ P and c− b ∈ P, hence, by closure under addition,

c− a = (b− a) + (c− b) ∈ P, that is, a < c.(c) Similar to (b).(d) Since b− a, c ∈ P, bc− ac = (b− a)c ∈ P, that is, ac < bc.(e) This follows by applying the trichotomy property of P to a− b.(f) If x > 0, then, by closure of P under multiplication, x2 > 0 . If x < 0, then−x > 0 so, by Exercise 1.2.1(c), x2 = (−x)(−x) > 0.

1.3.3 Definition. Let a and b be real numbers. If either a < b or a = b, wewrite a ≤ b or b ≥ a and say that a is less than or equal to b or that b is greaterthan or equal to a. If A ⊆ R, we define A+ = {x ∈ A : x ≥ 0}. ♦

Note that by the trichotomy property,

a ≤ b and b ≤ a⇒ a = b. (1.1)

The inequality a ≤ b is sometimes called weak inequality in contrast tostrict inequality a < b. The reader may check that parts (a)–(d) of the aboveproposition are valid if strict inequality is replaced by weak inequality.

1.3.4 Definition. The absolute value of a real number x is defined by

|x| ={

x if x ≥ 0,−x if x < 0.

♦

For example, |0| = 0 and |2| = | − 2| = 2.

1.3.5 Proposition. Absolute value has the following properties:

(a) |x| ≥ 0. (b) |x| = 0 iff x = 0. (c) | − x| = |x|.

(d) − |x| ≤ x ≤ |x|. (e) |xy| = |x| |y|. (f)∣∣∣∣xy∣∣∣∣ = |x||y|

(y 6= 0).

(g) |x+ y| ≤ |x|+ |y|. (h)∣∣ |x| − |y| ∣∣ ≤ |x− y|. (triangle inequalities)

Proof. Properties (a)–(e) are easily established by considering cases. For ex-ample, in (e), if x ≥ 0 and y ≤ 0, then xy ≤ 0, hence

|xy| = −(xy) = x(−y) = |x| |y|.

For part (f), use (e) to obtain

|x| =∣∣∣∣xy y

∣∣∣∣ =∣∣∣∣xy∣∣∣∣ |y|,

and then divide both sides by |y|.For (g), we have ±x ≤ |x| and ±y ≤ |y| by (d), hence ±(x+ y) ≤ |x|+ |y|.

Since one of the signed quantities on the left is |x+ y|, the assertion follows.From (g) we have

|x| = |(x− y) + y| ≤ |x− y|+ |y|,

hence |x| − |y| ≤ |x− y|. Switching x and y and using (c) yields (h).

1.3.6 Definition. Let S be a nonempty set of real numbers. The largestelement or maximum of S is a member maxS of S that satisfies

maxS ≥ s for all s ∈ S.

The smallest element or minimum of S, denoted by minS, is defined analo-gously.

A set may not have a largest or smallest member. The existence ofmaxSand minS for a nonempty finite set may be established by mathematicalinduction. (See Exercise 1.5.2.)

1.3.7 Definition. The positive and negative parts of a real number x aredefined by

x+ = max{x, 0} and x− = max{−x, 0}. ♦

ExercisesProve the following:

1. (a) If ab > 0, then a and b have the same sign.(b) a > 0 iff 1/a > 0.(c)S Suppose either b, d < 0 or b, d > 0. Then a/b > c/d iff ad > bc.

2. If x > 1, then x2 > x. If 0 < x < 1, then x2 < x.

3. (a) If 0 < x < y and 0 < a < b, then 0 < ax < by.(b) If x < y < 0 and a 0. Then x < y iff x2 < y2.

4.S If either 0 < x < y or x < y < 0, then 1/y < 1/x.

5. If −1 < x < y or x < y < −1, then x/(x + 1) < y/(y + 1). What ifx < −1 < y?

6. If 0 < x < y and n ∈ N, then

(a)S 0 < yn − xn ≤ n(y − x)yn−1, (b) ny + 1nx+ 1 <

(n+ 1)y + 1(n+ 1)x+ 1 .

7. If x > 1, m, n ∈ N, and x− 1x

<m

n< 1, then n > x.

8.S If a 0, then a ≤ b.

11. If 0 < a ≤ bx for every x > 1, then a ≤ b.

12. If a/x ≤ x+ 1 for every x > 0, then a ≤ 0.

13. For all x, y, z, w ∈ R,

(a) 2xy ≤ x2 + y2. (b) S xy + yz + xz ≤ x2 + y2 + z2.

(c) (xy + zw)2 ≤ (x2 + z2)(y2 + w2). (d) (x+ y)2 ≤ 2(x2 + y2).

14.S If x, a > 0, then x+ a2/x ≥ 2a. Equality holds iff x = a.

15. (a) |x− y| ≤ |x− z|+ |z − y|.(b) |x− L| < ε iff L− ε < x < L+ ε.

16. Let S, T ⊆ R be finite and nonempty. Define −S := {−s : s ∈ S}. Then(a) max(−S) = −minS.(b) min(−S) = −maxS.(c) max(S ∪ T ) = max{maxS,max T}.(d) min(S ∪ T ) = min{minS,minT}.

17. For any x, y ∈ R,(a) x+ ≥ 0, x− ≥ 0, x = x+ − x−, and |x| = x+ + x−.(b) x+ =

(|x|+ x

)/2 and x− =

(|x| − x

)/2.

(c) x = y − z and |x| = y + z imply y = x+ and z = x−.(d) (x+ y)+ ≤ x+ + y+ and (x+ y)− ≤ x− + y−.(e) (x− y)− ≤ y, if x, y ≥ 0.

18.S If a ≤ x ≤ b, then |x| ≤ max{|a|, |b|}.

19. (a) max{x, y} =(x+ y + |x− y|

)/2.

(b) min{x, y} =(x+ y − |x− y|

)/2.

20. (a) max{a, b, c} = 14(a+ b+ 2c+ |a− b|+

∣∣ a+ b− 2c+ |a− b|∣∣).

(b) min{a, b, c} = 14(a+ b+ 2c− |a− b| −

∣∣ a+ b− 2c− |a− b|∣∣).

21.S Let S = {a1, . . . , an}, where a1 < · · · < an. Let 1 ≤ k < n and denoteby S1, . . . , Sm the subsets obtained by removing exactly k members fromS, where m =

(n

k

)is the binomial coefficient (see Theorem 1.5.5). Then

max{

minS1, . . . ,minSm}

= ak+1.

1.4 Completeness Property of RA system (F,+, ·, <) with the algebraic and order properties described

in Sections 1.2 and 1.3 is called an ordered field. By Exercise 1.2.2, Q is anordered field under the algebraic operations and order relation inherited fromR. The same is true for the set

Q(√

2) := {x+√

2 y : x, y ∈ Q}

(Exercise 19). This suggests that there are infinitely many ordered subfields of R.The property that distinguishes R from all other ordered fields is completeness,described in this section.

1.4.1 Definition. A nonempty subset A of an ordered field F is said to bebounded above if there exists a member u ∈ F, called an upper bound of A,such that a ≤ u for all a ∈ A. The notions of bounded below and lower boundare defined analogously. The set A is said to be bounded if it is bounded aboveand below. Any set that is not bounded is said to be unbounded (either aboveor below). ♦

The subsets Q and Z of R are neither bounded above nor bounded below;N is bounded below but not above. The set {n/(n+ 1) : n ∈ N} is boundedabove by 1 and below by 1/2.

1.4.2 Definition. Let A be a nonempty subset of an ordered field F. Anupper bound u0 of A with the property that u0 ≤ u for all upper bounds u ofA is called a least upper bound, or supremum, of A, and is denoted by supA.A lower bound `0 of A such that ` ≤ `0 for all lower bounds ` of A is called agreatest lower bound, or infimum, of A, and is denoted by inf A. If supA ∈ A,then supA is called the maximum of A. If inf A ∈ A, then inf A is called theminimum of A. ♦

inf A supAr aa

A

FIGURE 1.1: Supremum and infimum of A.

It follows from (1.1) that the supremum or infimum of a set, if it exists, isunique.

It is not necessarily the case that a nonempty bounded subset of an orderedfield has an infimum or supremum. For example, because

√2 is not rational

(1.4.11 below), the bounded set {x ∈ Q : x2 < 2} has neither an infimum nora supremum in Q.

The following proposition will be used frequently in ensuing discussionsinvolving suprema and infima.

1.4.3 Approximation Property. Let A be a nonempty subset of an orderedfield F.

(a) If supA exists, then for each r with r < supA there exists a ∈ A such thatr < a ≤ supA.

(b) If inf A exists, then for each r with inf A < r there exists a ∈ A such thatinf A ≤ a < r.

Proof. If r < supA, then r cannot be an upper bound for A, hence there existsa ∈ A with r < a. The proof of (b) is similar.

We may now state the property that distinguishes the real number systemfrom all other ordered fields.

Every nonempty subset of R that isbounded above has a least upper bound.

This axiom is known as the completeness property of R. It is the key ingredientneeded for the formulation of a useful and robust theory of limits. From thecompleteness property one may deduce (Exercise 1) the symmetrical property

Every nonempty subset of R that isbounded below has a greatest lower bound.

The real number system may now be described as a complete ordered field.It may be shown that (up to isomorphism) there is exactly one such structure.

The following important consequence of completeness is useful in determin-ing the infimum or supremum of certain sets. It asserts that positive integermultiples of a positive real number may be made arbitrarily large.

1.4.4 Archimedean Principle. For any real numbers a and b with a > 0there exists n ∈ N such that na > b.

Proof. Suppose, for a contradiction, that na ≤ b for all n ∈ N. The setS = {na : n ∈ N} is then bounded above and hence has a least upper boundu. Since u − a < u, the approximation property for suprema implies thatu− a < na for some n ∈ N. But then u < (n+ 1)a ∈ S, contradicting that uis an upper bound for S.

1.4.5 Example. Let

A ={

(−1)n n

n+ 1 : n ∈ N}

={− 1

2 ,23 ,−

34 , . . .

}.

Since A is bounded above by 1 and below by −1, −1 ≤ inf A ≤ supA ≤ 1. Let0 < r < 1. By the Archimedean principle we may choose an even integer nsuch that n > r/(1− r). Then r < n/(n+ 1) ∈ A, which shows that r cannotbe an upper bound of A. Therefore, supA = 1. Similarly, inf A = −1. ♦

1.4.6 Well-Ordering Principle. Every nonempty subset A of N has a small-est member.Proof. Since A is bounded below by 1, it has a greatest lower bound `. Thetheorem will follow if we show that ` ∈ A. Suppose, for a contradiction, that` 6∈ A. By the approximation property for infima, there exists a ∈ A suchthat ` < a < ` + 1. Choose any real number r with ` < r < a, for example,r = (a+ `)/2. By the approximation property again, there exists a′ ∈ A suchthat ` < a′ < r. We now have ` < a′ < a < `+ 1, which implies that a− a′ isan integer strictly between 0 and 1. As this is impossible,5 it follows that `must be a member of A.

1.4.7 Greatest Integer Function. For each x ∈ R there exists a uniqueinteger bxc such that x− 1 < bxc ≤ x.Proof. The uniqueness is clear. To prove existence, apply the Archimedeanprinciple twice: first to obtain an integer k such that x + k ≥ 1 and thento conclude that the set A := {n ∈ N : n > x + k} is nonempty. By thewell-ordering principle, A has a least member a. Since 1 ≤ x + k < a, a − 1is a positive integer. Since a− 1 < a, a− 1 cannot be in A so x+ k ≥ a− 1.Therefore, x− 1 < a− 1− k ≤ x, hence the integer bxc := a− 1− k has therequired property.

x

y

1 2 3−1−2

1

2

3

−1

−2

−3

−3

FIGURE 1.2: Greatest integer function.

The integer bxc is called the greatest integer in x or the floor of x. Thegreatest integer function allows a simple proof of the following importantresult:

5This is intuitively clear. The abstract definition of N given in Section 1.5 may be usedto give a rigorous proof.

1.4.8 Density of the Rationals. Between any pair of distinct real numbersthere is a rational number.Proof. Let a < b. By the Archimedean principle, n(b− a) > 1 for some n ∈ N.Let m := bnac+ 1. Then na < m ≤ na+ 1 < nb, hence a < m/n < b.

1.4.9 Definition. (nth roots). Let n be a positive integer and let b > 0. Theunique positive solution of the equation xn = b is called the positive nth rootof b and is denoted by b1/n. For m ∈ Z we define bm/n = (b1/n)m. As usual wewrite

√b for b1/2. ♦

The existence of b1/n is an easy consequence of the intermediate valuetheorem, proved in Chapter 3. Uniqueness follows from Exercise 1.2.4(a).We omit the straightforward (but admittedly tedious) proof of the followingtheorem that summarizes the familiar rules of rational exponentiation.1.4.10 Theorem. For r, s ∈ Q and positive real numbers a, b,

brbs = br+s,br

bs= br−s, (br)s = brs, and (ab)r = arbr.

The following proposition gives a simple way to generate irrational numbers.1.4.11 Proposition. If n is positive integer that is not a perfect square, then√n is irrational.

Proof. By definition of the greatest integer function,√n − 1 < b

√nc ≤

√n.

Since√n is assumed not to be an integer, the second inequality is strict,

hence 0 <√n− b

√nc < 1. Suppose, for a contradiction, that

√n is rational.

Then the set A := {m ∈ N : m√n ∈ N} is nonempty. By the well-ordering

principle, A has a least member m0. In particular, m0√n ∈ N, hence both of

the quantities

m := m0(√n− b

√nc)

and m√n = m0

(n−√nb√nc)

are positive integers. But then m ∈ A, which is impossible since m < m0.Therefore,

√n must be irrational.

In later chapters, we shall see other important examples of irrationalnumbers, notably the base e of the natural logarithm.1.4.12 Definition. The extended real number system is the set

R := R ∪ {−∞,+∞},

where +∞, −∞ are symbols with the following prescribed properties:

−∞ < x <∞ for all x ∈ R,x+∞ = +∞ if −∞ < x ≤ +∞, x−∞ = −∞ if −∞ ≤ x < +∞,x · (+∞) = +∞ if 0 < x ≤ +∞, x · (+∞) = −∞ if −∞ < x < 0,x · (−∞) = −∞ if 0 < x < +∞, x · (−∞) = +∞ if −∞ ≤ x < 0,

x

+∞ = x

−∞= 0 if −∞ < x < +∞. ♦

The above algebraic conventions are derived from limit considerations. Notethat the operations

±∞∓∞, (±∞) · (∓∞), ±∞±∞

,±∞∓∞

, and 0 · (±∞) (1.2)

are not defined.

1.4.13 Definition. If A 6= ∅ is not bounded above, we set supA = +∞.Similarly, if A is not bounded below, we set inf A = −∞. We also definesup ∅ = −∞ and inf ∅ = +∞. ♦

The reader may verify that the approximation properties for suprema andinfima given in 1.4.3 hold in the extended system R.

1.4.14 Definition. An interval in R is a nonempty set I with the propertythat a, b ∈ I and a < x < b imply that x ∈ I. An interval containing morethan one point is said to be nondegenerate. ♦

Arguing cases, one may show that the definition of interval reduces to thefollowing familiar subsets of R:

(a, b) := {x : a < x < b}, (a, b] := {x : a < x ≤ b},[a, b) := {x : a ≤ x < b}, [a, b] := {x : a ≤ x ≤ b}.

For example, if I is unbounded below and bounded above with b := sup I ∈I, then I = (−∞, b]. If, instead, I is bounded below and above such thata := inf I ∈ I and b := sup I 6∈ I, then I = [a, b). Intervals that contain theirendpoints are said to be closed; those that don’t contain any endpoints arecalled open.

The length |a− b| of a finite interval I with endpoints a, b will be denotedby |I|. Note that the length of a degenerate interval is zero.

Exercises1. Prove that inf (−A) = − supA, where −A := {−a : a ∈ A}. Conclude

that every nonempty subset of R that is bounded below has a greatestlower bound.

2. Find the supremum and infimum of the following sets, where rn denotesthe remainder on division of n ∈ N by 3. 6

(a) S {(−1)n(r2n + 3rn + 2) : n ∈ N}. (b) S

{(−1)nn(rn − 1)(n+ 1)(rn + 1) : n ∈ N

}.

(c){

(−1)n 3n+ 22n+ 3 : n ∈ N

}. (d)

{[(−1)bn/3c − 1

]n

n+ 1 : n ∈ N

}.

6For the existence of rn, see Exercise 1.5.15.

3. Find the supremum and infimum of the following sets.

(a) {x : x2 − 5x+ 6 < 0}. (b) {x : (x+ 3)(x− 4) < −6}.(c) S {x : (x− 4)/(x− 3) < −2}. (d) S {x : x− 2 < 1/(x− 1)}.(e) S {x : (x− 1)/x < 4}. (f) {x > 0 : x/(2− x) > 3}.(g) {x : |x2 − 3x+ 2| ≤ 1/4}. (h) S {x : |x− 1|+ |x− 2| ≤ 3}.(i) S {x :

√x− 1/8 > x}. (j) {x :

√x+ 1/8 > x}.

(k) {x : 2|x− 1|+ 3|y − 2| < 6 for some y ∈ R}.(l) {x : 2

∣∣x2 − 1∣∣+ 3

∣∣y2 − 2∣∣ < 6 for some y ∈ R}.

(m)S{

(−1)n(sin(nπ/2)− n−1) : n ∈ N

}.

(n){

(−1)n(sin(mπ/2)− n−1) : m, n ∈ N

}.

4. Let A ⊆ B be nonempty subsets of R. Prove that supA ≤ supB andinf A ≥ inf B.

5.S ⇓7 For a nonempty bounded set A define |A| := {|a| : a ∈ A}. Provethat sup |A| − inf |A| ≤ supA− inf A. Hint. Use |x| − |y| ≤ |x− y|.

6. For r ∈ Q, x ∈ R, and nonempty subsets A and B of R, define

xA = {xa : a ∈ A} A+B = {a+ b : a ∈ A, b ∈ B}AB = {ab : a ∈ A, b ∈ B} Ar = {ar : a ∈ A}, A ⊆ (0,+∞).

Under the conventions described in 1.4.12, prove that

(a) sup (A+B) ≤ supA+ supB, inf (A+B) ≥ inf A+ inf B.(b)S sup (xA) = x supA, inf (xA) = x inf A if x ≥ 0.(c) sup (AB) ≤ (supA)(supB) and inf (AB) ≥ (inf A)(inf B)

if A, B ⊆ (0,∞).(d) supAr = (supA)r, inf Ar = (inf A)r if A ⊆ (0,∞) and r > 0.(e) supA−1 = 1/ inf A, inf A−1 = 1/ supA if A ⊆ (0,∞).

7. Let A ⊆ R be nonempty such that inf{|x − y| : x, y ∈ A, x 6= y} > 0(for example, any set of integers). If A is bounded above, prove thatsupA ∈ A, that is, A has a maximum.

8. Let A be a nonempty bounded set and let r ∈ R such that x− y < r forall x, y ∈ A. Show that supA− inf A ≤ r.

9.S Prove that between any pair of distinct real numbers there is an irrationalnumber.

7This exercise will be used in 5.2.6.

10. Prove that between any pair of real numbers a < b there exist infinitelymany rational numbers and infinitely many irrational numbers.

11. (Density of the dyadic rationals). Prove that for each pair of real numbersa n, a consequence of the binomialtheorem, proved in the next section.) A number of the form m/2n iscalled a dyadic rational.

12. Prove:(a) bxc = b−xc iff x = 0. (b)S bxc = −b−xc iff x ∈ Z.(c)S −1 < x+ b−xc ≤ 0. (d) bxc+ bm− xc = m or m− 1.

13. Let m ∈ Z, n ∈ N, xj ∈ R, and define

s :=n∑j=0

xj and t :=n∑j=0bxjc.

Prove:(a) 0 ≤ bsc − t ≤ n. (b) k ≤ s− t < k + 1 for some k = 0, 1, . . . , n.

14.S Let b > 0. Prove that bm/n = (bm)1/n.

15. ⇓8 Prove that for a, b > 0 and n ∈ N,

a1/n − b1/n = (a− b)( n∑j=1

a1−j/nb(j−1)/n)−1

.

16. Show that if 0 ≤ a < b and n ∈ N, then a1/n < b1/n.

17.S Prove that if A is a bounded set, then there exists an integer N suchthat |x| ≤ N for all x ∈ A.

18. Let a, b ∈ Q \ {0} and n ∈ N. Prove that x := a+ b√n is irrational iff n

is not a perfect square.

19. Show that if x, y ∈ Q(√

2), then x±y, xy, x/y ∈ Q(√

2), the last providedthat y 6= 0. Conclude that Q(

√2) is an ordered subfield of R. Show that

Q(√

2) is not complete.

20.S (a) Find all n ∈ N such that√n+ 11 +

√n ∈ Q.

(b) Same question for√n+ 21 +

√n.

21. Let p ∈ N be prime, that is, divisible only by 1 and itself. Prove that(√n+ 1)(√n+ p+ 1)−1 ∈ Q iff n = (p− 1)2/4.

1.5 Mathematical InductionIn this section we give an abstract characterization of the natural number

system. This will lead directly to the principle of mathematical induction.

1.5.1 Definition. A set S of real numbers is said to be inductive if

• 1 ∈ S,

• x ∈ S implies x+ 1 ∈ S.

The set N of natural numbers is then defined as the intersection of all inductivesubsets of R. ♦

The sets (a,+∞), and (a,+∞) ∩ Q, a < 1, are clearly inductive. Moreimportantly, N itself is inductive. Indeed, since 1 is common to all inductivesets, 1 ∈ N, and if n is common to all inductive sets, then so is n + 1. Wemay therefore characterize N as the smallest inductive set (in the sense of setinclusion). The principle of mathematical induction follows immediately fromthis characterization:

1.5.2 Principle of Mathematical Induction. For each n ∈ N, let P (n) bea statement depending on n. Suppose that

(a) P (1) is true,

(b) P (n+ 1) is true whenever P (n) is true.

Then P (n) is true for all n.

Proof. Let S denote the set of n ∈ N for which P (n) is true. Then (a) and (b)imply that S is inductive and hence, as a subset of N, must in fact equal N.

In a particular application of 1.5.2, part (a) is called the base step and part(b) the inductive step. The assumption in (b) that P (n) is true is called theinduction hypothesis.

The principle of mathematical induction has been loosely described as the“domino principle”: If dominoes are lined up vertically in such a way that the(n+ 1)st domino will fall if the nth one falls, then, if the first domino is tipped,all the dominoes will fall.

Mathematical induction may be used to give a rigorous proof that N isclosed under addition: Let P (n) be the statement that n+m ∈ N for all m ∈ N.Then P (1) is true because N is inductive, and if, for some n, P (n) is true, thatis, if n+m ∈ N for all m, then clearly P (n+ 1) is true. A similar argumentshows that N is closed under multiplication.

Mathematical induction is indispensable in proving many useful inequalitiesand formulas. We offer two examples; others may be found in the exercises.

1.5.3 Example. We prove by induction that 3nn! > nn for all n ∈ N. Thisis obvious for n = 1. For the induction step, we need the fact (verified inExample 2.2.4) that (1 + 1/n)n < 3, or equivalently, (n+ 1)n < 3nn, for all n.Assuming this, we see that if 3nn! > nn, then

3n+1(n+ 1)! = 3(n+ 1)3nn! > 3(n+ 1)nn > (n+ 1)n+1. ♦

1.5.4 Example. We derive a closed formula for f(n) :=∑nk=1(3k − 1)2 and

then verify the result by induction. A little experimentation suggests that weshould try a polynomial in n of degree 3, say g(n) := An3 +Bn2 + Cn+D.Then

g(n+ 1)− g(n) = A[(n+ 1)3 − n3]+B

[(n+ 1)3 − n2]+ C

[(n+ 1)− n

]= 3An2 + (3A+ 2B)n+A+B + C

and

f(n+1)−f(n) =n+1∑k=1

(3k−1)2−n∑k=1

(3k−1)2 =[3(n+1)−1

]2 = 9n2 +12n+4.

Assuming that f(n) = g(n) for all n, we may equate coefficients to obtainA = 3, B = 3/2, and C = −1/2. Since f(1) = 4, we see that D = 0. Thus,under the assumption that the sum has a closed form that is a cubic polynomial,we obtain the formula

n∑k=1

(3k − 1)2 = 3n3 + 32n

2 − 12n.

To prove the validity of the formula we use induction. When n = 1, eachside equals 4. Assuming the formula holds for n, we haven+1∑k=1

(3k−1)2 =n∑k=1

(3k−1)2 +[3(n+1)−1

]2 =[3(n+1)−1

]2 +3n3 + 32n

2− 12n.

A little algebra shows that the last expression reduces to

3(n+ 1)3 + 32 (n+ 1)2 − 1

2 (n+ 1).

Thus the formula holds for n+ 1, completing the induction. ♦

The stalwart reader may wish to use the methods of the last example toderive and then verify by induction the formula

n∑k=1

k4 = n

30(6n4 + 15n3 + 10n2 − 1

).

There are many other types of applications of the principle of mathematicalinduction, some of which are given in the exercises. The following has importantconsequences in combinatorics, probability theory, and infinite series.

1.5.5 Binomial Theorem. Let a, b ∈ R and n ∈ N. Then

(a+ b)n =n∑k=0

(n

k

)akbn−k, where

(n

k

):= n!

k!(n− k)! .

Proof. For n = 1 the formula asserts that

a+ b =(

10

)a0b1 +

(11

)a1b0,

which follows from the convention 0! = 1. Suppose that the formula holds forsome n ≥ 1. Writing (a + b)n+1 as (a + b)(a + b)n and using the inductionhypothesis, we have

(a+ b)n+1 =n∑k=0

(n

k

)ak+1bn−k +

n∑k=0

(n

k

)akbn+1−k

=n∑k=1

(n

k − 1

)akbn+1−k +

n∑k=1

(n

k

)akbn+1−k + an+1 + bn+1

=n∑k=1

[(n

k − 1

)+(n

k

)]akbn+1−k + an+1 + bn+1

=n+1∑k=0

(n+ 1k

)akbn+1−k,

where, for the last step, we used Exercise 1.2.6. By induction, the formulaholds for all n.

Exercises1. ⇓9 Let 0 < a < x1, y1 < b := a+ 1 and define

xn+1 = a+√|xn − a| and yn+1 = b−

√|b− yn|.

Prove that a < xn < xn+1 < b and a < yn+1 < yn < b for all n ∈ N.

2. Use induction to prove that a nonempty finite set has a maximum and aminimum.

3.S ⇓10 Verify by induction that

2n∑k=1

(−1)k+1

k=

2n∑k=n+1

1k

for all n ≥ 1.

9This exercise will be used in 2.2.3.10This exercise will be used in 6.4.8.

4. Establish the following formulas by mathematical induction:

(a)n∑k=1

k = n(n+ 1)/2. (b)n∑k=1

k2 = n(n+ 1)(2n+ 1)/6.

(c)n∑k=1

k3 = [n(n+ 1)/2]2 . (d)n∑k=1

(2k − 1)2 = n(4n2 − 1)/3.

(e)n∑k=1

(2k − 1)3 = n2(2n2 − 1). (f)n∑k=1

1√k − 1 +

√k

=√n.

(g)n∑k=1

(4k3 − 6k2 + 4k − 1) = n4. (h)n∑k=1

2k +√k(k − 1)− 1√

k +√k − 1

= n√n.

5.S Use the methods of 1.5.4 to derive and verify a closed formula for∑nk=1(5k − 4)2.

6. Use known formulas to calculate

(a) 1 · 2 + 2 · 3 + 3 · 4 + · · ·+ 999 · 1000.(b)S 1 · 3 + 3 · 5 + 5 · 7 + · · ·+ 999 · 1001.(c) 1 · 3 + 5 · 7 + 9 · 11 + · · ·+ 1001 · 1003.

7.S Use the principle of mathematical induction to prove the followingvariant: Let n0 ∈ Z and let P (n) is a statement depending on integersn ≥ n0 such that(a) P (n0) is true,(b) if n ≥ n0 and P (n) is true, then P (n+ 1) is true.Then P (n) is true for every n ≥ n0.

8. Use the variant of mathematical induction in Exercise 7 to verify thefollowing inequalities. (For (e) use (1 + 1/n)n > 2, an easy consequenceof the binomial theorem.)

(a) S 2n+ 1 < 2n, n ≥ 3. (b) n2 < 2n, n ≥ 5.(c) 2n < n!, n ≥ 4. (d) 3n < n!, n ≥ 7.(e) S 2nn! < nn, n ≥ 6. (f) 8nn! < (2n)!, n ≥ 6.

9.S Use the variant of mathematical induction in Exercise 7 to prove thatn < ln(n!), n ≥ 6.

10. Prove Bernoulli’s inequality: (1 + x)n ≥ 1 + nx, n ∈ Z+, x ≥ −1.


11. Use the principle of mathematical induction to prove the following variant:Let n0 ∈ Z and let P (n) be a statement depending on integers n ≥ n0such that(a) P (n0) is true,(b) P (n+ 1) is true whenever P (j) is true for all n0 ≤ j ≤ n.Then P (n) is true for every n ≥ n0.

12. (Prime Factorization). Use the variant of induction in Exercise 11 toprove that every integer n ≥ 2 may be written as a product of powers ofprime numbers (for example, 72 = 23 · 32).

13.S The Fibonacci numbers fn are defined recursively by

f0 = f1 = 1 and fn+1 = fn + fn−1, n ≥ 1.

Use the variant of induction in Exercise 11 to prove that

fn = 1√5(an+1 − bn+1), a := 1 +

√5

2 , b := 1−√

52 ,

where a, b are the zeros of x2 − x− 1.

14. Let a0 and a1 be arbitrary and define

an+1 = 12 (an + an−1), n ≥ 1.

Use the variant of induction in Exercise 11 to prove that for all n ≥ 0,

an = (−1)n3 · 2n−1 (a0 − a1) + 1

3(a0 + 2a1).

15.S (Division algorithm). Prove that for each pair of integers m and n withn > 0 there exist unique integers q and r such that

m = qn+ r and 0 ≤ r ≤ n− 1.

(The integer q is called the quotient and r the remainder on division ofm by n.)

16. Use the variant of induction in Exercise 11 to prove that each n ∈ Nmay be uniquely expressed in the form

∑pk=0 dk10k for some p ∈ N and

dk ∈ {0, 1, . . . , 9}. The representation

n = dpdp−1 . . . d0

is called the decimal positional notation for n.


1.6 Euclidean SpaceThe real number system may be used to construct other important math-

ematical systems, such as n-dimensional Euclidean space and the complexnumber system. In this section we construct the former. The reader may delayreading this section, as the material will not be needed until Chapter 8.

For n ∈ N, let Rn denote the set of all n-tuples x := (x1, x2, . . . , xn),where xj ∈ R. Each such n-tuple is called a point or vector, depending oncontext. The distinction between points and vectors is important in physicsand geometry, as it allows one to refer to a vector at a point, a notion usefulin describing, say, forces or tangent vectors.

The set Rn has an algebraic structure which is defined as follows: Let

x = (x1, . . . , xn), y = (y1, . . . , yn), and t ∈ R.

The operations of addition x+ y and scalar multiplication tx in Rn are thendefined by

x+ y = (x1, . . . , xn) + (y1, . . . , yn) = (x1 + y1, . . . , xn + yn), andtx = t(x1, . . . , xn) = (tx1, . . . , txn).

We also define

−x := (−x1, . . . ,−xn) and 0 := (0, . . . , 0).

The following theorem asserts that Rn is a vector space under these operations(see Appendix B). The straightforward proof is left to the reader.

1.6.1 Theorem. Addition and scalar multiplication on Rn have the followingproperties:

• associativity of addition: (x+ y) + z = x+ (y + z);

• commutativity of addition: x+ y = y + x;

• existence of an additive identity: x+ 0 = x;

• existence of additive inverses: x+ (−x) = 0;

• associativity of scalar multiplication: (st)x = s(tx);

• distributivity of a scalar over vector addition: s(x+ y) = sx+ sy;

• distributivity of a vector over scalar addition: (s+ t)x = sx+ tx;

• existence of a scalar multiplicative identity: 1x = x.


1.6.2 Definition. Let x = (x1, . . . , xn) and y = (y1, . . . , yn). The Euclideaninner product x · y of x and y and the Euclidean norm ‖x‖2 of x are definedby

x · y =n∑j=1

xjyj and ‖x‖2 =( n∑j=1

x2j

)1/2=√x · x.

The set Rn with its vector space structure and the Euclidean inner product iscalled n-dimensional Euclidean space. ♦

The structure of Euclidean space allows one to define lines, planes, length,perpendicularity, angle between vectors, etc. These ideas will be useful in laterchapters.

1.6.3 Theorem. The inner product in Rn has the following properties:

(a) x · x = ‖x‖22.

(b) x · y = y · x (commutativity).

(c) t(x · y) = (tx) · y = x · (ty) (associativity).

(d) x · (y + z) = (x · y) + (x · z) (additivity).

(e) |x · y| ≤ ‖x‖2 ‖y‖2 (Cauchy–Schwartz inequality).

Proof. Properties (a) and (b) are immediate and parts (c) and (d) followrespectively from the calculations

t

n∑j=1

xjyj =n∑j=1

(txj)yj =n∑j=1

xj(tyj) andn∑j=1

xj(yj + zj) =n∑j=1

xjyj +n∑j=1

xjzj .

The inequality in (e) holds trivially if y = 0. Suppose y 6= 0, so ‖y‖2 6= 0.By properties (a)–(d),

0 ≤ ‖x− ty‖22 = (x− ty) · (x− ty) = ‖x‖22 − 2t(x · y) + t2‖y‖22.

Setting t = (x · y)/‖y‖22, we obtain

0 ≤ ‖x‖22 − 2(x · y)2/‖y‖22 + (x · y)2/‖y‖22 = ‖x‖22 − (x · y)2/‖y‖22,

which implies that (x · y)2 ≤ ‖x‖22 ‖y‖22. Taking square roots yields (e).

1.6.4 Theorem. The Euclidean norm on Rn has the following properties:

(a) ‖x‖2 ≥ 0 (nonnegativity).

(b) ‖x‖2 = 0 iff x = 0 (coincidence).

(c) ‖tx‖2 = |t| ‖x‖2 (absolute homogeneity).

(d) ‖x+ y‖2 ≤ ‖x‖2 + ‖y‖2 (triangle inequality).


Proof. Parts (a) and (b) are clear, and (c) follows from

‖tx‖22 =n∑j=1

(txj)2 = t2n∑j=1

x2j = t2‖x‖22.

For (d) we use 1.6.3:

‖x+ y‖22 = (x+ y) · (x+ y)= ‖x‖22 + ‖y‖22 + 2(x · y)≤ ‖x‖22 + ‖y‖22 + 2‖x‖2 ‖y‖2= (‖x‖2 + ‖y‖2)2.

Exercises1.S Solve the following system of vector equations for x and y in terms of

a, b, c, d, and e, assuming that (a · b)(d · b) 6= 1.

x+ (y · b)a = c

y + (x · b)d = e.

2. Prove the following:(a) ‖x+ y‖22 − ‖x− y‖22 = 4(x · y) (polarization identity).(b) ‖x+ y‖22 + ‖x− y‖22 = 2

(‖x‖22 + ‖y‖22

)(parallelogram rule).

(c)S∣∣ ‖x‖2 − ‖y‖2 ∣∣ ≤ ‖x− y‖2.

(d) ‖x1 + · · ·+ xn‖2 ≤∑nj=1 ‖xj‖2 (generalized triangle inequality).

3.S Suppose that xi · xj = 0 for i 6= j. Prove that

‖x1 + · · ·+ xk‖22 =k∑j=1‖xj‖22.

4. ⇓11 For x = (x1, . . . , xn) define

‖x‖1 =n∑j=1|xj | and ‖x‖∞ = max{|x1|, . . . , |xn|}.

Verify that ‖ · ‖1 and ‖ · ‖∞ have the properties (a)–(d) of 1.6.4.

5. A nonempty subset C of Rn is said to be convex if x, y ∈ C and t ∈ [0, 1]imply that tx+(1− t)y ∈ C. Let r > 0. Prove that {x ∈ Rn : ‖x‖2 ≤ r}is convex. Is the set {x ∈ Rn : ‖x‖2 = r} convex? What about the sets{x ∈ Rn : ‖x‖1 ≤ r} and {x ∈ Rn : ‖x‖∞ ≤ r}?

11This exercise will be used in Section 8.1.


6. Find positive constants a, b, c such that for all x ∈ Rn,‖x‖2 ≤ a‖x‖1, ‖x‖1 ≤ b‖x‖∞, and ‖x‖∞ ≤ c‖x‖2.

7.S Prove that ‖x‖2 = ‖y‖2 = ‖(x+ y)/2‖2 = 1⇒ x = y. Is the same truefor ‖ · ‖∞ or ‖ · ‖1?

8. Show that in R3, a · b = ‖a‖ ‖b‖ cos θ, where θ is the (smaller) anglebetween a and b.

9. The cross product of vectors a = (a1, a2, a3) and b = (b1, b2, b3) in R3 isdefined by

a× b =⟨∣∣∣∣a2 a3b2 b3

∣∣∣∣ ,− ∣∣∣∣a1 a3b1 b3

∣∣∣∣ , ∣∣∣∣a1 a2b1 b2

∣∣∣∣⟩= 〈a2b3 − a3b2, a3b1 − a1b3, a1b2 − a2b1〉 .

Let θ be the (smaller) angle between a and b. Verify the following:

(a) (a× b) · a = (a× b) · b = 0.(b) b× a = −a× b.(c) a× (tx+ sy) = t(a× x) + s(a× y).(d) (a× b) · c = a · (b× c).(e) a× (b× c) = (a · c)b− (a · b)c.(f) ‖a× b‖ = ‖a‖ ‖b‖ sin θ.

Chapter 2Numerical Sequences

2.1 Limits of SequencesSimply stated, a sequence in a set E is a function from N to E. It is

more instructive, however, to think of a sequence as an infinite ordered list ofmembers of E. The list may be written out, for example, as

a1, a2, . . . , an, . . .

or abbreviated by {an}∞n=1 or simply by {an}. A sequence usually starts withthe index 1, although this is not necessary, 0 being a common alternative.

The set E in the definition of sequence is arbitrary. However, for Part I ofthe book, we consider only numerical sequences, that is, sequences containedin R.

Sequences may be defined by a closed formula, such as an = (−1)n, orrecursively, such as the Fibonacci sequence, defined by

a0 = a1 = 1 and an+1 = an + an−1, n ≥ 1

(see Exercise 1.5.13).The following notion will occasionally be useful. A property P of a sequence

{an} is said to hold eventually if there exists an index N such that an hasproperty P for all n ≥ N . For example, by the Archimedean principle, thesequence {1/n} is eventually less than .001. Or, consider the sequence definedby an = n2 + 100(−1)n; the reader may verify that eventually an < an+1.

Convergence of a sequence to a number a expresses the idea that eventuallythe terms of the sequence will be as close to a as desired. The followingdefinition makes this precise.

2.1.1 Definition. A sequence {an} in R is said to converge to a real number a,written

an → a or limnan = lim

n→+∞an = a,

if for each ε > 0 there exists N ∈ N such that

|an − a| < ε, (a− ε < an < a+ ε), for all n ≥ N.

If no such real number a exists, then the sequence is said to diverge. ♦

29

a+ ε

a

a− ε

1 2 N − 2 N3 N + 24 5

FIGURE 2.1: Convergence of a sequence to a

It follows immediately from the definition that an → a iff the terms ofsequence eventually lie in any open interval containing a. The definition alsoimplies that an → a iff |an − a| → 0.

Limits, if they exist, are unique. Indeed, if an → a and an → b, then bythe triangle inequality |a− b| ≤ |a− an|+ |b− an| → 0, hence a = b.

Examples. (a) The sequence {(−1)n} oscillates between −1 and 1 and socannot converge. For a rigorous proof, suppose (−1)n → a for some a ∈ R.Choose N such that a− 1 < (−1)n < a+ 1 for all n ≥ N . Thus, if n ≥ N iseven, then 1 < a + 1, and if n ≥ N is odd, then a − 1 < −1. Adding theseinequalities produces the absurdity a < a.(b) To show that

limn

(−1)nn

= 0,

let ε > 0 and choose an integer N > 1/ε (Archimedean principle). Then|(−1)n/n− 0| = 1/n < ε for all n ≥ N .(c) To verify that

limn

2n+ 13n+ 5 = 2

3 ,

note that ∣∣∣∣2n+ 13n+ 5 −

23

∣∣∣∣ = 73(3n+ 5) <

7n,

so any index N > 7/ε satisfies the condition in 2.1.1. ♦

2.1.2 Definition. A sequence {an} is said to be bounded (above, below ) ifthe set of its terms is bounded (above, below). ♦

2.1.3 Proposition. A convergent sequence in R is bounded.

Proof. Assume that an → a ∈ R. Choose N such that |an − a| < 1 for alln > N . Since |an| − |a| ≤ |an − a|, we see that |an| ≤ |an − a|+ |a| < 1 + |a|for all n > N . Thus |an| ≤ max{1 + |a|, |a1|, . . . , |aN |} for all n ∈ N.

Numerical Sequences 31

2.1.4 Theorem. Let {an} and {bn} be sequences with an → a and bn → b. Ifan ≤ bn for infinitely many n, then a ≤ b.

Proof. Suppose b < a. Then b < (a+ b)/2 < a, hence we may choose indicesN1 and N2 such that bn < (a + b)/2 for all n ≥ N1 and an > (a + b)/2 forall n ≥ N2. But then bn < an for all n ≥ max{N1, N2}, contradicting thehypothesis.

Note that, as a consequence of the preceding theorem, a convergent sequencein a closed interval I must have its limit in I.

2.1.5 Theorem (Squeeze principle). Let {an}, {bn}, and {cn} be sequencesin R such that an ≤ bn ≤ cn for all n. If limn an = limn cn = x ∈ R, thenlimn bn = x.

Proof. Given ε > 0, choose N1, N2 ∈ N such that |an − x| < ε for all n ≥ N1and |cn − x| < ε for all n ≥ N2. For n ≥ max{N1, N2}, the inequalities−ε < an − x ≤ bn − x ≤ cn − x < ε imply that |bn − x| < ε.

an bn cn

x

FIGURE 2.2: The squeeze principle.

2.1.6 Example. We show that limn nrn = 0 for any r ∈ (0, 1). Let h = r−1−1.

Then h > 0 and, by the binomial theorem,

r−n = (1 + h)n = 1 + nh+ 12n(n− 1)h2 + · · · > 1

2n(n− 1)h2,

hence0 < nrn <

2(n− 1)h2 , n > 1.

Since the term on the right tends to 0 as n→ +∞, the squeeze principle showsthat nrn → 0. (See Exercise 16 for an extension of this result.) ♦

For another illustration of the squeeze principle we prove

2.1.7 Proposition. For any real number x there exist sequences {an} in Qand {bn} in I such that limn an = limn bn = x.

Proof. By 1.4.8 and Exercise 1.4.9, for each n ∈ N we may choose pointsan ∈ (x − 1/n, x + 1/n) ∩ Q and bn ∈ (x − 1/n, x + 1/n) ∩ I. The squeezeprinciple then implies that an, bn → x.

2.1.8 Definition. (Infinite limits) A sequence {an} in R is said to divergeto +∞, written

an → +∞ or limnan = lim

n→+∞an = +∞,

if for each real number M there exists an index N such that an ≥M for alln ≥ N . Divergence to −∞ is defined analogously. ♦

2.1.9 Example. If r > 1, then rn/n→ +∞. This follows from 2.1.6: GivenM > 0 there exists N ∈ N such that n/rn < 1/M , hence rn/n > M , for alln ≥ N . ♦

2.1.10 Example. If r > 0, then an := rnn!→ +∞. Indeed, sinceanan−1

= rn→ +∞,

there exists N ∈ N such that an > 2an−1, for all n > N . Iterating, we see thatan > 2kan−k ≥ kan−k, so taking k = n−N we have an > (n−N)aN for alln > N . Since limn(n−N)aN = +∞ (Archimedean principle), the assertionfollows. ♦

For the following theorem, recall the conventions regarding addition andmultiplication in the extended real number system R (1.4.12).

2.1.11 Theorem. Let {an} and {bn} be sequences in R. The following limitproperties hold in R in the sense that if the expression on the right side of theequation exists in R, then the limit on the left side exists and equality holds.

(a) limn(san + tbn) = s limn an + t limn bn, s, t ∈ R.

(b) limn anbn = limn an limn bn.

(c) limn an/bn = limn an/ limn bn, if limn bn 6= 0.

(d) limn |an| = | limn an|.

(e) limn√an =

√limn an if an ≥ 0 for all n.

Proof. Let an → a, bn → b. We prove the theorem first for the case a, b ∈ R.Let ε > 0.

For (a) choose N1 and N2 so that

|an − a| <ε

2(|s|+ 1) for all n ≥ N1 and |bn − b| <ε

2(|t|+ 1) for all n ≥ N2.

If n ≥ N := max{N1, N2}, then both of these inequalities hold, hence, by thetriangle inequality,

|san + tbn − (sa+ tb)| ≤ |s| |an − a|+ |t| |bn − b| < ε/2 + ε/2 = ε.

To prove (b), choose M ≥ |a| so that |bn| ≤M for all n (2.1.3) and chooseN so that |an − a| < ε/2M and |bn − b| < ε/2M for all n ≥ N . For such n,

|anbn − ab| = |(an − a)bn + a(bn − b)| ≤ |an − a||bn|+ |a||bn − b|≤M |an − a|+M |bn − b| < ε/2 + ε/2 = ε.

For (c) it suffices to show that 1/bn → 1/b. Choose N such that

|bn − b| < min{|b|/2, εb2/2} for all n ≥ N.

For such n, |bn| ≥ |b| − |bn − b| > |b|/2, hence∣∣∣∣ 1bn− 1b

∣∣∣∣ = |bn − b||bbn|

≤ 2|bn − b|b2

< ε.

Part (d) follows from the inequality∣∣ |an| − |a| ∣∣ ≤ |an − a|.

For (e), observe first that a ≥ 0 (2.1.4). If a = 0, choose N such thatan < ε2 for all n ≥ N . If a > 0, choose N such that |an − a| < ε

√a for all

n ≥ N . For such n,

|√an −

√a| = |an − a|√

an +√a≤ |an − a|√

a< ε.

To illustrate the remaining cases a = ±∞ or b = ±∞, we prove part (b) forthe case −∞ < a < 0 and bn → +∞. To show that anbn → −∞, let M < 0and choose N so that

an < a/2 and bn > 2M/a for all n ≥ N .

For such n,−anbn > (−a/2)(2M/a) = −M,

hence anbn < M .

2.1.12 Example. To find

limn

√4n6 − 3n2 + 52n3 + 7n+ 3 ,

divide the numerator and denominator of the general term an by n3, thehighest power of n occurring in the denominator, to obtain

an =√

4− 3/n4 + 5/n6

2 + 7/n2 + 3/n3 .

The quotients in the numerator and denominator tend to 0, hence, by 2.1.11,an →

√4/2 = 1. ♦


Exercises1. Let a, b ∈ R. Find a closed formula for the nth term an of the sequences

(a)S a, b, a, b, . . . (b) a, a, b, b, a, a, . . . (c) a, a, a, b, b, b, a, a, a . . .(d) a, b, a, c, a, b, a, c . . . (e) 1, 2, 3, 4, 1, 2, 3, 4, . . .

2. Find a recursive formula for the sequence a, b, a, b, . . .

3. Use the ε, N definition of limit to prove that

(a) limn

4n− 12n+ 7 = 2. (b) S lim

n

2n2 − nn2 + 3 = 2. (c) lim

n

5√n+ 7

3√n+ 2 = 5

3 .

(d) limn

n− 1√n+ 1 = +∞. (e) S lim

n

(2 + 1

n

)3= 8. (f) limn

√n+ 2n+ 1 = 1.

4. Prove rigorously that the sequence {(−1)nn/(n+ 1)} has no limit.

5.S Find limn sin (n!rπ) for r ∈ Q.

6. Find limn1n

(n+ 1

n

)pfor all p ∈ R.

7.S Let {an} be contained in a finite set A. Prove that if an → a, then thereexists an index N such that an = a for all n ≥ N . In particular, a ∈ A.

8. Find limn bn if(a)S an → a and 3an + 2bn → c.(b) an → 2 and 3anbn + 5a2

n − 2bn → 1.

9. Let k ∈ N and a, b > 0. Evaluate limn an if an =

(a) S 2n+ 1(nk + 3n+ 1)1/k . (b)

(an− 1bn+ 1

)1/2.

(c)√n2 + kn− n. (d) S

√an+ b

√n−√an.

(e) (n+ k)!n!(n+ k)k . (f) nk

(√a2 + n−k − a

).

(g) S n[(a− 1/n)k − ak

]. (h) n

[1− (1− a/n)1/k

].

(i) (1− 1/2)(1− 1/3) · · · (1− 1/n). (j)n∑j=1

(n2 + j)−1.

(k) S (1− 1/22)(1− 1/32) · · · (1− 1/n2). (l)n∑j=1

(nk + j)−1/k, k > 1.

10. Let {an} be bounded and bn → 0. Prove that anbn → 0.

11.S Let an → a ∈ R, bn → b ∈ R, and r > 0 such that |an− bn| ≤ r for all n.Prove that |a− b| ≤ r.

12. Prove that if nan → a ∈ R, then√nan → 0. Show that the converse is

false.

13. Let an ≥ 0 for all n and an → a. Prove that a1/kn → a1/k, k ∈ N.

14. Let r > 0 and k ∈ N. Prove in each case that an → 1:(a)S an = r1/n. (b) an = n1/n.

(c) an =(r + nk

)1/n. (d) an =[

sin(1/n)]1/n.

15. Prove that an → a iff a+n → a+ and a−n → a−. (See 1.3.7.)

16. Let m ∈ N and r ∈ (−1, 1). Prove that limn nmrn = 0.

17.S Let 0 < r < 1, an > 0, and an+1/an < r for all n. Prove that an → 0.Construct a sequence {an} such that an > 0 and an+1/an < 1 for all nbut an 6→ 0.

18. Suppose that an → a ∈R. Prove that

limn

(a1 + · · ·+ an)/n = a.

Is the converse true?

19.S Let an → a ∈ R and let an ≥ a for all n. Prove that

limn

min{a1, · · · , an} = a.

Does min{a1, · · · , an} → a imply that an → a?

20. Show that if n−1an → 0, then n−1 max{a1, · · · , an} → 0. Prove that theconverse holds if {an} is bounded below. Give an example to show thatthe converse is not generally true.

21. Let 0 < x1 ≤ · · · ≤ xk. Prove that

limn

(xn1 + · · ·+ xnk )1/n = xk.

22.S Let f(x) be any real-valued function on R such that f(x)−x is boundedfor all x (for example, f(x) = bxc). Use Exercise 1.5.4 to prove that(a) (1/n2)

∑nj=1 f(jx)→ x/2. (b) (1/n3)

∑nj=1 f(j2x)→ x/3.

23. Let a0, a1 > 0 and an = √an−1an−2, n ≥ 2. Find limn an.

24. Let k ∈ N and let {an} be a sequence such that an+k − an → c ∈ R.Prove that an/n→ c/k. Suggestion. Consider first the case k = 1 to getthe general idea.

2.2 Monotone Sequences2.2.1 Definition. A sequence {an} in R is said to be increasing (strictlyincreasing) if an ≤ an+1 (an < an+1) for all n. Decreasing and strictly decreas-ing sequences are defined analogously. A sequence that is either increasing ordecreasing is called monotone. If {an} is increasing (decreasing), we write an ↑( an ↓). If an ↑ (an ↓) and an → a ∈ R, we write an ↑ a (an ↓ a). ♦

2.2.2 Monotone Sequence Theorem. If {an} is increasing (decreasing),then an ↑ supk ak (an ↓ infk ak). In particular, every bounded monotonesequence converges in R.

Proof. Assume {an} is increasing and let r < supk ak. By the approximationproperty of suprema, r < aN ≤ supk ak for some N . Since {an} is increasing,r < an ≤ supk ak for all n ≥ N . Therefore, an ↑ supk ak. The proof for thedecreasing case is similar.

2.2.3 Example. Let 0 < a < x1, y1 < b := a+ 1 and define {xn} and {yn}recursively by

xn+1 = a+√|xn − a| and yn+1 = b−

√|b− yn|.

By Exercise 1.5.1, {xn} is strictly increasing, {yn} is strictly decreasing, anda < xn, yn < b for all n. By 2.2.2, xn ↑ x and yn ↓ y for some x, y ∈ R. Tofind x, let n→∞ in the equation xn+1 = a+

√xn − a to obtain x = a+

√x− a.

This has solutions x = a and x = b. Since {xn} is increasing, x = b. Similarly,y = a. ♦

2.2.4 Example. We use the monotone sequence theorem to show that thesequence {(1 + 1/n)n} converges. By the binomial theorem (1.5.5) and theinequality k! ≥ 2k−1 (easily established by induction),

(1 + 1/n)n =n∑k=0

(n

k

)1/nk

= 2 +n∑k=2

(1− 1/n)(1− 2/n) · · · (1− (k − 1)/n)/k!

≤ 2 +n∑k=2

1/2k−1.

Since the sum in the last inequality is ≤ 1, {(1 + 1/n)n} is bounded aboveby 3. Now let m = n+ 1. Then

1− k/m ≥ 1− k/n ≥ 0, k = 1, . . . , n− 1,

hence

(1 + 1/m)m ≥ 2 +n∑k=2

(1− 1/m)(1− 2/m) · · · (1− (k − 1)/m)/k!

> 2 +n∑k=2

(1− 1/n)(1− 2/n) · · · (1− (k − 1)/n)/k!

= (1 + 1/n)n .

Thus {(1 + 1/n)n} is increasing. By 2.2.2, the sequence has a limit in R, whichis denoted by the letter e:

e := limn

(1 + 1/n)n = 2.71828182845905 . . . ♦

Exercises1.S Let 0 < a < 1 < b. Prove that a1/n ↑ 1 and b1/n ↓ 1.

2. Let an = an/nk and bn = bn/nk, where 0 < a < 1 0.

Prove: an ↓ 0 (eventually) and nan ↑ a/b.

4. Let xn > 0 and xn ↑ x. Prove that (xn1 + · · ·+ xnn)1/n → x.

5. Prove that for any nonempty set A of real numbers there exist sequences{an} and {bn} in A such that an ↑ supA and bn ↓ inf A.

6. Let {an} be monotone and set bn := (a1 + a2 + · · ·+ an)/n. Prove that{bn} is monotone. (Compare with Exercise 2.1.18.)

7.S Define a1 = 1 and an = 1 + (1 + an−1)−1. Find limn an by first showingthat 1 ≤ an ≤ 2, {a2n} is decreasing, and {a2n+1} is increasing.

8. Let r > 0, a0 =√r, and an = √r + an−1, n ≥ 1. Find limn an.

9.S Let r > 0, a1 > 0 and define

an = 12 (an−1 + r/an−1), n > 1.

Show that an ≥ an+1 ≥√r and find limn an.

10. Prove that e = limn (1− 1/n)−n.

11. Let < x0 < y0 and define

xn+1 = √xnyn and yn+1 = (xn + yn)/2.

Prove that 0 < xn < xn+1 < yn+1 < yn and that limn xn = limn yn.

2.3 Subsequences and Cauchy Sequences2.3.1 Definition. A subsequence of a sequence {an}∞n=1 in R is a sequence{ank}∞k=1, where the indices satisfy 1 ≤ n1 < n2 < · · · . The limit in R of asubsequence is called a cluster point of {an}. ♦

For example, in the following sequence the underlined terms define thebeginning of a subsequence {ank} with n1 = 3, n2 = 4, n3 = 6, etc.

a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a12, a13, a14, a15, . . .

Note that the indices nk of a subsequence satisfy nk ≥ k.

Examples. (a) The sequence{[1− (−1)b(n−1)/2c

]}= {0, 0, 2, 2, 0, 0, 2, 2 . . .}

is a subsequence of{1− (−1)n} = {2, 0, 2, 0, . . .},

which has cluster points 0 and 2.(b) The sequence {n sin (nπ/2)} has cluster points 0 and ±∞.(c) Let {r1, r2, . . .} be an arbitrary enumeration of the rational numbers

(see Appendix A). Then every real number is a cluster point of {rn}. Indeed,since every interval of the form (x − 1/n, x + 1/n) contains infinitely manyterms of the sequence, we may choose n1 ≥ 1 such that |x− rn1 | < 1, n2 > n1such that |x − rn2 | < 1/2, etc. In this way we may construct a subsequenceinductively such that |x− rnk | < 1/k for all k, hence rnk → x. ♦

Notation. It is occasionally convenient to use the following alternate methodto describe a subsequence: If we set bk = ank and then change the index in{bk}∞k=1 to n, then {bn} may be used to denote the subsequence {ank}. Thisprovides a convenient way to denote a subsequence of a subsequence. In thisregard, note that if {cn} is a subsequence of {bn} and {bn} is a subsequenceof {an}, then {cn} is a subsequence of {an}.

The following proposition shows that a convergent sequence has a singlecluster point.

2.3.2 Proposition. If {an} is a sequence in R and an → a ∈ R, then ank → afor any subsequence {ank} of {an}.

Proof. We prove the proposition for the case a ∈ R and leave the other casesfor the reader. Given ε > 0, choose N such that |an − a| < ε for all n ≥ N .Since nk ≥ k, |ank − a| < ε for all k ≥ N . Therefore, ank → a.


2.3.3 Example. We calculate limn(1 + 1/n2)3n2+5 by writing(1 + 1

n2

)3n2+5=[(

1 + 1n2

)n2 ]3(1 + 1

n2

)5.

The term in the square brackets is a subsequence of (1 + 1/n)n and henceconverges to e (see 2.2.4). It follows that (1 + 1/n2)3n2+5 → e3. ♦

The following result will have important consequences in later chapters.2.3.4 Bolzano–Weierstrass Theorem. Every bounded sequence in R hasa convergent subsequence.Proof. The proof is based on the observation that if a union of two sets containsinfinitely many terms of a sequence, then at least one of the sets must containinfinitely many of the terms of the sequence.

Let {an} be a bounded sequence, say c0 ≤ an ≤ d0 for all n. Bisect theinterval I0 := [c0, d0]. By the preceding observation, one of the resultingsubintervals, call it I1, contains infinitely many terms of the sequence. Chooseone such term, say an1 . Now bisect I1. Again, one of the resulting subintervals,call it I2, contains infinitely many terms of the sequence. Choose one such terman2 with n2 > n1. By repeating this procedure, we produce a subsequence{ank}∞k=1 of {an} and a sequence of intervals Ik = [ck, dk], k = 0, 1, . . ., suchthat

c0 ≤ ck−1 ≤ ck ≤ ank ≤ dk ≤ dk−1 ≤ d0, and dk+1 − ck+1 = 12 (dk − ck).

Since {ck} and {dk} are monotone and bounded ck → c and dk → d for somec, d ∈ R. Since dk − ck = 2−k(d0 − c0)→ 0, c = d. By the squeeze principle,ank → c.

I0

I1an1

I3

I2an2

an3

c1 d1

c2 d2

c3 d3

c0 d0

...

FIGURE 2.3: Interval halving process.

The Bolzano–Weierstrass theorem may be extended as follows:2.3.5 Theorem. Every sequence in R has a subsequence that converges in R.Proof. If {an} is bounded, then the Bolzano–Weierstrass theorem applies. Sup-pose that {an} is unbounded above. Then for each k ∈ N there exist infinitelymany indices n such that an > k. We may then construct a subsequence {ank}with ank > k for all k so ank → +∞.

2.3.6 Corollary. A sequence {an} in R has a limit in R iff it has exactly onecluster point in R.

Proof. The necessity is 2.3.2. For the sufficiency, suppose that {an} has exactlyone cluster point a ∈ R. Consider first the case a = +∞. We claim thatan → +∞. If not, then there exists M ∈ R such that an ≤ M for infinitelymany n, hence there exists a subsequence {ank} of {an} with ank ≤ M forall k. By 2.3.5, {ank} has a cluster point b ∈ R. But b ≤M < a, so {an} hasmore than one cluster point, a contradiction. Therefore, an → +∞, as claimed.The case a = −∞ is treated similarly.

Now suppose a ∈ R. Then an → a. If not, then there exists ε > 0 such that|an − a| ≥ ε for infinitely many n, so there is a subsequence {ank} of {an}with |ank − a| ≥ ε for all k. By 2.3.4, {ank} has a cluster point b in R. Butthen |b− a| ≥ ε, so again {an} has more than one cluster point.

2.3.7 Definition. A sequence {an} is said to be Cauchy if for each ε > 0there exists an index N such that |an − am| < ε for all m, n ≥ N . We expressthis condition by writing

limm,n

(an − am) = 0. ♦

The definition asserts that the terms of a Cauchy sequence get closer toone another. Thus the following result is not surprising.

2.3.8 Proposition. Every convergent sequence is Cauchy.

Proof. Let an → a. Given ε > 0, choose N such that |an − a| < ε/2 for alln ≥ N . Then for n,m ≥ N ,

|an − am| = |(an − a) + (a− am)| ≤ |an − a|+ |am − a| < ε.

It is of fundamental importance that the converse of 2.3.8 is true. To provethis, we need the following lemma.

2.3.9 Lemma. A Cauchy sequence is bounded.

Proof. Let {an} be a Cauchy sequence. Choose N such that |an − am| < 1 forall m,n ≥ N . Then |an| ≤ |an − aN |+ |aN | < 1 + |aN | for all n ≥ N , hence

|an| ≤ max{1 + |aN |, |a1|, |a2|, . . . , |aN−1|} for all n.

2.3.10 Cauchy Criterion. Every Cauchy sequence in R converges.

Proof. By 2.3.9 and the Bolzano–Weierstrass theorem, a Cauchy sequence{an} has a convergent subsequence, say ank → a ∈ R. We claim that an → a.Let ε > 0 and choose N such that |an−am| < ε for all m, n ≥ N . In particular,|an − ank | < ε for n, k ≥ N . Fixing n ≥ N and letting k → ∞ in the lastinequality yields |an − a| ≤ ε, verifying the claim.

Exercises1. Find all cluster points of {an}, where an =

(a)S (−1)n(

2n+ 14n+ 3

)sin2

(nπ3

). (b) (−1)n

(2n+ 1n+ 5

)cos2

(nπ4

).

(c)S (−1)bn/3c(1 + 1/n)2 + (−1)bn/4c(2 + 1/n)2 + (−1)bn/5c(3 + 1/n)2.(d) (−1)nrn + r2n, where rk is the remainder on division of k by 3.

2. Construct a sequence with precisely the cluster points 1, 2, 3, +∞.

3. Let k ∈ N. Use the fact that limn(1 + 1/n)n = e (2.2.4) to find limn anfor an =

(a)(

1 + 1kn

)n. (b)

(1 + 1

k + n

)n. (c)

(1k

+ 1n

)n.

(d)S

(1 + 1

2n+ k

)kn. (e)

(1 + 1

3n3 + 5

)7n3−4.

4. Let {an} and {bn} be bounded sequences. Show that there exist conver-gent subsequences of {an} and {bn} with the same indices.

5.S Prove that a sequence contained in a finite set has a constant subsequence.

6. Let −∞ < an < r ≤ +∞ with an → r. Show that {an} has a strictlyincreasing subsequence.

7. Show that every sequence of distinct real numbers has a strictly monotonesubsequence.

8.S Let k ∈ N and suppose that the series∑∞n=1 |an+k − an| converges (see

Chapter 6). Prove that {an} has a convergent subsequence.

9. Let a0, a1 be arbitrary and define an+1 = (an + an−1)/2, n ≥ 1. Showdirectly that {an} is a Cauchy sequence. (Its limit may be found fromExercise 1.5.14.)

10.S Let 0 0 for all n. Set bn = aqn/(1 + apn). Show thatan → 0 iff bn → 0. Is the assertion true if 0 < q < p?

11. Let I be an open interval and let {an} have the property that each opensubinterval J of I contains an for infinitely many n. Prove that everypoint of I is a cluster point of {an}. Give an example of such a sequence.

12. Suppose that the cluster points of {an} form a sequence {bn}. Show thatevery cluster point b of {bn} is a cluster point of {an}. Hint. Choose asubsequence {bnk} such that |bnk − b| < 1/k.


2.4 Limits Inferior and SuperiorFor an arbitrary sequence {an} in R, define

an = infk≥n

ak and an = supk≥n

ak, n = 1, 2, . . . .

Then {an} is increasing and {an} is decreasing, hence the limits

lim infn

an := limnan and lim sup

nan := lim

nan

exist in R. These limits are called, respectively, the limit inferior and limitsuperior of the sequence {an}.

an ana a

FIGURE 2.4: a = lim infn an and a = lim supn an.

Clearly,an ≤ an ≤ an and lim inf

nan ≤ lim sup

nan.

Furthermore, if {an} is unbounded below, then lim infn an = −∞, and if {an}is unbounded above, then lim supn an = +∞.

Here are some examples:

(a) lim infn(−1)nnn+ 1 = −1, lim supn

(−1)nnn+ 1 = 1,

(b) lim infn[(−1)n + 1]n = 0, lim supn[(−1)n + 1]n = +∞,(c) lim infn sinn = −1, lim supn sinn = 1.

Example (c) follows from Example 8.3.10. (See Exercise 8.3.15.)The next proposition shows that lim sup and lim inf have properties similar

to those of limits. Their usefulness derives from this fact together with theproperty that, in contrast to ordinary limits, the limits inferior and superiorof a sequence always exist (in R).

2.4.1 Proposition. For any sequences {an} and {bn} in R,

(a) lim supn(−an) = − lim infn an.

(b) lim supn(an + bn) ≤ lim supn an + lim supn bn if the right side is defined.

(c) lim infn(an + bn) ≥ lim infn an + lim infn bn if the right side is defined.

(d) lim supn can = c lim supn an, if c ≥ 0.

(e) lim infn can = c lim infn an, if c ≥ 0.

(f) lim supn(anbn) ≤ (lim supn an)(lim supn bn) if an, bn ≥ 0 for all n.

(g) lim infn(anbn) ≥ (lim infn an)(lim infn bn) if an, bn ≥ 0 for all n.

(h) lim infn an ≤ lim infn bn, lim supn an ≤ lim supn bn if an ≤ bn for all n.

Proof. Part (a) follows from supk≥n(−ak) = − infk≥n ak and part (h) is adirect consequence of the definitions. Part (b) follows by taking limits in theinequality

supk≥n

(ak + bk) ≤ supk≥n

ak + supk≥n

bk.

Part (f) follows similarly from

supk≥n

akbk ≤ supk≥n

ak supk≥n

bk.

Part (d) is a consequence of

supk≥n

cak = c supk≥n

ak, c ≥ 0.

Parts (c), (e), and (g) are proved in a similar manner.

2.4.2 Theorem. For any sequence {an} in R, the extended real numbersa := lim infn an and a := lim supn an are cluster points of {an}. All othercluster points of {an} in R lie between these.

Proof. We leave the case a = −∞ to the reader. Assume then that a > −∞and recall that an ↓ a. Choose a strictly increasing sequence of real numbersrn tending to a. Since r1 < a1, by the approximation property of supremathere exists an index n1 such that r1 < an1 ≤ an1 . Similarly, since r2 < an1+1,there exists an index n2 > n1 such that r2 < an2 ≤ an2 . In this way we mayconstruct inductively a subsequence {ank} such that rk < ank ≤ank . By thesqueeze principle, ank → a. The limit infimum case is treated similarly.

Now let {ank} be any subsequence of {an} with ank → a ∈ R. Then, forany m and k ≥ m, am ≤ ank ≤ am. Letting k → ∞ yields am ≤ a ≤ am.Letting m→∞ we obtain a ≤ a ≤ a.

Since limn an exists in R iff {an} has exactly one cluster point (2.3.6), thefollowing result is immediate.

2.4.3 Corollary. For any sequence {an} in R, limn an exists in R ifflim infn an = lim supn an. In this case, all three limits are equal.

Exercises1. Find lim infn an and lim supn an if

(a)S an = (−1)n5n+ 73n+ 5 .

(b) an = nsin(nπ/2) + (1/n) cos(n).(c)S an = (−1)bn/3c(1+1/n)2+(−1)bn/4c(2+1/n)2+(−1)bn/5c(3+1/n)2.

(d) an = 2nrn + 1nr2n + 1 , rk the remainder on division of k ∈ N by 3.

(e) an = (−1)rnxn + (−1)rn+1yn + (−1)rn+2zn, where xn → x, yn → y,zn → z, and x < y < z.

(f) a1 = 1, a2n = ra2n−1, a2n+1 = ar + a2n, 0 < r < 1, a > 0.(g) an = 2n + 2−n + (−1)n(2n − 2−n).

(h)S an = 3n cos (nπ/4) + 22n sin (nπ/4) + 3 .

2. Show by example that the inequalities (b), (c), (f), and (g) in 2.4.1 maybe strict.

3.S Let an > 0 for all n. Prove that

lim supn

(1/an) = 1/ lim infn

an and lim infn

(1/an) = 1/ lim supn

an.

4. Let {an} be bounded and nonnegative and let r ∈ Q+. Prove that

lim supn

arn =(

lim supn

an)r and lim inf

narn =

(lim inf

nan)r.

5.S Show that for any subsequence {ank} of {an},

lim supk→∞

ank ≤ lim supn

an and lim infk→∞

ank ≥ lim infn

an.

6. Let bn → b ∈ (0,+∞). Prove that

lim supn

(an+bn) = b+lim supn

an and lim infn

(an+bn) = b+lim infn

an.

7.S Let an ≥ 0 for all n and bn → b ∈ (0,+∞). Prove thatlim sup

nanbn = b lim sup

nan and lim inf

nanbn = b lim inf

nan.

8. Prove that∣∣ lim supn

an∣∣ ≤ lim sup

n|an| and

∣∣ lim infn

an∣∣ ≥ lim inf

n|an|.

Show by examples that the inequalities may be strict.


9. Let {nk} be a sequence of positive integers that contains each positiveinteger exactly once. Show that

lim supk

ank = lim supn

an and lim infk

ank = lim infn

an.

In particular, if an → a, then ank → a.(Note: {ank}∞k=1 is not necessarily

a subsequence {an}.)

10.S Let an → a > 0 and lim infn bn > 0. If b2n − anbn − 6a2n → 0, prove that

lim supn→∞ bn ≤ 3a.

11. Prove that for any sequence {an},

lim infn

an ≤ lim infn

1n

n∑j=1

aj ≤ lim supn

1n

n∑j=1

aj ≤ lim supn

an.

12.S ⇓1 Let an > 0 for all n. Prove that

lim infn

an+1

an≤ lim inf

na1/nn ≤ lim sup

na1/nn ≤ lim sup

n

an+1

an.

Use this to calculate limn n/(n!)1/n.


Chapter 3Limits and Continuity on R

3.1 Limit of a FunctionThe definition of limit of a function f given in 3.1.3 below is a precise

formulation of the intuitive idea that as x gets closer to a number a, the functionvalue f(x) approaches some fixed number L. This notion is convenientlydescribed in terms of certain subsets of R called neighborhoods.

3.1.1 Definition. Let r > 0. A neighborhood of a ∈ R is an interval of theform

N (a) = Nr(a) :=

(a− r, a+ r) if a ∈ R,(r,+∞) if a = +∞,(−∞,−r) if a = −∞.

If a ∈ R, the set N (a) \ {a} := (a − r, a) ∪ (a, a + r) is called a deletedneighborhood of a. ♦

The reader should verify that the intersection of finitely many neighbor-hoods of a is again a neighborhood of a and that neighborhoods separate points,that is, if a 6= b are extended real numbers, then there exist neighborhoodsN (a) and N (b) such that N (a) ∩N (b) = ∅.

3.1.2 Definition. An accumulation point of a nonempty set E of real numbersis an extended real number a such that every neighborhood of a contains apoint of E not equal to a. A member of E that is not an accumulation pointof E is called an isolated point of E. ♦

For example, the set of accumulation points of E :=[Q ∩ (−1, 0)

]∪ N is

[−1, 0] ∪ {+∞}, and the set of isolated points of E is N.The following definition of limit is sufficiently general to include the usual

limits encountered in calculus: one-sided limits, two-sided limits, limits atinfinity, and infinite limits.

3.1.3 Definition. Let E ⊆ R, let f be a real-valued function whose domainincludes E, and let a, L ∈ R, where either a ∈ E or a is an accumulation pointof E (not necessarily in the domain of f). We write

L = limx→ax∈E

f(x)

47

if, for each neighborhood N (L) of L, there is a neighborhood N (a) of a suchthat

x ∈ E ∩N (a) implies f(x) ∈ N (L). (3.1)

In this case we say that that f(x) approaches L as x tends to a along E ♦

The restrictions on a guarantee that E ∩N (a) 6= ∅, hence condition (3.1)is not vacuously satisfied. Note that if a ∈ E is not an accumulation point ofE, then it must be an isolated point, in which case lim{x→a, x∈E} f(x) triviallyexists and equals f(a).

We single out the following important special cases, where a ∈ R and s > 0:

(a) left-hand limit : limx→a−

f(x) := limx→ax∈E

f(x), E = (a− s, a).

(b) right-hand limit : limx→a+


f(x), E = (a, a+ s).

(c) two-sided limit : limx→a


f(x), E = (a− s, a+ s) \ {a}.

(d) limit at +∞ : limx→+∞

f(x) := limx→+∞x∈E

f(x), E = (s,+∞).

(e) limit at −∞ : limx→−∞

f(x) := limx→−∞x∈E

f(x), E = (−∞,−s).

f

L

L+ ε1

L− ε1

L+ ε2

L− ε2

xaa− δ a+ δ

FIGURE 3.1: δ works for ε1 but not for ε2.

Applying the definition of limit to the cases (a)–(e) above produces thestandard limit definitions encountered in beginning calculus. For example, ifthe limit L in (c) is finite, then, in the context of (c), 3.1.3 asserts that foreach ε > 0 there exists a δ ∈ (0, s) such that

|f(x)− L| < ε for all x with 0 < |x− a| < δ.

(See Figure 3.1.) For (e) and the case L = +∞, the definition asserts that foreach M ∈ R there exists an r > s such that

f(x) > M for all x with x < −r.

Limits and Continuity on R 49

The advantage of having a single definition of limit is that it provides aunified theory and allows for economy of thought and presentation.

As in the case of sequences, limits of functions are unique. Indeed, ifL1 6= L2 both satisfy criterion (3.1), then, given neighborhoods N (L1) andN (L2), there would exist a neighborhood N (a) such that

x ∈ E ∩N (a)⇒ f(x) ∈ N (L1) ∩N (L2).

However, N (L1) and N (L2) may be taken to be disjoint, and choosing anyx ∈ E ∩N (a) then results a contradiction.

In any discussion of limits we shall tacitly assume that a and E satisfy theconditions of 3.1.3.3.1.4 Example. Let f(x) = (3x+ 2)/(2x− 1). Then(a) limx→∞ f(x) = limx→−∞ f(x) = 3/2.

(b) limx→a f(x) = f(a), (a 6= 1/2).

(c) limx→1/2+ f(x) = +∞.

(d) limx→1/2− f(x) = −∞.To verify (a,) let ε > 0 and note that the quantity∣∣∣∣f(x)− 3

2

∣∣∣∣ = 72|(2x− 1)|

will be less than ε if |2x− 1| > 7/2ε. The latter inequality is satisfied if eitherx > (1 + 7/2ε)/2 or x < (1− 7/ε)/2.

For (b), observe first that

|f(x)− f(a)| =∣∣∣∣3x+ 22x− 1 −

3a+ 22a− 1

∣∣∣∣ = 7|x− a||2x− 1||2a− 1| .

By the triangle inequality,

|2x− 1| ≥ |2a− 1| − |(2a− 1)− (2x− 1)| = |2a− 1| − 2|a− x|.

Hence if |a− x| < |2a− 1|/4, then |2x− 1| > |2a− 1|/2 and therefore

|f(x)− f(a)| < 14|x− a||2a− 1|2 .

It follows that |f(x)− f(a)| will be less than ε if we require additionally that|x− a| < ε|2a− 1|2/14. Therefore, any δ < min{|2a− 1|/4, ε|2a− 1|2/14} willsatisfy criterion (3.1).

To prove (c), note that if 0 < |x− 1/2| < 1/2, then x > 0, hence

f(x) = 12

3x+ 2x− 1/2 >

1x− 1/2 .

Given M > 2, let δ = 1/M . Then |x − 1/2| < δ ⇒ 0 < x − 1/2 < 1/M ⇒f(x) > M , proving (c). The proof of part (d) is similar. ♦


3.1.5 Theorem. Let f be a function with domain D and let E = E1∪E2 ⊆ D.Suppose that one of the following holds:

• a is an accumulation point of both E1 and E2.• a is an isolated point of both E1 and E2.• a is an accumulation point of E1 and an isolated point of E2.• a is an accumulation point of E2 and an isolated point of E1.

Then lim{x→a, x∈E} f(x) exists in R iff both limits lim{x→a, x∈E1} f(x) andlim{x→a, x∈E2} f(x) exist in R and are equal. In this case all three limits areequal.

Proof. If a is an accumulation point of E1 or E2, then a is an accumulationpoint of E. If a is an isolated point of E1 and E2, then a is an isolated pointof E. This shows that in each case lim{x→a, x∈E} f(x) is at least defined.

Now suppose that L := lim{x→a, x∈E} f(x) exists. Then (3.1) holds forE, so it must hold for each of the subsets E1 and E2 as well. Therefore,lim{x→a, x∈E1} f(x) and lim{x→a, x∈E2} f(x) exist and equal L.

Conversely, suppose that the limits along E1 and E2 exist and equal K ∈ R.Then, given a neighborhood N (K), there exists a neighborhood N (a) of asuch that x ∈ Ej ∩ N (a) implies f(x) ∈ N (K), j = 1, 2. Thus x ∈ E ∩ N (a)implies f(x) ∈ N (K), proving that lim{x→a, x∈E} f(x) = K.

3.1.6 Example. Take E1 = N and E2 = (0, 2). Then 2 is an isolated point ofE1 and an accumulation point of E2, and lim{x→2, x∈E1} f(x) = f(2). Therefore,by the theorem, lim{x→2, x∈E} f(x) exists iff limx→2− f(x) = f(2). ♦

3.1.7 Example. (Dirichlet function). Let

d(x) ={

1 if x ∈ Q,0 otherwise.

Since lim{x→a, x∈Q} d(x) = 1 and lim{x→a, x∈I} d(x) = 0, limx→a d(x) cannotexist. ♦

The following is an immediate consequence of 3.1.5.

3.1.8 Corollary. limx→a f(x) exists iff limx→a− f(x) and limx→a+ f(x) existand are equal. In this case all three limits are equal.

The next result shows that function limits may be characterized in termsof limits of sequences.

3.1.9 Sequential Characterization of Limit. Let f be a function whosedomain includes E and let a ∈ R be an accumulation point of E. Thenlim{x→a, x∈E} f(x) exists in R and equals L iff f(an) → L for all sequences{an} in E with an → a.

Proof. Assume that lim{x→a, x∈E} f(x) = L and let {an} be a sequence inE with an → a. Given a neighborhood N (L), choose N (a) as in (3.1) andthen choose N such that an ∈ N (a) for all n ≥ N . For such n, f(an) ∈ N (L).Therefore, f(an)→ L.

Now suppose that lim{x→a, x∈E} f(x) 6= L. Then there is a neighborhoodof L such that (3.1) fails for each neighborhood N (a) of a. Consider the casea, L ∈ R. Then N (L) is of the form (L − r, L + r) for some r > 0. TakingN (a) = (a − 1/n, a + 1/n) we see that for each n ∈ N there exists an ∈ Ewith |an − a| < 1/n and |f(an)− L| ≥ r. Thus an → a and f(an) 6→ L, so thesequential condition does not hold. A similar argument works if either a or Lis infinite.

3.1.10 Example. Let f(x) = sin (1/x), x 6= 0. Since f(1/nπ

)= 0 and

f(2/(4n+ 1)π

)= 1, limx→0+ f(x) does not exist. ♦

3.1.11 Cauchy Criterion for Functions. Let a be an accumulation pointof E. Then lim{x→a, x∈E} f(x) exists in R iff given ε > 0 there exists δ > 0such that |f(x)− f(y)| < ε for all x, y ∈ E with |x− y| < δ.

Proof. If lim{x→a, x∈E} f(x) exists in R, then an application of the triangleinequality shows that the ε, δ-condition of the theorem holds.

Conversely, assume that the ε, δ-condition holds and let {an} be a sequencein E with an → a. By the hypothesis, {f(an)} is a Cauchy sequence andso converges to some real number L. Suppose {bn} is another sequence in Econverging to a. Then an−bn → 0 so, by the ε, δ-condition, f(an)−f(bn)→ 0.Therefore, f(bn)→ L. By 3.1.9, lim{x→a, x∈E} f(x) = L.

3.1.12 Theorem. Let f be a function whose domain includes E and let a ∈ Rbe an accumulation point of E. Then the following properties hold in the sensethat if the expressions on the right exist in R, then the limits on the left existand the equality holds.

(a) limx→ax∈E

[sf(x) + tg(x)] = s limx→ax∈E

f(x) + t limx→ax∈E

g(x), s, t ∈ R.

(b) limx→ax∈E

f(x)g(x) = limx→ax∈E

f(x) limx→ax∈E

g(x).

(c) limx→ax∈E

f(x)g(x) =

lim{x→a, x∈E} f(x)lim{x→a, x∈E} g(x) if lim

x→ax∈E

g(x) 6= 0.

(d) limx→ax∈E|f(x)| =

∣∣∣ limx→ax∈E

f(x)∣∣∣.

Proof. The assertions follow immediately from 2.1.11 and 3.1.9. However, itis instructive to formulate direct proofs. We do this for the finite version ofpart (c). Assume that the limits

L := limx→ax∈E

f(x) and M := limx→ax∈E

g(x) 6= 0

are finite and let ε > 0. Choose N1(a) such that

|g(x)−M | < |M |/2 for all x ∈ E ∩N1(a).

For such x, |g(x)| ≥ |M | − |M − g(x)| ≥ |M |/2, hence∣∣∣∣f(x)g(x) −

L

M

∣∣∣∣ = |Mf(x)− Lg(x)||Mg(x)|

= |M(f(x)− L) + L(M − g(x))||Mg(x)|

≤ |M | |f(x)− L|+ |L| |M − g(x)||M |2/2

= 2|M ||f(x)− L|+ 2|L|

M2 |M − g(x)|

≤ K(|f(x)− L|+ |M − g(x)|

), K := 2/|M |+ 2|L|/M2.

Now choose N2(a) so that |f(x)− L| < ε/2K and |M − g(x)| < ε/2K for allx ∈ E ∩N2(a). Then x ∈ E ∩N1(a) ∩N2(a)⇒ |f(x)/g(x)− L/M | < ε.

3.1.13 Example. (Limits of rational functions at infinity). Let f(x) =P (x)/Q(x), where

P (x) = a0 +a1x+ · · ·+anxn and Q(x) = b0 +b1x+ · · ·+bmx

m, an, bm 6= 0.

For any a, c ∈ R, limx→c a = a and limx→c x = c, hence, by 3.1.12,limx→c f(x) = f(c), provided Q(c) 6= 0. To calculate limits at +∞, write

f(x) = a0x−n + a1x

−n+1 + · · ·+ an−1x−1 + an

b0x−m + b1x−m+1 + · · ·+ bm−1x−1 + bmxn−m.

Since limx→+∞ x−j = 0 for j ∈ N, we see that

limx→+∞

f(x) =

0 if m > n,an/bn if m = n, and±∞ if m < n,

where the sign in the last case is that of an/bm. ♦

3.1.14 Theorem. Let f be a function whose domain includes E and leta ∈ R be an accumulation point of E. If f(x) ≤ g(x) for all x ∈ E and ifL := lim{x→a, x∈E} f(x) and M := lim{x→a, x∈E} g(x) exist in R, then L ≤M .

Proof. Assume, for a contradiction, that M < L. Choose any K ∈ (M,L)and then choose neighborhoods N (L) ⊆ (K,+∞) and N (M) ⊆ (−∞,K) (seeFigure 3.2). Then there exists a neighborhood N (a) such that f(x) ∈ N (L)and g(x) ∈ N (M) for all x ∈ E ∩ N (a). But for any such x, g(x) < f(x),contradicting the hypothesis.

K LM

N (M) N (L)

g(x) f(x)

FIGURE 3.2: L can’t be greater than M .

3.1.15 Theorem (Squeeze principle for functions). Let f be a functionwhose domain contains E and let a ∈ R be an accumulation point of E.If f(x) ≤ g(x) ≤ h(x) for all x ∈ E and if the limits lim{x→a, x∈E} f(x) andlim{x→a, x∈E} h(x) exist in R and are equal, then lim{x→a, x∈E} g(x) exists inR and all three limits are equal.

Proof. Let L denote the common limit. For the case L ∈ R, given ε > 0 thereexists a neighborhood N (a) of a such that

L− ε ≤ f(x) ≤ g(x) ≤ h(x) < L+ ε for all x ∈ E ∩N (a).

The cases L = ±∞ are proved similarly.

3.1.16 Definitions. A function f is said to be strictly increasing on E iff(x) < f(y) for all x, y ∈ E with x < y. Similarly, f is increasing on E iff(x) ≤ f(y) for all x, y ∈ E with x < y. The notions of strictly decreasing anddecreasing are defined analogously. If f is either (strictly) increasing or (strictly)decreasing on E, then f is said to be (strictly) monotone on E. Finally, f isbounded on E if there exists a real number M such that |f(x)| ≤ M for allx ∈ E. ♦

The reader should compare the following theorem with the monotonesequence theorem (2.2.2).

3.1.17 Monotone Function Theorem. Let a, b, c ∈ R with a < c < b.If f is monotone on (a, b), then limx→a+ f(x), limx→b− f(x) exist in R andlimx→c− f(x), limx→c+ f(x) exist in R.

Proof. Assume that f is increasing. Let s := supa<x<b f(x). By the approx-imation property of suprema, for each r < s there exists xr ∈ (a, b) suchthat r < f(xr) ≤ s. Since f(x) is increasing, r < f(x) ≤ s for all x ∈ (xr, b).Therefore

limx→b−

f(x) = supa<x<b

f(x).

Similarly,limx→a+

f(x) = infa<x<b

f(x).

The assertion regarding limits at c is proved in the same way noting that, sincef is bounded in a neighborhood of c, the limits are finite.


Exercises1.S Show that all points of a finite set E are isolated.

2. Find all the accumulation points of the set {2/m− 3/n : m,n ∈ N}.

3. Prove that if E has an accumulation point a, then there exists a sequence{an} of distinct points in E such that an → a.

4. Determine the limit and then use the definition to prove your result:

(a) limx→1

(3x2 − 2x+ 1). (b)S limx→1

x+ 33x+ 1 . (c) lim

x→4

√x+ 1√x− 1 .

(d)S limx→−∞

(x2 + x). (e) limx→−∞

√x2 + x− x. (f) lim

x→+∞

√x+ 1√x− 1 .

5. Discuss the possible values of limx→−∞

f(x) for the function f in 3.1.13.

6. Use the results of this section together with standard trig identities andthe limits

limx→0

cosx = limx→0

sin xx

= 1

to evaluate the following limits without using l’Hospital’s rule.

(a) S limx→0

sin (2x)sin (3x) . (b) lim

x→1

sin(x− 1)x3 − 1 . (c) lim

x→0+

sin x2√x.

(d) S limx→0+

sin√x

x2 . (e) limx→0

1− cos (3x)sin2 x

. (f) limx→0

3x+ 2 sin (5x)2x− 3 sin (7x) .

(g) S limx→+∞

1− sec(3/x)1− sec(5/x) . (h) lim

x→1

tan(x2 − 3x+ 2)tan(x2 − 4x+ 3) . (i) lim

x→+∞

sin(1/x)sin(1/

√x) .

7. Evaluate the following limits without using l’Hospital’s rule, wherem, n ∈ N, and a, b, c, d > 0.

(a) limx→3

1x− 3

[1 + (x− 15)

(x2 + x)

]. (b) S lim

x→0

1x

[1

x+ 1 −1√x+ 1

].

(c) limx→a

xn − an

xm − am. (d) lim

x→b

x1/n − b1/n

x1/m − b1/m.

(e) limx→2b+

√x− b−

√b

x− 2b . (f) S limx→0

√b+ x−

√b− x√

c+ x−√c− x

.

(g) limx→0

√(x+ 1)(x+ 2)−

√2

x. (h) S lim

x→0

√ax+ b−

√b

√cx+ d−

√d.

(i) limx→n−

x+ bxcx− bxc

. (j) limx→n+

x+ bxcx− bxc

.


8. Let aj ∈ R and ε > 0. Show that there exists a ∈ R such that

atn +n−1∑j=1

ajtj > −ε for all t ≥ 0.

9.S Define

f(x) ={

4x2 + 2x− 11 if x is rational3x2 + x− 5 if x is irrational.

Show that limx→a f(x) exists for precisely two values of a.

10. Let c ∈ R. Show that limx→c f(x) exists in R iff for each strictly increasingsequence {an} converging to c and each strictly decreasing sequence {bn}converging to c, limn f(an) = limn f(bn).

11. Evaluate the following limits without using l’Hospital’s rule, wherea, b, c > 0, m, n ∈ N.(a) S lim

x→+∞

baxcx

. (b) limx→+∞

√x

⌊b

x

⌋x.

(c) limx→+∞

bmxc+ n

bnxc+m. (d) lim

x→+∞

bax+ sin xcbbx+ cosxc .

(e) S limx→+∞

√ax[√

bx+ c−√bx]. (f) lim

x→+∞

√ax+ b−

√b

√cx+ d−

√d.

(g) limx→+∞

[√ax+

√bx−

√ax+

√cx

]. (h) lim

x→+∞

[x√ax2 + b− x2√a

].

12. ⇓1 Let f(x) = xn, n ∈ N. Use Exercise 1.2.4 to prove that f is strictlyincreasing on [0,+∞), and if n is odd then f is strictly increasing on R.

13.S Suppose that f is monotone on (0,+∞) and an → +∞. Prove thatlimx→∞ f(x) = limn f(an).

*3.2 Limits Inferior and SuperiorLet f be a function whose domain includes E ⊆ R, and let a ∈ R be either

a member of E or an accumulation point of E (not necessarily in the domainof f). For each neighborhood Nr(a) define

f(r) = inf{f(x) : x ∈ E ∩Nr(a)} and f(r) = sup{f(x) : x ∈ E ∩Nr(a)}.

If a ∈ R, then f(r) increases and f(r) decreases as r ↓ 0. Similarly, if a = ±∞,then f (r) increases and f(r) decreases as r ↑ +∞. By the monotone function



theorem, in the first case f and f have limits as r → 0+ and in the secondcase f and f have limits as r → +∞. We then define

lim infx→ax∈E

f(x) := limr→0+

f(r) and lim supx→ax∈E

f(x) := limr→0+

f(r) if a ∈ R,

lim infx→ax∈E

f(x) := limr→+∞

f(r) and lim supx→ax∈E

f(x) := limr→+∞

f(r) if a = ±∞.

The extended real numbers

lim infx→ax∈E

f(x) and lim supx→ax∈E

f(x)

are called, respectively, the limit inferior and limit superior of f , as x tendsto a along E. Clearly,

lim infx→ax∈E

f(x) ≤ lim supx→ax∈E

f(x).

The above definitions include all the standard formulations of limit superiorand limit inferior. For these, we shall use notation analogous to that for limits.For example, taking E = (0 +∞), we have

lim infx→0+

sin(1/x) = −1 and lim supx→+∞

sin x = 1.

Limits superior and inferior of a function have properties similar to ordinarylimits but unlike the latter, always exist (in R).

The following theorem establishes a connection with limits superior andinferior of sequences.

3.2.1 Theorem. Let a ∈ R be an accumulation point of E. Then there existsequences {xn} and {yn} in E tending to a such that

lim infx→ax∈E

f(x) = limnf(xn) and lim sup

x→ax∈E

f(x) = limnf(yn). (3.2)

Moreover, if A denotes the set of all sequences {an} in E tending to a, then

lim infx→ax∈E

f(x) = inf{an}∈A

lim infn f(an) and lim supx→ax∈E

f(x) = sup{an}∈A

lim supn f(an).

Proof. We prove only the lim sup case. Let

s := lim supx→ax∈E

f(x) and t := sup{an}∈A

lim supn→∞

f(an).

We assume that both a and s are finite; the proofs for the other cases aresimilar.

Choose a sequence rn → 0+ such that f(rn) → s. For each n, use the

approximation property of supremum to obtain a point yn ∈ E ∩Nrn(a) suchthat f(rn)− 1/n < f(yn) ≤ f(rn). Then yn → a and f(yn)→ s. This provesthe limsup part of (3.2) and also shows that s ≤ t.

For the reverse inequality, let {an} be in A and let L := lim supn→∞ f(an).Let r > 0 and choose N ∈ N such that an ∈ Nr(a) for all n ≥ N . For such n,f(an) ≤ f(r), hence L ≤ f(r). Letting r → 0+ yields L ≤ s. Therefore,t ≤ s.

3.2.2 Corollary. Let a ∈ R be an accumulation point of E. Thenlim{x→a, x∈E} f(x) exists in R iff

lim infx→ax∈E

f(x) = lim supx→ax∈E

f(x), (3.3)

in which case the three limits are equal.

Proof. Let (3.3) hold and denote the common value by L. By the theorem,for any sequence {an} in E tending to a, lim supn f(an) ≤ L ≤ lim infn f(an),hence limn f(an) exists and equals L. From the sequential characterization oflimit, lim{x→a, x∈E} f(x) exists and equals L.

Conversely, assume K := lim{x→a, x∈E} f(x) exists in R and let an ∈ Ewith an → a. Then K = limn f(an) hence lim supn f(an) = lim infn f(an). Bythe theorem, (3.3) holds and the common value is K.

3.2.3 Proposition. Let f and g have domain containing E ⊆ R and let a ∈ Rbe an accumulation point of E. Then

(a) lim supx→ax∈E

[−f(x)] = − lim infx→ax∈E

f(x).

(b) lim supx→ax∈E

[f(x) + g(x)] ≤ lim supx→ax∈E

f(x) + lim supx→ax∈E

g(x).

(c) lim infx→ax∈E

[f(x) + g(x]) ≥ lim infx→ax∈E

f(x) + lim infx→ax∈E

g(x).

(d) lim supx→ax∈E

cf(x) = c lim supx→ax∈E

f(x), lim infx→ax∈E

cf(x) = c lim infx→ax∈E

f(x) if c ≥ 0.

(e) lim supx→ax∈E

f(x)g(x) ≤(

lim supx→ax∈E

f(x))(

lim supx→ax∈E

g(x))if f, g ≥ 0.

(f) lim infx→ax∈E

f(x)g(x) ≥(

lim infx→ax∈E

f(x))(

lim infx→ax∈E

g(x))if f, g ≥ 0.

(g) lim infx→ax∈E

f(x) ≤ lim infx→ax∈E

g(x) and lim supx→ax∈E

f(x) ≤ lim supx→ax∈E

g(x) if f ≤ g.


Proof. In (b) and (c) it is assumed that the expressions on the right are definedin R. The assertions of the proposition may be proved directly or by using2.4.1 together with 3.2.1. We illustrate the latter approach in proving (b). Theproofs of the remaining parts are similar.

Let {an} be an arbitrary sequence in E tending to a. By part (b) of 2.4.1and by 3.2.1,

lim supn

[f(an) + g(an)

]≤ lim sup

nf(an) + lim sup

ng(an)

≤ lim supx→ax∈E

f(x) + lim supx→ax∈E

g(x).

Taking the supremum over all such sequences {an} yields (b).

Exercises1. Calculate lim infx→+∞ f(x) and lim supx→+∞ f(x) for each of the func-

tions f(x) below, where r(k) is the remainder on division of k ∈ N by 3and d(x) is the Dirichlet function (3.1.7).

(a) S sin[xd(x)]. (b) (−1)bxc3x+ 72x+ 5(−1)bxc . (c) S (−1)bxcb2xc+ 3

b3xc+ 2 .

(d) 4r(bxc)x+ 57r(bxc)x+ 1 . (e) S ex + e−x

(−1)bxc(ex − e−x) . (f) cosx sin x.

(g) 12 + cosx. (h) sin x+ cosx. (i) S 3 sin x

2 + sin x.

(j) cos(π

3 sin x). (k) 1

1 + sin2 x. (l) (−1)bxc−bx

2c.

2. Prove the remaining parts of 3.2.3. Give examples to show that theinequalities in (b), (c), (e), and (f) may be strict.

3.S Let E = E1 ∪ E2, where a is an accumulation point of both E1 and E2.Prove:

lim supx→ax∈E

f(x) = max{

lim supx→ax∈E1

f(x), lim supx→ax∈E2

f(x)}.

lim infx→ax∈E

f(x) = min{

lim infx→ax∈E1

f(x), lim infx→ax∈E2

f(x)}.

Conclude in particular that

lim supx→a f(x) = max{

lim supx→a− f(x), lim supx→a+ f(x)}.

lim infx→a f(x) = min{

lim infx→a− f(x), lim infx→a+ f(x)}.

4.S Let f(x) > 0 for all x ∈ E. Prove that

lim supx→ax∈E

1f(x) = 1

lim inf{x→a, x∈E} f(x) .

5. Prove that∣∣∣∣ lim supx→ax∈E

f(x)∣∣∣∣ ≤ lim sup

x→ax∈E

|f(x)| and∣∣∣∣ lim infx→ax∈E

f(x)∣∣∣∣ ≥ lim inf

x→ax∈E

|f(x)|.

Show by examples that the inequalities may be strict.

6. Let f : [a, b) → R and g(x) = supa≤t≤x f(t), a ≤ x < b. Prove thatg(x0) ≤ limx→x0+ g(x) for every x0 ∈ [a, b).

3.3 Continuous Functions3.3.1 Definition. A function f with domain D is said to be continuous at apoint a ∈ D if lim{x→a, x∈D} f(x) = f(a); that is, for each ε > 0 there exists aδ > 0 such that

|f(x)− f(a)| < ε for all x ∈ D with |x− a| < δ.

If f is continuous at each point of a subset E of D, then f is said to becontinuous on E. If f is continuous on D, then f is simply said to be continuous.A point in D at which f is not continuous is called a discontinuity of f . ♦

The definition of continuity implies that any function f : D → R iscontinuous at an isolated point ofD. For example, if D is a finite set or a setof integers, then every function f : D → R is continuous.

Continuity of f on E is not the same as continuity of the restriction f |E .For example, the function on R that is identically equal to one on Z and zeroelsewhere is not continuous on Z, yet its restriction to Z is continuous (as afunction with domain Z).

From the sequential characterization of limit we have

3.3.2 Sequential Characterization of Continuity. A function f withdomain D is continuous at a ∈ D iff f(an)→ f(a) for all sequences {an} inD with an → a.

3.3.3 Example. Let {r1, r2 . . .} be an enumeration of the rationals in (0, 1).Define f on (0, 1) by f(rn) = 1/n and f(x) = 0 if x is irrational. We use thesequential characterization of continuity to show that f is continuous preciselyat the irrational numbers in (0, 1).

Let x ∈ (0, 1) be rational. Choose a sequence {xn} of irrational numbersconverging to x. Since f(xn) = 0 for all n and f(x) 6= 0, f(xn) 6→ f(x).Therefore, f is not continuous at any rational.

Now let x ∈ (0, 1) be irrational and let {xn} be any sequence convergingto x. If f(xn) 6→ f(x), then there exists an N ∈ N and a subsequence {yn} of{xn} such that f(yn) ≥ 1/N for all n. By definition of f , yn ∈ {r1, r2, . . . , rN}.But this implies that x ∈ {r1, r2, . . . , rN}, contradicting that x is irrational.(For a variation of this example, see Exercise 10.) ♦

The following is an immediate consequence of 3.1.12.

3.3.4 Theorem. Let f and g be functions with domain D, let α, β ∈ R andlet a ∈ D. If f and g are continuous at a, then so are αf + βg, fg, f/g (thelast provided that g(a) 6= 0).

3.3.5 Theorem. Let g : D → R and f : E → R with g(D) ⊆ E. If g iscontinuous at a ∈ D and f is continuous at g(a), then f ◦ g is continuous at a.

Proof. Let b := g(a). Given ε > 0, choose η > 0 such that |f(y) − f(b)| < εfor all y ∈ E with |y − b| < η. Next, choose δ > 0 such that |g(x)− b| < η forall x ∈ D with |x− a| < δ. Then |x− a| < δ implies |f(g(x))− f(b)| < ε.

A more succinct proof uses the sequential characterization of continuity:an → a in D ⇒ g(an)→ g(a)⇒ f

(g(an)

)→ f

(g(a)

).

Constant functions and the function f(x) = x are clearly continuous. Itfollows from 3.3.4 that polynomials and rational functions are continuous.Continuity of trigonometric, logarithmic, and exponential functions will followfrom results in Chapter 4. Power functions xα := eα ln x are continuous as theyare compositions of continuous functions. Of course, in each case the domainof the function must be carefully specified.

It is possible that a function is nowhere continuous. The Dirichlet function(3.1.7) is an example. By contrast, we have

3.3.6 Theorem. A monotone function on an open interval I has at mostcountably many discontinuities.

Proof. Assume without loss of generality that f is increasing on I. Let Ddenote the set of discontinuities of f on I. For each t ∈ I, let

at = limx→t−

f(x) and bt = limx→t+

f(x)

and let It = (at, bt). Clearly, It 6= ∅ iff t ∈ D (see Figure 3.3). Furthermore,by monotonicity, s < t⇒ bs ≤ at. Therefore, the sets It are pairwise disjoint.For each t ∈ D, choose a rational number rt in It. Since the correspondencet→ rt is one-to-one and the set of rationals is countable, D is countable.

s t

as

bsrs

atrtbt

FIGURE 3.3: One-to-one correspondence between t ∈ D and rt ∈ Q.

Exercises1.S Define

f(x) ={mx+ 3 if x < 2,3x2 + 7 if x > 2.

If f is continuous at x = 2, find the values of f(2) and m.

2. Find all values of a for which the following function is continuous on R.

f(x) ={

3x2 + 5x− 7 if x < a

2x2 + 2x+ 3 if x ≥ a.

3. Let f : (a, b)→ R and g : (b, c)→ R be continuous and suppose that

limx→b−

f(x) = limx→b+

g(x).

Show that there exists a continuous function h : (a, c) → R such thath = f on (a, b) and h = g on (b, c).

4.S Let g be continuous on R and let d(x) be the Dirichlet function. Showthat f(x) := g(x)d(x) is continuous precisely at the zeros of g.

5. Let f be defined on an open interval I and let c ∈ I. Show that f iscontinuous at c iff for each strictly increasing sequence {an} converging toc and each strictly decreasing sequence {bn} converging to c, f(an)→ f(c)and f(bn)→ f(c).

6. Let f be a continuous function on [a, b] and let {an} be a sequence in[a, b]. Prove:

(a) f(

lim supn→∞

an

)≤ lim sup

n→∞f(an). (b) f

(lim infn→∞

an

)≥ lim inf

n→∞f(an).

Show that equality holds in each case if f is increasing. Give examplesto show that the inequalities may be strict.


7. Let f1, . . . , fn be continuous at x0. Prove that the functions

Mn(x) := max1≤j≤n

fj(x) and mn(x) := min1≤j≤n

fj(x)

are continuous at x0. Give examples to show that the correspondingresult is not true for infinitely many functions, where max is replacedsup and min by inf.

8.S Let f : R→ R be continuous at zero and satisfy f(x+ y) = f(x) + f(y)for all x, y ∈ R. Prove that f(tx) = tf(x) for all t, x ∈ R. Conclude thatf(x) = f(1)x for all x ∈ R.

9. A function f is right continuous at a if limx→a+ = f(a) and left continu-ous at a if limx→a− f(x) = f(a).(a) Prove that f is continuous at a iff f is both right and left continuousat a.(b) Prove that the greatest integer function bxc is right continuous on Rbut not left continuous at any integer.(c)S Let {cn} be any sequence in R. For x ∈ R define

f(x) =∑

n:cn≤x2−n,

where the notation indicates that the sum, possibly infinite, is taken overall indices n for which cn ≤ x. (If there are no such indices, the sumis defined to be 0.) Prove that f is right continuous everywhere. Provealso that f is left continuous at a iff a is not equal to any cn. (Notethat, because the series

∑∞n=1 2−n converges, the order of summation is

irrelevant (6.4.10). Thus f(x) is well-defined.)(d) Let f be increasing on an interval I. Define g on I by

g(x) = limt→x+

f(t) =(

inft>x

f(t).)

Prove that g is increasing and right continuous on I and that g iscontinuous at a iff f is continuous at a.

10. Define f : (0, 1)→ R by

f(x) ={

0 if x is irrational1/n if x = m/n, reduced.

Use the sequential characterization of continuity to show that f is con-tinuous precisely at the irrational numbers in (0, 1).

11.S Let f : [0, 1]→ R have the property that the limit g(x) := limt→x f(t)exists in R for all x ∈ [0, 1]. Prove that(a) g is continuous.(b) f has at most countably many discontinuities.Hint. For (a), use the sequential criterion. For (b), use ideas similar tothose used in the proof of 3.3.6.

3.4 Properties of Continuous Functions3.4.1 Extreme Value Theorem. If f is continuous on a closed boundedinterval [a, b], then f has a maximum and a minimum; that is, there existxm, xM ∈ [a, b] such that

f(xm) ≤ f(x) ≤ f(xM ) for all x ∈ [a, b].

Proof. We show first that f is bounded. Suppose, for instance, that f isnot bounded above. Then for each n ∈ N there exists an ∈ [a, b] such thatf(an) > n. On the other hand, by the Bolzano–Weierstrass theorem, {an} hasa convergent subsequence, say ank → x0. But then, by continuity,

nk < f(ank)→ f(x0) < +∞,

impossible. Thus f must be bounded above. Similarly, f is bounded below.Now let M := sup{f(x) : x ∈ [a, b]}. By the first paragraph, M is finite.

By the approximation property for suprema, there exists a sequence xn ∈ [a, b]such that f(xn)→M . By the Bolzano–Weierstrass theorem again, there existsa subsequence xnk converging to some xM ∈ [a, b]. By continuity, f(xM ) = MTherefore, f(xM ) is the maximum of f . The proof for the minimum case issimilar.

The examples f(x) = 1/x on (0, 1) and f(x) = x on [0,+∞) show that theinterval in the theorem must be both closed and bounded.

3.4.2 Definition. A function f is said to have the intermediate value propertyon an interval I if, for each a, b ∈ I with a < b and each y0 between f(a) andf(b), there exists an x0 ∈ (a, b) such that f(x0) = y0. ♦

The intermediate value property simply asserts that f(I) is an intervalwhenever I is an interval.

3.4.3 Intermediate Value Theorem. A continuous function f on an in-terval I has the intermediate value property.

Proof. Let a, b ∈ I with a < b and suppose that f(a) < y0 < f(b). Theset E := {x ∈ [a, b] : f(x) < y0} contains a and is bounded below, hencex0 := supE exists and lies in [a, b]. By continuity of f at a, E contains aninterval [a, a+ δ), hence x0 > a. Since f(x) < y0 for all x ∈ E, 3.1.14 and thecontinuity of f at x0 imply that

f(x0) = limx→x0x∈E

f(x) ≤ y0.

In particular, x0 6= b. Similarly, since f(x) ≥ y0 for all x ∈ (x0, b),

f(x0) = limx→x+

0

f(x) ≥ y0.

Therefore, y0 = f(x0). Figure 3.4 illustrates the proof.

f(a)

f(x)

f(x)

y0

f(b)

a x x0 x bE

FIGURE 3.4: y0 = f(x0).

Simple examples show that the continuity hypothesis is essential. Of course,there are many discontinuous functions that have the intermediate valueproperty (see Exercise 5). Interestingly, all derivatives have the intermediatevalue property, whether they are continuous or not (Exercise 4.2.25). Thus afunction without the intermediate value property cannot have an antiderivative.

Combining the extreme and intermediate value theorems we obtain

3.4.4 Corollary. If f is continuous on [a, b], then f([a, b]

)= [f(xm), f(xM )].

3.4.5 Corollary (Existence of nth roots). For each b > 0 and n ∈ N, theequation xn = b has a unique positive solution.

Proof. Let f(x) = xn. Since limx→+∞ xn = +∞, we may choose c > 0 suchthat f(c) > b > f(0) = 0. By the intermediate value theorem, the equationf(x) = b has a positive solution. By Exercise 3.1.12, xn is strictly increasingon (0 +∞), hence the solution is unique.

Here is another application of the intermediate value theorem.

3.4.6 Example. The equation

f(x) := 2√x+ sin (3x2)(x− 1)3 + 5x2 + e2x+7

(x− 2)5 = 0

has a solution x = x0 between 1 and 2. Indeed, since

limx→1+

f(x) = +∞ and limx→2−

f(x) = −∞,

there must exist 1 < a 0 > f(b). By the intermediatevalue theorem, f(x0) = 0 for some x0 ∈ (a, b). ♦

Remark. The zeros of a continuous function f may be approximated using theinterval halving method, reminiscent of the proof of the Bolzano–Weierstrasstheorem: Suppose f(a) < 0 < f(b) so that a zero of f lies in (a, b). Bisect theinterval [a, b] and compute the values of f at the endpoints of the resultingtwo intervals. If one of these values is zero, stop. If neither is zero, then forone of the intervals, denote it by [a1, b1], the values of f at the endpoints haveopposite signs. The intermediate value theorem then implies that a zero of flies in (a1, b1), and we may approximate the zero by either a1 or b1. Continuingthis process, we may (theoretically) approximate a zero of f to any desireddegree of accuracy. The procedure is easily programmable. ♦

Exercises1. Find an example of a bounded function on [0, 1] with a single discontinuity

that has no maximum or minimum.

2.S Let f be continuous and positive on R with limx→±∞

f(x) = 0. Prove thatf has a maximum value on R.

3. Let f be continuous on R with limx→±∞

f(x) = +∞. Prove that f has aminimum value on R.

4. A function f defined on an interval J and taking values in R is said tobe upper (lower) semicontinuous at x0 ∈ J if

f(x0) ≥ lim supx→x0

f(x)(f(x0) ≤ lim inf

x→x0f(x)

),

where the limits are one-sided if x0 is an endpoint of J . If f is upper(lower) semicontinuous at each point of J , then f is said to be upper(lower) semicontinuous on J

(a) Prove that f is upper semicontinuous at x0 iff −f is lower semicon-tinuous at x0.

(b) Prove that f is continuous at x0 iff it is both upper and lowersemicontinuous at x0.

(c) Show that, at any integer n, bxc is upper semicontinuous but notlower semicontinuous.

(d) Let f(x) = sin (1/x), x 6= 0, and f(0) = a. Show that f is upper(lower) semicontinuous at 0 iff a ≥ 1 (a ≤ −1).

(e)S Let fi be defined on J and upper semicontinuous at x0 for every iin some index set I. Define f(x) = inf i∈I fi(x), x ∈ J . Show that fis upper semicontinuous at x0. Give an example to show that f maynot be continuous at x0 even if each fi is continuous on J .

(f) (Semi-extreme value property) Prove: If f is upper (lower) semicontin-uous at each point of [a, b], then f is bounded above (below) on [a, b]and there exists x0 ∈ [a, b] such that f(x0) ≥ f(x) (f(x0) ≤ f(x))for all x ∈ [a, b].

5. Give an example of a function on [0, 1] with the intermediate valueproperty that is(a) discontinuous at precisely the points 1/n, n = 1, 2, . . . .(b)S discontinuous everywhere.

6. Prove that a polynomial P of odd degree maps R onto R. In particular,P has a real zero.

7. Use the intermediate value theorem to show that each of the followingequations has a solution in the indicated interval I.

(a) ln x+ x = e, I = (1, e).(b) sin x = ax, I = (π/2, π), 0 < a < 2/π.(c)S tan x = x, I = (nπ, (n+ 1/2)π), n ∈ N.(d) ex = 4.82 sin x, I = (0, π/2) and I = (π/2, π).

(e) x4 + x2 + 1x+ 1 + x3 + 1

x+ e−x + x

x− 1 = 0, I = (−1, 0) and I = (0, 1).

(f)S e1−x2 − x2

sin x = 2x2 − 5cosx , I = (0, π/2).

8. Prove that the equation ex = xn (n ∈ N) has a solution in R iff n ≥ 3.Hint. Find the minimum of ex/xn on (0,+∞).

9.S Let f : [a, b] → [a, b] be continuous. Prove that there exists x ∈ [a, b]such that f(x) = x.

10. Prove that if n ∈ N is odd, then every real number has a unique nthroot.

11. Let f be continuous and nonzero on R. Let a0 be arbitrary and define{an} recursively by an = an−1 + f(an−1), n ≥ 1. Show that eitheran ↑ +∞ or an ↓ −∞.

3.5 Uniform ContinuityRecall that a function f is continuous on a set E if for each y ∈ E and each

ε > 0 there exists δ > 0 such that |f(x)−f(y)| < ε for all x in the domain of fwith |x− y| < δ. The number δ typically depends on both ε and y. Removingthe dependence on y results in the notion of uniform continuity:

3.5.1 Definition. A function f is said to be uniformly continuous on a subsetE of the domain of f if for each ε > 0 there exists δ > 0 such that

|f(x)− f(y)| < ε for all x, y ∈ E with |x− y| < δ. ♦

The following result is frequently useful in determining whether or not afunction is uniformly continuous.

3.5.2 Sequential Characterization of Uniform Continuity. A functionf is uniformly continuous on E iff f(xn)− f(yn)→ 0 for all sequences {xn}and {yn} in E with xn − yn → 0.

Proof. Let f be uniformly continuous on E and let {xn} and {yn} be sequencesin E with xn − yn → 0. Given ε > 0, choose δ > 0 so that |f(x)− f(y)| < εfor all x, y ∈ E with |x− y| < δ. Next, choose N ∈ N such that |xn − yn| < δfor all n ≥ N . For such n, |f(xn)− f(yn)| < ε. Thus f(xn)− f(yn)→ 0.

Now assume that f is not uniformly continuous on E. Then there exists anε > 0 and sequences xn, yn ∈ E with |xn− yn| < 1/n and |f(xn)− f(yn)| ≥ ε.Then xn − yn → 0 but f(xn)− f(yn) 6→ 0, so f does not satisfy the sequentialcondition.

3.5.3 Example. The function f(x) = 1/x, x > 0, is uniformly continuous onintervals of the form [r,+∞), r > 0, as may be seen from the inequality

|f(x)− f(y)| = |x− y|xy

≤ |x− y|r2 , x, y ≥ r.

However, f is not uniformly continuous on (0,+∞). Indeed, if xn = 1/2n andyn = 1/n, then xn − yn → 0 yet f(xn)− f(yn) = n→ +∞. ♦

3.5.4 Theorem. Let f , g be uniformly continuous on E and let α, β ∈ R.Then

(a) αf + βg is uniformly continuous on E.

(b) If f and g are bounded, then fg is uniformly continuous on E.

(c) If g 6= 0 and 1/g is bounded on E, then 1/g is uniformly continuous on E.

Proof. Part (a) follows easily from the sequential characterization of uniformcontinuity. For (b), let M > 0 such that |f(x)|, |g(x)| ≤ M for all x ∈ E.Uniform continuity of fg then follows from the inequalities

|f(x)g(x)− f(y)g(y)| ≤ |f(x)g(x)− f(y)g(x)|+ |f(y)g(x)− f(y)g(y|≤M |f(x)− f(y)|+M |g(x)− g(y)|.

For (c), choose K > 0 such that 1/|g(x)| < K for all x ∈ E. Uniformcontinuity of 1/g then follows from∣∣∣∣ 1

g(x) −1

g(y)

∣∣∣∣ = |g(x)− g(y)||g(x)g(y)| ≤ K

2|g(x)− g(y)|, x, y ∈ E.

The following theorem may be given a short proof based on the sequentialcriterion for uniform continuity. We leave the details to the reader.

3.5.5 Theorem. Suppose that g is uniformly continuous on D, f is uniformlycontinuous on E, and g(D) ⊆ E. Then f ◦ g is uniformly continuous on D.

The next theorem shows that on closed and bounded intervals the notionsof continuity and uniform continuity coincide.

3.5.6 Theorem. If f is continuous on a closed bounded interval [a, b], thenf is uniformly continuous there.

Proof. We use the sequential characterization of uniform continuity. Let {xn}and {yn} be sequences in [a, b] with xn− yn → 0. Suppose, for a contradiction,that f(xn) − f(yn) 6→ 0. Then |f(xn) − f(yn)| > ε for some ε > 0 andinfinitely many n and hence for a subsequence of {n}. Changing notation ifnecessary, we may suppose that the inequality holds for all n. By the Bolzano–Weierstrass theorem, {xn} has a convergent subsequence, say xnk → x0. Sincexnk − ynk → 0, ynk → x0. But then, by continuity, |f(xnk) − f(ynk)| → 0,which is impossible.

The connection between continuity and uniform continuity on open intervalsis more complicated. For this, we need the following definitions.

3.5.7 Definition. A continuous function f on D is said to have a continuousextension to a set D1 ⊇ D if there exists a continuous function f1 : D1 → Rsuch that f1|D = f . In the special case D1 = D ∪ {a}, where a 6∈ D, f(x) issaid to have a removable discontinuity at x = a. ♦

3.5.8 Proposition. Let f be defined and continuous on D and let a be anaccumulation point of D, a 6∈ D. Then f has a removable discontinuity atx = a iff L := lim{x→a, x∈D} f(x) exists in R.

Proof. The necessity is clear. For the sufficiency, simply set f(a) = L to obtaina continuous extension of f to D ∪ {a}.


For example, the functions

x sin 1x,

sin xx

, and x√|x|

defined for x 6= 0, have removable discontinuities at x = 0 and hence haveunique continuous extensions to R. On the other hand, since limx→0+ sin(1/x)does not exist, the function sin(1/x) does not have a removable discontinuityat x = 0.

The following theorem is the main result regarding uniform continuity offunctions on bounded open intervals.

3.5.9 Theorem. Let f be continuous on the bounded interval (a, b). Thefollowing statements are equivalent:

(a) limx→a+ f(x) and limx→b− f(x) exist in R.

(b) f has a continuous extension to [a, b].

(c) f is uniformly continuous on (a, b).

Proof. (a) ⇒ (b) is immediate from 3.5.8.(b) ⇒ (c): By 3.5.6, a continuous extension g of f to [a, b] is uniformly

continuous. Therefore, f = g|(a,b) is uniformly continuous.(c) ⇒ (a): Let {an} be any sequence in (a, b) converging to a. Then {an}

is Cauchy and since f is uniformly continuous, {f(an)} is Cauchy (Exercise 7).Therefore, L := limn→∞ f(an) exists. We claim that limx→a+ f(x) exists andequals L. To see this, let {a′n} be any sequence in (a, b) converging to a. Thenan − a′n → 0, hence, by uniform continuity, f(an)− f(a′n)→ 0, so f(a′n)→ L.By the sequential characterization of limit (3.1.9), limx→a+ f(x) = L. A similarargument shows that limx→b− f(x) exists.

For example, since sin(1/x) has no continuous extension to [0, 1], it isnot uniformly continuous on (0, 1]. On the other hand, for any p > 0,limx→0+ xp sin(1/x) = 0, hence xp sin(1/x) is uniformly continuous on (0, 1].

For another example, consider f(x) = (1 − cosx)/x on R \ {0}. Byl’Hospital’s rule, proved in the next chapter, limx→0 f(x) = limx→0 sin x = 0,hence f has a continuous extension to R. Moreover, since limx→±∞ f(x) = 0,f is uniformly continuous on R (Exercise 5).

3.5.10 Corollary. A bounded, continuous, monotone function f on a boundedinterval (a, b) is uniformly continuous there.

Proof. By 3.1.17, limx→a+ f(x) and limx→b− f(x) exist in R.

The following result relies on the mean value theorem proved in the nextchapter.

3.5.11 Theorem. If f has a bounded derivative on an interval I, then f isuniformly continuous on I.

Proof. Let M be a bound for |f ′| on I. By the mean value theorem, for anyx, y ∈ I there exists a z between x and y such that f(x)− f(y) = f ′(z)(x− y).Thus |f(x)− f(y)| ≤M |x− y|, which implies uniform continuity.

For example, sinn x and cosn x have bounded derivatives for every n ∈ N,hence are uniformly continuous on R. This also follows from periodicity (seeExercise 11). On the other hand, xp is not uniformly continuous on (0,+∞)for p > 1. Indeed, if xn = n+ n(1−p)/2 and yn = n, then, by the mean valuetheorem, for each n there exists zn ∈ (yn, xn) such that

xpn − ypn = pzp−1n

n(p−1)/2 ≥ pn(p−1)/2 → +∞.

Since xn − yn → 0, 3.5.2 implies that xp is not uniformly continuous.

Exercises1.S Find functions f and g with f continuous and g uniformly continuous

such that neither f ◦ g nor g ◦ f is uniformly continuous.

2. Let r > 0. Show that the function f(x) = (3x+ 2)/(2x− 1) in 3.1.4 isuniformly continuous on Dr but not on its domain D, where

Dr := (−∞, 1/2− r]∪ [1/2 + r,+∞) and D = (−∞, 1/2)∪ (1/2,+∞).

3. Let a, b > 0. Give a careful ε, δ proof that each of the following functionsis uniformly continuous on R.

(a)S√ax2 + b. (b) 1/

√ax2 + b. (c) |ax+ b|.

4. Show that ln x is uniformly continuous on (r,+∞) for every r > 0 but isnot uniformly continuous on (0, 1).

5. Let f be continuous on [0,∞). Prove that if limx→+∞ f(x) exists andis finite, then f is uniformly continuous on [0,+∞). Give an exampleof a bounded continuous function on [0,+∞) that is not uniformlycontinuous.

6. Prove that each of the following functions is uniformly continuous on theindicated interval, where n ∈ N:

(a) sin(1/x), [r,+∞), r > 0. (b) x sin(1/x), [0,+∞).(c) arctan x, (−∞,+∞). (d) xne−x, [0,+∞).

(e) cos√x2 + 1, (−∞,+∞). (f) xp, 0 < p ≤ 1, [0,+∞).

(g) (1 + xn)1/n, [0,+∞). (h) (1 + xn)−1/n, [0,+∞).

7.S Let f be uniformly continuous on E and let {an} be a Cauchy sequencein E. Prove that {f(an)} is Cauchy.

8. Suppose that f(x) is uniformly continuous on [0,+∞). Prove that thefunction g is uniformly continuous on R, where

g(x) :={f(x) if x ≥ 0,f(−x) if x < 0.

9.S Let f be uniformly continuous on R. Prove that f(|x|), |f(x)|, and|f(|x|)| are uniformly continuous on R.

10. Let f be uniformly continuous on each of the intervals (a, b) and (c, d),where a 0, that is, f(x+ p) = f(x) forall x ∈ R. If f is continuous on [0, p], prove that f is uniformly continuousand bounded on R.

12. Let f1, . . . , fn be uniformly continuous on E. Prove that the functions

M(x) := max1≤j≤n

fj(x) and m(x) := min1≤j≤n

fj(x)

are uniformly continuous on E.

13.S Find all values of p > 0 for which the function f(x) = x−p sin x, x > 0,has a continuous extension to [0,+∞). Prove that for all such p theextension is uniformly continuous.

14. Let r > 0. Prove that f(x) := sin(xp) is uniformly continuous on (r,+∞)iff p ≤ 1.

15.S Prove that a uniformly continuous function f on a bounded interval(a, b) is bounded. Give examples to show that the result is not true if(a, b) is unbounded or if f is merely continuous.

16. Give examples to show that parts (b) and (c) of 3.5.4 are not necessarilytrue if the boundedness conditions are removed.

17. Let f be continuous on [a, b]. Prove that

g(x) := supa≤t≤x

f(t)

is continuous on [a, b].

18.S Letf(x) = (1− e1/x)−1, x 6= 0.

Is it possible to define f(0) so that f is continuous on R? What aboutfor the function

g(x) = x(1− e1/x)−1, x 6= 0?

Chapter 4Differentiation on R

The notion of rate of change of one quantity with respect to another is funda-mental to many disciplines. It is expressed mathematically as the derivative ofa function. In this chapter we establish the main properties of this importantconstruct.

4.1 Definition of Derivative and Examples4.1.1 Definition. A real-valued function f defined in a neighborhood of a ∈ Ris said to be differentiable at a if the limit

f ′(a) = Df(a) = df

dx

∣∣∣∣a

:= limx→a

f(x)− f(a)x− a

= limh→0

f(a+ h)− f(a)h

exists in R. The limit is then called the derivative of f at a. If f is differentiableat each member of a set E, then f is said to be differentiable on E and thefunction

f ′ = Df = df

dx

is called the derivative of f on E. If f ′ is continuous on E, then f is said tobe continuously differentiable on E. ♦

It follows immediately from the definition that the derivative of a constantfunction is 0. Here are some nontrivial examples.

4.1.2 Example. We prove the following special cases of the power rule (thegeneral power rule will be proved later): Let n ∈ N and r = n or 1/n. Then

Dxr = rxr−1.

(In the second case x 6= 0, and x > 0 if n is even.)The case r = n is obtained by letting h→ 0 in the identity

(x+ h)n − xnh

=n∑j=1

(x+ h)n−jxj−1

73

(Exercise 1.2.4.) Each term in the sum tends to xn−1, and since there are nterms the formula follows.

For the case r = 1/n we use the identity

(x+ h)1/n − x1/n

h=[ n∑j=1

(x+ h)1−j/nx(j−1)/n]−1

(Exercise 1.4.15). As h → 0, the term in square brackets tends to nx1−1/n,verifying the formula.

♦

For the next example, and indeed for the remainder of the book, we shalluse the standard definitions of cosine and sine as coordinates of points on theunit circle.1 From this one can derive the usual trigonometric identities, whichwe shall invoke as needed.

h

cosh

sinhtanh

h

1

1

FIGURE 4.1: sin h < h < tan h.

4.1.3 Example. D sin x = cosx. From the identity sin2 h + cos2 h = 1 andthe inequalities

sin h < h < tan h, 0 < h < π/2,which may be derived with the help of Figure 4.1, we see that√

1− h2 <√

1− sin2 h = cosh < sin hh

< 1, 0 < h < π/2. (4.1)

Since sin(−h) = − sin h and cos(−h) = cosh, (4.1) holds for −π/2 < h < 0 aswell. By the squeeze principle,

limh→0

cosh = limh→0

sin hh

= 1.

From this and the calculation

cosh− 1h

= cos2 h− 1h(cosh+ 1) = −

(sin hh

)2h

(cosh+ 1)

1A more rigorous approach to the calculus of trigonometric functions may be based onthe inverse sine function. This approach is described briefly in Section 4.4.

Differentiation on R 75

we see thatlimh→0

cosh− 1h

= 0.

Therefore,

sin(x+ h)− sin xh

= sin x cosh+ cosx sin h− sin xh

= cosh− 1h

sin x+ sin hh

cosx

→ cosx as h→ 0. ♦

It is occasionally necessary to consider one-sided derivatives, which aredefined by using one-sided limits in 4.1.1. Specifically, the left-hand and right-hand derivatives are, respectively,

D`f(a) = f ′`(a) := limx→a−

f(x)− f(a)x− a

and

Drf(a) = f ′r(a) := limx→a+

f(x)− f(a)x− a

.

From the general theory of limits, a function is differentiable at a iff it hasequal right-hand and left-hand derivatives at a. For example, at x = 0 thefunction f(x) = |x| has right-hand derivative 1 and left-hand derivative −1and so is not differentiable there.

Although we shall have no need to do so, one may even consider the moregeneral expressions

lim infx→ax∈E

f(x)− f(a)x− a

and lim supx→ax∈E

f(x)− f(a)x− a

,

where a is an accumulation point of E. The so-called Dini derivates areobtained by taking E to be intervals of the form (c, a) and (a, c).

The following proposition provides a useful characterization of differentia-bility. It asserts that for x near a, f(x) is approximated by the linear functiony = f(a) + f ′(a)(x− a), the equation of the tangent line at a.

4.1.4 Proposition. Let f be defined in a neighborhood N (a) of a. Then f isdifferentiable at a iff there exists a function η on N (a), continuous at a, suchthat

f(x) = f(a) + η(x)(x− a) for all x ∈ N (a).

In this case, f ′(a) = η(a).

Proof. If such a function η exists, then

f(x)− f(a)x− a

= η(x)→ η(a) as x→ a,


hence f ′(a) exists and equals η(a). Conversely, if f is differentiable at a, define

η(x) =

f(x)− f(a)

x− aif x ∈ N (a) \ {a},

f ′(a) if x = a.

Then η has the required properties.

4.1.5 Corollary. If f is differentiable at a, then f is continuous there.Proof. Simply note that f(x) = f(a) + η(x)(x− a)→ f(a) as x→ a.

The example |x| considered above shows that the converse of the corollaryis false: |x| is continuous at 0 but not differentiable there. It is a remarkablefact that there are continuous functions on R that are nowhere differentiable(see 8.9.7).4.1.6 Theorem. If c ∈ R and f and g are differentiable a, then so are f + g,cf , fg, and f/g, the last provided that g(a) 6= 0. Moreover, in this case,

(a) (f + g)′(a) = f ′(a) + g′(a), (b) (cf)′(a) = cf ′(a),

(c) (fg)′(a) = f(a)g′(a) + f ′(a)g(a), (d)(f

g

)′(a) = g(a)f ′(a)− f(a)g′(a)

g2(a) .

Proof. We prove only (d). Let h = f/g. Since g is continuous at a and g(a) 6= 0,h is defined in a neighborhood N (a) on which g is not 0. For x ∈ N (a) \ {a},a little algebra shows that

h(x)− h(a)x− a

=g(a)f(x)− f(a)

x− a− f(a)g(x)− g(a)

x− ag(x)g(a) .

Letting x→ a, using the continuity of g at a, yields (d).

The preceding theorem, together with 4.1.2 and 4.1.3, show that polyno-mials, rational functions, and trigonometric functions are differentiable. (SeeExercise 2.) The following important result will yield additional examples.4.1.7 Chain Rule. Let g be differentiable at a and let f be differentiable atg(a). Then f ◦ g is differentiable at a and (f ◦ g)′(a) = f ′(g(a))g′(a).Proof. Set b := g(a). By 4.1.4, there exists a function η, defined in a neighbor-hood N (b) of b and continuous at b with η(b) = f ′(b), such that

f(y) = f(b) + η(y)(y − b), y ∈ N (b). (4.2)

Since g is continuous at a, we may choose a neighborhood N (a) of a such thatg(N (a)) ⊆ N (b). Then f ◦ g is defined on N (a), and by (4.2)

f(g(x))− f(g(a))x− a

= η(g(x)) g(x)− g(a)x− a

, x ∈ N (a) \ {a}.

Letting x→ a produces the desired result.

The formula (f ◦ g)′(x) = f ′(g(x))g′(x) is sometimes easier to apply whenwritten in Leibniz notation as

dy

dx= dy

du

du

dx, where y = f(u) and u = g(x).

4.1.8 Example. The power rule

Dxr = rxr−1, r ∈ Q,

follows from 4.1.2 and the chain rule: Let r = m/n, m,n ∈ N, and set u = x1/n

and y = um. Then y = xr and

dy

dx= dy

du

du

dx= mum−1 1

nx1/n−1 = m

nxm/n−1 = rxr−1.

The case r < 0 may be verified using the quotient rule. ♦

Higher order derivatives of y = f(x) are defined inductively by

f ′′ = D2f = d2y

dx2 := d

dx

dy

dx,

...

f (n) = Dnf = dnf

dxn:= d

dx

dn−1f

dxn−1 .

By convention, we set f (0) = D0f := f .

Exercises1. Use the limit definition to find the derivative of

(a) x2 + x+ 1. (b)S√

2x+ 1. (c) 1x2 + 1 . (d)S 1√

3x+ 2.

2. Use the techniques of 4.1.3 to find the derivative of cosx. Use rules ofdifferentiation to obtain the derivatives of tan x, cotx, secx, and cscx.

3. Use rules of differentiation to find f ′ for each of the functions f :

(a) S 3√

5x+ 7 5√

3x+ 2. (b)(

2x+ 57x+ 2

)2/3. (c) S sin

(x2 − 1x2 + 1

).

(d) sin2 x− 1sin2 x+ 1

. (e) tan[

cos(1/x)]. (f)

√ax+

√bx+ c.

4. Assuming that y is a differentiable function of x that satisfies the givenequation, use the rules of differentiation to find dy

dx:

(a)x3 + y3 − xy = 1. (b) S sin(xy2) + x2 = 1. (c) tan(x+ y) + y2 = x.

5. Let f(x) = xn|x|, n ∈ N. Find f (n−1) and f (n).

6. Let f(x) = xmbxc, m ∈ N. Find f ′`(n) and f ′r(n), n ∈ Z.

7.S Find all values of a, b, such that f ′ exists on R, where

f(x) ={ax2 + bx+ a/x if x > 1,x3 if x ≤ 1.

8. Find all values of a, b, and c such that f ′ is continuous on (0,+∞), where

f(x) ={ax2 + bx if x > 1,c√x if 0 < x ≤ 1.

9. Let

f(x) ={ax2 + bx+ c if x > 1,x3 if x ≤ 1.

Find all values of a, b, and c such that

(a) f is continuous on R. (b) f is differentiable on R.(c) f ′ is continuous on R. (d) f ′′ exists on R.

10. Find all values of c such that f ′(c) exists, where

f(x) ={ax− 4 if x > c,

9x2 if x ≤ c.

Is f ′ continuous at these values?

11. Let f be differentiable at a. Use the limit definition of derivative tocalculate

(a) limh→0

f(a+ 5 sin h)− f(a+ 2 sin h)h

. (b)S limh→0

f(a+ h2)− f(a− h)h

.

12. Let g be differentiable on an open interval I and let f(x) = g(x)d(x),where d(x) is the Dirichlet function (3.1.7). Let a be a zero of g. Provethat f ′(a) exists iff a is a zero of g′.

13. Let f be differentiable at c and let {an} and {bn} be sequences such thatan < c < bn and an, bn → c. Prove that

limn→∞

f(bn)− f(an)bn − an

= f ′(c).

14.S Let f be differentiable and increasing on (a, b). Prove that f ′(x) ≥ 0 forall x ∈ (a, b).


15. Let f be differentiable at a and nonnegative in a neighborhood of a withf(a) = 0. Prove that f ′(a) = 0.

16.S Prove Leibniz’s rule: If f and g are n times differentiable, then

Dn(fg) =n∑k=0

(n

k

)(Dkf) (Dn−kg).

17. Prove that if f has right-hand and left-hand derivatives at a (not neces-sarily equal), then f is continuous at a.

18. Assuming that f , g, and h have the necessary differentiability, find generalformulas for(a) D

[f ◦ (gh)

]. (b) D

[f ◦ (g/h)

]. (c)S D2[f ◦ g]. (d) D

[f ◦ g ◦ h

].

19. Find a formula for the nth derivative of(a)S 1/x. (b) 1/

√x. (c) xex. (d) xe−x.

20. Find all values of p ∈ R for which the function

f(x) ={|x|p sin(1/x) if x 6= 0,0 otherwise

is (a) continuous, (b) differentiable, (c) continuously differentiable onR.

21.S Define f(0) = 0 and f(x) = xm sin xn, x 6= 0, where m ∈ Z, n ∈ N. Forwhat values of m and n does f ′(0) exist? For which of these values is f ′continuous on R?

22. A function f defined on a symmetric neighborhood (−a, a) of 0 is saidto be odd if f(−x) = −f(x) and even if f(−x) = f(x).(a) Prove that any function h : (−a, a) → R is the sum of an evenfunction f and an odd function g.(b) Prove that if f is differentiable and odd (even), then f ′ is even (odd).(c) Is the converse true? That is, if f ′ is even (odd), is f odd (even)?

23.S Let fj , gj , and hj be differentiable, j = 1, 2, 3. Prove that∣∣∣∣f1 f2g1 g2

∣∣∣∣′ =∣∣∣∣f ′1 f ′2g1 g2

∣∣∣∣+∣∣∣∣f1 f2g′1 g′2

∣∣∣∣ and∣∣∣∣∣∣f1 f2 f3g1 g2 g3h1 h2 g3

∣∣∣∣∣∣′

=

∣∣∣∣∣∣f ′1 f ′2 f ′3g1 g2 g3h1 h2 h3

∣∣∣∣∣∣+

∣∣∣∣∣∣f1 f2 f3g′1 g′2 g′3h1 h2 h3

∣∣∣∣∣∣+

∣∣∣∣∣∣f1 f2 f3g1 g2 g3h′1 h′2 h′3

∣∣∣∣∣∣ .

4.2 The Mean Value TheoremThe mean value theorem relates the average rate of change of a function

to its instantaneous rate of change. It is one of the most useful theorems inanalysis and will play a central role in the proof of the fundamental theoremof calculus in Chapter 5. The proof of the mean value theorem is based on theexistence of local extrema.

4.2.1 Definition. A function f is said to have a local maximum (local mini-mum) at c if f is defined on an open interval I containing c and f(x) ≤ f(c)(f(x) ≥ f(c)) for all x ∈ I. In either case, f is said to have a local extremumat c. ♦

c1x

f

c2

FIGURE 4.2: Local extrema of f .

4.2.2 Local Extremum Theorem. If f has a local extremum at c and if fis differentiable at c, then f ′(c) = 0.

Proof. Suppose that f has a local maximum at c. Let I be an open intervalcontaining c such that f(x) ≤ f(c) for all x ∈ I. Then

f(x)− f(c)x− c

{≥ 0 if x ∈ I and x < c

≤ 0 if x ∈ I and x > c.

It follows that the left-hand derivative of f at c is ≥ 0 and the right-handderivative is ≤ 0, hence f ′(c) = 0. The proof for the local minimum case issimilar.

4.2.3 Rolle’s Theorem. Let f be continuous on [a, b] and differentiable on(a, b). If f(a) = f(b), then there exists a point c ∈ (a, b) such that f ′(c) = 0.

Proof. By the extreme value theorem there exist xm, xM ∈ [a, b] such thatf(xm) ≤ f(x) ≤ f(xM ) for all x ∈ [a, b]. If f(xm) = f(xM ), then f is a constantfunction and the assertion of the theorem holds trivially. If f(xm) 6= f(xM ),then either xm ∈ (a, b) or xM ∈ (a, b), and the conclusion follows from thelocal extremum theorem.


The following result is the key ingredient in the proof of l’Hospital’s rulein Section 4.5.4.2.4 Cauchy Mean Value Theorem. Let f and g be continuous on [a, b]and differentiable on (a, b). Then there exists a point c ∈ (a, b) such that

[f(b)− f(a)]g′(c) = [g(b)− g(a)]f ′(c).

Proof. The function

h(x) := [f(b)− f(a)]g(x)− [g(b)− g(a)]f(x)

is continuous on [a, b], differentiable on (a, b), and satisfies h(a) = h(b). ByRolle’s theorem, h′(c) = 0 for some c ∈ (a, b), which is the assertion of thetheorem.

a c bx

y(f(c), g(c))

(f(b), g(b))

(f(a), g(a))

x

y

(a) (b)

FIGURE 4.3: (a) Cauchy mean value theorem. (b) Mean value theorem.

If f(a) 6= f(b) and f ′(x) 6= 0 on (a, b), then the conclusion of 4.2.4 may bewritten

g(b)− g(a)f(b)− f(a) = g′(c)

f ′(c) .

For smooth functions f and g, this equation asserts that at some point(f(c), g(c)

)on the curve given parametrically by the equations x = f(t)

and y = g(t), the line through the endpoints (f(a), g(a)) and (f(b), g(b)) isparallel to the line tangent to the curve at

(f(c), g(c)

). See Figure 4.3(a).

Taking g(x) = x in the Cauchy mean value theorem yields the standardmean value theorem (Figure 4.3(b)):4.2.5 Mean Value Theorem. If f is continuous on [a, b] and differentiableon (a, b), then there exists c ∈ (a, b) such that

f(b)− f(a)b− a

= f ′(c).

4.2.6 Corollary. Let f(x) and g(x) be differentiable on an open interval Isuch that f ′(x) = g′(x) for all x ∈ I. Then there exists a constant k such thatf = g + k on I.

Proof. Let a, b ∈ I. By the mean value theorem applied to h := f − g, thereexists c ∈ (a, b) such that h(a)− h(b) = h′(c)(a− b). Since h′ = 0, h(a) = h(b).Since a and b were arbitrary, h must be constant.

4.2.7 Corollary. Let f be differentiable on an open interval I.

(a) If f ′ ≥ 0 (f ′ > 0) on I, then f is increasing (strictly increasing) on I.

(b) If f ′ ≤ 0 (f ′ < 0) on I, then f is decreasing (strictly decreasing) on I.

Proof. We prove (a) for the strictly increasing case. Let a, b ∈ I, a < b. Bythe mean value theorem, f(b)− f(a) = f ′(c)(b− a) for some c ∈ (a, b). Sincef ′(c) > 0, f(b) > f(a).

Exercises1.S Show that cosx =

√x − 1 has exactly one solution x in the interval

(0, π/2).

2. Find an interval I such that for each c ∈ I, sin x = x2/2 + x + c hasexactly one solution x in the interval (0, π/2).

3.S Show that f(x) = x4−4x3 + 4x2 + c has at most one zero in the interval(1, 2). For what interval of values of c does f have exactly one zero in(1, 2)?

4. Let f have k derivatives and n distinct zeros on an interval I. Prove thatf (k) has at least n− k distinct zeros in I.

5. Let f have a continuous second derivative on [−1, 3], f(1) = 0, andset g(x) = x2f(x). Prove that g′′ has at least one zero in [−1, 2]. Hint.Consider the function gn(x) := x(x+ 1/n)f(x).

6. Let P (x) be a polynomial of degree n and let a 6= 0. Prove that theequation eax = P (x) has at most n+ 1 solutions.

7.S Let P (x) be a polynomial of degree n and let a 6= 0. Prove that theequation sin(ax) = P (x) has at most n+ 1 solutions.

8. Prove Bernoulli’s inequality: (1 + x)r ≥ 1 + rx for all x ≥ −1 and allrational numbers r ≥ 1. (Cf. Exercise 1.5.10.)

9.S Let f and g be continuous on [a, b] and differentiable on (a, b) such that|f ′| ≤ |g′|. If g′ is never zero on (a, b), prove that

|f(x)− f(y)| ≤ |g(x)− g(y)| for all x, y ∈ [a, b].

10. Let f and g be differentiable on an open interval I and let a, b ∈ I witha < b. Prove that if f(a) = g(a) and f ′ > g′ on (a, b), then f > g on(a, b). Use this to show that

(a) ln x < x− 1 on the interval (1,+∞).(b) sin x < x on the interval (0, π/2).(c) cosx > 1− x on the interval (0, π/2).(d) tan x > x on the interval (0, π/2).(e) ex > 1 + x + x2/2! + · · · + xn/n! on the interval (0,+∞). (Use

induction.)

11.S Show that sin xx

is a decreasing function on (0, π/2).

12. Show that on (0, π/2)

(a) x sin x+ cosx > 1. (b) x sin x+ p cosx < p, p ≥ 2.(c) x−1(1− cosx) is increasing. (d) x−2(1− cosx) is decreasing.

13. Let a, b, p > 0, and for x ≥ 0 define f(x) = ap + xp − (a + x)p. Showthat for x > 0,

f ′(x){> 0 if 0 1.

Conclude that

(a+ b)p{< ap + bp if 0 ap + bp if p > 1.

14. Let f and g have derivatives of order n on an open interval I and leta ∈ I. Suppose that

f (j)(a) = g(j)(a) = 0, j = 0, . . . , n− 1, and

f (j)(x)g(j)(x) 6= 0 for x > a and j = 0, . . . , n.

Prove that for any b ∈ I with b > a there exists c ∈ (a, b) such that

f(b)g(b) = f (n)(c)

g(n)(c) .

15. Suppose that f has a local maximum at c. Prove that

lim infx→c−

f(x)− f(c)x− c

≥ 0 ≥ lim supx→c+

f(x)− f(c)x− c

.

16. Let f and g be continuous on [a, b], differentiable on (a, b) and let f(a) =f(b) = 0. Show that there exists c ∈ (a, b) such that f ′(c) = g′(c)f(c).

17.S Show that for any polynomial P (x) there exist finitely many intervalswith union R such that P is strictly monotone on each interval.

18. Suppose that f has the property

|f(x)− f(y)| ≤ c|x− y|1+ε for all x, y ∈ R,

where c, ε > 0. Prove that f is constant.

19.S Let f have a bounded derivative on R. Prove that for sufficiently larger the function g(x) := rx+ f(x) is one-to-one and maps R onto R.

20. Suppose f > 0 on (1,+∞) and limx→+∞ xf ′(x)/f(x) ∈ (1,+∞). Provethat x/f(x) is decreasing on (b,+∞) for some b > 1.

21. Let f be twice differentiable on (0, a), f ′′ ≥ 0, and limx→0+ f(x) = 0.Prove that f(x)/x is increasing on (0, a). Show that the conclusion isfalse if the hypothesis f ′′ ≥ 0 is dropped.

22.S Let g(x) = x2 sin(1/x) if x 6= 0 and g(0) = 0. Set f(x) = x+ g(x). Showthat f ′(0) > 0 but f is not monotone on any neighborhood of 0.

23. Let limx→+∞ f ′(x) = 0. Prove that if g ≥ c > 0 on (a,+∞), then

limx→+∞

[f(x+ g(x)

)− f(x)

]= 0.

24. Let f be differentiable on R with supx∈R |f ′(x)| < 1. Prove that thesequence {xn} defined by xn+1 = f(xn) converges, where x1 is arbitrary.Conclude that f has a unique fixed point ; that is, there exists a uniquex ∈ R such that f(x) = x.

25.S Suppose f is differentiable on an open interval I. Show that f ′ has theintermediate value property. Conclude that if f ′(x) 6= 0 on I, then f isstrictly monotone on I. Hint. Apply the extreme value theorem to thefunction g(x) = f(x)− y0(x− a), a ≤ x ≤ b.

26. Let f be differentiable on I := (1,+∞). Prove that if f ′ has finitelymany zeros in I, then limx→+∞ f(x) exists in R.

27. Let f and g have continuous derivatives on an interval I with g′ 6= 0 andlet aj , bj ∈ I with aj < bj , j = 1, . . . , n. Prove that there exists c ∈ Isuch that

n∑j=1

[f(bj)− f(aj)]g′(c) =n∑j=1

[g(bj)− g(aj)]f ′(c).

28.S A function f is said to be uniformly differentiable on an open interval Iif, given ε > 0, there exists δ > 0 such that∣∣∣∣f(x)− f(y)

x− y− f ′(y)

∣∣∣∣ < ε

for all x and y in I with 0 < |x − y| < δ. Prove that f is uniformlydifferentiable on I iff f ′ exists and is uniformly continuous on I.

29. Generalize the preceding exercise as follows: Let f and g be differentiableon an open interval I with g′ 6= 0 on I. Prove that f ′/g′ is uniformlycontinuous on I iff, given ε > 0, there exists a δ > 0 such that∣∣∣∣f(x)− f(y)

g(x)− g(y) −f ′(y)g′(y)

∣∣∣∣ < ε,

for all x and y in I with 0 < |x− y| < δ.

30. Let f be differentiable on [a,+∞) and suppose that the zeros of f ′ forma strictly increasing sequence an ↑ +∞. Prove that if L := limn f(an)exists in R, then limx→+∞ f(x) = L.

31.S Prove that a function f is continuously differentiable on an open intervalI iff there exists a continuous function ϕ on I2 such that

f(x)− f(y) = ϕ(x, y)(x− y) for all x, y ∈ I.

32. Let f be continuous on (−r, r) and differentiable on (−r, 0) ∪ (0, r). Iflimx→0 f

′(x) exists, prove that f ′(0) exists and f ′ is continuous at 0.

*4.3 Convex Functions4.3.1 Definition. A function f is said to be convex on an interval (a, b) if

f((1− t)u+ tv

)≤ (1− t)f(u) + tf(v)

for all a < u < v < b and all t ∈ [0, 1]. f is concave if −f is convex. ♦

For example, |x| is convex on R, as is easily established using the triangleinequality.

To see the geometric significance of convexity, let Luv : [u, v]→ R denotethe function whose graph is the line segment from (u, f(u)) to (v, f(v)). Sincea typical point on the line segment may be written

(1− t)(u, f(u)) + t(v, f(v)

)=((1− t)u+ tv, (1− t)f(u) + tf(v)

), t ∈ [0, 1],

we see thatLuv

((1− t)u+ tv

)= (1− t)f(u) + tf(v).

This shows that f is convex iff the line segment connecting any two points onthe graph of f lies above the part of the graph between the two points. (SeeFigure 4.4.)

Luv

a u b

f

vx

FIGURE 4.4: Convex function.

Now let x ∈ (u, v). Then for some t ∈ (0, 1),

x = (1− t)u+ tv = t(v − u) + u = (1− t)(u− v) + v,

hencet = (x− u)/(v − u) and 1− t = (v − x)/(v − u).

It follows that f is convex on (a, b) iff

f(x) ≤ Luv(x) = f(u)v − xv − u

+ f(v)x− uv − u

for all a < u < x < v < b. (4.3)

4.3.2 Theorem. If f : (a, b) → R has an increasing derivative, then f isconvex. In particular, f is convex if f ′′ ≥ 0.

Proof. Let a < u < x < v < b. By the mean value theorem applied to f oneach of the intervals [u, x] and [x, v], there exist points y ∈

(u, x

)and z ∈

(x, v)

such thatf(x)− f(u)

x− u= f ′(y) ≤ f ′(z) = f(v)− f(x)

v − x.

Solving the inequality for f(x) yields (4.3).

Thus x2n is convex on R for any n ∈ N, ln(x) is concave on (0,+∞), andxp is convex on (0,+∞) if p ≥ 1 and concave if p < 1.

There is a partial converse to 4.3.2. For this we need following lemma.

4.3.3 Lemma. If f is convex and a < u < x ≤ y < v < b, then

(a) f(x)− f(u)x− u

≤ f(y)− f(u)y − u

≤ f(v)− f(y)v − y

, and

(b) f(v)− f(x)v − x

≤ f(v)− f(y)v − y

.

Proof. Referring to Figure 4.5, for (a) we have

f(x)− f(u)x− u

≤ Luy(x)− f(u)x− u

by convexity, since u < x < y,

= f(y)− f(u)y − u

by equality of slopes on Luy,

≤ Luv(y)− f(u)y − u

by convexity, since u < y < v,

= Luv(v)− Luv(y)v − y

by equality of slopes on Luv,

≤ f(v)− f(y)v − y

by convexity since u < y < v.

Luv

Luy

Lxv

vu x y

f

FIGURE 4.5: Convex function inequalities.

A similar calculation verifies (b):

f(v)− f(y)v − y

≥ Lxv(v)− Lxv(y)v − y

= Lxv(v)− Lxv(x)v − x

= f(v)− f(x)v − x

.

4.3.4 Theorem. If f is convex, then f ′r and f ′` exist, are increasing, andf ′`(x) ≤ f ′r(x).

Proof. Let a < u < x ≤ y < v < b. By (a) of the lemma, the differencequotients [f(x)− f(u)]/(x− u) decrease as x→ u+, so f ′r(u) exists in R and

f ′r(u) ≤ f(v)− f(y)v − y

< +∞.

Letting v → y+ shows that f ′r(u) ≤ f ′r(y). Therefore, f ′r is increasing. Similarly,by (b) the difference quotients [f(v) − f(y)]/(v − y) increase as y → v− sof ′`(v) exists in R and

f ′`(v) ≥ f(v)− f(x)v − x

> −∞.

Taking x = y in (a) of the lemma, we have

f(x)− f(u)x− u

≤ f(v)− f(x)v − x

.

Letting u ↑ x and v ↓ x, we obtain f ′`(x) ≤ f ′r(x). In particular, f ′`(x) andf ′r(x) are finite.

4.3.5 Corollary. A convex function f is continuous.

Proof. By the theorem, f has finite left-hand and right-hand derivatives andhence is left and right continuous.

4.3.6 Theorem. If a convex function f is differentiable at x ∈ (u, v), then

f ′(x)(t− x) + f(x) ≤ f(t) for all t ∈ (u, v).

That is, the tangent line at (x, f(x)) lies below the graph of f on (u, v).

Proof. Since the difference quotients[f(t)− f(x)

]/(t− x) decrease as t ↓ x,

f ′r(x) ≤ f(t)− f(x)t− x

, t > x.

The same difference quotients increase as t ↑ x, hence

f ′l (x) ≥ f(t)− f(x)t− x

, t < x.

Therefore, if f ′(x) exists, then f ′(x)(t− x) + f(x) ≤ f(t) for all t.

4.4 Inverse FunctionsIn this section we prove that under suitable conditions the inverse of a

one-to-one continuous (differentiable) function is continuous (differentiable).For this we need the following two lemmas. The proof of the first is illustratedin Figures 4.6 and 4.7.

4.4.1 Lemma. Let f be one-to-one on an interval I. If f has the intermediatevalue property, then f is strictly monotone and continuous on I.

Proof. Let a, b be arbitrary points in I with a < b. Assume, for definiteness,that f(a) < f(b). We claim that f(a) < f(x) < f(b) for all a < x < b.Indeed, if, say f(x) < f(a), then f(a) lies between f(x) and f(b), hence, bythe intermediate value property, there exists c ∈ (x, b) such that f(c) = f(a),contradicting that f is one-to-one.

Next we show that f is strictly increasing on [a, b]. Let a < x1 < x2 < band suppose that f(x2) < f(x1). Then f(x2) lies between f(a) and f(x1),hence there exists d ∈ (a, x1) such that f(d) = f(x2), again contradicting thatf is one-to-one. Thus f is strictly increasing on [a, b]. It follows that f mustbe strictly increasing on any closed and bounded subinterval of I containing

a bx c

f

f(a) = f(c)

f(b)

f(x)

a x1 x2d b

f

f(d) = f(x2)

f(x1)

f(a)

FIGURE 4.6: f(x) < f(a) or f(x1) > f(x2) violates one-to-one hypothesis.

x1 x x0

α = f(x) < f(x2) < α

f(x0)

x2x

f

β

α

FIGURE 4.7: Intermediate value property implies continuity.

[a, b]. Since every pair of points in I lies in such a subinterval, f is strictlyincreasing on I.

Now let x0 ∈ I. To verify continuity of f at x0, note that by monotonicity

α := limx→x−0

f(x) ≤ f(x0) ≤ β := limx→x+

0

f(x).

(If x0 is an endpoint, only one of these inequalities holds.) Continuity of f atx0 will then follow if we show that α = f(x0) = β. Suppose, for example, thatα < f(x0). Choose any x1 < x0 in I. Since f(x1) < α < f(x0), there existssome x ∈ (x1, x0) such that f(x) = α. But choosing x2 ∈ (x, x0) then producesthe contradiction f(x) = α < f(x2) < α.

4.4.2 Lemma. If f is strictly increasing (decreasing) on an interval I, thenf−1 is strictly increasing (decreasing) on f(I).

Proof. Assume that f is strictly increasing. If y1 = f(x1) < y2 = f(x2),then x1 < x2 (that is, f−1(y1) < f−1(y2)), since otherwise f(x1) ≥ f(x2).Therefore, f−1 is strictly increasing on I.


The next two theorems are the main results on inverse functions. Theyassert that the properties of continuity or differentiability of a one-to-onefunction are inherited by the inverse function.

4.4.3 Theorem. Let f be continuous and one-to-one on an interval I. ThenJ := f(I) is an interval and f−1 : J → I is continuous. Moreover, f and f−1

are strictly monotone.

Proof. Since f is continuous, it has the intermediate value property, hence Jis an interval. Moreover, by 4.4.1 and 4.4.2, f and f−1 are strictly monotone.Since I = f−1(J) is an interval, f−1 has the intermediate value property. Thecontinuity of f−1 now follows from 4.4.1.

4.4.4 Theorem. Let I be an open interval and let f : I → R be continuousand one-to-one on I. If f is differentiable at a ∈ I and f ′(a) 6= 0, then f−1 isdifferentiable at f(a), and (

f−1)′(f(a)) = 1f ′(a) .

Proof. Let y = f(x) and b = f(a). For x near a,

f−1(y)− f−1(b)y − b

= x− af(x)− f(a) =

[f(x)− f(a)

x− a

]−1.

Since f−1 is continuous, x = f−1(y)→ f−1(b) = a as y → b and the conclusionfollows.

If f is differentiable and nonzero on I and y = f−1(x), then x = f(y) andassertion of the theorem may be written in Leibniz notation as

dy

dx= 1

dx

dy

.

From 4.4.3 we obtain the following result, which will be generalized inChapter 9 to functions on open subsets of Rn.

4.4.5 Inverse Function Theorem. Let f be continuously differentiable onan open interval I. If f ′(a) 6= 0, then there exist open intervals Ia ⊆ I andJa = f(Ia) with a ∈ Ia such that f is one-to-one on Ia and f−1 : Ja → Ia iscontinuously differentiable.

Proof. Since f ′ is continuous and f ′(a) 6= 0, there exists an open interval Iacontaining a on which f ′ 6= 0. By the mean value theorem, f is one-to-one onIa, hence, by 4.4.3, Ja = f(Ia) is an interval, and, by 4.4.4, f−1 : Ja → Ia iscontinuously differentiable.

4.4.6 Global Inverse Function Theorem. Let f be continuously differ-entiable with f ′ nonzero on an open interval I. Then f is one-to-one on I,J := f(I) is an open interval, and f−1 : J → I is continuously differentiable.

Proof. That f is one-to-one follows from the mean value theorem. By 4.4.3, Jis an interval and f−1 : J → I is continuous. Since continuous differentiabilityis a local property, 4.4.5 implies that f−1 is continuously differentiable.

The following examples, as well as exercises below, establish the existenceand basic properties of several well-known functions.

4.4.7 Example. Since x = sin y is strictly increasing on [−π/2, π/2], theinverse function y = sin−1 x exists, is strictly increasing on [−1, 1], and

dy

dx=(dx

dy

)−1= 1

cos y = 1√1− x2

, −1 < x < 1.

Similarly, x = cos y is strictly decreasing on [0, π], hence y = cos−1 x exists, isstrictly decreasing on [−1, 1], and

dy

dx=(dx

dy

)−1= −1

sin y = −1√1− x2

, −1 < x < 1. ♦

An alternate approach to the preceding example is to define the inversesine by

sin−1 x =∫ x

0

dt√1− t2

, −1 < x < 1

and then obtain the sine function as the inverse of sin−1. This allows thederivation of the standard properties of sin x, and ultimately of the other trigfunctions, without relying on geometric arguments. The disadvantage of thisapproach is that verification of these properties is detailed and lengthy. Stillanother approach is based on complex infinite series. For the latter, the readermay wish to consult [7].

The following example illustrates the integral approach for the exponentialfunction. Some of the assertions in the example rely on results from Chapters 5and 6 but should be familiar to the reader.

4.4.8 Example. The natural logarithm function is defined by

ln x :=∫ x

1

1tdt, x > 0.

One may show that all the familiar algebraic properties of the natural logfollow from this definition. (See Exercise 5.) Since ln x is strictly increasing on(0,+∞), the inverse function

expx := ln−1 x


exists and is strictly increasing. Since ln 2 > 0,

ln 2n = n ln 2→ +∞ and ln 2−n = −n ln 2→ −∞,

hencelim

x→+∞ln x = +∞ and lim

x→0+ln x = −∞.

It follows from these limits and the intermediate value theorem that the rangeof ln x, that is, the domain of expx, must be R. Thus, by Exercise 4,

limx→−∞

expx = 0 and limx→+∞

expx = +∞.

From the fundamental theorem of calculus, proved in the next chapter,d ln ydy

= 1y, hence

d expxdx

=(d ln ydy

)−1= y = expx.

Moreover, since

1 = d ln ydy

∣∣∣∣y=1

= limn→+∞

ln(1 + 1/n)− ln 11/n = lim

n→+∞ln(1 + 1/n)n,

continuity of exp and 2.2.4 imply that

exp 1 = limn→+∞

exp(

ln(1 + 1/n)n)

= limn→+∞

(1 + 1/n)n = e.

Additional properties of expx may be found in the exercises, including theidentity exp r = er, r ∈ Q. Because of this identity, we frequently write exfor expx. Indeed, the function exp is the basis for rigorous definitions of thegeneral exponential function ax, a > 0, and the power function xa, x ≥ 0. (SeeExercises 8 and 9.) ♦

Exercises1. Find f−1 and its domain for each of the following functions f with the

given domain:

(a) x2 − 4x+ 5, [2,+∞). (b) S 3x+ 22x+ 3 , R \ {−3/2}.

(c) 5e−x + 23e−x + 7 , (−∞,+∞). (d) sin2 x− 4 sin x+ 3, [−π/2, π/2].

(e) ex − 2e−x, (−∞,+∞) (f) S 2 + cosx3 + cosx, (0, π).

2. Let f(x) = ax+ |x|+ |x− 1|. Find all values of a for which f−1 existson R. For these values, find f−1.

3. Give an example of a one-to-one continuous function on the union oftwo intervals that is (a) not monotone, (b) strictly monotone but withdiscontinuous inverse.

4. Let f be defined, continuous, and strictly increasing on (a, b), so thelimits

c := limx→a+

f(x) and d := limx→b−

f(x)

exist in R. Show that the domain of f−1 is (c, d) and that

limx→c+

f−1(x) = a and limx→d−

f−1(x) = b.

5. Verify the following properties of ln x, as defined in 4.4.8:

(a) ln 1 = 0, ln e = 1. (b) S ln(xy) = ln x+ ln y.(c) ln(x/y) = ln x− ln y. (d) ln xr = r ln x, r ∈ Q.

6. Prove that exp(x+ y) = exp(x) exp(y).

7. For c, d ∈ R with c > 0, define cd = exp(d ln c). Show that this definitionagrees with the usual one if d is rational and verify the following properties,where x, y ∈ R and a, b > 0.(a) ln ax = x ln a. (b)S axay = ax+y. (c) ax/ay = ax−y.(d)

(ax)y = axy. (e) (ab)x = axbx. (f) aln b = bln a.

8. Let a > 0, a 6= 1, and define ax as in Exercise 7. Find limx→−∞ ax,limx→+∞ ax, and (ax)′.

9.S Let a ∈ R and for x > 0 define xa as in Exercise 7. Prove the power rule(xa)′ = axa−1.

10. Prove that tan x restricted to (−π/2, π/2) has a differentiable inversedefined on R. Find limx→−∞ tan−1 x, limx→+∞ tan−1 x, and (tan−1 x)′.

11. Prove that secx restricted to [0, π/2) ∪ [π, 3π/2) has a continuous in-verse defined on (−∞,−1] ∪ [1,+∞). Show that sec−1 x is differen-tiable on (−∞,−1) ∪ (1,+∞) and compute its derivative. Also, findlimx→−∞ sec−1 x and limx→+∞ sec−1 x.

12. Verify the inequalities

(a) x− 1x

< ln x < x− 1, x > 1. (b) | tan−1 x− tan−1 y| ≤ |x− y|.

(c) y − x√1− x2

< | sin−1 y − sin−1 x| < y − x√1− y2

, −1 < x < y < 1.

13. Verify the identities

(a) tan(

sin−1 x)

= x√1− x2

, −1 < x < 1.

(b) sin−1 x+ cos−1 x = π/2, −1 ≤ x ≤ 1.

(c)S cos−1(x2 − 1x2 + 1

)+ 2 tan−1 x = π, x ≥ 0.

(d) cos−1 x = 2 sin−1√

1− x2 , −1 ≤ x ≤ 1.

(e) tan−1 x+ tan−1(2/x) + tan−1(x+ 2/x) = π, x 6= 0.

14.S Suppose f satisfies f(x+ y) = f(x)f(y) for all x, y ∈ R. Show that ifa := f ′(0) exists, then f(x) = f(0)eax.

15. Suppose that f : [0, 1] → [0, 1] is continuous, one-to-one, onto, andf = f−1. Prove that either f(x) = x for all x or f is monotone decreasing.

16. Suppose f ′ is one-to-one on an open interval I. Show that f ′ is continuousand strictly monotone on I. (See Exercise 4.2.25.)

17. Let f be differentiable on an open interval I with f ′ 6= 0. Let a, b ∈ Iwith a < b and suppose that f : [a, b] → [a, b] is one-to-one and onto.Prove that there exists c ∈ (a, b) such that

f(b)− f(a)f−1(b)− f−1(a) = f ′(c)f ′

(f−1(c)

).

18.S Let f be twice differentiable and f ′ 6= 0 on an open interval I. Showthat (f−1)′′(x) exists on f(I) and find a formula.

4.5 L’Hospital’s RuleThe rule for calculating the limit of a quotient of functions, namely,

limx→ax∈E

f(x)g(x) =

lim{x→a, x∈E} f(x)lim{x→a, x∈E} g(x) , (4.4)

requires that the limits on the right are finite and the denominator is not 0. If,instead, the limits in the quotient are both zero or ±∞, then the expression onthe left in (4.4) is called an indeterminate form of type 0

0 or ±∞±∞ , respectively.There are other types of indeterminate forms, but all may be converted to oneof these. The following theorem describes a method for evaluating these limits.

4.5.1 l’Hospital’s Rule. Let J be an open interval, finite or infinite, and leta ∈ R be an accumulation point of J . Suppose that f and g are differentiableon E := J \ {a} and that g(x)g′(x) 6= 0 for every x ∈ E. If the limits

A := limx→ax∈E

f(x), B := limx→ax∈E

g(x), and L := limx→ax∈E

f ′(x)g′(x)

exist in R and either A = B = 0 or B = ±∞, then

limx→ax∈E

f(x)g(x) = L.

Proof. There are a number of cases to consider, but the proofs of many ofthese are essentially the same. We prove the theorem for four fundamentallydifferent cases and for one-sided limits, so E = (a, c) or (c, a) for some c.

As a first step, we use the Cauchy mean value theorem to obtain, for everypair of distinct numbers x, b ∈ E, a number ξ = ξ(x, b) between x and b suchthat

[f(x)− f(b)]g′(ξ) = [g(x)− g(b)]f ′(ξ). (4.5)

Now seth(x) = f(x)

g(x) .

Case 1 : A = B = 0, a and L are finite, and E = (a, c). Extend f and gcontinuously to [a, c) by defining f(a) = g(a) = 0. Taking b = a and x ∈ (a, c)in (4.5) we see that

h(x) = f ′(ξ)g′(ξ) .

Since ξ → a as x→ a, limx→a+ h(x) = L, as required.For the remaining cases, we use the Cauchy mean value theorem in the

following form. Divide (4.5) by g′(ξ)g(x) and solve the resulting equation forh = f/g to obtain

h(x) = f(b)g(x) +

[1− g(b)

g(x)

]f ′(ξ)g′(ξ) , x, b ∈ E. (4.6)

Case 2 : A = B = 0, a = L = +∞, and E = (c,+∞). Let M > 0 andchoose x0 ∈ E such that

f ′(x)g′(x) > 2M for x > x0.

Let b > x > x0. For large b, g(b)/g(x) < 1/2, hence from (4.6)

h(x) ≥ f(b)g(x) + 1

2(2M) = f(b)g(x) +M.

Letting b→ +∞ we see that h(x) ≥M . Therefore, limx→+∞ h(x) = +∞.Case 3 : B = +∞, a and L are finite and E = (c, a). Given ε > 0, choose

b ∈ E such that ∣∣∣∣f ′(t)g′(t) − L∣∣∣∣ < ε/2 for all t ∈ (b, a).

Let x ∈ (b, a). By (4.6),

h(x)− f ′(ξ)g′(ξ) = f(b)

g(x) −g(b)g(x)

f ′(ξ)g′(ξ) .

Since the right side tends to 0 as x→ a,

|h(x)− L| ≤∣∣∣h(x)− f ′(ξ)

g′(ξ)

∣∣∣+∣∣∣f ′(ξ)g′(ξ) − L

∣∣∣ < ε/2 + ε/2 = ε

for all x near a. Therefore, limx→a− h(x) = L.Case 4 : B = +∞, a = L = +∞, and E = (c,+∞). Given M > 0, choose

b > c such thatf ′(t)g′(t) > 3M for all t > b.

Let x > b such that g(x) > g(b). By (4.6),

h(x) ≥ f(b)g(x) +

[1− g(b)

g(x)

]M.

Since the quotients on the right side tend to zero, for all sufficiently large xwe have

h(x) ≥ −M2 +[1− 1

2

](3M) = M

Therefore, limx→+∞ h(x) = +∞.

The following examples illustrate typical applications of l’Hospital’s rule.

Examples. (a) The limit

L := limx→0

x− tan xx3

is of the form 00 , hence

L = limx→0

1− sec2 x

3x2 = limx→0

2 sec2 x tan x−6x = lim

x→0

sec4 x+ 2 secx tan2 x

−3 = −13 .

Note that each step except the last produces a limit of the form 00 , allowing

another application of l’Hospital’s rule. The validity of each step is ultimatelyjustified by the existence of the final limit.


(b) The limit

L := limx→+∞

sin(1/x)e1/x − 1

is of the form 00 ; however, it is complicated to apply l’Hospital’s rule directly.

Making the substitution y = 1/x produces a more tractable problem:

L = limy→0+

sin yey − 1 = lim

y→0+

cos yey

= 1.

(c) The limitL := lim

x→+∞x sin(1/x)

is of the form ∞ · 0, but a simple algebraic manipulation produces the form 00 :

L = limx→+∞

sin(1/x)1/x = lim

y→0+

sin yy

= 1.

Here, l’Hospital’s rule was unnecessary, since we could use a known limit.(d) The limit L := limx→1+ x1/(xp−1), p > 0, is of the form 1∞, so we take

logarithms to obtain the form 00 :

limx→1+

ln[x1/(xp−1)

]= limx→1+

ln xxp − 1 = lim

x→1+

1/xpxp−1 = 1

p.

Thus L = e1/p.(e) The technique used in (d) shows that

limx→+∞

(1 + t

x

)x= et,

sincelim

x→+∞ln(

1 + t

x

)x= limy→0+

ln(1 + ty)y

= limy→0+

t

1 + ty= t.

(f) The limitL := lim

x→π/2+

[ 1x− π/2 + secx

]is of the form ∞−∞. Combining fractions we obtain a limit of the form 0

0 .Thus

L = limx→π/2+

cosx+ x− π/2(x− π/2) cosx

= limx→π/2+

1− sin x(π/2− x) sin x+ cosx

= limx→π/2+

− cosx(π/2− x) cosx− 2 sin x

= 0. ♦

Exercises1. Evaluate the following limits, where p, q > 0:

(a) S limx→0

epx − eqx

sin x (b) limx→1

epx − ep

tan(x− 1) (c) limx→+∞

ln(3x2 − 1)ln(5x2 − 1)

(d) S limx→+∞

x(

1− e1/x)

(e) limx→0+

ln(sin px)ln(sin qx) (f) lim

x→0

p sin(px)− p2x

x3

(g) S limx→0

x− tan−1 x

x− sin−1 x(h) lim

x→0+

(1x− 1

sin x

)(i) lim

x→1+ln(x− 1) ln x

(j) S limx→0+

(sin x)(ln x) (k) limx→0+

x(xp) (l) limx→0

(cosx)1/x2

(m) S limx→0+

[x

tan x −1x

](n) lim

x→1+

(x+ 1x− 1

)x−1(o) lim

x→1

√x ln xx− 1

(p) S limx→0

1− cosx2

x2 + x3 sin x (q) limx→0

sin x+ cosx− 1ln(1 + x) (r) lim

x→+∞

(x− 1x+ 1

)x(s) S lim

x→+∞x1/(ln ln x)p (t) lim

x→+∞

(1− 1√

x

)−x(u) lim

x→0+(sin x)x

(v) S limx→0+

xsin x (w) limx→0

x cosx− sin xx2 sin x (x) lim

x→0+

(1 + x)1/x − ex

2. For each function f : (0, 1]→ R below, define f(0) so that f continuouson [0, 1].

(a) 1− exx

. (b) ln(1 + x)x

. (c) S sin 5xsin 3x.

(d) x

tan x. (e) x ln x. (f) 1− cos 2x1− cos 3x.

3. Find limn an, where an =(a)S sin1/n(1/n). (b) n− n2 ln(1 + 1/n). (c) n [(1 + 1/n)n − e] .

4. Show that (n+ 1/n

)p − np →

0 if p < 2,2 if p = 2+∞ if p > 2

5. By considering the sequences {n} and {n+ 1/n}, use l’Hospital’s rule toprove that ex is not uniformly continuous on [0,+∞).

6.S Let f(x) = x1+1/x. Evaluate limn

[f(n+ 1)− f(n)

].

7. Let f be differentiable on (a, b) and suppose that limx→a+ f(x) andlimx→a+ f ′(x) exist in R. Find a continuous extension of f to [a, b) suchthat f ′ exists and is continuous at a.


8. Let g be differentiable on (1,+∞) and h differentiable on (−∞, 1] with

limx→1+

g(x) = h(1) and limx→1+

g′(x) = h′`(1). (†)

Define

f(x) ={g(x) if x > 1h(x) if x ≤ 1.

Show that f is differentiable at x = 1 and hence on R. Conversely,suppose that f ′(1) exists. Do the limit equations in (†) hold?

9.S Let f and g be differentiable on (0,+∞) with

limx→+∞

f(x) = limx→+∞

g(x) = +∞, and limx→+∞

f ′(x)g′(x) ∈ (0,+∞).

Evaluatelim

x→+∞

ln f(x)ln g(x) .

10. Let f be differentiable in a neighborhood of a and suppose that f ′′(a)exists. For α, β ∈ R calculate

(a)S limh→0

βf(a+ αh)− αf(a+ βh) + (α− β)f(a)h2 .

(b) limh→0

f(a+ αh) + f(a+ βh)− 2f(a)h2 if f ′(a) = 0.

11. Suppose that f has n derivatives on [a,+∞) and that limx→+∞ f (n)(x)exists in R. Prove that limx→+∞ f(x)/xn exists in R.

12.S Suppose that f has n derivatives on (0, a) and L := limx→0+

x2nf (n)(x)exists in R. Find limx→0+ xnf(x) in terms of L.

13. Let f be differentiable on (1,+∞) and limx→+∞ f(x) = 0. Prove thatif limx→+∞ x2f ′(x) exists in R, then limx→+∞ xf(x) also exists in R. Isthe converse true?

14. Suppose that, in a deleted neighborhood of 0, f is differentiable withf ′ 6= 0 and that limx→0 f(x) = 0. Prove that if limx→0 f(x)/f ′(x) exists,then it must equal 0.

15. Let g(x) be differentiable on (1,∞) with g and g′ nonzero and let f(x)be differentiable in a neighborhood of 0. Suppose that limx→+∞ g(x) = 0,f(0) = 0 and f ′ is continuous at 0. Find

limx→+∞

f(g(x))g(x) .

Give nontrivial examples of functions f and g that satisfy these condi-tions.

16.S Let f and g be differentiable on (1,+∞) with g′ 6= 0 and supposethat limx→+∞ f(x) = limx→+∞ g(x) = +∞ and that the limit L :=limx→+∞ f ′(x) exists in R. Find

limx→+∞

f(g(x))g(x) .

Give nontrivial examples that satisfy these conditions with L finite.

17. Let f be differentiable on (1,+∞) and suppose that limx→+∞ f(x) andlimx→+∞ f ′(x) exist in R. Prove that the second limit must be zero.Does the assertion still hold if limx→+∞ f(x) is infinite?

18.S Let f be differentiable on (1,+∞) and suppose that limx→+∞ f(x) andlimx→+∞ xf ′(x) exist in R. Prove that the second limit is zero. Does theassertion still hold if limx→+∞ f(x) is infinite?

19. Let f be differentiable in a deleted neighborhood of 0 and suppose thatlimx→0 f(x) and limx→0

[f ′(x) tan x

]exist in R. Prove that the second

limit must be 0. Does the assertion still hold if limx→0 f(x) is infinite?

20. Let f be differentiable on (0, b) and suppose that the limits limx→0+ f(x)and limx→0+ x2f ′(x) exist in R. Prove that one of these limits must bezero. Does the assertion still hold if limx→0+ f(x) is infinite?

21. Let f ′′ exist and be continuous on (−1, 1) and f(0) = f ′(0) = 0. Provethat there exists a continuous function g on (0, 1) such that f(x) = x2g(x).Must g be differentiable at 0?

4.6 Taylor’s Theorem on RTaylor’s theorem may be viewed as a generalization of the mean value

theorem. Its importance derives from its use in establishing various inequalitiesand from its fundamental connection with power series.

4.6.1 Taylor’s Theorem. Let f have n+ 1 derivatives in an open interval I.Then, for each x, a ∈ I with x 6= a, there exists a number c between x and asuch that

f(x) =n∑k=0

f (k)(a)k! (x− a)k + f (n+1)(c)

(n+ 1)! (x− a)n+1. (4.7)

Proof. Assume for definiteness that a < x. Define a function g on [a, x] by

g(t) =n∑k=0

f (k)(t)k! (x− t)k + α

(x− t)n+1

(n+ 1)! − f(x), (4.8)


where α is chosen so that g(a) = 0. Since g is continuous on [a, x], differentiableon (a, x) and g(x) = g(a), there exists, by Rolle’s theorem, c ∈ (a, x) such thatg′(c) = 0. From the calculations

d

dt

f (k)(t)k! (x− t)k =

f (k+1)(t)

k! (x− t)k − f (k)(t)(k − 1)! (x− t)

k−1 if k ≥ 1,

f ′(t) if k = 0,

we have

g′(t) =n∑k=0

f (k+1)(t)k! (x− t)k −

n−1∑k=0

f (k+1)(t)k! (x− t)k − α (x− t)n

n!

= f (n+1)(t)n! (x− t)n − α (x− t)n

n! .

In particular,

0 = g′(c) = f (n+1)(c)n! (x− c)n − α (x− c)n

n! ,

hence α = f (n+1)(c). Thus from (4.8),

0 = g(a) =n∑k=0

f (k)(a)k! (x− a)k + f (n+1)(c)

(n+ 1)! (x− a)n+1 − f(x),

which is (4.7).

Equation (4.7) is frequently written

f(x) = Tn(x, a) +Rn(x, a), where

Tn(x, a) =n∑k=0

f (k)(a)k! (x− a)k and Rn(x, a) = f (n+1)(c)

(n+ 1)! (x− a)n+1.

The expression Tn(x, a) is called the nth Taylor polynomial of f about a, andRn(x, a) is called the remainder. It may be shown that Tn(x, a) is the uniquepolynomial of degree ≤ n that best approximates f near a in the sense that

limx→a

f(x)− Tn(x, a)(x− a)n = 0.

(See Exercise 4.)The remainder term Rn(x, a) has other forms, one of which is given in

Exercise 3. Observe that if Rn(x, a)→ 0 as n→ +∞, then Tn(x, a)→ f(x),which implies that f(x) is expressible as a power series about a. We exploitthis idea in Section 7.4.

The following application of Taylor’s theorem is a generalization of thesecond derivative test.

4.6.2 nth Derivative Test. Let f have n continuous derivatives on an openinterval I and let a ∈ I with f (j)(a) = 0, 1 ≤ j ≤ n− 1, and fn(a) 6= 0.

(a) If n is even and f (n)(a) > 0 (f (n)(a) < 0), then f has a local minimum(local maximum) at a.

(b) If n is odd, then f has a neither a local minimum nor a local maximumat a.

Proof. Assume f (n)(a) > 0. By continuity, f (n)(x) > 0 for all x in an openinterval J containing a. Let x ∈ J , x 6= a. By Taylor’s theorem, there exists cbetween a and x such that

f(x) = f(a) + f (n)(c) (x− a)nn! .

Thus if n is even, then f(x) > f(a), hence f has a local minimum at a. If n isodd, then f(x) > f(a) if x > a and f(x) < f(a) if x < a, so f has a neither alocal maximum nor a local minimum at a. A similar argument works for thecase f (n)(a) < 0.

Note that the familiar second derivative test, obtained by taking n = 2 inthe theorem, is inconclusive for the function f(x) = x4 at a = 0. Here, onemust take n = 4.

Exercises1. Define f(0) = 0 and f(x) = e−1/x2

x 6= 0. Prove that f (n) exists on Rand f (n)(0) = 0 for all n. Conclude that every Taylor polynomial for fabout 0 is identically 0.

2. Verify the following inequalities:

(a)2n−1∑k=0

(−1)kxk < 11 + x

<

2n∑k=0

(−1)kxk, x > 0.

(b)S

2n−1∑k=0

(−1)kk! xk < e−x <

2n∑k=0

(−1)kk! xk, x > 0.

(c)2n∑k=1

(−1)k+1

(2k − 1)!x2k−1 < sin x <

2n+1∑k=1

(−1)k+1

(2k − 1)!x2k−1, 0 < x < π.

(d)2n−1∑k=0

(−1)k(2k)! x

2k < cosx <2n∑k=0

(−1)k(2k)! x

2k, 0 < x < π.

(e)n−1∑k=1

(−1)k−1

kxk < ln(1 + x) <

n∑k=1

(−1)k−1

kxk if n is odd,

the reverse inequalities if n is even.


3.S ⇓2 Show that if f (n+1) is continuous on I, then

Rn(x, a) = 1n!

∫ x

a

(x− t)nf (n+1)(t) dt.

Hint. Integrate by parts n times.

4. Prove that a polynomial Pn(x) =∑nk=0 ak(x− a)k satisfies

limx→a

f(x)− Pn(x)(x− a)n = 0

iff Pn = Tn, the nth Taylor polynomial of f about a.

5.S Let P (x) =∑nk=0 ak(x− a)k =

∑nk=0 bk(x− b)k. Show that

bk =n−k∑j=0

(j + k

k

)(b− a)jak+j .

6. Let P be a polynomial of degree n. Prove that the polynomials P (x± 1)may be written as linear combinations of P (k)(x), k = 0, . . . , n. Findsimplified expressions for P (x+ 1)± P (x− 1).

7. Let f have n derivatives on [0, 1]. Show that for each y 6= f(1) there existsan extension g of f to [0,+∞) with n derivatives such that g(b) = y forsome b > 1.

*4.7 Newton’s MethodA simple zero of a differentiable function f is a number z such that f(z) = 0

and f ′(z) 6= 0. Newton’s method is a rapidly converging recursion scheme forapproximating such a zero. The idea is to choose x1 near z and then define asequence {xn} recursively by

xn+1 = xn −f(xn)f ′(xn) , n = 1, 2, . . . , (4.9)

as illustrated in Figure 4.8. Under suitable conditions, the sequence is well-defined and converges to z, hence may be used to approximate z to (theoreti-cally) any desired degree of accuracy.

4.7.1 Newton’s Method. Let f ′′ be continuous on an open interval I andlet z be a simple zero of f in I. If x1 is chosen sufficiently near z, then thesequence {xn} lies in I and converges to z.


xnxn+1xn+2

y = f(xn) + f ′(xn)(x− xn)

xz

y = f(x)

FIGURE 4.8: Newton’s method.

Proof. Since f ′(z) 6= 0, there exists a neighborhood Iz of z contained in I onwhich |f ′| ≥ c > 0. Suppose that xn ∈ Iz. By Taylor’s theorem, for each x ∈ Ithere exists ξ between x and xn such that

f(x) = f(xn) + f ′(xn)(x− xn) + 12f′′(ξ)(x− xn)2.

In particular,

0 = f(z) = f(xn) + f ′(xn)(z − xn) + 12f′′(ξ)(z − xn)2.

Dividing by f ′(xn), we have

xn+1 − z = xn − z −f(xn)f ′(xn) = f ′′(ξ)

2f ′(xn) (z − xn)2.

Thus if d is the maximum of |f ′′| on Iz, then

|xn+1 − z| ≤ α|xn − z|2, α := d

2c .

Iterating, we have

|xn+1 − z| ≤ · · · ≤ α2k+1−1|xn−k − z|2k+1≤ · · · ≤ α2n−1|x1 − z|2

n

.

Thus if x1 is sufficiently near z, and in particular if α|x1− z| < 1, then xn ∈ Izfor all n and xn → z.

4.7.2 Example. Let f(x) = sin x− x/3. Since

f(3π/4) = 1/√

2− π/4 < 0 < 1− π/6 = f(π/2),

f has a zero in [3π/4, π/2] by the intermediate value theorem. Taking x1 = 3π/4yields the zero 2.27886266, accurate to eight decimal places. Taking x1 = 1produces the symmetric zero −2.27886266, while x1 = π/4 produces 0. ♦


If x1 is not sufficiently near z, then the sequence {xn} may converge moreslowly to z or may not converge at all (see Exercise 6).

4.7.3 Example. For an approximate solution of ex = 2−x we apply Newton’smethod to f(x) = ex + x− 2. By the intermediate value theorem, f has a zeroin (0, 1). The recursion formula for f is

xn+1 = xn − (exn + xn − 2)(exn + 1)−1.

Table 4.1 gives the first few terms of the sequence {xn} and the corresponding

TABLE 4.1: Newton’s method for ex + x− 2 = 0.

x1 x2 x3 x4 x5

1 .5378828 .4456167 .4428567 .4428544f(x1) f(x2) f(x3) f(x4) f(x5)

1.7182818 .2502604 .0070696 .0000059 .0000000x1 x2 x3 x4 x5

5 3.9866142 2.9686340 1.9701667 1.0961884f(x1) f(x2) f(x3) f(x4) f(x5)

151.4131591 55.8587993 20.4339472 7.1420387 2.0889256

values of f (accurate up to seven decimal places) for the initial values x1 = 1and x1 = 5. The convergence is significantly slower for the larger value. Thesolution, accurate to 10 decimal places, is .4428544010. ♦

Exercises1. Find a zero, accurate to eight decimal places, of the given polynomial in

the indicated interval.

(a) S x3 − x+ 2, [−2,−1]. (b) x3 + x+ 1, [−1, 0].(c) x3 − 2x+ 2, [−2,−1]. (d) Sx5 − 2x+ 3, [−2,−1].(e) x7 − x− 1, [1, 2]. (f) x4 − 2x3 + 5x2 − 8x− 6, [2, 3].(g)S 20x4 − 20x3 − 8x2 + 4x− 1, [1, 2].(h) 20x4 − 20x3 − 4x+ 1, [1, 2].

2. Find a solution of the given equation in the indicated interval, correct toeight decimal places.

(a) S sin x = x2, [.5, 1]. (b) sin x = x3, [.5, 1].(c) S ln x+ x = 2, [1, 2]. (d) 2 cosx = ex, [0, 1].(e) ln x = e−x, [1, 2]. (f) tan x+ x = 1, [0, 1].


3. Show that Newton’s method applied to the function x−1 − c producesthe equation xn+1 = 2xn − cx2

n. Use this to find 1/2.34567, correct toeight decimal places. Check your answer with a calculator.

4.S Use Newton’s method to find√

63 correct to eight decimal places. Checkyour answer with a calculator.

5. What happens when you apply Newton’s method with x1 = 1 to thepolynomial in part (c) of Exercise 1?

6. Show that the sequence generated by Newton’s method applied to f(x) =x1/3 cannot converge for any value of x1 6= 0.

Chapter 5Riemann Integration on R

5.1 The Riemann–Darboux IntegralThroughout this section, f denotes an arbitrary bounded,real-valued function on a closed and bounded interval [a, b].

The first step in the development of the Riemann–Darboux integral is topartition the interval [a, b] into finitely many subintervals, which are used toform upper and lower sums of f . Under suitable conditions, the sums convergeto the integral.

5.1.1 Definition. A partition of [a, b] is a set P = {x0, x1, . . . , xn−1, xn},where

x0 := a < x1 < · · · < xn−1 < xn := b.

The points x1, . . . , xn−1 are called the interior points of the partition. Themesh of the partition is defined as

‖P‖ := max1≤j≤n

∆xj , where ∆xj := xj − xj−1, 1 ≤ j ≤ n.

A refinement of P is a partition containing P. The common refinement ofpartitions P and Q is the partition P ∪Q. ♦

5.1.2 Example. Let p ∈ N. Then, for each n ∈ N,

Pn := {j/pn : j = 0, 1, . . . , pn}

is a partition of [0, 1], ‖Pn‖ = p−n, and Pn+1 is a refinement of Pn. ♦

5.1.3 Definition. The lower and upper (Darboux) sums of f over a partitionP of [a, b] are defined, respectively, by

S(f,P) :=n∑j=1

mj∆xj and S(f,P) :=n∑j=1

Mj∆xj ,

where

mj = mj(f) := infxj−1≤x≤xj

f(x) and Mj = Mj(f) := supxj−1≤x≤xj

f(x). ♦

107

A geometric interpretation of the upper and lower sums for a positivecontinuous function is given in Figure 5.1. The lower (upper) sum is the totalarea of the smaller (larger) rectangles.

a x1 x2 x3 x4 bx

f

FIGURE 5.1: Upper and lower sums of f .

The following proposition asserts that refinements increase lower sums anddecrease upper sums.

5.1.4 Proposition. If Q is a refinement of P, then

S(f,P) ≤ S(f,Q) ≤ S(f,Q) ≤ S(f,P).

Proof. The middle inequality is clear. To prove the rightmost inequality, letP = {x0 = a < x1 < · · · < xn−1 < xn = b} and assume first that Q = P ∪ {c}.Choose k so that xk−1 < c < xk and set

M ′k = supxk−1≤x≤c

f(x) and M ′′k = supc≤x≤xk

f(x).

Then M ′k, M ′′k ≤Mk, hence

S(f,Q) =k−1∑j=1

Mj∆xj +n∑

j=k+1Mj∆xj +M ′k(c− xk−1) +M ′′k (xk − c)

≤k−1∑j=1

Mj∆xj +n∑

j=k+1Mj∆xj +Mk(c− xk−1) +Mk(xk − c)

= S(f,P).

For the general case, observe that any refinement Q of P may be obtainedby successively adding points to P. At each step, the upper sum is decreasedso that ultimately one obtains the desired inequality. The proof for lower sumsis similar.

5.1.5 Corollary. For any partitions P and Q of [a, b],

S(f,Q) ≤ S(f,P). (5.1)

Proof. By 5.1.4, S(f,Q) ≤ S(f,P ∪Q) ≤ S(f,P ∪Q) ≤ S(f,P).

Riemann Integration on R 109

5.1.6 Definition. The lower and upper (Darboux ) integrals of f on [a, b] aredefined, respectively, by∫ b

a

f =∫ b

a

f(x) dx := supPS(f,P) and

∫ b

a

f =∫ b

a

f(x) dx := infPS(f,P),

where the supremum and infimum are taken over all partitions P of [a, b]. Ineach case, f is called the integrand and x the integration variable. ♦

5.1.7 Proposition. For any partition P of [a, b],

S(f,P) ≤∫ b

a

f ≤∫ b

a

f ≤ S(f,P).

Proof. The left and right inequalities are immediate from the definition oflower and upper integrals. The middle inequality follows by taking the infimumover Q and then the supremum over P in (5.1).

5.1.8 Proposition. The following statements are equivalent:

(a)∫ b

a

f =∫ b

a

f .

(b) For each ε > 0 there exists a partition Pε of [a, b] such that

S(f,Pε)− S(f,Pε) ≤ ε.

Proof. (a) ⇒ (b): Given ε > 0, there exist partitions P ′ and P ′′ such that∫ b

a

f − ε/2 < S(f,P ′) and S(f,P ′′) <∫ b

a

f + ε/2.

By 5.1.4, the inequalities still hold if P ′ and P ′′ are each replaced by theircommon refinement Pε := P ′ ∪ P ′′. Subtracting the resulting inequalities andapplying (a) yields (b).

(b) ⇒ (a): If the inequality in (b) holds then, by 5.1.7,

0 ≤∫ b

a

f −∫ b

a

f ≤ S(f,Pε)− S(f,Pε) < ε.

Since ε is arbitrary, the integrals must be equal.

5.1.9 Definition. The function f is said to be (Darboux) integrable on [a, b]if one (hence both) of the conditions (a), (b) of 5.1.8 hold. In this case, thecommon value of the integrals in (a) is called the (Riemann–Darboux ) integralof f on [a, b] and is denoted by∫ b

a

f =∫ b

a

f(x) dx.

Also, define ∫ a

b

f = −∫ b

a

f and∫ a

a

f = 0.

The collection of all integrable functions on [a, b] is denoted by Rba. ♦

The following theorem guarantees a rich supply of integrable functions.

5.1.10 Theorem. If f is continuous on [a, b] except possibly at finitely manypoints, then f ∈ Rba.

Proof. Denote the points of discontinuity of f , if any, by d1 < · · · < dn. Forconvenience, we assume that these lie in (a, b); only a minor modification ofthe proof is needed if d1 = a or dn = b. Let ε > 0. For each j, remove anopen interval of width r centered at dj , the value of r to be determined. Sincef is continuous on each of the resulting n + 1 closed intervals I0, . . . , In, itis uniformly continuous there. (If f is continuous on [a, b], then n = 0 andI0 = [a, b].) Thus there exists a δ > 0 such that for each j,

|f(x)− f(y)| < ε/2(b− a) for all x, y ∈ Ij with |x− y| < δ.

Now, the endpoints of the intervals Ij form a partition P of [a, b]. If necessary,refine P by inserting points (marked by ∗ in Figure 5.2) into these intervals sothat the distance between consecutive points is less than δ. The subintervals of

d2d1

a bP

I0 I2I1

Qβ β ββ α β α

r r

∗ ∗d1 d2

FIGURE 5.2: The partitions P and Q.

the resulting partition Q are of two types: those that contain some dj , whichwe mark by α, and those that do not, which we mark by β. Thus, in theobvious notation,

S(f,Q)− S(f,Q) =∑α

(Mj −mj)∆xj +∑β

(Mj −mj)∆xj .

In the first sum, ∆xj < r and in the second, Mj −mj ≤ ε/2(b− a). Since thefirst sum has n terms (corresponding to the n discontinuities dj),

S(f,Q)− S(f,Q) < 2Mnr + ε/2,

where M is a bound for |f | on [a, b]. Choosing r < ε/4Mn, we then haveS(f,Q)− S(f,Q) < ε, which shows that f is integrable on [a, b].

The set of discontinuities of an integrable function can be infinite but maynot be too large. We make this precise in Section 5.8. In the meantime, weoffer the following examples to illustrate the basic idea. In the first example,the function is discontinuous only on a countably infinite set, while in thesecond the function is discontinuous everywhere.

0 1x4x2 x3x1 1/n 1/(n− 1) x2n−2· · · 1/2x2n−3

FIGURE 5.3: The partition Pn of Example 5.1.11.

5.1.11 Example. Let f be any bounded function on [0, 1] such that f(x) = 0if x 6∈ {1/n : n = 2, 3 . . .}. We claim that f is integrable and that

∫ 10 f = 0.

The idea is to enclose the points of discontinuity of f in small intervals, as inthe proof of 5.1.10. Fix n and let

Pn = {x0 = 0, x1, x2, . . . , x2n−2, x2n−1 = 1},

where

x2j−1 < 1/(n− j + 1) < x2j < x2j+1, j = 1, 2, . . . , n− 1, and∆x2j = x2j − x2j−1 < 1/n2, j = 1, 2, . . . , n.

(See Figure 5.3.) Let |f | ≤ M on [0, 1]. Since f = 0 on [x2j , x2j+1] andmj ≥ −M ,

S(f,Pn) = m1x1 +m2(x2 − x1) + · · ·+m2n−2(x2n−2 − x2n−3)≥ −M

[x1 + (x2 − x1) + (x4 − x3) + · · ·+ (x2n−2 − x2n−3)

]≥ −M(1/n+ (n− 1)/n2) = −M(2/n− 1/n2).

A similar calculation shows that S(f,Pn) ≤M(2/n− 1/n2). Therefore,

limnS(f,Pn) = lim

nS(f,Pn) = 0,

hence f is integrable with zero integral. ♦

5.1.12 Example. The Dirichlet function d(x) (3.1.7) is not integrable on any(nondegenerate) interval [a, b]. Indeed, every upper sum of d(x) has the valueb− a and every lower sum has the value 0. ♦

A useful characterization of integrability may be given in terms of thelimits of S(f,P) and S(f,P) as ‖P‖ → 0.

5.1.13 Definition. Let L ∈ R. We write L = lim‖P‖→0 S(f,P) if, given ε > 0,there exists δ > 0 such that |S(f,P)−L| < ε for all partitions P with ‖P‖ < δ.The limit lim‖P‖→0 S(f,P) is defined analogously. ♦

5.1.14 Lemma. Let P ′ = {x′0 = a < x′1 < · · ·x′n < x′n+1 = b} be a partitionof [a, b] and let |f | ≤M on [a, b]. Then

S(f,P) ≤ S(f,P ′) + 3nM‖P‖

for all partitions P of [a, b] with ‖P‖ < δ′ := minj ∆x′j.

Pα αγ γ γγ γ

P ′a x′1 x′2 b

P ′′β β β βγ γ γ γ γ

FIGURE 5.4: The partitions P ′, P, and P ′′.

Proof. Since ‖P‖ < ∆x′j , no interval of P can contain more that one interiorpoint of P ′. Mark the intervals of P that contain exactly one interior point ofP ′ by α and mark those that contain no interior point of P ′ by γ. Considerthe common refinement P ′′ = P ∪ P ′ of P and P ′. Some of the intervals ofP ′′ were formed from an interval of P of type α; we mark those by β. Theremaining intervals of P ′′, intervals that were not formed from an interval of Pof type α, are precisely the intervals marked γ in P . Thus the terms of S(f,P)and S(f,P ′′) corresponding to intervals of type γ are identical, hence cancelunder substraction of upper sums. Therefore, in the obvious notation,

S(f,P)− S(f,P ′′) =∑α

Mj(f)∆xj −∑β

M ′′j (f)∆x′′j

≤M[∑

α

∆xj +∑β

∆x′′j]

≤M(n‖P‖+ 2n‖P ′′‖

),

the last inequality because there are at most n intervals of type α and at most2n intervals of type β. Since P ′′ is a refinement of P ′ and P,

S(f,P)− S(f,P ′) ≤ S(f,P)− S(f,P ′′) ≤ 3nM‖P‖.

5.1.15 Theorem. For any bounded function f on [a, b],∫ b

a

f = lim‖P‖→0

S(f,P) and∫ b

a

f = lim‖P‖→0

S(f,P). (5.2)

Thus f is integrable on [a, b] iff the limits in (5.2) are equal, in which case∫ b

a

f = lim‖P‖→0

S(f,P) = lim‖P‖→0

S(f,P). (5.3)

Proof. Given ε > 0, choose a partition P ′ such that

S(f,P ′) <∫ b

a

f + ε/2.

In the notation of 5.1.14, for any partition P with ‖P‖ < δ′,

S(f,P) ≤ S(f,P ′)|+ 3nM‖P‖ <∫ b

a

f + ε/2 + 3nM‖P‖.

Hence if ‖P‖ < min{δ′, ε/6nM}, then∫ b

a

f ≤ S(f,P) <∫ b

a

f + ε.

Since ε was arbitrary, the first limit in (5.2) is established. The second followsfrom the first by considering −f and using Exercise 5.1.3.

Equation (5.3) represents the integral as a limit of upper and lower sums.It is also possible to represent the integral as a limit of intermediate sums,called Riemann sums.

5.1.16 Definition. Let P = {x0 = a < x1 < · · · < xn = b} be a partition of[a, b] and let ξ = (ξ1, . . . , ξn), where ξj ∈ [xj−1, xj ]. The sum

S(f,P, ξ) :=n∑j=1

f(ξj)∆xj

is called the Riemann sum of f determined by P and ξ. ♦

Figure 5.5 illustrates a Riemann sum for a positive continuous functionf . In this case S(f,P, ξ) is the total area of the rectangles with heights f(ξj)and bases ∆xj .

a x1 x2 x3 x4 bξ1 ξ2 ξ3 ξ4 ξ5

f

x

FIGURE 5.5: A Riemann sum.

5.1.17 Definition. Let P = {x0 = a < x1 < · · · < xn = b} be a partition of[a, b] and let ξ = (ξ1, . . . , ξn), where ξj ∈ [xj−1, xj ]. We write

L = lim‖P‖→0

S(f,P, ξ)

if for each ε > 0 there exists δ > 0 such that |S(f,P, ξ) − L| < ε for allpartitions P with ‖P‖ < δ and all choices of ξ. Similarly, we write

L = limPS(f,P, ξ)

if for each ε > 0 there exists a partition Pε such that |S(f,P, ξ)− L| < ε forall refinements P of Pε and all choices of ξ. ♦

We may now give Riemann’s characterization of integrability.

5.1.18 Theorem. The following statements are equivalent:

(a) f ∈ Rba.

(b) lim‖P‖→0

S(f,P, ξ) exists in R.

(c) limPS(f,P, ξ) exists in R.

If these conditions hold, then∫ b

a

f = lim‖P‖→0

S(f,P, ξ) = limPS(f,P, ξ).

Proof. (a) ⇒ (b): Let L =∫ baf . For any partition P and any ξ, we have

S(f,P)− L ≤ S(f,P, ξ)− L ≤ S(f,P)− L,

hence (b) follows from 5.1.15.(b) ⇒ (c): Let L := lim‖P‖→0 S(f,P, ξ). Given ε > 0, choose δ > 0 such

that

|S(f,P, ξ)− L| < ε for all partitions P with ‖P‖ < δ and all ξ. (5.4)

Choose any partition Pε with ‖Pε‖ < δ. If P is any refinement of Pε, then‖P‖ ≤ ‖Pε‖ < δ, hence (5.4) holds for P.

(c) ⇒ (a): Let L := limP S(f,P, ξ). Given ε > 0, choose a partition Pεsuch that

|S(f,P, ξ)− L| < ε for all refinements P of Pε and all ξ. (5.5)

For such a partition P , by the approximation property of suprema there existsfor each j a sequence {ξj,k}∞k=1 in [xj−1, xj ] such that limk f(ξj,k) = Mj(f). Itfollows that

limkS(f,P, ξk) = S(f,P), where ξk = (ξ1k, ξ2k, . . . , ξnk).

From (5.5),∫ baf − L ≤ S(f,P) − L ≤ ε. Since ε was arbitrary,

∫ baf ≤ L.

Similarly,∫ baf ≥ L. Therefore

∫ baf =

∫ baf .

Exercises1. Prove that if k is a constant, then

∫ bak =

∫ bak = k(b− a).

2. Let a ≤ c < d ≤ b. Define f on [a, b] by f(x) = 1 if x ∈ [c, d] andf(x) = 0 otherwise. Show that f ∈ Rba and evaluate

∫ baf .

3.S ⇓1 Prove that

(a) S(−f,P) = −S(f,P) and∫ ba

(−f) = −∫ baf.

(b) f ∈ Rba ⇒ −f ∈ Rba and∫ ba

(−f) = −∫ baf.

4. ⇓2 Prove that a monotone function is integrable.

5.S Let f ∈ Rba and let g : [a, b]→ R be any function that differs from f atfinitely many points in [a, b]. Prove that g ∈ Rba and that

∫ baf =

∫ bag.

Does the same result hold if g differs from f at countably many points?

6. Let f ∈ Rba. Prove:

(a) If infa≤x≤b f(x) > 0, then 1/f ∈ Rba.(b) If f(x) ≥ 0 for all x ∈ [a, b], then

√f ∈ Rba.

(c)S sin(f) ∈ Rba.

7.S Let F (P) be a real-valued function of partitions P on an interval [a, b].Write

L = limPF (P)

if, given ε > 0, there exists a partition Pε such that |F (P)− L| < ε forall partitions P refining Pε.(a) Show that the limit is linear, that is,

limP

[αF (P) + βG(P)

]= α lim

PF (P) + β lim

PG(P),

provided the right side exists.(b) Let f be a bounded function on [a, b]. With this definition, show that∫ b

a

f = limPS(f,P) and

∫ b

a

f = limPS(f,P).

8. Let f ∈ R10 and set g(x) = xq, where q > 0. Prove that f ◦ g ∈ R1

0.1This exercise will be used in 5.2.2.2This exercise will be used in 5.9.8.

5.2 Properties of the IntegralThe following lemma will be useful in proving certain properties of integrals.

5.2.1 Lemma. Let f : [a, b]→ R be bounded. Then there exists a sequence ofpartitions {Pn} of [a, b] such that

limn→∞

S(f,Pn) =∫ b

a

f and limn→∞

S(f,Pn) =∫ b

a

f.

Moreover, the limits still hold if each Pn is replaced by a refinement.

Proof. By the approximation property of infima and suprema, for each n thereexist partitions P ′n and P ′′n of [a, b] such that∫ b

a

f − 1/n < S(f,P ′n) ≤∫ b

a

f and∫ b

a

f ≤ S(f,P ′′n) <∫ b

a

f + 1/n.

Since refinements decrease upper sums and increase lower sums, the inequalitiesstill hold if P ′n and P ′′n are replaced by their common refinement Pn or by anyrefinement of Pn. Letting n→ +∞ completes the proof.

5.2.2 Theorem. If f, g ∈ Rba and α, β ∈ R, then αf + βg ∈ Rba and∫ b

a

(αf + βg

)= α

∫ b

a

f + β

∫ b

a

g.

Proof. By 5.2.1, we may choose a sequence of partitions Pn such that

limnS(f,Pn) =

∫ b

a

f and limnS(g,Pn) =

∫ b

a

g.

(There exists one such sequence for f , another for g; the sequence of commonrefinements then works for both functions.) Letting n→∞ in∫ b

a

(f + g) ≤ S(f + g,Pn) ≤ S(f,Pn) + S(g,Pn)

yields ∫ b

a

(f + g) ≤∫ b

a

f +∫ b

a

g.

Similarly, ∫ b

a

(f + g) ≥∫ b

a

f +∫ b

a

g.

It follows that f + g is integrable and∫ b

a

(f + g) =∫ b

a

f +∫ b

a

g.

It remains to prove that αf is integrable and that∫ baαf = α

∫ baf . If α > 0,

thenS(αf,P) = αS(f,P) and S(αf,P) = αS(f,P).

Taking the infimum and supremum over P yields∫ b

a

αf = α

∫ b

a

f =∫ b

a

αf.

If α < 0, then −α > 0, hence∫ b

a

αf =∫ b

a

(−α)(−f) = (−α)∫ b

a

(−f) = α

∫ b

a

f,

the last equality by Exercise 5.1.3.

5.2.3 Proposition. If f ∈ Rba and a ≤ c < d ≤ b, then f |[c,d] ∈ Rdc .

Proof. Given ε > 0, let P be a partition of [a, b] with S(f,P)− S(f,P) < ε.We may assume that c, d ∈ P , otherwise replace P by the refinement P∪{c, d}.If Q = P ∩ [c, d], then clearly

S(f |[c,d],Q

)− S

(f |[c,d],Q

)≤ S(f,P)− S(f,P) < ε,

hence f |[c,d] ∈ Rdc .

The following is a converse of 5.2.3.

5.2.4 Theorem. Let a < c < b. If f |[a,c] ∈ Rca and f |[c,b] ∈ Rbc, then f ∈ Rbaand ∫ b

a

f =∫ c

a

f +∫ b

c

f.

Proof. By 5.2.1, we may choose sequences of partitionsP ′n of [a, c] and P ′′n of[c, b] such that

limnS(f |[a,c],P ′n) =

∫ c

a

f and limnS(f |[c,b],P ′′n) =

∫ b

c

f.

Then Pn := P ′n ∪ P ′′n is a partition of [a, b] and∫ b

a

f ≤ S(f,Pn) = S(f |[a,c],P ′n) + S(f |[c,b],P ′′n).


Letting n→∞, we obtain ∫ b

a

f ≤∫ c

a

f +∫ b

c

f.

Replacing f by −f produces the reverse inequality for the lower integral of f ,proving the theorem.

5.2.5 Theorem. If f, g ∈ Rba and f ≤ g on [a, b], then∫ b

a

f ≤∫ b

a

g.

In particular, if m ≤ f(x) ≤M for all x ∈ [a, b], then

m(b− a) ≤∫ b

a

f ≤M(b− a).

Proof. Let P be a partition of [a, b]. By hypothesis, Mj(f) ≤Mj(g) for each j,hence S(f,P) ≤ S(g,P). Taking the infimum over P yields the first inequality.The second inequality follows from the first and Exercise 5.1.1.

5.2.6 Theorem. If f ∈ Rba, then |f | ∈ Rba and∣∣∣ ∫ b

a

f∣∣∣ ≤ ∫ b

a

|f |.

Proof. By Exercise 1.4.5, for any partition P of [a, b],

Mj(|f |)−mj(|f |) ≤Mj(f)−mj(f).

Summing over j,

S(|f |,P)− S(|f |,P) ≤ S(f,P)− S(f,P).

Since the right side can be made arbitrarily small, |f | ∈ Rba. Applying 5.2.5 to±f ≤ |f | we obtain

±∫ b

a

f ≤∫ b

a

|f |,

which gives the desired inequality.

5.2.7 Theorem. If f, g ∈ Rba, then fg ∈ Rba.

Proof. Since fg = 12[(f + g)2 − f2 − g2], it suffices to prove that f2 ∈ Rba. To

this end, let P be any partition of [a, b] and let |f | ≤M . Then

Mj(f2)−mj(f2) = M2j (|f |)−m2

j (|f |) ≤ 2M[Mj(|f |)−mj(|f |)

].

Summing over j,

S(f2,P)− S(f2,P) ≤ 2M[S(|f |,P)− S(|f |,P)

].

Since |f | ∈ Rba, the right side of the last inequality may be made arbitrarilysmall. Therefore, f2 ∈ Rba.

Exercises1.S Let {cn} be a convergent sequence in [a, b] and let f be a bounded

function on [a, b] with f(x) = 0 for all x 6∈ {cn}. Prove that f ∈ Rba andfind

∫ baf .

2. Define f on [0, 1] by f(0) = 0 and

f(x) = 2−n if 2−n−1 < x ≤ 2−n, n ≥ 0.

Prove that f ∈ R10 and evaluate

∫ 10 f .

3. Prove or disprove: |f | ∈ Rba implies f ∈ Rba.

4. A function s on [a, b] is called a step function if there exists a partitionof [a, b] such that s is constant on the interior of each partition interval.Show that a step function is integrable. Prove that a bounded function fis integrable on [a, b] iff for each ε > 0 there exist step functions s` andsu such that s` ≤ f ≤ su and

∫ ba

(su − s`) < ε.

5.S Prove that if fj ∈ Rba, 1 ≤ j ≤ n, then max{f1, . . . , fn} ∈ Rba andmin{f1, . . . , fn} ∈ Rba.

6.S Let f be continuous and f(x) < M for all x ∈ [a, b]. Prove that∫ b

a

f < M(b− a). (Compare with 5.2.5.)

7. Let f ∈ Rba be nonnegative. Prove that if f is continuous at some pointx0 ∈ [a, b] and f(x0) 6= 0, then

∫ baf > 0.

8. Let f ∈ Rba such that either(a)

∫ bafg = 0 for every continuous function g, or

(b)∫ bafg = 0 for every step function g.

Prove that f is zero at each point of continuity of f .

9.S Let f ∈ Rba and for x, y ∈ [a, b] define F (x, y) =∫ yxf . Prove that F (x, y)

is continuous in y for each x and continuous in x for each y.

10. Let f be bounded on [a, b] and integrable [c, b] for every a < c < b. Provethat the following statements are equivalent:

(a) limx→a+

∫ b

x

f exists in R.

(b) limn→+∞

∫ b

an

f exists in R for some sequence an ↓ a.

(c) f ∈ Rba.Conclude from Exercise 9 that if f ∈ Rba, then the limit in (a) is

∫ baf .


11. Let f be integrable on [0, x] for all x > 0. Prove that

lim infx→+∞

f(x) ≤ lim infx→+∞

1x

∫ x

0f ≤ lim sup

x→+∞

1x

∫ x

0f ≤ lim sup

x→+∞f(x).

Conclude that if L := limx→+∞ f(x) exists in R, then

limx→+∞

1x

∫ x

0f(t) dt = L.

12.S Let f be continuous on [a, b] and let M = supa≤x≤b |f(x)|. Prove:(a) For each ε > 0 there exists δ > 0 such that

δ(M − ε) ≤∫ b

a

|f(x)| dx ≤M(b− a).

(b) M = limp→+∞

(∫ b

a

|f |p)1/p

.

13. ⇓3 Let f, g : [a, b]→ R be continuous. Supply the details in the followingoutline of a proof of the Cauchy–Schwarz inequality.(∫ b

a

fg

)2≤(∫ b

a

f2)(∫ b

a

g2).

(a) The inequality holds if∫ b

a

g2 = 0.

(b) For any real number t,

0 ≤∫ b

a

(f − tg)2 =∫ b

a

f2 − 2t∫ b

a

fg + t2∫ b

a

g2.

(c) Let t =(∫ b

a

fg

)(∫ b

a

g2)−1

in (b).

5.3 Evaluation of the IntegralThe theorems in this section describe standard methods for evaluating

integrals. The first of these expresses the integral of a function f in terms of aprimitive or antiderivative, that is, a function whose derivative is f . It alsoshows that the process of integration is the inverse of that of differentiation.


5.3.1 Fundamental Theorem of Calculus. Let f : [a, b]→ R be continu-ous.

(a) The function G(x) :=∫ x

a

f(t) dt, x ∈ [a, b], is a primitive of f .

(b) For any primitive F of f ,∫ b

a

f = F (x)∣∣∣ba

:= F (b)− F (a).

(c) If f ′ ∈ Rba, then∫ b

a

f ′ = f(b)− f(a). In particular, f(x) = f(a) +∫ x

a

f ′.

Proof. (a) We assume that a ≤ x 0 and x+ h < b, then∣∣∣∣G(x+ h)−G(x)h

− f(x)∣∣∣∣ =

∣∣∣∣∣ 1h∫ x+h

x

[f(t)− f(x)

]dt

∣∣∣∣∣ ≤ 1h

∫ x+h

x

|f(t)− f(x)| dt.

By continuity of f at x, given ε > 0 we may choose δ > 0 such that |t− x| < δimplies |f(t) − f(x)| < ε. Thus if h < δ, then the term on the right in theabove inequality is ≤ ε, proving (5.6).

(b) Let F be any primitive of f . Then F = f ′ = G, hence F = G+ c forsome constant c. Thus from (a),∫ b

a

f = G(b)−G(a) = F (b)− F (a).

(c) For any partition P, by the mean value theorem

f(xj)− f(xj−1) = f ′(ξj)∆xj for some ξj ∈ [xj−1, xj ], j = 1, . . . , n.

For this choice of ξj ,

S(f ′,P, ξ) =n∑j=1

f ′(ξj)∆xj =n∑j=1

[f(xj)− f(xj−1)] = f(b)− f(a).

Since we may choose P so that S(f ′,P, ξ) is arbitrarily near∫ baf ′, (c) follows.

The general primitive of a continuous function f is denoted by∫f and

is called the indefinite integral of f . (In this context,∫ baf is called a definite

integral.) For example, one writes∫

3x2 dx = x3 + c, where c is the so-calledconstant of integration. In general, since primitives of a function differ only bya constant, we write ∫

f(x) dx = F (x) + c,

where F is any particular primitive of f .


5.3.2 Change of Variables Theorem. Let ϕ : [a, b] → R be continuouslydifferentiable with ϕ′ never zero and let f be integrable on [c, d] := ϕ([a, b]).Then (f ◦ ϕ)|ϕ′| ∈ Rba and∫ b

a

f(ϕ(x))|ϕ′(x)| dx =∫ d

c

f(y) dy. (5.7)

Proof. By the intermediate value theorem, we may assume thatϕ′(x) > 0 forall x, so ϕ is strictly increasing, c = ϕ(a), and d = ϕ(b).

x

y = ϕ(x)

a x1 · · · xj−1 xj xn−1· · · b

cy1

yj−1...

yj

d

yn−1...

FIGURE 5.6: The partitions Px and Py.

We show first that f ◦ ϕ ∈ Rba. For this we use the fact that ϕ induces aone-to-one correspondence between partitions Px = {x0, . . . , xn} of [a, b] andpartitions Py = {y0, . . . , yn} of [c, d], where yj = ϕ(xj) (xj = ϕ−1(yj)) (seeFigure 5.6). Since ϕ([xj−1, xj ]) = [yj−1, yj ],

Mxj (f ◦ ϕ) = sup

xj−1≤x≤xjf(ϕ(x)) = sup

yj−1≤y≤yjf(y) = My

j (f). (5.8)

Moreover, by the mean value theorem, there exists zj ∈ [yj−1, yj ] such that

∆xj = ϕ−1(yj)− ϕ−1(yj−1) = (ϕ−1)′(zj)∆yj ≤ C∆yj , (5.9)

where C is a bound for |(ϕ−1)′| on [c, d]. From (5.8) and (5.9),

S(f ◦ ϕ,Px) ≤ CS(f,Py).

The same inequality evidently holds for −f , hence

−S(f ◦ ϕ,Px) ≤ −CS(f,Py).

Adding these inequalities,

S(f ◦ ϕ,Px)− S(f ◦ ϕ,Px) ≤ C[S(f,Py)− S(f,Py)].

Since the right side may be made arbitrarily small, f ◦ ϕ ∈ Rba, hence also(f ◦ ϕ)ϕ′ ∈ Rba.

To prove (5.7), we argue as in the first part of the proof, but now comparethe Riemann sums S((f ◦ ϕ)ϕ′,Px, ξ) and S(f,Py, ζ), where the intermediatepoints in each case are taken to be left endpoints:

ξ := (x0, . . . , xn−1), ζ := (y0, . . . , yn−1) = (ϕ(x0), . . . , ϕ(xn−1)).

ThenS((f ◦ ϕ)ϕ′,Px, ξ

)=

n∑j=1

f(ζj)ϕ′(xj)∆xj

and, by the mean value theorem,

S(f,Py, ζ) =n∑j=1

f(ζj)∆ϕ(xj) =n∑j=1

f(ζj)ϕ′(tj)∆xj ,

for some tj ∈ [xj−1, xj ]. Subtracting these equations and using the triangleinequality, we obtain

∣∣S((f ◦ ϕ)ϕ′,Px, ξ)− S(f,Py, ζ)

∣∣ ≤ n∑j=1|f(ζj)| |ϕ′(xj)− ϕ′(tj)|∆xj

≤Mn∑j=1|ϕ′(xj)− ϕ′(tj)|∆xj ,

where M is a bound for |f | on [c, d]. By the uniform continuity of ϕ′ on [a, b],given ε > 0 there exists a δ > 0 such that |ϕ′(s)− ϕ′(t)| < ε/M(b− a) for alls, t with |s− t| < δ. Hence if ‖Px‖ < δ, then∣∣S((f ◦ ϕ)ϕ′,Px, ξ)− S(f,Py, ζ)

∣∣ < ε.

Letting ‖Px‖ → 0 and noting that then also ‖Py‖ → 0 (because ∆yj =ϕ′(cj)∆xj ≤ B‖Px‖, where B is a bound for |ϕ′|), we see that∣∣∣∣ ∫ b

a

f(ϕ(x))ϕ′(x) dx−∫ b

a

f(y) dy∣∣∣∣ ≤ ε.

Since ε was arbitrary, the two integrals are equal, completing the proof.

Remark. Whether ϕ is increasing or decreasing, (5.7) may be written as∫ b

a

f(ϕ(x)

)ϕ′(x) dx =

∫ ϕ(b)

ϕ(a)f(y) dy.

This formula has an easy proof if f is continuous. Indeed, in this case f has a


primitive F on [c, d], hence, by the chain rule, F ◦ϕ is a primitive for (f ◦ϕ)ϕ′.The desired formula now follows from the fundamental theorem of calculus:∫ b

a

f(ϕ(x)

)ϕ′(x) dx = F

(ϕ(b)

)− F

(ϕ(a)

)=∫ ϕ(b)

ϕ(a)f(y) dy.

Note that in this case it is not necessary to assume that ϕ′ 6= 0. ♦

5.3.3 Integration by Parts Formula. Let f and g be differentiable on [a, b]with f ′, g′ ∈ Rba. Then∫ b

a

f(x)g′(x) dx = f(x)g(x)∣∣∣ba−∫ b

a

f ′(x)g(x) dx. (5.10)

Proof. Since (fg)′ = f ′g + fg′ ∈ Rba, 5.3.1(c) implies that

f(x)g(x)∣∣∣ba

=∫ b

a

(fg)′ =∫ b

a

f ′g +∫ b

a

fg′.

5.3.4 Example. We show that

∫ π/2

0sink x dx =

(k − 1)(k − 3) · · · 4 · 2k(k − 2) · · · 5 · 3 , k odd,

π

2(k − 1)(k − 3) · · · 5 · 3k(k − 2) · · · 4 · 2, k even.

Let Ik =∫ π/2

0sink x dx. Integrating by parts,

Ik =∫ π/2

0sink−1 x sin x dx = (k − 1)

∫ π/2

0sink−2 x cos2 x dx.

Since cos2 x = 1− sin2 x, Ik = (k − 1)(Ik−2 − Ik), hence

Ik = k − 1k

Ik−2.

Iterating, we obtain

Ik = (k − 1)(k − 3) · · · (k − 2j + 1)k(k − 2) · · · (k − 2j + 2) Ik−2j .

If k = is odd, take j = (k − 1)/2 so

Ik = (k − 1)(k − 3) · · · 4 · 2k(k − 2) · · · 3 · 1 I1.

If k is even, take j = (k − 2)/2 so

Ik = (k − 1)(k − 3) · · · 5 · 3k(k − 2) · · · 6 · 4 I2.

Since I1 = 1 and I2 = π/4, the formula follows. ♦


If f ′ and g′ are continuous, then (5.10) has the following analog for indefiniteintegrals: ∫

f(x)g′(x) dx = f(x)g(x)−∫f ′(x)g(x) dx. (5.11)

Setting h = g′ and using the symbols D for differentiation and I for integration,we may write (5.11) as

I(fh) = f · Ih− I(Df · Ih).

By induction we obtain

I(fh) =n∑k=1

(−1)(k−1)Dk−1f · Ikh+ (−1)nI(Dnf · Inh

). (5.12)

The fundamental theorem of calculus may then be used to calculate∫ bafh.

Formula (5.12) may be expressed in tabular form as shown in Table 5.1. Foreach k, the entries in column k are multiplied and the resulting products areadded. The exception is in column n+1, where the product must be integratedbefore adding. The process terminates if and when Dnf = 0.

TABLE 5.1: Table for evaluating∫fh by parts.

k 1 2 3 · · · n n+ 1(−1)k−1 +1 −1 +1 · · · (−1)n−1 (−1)nDk−1f f Df D2f · · · Dn−1f DnfIkh Ih I2h I3h · · · Inh Inh

5.3.5 Example. Using Table 5.1 with f(x) = (x + 1)3 and h(x) = e5x, wehave∫

(x+ 1)3e5x dx = e5x[

(x+ 1)3

5 − 3(x+ 1)2

52 + 6(x+ 1)53 − 6(x+ 1)

54

]+ c.

TABLE 5.2: Table for evaluating∫

(x+ 1)4e5x dx by parts.

k 1 2 3 4 5(−1)k−1 +1 −1 +1 −1 +1Dk−1f (x+ 1)3 3(x+ 1)2 6(x+ 1) 6 0Ikh e5x/5 e5x/52 e5x/53 e5x/54 e5x/55

♦


Exercises1.S ⇓4 Let f : R→ R be continuous and periodic with period p > 0, that is,f(x+ p) = f(x) for all x. Prove that∫ p

0f(x+ y) dx =

∫ p

0f(x) dx for all y ∈ R.

2. Let f : (a, b) → R have a uniformly continuous derivative. Prove thatf ′ ∈ Rba and ∫ b

a

f ′ = limε→0+

[f(b− ε)− f(a+ ε)

].

3. Verify the following inequalities:

(a)S 2π

(√2− 1

)≤∫ 1

0

sin x dx√1 + x2

≤√

2− 1.

(b) 12q(p+ 1) ≤

∫ 1

0

xp dx

(1 + xp)q ≤21−q − 1

(p+ 1)(1− q) , p, q > 0, q 6= 1.

4. Establish the formula∫ 1

0(1− x)mxn dx = m!

(n+ 1)(n+ 2) · · · (n+m+ 1) .

5. Let n ∈ N. Evaluate

(a)S

∫ 1

0exp(x1/n) dx. (b)

∫ e

1lnn x dx.

6. Let k ∈ N. Show that

∫ π/2

0cosk x dx =

(k − 1)(k − 3) · · · 4 · 2k(k − 2) · · · 5 · 3 , k odd,

π

2(k − 1)(k − 3) · · · 5 · 3k(k − 2) · · · 4 · 2, k even.

7.S ⇓5 Let k ∈ N. Show that

∫ 1

0

xk√1− x2

dx =

(k − 1)(k − 3) · · · 4 · 2k(k − 2) · · · 3 · 1 if k is odd

(k − 1)(k − 3) · · · 3 · 1k(k − 2) · · · 4 · 2

π

2 if k is even.

N.B. The integral is improper but converges by Exercise 5.7.7. For theeven case, use Exercise 6.

4This exercise will be used in 13.6.4.5This exercise will be used in 13.4.2


8. Let f ′ be continuous and positive on [a, b]. Prove that∫ b

a

f(x) dx+∫ f(b)

f(a)f−1(y) dy = bf(b)− af(a).

Interpret geometrically for f > 0 and a > 0.

9.S (Young’s inequality). Let f be continuous and strictly increasing on[0, a] with f(0) = 0. Prove that∫ x

0f +

∫ y

0f−1 = yf−1(y) +

∫ x

f−1(y)f.

Deduce that∫ x

0f +

∫ y

0f−1 ≥ xy, 0 ≤ x ≤ a, 0 ≤ y ≤ f(a).

10. Use Young’s inequality to verify the following inequalities:(a)

√1− y2 + y sin−1 y ≥ xy + cosx, 0 ≤ x ≤ π/2, 0 ≤ y ≤ 1.

(b)S x ln x+ ey ≥ xy + x, 1 ≤ x ≤ 2, 0 ≤ y ≤ ln 2.

11. Give an example of a discontinuous function that(a) has a primitive, (b) has no primitive.

12. Let f and g be continuously differentiable with g > 0. Prove that∫f(x)g′(x)g2(x) dx =

∫f ′(x)g(x) dx−

f(x)g(x) .

13.S Let f ′ ∈ Rba. Prove that

limn

∫ b

a

f(x) sin(nx) dx = 0.

14. Let f be continuous on [0,+∞) such that limx→+∞ f(x) exists in R andlet a > 0. Find

limn→+∞

∫ a

0f(nx) dx.

15. Let h′ be continuous and positive on [a, b] and let g′ be continuous on[c, d] = [h(a), h(b)]. Prove that∫ b

a

g(h(x)

)dx = g(d)b− g(c)a−

∫ d

c

g′(t)h−1(t) dt.


16. Let f ∈ Ra−a, a > 0. Show that∫ a

−af =

{0 if f is an odd function,2∫ a

0 f if f is an even function.

17.S Let f : [a, b]→ R be continuous and let u, v be differentiable functionswith range contained in [a, b]. Prove that

d

dx

∫ v(x)

u(x)f = f

(v(x)

)v′(x)− f

(u(x)

)u′(x).

18. Let functions a, b, c, d : [0, 1] → [0, 1] have continuous derivatives andlet f : [0, 1]→ R be continuous. Suppose that∫ b(x)

a(x)f =

∫ d(x)

c(x)f for all x ∈ [0, 1].

Prove that ∫ b(1)

b(0)f +

∫ c(1)

c(0)f =

∫ a(1)

a(0)f +

∫ d(1)

d(0)f.

19.S Let f be continuous and g differentiable with bounded derivative on[a, b]. Evaluate

limx→a

g(x)x− a

∫ x

a

f.

20. Let p > 0, q > 1, and m, k ∈ N with m > k. Evaluate limn→+∞

sn if sn =

(a) Sn∑k=1

kp

np+1 . (b)n∑k=1

kq−1

nq + kq. (c)

n∑k=1

((mn)!

nkn[(m− k)n]!

)1/n.

21.S Let |f ′| ≤ M on [a, b]. For n ∈ N set h = (b − a)/n and xk = a + kh,k = 0, 1, . . . , n− 1. Prove that∣∣∣∣ ∫ b

a

f − hn∑k=1

f(xk−1)∣∣∣∣ ≤ hM(b− a).

22. Let f be continuous on [0, 1]. Prove that∫ 1

0

∫ x

0f(t) dt dx =

∫ 1

0(1− x)f(x) dx.

23. Let f, g : [0, 1] → R be continuously differentiable, f monotone, andg(x) > g(0) = g(1) on (0, 1). Prove that

∫ 10 fg

′ = 0 iff f is constant.

*5.4 Stirling’s FormulaStirling’s formula gives an estimate for n! when n is large. The proof relies

on material from Section 4.3. We begin with the following lemma, whichprovides the fundamental inequality needed to establish the formula.

5.4.1 Lemma. If f is concave and differentiable on (a, b), then

f(u) + f(v)2 ≤ 1

v − u

∫ v

u

f(t) ≤ f(u+ v

2

), a < u < v < b.

Proof. By the concave versions of 4.3.6 and (4.3),

f(u) v − tv − u

+ f(v) t− uv − u

≤ f(t) ≤ f ′(x)(t− x) + f(x)

for all a < u < v < b and all x, t ∈ [u, v]. Integrating with respect to t,[f(u) + f(v)

]v − u2 ≤

∫ v

u

f(t) ≤ f ′(x) (v − x)2 − (x− u)2

2 + f(x)(v − u).

Taking x = (u+v)/2 and dividing by v−u produces the desired inequalities.

5.4.2 Stirling’s Inequalities. For all n,

e7/8 ≤ enn!nn√n≤ e, (5.13)

where the middle term is decreasing in n.

Proof. Taking f(x) = ln x, u = k ∈ N, and v = k + 1 in the lemma, we have

12 ln(k2 + k) ≤ 1

2[

ln(k) + ln(k + 1)]≤∫ k+1

k

ln(t) dt ≤ ln(k + 1

2).

Rearranging,

0 ≤∫ k+1

k

ln(t) dt− 12 ln(k2 + k) ≤ ln

(k + 1

2)− 1

2 ln(k2 + k). (5.14)

Now observe thatn−1∑k=1

∫ k+1

k

ln t dt =∫ n

1ln t dt = n lnn− n+ 1,

n−1∑k=1

ln(k2 + k) =n−1∑k=1

[ln(k + 1) + ln k] = 2n∑k=2

ln k − lnn = 2 lnn!− lnn, and

ln(k + 12 )− 1

2 ln(k2 + k) = 12 ln

(1 + 1

4(k2 + k)

)≤ 1

8(k2 + k) ,

where, for the last inequality, we used the fact that ln(1 + x) < x for x > 0,which follows directly from the integral definition of ln(x + 1). Summing in(5.14) and using the above inequalities, we obtain

0 ≤(n+ 1

2)

lnn− n+ 1− lnn! ≤n−1∑k=1

18(k2 + k) = 1

8

n−1∑k=1

[1k− 1k + 1

]≤ 1

8 .

Note that the term(n+ 1

2)

lnn− n+ 1− lnn! is increasing in n since it wasobtained as a sum of nonnegative terms in (5.14). Rearranging, we have

78 ≤ −

(n+ 1

2)

lnn+ n+ lnn! ≤ 1,

where the middle term is decreasing in n. Exponentiating yields the desiredinequalities.

5.4.3 Stirling’s Formula. limn

enn!nn√n

=√

2π.

Proof. By 5.4.2, the limit L in the formula exists in R. Set In =∫ π/2

0 sinn x dx.By 5.3.4,

I2n+1 = (2n)(2n− 2) · · · 4 · 2(2n+ 1)(2n− 1) · · · 5 · 3 and I2n = π

2(2n− 1)(2n− 3) · · · 5 · 3

2n(2n− 2) · · · 4 · 2 .

For x ∈ [0, π/2] and n ≥ m, sinn x ≤ sinm x, hence

I2n+2

I2n≤ I2n+1

I2n≤ I2nI2n

= 1.

It follows that

2n+ 12n+ 2

π

2 ≤22 · 42 · 62 · · · (2n− 2)2 · (2n)2

12 · 32 · 52 · · · (2n− 1)2(2n+ 1) ≤π

2 ,

from which we obtain Wallis’s product

limn

22 · 42 · 62 · · · (2n− 2)2 · (2n)2

12 · 32 · 52 · · · (2n− 1)2(2n+ 1) = π

2 .

Denote the general term in Wallis’s product by αn. Since

2 · 4 · · · (2n− 2) · (2n) = 2nn! and 3 · 5 · · · (2n− 1) = (2n)!2nn! ,

we see that√αn = 22n(n!)2

(2n)!√

2n+ 1.


Now set βn = enn!nn√n

and note that

β2n

β2n= e2n(n!)2

n2n+1(2n)2n√2ne2n(2n)! = (n!)222n√2

(2n)!√n

.

Dividing by √αn,

β2n

β2n√αn

= (n!)222n√2(2n)!

√n

(2n)!√

2n+ 122n(n!)2 =

√2√

2 + 1/n→ 2.

Since √αn →√π/2 and βn → L, we also have

limn

β2n

β2n√αn→ L

√2π.

Therefore, L√

2π = 2, hence L =

√2π.

5.5 Integral Mean Value TheoremsThe following theorem asserts that the average value of a continuous

function over an interval [a, b] is actually assumed by the function at someintermediate point c.

5.5.1 First Mean Value Theorem for Integrals. If f is continuous on[a, b], then there exists c ∈ (a, b) such that

1b− a

∫ b

a

f = f(c).

Proof. Apply the mean value theorem for derivatives and the fundamentaltheorem of calculus to the function G(x) :=

∫ xaf(t) dt.

The next theorem is a weighted average generalization of 5.5.1.

5.5.2 Weighted Mean Value Theorem for Integrals. Let f be continuouson [a, b] and let g ∈ Rba. If g does not change sign in [a, b], then there existsc ∈ [a, b] such that ∫ b

a

fg = f(c)∫ b

a

g. (5.15)

Proof. We may assume that g ≥ 0 on [a, b], so∫ bag ≥ 0. Suppose first that∫ b

ag = 0. If C is an upper bound for |f | on [a, b], then∣∣∣ ∫ b

a

fg∣∣∣ ≤ ∫ b

a

|f |g ≤ C∫ b

a

g = 0,


hence both sides of (5.15) are zero. Now assume that∫ bag > 0. Let m = f(xm)

and M = f(xM ) denote the minimum and maximum values of f on [a, b].Since mg ≤ fg ≤Mg,

m

∫ b

a

g ≤∫ b

a

fg ≤M∫ b

a

g,

hence

m ≤

∫ b

a

fg∫ b

a

g

≤M.

An application of the intermediate value theorem completes the proof.

5.5.3 Second Mean Value Theorem for Integrals. Let f be continuousand g differentiable and monotone on [a, b] with g′ ∈ Rba. Then there existsc ∈ [a, b] such that ∫ b

a

fg = g(a)∫ c

a

f + g(b)∫ b

c

f.

Proof. Let F (x) =∫ xaf . Integrating by parts,∫ b

a

fg =∫ b

a

F ′g = F (b)g(b)−∫ b

a

g′F.

Since g is monotone, the sign of g′ does not change, hence, by 5.5.2, thereexists c ∈ [a, b] such that∫ b

a

g′F = F (c)∫ b

a

g′ = F (c)[g(b)− g(a)].

Therefore,∫ b

a

fg = F (b)g(b)− F (c)[g(b)− g(a)] = g(a)F (c) + g(b)[F (b)− F (c)],

which is the assertion of the theorem.

Remarks. (a) Because derivatives have the intermediate value property(Exercise 4.2.25), the monotonicity requirement on g will be satisfied if g′ 6= 0on [a, b].(b) The second mean value theorem for integrals holds under the less

restrictive hypotheses that f is integrable and g is monotone. A proof may befound in [3]. ♦

Exercises1. Let 0 ≤ a < b and let f be continuous on [

√a,√b]. Prove that there

exists c ∈ [a, b] such that

12

∫ b

a

f(√x) dx = a

∫ √c√a

f(x) dx+ b

∫ √b√c

f(x) dx.

2. Let 0 < a < b and let f be continuous on [b−1, a−1]. Prove that thereexists c ∈ [a, b] such that∫ b

a

f(1/x) dx = b2∫ 1/c

1/bf(x) dx+ a2

∫ 1/a

1/cf(x) dx.

3.S Let f be continuous on [0, 1]. Prove that there exists c ∈ [1/2,√

3/2]such that∫ π/3

π/6f(

sin x)dx = 2√

3

∫ c

1/2f(x) dx+ 2

∫ √3/2

c

f(x) dx.

4. Let f be continuous on [0, 1]. Prove that there exists c ∈ [0, 1] such that∫ π/4

0f(

tan x)dx =

∫ c

0f(x) dx+ 1

2

∫ 1

c

f(x) dx.

5. Let f and g be continuous on [a, b]. Show that there exists c ∈ (a, b) suchthat

g(c)∫ b

a

f = f(c)∫ b

a

g.

6. Prove: If f is continuous, g ∈ Rba, and m is lower bound for g, then thereexist c, d ∈ [a, b] such that∫ b

a

fg = f(c)∫ b

a

g +m(b− a)[f(d)− f(c)].

7.S Prove the following variant of the second mean value theorem forintegrals: Let f, g ∈ Rba with g ≥ 0. If m ≤ f ≤M on [a, b], then thereexists c ∈ [a, b] such that∫ b

a

fg = m

∫ c

a

g +M

∫ b

c

f.

Hint. Consider G(x) := m∫ xag +M

∫ bxg.

8. Let g have a nonnegative integrable derivative on [0, 1] with g(0) = 0and g(1) = 1. Show that there exists c ∈ [0, 1] such that∫ 1

0xng(x) dx = 1− cn+1

n+ 1 .

9.S Let g have a nonnegative integrable derivative on [0, π] with g(0) = 0and g(π) = 1. Show that there exists c ∈ [0, π] such that∫ π

0g(x) sin x dx = cos c+ 1.

10. Let g be twice differentiable on [a, b] with g′′ < 0 and g′′ ∈ Rba, and letf be continuous on g([a, b]). Show that if g′ ≥ m > 0 and |f | ≤M , then∣∣∣ ∫ b

a

f ′ ◦ g∣∣∣ ≤ 2M

m.

Hint. Use the second mean value theorem for integrals.

*5.6 Estimation of the IntegralIntegrals that cannot be evaluated exactly may be approximated by various

numerical methods. Of course, an integral may always be approximated by aRiemann sum; however, unless the intermediate points of the subintervals arechosen judiciously, a Riemann sum usually offers only a coarse approximationof the integral. In this section we discuss three techniques, the trapezoidal rule,the midpoint rule, and Simpson’s rule, that yield good numerical estimates ofan integral.

The approximation techniques are given in order of increasing precision.For each of these, we use partitions of the form

xk = a+ khn, k = 0, 1, . . . , n, where hn := b− an

. (5.16)

The integral∫ baf is then estimated by replacing f on the interval [xk, xk+1]

by a simpler function fk. The approximation is therefore∫ b

a

f(x) dx ≈n−1∑k=0

∫ xk+1

xk

fk(x) dx.

The error in the approximation is simply the difference between the left andright sides. The main goal in the approximation schemes described below is


to obtain, for a given class of functions, the sharpest upper bound for themagnitude of the error

The reader may wish to compare the error bounds in the three approxima-tion techniques described below with the error bound for the approximationgiven by the Riemann sum

Rn = b− an

[f(x0) + f(x1) + · · ·+ f(xn−1)

]. (5.17)

By Exercise 5.3.21, for functions f with a bounded derivative one has in generalonly the first order error bound∣∣∣∣ ∫ b

a

f −Rn∣∣∣∣ ≤ hn(b− a)‖f ′‖∞,

implying that a good estimate requires a large n. Here, for a bounded functiong on [a, b],

‖g‖∞ := sup {|g(x)| : a ≤ x ≤ b} ,

Trapezoidal RuleLet

Pk := (xk, f(xk)) = (xk, yk), k = 0, 1, . . . , n, (5.18)where the points xk are given in (5.16). The trapezoidal rule uses the linesegment from Pk to Pk+1 to approximate f on [xk, xk+1], k = 0, 1, . . . n − 1.Thus the approximating function fk is given by

fk(x) = yk +mk(x− xk), xk ≤ x ≤ xk+1, mk := yk+1 − ykxk+1 − xk

.

A simple calculation shows that∫ xk+1

xk

fk = hn2 (yk+1 + yk),

The sum

Tn :=n−1∑k=0

∫ xk+1

xk

fk = hn2(y0 + 2y1 + · · ·+ 2yn−1 + yn

)is then used to approximate

∫ baf . If f > 0, Tn may be realized as the sum of

areas of trapezoids. (See Figure 5.7.)5.6.1 Trapezoidal Rule. If f ∈ Rba, then limn Tn =

∫ baf . Moreover, if f ′′

exists and is continuous on [a, b], then the following error estimate holds:∣∣∣ ∫ b

a

f − Tn∣∣∣ ≤ h2

n

12 (b− a)‖f ′′‖∞.


x0 x1 x2 x3 x4 x5

f

xx6

FIGURE 5.7: Trapezoidal rule approximation.

Proof. For the Riemann sum Rn in (5.17),

Rn − Tn = b− a2n

[f(x0)− f(xn)

]= b− a

2n[f(a)− f(b)

]→ 0,

hence Tn = (Tn −Rn) +Rn →∫ baf.

To obtain the error estimate, consider the function

gk(x) := f(x)− fk(x)(x− xk)(x− xk+1) = f(x)− yk −mk(x− xk)

(x− xk)(x− xk+1) ,

which has singularities at xk and xk+1. Since both the numerator and thedenominator vanish at these points, the singularities may be removed usingl’Hospital’s rule. Therefore, gk(x) has a continuous extension to [xk, xk+1].Since (x− xk)(x− xk+1) does not change sign on [xk, xk+1], by the weightedmean value theorem for integrals (5.5.2) there exists a point zk ∈ [xk, xk+1]such that∫ xk+1

xk

[f(x)− fk(x)] dx =∫ xk+1

xk

gk(x)(x− xk)(x− xk+1) dx

= gk(zk)∫ xk+1

xk

(x− xk)(x− xk+1) dx

= −gk(zk)h3

6 .

It follows that∫ b

a

f(t) dt− Tn =n−1∑k=0

∫ xk+1

xk

[f(x)− fk(x)] dx = −h3n

6

n−1∑k=0

gk(zk). (5.19)

Now fix x ∈ (xk, xk+1) and define ψ(z) on [xk, xk+1] by

ψ(z) = f(z)− fk(z)− gk(x)(z − xk)(z − xk+1).

Since ψ has distinct zeros x, xk, and xk+1, Rolle’s theorem applied twice shows


that ψ′′ has a zero vk ∈ (xk, xk+1). It follows that f ′′(vk) = 2gk(x). Since xwas arbitrary,

|gk(x)| ≤ 12‖f

′′‖∞ for all x ∈ [xk, xk+1].

From this and (5.19) we see that∣∣∣∣ ∫ b

a

f(t) dt− Tn∣∣∣∣ ≤ nh3

n

12 ‖f′′‖∞ = h2

n

12 (b− a)‖f ′′‖∞.

Midpoint RuleLet

xk := xk + xk+1

2 = a+(k + 1

2hn), k = 0, 1, . . . , n− 1,

where the points xk are given in (5.16). The midpoint rule uses the constantfunction

fk(x) = f (xk) , xk ≤ x ≤ xk+1,

to approximate f on [xk, xk+1]. This amounts to approximating∫ baf by Rie-

mann sums Mn, where the intermediate points are the midpoints of theintervals:

Mn = b− an

[f(x0) + f(x1) + · · ·+ f(xn−1)

].

f

a x1 x2 x2 x3

xx1 x3x0 b

FIGURE 5.8: Midpoint rule approximation.

5.6.2 Midpoint Rule. If f ′′ exists and is continuous on [a, b], then thefollowing error estimate holds:∣∣∣∣∣

∫ b

a

f −Mn

∣∣∣∣∣ ≤ h2n

24 (b− a)‖f ′′‖∞.


Proof. The function

gk(x) = f(x)− f(xk)− f ′(xk)(x− xk)(x− xk)2

has a double singularity at xk, which may be removed by applying l’Hospital’srule twice and defining gk(xk) to be the resulting limit. Since

f(x)− f(xk)− f ′(xk)(x− xk) = gk(x)(x− xk)2

and ∫ xk+1

xk

(x− xk) dx = 0,

we see that ∫ xk+1

xk

[f(x)− f(xk)] dx =∫ xk+1

xk

gk(x)(x− xk)2 dx.

Since (x − xk)2 has constant sign on [xk, xk+1], the weighted mean valuetheorem for integrals implies that the integral on the right equals

gk(zk)∫ xk+1

xk

(x− xk)2 dx = gk(zk)h3n

12

for some point zk ∈ [xk, xk+1]. Therefore,∫ xk+1

xk

[f(x)− f(xk)] dx = gk(zk)h3n

12 . (5.20)

Now fix x ∈ [xk, xk) ∪ (xk, xk+1]. By Taylor’s theorem, there exists a pointξk ∈ [xk, xk] such that

f(x) = f(xk) + f ′(xk)(x− xk) + f ′′(ξk)2 (x− xk)2.

Solving for f ′′(ξk) we see that f ′′(ξk) = 2gk(x). Therefore, |gk(x)| ≤ ‖f ′′‖∞/2for all x ∈ [xk−1, xk+1], hence from (5.20),

−h3n

24 |f′′‖∞ ≤

∫ xk+1

xk

[f(x)− f(xk)] dx ≤ h3n

24 |f′′‖∞.

Summing, we obtain

−nh3n

24 ‖f′′‖∞ ≤

∫ b

a

f(x) dx−Mn ≤nh3

n

24 ‖f′′‖∞,

which is the assertion of the theorem.

Note that the estimates in both the trapezoidal rule and the midpoint ruleare exact for all linear functions f , since then f ′′ = 0.


Simpson’s RuleSimpson’s rule assumes n = 2m in (5.16) and uses a parabola through each

triple of points

(Pk−1, Pk, Pk+1), k = 2j + 1, j = 0, . . . ,m− 1, Pk := (xk, f(xk)) = (xk, yk),

to approximate f . To obtain the rule, observe that any polynomial p(x) of

x0 x1 x2 x3 x4

x

f

FIGURE 5.9: Simpson’s rule approximation.

degree ≤ 2 may be written in the form

p(x) = bk(x− xk−1)(x− xk) + ck(x− xk−1) + dk, (5.21)

where

bk = p(xk+1)− 2p(xk) + p(xk−1)2h2 ,

ck = p(xk)− p(xk−1)h

, and

dk = p(xk−1).

It follows that the unique polynomial pk of degree ≤ 2 that passes throughthe points Pk−1, Pk, and Pk+1 is obtained by choosing

bk = bk(f) := f(xk+1)− 2f(xk) + f(xk−1)2h2 ,

ck = ck(f) := f(xk)− f(xk−1)h

, and

dk = dk(f) := f(xk−1). (5.22)

With this choice, one readily calculates

Sn,k :=∫ xk+1

xk−1

pk(x) dx = hn3 [yk−1 +yk+1 +4yk], k = 2j+1, j = 0, · · · ,m−1,

which is taken as an approximation of∫ xk+1xk−1

f . Note that the approximationis exact for all polynomials f of degree ≤ 2, since such a polynomial may be


written in the form (5.21). Summing this result, we see that the integral ofthe approximating function on [a, b] is

Sn := b− a3n

(y0 + 4y1 + 2y2 + 4y3 + 2y4 + · · ·+ 2yn−2 + 4yn−1 + yn

).

5.6.3 Simpson’s Rule. If f ∈ Rba, then limn Sn =∫ baf . Moreover, if f (4)

exists and is continuous on [a, b], then the following error estimate holds:∣∣∣ ∫ b

a

f − Sn∣∣∣ ≤ h4

n(b− a)‖f (4)‖∞180 .

Proof. Set

R′n :=(y0 + y2 + · · ·+ yn−2

)(2hn) and R′′n :=

(y1 + y3 + · · ·+ yn−1

)(2hn).

These are Riemann sums for f on [a, b] and

6Sn = 2R′n + 4R′′n + (b− a)(2hn).

It follows that Sn →∫ baf .

To obtain the error estimate, let f (4) be continuous on [a, b] and denotethe errors by

En,k =∫ xk+1

xk−1

f(x) dx− Sn,k and En =m−1∑j=0

En,2j+1 =∫ b

a

f(x) dx− Sn.

We show that there exists a point ξk ∈ [xk, xk+1] such that

En,k = −h5nf

(4)(ξk)90 . (5.23)

It will follow that

|En| ≤mh5

n‖f (4)‖∞90 = h4

n(b− a)‖f (4)‖∞180 ,

proving the theorem.To verify (5.23), fix k and choose a point in x∗k ∈ (xk−1, xk) ∪ (xk, xk+1).

For any function g, define a function Lg on [xk−1, xk+1] by

(Lg)(x) = ak(g)(x− xk−1)(x− xk)(x− xk+1) + bk(g)(x− xk−1)(x− xk)+ ck(g)(x− xk−1) + dk(g),

where bk(g), ck(g), and dk(g) are defined as in (5.22) and ak(g) is chosen sothat (Lg)(x∗k) = g(x∗k). Then Lg is the unique polynomial of degree ≤ 3 passingthrough the four points(

xk−1, g(xk−1)),(xk, g(xk)

),(xk+1, g(xk+1)

), and

(x∗k, g(x∗k)

).


Note that the coefficients in the definition of L are linear functions of g, henceL itself is a linear function. Furthermore, Lg = g for all polynomials of degree≤ 3. Since

(Lf)(x) = ak(f)(x− xk−1)(x− xk)(x− xk+1) + pk(x)

and ∫ xk+1

xk−1

(x− xk−1)(x− xk)(x− xk+1) dx = 0,

we see that ∫ xk+1

xk−1

Lf =∫ xk+1

xk−1

pk = Sn,k.

By Taylor’s formula with integral remainder (Exercise 4.6.3), there exists apolynomial T3(x) of degree ≤ 3 such that

f(x) = T3(x) +R3(x), where R3(x) := 13!

∫ x

xk−1

(x− t)3f (4)(t) dt.

The remainder may be written

R3(x) = 13!

∫ xk+1

xk−1

qt(x)f (4)(t) dt where qt(x) :={

(x− t)3 if t ≤ x0 if t > x.

SinceLf = LT3 + LR3 = T3 + LR3 = f −R3 + LR3,

we see thatEn,k =

∫ xk+1

xk−1

(f − Lf) =∫ xk+1

xk−1

(R3 − LR3).

In the remaining calculations, for ease of notation we assume that[xk−1, xk+1] = [−h, h]. By Fubini’s theorem for continuous functions,∫ h

−hR3(x) dx = 1

3!

∫ h

−hf (4)(t)

∫ h

−hqt(x) dx dt

= 14!

∫ h

−hf (4)(t)(h− t)4 dt. (5.24)

Also, because L is linear,

(LR3)(x) = 13!

∫ h

−hf (4)(t)(Lqt)(x) dt.6

Therefore, by Fubini’s theorem,∫ h

−h(LR3)(x) dx = 1

3!

∫ h

−hf (4)(t)

∫ h

−h(Lqt)(x) dx dt. (5.25)

6This may be proved using the dominated convergence theorem. (See Exercise 11.3.??)


Now, by definition of L,

(Lqt)(x) = at(x+ h)x(x− h) + bt(x+ h)x+ ct(x+ h) + dt,

where

bt = qt(h)− 2qt(0) + qt(−h)2h2 , ct = qt(0)− qt(−h)

h, and dt = qt(−h).

Since qt(−h) = 0 and qt(h) = (h− t)3, t ∈ [−h, h],∫ h

−h(Lqt)(x) dx = 2

3h3bt + 2h2ct = 1

3h[(h− t)3 + 4qt(0)]. (5.26)

From (5.24), (5.25), and (5.26),∫ h

−h(f − Lf) =

∫ h

−h(R3 − LR3) = 1

72

∫ h

−hf (4)(t)α(t) dt, (5.27)

whereα(t) := 3(h− t)4 − 4h[(h− t)3 + 4qt(0)].

Recalling the definition of qt(0), we see that

α(t) ={

(t− h)3(3t+ h) + 16ht3 if −h ≤ t ≤ 0,(t− h)3(3t+ h) if 0 ≤ t ≤ h.

(5.28)

Thus if t ≥ 0,

α(−t) = (t+ h)3(3t− h)− 16ht3 and α(t) = (t− h)3(3t+ h). (5.29)

The cubic polynomials in (5.29) are easily seen to be equal at the convenientlychosen points t = 0,±h, 2h and therefore must be identical. Thus α is an evenfunction of t so (5.28) may be rewritten

α(t) ={

(t+ h)3(3t− h) if −h ≤ t ≤ 0,(t− h)3(3t+ h) if 0 ≤ t ≤ h.

Taking derivatives, we see that α is decreasing on [−h, 0] and increasing on[0, h]. Since α(−h) = α(h) = 0, it follows that α ≤ 0 on [−h, h]. By (5.27) andthe weighted mean value theorem for integrals, for some point ξ ∈ [−h, h] wehave∫ h

−h(f−Lf) = f (4)(ξ)

72

∫ h

−hα(t) dt = f (4)(ξ)

36

∫ h

0(t−h)3(3t+h) dt = −h

5f (4)(ξ)90 .

The same result holds for En,k, with the point ξ depending on k. This completesthe proof of the theorem.


Comparison of the Approximations

Table 5.3 below gives the errors∫ 2

1 x−1 dx − An, rounded to 10 decimal

places, where An is the approximation. The left point rule simply refers toapproximation by the Riemann sum Rn. The exact value of the integral, up to10 decimal places, is ln 2 = .6931471805 . . .

TABLE 5.3: A comparison of the methods.

Method n = 4 n = 8Left Point Rule .1836233710 .0927753302Trapezoidal Rule -.0038766290 -.0009746698Midpoint Rule .0019272893 .0004866265Simpson’s Rule -.0001067877 -.00000735011

5.7 Improper IntegralsIn this section, the Riemann integral is extended in two ways: First, the

integrand is allowed to be unbounded and second, the integration interval canbe infinite.

5.7.1 Definition. A function f is said to be locally integrable on an intervalI if f ∈ Rdc for every interval [c, d] contained in I. ♦

For example, a continuous function is locally integrable on any interval.

5.7.2 Definition. Each expression in (a)–(c) below is called an improperintegral. The integral is said to converge if the limit exists in R and to divergeotherwise. In the former case, f is said to be improperly integrable on I.

(a)∫ b

a

f := limt→b−

∫ t

a

f , where f is locally integrable on [a, b).

(b)∫ b

a

f := limt→a+

∫ b

t

f , where f is locally integrable on (a, b].

(c)∫ b

a

f :=∫ c

a

f +∫ b

c

f , where f is locally integrable on (a, c) ∪ (c, b). ♦

Note that the limits of integration in these definitions, where appropriate,may be infinite.

It is easy to see that if f is locally integrable on (a, b], then∫ baf converges

iff∫ caf converges for some (every) c ∈ (a, b). In this case,∫ b

a

f =∫ c

a

f +∫ b

c

f.

The first integral on the right is improper while the second is a Riemannintegral. Moreover, if f is also bounded and a, b ∈ R, then, by Exercise 5.2.10,the improper integral

∫ baf is simply the Riemann integral. Analogous remarks

apply to the other cases.

5.7.3 Examples. (a) Let p ∈ R. For 0 < s < t,∫ t

s

dx

xp={

(1− p)−1 (t1−p − s1−p) if p 6= 1ln t− ln s if p = 1.

It follows that∫ ∞1

dx

xpconverges iff p > 1 and

∫ 1

0

dx

xpconverges iff p < 1.

(b) Let r > 0, r 6= 1. For t > 1,∫ t

1rx dx = rt − r

ln r , hence

∫ ∞1

rx dx converges iff r < 1.

(c) Since∫ t

s

(1 + x2)−1 dx = tan−1 t− tan−1 s,

∫ ∞0

dx

1 + x2 =∫ 0

−∞

dx

1 + x2 = π

2 , hence∫ ∞−∞

dx

1 + x2 = π.

(d)∫ 1

−1

dx√|x|

=∫ 0

−1

dx√−x

+∫ 1

0

dx√x

= 2 limt→0+

∫ 1

t

dx√x

= 4. ♦

For ease of exposition, for the remainder of the section we consider onlyintegrals that are improper at the upper limit. Analogous discussions hold forthe other types of improper integrals.

The proof of the following theorem is left to the reader.

5.7.4 Theorem. Let f and g be locally integrable on [a, b) and let α, β ∈ R.If the improper integrals

∫ baf and

∫ bag converge, then so does

∫ ba

(αf + βg),and ∫ b

a

(αf + βg) = α

∫ b

a

f + β

∫ b

a

g.

In contrast to the Riemann integral, the product of improperly integrablefunctions may not be improperly integrable. For example, f(x) := 1/

√1− x

is improperly integrable on the interval [0, 1) but f2 is not. The followingexample illustrates the same phenomenon, but on an unbounded interval. Itis the first of several examples in this section that uses the fact, proved inChapter 6, that a series of the form

∑∞n=1 1/np converges iff p > 1.

5.7.5 Example. Define f on [1,+∞) by f(x) = n if n ≤ x < n + 1/n5/2,n = 1, 2, . . ., and f(x) = 0 otherwise. Then∫ n+1

1f =

n∑k=1

1k3/2 and

∫ n+1

1f2 =

n∑k=1

1k1/2 ,

hence∫∞

1 f converges, whereas∫∞

1 f2 diverges. ♦

We now have examples, on both bounded and unbounded intervals, ofnonnegative improperly integrable functions whose squares are not improperlyintegrable. Conversely, there exist locally integrable nonnegative functions onunbounded intervals, for example, f(x) = 1/x on [1,+∞), such that f2 isimproperly integrable but f is not. However, for bounded intervals this is notpossible: If f2 is improperly integrable on a bounded interval, then so is |f |.(Exercise 25.)

The remainder of this section describes various convergence tests for im-proper integrals. Many of these are analogs of convergence tests for infiniteseries, discussed in Chapter 6.

5.7.6 Comparison Test for Integrals. Let f and g be locally integrable on[a, b) such that 0 ≤ f ≤ g. If

∫ bag converges, then so does

∫ baf .

Proof. Let F (x) =∫ xaf and G(x) =

∫ xag, a ≤ x < b. Since f and g are

nonnegative, F and G are increasing, hence, by the monotone function theorem(3.1.17), ∫ b

a

f = limx→b−

F (x) and∫ b

a

g = limx→b−

G(x)

exist in R. Since F ≤ G, the conclusion follows.

5.7.7 Example. Let f(x) = 1 + sin x√x(x+ 1)2 , x > 0. By definition,

∫ ∞0f =

∫ 1

0f +

∫ ∞1f,

provided the integrals on the right converge. That this is indeed the casefollows from 5.7.3(a), 5.7.6, and the inequalities f(x) ≤ 2/

√x on (0, 1] and

f(x) ≤ 2/(x+ 1)2 on [1,+∞). ♦

5.7.8 Example. Define the gamma function Γ by

Γ(x) =∫ ∞

0tx−1e−t dt, x > 0.

To see that the integral converges for all x > 0, note that tx−1e−t ≤ tx−1

for t ∈ (0, 1], hence∫ 1

0 tx−1e−t dt converges by comparison with

∫ 10 t

x−1 dt(see 5.7.3(a)). Furthermore, by l’Hospital’s rule applied sufficiently many times,

limt→+∞

tx+1e−t = 0

so there exists t0 > 1 such that tx+1e−t ≤ 1, or tx−1e−t ≤ t−2, for all t ≥ t0.Therefore,

∫∞1 tx−1e−t dt converges by comparison with

∫∞1 t−2 dt.

The gamma function has the following recursive property:

Γ(x+ 1) = xΓ(x).

To see this, integrate Γ(x+ 1) by parts to obtain∫ b

a

txe−t dt = txe−t∣∣∣t=at=b

+ x

∫ b

a

tx−1e−t dt,

and then let a→ 0 and b→ +∞. In particular, for n ∈ N

Γ(n+ 1) = nΓ(n) = n(n− 1)Γ(n− 1) = · · · = n(n− 1) · · · 1 · Γ(1).

SinceΓ(1) =

∫ ∞0

e−t dt = 1,

we see that Γ(n+ 1) = n!. Thus Γ(x) is a continuous (indeed, differentiable)extension of the factorial function on N. ♦

5.7.9 Limit Comparison Test for Integrals. Let f and g be locally inte-grable on [a, b) with f ≥ 0 and g > 0. If L := limx→b f(x)/g(x) exists and0 < L < +∞, then

∫ bag converges iff

∫ baf converges.

Proof. Since f, g ≥ 0,∫ baf and

∫ bag exist in R. Choose c ∈ (a, b) such that

L/2 < f(x)/g(x) < 2L for all x ∈ [c, b). For such x, g(x) < 2f(x)/L andf(x) < 2Lg(x). The assertion then follows from the inequalities∫ b

c

g ≤ 2L

∫ b

c

f and∫ b

c

f ≤ 2L∫ b

c

g.

5.7.10 Example. Let f(x) =√

2x5 − x2 + 1x4 + 3x+ 5 , x ≥ 1. For g(x) = x−3/2,

limx→+∞

f(x)g(x) = lim

x→+∞

√2x8 − x5 + x3

x4 + 3x+ 5 =√

2.

Since∫∞

1 g converges, so does∫∞

1 f . ♦

5.7.11 Root Test for Integrals. Let f be locally integrable and nonnegativeon [a, b), where b > 0, and suppose that L := limx→b− [f(x)]1/x exists in R.Then

∫ baf converges if L < 1 and diverges if L > 1.

Proof. Suppose L < 1. Choose r ∈ (L, 1) and x0 ∈ (a, b) ∩ (0, b) such that[f(x)]1/x < r for all x ≥ x0. For such x, f(x) < rx, hence, by the comparisontheorem and 5.7.3(b),

∫ bx0f converges. Therefore,

∫ baf converges. A similar

argument shows that∫ baf diverges if L > 1.

5.7.12 Example. For p ∈ R and x ≥ 1, let

f(x) =(

2x+ cosx3x+ sin x

)px.

Since limx→+∞

[f(x)]1/x = (2/3)p,∫ +∞

1 f converges iff p > 0. ♦

There are examples of convergent integrals and divergent integrals withL = 1, so the root test in inconclusive in this case (see Exercise 3).

5.7.13 Definition. Let f be locally integrable on [a, b). The improper integral∫ baf is said to converge absolutely if

∫ ba|f | converges. In this case f is said to be

improperly absolutely integrable on [a, b). If∫ baf converges but not absolutely,

then the integral is said to converge conditionally. ♦

5.7.14 Proposition. If f is improperly absolutely integrable on [a, b), then∫ baf converges and ∣∣∣ ∫ b

a

f∣∣∣ ≤ ∫ b

a

|f |.

Proof. Set g(x) := |f(x)|+ f(x), so 0 ≤ g ≤ 2|f | on [a, b]. By the comparisontest,

∫ bag converges. Since f = g − |f | is the difference of two improperly

integrable functions, f is improperly integrable. The inequality follows onletting t→ +∞ in the inequality |

∫ taf | ≤

∫ ta|f |.

5.7.15 Example. For p > 0, define

f(x) = (−1)n+1

np, n ≤ x < n+ 1, n = 1, 2, . . . .

Then ∫ n+1

1|f | =

n∑k=1

1kp

and∫ n+1

1f =

n∑k=1

(−1)k+1

kp.

The first sum has a finite limit iff p > 1, while the second sum has a finitelimit iff p > 0 (see Chapter 6). Therefore,

∫∞1 f converges absolutely iff p > 1

and conditionally iff 0 < p ≤ 1. ♦

The following theorem is useful in establishing conditional convergence ofimproper integrals.

5.7.16 Dirichlet’s Test for Integrals. Let f be continuous and g′ improperlyabsolutely integrable on [a, b). If the function F (t) :=

∫ taf is bounded on [a, b)

and limx→b− g(x) = 0, then∫ bafg converges.

Proof. Let M be a bound for |F | on [a, b). Then |Fg′| ≤M |g′|, hence, by thecomparison test, Fg′ is absolutely integrable on [a, b). Integrating by partsyields ∫ t

a

fg = F (t)g(t)−∫ t

a

Fg′.

Since∫ baFg′ converges and limt→b− F (t)g(t) = 0,

∫ bafg converges.

5.7.17 Corollary. Let f be continuous and g′ locally integrable on [a, b) withlimx→b− g(x) = 0. If the function F (t) :=

∫ taf is bounded on [a, b) and if g′

has constant sign, then∫ bafg converges.

Proof. By the fundamental theorem of calculus,∫ tag′ = g(t)− g(a), hence g′

is absolutely integrable on [a, b) and Dirichlet’s test applies.

5.7.18 Example. Let h(x) = x−p sin x, x ≥ 1, where p > 0. Taking f(x) =sin x and g(x) = x−p in 5.7.17 shows that

∫∞1 h converges. Since |h(x)| ≤ 1/xp,

h is improperly absolutely integrable on [1,+∞) if p > 1. If 0 

n∑k=2

(kπ)−p∫ kπ

(k−1)π| sin x| dx = 2π−p

n∑k=2

k−p

are unbounded (see Example 6.2.5), hence h is not improperly absolutelyintegrable in this case. ♦

5.7.19 Cauchy–Schwarz Inequality for Improper Integrals. Let f andg be continuous with f2 and g2 improperly integrable on [a, b). Then fg isimproperly absolutely integrable on [a, b) and(∫ b

a

|fg|)2≤∫ b

a

f2 ·∫ b

a

g2.

Proof. By Exercise 5.2.13, for all t ∈ [a, b)(∫ t

a

|fg|)2≤∫ t

a

f2 ·∫ t

a

g2.

Now let t→ b.

Exercises

1.S Determine all values of p, q > 0 for which∫ 1

0

dx

(sin x)p(1− x)q converges.

2. Let p > 0. Show that∫ ∞

1x−px dx converges and

∫ ∞1

x−p/x dx diverges.

Show that the same behavior holds for∫ 1

0x−px dx and

∫ 1

0x−p/x dx.

3. Find examples for which limx→+∞[f(x)]1/x = 1 and

(a)∫ ∞

1f converges. (b)

∫ ∞1

f diverges.

4. Let f and g be positive and continuous on [1,+∞). Prove that if

L := limx→+∞f(x)g(x) exists in R, then lim

x→+∞

∫∞xf∫∞

xg

= L.

5.S Determine if the integrals converge or diverge:

(a)∫ 1

0

sin x− xx3 dx. (b)

∫ 1

0

sin xx2 dx. (c)

∫ 1

0

√sin xx

dx.

(d)∫ ∞

2sin2 1

ln x dx. (e)∫ ∞

2

(ln x)(sin x)x

dx. (f)∫ ∞

1

(sin x)(cosx−1)x

dx.

6. Prove that∫ π/2

0cos(secp x) dx converges for all p > 0.

7. Show that∫ 1

0

xp√1− x2

dx converges iff p > −1.

8.S Show that∫ 1

0

sinp xxq

dx converges iff p < 1 + q.

9. Find all values of p for which the integral converges:

(a) S

∫ ∞1

xpe−x dx. (b)∫ 1

0xpe−x dx. (c) S

∫ 1

0sin xp dx.

(d)∫ 1

0xp sin xp dx. (e)

∫ 1

0xp ln x dx. (f)

∫ ∞1

xp ln x dx.

(g)∫ π/2

0sinp x dx. (h) S

∫ π/2

0x sinp x dx. (i)

∫ π/2

0(1− sin x)p dx.

(j)∫ π/2

0tanp x dx. (k) S

∫ π/2

0xp cosx dx. (l)

∫ π/2

0xp sin x dx.

10. Find all values of p > 0 for which∫ 1

0x−p sin ex dx converges absolutely.

11.S Prove that∫ ∞

1

x sin x1 + x2 dx converges conditionally.

12. Prove that∫ ∞

1xp sin ex dx converges for all p. For what values of p does

the integral converge conditionally? (See 5.7.18.)

13.S Find all values of p, q > 0 for which the integral converges:

(a)∫ 1

0

dx

(1− xp)q . (b)∫ 1

0

xp

(1− x2p)q dx. (c)∫ ∞

1

dx√xp + xq

.

(d)∫ 1

0

dx√xp + xq

. (e)∫ π/2

0

sinp xcosq x dx. (f)

∫ π/2

0

1sinp xq dx.

14. Prove by induction that∫ ∞

0xne−x dx = n!.

15.S Given that∫ ∞−∞

e−x2/2 dx =

√2π (to be established in 11.5.3) show

that,

1√2π

∫ ∞−∞

x2ne−x2/2 dx = (2n− 1)(2n− 3) · · · 3 · 1 = (2n)!

n!2n .

16. Given that∫∞

0 e−s2ds =

√π/2, show that

Γ(

12

)=√π, Γ

(32

)=√π

2 , and Γ(

52

)= 3√π

4 .

17. The formula Γ(x) = x−1Γ(x+ 1) may be used to extend the gamma func-tion to non-integer values x < 0. Use this to find Γ

(− 1

2)

and Γ(− 3

2).

18. Prove that if f is absolutely integrable on [1,∞), then

limn→∞

∫ ∞1

f(xn)dx = 0.

19. (Log test for integrals). Let f be locally integrable and positive on [0,+∞)such that

L := limx→∞

− ln f(x)ln x

exists in R. Prove that∫ ∞

0f converges if L > 1 and diverges if L < 1.

20.S Use Exercise 19 to determine the convergence behavior of

(a)∫ ∞

1

(ln x)− ln x

dx. (b)∫ ∞

1

(ln x)−√x

dx.

What does the root test reveal?

21. Prove that L(a) := limt→+∞

∫ t

1/t

sin axx

dx converges for all a ∈ R and that

L(a) = L(1) for all a > 0.

22. Let f be differentiable and nonzero on [1,+∞). If limx→+∞

xf ′(x)/f(x)

exists in R and is less than −1, prove that∫∞

1 f converges.

23. Prove that if∫∞

0 f(x) dx converges, then limn

∫ 10 f(nx) dx = 0.

24.S Prove that if f ≥ 0 and∫∞

1 f converges, then∫∞

1

√f(x)/x dx converges.

25. Prove that if [a, b) is finite and f2 is improperly integrable on [a, b), then|f | is improperly integrable on [a, b).

26.S Let f be continuous and g locally integrable and positive on [a, b).Suppose that the function G(x) :=

∫ xag is bounded on [a, b) and that

limx→b− f(x) = 0. Prove that∫ bafg converges.

27. Let f be continuous on [a, b) such that∫ baf converges. If g′ is locally

integrable and has constant sign on [a, b), prove that∫ bafg converges.

28.S Let f be improperly integrable on (−∞,+∞) and c ∈ R. Prove that∫ ∞−∞

f(x+ c) dx =∫ +∞

−∞f(x) dx.

5.8 A Deeper Look at Riemann IntegrabilityIn this section we characterize Riemann integrability of a function in terms

of the size of its set of discontinuities.

5.8.1 Definition. A set A of real numbers is said to have (Lebesgue ) measurezero if for each ε > 0 there exists a finite or infinite sequence of intervals Inwith total length

∑n |In| < ε such that the sequence covers A, that is, every

member of A is contained in some In. ♦

Any countable set has measure zero. Indeed, if A = {a1, a2, . . .} and ε > 0,then the intervals In = (an− ε/2n+2, an + ε/2n+2) obviously cover A and havetotal length < ε. In particular, the set of rational numbers has measure zero.An uncountable set of measure zero is constructed in Example 10.3.4.

The following result will be proved in Chapter 11.

5.8.2 Theorem. Let f be bounded on [a, b]. Then f ∈ Rba iff its set ofdiscontinuities has measure zero.

Examples 5.1.11 and 5.1.12 are relevant here: The function in the firstexample, shown to be integrable, has a countable set of discontinuities. Thefunction in the second example, shown not to be integrable, has [0, 1] as its setof discontinuities, certainly not a set of measure zero.

Theorem 5.8.2 allows simple proofs of many of the properties discussed inthis chapter. For example, if f and g are integrable with sets of discontinuityA and B, respectively, then f + g and fg have sets of discontinuity containedin A ∪B, a set of measure zero (Exercise 2), and hence are integrable.

Exercises1. Show that if B has measure zero and A ⊆ B, then A has measure zero.

2.S Prove: If An has measure zero for every n ∈ N, then so does A1∪A2∪· · · .

3. Let A have measure zero. Prove that A+ Q has measure zero.

4. Let f : [a, b]→ [c, d] be integrable and g : [c, d]→ R continuous. Provethat g ◦ f is integrable.

5. A set A of real numbers has (Jordan) content zero if for each ε > 0 thereexist finitely many intervals of total length < ε that cover A. Show that(a) a convergent sequence has content zero.(b) [0, 1] ∩Q does not have content zero.

6.S Prove that the function f in Exercise 3.3.10 is integrable on [a, b] andfind its integral.

*5.9 Functions of Bounded Variation5.9.1 Definition. Let P = {a = x0 < x1 < · · · < xn = b} be a partition of[a, b]. For f : [a, b]→ R define

VP(f) =n∑j=1|f(xj)− f(xj−1)|.

The total variation of f on [a, b] is the extended real number

V ba (f) := supPVP(f).

The function f is said to have bounded variation on [a, b] if V ba (f) < +∞. Theset of all functions with bounded variation on [a, b] is denoted by BVba. ♦

5.9.2 Proposition. Let f : [a, b]→ R.

(a) If f ∈ BVba, then f is bounded.

(b) If f has a bounded derivative on [a, b], then f ∈ BVba.

(c) If f is monotone on [a, b], then V ba (f) = |f(b)− f(a)|.

(d) If g ∈ Rba and f(x) =∫ xag(t) dt, then V ba (f) ≤ (b− a) sup[a,b] |g|.

(e) If P is a partition of [a, b] and Q is a refinement of P, then VP(f) ≤ VQ(f).

(f) If f, g ∈ BVba and c ∈ R, then f + g, cf, fg ∈ BVba.

Proof. (a) Let a < x < b and P = {a, x, b}. Then

2|f(x)| ≤ |f(x)− f(a)|+ |f(x)− f(b)|+ |f(a)|+ |f(b)|= VP(f) + |f(a)|+ |f(b)|≤ V ba (f) + |f(a)|+ |f(b)|.

(b) Let |f ′| ≤ C on [a, b]. By the mean value theorem, given a partition P,there exists for each j a point tj ∈ (xj−1, xj) such that

VP(f) =∑P|f(xj)− f(xj−1)| =

∑P|f ′(tj)|(xj − xj−1) ≤ C(b− a).

Therefore, V ba (f) ≤ C(b− a).(c) If f is increasing, then∑

P|f(xj)− f(xj−1)| =

∑P

(f(xj)− f(xj−1)

)= f(b)− f(a).

(d) Let M := supa≤t≤b |g(t)|. Then, for any partition P,

VP(f) ≤∑P

∫ xj

xj−1

|g(t)| dt ≤M∑P

(xj − xj−1) = M(b− a).

(e) Let P = {a = x0 < x1 < · · · < xn = b} and P ′ = P ∪ {c}, wherec ∈ [xi−1, xi]. Then

VP(f) =∑j 6=i|f(xj)− f(xj−1)|+ |f(xi)− f(xi−1)|

≤∑j 6=i|f(xj)− f(xj−1)|+ |f(xi)− f(c)|+ |f(c)− f(xi−1)|

= VP′(f).

Adding points successively, yields (e).

(f) Let |f |, |g| ≤M on [a, b]. The inequality

|(fg)(xj)− (fg)(xj−1)| ≤M |g(xj)− g(xj−1)|+M |f(xj)− f(xj−1)|

shows that fg ∈ BVba. The proofs of the remaining parts of (f) are similar.

5.9.3 Example. For α > 0, define a continuous function fα on [0, 1] by

fα(x) :={xα sin(1/x) if 0 < x ≤ 1,0 if x = 0.

We show that if α ≤ 1, then fα does not have bounded variation on [0, 1]. Set

ak := 12kπ + π/2 = 2

(4k + 1)π and bk := 12kπ

and note that

fα(bk) = 0 and fα(ak) = aαk = c

(4k + 1)α , where c := 2απα.

Since bk+1 < ak < bk, for sufficiently small ε > 0 we may form the partition

Pε = {ε < ap < bp < ap−1 < · · · < ak < bk < · · · < bq+1 < aq < bq < 1}

of [ε, 1], where p and q are, respectively, the largest and smallest integerssatisfying ε < ap < bq < 1, equivalently,

12π < q < p <

2− πε4πε .

From fα(ak)− fα(bk) = c

(4k + 1)α ,

V 10 (fα) ≥ V 1

ε (fα) ≥ cp∑k=q

1(4k + 1)α .

Since ε may be chosen arbitrarily small, the upper limit p of the sum on theright may be made arbitrarily large. Since the series

∑∞k=1(4k+ 1)−α diverges,

V 10 (fα) = +∞. ♦

5.9.4 Theorem. If f ′ ∈ Rba, then f ∈ BVba and

V ba (f) =∫ b

a

|f ′(x)| dx. (5.30)

Proof. Let P be a partition of [a, b]. By the mean value theorem, there existsξj ∈ [xj−1, xj ] such that

VP(f) =∑P|f ′(ξj)| |xj − xj−1| = S(|f ′|,P, ξ),

By 5.1.18, ∫ b

a

|f ′| = limPS(f ′,P, ξ) = lim

PVP(f). (5.31)

On the other hand, given r < V ba (f), we may choose a partition Pr of [a, b] suchthat r < VPr (f) ≤ V ba (f). By 5.9.2(e), r < VP(f) ≤ V ba (f) for all refinementsP of Pr. Thus, by (5.31), r ≤

∫ ba|f ′| ≤ V ba (f). Since r was arbitrary, (5.30)

follows.

5.9.5 Corollary. If f is continuous at a and f ′ is locally integrable on (a, b],then (5.30) holds, where the integral is improper. Thus f ∈ BVba iff

∫ ba|f ′|

converges.Proof. By the theorem and the definition of improper integral, it suffices toshow that

V ba (f) = limt→a+

V bt (f)(

= supa<t≤b

V bt (f)).

Clearly, we may assume that V ba (f) > 0. Let 0 < s < r < V ba (f) and choose apartition Pr = {x0 = a < x1 < · · · < xn = b} such that r < VPr(f) ≤ V ba (f).Next, choose t ∈ (a, x1) so that |f(t)−f(a)| < r−s. For such t, let Pt = Pr∪{t}.Then

V bt (f) ≥ |f(x1)− f(t)|+n−1∑j=1|f(xj+1)− f(xj)|

= VPt(f)− |f(t)− f(a)|> VPr (f)− (r − s) > s.

It follows that limt→a+

V bt (f) ≥ s. Since s was arbitrary, the assertion follows.

5.9.6 Example. We use 5.9.5 to show that the function fα in 5.9.3 hasbounded variation on [0, 1] if α > 1. We have

|f ′α(x)| = |αxα−1 sin(1/x)− xα−2 cos(1/x)| ≤ αxα−1 + xα−2.

If α > 1, the integral∫ 1

0 xα−2 dx converges, hence

∫ 10 |fα| converges. ♦

5.9.7 Theorem. If f ∈ BVba, then there exist monotone increasing functionsg and h on [a, b] such that f = g − h.Proof. For x ∈ [a, b], define g(x) := V xa (f) and h(x) := g(x) − f(x). Clearly,g is increasing. To see that h is increasing, let x < y, let Px be an arbitrarypartition of [a, x], and let Py = Px ∪ {y}. Then

VPx(f) + f(y)− f(x) = VPy (f) ≤ g(y).Taking suprema over all partitions Px yields g(x) + f(y)− f(x) ≤ g(y), thatis, h(x) ≤ h(y).

From Exercise 5.1.4 we have5.9.8 Corollary. BVba ⊆ Rba.

*5.10 The Riemann–Stieltjes IntegralIn this section we describe the main features of the Riemann-Stieltjes

integral, a generalization of the Riemann integral. These integrals have manyof the properties of Riemann integrals; however, as we shall see, there are somestriking differences.

Definition and General Properties5.10.1 Definition. Let f and w be bounded, real-valued functions on aninterval [a, b]. If P = {x0 = a < x1 < · · · < xn = b} and ξj ∈ [xj−1, xj ], then

Sw(f,P, ξ) :=n∑j=1

f(ξj)∆wj , ∆wj := w(xj)− w(xj−1), ξ := (ξ1, . . . , ξn),

is called a Riemann-Stieltjes sum of f with respect to w. The function f issaid to be Riemann-Stieltjes integrable with respect to w if for some I ∈ R andeach ε > 0, there exists a partition Pε such that

|Sw(f,P, ξ)− I| < ε for all refinements P of Pε and all choices of ξ.

In this case I is called the Riemann-Stieltjes integral with respect to w and isdenoted by ∫ b

a

f dw =∫ b

a

f(x) dw(x) = limPSw(f,P, ξ). (5.32)

The function f is called the integrand and w the integrator. The collection ofall functions that are Riemann-Stieltjes integrable with respect to w is denotedby Rba(w). ♦

It follows from 5.1.18 that, for the integrator w(x) = x, the Riemann-Stieltjes integral reduces to the Riemann integral.

It is clear that constant functions are Riemann-Stieltjes integrable. Thefollowing example shows that, in contrast to the Riemann integral, if f has asimple discontinuity, then

∫ baf dw may not exist.

5.10.2 Example. Let f : [0, 1]→ R and define

w(x) :={

0 if 0 ≤ x < 1,1 if x = 1

We show that f ∈ R10(w) iff f is continuous at 1.

Let P = {x0 = 0 < x1 < · · · < xn = 1} be any partition of [0, 1]. Then

Sw(f,P, ξ) = f(ξn)[w(1)− w(xn−1)] = f(ξn).

Hence if f ∈ R10(w) and ξ is chosen so that first ξn = 1 and second ξn < 1, we

see that f is continuous at 1 and∫ 1

0 f dw = f(1).Conversely, if f is not continuous at 1, then there exists a sequence {am}

and r > 0 such that am ↑ 1 and |f(am)−f(1)| ≥ r for every m. Let Pm denotethe refinement P ∪ {am} of P, where am ∈ (xn−1, 1]. If ξ consists of the leftendpoints of the intervals of Pm, then Sw(f,Pm, ξ) = f(am), hence

|Sw(f,Pm, ξ)− f(1)| = |f(am)− f(1)| ≥ r.

Since P was arbitrary, f 6∈ R10(w). ♦

5.10.3 Theorem. If f, g ∈ Rba(w) and α, β ∈ R, then αf + βg ∈ Rba(w) and∫ b

a

(αf + βg) dw = α

∫ b

a

f dw + β

∫ b

a

g dw.

Proof. This follows from the identity

Sw(αf + βg,P, ξ) = αSw(f,P, ξ) + βSw(g,P, ξ)

and the linearity of the limit in (5.32), as is readily established by a standardargument.

5.10.4 Theorem. Let w := αu+ βv, where α, β ∈ R and u, v : [a, b]→ R arebounded. If f ∈ Rba(u) ∩Rba(v), then f ∈ Rba(w) and∫ b

a

f dw = α

∫ b

a

f du+ β

∫ b

a

f dv.

Proof. This follows from

Sw(f,P, ξ) = αSu(f,P, ξ) + βSv(f,P, ξ)

and the linearity of the limit in (5.32).

5.10.5 Theorem. Let a < c < b. If f |[a,c] ∈ Rca(w) and f |[c,b] ∈ Rbc(w), thenf ∈ Rba(w) and ∫ b

a

f dw =∫ c

a

f dw +∫ b

c

f dw.

Proof. Given ε > 0, choose partitions P ′ε of [a, c] and P ′′ε of [c, b] such that thefollowing hold:∣∣∣∣∫ c

a

f dw − Sw(f,P ′, ξ′)∣∣∣∣ < ε/2 for all refinements P ′ of P ′ε and all ξ′,∣∣∣∣∣

∫ b

c

f dw − Sw(f,P ′′, ξ′′)∣∣∣∣∣ < ε/2 for all refinements P ′′ of P ′′ε and all ξ′′.

Then Pε := P ′ε ∪ P ′′ε is a partition of [a, b] containing c. Moreover, if P is arefinement of Pε, then P ′ := P ∩ [a, c] and P ′′ = P ∩ [c, b] are refinements ofP ′ε and P ′′ε , respectively. From

Sw(f,P, ξ) = Sw(f,P ′, ξ′) + Sw(f,P ′′, ξ′′)

and the above inequalities we see that∣∣∣∣∣∫ c

a

f dw +∫ b

c

f dw − Sw(f,P, ξ)∣∣∣∣∣ < ε/2 + ε/2 = ε.

This establishes the existence of∫ baf dw as well as the desired equality.

5.10.6 Example. Consider the floor function integrator w(x) = bxc. A slightmodification of the argument in 5.10.2 shows that

∫ n0 f(x) dbxc exists iff f is

left continuous at the integers 1, 2, . . . , n, in which case∫ kk−1 f(x) dbxc = f(k).

For such a function, 5.10.5 implies that∫ n

0f(x) dbxc =

n∑k=1

∫ k

k−1f(x) dbxc =

n∑1f(k). ♦

The preceding example suggests that improper Riemann-Stieltjes integra-tion could be used to provide a unified theory that includes both improperRiemann integrals and infinite series. This is indeed possible; however, it turnsout that Lebesgue integration is a more efficient approach. Lebesgue theoryon Rn is developed in Chapter 11.

The following theorem reveals a remarkable symmetry between integrandand integrator.5.10.7 Integration by Parts Formula. If f ∈ Rba(w), then w ∈ Rba(f) and∫ b

a

f dw +∫ b

a

w df = f(b)w(b)− f(a)w(a).

Proof. For any partition P{x0 = a, x1, . . . , xn−1, xn = b},

f(b)w(b)− f(a)w(a) =n∑j=1

f(xj)w(xj)−n∑j=1

f(xj−1)w(xj−1) and

Sf (w,P, ξ) =n∑j=1

w(ξj)f(xj)−n∑j=1

w(ξj)f(xj−1).

Subtracting we obtain

f(b)w(b)− f(a)w(a)− Sf (w,P, ξ)

=n∑j=1

f(xj−1)[w(ξj)− w(xj−1)] +n∑j=1

f(xj)[w(xj)− w(ξj)]

= Sw(f,Q, ζ),

where ζ = (a, x1, x1, x2, x2, . . . , xn−1, xn−1, b) and Q is the refinement of Pobtained by adding the coordinates of ξ to P. Therefore,

a x1 x2 x3 x4 b

ξ1 ξ2 ξ3 ξ4 ξ5

P

ξ

a x1 x2 x3 x4 bξ1 ξ2 ξ3 ξ4 ξ5Q

ζ

FIGURE 5.10: The partition Q.

∣∣∣f(b)w(b)− f(a)w(a)−∫ b

a

f dw − Sf (w,P, ξ)∣∣∣ =

∣∣∣Sw(f,Q, ζ)−∫ b

a

f dw∣∣∣.

Since f ∈ Rba(w), the right side may be made arbitrarily small, Therefore,∫ baw df exists and equals f(b)w(b)− f(a)w(a)−

∫ baf dw.

The next result shows that under certain general conditions the Riemann-Stieltjes integral reduces to a Riemann integral.

5.10.8 Theorem. Let f ∈ Rba(w). If w is continuously differentiable, thenfw′ ∈ Rba and ∫ b

a

f dw =∫ b

a

f(x)w′(x) dx.

Proof. For any partition P of [a, b] and any ξ,

Sw(f,P, ξ)− S(fw′,P, ξ) =n∑j=1

f(ξj)∆wj −n∑j=1

f(ξj)w′(ξj)∆xj .

By the mean value theorem, for each j there exists tj ∈ (xj−1, xj) such that

∆wj = w(xj)− w(xj−1) = w′(tj)∆xj .

Therefore,

Sw(f,P, ξ)− S(fw′,P, ξ) =n∑j=1

f(ξj)[w′(tj)− w′(ξj)

]∆xj . (5.33)

Let |f | ≤M on [a, b]. By uniform continuity of w′, given ε > 0, there exists aδ > 0 such that

|w′(x)− w′(y)| < ε

2M(b− a) whenever |x− y| < δ. (5.34)

Let P ′ε be a partition of [a, b] with ‖P ′ε‖ < δ. From (5.33) and (5.34),

|Sw(f,P, ξ)− S(fw′,P, ξ)| ≤ ε

2(b− a)

n∑j=1

∆xj = ε

2 (5.35)

for all refinements P of P ′ε and all ξ. Next, choose a partition P ′′ε such that∣∣∣ ∫ b

a

f dw− Sw(f,P, ξ)∣∣∣ < ε/2 for all ξ and all refinements P of P ′′ε . (5.36)

If P is a refinement of P ′ε ∪ P ′′ε , then both (5.35) and (5.36) hold, hence, bythe triangle inequality, ∣∣∣ ∫ b

a

f dw − S(fw′,P, ξ)∣∣∣ < ε.

This shows that fw′ ∈ Rba and establishes the equality.

Monotone Increasing IntegratorsIf w : [a, b]→ R is monotone increasing, then the Riemann-Stieltjes integral

may be characterized in terms of upper and lower sums, as in the Darbouxtheory. This fact will lead to an important existence theorem for integrators ofbounded variation and continuous integrands.

Let f : [a, b]→ R be bounded and let P be a partition of [a, b]. Define theupper and lower Darboux–Stieltjes sums of f with respect to w by

Sw(f,P) =n∑j=1

Mj∆wj and Sw(f,P) =n∑j=1

mj∆wj ,

where

Mj = Mj(f) := supxj−1≤x≤xj

f(x) and mj = mj(f) := infxj−1≤x≤xj

f(x).

The upper and lower Darboux–Stieltjes integrals of f with respect to w aredefined, respectively, by∫ b

a

f dw := infPSw(f,P) and

∫ b

a

f dw := supPS w(f,P).

As in the Darboux theory, if Q is a refinement of P then, because w isincreasing,

S w(f,P) ≤ S w(f,Q) ≤∫ b

a

f dw ≤∫ b

a

f dw ≤ Sw(f,Q) ≤ Sw(f,P).

Here is the analog of 5.1.8 for Riemann–Stieltjes integrals.

5.10.9 Theorem. The following statements are equivalent:

(a) f ∈ Rba(w).

(b) For each ε > 0, there exists a partition Pε such that

Sw(f,P)− S w(f,P) < ε.

(c)∫ b

a

f dw =∫ b

a

f dw.

If these conditions hold, then∫ b

a

f dw =∫ b

a

f dw =∫ b

a

f dw.

Proof. That (b) and (c) are equivalent is proved exactly as in 5.1.8.Assume that (a) holds. Given ε > 0, choose a partition Pε such that∣∣∣ ∫ b

a

f dw − Sw(f,P, ξ)∣∣∣ < ε/3 for all refinements P of Pε and all ξ. (5.37)

For such a partition P and for each j, there exists a sequence {ξj,k}∞k=1 in[xj−1, xj ] such that limk f(ξj,k) = Mj(f). It follows that

limkSw(f,P, ξk) = Sw(f,P), where ξk = (ξ1,k, . . . , ξn,k).

From (5.37), ∣∣∣∣ ∫ b

a

f dw − Sw(f,P)∣∣∣∣ ≤ ε/3.

Similarly, ∣∣∣ ∫ b

a

f dw − S w(f,P)∣∣∣ ≤ ε/3.

Part (b) now follows from the triangle inequality.Now assume that (c) holds. Let I denote the common value of the integrals

in (c). Given ε > 0, choose partitions P ′ε and P ′′ε such that

I − ε < S w(f,P ′ε) and Sw(f,P ′′ε ) < I + ε.

The inequalities still hold if P ′ε and P ′′ε are replaced by any refinement P ofPε := P ′ε ∪ P ′′ε . Thus

−ε < S w(f,P)− I ≤ Sw(f,P, ξ)− I ≤ Sw(f,P)− I < ε.

This shows that f ∈ Rba(w) and∫ baf dw = I.

Integrators of Bounded VariationRecall that a function of bounded variation may be expressed as the

difference of two monotone increasing functions (5.9.7). This, together with5.10.9, allows for a simple proof of the following existence theorem.

5.10.10 Theorem. If f : [a, b] → R is continuous and w : [a, b] → R hasbounded variation, then f ∈ Rba(w).

Proof. By the remark preceding the theorem and by 5.10.4, we may assumethat w is increasing. By uniform continuity of f , given ε > 0, there exists aδ > 0 such that

|f(x)− f(y)| < ε

w(b)− w(a) + 1 for all x, y with |x− y| < δ.

Let Pε be a partition with ‖Pε‖ < δ. For any refinement P of Pε, ‖P‖ < δ,hence

Mj(f)−mj(f) ≤ ε

w(b)− w(a) + 1 .

Therefore,

Sw(f,P)− S w(f,P) =n∑j=1

[Mj(f)−mj(f)

]∆wj ≤ ε,

which shows that f ∈ Rba(w).

The conclusion of the theorem does not necessarily hold if w fails to havebounded variation, even if w is continuous:

5.10.11 Example. Let f = w = f1/2, where fα is defined as in Example 5.9.3.We show that

∫ 10 f dw does not exist. Referring to that example, let Pε be the

partition

ε < ap < bp < ap−1 < · · · < bk+1 < ak < bk < · · · < bq+1 < aq < bq < 1,

of [ε, 1], and let ξ consist of left endpoints of Pε. Then

Sw(f,P, ξ) = f(ε)[w(aq)− w(ε)

]+ f(bq)

[w(1)− w(bq)

]+

p∑k=q

f(ak)[w(bk)− w(ak)

]+p−1∑k=q

f(bk+1)[w(ak)− w(bk+1)

].

Since f1/2(bk) = 0 and f1/2(ak) = √ak,

Sw(f,Pε, ξ) = f(ε)[√aq − w(ε)

]−

p∑k=q

ak.

Since the sums diverge as ε→ 0, limε→0 Sw(f,Pε, ξ) = −∞. ♦

Chapter 6Numerical Infinite Series

An infinite series is the limit of a sequence of expanding finite sums. Theterms of these sums may be real numbers or functions. In this chapter weexamine the convergence behavior of series of the former type; series whoseterms are functions are treated in the next chapter. In the first section, wegive examples of series that may be summed, that is, for which an explicitnumerical value may be calculated. The remaining sections describe varioustests for convergence of general series. Additional methods of summing seriesmay be found in Section 7.4.

6.1 Definition and Examples6.1.1 Definition. Let {an} be a sequence of real numbers. The varioussymbols ∑

an =∑n

an =∞∑n=1

an = a1 + a2 + · · ·+ an + · · ·

represent what is called an infinite series with nth term an or, simply, a series.The nth partial sum of the series is defined by

sn =n∑k=1

ak.

The series is said to converge if the sequence of partial sums converges, inwhich case we write ∑

an = limnsn

and call∑an the sum of the series. If the sequence {sn} diverges, then the

series is said to diverge. ♦

6.1.2 Remark. A series may begin with an index other than 1. In this regard,note that, because

sn = sm−1 +n∑

k=mak, n ≥ m > 1,

163

the series s :=∑∞n=1 an converges iff

∑∞n=m an converges. In this case the “tail

end” of the series tends to zero:

limm→+∞

∞∑n=m

an = limm→+∞

(s− sm−1) = 0. ♦

6.1.3 Example. Using the definition e := limn→∞

(1 + 1/n)n (see 2.2.4), we showthat

e =∞∑n=0

1n! .

First, since the partial sums sn :=∑nk=0 1/k! increase, the limit s := limn sn

exists in R. From the calculations in 2.2.4,

(1 + 1/n)n = 2 +n∑k=2

1k! (1− 1/n)(1− 2/n) · · · (1− (k − 1)/n) ≤ sn.

Letting n→∞, we obtain e ≤ s. On the other hand, if n > m, then

(1 + 1/n)n > 2 +m∑k=2

1k! (1− 1/n)(1− 2/n) · · · (1− (k − 1)/n).

Letting n→∞, we see that e ≥ sm. Letting m→∞ yields e ≥ s. ♦

6.1.4 Example. The geometric series∞∑n=0

arn, where a, r ∈ R and a 6= 0,

converges iff |r| < 1, in which case∞∑n=0

arn = a

1− r .

This follows from the calculation sn =n∑k=0

ark = a1− rn+1

1− r , r 6= 1. ♦

6.1.5 Example. For m ∈ N,∞∑n=1

1n(m+ n) = 1

m

m∑k=1

1k. (6.1)

To see this, we use partial fractions: For n > m

msn =n∑k=1

m

k(m+ k) =n∑k=1

[1k− 1

(k +m)

]=

m∑k=1

1k−

n+m∑k=n+1

1k. (6.2)

The second sum on the extreme right in (6.2) is less than m/(n+ 1) and hencetends to zero as n→∞. ♦

Numerical Infinite Series 165

The series in (6.1) is an example of a telescopic series, the name referringto the cancellations taking place in (6.2).

6.1.6 Theorem. Let {an} and {bn} be sequences and let α, β ∈ R. If∑an

and∑bn converge, then

∑(αan + βbn) converges and∑

(αan + βbn) = α∑

an + β∑

bn. (6.3)

Proof. Let sn =∑nk=1 ak and tn =

∑nk=1 bk. Then αsn+βtn is the nth partial

sum of the series∑

(αan + βbn) and

limn→∞

(αsn + βtn) = α limn→∞

sn + β limn→∞

tn,

which is (6.3).

6.1.7 Example. By 6.1.6 and 6.1.4,∞∑n=1

2 · 3n+1 + 3 · 2n−1

6n = 6∞∑n=1

12n + 3

2

∞∑n=1

13n

= 6 1/21− 1/2 + 3

21/3

1− 1/3= 6.75. ♦

The following result is a test for divergence. It implies that a series whosenth term does not tend to zero must diverge. For example, the series

∑sinn

and∑

2−1/n diverge.

6.1.8 Proposition. If∑an converges, then an → 0.

Proof. an = sn − sn−1 → s− s = 0.

The converse of 6.1.8 is false:

6.1.9 Example. The harmonic series∑∞n=1 1/n diverges. Indeed, if sn is the

nth partial sum of the series, then for all n

s2n − s2n−1 = 12n−1 + 1 + · · ·+ 1

2n−1 + 2n−1 >2n−1

2n−1 + 2n−1 = 12 ,

hence {sn} is not a Cauchy sequence.It is of interest to note that, while the sequence sn diverges, the sequence

tn := sn − lnn converges. To see this, observe first that

lnn =∫ n

1

dx

x=n−1∑k=1

∫ k+1

k

dx

x<

n∑k=1

∫ k+1

k

dx

k=

n∑k=1

1k

= sn,

so tn > 0. Furthermore,

ln (n+ 1)− lnn =∫ n+1

n

dx

x>

∫ n+1

n

1n+ 1 dx = 1

n+ 1 ,

hence

tn − tn+1 = ln (n+ 1)− lnn+ sn − sn+1 = ln (n+ 1)− lnn− 1n+ 1 > 0.

Therefore, {tn} is bounded below and decreasing, hence converges.The number

γ := limn→∞

tn = limn→∞

[ n∑k=1

1k− lnn

]is known as Euler’s constant. Its value to eleven decimal places is.57721566490 . . .. As of this writing, it is not known whether γ is irrational.Note that since sn = tn + lnn, the convergence of {tn} provides another proofthat the harmonic series diverges. ♦

Exercises1. Let m ∈ N. Sum the series

∑∞n=1 an, where an =

(a) S m2n+1

(m+ 1)2n−1 . (b) (−1)n+1m3n+1

(m+ 1)3n−1 .

(c) S ln[

2n+1 + 22n+1 + 1

]. (d) 1√

n(n+ 1)(√n+ 1 +

√n).

(e) S (−1)nn(n+m) , m even. (f) 12

(n+ 1)(n+ 2)(n+ 3) .

(g) S 1(n+ 1)(n+ 3)(n+ 5) . (h) (−1)n

(n+ 1)(n+ 3)(n+ 5) .

(i) S ln m√n+

√n(n+ 1)

m√n+ 1 +

√n(n+ 1)

. (j) ln n2 + 4n+ 4n2 + 4n+ 3 .

(k) 1(n+m)

√n+√n√n+m

. (l) 18n(n+ 1)(n+ 2)(n+ 3) .

(m) (−1)n(n+m+ 1)(2n+ 1)(2n+ 4m+ 3) . (n) S (−1)n(2n+ 2m+ 1)

n(n+ 2m+ 1) .

2. Let 0 < r < 1 and m ∈ N. Sum the series∑∞n=0 an if an =

(a)S rn cos[(nπ)/2

]. (b) (−1)bn/3crn. (c) (−1)bn/mcrn.

3. Given that e =∑∞n=0 1/n! and e−1 =

∑∞n=0(−1)n1/n!, find the value of

the series∑∞n=0 an if an =

(a) S (2n+ 3)3

n! . (b) 1(2n)! . (c) S 1

(2n+ 1)! . (d) n

(2n+ 1)! . (e) n

(2n)! .

4. Let p > 0 and sn =∑nk=1 1/k. Prove that sn/np → 0, sn/ lnn→ 1, and

sn/ ln lnn→ +∞.

5. Let γ denote Euler’s constant (6.1.9). Prove that

(a)S

n∑k=1

12k − 1 − ln

√n→ ln 2 + γ

2 .

(b)n∑k=1

4k(2k − 1)(2k + 1) − lnn→ ln 4 + γ − 1.

(c)∞∑n=1

1n(2n− 1)(2n+ 1) = ln 4− 1.

6. Prove that∑an converges iff for each ε > 0 there exists an index N

such that ∣∣∣∣ n+p∑k=n

an

∣∣∣∣ < ε for all n ≥ N and p ≥ 1.

7. Suppose that an tends monotonically to 0 and that s :=∑∞n=1 an

converges.(a) Prove that nan → 0.(b) Let p ∈ N. Show that t :=

∑∞n=1 n(an−an+p) converges and express t

in terms of s.Suggestion. For (b), consider first the case p = 1.

8.S Let∑an and

∑bn be convergent series with bn > 0 for all n. Suppose

that L := limn(an/bn) exists in R. Prove that

limn

∑∞k=n ak∑∞k=n bk

= L.

Use this to calculate limn

[∑∞k=n sin(3/k2)

] [∑∞k=n 1/k2]−1.

9. For a sequence {cn}, define ∆cn = cn+1−cn. Prove the following discreteanalog of l’Hospital’s rule: Let {an} and {bn} be sequences with {bn}strictly monotone. Suppose that either (a) an → 0 and bn → 0, or(b) bn → ±∞. Then

limn

anbn

= limn

∆an∆bn

,

provided that the limit on the right exists in R.

10. Let {an} and {bn} be sequences with bn > 0 for all n and∑n bn = +∞.

Set An =∑nk=1 ak and Bn =

∑nk=1 bk. Use Exercise 9 to prove that

limn

AnBn

= limn

anbn,

provided that the limit on the right exists in R. Use this to calculate thelimits of


(a)

n∑k=1

sin(1/k)

n∑k=1

1/k, (b)

n∑k=1

ln k

n∑k=1

kp, (c)

n∑k=1

rk

n∑k=1

kp,

where r, p > 0.

11. Let {bn}∞n=1 be a sequence obtained by rearranging finitely many termsof a sequence {an}∞n=1. Show that

∑bn converges in R iff

∑an converges

in R, in which case the series are equal.

12.S Let {bk} be a sequence obtained from a sequence {an} by grouping, thatis,

bk = ank−1+1 + ank−1+2 + · · ·+ ank , k = 1, 2, . . . ,

where {nk}k is a strictly increasing sequence of nonnegative integersand n0 = 0. Show that if

∑n an converges in R, then so does

∑k bk

and the series are equal. Show that the converse is true if an ≥ 0 for allsufficiently large n. What if the terms an change sign infinitely often?

13. Let {an} be decreasing and nonnegative. Prove that∑∞n=1 an converges

iff∑∞k=0 2ka2k converges. Hint. Set

sn =n∑j=1

aj and tk =k∑j=0

2ja2j .

Show that sn ≤ tk if n ≤ 2k+1 − 1 and sn ≥ tk/2 if n ≥ 2k.

14. (Decimal representation of real numbers). Prove that every real numberx ≥ 0 has a decimal representation

x = bNbN−1 · · · b0.a1a2 · · · :=N∑n=0

bn10n +∞∑n=1

an10−n,

where the digits bn, an are integers from 0 to 9.Hint. By Exercise 1.5.16, it may be assumed that x ∈ [0, 1). Prove byinduction that for each n there exist aj ∈ {0, 1, . . . 9} and xn ∈ [0, 10−n)such that

x = xn +n∑j=1

aj10−j = xn + (.a1a2 · · · an).

15.S Call a decimal representation bNbN−1 · · · b0.a1a2 · · · standard if no indexn exists such that ak = 9 for all k ≥ n. Prove that every real numberhas a unique standard decimal representation.

16. A real number x ≥ 0 is a repeating decimal if it has decimal representationof the form

x = bNbN−1 · · · b0.a1a2 · · · amam+1am+2 · · · am+k,

where the upper bar indicates that the block repeats forever. (For example,61/495 = .12323 · · · = .123.) Prove that every repeating decimal isrational.

17. Prove the converse of Exercise 16, that is, every rational number p/q isa repeating decimal. Conclude that if f : N 7→ N is strictly increasing,then

∑10−f(n) is irrational.

Hint. By the division algorithm you may assume that 1 ≤ p < q. Beginby showing that if p/q = .a1a2 · · · , then for each n

p

q= .a1a2 · · · an + rn

10nq , where rn ∈ {0, 1, . . . , q − 1},

and use this to show that qan = 10rn−1 − rn, where r0 := p.

6.2 Series with Nonnegative TermsThere are a variety of tests for the convergence of series with nonnegative

terms. The most basic of these is the following theorem.

6.2.1 Theorem. If an ≥ 0 for all n, then the series∑an converges in R iff

its partial sums are bounded.

Proof. Since the terms of the series are nonnegative, the sequence of partialsums is increasing. The assertion therefore follows from the monotone sequencetheorem (2.2.2).

6.2.2 Remark. By 6.1.2, the theorem is still valid if the inequality an ≥ 0holds only eventually, that is, for all n ≥ some m. Many of the results in thischapter have similar extensions. Rather than make these explicit, we leave thestraightforward formulations to the reader. ♦

6.2.3 Example. Let an, bn ≥ 0 for all n and suppose that∑an and

∑bn

converge. By the Cauchy–Schwarz inequality (1.6.3(e)),n∑k=1

√akbk ≤

( n∑k=1

ak

)1/2( n∑k=1

bk

)1/2.

Since the sums on the right are bounded, so are the sums on the left. Therefore,∑√anbn converges. ♦


The following test relates the convergence of a series to that of an improperintegral.

6.2.4 Integral Test. Let f be decreasing, positive, and locally integrable onthe interval [1,∞). Then the series

∑∞n=1 f(n) converges iff the improper

integral∫∞

1 f converges. Moreover, for every n ∈ N

0 ≤ s− sn ≤∫ ∞n

f(x) dx. (6.4)

Proof. For each n ∈ N let

sn =n∑k=1

f(k) and tn =∫ n

1f.

For each k ∈ N and x ∈ [k, k + 1], f(k + 1) ≤ f(x) ≤ f(k), hence

f(k + 1) ≤∫ k+1

k

f ≤ f(k)

and so

sn − f(1) =n∑k=2

f(k) =n−1∑k=1

f(k + 1) ≤n−1∑k=1

∫ k+1

k

f = tn ≤n−1∑k=1

f(k) = sn−1.

Therefore, {sn} is bounded iff {tn} is bounded. The first assertion of thetheorem now follows from 6.2.1.

Now observe that for m > n,

0 ≤ sm − sn =m∑

k=n−1f(k) =

m−1∑k=n

f(k + 1) ≤m−1∑k=n

∫ k+1

k

f =∫ m

n

f.

Letting m→ +∞ yields (6.4).

Inequality (6.4) allows one to estimate the error made by approximating sby a partial sum sn.

6.2.5 Example. (p-series). By 5.7.3(a),∫∞

1 1/xp dx converges iff p > 1. There-fore, the same is true of the series s :=

∑∞n=1 1/np. . Furthermore, if p > 1,

then0 ≤ s− sn ≤

∫ ∞n

x−p dx = 1(p− 1)np−1 .

Thus if the partial sum sn is to agree with s in, say, the first 10 decimal places,then n should be chosen so that (p− 1)np−1 > 1010. ♦

6.2.6 Comparison Test. Let 0 ≤ an ≤ bn for all n. If∑bn converges, then

so does∑an.

Proof. The partial sums of∑bn are bounded and dominate those of

∑an,

hence assertion follows from 6.2.1.

6.2.7 Limit Comparison Test. Let an, bn > 0 for all n.(a) If r := lim sup(an/bn) < +∞ and

∑bn converges, then

∑an converges.

(b) If r := lim inf(an/bn) > 0 and∑an converges, then

∑bn converges.

(c) If r := lim(an/bn) exists and r ∈ (0,+∞), then∑bn converges iff

∑an

converges.Proof. For (a), let r ∈ (r,+∞) and choose N so that supn≥N an/bn < r. Thenan < bnr for every n ≥ N , hence the conclusion follows from the comparisontest and 6.2.2. Part (b) follows similarly by choosing r ∈ (0, r) and then N sothat infn≥N an/bn > r. Part (c) follows from (a) and (b).

6.2.8 Examples. (a) The series∑n

2n + n3

3n + n2

converges by comparison with the convergent series∑n(2/3)n, since

2n + n3

3n + n2 (2/3)−n = 1 + n3/2n1 + n2/3n → 1.

(b) The series ∑n

√cn+ d−

√cn

n+ 1 , c, d > 0,

converges by comparison with the convergent series∑n n−3/2, since

√cn+ d−

√cn

n+ 1 n3/2 = dn3/2

(n+ 1)(√cn+ d+

√cn)→ d

2√c. ♦

6.2.9 Ratio Test. Let an > 0 for all n.

(a) If r := lim supn

an+1

an< 1, then

∑an converges.

(b) If r := lim infn

an+1

an> 1, then

∑an diverges.

Proof. (a) Let r ∈ (r, 1) and choose N so that supn≥N an+1/an < r. Forn > N

an < an−1r < an−2r2 < · · · < aNr

n−N ,

so∑an converges by the comparison test.

(b) If r > 1 there exists N such that infn≥N an+1/an > 1. Therefore,

an > an−1 > an−2 > · · · > aN > 0, n > N,

so an cannot converge to zero. Therefore,∑an diverges.

6.2.10 Examples. (a) Let an denote the general term of the series∞∑n=1

8 · 14 · 20 · · · (6n+ 2)6 · 11 · 16 · · · (5n+ 1) c

n,

where c > 0. Thenan+1

an= 6n+ 8

5n+ 6 c→65c,

hence the series converges if c < 5/6 and diverges if c > 5/6. If c = 5/6, then

an = 8 · 14 · 20 · · · (6n+ 2)5n6 · 11 · 16 · · · (5n+ 1)6n = (1 + 1/3)(2 + 1/3) · · · (n+ 1/3)

(1 + 1/5)(2 + 1/5) · · · (n+ 1/5) > 1,

so the series diverges in this case as well.(b) For the series

∞∑n=1

(n!)prn2, r > 0, p ∈ R

the ratios arean+1

an= (n+ 1)pr2n+1,

hence the series converges iff r < 1.(c) For the series

∞∑n=2

2n ln2 n

n! ,

an+1

an= 2 ln2(n+ 1)

(n+ 1) ln2 n≤ 2 ln2(n+ 1)

(n+ 1) → 0,

hence the series converges. ♦

6.2.11 Root Test. Let an ≥ 0 for all n and set ρ := lim supn a1/nn .

(a) If ρ < 1, then∑an converges.

(b) If ρ > 1, then∑an diverges.

Proof. (a) Let r ∈ (ρ, 1) and choose N such that supn≥N a1/nn < r. Then

an < rn for all n ≥ N , hence∑an converges by the comparison test,

(b) By 2.4.2, there exists a subsequence a1/nknk → ρ. Then, for all sufficiently

large k, ank > 1, hence the series diverges.

6.2.12 Example. For the series∞∑n=1

(a+ (−1)nb

)n, where a > b > 0,

lim supn a1/nn = a+ b, hence the series converges if a+ b < 1 and diverges if

a+ b > 1. If a+ b = 1, an 6→ 0 so the series diverges in this case as well. ♦

6.2.13 Remark. No conclusion regarding the convergence of the series∑an

in 6.2.9 and 6.2.11 can be inferred from the relations r ≥ 1, r ≤ 1, or ρ = 1.Indeed, the series

∑1/n2 and

∑1/n satisfy r = r = ρ = 1, yet the first series

converges while the second diverges. ♦

In Section 6.3, we consider more refined tests that can detect convergenceor divergence in cases where the ratio or root test fails. Here’s an example:

6.2.14 Example. Let an denote the nth term of the series∞∑n=1

(√an+ b

√n−√an

)n,

where a, b > 0. Then

a1/nn =

√an+ b

√n−√an = b

√n√

an+ b√n+√an→ b

2√a,

hence the series∑an converges if b2 < 4a and diverges if b2 > 4a. If b2 = 4a,

the root test fails but the log test (6.3.4) shows that the series converges inthis case. (Exercise 6.3.11.) ♦

6.2.15 Remark. By Exercise 2.4.12, if an > 0 for all n, then

lim infn

an+1

an≤ lim inf

na1/nn ≤ lim sup

na1/nn ≤ lim sup

n

an+1

an.

This shows that if the ratio test determines convergence or divergence conclu-sively, then so does the root test. It also suggests that the root test may beeffective when the ratio test fails. ♦

6.2.16 Example. Let an = snδn−1 + tnδn, where δn = 12 [1 + (−1)n] and

0 < s < t < 1. Then

an ={sn if n is odd,tn if n is even,

so the ratios an+1/an are sn+1/tn or tn+1/sn, and the roots a1/nn are s or t,

depending on the parity of n. Therefore, r = 0, r = +∞ and ρ = t, whichshows that the root test detects the convergence of

∑an while the ratio test

does not. ♦

Exercises1.S Determine whether the series

∑an converges or diverges, where an =

(a) n!3 · 5 · · · (2n+ 1) . (b) 3 · 5 · · · (2n+ 1)

(2n+ 1)! . (c) 3 · 6 · · · (3n)3 · 5 · · · (2n+ 1) .

(d) 4nn!5 · 8 · · · (3n+ 2) . (e) 2 · 4 · · · (2n)

4 · 7 · · · (3n+ 1) . (f) 4 · 7 · · · (3n+ 1)5 · 9 · · · (4n+ 1) .

2. Determine whether the series∑an converges or diverges, where an =

(a) S n3

2n . (b) (1 + r/n)n2, r > 0. (c) lnn

n1.1 .

(d) S n!nn. (e) (n1/n − r)n. (f) 1

n1+1/n .

(g) S 1n(lnn)(ln lnn)p . (h) 2n

n! . (i) sin2(1/n).

(j) S sin2(1/√n). (k) rn

1− rn , r 6= ±1. (l) 1(1− rn)2 , r 6= ±1.

(m) S n+ sinnn3 + sinn. (n) n+ lnn

nr lnn . (o) 1nr lnn.

(p) S 3nn!nn

. (q) n!(1.1)n3 . (r)

(1 + an

1 + bn

)n, a, b > 0.

(s) S 12lnn . (t) 1

3lnn . (u) rsinn, r > 0.

(v) S 3n + 4n8n − 6n . (w) (1− r/n)n

3, r > 0. (x) 1/rlnn, r > 0.

3. Let an > 0 for all n and suppose that∑an diverges. Prove that

∑anbn

diverges for all sequences {bn} with lim infn bn > 0.

4. Let bn → p > 0. Prove that∑∞n=1 n

−bn converges if p > 1 and divergesif p < 1. Give an example of a sequence {bn} with bn > 1 for all n andbn ↓ 1 such that

∑n−bn diverges.

5.S Let an > 0 for all n. Prove that∑an converges iff

∑n1/nan converges.

6. Find all values of a, b, p, q > 0 for which∑∞n=1 an converges if an =

(a) S 1lnp n. (b) lnp n

nq. (c) 1

nq lnp n.

(d) (nqp + 1)1/q − np. (e) S (n+ 1)p − npnq

. (f) pn/2n!( n∏j=1

pj + 1)−1.

(g) S a+ np

b+ nq. (h) 1 + anp

1 + bnq. (i)

(1 + anp

1 + bnq

)n.

7. Let {an} be positive and decreasing. Prove that∑an converges iff

∑a2n

converges.

8. Let an > 0 for all n. Prove or disprove: If∑∞n=1 an converges, then

Numerical Infinite Series 175∑∞n=1 bn converges, where bn =

(a) Sa2n. (b)√an. (c)

∑n≤j≤n+m

aj . (d) S minn≤j≤2n

aj .

(e) maxn≤j≤2n

aj . (f)∑

n≤j≤2naj . (g)

∏n≤j≤2n

aj . (h) S∏

1≤j≤naj .

(i)n∑j=1

anaj . (j) 1an

∏n<j≤n+m

aj . (k) 1an

∑n<j≤n+m

aj . (l) S(ann

)3/4.

9. Let an > 0. Prove that if∑an converges, then

∑(1− cos an) converges.

10. Let r > 0 and p > 3/r. Prove that∑[

bn/(rp − 3)]n converges for all

sequences {bn} with bn → r.

11.S Let an, bn > 0 and an+1/an ≤ bn+1/bn for all n. Prove that if∑bn

converges, then so does∑an.

12. Let an > 0. Show that∑an converges iff

∑f(an) converges, where

f(x) =

(a) sin x. (b) tan x. (c) sin−1 x. (d) tan−1 x.

(e) x

1 + ax. (f) ln(1 + x). (g) ex − 1. (h)x3 + x2 + x.

13. Let {pn} be a sequence in Z+ and {an} a sequence of positive reals.(a) Prove that if

∑n an converges, then

∑n

√anan+pn converges, pro-

vided that either {pn} is bounded or an is decreasing.(b) Suppose {pn} is bounded and an ↓ 0. Prove that if

∑n

√anan+pn

converges, then∑n an converges.

Does (b) hold if {an} is not monotone or {pn} is not bounded?

14.S Let g be positive and differentiable on [1,∞) such that limx→∞ g(x) = 0,and let f be differentiable in a neighborhood of 0 such that f(0) = 0,f(x) > 0 for x > 0, f ′ is continuous at 0, and f ′(0) > 0. Prove that∑∞n=1 f(g(n)) converges iff

∑∞n=1 g(n) converges.

15. Let f : R→ [0,+∞) be twice differentiable and p > 0. Prove:(a)S If p ≤ 1 and

∑f(1/np) converges, then f(0) = f ′(0) = 0.

(b) If p ≥ 1 and f(0) = f ′(0) = 0, then∑f(1/np) converges.

16. Let an ≥ 0 for all n and suppose that∑an converges. Prove that if

α > 1/2, then∑n−α√an converges. Give an example which shows that

the assertion is false if α = 1/2.

17.S This exercise shows that e is irrational. Assume, for a contradiction, thate = m/n, m,n ∈ N. Let sn =

∑nk=0 1/k!. Using the series representation

e =∑∞k=0 1/k!, show that

(a) n!(e− sn) ∈ N.(b) n!(e− sn) <

∑∞k=1(n+ 1)−k = 1/n.

Conclude that e must be irrational.

18. Let sn =∑nk=1 k

−p, 0 1,

11− p if p+ q = 1,

+∞ if p+ q < 1.

19. Let sn =∑nk=1 ak and suppose that

∑a2nn−p < +∞, where an, p > 0.

Prove that limn snn−q = 0 for all q > (p+ 1)/2.

20. Let {an}, {bn}, and {cn} be positive sequences such that∑cn diverges,

bn → b ∈ (0,+∞], and an/an+1 = 1 + bncn. Prove that an → 0.Hint. Let r ∈ (0, b) and choose m so that bn > r for all n ≥ m. Thenam+k/am+k+1 > 1 + rcm+k for all k ≥ 0.

6.3 More Refined Convergence TestsThe tests in this section are frequently useful when the root and ratio tests

fail. The first is a generalization of the ratio test.

6.3.1 Kummer’s Test. Let an, bn > 0 for all n and set

cn := anan+1

bn − bn+1.

(a) If c := lim infn cn > 0, then∑n an converges.

(b) If c := lim supn cn < 0 and∑n b−1n diverges, then

∑n an diverges.

Proof. (a) Set sn =∑nk=1 ak and let r ∈ (0, c). Choose N so that cn ≥ r for

all n ≥ N . Since anbn − an+1bn+1 = cnan+1, for all m > N we have

aNbN ≥ aNbN −ambm =m−1∑n=N

(anbn−an+1bn+1

)≥ r

m−1∑n=N

an+1 = r(sm−sN ),

hence sm ≤ sN + aNbN/r. The partial sums of∑an are therefore bounded so

the series converges.(b) If c < 0, there exists an N such that

akbk − ak+1bk+1 < 0 for all k ≥ N.

Then

aNbN − anbn =n−1∑k=N

(akbk − ak+1bk+1) < 0

so an > (aNbN )/bn, for all n > N . Since∑

1/bn diverges,∑an diverges by

the comparison test.

A simple but important consequence of Kummer’s test is

6.3.2 Raabe’s Test. Let an > 0 for all n and set

dn := n( anan+1

− 1).

(a) If d := lim infn dn > 1, then∑an converges.

(b) If d := lim supn dn < 1, then∑an diverges.

Proof. Take bn = n in Kummer’s test, so

cn = anan+1

n− (n+ 1) = dn − 1.

Then c = d− 1 and c = d− 1 and the assertions follow.

6.3.3 Example. We use Raabe’s test to show that the series

∑n

1(n+m)!

n∏k=1

(k + a), where a > 0 and m ∈ N,

converges iff m > 1 + a. Indeed, since

n

(anan+1

− 1)

= n

(n+m+ 1n+ 1 + a

− 1)

= n(m− a)n+ 1 + a

→ m− a,

the series converges if m− a > 1 and diverges if m− a < 1. If m− a = 1, thenthe general term reduces to

1(n+m)!

n∏k=1

(m− 1 + k) = m(m+ 1) · · · (m+ n− 1)(n+m)! = 1

(m+ n)(m− 1)! ,

hence the series diverges in this case as well. Note that the ratio test isinconclusive in this example since an+1/an → 1. ♦

The following test is sometimes useful when the root test fails.

6.3.4 Log Test. Let an > 0 for all n and set cn := ln(a−1n )/ lnn.

(a) If c := lim infn cn > 1, then∑an converges.

(b) If c := lim supn cn < 1, then∑an diverges.

Proof. (a) Let p ∈ (1, c). Then there exists N such that cn > p for all n ≥ N .For such n, ln(a−1

n ) > lnnp, hence an < 1/np. Since p > 1,∑an converges by

the comparison test. The proof of (b) is similar.

6.3.5 Example. Let an denote the general term of the series∞∑n=1

(a+ np

b+ nq

)n,

where a, b, p, q > 0. The root test shows that the series converges if p < qand diverges if p > q. If p = q, the test is inconclusive, so we consider cases. Ifa ≥ b, then an ≥ 1 and the series diverges. If a < b, we use the log test: Byl’Hospital’s rule, the sequence

cn = − ln anlnn = ln(b+ np)− ln(a+ np)

(lnn)/n

has the same limit as

pnp−1

b+ np− pnp−1

a+ np

1− lnnn2

= pnp+1

1− lnn(a+ np)− (b+ np)

(a+ np)(b+ np)

= p(a− b)b/np + 1

n

(1− lnn)(a+ np) .

The first quotient in the last expression tends to p(a− b) < 0. By l’Hospital’srule, the second quotient has the same limit as

1(1− lnn)(pnp−1)− (a+ np)/n = −n1−p

p(lnn− 1) + (a/np + 1) ,

which converges to 0 if p ≥ 1 and to −∞ if p < 1. Thus if p = q and a < b,then

limncn =

{0 if p ≥ 1+∞ if p < 1,

hence∑an converges iff p < 1. ♦

Exercises1. Show that the ratio test is a consequence of Kummer’s test.

2. Show that Raabe’s test detects the convergence properties of the p-series∑1/np for p 6= 1, whereas the ratio and root tests do not.

3.S Use Raabe’s test to determine the convergence of∑an if an =

(a)n∏k=1

3k − 13k + 1 . (b) 1

2n

n∏k=1

(2k − 1

2k

). (c) 1

3n(n+ 1)!

n∏k=1

(3k + 1).

Show that the ratio test is inconclusive in each case.

4. Let a, b > 0 and m ∈ N. Use Raabe’s test to show that the followingseries converges iff b− a > m:

∑n

( n∏k=1

mk + a

)( n∏k=1

mk + b

)−1.

5. Find all values of p > 0 for which the series converge:

(a)S∑n

pnn!nn

. (b)∑n

pnn!(p+ 1)(2p+ 1) · · · (np+ 1) .

What does the ratio test reveal?

6.S Show that the series∞∑n=1

1 · 3 · · · (2n− 1)(2 + p) · (4 + p) · · · (2n+ p)

converges iff p > 1.

7. Let p ∈ N. Use Raabe’s test to show that the series∑ (pn)!ppn(n!)p

converges if p > 3 and diverges if p < 3. What does the ratio test tell usfor these values of p?

8. Let a, b, c > 0 and m ∈ Z+. Use Raabe’s test to show that the series∞∑n=1

1nm

n∏k=1

(ak + b

ak + c

)converges iff c > (m+ 1)a+ b.

9. Let b > 0 and m ∈ N. Use Raabe’s test to show that∞∑n=1

(n∏k=1

kb

kb+ 1

)mconverges if m > b and diverges if m < b. What does the ratio testreveal? What happens if m = b = 1?

10. Let r > 0. Use the log test to determine the convergence behavior of∑an if an =

(a)S rln lnn. (b) 1nr lnn . (c) 1

nr ln lnn . (d)S 1(lnn)rn .

11. Let an be as in 6.2.14. Use the log test to verify that∑an converges if

b2 = 4a.

12. Let p, r > 0. Use the log test to verify that∑r(np) converges iff r < 1.

13.S Let bn → b > 0. Use the log test to verify that∑b− lnnn converges if

b > e and diverges if b < e.

14. Use the log test to show that the series∑

(1− 1/n)(np) converges iffp > 1.

15. Let p > 0 and a 6= 0. Use the log test to verify that∑

(1 − a/np)ndiverges if p ≥ 1, converges if 0 0, and diverges if0 0. Determine the convergence behavior of∑an if an =

(a)S(a+ np

b+ nq

)lnn. (b)

(a+ np

b+ nq

)ln lnn. (c)

(1 + anp

1 + bnq

)ln lnn.

17. Show that∑

(lnn)bn diverges if {bn} is bounded. What happens in theunbounded special cases (a) bn = − lnn and (b) bn = −np, p > 0? Whatdoes the root test reveal in (b)?

18.S (Loglog test) Let an > 0 for all n and set

cn = − ln (nan)ln lnn , c := lim inf

ncn, and c := lim sup

ncn.

Prove that∑an converges if c > 1 and diverges if c < 1. Use the test to

determine the convergence behavior of∞∑n=2

(1 + an

1 + bn

)ln lnn, a, b > 0.

19. Let a, b > 0. Use the log test to show that∑n

(1 + anp

1 + bnq

)lnn

diverges if p > q; converges if p < q; and if p = q, then converges ifb/a > e and diverges if b/a < e. Use the log log test to show that theseries also diverges if p = q and b/a = e.

20. Use Kummer’s test to prove Gauss’s test: Let an > 0 for all n and let{αn} be a bounded sequence such that

anan+1

= 1 + r

n− αnns,

where r, s ∈ R, s > 1. Then∑an converges iff r > 1.

21.S Use Kummer’s test to prove Bertrand’s test: Let an > 0 for all n andlet {βn} be a sequence such that

anan+1

= 1 + 1n− βnn lnn.

Then∑an converges if lim inf

nβn > 1 and diverges if lim sup

nβn < 1.

6.4 Absolute and Conditional ConvergenceThe convergence tests in Sections 6.2 and 6.3 apply only to series with

nonnegative terms. In this section we consider tests applicable to general series.

6.4.1 Definition. A series∑an is said to converge absolutely if

∑|an|

converges. A convergent series that does not converge absolutely is said toconverge conditionally. ♦

6.4.2 Theorem. (a) If∑an converges absolutely, then the series∑an,

∑a+n , and

∑a−n

converge and∑an =

∑a+n −

∑a−n ,

∑|an| =

∑a+n +

∑a−n .

(b) If∑an converges conditionally, then

∑a+n and

∑a−n diverge.

Proof. (a) If∑|an| converges, then the inequalities

0 ≤ a±n = 12(|an| ± an

)≤ |an|

and the comparison test show that∑a+n and

∑a−n converge. The remaining

assertions in (a) follow from the identities an = a+n − a−n and |an| = a+

n + a−n .(b) If

∑an and

∑a−n converge, then

∑|an| =

∑an + 2

∑a−n converges.

The same conclusion holds if∑an and

∑a+n converge. Hence if

∑an converges

conditionally, then neither∑a+n nor

∑a−n can converge.

All series of the form∑∞n=1(−1)n+1/np, 0 < p ≤ 1 converge conditionally.

This follows from the alternating series test given below. The following exampleis somewhat more interesting.6.4.3 Example. We show that the series

s :=∞∑n=2

[(−1)nnp − 1

]−1

converges conditionally iff 1/2 1.To see this, note first that if p < 0, then the nth term of the series does

not tend to zero, and if p = 0 the series is undefined. So assume p > 0. If sndenotes the nth partial sum of the series, then

s2n+1 =n∑k=1

{1

(2k)p − 1 −1

(2k + 1)p + 1

}=

n∑k=1

(αk + βk), (6.5)

where

αk := (2k + 1)p − (2k)p[(2k)p − 1

][(2k + 1)p + 1

] and βk := 2[(2k)p − 1

][(2k + 1)p + 1

] .By the mean value theorem applied to xp on the interval [2k, 2k + 1],

αk = pxp−1k[

(2k)p − 1][

(2k + 1)p + 1] , for some xk ∈ (2k, 2k + 1).

If 0 < p ≤ 1, then

αk ≤p

(2k)1−p[(2k)p − 1

][(2k)p + 1

] = 1(2k)1−p

[(2k)2p − 1

] ≤ 1kp+1

the last inequality for sufficiently large k. Therefore,∑nk=1 αk converges by

comparison with∑k 1/kp+1. Also, since

βkk−2p = 2

[2p − k−p][(2 + 1/k)p + k−p] →1

22p−1 ,

the limit comparison test shows that∑nk=1 βk converges iff p > 1/2. Therefore

the partial sum (6.5) has a finite limit iff p > 1/2. Since s2n+1 − s2n → 0, theseries s converges iff p > 1/2. Since np − 1 ≤

∣∣(−1)n+1np − 1∣∣ ≤ np + 1, s

converges absolutely iff p > 1. ♦


The tests of Sections 6.2 and 6.3 for positive-term series may be used inconjunction with 6.4.2 to test series with terms of mixed sign. For example,the inequality

∣∣n−2 sinn∣∣ ≤ n−2, together with the comparison test, shows

that the series∑n−2 sinn converges absolutely and hence converges.

The remainder of the section describes tests that are useful for establishingconditional convergence. They rely on the following discrete analog of theintegration-by-parts formula, due to Abel.

6.4.4 Summation by Parts. Let {an}, {bn}, and {sn} be sequences suchthat s0 = 0 and sk − sk−1 = ak, k ≥ n ≥ 1. Then, for m > n ≥ 1,

m∑k=n

akbk =m−1∑k=n

sk(bk − bk+1) + smbm − sn−1bn.

Proof. Since ak = sk − sk−1,m∑k=n

akbk =m∑k=n

skbk −m∑k=n

sk−1bk =m∑k=n

skbk −m−1∑k=n−1

skbk+1.

Combining the last two sums yields the desired formula.

6.4.5 Dirichlet’s Test. Let {an} and {bn} be sequences such that the follow-ing conditions hold:

(a) The partial sums of∑an are bounded.

(b) limn bn → 0, and

(c) The series∑|bn+1− bn| converges, which is the case, for example, if {bn}

is monotone.

Then∑anbn converges.

Proof. Let

sn :=n∑k=1

ak and tn :=n∑k=1

akbk.

If |sn| ≤M for every n, then, by 6.4.4,

|tm − tn−1| =∣∣∣ m∑k=n

akbk

∣∣∣ ≤M m∑k=n|bk − bk+1|+M(|bn|+ |bm|) m ≥ n > 1.

Since the right side of the inequality tends to 0 as m,n→∞, {tn} is a Cauchysequence and hence converges. If {bn} is monotone, say decreasing, then

n∑k=1|bk+1 − bk| =

n∑k=1

(bk − bk+1) = b1 − bn+1,

which converges.


6.4.6 Example. We apply Dirichlet’s test to the series∑∞n=1 bn sin(nθ), where

{bn} is monotone and bn → 0. To establish the boundedness of the sequenceof partial sums sn :=

∑nk=1 sin(kθ), we use the identity

2 sin (θ/2) sin (kθ) = cos[(k − 1/2)θ

]− cos

[(k + 1/2)θ

].

Summing,

2 sin (θ/2)n∑k=1

sin (kθ) = cos(θ/2)− cos((n+ 1)θ/2

).

Thus if θ is not a multiple of 2π, then |sn| ≤ | sin(θ/2)|−1. By 6.4.5,∑∞n=1 bn sin(nθ) converges for all θ. Note that if, for example, θ = π/2 and

bn = 1/n, then the convergence is conditional. ♦

6.4.7 Alternating Series Test. If bn ↓ 0, then∑∞n=1(−1)n+1bn converges.

Proof. The partial sums of∑∞n=1(−1)n+1 are clearly bounded, hence the

assertion follows from 6.4.5.

6.4.8 Example. (Alternating Harmonic Series). By 6.4.7, the series∑∞n=1(−1)n+1n−1 converges. We show that its value is ln 2. Let

sn =n∑k=1

(−1)k+1

kand tn =

n∑k=1

1k− lnn.

By 6.1.9, the sequence {tn} converges. Also, by Exercise 1.5.3,

s2n =2n∑k=1

(−1)k+1

k=

2n∑k=n+1

1k

= t2n − tn + ln 2.

It follows that s2n → ln 2. Since s2n+1 − s2n → 0, sn → ln 2. ♦

The contrast between absolutely convergent and conditionally convergentseries is strikingly displayed in the context of rearrangements.

6.4.9 Definition. A rearrangement of a series∑∞n=1 an is a series

∑∞k=1 amk ,

where {mk} is a sequence of positive integers that contains every positiveinteger exactly once.1 ♦

6.4.10 Theorem. If∑∞n=1 an converges absolutely to s, then any rearrange-

ment∑∞k=1 amk converges absolutely to s.

Proof. Assume first that an ≥ 0 for all n. Let

tn =n∑k=1

amk and sn =n∑k=1

ak.

1In other words, k 7→ mk is a one-to-one mapping of N onto itself.

For each N , choose K so large that the terms ak, 1 ≤ k ≤ N , are includedamong the terms amk , 1 ≤ k ≤ K. Then sN ≤ tK ≤

∑∞k=1 amk . Letting

N → ∞ shows that∑n an ≤

∑k amk . Since

∑n an is a rearrangement of∑

k amk , the reverse inequality holds as well. The general case follows byconsidering

∑a+n and

∑a−n and using 6.4.2.

6.4.11 Example. Consider the series

t := 1− 12p −

14p + 1

3p −16p −

18p + 1

5p −1

10p −1

12p + · · · ,

which is a rearrangement of the alternating series

s := 1− 12p + 1

3p −14p + 1

5p −16p + 1

7p −18p + 1

9p −1

10p + · · · ,

If p > 1, then both series converge absolutely and t = s. If p = 1, then the twoseries converge to different values. Indeed, if sn and tn denote the nth partialsums of s and t, respectively, then

t3n =n∑k=1

(1

2k − 1 −1

4k − 2 −14k

)= 1

2

n∑k=1

(1

2k − 1 −12k

)= s2n

2 → s

2 .

Sincet3n+1 = t3n + 1

2n+ 1 and t3n+2 = t3n+1 −1

4n+ 2 ,

we see that tn → s/2. ♦

The phenomenon illustrated in the last example holds generally, as shownby the following remarkable result due to Riemann.

6.4.12 Theorem. If s :=∑∞n=1 an converges conditionally, then, for any real

number x, some rearrangement of s converges to x.

Proof. We may assume that x ≥ 0. For n ∈ N let

sn :=n∑j=1

aj , s+n :=

n∑j=1

a+j , s−n :=

n∑j=1

a−j , and s+0 := 0.

Since s+n → +∞ (6.4.2), there exists a smallest integerm1 such that s+

m1> x.

Since x ≥ 0, m1 6= 0. Because s−n → +∞, there exists a smallest positiveinteger n1 such that s+

m1− s−n1

< x and then a smallest integer m2 such thats+m2− s−n1

> x. Obviously, m2 > m1. Continuing in this manner, we obtainstrictly increasing sequences {mk} and {nk} with the following properties:

• mk is the smallest integer such that

tk := s+mk− s−nk−1

= (a+1 + · · ·+ a+

mk)− (a−1 + · · ·+ a−nk−1

) > x,

• nk the smallest integer such that

rk := s+mk− s−nk = (a+

1 + · · ·+ a+mk

)− (a−1 + · · ·+ a−nk) < x.

Now consider the series

s′ := a+1 + · · ·+ a+

m1− a−1 − · · · − a−n1

+ a+m1+1 + · · ·+ a+

m2− a−n1+1 − · · · .

The terms of s′ are either aj or 0, and s′ contains each term of the series sexactly once. Thus s′ is a rearrangement of s. We show that s′ = x.

By the minimality properties of the sequences {mk} and {nk},

tk − a+mk≤ x < tk and rk < x ≤ rk + a−nk ,

hencex− a+

nk≤ rk < x < tk ≤ x+ a+

mk.

Since an → 0,limkrk = lim

ktk = x. (6.6)

Now let s′k denote the kth partial sum of the series s′ and consider the partialsums

r1 = (a+1 + · · ·+ a+

m1)− (a−1 + · · ·+ a−n1

),t2 = (a+

1 + · · ·+ a+m2

)− (a−1 + · · ·+ a−n1),

r2 = (a+1 + · · ·+ a+

m2)− (a−1 + · · ·+ a−n2

).

If m1 + n1 ≤ k ≤ m2 + n1, then s′k includes the terms of r1, additional termsfrom a+

m1+1 + · · · + a+m2

, and no others, hence r1 ≤ s′k ≤ t2. Similarly, ifm2 + n1 ≤ k ≤ m2 + n2, then s′k includes the terms of t2, additional termsfrom −a−n1+1 − · · · − a−n2

, and no others, so r2 ≤ s′k ≤ t2. In general, for j ≥ 1,

mj + nj ≤ k ≤ mj+1 + nj ⇒ rj ≤ s′k ≤ tj+1 andmj+1 + nj ≤ k ≤ mj+1 + nj+1 ⇒ rj+1 ≤ s′k ≤ tj+1.

From (6.6), s′k → x.

Exercises1. Suppose that

∑an converges absolutely. Prove that

∑anbn converges

absolutely for all sequences {bn} with lim supn→∞ |bn| < +∞.

2.S Suppose the ratio test shows that∑an does not converge absolutely.

Can∑an still converge conditionally?

3. For an alternating series s =∑∞n=1(−1)n+1bn, prove the inequality

|s− sn| ≤ bn. This result is useful in estimating the error made by usingsn to approximate s. For example, use the estimate to determine how largen should be so that the partial sum sn agrees with s =

∑∞n=1(−1)n+1/n4

in nine decimal places.


4. Let p > 0. Determine whether the series∑an converges absolutely,

conditionally, or diverges, where an =

(a) S (−1)nn1/n . (b) S (−1)n(n1/n − 1).

(c) S (−1)n sin(1/np). (d) (−1)n sin−1(1/np).(e) (−1)n tan(1/np). (f) (−1)n tan−1(1/np).

(g) sin[(2n+ 1)π/2]lnn . (h) (−2)n

n! .

(i) S (−1)n√n+ 1−

√n

np. (j) (−1)n

n lnp(n+ 1) .

(k) (−1)n 3n√n3 + 2

. (l) (−1)npn (n!)2

(2n)! .

(m) S (−1)npn + (−1)n , (p 6= 1). (n) (−1)n

np + (−1)n , (n ≥ 2).

(o) (−1)nn!3 · 5 · · · (2n+ 1) . (p) (−1)n3 · 6 · · · (3n)

3 · 5 · · · (2n+ 1) .

(q) (−1)nenn!5 · 8 · · · (3n+ 2) . (r) (−1)n+1n[(1)n−3]/2.

5. Suppose that {bn} is monotone and bn → 0. Use the identity

2 sin (θ/2) cos (nθ) = sin[(n+ 1/2)θ

]− sin

[(n− 1/2)θ

]to verify that the series

∑∞n=1 bn cosnθ converges if θ/(2π) 6∈ Z.

6. Let bn ↓ 0 and m ∈ N. Show that∑∞n=0(−1)bn/mcbn converges.

7. Let m ∈ N. Show that∞∑n=1

(−1)n+1m

n(n+m) =m∑n=1

(−1)n+m+1

n+ δm ln 2,

where δm = 0 or 2 according as m is even or odd.

8. (Abel) Prove that if∑∞n=1 an converges and {bn} is bounded and mono-

tone, then∑∞n=1 anbn converges.

9.S Prove that∞∑n=2

(n− 1/2) sin(nθ)np + (−1)n converges for all real θ iff p > 1.

10. Let p > 1. Express each of the series∞∑n=1

1(2n− 1)p and

∞∑n=1

(−1)nnp

in

terms of∞∑n=1

1np

.


11. Prove that if∑nan converges, then

∑an converges and, moreover,∑

|an|p converges for every p > 1. What if p = 1?

12. Prove that if∑n−pan converges, then

∑n−qan converges for all q > p.

13.S (a) Let sn =∑nk=1 an, where an → 0. Suppose there exists a positive

integer q such that snq → s ∈ R. Prove that∑an converges to s.

(b) Use (a) to sum the series

s := 1 + 12 + 1

3 −14 −

15 −

16 + 1

7 + 18 + 1

9 − · · · ,

where sums of length three alternate signs. Generalize your result toalternating sums of length p > 1.(c) Show that in contrast to (b), the following series diverges, where sumsof lengths p = 3 and q = 2 alternate signs.

t := 1 + 12 + 1

3 −14 −

15 + 1

6 + 17 + 1

8 −19 − · · · .

*6.5 Double Sequences and SeriesA double sequence is a doubly indexed infinite array

{am,n} = {am,n}∞m,n=1

of real numbers am,n.2 Associated with each double sequence are the so-callediterated limits

limm

limnam,n and lim

nlimmam,n.

For the first iterated limit to exist, each inner limit bm := limn am,n, as wellas the outer limit limm bm, must exist. Similar remarks apply to the seconditerated limit. The following scheme illustrates the case when the iteratedlimits exist and equal L.

a1,1 a1,2 · · · a1,n → b1a2,1 a2,2 · · · a2,n → b2...

... · · ·...

...am,1 am,2 · · · am,n → bm↓ ↓ · · · ↓ ↓c1 c2 · · · cn → L

In addition to iterated limits, a double sequence gives rise to a third typeof limit, frequently called a double limit to distinguish it from iterated limits.

2More precisely, a double sequence is a function (m, n) 7→ am,n from N× N to R.

6.5.1 Definition. Let L ∈ R. We write

L = limm,n

am,n

and say that am,n converges to L or has limit L if for each ε > 0 there existsN ∈ N such that |am,n − L| < ε for all n,m ≥ N . We also write

limm,n

am,n = +∞ (−∞)

if for each r ∈ R there exists N ∈ N such that am,n > r (< r) for all n,m ≥ N .♦

Double limits have properties similar to limits of single sequences. Forexample, double limit analogs of 2.1.3, 2.1.4, 2.1.5, and 2.1.11, are readilyformulated and proved.

It is easy to find examples of iterated limits that exist but are unequal;am,n = (1− 1/n)m is one such. When this happens, the double limit cannotexist, as shown in 6.5.2 below. However, even if the iterated limits are equal,the double limit may fail to exist. This is the case for the sequence defined by

am,n ={

1 if m = n, and0 otherwise,

which has zero iterated limits. Finally, the example

am,n = (−1)m+n(1/m+ 1/n)

shows that a double limit may exist even if both iterated limits fail to exist.The following theorem gives the basic connection between double limits

and iterated limits.

6.5.2 Iterated Limit Theorem. Let {am,n} be a double sequence such thatlimn am,n exists for each m and limm am,n exists for each n. If the double limitlimm,n am,n exists, then the iterated limits limm limn am,n and limn limm am,nexist and equal the double limit.

Proof. Let L := limm,n am,n, bm := limn am,n, and cn := limm am,n. Givenε > 0, choose N ∈ N such that

|am,n − L| < ε for all m, n ≥ N.

Letting n → +∞ yields |bm − L| ≤ ε for all m ≥ N . Therefore, bm → L.Similarly, cn → L.

6.5.3 Definition. Given a double sequence {am,n}, form the partial sums

sm,n =m∑j=1

n∑k=1

aj,k, m, n ∈ N.

The double infinite series∑am,n =

∑m,n

am,n =∞∑

m,n=1am,n

is said to converge to s ∈ R if {sm,n} converges to s in the sense of 6.5.1. Theseries converges absolutely if

∑|am,n| converges, and converges conditionally

if∑am,n converges but not absolutely. ♦

As in the case of single series, an absolutely convergent double series con-verges (Exercise 7). Moreover, a double series with nonnegative terms convergesabsolutely iff the partial sums

∑mj=1

∑nk=1 aj,k are bounded (Exercise 5).

The iterated limits

limm

limnsm,n = lim

mlimn

m∑j=1

n∑k=1

aj,k =∞∑j=1

∞∑k=1

aj,k

andlimn

limmsm,n = lim

nlimm

n∑k=1

m∑j=1

aj,k =∞∑k=1

∞∑j=1

aj,k

are called iterated series. The following result, a special case of the Fubini–Tonelli theorem, establishes a connection between double and iterated series.

6.5.4 Fubini–Tonelli Theorem for Series. A double series∑am,n is

absolutely convergent iff one (hence both) of the following conditions hold:∞∑m=1

∞∑n=1|am,n| < +∞ and

∞∑n=1

∞∑m=1|am,n| < +∞. (6.7)

In this case, ∑m,n

am,n =∞∑m=1

∞∑n=1

am,n =∞∑n=1

∞∑m=1

am,n. (6.8)

Proof. Set sm,n =∑mj=1

∑nk=1 aj,k and tm,n =

∑mj=1

∑nk=1 |aj,k|. The first

assertion of the theorem is clear, since each condition in (6.7) implies thatT := supm,n tm,n < +∞, and conversely.

Now suppose that∑am,n is absolutely convergent. Let s := limm,n sm,n.

For each j,n∑k=1|aj,k| ≤ tj,n ≤ T for all n,

hence∑∞k=1 aj,k converges. Set rm :=

m∑j=1

∞∑k=1

aj,k. Given ε > 0, choose N such

that ∣∣∣ m∑j=1

n∑k=1

aj,k − s∣∣∣ < ε for all m, n ≥ N.

Fixing m ≥ N and letting n→ +∞ in this inequality yields |rm− s| ≤ ε. Thisshows that rm → s, which is the first equality in (6.8). The proof of the secondequality is similar.

Exercises1. Let α : N → N be strictly increasing. Show that if L := limm,n am,n

exists in R, then limm,n aα(n),n exists and equals L.

2. A double sequence {am,n} is said to be Cauchy if, given ε > 0, thereexists N ∈ N such that |am,n − am+p,n+q| < ε for all m, n ≥ N and allp, q ≥ 0. Prove that {am,n} converges iff it is Cauchy. Hint. Show that{an,n} converges.

3.S Determine the convergence behavior, double and iterated, of the followingsequences, where a, b > 0:

(a) sin(m/n). (b) ln(mn)n

. (c) (−1)mmm+ n

.

(d) m− nm+ n

. (e) mn

(m+ n)2 . (f) mn

m2 + n2 .

(g) 1m1/n . (h) n

m+ n2 . (i) n3m

m4 + n4 .

(j) n+ nm sin(1/n)am+ bn

. (k) m2n

an2 + bm4 . (l) n2 sin(1/n)m+ n

.

4. Show that if∑am,n converges, then limm,n am,n = 0.

5. Let am,n ≥ 0 for all m, n ∈ N. Prove that∑m,n am,n converges iff

s := supm,n sm,n < +∞, in which case the series sums to s.

6. State and prove a comparison test for double series with nonnegativeterms.

7. Prove that an absolutely convergent double series converges.

8. For sequences {an} and {bn}, set cm,n = anbm. Prove that c :=∑m,n cm,n converges absolutely iff a :=

∑n an and b :=

∑n bn con-

verge absolutely, in which case c = ab. Conclude that∑m,nm

−qn−p

converges iff p, q > 1.

9.S Given a double sequence {am,n} with am,n ≥ 0, let {bn} be the sequenceobtained by summing am,n along the diagonals j + k = n + 1, that is,bn :=

∑nj=1 aj,n+1−j . Prove that

∑am,n converges iff

∑n bn converges,

in which case the two series are equal.

10. Use Exercise 9 to show that the double series

(a)∑m,n

1(m+ n)p , (b)S

∑m,n

1(m2 + n2)p/2 , and (c)

∑m,n

1mp + np

converge iff p > 2. Show that for p > 2,∞∑

m,n=1

1(m+ n)p =

∞∑n=2

1np−1 −

∞∑n=2

1np.

11.S Prove that∑m,n r

mn converges iff |r| < 1, in which case the iteratedseries

∑∞m=1

∑∞n=1 r

mn converges.

12.S Prove the root test for double series with nonnegative terms: Supposethat L := limm,n am,n

1/mn exists. Then∑m,n am,n converges if L < 1

and diverges if L > 1.

13. Let am,n = (−1)mn−m−2. Prove that∑m≥0,n≥2

|am,n| = 1 and∑

m≥0,n≥2am,n = 1/2.

Chapter 7Sequences and Series of Functions

7.1 Convergence of Sequences of FunctionsUnlike numerical sequences, sequences of functions have several modes of

convergence. In this chapter we consider the two most common types: pointwiseand uniform. Other types of convergence will be examined in Chapter 11.

7.1.1 Definition. Let S be a nonempty set. A sequence of real-valued func-tions fn on S is said to converge pointwise on S to a function f : S → R iffn(x)→ f(x) for each x ∈ S. We then write f = limn f or fn → f (on S). ♦

The following theorem is an immediate consequence of 2.1.11 and 3.1.9.

7.1.2 Theorem. Let fn → f and gn → g pointwise on S and let h becontinuous such that h ◦ fn and h ◦ f are defined on S. Then, for α, β ∈ R,

αfn + βgn → αf + βg, fngn → fg,fngn→ f

g(if g 6= 0) and h ◦ fn → h ◦ f

pointwise on S.

The definition of pointwise convergence may be phrased as follows: Foreach x ∈ S and ε > 0 there exists an index N such that |fn(x)− f(x)| < ε forall n ≥ N . Here, the index N usually depends on both ε and x. Removing the

f + ε

f − ε

f

fn

S

FIGURE 7.1: Uniform convergence of fn to f .

dependence on x results in the stronger property of uniform convergence:

193

7.1.3 Definition. A sequence of functions fn : S → R is said to convergeuniformly on S to a function f : S → R if, for each ε > 0, there exists N ∈ Nsuch that |fn(x)− f(x)| < ε for all n ≥ N and all x ∈ S. (See Figure 7.1.) ♦

Clearly, uniform convergence implies pointwise convergence. The examplesbelow show that the converse is not generally true. For these examples and forthe exercises at the end of the section, the following propositions are useful.

7.1.4 Proposition. Let fn, f : S → R. Suppose that there exists a sequence{an} of positive real numbers such that an → 0 and |fn(x) − f(x)| ≤ an forall x ∈ S and all n. Then fn converges uniformly to f on S.

Proof. One need only choose N in the definition of uniform convergence sothat an < ε for all n ≥ N .

7.1.5 Proposition. Let fn, f : S → R. Then fn converges uniformly to f onS iff

limn

[fn(bn)− f(bn)

]= 0

for any sequence {bn} in S.

Proof. If fn converges uniformly to f on S, choose N so that |fn(x)−f(x)| < εfor all n ≥ N and all x ∈ S. For such n, |fn(bn)− f(bn)| < ε.

Conversely, suppose fn does not converge uniformly to f on S. Then thereexists an ε > 0, and points bn ∈ S such that |fn(bn)− f(bn)| ≥ ε for infinitelymany n. Thus the sequential condition fails.

7.1.6 Examples. (a) The sequence {xn} converges pointwise but not uni-formly to zero on (−1, 1). (Take bn = 1/21/n in 7.1.5.) The convergence isuniform on intervals [−r, r], 0 < r < 1, since on such an interval |xn| ≤ rn andrn → 0.(b) The sequence {n/xn} converges pointwise to zero on (1,+∞) but the

convergence is not uniform there, as can be seen by taking bn = 21/n in7.1.5. The convergence is uniform for x ∈ [r,+∞), r > 1, since then |n/xn| ≤n/rn → 0.(c) The sequence {xne−nx} converges uniformly to zero on [0,+∞) since

xne−nx ≤ e−n for x ≥ 0.(d) The sequence {n−1 sinnx} converges uniformly to zero on R since

|n−1 sinnx| ≤ 1/n for all x.(e) The sequence {sin(x/n)} converges pointwise to zero on R, but the

convergence is not uniform, as can be seen, for example, by takingbn = πn/2in 7.1.5. The convergence is uniform on bounded intervals [a, b] since on thisinterval | sin(x/n)| ≤ |x|/n ≤ max{|a|, ||b}. ♦

There is an analog of 7.1.2 for uniform convergence; however, it is morerestrictive and requires the notion of uniform boundedness.

Sequences and Series of Functions 195

7.1.7 Definition. A sequence of functions fn is said to be uniformly boundedon S with uniform bound M if |fn(x)| ≤M for all x ∈ S and all n. ♦

7.1.8 Proposition. Let fn → f pointwise on a set S.

(a) If {fn} is uniformly bounded on S, then f is bounded on S.

(b) If each fn is bounded on S and fn → f uniformly on S, then {fn} isuniformly bounded on S, hence f is bounded.

(c) If fn → f uniformly on S and f is bounded, then {fn}∞n=N is uniformlybounded for some N .

Proof. (a) This follows by letting n→ +∞ in the inequality |fn(x)| ≤M .(b) Choose N such that

|fn(x)− f(x)| ≤ 1 for all n ≥ N and x ∈ S.

For such n and for all x ∈ S,

|fn(x)| ≤ |fn(x)− f(x)|+ |f(x)− fN (x)|+ |fN (x)| ≤ 2 +MN ,

where MN is a bound for fN on S. Since the functions f1, . . ., fN−1 arebounded, {fn}∞n=1 is uniformly bounded.(c) Let |f(x)| ≤M for all x. Choose N such that |fn(x)− f(x)| ≤ 1 for all

n ≥ N and x ∈ S. For such n, |fn(x)| ≤ 1 +M for all x ∈ S.

The sequence {fn} on (0, 1) defined by

fn(x) ={n if 0 < x < 1/n,0 if otherwise

shows that the first assertion in (b) may be false if the convergence is merelypointwise.

7.1.9 Theorem. Let fn → f and gn → g uniformly on S and let h beuniformly continuous such that h ◦ f and h ◦ fn are defined on S. Then

(a) αfn + βgn → f + g uniformly on S, α, β ∈ R.

(b) h ◦ fn → h ◦ f uniformly S.

(c) fngn → fg uniformly on S if {fn} and {gn} are uniformly bounded on S.

(d) 1gn→ 1

gnuniformly on S if

{1gn

}is uniformly bounded on S.

Proof. The proof of (a) is left to the reader. To prove (b), choose δ > 0 suchthat |h(u) − h(v)| < ε for all u, v with |u − v| < δ and choose N such that|fn(x)−f(x)| < δ for all x ∈ S and n ≥ N . For such n, |h◦fn(x)−h◦f(x)| < ε.

For (c), let M > 0 be a common uniform bound for the sequences {|fn|}and {|gn|} and let ε > 0. Choose N such that

|fn(x)− f(x)| < ε/2M and |gn(x)− g(x)| < ε/2M.

for all x ∈ S and n ≥ N . For such n and x,

|fn(x)gn(x)− f(x)g(x)| ≤ |fn(x)gn(x)− f(x)gn(x)|+ |f(x)gn(x)− f(x)g(x)|= |gn(x)| |fn(x)− f(x)|+ |f(x)| |gn(x)− g(x)|≤M |fn(x)− f(x)|+M |gn(x)− g(x)| < ε.

For (d), let 1/|gn(x)| ≤M for all n and x. Then the same inequality holdsfor g, and ∣∣∣∣ 1

gn(x) −1

g(x)

∣∣∣∣ = |gn(x)− g(x)||gn(x)g(x)| ≤

1M2 |gn(x)− g(x)|.

The hypothesis of uniform boundedness in parts (c) and (d) of the theoremcannot be relaxed. (See Exercises 6 and 7.)

There are versions of the Cauchy criterion for pointwise and uniformconvergence of sequences of functions. For the pointwise version, consider asequence of functions fn on S such that limm,n |fn(x)− fm(x)| = 0 for eachx ∈ S. Then {fn(x)}∞n=1 is a Cauchy sequence of real numbers and henceconverges to a unique real number f(x). Thus fn → f on S. Here is theanalogous result for uniform convergence:

7.1.10 Uniform Cauchy Criterion. A sequence of functions fn convergesuniformly on a set S iff for each ε > 0 there exists an index N such that

|fn(x)− fm(x)| < ε for all x ∈ S and all m, n ≥ N . (7.1)

Proof. If fn → f uniformly on S, then, given ε > 0, there exists an index Nsuch that |fn(x)− f(x)| < ε/2 for all x ∈ S and all n ≥ N . An application ofthe triangle inequality yields (7.1).

Conversely, assume that the condition holds. Then, in particular,limm,n |fn(x) − fm(x)| = 0 for every x ∈ S, hence, by the observation pre-ceding the theorem, there exists a function f such that fn → f pointwise onS. We claim that the convergence is in fact uniform. To see this, let ε > 0and choose N as in (7.1). Letting m → +∞ in that inequality then yields|fn(x) − f(x)| ≤ ε for all x ∈ S and all n ≥ N . This shows that fn → funiformly on S.

7.1.11 Definition. Let S be an arbitrary set and let fn : S → R. If thesequence {fn(x)} is increasing (decreasing) for each x ∈ S and fn → f on S,we write fn ↑ f (fn ↓ f). In either case we say that {fn} is monotone. ♦


The following theorem gives general conditions under which pointwiseconvergence implies uniform convergence.

7.1.12 Dini’s Theorem. Let f and fn be continuous on [a, b] for each n andsuppose that either fn ↓ f or fn ↑ f on [a, b]. Then fn → f uniformly.

Proof. We may assume that fn ↓ f . Let gn = fn − f , so gn ↓ 0. Suppose theassertion of the theorem is false. Then there exists an ε > 0, a subsequence{hn} of {gn}, and a sequence {xn} in [a, b] such that hn(xn) ≥ ε for all n.(Why?) By the Bolzano–Weierstrass theorem, there exists a subsequence {xnk}converging to some x ∈ [a, b]. Since hn ↓, for any fixed n and all sufficientlylarge k, hn(xnk) ≥ hnk(xnk), hence hn(xnk) ≥ ε. Letting k → +∞ in the lastinequality yields hn(x) ≥ ε for all n, contradicting that hn(x)→ 0.

The examples xn on [0, 1) and x−n on [2,+∞) show that Dini’s theorem isfalse if the interval is not closed and bounded. The decreasing sequence definedby

fn(x) =

1 if 0 ≤ x ≤ 1,1 + n(1− x) if 1 ≤ x ≤ 1 + 1/n,0 if 1 + 1/n ≤ x ≤ 2

(7.2)

shows that continuity of the limit function in Dini’s theorem is essential.

Exercises1. Find the largest subset of R on which the given sequence converges

pointwise, and determine the intervals on which the convergence isuniform.

(a) xn(1− x)n. (b) S npxn(1− x). (c) ex/n.

(d) S nx2

enx2 . (e) 11 + x2n(1− x)2 . (f) n1/2 sin

( x

n2/3

).

(g) S

√nx2

1 + nx2 . (h) nx2

1 + nx2 . (i)(

x

2 + x

)n.

(j) S x2n

2 + x2n . (k) 11 + |x|n . (l) n sin x2

1 + nx2 .

2. Describe the convergence behavior of the following sequences on [0, 1]:

(a)S 1nx+ 1 . (b)S x

nx+ 1 . (c) nx

n2x+ 1 . (d) nx

n2x2 + 1 .

3. Describe the convergence behavior of the sequences on (0, 1):(a) {x1/n}. (b) {x1+1/n}. (c) {x−1/n}. (d) {x1−1/n}.

4. Show directly that the sequence defined in (7.2) does not convergeuniformly.

5. Let p, q > 0. Prove that the sequence of functions xp

n+ xqconverges

uniformly to zero on [0,+∞) iff p < q.

6.S Give an example of sequences {fn}, {gn} and functions f , g such thatfn → f and gn → g uniformly, and fngn → fg pointwise but notuniformly.

7. Give an example of a sequence {gn} and a function g such that gn → guniformly and 1/gn → 1/g pointwise but not uniformly.

8. Let −∞ < a < b ≤ +∞. Suppose that fn → f uniformly on [a, r]for every r ∈ (a, b). Prove that fn → f uniformly on [a, b) iff for eachsequence {bn} with bn ↑ b, fn(bn) − f(bn) → 0. Use this to show thatfn(x) := x−n does not converge uniformly on [2,+∞).

9. Let fn be bounded for each n and let fn → f uniformly on a set S. Provethat supS fn → supS f and infS fn → infS f .

10.S Let f be uniformly continuous on R and an → a. Set fn(x) = f(x+ an).Show that {fn} converges uniformly on R.

11. Let fn be continuous on [a, b] for each n and let fn converge uniformlyon (a, b) ∩Q. Prove that fn converges uniformly on [a, b].

12. Prove: If fn → f uniformly on each of the sets S1, . . . , Sm, then fn → funiformly on S1 ∪ · · · ∪ Sm. Show that the corresponding statement fora union of infinitely many sets is false.

13.S For x ∈ [0, 1] define

fn(x) ={

1 if x ∈ Q and x = k/m in reduced form with m ≤ n,0 otherwise.

Show that {fn} converges pointwise but not uniformly to the Dirichletfunction.

14. Let p ∈ N. For x ∈ [0, 1] define

gn(x) ={

(m+ 1/n)p if x ∈ Q, x = k/m in reduced form0 if x is irrational.

Show that gn converges uniformly on [0, 1] iff p = 1.

15. Let {fn} be uniformly bounded, let f, g be bounded on [0, 1], and supposethat fn → f pointwise (uniformly) on [r, 1] for each 0 < r < 1. If g iscontinuous at 0 and g(0) = 0, prove that fng → fg pointwise (uniformly)on [0, 1].

16. Let {fn} be uniformly bounded and fn → f uniformly on S.

(a) Prove that (f1 + f2 + · · ·+ fn)/n→ f uniformly on S.(b)S Suppose for some r > 0 that fn(x) ≥ r for all n and all x ∈ S. Prove

that (f1f2 · · · fn)1/n → f uniformly on S.

17.S Let f0 be a bounded function on a set S and 0 < r < 1. Define asequence {fn} recursively by

fn(x) = sin(rfn−1(x)

), x ∈ S, n ≥ 1.

Prove that {fn} converges uniformly on S. Show that a similar resultholds if S is an interval and sin x is replaced by any function g such thatsupx |g′(x)| < 1/r, where r is any positive number.

18. Let g and h be positive and continuous on [a, b] and define

fn(x) := ng(x)1 + n2h(x) .

Prove that the following convergence is uniform on [a, b]:

(a) n sin fn →g

h. (b) n

(1− cos fn

)→ 0. (c) n2(1− cos fn

)→ g2

2h2 .

7.2 Properties of the Limit FunctionThe theorems in this section give conditions under which the properties

of continuity, integrability, or differentiability of functions in a sequence arepassed along to the limit function. We shall see that pointwise convergence isgenerally insufficient for this—the stronger property of uniform convergence isneeded.

The following theorem asserts that under suitable conditions two limitprocesses may be interchanged. It is one of several such results to be found inthe text.

7.2.1 Interchange of Limits. Let fn → f uniformly on a subset E of R andlet a be an accumulation point of E such that Ln := lim{x→a, x∈E} fn(x) existsin R for each n. Then L := limn Ln exists in R and lim{x→a, x∈E} f(x) = L.In other words, the equality

limn

limx→ax∈E

fn(x) = limx→ax∈E

limnfn(x)

holds provided that each inner limit exists in R and the convergence in theinner limit on the right is uniform.

Proof. Given ε, for each n choose δn > 0 such that

|fn(x)− Ln| < ε/3 for all x ∈ E with |x− a| < δn.

Next, choose N ∈ N such that

|fn(x)− f(x)| < ε/6 for all x ∈ E and all n ≥ N .

For n, m ≥ N , choose x ∈ E such that |x− a| < min{δn, δm}. Then

|Ln − Lm| ≤ |Ln − fn(x)|+ |fn(x)− fm(x)|+ |Lm − fm(x)| < ε.

This shows that {Ln} is a Cauchy sequence and hence converges to someL ∈ R. Let n ≥ N be sufficiently large so that |Ln − L| < ε/6. If x ∈ E and|x− a| < δn, then

|f(x)− L| ≤ |f(x)− fn(x)|+ |fn(x)− Ln|+ |Ln − L| < ε/6 + ε/3 + ε/6 < ε.

Therefore, lim{x→a, x∈E} f(x) = L.

7.2.2 Corollary. If fn → f uniformly on an interval I and if each fn iscontinuous at some a ∈ I, then f is continuous at a.Proof. Take Ln = fn(a) in the theorem.

The corollary is false if the convergence is only pointwise. For example, thesequence of continuous functions xn converges pointwise on [0, 1] to a functionthat is discontinuous at x = 1.7.2.3 Theorem. If fn → f uniformly on [a, b] and fn ∈ Rba for all n, thenf ∈ Rba and

limn

∫ b

a

fn(t) dt =∫ b

a

f(t) dt. (7.3)

Proof. By 7.1.8, f is bounded. By uniform convergence, given ε > 0, thereexists an N such that

fn(x)− ε

4(b− a) < f(x) < fn(x) + ε

4(b− a)for all x ∈ [a, b] and n ≥ N . It follows that for fixed n ≥ N and any partition P ,

S(fn,P)− ε

4 ≤ S(f,P) ≤ S(f,P) ≤ S(fn,P) + ε

4 ,

henceS(f,P)− S(f,P) ≤ S(fn,P)− S(fn,P) + ε

2 .

Since fn is integrable, P may be chosen so that the right side of this inequalityis less than ε. Therefore, f is integrable.

Since |fn(t)− f(t)| < ε/4(b− a) for n ≥ N and all t,∣∣∣ ∫ b

a

fn(t) dt−∫ b

a

f(t) dt∣∣∣ ≤ ∫ b

a

|fn(t)− f(t)| dt ≤ ε

4 ,

which shows that∫ bafn →

∫ baf .

The following examples show that the hypothesis of uniform convergencein 7.2.3 cannot be relaxed.

7.2.4 Example. Define fn : [0, π] 7→ R by

fn(x) ={n sin(nx) if 0 ≤ x ≤ π/n,0 if π/n ≤ x ≤ π.

n

ππ/n

fn

x

FIGURE 7.2: Pointwise convergence insufficient.

Each fn is continuous and {fn} converges pointwise on [0, π] to the zerofunction, yet

∫ π0 fn = 2 for all n. ♦

7.2.5 Example. Let r1, r2, . . . be an enumeration of the rationals in [0, 1] andlet

fn(x) ={

1 if x ∈ {r1, . . . , rn},0 otherwise.

Then fn is integrable with zero integral and fn converges pointwise to theDirichlet function, which is not Riemann integrable. ♦

In the two preceding examples, either the sequence was not uniformlybounded or the limit function was not integrable. It will follow from results inChapter 11 that if {fn} is uniformly bounded, fn, f ∈ Rba, and fn → f merelypointwise on [a, b], then (7.3) holds.

7.2.6 Theorem. Let fn be differentiable on (a, b) for each n and let {f ′n}converge uniformly on (a, b). If {fn(x0)} converges for some x0 ∈ (a, b), then{fn} converges uniformly to a differentiable function f on (a, b) and f ′n → f ′

on (a, b).

Proof. Given ε > 0, choose N such that, for all m, n ≥ N and x ∈ (a, b),

|fn(x0)− fm(x0)| < ε

2 and |f ′n(x)− f ′m(x)| < ε

2(b− a) .

Fix m, n ≥ N . By the mean value theorem applied to fn − fm, for each pair

x, y ∈ (a, b) there exists ξm,n ∈ (a, b) such that∣∣(fn(x)− fm(x))−(fn(y)− fm(y)

)∣∣ = |f ′n(ξm,n)− f ′m(ξm,n)||x− y|

≤ ε|x− y|2(b− a) ≤

ε

2 . (7.4)

In particular, for all x ∈ (a, b),

|fn(x)− fm(x)|≤∣∣(fn(x)− fm(x)

)−(fn(x0)− fm(x0)

)∣∣+ |fn(x0)− fm(x0)|< ε/2 + ε/2 = ε.

By the uniform Cauchy criterion, {fn} converges uniformly on (a, b) to somefunction f . Also, from (7.4), for fixed y and for all x 6= y,∣∣∣∣fn(x)− fn(y)

x− y− fm(x)− fm(y)

x− y

∣∣∣∣ ≤ ε

2(b− a) .

Therefore, the sequence of functions [fn(x)−fn(y)]/(x−y) converges uniformlyin x on the set Ey := (a, y) ∪ (y, b). Since fn converges to f ,

fn(x)− fn(y)x− y

→ f(x)− f(y)x− y

uniformly in x on Ey.

By 7.2.1 with E = Ey,

limnf ′n(y). = lim

nlimx→y

fn(x)− fn(y)x− y

= limx→y

f(x)− f(y)x− y

= f ′(y).

The sequence given by fn(x) = xn/n, 0 < x < 1 shows that uniformconvergence of a sequence of functions does not guarantee that the derivativesconverge uniformly.

Exercises1. Prove: If fn → f uniformly on an interval I and each fn is continuous ata ∈ I, then, for any sequence {an} in I with an → a, limn fn(an) = f(a).

2. Show that if fn → f uniformly on a subset E of R and each fn isuniformly continuous on E, then f is uniformly continuous on E.

3. Prove that (1 + x/n)n → ex uniformly on any bounded interval of R.Conclude that

∫ 10 (1 + x/n)n → e− 1.

4.S Show that n2xe−nx → 0 for all x ≥ 0, yet∫ 1

0 n2xe−nx dx 6→ 0. Why

does this not contradict 7.2.3?


5. Evaluate limn

∫ 10 fn if fn(x) =

(a) 1cos(x/n) . (b) x

n sin(x/n) . (c) n(ex/n − 1)x

.

(d)S

√n(e−x/n − 1

)x

. (e) arctan(ax(x+ 1)n+ 1

nx+ 1

), a > 0.

6.S Prove that fn(x) :=√n/(1 +n2x2) converges to 0 pointwise on (0,+∞),

uniformly on [r,+∞) for every r > 0, but not uniformly on (0, 1). Showthat, nonetheless,

∫ 10 fn → 0.

7. Let {an} be a positive, strictly increasing sequence. Prove that

limn

∫ 1

0

anx

1 + anxdx =

∫ 1

0limn

anx

1 + anxdx.

8. Let f and f ′ be positive and continuous on [a, b]. Define

fn(x) := nf ′(x)1 + n2f(x) and gn(x) := 2n

√f ′(x)

1 + n2f(x) .

Use Exercise 7.1.18 to find

(a)S limn

∫ b

a

n sin fn. (b) limn

∫ b

a

n(1− cos fn). (c) limn

∫ b

a

n(1− cos gn).

9.S Show that if fn → f uniformly on [a, b] and fn is integrable for each nthen ∫ x

a

fn(t) dt→∫ x

a

f(t) dt

uniformly in x on [a, b].

10. Suppose that fn is improperly integrable on [a, c), fn → f uniformlyon [a, t] for all t ∈ [a, c), and |fn| ≤ g on [a, c) for all n, where g isimproperly integrable on [a, c). Prove that f is improperly integrable on[a, c) and

limn

∫ c

a

fn =∫ c

a

f.

11. Prove that if f is continuous on [0, 1], then

limn

∫ 1

0f(xn) dx = f(0).

12. For each n, let fn be continuous on [a,+∞), a > 0, and suppose thatcn := limx→+∞ fn(x) exists in R. Prove that if fn → f uniformly on

[a,+∞), then limn cn and limx→+∞ f(x) exist and are equal. Show alsothat

limn

∫ 1/a

0fn(x) dx =

∫ 1/a

0f(x) dx.

Hint. Let gn(x) = fn(1/x), 0 < x ≤ 1/a and apply 7.2.1.

13. Let fn be as in 7.2.4 and define

gn(x) =∫ x

0fn(t) dt, hn(x) = xgn(x), 0 ≤ x ≤ π.

Show that(a) {gn} converges pointwise and monotonically on [0, π] but not uni-formly.(b) {hn} converges uniformly on [0, π].(c) {h′n} does not converge uniformly on [0, π].

7.3 Convergence of Series of Functions7.3.1 Definition. Let {fn} be a sequence of real-valued functions on a set S.For each x ∈ S and n ∈ N form the nth partial sums

sn(x) =n∑k=1

fn(x) and tn(x) =n∑k=1|fn(x)|.

The infinite series of functions∑n fn =

∑∞n=1 fn is said to converge

• pointwise on S if∑n fn(x) converges for each x ∈ S;

• absolutely pointwise on S if∑n fn(x) converges absolutely for every

x ∈ S;

• uniformly on S if {sn} converges uniformly on S;

• absolutely uniformly on S if {tn} converges uniformly on S. ♦

The methods of Chapter 6 series may be applied at each x to test pointwiseconvergence of a series of functions. For uniform convergence, additional testsare required.

The following result is an immediate consequence of 7.1.9.7.3.2 Theorem. Let

∑n fn and

∑n gn converge uniformly on a set S and

let α, β ∈ R. Then∑n(αfn + βgn) converges uniformly on S and∑n

(αfn + βgn) = α∑n

fn + β∑n

gn.

The next theorem is a useful test for nonuniform convergence of a series.The proof is immediate from the identity fn = sn − sn−1.

7.3.3 Theorem. If∑n fn converges uniformly on a set S, then fn → 0

uniformly on S.

For example, the geometric series∞∑n=0

xn = 11− x, |x| < 1, (7.5)

converges pointwise but not uniformly on (−1, 1), since xn does not tend tozero uniformly on (−1, 1). We show below that the series converges uniformlyon all closed subintervals of (−1, 1).

The comparison test for uniform convergence of a series of functions takesthe following form:

7.3.4 Uniform Comparison Test. If |fn(x)| ≤ gn(x) for all n and allx ∈ S and if

∑n gn converges uniformly on S, then

∑n fn converges absolutely

uniformly on S.

Proof. Since∑nk=m |fn(x)| ≤

∑nk=m gn(x), the assertion follows from the

uniform Cauchy criterion.

7.3.5 Corollary. If∑n fn converges absolutely uniformly on a set S, then∑

n fn converges uniformly on S.

Proof. 0 ≤ fn+|fn| ≤ 2|fn|, hence, by 7.3.4,∑n(fn+|fn|) converges uniformly

on S and therefore so must∑n

fn =∑n

(fn + |fn|)−∑n

|fn|.

7.3.6 Weierstrass M-test. If there exist positive constants Mn such that∑nMn < +∞ and |fn| ≤Mn on S for all n, then

∑n fn converges absolutely

uniformly on S.

Proof. Take gn to be the constant function Mn in 7.3.4.

For example, taking Mn = rn, we see that the geometric series (7.5)converges uniformly in every interval [−r, r], 0 < r < 1.

The next results are uniform convergence analogs of Dirichlet’s and Abel’stests for numerical series.

7.3.7 Theorem. If∑n fn converges uniformly on a set S and if there exists

a constant M such that

|g1(x)|+∞∑n=1|gn+1(x)− gn(x)| ≤M for all x ∈ S,

then∑n fngn converges uniformly on S.

Proof. Let sn =∑nk=1 fk −

∑∞n=1 fn and tn =

∑nk=1 fkgk. For each n > 1,

gn =n−1∑k=1

(gk+1 − gk

)+ g1,

hence |gn| ≤ M on S. Given ε > 0, choose N so that |sn(x)| < ε for alln,m ≥ N and x ∈ S. By 6.4.4, for m > n > N and x ∈ S,

|tm(x)− tn−1(x)|

≤m∑k=n|sk(x)| |gk(x)− gk+1(x)|+ |sm(x)| |gm(x)|+ |sn−1(x)| |gn(x)| (7.6)

≤Mε+Mε+Mε = 3Mε.

Therefore, {tn} is uniformly Cauchy on S and hence converges uniformly.

7.3.8 Theorem. If, on a set S, the partial sums of∑n fn are uniformly

bounded,∑n |gn+1 − gn| converges uniformly, and gn → 0 uniformly, then∑

n fngn converges uniformly on S.

Proof. Let tn be in the proof of 7.3.7, sn :=∑nk=1 fk, and let M be a uniform

bound for {sn} on S. Given ε > 0, choose N such that

|gn(x)| < ε andm∑k=n|gk(x)− gk+1(x)| < ε, m > n > N, x ∈ S. (7.7)

Since (7.6) holds in the current setting, (7.7) implies that

|tm(x)− tn−1(x)| ≤ 3Mε, m > n > N, x ∈ S.

Therefore, {tn} converges uniformly on S.

7.3.9 Corollary. If the partial sums of∑n fn are uniformly bounded and if

gn ↓ 0 or gn ↑ 0 uniformly on S, then∑n fngn converges uniformly on S.

Proof. Assume that {gn} is decreasing. Thenn∑k=1|gk+1 − gk| =

n∑k=1

(gk − gk+1) = g1 − gn+1,

hence∑∞n=1 |gn+1 − gn| converges uniformly.

7.3.10 Example. Let gn be continuous and gn ↓ 0 or gn ↑ 0 on R. We applythe preceding corollary to the series

s(x) :=∑n

gn(x) sinnx


on closed bounded intervals I not containing any integer multiple of 2π.By Dini’s theorem, gn → 0 uniformly on I. Also, by 6.4.6, s(x) converges

pointwise on R. Moreover, if x is not a multiple of 2π, then∣∣∣ n∑k=1

sin(kx)∣∣∣ ≤ ∣∣∣ 1

sin(x/2)

∣∣∣.Since infI | sin(x/2)| > 0, the sums

∑nk=1 sin(kx) are uniformly bounded on I.

By 7.3.9, s(x) converges uniformly on I. By 7.3.8, the same result holds if,instead of monotonicity of the sequence {gn}, we require that

∑n |gn+1 − gn|

converges and gn → 0, both uniformly on I. Analogous results hold for seriesof the form

∑n gn(x) cosnx. ♦

7.3.11 Uniform Alternating Series Test. If gn ↓ 0 or gn ↑ 0 uniformlyon a set S, then

∑∞n=1(−1)n+1gn converges uniformly on S.

Proof. Take fn = (−1)n+1 in 7.3.9.

7.3.12 Example. Let f be continuous on R and monotone in some neighbor-hood N of 0 with f(0) = 0. If an ↓ 0, then the series

∞∑n=1

(−1)n+1f(anx)

converges uniformly on any closed, bounded interval I.We verify this for the case I ⊆ [0,+∞) and f increasing. Choose N so that

anx ∈ N for all n ≥ N and x ∈ I. Then f(anx) ↓ 0 on I, hence, by Dini’stheorem and 7.3.11,

∑∞n=1(−1)nf(anx) converges uniformly on I.

For example, taking an = 1/n we see that the series∞∑n=1

(−1)n+1 sin(x/n),∞∑n=1

(−1)n+1n−1xex/n, and∞∑n=1

(−1)n+1[1− e−n−2x2

]

all converge uniformly on closed bounded intervals. ♦

The following theorem is an immediate consequence of 7.2.2, 7.2.3, and7.2.6 applied to the sequence of partial sums of the series.

7.3.13 Theorem. Let fn : [a, b]→ R and s :=∑n fn.

(a) If s converges uniformly on [a, b] and each fn is continuous, then s iscontinuous.

(b) If s converges uniformly on [a, b] and fn ∈ Rba for all n, then s ∈ Rba and∫ bas =

∑n

∫ bafn.

(c) Let fn be differentiable on (a, b) and suppose that the derived series∑n f′n

converges uniformly on (a, b) and that∑n fn(x0) converges for some

x0 ∈ (a, b). Then s converges uniformly on (a, b) and s′ =∑n f′n.

7.3.14 Example. (a) By 7.3.10,∑n n−1 sin(nx) converges uniformly on

intervals [a, b] ⊆ (0, 2π), hence∫ b

a

∑n

n−1 sin(nx) dx =∑n

∫ b

a

n−1 sin(nx) dx =∑n

cos(na)− cos(nb)n2 .

On the other hand, the derived series∑n cos(nx) does not converge.

(b) Both s(x) :=∑n n−1 sin(x/n) and its derived series

∑n n−2 cos(x/n)

converge uniformly on R, hence the latter equals s′(x). ♦

A closed form for a series s :=∑n fn on a subset E of R is a “standard

function” that equals s on E. Closed forms are typically combinations of ratio-nal, power, exponential, logarithmic, trigonometric, or inverse trigonometricfunctions.

7.3.15 Example. Since 1/(1 − x) is a closed form for the geometric series(7.5) on (−1, 1), the function

1

1− 12 + sin x

= 2 + sin x1 + sin x

is a closed form for the series∞∑n=0

(1

2 + sin x

)non intervals I not containing

(4n− 1)π/2 or −(4n+ 1)π/2, n = 0, 1, 2, . . .. By the Weierstrass M -test, theseries converges absolutely uniformly on closed subintervals of I, since on sucha subinterval 0 < 1/(2 + sin x) < 1/(1 + ε) for some ε > 0. ♦

Exercises1. For the functions fn below, determine all subintervals of [0,+∞) on

which∑∞n=0 fn(x) converges pointwise or uniformly, where p ∈ N.

(a) S 11 + xn

. (b) xn

1 + xn. (c) x

n2 + x.

(d) S x

1 + n2x. (e)

(x

x− 2

)n. (f) sin(nx)

1 + n2x2 .

(g) S npe−nx. (h) n−x. (i) S sin(x/np).

(j) xn(1− x)n. (k)(

1− x1 + x

)n. (l) xne−nx.

2. Find the largest intervals of pointwise convergence and uniform conver-gence and a closed form for the series

∑∞n=0 fn(x), where fn(x) =

(a) cosn(πx/2

), x ∈ [0, 1]. (b)S lnn(1/x). (c) (−1)n

enx. (d) (x2 ln x)n.

3. Prove that if∑n fn converges absolutely uniformly on a set S, then∑

n f+n and

∑n f−n converge uniformly on set S, where, for each x ∈ S,

f+n (x) and f−n (x) are, respectively, the positive and negative parts offn(x).

4.S Suppose that the numerical series∑n an converges absolutely. Let

s(t) =∑n

an sin[(2n+ 1)t

]and c(t) =

∑n

an cos(nt).

Find series expansions for∫ π/2

x

s(t) dt and∫ x

0c(t) dt.

5. Let p > 0 and s(x) =∞∑n=1

sin(x/np). Prove:

(a) If p ≤ 1, then s(x) diverges for all x 6= 0.(b) If p > 1, then s(x) converges absolutely uniformly on boundedintervals, (hence pointwise on R) but not uniformly on R.

6.S Let p > 0 and s(x) =∞∑n=1

[1− cos(x/np)]. Prove:

(a) If p ≤ 1/2, then s(x) diverges for all x 6= 0.(b) If p > 1/2, then s(x) converges absolutely uniformly on boundedintervals, (hence pointwise on R) but not uniformly on R.

7. Let f(x) be bounded on [0, 1] and

t(x) :=∞∑n=0

xnf(x), x ∈ [0, 1].

(a) Prove that t(x) converges pointwise on [0, 1) and uniformly on [0, r]for 0 < r < 1.(b) Prove that if f(1) 6= 0, then the convergence of t(x) is not uniformon [0, 1).(c) Suppose that L := limx→1−(1 − x)−1f(x) exists. Prove that theconvergence of t(x) is uniform on [0, 1) iff L = 0.(d) Let m ∈ N. Determine whether the convergence of t(x) is uniform on[0, 1) for f(x) =

(i) (1− x)m. (ii) 1− xm. (iii) 1− sin(πx/2). (iv) cos(πx/2).

8. (Uniform limit comparison test). Let fn ≥ 0 and gn > 0 on a set S andlet fn/gn → h uniformly on S, where h : S → R satisfies

0 < infSh ≤ sup

Sh < +∞.

Prove that∑n fn converges uniformly on S iff

∑n gn converges uniformly

on S.

9.S Suppose that f ′ exists, is bounded on I := (−r, r), and f(0) = 0. Provethat the series

s(x) :=∞∑n=0

1nf

(x

n+ 1

)converges uniformly on I and that s′(0) = f ′(0).

10. Suppose that |f(x)| ≤ |x| on I = (−r, r), r > 0. If f is differentiableon I and f ′ is continuous at 0, show that the series s(x) in Exercise 9converges uniformly on I and that |s′(0)| ≤ 1.

11.S Let fn(x) be continuous and nonnegative on [a, b]. Prove that if∑n fn

converges pointwise on [a, b] to a continuous function, then the conver-gence is uniform.

12. Let {an} be a sequence such that∑∞n=1 a

−1n converges absolutely. Prove

that∑∞n=1 |x − an|−1 converges uniformly on bounded intervals not

containing any an.

13.S Suppose that fn is monotone on [a, b] for each n. Prove that if∑n fn(a)

and∑n fn(b) converge absolutely, then

∑n fn ∈ Rba and∫ b

a

∑n

fn =∑n

∫ b

a

fn.

14. Let∑n fn converge uniformly on S and let {gn} be a uniformly bounded

sequence of functions on a set S such that either {gn} is monotoneincreasing or monotone decreasing on S. Prove that

∑n fngn converges

uniformly on S.

15.S Suppose that the partial sums of∑n fn are uniformly bounded on

I = [a, b], gn is continuous for each n, and gn ↓ 0 or gn ↑ 0 on I. Provethat

∑n fngn converges uniformly on I.

16. Suppose that∑n fn converges uniformly on I = [a, b], gn is continuous

for each n, and gn ↓ g or gn ↑ g on I, where g is continuous. Prove that∑n fngn converges uniformly on I.

17. Suppose that gn is continuous on I = [a, b] for each n, {gn} is monotone,and s(x) :=

∑n(−1)ngn(x) converges for each x ∈ I. Prove that s(x) is

continuous on I.

18.S Let g be continuous and nonnegative on R. Prove that the series

s(x) :=∞∑n=1

(−1)n g(x) + n

n2

converges uniformly on bounded intervals, hence pointwise on R, butdoes not converge absolutely for any x.

19. Let gn be continuous and gn ↓ 0 on R. Show that if [a, b] does not containany odd multiple of π, then

∑n(−1)ngn(x) cosnx converges uniformly

on [a, b].

7.4 Power SeriesA power series in x about a is an infinite series of the form

s(x) =∞∑n=0

cn(x− a)n, a, cn ∈ R.

In the following four subsections we examine the properties of these importantseries. The first step is to determine the convergence set of a power series.

Radius of Convergence of a Power Series7.4.1 Radius of Convergence Theorem. Given a power series s(x) :=∑∞

n=0 cn(x− a)n, define the extended real number R ∈ [0,+∞] by

R = ρ−1, where ρ := lim supn|cn|1/n.

Then s(x)

(a) converges absolutely pointwise for |x− a| < R;(b) converges absolutely uniformly for |x− a| ≤ r < R;(c) diverges for |x− a| > R.

Proof. For the case R = 0 (ρ = +∞), the theorem asserts that s(x) divergesfor all x 6= a. This is immediate from the root test. A similar application ofthe root test proves (c): If |x− a| > R, then

lim supn|cn(x− a)n|1/n = ρ|x− a| > 1.

To prove (a) and (b), assume R > 0 (ρ < +∞) and let 0 < r < s < R.

Then ρ < 1/s so there exists an index N such that |cn|1/n < 1/s for all n ≥ N .For such n and for all x with |x− a| ≤ r, |cn(x− a)n| ≤ (r/s)n. Since r/s < 1,the series converges uniformly on [a− r, a+ r] by Weierstrass M -test. Since ris arbitrary, part (a) follows.

The number R = 1/ρ is called the radius of convergence of the series. Theset I of all x for which the series

∑∞n=0 cn(x − a)n converges is called the

interval of convergence. By 7.4.1, I is one of the intervals

{a}, (a−R, a+R), (a−R, a+R], [a−R, a+R), or [a−R, a+R].

The theorem gives no further information regarding I. The methods of Chap-ter 6 may be applied to determine convergence behavior at the endpoints a±Rif R is finite.

The following characterization of R is frequently useful.

7.4.2 Theorem. If cn > 0 for all sufficiently large n, then

R = limn

|cn||cn+1|

,

provided the limit exists in R.

Proof. Let L denote the limit and set an = |cn| > 0 for all n ≥ N . Theassertion then follows from the inequalities

1L

= lim infn

an+1

an≤ lim inf

na1/nn ≤ ρ = lim sup

na1/nn ≤ lim sup

n

an+1

an= 1L,

(Exercise 2.4.12).

Here are some typical examples using 7.4.2, where I is the convergenceinterval.

Examples.

(a)∞∑n=1

nnxn, I = {0}.

(b)∞∑n=1

xn

n! , I = (−∞,+∞).

(c)∞∑n=1

xn√n, I = [−1, 1), conditional convergence at −1.

(d)∞∑n=1

xn

n2 , I = [−1, 1], absolute convergence at ±1. ♦

The following example is somewhat more interesting.


7.4.3 Example. The Fibonacci sequence {cn} is defined by

c0 = c1 = 1, cn = cn−1 + cn−2, n ≥ 2.

The Fibonacci power series is the series∑∞n=0 cnx

n. We use 7.4.2 to show thatthe radius of convergence of the series is (

√5− 1)/2.

Set rn = cn+1/cn. Note that the first few terms of the sequence {rn} are1, 2, 3/2, 5/3, 8/5 and that

rn = cn + cn−1

cn= 1 + 1

rn−1, n ≥ 2. (7.8)

An induction argument then shows that

3/2 ≤ rn ≤ 5/3, n ≥ 2. (7.9)

Now, from (7.8),

rn − rm = 1rn−1

− 1rm−1

= rm−1 − rn−1

rm−1rn−1. (7.10)

In particular,

r2k+1 − r2k−1 = r2k−2 − r2k

r2kr2k−2and r2k − r2k−2 = r2k−3 − r2k−1

r2k−1r2k−3,

hencer2k+1 − r2k−1 = r2k−1 − r2k−3

r2kr2k−2r2k−1r2k−3.

Iterating, we obtain r2k+1 − r2k−1 = (r3 − r1)/ak for some ak > 0, hence{r2k+1} is increasing. A similar argument shows that {r2k} is decreasing.Therefore, the sequences converge, say, r2k+1 → L and r2k →M . From (7.10),

|rn − rn−1| =|rn−1 − rn−2|rn−1rn−2

= . . . = |r2 − r1|bn

,

where bn is a product of 2n−2 terms, each of which is an rk. From (7.9),bn → +∞, hence rn − rn−1 → 0. Therefore,

cn+1

cn= rn → L = M = 1/R,

where R is the radius of convergence of the series. Taking limits in (7.8) showsthat 1/R = 1 +R, which has positive solution R = (

√5− 1)/2. ♦

Since a power series converges uniformly on closed bounded subintervalsof (a − R, a + R), 7.3.13 implies that the series is continuous on the entireinterval. The following theorem extends continuity to the endpoints.

7.4.4 Abel’s Continuity Theorem. Let s(x) :=∑∞n=0 cn(x − a)n have

radius of convergence R with 0 < R < +∞. If s(x) converges at x = a + R,then s(x) converges uniformly on [b, a + R] for any b ∈ (a − R, a + R). Inparticular, s is continuous on (a−R, a+R].

Proof. The transformation x = Ry+a produces a power series in y = (x−a)/Rthat converges on (−1, 1]. Hence we may assume in the original series thata = 0 and s(x) converges on (−1, 1]. It suffices then to show that s(x) convergesuniformly on [0, 1].

Let sn(x) =∑nk=0 ckx

k, 0 ≤ x ≤ 1. For n > m > 1, define

Cm,n =n∑

k=mck = sn(1)− sm−1(1).

By 6.4.4,

sn(x)− sm−1(x) =n−1∑k=m

Cm,k(xk − xk+1) + Cm,nxn − Cm−1,nx

m.

Since∑n cn converges, given ε > 0, we may choose N such that |Cm,n| < ε/3

for all n > m ≥ N . Then for all n > m ≥ N ,

|sn(x)− sm−1(x)| ≤n−1∑k=m|Cm,k|(xk − xk+1) + |Cm,n|+ |Cm−1,n|

≤ ε

3

n−1∑k=m

(xk − xk+1) + 2ε3 .

Since∑n−1k=m(xk−xk+1) = xm−xn ≤ 1, the last expression is ≤ ε. This shows

that {sn} is uniformly Cauchy on [0, 1], hence converges uniformly.

The next result shows that a power series may be differentiated or integratedterm by term over the interior of the interval of convergence.

7.4.5 Theorem. Let s(x) :=∑∞n=0 cn(x − a)n have radius of convergence

R > 0. Then the derived series and the integrated series

D(x) :=∞∑n=1

ncn(x− a)n−1 and I(x) :=∞∑n=0

cnn+ 1(x− a)n+1

have radius of convergence R. Moreover, s(x) is differentiable on the interval(a−R, a+R), and for x ∈ (a−R, a+R)

s′(x) = D(x) and∫ x

a

s(t) dt = I(x).

Proof. Since limn n1/n = limn 1/(n+ 1)1/n = 1,

lim supn|ncn|1/n = lim sup

n|cn/(n+ 1)|1/n = lim sup

n|cn|1/n.

Therefore, the series s(x), D(x), and I(x) have the same radius of convergence.Since the differentiation and integration takes place on closed subintervalswhere the convergence of each of the three series is uniform, the remainingassertions follow from 7.3.13.

Representation of Functions by Power SeriesA power series s(x) =

∑∞n=0 cn(x− a)n is said to represent a function f on

an interval I if f = s on I. The largest interval for which the representation isvalid is called the representation interval. Note that the representation intervalmay be smaller than the convergence interval. (See the examples below.)

Power series representations are unique. Indeed, if f is represented by∑∞n=0 cn(x−a)n on Ia := (a− r, a+ r), r > 0, then, by 7.4.5, f has derivatives

of all orders on Ia, and repeated differentiation of the identity

f(x) =∞∑n=0

cn(x− a)n, x ∈ Ia

shows that f (n)(a) = cnn!. Therefore, if f has a power series representationabout a, then

f(x) =∞∑n=0

f (n)(a)n! (x− a)n, x ∈ Ia.

The last series is called the Taylor series expansion of f about a. For a = 0 itis called a Maclaurin series.

The following examples show how various power series representations maybe obtained from the geometric series representation of (1 − x)−1 given in(7.5).7.4.6 Examples. (a) Differentiating (7.5) term by term and multiplying theresult by x yields the representation

x

(1− x)2 =∞∑n=1

nxn, |x| < 1. (7.11)

(b) Replacing x in (7.5) by −t and integrating produces

ln(x+ 1) =∫ x

0

11 + t

dt =∞∑n=1

(−1)n+1

nxn, |x| < 1. (7.12)

Since the series converges at x = 1, Abel’s continuity theorem shows that

ln 2 =∞∑n=1

(−1)n+1

n,

a result obtained in 6.4.8 by another method.(c) Replacing x in (7.5) by −t2 and integrating produces

arctan x =∫ x

0

11 + t2

dt =∞∑n=0

(−1)n x2n+1

2n+ 1 , |x| < 1. (7.13)

(d) For an example with a 6= 0, consider

35− 2x = 3

3− 2(x− 1) = 11− 2(x− 1)/3 =

∞∑n=0

2n3n (x−1)n, |x−1| < 3

2 . ♦

The next example and the theorem thereafter show that differentiation canbe a powerful tool for finding a closed form for a power series.7.4.7 Example. We show that

ex =∞∑n=0

xn

n! , −∞ < x < +∞. (7.14)

Let s(x) denote the series. By 7.4.2, the radius of convergence of s is

limn

(n+ 1)!n! = lim

n(n+ 1) = +∞,

so s(x) converges for all x. Differentiating the series term by term yieldss′(x) = s(x). Now set g(x) = e−xs(x). Then g′(x) = e−x[s′(x) − s(x)] = 0,hence g is constant. Since g(0) = 1, s(x) = ex. ♦

The following result is an extension of the binomial theorem. The coefficient(an

)in (7.15) is called a generalized binomial coefficient.

7.4.8 Binomial Series. For any a ∈ R and |x| < 1,

(1 +x)a =∞∑n=0

(a

n

)xn,

(a

n

):= a(a− 1) · · · (a− n+ 1)

n! ,

(a

0

):= 1. (7.15)

Proof. Let s(a, x) denote the series in (7.15). A simple calculation shows that∣∣∣∣∣(a

n

)(a

n+ 1

)−1∣∣∣∣∣ = n+ 1|a− n|

→ 1.

Therefore, by 7.4.2, s(a, x) converges for |x| < 1. For such x,

(1 + x)s(a− 1, x) =∞∑n=0

(a− 1n

)xn +

∞∑n=0

(a− 1n

)xn+1

= 1 +∞∑n=0

[(a− 1n+ 1

)+(a− 1n

)]xn+1

= 1 +∞∑n=0

(a

n+ 1

)xn+1

= s(a, x), (7.16)

where for the third equality we used the identity (Exercise 6)(a− 1n

)+(a− 1n+ 1

)=(

a

n+ 1

), n ∈ Z+. (7.17)

Now differentiate the series s(a, x) term by term to obtain

s′(a, x) =∞∑n=1

(a

n

)nxn−1 =

∞∑n=0

(a

n+ 1

)(n+ 1)xn = a

∞∑n=0

(a− 1n

)xn

= as(a− 1, x). (7.18)

Set g(x) = (1 + x)−as(a, x), |x| < 1. By (7.18) and (7.16),

g′(x) = −a(1 + x)−a−1s(a, x) + a(1 + x)−as(a− 1, x)= a(1 + x)−a−1[− s(a, x) + (1 + x)s(a− 1, x)

]= 0.

Therefore, g(x) = g(0) = 1, hence s(a, x) = (1 + x)a, as claimed.

7.4.9 Example. Replacing x in (7.15) by −x, we have

1√1− x

=∞∑n=0

(−1/2n

)(−1)nxn, |x| < 1.

Since (−1/2n

)= 1n!

(−1

2

)(−3

2

)· · ·(−2n− 1

2

)= (−1)n1 · 3 · 5 · · · (2n− 1)

n! 2n

= (−1)n1 · 2 · 3 · 4 · · · (2n− 1) · 2nn! 2n2 · 4 · · · 2n

= (−1)n(2n)!(n!)24n ,

we see that1√

1− x=∞∑n=0

(2n)!(n!)24n x

n, |x| < 1. (7.19)

Replacing x by t2 and integrating term by term from 0 to x yields the Maclaurinseries for arcsin x:

arcsin x =∞∑n=0

(2n)!(n!)2(2n+ 1)4nx

2n+1, |x| < 1. (7.20)

7.4.10 Remark. If a > 0 and is not an integer, then the binomial seriesconverges absolutely uniformly on [−1, 1]. Indeed, if an = |

(an

)|, then

anan+1

= |a(a− 1) · · · (a− n+ 1)|n!

(n+ 1)!|a(a− 1) · · · (a− n)| = n+ 1

|a− n|,

hence, for sufficiently large n,

n

(anan+1

− 1)

= n

(n+ 1n− a

− 1)

= n(1 + a)n− a

→ 1 + a > 1.

By Raabe’s test (6.3.2) the series converges absolutely at x = ±1, hence, byAbel’s continuity theorem (7.4.4), the series converges absolutely uniformly onthe interval [−1, 1]. ♦

Multiplication of Power Series7.4.11 Definition. The Cauchy product of the power series

∑∞n=0 anx

n and∑∞n=0 bnx

n is the power series∑∞n=0 cnx

n, where

cn =n∑k=0

akbn−k. ♦

Note that∑n cnx

n is precisely the series one obtains by formally carryingout the multiplication

(a0 + a1x+ a2x2 + · · · )(b0 + b1x+ b2x

2 + · · · )and collecting like powers. We show below that if the power series

∑∞n=0 anx

n

and∑∞n=0 bnx

n converge for |x| < R, then so does the Cauchy product. Forthis we need the following result due to Mertens.7.4.12 Lemma. If the numerical series A :=

∑∞n=0 αn and B :=

∑∞n=0 βn

both converge, and if at least one of the series converges absolutely, then theCauchy product

C :=∞∑n=0

γn, γn =n∑k=0

αkβn−k,

converges and C = AB.Proof. Assume that

∑∞n=0 αn converges absolutely. Let

An =n∑k=0

αk, Bn =n∑k=0

βk, Cn =n∑k=0

γk, and A′ =∞∑n=0|αn|.

ThenCn = α0β0 + (α0β1 + α1β0) + · · ·+ (α0βn + α1βn−1 + · · ·+ αnβ0)

= α0Bn + α1Bn−1 + · · ·+ αnB0

= α0(Bn −B +B) + α1(Bn−1 −B +B) + · · ·+ αn(B0 −B +B)= α0(Bn −B) + α1(Bn−1 −B) + · · ·+ αn(B0 −B) +AnB.

Thus to show that Cn → AB it suffices to verify that

α0(Bn −B) + α1(Bn−1 −B) + · · ·+ αn(B0 −B)→ 0.

Given ε > 0, choose N such that

|Bn −B| < ε/(2A′) for all n > N.

Since αn → 0, we may choose N ′ > N so that for all n > N ′

|αn(B0 −B) + αn−1(B1 −B) + · · ·+ αn−N (BN −B)| < ε/2.

For such n,

|α0(Bn −B) + α1(Bn−1 −B) + · · ·+ αn(B0 −B)|≤ |αn(B0 −B) + αn−1(B1 −B) + · · ·+ αn−N (BN −B)|+ |αn−N−1| |BN+1 −B|+ |αn−N−2| |BN+2 −B|+ · · ·+ |α0| |Bn −B|< ε/2 + ε/2 = ε.

7.4.13 Cauchy Product Theorem. For each x, let

C(x) =∞∑n=0

cnxn cn :=

n∑k=0

akbn−k

be the Cauchy product of series A(x) =∑∞n=0 anx

n and B(x) =∑∞n=0 bnx

n.If A(x) and B(x) have radii of convergence Ra and Rb, respectively, then C(x)has radius of convergence Rc ≥ min{Ra, Rb} and

C(x) = A(x)B(x), |x| < min{Ra, Rb}. (7.21)

Moreover, if, say Rb < Ra and B(Rb) converges, then C(Rb) converges andC(Rb) = A(Rb)B(Rb).

Proof. Assume that Rb ≤ Ra and let |x| < Rb. By 7.4.12 applied to αn = anxn

and βn = bnxn, the series C(x) converges, hence Rc ≥ |x| and 7.21 holds. Since

|x| was arbitrary, Rc ≥ Rb = min{Ra, Rb}. The last assertion of the theoremfollows from 7.4.4 by letting x ↑ Rb in 7.21.

7.4.14 Example. By (7.5) and (7.14), for |x| < 1

ex

1 + x=∞∑n=0

cnxn, where cn =

n∑k=0

(−1)n−kk! = (−1)n

n∑k=0

(−1)kk! . ♦

Remark. If Ra = Rb and both A(Ra) and B(Rb) in 7.4.13 converge, itdoes not necessarily follow that C(Ra) converges. Consider, for example,A(x) = B(x) =

∑∞n=1(−1)nxn/

√n, which has radius of convergence 1 and

converges conditionally at x = 1. The Cauchy product at x = 1 is∑∞n=1 cn,

where

cn = (−1)nn−1∑k=1

1√k(n− k)

.

However, for odd n,

|cn| =n−1∑k=1

1√k(n− k)

≥(n−1)/2∑k=1

1√k(n− k)

≥(n−1)/2∑k=1

1√(n− 1)2/2

=√

22 ,

hence∑n cn diverges. ♦

Analytic Functions7.4.15 Definition. A function f is said to be (real ) analytic at a point a if,for some r > 0, f has derivatives of all orders on (a−r, a+r) and is representedthere by its Taylor series at a, that is,

f(x) =∞∑n=0

f (n)(a)n! (x− a)n, |x− a| < r.

If f is analytic at each point of a set E, then f is said to be analytic on E.♦

A function that has derivatives of all orders on an interval may not beanalytic there. This is the case for the function in Exercise 29 below. Thefollowing theorem gives a necessary and sufficient condition for analyticity ata point.

7.4.16 Taylor Series Representation. Let f have derivatives of all orderson an open interval I containing a. Then f is analytic at a iff there existpositive constants M and r such that

|f (k)(x)| ≤ k!Mk for all k ∈ N and x ∈ (a− r, a+ r). (7.22)

Proof. Assume condition (7.22) holds. To prove that f is analytic at a weuse Taylor’s theorem (Section 4.6), which asserts that for each n ∈ N andx ∈ (a− r, a+ r) there exists a number c = c(n, x) between x and a such thatf(x) = Tn(x) +Rn(x), where

Tn(x) :=n−1∑k=0

f (k)(a)k! (x− a)k, and Rn(x) := f (n)(c)

n! (x− a)n.

Now let r ∈ (0, 1/M) and |x− a| < r. By hypothesis,

|Rn(x)| ≤Mn|x− a|n ≤ (Mr)n.

Since Mr < 1, Rn(x)→ 0, hence Tn(x)→ f(x). Therefore, f is analytic at a.

Conversely, let f be analytic at a. Then there exist constants r1 ∈ (0, 1)and cn such that

f(x) =∞∑n=0

cn(x− a)n, |x− a| ≤ r1. (7.23)

In particular, |cnrn1 | → 0. Choose M1 > 1 so that |cnrn1 | < M1 for all n.Termwise differentiation of (7.23) yields

f (k)(x) =∞∑n=k

n(n− 1) · · · (n− k + 1)cn(x− a)n−k,

hence for |x− a| ≤ r1/2,

|f (k)(x)| ≤∞∑n=k

n(n− 1) · · · (n− k + 1)|cn|(r1/2)n−k

≤M1r−k1

∞∑n=k

n(n− 1) · · · (n− k + 1)(1/2)n−k.

The last series is the kth derivative of the geometric series for (1 − x)−1

evaluated at 1/2 and therefore equals

dk

dxk

∣∣∣x=1/2

(1− x)−1 = k!(1− 1/2)−k−1 = k!2k+1.

Thus|f (k)(x)| ≤M1r

−k1 k!2k+1, |x− a| ≤ r1/2.

To obtain (7.22), take r = r1/2 and choose M > 4M1/r1, so that Mk >M1r

−k1 2k+1 for all k.

7.4.17 Example. Let f(x) = sin x. Then f (2k)(0) = 0 and f (2k+1)(0) = (−1)k.Since the derivatives of f are bounded, (7.22) holds for all x. Therefore,

sin x =∞∑n=0

(−1)n(2n+ 1)!x

2n+1 = x− x3

3! + x5

5! − . . . , ∞ < x < +∞.

Similarly,

cosx =∞∑n=0

(−1)n(2n)! x

2n = 1− x2

2! + x4

4! − . . . , ∞ < x < +∞. ♦

It is clear from 7.3.2 and 7.4.13 that the sum and product of functionsanalytic at a are analytic at a. In Exercise 33 the reader is asked to show thatthe reciprocal of a nonzero analytic function is analytic. It follows that theratio of two analytic functions, if defined, is analytic.

The next result extends the property of analyticity to nearby points.

7.4.18 Theorem. If the series f(x) =∑∞n=0 an(x − a)n converges on I :=

(a− r, a+ r), then f is analytic on I.

Proof. By considering g(x) = f(x+a), we may suppose that a = 0. Let |b| < r,0 < s < r − |b|, and |x− b| < s. We show that f has a power series expansionabout b on the interval |x− b| < s. Since b is arbitrary, it will follow that f isanalytic on I.

By the binomial theorem applied to[(x− b) + b

]n,f(x) =

∞∑n=0

n∑k=0

an

(n

k

)(x− b)kbn−k =

∞∑n=0

∞∑k=0

andk,n(x− b)kbn−k, (7.24)

where dk,n =(nk

)for k = 0, 1, · · · , n and dk,n = 0 for k > n. Now,

∞∑n=0

∞∑k=0|an(x− b)kbn−kdk,n| =

∞∑n=0|an|

n∑k=0

(n

k

)|x− b|k|bn−k|

=∞∑n=0|an|(|x− b|+ |b|)n.

If |x − b| < s then |x − b| + |b| < s + |b| < r and the last series converges.Therefore, (7.24) converges uniformly for |x − b| < s. By 6.5.4, the order ofsummation may be interchanged, so

f(x) =∞∑k=0

∞∑n=0

andk,n(x− b)kbn−k

=∞∑k=0

∞∑n=k

an

(n

k

)(x− b)kbn−k

=∞∑k=0

bk(x− b)k, where bk :=∞∑n=k

an

(n

k

)bn−k.

This shows that f has a power series expansion about b on (b− s, b+ s).

7.4.19 Theorem. Let f be analytic on an open interval I and let f = 0 on asubinterval (a, b) of I. Then f = 0 on I.

Proof. Let c ∈ I, c > b, and define

A = {t ∈ (a, c) | f (n) = 0 on (a, t] for all n ≥ 0}.

Then A 6= ∅ and t0 := supA ≤ c. Suppose, for a contradiction, that t0 < c.Since f is analytic at t0, f has a Taylor series representation about t0 onJ := (t0 − r, t0 + r) for some r > 0. By continuity and the approximationproperty of suprema, f (n)(t0) = 0 for each n. It follows that f is identicallyzero on J , contradicting the definition of t0. Therefore, t0 = c, hence f = 0on (a, c). Since c was arbitrary, f(x) = 0 for all x ∈ I with x ≥ a. Similarly,f(x) = 0 for all x ∈ I with x ≤ b.


The proof of the following corollary is left to the reader.7.4.20 Corollary. Let f and g be analytic on the open intervals I and J ,respectively. If I ∩ J 6= ∅ and f = g on an open subinterval of I ∩ J , then thereexists an analytic function h on I ∪ J such that h|I = f and h|J = g.

The preceding corollary is known as analytic continuation, as it may beused to extend an analytic function to a larger interval.

Exercises1. Find the interval of convergence of

∑∞n=1 fn(x), where fn(x) =

(a) S (−1)nn2n (x− 1)n. (b) 23nn3xn√

n!. (c) n

2n!(2n)! (x− 2)n.

(d) S (−1)nnxn(n+ 1) ln(n+ 2) . (e) n!xn

nn+2−1/n . (f) (1 + 2/n)n3n (x+ 1)n.

(g) S [3 + (−1)n]n sin[

1n

]xn. (h) 2n + 5n

3n + 4nx2n. (i) S (1.5)(2.5) · · · (n+ .5)

n! xn.

2. Use (7.5) to represent the following functions as power series about thegiven point a. In each case, find the representation interval.

(a) x

(x+ 1)2 , a = 0. (b)S x3

2− 3x , a = 0. (c) x

3 + 2x , a = 1.

3. Use (7.12) to find power series representations for (a)S x ln x, (b) x2 ln xabout the point a = 1.

4. Without using 7.4.16, find the Maclaurin series and representation intervalfor the following functions.

(a) S ln(

1 + 2x1− 3x

). (b) (1 + x2) arctan x. (c) x3e−3x2

.

(d) ex − 1x

. (e) S sin√x√

x. (f) cosx− 1

x2 .

(g) S sin x cosx. (h) 1√9− x2

. (i) sin(x+ π/3).

5.S Use an identity and 7.4.9 to find the Maclaurin series for arccosx.

6. Verify the identity (7.17).

7. Without using 7.4.16, show that

(a) sin x =∞∑n=0

an(x− a)n, a2n = (−1)n sin a(2n)! , a2n+1 = (−1)n cos a

(2n+ 1)! .

(b) cosx =∞∑n=0

bn(x− a)n, b2n = (−1)n cos a(2n)! , b2n+1 = (−1)n+1 sin a

(2n+ 1)! .

8. Prove that∞∑n=0

4(−1)n2n+ 1 =

∞∑n=0

2(2n)!(n!)2(2n+ 1)4n = π.

9. Find a power series representation for∫ x

0 f(t) dt if f(t) =

(a) S sin t− tt3

. (b) cos t√t. (c) cos t− 1

t. (d) et

2 − 1t2

.

10. Find a closed form for the series∑∞n=0 fn(x), |x| < 1, where fn(x) =

(a) n2xn. (b)S (−1)n(2n+ 1)x2n+1. (c) xn+1

(n+ 1)(n+ 2) . (d) n2xn

n+ 1 .

11.S Sum the series∑∞n=0(n2 + n+ 1)3−n.

12. Use 7.4.13 to find a series representation and representation interval for

(a)S ln(1− x)1 + x

. (b) e−x√1− x

. (c) sin x1− x2 . (d)S ex

2 sin x. (e) arctan xx(1 + x2) .

13. By calculating the Maclaurin series of the function sin2 x in two ways,establish the identity

22n+1

(2n+ 2)! =n∑k=0

1(2k + 1)!(2n− 2k + 1)! .

14. By calculating the Maclaurin series of the function cos2 x in two ways,establish the identity

22n−1

(2n)! =n∑k=0

1(2k)!(2n− 2k)! .

15. By calculating the Maclaurin series of the function (1− x)−3/2 in twoways, establish the identity

(2n+ 1)!(n!)24n =

n∑k=0

(2k)!(k!)24k .

16.S Show that the Fibonacci power series s(x) (7.4.3) has the closed form

(1− x− x2)−1, |x| <(√

5− 1)/2.

Conclude from Abel’s continuity theorem (7.4.4) that s(x) cannot con-verge at the endpoint (

√5− 1)/2.

17. Let an → L ∈ R and set s(x) :=∑∞n=0 anx

n, |x| < 1. For m ∈ N, define

ϕm(x) :=2m−1∑k=0

(−1)kxk.

Prove that limx→1− ϕm(x)s(x) = mL. Hint. Use Abel’s continuity theo-rem.

18.S Use the method of 7.4.9 to establish the representation

ln(√

1 + x2 + x)

=∞∑n=0

(−1)n(2n)!(2n+ 1)(n!)24n x

2n+1, |x| < 1.

19. Let R be the radius of convergence of∑n cn(x − a)n and let p ≥ 0.

Prove:(a) If lim infn |cn|np > 0, then R ≤ 1.(b) If lim supn |cn|/np < +∞, then R ≥ 1.

20. Let Ra and Rb denote the radii of convergence of A(x) :=∑n anx

n andB(x) :=

∑n bnxn, respectively. Suppose that

lim sup(|an|/|bn|) < +∞.

Prove that Ra ≥ Rb.

21.S Let Rs and Rt denote the radii of convergence of

s(x) :=∑n

cn(x− a)n and t(x) :=∑n

cn2(x− a)n,

respectively. Prove:

(a) If Rs > 1, then Rt = +∞.(b) If Rs ≤ 1, then no conclusion is possible.

22. Let Rs and Rt denote the radii of convergence of

s(x) :=∑n

cn(x− a)n and t(x) :=∑n

cn(x− a)n2,

respectively. Prove:(a)S If 0 < Rs < +∞, then Rt = 1.(b) If Rs = 0, then Rt ≤ 1, and any value of Rt ≤ 1 is possible.(c) If Rs = +∞, then Rt ≥ 1, and value of Rt ≥ 1 is possible.

23. Suppose that the series

A :=∞∑n=0

an, B :=∞∑n=0

bn, and C :=∞∑n=0

cn

converge, where cn =∑nk=0 anbn−k. Use 7.4.12 and 7.4.4 to prove that

AB = C.

24. Prove that for any a, b ∈ R and n ∈ N,(a

0

)(b

n

)+(a

1

)(b

n− 1

)+ · · ·+

(a

n

)(b

0

)=(a+ b

n

).

25. Let n ∈ Z+. The Bessel function of order n may be defined as the powerseries

Jn(x) =∞∑k=0

(−1)k(n+ k)!k!

(x2

)n+2k.

Prove:(a) The radius of convergence of Jn(x) is +∞.(b) Jn satisfies Bessel’s differential equation x2y′′ + xy′ + (x2 − n2)y = 0.

(c) d

dxxnJn(x) = xnJn−1(x), n ≥ 1.

26. Prove that limx→1−

∞∑n=1

xn

np

{= +∞ if p ≤ 1,< +∞ if p > 1.

27.S Let {cn} tend monotonically to 0. Prove that∑n cnx

n is continuous on[−1, 1).

28. Let f(x) be bounded on [0, 1].(a)S Prove that t(x) :=

∑n nx

nf(x) converges pointwise on [0, 1) anduniformly on [0, r] for 0 < r < 1.(b) Suppose that L := limx→1−(1 − x)−2f(x) exists. Prove that theconvergence of t(x) in (a) is uniform on [0, 1) iff L = 0. (Compare withExercise 7.3.7.)

29. Show that the function

f(x) ={e−1/x2 if x 6= 0,0 otherwise

is not analytic at 0. (See Exercise 4.6.1.)

30.S Prove 7.4.20.

31. Prove: If f(x) is analytic at a, then f ′(x) and g(x) :=∫ xaf(t) dt are

analytic at a.

32. Let f be analytic at a and let {an} be a sequence of distinct real numberssuch that an → a and f(an) = 0 for all n. Prove that f is identically zeroin a neighborhood of a. Hint. Assume that an ↑ a (how?). Construct, byinduction, sequences {a(k)

n }n such that limn a(k)n = a and f (k)(a(k)

n ) = 0for all n and k.

33.S Let f be analytic at a and f(a) 6= 0. Carry out the following steps toshow that 1/f is analytic at a.

(a) Assume that f(a) = 1 and that

f(x) =∞∑n=0

an(x− a)n 6= 0, |x− a| < r

for some r. Define a series g formally by

g(x) =∞∑n=0

bn(x− a)n,

where the sequence {bn} is given recursively by

b0 = 1 and bn = −n∑k=1

akbn−k, n ≥ 1.

Show that if g(x) converges for |x − a| < r1 for some 0 < r1 < r,then f(x)g(x) = 1 for |x− a| < r1.

(b) Show that if |an| ≤Mn for all n, then |bn| ≤ (2M)n for all n.(c) Conclude that g is analytic at a and that g = 1/f .

Part II

Functions of SeveralVariables

Chapter 8Metric Spaces

The essential feature in the notion of limit of a function is the idea of nearness.This is made precise by a distance function, which, in the case of limits onR,is derived from the absolute value function. It turns out that there are manyother important mathematical structures equipped with a distance functionand therefore admitting a definition of limit. In this chapter, we examine thegeneral properties of these structures.

8.1 Definitions and Examples8.1.1 Definition. A metric on a nonempty set X is a function d : X×X → Rsuch that, for all x, y, z ∈ X,

(a) d(x, y) ≥ 0 (nonnegativity),

(b) d(x, y) = 0 iff x = y (coincidence),

(c) d(x, y) = d(y, x) (symmetry), and

(d) d(x, y) ≤ d(x, z) + d(y, z) (triangle inequality).

The ordered pair (X, d) is called a metric space. A nonempty subset E of Xwith the metric d

∣∣E×E is called a subspace of X and is denoted by (E, d). ♦

The real number system is a metric space under the usual metric d(x, y) =|x− y|. The following example shows that any nonempty set may be given ametric.

8.1.2 Example. (Discrete metric space). On a nonempty set X defined(x, x) = 0 for all x ∈ X, and d(x, y) = 1 if x 6= y. Then d is easily seen to bea metric, called the discrete metric on X. For example, the triangle inequalityd(x, y) ≤ d(x, z) + d(y, z) holds because the left side of the inequality is atmost 1, in which case either x 6= z or y 6= z implying that the right side mustbe at least 1. ♦

8.1.3 Definition. A subset E of a metric space X is said to be bounded iffor some x0 ∈ X and M > 0, d(x, x0) ≤M for all x ∈ E. ♦

231

The point x0 in the preceding definition may be replaced by any otherpoint y0 ∈ X since for x ∈ E,

d(x, y0) ≤ d(x, x0) + d(x0, y0) ≤M + d(x0, y0).

The notions of convergence and completeness readily carry over to generalmetric spaces:

8.1.4 Definition. A sequence {xn} in a metric space (X, d) is said to convergeto a member x of X if limn d(xn, x) = 0. In this case we write xn → x orlimn xn = x. A cluster point of a sequence in X is the limit of a convergentsubsequence. ♦

The limit of a sequence {xn} in X, if it exists, must be unique. Indeed, ifxn → x and xn → y, then, by the triangle inequality,

0 ≤ d(x, y) ≤ d(x, xn) + d(y, xn)→ 0,

hence d(x, y) = 0 and so x = y.

8.1.5 Definition. A sequence {xn} in a metric space (X, d) is said to beCauchy if limm,n d(xm, xn) = 0. A metric space (X, d) is said to be complete ifevery Cauchy sequence in X converges to a member of X. A subset E of X iscomplete if it is complete as a subspace of X, that is, every Cauchy sequencein E converges to a member of E. ♦

The real number system is complete under the usual metric. The subspaceQ of R is not complete: a sequence of rational numbers converging to anirrational number is Cauchy. A discrete metric space is complete, since everyCauchy sequence is eventually constant and therefore trivially converges.

8.1.6 Proposition. (a) Every Cauchy sequence is bounded.

(b) Every convergent sequence is Cauchy, hence bounded.

Proof. (a) If {xn} is Cauchy, choose an index N such that d(xm, xn) < 1 forall m, n ≥ N . Then, for all n ∈ N,

d(xN , xn) < 1 + max{d(xN , x1), d(xN , x2), . . . , d(xN , xN−1)}.

(b) If xn → x, then the inequality d(xm, xn) < d(xm, x) + d(xn, x) implies that{xn} is Cauchy.

The notions of pointwise convergence and uniform convergence of a sequenceof real-valued functions easily extend to general metric spaces:

8.1.7 Definition. Let S be a nonempty set and let (X, d) be a metric space.A sequence of functions fn : S → X is said to converge pointwise to a functionf : S → X if fn(s)→ f(s) for each s ∈ S. In this case we write f = limn f orfn → f (on S). The sequence converges uniformly to f on S if for each ε > 0there exists N ∈ N such that d

(fn(s), f(s)

)< ε for all n ≥ N and s ∈ S. ♦

Metric Spaces 233

8.1.8 Definition. Let X be a vector space. A norm on X is a function ‖ · ‖from X to R such that for all x, y ∈ X and t ∈ R

(a) ‖x‖ ≥ 0 (nonnegativity),

(b) ‖x‖ = 0 iff x = 0 (coincidence),

(c) ‖tx‖ = |t| ‖x‖ (absolute homogeneity),

(d) ‖x+ y‖ ≤ ‖x‖+ ‖y‖ (triangle inequality).

The pair (X , ‖ · ‖) is then called a normed vector space. ♦

The proof of the following proposition is left to the reader.

8.1.9 Proposition. If (X , ‖ · ‖) is a normed vector space, then the functiond(x,y) := ‖x− y‖ is a metric on X .

From 1.6.4 and Exercise 1.6.4 we see that ‖ · ‖2, ‖ · ‖1, and ‖ · ‖∞ arenorms on Rn, hence, according to 8.1.9, give rise to metrics. We denote these,respectively, by d2,

d1, and d∞.In Exercise 17 the reader is asked to show that Rn is complete in each of

these metrics. The metric d2 is called the Euclidean metric on Rn. The metricd1 is the `1 metric on Rn and d∞ the max metric on Rn. Clearly, for n = 1,all three metrics reduce to absolute value on R.

8.1.10 Example. Let S be a nonempty set and let B(S) denote the set of allbounded real-valued functions on S. Then B(S) is a vector space under theoperations of addition f + g and scalar multiplication cf defined by

(f + g)(s) = f(s) + g(s) and (cf)(s) = cf(s), s ∈ S.

The supremum norm of f ∈ B(S) is defined by

‖f‖∞ = sup {|f(s)| : s ∈ S} .

It is easy to check that ‖ · ‖∞ is indeed a norm. For example, the triangleinequality follows by taking the supremum over s ∈ S in the inequality

|f(s) + g(s)| ≤ |f(s)|+ |g(s)| ≤ ‖f‖∞ + ‖g‖∞.

Note that convergence of a sequence of functions in B(S) is simply uniformconvergence on S. For this reason, ‖ · ‖∞ is also called the uniform norm.

The space B(S) is complete in the metric d∞(f, g) := ‖f − g‖∞ induced bythe norm. To see this, let {fn} be a Cauchy sequence in B(S) and let ε > 0.Choose N such that d∞(fn, fm) < ε for all m, n ≥ N . For such m, n

|fn(s)− fm(s)| < ε for all s ∈ S, (8.1)

hence {fn(s)} is a Cauchy sequence in R for every s ∈ S. Since R is complete,fn(s)→ f(s) for some f(s) ∈ R. Fixing n in (8.1) and letting m→ +∞ yields

|fn(s)− f(s)| ≤ ε for all s ∈ S and all n ≥ N.

This shows that f is bounded and that fn → f in B(S). ♦

In the case S = N, B(S) may be identified with the set of all boundedsequences and as such is denoted by `∞.

8.1.11 Example. Let `1 denote the set of all sequences a = {an} in R suchthat

∑n |an| < +∞. Clearly, `1 is a vector subspace of `∞. It is easy to check

that ‖a‖1 :=∑n |an| defines a norm on `1. We show that

(`1, ‖·‖1

)is complete

in this norm.Let {an := (a1,n, a2,n, . . .)}∞n=1 be a Cauchy sequence in `1, and let ε > 0.

Choose N so that

‖an − am‖1 =∞∑k=1|ak,n − ak,m| < ε for all n, m ≥ N. (8.2)

Since |ak,n − ak,m| ≤ ‖an − am‖1, the sequence {ak,n}n is Cauchy for each k,hence converges. Let ak = limn ak,n. Fix K ∈ N and n ≥ N . From (8.2),

K∑k=1|ak,n − ak,m| < ε for all m ≥ N .

Letting m→ +∞, we obtain∑Kk=1 |ak,n − ak| ≤ ε. Since K was arbitrary,

‖an − a‖1 =∞∑k=1|ak,n − ak| ≤ ε for all n ≥ N.

It follows that a ∈ `1 and an → a. ♦

8.1.12 Definition. Let (X, d) and (Y, ρ) be metric spaces. The product metricd× ρ on X × Y is defined by

(d× ρ)((x, y), (a, b)

):= d(x, a) + ρ(y, b), x, a ∈ X, y, b ∈ Y.

The pair (X × Y, d× ρ) is called the product of the metric spaces X and Y . ♦

In Exercise 13 the reader is asked to prove, among other things, that d× ρis indeed a metric and that a sequence {(xn, yn)} converges to (a, b) in X × Yin this metric iff xn → a in X and yn → b in Y .

Metric Spaces 235

Exercises1.S Determine whether d is a metric on R2, where d((x1, x2), (y1, y2)) =

(a) 2|x1 − y1|+ 3|x2 − y2|. (b) |x21 − y2

1 |+ |x22 − y2

2 |.(c) |x3

1 − y31 |+ |x3

2 − y32 |. (d) |x1 − x2|+ |y1 − y2|.

(e) |x1 − y1|+ |x2 − y2|2 + |x1 − y1|+ |x2 − y2|

. (f) |ex1 − ey1 |+ |ex2 − ey2 |.

2. (p-adic metric). Let p be a fixed prime number. Define ρp(n, n) = 0, andfor m 6= n ∈ Z define ρp(m,n) = 1/pα, where α is the power of p in theunique prime factorization of |m − n|. (For example, ρ2(42, 2) = 1/8,ρ5(42, 2) = 1/5, and ρ3(42, 2) = 1.) Show that ρp is a metric on Z.

3.S (Hamming distance). Let A be a nonempty set and X := An. Forx = (x1, . . . , xn), y = (y1, . . . , yn) ∈ X define d(x, y) to be the numberof indices j for which xj 6= yj . Show that d is a metric on X. (Themetric is named after Richard Hamming, who pioneered the field of errorcorrecting codes.)

4. Let X be as in Exercise 3. Define ρ(x, x) = 0, and for distinct pointsx = (x1, . . . , xn) and y = (y1, . . . , yn) in X define ρ(x, y) = 2−j , where jis the smallest index for which xj 6= yj . Show that ρ is a metric on X.

5.S Prove that a metric d satisfies

|d(x, y)− d(a, b)| ≤ d(x, a) + d(y, b).

Conclude that if xn → a and yn → b, then d(xn, yn)→ d(a, b).

6. Let X and Y be nonempty sets and let f : X → Y be one-to-one. Showthat if ρ is a metric on Y , then d(x, y) := ρ(f(x), f(y)) defines a metricon X.

7. Prove that a finite union of bounded sets in a metric space is bounded.

8. Prove 8.1.9.

9. ⇓1 Prove that a Cauchy sequence with a cluster point converges.

10.S Let E1, . . . , Em be complete subspaces of (X, d). Prove that the finiteunion E1 ∪ · · · ∪Em is complete. Does the analogous assertion hold for acountable union of complete subspaces?

11. Let X := [1,+∞) have the metric

d(x, y) = |x−1 − y−1|

(see Exercise 6). Show that xn → x with respect to the usual metric onX iff xn → x with respect to d. Is (X, d) complete?



12. Same as Exercise 11 but with the metric

ρ(x, y) =∣∣x(1 + x2)−1 − y(1 + y2)−1∣∣ .

13.S Let (X, d) and (Y, ρ) be metric spaces and let (Z, η) = (X × Y, d × ρ)be the product space. Prove:

(a) η is a metric on Z.(b) A sequence {(xn, yn)} is Cauchy in Z iff {xn} is Cauchy in X and{yn} is Cauchy in Y .

(c) A sequence {(xn, yn)} converges to (x, y) in Z iff xn → x in X andyn → y in Y .

(d) Z is complete iff X and Y are complete.

14. Metrics d, ρ on a set X are said to be metrically equivalent if there existpositive constants a and b such that

d(x, y) ≤ a ρ(x, y) and ρ(x, y) ≤ b d(x, y) for all x, y ∈ X.

For example, by Exercise 1.6.6, the metrics d1, d2, and d∞ are metricallyequivalent. Suppose that d and ρ are metrically equivalent. Let {xn} bea sequence in X and let x ∈ X. Prove the following:

(a) xn → x in (X, d) iff xn → x in (X, ρ).(b) {xn} is Cauchy in (X, d) iff {xn} is Cauchy in (X, ρ).(c) (X, d) is complete iff (X, ρ) is complete.

15.S Let d be a metric on a set X and a > 0. Define

ρ(x, y) := min{d(x, y), a}.

Prove:(a) ρ is a metric on X.(b) A sequence is Cauchy in (X, ρ) iff it is Cauchy in (X, d).(c) A sequence converges in (X, ρ) iff it converges in (X, d).(d) (X, ρ) is complete iff (X, d) is complete.

Are d and ρ metrically equivalent? Does σ(x, y) := max{d(x, y), a}define a metric on X?

16. Let ρ1 and ρ2 be metrics on X. Prove that max{ρ1, ρ2} is a metric. Ismin{ρ1, ρ2} a metric?

17. Let x := (x1, . . . , xn), xk := (x1,k, . . . , xn,k) ∈ Rn, k = 1, 2, . . .. Prove:(a) xk → x in (Rn, d2) iff xj,k → xj for j = 1, . . . , n.(b) {xk} is Cauchy in (Rn, d2) iff {xj,k}∞k=1 is Cauchy in R for eachj = 1, . . . , n.(c) Rn is complete in each of the metrics d1, d2, d∞. (Use Exercise 14.)

Metric Spaces 237

18.S Let d be a metric on a set X and define

ρ(x, y) := d(x, y)1 + d(x, y) .

Verify that (a)–(d) of Exercise 15 hold. Are d and ρ metrically equivalent?

19. Let ρ1 and ρ2 be metrics on a set X and let α, β > 0. Define

ρ(x, y) := αρ1(x, y) + βρ2(x, y).

Prove:(a) ρ is a metric on X.(b) A sequence {xn} converges to x in (X, ρ) iff it converges to x in both(X, ρ1) and (X, ρ2).

20.S Let {dk}∞k=1 be a sequence of metrics on a set X. For x, y ∈ X define

ρk(x, y) = dk(x, y)1 + dk(x, y) and ρ(x, y) =

∞∑k=1

2−kρk(x, y).

Prove:(a) ρ is a metric on X. (See Exercise 18.)(b) ρ(xn, x)→ 0 iff dk(xn, x)→ 0 for every k.

21. Let C(R) denote the set of continuous, real-valued functions on R. Forf, g ∈ C(R) define

ρ(f, g) :=∞∑k=1

2−kρk(f, g),

where

dk(f, g) = sup−k≤x≤k

|f(x)− g(x)| and ρk(f, g) = dk(f, g)1 + dk(f, g) .

Prove:(a) ρ is a metric on C(R).(b) fn → f in this metric iff fn → f uniformly on each bounded subsetof R.(c) C(R) is complete in this metric.

22. For f ∈ C([a, b]) define ‖f‖1 =∫ ba|f |. Show that ‖ · ‖1 is a norm on

C([a, b]) and that C([a, b]) is not complete in the metric induced by thisnorm.

23.S Show that the sequence of functions

fn(x, y) = (1 + xn)1/n(1 + yn)−1/n

converges uniformly to f(x, y) = x/y on [1, b]× [1, b] for any b > 1.

8.2 Open and Closed Sets

Throughout this section, (X, d) denotes an arbitrary metric space.

It is frequently useful to formulate assertions regarding a metric space Xin terms of certain subsets of X rather than the metric. The subsets of mostinterest in this regard are described in the next two definitions.

8.2.1 Definition. Let x ∈ X and r > 0. The sets

Br(x) := {y ∈ X : d(x, y) < r} and Cr(x) := {y ∈ X : d(x, y) ≤ r}

are called, respectively, the open and closed balls with center x and radius r.The set

Sr(x) := Cr(x) \Br(x) = {y ∈ X : d(x, y) = r}is called the sphere with center x and radius r. The ball Br(x) is also called aneighborhood of x. ♦

The open (closed) balls in R with the usual metric are simply the boundedopen (closed) intervals. The spheres are the endpoints of these intervals. Theopen (closed) balls in Euclidean space R2 are open (closed) disks and thespheres are circles. The open and closed balls in a discrete metric space X arethe sets X and {x}; the spheres are X \ {x} and the empty set.

8.2.2 Definition. A subset U of X is said to be open if either U = ∅ or elseU has the following property:

For each x ∈ U there exists ε > 0 such that Bε(x) ⊆ U .

A subset of X is closed if its complement is open. The collection of all opensets is called the (metric ) topology of (X, d). ♦

In any metric space, X and ∅ are both open and closed. There are manymetric spaces for which these are the only subsets that are both open andclosed; Euclidean space Rn is an important example (see Section 8.7). Thesets Q and I are neither open nor closed in R since every open ball (= openinterval) contains members of both sets. A finite set F is always closed. Indeed,if x ∈ F c, then Br(x) ⊆ F c, where r = min{d(x, y) : y ∈ F}, hence F c isopen.

8.2.3 Proposition. An open ball is open, a closed ball is closed, and a sphereis closed.

Proof. Let x ∈ Br(x0). We claim that Bε(x) ⊆ Br(x0), where ε := r−d(x, x0).Indeed, if y ∈ Bε(x) then

d(y, x0) ≤ d(y, x) + d(x, x0) < ε+ d(x, x0) = r,

Metric Spaces 239

x0

r

Br(x0)

Bε(x)

xy

ε

FIGURE 8.1: An open ball is open.

hence y ∈ Br(x0) (Figure 8.1). Since x was arbitrary, Br(x0) is open. A similarargument shows that Cr(x0)c and Sr(x0)c are open, hence Cr(x0) and Sr(x0)are closed. (See Exercise 2.) That Sr(x0) is closed also follows from 8.2.6below.

8.2.4 Theorem. Open sets in (X, d) have the following properties:

(a) If Ui is open for each i in an index set I, then⋃

i∈I Ui is open.

(b) If V1, . . . , Vn are open, then V1 ∩ · · · ∩ Vn is open.

Proof. (a) Let U denote the union. If x ∈ U , then x ∈ Ui for some i, hencethere exists r > 0 such that Br(x) ⊆ Ui ⊆ U . Therefore, U is open.

(b) Let V denote the intersection and let x ∈ V . For each j = 1, . . . , nthere exists rj > 0 such that Brj (x) ⊆ Vj . Then Br(x) ⊆ V , where r =min{r1, . . . , rn}. Therefore, V is open.

8.2.5 Corollary. A nonempty subset U is open iff it is the union of openballs.

For example, in a discrete metric space, every subset is a union of openballs {x} = B1(x) and hence is open. It follows that every subset is also closed.

8.2.6 Corollary. Closed sets in (X, d) have the following properties:

(a) If Ci is closed for each i in an index set I, then⋂

i∈I Ci is closed.

(b) If C1, . . . , Cn are closed, then C1 ∪ · · · ∪ Cn is closed.

Proof. In (a), each Cci is open, hence, using DeMorgan’s law and 8.2.4,(⋂i∈I

Ci

)c=⋃i∈I

Cci

is open, that is,⋂

i∈I Ci is closed. Part (b) is proved in a similar manner, usingDeMorgan’s law for complements of finite unions.


8.2.7 Theorem. A subset C of X is closed iff C contains the limit of eachconvergent sequence in C.

Proof. Assume that C is closed and let {xn} be a sequence in C with xn → x.If x 6∈ C, then, because Cc is open, there exists ε > 0 such that Bε(x)∩C = ∅.But then xn is eventually in Bε(x) ⊆ Cc, impossible. Therefore, x ∈ C.

Now suppose C is not closed. Then Cc is not open, hence there existsx ∈ Cc such that B1/n(x) 6⊆ Cc, that is, B1/n(x) ∩ C 6= ∅, for every n ∈ N .Choosing a point xn in this intersection, we then obtain a sequence {xn} in Cthat converges to a point not in C.

8.2.8 Corollary. Let (X, d) be a metric space and let Y be a subspace of X.

(a) If X is complete and Y is closed, then Y is complete.

(b) If Y is complete, then Y is closed.

Proof. (a) Let {yn} be a Cauchy sequence in Y . Since X is complete, thereexists x ∈ X such that yn → x. Since Y is closed, x ∈ Y . Therefore, Y iscomplete.

(b) Let {yn} be a sequence in Y such that yn → x ∈ X. Then {yn} isCauchy and hence converges to some y ∈ Y . Since limits are unique, x = y.Therefore, x ∈ Y , hence Y is closed.

8.2.9 Example. Let C([a, b]) denote the set of all continuous real-valuedfunctions on the interval [a, b]. Each such function is bounded, hence C([a, b])is a vector subspace of B([a, b]) (8.1.10). Since the uniform limit of continuousfunctions is continuous (7.2.2), C([a, b]) is closed in the uniform metric. SinceB([a, b]) is complete, 8.2.8(a) shows that C([a, b]) is complete. ♦

8.2.10 Example. The subspace D([a, b]) of C([a, b]) consisting of all differ-entiable functions is not complete in the uniform metric. To see this take[a, b] = [0, 1] and define a sequence of continuous functions gn(x), n ≥ 2, on[0, 1] such that gn = 1 on [0, 1/2], gn = 0 on [1/2 + 1/n, 1], and gn is linear on[1/2, 1/2 + 1/n]. Also, define g(t) on [0, 1] by g = 1 on [0, 1/2] and g = 0 on(1/2, 1]. (See Figure 8.3.)

12+ 1

n

g

12

1

x12

1

gn

1

FIGURE 8.2: The functions gn and g.

Metric Spaces 241

Now set

fn(x) =∫ x

0gn(t) dt and f(x) =

∫ x

0g(t) dt, x ∈ [0, 1].

Then fn ∈ D([0, 1]), f ∈ C([0, 1]), and

|fn(x)− f(x)| ≤∫ 1

0|gn − g| =

∫ 1/2+1/n

1/2gn = 1

2n.

Therefore, fn → f uniformly on [0, 1]. Since f is not differentiable at 1/2,D([0, 1]) is not closed. ♦

8.2.11 Definition. Let Y be a subset of X. A subset A ⊆ Y is said to berelatively open (relatively closed ) in Y if A is open (closed) in the subspace(Y, d) of (X, d). ♦

8.2.12 Theorem. Let A ⊆ Y ⊆ X. Then A is relatively open (relativelyclosed) in Y iff A = Y ∩B for some open (closed) subset B of X.

Proof. By definition, a nonempty open set A in the subspace Y is a union ofopen balls in Y . The latter are of the form Y ∩Br(y), where y ∈ Y and Br(y)is an open ball of X. Therefore, A = Y ∩ B, where B is the correspondingunion of the open balls Br(y).

From the first paragraph, the closed sets of Y are of the form Y \A = Y ∩Bc,where B is open in X. Since Bc is closed in X, the assertion regarding closedsets follows.

8.2.13 Definition. Let X be a vector space and let a, b ∈ X . The linesegment from a to b is defined by

[a : b] = {(1− t)a+ tb : 0 ≤ t ≤ 1} .

A subset E of X is said to be convex if a, b ∈ E implies [a : b] ⊆ E. ♦

a b

a

b

FIGURE 8.3: Convex and non-convex sets.

Recall that, by definition, the convex subsets of R are the intervals. Thereader may easily check that if D ⊆ Rp and E ⊆ Rq are convex, then D × E,as a subset of Rp+q, is convex. In particular, Cartesian products I1 × · · · × Inof intervals Ij are convex in Rn. Other examples are given in Exercise 5.


Exercises1.S Sketch B1(0, 0) ⊆ R2 for the metrics d1 and d∞ derived from the norms‖ · ‖1 and ‖ · ‖∞.

2. Prove that a closed ball is closed.

3.S Let x, y be distinct points in a metric space (X, d). Find the largestnumber r such that Br(x) ∩Br(y) = ∅.

4. Show that every open subset U of Rn is a countable union of open ballsas well as a countable union of bounded open n-dimensional intervals(a1, b1)× · · · × (an, bn).

5.S Prove that open and closed balls in a normed vector space are convex.Are spheres convex?

6. Show by example that arbitrary intersections of open sets may not beopen and that arbitrary unions of closed sets may not be closed.

7. Metrics d and ρ on a set X are said to be topologically equivalent if theyhave the property that a sequence {xn} converges to x in (X, d) iff itconverges to x in (X, ρ).(a) Prove that metrically equivalent metrics are topologically equivalent.(See Exercise 8.1.14.)(b) Prove that d and ρ are topologically equivalent iff (X, d) and (X, ρ)have the same topologies, that is, the metrics produce the same opensets.(c) Are topologically equivalent metrics necessarily metrically equivalent?

8.S Prove that the metric ρ(x, y) = |ex− ey| on R is topologically equivalentto the usual metric. Is R complete in this metric? Is ρmetrically equivalentto the usual metric on R?

9. Let Y be a subspace of (X, d) with the property that for some r > 0,d(x, y) ≥ r for all x, y ∈ Y with x 6= y. Prove that Y is complete, henceclosed. Conclude that finite metric spaces, discrete metric spaces, andthe subspaces N and Z of R are complete.

10. Let xn → x0 in (X, d). Prove that the set C := {x0, x1, x2, . . .} is closedin X.

11. Let Y be open (closed) in (X, d). Prove that a subset U of Y is relativelyopen (relatively closed) in Y iff it is open (closed) in X.

12.S Prove that the set

C := {f ∈ C([0, 1]

): f(x) = f(1− x) for all x ∈ [0, 1]}

is closed in the supremum metric (8.1.10) but not in the metric ofExercise 8.1.22.

Metric Spaces 243

13. Prove that the subspaces

V :={f ∈ B

([0,+∞)

): limx→+∞

f(x) exists in R}

and

W :={f ∈ V : lim

x→+∞f(x) = 0

}are closed in the supremum metric.

8.3 Closure, Interior, and Boundary

Throughout this section, (X, d) denotes an arbitrary metric space.

8.3.1 Definition. Let E ⊆ X.

• The closure cl(E) = clX(E) of E in X is the intersection of all closedsubsets of X containing E.

• The interior int(E) = intX(E) of E is the union of all open subsets ofX contained in E.

• The boundary bd(E) = bdX(E) of E is the set cl(E) \ int(E). ♦

8.3.2 Examples. (a) Since every nonempty open set of R (with the usualmetric) contains rational and irrational points,

int(Q) = int(I) = ∅ and cl(Q) = cl(I) = R, hence bd(Q) = bd(I) = R.

For bounded intervals we have

cl((a, b)) = [a, b], int([a, b]) = (a, b), and bd((a, b)) = bd([a, b]) = {a, b}.

(b) In a discrete metric space a subset E is both open and closed, hencecl(E) = int(E) = E and bd(E) = ∅. ♦

By 8.2.4 and 8.2.6, int(E) is open and cl(E) is closed, hence bd(E) is closed.The following proposition asserts that int(E) is the largest open set containedin E and cl(E) is the smallest closed set containing E.

8.3.3 Proposition. If U is open, C is closed, and U ⊆ E ⊆ C, then

U ⊆ int(E) ⊆ E ⊆ cl(E) ⊆ C.

Proof. Simply note that U is one of the open sets in the definition of int(E)and that C is one of the closed sets in the definition of cl(E).

8.3.4 Corollary. Let E ⊆ X.

(a) E is open in X iff int(E) = E. (b) int(int(E)

)= int(E).

(c) E is closed in X iff cl(E) = E. (d) cl(cl(E)

)= cl(E).

Proof. If E is open, take U = E in the proposition. If E is closed, take C = E.This proves (a) and (c). Parts (b) and (d) follow from these.

8.3.5 Proposition. For any subset E of X,

(a) cl(Ec) =(int(E)

)c, (b) int(Ec) =

(cl(E)

)c,

(c) bd(E) = cl(E) ∩ cl(Ec) = bd(Ec).

Proof. For (a) we have

(int(E)

)c =[ ⋃

U⊆EU open

U

]c=

⋂C⊇EcC closed

C = cl(Ec).

Parts (b) and (c) follow from (a).

8.3.6 Proposition. Let E ⊆ X. Then x ∈ cl(E) iff there exists a sequence{an} in E such that an → x.

Proof. Let C be the set of all limits of convergent sequences in E, includingconstant sequences, so E ⊆ C. We show that C = cl(E), which will establishthe proposition.

First, C is closed. If not, then Cc is not open, hence there exists y ∈ Ccand for each n a point yn ∈ B1/n(y) ∩ C. By definition of C, each yn is thelimit of a sequence in E, hence there exists an ∈ E such that d(yn, an) < 1/n.By the triangle inequality, d(an, y) < 2/n hence an → y. But then y ∈ C, acontradiction. Therefore C must be closed.

It follows that cl(E) ⊆ C. Since cl(E) contains the limit of all convergentsequences in E (8.2.7), C ⊆ cl(E). Therefore, C = cl(E).

8.3.7 Example. (Topologist’s sine curve). Let

A = {(x, sin(1/x)) : 0 < x < 2/π} and B = {0} × [−1, 1].

We show that cl(A) = A ∪B.For the inclusion A ∪B ⊆ cl(A), note first that{

sin 1x

: 2(4n+ 3)π ≤ x ≤

2(4n+ 1)π

}= [−1, 1], n ∈ Z+.

It follows from the intermediate value theorem that for each y ∈ [−1, 1] andn ∈ N there exists xn ∈ R such that

0 < xn ≤2

(4n+ 1)π and sin(1/xn) = y.

Metric Spaces 245

Since (xn, y) ∈ A and (xn, y) → (0, y), (0, y) ∈ cl(A). Therefore, B ⊆ cl(A),hence A ∪B ⊆ cl(A).

The reverse inclusion will follow if we show that A ∪B is closed. For thiswe use 8.2.7. Let {(xn, yn)} be a sequence in A ∪B with (xn, yn)→ (x, y).Case 1. There exists a subsequence {(xnk , ynk)} that lies in B. Then, since

B is closed, (x, y) ∈ B.Case 2. {(xn, yn)} eventually lies in A, so yn = sin(1/xn) for all sufficiently

large n. Since limt→0 sin(1/t) does not exist, x cannot be zero, hence y =sin(1/x), that is, (x, y) ∈ A.

In each case (x, y) ∈ A ∪B, hence A ∪B is closed. ♦

8.3.8 Definition. A subset E of X is said to be dense in X if cl(E) = X.Equivalently, every x ∈ X is the limit of a sequence in E. ♦

By 8.3.2, Q and I are dense in R. The set of all points in R2 with rationalcoordinates is dense in R2. A discrete space has no proper dense subsets. InSection 8.8 we show that the set of polynomials on [a, b] is dense in C([a, b]) inthe uniform norm.

8.3.9 Example. (Dirichlet). If ξ is irrational, then the set

E := {nξ +m : m ∈ Z, n ∈ N}

is dense in R. To verify this we show that for any x ∈ R and k ∈ N there existsz ∈ E such that |z − x| < 1/k.

To this end, let

yj = jξ − bjξc, j = 1, . . . , k + 1.

Because ξ is irrational, 0 < yj < 1, hence yj must be in one of the intervals(0, 1/k), (1/k, 2/k), . . . , ((k − 1)/k, 1). Since there are only k intervals, one ofthese must contain yi and yj for some i 6= j.2 By the irrationality of ξ, yj 6= yi.Hence one of the quantities ±(yj − yi), call it y, is in E and |y| < 1/k.

We consider two cases. If y > 0, choose m ∈ Z such that x + m > 0and let n be the smallest integer such that ny > x + m. Then n ∈ N and(n− 1)y ≤ x+m, hence z := ny −m ∈ E and

0 < z − x = ny −m− x ≤ y < 1/k.

On the other hand, if y < 0, choose m ∈ Z such that x+m < 0 and let n bethe smallest integer such that n(−y) > −(x+m), that is, ny < x+m. Again,z := ny −m ∈ E, and in this case, since (n− 1)y ≥ x+m,

−1/k < y ≤ ny −m− x = z − x < 0.

In either case, |z − x| < 1/k, as required. ♦

2This is an instance of the so-called pigeon hole principle.


8.3.10 Example. We show that the set S := {sinn : n ∈ N} is dense in theinterval [−1, 1]. Let x ∈ R and take ξ = 1/2π in the preceding example. Thennk/2π + mk → x for some integer sequences {nk} and {mk} with nk > 0,hence

sinnk = sin[2π(nk/2π +mk)

]→ sin(2πx).

Since x was arbitrary, every member of [−1, 1] is the limit of a sequence in S.A similar argument shows that {cosn : n ∈ N} is dense in [−1, 1]. ♦

8.3.11 Definition. A metric space is said to be separable if it has a countabledense subset. ♦

For example, Rn is separable (consider all points with rational coordinates).An uncountable discrete space is not separable. The space C([a, b]) is separablein the supremum norm (Exercise 19).

Exercises1. Let (X, d) be a metric space and A, B ⊆ X. Prove the following:

(a) S cl(A ∪B) = cl(A) ∪ cl(B).(c) int(A ∩B) = int(A) ∩ int(B).(e) bd(A ∪B) ⊆ bd(A) ∪ bd(B).(g) bd(int(A)) ⊆ bd(A).

(b) cl(A ∩B) ⊆ cl(A) ∩ cl(B).(d) S int(A ∪B) ⊇ int(A) ∪ int(B).(f) S bd(cl(A)) ⊆ bd(A).(h) cl(A) = A ∪ bd(A).

Show by examples that the inclusions may be strict.

2. Prove: bd(A ∩B) ⊆(A∩bd(B)

)∪(B∩bd(A)

)∪(bd(A)∩bd(B)

). Show

that the inclusion may be strict.

3. Find cl(A) \A for A =

(a) {(1/n, 1/m) : m,n ∈ N} . (b) S{

(cos t, sin t, e−t) : t > 0}.

(c){(

cos t, sin t, t

1 + |t|

): t ∈ R

}. (d) {(t cos t, t sin t, t) : t > 0} .

(e) S

{(cos t1 + t

,sin t1 + t

): t > 0

}. (f) S

{(t cos t1 + t

,t sin t1 + t

): t > 0

}.

4. An induction argument shows that parts (a) and (c) of Exercise 1 holdfor any finite number of sets. Show, by example, that the analogousstatements for infinitely many sets are false.

5. Prove that if cl(A) ∩ cl(B) = ∅, then int(A ∪B) = int(A) ∪ int(B).

6. Let Y be a subspace of (X, d) and A ⊆ Y . Prove that(a)S clY (A) = clX(A) ∩ Y . (b) intX(A) ∩ Y ⊆ intY (A).(c) bdY (A) ⊆ bdX(A).Show by examples that the inclusions in (b) and (c) may be strict.

Metric Spaces 247

7. Let xn → x0 in X. Show that cl({x1, x2, . . .}

)= {x0, x1, x2, . . .}.

8.S Let fn(x) = xn, 0 ≤ x ≤ 1. Show that the set {f1, f2, . . .} is closed in(C([0, 1]), ‖ · ‖∞

). Is it closed in

(C([0, 1]), ‖ · ‖1

)?

9. Let B = Br(x0) and C = Cr(x0). Prove that

(a)S B ⊆ int(C). (b) cl(B) ⊆ C. (c) bd(B) ⊆ C \B.Show, by example, that the inclusions may be strict.

10. Prove that in a normed vector space the inclusions in Exercise 9 areequalities.

11. Prove that the set E = {(x, y) : x, y ∈ Q and x 6= y} is neither opennor closed and is dense in Euclidean space R2.

12. Let x ∈ R, r ∈ Q, r 6= 0. In each case, find the largest interval in whichthe given set is dense.(a) {sin(rn) : n ∈ N}. (b)S {sin(x+ n) : n ∈ N}.(c) {sinn cosn : n ∈ N}. (d)

{tan2 n : n ∈ N

}.

13. Show that limn sin(πnx) does not exist for any irrational number x. Con-clude that limn sin(nr) does not exist for any nonzero rational number r.

14. (a) Let E be dense in X and let F be a proper finite subset of E. Showthat E \ F is dense in X \ F . Is E \ F is necessarily dense in X?(b) Let X be a normed vector space with {a1,a2, . . .} dense in X . Showthat {an : n ≥ N} is dense in X for every N ∈ N. Conclude that{sinn : n ≥ N} is dense in [−1, 1].

15. Show that lim infn sinn = −1 and lim supn sinn = 1.

16.S Let Y be dense in X and U ⊆ X open. Show that U ∩ Y is dense in U .What if U is not open?

17. Let X = Rn with the Euclidean metric and let Y ⊆ X have the propertyof Exercise 8.2.9. Prove that Y c is open and dense in X. Conclude thatNc and Zc are open and dense in R.

18. Show that in a separable space, every nonempty open set U is a countableunion of open balls.

19. Use the Weierstrass approximation theorem (8.8.5, below) to show that(C([a, b]), ‖ · ‖∞

)is separable

20. (a)S Let {Ii : i ∈ I} be a family of open intervals in R with the propertythat each pair has a nonempty intersection. Show that

⋃i∈I Ii is an open

interval.(b) Prove that every nonempty open set in R is a countable union ofdisjoint open intervals.

8.4 Limits and Continuity

In this section, (X, d), (Y, ρ), and (Z, µ) denote arbitrary metric spaces.

8.4.1 Definition. Let E ⊆ X. A member a ∈ X is said to be an accumulationpoint of E if E ∩

(Br(a) \ {a}

)6= ∅ for each r > 0. A member of E that is not

an accumulation point is called an isolated point of E. ♦

It follows from the definition that a is an accumulation point of E iff thereexists a sequence of distinct points of E converging to a.

No subset of a discrete metric space has an accumulation point. The setof functions x 7→ xn in C([0, 1]), n ∈ N, has no accumulation points in theuniform norm but the identically zero function is an accumulation point in thenorm ‖ · ‖1.

8.4.2 Definition. Let E ⊆ X, f : E → Y , and let a ∈ X be either a memberof E or an accumulation point of E. If b ∈ Y , we write

b = lim{x→a, x∈E} f(x)

if for each ε > 0 there exists δ > 0 such that

x ∈ E and d(x, a) < δ implies ρ(f(x), b) < ε. (8.3)

In the special case E = X \ {a}, we write simply b = limx→a f(x). ♦

Note that condition (8.3) may be written

f(E ∩Bδ(a)

)⊆ Bε(b).

This observation will be useful later in proving a global characterization ofcontinuity.

Many of the results in Chapter 3 on limits of functions on subsets ofR hold for real-valued functions defined on a metric space. These includethe theorems on limits of sums, products, and quotients of functions, thecomparison theorem, the squeeze principle, and the sequential characterizationof limit. The statements and proofs are essentially the same: simply replace|x− y| by the metric d(x, y). For future reference, we explicitly state:

8.4.3 Sequential Characterization of Limit. Let a be an accumulationpoint of E ⊆ X and let f : E → Y . Then lim{x→a, x∈E} f(x) exists and equalsb ∈ Y iff f(an)→ b for all sequences {an} in E with an → a.

The following theorem gives sufficient conditions for a double limit to equalan iterated limit.

Metric Spaces 249

8.4.4 Iterated Limit Theorem. Let X×Y have the product metric η := d×ρ,and let a and b be accumulation points of X \ {a} and Y \ {b}, respectively. Iff : X × Y \ {(a, b)} → Z has the properties

(a) g(x) := limy→b f(x, y) exists in Z for each x ∈ X, and

(b) z := lim(x,y)→(a,b) f(x, y) exists in Z,

then limx→a g(x) exists and equals z.

Proof. Given ε > 0, by (b) choose δ > 0 such that

µ(f(x, y), z

)< ε for all (x, y) ∈ X × Y with 0 < η

((x, y), (a, b)

)< δ.

Let 0 < d(x, a) < δ. Then, for all y sufficiently near b, η((x, y), (a, b)

)< δ,

hence

µ(g(x), z

)≤ µ

(g(x), f(x, y)

)+ µ

(f(x, y), z

)< µ

(g(x), f(x, y)

)+ ε.

Letting y → b in this inequality, noting that f(x, y) → g(x), we obtainµ(g(x), z

)≤ ε. This shows that limx→a g(x) = z.

The theorem implies that

lim(x,y)→(a,b)

f(x, y) = limx→a

limy→b

f(x, y) = limy→b

limx→a

f(x, y)

provided the limit on the left exists and inner limits on the right exist for eachx and y, respectively. The limits on the right are called iterated limits and thelimit on the left is sometimes called a double limit. In particular, if the iteratedlimits exist and are unequal, then the double limit cannot exist.

In many cases, the iterated limit theorem (suitably modified) still holds iff is defined on subsets E of X × Y more general than X × Y \ {(a, b)}. Thisis the case in Examples (c) and (d) that follow.

8.4.5 Examples. In (a)–(e), X = Y = Z = R. Note that in this case theproduct metric η is equivalent to the Euclidean metric on R2.(a) Let E = (0,+∞)× (0,+∞). To calculate the limit

lim(x,y)→(0,0)

(x,y)∈E

sin(x+ 2y)2x+ y

we write the function assin(x+ 2y)x+ 2y

x+ 2y2x+ y

.

As (x, y)→ (0, 0) along E, the first factor tends to 1 but the second factor hasno limit. Indeed, along a path y = mx, m > 0, x > 0,

x+ 2y2x+ y

= x+ 2mx2x+mx

= 1 + 2m2 +m

.

Therefore, the double limit does not exist. The iterated limits exist and areunequal:

limy→0+

limx→0+

f(x, y) = limy→0+

sin(2y)y

= 2, limx→0+

limy→0+

f(x, y) = limx→0+

sin x2x = 1

2 .

(b) Let E be as in (a) and let p, q > 0. The limit

L := lim(x,y)→(0,0)

(x,y)∈E

xp + yq

x2 + y2

exists iff p, q > 2 or p = q = 2. In the former case, L = 0 and in thelatter, L = 1. This is best seen by converting to polar coordinates x = r cos θ,y = r sin θ, 0 < θ < π/2:

L = limr→0+

(rp−2 cosp θ + rq−2 sinq θ

).

Both iterated limits exist iff p, q ≥ 2.(c) Let E = {(x, y) : x > 0, y > 0, x 6= y}. Then

lim(x,y)→(0,0)

(x,y)∈E

xp − yp

x− y

exists iff p ≥ 1 and has zero limit if p > 1. Indeed, if 0 < x < y, then, by themean value theorem, there exists t ∈ (x, y) such that xp − yp = ptp−1(x− y),hence

pxp−1 <xp − yp

x− y< pyp−1,

and the assertion follows from the squeeze principle.Clearly, the iterated limits exist (hence equal the double limit) iff p ≥ 1.(d) Let E be as in (c). Then

lim(x,y)→(0,0)

(x,y)∈E

xp + yp

y − x

does not exist for any value of p Indeed, along the path y = mx, m, x > 0,m 6= 1, the function has values

xp + (mx)pmx− x

= xp−1(1 +mp)1−m

so the limit cannot exist if p ≤ 1. Let p > 1 and set θr = mrp−1 + π/4. Alongthe path given by

x = r cos θr = r√2[cos(mrp−1)− sin

(mrp−1)]

y = r sin θr = r√2[cos(mrp−1)+ sin

(mrp−1)] ,

Metric Spaces 251

where r ↓ 0, the function has values

xp + yp

y − x= 1√

2mmrp−1

sin(mrp−1)(

cosp θr + sinp θr),

which tends to 2(1−p)/2/m as r → 0.Neither of the iterated limits exists if p < 1. If p > 1, then clearly

limx→0

limy→0

xp + yp

y − x= limy→0

limx→0

xp + yp

y − x= 0,

and if p = 1, then

limx→0

limy→0

xp + yp

y − x= −1, while lim

y→0limx→0

xp + yp

y − x= 1.

(e) Let E = {(x, y) : x > 0, y > 0}. Then

lim(x,y)→(0,0)

(x,y)∈E

xpy

x+ y

exists iff p > 0, in which case the limit is zero. Indeed, along the path y = mxthe function has values mxp/(1 + m), so the limit cannot exist if p ≤ 0. Ifp > 0, one can introduce polar coordinates as in (b).

Both iterated limits exist iff p ≥ 0, but are unequal if p = 0. ♦

8.4.6 Definition. A function f : X → Y is said to be continuous at a pointa ∈ X if limx→a f(x) = f(a). Also, f is said to be continuous on a set E ⊆ Xif f is continuous at each point of E. If E = X, then f is simply said to becontinuous. If f is one-to-one and onto Y and if f−1 : Y → X is continuous,then f is called a homeomorphism. ♦

From the sequential characterization of limit we have

8.4.7 Sequential Characterization of Continuity. Let f : X → Y anda ∈ X. Then is continuous at a iff f(an)→ f(a) for all sequences {an} in Xwith an → a.

The next theorem gives an important global characterization of continuity.

8.4.8 Theorem. Let f : X → Y . The following statements are equivalent:

(a) f is continuous.

(b) f−1(V ) is open in X for each open subset V of Y .

(c) f−1(C) is closed in X for each closed subset C of Y .

Proof. That (b) and (c) are equivalent follows from the general set-theoreticidentity f−1(Bc) =

(f−1(B)

)c.(a) ⇒ (b): Let V ⊆ Y be open. If x ∈ f−1(V ), then f(x) ∈ V so there

exists ε > 0 such that Bε(f(x)

)⊆ V . By continuity there exists δ > 0 such

that f(Bδ(x)

)⊆ Bε

(f(x)

). Therefore, f

(Bδ(x)

)⊆ V , hence Bδ(x) ⊆ f−1(V ).

(b) ⇒ (a): Let x ∈ X and ε > 0. Since U := f−1 (Bε(f(x)))

is open in Xand contains x, we may choose δ > 0 such that Bδ(x) ⊆ U . Then

f(Bδ(x)

)⊆ f(U) ⊆ Bε

(f(x)

),

which shows that f is continuous at x.

8.4.9 Definition. A function f : X → Y is said to be uniformly continuouson a set E ⊆ X if, given ε > 0, there exists δ > 0 such that

ρ(f(u), f(v)) < ε for all u, v ∈ E with d(u, v) < δ. ♦

8.4.10 Example. The function

f(x, y) = 12.1 + sin x+ sin y

is uniformly continuous on R2. Indeed, for all (x, y), (a, b) ∈ R2,

|f(x, y)− f(a, b)| = | sin x+ sin y)− (sin a+ sin b)|(2.1 + sin x+ sin y)(2.1 + sin a+ sin b)

≤ | sin x− sin a|+ | sin y − sin b|(2.1 + sin x+ sin y)(2.1 + sin a+ sin b)

≤ 100| sin x− sin a|+ 100| sin y − sin b)|≤ 100(|x− a|+ |y − b|)≤ 200

√(x− a)2 + (y − b)2. ♦

The proof of the following theorem is entirely analogous to that of 3.5.2.The details are left to the reader.

8.4.11 Sequential Characterization of Uniform Continuity. A functionf : X → Y is uniformly continuous on E ⊆ X iff ρ

(f(un), f(vn)

)→ 0 for all

sequences {un} and {vn} in E with d(un, vn)→ 0.

For example, every function on a discrete metric space is uniformlycontinuous, since eventually un = vn. The indefinite integral functionF (f)(x) =

∫ xaf(t) dt on the space C([a, b]) is uniformly continuous with re-

spect to the uniform norm, since ‖fn − gn‖∞ → 0⇒ ‖F (fn)− F (gn)‖∞ → 0.The addition function (x, y) 7→ x + y is uniformly continuous on R2 since(xn, yn)− (an, bn) → (0, 0) clearly implies that xn + yn − (an + bn) → (0, 0).On the other hand, the multiplication function (x, y) 7→ xy is not uniformlycontinuous on R2, since (n+1/n, n+1/n)−(n, n)→ 0 but (n+1/n)2−n2 → 1.

Metric Spaces 253

Exercises1. For each of the functions f(x) below, find lim{x→0,x∈E} f(x) and the

corresponding iterated limits or show that the limits fail to exist. In eachcase take E to be the natural domain of the function.

(a) y2 + sin2 x

3x2 + 2y2 . (b) S x2y

5x2 + 2y2 . (c) x2y2

x4 + 7y4 .

(d) sin x sin y√x2 + y2

. (e) S x4

x4 − xy2 + y4 . (f) (x+ y) sin 1x2 + y2 .

(g) sin(3xy2 + 2xy3)xy2 . (h) xy2 cos(xy)

x2 + y2 . (i) S

√(1 + x2)(1 + y2)− 1

x2 + y2 .

(j) 3x+ 2y(x2 + y2)1/3 . (k) S 1− cos(xy)

sin x sin y . (l) x2 + |y|2.1x2 + y2 .

(m) 1− cos√|xy|

|x|p. (n) sin x± sin y

x− y. (o) S x− y

ln x− ln y .

(p) x|y|1.1

sin2√x2 + y2

. (q) 3x2 + 2y2 + z2

x2 + y2 + z2 . (r) S xy + yz + xz√x2 + y2 + z2

.

2.S Let a > 0, p > 1. Evaluate the limit

lim(x,y)→(0,0)

(x,y)∈E

x2 − 5y2

x2 + 3y2 .

for the sets(a) E = {(x, y) : |y| ≤ a|x|p|} (b) E = {(x, y) : |y| < |x|}.

3.S Let f be continuously differentiable on (−π/2, π/2). Define g on the set

E := {(x, y) ∈ (−π/2, π/2)2 : x 6= y}

byg(x, y) = f(x)− f(y)

sin x− sin y .

Show that g has a continuous extension to (−π/2, π/2)2.

4. Let f and g be continuously differentiable on some open interval (a, b)and suppose that g′ 6= 0. Define h on the set

E := {(x, y) ∈ (a, b)2 : x 6= y}

by

h(x, y) = f2(x)− f2(y)g(x)− g(y) .

Prove that h has a continuous extension to (a, b)2.

5. Let f : X → Y . Prove that the following statements are equivalent:(a) f is continuous.(b) f

(cl(A)

)⊆ cl

(f(A)

)for each subset A of X.

(c) cl(f−1(B)

)⊆ f−1(cl(B)) for each subset B of Y .

(d) f−1(int(B))⊆ int

(f−1(B)

)for each subset B of Y .

6.S Show that d : X ×X → R is uniformly continuous with respect to theproduct metric η := d× d on X ×X.

7.S Let f : [0, a)→ R and

g(x, y) := f(√

x2 + y2),√x2 + y2 < a.

(a) Prove that g is uniformly continuous iff f is uniformly continuous.(b) Use (a) to show that the functions√

x2 + y2,1√

x2 + y2 + 1, and sin

√x2 + y2

are uniformly continuous on R2 but sin(x2 + y2) is not.

8.S Let f(x) be uniformly continuous on R. Prove that the function g(x, y) :=f(αx+βy) is uniformly continuous on R2. Give an example of a boundeduniformly continuous function f on R such that the function h(x, y) :=f(xy) is not uniformly continuous on R2.

9. Show that the function

f(x, y) = 11− sin x sin y

is uniformly continuous on the set

Er := [−π/2 + r, π/2− r]× [−π/2 + r, π/2− r]

for any 0 < r < π/2, but is not uniformly continuous on

E := (−π/2, π/2)× (−π/2, π/2).

10. Let f : (X, d)→ (Y, ρ) and g : (Y, ρ)→ (Z, µ) be (uniformly) continuous.Prove that g ◦ f : (X, d)→ (Z, µ) is (uniformly) continuous.

11.S Let f : X → Rk, say f(x) =(f1(x), . . . , fk(x)

). Prove that f is (uni-

formly) continuous iff each fj is (uniformly) continuous.

12.S Let fn : (X, d) → (Y, ρ) converge uniformly to f on X. Prove that ifeach fn is (uniformly) continuous, then f is (uniformly) continuous.

Metric Spaces 255

8.5 Compact Sets

Throughout this section, (X, d) and (Y, ρ) denote arbitrary metric spaces.

Compactness is one of the most important concepts in analysis. For example,it allows the formulation of results such as the extreme value theorem andthe uniform continuity theorem in the context of general metric spaces. It isalso the key feature that distinguishes the finite dimensional space Rn from itsinfinite dimensional counterparts `∞ and `1.

8.5.1 Definition. Let E ⊆ X. A collection U = {Ui : i ∈ I} of subsets of Xis called a cover of E if E is contained in the union of the sets Ui. If each Ui

is open, then U is called an open cover of E. A cover U of E is said to have afinite subcover if there exists a finite subset I0 of I such that {Ui : i ∈ I0} isa cover of E. If every open cover of E has a finite subcover, then E is said tobe compact. ♦

Finite subsets of a metric space are compact. In a discrete metric space,these are the only compact sets. Indeed, if E is an infinite subset of a discretespace, then

{{x} : x ∈ E

}is an open cover of E with no finite subcover.

8.5.2 Proposition. A compact subset of a metric space is closed and bounded.

Proof. Let E be compact and let a ∈ Ec. For each x ∈ E let Ux and Vx denotedisjoint open balls with centers x and a, respectively (see Figure 8.5). Then{Ux : x ∈ E} is an open cover of E, hence there exists a finite subset E0 ofE such that {Ux : x ∈ E0} covers E. Set V =

⋂x∈E0

Vx. Then V is an openball with center a, and since V ∩ Ux = ∅ for each x ∈ E0, V ⊆ Ec. ThereforeEc is open.

E

Ux

x

Vx

a

FIGURE 8.4: The neighborhoods Ux and Vx.

To show that E is bounded, choose any x ∈ X and consider the open cover{Bn(x) : n ∈ N} of E. Let F be a finite subset of N such that {Bn(x) : n ∈ F}covers E. Then E ⊆ Bm(x), where m is the largest member of F .


The converse of 8.5.2 is false. For example, the set Q ∩ [−√

2,√

2] is closedand bounded in Q but not compact. Indeed, if {rn} is a sequence in Q withrn ↑

√2, then {(−rn, rn) : n ∈ N} is an open cover of Q ∩ [−

√2,√

2] with nofinite subcover. For another example, consider a discrete metric space. Here,the entire metric space is closed and bounded but only finite sets are compact.8.5.3 Proposition. A closed subset of a compact metric space is compact.Proof. Let X be compact, E ⊆ X closed, and let U = {Ui : i ∈ I} be an opencover of E. Then U ∪ {Ec} is an open cover of X, hence there exists a finitesubset I0 of I such that X = Ec ∪

⋃λ∈I0

Ui. It follows that E ⊆⋃

i∈I0Ui.

Closely related to compactness is the notion of total boundedness.8.5.4 Definition. Let E ⊆ X and ε > 0. An ε-net for E is a set F ⊆ X suchthat {Bε(x) : x ∈ F} covers E. E is said to be totally bounded if for eachε > 0 there exists a finite ε-net for E. ♦

An ε-net F for E has the property that every member of E is within ε ofa member of F . For example, Q is an ε-net for R, and Z is a 1-net for R.

The following proposition shows that the set F in the definition of totalboundedness may be taken to be a subset of E.8.5.5 Proposition. If E has a finite ε-net F , then E has a finite 2ε-netcontained in E.Proof. For each x ∈ F , apply the following procedure: If E∩Bε(x) = ∅, removex from F . Otherwise, choose any a ∈ E ∩Bε(x) and replace Bε(x) by B2ε(a)

E

ε

x

a

2ε

FIGURE 8.5: A 2ε-net.

and x in F by a. Since Bε(x) ⊆ B2ε(a), the revised set is a finite 2ε-net for Econtained in E.

Since a finite union of open balls is bounded (Exercise 8.1.3), every totallybounded set is bounded. The converse is false. For example, in a discrete spaceall sets are bounded but no infinite set can be totally bounded.

Open and closed balls in C([0, 1]) with the supremum norm are boundedbut not totally bounded (Exercise 8). Contrast this with the following example:

Metric Spaces 257

8.5.6 Example. Every bounded subset E of Rn is totally bounded. To see this,let ε > 0 and choose k ∈ N so large that E ⊆ [−kδ, kδ]n, where 0 < δ < 2ε/

√n.

Subdividing, we see that I is a finite union of sets of the form

J := [a1, b1]× [a2, b2]× · · · × [an, bn], where bj − aj = δ.

(See Figure 8.6.) The largest diagonal in J has length√n δ < 2ε, hence J may

E

kδ−kδ

kδ

−kδ

J

Bε(c)c

FIGURE 8.6: A bounded set in Rn is totally bounded.

be enclosed in an open ball with radius ε and center c = (c1, . . . cn), wherecj = (aj + bj)/2. The resulting collection of balls is a finite ε-cover of E. ♦

8.5.7 Definition. A subset E of X is said to be sequentially compact if everysequence in E has a cluster point in E. ♦

By the Bolzano–Weierstrass theorem, closed and bounded intervals in Rare sequentially compact. The same is not true in Q; for example, Q∩ [0,

√2] is

not sequentially compact. In a discrete space, no infinite set can be sequentiallycompact since sequences with distinct terms cannot converge.

8.5.8 Heine–Borel Theorem. The following statements are equivalent:

(a) X is compact.

(b) X is sequentially compact.

(c) X is complete and totally bounded.

Proof. (a) ⇒ (b): We prove the contrapositive ∼(b) ⇒ ∼(a). Let {an} bea sequence in X with no cluster point. Then for each x ∈ X there mustexist an open ball B(x) with center x that contains only finitely many termsof the sequence. This implies that every finite subcover of the open cover

{B(x) : x ∈ X} of X contains only finitely many terms of the sequence andhence cannot cover X. Therefore, X is not compact.

(b)⇒ (c): LetX be sequentially compact and let {an} be a Cauchy sequencein X. By hypothesis, {an} has a convergent subsequence, say ank → a ∈ X.By Exercise 8.1.9, an → a. Therefore, X is complete.

Suppose that X is not totally bounded. Then there exists ε > 0 such thatno finite collection of open balls of radius ε covers X. Choose any a1 ∈ X. SinceBε(a1) does not cover X, there exists a2 ∈ X \Bε(a1). Since Bε(a1) ∪Bε(a2)does not cover X, there exists a3 ∈ X \

(Bε(a1) ∪Bε(a2)

). Continuing in this

fashion, we construct a sequence {an} in X such that

an ∈ X \[Bε(a1) ∪Bε(a2) ∪ · · · ∪Bε(an−1)

].

It follows that d(an, am) ≥ ε for all m 6= n. But then no subsequence of {an}can converge. Therefore, X must be totally bounded.

(c)⇒ (a): Assume that X is complete and totally bounded but not compact.Then X has an open cover U = {Ui : i ∈ I} with no finite subcover. Foreach k let Fk be a finite set of points in X such that {B1/k(x) : x ∈ Fk}is a cover of X. Consider the case k = 1. If for each x ∈ F1 the ball B1(x)could be covered by finitely many members of U , then X itself would havesuch a cover, contradicting our assumption. Thus there exists x1 ∈ F1 suchthat E1 := B1(x1) cannot be covered by finitely many members of U . Since{B1/2(x) : x ∈ F2} covers X, {E1 ∩ B1/2(x) : x ∈ F2} covers E1, so bysimilar reasoning there exists x2 ∈ F2 such that E2 := E1 ∩B1/2(x2) cannotbe covered by finitely many members of U . In this manner we construct asequence of points xn in X and decreasing sets

En = B1(x1) ∩B1/2(x2) ∩ · · · ∩B1/n(xn) = En−1 ∩B1/n(xn) (8.4)

that cannot be covered by finitely many members of U . In particular, En 6= ∅.Choose a point yn ∈ En. If n > m, then yn ∈ Em, hence from (8.4)

d(xm, xn) ≤ d(xm, yn) + d(yn, xn) < 1/m+ 1/n.

It follows that {xn} is a Cauchy sequence. Since X is complete, xn → x forsome x ∈ X. Choose i ∈ I such that x ∈ Ui. Since Ui is open, there exists r > 0such that Br(x) ⊆ Ui. Next, choose n > 2/r such that d(xn, x) < r/2. By thetriangle inequality, B1/n(xn) ⊆ Br(x). But then En ⊆ Ui, contradicting thenoncovering property of En. Therefore, X must be compact, completing theproof.

8.5.9 Corollary. A subset of Rn is compact iff it is closed and bounded.

Proof. We have already seen that a compact set in a metric space is closedand bounded. Conversely, let C ⊆ Rn be closed and bounded. Since Rn iscomplete (Exercise 8.1.17), C is complete (8.2.8). Since C is bounded, it istotally bounded (8.5.6). By the theorem, C is compact.

Metric Spaces 259

The validity of the preceding corollary ultimately rests on the finite dimen-sionality of Rn. For infinite dimensional normed spaces such as C

([0, 1]

), a

closed and bounded set need not be compact (Exercise 8). In the next section,we characterize the compact subsets of spaces like C

([0, 1]

).

8.5.10 Theorem. If f : X → Y is continuous and X is compact, then f(X)is compact.

Proof. Let {Vi : i ∈ I} be an open cover of f(X) in Y . For each i ∈ I, setUi = f−1(Vi). Then {Ui : i ∈ I} is an open cover of X, hence there exists afinite subset I0 of I such that {Ui : i ∈ I0} is a cover of X. It follows that{Vi : i ∈ I0} is a finite cover of f(X).

8.5.11 Corollary. Let f : X → Y be continuous, one-to-one, and onto Y . IfX is compact then f−1 : Y → X is continuous, hence f is a homeomorphism.

Proof. Let g = f−1 and let C be a closed subset of X. Then C is compact(8.5.3), hence, by the theorem, g−1(C) = f(C) is compact and therefore closedin Y (8.5.2). By 8.4.8, g is continuous.

Corollary 8.5.11 is false for noncompact X (Exercise 19).

8.5.12 Extreme Value Theorem. If f : X → R is continuous and X iscompact, then there exist points xm and xM in X such that

f(xm) ≤ f(x) ≤ f(xM ) for all x ∈ X.

Proof. By 8.5.10 and 8.5.2, f(X) is closed and bounded in R and thereforecontains its supremum and infimum.

8.5.13 Theorem. If f : X → Y is continuous and X is compact, then f isuniformly continuous.

Proof. Let ε > 0. By continuity, for each x ∈ X there exists γx > 0 such that

f(Bγx(x)

)⊆ Bε/2

(f(x)

). (8.5)

Set δx = γx/2. The collection {Bδx(x) : x ∈ X} is an open cover of X, hencethere exists a finite set F ⊆ X such that the collection {Bδx(x) : x ∈ F}covers X. Let δ := minx∈F δx and let a, b ∈ X with d(a, b) < δ. Choose x ∈ Fsuch that a ∈ Bδx(x). Then

d(x, a) < δx < γx and d(x, b) ≤ d(a, b) + d(x, a) < δx + δx = γx,

so a, b ∈ Bγx(x). By (8.5),

ρ(f(a), f(b)

)≤ ρ(f(a), f(x)

)+ ρ(f(x), f(b)

)< ε/2 + ε/2 = ε.

Therefore, f is uniformly continuous.

The following is a generalization of 3.5.9.

8.5.14 Corollary. Let X be compact, Y complete, E a dense subset of X,and f : E → Y continuous. The following statements are equivalent:

(a) lim{x→a, x∈E} f(x) exists for each a ∈ X.

(b) f has a continuous extension to X; that is, there exists a continuousfunction g : X → Y such that g|E = f .

(c) f is uniformly continuous on E.

Proof. (a) ⇒ (b): For each a ∈ X define g(a) = lim{x→a, x∈E} f(x). Sincef is continuous, g|E = f . If g is not continuous at a ∈ X, then there existε > 0 and a sequence {xn} in X such that xn → a and ρ

(g(xn), g(a)

)≥ ε

for all n. By definition of g(xn), for each n we may choose an ∈ E such thatd(xn, an) < 1/n and ρ

(g(xn), f(an)

)< ε/2. Then an → a but

ρ(f(an), g(a)

)≥ ρ(g(xn), g(a)

)− ρ(g(xn), f(an)

)> ε/2,

contradicting the definition of g(a). Therefore, g is continuous.(b) ⇒ (c): By 8.5.13, g is uniformly continuous on X, hence f is uniformly

continuous on E.(c) ⇒ (a): Let a ∈ X and let {xn} be a sequence in E such that xn → a.

Since f is uniformly continuous, {f(xn)} is Cauchy and therefore convergesto some b ∈ Y . If {yn} is another sequence in E such that yn → a, thend(yn, xn) → 0 so, by uniform continuity again, ρ

(f(yn), f(xn)

)→ 0, hence

f(yn)→ b. By the sequential criterion for limits, lim{x→a, x∈E} f(x) exists andequals b.

Exercises1. Determine which of the following subsets of R2 are closed, bounded, or

compact.

(a) S {(x, y) : 2x2 + y2 + 6y ≤ 8x}. (b) S {(x, y) : 3x2 + 2y ≤ 6x}.(c) {(x, y) : xy = 1}. (d) {(x, y) : x1/3 + y1/3 = 1}.

(e) {(x, y) : x2/3 + y2/3 = 1}. (f) S

{[x cosx1 + x

,x sin x1 + x

]: x ≥ 0

}.

(g){

(e−x cosx, e−x sin x) : x ≥ 0}. (h) S {(x, y) : x3/y + y3/x > 0}.

2. Let {xn} be a convergent sequence in X with xn → x0. Prove that theset {x0, x1, x2, . . .} is compact.

3.S Prove that a finite union of totally bounded (compact) sets is totallybounded (compact).

Metric Spaces 261

4.S Prove that the intersection of an arbitrary family of compact subsets ofa metric space X is compact.

5. Prove that X × Y is compact in the product metric η := d× ρ iff X andY are compact.

6. Prove that the closure of a totally bounded subset of a metric space istotally bounded.

7.S Prove that a subset E of a complete metric space X is totally boundediff every sequence in E has a cluster point in X.

8. Prove that in(C([0, 1]), ‖ · ‖∞

), the closed ball with radius 1 and center

the zero function is not compact.

9. Let C0([0,+∞)) be the vector subspace of B([0,+∞)) consisting of all real-valued continuous functions f on [0,+∞) such that limx→+∞ f(x) = 0.Prove that C0([0,+∞)) is closed in the uniform norm and that the closedball C1(0) in C0([0,+∞)) with radius 1 and center the zero function isnot compact and therefore is not totally bounded.

10. For n ∈ N, define fn ∈ B([0,+∞)) by fn = 1 on [n, n + 1] and zeroelsewhere. Prove that the set E := {f1, f2, . . .} is bounded but not totallybounded in the sup metric.

11.S (Cantor’s intersection theorem). Let C1, C2, . . . be a sequence ofnonempty compact subsets of a metric space X such that Cn+1 ⊆ Cnfor all n. Prove that

⋂∞n=1 Cn 6= ∅.

12. A collection of subsets of a metric space X is said to have the finite inter-section property if every finite subcollection has a nonempty intersection.Prove that X is compact iff every collection of closed subsets ofX withthe finite intersection property has a nonempty intersection.

13.S The diameter of a nonempty subset A of (X, d) is defined by

d(A) := sup {d(a, b) : a, b ∈ A} .

(a) Prove that if A is compact, then there exist points a, b ∈ A such thatd(A) = d(a, b).(b) Give an example of a closed and bounded set A in a metric spacesuch that d(A) > d(a, b) for all a, b ∈ A.

14. ⇓3 The distance between nonempty subsets A and B of (X, d) is definedas

d(A,B) := inf {d(a, b) : a ∈ A, b ∈ B} .


(a) Prove that if A and B are disjoint with A closed and B compact,then d(A,B) > 0.(b) Show by example that the conclusion in (a) is false if B is merelyclosed.(c) Show that if both sets are compact, then there exist a ∈ A and b ∈ Bsuch that d(A,B) = d(a, b).

15.S ⇓4 Let A be a nonempty subset of X and define d(A, ·) : X → R byd(A, x) = d(A, {x}) (see Exercise 14). Prove the following:(a) |d(A, x)− d(A, y)| ≤ d(x, y), hence d(A, ·) is uniformly continuous.(b) d(A, x) = 0 iff x ∈ cl(A).(c) If A and B are disjoint closed sets, then the function

FAB(x) = d(x,A)d(x,A) + d(x,B) , x ∈ X,

is well-defined and continuous, 0 ≤ FAB ≤ 1 on X, and

A = {x : FAB(x) = 0}, B = {x : FAB(x) = 1}.

(d) If A and B are disjoint closed sets of X, then there exist disjointopen sets U and V such that A ⊆ U and B ⊆ V . (U and V are then saidto separate A and B.)

16. Referring to 8.1.10, show that the set {f ∈ `∞ : |f(n)| ≤ e−n} iscompact. Is {f ∈ B([1,+∞)) : |f(x)| ≤ e−x} compact?

17. (Lebesgue’s number). Let X be compact and let U = {Ui : i ∈ I} be anopen cover of X. Prove that there exists a number r > 0 such that everyset with diameter < r (Exercise 13) is contained in some Ui.

18. (Dini’s Theorem). LetX be compact and let fn, g : X → R be continuoussuch that either fn ↓ g or fn ↑ g on X. Prove that the convergence isuniform. (See 7.1.12.)

19.S Let f : [0, 2π) → R2 be defined by f(t) = (cos t, sin t). Show that f iscontinuous, one-to-one, and maps [0, 2π) onto the circle x2 + y2 = 1 buthas a discontinuous inverse.

20. Let f : R2 → R be defined by f(x, y) = x. Prove or disprove:

(a) If E ⊆ R2 is closed, then f(E) is closed.

(b) If E ⊆ R2 is open, then f(E) is open.

Metric Spaces 263

21.S Let A and B be compact subsets of R. Prove that the sets

AB := {ab : a ∈ A, b ∈ B} and A+B := {a+ b : a ∈ A, b ∈ B}

are compact.

22. ⇓5 Let a sequence of continuous functions fn : (X, d)→ (Y, ρ) convergeuniformly to f on X, let C ⊆ X be compact, and let U ⊆ Y be open.Prove that if f(C) ⊆ U , then fn(C) ⊆ U for all sufficiently large n.

*8.6 The Arzelà–Ascoli Theorem

Throughout this section, (X, d) and (Y, ρ) denote arbitrary metric spacesand C(X,Y ) denotes the set of all continuous functions from X to Y .

As noted in the previous section, closed and bounded subsets in infinitedimensional spaces such as C

([0, 1]

)need not be compact. The additional

property of equicontinuity is needed to characterize compact subsets of suchspaces.

8.6.1 Definition. A family F of functions in C(X,Y ) is said to be

• equicontinuous at a point a ∈ X if, for each ε > 0, there exists δ > 0such that ρ

(f(x), f(a)

)< ε for all x ∈ X with d(x, a) < δ and all f ∈ F ;

• equicontinuous on E ⊆ X if F is equicontinuous at each point of E;

• uniformly equicontinuous on E if, for each ε > 0, there exists δ > 0 suchthat ρ

(f(x), f(y)

)< ε for all f ∈ F and all x, y ∈ E with d(x, y) < δ.♦

The distinguishing feature of equicontinuity is that, while δ may varywith the point a, it is independent of the functions f ∈ F . With uniformequicontinuity, δ is independent of both f and a.

8.6.2 Example. For each x, t ∈ R, define ft(x) = tx. Let I = (c, d) be abounded interval and set M = max{|c|, |d|}. The inequality

|ft(x)− ft(y)| = |t| |x− y| ≤M |x− y|, t ∈ I,

shows that the collection of functions {ft : t ∈ I} is uniformly equicontinuouson R. On the other hand, the larger collection {ft : t ∈ R} is not equicontinuousat any a ∈ R. Indeed, no δ can be chosen so that |tx − ta| < 1 for all t ∈ Rand all x ∈ R with |x− a| < δ. ♦

A straightforward modification of the proof of 8.5.13 yields

8.6.3 Theorem. If X is compact and F is equicontinuous on X, then F isuniformly equicontinuous.

8.6.4 Definition. A metric space is said to have the Bolzano–Weierstrassproperty if every bounded sequence has a cluster point. ♦

A compact metric space and the space Rn have the Bolzano–Weierstrassproperty, while infinite discrete metric spaces, the space Q, and the infinitedimensional space

(C([0, 1]

), ‖ · ‖∞

)do not.

8.6.5 Proposition. (a) A metric space with the Bolzano–Weierstrass propertyis complete.(b) A metric space has the Bolzano–Weierstrass property iff every closed andbounded set is compact.

Proof. For (a), use the fact that a Cauchy sequence is bounded and applyExercise 8.1.9. Part (b) follows from 8.5.8.

The following lemma may be proved using familiar ideas such as thosefound in 8.1.10. The details are left to the reader.

8.6.6 Lemma. Let (X, d) be compact and (Y, ρ) complete. For f, g ∈ C(X,Y )define

σ(f, g) = supx∈X

ρ(f(x), g(x)).

Then σ is a metric on C(X,Y ), and C(X,Y ) is complete in this metric.

8.6.7 Lemma. A compact metric space X has a countable dense subset ofthe form D =

⋃∞k=1 Fk, where Fk is a finite (1/k)-net for X.

Proof. For each k ∈ N, the collection {B1/k(x) : x ∈ X} is an open cover ofX, hence has a finite subcover {B1/k(x) : x ∈ Fk}. By definition of ε-net,⋃∞k=1 Fk is dense in X.

8.6.8 Arzelà–Ascoli Theorem. Let X be compact and let Y have theBolzano–Weierstrass property. Then a set F is compact in

(C(X,Y ), σ

)iff it

is closed, bounded, and equicontinuous.

Proof. Suppose F is compact in C(X,Y ), hence closed and bounded. If F isnot equicontinuous at some a ∈ X, then there exists an ε > 0 and for every nmembers xn of X and fn of F such that d(xn, a) < 1/n and

ρ(fn(xn), f(a)) ≥ ε. (8.6)

By compactness of F , we may assume that {fn} converges uniformly to somef ∈ F (otherwise, take a subsequence). Since xn → a, the uniform convergenceof {fn} implies that fn(xn)→ f(a). But this contradicts (8.6). Therefore, Fis equicontinuous.

Metric Spaces 265

Conversely, assume that F is closed, bounded, and equicontinuous and let{fn} be any sequence in F . We show that {fn} has a convergent subsequence.The compactness of F will then follow from 8.5.8.

Let Fk and D = {x1, x2, . . .} be as in 8.6.7. We show first that {fn} hasa subsequence that converges pointwise on D. For this we use the Bolzano–Weierstrass property of Y and the following diagonalization argument: Because{fn} is bounded, we may choose a subsequence {f (1)

n } of {f (0)n := fn} such

that the sequence {f (1)n (x1)} converges to some y1 ∈ Y . We may then choose

a subsequence {f (2)n } of {f (1)

n } such that {f (2)n (x2)} converges to some y2 ∈ Y .

Continuing in this way, we obtain for each k a sequence {f (k)n } such that

{f (k+1)n } is a subsequence of {f (k)

n } and limn f(k)n (xk) = yk. Now take the

diagonal sequence {gn := f(n)n }, which is a subsequence of {fn} and for each

k, except for the first k − 1 terms, is a subsequence of {f (k)n }. It follows that

limn gn(xk) = yk for each k. The scheme may be depicted as follows:

f(1)1 , f

(1)2 , . . . , f (1)

n , . . . → y1 at x1

f(2)1 , f

(2)2 , . . . , f (2)

n , . . . → y2 at x2

...

f(n)1 , f

(n)2 , . . . , f (n)

n , . . . → yn at xn... ↘

yk at each xk

Having obtained a subsequence {gn} of {fn} that converges pointwise onthe dense set D, we now show that {gn} converges uniformly on X, which willcomplete the proof.

By the uniform equicontinuity of {gn}, given ε > 0, we may choose δ > 0such that

ρ(gn(x), gn(y)

)< ε/3, for all n ∈ N and x, y ∈ X with d(x, y) < δ. (8.7)

Let k > 1/δ. Since {gn} converges pointwise on Fk and Fk is finite, we maychoose Nk so that

ρ(gn(y), gm(y)

)< ε/3, for all n, m ≥ Nk and all y ∈ Fk. (8.8)

Since Fk is a δ-net, given x ∈ X, there exists y ∈ Fk such that d(x, y) < δ. Itfollows from (8.7) and (8.8) that for m, n ≥ Nk,

ρ(gn(x), gm(x)

)≤ ρ(gn(x), gn(y)

)+ ρ(gn(y), gm(y)

)+ ρ(gm(y), gm(x)

)< ε/3 + ε/3 + ε/3 = ε.

Since x was arbitrary, {gn} is a Cauchy sequence in C(X,Y ). Since C(X,Y ) iscomplete, {gn} converges in C(X,Y ).


Remark. The proof of the sufficiency of the theorem did not require thatF be uniformly bounded. All that was used was the property of pointwiseboundedness, that is, {f(x) : f ∈ F} bounded in Y for each x ∈ X. Uniformboundedness is then a consequence of equicontinuity. ♦

8.6.9 Example. Let X be compact. Then any convergent sequence of func-tions fn in C(X,R), say fn → f , is equicontinuous. This may be verifieddirectly, but a quick proof uses 8.6.8 applied the set {f, f1, f2, . . .}, whosecompactness is readily established. ♦

Exercises1. Let X × Y have the product metric η := d× ρ and let f : X → Y . The

graph of f is the set

G(f) = {(x, y) : x ∈ X and y = f(x)}.

Prove that if f is continuous, then G(f) is closed in X × Y . Conversely,prove that if G(f) is closed, f(X) is bounded, and Y has the Bolzano–Weierstrass property, then f is continuous. Give an example of a real-valued discontinuous function on [0, 1] with a closed graph.

2. Let X have the Bolzano–Weierstrass property and let {xn} be a boundedsequence in X with only finitely many cluster points y1, . . . , yk. Provethat the set C := {y1, . . . , yk, x1, x2, . . .} is compact.

3.S Prove that a subset F of C(X,Y ) is equicontinuous at a ∈ X iff for anysequences {fn} in F and {xn} in X with xn → a, ρ

(fn(xn), fn(a)

)→ 0.

4. Prove that a subset F of C(X,Y ) is uniformly equicontinuous on E ⊆ Xiff for any sequences {fn} in F and {xn}, {an} in E with d(xn, an)→ 0,ρ(fn(xn), fn(an)

)→ 0.

5. Prove that a finite set of uniformly continuous functions f : X → Y isuniformly equicontinuous.

6. Prove that the uniform closure of a set F ⊆ C(X,Y ) of uniformlyequicontinuous functions is uniformly equicontinuous.

7.S Let c, p > 0 and define fn(x) = (nx)−p, x ≥ c. Show that the sequence{fn} is uniformly equicontinuous.

8. Define fn(x) = ln(n + x). Show that the sequence {fn} is uniformlyequicontinuous on (0,+∞).

9.S Define fn(x) = sin(nx). Use Exercise 3 and Exercise 8.3.13 to showthat the sequence {fn} is not equicontinuous at any nonzero rationalnumber r.

Metric Spaces 267

10. Let M > 0 and define

RM := {f : f is locally integrable on [0,+∞) and ‖f‖∞ ≤M} .

For f ∈ RM defineFf (x) =

∫ x

0f, x ≥ 0.

Prove that the set F := {Ff : f ∈ RM} is uniformly equicontinuous on[0,+∞).

11.S Let M > 0 and define

DM := {f : (a, b)→ R : |f ′(x)| ≤M for all a < x < b} .

Show that DM is uniformly equicontinuous. Conclude that if g hasa bounded derivative on R, then the set of functions {gt : t ∈ R} isuniformly equicontinuous on I, where gt(x) = g(t+ x).

12. Let f : X × Y → R have the property that f(x, y) is continuous in y foreach fixed x and continuous in x for each fixed y. Define

F := {f( · , y) : y ∈ Y } .

Prove:

(a) If F is equicontinuous, then f is continuous.

(b) If f is continuous and Y is compact, then F is equicontinuous.

13. Let X be compact. Show that a totally bounded subset of C(X,Y ) isuniformly equicontinuous.

14.S Let {fi : i ∈ I} be a uniformly bounded subset of Rba. Define

Fi(x) :=∫ x

a

fi(t) dt, a ≤ x ≤ b.

Show that {Fi : i ∈ I} is a totally bounded subset of C([a, b]).

15. Let f(t, x, y) be continuous on [a, b]3 and define ft(x, y) = f(t, x, y).Prove that the family {ft : t ∈ [a, b]} is uniformly equicontinuous on[a, b]2. Apply this to the function

f(t, x, y) = 1 + t sin x2 + t sin y on [0, 1]3.


8.7 Connected SetsThroughout this section, (X, d) and (Y, ρ) denote arbitrary metric spaces.

8.7.1 Definition. A pair (U, V ) of open sets in X is said to separate X if

X = U ∪ V, U 6= ∅, V 6= ∅, and U ∩ V = ∅.

The pair (U, V ) is then called a separation of X. The space X is said to beconnected if it has no separation, and disconnected otherwise. A subset E ofX is connected if it is connected as a subspace of X. ♦

It follows from the definition that if E is disconnected, then there existsets U , V open in X such that (E ∩U,E ∩V ) is a separation of E. The sets Uand V need not be disjoint in this definition; however the next theorem showsthat this useful state of affairs may always be achieved. In this case we shallcall (U, V ) a separation of E.

8.7.2 Theorem. A subset E of X is disconnected iff there exists a separation(E ∩ U,E ∩ V ) of E such that U ∩ V = ∅.

EU V

FIGURE 8.7: A separation (U, V ) of E.

Proof. The sufficiency is clear. For the necessity, assume that E is disconnectedand that (E ∩ U1, E ∩ V1) is a separation of E. Here, U1 and V1 are open inX but may not be disjoint. However, since E ∩ U1 and E ∩ V1 are disjoint,clE(E ∩ U1) ∩ V1 = ∅. Indeed, if, to the contrary, x ∈ clE(E ∩ U1) ∩ V1 forsome x, then there would be a sequence {xn} in E ∩ U1 converging to x,which would imply that eventually xn ∈ E ∩ V1, impossible. Recalling thatclE(E ∩ U1) = E ∩ clX(U1), we now see that

v 6∈ clX(U1) for each v ∈ E ∩ V1.

Similarly,u 6∈ clX(V1) for each u ∈ E ∩ U1.

By Exercise 8.5.14 it follows that for u ∈ E ∩ U1 and v ∈ E ∩ V1 the distances

r(u) := inf{d(u, x) : x ∈ clX(V1)} and s(v) := inf{d(v, x) : x ∈ clX(U1)}

Metric Spaces 269

are positive. Define

U =⋃

u∈E∩U1

Br(u)/2(u), and V =⋃

v∈E∩V1

Bs(v)/2(v).

Clearly, U and V are open in X and contain E ∩ U1 and E ∩ V1, respectively.To prove that (U, V ) is a separation of E, it remains to show that U ∩ V = ∅.

Suppose the the contrary that there exists a point x ∈ U ∩ V . Then, bythe above,

d(x, u) < r(u)/2 for some u ∈ U1 and d(x, v) < s(v)/2 for some v ∈ V1.

Adding and using the triangle inequality we have

d(u, v) < r(u)/2 + s(v)/2.

On the other hand, by definition of r(u) and s(v),

d(u, v) ≥ r(u) and d(u, v) ≥ s(v),

henced(u, v) ≥

[r(u) + s(v)

]/2

This contradiction shows that U ∩ V = ∅ and completes the proof of thetheorem.

In any metric space, the empty set and the singletons {x} are triviallyconnected, but no other finite subsets are connected. In a discrete space theonly connected sets are the empty set and the singletons. The set Q is notconnected in R, since the open sets (−∞,

√2) and (

√2,+∞) separate Q.

8.7.3 Theorem. X is not connected iff there exists a continuous functionfrom X onto {0, 1}. Equivalently, X is connected iff every continuous functionfrom X into {0, 1} is constant.

Proof. Assume that X is not connected and let (U, V ) separate X. Define

g(x) ={

0 if x ∈ U ,1 if x ∈ V .

Then g maps X onto {0, 1}. Let W be any open set in R. Then g−1(W ) isone of the sets ∅, U , V , or X, each of which is open in X. Therefore, g iscontinuous.

Conversely, if a continuous function g from X onto {0, 1} exists, then theopen sets g−1((−1, 1/2)) and g−1((1/2, 2)) separate X.

8.7.4 Corollary. The nonempty connected subsets of R are the intervals.

Proof. By the intermediate value theorem, there can be no continuous functionfrom an interval onto {0, 1}. Hence intervals must be connected.

Now let E be a nonempty subset of R that is not an interval. Choosereal numbers a < c < b with a, b ∈ E but c 6∈ E. Then (−∞, c) and (c,+∞)separate E, hence E is not connected.

The following is a generalization of the intermediate value theorem.

8.7.5 Corollary. If f : X → Y is continuous and X is connected, then f(X)is connected.

Proof. Let g : f(X) → {0, 1} be continuous. Then g ◦ f : X → {0, 1} iscontinuous and hence must be constant. It follows that g itself must beconstant.

8.7.6 Corollary. If A ⊆ X is connected and A ⊆ B ⊆ cl(A), then B isconnected. In particular, the closure of a connected set is connected.

Proof. Let g : B → {0, 1} be continuous. Then g|A is continuous, hence mustbe constant. Since B ⊆ cl(A), g itself must be constant. Therefore, A isconnected.

The converse of 8.7.6 is false. For example, cl(Q) = R is connected but Qis not.

8.7.7 Definition. A path in X from x to y is a continuous function ϕ froman interval [a, b] to X such that ϕ(a) = x, the initial point of the path, andϕ(b) = y, the terminal point. X is said to be path connected if for each pair ofpoints x, y ∈ X there exists a path in X from x to y. A subset E of X is pathconnected if it is path connected as a subspace of X. ♦

Note that if ϕ : [a, b]→ X is a path from x to y, then

−ϕ(t) := ϕ(−t), −b ≤ t ≤ −a,

defines a path from y to x. Also, if ϑ : [c, d]→ X is a path from y to z, thenthe sum or concatenation ϕ+ ϑ : [0, 2] → X of the paths ϕ and ϑ is a pathfrom x to z, where

(ϕ+ ϑ)(t) ={ϕ(a+ (b− a)t

)if 0 ≤ t ≤ 1,

ϑ(c+ (d− c)(t− 1)

)if 1 ≤ t ≤ 2.

A convex subset C of a normed vector X is path connected. Indeed, ifx, y ∈ C, then the line segment

ϕ(t) := (1− t)x+ ty, 0 ≤ t ≤ 1,

joins x to y and lies in C. In particular, open and closed balls in X are pathconnected.

Metric Spaces 271

8.7.8 Theorem. If X is path connected, then it is connected.

Proof. Let g : X → {0, 1} be a continuous function, let x, y ∈ X, and letϕ[a, b]→ X be a path from x to y. Then g ◦ ϕ : [a, b]→ {0, 1} is continuousand, because [a, b] is connected, must be constant. In particular,

g(x) = (g ◦ α)(a) = (g ◦ α)(b) = g(y).

Since x and y were arbitrary, g is constant.

8.7.9 Example. The subset B1(−1, 0)∪B1(1, 0) of R2 is not connected, hencenot path connected.

C1(−1, 0) C1(1, 0)

x y

(−1, 0) (1, 0)

FIGURE 8.8: C1(−1, 0) ∪ C1(1, 0) is path connected.

However, its closure C1(−1, 0) ∪ C1(1, 0) is path connected, as can be seenfrom the figure, hence is connected. ♦

8.7.10 Example. A sphere in Rn, n > 1, is path connected, hence connected.For example, consider the sphere

S = {x ∈ Rn : ‖x‖2 = 1} .

We show that there is a path from the point a = (1, 0, . . . , 0) to any pointb = (b1, b2, . . . , bn). It will then follow that any pair of points in S may bejoined by a path in S through a.

If b = (−1, 0, . . . , 0), then (cos t, sin t, 0, . . . , 0), 0 ≤ t ≤ π, is such a path.Suppose b 6= (−1, 0, . . . , 0). Then the line segment

ϕ(t) = (1− t)a+ tb = (1− t+ tb1, tb2, . . . , tbn), 0 ≤ t ≤ 1,

is never zero, hence ‖ϕ(t)‖−12 ϕ(t) is a path from a to b in S. ♦

The converse of 8.7.8 is false, as the following example—the topologist’ssine curve (8.3.7)—demonstrates.

8.7.11 Example. Let

A = {(x, sin(1/x)) : 0 < x < 2/π}, B = {0} × [−1, 1], and E = A ∪B.

Since A is connected and E = cl(A), 8.7.6 shows that E is connected. However,E is not path connected. Indeed, no point in A can be joined to a point in Bby a path in E. Suppose such a path existed, say ϕ : [a, b]→ E, where

ϕ(t) =(x(t), y(t)

), ϕ(a) ∈ A, and ϕ(b) ∈ B.

LetS :=

{t ∈ [a, b] : ϕ

([a, t]

)⊆ A

}.

Since S is nonempty and bounded, c := supS exists and c ∈ [a, b]. Note thatx(t) > 0 on S. If x(c) > 0, then c < b, hence, by continuity, x(s) is positive on[a, c+ δ] for some δ > 0, contradicting the definition of c. Therefore, x(c) = 0and x(t) > 0 on [a, c). This implies that ϕ(t) =

(x(t), sin(1/x(t))

)on [a, c) and

limt→c− x(t) = 0. By continuity, for each δ > 0 the set x([c−δ, c]) is an intervalof the form [0, d], d > 0. Therefore, y(t) = sin(1/x(t)) takes on all values in[−1, 1] on each interval [c− δ, c), which implies that limt→c− y(t) cannot exist.But this contradicts the continuity of ϕ at c. ♦

While there is no strict converse to 8.7.8, the next theorem provides apartial converse.

8.7.12 Theorem. An open connected subset E of a normed vector space Xis path connected.

Proof. Fix a point x ∈ E and let U denote the set of all points u ∈ E forwhich there exists a path in E from x to u. We claim that U is open. Let

xBr(u0)

u0

u

E

FIGURE 8.9: E is path connected.

u0 ∈ U and choose r > 0 such that Br(u0) ⊆ E. By definition of U , thereexists a path in E from x to u0. Since Br(u0) is convex, there exists a linesegment in Br(u0) from u0 to any point u ∈ Br(u0). The sum of these pathsis then a path in E from x to u. Therefore, Br(u0) ⊆ U , which shows thatU is open. A similar argument shows that V := E \ U is open. Since E isconnected and x ∈ U , V = ∅. Therefore, E = U .

Metric Spaces 273

Exercises1. Determine which sets are connected in R2:

(a) B1(−1, 0) ∪ {(0, 0)} ∪B1(1, 0).(b) R2 \ {(1/m, 1/n) : m,n ∈ N}.(c)S Q2.(d)S R2 \Q2.(e)S {(x, sin(1/x)) : x 6= 0} ∪ {(0, a)}.(f) R2 \G, where G is the graph of a bounded function f : [a, b]→ R.(g) R2 \G, where G is the graph of an equation F (x, y) = 0.(h) {(x, y, z) : x2 + y2 − z2 = 1}.(i) {(x, y, z) : x2 + y2 − z2 = −1}.(j) {(x, y, z) : x2 + y2 − z2 = 0, 0 < x2 + y2 ≤ 1}.

2. Prove that a metric spaceX is connected iff it has no proper nonemptysubset that is both open and closed.

3. Prove that X is connected iff it cannot be expressed as the union ofnonempty sets A and B such that

A ∩ clX(B) = clX(A) ∩B = ∅.

Hint. Use 8.7.3 and the sequential characterization of continuity.

4. Prove that X × Y is connected in the product metric d× ρ iff X and Yare connected.

5.S Let X be connected and f : X → R continuous. Suppose there existu, v ∈ X such that f(u)f(v) < 0. Show that the equation f(x) = 0 hasa solution.

6. Let X be connected and f : X → Y continuous. Suppose f has theproperty that for each x ∈ X there exists ε > 0, possibly depending onx, such that f is constant on Bε(x). Prove that f is constant on X.

7.S Let X be connected and let g, h : X → R be continuous such thatg(x) 6= h(x) for all x ∈ X. Prove that g > h or h > g on X.

8. ⇓6 Let X be a normed vector space and u, v ∈ X . A polygonal pathP from u to v is a finite sequence of line segments Lk = [xk : xk+1],k = 1, . . . , n − 1, where x1 = u and xn = v. The path P is non-overlapping if Lj ∩ Lk = ∅ unless j = k − 1, in which case Lj ∩ Lk = xk.A subset E of a normed vector space X is polygonally connected if for

each pair of points u and v in E there exists a polygonal path from uto v contained in E. For example, a convex set is polygonally connected.Prove that every open connected subset E of X is polygonally connected.Show also that it is always possible to choose P to be non-overlapping

9.S Show that for n > 1 the complement of an open ball or a closed ball inRn is path connected, hence connected.

10. Suppose A ⊆ X is connected. By 8.7.6, cl(A) is connected. Prove ordisprove: (a) int(A) is connected, (b) bd(A) is connected.

11. The exterior ext(E) of a subset E of a metric space X is defined as theinterior of Ec. Show that X = int(E)∪bd(E)∪ext(E). Conclude that Xis connected iff every subset of X with nonempty interior and nonemptyexterior also has a nonempty boundary.

12.S Let {An} be a finite or infinite sequence of connected subsets of X suchthat An ∩An+1 6= ∅ for each n. Prove that

⋃nAn is connected.

13. Let {Ai : i ∈ I} be a collection of nonempty connected sets and i0 ∈ Isuch that Ai ∩Ai0 6= ∅ for all i. Prove that

⋃iAi is connected.

14. Let {An} be an infinite sequence of compact connected subsets of Xsuch that An+1 ⊆ An. Prove that

⋂nAn is connected.

15. Let X = A1 ∪ · · · ∪Ap and Y = B1 ∪ · · · ∪Bq, p < q, where Aj and Bjare connected, the Aj ’s are pairwise disjoint, and the Bj ’s are pairwisedisjoint and closed. Show that no continuous function f : X → Y canmap X onto Y .

16.S Prove that no one-to-one continuous function can map a closed linesegment L onto a circle C. Show, however, that there are continuousfunctions that can do this.

17. Suppose closed line segments L1, L2, L3 in the plane meet at a singleendpoint P . Show that no one-to-one continuous function can map aclosed line segment L onto L1 ∪ L2 ∪ L3. Show, however, that there arecontinuous functions that can do this.

18. Let C1 and C2 be tangent circles in the plane. Show that no one-to-onecontinuous function can map C1 ∪ C2 onto a circle C. Show, however,that there are continuous functions that can do this.

19. Show that no one-to-one continuous function can map the set

E := {(x, y, z) : x2 + y2 = z2, x2 + y2 ≤ 1}

onto a closed disk D. Show, however, that there are continuous functionsthat can do this.

Metric Spaces 275

20.S Let X be a normed vector space and f : X → R continuous. Let

A := {x ∈ X : f(x) ≥ c} and B := {x ∈ X : f(x) = c}.

Prove that bd(A) ⊆ B and that the inclusion may be strict.

21. Let X be connected and have at least two points. Show that X isuncountable. Hint. For all sufficiently small r > 0, X 6= Br(x) ∪ Ccr(x).

22.S Let U be an open subset of a normed vector space X and let x ∈ U . Thecomponent of U containing x is the union Cx of all connected subsets ofU containing x.(a) Prove that Cx is open and connected and that U is a union of pairwisedisjoint components.(b) Show that the number of components is countable if X is a Euclideanspace Rn.

23. Let (X, d) be complete, (Y, ρ) connected, c > 0, and let f : X → Y be acontinuous mapping such that f(X) is open and

ρ(f(u), f(v)

)≥ c d(u, v) for all u, v ∈ X.

Prove that Y is complete.

8.8 The Stone–Weierstrass TheoremLet (X, d) be a compact metric space and let C(X) denote the space of

all continuous real-valued functions of X with the supremum norm ‖f‖∞ =supx∈X |f(x)|. A member f of C(X) is said to be uniformly approximated bymembers of a subset S of C(X) if f ∈ cl(S). This is equivalent to the existenceof a sequence {fn} in S converging uniformly to f on X.

Weierstrass’s approximation theorem asserts that any function in C([a, b])

may be uniformly approximated by polynomials. Stone’s generalization ofWeierstrass’s theorem replaces [a, b] by a compact metric space7 and the set ofpolynomials by a more general class of functions.

The proof of Weierstrass’s theorem given below is due to Lebesgue. Thebasic idea is to show that every continuous function may be uniformly approx-imated by piecewise linear functions and that these in turn may be uniformlyapproximated by polynomials.

7more generally, by a compact Hausdorff topological space.

8.8.1 Definition. Let a = x0 < x1 < . . . < xk = b. A function g on [a, b] issaid to be piecewise linear with vertices (xj , yj) if, for j = 0, 1, . . . , k − 1,

g(x) = yj +mj(x− xj), mj = yj+1 − yjxj+1 − xj

, xj ≤ x ≤ xj+1. ♦

Note that a piecewise linear function is necessarily continuous and thatits graph consists of a sequence of line segments joined at the vertices. (SeeFigure 8.10.)

a x1 x2 x3 x4 b

y0

y1y2

y5

y4

y3

x

y

FIGURE 8.10: A piecewise linear function.

8.8.2 Lemma. Every continuous function f on [a, b] may be uniformly ap-proximated by a piecewise linear function.

Proof. Given ε > 0, choose δ > 0 such that |f(x) − f(y)| < ε/2 whenever|x − y| ≤ δ. Let x0 = a < x1 < · · · < xk = b be a partition of [a, b] withmesh < δ and let g be as in 8.8.1 with yj = f(xj). If xj ≤ x ≤ xj+1, then

|mj |(x− xj) = |f(xj+1)− f(xj)|x− xj

xj+1 − xj≤ |f(xj+1)− f(xj)| < ε/2,

hence|f(x)− g(x)| ≤ |f(x)− f(xj)|+ |mj |(x− xj) < ε.

8.8.3 Lemma. The function g in 8.8.1 may be written

g(x) = y0 +k−1∑j=0

cj(x− xj)+, a ≤ x ≤ b,

for suitably chosen constants cj.

Proof. For 0 ≤ j ≤ k − 1 and xj ≤ x ≤ xj+1, the desired equation reduces to

yj +mj(x− xj) = y0 +j∑i=0

ci(x− xi) = y0 −j∑i=0

cixi + x

j∑i=0

ci.

Metric Spaces 277

This holds iff

mj =j∑i=0

ci, and yj −mjxj = y0 −j∑i=0

cixi. (8.9)

The first equation in (8.9) is satisfied by taking c0 = m0 and cj = mj −mj−1,j ≥ 1. For this choice,

y0 −j∑i=0

cixi = y0 +j∑i=1

mi−1xi −j∑i=0

mixi

= y0 −mjxj +j−1∑i=0

mi(xi+1 − xi)

= y0 −mjxj +j−1∑i=0

(yi+1 − yi)

= yj −mjxj ,

which shows that the second equation in (8.9) is also satisfied.

8.8.4 Lemma. The functions |x| and x+ may be uniformly approximated bypolynomials on any bounded interval I.

Proof. By 7.4.10, the binomial series∞∑n=0

(1/2n

)(−t)n

converges uniformly to√

1− t on [−1, 1]. Setting t = 1− x2 we see that∞∑n=0

(1/2n

)(x2 − 1)n

converges uniformly to√x2 = |x| on [−1, 1]. Thus if sn(x) denotes the nth

partial sum of the last series and m is chosen so that I ⊆ [−m,m], thenQn(x) := msn(x/m) defines a sequence of polynomials converging uniformlyto |x| on I. Since x+ = 1

2 (x + |x|), the polynomials Pn(x) := 12(x + Qn(x)

)converge uniformly to x+ on I.

8.8.5 Weierstrass Approximation Theorem. The set of all polynomialson [a, b] is dense in C([a, b]). That is, every member of C([a, b]) may be uniformlyapproximated by polynomials.

Proof. Let f ∈ C([a, b]) and ε > 0. By 8.8.2, there exists a piecewise linearfunction g on [a, b] such that ‖f − g‖∞ < ε/2. By 8.8.3 and 8.8.4, there existsa polynomial P such that ‖P − g‖∞ < ε/2. Then, by the triangle inequality,‖f − P‖∞ < ε.


For the statement of the Stone–Weierstrass theorem, we need the followingdefinitions.

8.8.6 Definition. A collection A of real-valued functions on a set S is saidto be an algebra if A is closed under addition, multiplication, and scalarmultiplication; that is,

f, g ∈ A and α ∈ R⇒ f + g, fg, αf ∈ A.

A is said to separate points of S if for each pair of distinct points s and t in Sthere exists f ∈ A such that f(s) 6= f(t). ♦

For example, the collection of all polynomials on [a, b] is an algebra thatseparates points of [a, b].

8.8.7 Stone–Weierstrass Theorem. Let X be a compact metric space andlet A be an algebra in C(X) that contains the constant functions and separatespoints of X. Then A is dense in C(X).

Proof. Set B := cl(A). The proof that B = C(X) consists of the followingsequence of steps.

I. B is an algebra in C(X).J If fn, gn ∈ A, fn → f , gn → g, and α ∈ R, then

(a) ‖αfn − αf‖∞ = |α| |fn − f‖∞ → 0,(b) ‖(fn + gn)− (f + g)‖∞ ≤ ‖fn − f‖∞ + ‖gn − g‖∞ → 0, and(c) ‖fngn − fg‖∞ ≤ ‖fngn − fgn‖∞ + ‖fgn − fg‖∞

≤ ‖gn‖∞‖fn − f‖∞ + ‖f‖∞‖gn − g‖∞→ 0,

the convergence in (c) holding because {gn} is uniformly bounded. (Eachgn is bounded and gn converges uniformly to a bounded function.) ThusB is closed under addition, multiplication, and scalar multiplication. K

II. f ∈ B ⇒ |f | ∈ B.J Let M = ‖f‖∞. By 8.8.4 there exists a sequence of polynomials Pn(x)converging uniformly to |x| on [−M,M ]. It follows that Pn ◦ f convergesuniformly to |f | on X. Because B is an algebra containing the constants,Pn ◦ f ∈ B. Indeed, if Pn(x) =

∑kj=0 ajx

j , then Pn ◦ f =∑kj=0 ajf

j .Since B is closed, |f | ∈ B. K

III. f1, . . . , fk ∈ B ⇒ max{f1, . . . , fk}, min{f1, . . . , fk} ∈ B.J By induction, it suffices to consider the case k = 2. This follows fromstep II and the identities

max{f1, f2} = 12(f1 + f2 + |f1 − f2|

),

min{f1, f2} = 12(f1 + f2 − |f1 − f2|

). K

Metric Spaces 279

IV. Let f ∈ C(X). Then for each pair of distinct points x, y in X there existsa function gxy ∈ A such that gxy(x) = f(x) and gxy(y) = f(y).J Choose a function h ∈ A such that h(x) 6= h(y) (A separates points).Define

gxy(z) = f(x) + f(x)− f(y)h(x)− h(y)

(h(z)− h(x)

), z ∈ X.

Because A contains the constant functions, gxy ∈ A. Clearly, gxy(x) =f(x) and gxy(y) = f(y). K

V. If f ∈ C(X), x ∈ X, and ε > 0, then there exists a function gx ∈ B suchthat

gx(x) = f(x) and gx(z) < f(z) + ε for all z ∈ X.

J By continuity, for each y ∈ X the set

Uy := {z ∈ X : gxy(z) < f(z) + ε}

is open in X, where gxy is the function in step IV. Moreover, Uy containsboth x and y. Since X is compact, there exist y1, . . . yk ∈ X such that

X = Uy1 ∪ · · · ∪ Uyk .

Set gx := min{gxy1 , . . . , gxyk}. Then gx clearly has the required propertiesand, by step III, gx ∈ B. K

VI. If f ∈ C(X) and ε > 0, then there exists a function g ∈ B such that

f(z)− ε < g(z) < f(z) + ε, for all z ∈ X.

J By continuity, for each x ∈ X the set

Vx := {z ∈ X : gx(z) > f(z)− ε}

is open in X, where gx is the function in step V. Moreover, Vx clearlycontains x and

f(z)− ε < gx(z) < f(z) + ε, for all z ∈ Vx.

Since X is compact, there exist x1, . . . , xm ∈ X such that

X = Vx1 ∪ · · · ∪ Vxk .

Set g := max{gx1 , . . . , gxm}. By step III, g ∈ B, and g clearly satisfiesthe desired inequality. K

To complete the proof of the theorem, observe that step VI asserts thatC(X) = cl(B). Since B is closed, C(X) = B.

8.8.8 Example. A trigonometric polynomial is a function on R of the form

T (x) = a0 +m∑j=1

aj cos(jx) + bj sin(jx), aj , bj ∈ R.

The collection T ([a, b]) of all trigonometric polynomials on the interval [a, b]clearly contains the constant functions and is closed under addition and scalarmultiplication. Since

sin jx sin kx = 12[

sin(j − k)x+ sin(j + k)x],

with similar identities holding for sin jx cos kx and cos jx cos kx, T ([a, b]) isan algebra.

If 0 < b− a < 2π, then {cosx, sin x}, and hence T ([a, b]), separate pointsof [a, b]. By the Stone–Weierstrass theorem, every member ofC([a, b]) may beuniformly approximated by trigonometric polynomials on [a, b].

If b− a = 2π, then T ([a, b]) no longer separates points of [a, b]. However,in this case every member f of C([a, b]) with f(a) = f(b) may be uniformlyapproximated by a trigonometric polynomial. We verify this for the interval[0, 2π]. Let E denote the algebra of continuous functions f : [0, 2π]→ R withf(0) = f(2π), and let X denote the circle x2 + y2 = 1 with the Euclidean R2

metric. For each f ∈ E , define Ff : X → R by

Ff (cos t, sin t) = f(t), 0 ≤ t ≤ 2π.

It is straightforward to verify that Ff is continuous. For example, if(cos tn, sin tn) → (1, 0), then every convergent subsequence {tnk} convergeseither to 0 or to 2π, hence

Ff (cos tnk , sin tnk) = f(tnk)→ f(0) = f(1) = Ff (1, 0).

The setA := {FT : T ∈ T ([0, 2π])}

is easily seen to be an algebra that contains the constant functions. Moreover,A separates points of X. Indeed, if x := (cos s, sin s) and y := (cos t, sin t)with x 6= y, then, say, cos s 6= cos t hence FT (x) 6= FT (y), where T (x) = cosx.Therefore, each Ff may be uniformly approximated on X by members of A.It follows that each member of E may be uniformly approximated on [0, 2π]by trigonometric polynomials. ♦

Exercises1. Give an example of a bounded continuous function that cannot be

approximated uniformly by polynomials on (0, 1).

2. Let f be continuous on [a,+∞) such that limx→+∞ f (m)(x) 6= 0 for allsufficiently large m ∈ N. Prove that f cannot be uniformly approximatedby polynomials on [a,+∞). Give an example of such a function.

Metric Spaces 281

3.S Let f ∈ C([a, b]) have the property that∫ baxnf(x) dx = 0 for all n ∈ Z+.

Prove that f = 0 on [a, b]. Show that if a ≥ 0, then it is enough that thegiven property holds for even integers n in Z+.

4. Let f : [a, b]→ R have continuous derivatives up to order k such that∫ b

a

xnf (k)(x) dx = 0 for all n ∈ Z+.

Prove that f is a polynomial.

5. Let f : [a, b]→ R have continuous derivatives up to order k. Prove thatthere exists a sequence of polynomials Pn such that limn P

(j)n = f (j)

uniformly on [a, b] for j = 0, 1, . . . , k.

6.S Let X be compact and let A be an algebra in C(X) that contains theconstant functions and separates the points of X. Let x0 ∈ X and letf ∈ C(X) satisfy f(x0) = 0. Prove that there exists a sequence fn ∈ Aconverging uniformly to f such that fn(x0) = 0 for all n.

7. Show that there exists a sequence of polynomials Pn converging uniformlyto sin x on [0, π] such that Pn(0) = Pn(π) = 0 for all n.

8. Let f be an odd (even) continuous function on [−a, a], a > 0. Prove thatthere is a sequence of odd (even) polynomials that converges uniformlyto f on [−a, a].

9.S Let f ∈ C([0, 2π]) have the properties f(0) = f(2π) and∫ 2π

0f(x) sinm x cosn x dx = 0 for all m, n ∈ Z+.

Prove that f is identically zero on [0, 2π].

10. Let f : R → R be continuous and periodic with period 2π. Prove thatthere exists a sequence of trigonometric polynomials that convergesuniformly to f on R.

11.S Let f ∈ C([−π/2, π/2]) with f(0) = 0. Prove that f can be uniformlyapproximated on [−π/2, π/2] by functions of the form

∑mj=1 bj sin(jx).

12. Let g be continuous and one-to-one on [a, b]. Prove that any functionin C

([a, b]

)may be uniformly approximated by functions of the form∑m

j=0 ajgj .

13. Prove the following version of the Stone–Weierstrass theorem: If V is alinear subspace of C(X) that contains the constant functions, separatespoints of X, and contains |f | for all f ∈ V, then V is dense in C(X).

14. Show that for any f ∈ C([0, 2π]) there exists a sequence of trigonometricpolynomials Tn such that

∫ 2π0 |f − Tn| → 0.

15.S Let X and Y be compact metric spaces and let f(x, y) ∈ C(X × Y ) bea continuous real-valued function on X × Y . Show that for every ε > 0there exist g1, . . . , gn ∈ C(X) and h1, . . . , hn ∈ C(Y ) such that∣∣∣f(x, y)−

n∑i=1

gi(x)hi(y)∣∣∣ < ε for all (x, y) ∈ X × Y .

16. Let E0 denote the algebra of all continuous functions f [a, b] :→ R suchthat f(a) = f(b) = 0. If A0 is an algebra in E that separates points of(a, b) show that A0 is dense in E0 in the uniform norm. Hint. Use ideasof 8.8.8 by considering the algebra generated by A0 and the constantfunctions.

17. Let C0(R) denote the algebra of all continuous functions f on R suchthat limt→±∞ f(t) = 0. Let B0 be an algebra in C0(R) that separatespoints of R. Show that B0 is dense in C0(R) in the uniform norm. Hint.Consider θ(t) = tan−1[(t− π)/2], 0 < t < 2π and use Exercise 16.

*8.9 Baire’s TheoremLet (X, d) be a metric space. The diameter d(E) of a nonempty subset E

of X is defined byd(E) = sup

x,y∈Ed(x, y).

8.9.1 Lemma. If X is complete, then the intersection C of any decreasingsequence of nonempty closed sets Cn in X with d(Cn)→ 0 contains a singlepoint.

Proof. For each n choose a point xn ∈ Cn. If m > n, then xm ∈ Cn, henced(xm, xn) ≤ d(Cn). Since d(Cn) → 0, {xn} is Cauchy. Let xn → x. Sincexn, xn+1, . . . ∈ Cn and Cn is closed, x ∈ Cn for all n, that is, x ∈ C. Sinced(C) ≤ d(Cn)→ 0, C = {x}.

8.9.2 Baire Category Theorem. Let X be a complete metric space. Thenthe following statements hold:

(a) If Un ⊆ X is open and dense in X for all n, then G :=⋂n Un is dense in

X.

(b) If Cn ⊆ X is closed and has empty interior for all n, then F :=⋃n Cn

has empty interior.

Metric Spaces 283

Proof. To prove (a), we show that B∩G 6= ∅ for any open ball B. Since B∩U1 isopen and nonempty, C1 := Cr1(x1) ⊆ B ∩U1 for some x1 ∈ X and 0 < r1 ≤ 1.Since Br1(x1) ∩ U2 is open and nonempty, C2 := Cr2(x2) ⊆ Br1(x1) ∩ U2 forsome x2 ∈ X and 0 < r2 < 1/2. Continuing in this manner, we obtain adecreasing sequence of closed balls Cn ⊆ B ∩ Un with diameters tending tozero. By 8.9.1,

⋂n Cn contains a point x. Then x ∈ B ∩ Un for all n, hence

x ∈ B ∩G.Part (b) follows from (a). Indeed, suppose int(Cn) = ∅ for all n. Then

Un := Ccn is dense in X, hence⋂n Un is dense in X. It follows that the interior

of (⋂n Un)c = F is empty.

We give three applications of Baire’s theorem. The first is known as theprinciple of uniform boundedness.

8.9.3 Theorem. Let X and Y be complete normed vector spaces and let L bea family of continuous linear transformations from X to Y such that

supT∈L‖Tx‖ <∞ for each x ∈ X .

Then there exists M > 0 such that ‖Tx‖ ≤M‖x‖ for all x ∈ X and T ∈ L.

Proof. For each n, set

Cn = {x ∈ X : ‖Tx‖ ≤ n for all T ∈ L}.

By hypothesis, X =⋃n Cn. By continuity of the transformations T , each Cn

is closed. Therefore, Baire’s theorem shows that int(Cn) 6= ∅ for some n. Thusthere exists x0 and r > 0 such that ‖Ty‖ ≤ n for all T ∈ L and y ∈ X with‖y − x0‖ ≤ r. If ‖x‖ ≤ r, then, taking y = x+ x0, we have

‖Tx‖ ≤ ‖Tx+ Tx0‖+ ‖Tx0‖ = ‖Ty‖+ ‖Tx0‖ ≤ n+ ‖Tx0‖.

It follows that for all x 6= 0 and T ∈ L∥∥∥∥T ( rx

‖x‖

)∥∥∥∥ ≤ n+ ‖Tx0‖

hence‖Tx‖ ≤ r−1(n+ ‖Tx0‖

)‖x‖.

The following corollary is one of the few instances in analysis (Dini’stheorem being another) when pointwise convergence of a sequence of continuousfunctions is sufficient to convey the property continuity to the limit function.

8.9.4 Corollary. Let X and Y be complete normed vector spaces and let {Tn}be a sequence of continuous linear transformations from X to Y convergingpointwise on X to a function T . Then T is linear and continuous.

Proof. Linearity of T is clear. For continuity, note that supn ‖Tnx‖ < +∞ foreach x ∈ X , hence, by the theorem, there exists M > 0 such that ‖Tnx‖ ≤M‖x‖ for all n and x. Letting n → +∞ yields ‖Tx‖ ≤ M‖x‖, hence T iscontinuous.

For the second application of Baire’s theorem, recall that there existfunctions f : R → R whose set of discontinuity points is precisely Q (3.3.3).The obvious question raised by this fact is answered in the following theorem.

8.9.5 Theorem. There is no function f : R → R whose set of continuitypoints is precisely Q.

Proof. For each n, let Un denote the union of all intervals (a, b) such that|f(x) − f(y)| < 1/n for all x, y ∈ (a, b). Then Un is open and the set ofcontinuity points of f is precisely C :=

⋂∞n=1 Un. Suppose that C = Q. Then

each Un contains Q and hence is dense in R. Let {r1, r2, . . .} be an enumerationof Q. Then the open sets Vm := R \ {rm} are also dense in R and haveintersection I. By Baire’s theorem, the collection of sets {Un, Vm : m, n ∈ N}has a nonempty intersection. But this intersection is Q ∩ I = ∅. Therefore, Ccannot equal Q.

The last application of Baire’s theorem shows that there is a rich supplyof continuous, nowhere differentiable functions. For the proof we need thefollowing lemma.

8.9.6 Lemma. If g is piecewise linear on [a, b], then there exists M > 0 suchthat

|g(x)− g(y)| ≤M |x− y| for all x, y ∈ [a, b].

Proof. Let g be as in 8.8.1 and set M = maxj{|mj |}. If

xi ≤ x ≤ xi+1 ≤ xj ≤ y ≤ xj+1

then

|g(x)− g(y)| ≤ |g(x)− g(xi+1)|+ |g(xi+1)− g(xi+2)|+ · · ·+ |g(xj)− g(y)|≤ |mi|(xi+1 − x) + |mi+1|(xi+2 − xi+1) + · · ·+ |mj |(y − xj)≤M(y − x).

8.9.7 Theorem. The set of all continuous, nowhere differentiable functionson an interval [a, b] is dense in C([a, b]) in the uniform norm.

Proof. For each n ∈ N and f ∈ C([a, b]) define

En(f) = {x ∈ [a, b] : |f(y)− f(x)| ≤ n|x− y| for all y ∈ [a, b]}.

we break the proof into several steps:

Metric Spaces 285

I.⋃∞n=1En(f) contains all points at which f is differentiable.

J Let x be such a point and choose δ > 0 such that∣∣∣∣f(y)− f(x)y − x

− f ′(x)∣∣∣∣ < 1 for all y ∈ [a, b] with 0 < |x− y| < δ.

Then

|f(y)− f(x)| ≤{(

1 + |f ′(x)|)|y − x| if |x− y| < δ,

2‖f‖∞ ≤ 2δ−1‖f‖∞|y − x| if |x− y| ≥ δ,

which shows that x ∈ En(f) for all n > 1 + |f ′(x)|+ 2δ−1‖f‖∞. K

II. En := {f ∈ C([a, b]) : En(f) 6= ∅} is closed in C([a, b]).J Let {fk} be a sequence in En converging uniformly to f ∈ C([a, b]).For each k, choose a point xk ∈ En(fk). We may assume that xk → x forsome x ∈ [a, b] (otherwise, take a subsequence). Then for all y ∈ [a, b],

|f(y)− f(x)| ≤ |f(y)− fk(y)|+ |fk(y)− fk(xk)|+ |fk(xk)− fk(x)|+ |fk(x)− f(x)|

≤ 2‖f − fk‖∞ + n|y − xk|+ n|xk − x|.

Letting k →∞ shows that |f(y)− f(x)| ≤ n|y − x|, that is, x ∈ En(f).Therefore, f ∈ En. K

III. Ecn is dense in C([a, b]).J Let f ∈ C([a, b]) and ε > 0. We construct a function h ∈ Bε(f)∩Ecn. By8.8.2, there exists a piecewise linear function g such that ‖f −g‖∞ < ε/2.By 8.9.6, there exists M > 0 such that

|g(x)− g(y)| ≤M |x− y| for all x, y ∈ [a, b].

Let r > 0 and let x0 = a < x1 < · · · < x2p = b be a partition of [a, b]with mesh < r. Construct a “sawtooth” piecewise linear function hr with

x2x0 x4 x6 x8

xx1 x3 x5 x7

x

xj

1

−1

c = |hr(x)− hr(xj)| ≥ 1, |x− xj | < r

c

hr

FIGURE 8.11: The sawtooth function hr.

vertices

(x0, 1), (x2, 1), . . . , (x2p, 1) and (x1,−1), (x3,−1), . . . , (x2p−1,−1),

and set h := g + εhr/2. Then

‖h− f‖∞ ≤ ‖h− g‖∞ + ‖g− f‖∞ = ε

2‖hr‖∞ + ‖g− f‖∞ <ε

2 + ε

2 = ε,

so h ∈ Bε(f). To show that h ∈ Ecn, let x be an arbitrary member of[a, b]. If hr(x) ≤ 0 (≥ 0) choose xj such that |x− xj | < r and hr(xj) = 1(= −1) (see Figure 8.11). Then |hr(x)− hr(xj)| ≥ 1, hence

|h(x)− h(xj)| ≥ε

2 |hr(x)− hr(xj)| − |g(x)− g(xj)|

≥ ε

2 −M |x− xj |

≥( ε

2r −M)|x− xj |.

If r is chosen so that ε

2r −M > n, then x 6∈ En(h), hence h 6∈ En. K

To complete the proof note that by step III and Baire’s theorem, F :=⋂∞n=1 Ecn is dense in C([a, b]). Since f ∈ F implies that En(f) = ∅ for every

n, and since a point at which f is differentiable must lie in some En(f), nomember of F can be differentiable at any point of [a, b].

Exercises1.S Prove the converse of 8.9.1: If the intersection of any decreasing sequence

of nonempty closed sets Cn in X with d(Cn)→ 0 contains a single point,then X is complete.

2. Let Q have the usual metric. Find a decreasing sequence of closed setsCn in Q with d(Cn)→ 0 and

⋂n Cn = ∅.

3.S Show that 8.9.2 does not hold in Q with the usual metric.

4. Let D = {x1, x2, . . .} be a proper subset of a complete metric space X.Show that (a) and (b) of 8.9.2 hold for Y := X \D. Conclude that the setof irrationals I with the usual metric satisfies (a) and (b) of the theorem.

Chapter 9Differentiation on Rn

For the remainder of the book, the Euclidean norm‖ · ‖2 on the spaces Rn will be denoted simply by ‖ · ‖.

In this chapter we extend the ideas of Chapter 4 to vector-valued functionsof several variables. This will require some notions from linear algebra, a briefreview of which may be found in Appendix B.

9.1 Definition of the DerivativeTo motivate the general definition of the derivative of a function on Rn,

we begin with two important special cases.

Derivative of a Vector-Valued Function of a Real VariableThe definition of derivative in this case is a natural extension of the

definition of the derivative of a scalar-valued function:

9.1.1 Definition. Let I ⊆ R be an interval and a ∈ I. A function f : I → Rmis said to be differentiable at a if the (vector) limit

f ′(a) := limh→0

f(a+ h)− f(a)h

= limt→a

f(t)− f(a)t− a

exists in Rm. (The limit is one-sided if a is an endpoint of I.) The vector f ′(a)is called the derivative of f at a. If f is differentiable at each point in I, thenf is said to be differentiable on I and the resulting function f ′ : I → Rm iscalled the derivative of f on I. ♦

The function f may be viewed as a parametrization of a curve C in Rm.The vector f ′(a) is then called the tangent vector to C at the point f(a). Ifthe variable t is interpreted as time, then C may be viewed as the path ofa particle in Rm. In this context, f ′(a) is called the velocity of the particleand ‖f ′(a)‖ the speed. The curve is said to be smooth if f ′ is continuous andnonzero on I. Parameterized curves will be examined in detail in Chapter 12.

287


Note that the function f : I → Rm may be written f = (f1, . . . , fm), wherefj : I → R is the jth component function of f .

9.1.2 Proposition. Let I be an interval and f = (f1, . . . , fm) : I → Rm.Then f is differentiable at a ∈ I iff each fj is differentiable at a, in which casef ′(a) = (f ′1(a), . . . , f ′m(a)). In particular, if f is differentiable at a, then f iscontinuous at a.

Proof. The assertions follow directly from the inequalities∣∣∣∣fj(a+ h)− fj(a)h

− xj∣∣∣∣2 ≤ ∥∥∥∥f(a+ h)− f(a)

h− (x1, . . . , xm)

∥∥∥∥2

≤m∑i=1

∣∣∣∣fi(a+ h)− fi(a)h

− xi∣∣∣∣2 .

The differential of f at a is the linear transformation dfa : R→ Rm thattakes a real number h to the vector hf ′(a):

dfa(h) = hf ′(a), h ∈ R.

Definition 9.1.1 may then be rephrased as follows: f is differentiable at a iffthere exists a linear transformation T : R→ Rm such that

limh→0

f(a+ h)− f(a)− Th|h|

= 0,

in which case T = dfa

Derivative of a Real-Valued Function of Several VariablesThe derivative of a scalar-valued function of n variables is defined as follows:

9.1.3 Definition. Let U ⊆ Rn be open and a ∈ U . Then f : U → R is saidto be differentiable at a if there exists a vector f ′(a) in Rn such that

limh→0

f(a+ h)− f(a)− f ′(a) · h‖h‖

= 0. (9.1)

The vector f ′(a) is called the derivative of f at a. The differential of f at a isthe linear transformation dfa ∈ L(Rn,R) defined by

dfa(h) = f ′(a) · h, h ∈ Rn. ♦

Now letej = (0, . . . , 0,

j

1, 0, . . . , 0), j = 1, . . . , n,denote the standard basis vectors in Rn. If f ′(a) exists, then, taking h = tej

in (9.1), we have

limt→0

f(a+ tej)− f(a)− tf ′(a) · ej

t= 0,

Differentiation on Rn 289

or, equivalently,

limt→0

f(a+ tej)− f(a)t

= f ′(a) · ej . (9.2)

The expression the right is just the jth component of f ′(a). The limit on theleft is called the jth partial derivative of f at a and is denoted variously by

∂jf = fxj = ∂f

∂xj.

We have proved the following result.

9.1.4 Proposition. If f is differentiable at a, then the partial derivatives∂jf(a) of f exist at a and

f ′(a) =(∂1f(a), ∂2f(a), . . . , ∂nf(a)

). (9.3)

In particular, the derivative is unique.

The vector on the right in (9.3) is called the gradient of f at a and isdenoted by ∇f or grad f . The linear transformation dfa ∈ L(Rn,R) may nowbe written

dfa(h) = ∇f(a) · h, h ∈ Rn. (9.4)

For an alternate notation, let dxj : Rn → R be the linear function defined by

dxj(h) = hj , h = (h1, . . . , hn).

Then dfa may be expressed as

dfa(h) =n∑j=1

∂f(a)∂xj

dxj(h).

If the partial derivatives of f exist at each point of U , we write simply

df =n∑j=1

∂f

∂xjdxj .

For example,

d sin(x2y) = 2xy cos(x2y) dx+ x2 cos(x2y) dy.

We show below that if f has continuous partial derivatives on U , then f isdifferentiable on U . The continuity hypothesis cannot be removed: There arefunctions f that are not differentiable on U but whose partial derivatives existthroughout U . This is the case for the function in the following example.


9.1.5 Example. Let m ∈ N. The function

f(x, y) =

xmy

x2 + y2 if (x, y) 6= (0, 0),

0 otherwise

exhibits a variety of behavior depending on the values of m. The partialderivatives of f are

fx(x, y) =

mxm+1y +mxm−1y3 − 2xm+1y

(x2 + y2)2 , if x 6= (0, 0),

0 otherwise,

fy(x, y) =

xm(x2 − y2)(x2 + y2)2 , if x 6= (0, 0),

0 otherwise.

If m = 1, f is not continuous at (0, 0), hence is not differentiable there (see9.1.11, below). If m = 1 or 2, the partial derivatives exist at (0, 0) but arenot continuous there. If m = 2, the function is continuous at (0, 0), with zeropartial derivatives at (0, 0), but is not differentiable there since in this casethe limit

limx→0

f(x)− f(0)− 0 · x‖x‖

,= lim(x,y)→(0,0)

x2y

(x2 + y2)3/2

fails to exist. If m ≥ 3, f has continuous partial derivatives and is differentiableon R2. ♦

The definition of the jth partial derivative of f at a may be writtenexplicitly as

∂jf(a) = limh→0

f(a1, . . . , aj + h, . . . , an)− f(a1, . . . , aj , . . . , an)h

.

This is simply the derivative at aj of the one-variable function

t 7→ f(a1, . . . , aj−1, t, aj+1, . . . , an).

Thus to find the jth partial derivative of f(x1, . . . , xj , . . . , xn), one simplydifferentiates f with respect to xj while holding the other variables fixed. Itfollows that the standard formulas for derivatives of functions of one variablehold for partial derivatives of functions of several variables. For example, theproduct rule takes the form

∂j(fg)(a) = f(a)∂jg(a) + g(a)∂jf(a),

and the quotient rule becomes

∂j

(f

g

)(a) = g(a)∂jf(a)− f(a)∂jg(a)

g2(a) , g(a) 6= 0.


Derivative of a Vector-Valued Function of Several VariablesWe now consider the general case. The following definition includes the

two special cases discussed before.

9.1.6 Definition. Let U ⊆ Rn be open. A function f : U → Rm is said to bedifferentiable at a ∈ U if there exists a linear transformation dfa : Rn → Rm,called the differential of f at a, such that

limh→0

f(a+ h)− f(a)− dfa(h)‖h‖

= 0.

The m × n matrix [dfa] is called the derivative of f at a, or the Jacobianmatrix of f at a, and is denoted by f ′(a). ♦

9.1.7 Example. If T ∈ L(Rn,Rn), then, by the linearity of T ,

T (x+ h)− T (x)− Th = 0 for all h.

It follows that dTx = T for all x. This is the n-dimensional version of thefamiliar result that the derivative of the function x→ tx is the constant t. ♦

9.1.8 Theorem. Let U ⊆ Rn be open, f = (f1, . . . , fm) : U → Rm, andlet a ∈ U . Then f is differentiable at a iff each function fi : U → R isdifferentiable at a. In this case, ∂jfi(a) exists and equals dfa(ej) · ei, and

dfa(h) =(∇f1(a) · h, . . . ,∇fm(a) · h

), h ∈ Rn. (9.5)

In particular, if the differential exists, it is unique.

Proof. Let f be differentiable at a. For i = i, . . . ,m and j = 1, . . . , n, letbij = dfa(ej) · ei, the ith component of dfa(ej) and the (i, j)th entry of thematrix [dfa]. Then

dfa(h) =(b1 · h, . . . , bm · h

), where bi := (bi1, . . . , bin).

Thus for each i,

|fi(a+ h)− fi(a)− bi · h| ≤ ‖f(a+ h)− f(a)− dfa(h)‖,

from which it follows that

limh→0

fi(a+ h)− fi(a)− bi · h‖h‖

= 0.

Therefore, the derivative of fi at a exists and equals bi. By 9.1.4, bi = ∇fi(a),that is, bij = ∂jfi(a).

Conversely, suppose each fj is differentiable at a. Then ∇fj(a) exists andby (9.4),

limh→0

|fi(a+ h)− fi(a)−∇fi(a) · h|‖h‖

= 0, i = 1, . . . ,m.


Let T (h) denote the right side of (9.5). Then T is linear and

‖f(a+ h)− f(a)− T (h)‖2‖h‖2

=m∑i=1

|fi(a+ h)− fi(a)−∇fi(a) · h|2

‖h‖2→ 0

as h→ 0. Therefore, dfa exists and equals T .

By the theorem, the (i, j) entry of f ′(a) is ∂jfi(a). The effect of dfa on avector h ∈ Rn may therefore be expressed in matrix form as

f ′(a)ht =

∂1f1(a) · · · ∂nf1(a)...

...∂1fm(a) · · · ∂nfm(a)

h1

...hn

=

∇f1(a) · h...

∇fm(a) · h

,where ht denotes the transpose of the vector h. In the special case m = n, thedeterminant of f ′(a) is called the Jacobian of f at a and is denoted variouslyby

det f ′(a) = Jf (a) = ∂(f1, . . . , fn)∂(x1, . . . , xn) (a).

9.1.9 Example. The transformation (x, y, z) = (r cos θ, r sin θ, z) from cylin-drical coordinates to rectangular coordinates in R3 has Jacobian

∂(x, y, z)∂(r, θ, z) =

∣∣∣∣∣∣cos θ sin θ 0−r sin θ r cos θ 0

0 0 1

∣∣∣∣∣∣ = r. ♦

The following characterization of differentiability will be useful.

9.1.10 Theorem. Let f : U → Rm, where U ⊆ Rn is open. Then f isdifferentiable at a ∈ U iff there exists T ∈ L(Rn,Rm) and, for sufficientlysmall r, a function η : Br(0)→ Rm such that

f(a+ h) = f(a) + Th+ ‖h‖ η(h), and limh→0

η(h) = 0. (9.6)

In this case, T = dfa.

Proof. Assume that f is differentiable at a. Choose r > 0 such that Br(a) ⊆ Uand define η : Br(0)→ Rm by η(0) = 0 and

η(h) = f(a+ h)− f(a)− dfa(h)‖h‖

if h 6= 0.

Then (9.6) holds with T = dfa.Conversely, if (9.6) holds for some η and T , then

limh→0

‖f(a+ h)− f(a)− Th‖‖h‖

= limh→0‖η(h)‖ = 0,

hence f is differentiable at a with dfa = T .

9.1.11 Corollary. If f is differentiable at a, then f is continuous at a.

Proof. By (9.6) and the continuity of linear transformations,

limh→0

[f(a+ h)− f(a)

]= lim

h→0‖h‖η(h) + lim

h→0dfa(h) = 0.

Exercises1. Find the differential df for each of the functions f(x, y):

(a) S x− yx+ y

. (b) ln(x2 + y3). (c) arctan (xy2).

(d) cos xy. (e) S sin (x2y). (f) arcsin y

x, 0 < y < x.

(g) sec (yex). (h) S exy2. (i) tan 3x+ 2y

2x+ 3y .

2. Find f ′(x) where f(x) =

(a)(x3 − y3, x2y2). (b)S

(ex sin y, ey sin x

). (c)S

(xy

x2 + y2 ,x2 − y2

x2 + y2

).

(d)(

ln(x2 + y2 + z2 + 1), xyz). (e)

(arctan(x− y), exy, x/y

).

3. For each of the functions f(x, y) below, find all values of p, q ∈ N forwhich on R2

(i) fx, fy exist, (ii) fx, fy are continuous, (iii) f ′ exists.

(a) S

xp + yq

x2 + y2 if (x, y) 6= 0,

0 otherwise.(b)

xpyq

x2 + y2 if (x, y) 6= 0,

0 otherwise.

(c){

(x− y)p sin(x− y)−1 if x 6= y,0 otherwise.

(d) S

xp sin 1x

+ yq if x 6= 0,yq otherwise.

(e)

xp + yq

x− yif x 6= y,

0 otherwise.(f)

xpyq

x− yif (x, y) 6= 0,

0 otherwise.

4. Find all values of p, q, s ∈ (0,+∞) for which on R2

(i) fx, fy exist, (ii) fx, fy are continuous, (iii) f ′ exists,where f(0, 0) = 0 and, for (x, y) 6= (0, 0), f(x, y) =

(a)S |x|p|y|q ln(x2 + y2). (b) sin(x2 + y2)p(x2 + y2)q . (c) tan(x2 + y2)p

(x2 + y2)q .

(d)S sin |x|p|y|q(x2 + y2)s . (e) sin−1 |x|p|y|q

(x2 + y2)s .

5. Spherical coordinates (ρ, φ, θ) in R3 are defined by

x = ρ sinφ cos θ, y = ρ sinφ sin θ, z = ρ cosφ,

where ρ ≥ 0, 0 ≤ φ ≤ π, and 0 ≤ θ < 2π. Show that

∂(x, y, z)∂(ρ, φ, θ) = ρ2 sinφ.

6. Let (u, v) =(

sin f(x, y), cos f(x, y)). Find ∂(u, v)

∂(x, y) .

7. Let (u, v, w) = (y/z, z/x, x/y), where xyz 6= 0. Find ∂(u, v, w)∂(x, y, z) .

8.S Letf(x) =

n∑i=1

xaii and g(x) =n∏i=1

xaii ,

where xi, ai > 0 and∑i ai = 1. Find

(a) x ·∇f(x). (b) x ·∇g(x).

9. Let f(x) be defined implicitly by the equation

1f(x) =

n∑i=1

1xi.

Express ∇f(x) in terms of f .

10.S Let f(x) = ln(∑n

i=1 exi). Express ∇f(x) in terms of f .

11. Let the equation αxn − x1x2 · · ·xn−1 = 0, α 6= 0, define each of thevariables x1, . . . , xn−1 as a differentiable function of xn. Show that

xn−2n

∂x1

∂xn

∂x2

∂xn· · · ∂xn−1

∂xn= α.

12. Let x = (x1, . . . , xn). Find ∂i of

(a)S ‖x‖. (b) 1‖x‖

. (c)S xi‖x‖

. (d) xi‖x‖2

.

13. Let f : R→ R be differentiable and p > 0. Show that for x 6= 0,

x ·∇‖x‖p = p‖x‖p and x ·∇f(‖x‖p

)= pf ′

(‖x‖p

)‖x‖p.


9.2 Properties of the DifferentialIn this section we consider analogs of differentiation rules for single variable

functions. Deeper properties of the differential are taken up in later sections.

Linearity of the Differential9.2.1 Theorem. Let U ⊆ Rn be open, let f, g : U → Rm be differentiable ata ∈ U , and let α, β ∈ R. Then αf + βg is differentiable at a and

d(αf + βg)a = αdfa + βdga.

Proof. By 9.1.10, there exist functions η(h), µ(h), defined for h ∈ Rn withsufficiently small norm, such that

f(a+ h) = f(a) + dfa(h) + ‖h‖η(h), limh→0

η(h) = 0, and

g(a+ h) = g(a) + dga(h) + ‖h‖µ(h), limh→0

µ(h) = 0.

Then

(αf + βg)(a+ h) = (αf + βg)(a) + (αdfa + βdga)(h) + ‖h‖(αη + βµ

)(h)

andlimh→0

(αη + βµ

)(h) = 0.

Another application of 9.1.10 completes the proof.

The Norm of a Linear TransformationFor additional properties of the differential, including product rules, we need

the notion of operator norm on the space L(Rn,Rm) of linear transformationsfrom Rn to Rm.

9.2.2 Definition. Let T ∈ L(Rn,Rm). The operator norm of T is defined as

‖T‖ = sup{‖Tx‖ : x ∈ Rn, ‖x‖ = 1

}. ♦

The following proposition justifies the use of the term “norm.”

9.2.3 Proposition. ‖T‖ defines a norm on L(Rn,Rm) such that

‖Tx‖ ≤ ‖T‖ ‖x‖ for all x ∈ Rn. (9.7)

Moreover, if [aij ]m×n is the matrix of T , then for all k, `

|ak`| ≤ ‖T‖ ≤( m∑i=1

n∑j=1

a2ij

)1/2. (9.8)


Proof. Inequality (9.7) is clear if x = 0. If x 6= 0, then ‖x‖−1x has norm 1hence

‖x‖−1‖Tx‖ =∥∥T (‖x‖−1x)

∥∥ ≤ 1.

To verify (9.8), let ai = (ai1, . . . , ain). Since Tx =(a1 · x, . . . ,an · x

), by

the Cauchy–Schwarz inequality,

‖Tx‖2 =m∑i=1

(ai · x)2 ≤m∑i=1‖ai‖2‖x‖2 =

∑i,j

a2ij , ‖x‖ = 1,

which verifies the second inequality in (9.8). The first inequality follows from

|ak`|2 ≤m∑i=1|ai`|2 = ‖Te`‖2 ≤ ‖T‖2.

To see that ‖T‖ defines a norm, note that homogeneity follows directlyfrom the definition, and the triangle inequality ‖T1 + T2‖ ≤ ‖T1‖+ ‖T2‖ is aconsequence of

‖(T1 + T2)x‖ ≤ ‖T1x‖+ ‖T2x‖ ≤ ‖T1‖+ ‖T2‖, ‖x‖ = 1.

The property of coincidence follows directly from (9.7).

9.2.4 Corollary. A linear transformation T : Rn → Rm is uniformly contin-uous.

Proof. This follows from

‖Tx− Ty‖ = ‖T (x− y)‖ ≤ ‖T‖ ‖x− y‖,

using the linearity of T .

Since L(Rn,Rm) is a normed vector space, it is a metric space under thedistance function ρ(T1, T2) := ‖T1−T2‖. Thus the methods of Chapter 8 apply.In particular, we have the following consequence of 9.2.3.

9.2.5 Corollary. Let (X, d) be a metric space and let F be a function fromX to L(Rn,Rm). For each x ∈ X, let [aij(x)]m×n denote the matrix of F (x).Then F is (uniformly) continuous with respect to the metric ρ iff each functionaij(x) is (uniformly) continuous on X.

Proof. The matrix of F (x)− F (y) is [aij(x)− aij(y)]m×n, hence, by (9.8),

|ak`(x)− ak`(y)|2 ≤ ‖F (x)− F (y)‖2 ≤∑i,j

[aij(x)− aij(y)]2.

The assertion follows.


Product RulesWe consider two product rules; additional product rules, as well as a

quotient rule, are given in the exercises.

9.2.6 Theorem (Scalar Product Rule). Let U be open in Rn and f : U → Rmand ψ : U → R differentiable at a ∈ U . Then

d(ψf)a(h) = ψ(a)dfa(h) +(∇ψ(a) · h

)f(a), h ∈ Rn. (9.9)

Proof. By 9.1.10, there exist functions η(h) and µ(h), defined for h ∈ Rn withsufficiently small norm, such that

f(a+ h)− f(a)− dfa(h) = ‖h‖η(h), limh→0

η(h) = 0,

ψ(a+ h)− ψ(a)−∇ψ(a) · h = ‖h‖µ(h), limh→0

µ(h) = 0.

Let Th denote the right side of (9.9) and set

ν(h) := (ψf)(a+ h)− (ψf)(a)− Th= ψ(a+ h)f(a+ h)− ψ(a)f(a)− ψ(a)dfa(h)−

(∇ψ(a) · h

)f(a).

Then T is linear and

ν(h) = ψ(a+ h)[f(a+ h)− f(a)− dfa(h)

]+[ψ(a+ h)− ψ(a)−∇ψ(a) · h

]f(a) +

[ψ(a+ h)− ψ(a)

]dfa(h)

= ψ(a+ h)‖h‖η(h) + ‖h‖µ(h)f(a) +[ψ(a+ h)− ψ(a)

]dfa(h).

Since ‖dfa(h)‖ ≤ ‖dfa‖ ‖h‖,

‖ν(h)‖‖h‖

≤ |ψ(a+ h)| ‖η(h)‖+ |µ(h)| ‖f(a)‖+ ‖ψ(a+ h)− ψ(a)‖ ‖dfa‖.

By continuity of ψ at a, the right side of the last inequality tends to zero ash→ 0, proving the theorem.

9.2.7 Theorem (Dot Product Rule). Let U be open in Rn and f, g : U → Rmdifferentiable at a ∈ U . Then

d(f · g)a(h) = f(a) · dga(h) + g(a) · dfa(h), h ∈ Rn. (9.10)

Proof. Let η(h) and µ(h) be functions defined for sufficiently small ‖h‖ suchthat

f(a+ h)− f(a)− dfa(h) = ‖h‖η(h), limh→0

η(h) = 0,

g(a+ h)− g(a)− dga(h) = ‖h‖µ(h), limh→0

µ(h) = 0.

Let Th denote the right side of (9.10) and define

ν(h) :=(f · g)(a+ h)− (f · g)(a)− Th, h ∈ Rn

=f(a+ h) · g(a+ h)− f(a) · g(a)− f(a) · dga(h)− g(a) · dfa(h).

Then T is linear and

ν(h) = f(a+ h) ·[g(a+ h)− g(a)− dga(h)

]+ g(a) ·

[f(a+ h)− f(a)− dfa(h)

]+[f(a+ h)− f(a)

]· dga(h)

= ‖h‖f(a+ h) · µ(h) + ‖h‖g(a) · η(h) +[dfa(h) + ‖h‖η(h)

]· dga(h).

By the Cauchy–Schwarz and operator norm inequalities,

|ν(h)|‖h‖

≤ ‖f(a+ h)‖ ‖µ(h)‖+ ‖g(a)‖ ‖η(h)‖

+ ‖dfa‖ ‖dga‖ ‖h‖+ ‖η(h)‖ ‖dga‖ ‖h‖.

Since the right side of this inequality tends to zero as h→ 0 so does the left,completing the proof.

Continuity of the DifferentialIf U is an open subset of Rn and f : U → Rm is differentiable, then the

mapping x 7→ dfx is a function from U to L(Rn,Rm). Since L(Rn,Rm) is ametric space in the operator norm, the notion of continuity of this mapping ismeaningful.

9.2.8 Definition. Let U ⊆ Rn be open. A function f : U → Rm is said to becontinuously differentiable on U if dfx exists and is continuous as a functionof x on U . In this case, f is also said to be of class C1 on U . A function g iscontinuously differentiable on a subset E of Rn if g is the restriction to E of acontinuously differentiable function f on an open set U ⊇ E. ♦

9.2.9 Theorem. Let f = (f1, . . . , fm) : U → Rm, where U ⊆ Rn is open.Then f is continuously differentiable on U iff the partial derivatives ∂jfi,1 ≤ i ≤ m, 1 ≤ j ≤ n, exist and are continuous on U .

Proof. If f is continuously differentiable on U then, by 9.2.5, the matrix f ′(x)has continuous entries. By 9.1.8, these entries are the partial derivatives of thecomponents of f .

For the sufficiency, by 9.1.8 we may assume that m = 1, that is, f is real-valued. Suppose then that the partial derivatives ∂jf exist and are continuouson U . Let a ∈ U and ε > 0. Choose r > 0 such that Br(a) ⊆ U and fixh = (h1, . . . , hn) such that ‖h‖ < r. For 1 ≤ j ≤ n set

gj(t) := f(a+ hj(t)

), hj(t) := (h1, . . . , hj−1, thj , 0, . . . , 0), 0 ≤ t ≤ 1.


Thengj(1)− gj(0) = f

(a+ hj(1)

)− f

(a+ hj(0)

).

Also, by the mean value theorem and the chain rule, there exists tj ∈ (0, 1)such that

gj(1)− gj(0) = g′j(tj) = hj∂jf(a+ hj(tj)

).

Therefore,

f(a+ h)− f(a) =n∑j=1

[gj(1)− gj(0)

]=

n∑j=1

hj∂jf(a+ hj(tj)

),

hence

f(a+ h)− f(a)−∇f(a) · h =n∑j=1

[∂jf(a+ hj(tj)

)− ∂jf(a)

]hj = ν(h) · h,

whereν(h) :=

n∑j=1

[∂jf(a+ hj(tj)

)− ∂jf(a)

]ei.

Since limh→0 hj(tj) = 0, the continuity of ∂jf at a implies that limh→0 ν(h) =0. Since |ν(h) · h| ≤ ‖ν(h)‖ ‖h‖,

|f(a+ h)− f(a)−∇f(a) · h|‖h‖

≤ ‖ν(h)‖ → 0,

completing the proof.

Exercises1.S Prove that for T ∈ L(Rn,Rm),

‖T‖ = sup{‖Tx‖ : x ∈ Rn, ‖x‖ ≤ 1

}.

2. Let T1 ∈ L(Rm,Rk) and T2 ∈ L(Rn,Rm). Prove that

‖T1T2‖ ≤ ‖T1‖ ‖T2‖.

(We use the standard notation T1T2 for composition of linear operators.)

3.S (Quotient rule) Let U f , and ψ be as in 9.2.6. If ψ(a) 6= 0, prove that

d

[f

ψ

]a

(h) =ψ(a)dfa(h)−

(∇ψ(a) · h

)f(a)

ψ2(a) .

4. Find dgx(x), ‖x‖ 6= 0, if g(x) =(a)S ‖x‖x. (b) ‖x‖−2x. (c) ‖x‖−1x.

5. The cross product of vectors a = (a1, a2, a3) and b = (b1, b2, b3) is definedby

a× b =∣∣∣∣a2 a3b2 b3

∣∣∣∣ e1 −∣∣∣∣a1 a3b1 b3

∣∣∣∣ e2 +∣∣∣∣a1 a2b1 b2

∣∣∣∣ e3.

(See Exercise 1.6.9.) Let f : U → R3 and g : U → R3, where U ⊆ Rn isopen. Define f × g on U by

(f × g)(x) = f(x)× g(x).

Prove that

d(f × g)a(h) = f(a)× dga(h) + dfa(h)× g(a).

6.S Let V ⊆ Rp and W ⊆ Rq be open, f : V → Rk, g : W → Rk, andα, β ∈ R. Define F on V ×W ⊆ Rp+q by

F (x,y) = αf(x) + βg(y), x ∈ V, y ∈W.

If f is differentiable at a ∈ V and g is differentiable at b ∈ W , provethat F is differentiable at c := (a, b) and

dFc(h,k) = αdfa(h) + βdgb(k), h ∈ Rp, k ∈ Rq.

7. Let V ⊆ Rp and W ⊆ Rq be open and f : V → Rk, g : W → Rk. DefineF on V ×W ⊆ Rp+q by

F (x,y) = f(x) · g(y), x ∈ V, y ∈W.

If f is differentiable at a ∈ V and g is differentiable at b ∈ W , provethat F is differentiable at c := (a, b) and

dFc(h,k) = g(b) · dfa(h) + f(a) · dgb(k), h ∈ Rp, k ∈ Rq.

8. Formulate and prove the analog of Exercise 7 for cross products.

9. Let f : I → Rm be differentiable and ‖f‖ = 1 on an open interval I.Prove that f(t) and f ′(t) are perpendicular for all t, that is, f · f ′ = 0on I.

10.S Let f : [a, b]→ Rm be differentiable and v 6∈ A := f [a, b]. Referring toExercise 8.5.15 with d(x,y) = ‖x− y‖, show that(a) d(A,v) = ‖f(t0)− v‖ for some t0 ∈ [a, b].(b)

(f(t0)− v

)· f ′(t0) = 0 if t0 ∈ (a, b).

11. A path ϕ : [a, b] → Rn is piecewise smooth if there exists a partitiona0 = a < a1 < · · · < an = b of [a, b] such that ϕ′ exists and is continuouson each subinterval [aj−1, aj ]. Let U ⊆ Rn be nonempty open andconnected. Show that if ε > 0, then any pair of points can be joined bya piecewise smooth path ϕ in U such that supaj−1≤t≤aj ‖ϕ

′(t)‖ < ε foreach j.


9.3 Further Properties of the DifferentialIn this section we prove two important theorems, the first of which is an

n-dimensional version of the chain rule.

9.3.1 Chain Rule. Let U ⊆ Rn and V ⊆ Rm be open and f : U → Rm,g : V → Rk with f(U) ⊆ V . If f is differentiable at a ∈ U and g is dif-ferentiable at b := f(a), then g ◦ f : U → Rk is differentiable at a and thelinear transformation d(g ◦ f)a : Rn → Rk is the composition of the lineartransformations dgb : Rm → Rk and dfa : Rn → Rm:

d(g ◦ f)a = dgb ◦ dfa.

Proof. Choose r, s > 0 such that Br(a) ⊆ U and Bs(b) ⊆ V . By 9.1.10 thereexist functions η : Br(0)→ Rm and ν : Bs(0)→ Rk such that

f(a+ h) = f(a) + dfa(h) + ‖h‖η(h), limh→0

η(h) = 0, and (9.11)

g(b+ k) = g(b) + dgb(k) + ‖k‖ν(k), limk→0

ν(k) = 0. (9.12)

Setk = f(a+ h)− f(a) = dfa(h) + ‖h‖η(h). (9.13)

By the continuity of f at a, k ∈ Bs(0) for all sufficiently small ‖h‖. For such hset

µ(h) = (g ◦ f)(a+ h)− (g ◦ f)(a)− (dgb ◦ dfa)(h).To complete the proof we show that

limh→0

µ(h)‖h‖

= 0. (9.14)

From (9.11), (9.12), and (9.13),

µ(h) = g(b+ k)− g(b)− dgb(k) + dgb[k − dfa(h)]= ‖k‖ν(k) + ‖h‖dgb(η(h)),

hence‖µ(h)‖‖h‖

≤ ‖k‖‖h‖‖ν(k)‖+ ‖dgb(η(h)‖

Since‖k‖ =

∥∥dfa(h) + ‖h‖η(h)∥∥ ≤ (‖dfa‖+ ‖η(h)‖

)‖h‖,

we have‖µ(h)‖‖h‖

≤(‖dfa‖+ ‖η(h)‖

)‖ν(k)‖+ ‖dgb(η(h))‖.

Since h→ 0 implies k→ 0, (9.14) follows.


9.3.2 Remark. Let f be differentiable on U and g differentiable on V . Sety = f(x) and z = (g ◦ f)(x) = g(y). Then the chain rule may be written inmatrix form as (g ◦ f)′(x) = g′(y)f ′(x) or

∂z1

∂x1· · · ∂z1

∂xn......

∂zk∂x1

· · · ∂zk∂xn

=

∂z1

∂y1· · · ∂z1

∂ym...

...∂zk∂y1

· · · ∂zk∂ym

∂y1

∂x1· · · ∂y1

∂xn......

∂ym∂x1

· · · ∂ym∂xn

.

From this we obtain the familiar formulas

∂z`∂xj

=m∑i=1

∂z`∂yi

∂yi∂xj

j = 1, . . . , n, ` = 1, . . . , k. ♦

9.3.3 Example. Let the partial derivatives of u = f(x, y) and v = g(x, y)exist on R. If x = r cos θ and y = r sin θ, we may use the chain rule to find fx,fy, gx, and gy in terms of ur, vr, uθ, and vθ. Indeed, from 9.3.2,[

ur uθvr vθ

]=[fx fygx gy

] [cos θ −r sin θsin θ r cos θ

],

hence[fx fygx gy

]=[ur uθvr vθ

] [cos θ −r sin θsin θ r cos θ

]−1= 1r

[ur uθvr vθ

] [r cos θ r sin θ− sin θ cos θ

].

Thus, for example, fx = (cos θ)ur − r−1(sin θ)uθ. ♦

9.3.4 Remark. The chain rule may be used to suggest a definition of tangentplane to a smooth surface. Let f : U → R be differentiable on the open subsetU of Rn and let c ∈ R. The set

S = {x ∈ U : f(x) = c and ∇f(x) 6= 0}

is called a level surface of f in Rn. Let a ∈ S and let ϕ : (−r, r)→ Rn be asmooth path in S such that ϕ(0) = a. The existence of such paths may bejustified by the implicit function theorem, proved in the next section. Applyingthe chain rule to the identity f

(ϕ(t)

)= c, we see that

0 = (f ◦ ϕ)′(0) = ∇f(a) · ϕ′(0).

Since ϕ′(0) is tangent to the curve at a, ∇f(a) is perpendicular to S at a. Thetangent hyperplane to S at a is then defined as the set of all points x ∈ Rnsuch that x− a is perpendicular to ∇f(a), that is,

(x− a) ·∇f(a) = 0.

For example, the hyperplane tangent at a to the (n− 1)-dimensional sphere{x ∈ Rn : |x‖2 = 1

}is the set of all x such that

n∑i=1

2ai(xi − ai) = 0 or a · x = 1.

The tangent hyperplane at a to a surface S may be seen as the best linearapproximation to S near a. ♦

The second main result of this section is an n-dimensional version of themean value theorem of Chapter 4. While such a theorem is not generallyavailable for vector-valued functions (Exercise 14), there is a version for scalar-valued functions. For its statement, we recall that the line segment in Rn froma to b is defined by

[a : b] = {(1− t)a+ tb : 0 ≤ t ≤ 1} .

9.3.5 Mean Value Theorem. Let U ⊆ Rn be open and let f : U → R bedifferentiable on U . For each pair of points a, b ∈ U with [a : b] ⊆ U thereexists c ∈ [a : b] such that

f(b)− f(a) = dfc(b− a) = ∇f(c) · (b− a).

Proof. Set ϕ(t) = (1− t)a+ tb, 0 ≤ t ≤ 1, and g = f ◦ ϕ. Since ϕ′(t) = b− a,the chain rule and one-variable mean value theorem imply that

f(b)− f(a) = g(1)− g(0) = g′(c) = dfϕ(c)(b− a)

for some c ∈ (0, 1). Setting c = ϕ(c) completes the proof.

We conclude this section with two applications of the mean value theorem.

9.3.6 Theorem. Let U ⊆ Rn be open and let f : U → Rm be continu-ously differentiable on U . Let C ⊆ U be compact and convex and definec := supz∈C ‖dfz‖. Then c < +∞ and ‖f(x)− f(y)‖ ≤ c‖x− y‖, x, y ∈ C.

Proof. Since z 7→ dfz is continuous and C is compact, c < +∞. Let x, y ∈ Cand u ∈ Rm. By 9.3.5 applied to the scalar function g := u · f , there exists apoint c ∈ [x : y] ⊆ C such that

u ·[f(x)− f(y)

]= g(x)− g(y) = dgc(x− y) = u · dfc(x− y).

Taking u = f(x)−f(y) and using the Cauchy–Schwarz and the operator norminequalities, we have

‖f(x)− f(y)‖2 =[f(x)− f(y)

]·[dfc(x− y)

]≤ c‖f(x)− f(y)‖ ‖x− y‖.

Dividing by ‖f(x)− f(y)‖ completes the proof.


9.3.7 Corollary. Let U ⊆ Rn be open and connected and let f : U → Rm bedifferentiable on U . If dfx = 0 for all x ∈ U , then f is constant.

Proof. Let x ∈ U and choose r > 0 such that Cr(x) ⊆ U . Since Cr(x) iscompact and convex, 9.3.6 implies that

‖f(x)− f(y)‖ ≤ c‖x− y‖, y ∈ Cr(x), c := supz∈Cr(x)

‖dfz‖.

By hypothesis, c = 0, hence f(y) = f(x) for all y ∈ Cr(x). Thus f is constanton any ball contained in U .

Now let a ∈ U and define

Ua = {x ∈ U : f(x) = f(a)} and Va = {x ∈ U : f(x) 6= f(a)} .

By the first paragraph, if x ∈ Ua, then a ball with center x is contained in Ua.Therefore, Ua is open. A similar argument shows that Va is open. Since U isconnected and Ua 6= ∅, Ua = U , that is, f(x) = f(a) for all x ∈ U .

Exercises1.S Let g, ϕ, ψ : R → R be differentiable and let f(x, y) = g

(ϕ(x)ψ(y)

).

Find ∇f(x, y) in terms of g, ϕ, and ψ.

2. Let ϕ : R → R and g : R3 → R be differentiable and set f(x, y) :=g(x, ϕ(x+ 2y), ϕ(x− 3y)

). Find fy in terms of g and ϕ.

3.S Let g : R2 → R be differentiable, a, b ∈ Rn, and set f(x) = g(a·x, b·x).

Find ∇f .

4. Let the partial derivatives of f : R2 → R of f exist and let

z = f(x, y) = f(r cos θ, r sin θ).

Prove that [r∂z

∂r

]2+[∂z

∂θ

]2=[∂z

∂x

]2+[r∂z

∂y

]2.

5. Let F : Rn → R be differentiable and set f(x) = F (x, . . . , x). Prove thatf ′(x) = (1, . . . , 1) ·∇F (x, . . . , x).

6. Let f(x, y) be continuously differentiable. Prove that

f(x, y) =∫ 1

0(x, y) ·∇f(tx, ty)t dt+

∫ 1

0f(tx, ty) dt.

7.S Let f : U → Rm be differentiable on an open set U ⊆ Rn. Find (T ◦f)′(x)for T ∈ L(Rm,Rk).


8. Let f : Rn → R be differentiable and a = (a1, . . . , an) ∈ Rn with an 6= 0.Prove that a ·∇f(x) = 0 for all x ∈ Rn iff there exists a differentiablefunction g : Rn−1 → R such that

f(x1, x2, . . . , xn) = g(x1 − b1xn, x2 − b2xn, . . . , xn−1 − bn−1xn

),

where bj = aj/an, 1 ≤ j ≤ n− 1.

9. Let U ⊆ Rn be open and f : U → R smooth. Let α, β : I → Rnbe smooth paths in U such that ∇(f ◦ α) = α′, α(t1) = β(t1), and‖α′(t1)‖ = ‖β′(t1)‖ = 1 for some t1 ∈ I (that is, α and β both have unitspeed at the intersection). Show that (f ◦ α)′(t1) ≥ (f ◦ β)′(t1).

10. Let U ⊆ Rn be open and f : U → R differentiable at a ∈ U . If u ∈ Rnwith ‖u‖ = 1, define the directional derivative of f in the direction of uby

Duf(a) = limt→0

f(a+ tu)− f(a)t

.

(a)S Show that if f is differentiable at a, then Duf(a) exists and equalsu ·∇f(a).

(b) Show that if Duf exists, then D−uf exists and D−uf = −Duf .(c)S Define

f(x, y) =

xy2

x2 + y4 if (x, y) 6= (0, 0),

0 otherwise.Show that Duf(0, 0) exists for each u but f is not even continuousat (0, 0).

(d) Find all unit vectors u such that Du(xy)1/3 exists at (0, 0).(e) Find all unit vectors u such that Du|x+ y| exists at (x0,−x0).(f) Find all unit vectors u such that Du(x+ y)1/3 exists at (0, 0).

11. Let z = F (x, y), where x = x(u, v), y = y(u, v), z = z(u, v), andthe partial derivatives of these functions exist on R2. Suppose thatxuyv − yuxv 6= 0. Find zx and zy in terms of zu, zv, xu, xv, yu, and yv.

12.S Let f and fx be continuous on [a, b]× [c, d]. Use the mean value theoremto prove that

d

dx

∫ b

a

f(t, x) dt =∫ b

a

fx(t, x) dt, c ≤ x ≤ d.

13. Let f and fx be continuous on R2 and u(x), v(x) differentiable on R.Use Exercises 5 and 12 to prove that

d

dx

∫ v(x)

u(x)f(t, x) dt =

∫ v(x)

u(x)fx(t, x) dt+f

(v(x), x

)v′(x)−f

(u(x), x

)u′(x).


14. Show that the mean value theorem does not generally hold for vector-valued functions.

15.S A function f : Rn \ {0} → R is homogeneous of degree p > 0 iff(tx) = tpf(x) for all t > 0 and all x 6= 0. Prove that a differentiablefunction f is homogeneous of degree p iff x ·∇f(x) = pf(x) for everyx 6= 0.

16. Prove the following generalization of the Cauchy mean value theorem:Let U ⊆ Rn be open and convex and let f, g : U → R be differentiableon U . Then, for each pair of points a, b ∈ U , there exists c ∈ [a : b] suchthat (

f(b)− f(a))∇g(c) · (b− a) =

(g(b)− g(a)

)∇f(c) · (b− a).

17.S Let f : U → Rm be continuously differentiable on the open set U ⊆ Rnand let C be a compact convex subset of U . Prove that

‖f(x)− f(y)− dfy(x− y)‖ ≤ supz∈C‖dfz − dfy‖ ‖x− y‖, x, y ∈ C,

and that the supremum is finite.

18. Let f(x, y) =(x2−y2, 2xy

)and (a, b) 6= (0, 0). Show that if the functions

ϕ, ψ : (−1, 1)→ R2 are differentiable and ϕ(0) = ψ(0) = (a, b), then

(f ◦ ϕ)′(0) · (f ◦ ψ)′(0)‖(f ◦ ϕ)′(0)‖ ‖(f ◦ ψ)′(0)‖ = ϕ′(0) · ψ′(0)

‖ϕ′(0)‖ ‖ψ′(0)‖ ,

that is, the angle between the curves ϕ and ψ at their intersection ispreserved under the transformation f .

9.4 Inverse Function TheoremThe one-dimensional inverse function theorem of Section 4.4 has the fol-

lowing n-dimensional extension.

9.4.1 Inverse Function Theorem. Let U ⊆ Rn be open and let f : U → Rnbe continuously differentiable on U . If Jf (a) 6= 0 for some a ∈ U , then thereexist open sets Ua ⊆ U and Va = f(Ua) with a ∈ Ua such that f is one-to-oneon Ua and f−1 : Va → Ua is continuously differentiable. Moreover,(

dfx

)−1 = d(f−1)y, x ∈ Ua, y := f(x). (9.15)

The conclusion of the theorem may be summarized by saying that f has acontinuously differentiable local inverse at a. Of course, since f need not beone-to-one on U , f may not have a “global” inverse.

The proof of the theorem requires two lemmas. The first is of some inde-pendent interest.

9.4.2 Lemma (Contraction Mapping Principle). Let (X, d) be a completemetric space and let ϕ : X → X be a continuous function such that, for some0 ≤ c < 1,

d(ϕ(x), ϕ(y)

)≤ c d(x, y) for all x, y ∈ X.

Then there exists a unique point x ∈ X such that ϕ(x) = x.

Proof. Choose any point x0 in X and define a sequence {xn} recursively byxn = ϕ(xn−1), n ≥ 1. By hypothesis,

d(xk+1, xk) ≤ c d(xk, xk−1) ≤ c2d(xk−1, xk−2) ≤ · · · ≤ ckd(x1, x0).

Thus, by the triangle inequality, for m > n

d(xn, xm) ≤m−1∑k=n

d(xk, xk+1) ≤ d(x1, x0)∞∑k=n

ck.

Since c < 1, the series∑∞k=1 c

k converges, hence the sum on the right tendsto zero as n → ∞. It follows that {xn} is a Cauchy sequence and thereforeconverges to some x ∈ X. Letting n → +∞ in the equation xn = ϕ(xn−1)yields ϕ(x) = x. If also ϕ(y) = y, then d(x, y) = d

(ϕ(x), ϕ(y)

)≤ c d(x, y),

which is possible only if x = y.

9.4.3 Lemma. Let U ⊆ Rn be open and f : U → Rn continuously differen-tiable. If a ∈ U with Jf (a) 6= 0, then there exists r > 0 such that the lineartransformation dfx is invertible for each x ∈ Br(a).

Proof. Since f ′ is continuous, its entries are continuous, hence Jf (x) is acontinuous function of x. Since Jf (a) 6= 0, there exists r > 0 such thatJf (x) 6= 0 on Br(a) ⊆ U . Since a linear transformation on Rn is invertible iffthe determinant of its matrix is not zero, dfx is invertible for x ∈ Br(a).

Proof of the inverse function theorem.

By 9.4.3, there exists an r > 0 such that Cr(a) ⊆ U and dfx is invertible foreach x in an open setWr containing Cr(a). Let T = dfa and define g = T−1◦fon Wr. Then dga = T−1 ◦ dfa = In, the identity transformation on Rn. Nowapply 9.3.6 to the function g(x)−x on Cr(a). The constant c in that theoremis

sup{‖dgz − dga‖ : z ∈ Cr(a)},

which we can make less than 1/2 by taking r sufficiently small, using thecontinuity of the function z 7→ dgz at a. Thus

‖g(x)− g(y)− (x− y)‖ ≤ 12‖x− y‖, x, y ∈ Cr(a). (9.16)

Since‖x− y‖ − ‖g(x)− g(y)‖ ≤ ‖g(x)− g(y)− (x− y)‖,

we see from (9.16) that

12‖x− y‖ ≤ ‖g(x)− g(y)‖, x, y ∈ Br(a).

In particular, g is one-to-one on Br(a).Next, we use 9.4.2 to show that g

(Br(a)

)is open. Let c ∈ Br(a), d = g(c)

and choose s > 0 so that Cs(c) ⊆ Br(a). We claim that

Bs/2(d) ⊆ g(Cs(c)

)⊆ g(Br(a)

). (9.17)

The second inclusion is clear. For the first, let u ∈ Bs/2(d). To show thatu ∈ g

(Br(a)

)define

ϕ(x) = x− g(x) + u, x ∈ Cs(c).

Then

‖c− ϕ(x)‖ = ‖g(x)− g(c)− (x− c) + d− u‖≤ ‖g(x)− g(c)− (x− c)‖+ ‖d− u‖≤ 1

2‖x− c‖+ ‖d− u‖ by (9.16)< s/2 + s/2 = s,

so ϕ(Cs(c)

)⊆ Bs(c). Moreover, using (9.16) again we have

‖ϕ(x)− ϕ(y)‖ ≤ 12‖x− y‖, x, y ∈ Cs(c).

By Lemma 9.4.2, ϕ(x) = x for some x ∈ Bs(c), hence u = g(x) ∈ g(Bs(c)

).

Since u was arbitrary, (9.17) holds. Since d ∈ g(Br(a)

)was arbitrary, g

(Br(a)

)is open.

Next, we show that g−1 : g(Br(a)

)→ Br(a) is differentiable at b := g(a).

Since b ∈ g(Br(a)

)and g

(Br(a)

)is open, b + k ∈ g

(Br(a)

)for sufficiently

small ‖k‖, that is, for each such k,

b+ k = g(a+ h) for some ‖h‖ < r.

By (9.16),

‖h‖ − ‖k‖ ≤ ‖h− k‖ = ‖g(a+ h)− g(a)− h‖ ≤ 12‖h‖,

hence ‖k‖ ≥ 12‖h‖. Since g−1(b+ k) = a+ h and g−1(b) = a, recalling that

dga = In we have

‖g−1(b+ k)− g−1(b)− Ink‖‖k‖

= ‖h− k‖‖k‖

≤ 2 ‖g(a+ h)− g(a)− dga(h)‖‖h‖

.

Since k→ 0 implies that h→ 0, which in turn implies that the right side ofthe above inequality tends to zero, we see that g−1 is differentiable at b withderivative In.

Now set Ua = Br(a) and Va = (T ◦ g)(Ua). Since T is invertible, it is ahomeomorphism, hence Va is open. Moreover, since g is one-to-one on Ua andmaps Ua onto g(Ua), f = T ◦ g is one-to-one on Ua and maps Ua onto Va.Since f−1 = g−1 ◦ T−1, the chain rule implies that f−1 is differentiable atf(a) = Tb.

Now observe that the entire above argument may be used at any pointx of Ua, since all that is needed is the invertibility of dfx. Therefore, f−1 isdifferentiable on Va.

To verify (9.15) apply the chain rule to f−1 ◦ f = In:

d(f−1)y ◦ dfx = d(f−1 ◦ f)x = d(In)x = In, y = f(x) ∈ Va.

9.4.4 Corollary. Let U ⊆ Rn be open and f : U → Rn continuously differ-entiable with Jf (x) 6= 0 for each x ∈ U . Then f is an open map, that is, ifE ⊆ U is open, then f(E) is open. If particular, f(U) is open.

Proof. In the notation of the theorem, f(E) is the union of the open setsf(Ua ∩ E), a ∈ E.

Since continuous differentiability is a local property, we have

9.4.5 Global Inverse Function Theorem. Under the conditions of thepreceding corollary, if f is also one-to-one on U , then f−1 : f(U) → U iscontinuously differentiable.

9.4.6 Example. The function (x, y) = f(r, θ) = (r cos θ, r sin θ), r > 0, θ ∈ R,has Jacobian r, hence is locally invertible at each point of its domain. Since thefunction is not one-to-one, it has no global inverse. However, if the domain off is suitably restricted, say by requiring θ0 < θ < θ0 + 2π, then f is one-to-oneon the resulting open set Uθ0 := (0,+∞)× (θ0, θ0 + 2π).

By 9.4.5, the restriction g of f to Uθ0 has a continuously differentiableinverse (

r(x, y), θ(x, y))

= g−1(x, y)on the open set Vθ0 = f(Uθ0), obtained by removing the ray (r, θ0), r ≥ 0, fromR2. Clearly, r(x, y) =

√x2 + y2. The function θ(x, y) is called the argument

of (x, y) (determined by θ0) and is denoted by argθ0(x, y). Thus

g−1(x, y) =(√

x2 + y2, argθ0(x, y))

on Vθ0 .

For example, if θ0 = −π, then argθ0(x, y) = arctan(y/x) for x > 0. ♦


x

y

θ0

FIGURE 9.1: The domain of argθ0 .

If a function f has a nonzero Jacobian on an open set U and if f is one-to-one on an open subset U0 of U , then the inverse of the restriction of f toU0 is called a branch of f−1 (even though a global f−1 may not exist). In thepreceding example, g−1 is one of infinitely many branches of f−1.

9.4.7 Example. The function (x, y) = f(u, θ) = (eu cos θ, eu sin θ), where(u, θ) ∈ R2, has Jacobian eu, hence is locally invertible at each point of R2.The set Uθ0 = R× (θ0, θ0 + 2π) is open, and f restricted to Uθ0 is one-to-one.Therefore, the corresponding branch of f−1 is continuously differentiable onf(Uθ0), which is the set Vθ0 of 9.4.6. The inverse may be given explicitly by

u = ln√x2 + y2, θ = argθ0(x, y). ♦

9.4.8 Example. Let (u, v) = f(x, y) =(2x2 − 3y2, 3x2 − 2y2). The Jacobian

is nonzero on the open set U = {(x, y) : xy 6= 0}. Solving the equations for x2

and y2 yieldsx2 = 3v − 2u

5 and y2 = 2v − 3u5 .

Restricting f to each of the open quadrants of R2, we obtain four naturalbranches of f−1, each defined on the open set

V := {(u, v) : 3v > 2u and 2v > 3u} = {(u, v) : v > max{2u/3, 3u/2}} ,

and each of the form

f−1(u, v) =(±√

3v − 2u5 ,±

√2v − 3u

5

), (u, v) ∈ V,

For example, in the open second quadrant of the x, y plane, one chooses theminus sign in the first coordinate and the plus sign in the second. ♦

Exercises1. Find the largest set at each point of which the inverse function theorem

guarantees a local C1 inverse of f , where f(x) =

(a) S (x+ y, xy). (b) S (sin x+ cos y, cosx+ sin y).

(c)(ye−x

2, xe−y

2). (d) (sin x+ sin y, cosx− cos y).

(e) S(ye−2x, ye3x). (f) S

(ln√xy, 1

x2 + y2

), x, y > 0.

(g) (xy, x2 − y2). (h)(

x

1 + x2 + y2 ,y

1 + x2 + y2

).

(i) S (x2 + y2, xy). (j) S (xy2, x2z, yz2).

(k)(ye−x

2, yex

2). (l) (x/y, y/z, z/x), xyz 6= 0.

2. Find a local inverse of the function in the specified part below of Exercise 1about the point (a, b) and find df−1

(u,v).

(i) S (a) , a > b > 0. (ii) (e) , ab 6= 0. (iii) (f) , a > b > 0.(iv) (g) , a > b > 0. (v) S (i) , a, b > 0. (vi) (k) , a, b > 0.

Show that for part (a) in Exercise 1, no inverse is possible on (0,+∞)2.

3. Let f(ρ, φ, θ) =(x(ρ, φ, θ), y(ρ, φ, θ), z(ρ, φ, θ)

)be the spherical coordi-

nate transformation of Exercise 9.1.5. Find an explicit formula for thebranch of f−1 on the set {(ρ, φ, θ) : ρ > 0, 0 < φ < π, 0 < θ < π} .

4.S Letf(x, y) :=

(x

x2 + y2 ,y

x2 + y2

), (x, y) 6= (0, 0).

Show that f = f−1 and find Jf .

5. By considering the function

f(x) ={x+ x2 sin(1/x) if x 6= 0,0 otherwise,

show that the hypothesis in the statement of the inverse function theoremthat df be continuous on U cannot be removed.

6. Let U ⊆ Rn be open and f : U → Rn of class C1 such that for somec > 0, ‖f(x) − f(y)‖ ≥ c‖x − y‖ for all x, y ∈ U , where c > 0. Provethat dfx is invertible for each x ∈ U . Conclude that f : U → f(U) is ahomeomorphism.

9.5 Implicit Function TheoremThe implicit function theorem is one of the most important applications

of the inverse function theorem. The theorem gives conditions under whichan equation of the form F (x,y) = 0 may be solved locally for y in terms ofx. The resulting function is then said to be implicitly defined by the equationF (x,y) = 0. The following simple example illustrates the basic idea.

9.5.1 Example. Let F (x, y, z) = x2 + y2 + z2 − 1. Consider the problem offinding all points (a, b, c) with F (a, b, c) = 0 such that the equation F (x, y, z) =0 has a continuously differentiable solution z = z(x, y) satisfying z(a, b) = c.The key fact here is that such a solution is possible if Fz(a, b, c)(= 2c) 6= 0.Indeed, in this case a2 + b2 = 1 − c2 < 1, hence x2 + y2 < 1 for all (x, y, z)sufficiently near (a, b, c) that satisfy F (x, y, z) = 0. For such points the solution

z(x, y) = ±√

1− x2 − y2

is continuously differentiable, and if the sign chosen is that of c, then z(x, y) isthe unique solution satisfying z(a, b) = c. ♦

Notation. For the statement and proof of the implicit function theorem weuse the following conventions: For points z ∈ Rn+m we write

z = (x,y) = (x1, . . . xn, y1, . . . ym), x ∈ Rn, y ∈ Rm.

For a differentiable function

F (z) = F (x,y) = (F1(x,y), . . . , Fm(x,y)),

we denote by Fy(x,y) the m×m matrix with (i, j)th entry ∂Fi∂yj

(x,y). ♦

9.5.2 Implicit Function Theorem. Let U be an open subset of Rn+m,let F = (F1, . . . , Fm) : U → Rm be continuously differentiable, and letF (a, b) = 0 for some (a, b) ∈ U . If

∂(F1, . . . , Fm)∂(y1, . . . , ym) = detFy(a, b) 6= 0,

then there is an open set Va ⊆ Rn containing a and a unique continuouslydifferentiable mapping f : Va → Rm such that

f(a) = b and F(x, f(x)

)= 0 for every x ∈ Va.

Proof. Define G : U → Rn+m by

G(x,y) =(x, F (x,y)

)=(x, F1(x,y), . . . , Fm(x,y)

).

Then G is continuously differentiable, and

G′(x,y) =[In×n On×mA Fy

],

where In×n is the n×n identity matrix, Om×n is the m×n zero matrix, and Ais an m× n matrix of partial derivatives of the components of F with respectto x. Therefore, JG = detFy. Since detFy(a, b) 6= 0, by the inverse functiontheorem there exists an open set W ⊆ U containing (a, b) and an open setV ⊆ Rn+m containing G(a, b) = (a,0) such that G(W ) = V and

H = (H1, . . . ,Hn, Hn+1, . . . ,Hn+m) := G−1 : V →W

is continuously differentiable. Note that the identities H(G(x,y)

)= (x,y)

and G(H(x,y)

)= (x,y) imply, respectively, that(

Hn+1(G(x,y)

), . . . ,Hn+m

(G(x,y)

))= y, (x,y) ∈W (9.18)

andF(x, Hn+1

(x,y

), . . . ,Hn+m

(x,y

))= y, (x,y) ∈ V. (9.19)

Now let Va = {x ∈ Rn : (x,0) ∈ V }. Then Va is open and contains a. Definef on Va by

f(x) =(Hn+1(x,0), . . . ,Hn+m(x,0)

).

Then f is continuously differentiable, and since (a,0) = G(a, b), (9.18) impliesthat

f(a) =(Hn+1(a,0), . . . ,Hn+m(a,0)

)= b.

Furthermore, (9.19) implies that F (x, f(x)) = 0 on Va. This establishes theexistence of f .

To show uniqueness, assume that F (x, g(x)) = 0 for some function g :Va → Rm. Then

G(x, f(x)) =(x, F (x, f(x))

)=(x, F (x, g(x))

)= G(x, g(x)).

Since G is one-to-one, f(x) = g(x).

9.5.3 Example. The point (x, y, u, v) = (−1, 1, 1, 1) is a solution of the system

F (x, y, u, v) := xu2 + y = 0G(x, y, u, v) := xy2 + u2v2 = 0,

and at that point∂(F,G)∂(u, v) = 4xu3v 6= 0.

By the implicit function theorem, there are C1 functions u(x, y) and v(x, y)defined on a ball Br(−1, 1) that satisfy the above system with u(−1, 1) =v(−1, 1) = 1. If r < 1, then (x, y) ∈ Br(−1, 1) implies that x < 0 < y and wehave the explicit solution

u =√−y/x, v = −x√y. ♦


9.5.4 Remark. Let f = (f1, . . . , fm) be the function in the statement of theimplicit function theorem. Set y = f(x) and w = F (x,y) = F (z). Applyingthe chain rule to the identity F

(x, f(x)

)= 0 yields

∂wi∂xj

+ ∂wi∂y1

∂y1

∂xj+ · · ·+ ∂wi

∂ym

∂ym∂xj

= 0, i = 1, . . . ,m, j = 1, . . . , n.

This may be written in matrix form as∂w1

∂y1· · · ∂w1

∂ym...

...∂wm∂y1

· · · ∂wm∂ym

∂y1

∂x1· · · ∂y1

∂xn...∂ym∂x1

· · · ∂ym∂xn

= −

∂w1

∂x1· · · ∂w1

∂xn...∂wm∂x1

· · · ∂wm∂xn

,

or, in the above notation, as Fy(z)f ′(x) = −Fx(z). Therefore,

f ′(x) = −Fy(z)−1Fx(z),

which shows that the partial derivatives of the solution f in the implicit functiontheorem may be calculated by carrying out a matrix inversion. However, thisis practical only for small dimensions, and even in this case it is often easier toapply the chain rule directly and then use Cramer’s rule. The next exampleillustrates the latter approach. ♦

9.5.5 Example. Suppose (x0, y0, u0, v0) satisfies the system

F (x, y, u, v) = G(x, y, u, v) = 0, (9.20)

where F and G are C1 in a neighborhood of (x0, y0, u0, v0) and

∂(F,G)∂(u, v) (x0, y0, u0, v0) 6= 0.

Then (9.20) has a C1 solution u = u(x, y), v = v(x, y) near (x0, y0) such thatu0 = u(x0, y0), v0 = v(x0, y0). Differentiating each equation in (9.20) withrespect to x and y, we obtain the two systems

Fuux + Fvvx = −Fx Fuuy + Fvvy = −FyGuux +Gvvx = −Gx Guuy +Gvvy = −Gy

Cramer’s rule gives the following solutions near (x0, y0, u0, v0):

ux = −

∂(F,G)∂(x, v)∂(F,G)∂(u, v)

, vx = −

∂(F,G)∂(u, x)∂(F,G)∂(u, v)

, uy = −

∂(F,G)∂(y, v)∂(F,G)∂(u, v)

, vy = −

∂(F,G)∂(u, y)∂(F,G)∂(u, v)

. ♦


Exercises1.S What does the implicit function theorem tell us about solving the

equation x+ y2 + exy = 1 near (0, 0) for one of the variables in terms ofthe other?

2. Suppose (x0, y0, z0) satisfies the equation F (x, y, z) = 0, where F is C1

in a neighborhood of (x0, y0, z0) and Fz(x0, y0, z0) 6= 0. By the implicitfunction theorem, F (x, y, z) = 0 has a C1 solution z = z(x, y) near(x0, y0) with z0 = z(x0, y0). Show that near (x0, y0, z0),

zx = −FxFz

and zy = −FyFz.

3. Show that for each of the functions F below the equation F (x, y, z) = 0has a local C1 solution z = z(x, y) on some ball Br(a, b) such thatz(a, b) = c. Calculate zx in a neighborhood of (a, b, c).

(a) sin(xyz) + cos(xyz)− 1, (a, b, c) = (1, π, 0).(b) exyz + x+ y + z − 1, (a, b, c) = (0, 0, 0).(c) z sin(x+ y + z)− π

√3/6, (a, b, c) = (π/6, π/6, π/3).

(d) xyz + ln(x+ y + z)− 1− ln 3, (a, b, c) = (1, 1, 1).(e) x ln z + y ln x+ z ln y, (a, b, c) = (1, 1, 1).(f) x sin z + y sin x+ z sin y − 3π/2, (a, b, c) = (π/2, π/2, π/2).(g) z2n + xz2n−1 + xy − 1, n ∈ N, (a, b, c) = (1, 1,−1).(h) cos(xyz) + cos(xz) + cos(yz), (a, b, c) = (0, 1, π/2).

4. Suppose (x0, y0, z0) satisfies the system F (x, y, z) = G(x, y, z) = 0, whereF and G are C1 in a neighborhood of (x0, y0, z0) and

∂(F,G)∂(x, y) (x0, y0, z0) 6= 0.

By the implicit function theorem, the system has a C1 solution (x, y) =(x(z), y(z)) near (x0, y0) with (x0, y0) = (x(z0), y(z0)). Show that near(x0, y0, z0),

x′(z) = −

∂(F,G)∂(z, y)∂(F,G)∂(x, y)

, and y′(z) = −

∂(F,G)∂(x, z)∂(F,G)∂(x, y)

.

5.S Show that each pair of variables in the system

sin(x+ z) + ln(y + z) =√

2/2exz + sin(πy + z) = 1


are C1 functions of the other variable near (x, y, z) = (π/4, 1, 0). In thecase (x, y) =

(x(z), y(z)

), calculate x′(z) and y′(z) in a neighborhood of

(π/4, 1, 0).

6. Show that each pair of variables in the system

xy + yz + xz = 11xyz + x + y = 9

are C1 functions of the other variable near (x, y, z) = (1, 2, 3). In thecase (x, y) =

(x(z), y(z)

), calculate x′(z) and y′(z) in a neighborhood of

(1, 2, 3).

7. Show that each pair of the variables (u, v), (x, y), and (x, v) in the system

x2 − y2 + uv−v2 = 0x2 + y2 + uv+u2 = 4

are C1 functions of the remaining variables near (x, y, u, v) = (1, 1, 1, 1).In the case u(x, y), v(x, y), calculate ux in a neighborhood of (1, 1).

8.S Show that the system

x− y + z +u2= 2−x + 2z +u3= 2− y + 3z +u4= 3

cannot be solved for x, y, and z in terms of u near the point (x, y, z, u) =(1, 1, 1, 1), but for any other group of three variables a localC1 solutionin terms of the fourth variable is possible.

9. Let f(x, y) be continuously differentiable with f(0, 0) = 0. Give conditionson fx and fy such that each of the equations below has a C1 solutiony = y(x) on some interval (−r, r) with y(0) = 0. Calculate y′(x) in eachcase.

(a) f(2y, 2x− 3y) = 0. (b)S f(f(x, y), y

)= 0. (c) f

(f(x, y), f(x, y)

)= 0.

10. Let f(x, y) be continuously differentiable with f(0, 0) = 0. Give conditionson fx and fy under which each of the equations below has a C1 solutionz = z(x, y) on some open ball Br(0, 0) with z(0, 0) = 0.(a) f(2y + 3z, 3x− 2z) = 0.(b) f

(f(x,−z), z ln(e2 + x+ y)

)= 0.

(c) f(e2zf(x, 2z), f(y, sin 3z)

)= 0.

(d) f(f(z, x), f(y, z)

)= 0.


11. Let f(x, y) be continuously differentiable with f(0, 0) = 0. For eachsystem below, give conditions on fx and fy under which the systemhas a C1 solution x = g(z), y = h(z) on some interval (−r, r) withg(0) = h(0) = 0.(a) S f

(f(x, y), f(z, y)

)= 0 (b) f

(f(z, z), f(x, y)

)= 0

f(f(y, z), f(x, z)

)= 0 f

(f(x, y), f(y, z)

)= 0

For each system, calculate g′(z).

12. Let f(x, y) be continuously differentiable with f(0, 0) = 0. What doesthe implicit function theorem tell us about the possibility of solving thesystem

f(f(u, x), f(v, y)

)= f

(f(y, u), f(x, v)

)= 0

(a) for (x, y) in terms of (u, v) such that x(0, 0) = y(0, 0) = 0?(b) for (u, v) in terms of (x, y) such that u(0, 0) = v(0, 0) = 0?

13.S Let f , g, and h be continuously differentiable and f(1) = g(1) = h(1) = 0.Give conditions on f ′, g′, and h′ so that the system

f(xu) + g(yu) + h(zu) = 0f(xv) + g(yv) + h(zv) = 0

has a C1 solution u = u(x, y, z), v = v(x, y, z) on some ball Br(1, 1, 1)such that u(1, 1, 1) = v(1, 1, 1) = 1. Calculate ux.

14. Let D ⊆ R2 be compact and let F (x, y, z) be continuous on the setE := D × [a, b] such that for each (x, y) ∈ D there exists a uniquez = z(x, y) ∈ [a, b] for which F

(x, y, z(x, y)

)= 0. Prove that z(x, y) is

continuous on D.

15.S Suppose the equation F (x1, . . . , xn) = 0 may be solved for each variablexj in terms of the others. Show that under suitable conditions

∂x2

∂x1

∂x3

∂x2. . .

∂xn∂xn−1

∂x1

∂xn= (−1)n.

Verify this for each of the functions(a) F (x1, x2, x3) = x1x2x3 − 1,(b) F (x1, x2, x3, x4) = x1x2x3x4 − 1.

16. Let p(x, y) and q(x, y) be C1 on an open set U containing (0, 0) suchthat p(0, 0) = q(0, 0) = 0 and for (x, y) ∈ U \ {(0, 0)}

p(x, y) > 0, and − 1 ≤ q(x, y) ≤ 1.Let

f(x, y, z) = z3 + p(x, y)z + q(x, y), (x, y) ∈ U, z ∈ R.Prove that there is a unique solution z = z(x, y) to f(x, y, z) = 0 on allof U which is C1 on U \ {(0, 0)} and satisfies z(0, 0) = 0.

9.6 Higher Order Partial DerivativesLet f be a real-valued function defined on an open subset of R2 with first

partial derivatives fx and fy. The higher order partial derivatives are definedinductively by

fxx = ∂2f

∂x2 := ∂

∂x

∂f

∂x, fyy = ∂2f

∂y2 := ∂

∂y

∂f

∂y,

fxy = ∂2f

∂y∂x:= ∂

∂y

∂f

∂x, fyx = ∂2f

∂x∂y:= ∂

∂x

∂f

∂y,

fxxx = ∂3f

∂x3 := ∂

∂x

∂2f

∂x2 , fxxy = ∂3f

∂y∂x2 := ∂

∂y

∂2f

∂x2

......

Analogous definitions are given for functions of n variables. For such a functionf , integers mi ∈ Z+ and a permutation (i1, . . . , in) of (1, . . . , n),

∂mf

∂xm1i1· · · ∂xmnin

, m := m1 + · · ·+mn,

is called a partial derivative of order m.The following result will allow some simplifications in calculating higher

order partial derivatives.

9.6.1 Theorem. Let U ⊆ R2 be open and let f : U → R have continuous firstpartial derivatives fx and fy on U . If fxy exists on U and is continuous at(a, b) ∈ U , then fyx(a, b) exists and equals fxy(a, b).

Proof. Choose r > 0 such that (a−r, a+r)× (b−r, b+r) ⊆ U . For |h|, |k| < r,define

ϕk(x) = f(x, b+ k)− f(x, b), x ∈ (a− r, a+ r),ψh(y) = f(a+ h, y)− f(a, y), y ∈ (b− r, b+ r),

∆(h, k) = ϕk(a+ h)− ϕk(a)= ψh(b+ k)− ψh(b)= f(a+ h, b+ k)− f(a, b+ k) + f(a, b)− f(a+ h, b).

By the mean value theorem applied twice, there exist s, t ∈ (0, 1) such that

∆(h, k) = ϕ′k(a+ sh)h=[fx(a+ sh, b+ k)− fx(a+ sh, b)

]h

= fxy(a+ sh, b+ tk)hk.

By continuity of fxy at (a, b),

lim(h,k)→(0,0)

∆(h, k)hk

= lim(h,k)→(0,0)

fxy(a+ sh, b+ tk) = fxy(a, b).

On the other hand, for each h,

limk→0

∆(h, k)k

= limk→0

ψh(b+ k)− ψh(b)k

= ψ′h(b) = fy(a+ h, b)− fy(a, b),

so by the iterated limit theorem (8.4.4),

limh→0

fy(a+ h, b)− fy(a, b)h

= limh→0

limk→0

∆(h, k)hk

= lim(h,k)→(0,0)

∆(h, k)hk

.

Therefore, fyx(a, b) = fxy(a, b).

The following example shows that continuity of at least one of the secondpartial derivatives in the theorem is essential.

9.6.2 Example. Let f(0, 0) = 0 and define

f(x, y) = x3y − y3x

x2 + y2 if (x, y) 6= (0, 0).

Then the first partial derivatives exist and are continuous on R2, the sec-ond partial derivatives exist on R2, but fxy(0, 0) 6= fyx(0, 0). Indeed, sincefx(0, 0) = 0,

fx(0, y) = limh→0

f(h, y)− f(0, y)h

= limh→0

h2y − y3

h2 + y2 = −y,

and similarly fy(x, 0) = x. Therefore, fxy(0, 0) = −1 and fyx(0, 0) = 1. ♦

Theorem 9.6.1 may be extended to functions f of n variables. Indeed, if1 ≤ i < j ≤ n, then under suitable continuity conditions one has

∂2f

∂xi∂xj= ∂2f

∂xj∂xi,

since the only “active” variables in this identity are xi and xj . Combining thisobservation with an induction argument leads to the following result.

9.6.3 Corollary. Let f be a real-valued function defined on an open subset ofRn and let m = m1 + m2 + · · · + mn, mi ∈ Z+. Then, for any permutation(i1, . . . , in) of (1, . . . , n),

∂mf

∂xmi1i1· · · ∂xminin

= ∂mf

∂xm11 · · · ∂xmnn

,

provided that all partial derivatives of f up to order m are continuous on U .


9.6.4 Definition. Let r ∈ N. A real-valued function f on an open set U ⊆ Rnis said to be of class Cr on U (or simply Cr on U) if all partial derivativesup to order r exist and are continuous on U . Also, f is of class C∞ on U ifit is of class Cr on U for every r ∈ N. A vector-valued function is Cr if eachcomponent function is Cr. Continuous functions are said to be of class C0. Afunction is of class Cr on a set if it is the restriction of a Cr function on alarger open set. ♦

9.6.5 Remarks. (a) A function of class r + 1 is of class r. The function

f(x1, . . . , xn) ={xr+1

1 if x1 ≥ 0,0 otherwise

is Cr on Rn but not Cr+1.(b) The standard rules of differentiation show that if f and g are real-valued

functions of class Cr, then so are αf , f + g, fg, and f/g. For example, iff(x, y) and g(x, y) are of class C2, then

(fg)xx = fxxg + fgxx + 2fxgx,

with similar formulas holding for (fg)xy and (fg)yy. Since the terms on theright are continuous, fg is C2. In particular, polynomials and rational functionsof several variables are of class C∞.(c) The composite f = g ◦ h of real-valued Cr functions is again Cr. This

follows from the chain rule: The matrix equation

f ′(x) = g′(h(x)

)h′(x)

shows that the entries of f ′(x) are sums of products of Cr−1 functions, hencethe entries of f(x) are Cr.(d) If the function f in the statement of the inverse function theorem is Cr

on U , then the local inverse of f is also Cr. This is proved by induction on ras follows. Assume that the assertion holds for r − 1, and let f be Cr on U .Then the entries of the matrix f ′(x) are Cr−1, hence, near a, the entries of

(f−1)′(y) =[f ′(f−1(y))

]−1

are Cr−1, as these are rational functions of the entries of f ′. Therefore, theentries of f−1 are Cr.(e) If the function F in the statement of the implicit function theorem is Cr,

then the solution y = f(x) to the equation F (x,y) = 0 is Cr. This followsfrom (d) , since f is constructed using the inverse function theorem. ♦

The following example illustrates how the chain rule may be used tocalculate higher order partial derivatives of composite functions.


9.6.6 Example. Let u = f(x, y) be C2 on R2 and let x = r cos θ, y = r sin θ.Then

ur = (cos θ)ux + (sin θ)uy, uθ = −(r sin θ)ux + (r cos θ)uy,urr = (cos θ)uxr + (sin θ)uyr = (cos θ)2uxx + (2 sin θ cos θ)uxy + (sin θ)2uyy,

uθθ = −(r cos θ)ux − (r sin θ)uxθ − (r sin θ)uy + (r cos θ)uyθ,= (r sin θ)2uxx − (2r2 sin θ cos θ)uxy + (r sin θ)2uyy − rur.

Calculations like these are useful for changing coordinates in differential opera-tors. For example, the above equations imply that

∂2

∂x2 + ∂2

∂y2 = ∂2

∂r2 + 1r

∂

∂r+ 1r2

∂2

∂θ2 . (9.21)

The operator on the left is called the Laplacian. The equation expresses theLaplacian in polar coordinates. ♦

Exercises1. Let z = f(x, y) be C2 on R2. Show that the following equations hold for

the given functions x = x(r, t) and y = y(r, t).

(a) zxx + zyy = zrr + ztta2 + b2

, x = ar + bt, y = at− br.

(b)S xzxx − zyy = rzrr − tzttt− r

, x = rt, y = r + t.

(c) zxx + 4zyy = zrr + zttr2 + t2

, x = rt, y = r2 − t2.

(d) x2zxx + y2zyy = 12 [r2zrr + ztt − rzr], x = ret, y = re−t.

(e)S zxx + zyy = e−2r[zrr + ztt], x = er sin t, y = er cos t.

(f)S a2x2zxx + b2y2zyy = zrr + ztt − azr − bzt, x = ear, y = ebt.

2. Let z = f(x, y) be C2 on R2, x = ar + bs, and y = cr + ds. Show thatzrrzsszrs

=

a2 c2 2acb2 d2 2bdab cd ad+ bc

zxxzyyzxy

.In particular, if x = r − s and y = r + s, show thatzxxzyy

zxy

= 14

1 1 −21 1 21 −1 0

zrrzsszrs

.3. Let z = f(x, y), x = g(r, s), y = h(r, s) be C2 on R2. Show that

∂2z

∂r2 = ∂z

∂x

∂2x

∂r2 + ∂z

∂y

∂2y

∂r2 + ∂2z

∂x2

(∂x

∂r

)2+ ∂2z

∂y2

(∂r

∂r

)2+ 2 ∂2z

∂x∂y

∂x

∂r

∂y

∂r.

4.S Let F (x, y, z) be C2 on an open set U and assume that the equationF (x, y, z) = 0 defines z implicitly as a function of x and y. Express zxxin terms of partial derivatives of F .

5.S Show that each of the following functions u = u(t, x) satisfies the onedimensional heat equation ut = k2uxx.(a) u = (a sin x+ b cosx) exp(−k2t). (b) u = t−1/2 exp (−x2/4k2t).

6. Let f(x) and g(x) be twice differentiable.(a) Show that the function u(t, x) =

[f(x− ct) + g(x+ ct)

]satisfies the

one dimensional wave equation utt = c2uxx.

(b) Show that the function v(t, x) = 1x

[f(x− ct) + g(x+ ct)], x > 0,

satisfies the equation vtt = c2(

1 + 1x

)vxx + c2

xvx.

7.S (Spherical coordinate analog of (9.21)). Let w = f(x, y, z) be of classC2 on R3, where

x = ρ sinφ cos θ, y = ρ sinφ sin θ, and z = ρ cosφ.

Show that

∂2w

∂x2 + ∂2w

∂y2 + ∂2w

∂z2 = ∂2w

∂ρ2 + 2ρ

∂w

∂ρ+ 1ρ2∂2w

∂φ2 + cosφρ2 sinφ

∂w

∂φ+ 1ρ2 sin2 φ

∂2w

∂θ2 .

8. Show that if f(x, y) is C2 and homogeneous of degree n ≥ 2 (Exer-cise 9.3.15), then

x2fxx + 2xyfxy + y2fyy = n(n− 1)f(x, y).

9.S Let g be C2 on (0,+∞), p 6= 0, and f(x) = g (‖x‖p), x ∈ Rn \ {0}.Show that

1p

n∑i=1

fxixi = (n+ p− 2)‖x‖p−2g′ (‖x‖p) + p‖x‖2(p−1)g′′ (‖x‖p) and

1p

∑i<j

fxixj =[(p− 2)‖x‖p−4g′ (‖x‖p) + ‖x‖2(p−2)g′′ (‖x‖p)

]∑i<j

xixj .

10. Let r > 0. Show that the substitutions

s = ex, t = T − 2τσ2 , and v(t, s) = u(τ, x)

transform the partial differential equation

vt(t, s) + rsvs(t, s) + 12σ

2s2vss(t, s)− rv(t, s) = 0, s > 0, 0 ≤ t ≤ T


into

uτ (τ, x) = (k − 1)ux(τ, x) + uxx(x, τ)− ku(x, τ), k := 2r/σ2.

The first equation arises in the Black–Scholes theory of option pricing.The second is an example of a diffusion equation.

11. Show that the substitutions

u(τ, x) = eax+bτw(τ, x), a := 12 (1−k), b := a(k−1)+a2−k = − 1

4 (k+1)2,

reduce the diffusion equation in Exercise 10 to the heat equationwτ (τ, x) = wxx(τ, x)

9.7 Higher Order Differentials and Taylor’s TheoremHigher order differentials of a function f of several variables are analogs

of higher order derivatives of functions of a single variable. These may beconveniently expressed in terms of higher order partial derivatives of f . Animportant consequence is Taylor’s theorem in n-dimensions, which is used toestablish convergence of power series in several variables.

We begin by giving an alternate description of the spaceL(Rn,L(Rn,R)

).

For a member B of this space and each h ∈ Rn, Bh ∈ L(Rn,R) has matrix[(Bh)e1 · · · (Bh)en

],

which we identify with the vector((Bh)e1, . . . , (Bh)en

), so that (Bh)k may

be written (Bh) · k. Now define

B(h,k) := (Bh) · k, h, k ∈ Rn.

Clearly, B is linear in h for each fixed k and linear in k for each fixed h. Sucha function is called a bilinear functional on Rn. Using the bilinearity, we have

B(h,k) = B

n∑i=1

hiei,

n∑j=1

kjei

=n∑i=1

n∑j=1

Bijhikj , (9.22)

where Bij := B(ei, ej) = (Bei) · ej . In matrix notation,

B(h,k) = [h1 · · · hn][Bij] k1

...kn

.


Conversely, given any bilinear functional B on Rn, the equation (Bh)k :=B(h,k) defines a member B of L

(Rn,L(Rn,R)

). Thus, identifying B with B,

we see that L(Rn,L(Rn,R)

)may be viewed as the vector space of all bilinear

functionals on Rn.Now let U ⊆ Rn be open and let f : U → R be C2 on U . Then df is

a function on U taking values in L(Rn,R). Identifying df with the vectorfunction ∇f = (∂1f, . . . , ∂nf), we define d2fx ∈ L

(Rn,L(Rn,R)

)by

d2fx = d(df)x = d(∇f)x

that is, by the above identification,

d2fx(h,k) = d(∇f)x(h) · k, x ∈ U, h, k ∈ Rn.

The matrix of d(∇f)x has (i, j) entry ∂j∂if(x) = ∂i∂jf(x), since f is C2.Thus

d2fx(h,k) =n∑

i,j=1

∂2f(x)∂xi∂xj

hikj , h, k ∈ Rn.

The bilinear function d2fx is called the second order differential of f at x.For higher order differentials, we need the following generalization of a

bilinear functional:

9.7.1 Definition. An m-multilinear functional on Rn is a real-valued functionM(h1, . . . ,hm) of vectors hj = (hj1, . . . , hjn) ∈ Rn that is linear in each variablehj when the other variables are held fixed. ♦

Analogous to (9.22) we have

M(h1, . . . ,hm) =n∑

j1=1· · ·

n∑jm=1

Mj1,...,jmh1j1· · ·hmjm (9.23)

where Mj1,...,jm := M(ej1 , . . . , ejm).Now let f be Cm on U , m ≥ 2. The mth order differential of f at x is

defined inductively bydmfx = d(dm−1f)x.

As in the case m = 2, we may interpret dmfx as the m-multilinear functional

dmfx(h1, . . . ,hm) =n∑

j1=1· · ·

n∑jm=1

∂mf(x)∂xj1 · · · ∂xjm

h1j1· · ·hmjm .

The mth total differential Dmfx of f at x is then defined by

Dmfx(h) := dmfx(h, . . . ,h), h := (h1, . . . , hn)

=n∑

j1=1· · ·

n∑jm=1

∂mf(x)∂xj1 · · · ∂xjm

hj1 · · ·hjm , (9.24)


which is frequently written

Dmfx =n∑

j1,j2,...,jm=1

∂mf(x)∂xj1∂xj2 . . . ∂xjm

dxj1dxj2 · · · dxjm , dxj(h) := hj .

By 9.6.3, each partial derivative in (9.24) may be expressed as

∂mf

∂xm11 . . . ∂xmnn

, m := m1 + · · ·+mn, mj ∈ Z+.

Similarly, the corresponding product of h’s in (9.24) may be written in theform hm1

1 . . . hmnn . For a fixed multi-index (m1, . . . ,mn) ∈ Z+ × · · · × Z+, thenumber of terms in (9.24) of the form

∂mf

∂xm11 . . . ∂xmnn

hm11 · · ·hmnn

is given by the multinomial coefficient(m

m1,m2, . . . ,mn

)= m

m1!m2! · · ·mn! , (9.25)

which is the number of distinct ways of arranging m objects, where m1 arealike, m2 are alike, etc. With this notation, (9.24) may be written

Dmfx(h) =∑(

m

m1, . . . ,mn

)∂mf(x)

∂xm11 . . . ∂xmnn

hm1 · · ·hmn , (9.26)

or, in differential notation,

Dmfx =∑(

m

m1, . . . ,mn

)∂mf(x)

∂xm11 . . . ∂xmnn

(dx1)m1 · · · (dxn)mn ,

where the sums are taken over all multi-indices (m1, . . . ,mn) ∈ Z+ × · · · × Z+

for which m1 + · · ·+mn = m.We may go a step further by appealing to the following generalization of

the binomial theorem.

9.7.2 Multinomial Theorem. Let h1, . . . , hn ∈ R and m ∈ N. Then

(h1 + · · ·+ hn)m =∑(

m

m1, . . . ,mn

)hm1

1 · · ·hmnn , (9.27)

where the summation is taken over all multi-indices (m1, . . . ,mn) for whichm1 + · · ·+mn = m.

Proof. The theorem may be proved by induction, but we give a combinatorialargument instead. The left side of (9.27) expands into a sum of products ofthe form x1 · · ·xm, where each xi is one of the terms in the sum h1 + · · ·+ hn.


Each such product may be written uniquely as hm11 · · ·hmnn , where mj ≥ 0

and m1 + · · ·+mn = m. For each fixed (m1, . . . ,mn), the number of productsof this form is the number of ways m1 factors in the product x1 · · ·xn maybe chosen to be h1, m2 factors may be chosen to be h2, etc. This number isprecisely the multinomial coefficient (9.25).

Now consider the operator hi∂

∂xi, which takes a C1 function f to the

function hi∂f

∂xi. If multiplication of such operators is defined as operator

composition, then the usual laws of algebra hold. For example, the operator(h1

∂

∂x1+ h2

∂

∂x2

)h2

∂

∂x2

applied to a C2 function f yields

h1h2∂2f

∂x1∂x2+ h2

2∂2f

∂x22

=(h1h2

∂2

∂x1∂x2+ h2

2∂2

∂x22

)f,

hence we may write(h1

∂

∂x1+ h2

∂

∂x2

)h2

∂

∂x2=(h1h2

∂2

∂x1∂x2+ h2

2∂2

∂x22

).

Similarly, (h∂

∂x+ k

∂

∂y

)2= h2 ∂2

∂x2 + 2hk ∂2

∂x∂y+ k2 ∂2

∂y2 .

The last example suggests that the multinomial theorem is valid in this setting.This is indeed the case (a similar proof works). It follows from (9.26) that themth total differential may be written in operator form as

Dmfx(h) =(h1

∂

∂x1+ h2

∂

∂x2+ · · ·+ hn

∂

∂xn

)m= (h ·∇)m.

We may now state the n-dimensional version of Taylor’s theorem.

9.7.3 Taylor’s Theorem. Let U ⊆ Rn be open, m ∈ N, and let f : U → Rbe Cm+1 on U . Then for each pair of distinct points a, x ∈ U for which[a : x] ⊆ U , there exists a point c ∈ [a : x] depending on x and a such that

f(x) =m∑p=0

1p!(h ·∇

)pf(a) + 1

(m+ 1)!(h ·∇

)m+1f(c), h := x− a. (9.28)

Proof. The line segment [a : x] is described by

ϕ(t) := (1− t)a+ tx = a+ th, 0 ≤ t ≤ 1.

Since U is open, there exists an r > 0 such that ϕ((−r, 1 + r)

)⊆ U . Let

F = f ◦ ϕ. By the chain rule,

F ′(t) =n∑j=1

∂f(ϕ(t)

)∂xj

hj and d

dt

∂f(ϕ(t)

)∂xj

=n∑i=1

∂2f(ϕ(t)

)∂xi∂xj

hi,

henceF ′′(t) =

n∑i, j=1

∂2f(ϕ(t)

)∂xi∂xj

hihj .

An induction argument shows that

F (p)(t) =n∑

j1,...,jp=1

∂pf(ϕ(t)

)∂xj1 · · · ∂xjp

hj1 . . . hjp =(h ·∇

)pf(ϕ(t)

).

By Taylor’s theorem in one variable, there exists c ∈ (0, 1) such that

f(x) = F (1) =m∑p=0

F (p)(0)p! + F (m+1)(c)

(m+ 1)! .

Setting c = ϕ(c) completes the proof.

The summation in (9.28) is called an mth order Taylor polynomial about aand is denoted by Tm(x,a). For example, the second order Taylor polynomialof a C2 function f(x1, x2) is

f + h1∂f

∂x1+ h2

∂f

∂x2+ 1

2 h21∂2f

∂x21

+ h1h2∂2f

∂x1∂x2+ 1

2 h22∂2f

∂x22,

where hj = xj − aj and the terms are evaluated at (a1, a2). The last term in(9.28) is called the remainder term and is denoted by Rm(x,a).

The following theorem gives a sufficient condition for a C∞ function to beexpressed as a multi-variable Taylor’s series.

9.7.4 Taylor Series Representation. Let U ⊆ Rn be open and convex andlet f : U → R be C∞ on U . Suppose that for some M < +∞∣∣∣∣ ∂pf(x)

∂xp11 ∂x

p22 . . . ∂xpnn

∣∣∣∣ ≤Mfor all x ∈ U , p ∈ N, and all pj ∈ Z+, where p = p1 + . . .+ pn. Then

f(x) =∞∑p=0

1p!(h ·∇

)pf(a), a, x ∈ U, h := x− a. (9.29)


Proof. By 9.7.3, the theorem will follow if we show that the remainder term

Rm(x,a) = 1(m+ 1)!

(h ·∇

)m+1f(c)

tends to zero as m→∞. By (9.26),

|Rm(x,a)| ≤ M

(m+ 1)!∑(

m+ 1m1, . . . ,mn

)|h1|m1 |h2|m2 . . . |hn|mn ,

where the summation is taken over all multi-indices (m1, . . . ,mn) for whichm1 + · · ·+mn = m+ 1. By the multinomial theorem, this sum is hm+1, whereh = |h1|+ |h2|+ · · ·+ |hn|. Therefore,

|Rm(x,a)| ≤ Mhm+1

(m+ 1)! ,

which implies limmRm(x,a) = 0.

The series on the right in (9.29) is called the Taylor series for f about a.While the theorem may be applied directly, in many cases it is easier to makeuse of single variable series. For example, from the series expansion for ex wehave

exy =∞∑n=0

xnyn

n! and ex+y = exey =∞∑n=0

( n∑j=0

xj

j!yn−j

(n− j)!

).

Exercises1. Let f be of class C3. Write out explicitly

(a) D2f(x, y). (b)S D3f(x, y). (c) D2f(x, y, z).

2.S Calculate D2f for the functions f(x, y) =

(a) x3y2 + x2y3. (b) 1x2y

. (c) sin(xy). (d) ex2+y2 . (e) ln(x2 + y).

3.S Find Dm+n+1(xmyn).

4. Let f(x, y) be Cn. Show that for 1 ≤ k ≤ n,

∂k

∂tkf(tx, ty)

∣∣t=1 =

((x, y) ·∇

)kf(x, y).

Conclude that if f is homogeneous of degree n (Exercise 9.3.15), then((x, y) ·∇

)kf(x, y) = n(n− 1) · · · (n− k + 1)f(x, y).

5. Write out explicitly(a)S the first order Taylor polynomial for a C1 function f(x1, x2, x3).(b) the third order Taylor polynomial for a C3 function f(x1, x2),

6. A polynomial of degree m+ n in two variables x and y is a function ofthe form

m∑i=0

n∑j=0

aijxiyj , where aij ∈ R and amn 6= 0.

Prove that f(x, y) is a polynomial of degree ≤ p on Br(a, b) iffDp+1f(x, y) = 0 for all (x, y) ∈ Br(a, b).

7. Let P (x, y) be a polynomial in x, y. Prove that the polynomials P (x± 1)may be written as linear combinations of derivatives

∂kP (x, y)∂xi∂yj

, k = i+ j.

8.S Let ϕ(t) be of class Cm on an interval (−r, r) and let f(x) = ϕ(b · x

)where b, x ∈ Rn. Show that the Taylor polynomial for f of order mabout 0 is

m∑p=0

ϕ(p)(0)p!

(b · x

)p.

9. Let U ⊆ R2 be open and connected and let f be C∞ on U such that foreach (x, y) ∈ U there exists r > 0 and p ∈ N depending on (x, y) suchthat Dpf = 0 on Br(x, y). Prove that there exists a single p ∈ N suchthat Dpf = 0 on U . Hint. Use Exercise 6.

10. Let U ⊆ Rn be open and let f be Cp on U such that all partial derivativesof f of order r < p vanish throughout U . Let C be a compact convexsubset of U . Prove that there exists c < +∞ such that

‖f(x)− f(y)‖ ≤ c‖x− y‖p, x, y ∈ C.

11. Use the one variable Taylor series to find third order Taylor polynomialswith a = (0, 0) for the functions

(a) S sin(x+ y). (b) cos√xy. (c) ln(1− x− y)−1.

(d) S arctan(x+ y). (e) e2x+3y. (f) y

1 + xy.

*9.8 OptimizationThroughout the section, f : U → R denotes aC1 function on an open subset U of Rn.

In this section we use differential theory to find the maximum and minimumvalues of f on subsets E of U . The first step is to find all local extrema.

Local Extrema and Critical Points9.8.1 Definition. Let a ∈ U . If f(a) is the maximum (minimum) value of fon some ball in U with center a then f is said to have a local maximum (localminimum) at a ∈ U In either case, f is said to have a local extremum at a. ♦

The following theorem gives a necessary condition for the existence of alocal extremum.

9.8.2 Local Extremum Theorem. If f has a local extremum at a, thendfa = 0.

Proof. The function g(t) := f(a1, . . . aj−1, t, aj+1, . . . , an) has a local ex-tremum at t = aj , hence, by the single variable local extremum theorem(4.2.2), ∂jf(a1, . . . , an) = g′(aj) = 0.

9.8.3 Definition. A point a ∈ U is called a critical point of f if dfa = 0. Acritical point a is a local maximum (local minimum) point if f has a localmaximum (local minimum) at a. If a is neither a local maximum nor a localminimum point, then a is called a saddle point. ♦

FIGURE 9.2: Saddle point.

By definition, a critical point a is a saddle point iff in each ball Br(a)there exist points x and y such that f(x) < f(a) < f(y). This means thatthe graph of f rises in some directions from a and falls in others. A familiarexample is f(x, y) = y2 − x2 at (0, 0) (Figure 9.2).

Second Derivative TestThe following theorem gives sufficient conditions for a critical point of a

function f to be a local maximum point, a local minimum point, or a saddlepoint. It may be seen as an extension of the second derivative test for functionsof one variable.

9.8.4 Second Derivative Test. Let f be C2 on U and let a ∈ U be a criticalpoint of f .

(a) If D2fa(h) > 0 for all h 6= 0, then a is a local minimum point.

(b) If D2fa(h) < 0 for all h 6= 0, then a is a local maximum point.

(c) If D2fa(h) > 0 for some h and D2fa(k) < 0 for some k, then a is asaddle point of f .

Proof. Choose r > 0 such that Br(a) ⊆ U . By 9.28 with m = 1, for each hwith ‖h‖ < r there exists c ∈ [a : a+ h] such that

f(a+ h)− f(a) = 12D

2fc(h) = 12[D2fa(h) + η(h)

], (9.30)

where

η(h) = D2fc(h)−D2fa(h) =n∑

i,j=1hihj

[∂2f(c)∂xi∂xj

− ∂2f(a)∂xi∂xj

].

Set

ε(h) ={‖h‖−2η(h) if ‖h‖ 6= 0,0 if ‖h‖ = 0.

Since |hihj | ≤ ‖h‖2,

|ε(h)| ≤n∑

i,j=1

∣∣∣∣ ∂2f(c)∂xi∂xj

− ∂2f(a)∂xi∂xj

∣∣∣∣ .Since f is C2, limh→0 ε(h) = 0.

With these preliminaries out of the way, assume that the hypothesis in(a) holds. Since the function D2fa(h) is continuous in h, it has a positiveminimum m on the sphere S1(0) in Rn. Thus

D2fa(h) = ‖h‖2D2fa

(h

‖h‖

)≥ m‖h‖2, h 6= 0,

so from (9.30)

f(a+ h)− f(a) ≥ 12(m‖h‖2 + η(h)

)= 1

2(m+ ε(h)

)‖h‖2.

Since m > 0 and ε(h)→ 0, f(a+h)− f(a) > 0 for all h 6= 0 with sufficientlysmall norm. This proves (a). Part (b) follows from (a) by considering −f .

To prove (c), suppose for some h, k that D2fa(h) > 0 and D2fa(k) < 0.By (9.30),

f(a+ th)− f(a) = t2

2[D2fa(h) + ‖h‖2ε(th)

],

for all t > 0. Therefore, f(a+ th)− f(a) > 0 for all sufficiently small t > 0.Similarly, f(a+ tk)− f(a) < 0 for all sufficiently small t > 0.

9.8.5 Example. Let f(x, y, z) = x2 + y2 + xy + 3x+ sin2 z. The system

fx = 2x+ y + 3 = 0, fy = x+ 2y = 0, fz = sin(2z) = 0

has solutions an = (−2, 1, nπ/2), n ∈ Z. From

fxx = fyy = 2, fzz = 2 cos(2z), fxy = 1, and fxz = fyz = 0,

we have

D2f(h, k, `) =(h∂

∂x+ k

∂

∂y+ `

∂

∂z

)2f

= h2fxx + k2fyy + `2fzz + 2(hkfxy + h`fxz + k`fyz)= 2(h2 + k2 + hk + `2 cos(2z)

).

Therefore,

D2fan(h, k, `) ={

2(h2 + k2 + hk + `2) if n = 2k,2(h2 + k2 + hk − `2) if n = 2k + 1.

Since h2 + k2 + hk ≥ 0 for all h, k, a2k is a local minimum point and a2k+1 asaddle point. ♦

The second derivative test gives no information ifD2fa = 0. For example,the critical point (0, 0) of the function f(x, y) = xn + y2, n ≥ 3, is a saddlepoint if n is odd and a local minimum point if n is even.

For n = 2, there is a simpler version of the second derivative test:

9.8.6 Corollary. Let U ⊆ R2 be open and let f : U → R be C2 on U . For acritical point (a, b) of f , set

∆ = ∆(a, b) =∣∣∣∣fxx(a, b) fxy(a, b)fyx(a, b) fyy(a, b)

∣∣∣∣ = fxx(a, b)fyy(a, b)− f2xy(a, b).

(a) If ∆ > 0 and fxx(a, b) > 0, then (a, b) is a local minimum point.

(b) If ∆ > 0 and fxx(a, b) < 0, then (a, b) is a local maximum point.

(c) If ∆ < 0, then (a, b) is a saddle point of f .

Proof. Let

α = fxx(a, b), β = fxy(a, b), and γ = fyy(a, b).

Then ∆ = αγ − β2 and

D2f(a,b)(h, k) = αh2 + 2βhk + γk2, h, k ∈ R. (9.31)

If α 6= 0, completing the square yields

D2f(a,b)(h, k) = α

[h+ kβ

α

]2+ k2(αγ − β2)

α= α

[h+ kβ

α

]2+ k2∆

α.

Thus if ∆ > 0, α > 0, and (h, k) 6= (0, 0), then D2fa(h, k) > 0, hence, by thetheorem, (a) holds. A similar argument proves (b).

Now suppose ∆ < 0. If α 6= 0, then from (9.31)

D2f(a,b)(1, 0) = α and D2f(a, b)(−βα−1, 1) = ∆α,

which have opposite signs. If γ 6= 0, then completing the square yields

D2f(a,b)(h, k) = γ

[k + hβ

γ

]2+ h2∆

γ,

and one may argue similarly. (This also shows that (a) and (b) hold withfxxin the statement replaced by fyy.) Finally, if α = γ = 0, then β 6= 0, and(9.31) shows that, again, D2fa(h, k) has positive and negative values. Thisproves (c).

9.8.7 Example. Let f(x, y) = 3x2y + 2xy2 − 6xy. Since

fx(x, y) = 2y(3x+ y − 3) and fy(x, y) = x(3x+ 4y − 6),

the critical points are (0, 0), (2, 0), (0, 3), and (2/3, 1).

TABLE 9.1: Values of ∆.

(a, b) (0, 0) (2, 0) (0, 3) (2/3, 1)fxx(a, b) 0 0 18 6fyy(a, b) 0 8 0 8/3fxy(a, b) −6 6 6 2∆(a, b) −36 −36 −36 12

Table 9.1 shows that f has three saddle points and one local minimumpoint. ♦


9.8.8 Example. Let

f(x, y) = (cx2 + y2)e−x2−y2

, c 6= 0, 1.

The system

fx = 2xe−x2−y2

(c− cx2 − y2) = 0, fy = 2ye−x2−y2

(1− cx2 − y2) = 0

has solutions (0, 0), (0,±1), and (±1, 0). The second partial derivatives are

fxx = 2e−x2−y2[

c− 3cx2 − y2 + 2x2(cx2 + y2 − c)],

fyy = 2e−x2−y2[

1− cx2 − 3y2 + 2y2(cx2 + y2 − 1)], and

fxy = 4xye−x2−y2[

cx2 + y2 +−c− 1].

TABLE 9.2: Values of ∆.

(a, b) (0, 0) (0, 1)) (0,−1) (1, 0) (−1, 0)fxx(a, b) 2c 2(c− 1)/e 2(c− 1)/e −4c/e −4c/efyy(a, b) 2 −4/e −4/e 2(1− c)/e 2(1− c)/efxy(a, b) 0 0 0 0 0∆(a, b) 4c 8(1− c)/e 8(1− c)/e2 8(c− 1)/e 8(c− 1)/e2

The values of ∆ at the critical points (a, b) are given in Table 9.2. Assigningvalues to c produces a variety of local extreme points. For example, if c > 1,then (0,±1) are saddle points and the remaining critical points are localminimum points of f . ♦

Global ExtremaWe now turn to the problem described at the beginning of the section,

namely, to find the points in a subset E of U at which f has a maximumor a minimum. Such points, called global extrema, will always exist if E isclosed and bounded. The following examples illustrate a common techniquefor finding them.

9.8.9 Example. Let

f(x, y) = 2x3 − x2 + 3y2, E :={

(x, y) : x2 + y2 ≤ 1}.

By 9.8.2, the extreme values of f occur at points on bd(E) or at critical pointsof f in int(E). Solving the system

fx = 6x2 − 2x = 0, fy = 6y = 0

yields the critical points (0, 0) and (1/3, 0), which are candidates for extrema

in int(E). To find possible extrema on bd(E) we substitute 1 − x2 for y2 inthe expression for f to obtain the function

F (x) = 2x3 − 4x2 + 3, −1 ≤ x ≤ 1.

Since the only zero of F ′(x) in [−1, 1] is x = 0, single variable optimizationtheory gives us the additional extrema candidates (0,±1) and (±1, 0). Cal-culating the values of f at these six points shows that f(0,±1) = 3 is themaximum value of f on E and f(−1, 0) = −3 is the minimum. ♦

9.8.10 Example. Let

f(x, y, z) = (x− 1)2 + (y − 2)2 + z2, E :={

(x, y, z) : x2 + y2 + z2 ≤ 6}.

The solution of the system fx = fy = fz = 0 is (1, 2, 0), at which f has minimumvalue zero. The maximum of f must then occur on bd(E). Substituting theexpression 6− x2 − y2 for z2 in the definition of f , we obtain the function

F (x, y) = (x− 1)2 + (y − 2)2 + 6− x2 − y2 = 11− 2x− 4y, x2 + y2 ≤ 6.

The system Fx = Fy = 0 has no solution, hence the extreme values of F mustlie on the boundary x2 + y2 = 6. To find these values, let x =

√6 cos θ and

y =√

6 sin θ, so

F (x, y) = G(θ) := 11− 2√

6 cos θ − 4√

6 sin θ, 0 ≤ θ ≤ 2π.

Applying single variable optimization techniques to G, we see that possibleextreme values of F on x2 + y2 = 6 occur at points (x, y) for which θ = 0and θ = arctan 2, that is, (x, y) = (

√6, 0) and ±

(√65 , 2√

65

). Calculating the

values of F at these points shows that the maximum value of f on E is

f

(−√

65 ,−2

√65 , 0)≈ 22. ♦

In the above examples, E was the closure of an open set whose boundaryis a smooth surface. In many important cases, however, E itself is a surface.The surfaces we shall consider are of the form

E = {x ∈ U : g1(x) = · · · = gm(x) = 0} ,

where U ⊆ Rn is open,m < n, and the functions gj are C1 on U . The equationsgj(x) = 0 are then called constraints and E is the constraint set. If f(a) is themaximum or minimum value of f on E, then f is said to have an extremum ata subject to the constraints gj = 0.

9.8.11 Example. We find the points on the surface z2−x2y = 1 closest to theorigin. This is equivalent to minimizing f(x, y, z) = x2 + y2 + z2 subject to theconstraint z2−x2y−1 = 0. Since the surface is unbounded, it suffices to consider

that part of the surface inside a ball with center 0. To find the minimum, wesubstitute z2 = x2y+ 1 into f to obtain a function F (x, y) = x2(1 + y) + y2 + 1defined on an open disk containing a point at which f is minimum. The criticalpoints of F , solutions of the system

Fx = 2x(1 + y) = 0, Fy = x2 + 2y = 0,

are (0, 0), and (±√

2,−1). The last two are easily seen to be saddle points,while (0, 0) is a local minimum point. Therefore, the minimum of f occurs at(0, 0,±1), hence the distance from the surface to the origin is 1. ♦

Lagrange MultipliersIn 9.8.11, it was possible to solve the constraint equation for one of the

variables in terms of the others, reducing the dimension by one, therebysimplifying the problem. This is not always possible, but the implicit functiontheorem may be used to solve the constraint equation locally. This is themethod used in the proof of the next theorem. For its statement, we use thefollowing notational conventions, similar to those used in the proof of theimplicit function theorem.

Notation. Let m < n and p := n−m. For points z ∈ Rn = Rm+p we write

z = (x,y) = (x1, . . . xm, y1, . . . yp), x ∈ Rm, y ∈ Rp.

If G := (g1, . . . , gm) : U → Rm, then G(z) may be written as G(x,y). If G isdifferentiable, we define

Gx =

∂g1

∂x1· · · ∂g1

∂xm... · · ·...

∂gm∂x1

· · · ∂gm∂xm

and Gy =

∂g1

∂y1· · · ∂g1

∂yp... · · ·

...∂gm∂y1

· · · ∂gm∂yp

. ♦

9.8.12 Lagrange Multipliers. Let U ⊆ Rn be open and let f, gj : U → R,j = 1, . . . ,m < n be C1 functions. Set G := (g1, . . . , gm). Suppose that fhas a global extremum at c = (a, b) ∈ U subject to the constraint G = 0. IfdetGx(c) 6= 0, then there exist constants λ1, . . . , λm such that

∇f(c) =m∑i=1

λi∇gi(c). (9.32)

Proof. Equation (9.32) is the system

∂jf(c) = λ1∂g1

∂xj+ · · ·+ λm

∂gm∂xj

, j = 1, . . . ,m,

∂j+mf(c) = λ1∂g1

∂yj+ · · ·+ λm

∂gm∂yj

, j = 1, . . . , p,


which may be written in matrix form as[λ1 · · · λm

]Gx(c) =

[∂1f(c) · · · ∂mf(c)

](9.33)[

λ1 · · · λm]Gy(c) =

[∂m+1f(c) · · · ∂nf(c)

]. (9.34)

Equation (9.33) is satisfied by defining[λ1 · · · λm

]:=[∂1f(c) · · · ∂mf(c)

]G−1

x (c). (9.35)

It remains to show that (9.34) is satisfied for this choice of[λ1 · · · λm

].

By the implicit function theorem applied to G, there is an open set Vb ⊆ Rpcontaining b and a continuously differentiable mapping

h = (h1, . . . , hm) : Vb → Rm

such that

h(b) = a and G(h(y),y

)= 0 for every y ∈ Vb.

Applying the chain rule to each component equation gi(h(y),y

)= 0 yields

∂gi∂x1

∂h1

∂yj+ · · ·+ ∂gi

∂xm

∂hm∂yj

+ ∂gi∂yj

= 0, i = 1, . . . ,m, j = 1, . . . , p,

which may be written in matrix form as∂g1

∂x1· · · ∂g1

∂xm......

∂gm∂x1

· · · ∂gm∂xm

∂h1

∂y1· · · ∂h1

∂yp...

...∂hm∂y1

· · · ∂hm∂yp

= −

∂g1

∂y1· · · ∂g1

∂yp...

...∂gm∂y1

· · · ∂g1

∂yp

or in the above notation as

Gx(c)h′(b) = −Gy(c).

Multiplying the last equation on the left by[∂1f(c) · · · ∂mf(c)

]G−1

x (c)and using (9.35), we obtain[

∂1f(c) · · · ∂mf(c)]h′(b) = −

[λ1 · · · λm

]Gy(c). (9.36)

Since f(h(y),y

)has a local extremum at b, its partial derivatives must vanish

there:∂f(c)∂x1

∂h1(b)∂yj

+ · · ·+ ∂f(c)∂xm

∂hm(b)∂yj

+ ∂f(c)∂yj

= 0, j = 1, 2, . . . , p.

In matrix form,[∂1f(c) · · · ∂mf(c)

]h′(b) = −

[∂m+1f(c) · · · ∂nf(c)

]. (9.37)

Equation (9.34) now follows from (9.36) and (9.37).


9.8.13 Example. Let c, x ∈ Rn, c 6= 0. We find the extreme values off(x) := c · x on the sphere ‖x‖ = 1, that is, subject to the constraintg(x) := ‖x‖2 − 1 = 0. By Lagrange multipliers, the extreme values occur atpoints x for which ∇f(x) = λ∇g(x) for some λ ∈ R. This leads to the systemci = 2λxi, 1 ≤ i ≤ n. Squaring and adding yields ‖c‖2 = 4λ2‖x‖2 = 4λ2,hence 2λ = ±‖c‖ and x = c/2λ = ±c/‖c‖. Therefore, the extreme values of fare f

(± c/‖c‖

)= ±‖c‖. ♦

The last example has an important application to directional derivatives:Let h be differentiable on Br(a). From Exercise 9.3.10, the directional derivativeDxh(a) of h at a in the direction of a unit vector x is c · x, where c = ∇h(a).Thus, by the example, Dxh(a) is maximum when x = c/‖c‖, that is, when xis in the direction of the gradient of h.

9.8.14 Example. Let x = (x1, . . . , xn), a = (a1, . . . , an), and c = (c1, . . . , cn),where xj ≥ 0, aj > 0, and cj > 0. We find the maximum value of f(x) =xa1

1 xa22 · · ·xann subject to the constraint c · x = 1. Note that the conditions

xj ≥ 0 and cj > 0 imply that the constraint set is closed and bounded.Set g(x) = c · x − 1. The maximum of f occurs at points x for which∇f(x) = λ∇g(x) for some λ ∈ R. This leads to the equations

ajf(x) = λcjxj , j = 1, . . . , n. (9.38)

Adding and using the constraint yields af(x) = λ, or f(x) = λ/a, wherea =

∑nj=1 aj . From (9.38), aj = acjxj so the maximum occurs at the point( a1

ac1,a2

ac2, . . . ,

anacn

).

In particular, if a1 = · · · = an = 1 and c1 = · · · = cn = 1/c, c > 0, thenf(x1, x2, . . . , xn) = x1x2 · · ·xn has maximum f(c/n, . . . , c/n) = (c/n)n. Thusx1x2 · · ·xn ≤ (c/n)n, or equivalently (x1x2 · · ·xn)1/n ≤ c/n for all xj > 0satisfying x1 + · · ·+ xn = c. Since c is arbitrary, we obtain the classic result

(x1x2 . . . xn)1/n ≤ x1 + x2 + · · ·+ xnn

, xj ≥ 0,

which asserts that the geometric mean of nonnegative data does not exceedthe arithmetic mean. ♦

Exercises1. In each case classify the critical point a := (π/2, π/2, π/2) of the function.

(a) (sin x)(sin y)(sin z). (b) (sin x)(cos y)(cos z).

2.S Show that the function x2 +2y2 +3z2−xy−yz−xz on R3 has minimumvalue zero.


3. Find and classify the critical points of the following functions.

(a) S x3 + 2xy + 3x2 + y2. (b) x3 + 3x2y2 − 6x2 − 12y2.

(c) x2y2 + 2/x+ 2/y. (d) S x4 + 2y2 − 4xy.(e) x−1 + y−1 + ln(x2 + y2). (f) S x−1 + y−1 + arctan(y/x).(g) x3 − xy2 + x2 − y2. (h) x4 − 2x2 + 4y3 − 12y.(i) S xy − x2y − xy2. (j) x4 − 4x3 + 4x2 + y2.

4. Find the maximum and minimum values of each of the following functionsf on R2 \ {(0, 0)}.

(a)S x+ y√x2 + y2

. (b) x+√

3 y√x2 + y2

. (c) x+ 2y√x2 + y2

. (d) x2 + xy

x2 + y2 .

5. Show that the point (x, x2) on the curve y = x2 nearest the point (1, 2)satisfies the equation 2x3 − 3x− 1 = 0.

In Exercises 6–9, use the method of 9.8.9 and 9.8.10.

6. Find the extreme values of the following functions on the disk D :={(x, y) : x2 + y2 ≤ 1

}:

(a) 3x2 + 2y2 − x. (b)S x2 + xy − x+ y2. (c) cos(xy). (d)S sin(xy).

7.S Prove that the maximum of f(x, y) = x2 + ay2 + (a− 1)y on the diskD :=

{(x, y) : x2 + y2 ≤ 1

}occurs on bd(D).

8. Let f(x, y) = x2 + y2 + axy on the disk D :={

(x, y) : x2 + y2 ≤ 1}.

Prove that a maximum of f occurs on bd(D), and that a minimum of foccurs on bd(D) iff |a| ≥ 2.

9. Show that the maximum (minimum) value of fn(x1, . . . , xn) =∑ni=1 xi

on C1(0) is√n (−

√n).

10.S Let f(x, y) = ax−1 + by−1 + xy, a, b > 0. Prove that f has a minimumon (0,+∞)× (0,+∞) and that the minimum value is 3(ab)1/3.

11.S Consider the data points (xi, yi), 1 ≤ i ≤ n, where xi 6= xj for at leastone pair of points. The linear least squares fit is the line y = mx + bwith the property that the sum of squares of the vertical distances fromthe data points to the line, namely,

∑ni=1(yi −mxi − b

)2, is minimum.Show that

m = x · y − nx y‖x‖2 − nx2 , and b = y −mx, where

x = (x1, . . . , xn), y = (y1, . . . , yn), x := 1n

n∑i=1

xi, and y := 1n

n∑i=1

yi.


12. Let x, a, b, c ∈ Rn. Prove that f(x) := ‖x−a‖2 + ‖x− b‖2 + ‖x− c‖2has a minimum value and find the point at which it occurs.

In Exercises 13–28, use Lagrange multipliers.

13. Show that the maximum value of f(x) = x1x2 · · ·xn on the set E :={x : xi ≥ 0 and

∑ni=1 xi ≤ 1} is n−n.

14. Find the maximum and minimum of 2x− 3y subject to the constraint(x+ 1)2 + (y − 1)2 = 1.

15.S Find the maximum and minimum of ax2 + 2bxy + y2 subject to theconstraint x2 + y2 = c2, where abc 6= 0 and (a+ 1)2 + 4(a− b2) ≥ 0.

16. Show that the point (x, y, z) on the surface x2 + y2 + z2 = 1 nearest thepoint (1, 2, 3) is (1/

√14, 2/

√14, 3/

√14).

17.S Show that the point (x, y, z) on the surface z = x2 + y2 nearest thepoint (1, 2, 3) satisfies the equations

10x3 − 5x− 1 = 0, y = 2x, and z = x2 + y2 = 5x2.

18. Show that the point on the surface x2 + y2 − z2 = 1 nearest to the point(1, 2, 3) is(

x, 2x, 3x/(2x− 1)), where 20x4 − 20x3 − 8x2 + 4x− 1 = 0.

19.S Show that the point on the surface z2 − x2 − y2 = 1 nearest (1, 2, 3) is(x, 2x, 3x/(2x− 1)), where 20x4 − 20x3 − 4x+ 1 = 0.

20. The intersection of the surfaces z = x2 + y2 and x + y + z = 1 is anellipse lying above the xy plane. Find the highest and lowest points ofthe ellipse.

21. Let a, b, c > 0. Show that the maximum and minimum values of thefunction f(x, y, z) = ax+ by + cz subject to the constraints x2 + z2 = 1,y2 + z2 = 1, x, y, z ≥ 0, are, respectively, the maximum and minimum ofthe quantities c, a+ b, and (a+ b+ cd)/

√1 + d2, where d := c/(a+ b).

22.S Find the maximum and minimum values of x+ 2y + 3z subject to theconstraints x+ y + z = 1 and x2 + y2 + z2 = 1.

23. Let a > 1/3. Show that the maximum value of xyz subject to theconstraints x+ y + z = 1 and x2 + y2 + z2 = a is

xyz = 127(1− 3t2 + 2t3), where t =

√3a− 1

2 .

24.S Let x = (x1, · · · , xn), a = (a1, · · · , an) 6= 0, and b = (b1, · · · , bn).Show that the shortest distance from b to the hyperplane a · x = c is√

2∣∣c− a · b

∣∣ ‖a‖−1.

25. Let p ≥ 2. Show that the largest distance from the surface∑ni=1 |xi|p = 1

to the origin is n(p−2)/2p and the smallest distance is 1.

26.S Find the distance from the point a = (a1, . . . , an) to the (n − 1)-dimensional sphere ‖x‖ = 1 in Rn, where aj > 0, and ‖a‖ 6= 1.

27. Let a = (a1, . . . , an) and b = (b1, . . . , bn), where ai, bi > 0.(a)S Show that the minimum value of the function a · x subject to the

constraintn∑i=1

bi/xi = 1, where xi > 0, is( n∑i=1

√aibi

)2.

(b) Show that the minimum value of∑ni=1 aix

2i subject to the constraint

in (a) is( n∑i=1

a1/3i b

2/3i

)3.

28. Let a = (a1, . . . , an) and b = (b1, . . . , bn), where ai, bi > 0 and∑ni=1 bi = 1. Find the minimum value of a · x subject to the con-

straint∏ni=1 x

bjj = 1, where xi > 0.

29. Let U ⊆ Rn be open and let f : U → R be C2 on U . Show that if f hasa local maximum (minimum) at a ∈ U , then D2fa(h) ≤ 0 (≥ 0) for allh ∈ Rn.

30.S Prove the following generalization of Rolle’s theorem: Let U ⊆ Rn bebounded and open and let f : U → R be differentiable on U , continuouson cl(U), and constant on bd(U). Then f ′(u) = 0 for some u ∈ U .

31. Let U ⊆ Rn be open and f : U → Rn C1 on U such that Jf 6= 0on U . Let a ∈ U and let C := Cr(a) ⊆ U , r > 0. Prove that ifsupC ‖f(x)− x‖ < r/2, then the equation f(x) = a has a solution in C.

Chapter 10Lebesgue Measure on Rn

The methods of Chapter 5 may be modified in a natural way to construct theRiemann integral of a function of several variables. In Section 11.1, we brieflydescribe how this is done. However, the main goal of the present chapter andthe next is to construct the more general Lebesgue integral. The choice todevelop the n-dimensional Lebesgue integral rather than the n-dimensionalRiemann integral is motivated by the fact that, as an analytical tool, the formerhas several distinct advantages over the latter. For example, the Lebesguetheory allows the interchange of limit and integral in more general settings.Furthermore, the collection of Lebesgue integrable functions, which includesunbounded functions on unbounded domains, is significantly larger than theset of Riemann integrable functions. These advantages make the Lebesguetheory better suited for applications based on, for example, probability theoryand, in particular, stochastic processes.

The key idea in Riemann integration on Rn is the partitioning of thedomain of the integrand f into n-dimensional subintervals. The Riemannintegral is then obtained as a limit of Riemann sums, that is, sums of functionvalues times the volumes of the subintervals. In Lebesgue integration, it is therange of f rather than the domain that is partitioned into subintervals (seeFigure 10.7). This still produces a partition of the domain of f ; however, thesets in this partition are generally more complicated than subintervals. TheLebesgue integral is constructed by multiplying the measure of these sets byfunction values, adding the results, and then taking limits. In this chapter weconstruct the measure and in the next chapter we construct the integral. Theprecise connection between the Riemann and Lebesgue integrals is made inSection 11.4.

10.1 General Measure TheoryIn this section we give brief description of those aspects of measure theory

that will be needed to construct Lebesgue measure on Rn. For a comprehensivetreatment see, for example, [4].

343


Sigma Fields10.1.1 Definition. A σ-field on a nonempty set S is a collection F of subsetsof S such that

(a) S, ∅ ∈ F ;

(b) A ∈ F implies Ac ∈ F ;

(c) Ak ∈ F , k ∈ N, implies⋃

kAk ∈ F . ♦

Part (c) of the definition says that F is closed under countable unions. ByDeMorgan’s law, ⋂

kAk =

(⋃kAck

)c,

hence part (b) implies that F is also closed under countable intersections.The collection of all subsets of S and the collection {∅, S} are simple

examples of σ-fields. The following examples are somewhat more interesting.

10.1.2 Example. If A is an arbitrary collection of subsets of S, then theσ-field generated by A is the intersection σ(A) of all σ-fields containing A. It isthe smallest σ-field containing A in the sense that if F is a σ-field containingA then F contains σ(A). In the special case where A = {A1, A2, . . .} is acountable partition of S, σ(A) is simply the collection F of all unions ofmembers of A. Indeed, F is clearly closed under countable unions, and thecalculation ( ⋃

k∈F

Ak

)c=⋃k∈F c

Ak, F ⊆ N

shows that F is closed under complements. Thus, by minimality, σ(A) = F . ♦

10.1.3 Example. If F is a σ-field on S and E ⊆ S, then the collection

FE := {A ∩ E : A ∈ F}

is a σ-field of subsets of E. Moreover, FE ⊆ F iff E ∈ F . (See Exercise 2.) ♦

Measure on a Sigma Field10.1.4 Definition. A measure on a σ-field F of subsets of S is a functionµ : F → [0,+∞] such that µ(∅) = 0 and µ has the additivity property

µ(⋃

kAk

)=∑

kµ(Ak)

for any finite or infinite sequence of pairwise disjoint sets Ak ∈ F . The extendedreal number µ(A) is called the measure of A. ♦

Lebesgue Measure on Rn 345

10.1.5 Example. Let {pk} be a sequence of nonnegative real numbers. Define

µ(E) =∑k∈E

pk, E ⊆ N,

where the sum may be infinite. (By convention, the sum over the empty set iszero.) It is not difficult to show that µ is a measure on the σ-field of all subsetsof N.

In the special case pk = 1 for all k, µ(E) counts the number of elementsin E if E is a finite set, and µ(E) = +∞ otherwise. In this case, µ is called acounting measure. ♦

10.1.6 Proposition. Let µ be a measure on a σ-field F and A1, A2, · · · ∈ F .

(a) If A1 ⊆ A2, then µ(A1) ≤ µ(A2) (monotonicity).

(b) µ(⋃

k Ak)≤∑k µ(Ak) (subadditivity).

(c) µ(A1) + µ(A2) = µ(A1 ∪A2) + µ(A1 ∩A2) (inclusion-exclusion).

(d) If Ak ↑ A, then µ(Ak) ↑ µ(A) (continuity from below).

(e) If Ak ↓ A and µ(A1) < +∞, then µ(Ak) ↓ µ(A) (continuity from above).

Proof. (a) By additivity, µ(A2) = µ(A2 \A1) + µ(A1) ≥ µ(A1).(b) Write⋃

k

Ak = A1 ∪ (A2 ∩Ac1) ∪ · · · ∪ (Am ∩Ac1 ∩ · · · ∩Acm−1) ∪ · · · .

Since the sets in the union on the right are pairwise disjoint, by countableadditivity and monotonicity

µ

(⋃k

Ak

)= µ(A1) +

∑m≥2

µ(Ac1 ∩ · · · ∩Acm−1 ∩Am

)≤∑m≥1

µ(Am).

(c) Since A1∪A2 is the union of the pairwise disjoint sets A1∩Ac2, A1∩A2,and A2 ∩Ac1, additivity implies that

µ(A1 ∪A2) = µ(A1 ∩Ac2) + µ(A1 ∩A2) + µ(A2 ∩Ac1).

Similarly,

µ(A1) + µ(A2) = µ(A1 ∩Ac2) + 2µ(A2 ∩A1) + µ(A2 ∩Ac1).

It follows that µ(A1 ∪A2) = +∞ iff µ(A1) + µ(A2) = +∞, which proves (c)in the infinite case. In the finite case, simply subtract the above equations toget (c).

(d) This is clear if some Ak has infinite measure, so assume µ(Ak) < +∞

for all k. Set A0 = ∅ and Ek = Ak \Ak−1. The sets Ek are pairwise disjoint,A =

⋃∞k=1Ek, and µ(Ek) = µ(Ak)− µ(Ak−1), hence by additivity

µ(A) =∞∑k=1

µ(Ek) = limn

n∑k=1

[µ(Ak)− µ(Ak−1)

]= lim

nµ(An).

(e) Note that A1 \Ak ↑ A1 \A, hence, by (d),

µ(A1)− µ(A) = µ(A1 \A) = limkµ(A1 \Ak) = µ(A1)− lim

kµ(Ak).

ExercisesFor the following exercises, F is a σ-field ofsubsets of a set S and µ is a measure on F .

1.S Find an example which shows that the hypothesis µ(A1) < +∞ in10.1.6(e) cannot be removed.

2. Verify that the collection FE in 10.1.3 is a σ-field.

3.S Let A, B ∈ F with µ(B) = 0. Show that µ(A ∪B) = µ(A \B) = µ(A).

4. Let Ak, Bk ∈ F and let s denote the sum∑k µ(Ak \Bk). Prove that

(a) µ(⋃

k

Ak \⋃k

Bk

)≤ s. (b) µ

(⋂k

Ak \⋂k

Bk

)≤ s.

5.S (General inclusion-exclusion principle). Let µ(A1 ∪ · · · ∪ An

)< +∞.

Prove that for n ≥ 2

µ(A1 ∪ · · · ∪An

)=

n∑i=1

µ(Ai)−n∑

1≤i<j≤nµ(Ai ∩Aj)

+n∑

1≤i<j<k≤nµ(Ai ∩Aj ∩Ak)− · · ·+ (−1)n−1µ(A1 ∩ · · · ∩An).

Hint. Use induction on n. The case n = 2 is 10.1.6(c).

6. For a sequence of sets Ak ∈ F , define

lim infk

Ak =∞⋃k=1

∞⋂j=k

Aj and lim supk

Ak =∞⋂k=1

∞⋃j=k

Aj .

Prove the following:(a) lim infk Ak ⊆ lim supk Ak.(b) µ

(lim infk Ak

)≤ lim infk µ(Ak).

(c) µ(

lim supk Ak)≥ lim supk µ(Ak) if µ (

⋃k Ak) < +∞.

(d) µ(

lim supk Ak)

= 0 if∑k µ(Ak) < +∞.


7.S Let {Ek} be a sequence in F , m ∈ N, and let A denote the set of allx ∈ S such that x ∈ Ek for finitely many and at least m values of k.Prove that A ∈ F and

µ(A) ≤ 1m

∞∑k=1

µ(A ∩ Ek).

8. Let {Ek} be a sequence in F , m ∈ N, and let B denote the set of allx ∈ S such that x ∈ Ek for at most m values of k. Prove that B ∈ F and

µ(B) ≥ 1m

∞∑k=1

µ(B ∩ Ek).

9. Let E be a collection of pairwise disjoint members of F and let A ∈ F .Show that µ(A∩E) > 0 for at most countably many members of E . Hint.Consider

Em := {E ∈ E : µ(A ∩ E) ≥ 1/m} , m ∈ N.

10.2 Lebesgue Outer Measure10.2.1 Definition. An n-dimensional interval in Rn is a Cartesian product

I = A1 ×A2 × · · · ×An,

where each Aj is an interval in R. If I is bounded, then the n-dimensionalvolume |I| of I is defined by

|I| :=n∏j=1|Aj | =

n∏j=1

(bj − aj),

where aj ≤ bj are the endpoints of Aj . If Aj = [aj , bj) for each j, then I issaid to be half-open. We denote by

• I the collection of all bounded intervals in Rn,

• H the collection of all bounded half-open intervals in Rn,

• O the collection of all bounded open intervals in Rn,

• C the collection of all bounded closed intervals in Rn. ♦

Note that each of the above collections is closed under the formation ofnonempty finite intersections.


10.2.2 Lemma. Let I, I1, . . . , Im ∈ H.(a) If I1, . . . , Im are pairwise disjoint and I =

⋃mj=1 Ij, then |I| =

∑mj=1 |Ij |.

(b) If I ⊆⋃mj=1 Ij, then |I| ≤

∑mj=1 |Ij |.

(c) If I1, . . . , Im are pairwise disjoint and I ⊇⋃mj=1 Ij, then |I| ≥

∑mj=1 |Ij |.

Proof. For ease of notation, we prove the lemma for the case n = 2, in whichcase the intervals are half-open rectangles. Let I = [a, b) × [c, d). We may

a bx1 x2 x3 x4

R2,3 R3,3 R4,3

c

y1

y2

y3

y4

d

Ik

FIGURE 10.1: Pairwise disjoint interval grid.

assume in (b) that I =⋃mj=1 Ij , otherwise we could replace Ij by Ij ∩ I. Thus

in each case the rectangles Ij are contained in I, hence the coordinates of theirvertices form partitions

{x0 := a ≤ x1 ≤ . . . ≤ xp := b} and {y0 := c ≤ y1 ≤ . . . ≤ yq := d}

of [a, b] and [c, d], respectively. These partitions generate a grid of subrectanglesRi,j = [xi, xi+1) × [yj , yj+1) with union I such that each rectangle Ik is aunion of subrectangles Ri,j . Case (a) is depicted in Figure 10.1; the rectanglesIk in the figure are shown with solid boundaries, and the dashed lines are theextensions of these boundaries. Since

b− a =p−1∑i=0

(xi+1 − xi) and d− c =q−1∑j=1

(yj+1 − yj),

we have

|I| =[ p−1∑i=0

(xi+1 − xi)][ q−1∑

j=0(yj+1 − yj)

]=

p−1∑i=0

q−1∑j=0|Rij |. (10.1)

Similarly, |Ik| =∑{(i,j):Ri,j⊆Ik} |Rij |, hence∑

k

|Ik| =∑k

∑{(i,j):Ri,j⊆Ik}

|Rij |. (10.2)


We now compare (10.1) and (10.2). In part (a), every Ri,j is containedin exactly one Ik, hence |I| =

∑mk=1 |Ik|. In (b), a rectangle Ri,j could be

contained in more than one Ik, so |I| ≤∑mk=1 |Ik|. Finally, in (c) not every

Ri,j is necessarily contained in an Ik, hence |I| ≥∑mk=1 |Ik|.

10.2.3 Definition. The Lebesgue outer measure of a subset A of Rn is definedby

λ∗(A) = λ∗n(A) := inf{∑

j|Ij | : Ij ∈ I and

⋃jIj ⊇ A

}. ♦

10.2.4 Remark. The number of intervals Ij covering A in the definition ofλ∗(A) may be finite or infinite. Of course, every bounded subset of Rn hasa covering by a finite many Ij ’s. By slightly adjusting endpoints, one mayshow that the value of λ∗(A) is unchanged if I is replaced by H, O, or C.(Exercise 1.) ♦

10.2.5 Proposition. Lebesgue outer measure on Rn has the following prop-erties:

(a) 0 ≤ λ∗(A) ≤ +∞.

(b) λ∗(∅) = 0.

(c) λ∗(I) = |I| for each I ∈ I.

(d) If A ⊆ B, then λ∗(A) ≤ λ∗(B) (monotonicity).

(e) λ∗(⋃

kAk

)≤∑

kλ∗(Ak) (subadditivity).

(f) If I, J ∈ H and I ∩ J = ∅, then λ∗(I ∪ J) = |I|+ |J |.

Proof. Parts (a) and (d) follow directly from the definition, and (b) followsfrom the observation that ∅ may be covered by a single interval of arbitrarilysmall volume.

I

J

Ik

Jk

FIGURE 10.2: The coverings {Ik} and {Jk}.

To prove (c), note first that, because {I} is a covering, λ∗(I) ≤ |I|. Forthe reverse inequality, let ε > 0 and choose a closed bounded interval J ⊆ Isuch that |J | > |I| − ε. Let {Ik} be any sequence of intervals covering I. By

10.2.4, we may take Ik ∈ O. Let {Jk} be a sequence in H such that Ik ⊆ Jkand |Jk| < |Ik| + ε/2j (Figure 10.2). Since J is compact, there exists an msuch that

J ⊆ I1 ∪ · · · ∪ Im ⊆ J1 ∪ · · · ∪ Jm.Therefore,

|I| − ε < |J | ≤ |J1|+ · · ·+ |Jm| ≤ ε+∞∑k=1|Ik|,

the second inequality by 10.2.2(b). Letting ε → 0, we have |I| ≤∑∞k=1 |Ik|.

Therefore, |I| ≤ λ∗(I).For (e), we may assume that λ∗(Ak) < +∞ for all k. Let ε > 0 and for

each k choose a sequence {Ik,j}∞j=1 in I such that

Ak ⊆∞⋃j=1

Ik,j and∞∑j=1

λ∗(Ik,j) ≤ λ∗(Ak) + ε

2k .

Since the countable collection {Ik,j : k, j = 1, 2, . . .} covers⋃∞k=1Ak,

λ∗( ∞⋃k=1

Ak

)≤∞∑k=1

∞∑j=1

λ∗(Ik,j) ≤∞∑k=1

λ∗(Ak) + ε.

Since ε was arbitrary, (e) follows.For (f), let I ∪ J ⊆

⋃mk=1 Ik, where Ik ∈ H. Since Ik ⊇ (Ik ∩ I) ∪ (Ik ∩ J),

10.2.2(c) shows that |Ik| ≥ |Ik ∩ I|+ |Ik ∩ J |. Therefore, by (c),m∑k=1|Ik| ≥

m∑k=1|Ik ∩ I|+

m∑k=1|Ik ∩ J | ≥ λ∗(I) + λ∗(J) = |I|+ |J |,

Taking the infimum we have λ∗(I ∪ J) ≥ |I| + |J |. The reverse inequalityfollows from (e).

Exercises1.S Prove the assertions in 10.2.4. More generally prove the following:

Let J be a collection of bounded intervals with the property that foreach bounded interval I and each ε > 0 there exists J ∈ J containing Isuch that |J | < |I|+ ε. For A ⊆ Rn, define

α(A) := inf{∑

k

|Jk| : Jk ∈ J and⋃k

Jk ⊇ A}.

Then λ∗(A) = α(A).

2. Prove that in the definition of λ∗(A), I may be replaced by the collectionIr of all bounded intervals I whose coordinate intervals have rationalendpoints.


3. Prove that in the definition of λ∗(A), I may be replaced by the collectionU of all bounded open subsets of R and also by the collection K of allcompact sets.

4.S Show that Lebesgue outer measure is translation invariant, that is,

λ∗(A+ x) = λ∗(A) for every A ⊆ Rn and x ∈ Rn,

where A+ x := {a+ x : a ∈ A}.

5. Show that Lebesgue outer measure has the reflection property

λ∗(−A) = λ∗(A) for every A ⊆ Rn,

where −A := {x : −x ∈ A}.

6. Show that Lebesgue outer measure has the dilation property

λ∗(rA) = |r|nλ∗(A) for every A ⊆ Rn and r ∈ R,

where rA := {rx : x ∈ A}.

10.3 Lebesgue MeasureBy subadditivity of outer measure,

λ∗(C) ≤ λ∗(C ∩ E) + λ∗(C ∩ Ec)

for all subsets E and C of Rn. The following definition singles out those setsE that also satisfy the reverse inequality for all sets C.

10.3.1 Definition. A subset E of Rn is said to be Lebesgue measurable if

λ∗(C) ≥ λ∗(C ∩ E) + λ∗(C ∩ Ec) (10.3)

for all subsets C of Rn. The collection of all Lebesgue measurable subsets ofRn is denoted byM =M(Rn). The restriction of λ∗ toM is called Lebesguemeasure on Rn and is denoted by λ = λn. Any particular set C satisfying(10.3) is called a test set for E. ♦

If C is a test set for E, then λ∗(C) = λ∗(C ∩ E) + λ∗(C ∩ Ec)); the set E

splits the outer measure of C.

10.3.2 Theorem. M is a sigma field containing all sets of outer measurezero and λ is a measure onM.


Proof. Clearly, ∅, Rn ∈ M, and since E and Ec appear symmetrically in(10.3), Ec ∈M iff E ∈M. If λ∗(E) = 0, then, by monotonicity,

λ∗(C ∩ E) + λ∗(C ∩ Ec) ≤ λ∗(E) + λ∗(C ∩ Ec) = λ∗(C ∩ Ec) ≤ λ∗(C),

hence E ∈M. Therefore,M contains all sets of Lebesgue outer measure 0.It remains to show that, for a sequence {Ek} in M,

⋃∞k=1Ek ∈ M and

furthermore that λ∗(⋃∞

k=1Ek)

=∑∞k=1 λ

∗(Ek) if the sets Ek are pairwisedisjoint. This is accomplished in the following four steps:

I. If E, F ∈M, then E ∪ F, E ∩ F ∈M.J To show that E ∪ F ∈M, take any set C as a test set for E and takeC ∩ Ec as a test set for F to obtain

λ∗(C) = λ∗(C ∩ E) + λ∗(C ∩ Ec) andλ∗(C ∩ Ec) = λ∗(C ∩ Ec ∩ F ) + λ∗(C ∩ Ec ∩ F c).

Combining these and using subadditivity,

λ∗(C) = λ∗(C ∩ E) + λ∗(C ∩ Ec ∩ F ) + λ∗(C ∩ Ec ∩ F c)≥ λ∗

[(C ∩ E) ∪ (C ∩ Ec ∩ F )

]+ λ∗(C ∩ Ec ∩ F c). (10.4)

Since(C ∩E

)∪(C ∩Ec∩F

)⊇ C ∩ (E∪F ), by monotonicity and (10.4),

λ∗(C) ≥ λ∗[C ∩ (E ∪ F )

]+ λ∗

[C ∩ Ec ∩ F c

]= λ∗

[C ∩ (E ∪ F )

]+ λ∗

[C ∩ (E ∪ F )c

].

This shows that E ∪F ∈M. That E ∩F ∈M follows from De Morgan’slaw E ∩ F = (Ec ∪ F c)c. K

II. If C ⊆ Rn and E, F ∈M with E ∩ F = ∅, then

λ∗(C ∩ (E ∪ F )

)= λ∗(C ∩ E) + λ∗(C ∩ F ).

J Use C ∩ (E ∪ F ) as a test set for E to obtain

λ∗[C ∩ (E ∪ F )

]= λ∗

[C ∩ (E ∪ F ) ∩ E

]+ λ∗

[C ∩ (E ∪ F ) ∩ Ec

]= λ∗(C ∩ E) + λ∗(C ∩ F ). K

III. If the sets Ek are pairwise disjoint and F :=⋃k Ek, then F ∈ M and

λ(F ) =∑k λ(Ek).

J Set Fk =⋃kj=1Ej and let C ⊆ Rn. By steps I and II and induction,

Fk ∈M and

λ∗(C ∩ Fk) =k∑j=1

λ∗(C ∩ Ej).


Thus, by monotonicity,

λ∗(C) = λ∗(C ∩ Fk) + λ∗(C ∩ F ck ) ≥k∑j=1

λ∗(C ∩ Ej) + λ∗(C ∩ F c).

Since k was arbitrary, by subadditivity

λ∗(C) ≥∞∑j=1

λ∗(C∩Ej)+λ∗(C∩F c) ≥ λ∗(C∩F )+λ∗(C∩F c) ≥ λ∗(C).

The inequalities are therefore equalities, which shows that F ∈ M.Taking C = F verifies the second assertion of III. K

IV.∞⋃k=1

Ek ∈M.

J Use I, III and⋃∞k=1Ek = E1 ∪ (E2 ∩Ec1)∪ (E3 ∩Ec1 ∩Ec2)∪ . . . . K

10.3.3 Definition. A set E is said to have (Lebesgue)measure zero if λ(E) = 0.A property P (x) depending on points x ∈ Rn is said to hold almost everywhere(a.e.) or for almost all x if the set of all x for which P (x) is false has measurezero. ♦

For example, the Dirichlet function is zero a.e. More generally, if E ∈Mthen 1E = 0 a.e. iff λ(E) = 0.

By subadditivity, a countable union of sets of measure zero has measurezero. Since a point has measure zero, it follows that every countable set hasmeasure zero. In particular, Qn has measure zero. The following is an exampleof an uncountable set with measure zero.

10.3.4 Example. (Cantor ternary set). Remove from I0,1 := [0, 1] the “middlethird” open interval (1/3, 2/3), leaving closed intervals I1,1 and I1,2 with unionE1 and total length 2/3. Next, remove from each of I1,1 and I1,2 the middle thirdopen interval, leaving closed intervals I2,1, I2,2, I2,3, and I2,4 with union E2and total length 4/9 = (2/3)2. By induction, one obtains a decreasing sequence

I1,1 I1,2

I0,1

I2,1 I2,2 I2,3 I2,4

I3,1 I3,2 I3,3 I3,4 I3,5 I3,6 I3,7 I3,8...

E1

E2

E3

.00220 . . . .22202 . . .

FIGURE 10.3: Middle thirds construction.

of closed sets Ek =⋃2kj=1 Ik,j such that, by subadditivity, λ∗(Ek) ≤ (2/3)k. If


E denotes the intersection of these sets, then E is closed and, by monotonicity,λ∗(E) ≤ (2/3)k for all k. Therefore, λ∗(E) = 0.

To show that E is uncountable, we use the fact that every real numberx ∈ [0, 1] has both ternary and binary representations

x = .d1d2 . . . (ternary) =∞∑k=1

dk3−k, where dk ∈ {0, 1, 2},

x = .e1e2 . . . (binary) =∞∑k=1

ek2−k, where ek ∈ {0, 1}.

These are obvious analogs of the decimal representation of a real number (seeExercise 6.1.14). As with decimal representations, there is some ambiguity; forexample, 1/3 = .1000 . . . = .0222 . . . (ternary). Now observe that if dk = 0 or

Ik−1,j

Ik,2j−1 Ik,2j

dk = 0 dk = 2

FIGURE 10.4: x ∈ Ik−1,j ⇒ x ∈ Ik,2j−1+dk/2.

2 for all k in the above ternary representation, then x ∈ E. For example,

.00220 . . . ∈ I1,1 ∩ I2,2 ∩ I3,4 ∩ I4,7 ∩ · · ·

and.22202 . . . ∈ I1,2 ∩ I2,4 ∩ I3,7 ∩ I4,14 ∩ · · ·

(see Figure 11.2). In general, if x ∈ Ik−1,j , then x ∈ Ik,2j−1+dk/2. Conversely,let x ∈ E. Since x ∈ E1, we may choose d1 = 0 or 2. Similarly, since x ∈ E2,we may choose d2 = 0 or 2, etc. Continuing in this manner, we see that everymember of E has a (unique) ternary representation with digits 0 or 2.

Now define ϕ : E → [0, 1] by

ϕ(.d1d2 . . . (ternary)

)= .e1e2 . . . (binary), where dk ∈ {0, 2} and ek = dk/2.

The function ϕ is not one-to-one; for example,

ϕ(.0222 . . .) = .0111 . . . = .1000 . . . = ϕ(.2000 . . .).

However, by removing from E the countable set of all numbers with ternaryrepresentations having a tail end of zeros, these being necessarily rational, weobtain a set F on which ϕ is one-to-one. Since ϕ(F ) = (0, 1), it follows that Eis uncountable. ♦


We show in the next section that intervals, open sets, and closed sets areLebesgue measurable. It follows that countable unions and intersections ofthese sets are also Lebesgue measurable. The reader may well ask if there areany subsets of Rn that are not Lebesgue measurable. The answer is that thereare many, but their construction is surprisingly intricate. The following is anexample for the case n = 1.

10.3.5 Example. (A non-measurable set). Consider sets of the form x+ Q,x ∈ R. We claim that if

(x+ Q

)∩(y + Q

)6= ∅, then x+ Q = y + Q. To see

this, choose z ∈(x+ Q

)∩(y + Q

), say z = x+ r1 = y + r2, r1, r2 ∈ Q. Then,

for any r ∈ Q,

x+ r = y + r2 − r1 + r ∈ y + Q and y + r = x+ r1 − r2 + r ∈ x+ Q,

hence x+Q = y+Q. It follows that every real number is in exactly one of thesets x+ Q. Now form a set E by choosing exactly one number in each of thedistinct sets x+ Q.1 For each x ∈ R, the set E ∩ (x+ Q) has a single member,hence x = y + r for unique y ∈ E and r ∈ Q. Thus R may be expressed as adisjoint union

R =∞⋃k=1

(rk + E), (10.5)

where {r1, r2, . . .} is an enumeration of Q.Suppose, for a contradiction, that E is Lebesgue measurable. Then

λ(E) > 0, otherwise, by (10.5), translation invariance (Exercise 1), andcountable additivity, R would have measure zero. On the other hand, let I bean arbitrary bounded interval and set J = Q ∩ (0, 1). Since I is measurable(Section 10.4, below), the set

F :=⋃r∈J

(r + E ∩ I

)is measurable. Also, since I and J are bounded so is F . Thus, by countableadditivity and translation invariance,

+∞ > λ(F ) =∑r∈J

λ(r + E ∩ I

)=∑r∈J

λ(E ∩ I

).

Since J is an infinite set, λ(E ∩ I)

= 0. But then

λ(E) =∞∑k=0

λ(E ∩ [k, k + 1))

+∞∑k=0

λ(E ∩ [−k − 1,−k))

= 0.

This contradiction shows that E cannot be Lebesgue measurable. ♦

1The existence of E requires the axiom of choice, one of the axioms of Zermelo–Fraenkelset theory.

Exercises1. ⇓2 Show that E ∈ M and x ∈ Rn imply that x + E ∈ M. Conclude

from Exercise 10.2.4 that λ(x+ E) = λ(E).

2.S Show that E ∈M implies that −E ∈M. Conclude from Exercise 10.2.5that λ(−E) = λ(E).

3. Show that E ∈ M and r 6= 0 imply that rE ∈ M. Conclude fromExercise 10.2.6 that λ(rE) = |r|nλ(E).

4.S Show that for any ε > 0 there exists an open set D dense in Rn suchthat λ(D) < ε.

5. Prove that if f and g are continuous real-valued functions on Rn whichare equal a.e., then f = g. Does the same result hold if only one of thefunctions is continuous?

6. Let A be the subset of [0, 1] whose members are missing the digit threein their decimal expansions. Prove that A is uncountable and λ(A) = 0.

10.4 Borel SetsRecall that the σ-field generated by a collection A of sets is the intersection

of all σ-fields containing A (10.1.2). The following special case is of particularimportance.

10.4.1 Definition. The Borel σ-field B = B(Rn) is the σ-field generated bythe open sets of Rn. A member of B is called a Borel set. ♦

10.4.2 Remark. Since open sets and closed sets are complements of oneanother, B is also generated by the closed sets. Furthermore, since an openset is a countable union of n-dimensional open intervals (Exercise 8.2.4), B isalso generated by O. Since every open interval is a countable union of closedand bounded intervals and every closed interval is a countable intersection ofopen intervals, B is also generated by C. Similar considerations show that B isgenerated by H as well. ♦

10.4.3 Theorem. B(Rn) ⊆M(Rn).

Proof. By 10.4.2, it suffices to show that H ⊆M. Note first that if I, J ∈ Hthen, using partitions as in the proof of 10.2.2, I \ J may be expressed (usuallyin several ways) as a disjoint union of members of H. (See Figure 10.5.)

I

I1

I3

J

I2

I5

I4

FIGURE 10.5: I \ J = I1 ∪ I2 ∪ I3 ∪ I4 ∪ I5.

Now let I ∈ H, C ⊆ Rn, and let {Ik} be any sequence in H that covers C.We show that

λ∗(C ∩ I) + λ∗(C ∩ Ic) ≤∑k

λ∗(Ik). (10.6)

Taking the infimum over all such sequences {Ik} produces the inequalityλ∗(C ∩ I) + λ∗(C ∩ Ic) ≤ λ∗(C), proving that I ∈M.

To verify (10.6), we may assume that∑∞k=1 λ

∗(Ik) < +∞. For each k thereexist, according to the observation at the beginning of the proof, intervalsJj,k ∈ H such that Ik \ I =

⋃mkj=1 Jj,k (disjoint union). Then

Ik = (Ik ∩ I) ∪ (Ik \ I) = (Ik ∩ I) ∪mk⋃j=1

Jj,k (disjoint union),

hence, by 10.2.5(f) and induction,

λ∗(Ik) = λ∗(Ik ∩ I) +mk∑j=1

λ∗(Jj,k).

Since {Ik ∩ I}k covers C ∩ I and {Jj,k}j,k covers C ∩ Ic,

∑k

λ∗(Ik) =∑k

λ∗(Ik ∩ I) +∑k

mk∑j=1

λ∗(Jj,k) ≥ λ∗(C ∩ I) + λ∗(C ∩ Ic).

It may be shown that the inclusion B ⊆M is proper.3 The importance ofBorel sets is that they are closely linked to the topology of Rn and hence arebetter suited for contexts involving continuous functions.

The remainder of the section demonstrates the precise connection betweenB andM.

3See, for example, [4].

10.4.4 Lemma. For any bounded E ∈M, there exists a decreasing sequenceof bounded open sets Uk ⊇ E such that

limkλ(Uk) = lim

kλ(cl(Uk)

)= λ(E).

Proof. By definition of λ(E), for each k we may choose a sequence of openintervals Ij,k with union Vk containing E such that

λ(E) ≤ λ(Vk) ≤ λ(cl(Vk)

)≤∑j

|cl(Ij,k)| < λ(E) + 1/k.

The sequence of open sets Uk := V1 ∩ · · · ∩ Vk is decreasing, contains E, andsatisfies

λ(E) ≤ λ(Uk) ≤ λ(cl(Uk)

)≤ λ

(cl(Vk)

)≤ λ(E) + 1/k.

Letting k → +∞ proves the assertion.

10.4.5 Lemma. For any E ∈ M, there exists an increasing sequence ofcompact sets Ck ⊆ E such that limk λ(Ck) = λ(E).

Proof. Suppose first that E is bounded. Let I be a bounded open intervalcontaining cl(E) and let ε > 0. Choose a sequence of open intervals Ik with

Ik

K = I \ UE

I \ E ⊆ U :=⋃

k Ik

I

FIGURE 10.6: K = cl(E) \ U .

union U ⊇ I \ E such that∑∞k=1 |Ik| < λ(I \ E) + ε. Since I is open, we may

assume that Ik ⊆ I (otherwise, replace Ik by Ik ∩ I). Then I \E ⊆ U ⊆ I and

λ(U) ≤ λ(I \ E) + ε = λ(I)− λ(E) + ε.

Set K = I \ U . Then K ⊆ E ⊆ cl(E) ⊆ I, hence K = cl(E) \ U . Therefore, Kis compact and

λ(K) = λ(I)− λ(U) ≥ λ(I)−(λ(I)− λ(E) + ε

)= λ(E)− ε.

Now let E ∈ M be arbitrary and let {Ek} be a sequence of bounded

measurable sets such that Ek ↑ E. By the first paragraph, for each k wemay choose a compact set Kk ⊆ Ek such that λ(Kk) > λ(Ek) − 1/k. Theseconditions still hold if Kk is replaced by the compact set Ck = K1 ∪ · · · ∪Kk.The sequence {Ck} is increasing, contained in E, and λ(Ck)→ λ(E).

10.4.6 Lemma. If E ∈M is bounded, then there exists an increasing sequenceof compact sets Ck and a decreasing sequence of bounded open sets Uk suchthat

Ck ⊆ E ⊆ Uk and limkλ(Uk \ Ck) = 0.

Proof. If Ck and Uk are as in 10.4.4 and 10.4.5 with Uk bounded, then

λ(Uk \ Ck) = λ(Uk \ E) + λ(E \ Ck)→ 0.

10.4.7 Theorem. If E ∈M, then there exist Borel sets F and G such thatF ⊆ E ⊆ G and λ(G \ F ) = 0.

Proof. Suppose first that E is bounded. Set F =⋃∞k=1 Ck and G =

⋂∞k=1 Uk,

where Ck and Uk are the sets in 10.4.6. Then F ⊆ E ⊆ G and G \F ⊆ Uk \Ckfor all k, hence λ

(G \ F

)≤ λ

(Uk \ Ck)→ 0.

In the general case, there exists a sequence of bounded Borel sets Ek ↑ E. Bythe first paragraph, there exist Borel sets Fk and Gk such that Fk ⊆ Ek ⊆ Gkand λ(Gk \ Fk) = 0. Let

F =∞⋃k=1

Fk and G =∞⋃k=1

Gk.

Then F and G are Borel sets, F ⊆ E ⊆ G, and G \ F ⊆⋃∞k=1Gk \ Fk. By

countable subadditivity, λ(G \ F ) = 0.

10.4.8 Corollary. Every E ∈ M is the disjoint union of a Borel set and aset of Lebesgue measure zero.

Proof. By the theorem, E = F ∪ (E \ F ), where F ∈ B and λ(E \ F ) = 0.

Exercises1.S Let ε > 0. Construct an explicit compact subset C ⊆ E := [0, 1]∩ I such

that λ(E \ C) < ε.

2. Show that the graph G := {(x, y) : y = f(x)} of a continuous functionf : R→ R is a Borel set with two-dimensional Lebesgue measure zero.

3. Let E denote the Cantor set (10.3.4). Show that E + Q and E + E areBorel sets and find their measures.

4.S Let B ∈ B(Rn), y ∈ Rn, and r ∈ R. Prove that B + y :={x+ y : x ∈ B}, rB := {rx : x ∈ B} and −B := {x : −x ∈ B} areBorel sets.

10.5 Measurable FunctionsIn this section, F denotes a σ-field of subsets of a set S..

Definition and Basic Properties10.5.1 Lemma. Let f : S → R. The following statements are equivalent:

(a) f−1({+∞}), f−1({−∞}) ∈ F , and f−1(U) ∈ F for all open sets U ⊆ R.

(b) f−1({+∞}), f−1({−∞}) ∈ F , and f−1(F ) ∈ F for all closed sets F ⊆ R.

(c) f−1({+∞}), f−1({−∞}) ∈ F , and f−1(B) ∈ F for all Borel sets B ⊆ R.

(d) {x : f(x) ≤ t} ∈ F for all t ∈ R.

(e) {x : f(x) < t} ∈ F for all t ∈ R.

(f) {x : f(x) ≥ t} ∈ F for all t ∈ R.

(g) {x : f(x) > t} ∈ F for all t ∈ R.

Proof. The equivalence of (a) and (b) follows from the general set theoreticrelation f−1(Ac) =

[f−1(A)

]c. Clearly, (c) implies (b). For the converse, denoteby G the collection of all Borel subsets B of R such that f−1(B) ∈ F . Then Gis a σ-field. If (b) holds, then G contains the closed sets, hence, by minimality,G = B. This proves (c) and hence shows that (a)–(c) are equivalent.

The implications (c) ⇒ (d) ⇒ (e) ⇒ (f) ⇒ (g) ⇒ (d) are proved using thefollowing set relations:

(c) ⇒ (d) : {x : f(x) ≤ t} = f−1({−∞}) ∪ f−1((−∞, t]).(d) ⇒ (e) : {x : f(x) < t} =

∞⋃n=1{x : f(x) ≤ t− 1/n} .

(e) ⇒ (f) : {x : f(x) ≥ t} = {x : f(x) < t}c .

(f) ⇒ (g) : {x : f(x) > t} =∞⋃n=1{x : f(x) ≥ t+ 1/n} .

(g) ⇒ (d) : {x : f(x) ≤ t} = {x : f(x) > t}c .

Thus (d)–(g) are equivalent and are implied by (a)–(c).Now assume that (d)–(g) hold. Then the sets

f−1(+∞) =∞⋂k=1{x : f(x) > k} , f−1(−∞) =

∞⋂k=1{x : f(x) < −k}

are members of F , and for −∞ < a a} ∩ {x : f(x) < b} ∈ F .

Since every open subset of R is a countable union of open intervals, (a) holds,completing the proof.

10.5.2 Definition. A function f : S → R is said to be measurable withrespect to F , or simply F-measurable, if any (hence all ) of the conditions inLemma 10.5.1 hold. ♦

The following theorem shows that the collection of all measurable functionsis closed under the standard ways of combining functions. The functions f+, f−,supn fn, infn fn, lim supn fn, and lim infn fn in the statement of the theoremare defined by

f+(x) := max{f(x), 0}, f−(x) := max{−f(x), 0},(supkfk)(x) := sup

kfk(x), (inf

kfk)(x) := inf

kfk(x),

(lim supk

fk)(x) := lim supk

fk(x), (lim infk

fk)(x) := lim infk

fk(x).

10.5.3 Theorem. Let f, g, fk be measurable with respect to a σ-field F onS. If α ∈ R and p > 0, then f + g, αf , f2, fg, |f |p, f+, f−, supk fk, infk fk,lim supk fk, and lim infk fk are measurable.

Proof. The proof is based on the following equalities. The details are left tothe reader.

• {x : (f + g)(x) < t} =⋃

r∈Q{x : f(x) < r} ∩ {x : g(x) < t− r}.

• {x : αf(x) < t} = {x : f(x) < t/α} for α > 0.

•{x : f2(x) < t

}={x : −

√t < f(x) <

√t}

for t > 0.

• fg = 12 [(f + g)2 − f2 − g2].

• {x : |f |p(x) < t} ={x : −t1/p < f(x) < t1/p

}for t > 0.

• f+ = 12 (|f |+ f), f− = 1

2 (|f | − f).

•{x : sup

kfk(x) ≤ t

}=⋂

k{x : fk(x) ≤ t}.

• infk fk = − supk(−fk).

• lim infk fk = supk infj≥k fj ; lim supk fk = − lim infk(−fk).

10.5.4 Corollary. If fk : S → R is F-measurable for every k and if fk → fon S, then f is F-measurable.

Simple Functions10.5.5 Definition. The indicator function of a set A ⊆ S is the function 1Aon S defined by

1A(x) ={

1 if x ∈ A, and0 if x 6∈ A.

♦

For example, the Dirichlet function may be expressed as 1Q.

10.5.6 Definition. A function f : S → R with finite range is called a simplefunction. The collection of all nonnegative F-measurable simple functions isdenoted by S+(F). ♦

10.5.7 Remarks. (a) A linear combination of indicator functions is a simplefunction. Conversely, a simple function f may be expressed in many ways as alinear combination of indicator functions. The most important of these is thestandard form

f =m∑j=1

aj1Aj , Aj := {x ∈ S : f(x) = aj} , (10.7)

where a1, . . . , am ∈ R are the distinct values of f . Note that the sets Aj forma partition of Rn. By 10.5.3 and Exercise 8, f is F-measurable iff Aj ∈ F foreach j.(b) If f1, f2 ∈ S+(F), α ≥ 0, and p > 0, then the functions

αf1, f1 + f2, f1f2, fp1 , max{f1, f2}, min{f1, f2}

are nonnegative, measurable, and have finite ranges, hence are in S+(F). ♦

The following theorem shows that the collection S+(F) generates all mea-surable functions. It is a key ingredient in the development of the Lebesguetheory.

10.5.8 Theorem. For each nonnegative F-measurable function f on S, thereexists a sequence {fk} in S+(F) such that fk ↑ f on S.

Proof. Let f0 = 0, and for each k ∈ N define

fk =k2k∑j=1

j − 12k 1Ak,j + k1Ak , where Ak = {x : f(x) ≥ k} and

Ak,j ={x : (j − 1)2−k ≤ f(x) < j2−k

}, j = 1, 2, . . . , k2k.

(See Figure 10.7.) We show that fk(x) ↑ f(x) for each x ∈ S. This isclear if f(x) = +∞, since then fk(x) = k for all k. Suppose f(x) ∈ Rand let k ∈ N. If f(x) ≥ k + 1, then fk+1(x) = k + 1 > k = fk(x). If

(j − 1)2−k

j2−k

k

S

Ak,j

Ak

...

f

FIGURE 10.7: The components of fk.

(j − 1)2−k

j2−k

k

S

(2j − 1)2k+1

k + 1......

...

x x x

FIGURE 10.8: The components of fk+1.

k ≤ f(x) < k + 1, then fk+1(x) ≥ k = fk(x). Finally, suppose that f(x) < k.Then (j − 1)2−k ≤ f(x) < j2−k for some 1 ≤ j ≤ k2k, hence

2j − 22k+1 ≤ f(x) < 2j − 1

2k+1 or 2j − 12k+1 ≤ f(x) < 2j

2k+1 .

(See Figure 10.8.) In either case,

fk+1(x) ≥ 2j − 22k+1 = j − 1

2k = fk(x).

Thus fk ↑ on S. Since 0 ≤ f(x) − fk(x) < 2−k for all sufficiently large k,fk(x)→ f(x).

Lebesgue and Borel Measurable Functions10.5.9 Definition. A function f : Rn → R is said to be Borel (Lebesgue)measurable if f is measurable with respect to the σ-field B(Rn) (M(Rn)). ♦

10.5.10 Proposition. If f is Lebesgue measurable and f = g a.e., then g isLebesgue measurable.

Proof. Let A = {x : f(x) 6= g(x)}. By hypothesis, A has Lebesgue measurezero, hence Ac and {x : g(x) < t} ∩A ∈M. Therefore,

{x : g(x) < t} =({x : f(x) < t} ∩Ac

)∪({x : g(x) < t} ∩A

)∈M.

If f is Borel measurable and f = g a.e., then g need not be Borel measurable.Indeed, there exist sets E ∈M \ B with measure zero, hence 1E = 0 a.e. but1E is not Borel measurable.4

Clearly, a Borel measurable function is Lebesgue measurable. The precedingparagraph shows that the converse is false. However, we have

10.5.11 Proposition. If f : Rn → R is Lebesgue measurable, then there existsa Borel measurable function g : Rn → R such that g = f a.e.

Proof. Consider first the case f = 1E , E ∈ M. By 10.4.8, E is the disjointunion of a Borel set F and a set A of Lebesgue measure zero. Thus g := 1F isBorel measurable and f = g + 1A = g a.e. The assertion therefore holds forindicator functions.

If f is a simple function, then each term in the standard form of f is a.e.equal to a Borel function. Therefore, the assertion holds for simple functions.

If f ≥ 0, then, by 10.5.8, there exists a sequence of nonnegative Lebesguemeasurable simple functions fk such that fk → f on Rn. By the previousparagraph, for each k there exists a Borel measurable function gk such thatfk = gk a.e. Let

Ak := {x : fk(x) 6= gk(x)} and A :=∞⋃n=1

Ak.

Then A ∈ M, λ(A) = 0 and fk(x) = gk(x) for all x ∈ Ac and all k. Let Bdenote the set of all x such that the sequence {gk(x)} does not converge. ThenB ⊆ A and, by 10.5.3, B ∈ B. Let g = limk gk1Bc . Then g is Borel measurableand {x : g(x) 6= f(x)} ⊆ A so g = f a.e. Therefore, the assertion holds fornonnegative f . The general case follows from the identity f = f+ − f−.

Part (a) of 10.5.1 implies that a continuous function f : Rn → R is Borelmeasurable. In a similar vein,

10.5.12 Proposition. If f : Rn → R be continuous except on a set E ofLebesgue measure zero, then f is Lebesgue measurable.

Proof. Let U ⊆ R be open. Then

f−1(U) = A ∪B, where A := f−1(U) ∩ E and B := f−1(U) ∩ Ec.

Since A ⊆ E and λ(E) = 0, A ∈ M. Since f is continuous on Ec, B is openin Ec, hence B = V ∩ Ec for some open subset V of Rn. Therefore, B ∈ M,so f−1(U) ∈M. By 10.5.1, f is Lebesgue measurable.

4See, for example, [4].

Proposition 10.5.12 implies that a function with at most countably manydiscontinuities is Lebesgue measurable. An examination of the proof shows thatsuch a function is in fact Borel measurable. In particular, monotone functionson R, hence also functions of bounded variation, are Borel measurable (see3.3.6 and 5.9.7).

Note that a function that is continuous except on a set of measure zero isnot necessarily equal a.e. to a continuous function (Exercise 12). Conversely, afunction equal a.e. to a continuous function need not be continuous anywhere;the Dirichlet function is an obvious example.

ExercisesIn Exercises 1–8, F denotes a σ-field of subsets of a set S..

1.S Let f : S → R have the property that 1Akf is F -measurable for every k,where Ak ∈ F and

⋃k Ak = S. Prove that f is F-measurable.

2. Prove that if f : S → R is F-measurable and never zero, then 1/f isF-measurable.

3. Let f : S → R have the property that {x : f(x) < r} ∈ F for all r ∈ Q.Prove that f is F-measurable.

4. Let f : S → R be F-measurable and let g : R→ R be continuous. Showthat g ◦ f is F-measurable.

5. Let g, h : S → R be F-measurable functions. Prove that the followingsets are F-measurable:(a)S {x ∈ S : g(x) > h(x)}, (b) {x ∈ S : g(x) ≥ h(x)},(c) {x ∈ S : g(x) = h(x)}, (d)S {x ∈ S : g(x)h(x) = 1}.

6. Let {fk : S → R} be a sequence of F-measurable functions. Prove thatthe set

{x ∈ S : limk fk(x) exists in R

}is F-measurable.

7. Let f : S → R have range consisting of the distinct values ak, k ∈ N.Show that f is F-measurable iff {x ∈ S : f(x) = ak} ∈ F for every k.

8.S Let E ⊆ S. Prove that 1E is F-measurable iff E ∈ F .

9. Let A, B, and C be subsets of S. Prove:(a) 1AB = 1A1B . (b) 1A∪B = 1A + 1B − 1A1B .(c) 1Ac = 1− 1A (d) 1A ≤ 1B iff A ⊆ B.

10.S Define the symmetric difference A∆B of sets A and B by

A∆B = (A \B) ∪ (B \A) = (A ∪B) \ (A ∩B).

Prove that 1A∆B = |1A − 1B |.

11. Let Ak ⊆ S and set B = lim infk Ak and C = lim supk Ak. (see Exer-cise 10.1.6). Prove that(a) 1B = lim infk 1Ak . (b) 1C = lim supk 1Ak .

12. Prove that 1[0,1] is not equal a.e. to a continuous function on R.

13. Let f : R→ R. Prove that if f ′ exists on R, then f ′ is Borel measurable.

14.S Let f(x) = bx−1c−1, 0 < x ≤ 1. Show that f is Borel measurable on(0, 1].

15. Let f(x) = 1 + r(bx−1c

), 0 < x ≤ 1, where r(k) denotes the remainder

on division of an integer k by 3. Show that f is Borel measurable.

16. Define f : [0, 1]→ R by f(x) = 0 if x is rational and f(x) = 1/√d if x

is irrational, where d is the first nonzero digit in the decimal expansionof x. Prove that f is Borel measurable.

17.S Prove that if the function f in 10.5.8 is bounded, then the convergenceof the sequence is uniform.

18. Let f : R2 → R have the property that f(x, y) is continuous in xfor each y and Borel measurable in y for each x. Let g : R → R beBorel measurable. Prove that the function h(y) := f

(g(y), y

)is Borel

measurable. Hint. Start with indicator functions g.

19.S ⇓5 Let f = (f1, . . . , fm) : Rn → Rm, where each fj : Rn → R is Borelmeasurable. Prove:(a) F :=

{B ∈ B(Rm) : f−1(B) ∈ B(Rn)

}is a σ-field.

(b) F = B(Rm), that is, f−1(B) ∈ B(Rn) for every B ∈ B(Rm).(c) If F : Rm → R is Borel measurable, then the function g := F ◦ f isBorel measurable.

20. (a) Show that B × R ∈ B(R2) for all B ∈ B(R).(b) Let f : R → R be Borel measurable and define g : R2 → R byg(x, y) = f(x). Show that g is Borel measurable.

21. Let 0 ∈ A ⊆ Rn. Define the “radius function” fA : Rn → R by

fA(x) := sup {t ≥ 0 : tx ∈ A} , x ∈ Rn.

(a) Let 0 ∈ Ak for all k and Ak ↑ A. Show that fAk ↑ fA.(b) Show that if A is open, then fA is positive and Borel measurable.(c) Use (b) to show that if A is compact, then fA is Borel measurable.(d) Conclude from 10.4.5 that fA is Borel measurable for any Borel setA containing 0.

Chapter 11Lebesgue Integration on Rn

In this chapter we use the measure theory developed in Chapter 10 to con-struct the Lebesgue integral of a measurable function of several variables. Forcomparison purposes, we begin with a brief description of the Riemann integralon compact subintervals of Rn.

11.1 Riemann Integration on Rn

The n-dimensional Riemann integral is constructed in essentially the sameway as the one-dimensional integral: Let f be a bounded real-valued functionon an n-dimensional interval

[a,b] := [a1, b1]× [a2, b2]× · · · × [an, bn],

where a := (a1, . . . , an) and b := (b1, . . . , bn).

a1 b1

a2

b2

P1

P2

I

I1

I2

x1

x2

FIGURE 11.1: Partition of [a, b]× [c, d].

For each j, let Pj be a partition of the coordinate interval [aj , bj ]. The collectionof all Cartesian products of the resulting coordinate subintervals produces apartition P of [a,b] consisting of n-dimensional subintervals I = I1×I2×· · ·×Inwith volume ∆VI := |I1| |I2| · · · |In| (see Figure 11.1). The lower and upper

367


sums of f over P are defined by

S(f,P) =∑I∈P

mI∆VI , mI := infx∈I

f(x), and

S(f,P) =∑I∈P

MI∆VI , MI := supx∈I

f(x).

The lower and upper integrals on [a,b] are defined by∫ b

a

f := supPS(f,P) and

∫ b

a

f := infPS(f,P),

where the supremum and infimum are taken over all partitions P of [a,b]. Ifthe two integrals are equal, then f is said to be Riemann–Darboux integrableon [a,b]. The common value of these integrals is then denoted by

∫ b

af . As in

the one-variable case, ∫ b

a

f = lim‖P‖→0

S(f,P, {ξI}I),

where ‖P‖ = maxj ‖Pj‖ and S(f,P, {ξI}I) is the Riemann sum

S(f,P, {ξI}I) :=∑I

f(ξI)∆VI , ξI ∈ I, I ∈ P.

The n-dimensional Riemann integral has properties analogous to those ofthe one-dimensional integral. Moreover, as is shown in Section 11.5, if f iscontinuous, then

∫ b

af may be expressed as an iterated integral∫ b1

a1

. . .

∫ bn

an

f(x1, . . . , xn) dxn · · · dx1,

effectively reducing the theory to the one-dimensional case. Integrals overregions bounded by “nice” surfaces may be similarly evaluated.

11.2 The Lebesgue IntegralThe Lebesgue integral on Rn is defined first for nonnegative Lebesgue

measurable simple functions and is then extended to a larger class of functions,including all nonnegative Lebesgue measurable functions. The identity f =f+ − f− is then used to define the integral for general measurable functions.

Lebesgue Integration on Rn 369

The Integral of a Simple Function11.2.1 Definition. Let f ∈ S+(M) have standard form

f =m∑j=1

aj1Aj , Aj := {x : f(x) = aj} ,

where {A1, . . . , Am} is a (measurable) partition of Rn. The Lebesgue integralof f on Rn is defined by ∫

f dλ :=m∑j=1

ajλ(Aj). ♦

Note that the above sum may contain a term of the form0 · (+∞). Whilethis expression was heretofore undefined, it is now necessary to make thedefinition

0 · (+∞) := 0.

In particular, the integral of the identically zero function is 0 · λ(Rn) = 0.

11.2.2 Lemma. If f, g ∈ S+(M) and α ≥ 0, then

(a)∫αf dλ = α

∫f dλ; (b)

∫(f + g) dλ =

∫f dλ+

∫g dλ;

(c)∫f dλ ≤

∫g dλ if f ≤ g a.e. (d)

∫f dλ =

∫g dλ if f = g a.e.

Proof. Part (a) is immediate from the definition, and (d) follows from (c). Toprove (b) and (c), let f and g have standard representations

f =m∑i=1

ai1Ai and g =k∑j=1

bj1Bj ,

so ∫f =

m∑i=1

aiλ(Ai) and∫g =

k∑j=1

bjλ(Bj).

Since Rn =⋃mi=1Ai =

⋃kj=1Bj and the unions are disjoint,

λ(Ai) =k∑j=1

λ(Ai ∩Bj) and λ(Bj) =m∑i=1

λ(Ai ∩Bj),

hence∫f dλ =

m∑i=1

k∑j=1

aiλ(Ai ∩Bj) and∫g dλ =

k∑j=1

m∑i=1

bjλ(Ai ∩Bj). (11.1)


Now let c1, . . . , cp be the distinct values of f + g, and set

C` = {x : (f + g)(x) = c`} , ` = 1, . . . , p.

Then

f + g =p∑`=1

c`1C` and C` =⋃

{(i,j):ai+bj=c`}

Ai ∩Bj (disjoint),

so ∫(f + g) dλ =

p∑`=1

c`λ(Cl) =p∑`=1

c`∑

{(i,j):ai+bj=c`}

λ(Ai ∩Bj)

=m∑i=1

k∑j=1

(ai + bj)λ(Ai ∩Bj).

By (11.1), the last sum is∫f dλ+

∫g dλ, proving (b).

For (c), suppose f ≤ g a.e. and let E = {x : f(x) ≤ g(x)}. Then λ(Ec) = 0and ai ≤ bj for all i, j for which Ai ∩Bj ∩ E 6= ∅. From

λ(Ai ∩Bj) = λ(Ai ∩Bj ∩ E) + λ(Ai ∩Bj ∩ Ec) = λ(Ai ∩Bj ∩ E)

and (11.1), we have∫f dλ =

m∑i=1

k∑j=1

aiλ(Ai ∩Bj ∩ E) ≤k∑j=1

m∑i=1

bjλ(Ai ∩Bj ∩ E) =∫g dλ.

The Integral of a Measurable Function11.2.3 Definition. Let f : Rn → R be Lebesgue measurable. If f ≥ 0, define∫

f dλ =∫f(x) dλ(x) := sup

{∫fs dλ : fs ≤ f, fs ∈ S+(M)

}. (11.2)

In general, define the Lebesgue integral on Rn by∫f dλ :=

∫f+ dλ−

∫f− dλ,

provided at least one of the terms on the right is finite. For E ∈M define theLebesgue integral on E by ∫

E

f dλ :=∫f · 1E dλ

whenever the right side is defined. If both∫Ef+ dλ and

∫Ef− dλ are finite,

then f is said to be (Lebesgue) integrable on E. The collection of all integrablefunctions on E is denoted by L1(E). Finally, f is said to be integrable if it isintegrable on Rn. ♦

Note that from the definition, f ≥ 0⇒∫f ≥ 0. More generally,

11.2.4 Proposition. If f, g : Rn → R are Lebesgue measurable, f ≤ g a.e.,and

∫f dλ,

∫g dλ are defined, then

∫f dλ ≤

∫g dλ. In particular, if f ≥ 0

and g is integrable, then f is integrableProof. Assume first that f, g ≥ 0. Let fs ∈ S+(M) with fs ≤ f and setgs := 1Efs, where E := {x : f(x) ≤ g(x)}. Then gs ∈ S+(M), fs = gs a.e.,and gs ≤ 1Ef ≤ 1Eg ≤ g. By 11.2.2,

∫fs dλ =

∫gs dλ ≤

∫g dλ. Since fs was

arbitrary,∫f dλ ≤

∫g dλ.

In the general case, f+ ≤ g+ and f− ≥ g− a.e., hence, by the first part ofthe proof,∫

f dλ =∫f+ dλ−

∫f− dλ ≤

∫g+ dλ−

∫g− dλ =

∫g dλ.

11.2.5 Corollary. If f : Rn → R is integrable and f = g a.e., then g isintegrable and

∫f dλ =

∫g dλ.

Proof. By 10.5.10, g is Lebesgue measurable. Moreover, f+ = g+ a.e., so by11.2.4,

∫f+ dλ =

∫g+ dλ. Similarly,

∫f− dλ =

∫g− dλ.

11.2.6 Proposition. If f : Rn → R is integrable, then f is finite a.e.Proof. Suppose first that f ≥ 0. Let

A = {x : f(x) = +∞} and Ak = {x : f(x) ≥ k} .

Since f ≥ f1Ak ≥ k1Ak ≥ k1A,

0 ≤ λ(A) ≤ 1k

∫f dλ < +∞.

Letting k → +∞ shows that λ(A) = 0.In the general case, apply the result of the first paragraph to f+ and f−

to obtain

λ( {x : f+(x) = +∞

} )= λ

( {x : f−(x) = +∞

} )= 0,

hence λ({x : |f(x)| = +∞}

)= 0.

11.2.7 Proposition. Let f : Rn → [0,+∞] be Lebesgue measurable. Then∫f dλ = 0 iff f = 0 a.e.

Proof. The sufficiency follows from 11.2.5. For the necessity, suppose∫f dλ = 0

and letB = {x : f(x) > 0} and Bk = {x : f(x) > 1/k} .

Then B =⋃∞k=1Bk and f ≥ f1Bk ≥ k−11Bk so

0 ≤ λ(Bk) ≤ k∫f dλ = 0.

Therefore, λ(Bk) = 0. By countable subadditivity, λ(B) = 0.

By 11.2.4, f ≥ 0 implies that∫Af dλ ≥ 0 for all A ∈M. The following is

a converse:

11.2.8 Proposition. Let f : Rn → R be Lebesgue measurable and supposethat

∫Af dλ is defined for all A ∈M.

(a) If∫Af dλ ≥ 0 for all A ∈M, then f ≥ 0 a.e.

(b) If∫Af dλ = 0 for all A ∈M, then f = 0 a.e.

Proof. Part (b) follows from part (a). To prove (a), let

Ak ={x : f(x) ≤ −k−1} and A = {x : f(x) < 0} .

Then f1Ak ≤ −k−11Ak or 1Ak ≤ −kf1Ak , hence, since∫Akf ≥ 0,

0 ≤ λ(Ak) ≤ −k∫Ak

f dλ ≤ 0.

Therefore, λ(Ak) = 0. Since A =⋃∞k=1Ak, λ(A) = 0.

11.2.9 Remark. The above properties of integrals on Rn also hold for integralson E ∈ M. For example, if f is integrable on E, then f is finite a.e. on E:simply replace f in 11.2.6 by f · 1E . This observation applies to most of theresults that follow. We shall usually refrain from making this explicit, but thereader is invited to formulate and verify such generalizations. ♦

Linearity of the IntegralThe following lemma is a special case of the monotone convergence theorem

proved in the next section.

11.2.10 Lemma (Beppo–Levi). If {fk} is a sequence of nonnegative Lebesguemeasurable functions such that fk ↑ f on Rn, then∫

f dλ = limk

∫fk dλ.

Proof. By 10.5.3, f is Lebesgue measurable, hence∫f dλ is defined. It follows

from 0 ≤ fk ≤ fk+1 ≤ f and 11.2.4 that∫fk dλ ≤

∫fk+1 dλ ≤

∫f dλ.

Therefore, L := lim∫fk dλ exists in R and L ≤

∫f dλ. For the reverse

inequality, it suffices to show that∫g dλ ≤ L for any g ∈ S+(M) with g ≤ f .

Let 0 < r < 1 and set Ek = {x : fk(x) ≥ rg(x)} . Since the sequence {fk} is


increasing, Ek ⊆ Ek+1. Since fk(x) ≥ rg(x) for all large k, Ek ↑ Rn. If g hasstandard form

∑mj=1 aj1Aj , then

fk ≥ fk1Ek ≥ rm∑j=1

aj1Ek∩Aj ,

hence ∫fk dλ ≥ r

m∑j=1

ajλ(Ek ∩Aj).

Letting k → +∞, noting that Ek ∩Aj ↑k Aj , we then obtain

L ≥ rm∑j=1

ajλ(Aj) = r

∫g dλ.

Letting r ↑ 1 yields L ≥∫g dλ, as required.

11.2.11 Theorem. If f, g : Rn → [0,+∞] are Lebesgue measurable, then∫(αf + βg) dλ = α

∫f dλ+ β

∫g dλ α, β ∈ R+.

In particular, if f and g are integrable then so is αf + βg.

Proof. By 10.5.8, there exist sequences {fk} and {gk} in S+(M) such thatfk ↑ f and gk ↑ g. Then αfk + βgk ↑ f + g and, by 11.2.10 and 11.2.2,∫

(αf + βg) dλ = limk

∫(αfk + βgk) dλ

= α limk

∫fk dλ+ β lim

k

∫gk dλ

= α

∫f dλ+ β

∫g dλ.

11.2.12 Corollary. Let f, g : Rn → R be Lebesgue measurable.

(a) f is integrable iff |f | is integrable.

(b) If f is integrable and |g| ≤ |f |, then g is integrable.

(c) If f and g are integrable, then f + g is integrable.

(d) If f is integrable and E ∈M, then f is integrable on E.

Proof. (a) If f is integrable then, by definition, both f+ and f− are integrable,hence, by the theorem, |f | = f+ + f− is integrable. Conversely, if |f | isintegrable, then the inequalities 0 ≤

∫f± dλ ≤

∫|f | dλ show that both f+

and f− are integrable, hence f is integrable.

(b) By (a), |f | is integrable. The inequality |g| ≤ |f | then implies that |g| isintegrable. By (a) again, g is integrable.

(c) If f and g are integrable, then so are |f | and |g|. The inequality |f +g| ≤|f |+ |g| then shows that |f + g| is integrable. By (a), f + g is integrable.

(d) This follows from (b) since |f1E | ≤ |f |.

The following theorem complements 11.2.11.

11.2.13 Theorem. Let f, g : Rn → R be Lebesgue measurable with g inte-grable, and let c ∈ R. Then the following hold:

(a) cg is integrable and∫cg dλ = c

∫g dλ.

(b) If f is integrable, then f + g is integrable and∫(f + g) dλ =

∫f dλ+

∫g dλ. (11.3)

(c) If∫f dλ is defined, then

∫(f + g) dλ is defined and (11.3) holds.1

Proof. (a) If c ≥ 0, then (cg)+ = cg+ and (cg)− = cg−, hence, by 11.2.11, thefunctions (cg)± are integrable and∫

cg dλ =∫

(cg)+ dλ−∫

(cg)− dλ = c

∫g+ dλ− c

∫g− dλ = c

∫g dλ.

Next, observe that (−g)+ = g− and (−g)− = g+ so∫(−g) dλ =

∫(−g)+ dλ−

∫(−g)− dλ =

∫g− dλ−

∫g+ dλ = −

∫g dλ.

Therefore, if c < 0,∫cg dλ =

∫(−c)(−g) dλ = −c

∫(−g) dλ = c

∫g dλ.

(b) By 11.2.12, f+g is integrable. By 11.2.6, there exists a set A of Lebesguemeasure zero such that f(x), g(x) ∈ R for x ∈ Ac. Then on the set Ac,

(f + g)+ − (f + g)− = f + g = f+ − f− + g+ − g−, hence(f + g)+ + f− + g− = (f + g)− + f+ + g+.

1To avoid undefined expressions such as ∞−∞ in the integrand f + g in (b) and (c), itmust be assumed that g is finite-valued. This is no real loss of generality since g is integrable,hence finite-valued a.e. (11.2.6).

By 11.2.11 and 11.2.5,∫(f + g)+ dλ+

∫f− dλ+

∫g− dλ =

∫(f + g)− dλ+

∫f+ dλ+

∫g+ dλ.

Since the integrals in this equation are finite, rearranging yields∫(f + g) dλ =

∫(f + g)+ dλ−

∫(f + g)− dλ

=∫f+ dλ−

∫f− dλ+

∫g+ dλ−

∫g− dλ

=∫f dλ+

∫g dλ.

(c) The cases to be considered are

(i)∫f− dλ < +∞ and

∫f+ dλ = +∞;

(ii)∫f+ dλ < +∞ and

∫f− dλ = +∞.

Suppose that (i) holds. We may assume that both f− and g are finite-valued.Since ∫

(f + g)− dλ ≤∫

(f− + g−) dλ < +∞,∫(f +g) dλ is defined. If

∫(f +g)+ dλ < +∞, then (f +g) would be integrable,

hence, by part (b,) so would f+ = (f+g)+f−−g, contrary to our assumption.Therefore, ∫

(f + g) dλ = +∞ =∫f dλ+

∫g dλ.

Case (ii) is similar (or apply Case (i) to −f).

11.2.14 Corollary. If f is integrable, then∣∣∣ ∫ f dλ

∣∣∣ ≤ ∫ |f | dλ.Proof. Since ±f ≤ |f |, ±

∫f dλ =

∫±f dλ ≤

∫|f | dλ.

Approximation of Integrable Functions11.2.15 Definition. For E ∈M and f ∈ L1(E) define the L1 seminorm of fby

‖f‖1 :=∫E

|f | dλ.

11.2.16 Theorem. L1(E) is a linear space and ‖ · ‖1 has all the propertiesof a norm except the coincidence property.

Proof. That L1(E) is a linear space follows from 11.2.13. Coincidence may failsince ‖f‖1 = 0 only implies that f = 0 a.e. (Consider the Dirichlet function.)The other properties of a norm are easily established.

11.2.17 Theorem. Let f ∈ L1(Rn) and ε > 0. Then there exists a simplefunction g and a continuous function h, each vanishing outside a boundedinterval, such that ‖f − g‖1 < ε and ‖f − h‖1 < ε.

Proof. By considering positive and negative parts, we may assume that f ≥ 0.By definition of

∫f dλ, there exists fs ∈ S+(M) with fs ≤ f such that

‖f − fs‖1 =∫f dλ−

∫fs dλ < ε/4.

Let fs =∑mi=1 ai1Ai , where ai > 0. Since

m∑i=1

aiλ(Ai) =∫fs dλ ≤

∫f dλ < +∞,

λ(Ai) < +∞ for each i. LetM = maxi ai. By 10.1.6(d), there exists a boundedinterval I such that

λ(Ai)− λ(I ∩Ai) < ε/(4Mm), i = 1, . . . ,m.

Set Bi := Ai ∩ I and g :=m∑i=1

ai1Bi . Then

‖g − fs‖1 =m∑i=1

ai[λ(Ai)− λ(Bi)

]< ε/4,

hence‖f − g‖1 ≤ ‖f − fs‖1 + ‖fs − g‖1 < ε/2.

To obtain h, for each i choose a compact set Ci and a bounded open set Uisuch that Ci ⊆ Bi ⊆ Ui and λ(Ui\Ci) < ε/(4mM) (10.4.6). By Exercise 8.5.15,there exists a continuous function hi : Rn → [0, 1] such that hi = 1 on Ci andhi = 0 on U ci . Since hi − 1Bi = 0 on Ci ∪ U ci = (Ui \ Ci)c,

‖1Bi − hi‖1 =∫Ui\Ci

|1Bi − hi| dλ ≤ 2λ(Ui \ Ci) < ε/2mM.

The function h :=∑mi=1 aihi is continuous and by the triangle inequality

‖g − h‖1 < ε/2. Therefore, ‖f − h‖1 < ε, completing the proof.

Translation Invariance of the Integral11.2.18 Theorem. If f : Rn → R is Lebesgue measurable and y ∈ Rn, then∫

f(x+ y) dx =∫f(x) dx (11.4)

in the sense that if one side is defined, then so is the other and the integralsare then equal.

Proof. If E ∈ M, then E − y ∈ M and λ(E − y) = λ(E) (Exercise 10.3.1),hence ∫

1E(y + x) dx =∫

1E−y dλ = λ(E − y) =∫

1E dλ.

Therefore, (11.4) holds for indicator functions.For a function h, define hy(x) := h(y + x). Let f ≥ 0 and let g ∈ S+(M)

with g ≤ f . Then gy ≤ fy and, by the first paragraph and linearity,∫g =

∫gy,

hence∫g ≤

∫fy. Taking the supremum over g yields

∫f ≤

∫fy. Replacing y

by −y and f by fy in this inequality produces the reverse inequality. Therefore,(11.4) holds for f ≥ 0. The general case follows from this and the identities(f±)y = (fy)±.

Exercises1. Let f and g be integrable. Prove:

(a) If∫Ef dλ ≤

∫Eg dλ for all E ∈M, then f ≤ g a.e.

(b) If∫Ef dλ =

∫Eg dλ for all E ∈M, then f = g a.e.

2. Let f(x) = 1 + r(bx−1c

)for 0 < x ≤ 1, where r(k) is the remainder on

division of the positive integer k by 3. (Cf. Exercise 10.5.15.) Show that∫(0,1]

f dλ = 23+

∞∑k=1

[1

3k(3k + 1)+ 2(3k + 1)(3k + 2)+ 3

(3k + 2)(3k + 3)

].

3.S Define f : [0, 1] → R by f(x) = 0 if x is rational, and f(x) = d2 if xis irrational, where d is the first nonzero digit in the decimal expansionof x. (See Exercise 10.5.16.) Show that

∫[0,1] f dλ = 95/3.

4. (a) Prove the following mean value theorem for integrals: Let f becontinuous on a compact connected set K ⊆ Rn. Then there existsxK ∈ K such that ∫

K

f dλ = f(xK)λ(K).

(b) Let f be continuous on C1(x0). Prove that

limr→0

1λ(Cr(x0))

∫Cr(x0)

f dλ = f(x0).

5.S Let f be Lebesgue measurable on R and let m ≤ f ≤M on E ∈M(R).(a) Prove that if g is integrable on E, then there exists a ∈ [m,M ] suchthat ∫

E

f |g| dλ = a

∫E

|g| dλ

(b) Show that part (a) may be false if |g| is replaced by g.

(c) Use (a) to show that at each point x where f is continuous,

limy→x

1y − x

[ ∫[a,y]

f dλ−∫

[a,x]f dλ

]= f(x).

6. (Cauchy–Schwarz inequality) Let f and g be Lebesgue measurable onRn. Prove that (∫

|fg| dλ)2≤∫f2 dλ ·

∫g2 dλ.

(See 5.7.19.)

7.S Prove that if f is integrable on [0, 1] and ε > 0, then there exists apolynomial P on [0, 1] such that

∫ 10 |f − P | dλ < ε.

8. (Absolute continuity of the integral). Let f ≥ 0 be integrable on Rn.Prove that for each ε > 0 there exists a δ > 0 such that∫

E

f dλ < ε for all E ∈M(Rn) with λ(E) < δ.

Conclude that if {Ek} is a sequence in M(Rn) with λ(Ek) → 0, then∫Ekf dλ→ 0. Hint. Begin with simple functions.

9.S Let f be integrable. Prove that limk

∫[k,k+1] fdλ = 0. (A quick proof

uses the dominated convergence theorem. For now, give a proof startingwith simple functions.)

10. Suppose f : I = [0, 1]→ [−1, 1] is integrable. Prove that∫I

f2 dλ ≤ ε2 + λ({x : |f(x)| > ε}

)for every ε > 0.

11.S Let f : I = [0, 1]→ R be integrable. Prove that∫I

f2 dλ ≤ ε2λ({x ∈ I : |f(x)| > ε}

)for every ε > 0.

12. Prove: If f is integrable and f < 1 a.e. on I, then∫If < 1.

13. Let f be Lebesgue integrable on E ∈ M with 0 < λ(E) < +∞ and∫Ef dλ ≥ λ(E). Prove that λ {x ∈ E : f(x) ≥ 1} > 0.

14. Let f be integrable on Rn. Show that for each r > 0 the functionfr(x) := f(rx) is integrable and

∫fr dλ = r−n

∫f dλ.

15.S Let f : [0, 1] → R be a bounded Lebesgue measurable function suchthat

∫[0,1] x

2kf(x) dλ(x) = 0 for all k ∈ Z+. Prove that f = 0 a.e.

16. Let f be Lebesgue integrable and g, g′ bounded and continuous on R.Carry out the following steps to show that

limk

∫f(x)g′(kx) dλ(x) = 0. (11.5)

(a) Prove (11.5) for f = 1[a,b]. (Use the fact that the Riemann andLebesgue integrals of a continuous function on a closed boundedinterval are equal. (Section 11.4.))

(b) Use (a) to show that (11.5) holds for f = 1U , where U is boundedand open.

(c) Use (b) and 10.4.7 to show that (11.5) holds for f = 1E , whereE ∈M is bounded.

(d) Use (c) and 11.2.17 to complete the proof.

If g′(x) = sin x or cosx, then (11.5) is known as the Riemann–Lebesguelemma.

11.3 Convergence TheoremsIn this section we state and prove three pointwise convergence theorems

for the Lebesgue integral. The first of these is a generalization of 11.2.10.11.3.1 Monotone Convergence Theorem. Let fk : Rn → R be Lebesguemeasurable with fk ↑ f on Rn and let

∫f−1 dλ < +∞. Then∫

f dλ = limk

∫fk dλ. (11.6)

Proof. From 0 ≤ f− ≤ f−k ≤ f−1 we have

∫f−k dλ < +∞ and

∫f− dλ < +∞,

hence the integrals in the assertion of the theorem are defined. If∫f+

1 dλ =+∞, then from f+

1 ≤ f+k ≤ f+ we see that each side of (11.6) is +∞. If∫

f+1 dλ < +∞, then f1 is integrable and we may apply 11.2.10 to fk − f1

(≥ 0) to obtain∫fk dλ =

∫(fk − f1) dλ+

∫f1 dλ→

∫(f − f1) dλ+

∫f1 dλ =

∫f dλ.


11.3.2 Remark. Equation (11.6) is still true if the inequalities fk ≤ fk+1 ≤ fand the convergence fk ↑ f hold only almost everywhere. To see this, let Adenote the set on which fk ≤ fk+1 for all k and fk ↑ f . Set fk = fk1A andf = f1A. Then (11.6) holds for the new functions. Since λ(Ac) = 0, 11.2.5shows that the equation holds for the original functions. Analogous remarksapply to the other convergence theorems in this section. ♦

11.3.3 Corollary. If gk is Lebesgue measurable and nonnegative for every k,then ∫ ( ∞∑

k=1gk

)dλ =

∞∑k=1

∫gk dλ.

Proof. Let fk =∑kj=1 gj and f =

∑∞j=1 gj . Then 0 ≤ fk ↑ f on Rn, hence, by

the theorem and linearity,∫f dλ = lim

k

∫fk dλ = lim

k

k∑j=1

∫gj dλ.

11.3.4 Corollary. Let f ≥ 0 be Lebesgue measurable. Define a function µ onM(Rn) by

µ(E) :=∫E

f dλ, E ∈M(Rn).

Then µ is a measure onM(Rn).

Proof. For countable additivity, apply 11.3.3 to gk = 1Ek .

11.3.5 Fatou’s Lemma. If fk is nonnegative and Lebesgue measurable forevery k, then ∫

lim infk

fk dλ ≤ lim infk

∫fk dλ. (11.7)

Proof. Let gk = infj≥k fj and g = lim infk fk. Then gk ≤ fk, gk ↑ g, andgk and g are Lebesgue measurable (10.5.3). By the monotone convergencetheorem,∫

lim infk

fk dλ =∫g dλ = lim

k

∫gk dλ = lim inf

k

∫gk dλ ≤ lim inf

k

∫fk dλ.

The inequality in (11.7) may be strict. For example, if fk = k1[0,1/k], thenthe left side is zero while the right side is one.

11.3.6 Dominated Convergence Theorem. Let g : Rn → [0,+∞] beintegrable and let {fk : Rn → R} be a sequence of Lebesgue measurable functionssuch that |fk| ≤ g for all k. If fk → f on Rn, then f is integrable and∫fk dλ→

∫f dλ.

Proof. Since |f | ≤ g, fk and f are integrable (11.2.12). Fatou’s lemma appliedto g ± fk (≥ 0) shows that∫

g dλ+∫f dλ ≤ lim inf

k

∫(g + fk) dλ =

∫g dλ+ lim inf

k

∫fk dλ

and ∫g dλ−

∫f dλ ≤ lim inf

k

∫(g − fk) dλ =

∫g dλ− lim sup

k

∫fk dλ.

Subtracting∫g dλ in each inequality yields∫f dλ ≤ lim inf

k

∫fk dλ ≤ lim sup

k

∫fk dλ ≤

∫f dλ.

The following example illustrates that care must be taken when applyingthe dominated convergence theorem.

11.3.7 Example. Let p > 0 and define

fk(x) := k

1 + k2x2p , 0 < x ≤ 1, and Ik :=∫

(0,1]fk dλ, k ∈ N.

Clearly, fk → 0 for all p > 0. We show that limk Ik = 0 iff 0 < p < 1. BySection 11.4, below, the integrals are Riemann, hence, making the substitutiont = kxp and setting q = p−1 − 1, we obtain

Ik =∫ 1

0

k

1 + k2x2p dx = 1pkq

∫ k

0

tq

1 + t2dt =

∫gk dλ,

wheregk(t) = 1

pkqtq

1 + t21[0,k].

If p = 1, then q = 0 and Ik = arctan k → π/2. If 0 1, then −1 < q < 0 and

Ik ≥1pkq

∫ 1

0

11 + t2

dt→ +∞. ♦

The following theorem gives general conditions under which one may“differentiate under the integral sign.”

11.3.8 Theorem. Let f(x, y) be Lebesgue measurable on I := (a, b) × (c, d)such that for each y in (c, d) the function f(·, y) is Lebesgue integrable on (a, b)and the derivative fy exists on I. If there exists an integrable function g on(a, b) such that |fy(x, y)| ≤ g(x) for all (x, y) ∈ I, then

d

dy

∫(a,b)

f(x, y) dλ(x) =∫

(a,b)

∂f

∂y(x, y) dλ(x).


Proof. We prove the right-hand derivative version. Let y ∈ (c, d) and yk ↓ y.Set

G(y) =∫

(a,b)f(x, y) dλ(x) and gk(x) = f(x, yk)− f(x, y)

yk − y.

By the mean value theorem, gk(x) = fy(x, tk) for some tk ∈ (y, yk), hence|gk| ≤ g. Since gk(x)→ fy(x, y), the dominated convergence theorem impliesthat

G(yk)−G(y)yk − y

=∫

(a,b)gk(x) dλ(x)→

∫(a,b)

fy(x, y) dλ(x).

Since {yk} was arbitrary, G′r(y) exists and equals∫

(a,b) fy(x, y) dλ(x).

Exercises1.S Prove the following:

(a) limk

∫[0,π]

sink x (1− sin x)k dλ = 0.

(b) limk

∫[0,+∞)

k sin(x3/2)

1 + k2x2 dλ(x) = 0.

2. Let f : Rn → (0,+∞) be integrable. Prove that

(a)∫k ln(1 + k−1f) dλ→

∫f dλ. (b) S

∫k ln(1 + k−2f) dλ→ 0.

(c) S

∫k sin

(k−1f

)dλ→

∫f dλ. (d)

∫E

f1/k dλ→ λ(E), E ∈M.

3.S Let f, g : Rn → (0,+∞) be Lebesgue measurable with g integrable.Prove: ∫

g(1 + k−1f)k exp (−f) dλ→∫g dλ.

4. Let f : Rn → [1,+∞) be Lebesgue measurable and g : Rn → [0,+∞)integrable. Prove that ∫

k2g exp (−kf) dλ→ 0.

5.S Let f be integrable on (0,∞). Show that for each t ∈ R the func-tion f(x) sin(tx)/x is integrable on (0,∞) and prove that the integral∫

(0,∞) f(x)x−1 sin(tx) dλ(x) is continuous in t.

6. Prove that the derivative of the gamma function (5.7.8) is

Γ′(x) =∫ ∞

0tx−1e−t ln t dt, x > 0.

(Use the fact, proved in the next section, that the improper Riemannintegral and the Lebesgue integral of a nonnegative continuous functionare equal.)

7. Let f : [0,+∞)→ R be bounded and Lebesgue measurable and supposethat limx→+∞ f(x) = r. Show that

limk

∫[0,a]

f(kx) dλ(x) = ar for every a > 0.

Hint. Use Exercise 11.2.14.

8.S Let f : Rn → R be Lebesgue measurable and have countable range{a1, a2, . . .}. Set Ak = {x ∈ Rn : f(x) = ak}. Prove that f is integrableiff the series

∑∞k=1 akλ(Ak) converges absolutely, in which case the value

of the series is∫f dλ.

9. Let p > 1 and f(x) := bx−1c−p, 0 < x < 1. Find∫

(0,1) f dλ.

10. Let fk, f be integrable and Ek, E ∈M(Rn) such that

limk‖fk − f‖1 = 0 and lim

kλ(Ek∆E) = 0

(see Exercise 10.5.10). Prove that

limk

∫Ek

fk dλ =∫E

f dλ.

11. Let f : Rn → R be integrable and ε > 0.(a) Prove that the set A = {x : |f(x)| ≥ ε} has finite measure.(b) Show that there exists B ∈M with λ(B) < +∞ such that∣∣∣∣ ∫ f dλ−

∫B

f dλ

∣∣∣∣ < ε.

12.S Let {fk} be a sequence of integrable functions on Rn such that∞∑k=1‖fk‖1 < +∞. Prove that limk fk(x) = 0 a.e.

13. Let f be Lebesgue integrable on R and p > 0. Prove that the series∑∞k=1 k

−pf(kx) converges absolutely a.e. on R.

14. Let T : C([a, b]

)→ C

([a, b]

)be linear and continuous in the L1 norm. If

f : [a, b]× [c, d]→ R is continuous, prove that

T

∫ d

c

f(·, x) dx =∫ d

c

Tf(·, x) dx

where the integrals may be taken to be Riemann.


15.S Prove the following extension of Fatou’s lemma: If fk, g are Lebesgueintegrable on Rn and fk ≥ g for all k, then∫

lim infk

fk dλ ≤ lim infk

∫fk dλ.

16. Let g be integrable on Rn and let {fk} be a sequence of Lebesguemeasurable functions on Rn such that |fk| ≤ g. Show that∫

lim infk

fk ≤ lim infk

∫fk ≤ lim sup

k

∫fk ≤

∫lim sup

kfk.

17. Let f, fk be nonnegative Lebesgue integrable functions on Rn such thatfk → f . Prove that∫

fk dλ→∫f dλ iff ‖fk − f‖1 → 0.

Hint. For the necessity, note that (fk − f)− ≤ f .

18. Let f, fk be integrable on Rn with fk → f . Prove that

‖fk − f‖1 → 0 iff∫|fk| dλ→

∫|f | dλ.

Hint. For the sufficiency use Fatou’s lemma.

19.S Let f be Lebesgue integrable on R such that∫

[a,b] f dλ = 0 for allintervals [a, b]. Prove that f = 0 a.e. Hint. Use 11.3.4, 10.4.4, andExercise 11.2.8.

20. Let f : R2 → R have the property that f(x, y) is Lebesgue measurablein y for each x and continuous in x for each y. Suppose there exists anintegrable function g : R→ R such that |f(x, y)| ≤ g(y) for all x and y.Prove that the function

F (x) :=∫f(x, y) dλ(y)

is continuous.

21. Let f ≥ 0 be integrable on [1,+∞). Prove that∑∞k=1 f(x+k) is integrable

on [0, 1]. Conclude that the series converges a.e. on [1,+∞).

22. Let f be Lebesgue measurable on I = [0, 1] and set

Ak = {x ∈ I : |f(x)| ≥ k} .

Prove:(a) f is integrable on I iff

∑∞k=0 λ(Ak) converges.

(b) If f is integrable on I = [0, 1] then limk kλ(Ak) = 0.


11.4 Connections with Riemann IntegrationThroughout the section, f denotes an arbitrary bounded

real-valued function on a closed and bounded interval [a, b].

In this section we show that f is Riemann integrable if and only if its setof discontinuities has Lebesgue measure zero. The first step is to show thatthe upper and lower integrals of f may be expressed as integrals of Borelmeasurable functions.

By 5.2.1 there exists a sequence of partitions {Pk} of [a, b] such that Pk+1is a refinement of Pk, ‖Pk‖ → 0, and∫ b

a

f = limkS(f,Pk),

∫ b

a

f = limkS(f,Pk).

Definehk =

∑j

mj1[xj−1,xj ] and gk =∑j

Mj1[xj−1,xj ],

wheremj := inf

xj−1≤x≤xjf(x), Mj := sup

xj−1≤x≤xjf(x),

and the intervals [xj−1, xj ] are those generated by the partition Pk. Then gkand hk are Borel measurable simple functions and

S(f,Pk) =∫

[a,b]hk dλ, S(f,Pk) =

∫[a,b]

gk dλ.

Moreover,h1 ≤ h2 ≤ · · · ≤ f ≤ . . . ≤ g2 ≤ g1,

hence h(x) := limk hk(x) and g(x) := limk gk(x) exist in R for each x ∈ [a, b],h ≤ f ≤ g, and h and g are Borel measurable. If M is a bound for |f |, then|hk|, |gk| ≤M a.e., hence by the dominated convergence theorem∫ b

a

f = limkS(f,Pk) = lim

k

∫[a,b]

hk dλ =∫

[a,b]h dλ (11.8)

and ∫ b

a

f = limkS(f,Pk) = lim

k

∫[a,b]

gk dλ =∫

[a,b]g dλ. (11.9)

11.4.1 Lemma. f ∈ Rba iff g = h a.e. In this case, f is Lebesgue measurableand

∫ baf =

∫[a,b] f dλ.

Proof. From (11.8) and (11.9), f ∈ Rba iff∫

[a,b](g−h) dλ = 0, which, by 11.2.7,is equivalent to g = h a.e. If this holds, then h = f = g a.e. so f is Lebesguemeasurable and

∫ baf =

∫[a,b] f dλ by (11.8) and (11.9).

11.4.2 Lemma. Suppose that x ∈ [a, b] is not a member of any of the partitionsPk. Then f is continuous at x iff h(x) = g(x).

Proof. Suppose f is continuous at x. Given ε > 0, choose δ > 0 such thaty ∈ [a, b] and |x−y| < δ implies |f(x)−f(y)| < ε. Choose N so that ‖Pk‖ < δfor all k ≥ N and fix k ≥ N . Since x is in some subinterval (xj−1, xj) of Pk,

f(x)− ε < f(y) < f(x) + ε for all y ∈ [xj−1, xj ],

hencef(x)− ε ≤ hk(x) = mj ≤Mj = gk(x) ≤ f(x) + ε.

Letting k → +∞ yields

f(x)− ε ≤ h(x) ≤ g(x) ≤ f(x) + ε,

and since ε was arbitrary, g(x) = h(x).Conversely, let g(x) = h(x). Given ε > 0, choose k such that

|gk(x)− g(x)| < ε and |hk(x)− h(x)| < ε.

Suppose that x is in the open subinterval (xi−1, xi) of Pk. Choose δ > 0 sothat (x− δ, x+ δ) ⊆ (xi−1, xi). Then for all y ∈ (x− δ, x+ δ),

h(x)− ε ≤ hk(x) ≤ f(y) ≤ gk(x) ≤ g(x) + ε = h(x) + ε,

which implies that |f(x)− f(y)| < 2ε. Therefore, f is continuous at x.

Here is the main result of the section.

11.4.3 Theorem. Let f : [a, b]→ R be bounded. Then f ∈ Rba iff the set Dof discontinuities of f has Lebesgue measure zero. In this case, f is Lebesguemeasurable and ∫ b

a

f(x) dx =∫

[a,b]f dλ.

Proof. Let A denote the union of the partitions Pk and set B ={x : g(x) 6= h(x)}. By 11.4.2,

B ∩Ac ⊆ D ⊆ A ∪B

Since A is countable, λ(A) = 0, hence λ(B∩Ac) = λ(A∪B) = λ(B). It followsthat λ(B) = λ(D). Thus, by 11.4.1, f ∈ Rba iff λ(D) = 0.

11.4.4 Example. Let A := (0, 1) \ E, where E is the Cantor ternary set(10.3.4). Since A is open, the function f(x) = 1A(x) sin(πx) is continuous onA. Since λ(E) = 0, f is both Riemann and Lebesgue integrable on [0, 1] and∫ 1

0f(x) dx =

∫ 1

0sin(πx) dx = 2

π. ♦

11.4.5 Remark. Theorem 11.4.3 readily extends to n-dimensional Riemannintegrals; the statement and proof are essentially the same. Note that in thiscase, a Riemann integrable function f may be discontinuous on m-dimensionalhyperplanes, m < n, as these have Lebesgue measure zero (see 11.6.9). ♦

Here is the connection between improper integrals and Lebesgue integrals.

11.4.6 Corollary. Let g be locally Riemann integrable on [a, b) (where b couldbe infinite). Then g is Lebesgue measurable on [a, b). Moreover:

(a) If g ≥ 0, then g is improperly integrable on [a, b) iff g is Lebesgue integrableon [a, b), in which case ∫ b

a

g =∫

[a,b)g dλ. (11.10)

(b) If g is Lebesgue integrable on [a, b), then g is improperly integrable on [a, b)and (11.10) holds.

(c) If g is improperly integrable on [a, b), then g need not be Lebesgue integrableon [a, b).

Proof. (a) Let bk ↑ b and let D denote the set of discontinuities of g on [a, b).Since g is Riemann integrable on [a, bk], λ

([a, bk] ∩D

)= 0. By the theorem,

1[a,bk]g is Lebesgue measurable for every k and∫ bk

a

g =∫

[a,bk]g dλ =

∫1[a,bk]g dλ.

Taking limits we see that g is Lebesgue measurable and, by the monotoneconvergence theorem, 11.10 holds.

(b) If g is Lebesgue integrable on [a, b), then, by (a), g+ and g− areimproperly integrable on [a, b) hence (b) holds.

(c) The functiong(x) = x−1 sin x

is improperly integrable but not absolutely improperly integrable on [1,+∞)(5.7.18). Since a Lebesgue integrable function is absolutely integrable, g cannotbe Lebesgue integrable on [1,+∞).

11.5 Iterated Integrals

For the remainder of the text we also use the notation dx

for dλ(x) and∫ baf(x) dx for

∫[a,b] f(x) dλ(x), etc.

In this section we state and prove a result that gives general conditionsunder which the Lebesgue integral of a function on Rn may be expressed asan iterated integral, a useful tool for evaluating integrals.

11.5.1 Fubini–Tonelli Theorem. Let f be Borel measurable on Rn and letp, q ∈ N with p+ q = n.

(a) If f ≥ 0, then the functions∫Rqf(x, z) dz and

∫Rpf(z,y) dz

are Borel measurable in x ∈ Rp and y ∈ Rq, respectively, and∫Rp

∫Rqf(x, z) dz dx =

∫Rq

∫Rpf(z,y) dz dy. (11.11)

(b) If either of the iterated integrals∫Rp

∫Rq|f(x, z)| dz dx or

∫Rq

∫Rp|f(z,y)| dz dy

is finite, then both are finite, f is integrable, and (11.11) holds.

By induction we have

11.5.2 Corollary. Let f be Borel measurable on Rn such that∫ ∞−∞· · ·∫ ∞−∞|f(x1, . . . , xn)| dxi1 · · · dxin < +∞

for some permutation (i1, . . . , in) of (1, . . . , n). Then f is integrable and∫Rnf dλ =

∫ ∞−∞· · ·∫ ∞−∞

f(x1, . . . , xn) dxj1 · · · dxjn .

for every permutation (j1, . . . , jn) of (1, . . . , n).

11.5.3 Example. We prove the Gaussian density formula∫ ∞−∞

ϕ(t) dt = 1, where ϕ(t) := e−t2/2

√2π

. (11.12)

By 11.4.6, the integral may be interpreted either as a Lebesgue integral or as animproper Riemann integral. The function ϕ is called the standard normal (orGaussian) density. It plays an important role in probability and statistics. Forexample, σ−1 ∫ b

aϕ[(x− µ)/σ

]dx is the probability that randomly chosen data

from a normally distributed population with mean µ and standard deviationσ lies between a and b, and σ−1 ∫∞

−∞ ϕ[(x − µ)/σ

]x dx is the average of the

data.To verify (11.12) note that because the integrand is an even function, the

left side is 2∫∞

0 ϕ(t) dt. By a change of variable,

2∫ ∞

0ϕ(t) dt = 2√

2π

∫ ∞0

e−t2/2 dt = 2√

π

∫ ∞0

e−t2dt.

Thus it suffices to show that2√π

∫ ∞0

e−t2dt = 1.

Let I denote the integral on the left. Then

I2 =∫ ∞

0e−y

2∫ ∞

0e−t

2dt dy

=∫ ∞

0e−y

2∫ ∞

0ye−x

2y2dx dy, by the substitution t = xy

=∫ ∞

0

∫ ∞0

ye−y2(1+x2) dy dx, by 11.5.1

= 12

∫ ∞0

(1 + x2)−1∫ ∞

0e−u du dx, by the substitution u = y2(1 + x2)

= 12 arctan x

∣∣∣∞0

because∫∞

0 e−u du = 1

= π

4 ♦

11.5.4 Example. Let f, g : Rn → R be Borel measurable and integrable. ByExercise 10.5.19, the function F (x,y) := f(x − y)g(y) is Borel measurablein (x,y). By the Fubini–Tonelli theorem and translation invariance of theintegral,∫

Rn×Rn|F (x,y)| dλ(x,y) =

∫Rn|g(y)|

∫Rn|f(x− y)|dx dy = ‖g‖1‖f‖1 < +∞.

Therefore, F is integrable, hence the function

(f ∗ g)(x) :=∫Rnf(x− y)g(y)dy,

called the convolution of f and g, is finite a.e. and integrable on Rn. Con-volutions are useful in calculating the probability distribution of a sum ofindependent random variables. ♦


11.5.5 Example. (Volume of a simplex). Let a > 0 and let ej , 1 ≤ j ≤ n, bethe standard basis in Rn. Define the n-dimensional simplex in Rn by

S(a, n) ={x :

n∑j=1

xj ≤ a and xj ≥ 0}.

a

x2

a

x1

a

x3

FIGURE 11.2: Three-dimensional simplex.

We use the Fubini–Tonelli theorem and induction to show that

λn(S(a, n)

)= an

n! .

The formula holds for n = 1 since S(a, 1) = [0, a]. Assume the formula holdsfor n− 1 and all a > 0. Then

λn(S(a, n)

)=∫

1S(a,n)(x1, . . . , xn) d(x1, . . . , xn)

=∫

[0,a]1S(a−xn,n−1)(x1, . . . , xn−1) d(x1, . . . , xn−1) dxn

= 1(n− 1)!

∫[0,a]

(a− xn)n−1 dxn.

The last integral evaluates to an/n, completing the proof. ♦

11.5.6 Example. Let Cnr (x) denote the closed ball in Rn with center x andradius r. We show that λ

(Cnr (x)

)= rnαn, where

αn =

(2π)n/2

n(n− 2) · · · 4 · 2 if n is even,

2(2π)(n−1)/2

n(n− 2) · · · 3 · 1 if n is odd.

For ease of notation we write Cnr for Cnr (0) and denote by 1r the indicatorfunction of Cnr . By the translation and dilation properties of Lebesgue measure,λ(Cnr (x)

)= rnλ

(Cn1), hence it suffices to establish the formula for r = 1 and

x = 0.


If n = 1, then Cn1 = (−1, 1) and αn = 2, so the formula holds in this case.By a simple integration, λ

(C2

1)

= π, hence the formula holds for n = 2 as well.Now assume that n > 2. From

Cn1 ={

(x1, . . . , xn) : x21 + · · ·+ x2

n ≤ 1}

={

(x1, . . . , xn) : x23 + · · ·+ x2

n ≤ 1− x21 − x2

2, (x1, x2) ∈ C21}

we have11(x1, . . . , xn) = 1√1−x2

1−x22

(x3, . . . , xn)11(x1, x2),

hence, by the Fubini–Tonelli theorem,

λ(Cn1)

=∫R2

11(x1, x2)∫Rn−2

1√1−x21−x2

2(x3, . . . , xn) dλ(x3, . . . , xn) dx1 dx2.

The inner integral is

λ

(Cn−2√

1−x21−x2

2

)= (1− x2

1 − x22)(n−2)/2λ

(Cn−2

1),

hence, changing to polar coordinates,2

λ(Cn1)

= λ(Cn−2

1) ∫

x21+x2

2≤1(1− x2

1 − x22)(n−2)/2 dx1 dx2

= λ(Cn−2

1) ∫ 2π

0

∫ 1

0(1− r2)(n−2)/2r dr dθ

= 2πnλ(Cn−2

1).

Iterating, we obtain

λ(Cn1)

= 2πnλ(Cn−2

1)

= (2π)2

n(n− 2)λ(Cn−4

1)

= · · ·

= (2π)m−1

n(n− 2) · · · (n− 2(m− 2))λ(Cn−2(m−1)1

).

Thusλ(C2m

1)

= (2π)m−1

2m(2m− 2) · · · 4λ(C2

1)

= (2π)m2m(2m− 2) · · · 2

and

λ(C2m−1

1)

= (2π)m−1

(2m− 1)(2m− 3) · · · 3λ(C1

1)

= 2(2π)m−1

(2m− 1)(2m− 3) · · · 3 . ♦

2The general change of variables theorem for Lebesgue integrals is proved in the nextsection.


Proof of the Fubini–Tonelli theorem.

We show first that part (b) of the theorem is a consequence of part (a).Indeed, if one of the iterated integrals in (b) is finite, then, by part (a) appliedto |f |, so is the other and f is integrable. Applying part (a) to f±, we see that(11.11).

Next, observe that if part (a) of the theorem holds for indicator functionsthen, by linearity of the integrals, it holds for nonnegative simple functions. By10.5.8 and the monotone convergence theorem, (a) holds for all nonnegativeBorel measurable functions.

It remains then to prove (a) for indicator functions. The proof consistsof several lemmas, the first of which is a special case of a theorem due toE.B. Dynkin.

11.5.7 Lemma. Let F denote the intersection of all collections G of subsetsof Rn with the following properties:

(a) If A, B ∈ G and A ⊆ B, then B \A ∈ G.

(b) If Ak ∈ G and Ak ↑ A, then A ∈ G.

(c) G contains every bounded interval.

Then F is a σ-field containing B(Rn).

Proof. It is easy to see that F itself has properties (a)–(c). Moreover, from (b)and (c), F contains every interval. In particular, Rn ∈ F .

We show first that F is closed under finite intersections. To see this, fixA ∈ F and define

FA := {B ∈ F : A ∩B ∈ F} .One easily checks that FA has properties (a) and (b). Furthermore, if A is aninterval, then FA has property (c) so by minimality F ⊆ FA. This shows thatif B ∈ F , then A ∩B ∈ F for all intervals A; in other words, FB contains allintervals. Thus FB has properties (a)–(c). By minimality, F ⊆ FB, that is,A, B ∈ F ⇒ A ∩B ∈ F . By induction, F is closed under finite intersections.

Now observe that property (a), together with the fact that Rn ∈ F , impliesthat F is closed under complements. Thus if {Ek} is a sequence in F , then,by the result of the preceding paragraph,

Ak :=k⋃j=1

Ek =( k⋂j=1

Eck

)c∈ F .

By (b),⋃∞k=1Ek =

⋃∞k=1Ak ∈ F . This shows that F is a σ-field. Since F

contains all intervals, it must contain B(Rn).

11.5.8 Lemma. Let p, q ∈ N with p+ q = n. If A ∈ B(Rp) and B ∈ B(Rq),then

A×B ∈ B(Rn) and λ(A×B) = λ(A)λ(B). (11.13)


Proof. For fixed bounded intervals I ⊆ Rp and J ⊆ Rq, define

GI,J ={B ∈ B(Rq) : I×(B∩J) ∈ B(Rn) & λ

(I×(B∩J)

)= λ(I)λ(B∩J)

}.

We show that GI,J has properties (a)–(c) of 11.5.7. Clearly, (c) holds. IfB ∈ GI,J , then

I × (Bc ∩ J) = (I × J) \(I × (B ∩ J)

)∈ B(Rn) and

λ(I × (Bc ∩ J)

)= λ

(I × J

)− λ(I × (B ∩ J)

)= λ(I)

[λ(J)− λ(B ∩ J)

]= λ(I)λ(J ∩Bc),

hence Bc ∈ G. Therefore, GI,J is closed under complements. Now let Bk ∈ GI,Jand Bk ↑ B. Then

I × (J ∩B) =∞⋃k=1

I × (J ∩Bk) ∈ B(Rn)

and, by 10.1.6,

λ(I × J ∩B)

)= lim

kλ(I × (J ∩Bk)

)= λ(I) lim

kλ(J ∩Bk) = λ(I)λ(J ∩B),

which shows that B ∈ GI,J . Therefore, GI,J has properties (a)–(c) of 11.5.7, soB(Rq) = GI,J . We have shown that for all bounded intervals I ⊆ Rp, J ⊆ Rqand all B ∈ B(Rq),

I × (B ∩ J) ∈ B(Rn) and λ(I × (B ∩ J)

)= λ(I)λ(B ∩ J).

Taking a sequence of bounded intervals Jk ↑ Rn, we see that

I ×B ∈ B(Rn) and λ(I ×B)

)= λ(I)λ(B). (11.14)

Now fix B ∈ B(Rq) and let I ⊆ Rp be a bounded interval. Define

HB,I = {A ∈ B(Rp) : (A∩I)×B ∈ B(Rn) & λ((A∩I)×B

)= λ(A∩I)λ(B)}.

By (11.14), HB,I contains all intervals. Arguing as above, we see that HB,I =B(Rp). Thus for all A ∈ B(Rp), B ∈ B(Rq), and all bounded intervals I ⊆ Rp,

(A ∩ I)×B ∈ B(Rn) and λ((A ∩ I)×B

)= λ(A ∩ I)λ(B).

Taking a sequence of bounded intervals Ik ↑ Rn in the last equation yields(11.13).

The following lemma asserts that part (a) of the Fubini–Tonelli theoremholds for indicator functions of Borel sets and hence completes the proof ofthe theorem.


11.5.9 Lemma. Let p, q ∈ N with p+ q = n and let C ∈ B(Rn). Then∫Rq

1C(x, z) dz and∫Rp

1C(z,y) dz

are Borel measurable functions of x ∈ Rp and y ∈ Rq, respectively, and

λ(C) =∫Rp

∫Rq

1C(x, z) dz dx =∫Rq

∫Rp

1C(z,y) dz dy.

Proof. Let G denote the collection of all C ∈ B(Rn) for which the assertionsof the lemma hold. We show that G = B(Rn). The first step is to show that Ghas properties (b) and (c) of 11.5.7.

For property (b), let Ck ∈ G and Ck ↑ C. Then 1Ck(x, z) ↑ 1C(x, z), hence,by the monotone convergence theorem,∫

Rq1Ck(x, z) dz ↑

∫Rq

1C(x, z) dz, x ∈ Rp.

Thus∫Rq 1C(x, z) dz is Borel measurable in x. Applying the monotone conver-

gence theorem again, we see that

λ(C) = limkλ(Ck) = lim

k

∫Rp

∫Rq

1Ck(x, z) dz dx =∫Rp

∫Rq

1C(x, z) dz dx,

and similarly for the other iterated integral. Therefore, G has property (b).For property (c), let A ∈ B(Rp), B ∈ B(Rq), and C = A×B. Then∫

Rq1C(x, z) dz =

∫Rq

1A(x)1B(z) dz = 1A(x)λ(B),

which is Borel measurable in x and, together with 11.5.8, implies that∫Rp

∫Rq

1C(x, z) dz dx = λ(A)λ(B) = λ(C).

Similar assertions hold for the other iterated integral. Thus, G contains Carte-sian products of Borel sets and, in particular, all intervals.

Now let I be a bounded interval in Rn and let GI = {B ∈ B : B ∩ I ∈ G}.Since G has properties (b) and (c) of 11.5.7, so does GI . We claim that GI alsohas property (a). To see this, let C, D ∈ GI with C ⊆ D and let E = D \ C.Since 1E∩I = 1D∩I − 1C∩I ,∫

Rq1E∩I(x, z) dz =

∫Rq

1D∩I(x, z) dz −∫Rq

1C∩I(x, z) dz,

which, because C ∩ I and D∩ I ∈ G, is Borel measurable in x and implies that∫Rp

∫Rq

1E∩I(x, z) dz dx

=∫Rp

∫Rq

1D∩I(x, z) dz dx−∫Rp

∫Rq

1C∩I(x, z) dz dx

= λ(D ∩ I)− λ(C ∩ I)= λ(E ∩ I).


Here we have used the fact that, because I is bounded, the calculations takeplace in R, hence subtraction is legitimate. The other iterated integral istreated similarly. Therefore E ∈ GI , as required.

Since GI has properties (a)–(c) of 11.5.7, GI contains all Borel sets. Thismeans that for any C ∈ B(Rn) and bounded interval I ⊆ Rn, the functions∫

Rq1C∩I(x, z) dz and

∫Rp

1C∩I(z,y) dz

are Borel measurable in x and y, respectively, and∫Rp

∫Rq

1C∩I(x, z) dz dx = λ(C ∩ I) =∫Rq

∫Rp

1C∩I(z,y) dz dy.

Taking an increasing sequence of bounded intervals I tending to Rn and usingthe monotone convergence theorem shows that C ∈ G. Therefore, G = B(Rn),as required.

Exercises1.S Prove that λn

({(x1, . . . , xn) : xj ∈ Q for some j}

)= 0.

2. Evaluate∫

[0,+∞)n f , where

(a)S f(x) = x1 · · ·xne−‖x‖2 . (b) f(x) = x1 · · ·xn(1 + ‖x‖2)−n−1.

3. (Cavalieri’s principle). For E ∈M(Rn) and t ∈ R, define

Et :={x = (x1, . . . , xn−1) ∈ Rn−1 : (x, t) ∈ E

}.

Suppose that Et ∈M(Rn) for all t ∈ [a, b]. Prove that

λn

[E ∩

(Rn−1 × [a, b]

)]=∫ b

a

λn−1(Et) dt.

Thus the “volume” of the portion ofE between the hyperplanes xn = aand xn = b is the integral from a to b of the “cross-sectional areas”λn−1(Et).

4. Let f and g be Riemann integrable on [0, 1]. Prove that∫ 1

0

∫ x

0g(x− y)f(y) dy dx =

∫ 1

0

∫ 1−y

0g(x)f(y) dx dy

=∫ 1

0

∫ 1−x

0g(x)f(y) dy dx.

5.S Evaluate∫

0≤x≤x1≤···≤xm≤1x dλ(x, x1, . . . , xm).

6. Show that∫ 1

0

∫ 1

0

x2 − y2

(x2 + y2)2 dy dx = −∫ 1

0

∫ 1

0

x2 − y2

(x2 + y2)2 dx dy = π

4 .

Why does this not contradict the Fubini–Tonelli theorem?

7.S Let f be integrable on (0, 1), p > 0, and define

g(x) =∫

[x1/p,1)t−pf(t) dt, 0 < x < 1.

Prove that g is integrable on (0, 1) and that∫(0,1)

g dλ =∫

(0,1)f dλ.

8. Let f ′ be continuous on [−1, 1]. Show that

(a)∫ 2π

0

∫ 1

0f ′(r cos θ)r cos2 θ dr dθ

=∫ 2π

0f(cos θ) cos θ dθ −

∫ 2π

0

∫ cos θ

0f(x) dx dθ.

(b)∫ 2π

0

∫ 1

0f ′(r cos θ)r sin2 θ dr dθ =

∫ 2π

0

∫ cos θ

0f(x) dx dθ.

(c)∫ 2π

0

∫ 1

0f ′(r cos θ)r dr dθ =

∫ 2π

0f(cos θ) cos θ dθ.

9. Let a, b > 0. Use the Fubini–Tonelli theorem, the dominated convergencetheorem, and the identity 1/x =

∫∞0 e−xt dt, x > 0, to prove that

(a)S

∫ ∞0

sin xx

dx = π

2 . (b)∫ ∞

0

e−ax − e−bx

xdx = ln b− ln a.

10. Show that ϕ ∗ ϕ(x) = 1√2ϕ

(x√2

).

11. Let f, g : Rn → R be Borel measurable and integrable. Prove:(a)S f ∗ g = g ∗ f .(b) If f and g are continuous, thend

dz

∫Az

f(x)g(y) dx dy = f ∗ g(z), where Az = {(x, y) : x+ y ≤ z} .

12. Let f : [0, 1]→ (0,+∞] be Lebesgue measurable. Use the Fubini–Tonellitheorem to prove that∫

[0,1]f dλ

∫[0,1]

1/f dλ ≥ 1.

(A simpler but less interesting proof uses the Cauchy–Schwarz inequality.)


13. Let f and g be positive Lebesgue measurable functions on [0, 1] suchthat fg ≥ 1. Use the preceding exercise to prove that∫

[0,1]f dλ

∫[0,1]

g dλ ≥ 1.

(The Cauchy–Schwarz inequality may be used here as well.)

14.S Let f and g be Lebesgue integrable on [a, b] and for x ∈ [a, b] let

F (x) = F (a) +∫

[a,x]f(t) dλ(t) and G(x) = G(a) +

∫[a,x]

g(t) dλ(t),

where F (a) and G(a) are arbitrary. Prove that∫[a,b]

F (x)g(x) dλ(x) +∫

[a,b]G(x)f(x) dλ(x) = F (b)G(b)− F (a)G(a).

15. (a) Verify that the function

κ(t, x) = 12√πte−x

2/4t = 1√2tϕ

(x√2t

)is a solution of the heat equation

wt(t, x) = wxx(t, x), x ∈ R, t > 0.

(b) Let w0(x) be integrable on R and define

w(t, x) =∫ ∞−∞

w0(y)κ(t, x− y) dy,

the convolution of κ with w0. Show that w(t, x) satisfies the heat equation.(c) Verify that

w(t, x) =∫ ∞−∞

w0(x+ z

√2t)ϕ(z) dz.

(d) Use (c) and the dominated convergence theorem to show that if w0 iscontinuous and satisfies |w0(x)| ≤ aeb|x| for some positive constants a, band for all x, then limt→0+ w(t, x) = w0(x). Conclude that the solutionw(t, x) may be continuously extended to [0,+∞)× R and consequentlysatisfies the boundary condition w(0, x) = w0(x).

16. For a Borel measurable function f : R→ [0,+∞), define

A := {(x, y) : 0 ≤ y ≤ f(x)} and Ay := {x : f(x) > y} , y ∈ R.

Prove:(a) A ∈ B(R2).(b) The function y 7→ λ(Ay) is Borel measurable and∫

f(x) dλ(x) =∫

(0,+∞)λ(Ay) dλ(y) = λ(A).

(c) Part (b) holds if A and Ay are replaced, respectively, by

B = {(x, y) : 0 ≤ y < f(x)} and By = {x : f(x) ≥ y} .

(d) λ({(x, y) : f(x) = y}

)= 0. (The graph of a Borel function has

measure zero.)

11.6 Change of VariablesIn Chapter 5 we proved that if ϕ : [a, b]→ R is continuously differentiable

with everywhere nonzero derivative and if f is Riemann integrable on [c, d] :=ϕ([a, b]), then ∫ d

c

f(y) dy =∫ b

a

f(ϕ(x))|ϕ′(x)| dx.

In this section we prove the following n-dimensional version of this result.

11.6.1 Change of Variables Theorem. Let U and V be open subsets of Rnand let ϕ : U → V be C1 on U with C1 inverse ϕ−1 : V → U . If f is Lebesguemeasurable on V and either f ≥ 0 or f is integrable, then∫

V

f(y) dy =∫U

(f ◦ ϕ)(x)|Jϕ(x)| dx, (11.15)

where Jϕ is the Jacobian of ϕ on U .

11.6.2 Example. Spherical coordinates (r, θ1, θ2, . . . , θn−1) in Rn are definedby the transformation formulas

x1 = r cos θ1

x2 = r sin θ1 cos θ2

x3 = r sin θ1 sin θ2 cos θ3

...xn−1 = r sin θ1 sin θ2 · · · sin θn−2 cos θn−1

xn = r sin θ1 sin θ2 · · · sin θn−2 sin θn−1,

where

r > 0, 0 < θj < π, j = 1, . . . , n− 2, and 0 < θn−1 < 2π.

Note that sin θj > 0 for j ≤ n− 2 and∑nj=1 x

2j = r2. Let

U := (0,+∞)× (0, π)n−2 × (0, 2π) and V := Rn \(Rn−2 × [0,+∞)× {0}

)and define ϕ on U by

ϕ(r, θ1, , . . . , θn−1

)= (x1, . . . , xn),

where the xj are as above. Clearly U and V are open and ϕ is C∞ on U . Weclaim that ϕ maps U onto V and has a C∞ inverse on U .

The inclusion ϕ(U) ⊆ V is established as follows: If (r, θ1, . . . , θn−1) ∈ Uand (x1, . . . , xn) = ϕ(r, θ1, . . . , θn−1) 6∈ V , then xn−1 ≥ 0 and xn = 0. But thelatter implies that θn−1 = π, which gives the contradiction xn−1 < 0.

For the reverse inclusion, we show that for each (x1, . . . , xn) ∈ V thereexists a unique solution (r, θ1, θ2, . . . , θn−1) to the above system. Clearly, rand θ1 have the unique solutions

r =( n∑j=1

x2j

)1/2and θ1 = arccos(x1/r).

In particular, the system has a unique solution if n = 2. Now set

yj = xj/(r sin θ1), 2 ≤ j ≤ n.

By induction, we may assume that the reduced system

y2 = cos θ2

y3 = sin θ2 cos θ3

...yn−1 = sin θ2 · · · sin θn−2 cos θn−1

yn = sin θ2 · · · sin θn−2 sin θn−1

has a unique solution (θ2, . . . , θn−1). Then the original system has the uniquesolution (r, θ1, . . . , θn−1). Therefore, ϕ is one-to-one and ϕ(U) = V .

By standard properties of determinants and a reduction argument,

Jϕ(r, θ1, θ2, . . . , θn−1) = rn−1 sinn−2 θ1 sinn−3 θ2 · · · sin2 θn−3 sin θn−2.

Since Jϕ > 0 on U , the inverse function theorem implies that ϕ has a globalC∞ inverse on U . Hence, by the change of variables theorem, if f is Lebesguemeasurable on Rn and either f ≥ 0 or f is integrable, then∫

V

f dλ =∫U

(f ◦ ϕ)Jϕ dλ.


Since V differs from Rn by a set of measure zero, we may write the last equationas∫ ∞−∞· · ·∫ ∞−∞

f(x1, . . . , xn) dx1 · · · dxn (11.16)

=∫ ∞

0

∫ π

0· · ·∫ π

0

∫ 2π

0f(r cos θ1, r sin θ1 cos θ2, . . . , r sin θ1 · · · sin θn−1

)(rn−1 sinn−2 θ1 sinn−3 θ2 · · · sin2 θn−3 sin θn−2

)dθn−1 dθn−2 · · · dθ1 dr.

In particular, taking f to be the indicator function of Cn1 (0) and using 11.5.6,we see that the left side of (11.16) is λ

(Cn1 (0)

)= αn and the right side is∫ 1

0

∫ π

0· · ·∫ π

0

∫ 2π

0

(rn−1 sinn−2 θ1 · · · sin2 θn−3 sin θn−2

)dθn−1 dθn−2 · · · dθ1 dr

= 2πn

∫ π

0· · ·∫ π

0

(sinn−2 θ1 · · · sin2 θn−3 sin θn−2

)dθn−2 · · · dθ1.

In particular,∫ π

0· · ·∫ π

0

(sinn−2 θ1 sinn−3 θ2 · · · sin2 θn−3 sin θn−2

)dθn−2 · · · dθ1 = nαn

2π . ♦

Proof of the change of variables theorem.

Before we begin the proof proper, we make some reductions. First, byconsidering f+ and f−, we need only prove the case f ≥ 0. Second, since aLebesgue measurable function is equal a.e. to a Borel measurable function, wemay assume that f is Borel measurable. Note that in this case f ◦ ϕ is alsoBorel measurable. To prove (11.15) it then suffices to verify that∫

V

f dλ ≤∫U

(f ◦ ϕ)|Jϕ| dλ (11.17)

for all Borel measurable functions f : V :→ [0,+∞]. Indeed, if (11.17) holdsfor all f and ϕ, then, switching the roles of U and V it must also be the casethat ∫

U

g dλ ≤∫V

(g ◦ ϕ−1)|Jϕ−1 | dλ

for all Borel measurable g : U :→ [0,+∞]. Taking g = (f ◦ϕ)|Jϕ| and recallingthat JϕJϕ−1 = 1, we obtain the reverse of inequality (11.17). Finally, byconsidering simple functions and using linearity, 10.5.8, and the monotoneconvergence theorem, it suffices to prove (11.17) for indicator functions f = 1B ,where B ∈ B(Rn) and B ⊆ V . Equation (11.17) then reduces to

λ(B) ≤∫ϕ−1(B)

|Jϕ| dλ, B ⊆ V, B ∈ B(Rn),


or, equivalently, (taking B = ϕ(E)),

λ(ϕ(E)

)≤∫E

|Jϕ| dλ, E ⊆ U, E ∈ B(Rn). (11.18)

The proof of (11.18) is a sequence of lemmas, the first of which treats thecase of a linear change of variable.

11.6.3 Lemma. If T ∈ L(Rn,Rn) is nonsingular, then

λ(T (E)) = |detT |λ(E), E ∈ B(Rn). (11.19)

Proof. Since T is nonsingular, T (E) ∈ B(Rn) so the left side of (11.19) isdefined. Furthermore, if (11.19) holds for T1 and T2, then it holds for T1T2:

λ(T1T2(E)

)= |detT1|λ

(T2(E)

)= |detT1| |detT2|λ(E) = |det(T1T2)|λ(E).

Now observe that a nonsingular linear transformation T may be expressed asa product of elementary linear transformations, that is, linear transformationswhose matrices are obtained from the identity matrix by one of the followingoperations:

(a) Interchange of two rows.

(b) Multiplication of a row by a nonzero constant.

(c) Addition of one row to another.

This is simply the assertion that a matrix may be put into reduced row echelonform by a sequence of elementary row operations. (See Appendix B.) Weclaim that (11.19) holds for elementary linear transformations T and boundedintervals E = I1 × . . .× In.

In case (a), detT = −1 and T (E) is the interval obtained from E byinterchanging a pair of intervals Ii and Ij , hence (11.19) holds in this case.

In (b), T (E) is the interval obtained from E by multiplying one of thecoordinate intervals by a nonzero constant a, hence λ(T (E)) = |a|λ(E). Since|detT | = |a|, (11.19) holds in this case as well.

For case (c), assume for definiteness that the matrix of T is the result ofadding row two of the identity matrix to row one, so

T (x1, x2, x3, . . . , xn) = (x1 + x2, x2, x3, . . . , xn).

Then detT = 1 and

λ(T (E)

)=∫

1T (E)(x) dx =∫

1E(x1 − x2, x2, . . . , xn) dx.

By the Fubini–Tonelli theorem and translation invariance, the last integral

evaluates to∫∫· · ·∫

1I1(x1 − x2)1I2(x2) · · ·1In(xn) dxn · · · dx2 dx1

= |In| · · · |I3|∫

1I2(x2)∫

1I1(x1 − x2) dx1 dx2

= |In| · · · |I3| |I2| |I1|= λ(E).

Therefore, (c) holds.It now follows that (11.19) holds for all nonsingular T and all intervals

E. To verify (11.19) for all Borel sets E, we use 11.5.7. For a fixed boundedinterval I, let GI denote the collection of all E ∈ B(Rn) for which

λ(T (E ∩ I)) = |detT |λ(E ∩ I). (11.20)

By the first part of the proof, GI contains all intervals. Let A, B ∈ GI withA ⊆ B, and set C = A ∩ I and D = B ∩ I. Then (B \A) ∩ I = D \ C and

λ(T (D\C)

)= λ

(T (D)

)−λ(T (C)

)= |detT |

(λ(D)−λ(C)

)= |detT |λ(D\C),

hence B \A ∈ GI . (The operation of substraction is legitimate because C andD are bounded.) Now let Ak ∈ GI , Ak ↑ A. Letting k → +∞ in

λ(T (Ak ∩ I)) = |detT |λ(Ak ∩ I)

shows that A ∈ GI . Therefore, GI satisfies (a)–(c) of 11.5.7, hence (11.20)holds for every E ∈ B(Rn). Taking a sequence of bounded intervals Ik ↑ Rn in(11.20) yields (11.19).

Br√n/2(y)

Qr(y)

y

r√n/2

r/2

FIGURE 11.3: Concentric cube and ball.

For the remaining lemmas, the following terminology and notation will beuseful. The cube with center y ∈ Rn and edge r > 0 is the semi-closed interval

Q = Qr(y) := {x ∈ Rn : yj − r/2 ≤ xj < yj + r/2, j = 1, . . . , n} .

Note that |Q| = rn and the diameter of Q is r√n. Thus

Br/2(y) ⊆ Qr(y) ⊆ Br√n/2(y).

A paving of a subset A of Rn is a finite collection Qr of pairwise disjoint cubeswith edge r that covers A. Two pavings Qr = {Qr(xj) : 1 ≤ j ≤ m} andQs = {Qs(xj) : 1 ≤ j ≤ m} with the same centers are said to be concentric.Any bounded set A has a paving Qr with arbitrarily small r. Indeed, ifA ⊆ [a, b)n, one need only subdivide [a, b) into subintervals of size (b− a)/kfor sufficiently large k and form Cartesian products of these subintervals.

11.6.4 Lemma. Let K ⊆ U be compact.

(a) For each sufficiently small δ > 0, there exists a compact set Kδ withK ⊆ Kδ ⊆ U .

(b) For each r < δ, there exists a paving Qr of K contained in Kδ.

Proof. For subsets A, B ⊆ Rn, denote by d(A,B) the distance between Aand B:

d(A,B) = inf {‖a− b‖ : a ∈ A, b ∈ B} .Since K is compact and U c is closed, δ0 := d(U c,K) > 0. For 0 < δ < δ0/

√n,

letKδ =

{x : d(x,K) ≤ δ

√n}.

Then Kδ is compact and K ⊆ Kδ ⊆ U . Let Q be a cube with edge r. If

U

Kδ

K

Qi

FIGURE 11.4: The paving Qr.

x ∈ Q ∩K and y ∈ Q ∩Kcδ , then

δ√n < d(y,K) ≤ ‖x− y‖ ≤ r

√n.

Therefore, if r < δ and Q ∩K 6= ∅, then Q ∩Kcδ = ∅, that is, Q ⊆ Kδ. Since

K is bounded, there exists a paving Qr of K. Removing those members of Qrthat do not meet K produces a paving of K contained in Kδ.

11.6.5 Corollary. Let ψ : U → Rn be C1 on U and let E ⊆ U with λ(E) = 0.Then λ

(ψ(E)

)= 0.

Proof. Suppose first that E is bounded. Let V ⊇ E be open with compactclosure contained in U and set

c := supz∈cl(V )

‖ψ′(z)‖.

By continuity of ψ′ and compactness of cl(V ), c < +∞. Given ε > 0, letW ⊇ E be open with compact closure K = cl(W ) ⊆ V such that λ(K) < ε/2.This is possible by 10.4.4, since λ(E) = 0.

Now let Kδ be as in 11.6.4. Since Kδ ↓ K as δ ↓ 0, we may take δ sufficientlysmall so that λ(Kδ) < ε. According to the lemma, we may choose a pavingQr = {Q1, . . . , Qk} of K contained in Kδ with r < ε. It follows that

krn =k∑j=1

λ(Qj) = λ

( k⋃j=1

Qj

)< ε. (11.21)

Let xj denote the center of Qj . Since Qj is convex, 9.3.6 implies that

‖ψ(x)− ψ(xj)‖ ≤ c‖x− xj‖ ≤ cr√n, x ∈ Qj .

Therefore,ψ(Qj) ⊆ Bcr√n

(ψ(xj)) ⊆ Q2cr

√n

(ψ(xj)

)and so

λ(ψ(Qj))≤ (2cr

√n)n.

Since the sets ψ(Qj) cover ψ(K),

λ (ψ(E)) ≤ λ (ψ(K)) ≤ k(2cr√n)n ≤ (2c

√n)nε,

the last inequality by (11.21). Since ε was arbitrary, λ (ψ(E)) = 0. This provesthe assertion of the lemma for bounded E. In the unbounded case, take asequence of bounded Borel sets Ek ↑ E.

11.6.6 Lemma. Let ψ be C1 on U , Q a cube contained in U , and let Indenote the identity transformation on Rn. If ‖dψ′x − In‖ ≤ c for all x ∈ Q,then λ

(ψ(Q)

)≤ [(1 + c)n]nλ(Q).

Proof. Let ψ(x) = ψ(x)− x. Then dψx = dψx − In. By 9.3.6,

‖ψ(x)− ψ(y)‖ ≤ c‖x− y‖, for all x, y ∈ Q.

Thus, if Q has center x0 and edge r, then for all x ∈ Q

‖ψ(x)−ψ(x0)‖ ≤ ‖ψ(x)−ψ(x0)‖+‖x−x0‖ ≤ (c+1)‖x−x0‖ ≤ (c+1)√nr/2,

that is, ψ(Q) is contained in the closed ball C with center ψ(x0) and radius√n(c+ 1)r/2. Since C is contained in the cube with center ψ(x0) and edge

(c+ 1)nr,λ(ψ(Q)

)≤ [(c+ 1)nr]n = [(c+ 1)n]nλ(Q).

11.6.7 Lemma. Let ψ : U → Rn be C1 on U and let K ⊆ U be compact.Then, for each ε > 0, there exists δ > 0, a compact set Kδ with K ⊆ Kδ ⊆ U ,and concentric pavings Qr, Qnr of K contained in Kδ with arbitrarily small rsuch that for any Qr(y) ∈ Qr,

λ(ϕ(Qr(y)

))≤ (1 + ε)n|Jϕ(y)|λ

(Qnr(y)

)(11.22)

Moreover, δ may be chosen so that∫Kδ

|Jϕ(x)| dx <∫K

|Jϕ(x)| dx+ ε. (11.23)

Proof. Let M = sup{∥∥(dϕy)−1

∥∥ : y ∈ Kδ

}, where Kδ is chosen as in 11.6.4.

For x, y ∈ U define

ψy(x) =(dϕy

)−1(ϕ(x)− ϕ(y)

)=(dϕy

)−1(ϕ(x)

)−(dϕy

)−1(ϕ(y)

).

Since(dϕy

)−1 is linear, by the chain rule

d(ψy)x = (dϕy)−1 ◦ dϕx.

Thus for all x ∈ U , y ∈ Kδ, and z ∈ Rn,

‖d(ψy)x(z)− z‖ =∥∥(dϕy

)−1(dϕx(z)− dϕy(z)

)∥∥ ≤M‖dϕx − dϕy‖ ‖z‖.

Therefore, by definition of the operator norm,

‖d(ψy)x − In‖ ≤M‖dϕx − dϕy‖. (11.24)

Now, by the uniform continuity of dϕ on Kδ there exists 0 < δ1 < δsuch that ‖dϕx − dϕy‖ ≤ ε/M for all x,y ∈ Kδ with ‖x − y‖ < δ1

√n. Let

r < δ1/n and let Qr Qnr be concentric pavings of K contained in Kδ. Ifx ∈ Q := Qr(y) ∈ Qr, then ‖x − y‖ < r

√n < δ1

√n, hence, from (11.24),

‖d(ψy)x − In‖ < ε. By 11.6.6,

λ(ψy(Q)

)≤ [(1 + ε)n]nλ(Q) = (1 + ε)nλ

(Qnr(y)

). (11.25)

On the other hand, since ψy(Q) =(dϕy)

)−1(ϕ(Q)

)−(dϕy

)−1(ϕ(y)

), by

translation invariance and 11.6.3,

λ(ψy(Q)

)= λ

[(dϕy

)−1(ϕ(Q))]

= |Jϕ(y)|−1λ(ϕ(Q)

). (11.26)

Inequality (11.22) now follows from (11.25) and (11.26).For (11.23), note that since K1/k ↓ K and µ(A) :=

∫A|Jϕ| dλ is a measure

on the Borel sets (11.3.4), µ(K1/k

)↓ µ(K). Thus there exists k such that

µ(K1/k

)< µ(K) + ε. Taking δ < 1/k completes the proof.

11.6.8 Lemma. If K ⊆ U is compact, then

λ(ϕ(K)

)≤∫K

|Jϕ(y)| dy.

Proof. Let ε > 0 and choose δ > 0 as in 11.6.7. By uniform continuity of Jϕ(x)on Kδ, there exists δ1 < δ such that

|Jϕ(x)− Jϕ(y)| < ε for all x, y ∈ Kδ with ‖x− y‖ < δ1.

Choose pavings Qr = {Qr(y)}y and Qnr = {Qnr(y)}y as in 11.6.7. Then forx ∈ Qnr(y)

|Jϕ(y)| ≤ |Jϕ(x)− Jϕ(y)|+ |Jϕ(x)| < ε+ |Jϕ(x)|,

hence, by (11.22),

(1 + ε)−nλ(ϕ(Qr(y))

)≤ |Jϕ(y)|λ(Qnr(y)) ≤

∫Qnr(y)

(|Jϕ(x)|+ ε

)dx,

so

(1 + ε)−nλ(ϕ(K)

)≤∑

y

(1 + ε)−nλ(ϕ(Qr(y))

)≤∫Kδ

(|Jϕ(x)|+ ε

)dx

≤∫K

|Jϕ(x)| dx+ ε(1 + λ(Kδ)

). by (11.23)

Letting ε→ 0 verifies the lemma.

Now use 10.4.5 to obtain an increasing sequence of compact sets Kk ⊆ Esuch that λ(Kk) ↑ λ(E). Then λ

(ϕ(Kk)

)↑ λ(ϕ(E)

)and, by 11.6.8,

λ(ϕ(Kk)

)≤∫Kk

|Jϕ(y)| dy ≤∫E

|Jϕ(y)| dy.

Letting k → +∞ yields (11.18), completing the proof of the change of variablestheorem.

11.6.9 Remark. If V is a linear subspace of Rn of dimension m < n, thenλn(V) = 0. To see this, let v1, . . ., vm, . . ., vn, be an orthonormal basis forRn, where the first m vectors form a basis for V .3 Define TV ∈ L(Rn,Rn) suchthat TV (vj) = ej , 1 ≤ j ≤ n. Then TV is an orthogonal transformation andTV (V) = Rm × {0}. By 11.6.3

λn(V) = |det(TV )|λn(Rm × {0}) = 0,

3This is always possible by the Gram–Schmidt process.


as claimed. This also shows that (11.19) holds for singular transformations Tas well, since then both sides of that equation are zero.

While the n-dimensional volume of a subset E of V is zero, E may stillhave positive m-dimensional measure. This is defined as

λV(E) := λm(TV (E)

)for E ∈ T−1

V

(B(Rm)

).

From a geometric point of view, this is a reasonable definition, since anorthogonal transformation is either a rotation or a rotation combined witha reflection and therefore does not change volumes or areas. To see that thedefinition does not depend on the particular choice of the orthonormal basis,let w1, . . . ,wn be another orthonormal basis for Rn whose first m membersform a basis for V and let TV ∈ L(Rn,Rn) satisfy TV (wj) = ej , 1 ≤ j ≤ n. SetT = TVT

−1V

. Then, by (11.19),

λm(TV (E)

)= λm

(TTV (E)

)= |detT |λm

(TV (E)

)= λm

(TV (E)

),

the last equality because T is orthogonal and hence has determinant ±1. ♦

Exercises1. Define the n-dimensional ellipsoid

E ={

(x1, . . . , xn) :(x1

a1

)2+ · · ·+

(xnan

)2≤ 1},

where aj > 0. Prove that λn(E) = a1 · · · an λn(C1(0)

).

2. Show that the volume of the solid with surface√|x|+

√|y|+

√|z| = 1

is given by

64∫ 1

0

∫ 1−u

0

∫ 1−u−v

0uvw dw dv du.

3.S ⇓4 Let h be Lebesgue integrable on [0,+∞). Use 11.6.2 to prove that∫Rnh(‖x‖) dx = nαn

∫ ∞0

h(r)rn−1 dr.

4. Use Exercise 3 to show that for n ≥ 2

(a)∫Rn

exp(−‖x‖) dx = n!αn. (b)∫Rn

exp(−‖x‖2) dx = πn/2.

5.S A hole of radius R ∈ (0, 1) is drilled in the (n + 1)-dimensional ballCn+1

1 (0) from the north pole (0, 0, . . . , 1) to the south pole (0, 0, . . . ,−1).Use Exercise 3 to show that the amount removed from the ball is

nαn[R√

1−R2 − arcsin√

1−R2 + π/2].

4This exercise will be used in 13.2.5 and 13.4.2.

6.S (Theorem of Pappus) Let E ∈ M(Rn) be bounded with positiven-dimensional Lebesgue measure such that xn > 0 for all x =(x1, . . . , xn) ∈ E. Define

Er = {(x1, . . . , xn−1, xn cos θ, xn sin θ) : x ∈ E, 0 < θ < 2π} .

Prove thatλn+1(Er) = 2πxnλn(E),

wherexn := 1

λn(E)

∫E

xn dλn(x1, . . . , xn),

the nth coordinate of the centroid x of E. Thus if n = 2, then Er isthe rotation of E about the x1-axis, and the theorem of Pappus assertsthat the volume of Er is equal to the area of E times the distance thecentroid of E travels around the x1 axis.

x

θ

x1

x2

x3

E

FIGURE 11.5: Theorem of Pappus.

Chapter 12Curves and Surfaces in Rn

12.1 Parameterized CurvesA parameterized curve Rn is a continuous function ϕ : I → Rn, where I

is an interval in R. We shall usually refer to ϕ as simply a curve. The rangeϕ(I) of ϕ is called the trace of ϕ and is denoted by trace(ϕ). The curve issaid to lie in a set E ⊆ Rn if trace(ϕ) ⊆ E. The curve is called simple if ϕ isone-to-one. If I = [a, b], the point ϕ(a) is the initial point of the curve andϕ(b) the terminal point. The curve ϕ is then said to be closed if ϕ(a) = ϕ(b),and simple closed if it is closed and ϕ is one-to-one on (a, b), that is, the curveintersects itself only at the initial and terminal points. For example, the curve(cos(2kπt), sin(2kπt)), t ∈ [0, 1], k ∈ N, is a simple closed curve iff k = 1; itstrace is the circle x2 + y2 = 1.

ϕ(a) ϕ(b) ϕ(a) = ϕ(b)

ϕ(b)

ϕ(a)

Simple curve Non-simple curve Simple closed curve

FIGURE 12.1: Curves in R2.

A curve ϕ : I → Rn is said to be of class Cr if ϕ is Cr on an open intervalcontaining I. A C1 curve ϕ is smooth if ϕ′(t) 6= 0 for all t ∈ I. For example, on[−1, 1] the curve ϕ(t) = (t, t2) is smooth but the curve ψ(t) = (t3, t6), whichhas the same trace as ϕ, is not.

A curve ϕ : [a, b]→ Rn is said to be piecewise smooth if, for some partitiona = a0 < a1 < · · · < am = b, ϕ is smooth on each interval [aj−1, aj ]. This im-plies that ϕ′ is uniformly continuous on each interval of smoothness (aj−1, aj)and has right-hand and left-hand limits at the left and right endpoints, respec-tively. Thus a piecewise smooth curve may be viewed as a concatenation (sum)

409


of smooth curves, as shown in Figure 12.2. Note that at junctions that arecorners there are two tangent vectors, and at junctions that are cusps there isone. A point on a smooth portion of the curve will be called a smooth point. Apiecewise smooth curve therefore consists of smooth points and finitely manycorner or cusp points.

cusp

corner

corner

smoothpoint

FIGURE 12.2: A piecewise smooth curve with tangent vectors.

A reparametrization of a curve ϕ : I → Rn is a curve ψ = ϕ ◦ α : J → Rn,where α : J → I is continuous, strictly increasing, and α(J) = I (hencetrace(ψ) = trace(ϕ)). If ϕ is smooth, then α is required to be smooth withpositive Jacobian. If ψ is a reparametrization of ϕ, then ϕ and ψ are said tobe equivalent. For example, the smooth curve (t, t2, t3) (t > 0) is equivalent tothe curve (et, e2t, e3t) (t ∈ R).

A curve ϕ : I → Rn has a positive direction, namely, the direction that ϕ(t)moves as t increases. An equivalent curve ψ = ϕ ◦ α has the same directionsince α is strictly increasing. The curve −ϕ, defined by

(−ϕ)(t) := ϕ(−t), −t ∈ I,

has the opposite (negative) direction. If ϕ is piecewise smooth, then the positivedirection is given by the tangent vectors ϕ′(t), defined at smooth points. Atcorners and cusps the tangent vectors are right- and left-hand limits. The setof tangent vectors to a curve is called the tangent vector field (defined moreprecisely later).12.1.1 Proposition. . Let ϕj : [aj , bj ] → Rn, j = 1, . . . , k, be piecewise C1

curves such that ϕj(bj) = ϕj+1(aj+1), j = 1, . . . , k − 1. Then there exists apiecewise C1 curve ϕ : [0, 1]→ Rn, denoted by

ϕ = ϕ1 + ϕ2 + · · ·+ ϕk

and called the sum of the curves ϕj, such that ϕ∣∣[(j−1)/k,j/k] is equivalent to

ϕj.Proof. Define αj : [(j − 1)/k, j/k]→ [aj , bj ] by

αj(t) = bj + (bj − aj)(kt− j), (j − 1)/k ≤ t ≤ j/k,

and ϕ : [0, 1]→ Rn by ϕ = ϕj ◦ αj on [(j − 1)/k, j/k].

Curves and Surfaces in Rn 411

Exercises1.S Prove that the notion of equivalent smooth curves is an equivalence

relation.

2. Show that if ϕ is smooth and ψ = ϕ ◦ α is an equivalent curve, then

ϕ′(t)‖ϕ′(t)‖ = ψ′(t)

‖ψ′(t)‖ .

Thus the unit tangent vector field is invariant under a reparametrization.

3.S Sketch the trace of the curve ϕ(t) = (t2, t3 − t) on the interval [−2, 2].Find all points on the trace where there are two tangent vectors andexpress these vectors in terms of the standard basis.

4. Find the tangent vector field of the given curve ϕ on the interval [0, 2π].Sketch the trace and find all points on the trace at which there are twotangent vectors. Express these vectors in terms of the standard basis.(a)S ϕ(t) =

(sin t, cos(2t)

). (b) ϕ(t) =

(cos t, sin(2t)

).

(c) ϕ(t) =(

cos t, cos(2t)). (d) ϕ(t) =

(sin t, sin(2t)

).

5. In (a)–(d) below, find a smooth simple curve or a smooth simple closedcurve ϕ : I → C with trace C.

(a) C is the intersection of the elliptic cylinder x2

a2 + y2

b2= 1 and the

plane x+ y + z = 1.

(b)S C is the intersection of the elliptic cylinder x2

a2 + y2

b2= 1 and the

surface z = 2xy.(c) C is the intersection in the first octant of the paraboloid z = x2 + y2

and the plane x+ y + z = 1.(d) C is the intersection in the first octant of the cone z = x2 + y2 andthe plane x+ y + z = 1.

6.S Let ϕ : [a, b] → Rn be a C1 curve with the property that for somex ∈ Rn, ϕ(t) = x for infinitely many t ∈ [a, b]. Prove that ϕ is notsmooth.

7. Let f be C1 on an open set U and let ϕ be a C1 curve in U . Supposethat ϕ′(t) = ∇f

(ϕ(t)

)for all t > a and that the limit x := limt→+∞ ϕ(t)

exists in U . Prove that ∇f(x) = 0.Hint. Assume ∇f(x) 6= 0. Let g = f ◦ ϕ and show that g′(t) >‖∇f(x)‖2/2 for all sufficiently large t.

12.2 Integration on CurvesRectifiable Curves

Let ϕ : I → Rn be a parameterized curve. Assume first that I = [a, b]. Fora partition P = {t0 = a < t1 < · · · < tk−1 < tk = b} of [a, b] define

LP(ϕ) =k∑j=1‖ϕ(tj)− ϕ(tj−1)‖,

which is the length of the inscribed polygonal line with segments joining thepoints ϕ(tj−1) and ϕ(tj).

ϕ(a)

ϕ(t1)

ϕ(t2)

ϕ(b)

ϕ(t3)

FIGURE 12.3: Inscribed polygonal line.

The (arc) length of ϕ is defined as

length(ϕ) := supPLP(ϕ),

where the supremum is taken over all partitions P of [a, b]. If length(ϕ) < +∞,then ϕ is said to be rectifiable. Note that if ψ = ϕ ◦ α is equivalent to ϕ,then length(ψ) = length(ϕ), since α : [c, d] → [a, b] induces a one-to-onecorrespondence between partitions of [c, d] and [a, b].

If I = [a, b) (where b could be infinite), define

length(ϕ) := supa<t<b

length(ϕ∣∣[a,t]

).

A similar definition is given for intervals I = (a, b]. For an open intervalI = (a, b), define

length(ϕ) := length(ϕ∣∣(a,c]

)+ length

(ϕ∣∣[c,b)

),

where a < c < b. By 12.2.3 below, the expression on the right does not dependon the intermediate point c.

12.2.1 Example. Let α > 0. The curve

ϕ(t) =(x(t), y(t)

)=(t, tα sin(1/t)

), 0 < t ≤ b, ϕ(0) = 0,

is rectifiable iff α > 1. This follows from the inequalities

k∑j=1|y(tj)−y(tj−1)| ≤

k∑j=1‖ϕ(tj)−ϕ(tj−1)‖ ≤ 2(b−a) + 2

k∑j=1|y(tj)−y(tj−1)|

and 5.9.3. ♦

We prove in 12.2.4 below that piecewise C1 curves on [a, b] are rectifiable.For this, we require two lemmas. The proof of the first is similar to that of thecorresponding result for lower Darboux sums and is left as an exercise.

12.2.2 Lemma. Let ϕ : [a, b]→ Rn be a curve and let P and Q be partitionsof [a, b]. If P is a refinement of Q, then LQ(ϕ) ≤ LP(ϕ).

12.2.3 Lemma. Let ϕ : [a, b]→ Rn be a curve and c ∈ (a, b). Then

length(ϕ) = length(ϕ|[a,c]

)+ length

(ϕ|[c,b]

).

In particular, ϕ is rectifiable iff ϕ|[a,c] and ϕ|[c,b] are rectifiable.

Proof. Let P ′ and P ′′ be partitions of [a, c] and [c, b], respectively, and setP = P ′ ∪ P ′′. Then P is a partition of [a, b] and

length(ϕ) ≥ LP(ϕ) = LP′(ϕ|[a,c]

)+ LP′′

(ϕ|[c,b]

).

Taking suprema over P ′ and then P ′′ yields

length(ϕ) ≥ length(ϕ|[a,c]

)+ length

(ϕ|[c,b]

).

For the reverse inequality, let P = {t0 = a < t1 < · · · < tk = b} be a partitionof [a, b] and suppose c ∈ (ti−1, ti]. If P ′ = {t0 = a < t1 < · · · < ti−1 < c} andP ′′ = {c ≤ ti < · · · < tk = b}, then an application of the triangle inequalityshows that

LP(ϕ) ≤ LP′(ϕ|[a,c]

)+ LP′′

(ϕ|[c,b]

)≤ length

(ϕ|[a,c]

)+ length

(ϕ|[c,b]

).

Since P was arbitrary,

length(ϕ) ≤ length(ϕ|[a,c]

)+ length

(ϕ|[c,b]

).

12.2.4 Theorem. Let ϕ : [a, b]→ Rn be piecewise C1. Then ϕ is rectifiableand

length(ϕ) =m∑j=1

∫ aj

aj−1

‖ϕ′(t)‖ dt,

where ϕ is smooth on the intervals [aj−1, aj ], a = a0 < a1 < · · · < am = b.

Proof. By 12.2.3 we may assume that ϕ = (ϕ1, . . . , ϕn) is C1 on [a, b]. Givenε > 0, choose δ > 0 so that∣∣∣ ∫ b

a

‖ϕ′(t)‖ dt−m∑k=1‖ϕ′(tk)‖∆tk

∣∣∣ < ε, ∆tk := tk − tk−1 (12.1)

for all partitions P = {t0 = a < t1 < · · · < tm−1 < tm = b} with ‖P‖ < δ. Forsuch a partition P, choose sj,k ∈ (tk−1, tk) such that

ϕj(tk)− ϕj(tk−1) = ϕ′j(sj,k)∆tk, k = 1, . . . ,m, j = 1, . . . , n.

Then

LP(ϕ) =m∑k=1‖ϕ(tk)− ϕ(tk−1)‖ =

m∑k=1

( n∑j=1|ϕ′j(sj,k)|2

)1/2∆tk,

hence∣∣∣∣LP(ϕ)−m∑k=1‖ϕ′(tk)‖∆tk

∣∣∣∣=

∣∣∣∣∣∣m∑k=1

( n∑j=1|ϕ′j(sj,k)|2

)1/2−( n∑j=1|ϕ′j(tk)|2

)1/2∆tk

∣∣∣∣∣∣ .Taking a smaller δ if necessary, we may assume that the absolute value of theterm in braces is less than ε/(b− a). This is possible by the uniform continuityof ϕ′. It follows that ∣∣∣∣LP(ϕ)−

m∑k=1‖ϕ′(tk)‖∆tk

∣∣∣∣ < ε. (12.2)

From (12.1) and (12.2) we now have∫ b

a

‖ϕ′(t)‖ dt− 2ε < LP(ϕ) <∫ b

a

‖ϕ′(t)‖ dt+ 2ε (12.3)

for all P with ‖P‖ < δ. Since LP(ϕ) ≤ length(ϕ) and ε was arbitrary, the firstinequality in (12.3) implies that∫ b

a

‖ϕ′(t)‖ dt ≤ length(ϕ).

For the reverse inequality, let Q be any partition of [a, b]. Refine Q to obtaina partition P with ‖P‖ < δ. Then, from 12.2.2 and the second inequality in(12.3),

L(ϕ,Q) <∫ b

a

‖ϕ′(t)‖ dt+ 2ε.

Since Q and ε are arbitrary, length(ϕ) ≤∫ ba‖ϕ′(t)‖ dt.

The proof of the following corollary is left to the reader.

12.2.5 Corollary. If ϕ : [a, b) → Rn is C1, then length(ϕ) is the improperintegral

∫ ba‖ϕ′(t)‖ dt.

12.2.6 Example. Let ϕ(t) =(e−t cos t, e−t sin t

), where 0 ≤ t < +∞. Then

‖ϕ′(t)‖ = e−t, hence length(ϕ) =∫∞

0 e−t = 1. ♦

Line IntegralsLet ϕ : [a, b] → Rn be a C1 curve with trace C and let f : C → R be

continuous. The line integral of f over ϕ is defined by∫ϕ

f ds =∫C

f ds =∫ b

a

f(ϕ(t)

)‖ϕ′(t)‖ dt.

Note that if ψ = ϕ◦α is an equivalent parametrization, where α : [c, d]→ [a, b]is C1, then, by the chain rule and the change of variables theorem,∫ d

c

f(ψ(t)

)‖ψ′(t)‖ dt =

∫ d

c

f(ϕ(α(t))

)‖ϕ′(α(t)

)‖α′(t) dt

=∫ b

a

f(ϕ(u)

)‖ϕ′(u)‖ du.

The value of a line integral is therefore independent of the choice of parametriza-tion.

If ϕ : [a, b]→ Rn is piecewise C1, then the line integral is defined as∫ϕ

f ds =∑j

∫ϕj

f ds,

where ϕj is the restriction of ϕ to [aj , aj+1] and ϕ is C1 on [aj , aj+1]. Ifϕ : I → Rn is C1, where I is an arbitrary interval, then the line integral isdefined as an improper integral, as in the case of arc length.

12.2.7 Remark. Theorem 12.2.4 shows that arc length is the line integral ofthe constant function 1. Using techniques similar to those found in the proofof that theorem, one may show that if ϕ is C1, then

∫ϕf is the limit of sums

of the formk∑j=1

(f ◦ ϕ)(t∗j )‖ϕ(tj)− ϕ(tj−1)‖, t∗j ∈ (tj−1, tj),

as maxj ‖ϕ(tj) − ϕ(tj−1)‖ → 0. This interpretation is useful in applications.For example, if f(x) is the mass per unit length at the point x of a wire C,then (f ◦ ϕ)(t∗j )‖ϕ(tj)− ϕ(tj−1)‖ is approximately the mass of a small pieceof the wire. Summing and taking the limit gives the mass of the wire as theline integral

∫Cf ds. ♦


Vector Fields12.2.8 Definition. A vector field on a set E ⊆ Rn is a function

~F = (f1, . . . , fn) : E → Rn.

The vector field is said to be of class Cr if each fj is Cr. ♦

Geometrically, a vector field assigns to each point of E a unique vector in Rn,as illustrated in Figure 12.4.

Ex ~F (x)

FIGURE 12.4: Vector field on E.

If ϕ is a simple smooth curve and x = ϕ(t), then

~vϕ(x) := ϕ′(t) and ~Tϕ(x) := ϕ′(t)‖ϕ′(t)‖

denote, respectively, the tangent vector field and unit tangent vector field alongϕ. If ϕ denotes the position of a particle at time t, then the tangent vectorfield is called the velocity vector field of the particle.

Vector fields that describe forces, such as gravitation or electromagnetism,are called force fields. Line integrals may then be used to calculate the workdone by the force in moving a particle along a curve. Specifically, suppose theparticle moves along a simple smooth curve ϕ : [a, b]→ R3 under the actionof a continuous force field ~F =

(f1, f2, f3

)on C := trace(ϕ). The work ∆jW

done by the force in moving the particle from a point xj = ϕ(tj) on C to anearby point xj+1 = ϕ(tj+1) is approximately the component of the force inthe direction of the tangent to the curve at xj multiplied by the distance theparticle travels:

∆jW ≈[~F (xj) · ~Tϕ(xj)

]‖xj − xj+1‖.

The total work W done by the force is then approximately∑j ∆jW . Since F

is continuous, the approximation gets better by taking smaller intervals. It istherefore reasonable to define the total work done by the force in moving theparticle along the curve as the limit of

∑j ∆jW as maxj ‖xj −xj+1‖ → 0. By

12.2.7, we are therefore led to the definition

W :=∫ϕ

~F · ~T ds =∫ b

a

~F(ϕ(t)

)· ~Tϕ

(ϕ(t)

)‖ϕ′(t)‖ dt.


Since ~Tϕ(ϕ(t)

)= ‖ϕ′(t)‖−1ϕ′(t), we see that

W =∫ b

a

~F(ϕ(t)

)· ϕ′(t) dt =

∫ b

a

[f1(x)dx1

dt+ f2(x)dx2

dt+ f3(x)dx3

dt

]dt,

where x = ϕ(t). The last integral is frequently written∫ϕ

(f1 dx1 + f2 dx2 + f3 dx3

).

The integrand is called a (differential) 1-form on C in R3.

Differential 1-Forms in Rn

Let fj be defined on a set S ⊆ Rn. The symbolω := f1 dx1 + · · ·+ fn dxn

is called a (differential) 1-form on S. The form is said to be Cr on S if eachfj is Cr on S, where r ∈ N ∪ {+∞}. Given another 1-form

η = g1 dx1 + · · ·+ gn dxn

on S and a, b ∈ R, the 1-form aω + bη on S is defined byaω + bη := (af1 + bg1) dx1 + · · ·+ (afn + bgn) dxn.

If ~H = (h1, . . . , hn) is a vector field on S, we define the inner product ω · ~H ofω and ~H on S by

ω · ~H(x) :=n∑j=1

fj(x)hj(x), x ∈ S.

The integral of a continuous (that is, C0) 1-form ω over a C1 curveϕ : [a, b] → S is defined as∫ϕ

ω =∫ b

a

[f1(ϕ(t)

)ϕ′1(t) + · · ·+ fn

(ϕ(t)

)ϕ′n(t)

]dt =

∫ b

a

[~F (ϕ(t))

]·ϕ′(t) dt,

where ~F := (f1, . . . , fn). If ϕ is only piecewise C1, then∫ϕω is defined to be

the sum of the integrals over the intervals on which ϕ is C1.The following properties of the integral are easily established:

•∫ϕ

(aω + bη) = a

∫ϕ

ω + b

∫ϕ

η,

•∫−ϕ

ω = −∫ϕ

ω, and

•∫ϕ1+···+ϕk

ω =k∑j=1

∫ϕj

ω.


A continuous 1-form ω = f1 dx1 + · · ·+ fn dxn on an open set U ⊆ Rn issaid to be exact if there exists a C1 function f on U such that fj = ∂jf on Ufor each j. We then write

ω = df =n∑i=1

∂f

∂xjdxj .

The following proposition shows that the integral of an exact form over acurve depends only on f and the endpoints of the curve.

12.2.9 Proposition. If ϕ : [a, b]→ U is piecewise C1, then∫ϕ

df = f(ϕ(b)

)− f

(ϕ(a)

).

Proof. If ϕ is C1, then, by the chain rule and the fundamental theorem ofcalculus,∫ϕ

df =∫ b

a

n∑i=1

(∂jf)(ϕ(t)

)ϕ′i(t) dt =

∫ b

a

(f ◦ ϕ)′(t) dt = f(ϕ(b)

)− f

(ϕ(a)

).

If ϕ is only piecewise C1, subdivide the interval [a, b] into intervals on which ϕis smooth, apply the above result to each subinterval, and sum the results.

12.2.10 Theorem. Let U ⊆ Rn be open and connected and let ω be a contin-uous 1-form on U . The following statements are equivalent:

(a) ω is exact.

(b)∫ϕω = 0 for every closed piecewise C1 curve ϕ in U .

(c)∫φω =

∫ψω for every pair of piecewise C1 curves φ, ψ : [a, b]→ Rn in U

with φ(a) = ψ(a) and φ(b) = ψ(b).

Proof. That (a) implies (b) follows from 12.2.9.φ

ψ

ψ(a) = φ(a) ψ(b) = φ(b)ϕ

FIGURE 12.5: ϕ = ψ − φ.

For (b) implies (c), define a closed, piecewise smooth curve ϕ : [a, b+1]→ Rnby

ϕ(t) = ψ(t), a ≤ t ≤ b, ϕ(t) = φ (b+ (b− t)(b− a)) , b ≤ t ≤ b+ 1.

(See Figure 12.5.) Then ϕ|[b,b+1] is equivalent to −φ, hence if (b) holds,

0 =∫ϕ

ω =∫ψ

ω −∫φ

ω,

proving (c).

a

x

ϕx

U

ψt

x+ tej

FIGURE 12.6: ϕx+tej = ϕx + ψt.

Now assume that (c) holds and let ω =∑nj=1 fj dxj . To establish (a), we

construct a function f on U such that ∂jf = fj . Choose any point a ∈ U .By Exercise 8.7.8, for each x ∈ U there exists a piecewise C1 curve ϕx in Uwith initial point a and terminal point x. Define f(x) =

∫ϕxω. By (c), f(x) is

independent of the path and hence is well-defined. Fix j, let t > 0, and denoteby ψt the line segment x+ uej , 0 ≤ u ≤ t. Then ψt lies in U for sufficientlysmall t > 0, and by continuity of fj ,

1t

[f(x+ tej)− f(x)

]= 1t

∫ψt

ω = 1t

∫ t

0fj(x+ uej) du→ fj(x)

as t→ 0+. A similar argument works for the case t→ 0−. Therefore, ∂jf(x) =fj(x), as required.

Exercises1.S Determine which of the following curves are rectifiable.

(a) ϕ(t) = (t, t−p), 0 < t ≤ 1, where p > 0.(b) ϕ(t) = (e−t, e−t2 , e−t3), 0 ≤ t < +∞.(c) ϕ(t) =

(t−1, e−t

), t ≥ 1.

(d) ϕ(t) =(t−1, e−t

), 0 < t ≤ 1.

2. Evaluate∫ϕf for

(a) ϕ(t) = (t3/3, t4/4), 1 ≤ t ≤ 2, f(x, y) = x/y.(b)S ϕ(t) =

(t, sin(2t), cos(2t)

), 0 ≤ t ≤ π/4, f(x, y, z) = xz.

(c) ϕ(t) =(t, t2/2, t3/3

), 0 ≤ t ≤ 1, f(x, y, z) = x+ 6z.

(d) ϕ(t) =(

sin t,√

2 cos t, sin t), 0 ≤ t ≤ π/2, f(x, y, z) = xyz.

3. Set up, but do not evaluate, the integral that gives the circumference of

the ellipse x2

a2 + y2

b2= 1. (Your answer should involve sin2 t.)

4. In each case below, find a smooth simple curve or a smooth simple closedcurve with trace C. Use the parametrization to find an integral thatgives the length of the curve. (Do not evaluate the integral.)(a) C =

{(x, y) : x3 − 7y2 = 1, 1 < x < 2, y > 0

}.

(b)S C ={

(x, y) : 9(x− 1)2 + 4(y − 2)2 = 36}.

(c) C ={

(x, y) : x2 − y2 = 4, x > 2, 0 < x+ y < a}.

5. Let ϕ(x) = (x, g(x)), a ≤ x ≤ b, where g is continuously differentiable,and let f(x, y) be continuous on the graph of g. Show that∫

ϕ

f =∫ b

a

f(x, g(x)

)√1 + [g′(x)]2 dx.

Use this to find

(a)∫ϕ

f if g(x) = (2/5)x5/2 and f(x, y) = x2, 0 ≤ x ≤ 1.

(b)S the length of the graph of the equation x2/3 + y2/3 = 1.

(c) the length of the graph of the function g(x) = xp

2p + x2−p

2(p− 2) , where0 < a ≤ x ≤ b and p > 2.

6. Prove 12.2.5.

7.S Let a smooth curve ϕ : [a, b]→ R2 be described in polar coordinates by

ϕ(t) =(r(t) cos θ(t), r(t) sin θ(t)

), r(t) ≥ 0.

Show that

length(ϕ) =∫ b

a

√r(t)

[θ′(t)

]2 +[r′(t)

]2dt.

8. Let ~F = (F1, F2, F3) be a force field in R3 that moves a particle of massm along a smooth curve ϕ : [a, b]→ R3. The kinetic energy of the particleat time t is defined as 1

2m‖ϕ′(t)‖2. Use Newton’s second law ~F = mϕ′′

to show that the work done by the force in moving the particle from ϕ(a)to ϕ(b) is the change in kinetic energy

12m‖ϕ

′(b)‖2 − 12m‖ϕ

′(a)‖2.

9. A force field ~F in R3 is said to be conservative if there exists a functionP (x, y, z) such that ~F = −∇P . P (x, y, z) is called the potential energyof an object at the point (x, y, z).(a)S Show that the work done by a conservative force in moving theobject along a curve ϕ from ϕ(a) to ϕ(b) is

P(ϕ(a)

)− P

(ϕ(b)

).

(b) Deduce the Law of Conservation of Energy

P(ϕ(b)

)+ 1

2m‖ϕ′(b)‖2 = P

(ϕ(a)

)+ 1

2m‖ϕ′(a)‖2,

that is, the sum of the potential and kinetic energies is constant.(c) Find a potential function for the gravitational force field

F (x) = −mMG‖x‖−3x,

where M is the mass of the earth (concentrated at the origin, the centerof the earth), m is the mass of the particle at point x, and G is thegravitation constant.

10. For a smooth curve ϕ : [a, b] :→ Rn, define the arc length functions = s(t) by

s(t) =∫ t

a

‖ϕ′(τ)‖ dτ, a ≤ t ≤ b.

Show that s has a smooth inverse t = t(s), 0 ≤ s ≤ ` := length(ϕ). Thecurve ψ(s) = ϕ(t(s)) is called a reparametrization of ϕ by arc length.Show that, for a continuous vector field ~F on trace(ϕ) = trace(ψ),∫

ϕ

~F · ~Tϕ =∫ `

0~F(ψ(s)

)· ψ′(s) ds.

11. Let P = {a = t0 < t1 < · · · < tk = b} be a partition of [a, b]. Forf : [a, b]→ R, define

VP(f) =k∑j=1|f(tj)− f(tj−1)|.

Then f is said to have bounded variation on the interval [a, b] ifsupP VP(f) < +∞. (Section 5.9.) Show that a curve ϕ = (ϕ1, . . . , ϕn) :[a, b] → Rn is rectifiable iff each component function ϕi has boundedvariation on [a, b].


12.3 Parameterized Surfaces12.3.1 Definition. Let 1 ≤ m ≤ n. A smooth parameterized m-surface inRnis a C1 function ϕ= (ϕ1, . . . , ϕn) : U → Rn, where U ⊆ Rm is open andthe derivative ϕ′(u) has rank m at each point u ∈ U . A reparametrization ofϕ is a smooth parameterized m-surface ψ = ϕ ◦ α : V → Rn, where V ⊆ Rmis open and α : V → U is C1 with C1 inverse α−1 : U → V such that Jα > 0on V . In this case, ϕ and ψ are said to be equivalent. ♦

We shall usually drop the qualifier “smooth” when referring to parameter-ized surfaces. Note that the parameter set U is a m-parameterized surface inRm. Here, we take ϕ to be the identity map ι : U → U .

Tangent Spaces of a Parameterized SurfaceLet ϕ : U → Rn be a parameterized m-surface and u ∈ U . For small |t| the

line segment u+ tej is contained in U and is mapped by ϕ onto a curve inS := ϕ(U) with tangent vector

d

dt

∣∣∣t=0

ϕ(u+ tej) = dϕu(ej) =(∂ϕ1

∂uj(u), . . . , ∂ϕn

∂uj(u))

=: ∂jϕ(u),

where e1, . . . , em are the standard basis vectors in Rm. Note that ∂jϕ(u) isjust the jth column of ϕ′(u). Since ϕ′(u) has rank m, the vectors dϕu(ej) arelinearly independent and hence form a basis for an m-dimensional subspaceTϕ(u) of Rn, called the tangent space of ϕ at u. Thus dϕu is a linear isomor-phism from Rm onto Tϕ(u) mapping the frame (e1, . . . , em) onto the frame(∂1ϕ(u), . . . , ∂mϕ(u)).1 Note that ϕ is not assumed to be one-to-one, andϕ(u) = ϕ(v) does not necessarily imply that Tϕ(u) = Tϕ(v). (See Figure 12.7.)

Tϕ(u)

Tϕ(v)

ϕ(U)

p

FIGURE 12.7: Tangent spaces at p = ϕ(u) = ϕ(v).

1A frame in a finite dimensional vector space is simply an ordered basis—see Appendix B.


Orientation of a Parameterized m-SurfaceTangent spaces may be used to assign an orientation to a parameterized

m-surface, a notion that will be needed later to construct the integral of adifferential form on a surface. First, we define orientation for the space Rm.

Two frames (v1, . . . ,vm) and (w1, . . . ,wm) in Rm are said to be orientationequivalent if the determinants of the matrices[

v1 · · · vm]

and[w1 · · · wm

](where vj andwj are written as column vectors) have the same sign. Orientationequivalence is easily seen to be an equivalence relation. The collection of framesof Rm is therefore partitioned into two classes, one that contains (e1 . . . , em)and the other containing (−e1 . . . , em). An orientation is assigned to Rm bydesignating one of these equivalence classes to be positive and the other negative.Any frame in the former class is then said to have positive orientation, whilea frame in the latter class is said to have negative orientation. For example,if m = 3 and (v1,v2,v3) has positive orientation, then so does (v2,v3,v1),while (v2,v1,v3) has negative orientation. By convention, the standard orpositive orientation of Rm is the orientation obtained by designating the frame(e1, . . . , em) to be positive. For example, in the standard orientation, the signof the frame (em, e1, . . . , em−1) is (−1)m−1. We shall always assume that thespaces Rm have the standard orientation.

A parameterized m-surface ϕ : U → Rn is said to be orientable if, wheneverϕ(u) = ϕ(v),

• Tϕ(u) = Tϕ(v) and• the matrix of the linear transformation,

Tuv = (dϕv)−1 ◦ dϕu : Rm → Rm (12.4)

has positive determinant.

Frames (ξ1, . . . , ξm) and (ζ1, . . . , ζm) in Tϕ(u) are then declared to be orien-tation equivalent if the frames

dϕ−1u (ξ1, . . . , ξm) :=

(dϕ−1

u (ξ1), . . . , dϕ−1u (ξm)

)and

dϕ−1u (ζ1, . . . , ζm) :=

(dϕ−1

u (ζ1), . . . , dϕ−1u (ζm)

)are orientation equivalent in Rm. Since

Tuv ◦ (dϕu)−1(ξ1, . . . , ξm) = (dϕv)−1(ξ1, . . . , ξm)

and detTuv > 0, the frames dϕ−1u (ξ1, . . . , ξm) and dϕ−1

v (ξ1, . . . , ξm) have thesame sign, hence the notion of orientation equivalence in the common tangentspace Tϕ(u) = Tϕ(v) is well-defined. As with the vector space Rm, orientation


equivalence on Tϕ(u) is an equivalence relation with two equivalence classes,one containing dϕu(e1, . . . , em), the other containing dϕu(−e1 . . . , em). Thepositive (negative) orientation of ϕ is obtained by designating the equivalenceclass containing dϕu(e1, . . . , em) to be positive (negative) for every u ∈ U .We define the sign of ϕ by

sign(ϕ) ={

+1 if ϕ is positively oriented,−1 ϕ is negatively oriented.

Obviously, if ϕ is one-to-one, then it is orientable. For example, a simplesmooth curve ϕ : I → Rn is orientable, and since

d(ϕt)(e1) = ϕ′(t) = lim∆t→0+

ϕ(t+ ∆t)− ϕ(t)∆t ,

the positive orientation is the one for which the tangent vector dϕt(e1) is inthe direction of increasing t. By contrast, the curve in Figure (12.7) is notorientable.

12.3.2 Example. Let a1, . . . ,am be linearly independent vectors in Rn andlet b ∈ Rn. Define m-dimensional parameterized affine space ϕ : Rm → Rn by

ϕ(u) = ϕ(u1, . . . , um) = b+m∑i=1

uiai.

x1

x2

x3

u1a1

u2a2

b

FIGURE 12.8: Affine space.

Since ϕ is one-to-one, it is orientable. Since ∂iϕ = ai, the tangent space ateach point is the subspace of Rn with frame (a1, . . . ,am). ♦

12.3.3 Example. The Cartesian product of circles

ϕ(θ1, . . . , θm) =(r1 cos θ1, r1 sin θ1, . . . , rm cos θm, rm sin θm

), ri > 0,

is a parameterized m-surface in R2m. Orientability follows from the periodicityof the sine and cosine functions. ♦


Orientation of a Parameterized (n− 1)-SurfaceForm = n−1, the notion of orientability may be formulated more concretely

in terms of a normal vector field.

12.3.4 Lemma. Let ϕ : U → Rn be a parameterized (n− 1)-surface. Define∂ϕ⊥ : U → Rn by

∂ϕ⊥ :=n∑i=1

(−1)i+n ∂(ϕ1, . . . , ϕi, . . . , ϕn)∂(u1, . . . , un−1) ei,

where the hat indicates that ϕi is omitted in the calculation, and let

A :=

∂1ϕ(u)

...∂n−1ϕ(u)∂ϕ⊥(u)

n×n

.

Then dϕ⊥(u) is perpendicular to the tangent space Tϕ(u), and

|A| = ‖∂ϕ⊥(u)‖2 = det[ϕ′(u)tϕ′(u)

]> 0. (12.5)

Proof. Let m = n− 1. For each j, the determinant

Dj(u) :=

∣∣∣∣∣∣∣∣∣∂jϕ1(u) · · · ∂jϕn(u)∂1ϕ1(u) · · · ∂1ϕn(u)

......

∂mϕ1(u) · · · ∂mϕn(u)

∣∣∣∣∣∣∣∣∣has two identical rows and hence is zero. Expanding Dj(u) along the first rowand multiplying by (−1)m yields

Dj(u) = (−1)mn∑i=1

(−1)i+1∂jϕi∂(ϕ1, . . . , ϕi, . . . , ϕn)∂(u1, . . . , un−1) = ∂jϕ(u) · ∂ϕ⊥(u).

Therefore, d∂j(u) · ϕ⊥(u) = 0, so ϕ⊥(u) is perpendicular to Tϕ(u).To prove the first equality in (12.5), expand |A| along the last row to obtain

|A| =n∑i=1

[∂(ϕ1, . . . , ϕi, . . . , ϕn)

∂(u1, . . . , um)

]2= ‖ϕ⊥(u)‖2 > 0,

the positive inequality because ϕ′ has rank m.For the second equality in (12.5), using what has already been established

we calculate

‖∂ϕ⊥(u)‖4 = |A|2 = |AAt|

=

∣∣∣∣∣∣∣∣∣∂1ϕ(u) · ∂1ϕ(u) · · · ∂1ϕ(u) · ∂mϕ(u) 0

......

...∂mϕ(u) · ∂1ϕ(u) · · · ∂mϕ(u) · ∂mϕ(u) 0

0 · · · 0 ‖∂ϕ⊥(u)‖2

∣∣∣∣∣∣∣∣∣= ‖∂ϕ⊥(u)‖2 det

[ϕ′(u)tϕ′(u)

].

12.3.5 Corollary. The frame(dϕu(e1), . . . , dϕu(en−1), ∂ϕ⊥(u)

)is positively

oriented in Rn.

12.3.6 Theorem. Let ϕ : U → Rn be a parameterized (n − 1)-surface. Thefollowing statements are equivalent:

(a) ϕ is orientable.

(b) ϕ(u) = ϕ(v)⇒ ∂ϕ⊥(u) = c∂ϕ⊥(v) for some c > 0.

(c) There exists a function ~Nϕ : ϕ(U)→ Rn (necessarily unique) such that

~Nϕ(ϕ(u)

)= ‖∂ϕ⊥(u)‖−1∂ϕ⊥(u) (12.6)

= 1√det[ϕ′(u)tϕ′(u)

] n∑i=1

(−1)i+n ∂(ϕ1, . . . , ϕi, . . . , ϕn)∂(u1, . . . , un−1) ei.

Proof. For u ∈ U , let Tu : Rn → Rn denote the unique linear isomorphismsuch that

Tu(ej) = dϕu(ej), 1 ≤ j ≤ n− 1, and Tu

(en) = ∂ϕ⊥(u).

By Lemma 12.3.4, detTu = ‖ϕ⊥(u)‖2 > 0. Suppose that ϕ(u) = ϕ(v). Since∂ϕ⊥(u) ⊥ Tϕ(u) and ∂ϕ⊥(v) ⊥ Tϕ(v),

Tϕ(v) = Tϕ(u) iff ∂ϕ⊥(u) = c∂ϕ⊥(v) for some c 6= 0.

In this case, by (12.4),

(T−1v Tu)(ej) =

(dϕv

)−1(dϕu(ej)

)= Tuv(ej), 1 ≤ j ≤ n− 1, and

(T−1v Tu)(en) = T−1

v

(c∂ϕ⊥(v)

)= cen.

Thus the matrix of T−1v Tu has columns Tuv(e1), . . ., Tuv(en−1), cen. It follows

that0 < detTu/ detTv = det(T−1

v Tu) = cdetTuv. (12.7)


With these preliminaries out of the way, assume that ~Nϕ exists and letϕ(u) = ϕ(v). Then

‖∂ϕ⊥(u)‖−1∂ϕ⊥(u) = ~Nϕ(ϕ(u)

)= ~Nϕ

(ϕ(v)

)= ‖∂ϕ⊥(v)‖−1∂ϕ⊥(v),

hence ∂ϕ⊥(u) = c∂ϕ⊥(v) for some c > 0. By the first paragraph, Tϕ(u) = Tϕ(v)and detTuv > 0. Therefore, ϕ is orientable.

Conversely, assume that ϕ is orientable and let ϕ(u) = ϕ(v). Then Tϕ(u) =Tϕ(v), hence ∂ϕ⊥(u) = c∂ϕ⊥(v) for some c 6= 0. Since det[Tuv] > 0, c > 0 by(12.7). Therefore,

‖∂ϕ⊥(u)‖−1∂ϕ⊥(u) = ‖∂ϕ⊥(v)‖−1∂ϕ⊥(v),

so ~Nϕ may be unambiguously defined by (12.6).

12.3.7 Special Cases.(a) n = 2: Then ϕ⊥ = (−ϕ′2, ϕ′1), the inward normal. (Figure 12.9.)

(−ϕ′2, ϕ

′1)

(ϕ′1, ϕ

′2)

FIGURE 12.9: The inward unit normal.

(b) n = 3: Then

∂ϕ⊥ =(∣∣∣∣∂1ϕ2 ∂1ϕ3∂2ϕ2 ∂2ϕ3

∣∣∣∣ ,− ∣∣∣∣∂1ϕ1 ∂1ϕ3∂2ϕ1 ∂2ϕ3

∣∣∣∣ , ∣∣∣∣∂1ϕ1 ∂1ϕ2∂2ϕ1 ∂2ϕ2

∣∣∣∣) = ∂1ϕ× ∂2ϕ,

the familiar cross product of ∂1ϕ and ∂2ϕ.

S = ϕ(U)

dϕu(e1)

p

~Nϕ(p)dϕu(e

2)

FIGURE 12.10: Normal vector to S at p.

Thus the positively oriented frame(dϕu(e1), dϕu(e2), ~Nϕ(p)

)is a right-handed

system, as shown in Figure 12.10.


(c) Let U ⊆ Rn−1 be open and let g : U → R be C1. Define

ϕ(u1, . . . , un−1) =(u1, . . . , un−1, g(u1, . . . , un−1)

).

Then ϕ(U) is the graph of g. Since ϕ is one-to-one, it is orientable. Also,

∂jϕ =(0, · · · , 0, 1

j, 0, · · · , 0, ∂jg

)⊥ (−∂1g, · · · ,−∂jg, · · · ,−∂n−1g, 1

)and, by elementary row operations,∣∣∣∣∣∣∣∣∣∣∣

1 · · · 0 ∂1g0 · · · 0 ∂2g...

......

0 · · · 1 ∂n−1g−∂1g · · · −∂n−1g 1

∣∣∣∣∣∣∣∣∣∣∣= (∂1g)2 + · · ·+ (∂n−1g)2 + 1.

Since this is positive, by uniqueness,

~Nϕ ◦ ϕ =(−∂1g, · · · ,−∂n−1g, 1

)√(∂1g)2 + · · ·+ (∂n−1g)2 + 1

=(−∇g, 1

)√‖∇g‖2 + 1

. ♦

12.3.8 Example. Let r > 0 and define

ϕ(θ1, θ2) = (r sin θ1 cos θ2, r sin θ1 sin θ2, r cos θ1), θ1 ∈ (0, π), θ2 ∈ (0, 2π).

The image of ϕ is the sphere in R3 with radius r and center (0, 0, 0) and withthe great circle (r sin θ1, 0, r cos θ1) (that is, θ2 = 0) through the poles (0, 0,±r)missing. Since

∂1ϕ(θ1, θ2) = r(cos θ1 cos θ2, cos θ1 sin θ2,− sin θ1) and∂2ϕ(θ1, θ2) = r(− sin θ1 sin θ2, sin θ1 cos θ2, 0),

by 12.3.7

∂ϕ⊥(θ1, θ2) = ∂1ϕ(θ1, θ2)× ∂2ϕ(θ1, θ2)= r sin θ1

(r sin θ1 cos θ2, r sin θ1 sin θ2, r cos θ1

)= (r sin θ1)ϕ(θ1, θ2).

Therefore,( ~Nϕ ◦ ϕ)(θ1, θ2) = ϕ(θ1, θ2)

‖ϕ(θ1, θ2)‖ = r−1ϕ(θ1, θ2),

that is,~Nϕ(p) = p

‖p‖, p ∈ S. ♦


ψ(t)

x3

x1

x2(ψ1(t), ψ2(t) cos θ, ψ2(t) sin θ

)

θ

FIGURE 12.11: Surface of revolution.

12.3.9 Example. Let I be an open interval and ψ : I → R2 a smooth curvewith ψ2(t) > 0 for t ∈ I. The parameterized surface of revolution in R3 isdefined by

ϕ(t, θ) = (ψ1(t), ψ2(t) cos θ, ψ2(t) sin θ), t ∈ I, θ ∈ R.

From (12.3.7) and the calculations

∂1ϕ(t, θ) =(ψ′1(t), ψ′2(t) cos θ, ψ′2(t) sin θ

),

∂2ϕ(t, θ) =(0,−ψ2(t) sin θ, ψ2(t) cos θ

),

we have∂ϕ⊥(t, θ) = ψ2(t)

(ψ′2(t),−ψ′1(t) cos θ,−ψ′1(t) sin θ

)(12.8)

and∂ψ⊥(t) = (−ψ′2(t), ψ′1(t)).

Now suppose that ψ is orientable. We claim that ϕ is then orientable. Tosee this, suppose that ϕ(t1, θ1) = ϕ(t2, θ2). Then ψ1(t1) = ψ1(t2), and becauseψ2(t) > 0, ψ2(t1) = ψ2(t2) and hence θ2 = θ1 + 2kπ. By orientability of ψ,

(−ψ′2(t2), ψ′1(t2)) = ∂ψ⊥(t2) = c∂ψ⊥(t1) = c(−ψ′2(t1), ψ′1(t1))

for some c > 0. It follows from (12.8) that

∂ϕ⊥(t2, θ2) = ψ2(t2)(ψ′2(t2),−ψ′1(t2) cos θ2,−ψ′1(t2) sin θ2

)= cψ2(t1)

(ψ′2(t1),−ψ′1(t1) cos θ1,−ψ′1(t1) sin θ1

)= c∂ϕ⊥(t1, θ1),

which shows that ϕ is orientable. Moreover, from (12.8),

‖∂ϕ⊥(t, θ)‖ = ψ2(t)‖ψ′(t)‖,

hence

~Nϕ(ϕ(t, θ)) = ∂ϕ⊥(t, θ)‖∂ϕ⊥(t, θ)‖ = ‖ψ′(t)‖−1(ψ′2(t),−ψ′1(t) cos θ,−ψ′1(t) sin θ

),

which is the rotation of the unit normal vector −Nψ about the x1 axis.For the special case ψ(x) =

(x, f(x)

),

ϕ(x, θ) = (x, f(x) cos θ, f(x) sin θ)

and~Nϕ(ϕ(x, θ)) =

([f ′(x)]2 + 1

)−1/2(f ′(x),− cos θ,− sin θ

).

A point (x, y, z) on the surface S = ϕ(U) and not on the graph of f may bewritten uniquely as(

x, f(x) cos(θ(y, z)), f(x) sin(θ(y, z)

))where 0 < θ(y, z) < 2π is the (continuous) argument of (y, z) determined byθ0 = 0 (see 9.4.6). Therefore,

~Nϕ(x, y, z) =([f ′(x)]2 + 1

)−1/2 (f ′(x),− cos

(θ(y, z)

),− sin

(θ(y, z)

)),

which is continuous on S by the periodicity of sine and cosine. ♦

12.3.10 Example. The parameterized Möbius strip is defined by

ϕ(t, θ) =([

2 + t cos( 1

2θ)]

cos θ,[2 + t cos

( 12θ)]

sin θ, t sin( 1

2θ)),

where −1 < t < 1 and θ ∈ R. The surface may be concretely realized by takingone end of a long strip of paper, giving it a half-twist, and gluing it to theother end.

FIGURE 12.12: Möbius strip.

The Möbius strip is not orientable. Indeed, ϕ(0, 0) = ϕ(0, 2π), but since

∂1ϕ(0, 0) = −∂1ϕ(0, 2π) = (1, 0, 0) and ∂2ϕ(0, 0) = ∂2ϕ(0, 2π) = (0, 1, 0),

we see that

∂1ϕ(0, 0)× ∂2ϕ(0, 0) = (0, 0, 1) = −∂1ϕ(0, 2π)× ∂2ϕ(0, 2π).

Therefore, ~Nϕ cannot exist. ♦

Exercises1. Assuming that R3 has the standard orientation, find the sign of the

frames(a)S (e1 + e2, e2 + e3, e3 + e1).(b) (−e1 + e2 + e3, e1 − e2 + e3, e1 + e2 − e3).

2. Show that the frames

(e1 + e2 + e3, 2e1 + e2 + 3e3) and (e1 + 3e2 − e3, e1 + 4e2 − 2e3)

in R3 span the same subspace but have opposite orientations.

3. Let ϕ : U → Rn be a parameterized m-surface and ψ = ϕ ◦ α : V → Rna reparametrization of ϕ. Show that ϕ is orientable iff ψ is orientable.

4. Let ϕ : U → Rn be an orientable parameterized (n− 1)-surface and letψ = ϕ ◦ α : V → Rn be a reparametrization of ϕ. Find ∂ψ⊥ in terms of∂ϕ⊥. Use the result to show that Nψ = Nϕ on S := ϕ(U) = ψ(V ).

5.S Use 12.3.9 to find ~Nϕ(x, y, z) for the torus

ϕ(φ, θ) =(a cosφ, (b+ a sinφ) cos θ, (b+ a sinφ) sin θ

), 0 < θ, φ < 2π,

where 0 < a < b.

6. Find ~Nϕ(x, y, z) for the following orientable 2-surfaces in R3:

(a)S ϕ(t, θ) = (t cos θ, t sin θ, t), t > 0, θ ∈ R.(b) ϕ(t, θ) = (sinh t, cosh t cos θ, cosh t sin θ), t, θ ∈ R (hyperboloid of

one sheet).(c) ϕ(t, θ) = (cosh t, sinh t cos θ, sinh t sin θ), t, θ ∈ R (one sheet of a

hyperboloid of two sheets).(d) ϕ(t, θ) = (t cos θ, t sin θ, θ), t > 0, θ ∈ R (helicoid).(e) ϕ(t, θ) = (t cos θ, t sin θ, θ2), t > 0, θ > 0.(f)S ϕ(t, s) = (1 − s)

(a cos t, a sin t, 0

)+ s(b cos t, b sin t, 1), 0 < s < 1,

where 0 < a < b.

7.S Let V ⊆ Rn−2 be open and let ψ : V → Rn−1 be an (n−2)-parameterizedsurface in Rn−1. Define the cylinder ϕ over ψ by

ϕ(v, s) =(ψ(v), s

), v ∈ V, s ∈ (a, b).

Show that

(a) ϕ is a parameterized (n− 1)-surface in Rn.(b) ∂ϕ⊥(u1, . . . , un−1) =

(∂ψ⊥(u1, . . . , un−2), 0

).

(c) ϕ is orientable iff ψ is orientable, in which case

Nϕ(x1, . . . , xn) =(Nψ(x1, . . . , xn−1), 0

).

8. Let V ⊆ Rn−2 be open and let ψ : V → Rn−1 be an (n−2)-parameterizedsurface in Rn−1. Define the cone over ψ by

ϕ(v, s) =((1− s)ψ(v), s

), v ∈ V, 0 < s < 1.

Show that(a) ϕ is a parameterized (n− 1)-surface in Rn.(b) ∂ϕ⊥(v, s) =

((1− s)n−2∂ψ⊥(v), D(v, s)

), where

D(v, s) =

(1− s)a1,1 · · · (1− s)a1,n−2 −ψ1(v)(1− s)a2,1 · · · (1− s)a2,n−2 −ψ2(v)

......

...(1− s)an−1,1 · · · (1− s)an−1,n−2 −ψn−1(v)

and [ai,j ](n−1)×(n−2) = ψ′(v).

12.4 m-Dimensional SurfacesLet 1 ≤ m < n and let V ⊆ Rn be open. Suppose that the function

F = (F1, . . . , Fn−m) : V → Rn−m

is C1 on V such that the (n−m)× n matrix F ′(x) has rank n−m at eachpoint x ∈ V . A set of the form

S = {x ∈ V : F (x) = c} ,

where c ∈ Rn−m, is called an m-dimensional level surface of F or simply anm-surface in Rn. By replacing F by F − c, we may (and hereafter shall) takec = 0.

Local Parametrization of an m-SurfaceThe following theorem shows that an m-surface may be “patched together”

from a collection of one-to-one parameterized m-surfaces. This will be animportant tool in the development of a theory of integration on m-surfaces.


12.4.1 Theorem. Let S = {x ∈ V : F (x) = 0} be an m-surface in Rn.

(a) For each a ∈ S there exist open sets Ua ⊆ Rm and Va ⊆ Rn with a ∈ Va,and a one-to-one parameterized m-surface ϕa from Ua onto Sa := S ∩ Va.

(b) Each ϕ−1a is the restriction to Sa of a C1 map on Va.

(c) If Sa ∩ Sb 6= ∅, then the mapping

ϕab := ϕ−1b ◦ ϕa : ϕ−1

a (Sa ∩ Sb)→ ϕ−1b (Sa ∩ Sb)

is C1 with inverse ϕba.

(d) The mappings ϕa may be chosen so that 0 ∈ Ua and ϕa(0) = a.

Proof. If (a)–(c) of the theorem hold and a = ϕa(u0), then (d) may be achievedby replacing Ua by Ua − u0 and ϕa by ϕa(u+ u0), u ∈ Ua − u0.

We prove (a)–(c) first for the case m = n − 1, that is, for F real-valued,and then outline the proof for the general case.

Since F has rank 1, ∂iF (a) 6= 0 for some index i (which typically dependson a). Define a C1 map Ga : V → Rn by

Ga(x1, . . . , xn) =(x1, . . . , xi−1, F (x1, . . . , xn), xi+1, . . . xn

).

Thus Ga simply replaces the ith coordinate of its argument x by F (x). Notethat G′a(x) is the identity matrix with row i replaced by ∇F (x). A standardrow reduction shows that JGa(a) = ∂iF (a). Since this is nonzero, by theinverse function theorem there exist open sets Va ⊆ V and Wa = Ga(Va) inRn with a ∈ Va such that Ga is one-to-one on Va and G−1

a : Wa → Va is C1.Taking smaller Wa and Va if necessary, we may suppose that

Wa = (α1, β1)× · · · × (αn, βn).

Note that 0 ∈ (αi, βi), since(a1, . . . , ai−1, 0, ai+1, . . . , an

)= Ga(a) ∈ Wa.

Now let (u1, . . . , un) ∈Wa and set (v1, . . . , vn) = G−1a (u1, . . . , un). Then

(u1, . . . , ui, . . . , un) = Ga(v1, . . . , vn)=(v1, . . . , vi−1, F (v1, . . . , vn), vi+1, . . . , vn

)=(u1, . . . , ui−1, (F ◦G−1

a )(u1, . . . , un), ui+1, . . . , un),

hence(F ◦G−1

a )(u1, . . . , un) = ui (12.9)and, in particular,

(F ◦G−1a )(u1, . . . , ui−1, 0, ui+1, . . . , un) = 0. (12.10)

Now set

Ua := (α1, β1)× · · · × (αi−1, βi−1)× (αi+1, βi+1)× · · · × (αn, βn)


and define ϕa : Ua → Rn by

ϕa(u1, . . . , un−1) = G−1a (u1, . . . , ui−1, 0, ui, . . . , un−1).

By (12.10), F(ϕa(u1, . . . , un−1)

)= 0, hence ϕa(Ua) ⊆ Sa. Conversely, by

(12.9),

(v1, . . . , vn) ∈ Sa ⇒ ui = (F ◦G−1a )(u1, . . . , un) = F (v1, . . . , vn) = 0

⇒ (v1, . . . , vn) = G−1a (u1, . . . , ui−1, 0, ui+1, . . . un−1) = ϕa(u1, . . . , un−1).

Therefore, ϕa(Ua) = Sa.

S

Wa Va

G−1a

Ua

Sa

ϕa

FIGURE 12.13: The mapping G−1a .

Now define the injection mapping ιa : Ua →Wa and the projection mappingπa : Va → Rn−1, respectively, by

ιa(u1, . . . , un−1) = (u1, . . . , ui−1, 0, ui, . . . , un−1) andπa(v1, . . . , vn) = (v1, . . . , vi−1, vi+1, . . . , vn).

Then πa ◦ ιa : Ua → Ua is the identity function and ϕa = G−1a ◦ ιa. Since G−1

a

has rank n and ιa has rank n− 1, ϕa has rank n− 1. Also, if v = ϕa(u), then

(πa ◦Ga)(v) = (πa ◦Ga ◦ ϕa)(u) = πa ◦ ιa(u) = u = ϕ−1a (v),

which shows that ϕ−1a : Sa → Ua is the restriction to Sa of the C1 function

πa ◦Ga : Va → Ua.Now let b ∈ S and Sa ∩ Sb 6= ∅. Then Gb ◦ G−1

a maps the open setGa(Va ∩ Vb) onto the open set Gb(Va ∩ Vb). Also, in the preceding notation,ϕa = G−1

a ◦ ιa on Ua and ϕ−1b = πb ◦Gb on Sb, hence

ϕ−1b ◦ ϕa = πb ◦Gb ◦G−1

a ◦ ιa,

which maps the open set ϕ−1a (Sb∩Sa) ⊆ Ua onto the open set ϕ−1

b (Sb∩Sa) ⊆Ub and is C1 with C1 inverse ϕ−1

a ◦ ϕb. This verifies the theorem for the casem = n− 1.

In the general case, there exist indices i1 < · · · < ik in {1, . . . , n} such that

∂(F1, . . . Fk)∂(ui1 , . . . , uik) (a) 6= 0,

where k := n−m. Let i′1 < i′2 < · · · < i′m denote the complementary indices.(In the above case, these were the indices 1, . . . , i − 1, i + 1, . . . , n.) DefineGa(x1, . . . , xn) to be the n-tuple (x1, . . . , xn), with the coordinates xi1 , . . . , xikreplaced by F1(x), . . ., Fk(x). Then JGa(a) 6= 0, so the sets Va and Wa maybe obtained as before. Define

Ua = (αi′1 , βi′1)× · · · × (αi′m , βi′m)→ Rn

and the injection mapping ιa : Ua →Wa by

ιa(u1, u2, . . . , um) = (v1, v2, . . . , vn),

where vij = 0, 1 ≤ j ≤ k, and vi′j

= uj , 1 ≤ j ≤ m. Thus ιa places zeros inthe coordinate positions i1 < · · · < ik and fills the complementary positionsby u1, . . . , um. Finally, define the projection mapping πa : Va → Rn−1 by

πa(v1, . . . , vn) = (vi′1 , . . . , vi′m).

The proof then proceeds as before.

S

SbSa

ϕb

a b

Ua

ϕaϕab

ϕba

Ub

FIGURE 12.14: Transition mappings.

The functions ϕa : Ua → Sa in the theorem are called local parametrizationsof S, and the C1 functions ϕab are called transition mappings. The sets Sa

are called surface elements. A collection of local parameterizations of S whosesurface elements cover S is called an atlas for S. Note that if F is Cr then,as an examination of the proof reveals, the local parameterizations and thetransition maps are Cr as well.

12.4.2 Example. Consider the (n− 1)-sphere S := {y ∈ Rn : ‖y‖ = 1} withnorth and south poles p := (0, . . . , 0, 1) and q := (0, . . . , 0,−1). Let the pointsy = (y1, . . . , yn) and x = (x1, . . . , xn−1) be related as in Figure 12.15.

y

(x, 0)

Rn−1

S

0

q = (0, . . . ,0,−1)

p = (0, . . . , 0, 1)

Rn

FIGURE 12.15: Stereographic projection from p.

Then for some t,

(x1, . . . , xn−1,−1) = (x, 0)− p = t(y − p) =(ty1, . . . tyn−1, t(yn − 1)

),

hence(x1, . . . , xn−1) = 1

1− yn(y1, . . . , yn−1), −1 ≤ yn < 1.

The mapping

x = ϕ−1(y) = 11− yn

(y1, . . . , yn−1), yn < 1,

from S \ {p} onto Rn−1, is called the stereographic projection from p ontothe equatorial hyperplane xn = 0. One readily checks that the inverse of thismapping is given by

y = ϕ(x) = 11 + ‖x‖2

(2x1, . . . , 2xn−1, ‖x‖2 − 1

), x ∈ Rn−1.

Similarly, the stereographic projection from q is given by

x = ϕ−1(y) = 11 + yn

(y1, . . . , yn−1), yn > −1

with inverse

y = ϕ(x) = 11 + ‖x‖2

(2x1, . . . , 2xn−1, 1− ‖x‖2

).

The set {ϕ, ϕ} is an atlas for S. The transition mapping from Rn−1 \ {0} toRn−1 \ {0} is the self-inverse mapping

(ϕ−1 ◦ ϕ)(x) = x

‖x‖2. ♦


Tangent Space of an m-SurfaceThe local parameterizations ϕa of an m-surface S = {x : F (x) = 0} may

be used to construct a tangent space at each point a ∈ S. Let

ϕa(u) = ϕb(v) ∈ Sa ∩ Sb.

Then v := ϕab(u) and, by the chain rule applied to ϕa = ϕb ◦ ϕab,

d(ϕa)u = d(ϕb)v ◦ d(ϕab)u.

Since d(ϕab)u : Rm → Rm is an isomorphism, the vectors dj := dϕab)u(ej)form a basis of Rm. Therefore, we have the mapping of Rm-frames

d(ϕa)u(e1, . . . , em) = d(ϕb)v(d1, . . . ,dm), (12.11)

which shows that Tϕb(v) = Tϕa(u) and hence makes the following definitionmeaningful.

12.4.3 Definition. The tangent space Tx to S at a point x ∈ S is defined asTϕa(u), where ϕa is any local parametrization of S with ϕa(u) = x. ♦

The next proposition gives an intrinsic characterization of tangent space.

12.4.4 Proposition. For x ∈ S let Λx denote the set of all vectors in Rn ofthe form α′(0), where α : (−r, r)→ S is a C1 curve with α(0) = x. Then

Tx = Λx = {z ∈ Rn : dFx(z) = 0} =n−m⋂i=1{z ∈ Rn : ∇Fi(x) · z = 0} .

Proof. Let ϕ be a local parametrization of S with ϕ(u) = x. A member of Tx

is of the form z =m∑i=1

aidϕu(ei). For small |t|, the curve

α(t) = ϕ

(u+ t

m∑i=1

aiei

)lies in S, α(0) = x, and, by the chain rule,

α′(0) = dϕu

( m∑i=1

aiei

)=

m∑i=1

aidϕ)u(ei) = z.

Therefore, z ∈ Λx. On the other hand, if α′(0) ∈ Λx, then differentiating theidentity (F ◦ α)(t) = 0 at t = 0 yields dFx

(α′(0)

)= 0. We have shown that

Tx ⊆ Λx ⊆ {z : dFx(z) = 0} .

Since dFx(z) = 0 has dimension m, the three spaces must be equal.


12.4.5 Remark. The proposition shows that if S1 is an m1-surface, S2 isan m2-surface, and ψ : S1 → S2 is C1, then for x ∈ S1 and y = ψ(x) thefunction dψx maps Tx into Ty. Indeed, if v ∈ Tx, then there exists a smoothcurve α1 : (−1, 1)→ S1 with α1(0) = x and α′1(0) = v. Then α2 =: ψ ◦ α1 isa smooth curve in S2 and dψx(v) = (ψ ◦ α1)′(0) = α′2(0) ∈ Ty. ♦

S1 S2

ψ

dψx

α2yxα1

v

FIGURE 12.16: The mapping dψx : Tx → Ty.

Orientation of an m-SurfaceLet S be an m-surface with local parameterizations ϕa : Ua → Rn. Since

ϕa is one-to-one, it is orientable. Suppose the parameterizations have the sameorientation, that is, sign(ϕa) = sign(ϕb) for all a and b. If u ∈ Ua, v ∈ Ub,and ϕa(u) = ϕb(v), then (12.11) shows that the orientation of Tϕb(v) agreeswith that of Tϕa(u) iff Jϕab

(v) > 0. Thus if Jϕab> 0 whenever Sa ∩ Sb 6= ∅,

then S may be given a well-defined orientation via the orientations of thelocal parameterizations. In this case, S is said to be orientable. The positiveorientation is obtained if each local parametrization is positively oriented.

Orientation of an (n− 1)-SurfaceOrientability of an (n − 1)-surface may be characterized in terms of the

normal vector fields ~Nϕa . For this we need the following lemma, which relates~Nϕa and ~Nϕb

on overlapping surface elements.12.4.6 Lemma. Let x := ϕa(u) = ϕb(v) ∈ Sa ∩ Sb, where u ∈ Ua, v ∈ Ub.Then

~Nϕa(x) = |Jϕab(u)|−1Jϕab

(u) ~Nϕb(x) = sign

(Jϕab

(u))~Nϕb

(x).

Proof. Since ϕa = ϕb ◦ ϕab and v = ϕab(u), the chain rule implies that

ϕ′a(u) = ϕ′b(v)ϕ′ab(u)

and∂(ϕa,1, . . . , ϕa,i, . . . , ϕa,n)

∂(u1, . . . , un−1) (u) = ∂(ϕb,1, . . . , ϕb,i, . . . , ϕb,n)∂(v1, . . . , vn−1) (v)Jϕab

(u).


From the first equation,√det(ϕ′a(u)tϕ′a(u)

)=√

det(ϕ′ab(u)tϕ′b(v)tϕ′b(v)ϕ′ab(u)

)= |Jϕab

(u)|√

det(ϕ′b(v)tϕ′b(v)

).

The assertion now follows by recalling that

~Nϕa(x) = 1√det(ϕ′a(u)tϕ′a(u)

) n∑i=1

(−1)i+n ∂(ϕa,1, . . . , ϕa,i, . . . , ϕa,n)∂(u1, . . . , un−1) (u)

and

~Nϕb

(x) = 1√

det(ϕ′b(v)tϕ′b(v)

) n∑i=1

(−1)i+n ∂(ϕb,1, . . . , ϕb,i, . . . , ϕb,n)∂(v1, . . . , vn−1) (v).

12.4.7 Theorem. An (n− 1)-surface S is orientable iff there exists a contin-uous vector field ~N on S such that

~N∣∣Sa

= ~Nϕa for each a ∈ S. (12.12)

Proof. If S is orientable, then Jϕab> 0, hence, by 12.4.6, ~Nϕa = ~Nϕb

onSa ∩ Sb. Therefore, (12.12) defines ~N unambiguously. Since ~Nϕa is easily seento be continuous on Sa and Sa is relatively open in S, ~N is continuous on S.

Conversely, assume there exists a continuous vector field ~N on S thatsatisfies (12.12). If x = ϕa(u) ∈ Sa ∩ Sb, then

~Nϕa(x) = ~N(x) = ~Nϕb(x),

hence, by 12.4.6, Jϕab(u) > 0. Therefore, S is orientable.

Let S be orientable with positive orientation. Then, by definition, the frame(d(ϕa)u(e1), . . . , d(ϕa)u(en−1)) in Ta is designated as positive (sign(ϕa) > 0)for each a ∈ S. Since the frame(

d(ϕa)u(e1), . . . , d(ϕa)u(en−1), ~N(a))

in Rn is positive (12.3.5), we say in this case that S is oriented by ~N . Thenotion of orientation by − ~N is defined analogously. For example, the sphereS = {(x1, . . . , xn) : ‖x‖ = r} is locally parameterized by the mappings ϕ andϕ of 12.3.8. The positive orientation is given by the unit normal vector field~N(p) = ‖p‖−1p, called the outward unit normal.

12.4.8 Corollary. If S = {x : F (x) = 0} is connected, then S is orientableand

~N = ‖∇F‖−1∇F or ~N = −‖∇F‖−1∇F.


Proof. Since ∇F (x) is perpendicular to S at x, the uniqueness of ~N impliesthat

~N(x) = s(x) ∇F (x)‖∇F (x)‖ , x ∈ S,

where s(x) = ±1 is constant on each surface element. Since the surface elementsare open in S, s(x) is continuous. Since S is connected, s(x) must be constanton S.

(n− 1)-Surfaces-with-BoundaryTo discuss surfaces-with-boundary, we shall need the following notation:

Rn−1+ :=

{y ∈ Rn−1 : yn−1 > 0

}.

Hn−1 :={y ∈ Rn−1 : yn−1 ≥ 0

}.

∂Hn−1 :={y ∈ Rn−1 : yn−1 = 0

}.

12.4.9 Definition. An (n− 1)-surface-with-boundary is a subset of Rn of theform

S = {x ∈W : F (x) = 0 and gi(x) ≥ 0, i = 1, . . . , k} ,

where W ⊆ Rn is open and F : W → R and gi : W → R are C1 and satisfythe following conditions:

(a) ∇F (x) 6= 0 for all x ∈ S.

(b) The sets Bi := {x ∈ S : gi(x) = 0} are pairwise disjoint.

(c) For each i and x ∈ Bi, the vectors ∇F (x) and ∇gi(x) are linearly inde-pendent.

The set ∂S:=⋃ki=1Bi is called the boundary of S and S \ ∂S is the interior.♦

x1

x2

x3B2 : g2(x) := 1− x3 = 0

B1 : g1(x) := x3 = 0

F (x) := x21 + x2

2 − 1 = 0

∇F

∇F

∇g2

∇g1

FIGURE 12.17: Cylinder-with-boundary: x21 + x2

2 = 1, 0 ≤ x3 ≤ 1.


If V denotes the open set {x ∈W : gi(x) > 0, i = 1, . . . , k}, then

S \ ∂S = {x ∈ V : F (x) = 0} .

Therefore, condition (a) implies that the interior of S is an (n − 1)-surface.Conditions (b) and (c) assert that the boundary of S is made up of disjoint(n− 2)-surfaces. Indeed, if Fi := (F, gi), then Bi = {x ∈W : Fi = 0} and

F ′i =[∇F∇gi

]has rank 2. Also, because the (n− 2)-surfaces Bi are pairwise disjoint, a localparametrization of Bi may be chosen to be disjoint from a local parametrizationof Bj .

The following theorem shows that, as in the case of an (n − 1)-surface,an (n − 1)-surface-with-boundary may be described by a collection of localparameterizations.

12.4.10 Theorem. Let S be an (n− 1)-surface-with-boundary.

(a) If a ∈ S \ ∂S, then there exists a local parametrization ϕa : Ua → Rn ofS \ ∂S at a with ϕa(0) = a.

(b) If a ∈ ∂S, then there exists an open set Ua ⊆ Rn−1 and a one-to-oneparameterized (n− 1)-surface ϕa : Ua → Rn−1 with ϕa(0) = a such thatif Ua := Ua ∩ Hn−1 and ϕa := ϕa

∣∣Ua

, then

(i) ϕa

(Ua

)is open in S,

(ii) ϕa

(Ua ∩ Rn−1

+)is open in S \ ∂S, and

(iii) ϕa

(Ua ∩ ∂Hn−1) is open in ∂S.

S

∂S

a

ϕa

(Ua

)

ϕa

(Ua ∩ Rn−1

+

)

ϕa

(Ua ∩ ∂Hn−1

+

)

FIGURE 12.18: Surface element Sa = ϕa

(Ua ∩ Hn−1).

Proof. Part (a) follows from 12.4.1, since S \ ∂S is an (n− 1)-surface withoutboundary.

For part (b), we may assume without loss of generality that ∂S ={x ∈ S : g(x) = 0}. Choose a local parametrization ψa : Wa → Rn of


S′ := {x ∈W : F (x) = 0} such that ψa(0) = a. Since ψa has rank n− 1 andg has rank 1, ∂i(g ◦ ψa)

(0))6= 0 for some i. Define Ha : Wa → Rn−1 by

Ha(w1, . . . , wn−1) =(w1, . . . , wi−1, wi+1, . . . , wn−1, g ◦ ψa(w1, . . . , wn−1)

).

Then Ha has rank n− 1 at 0, hence, by the inverse function theorem, thereexist open sets Wa ⊆Wa and Ua = Ha(Wa) in Rn−1 with 0 ∈ Wa such thatHa is one-to-one on Wa and H−1

a : Ua → Wa is C1. Set

ϕa = ψa ◦H−1a : Ua → S′.

If u = Ha(w) ∈ Ua, then g ◦ψa(w) = g ◦ψa ◦H−1a (u) = g ◦ ϕa(u), hence, by

definition of Ha,

(u1, . . . , un−1) =(w1, . . . , wi−1, wi+1, . . . , wn−1, g ◦ ϕa(u)

).

Therefore, un−1 = g ◦ ϕa(u), so

g ◦ ϕa(u) > 0 iff u ∈ Ua ∩ Rn−1+ and g ◦ ϕa(u) = 0 iff u ∈ Ua ∩ ∂Hn−1.

It follows that

ϕa

(Ua ∩ Rn−1

+)

= (S \ ∂S) ∩ ψ(Wa

)and ϕa

(Ua ∩ ∂Hn−1) = ∂S ∩ ψ

(Wa

).

Since ψ(Wa

)is open in S′ and S′ ⊇ S, (i)–(iii) follow.

Oriented (n− 1)-Surfaces-with-BoundaryAs in the non-boundary case, orientation of an (n−1)-surface-with-boundary

S may be defined in terms of local parameterizations. By 12.4.4, the (n− 1)-dimensional tangent space at a ∈ S is

T Sa = {z ∈ Rn : z ·∇F (a) = 0} .

The new feature here is that if a ∈ ∂S, say a ∈ Bi, then there is also an(n− 2)-dimensional tangent space to ∂S at a, namely,

T ∂Sa = {z ∈ Rn : z ·∇F (a) = z ·∇gi(a) = 0} .

The connection between T Sa and T ∂Sa is described as follows: Let ϕa be alocal parametrization of S as described in part (b) of 12.4.10, where ϕa(0) =a. Since ϕa(Ua ∩ ∂Hn−1) ⊆ ∂S and (e1, . . . , en−2) is a frame for ∂Hn−1,d(ϕa)0(e1, . . . , en−2) is a frame for T ∂Sa . Since the vector d(ϕa)0(−en−1) isnot in the subspace T ∂Sa ,

d(ϕa)0(−en−1, e1, . . . , en−2) (12.13)

is a frame for T Sa . The induced orientation of ∂S is obtained by declaring theframe d(ϕa)0(e1, . . . , en−2) of T ∂Sa to have the sign of the frame (12.13). If Sis positively oriented, then this sign is (−1)n−1.

S

∂S

T ∂Sa

T SaR+

Hn−1

∂Hn−1

−en−1

0−→N ϕ(a)

a

ϕa

Ua

d(ϕa)0(−en−1)

d(ϕa)0(e1)

n = 3

FIGURE 12.19: Induced orientation of T ∂Sa .

Figure 12.19 depicts the case n = 3. Here, S is oriented by the normal~N (pointing outward). Therefore, by definition, the frame d(ϕa)0(e1, e2) ispositive in T Sa , hence so is the frame d(ϕa)0(−e2, e1). Thus, again by definition,the frame d(ϕa)0(e1) of T ∂Sa is positive in the induced orientation. Note thatbecause the frame (d(ϕa)0(e1), d(ϕa)0(e2), ~Nϕ) in R3 is positive (12.3.5), sois the frame (d(ϕa)0(−e2), d(ϕa)0(e1), ~Nϕ). The latter therefore forms a right-handed system in R3. Thus if d(ϕa)0(−e2) points upward, then d(ϕa)0(e1)must point in the direction shown. Therefore, the induced orientation of ∂Sis the one for which the surface S is on the left when ∂S is traversed in thedirection of the tangent vectors d(ϕa)0(e1).

Exercises1. Let 0 < a < b. Show that the mapping

ϕ(φ, θ) =(a cosφ, (b+ a sinφ) cos θ, (b+ a sinφ) sin θ

), 0 < θ, φ < 2π,

is a local parametrization of the torus x2 +(√

y2 + z2 − b)2 = a2 with

two circles missing.

2. Let U ={x ∈ Rn−1 : ‖x‖ < 1

}and define a local parametrization

ψ : U → Sn−1 = {y ∈ Rn : ‖y‖ = 1} by

ψ(x) =(x,√

1− ‖x‖2), x ∈ Rn−1

Give a geometric description of ψ. Referring to 12.4.2, find the transitionmapping ϕ−1 ◦ ψ.

3.S Consider the stereographic projection ϕ−11 (y) = x from p onto the

hyperplane xn = −1 shown in Figure 12.20, where y = (y1, . . . , yn) andx = (x1, . . . , xn−1). Calculate ϕ1(x) and ϕ−1

1 (y) and find the transitionmapping ϕ−1 ◦ ϕ1, where ϕ is the mapping of 12.4.2.

p = (0, . . . , 0, 1)

Sy

(x,−1)

0

q = (0, . . . , 0,−1)

Rn

FIGURE 12.20: Stereographic projection ϕ−11 (y) from p.

4. Replace the sphere in 12.4.2 by the elliptic paraboloid

S ={

(y1, y2, y3) : y3 =(y1

a1

)2+(y2

a2

)2, y3 < 1

}

(with p = (0, 0, 1)) and find the corresponding maps ϕ and ϕ−1.

5.S Repeat Exercise 4 using the elliptic cone

S ={

(y1, y2, y3) : y23 =

(y1

a1

)2+(y2

a2

)2, 0 < y3 < 1

}.

6. Repeat Exercise 4 using the ellipsoid

S ={

(y1, y2, y3) :(y1

a1

)2+(y2

a2

)2+ y2

3 = 1}.

7. Find the equation of the tangent plane Ta at a = (1, 1, 1) for each of thefollowing surfaces:(a)S x2

1 + 2x22 + 3x2

3 = 6. (b) x21 + x2

2 − 2x23 = 0. (c) x2

1 − x22 + x3 = 1.

8. An n× n matrix A is said to be orthogonal if AtA is the identity matrix.Identifying a 2 × 2 matrix [ x1 x2

x3 x4 ] with the point (x1, x2, x3, x4), showthat the collection of all 2 × 2 orthogonal matrices is a 1-surface Sin R4. Characterize the matrices in the tangent space to S at each of thefollowing points:

(a)[1 00 1

]. (b)

[−1 0

0 1

]. (c)

[0 11 0

]. (d)

[1/√

2 −1/√

21/√

2 1/√

2

].

The matrices in the tangent space at the point in part (a) are the so-called2× 2 skew-symmetric matrices.


9. Referring to 12.4.2, let y ∈ S and set T := d(ϕ−1)y : Ty → Rn−1.

(a)S Prove that ‖T (v)‖ = 1(1− yn)‖v‖ for all v ∈ Ty.

(b) Use (a), the bilinearity of v ·w and T (v) · T (w), and the identity2v ·w = ‖v +w‖2 − ‖v‖2 − ‖w‖2

to prove thatv ·w‖v‖‖w‖

= T (v) · T (w)‖T (v)‖‖T (w)‖ , v, w ∈ Ty.

Thus, by 12.4.4, the stereographic projection preserves the angle at theintersection of a pair of simple smooth curves on S.

10. Let each of the following 2-surfaces-with-boundary be positively oriented.Find parametrizations of the boundary curves that are compatible withthe induced orientation on the boundary.(a) S =

{(x1, x2, x3) : x2

1 + x22 = 1, 0 ≤ x3 ≤ 2− x2

}.

(b) S ={

(x1, x2, x3) : x3 = x21 + x2

2, 0 ≤ x3 ≤ 1− x1 − x2}.

(c) S ={

(x1, x2, x3) : x21 + x2

2 + x23 = 4, −2 ≤ x3 ≤ 3− x1 − x2

}.

Hint. For (c) the boundary is a circle on the plane x1 + x2 + x3 = 3.Translate and rotate that plane into the plane x3 = 0, find a parametricequation of the rotated circle with center 0, then reverse the proce-dure to find the parametrization of the original circle with appropriateorientation.

11. Let S = {x : F (x) = 0} be an oriented 2-surface in R3, where F is C2.(a)S The tangent bundle of S is the set

TS =⋃

x∈S{x} × Tx.

Show thatTS =

{(x,v) ∈ R6 : F (x) = 0 and v ·∇F (x) = 0

}and that TS is a 4-surface in R6.(b) The sphere bundle of S is the subset

T 1S := {(x,v) ∈ TS : ‖v‖ = 1} .

Show that T 1S is a 3-surface in R6.

(c) Let S ={x ∈ R3 : ‖x‖2 = 3

}. Show that the tangent space to the

sphere bundle T 1S at the point (1, 1, 1, 1/

√6, 1/√

6,−2/√

6) consists ofall vectors w ∈ R6 satisfying the system

w1 + w2 + w3 = 0−3√

6w3 + w4 + w5 + w6 = 0w4 + w5 − 2w6 = 0

Chapter 13Integration on Surfaces

Throughout the chapter m and n arefixed positive integers with 1 ≤ m ≤ n.

In this chapter we construct the integral of a differential m-form on anm-surface in Rn, a generalization of the line integral of a 1-form on a curve.This will provide the necessary context for the divergence theorem and thetheorems of Green and Stokes, far-reaching generalizations of the fundamentaltheorem of calculus

13.1 Differential FormsAlternating Multilinear Functionals

An m-multilinear functional on Rn is a real-valued function

M(a1, . . . ,am), a1, . . . ,am ∈ Rn,

that is linear in each variable ai separately. (See Section 9.7.) Such a functionis said to be alternating if interchanging two vectors changes the sign of M :

M(a1, . . . ,ai, . . . ,aj , . . . ,am) = −M(a1, . . . ,aj , . . . ,ai, . . . ,am).

Thus if ai = aj , then M(a1, . . . ,am) = 0. Note that a linear combination ofalternating m-multilinear functionals is an alternating m-multilinear functional.

A permutation of (1, . . . ,m) is a one-to-one function σ mapping {1, . . . ,m}onto itself, frequently denoted by (i1, . . . , im), where ik = σ(k). The sign (−1)σof σ is positive (negative) if an even (odd) number of adjacent interchangesare required to transform (i1, . . . , im) back to (1, . . . ,m) (see Appendix B). Itfollows that if M is an alternating m-multilinear functional, then

M(aσ(1), . . . ,aσ(m)) = (−1)σM(a1, . . . ,am).

An important example is the determinant of an n × n matrix, which is

447

multilinear and alternating on its rows as well as its columns. To build on this,we introduce the following notation. Define

Jm = {j := (j1, . . . , jm) : 1 ≤ jk ≤ n} , andIm = {i := (i1, . . . , im) : 1 ≤ i1 < i2 < · · · < im ≤ n} .

Thus Jm is the set of all m-tuples of (possibly repeated) indices in {1, . . . , n}and Im the set of all strictly increasing m-tuples in Jm. In particular, In ={(1, . . . , n)}.

Now let A be an n×m matrix with columns a1, . . . ,am ∈ Rn and B anm× n matrix with rows b1, . . . , bm ∈ Rn. For any member j = (j1, . . . , jm) ofJm define Aj to be the m×m matrix whose rth row is row jr of A and defineBj to be the m×m matrix whose cth column is column jc of B, that is,

Aj =[a1 · · ·am

]j =

a1

1 a21 · · · am1

a12 a2

2 · · · am2...

......

a1n a2

n · · · amn

j

=

a1j1

a2j1· · · amj1

a1j2

a2j2· · · amj2

......

...a1jm

a2jm

· · · amjm

and

Bj =

b1...bm

j

=

b11 b21 · · · bn1b12 b22 · · · bn2...

......

b1m b2m · · · bnm

j

=

bj11 bj2

1 · · · bjm1bj12 bj2

2 · · · bjm2...

......

bj1m bj2

m · · · bjmm

Thus j selects rows from A and columns from B. For example, for n = 4 andm = 3,

1 2 34 5 67 8 910 11 12

(4,2,1)

=

10 11 124 5 61 2 3

and 1 2 3 4

5 6 7 89 10 11 12

(4,4,1)

=

4 4 18 8 512 12 9

.Finally, define the alternating m-multilinear functional dxj = dxj1,...,jm on Rnby

dxj(a1, . . . ,am

)= det[a1 · · ·am]j.

Note that if m = 1, the definition reduces to dxj(a) = aj , as defined inSection 9.7.

Integration on Surfaces 449

13.1.1 Lemma. If i = (i1, · · · , im) and j = (j1, · · · , jm) ∈ Im, then

dxi(ej1 , . . . , ejm

)={

1 if i = j,0 otherwise,

where e1, . . . , en are the standard basis vectors in Rn.

Proof. By definition,

dxi(ej1 , . . . , ejm

)= det

ej11 · · · ejm1...

...ej1n · · · ejmn

i

=

∣∣∣∣∣∣∣ej1i1· · · ejmi1...

...ej1im· · · ejmim

∣∣∣∣∣∣∣ ,where eji = 1 if i = j and 0 otherwise. If j1 < i1, then j1 < i` for every `, hencethe first column is zero and the determinant is zero. Similarly, if j1 > i1, thenthe first row is zero and, again, the determinant is zero. If j1 = i1, then thedeterminant reduces to ∣∣∣∣∣∣∣

ej2i2· · · ejmi2...

...ej2im· · · ejmim

∣∣∣∣∣∣∣ ,and an induction argument completes the proof.

13.1.2 Lemma. Let M and M ′ be alternating m-multilinear functionals onRn. If

M(ei1 , . . . , eim) = M ′(ei1 , . . . , eim) (13.1)

for all (i1, . . . , im) ∈ Im, then M = M ′.

Proof. For j = 1, . . . ,m, let aj = (aj1, . . . , ajn) =∑ni=1 a

jiei. By multilinearity,

M(a1, . . . ,am) = M

(n∑i=1

a1iei, . . . ,

n∑i=1

ami ei

)

=n∑

i1=1· · ·

n∑im=1

a1i1 · · · a

mimM(ei1 , . . . , eim),

with the analogous equality holding for M ′. It therefore suffices to show that

M(ei1 , . . . , eim) = M ′(ei1 , . . . , eim).

This is clear if two of the indices ik are equal, since then both sides are zero.If the indices are distinct, then, by permuting the vectors ei1 , . . . , eim andattaching the appropriate signs, the indices may be brought into increasingorder, and the desired equality then follows from the hypothesis.


13.1.3 Theorem. If M is an alternating m-multilinear functional on Rn,then

M =∑

(i1,...,im)∈Im

M(ei1 , . . . , eim) dxi1,··· ,im .

Proof. Let M ′ denote the alternating m-multilinear functional on the right. If(j1, . . . , jm) ∈ Im, then

M ′(ej1 , . . . , ejm) =∑

(i1,...,im)∈Im

M(ei1 , . . . , eim) dxi1,...,im(ej1 , . . . , ejm)

= M(ej1 , . . . , ejm),

the second equality from 13.1.1. By 13.1.2, M = M ′.

The following application of 13.1.3 will be needed later in connection withintegration on surfaces.

13.1.4 Binet–Cauchy Product. Let C be an m×n matrix and D an n×mmatrix. Then

det(CD) =∑i∈Im

(detCi)( detDi

).

Proof. Let c1, . . ., cm ∈ Rn denote the rows of C and d1, . . ., dm ∈ Rn thecolumns of D, the latter considered as variables. Define

M(d1, . . . ,dm

)= det(CD) = det

(ci · dj

)m×m .

Then M is an alternating m-multilinear form and, by 13.1.3,

M(d1, . . . ,dm

)=

∑(i1,...,im)∈Im

M(ei1 , . . . , eim) dxi1,...,im(d1, . . . ,dm

).

Since M(ei1 , . . . , eim

)= detCi and dxi

(d1, . . . ,dm

)= detDi, the conclusion

follows.

13.1.5 Corollary. If C and D are n× n matrices, then

det(CD) = (detC)(detD).

13.1.6 Corollary. If A is an n×m matrix, then

det(AtA) =∑i∈Im

[det(Ai)]2.

Proof. Take C = At and D = A in the theorem and note that Ci = (Ai)t, so(detCi)(detDi

)=(

detAit)(

detAi)

= [detAi]2.

From 13.1.6, we have

13.1.7 Corollary. Let A be an n × m matrix. Then A has rank m iffdet(AtA) 6= 0.


Definition of a Differential FormA differential m-form on a set S ⊆ Rn is a function ω that assigns to each

x ∈ S an alternating m-multilinear functional ωx on Rn. We shall usually dropthe qualifier “differential” when referring to forms. The integer m is called thedegree of the form. A 0-form is simply a real-valued function on S.

By 13.1.3, if ω is an m-form, then for each i ∈ Im there exists a uniquefunction gi on S such that

ωx =∑i∈Im

gi(x) dxi, x ∈ S.

Conversely, if fj is a real-valued function on S, then

ωx :=∑

j∈Jm

fj(x) dxj, x ∈ S, (13.2)

defines an m-form on S. If each fj is of class Cr on S (that is, on an open setcontaining S), then ω is called a differential form of class Cr or simply a Crform, where r ∈ Z+ ∪ {+∞}.

The Algebra of Differential FormsFor a ∈ R and m-forms

ω =∑

j∈Jm

fj dxj and η =∑

j∈Jm

gj dxj

on S, define m-forms aω and ω + η on S by

aω :=∑

j∈Jm

afj dxj and ω + η :=∑

j∈Jm

(fj + gj) dxj.

The collection of m-forms on S is easily seen to be a vector space under theseoperations.

It is also possibly to multiply forms. For this, the notation

dxj1,...,jm = dxj1 ∧ · · · ∧ dxjm (13.3)

will be useful. The right side may be interpreted as a product of differentials,called a wedge product and made precise below. Because dxj1,...,jm(a1, . . . ,am)is a determinant, interchanging a pair of differentials in (13.3) changes the signof the product. Furthermore, if there are duplicate indices, then the product iszero. Thus we have the “rules”

dxj ∧ dxi = −dxi ∧ dxj and dxi ∧ dxi = 0. (13.4)

Using these rules, one can reduce any m-form to its unique canonical represen-tation

ω =∑

(i1,...,im)∈Im

gi1,...,imdxi1 ∧ · · · ∧ dxim .


For example, the 3-form in R4

ω = f dx2 ∧ dx1 ∧ dx2 + g dx3 ∧ dx2 ∧ dx1 + h dx2 ∧ dx4 ∧ dx1

has canonical representation

ω = −g dx1 ∧ dx2 ∧ dx3 + h dx1 ∧ dx2 ∧ dx4.

13.1.8 Definition. Let 1 ≤ p, q ≤ n. The wedge product or exterior productof the forms

ω =∑

(j1,...,jp)∈Jp

fj1,...,jp dxj1 ∧ · · · ∧ dxjp and η =∑

(k1,...,kq)∈Jq

gk1,...,kq dxk1 ∧ · · · ∧ dxkq

is the form

ω ∧ η :=∑

(j1,...,jp)∈Jp(k1,...,kq)∈Jq

fj1,...,jpgk1,...,kq dxj1 ∧ · · · ∧dxjp ∧dxk1 ∧ · · · ∧dxkq . (13.5)

If f is a 0-form on S, then the p-form fω = f ∧ ω is defined by

f ∧ ω :=∑

(j1,...,jp)∈J

ffj1,...,jpdxj1 ∧ · · · ∧ dxjp . ♦

Note that the right side of (13.5) may be obtained by formally multiplyingthe sums defining ω and η, where the product of forms dxi1 ∧ · · · ∧ dxip anddxj1 ∧ · · · ∧ dxjq is defined as

dxj1 ∧ · · · ∧ dxjp ∧ dxk1 ∧ · · · ∧ dxkq .

The rules in (13.4) may then be used to obtain the canonical representation ofω ∧ η. The resulting form has degree ≤ n, in compliance with our definition.

13.1.9 Example. In R4,

(a) (f1 dx1 + f2 dx2 + f3 dx3 + f4 dx4) ∧ (g1 dx1 + g2 dx2)= (f1g2 − f2g1) dx1 ∧ dx2 − f3g1 dx1 ∧ dx3 − f3g2 dx2 ∧ dx3

− f4g2 dx2 ∧ dx4 − f4g1 dx1 ∧ dx4.

(b) (f1 dx1 + f2 dx2 + f3 dx3 + f4 dx4) ∧ (h1 dx1 ∧ dx3 + h2 dx2 ∧ dx4)= f1h2 dx1 ∧ dx2 ∧ dx4 − f2h1 dx1 ∧ dx2 ∧ dx3

− f3h2 dx2 ∧ dx3 ∧ dx4 + f4h1 dx1 ∧ dx3 ∧ dx4. ♦

It must still be shown that the definition of ω ∧ η in (13.5) is independentof the particular representations of ω and η. To see this, apply the rules in(13.4), first on the indices jp and then on the indices kq, to reduce the rightside of (13.5) to∑

(i1,...,ip)∈Ip(i′1,...,i

′q)∈Iq

fi1,...,ip gi′1,...,i′p dxi1 ∧ · · · ∧ dxip ∧ dxi′1 ∧ · · · ∧ dxi′q


where ∑(i1,...,ip)∈Ip

fi1,...,ip dxi1 ∧ · · · ∧ dxip and∑

(i′1,...,i′q)∈Iq

gi′1,...,i′q dxi′1 ∧ · · · ∧ dxi′q

are the canonical representations of ω and η. Since the latter are unique, everyversion of ω ∧ η may be reduced to the same form, hence ω ∧ η is well-defined.

13.1.10 Proposition. Let ω be a p-form, η a q-form, and ν an r-form, where1 ≤ p, q, r ≤ n. Then

(a) ω ∧ η is linear in each variable separately;

(b) (ω ∧ η) ∧ ν = ω ∧ (η ∧ ν);

(c) η ∧ ω = (−1)pqω ∧ η.

Proof. The straightforward proofs of (a) and (b) are left to the reader. For theproof of (c), let ω and η be as in 13.1.8. Then

η ∧ ω =∑

(k1,...,kq)∈Jq(j1,...,jp)∈Jp

gk1,...,kqfj1,...,jp dxk1 ∧ · · · ∧ dxkq ∧ dxj1 ∧ · · · ∧ dxjp

=∑

(k1,...,kq)∈Jq(j1,...,jp)∈Jp

gk1,...,kqfj1,...,jp (−1)pqdxj1 ∧ · · · ∧ dxjp ∧ dxk1 ∧ · · · ∧ dxkq

= (−1)pqω ∧ η,

the last equality because pq adjacent interchanges are required.

13.1.11 Proposition. Let aj =∑ni=1 a

jiej, j = 1, . . . , n. Then(

n∑i=1

a1i dxi

)∧ · · · ∧

(n∑i=1

ani dxi

)= det[a1 · · ·an]dx1 ∧ · · · ∧ dxn.

Proof. By properties of the wedge product, the left side of the equation isn∑

i1=1· · ·

n∑in=1

a1i1 · · · a

nindxi1 ∧ · · · ∧ dxin =

∑i1,...,in distinct

a1i1 · · · a

nindxi1 ∧ · · · ∧ dxin .

If σ = (i1, . . . , in), then dxi1 ∧ · · · ∧ dxin = (−1)σdx1 ∧ · · · ∧ dxn, and theassertion follows from the definition of determinant.

The proposition provides an alternate method for evaluating determinants.


13.1.12 Example. Let A =

1 3 52 4 63 −2 1

. By wedge product rules applied to

the forms constructed from the columns,

(1 dx1 + 2 dx2 + 3 dx3) ∧ (3 dx1 + 4 dx2 − 2 dx3) ∧ (5 dx1 + 6 dx2 + 1 dx3)= (−2 dx1,2 − 11 dx1,3 − 16 dx2,3) ∧ (5 dx1 + 6 dx2 + dx3)= (−2 dx1,2,3 + 66 dx1,2,3 − 80 dx1,2,3, )= −16 dx1,2,3,

hence det(A) = −16. ♦

The Differential of a Form13.1.13 Definition. The differential of a 0-form f of class C1 on S ⊆ Rn isits differential as a C1 function, namely, the 1-form

df =n∑j=1

(∂jf)dxj .

The differential of an m-form

ω =∑

(j1,...,jm)∈J

fj1,...,jm dxj1 ∧ · · · ∧ dxjm

of class C1 on S is the (m+ 1)-form dω defined by

dω =∑

(j1,...,jm)∈Jm

(dfj1,...,jm) ∧ dxj1 ∧ · · · ∧ dxjm (13.6)

=∑

(j1,...,jm)∈J

n∑j=1

(∂jfj1,...,jm) dxj ∧ dxj1 ∧ · · · ∧ dxjm . ♦

Note that if m = n, then dω = 0, since in the last expression every dxj is adxji for some i.

As in the case of wedge products, it must be verified that the definition ofdω does not depend on the particular representation of ω. For this we use therules in (13.4) to express ω canonically as

ω =∑

(i1,...,im)∈Im

gi1,...,im dxi1 ∧ · · · ∧ dxim .

Here, each gi1,...,im is a linear combination the functions fj1,...,jm produced bycombining these functions during the reduction process. Applying the samesequence of operations to the sum on the right in (13.6) results in∑

(i1,...,im)∈Im

ηi1,...,im ∧ dxi1 ∧ dxi2 ∧ · · · ∧ dxim ,


where ηi1,...,im is precisely the same linear combination of the formsdfj1,...,jm .Since the differential is linear on 0-forms, ηi1,...,im = dgi1,...,im . Therefore, allversions of dω may be reduced to the same form and hence are equal.

For the next example, we introduce the following notation and terminologyfrom classical vector analysis.

13.1.14 Definition. The curl of a C1 vector field ~F = (f1, f2, f3) on an opensubset of R3 is the vector

curl ~F = (∂2f3 − ∂3f2) e1 + (∂3f1 − ∂1f3) e2 + (∂1f2 − ∂2f1) e3.

The divergence of a C1 vector field ~F = (f1, . . . , fn) on an open subset of Rnis defined by

div ~F =n∑i=1

∂ifi.

If ω =∑nj=1 fj dxj we define divω = div ~F . ♦

13.1.15 Example. In R3,

(a) d(f1 dx1 + f2 dx2 + f3 dx3

)= (∂1f1 dx1 + ∂2f1 dx2 + ∂3f1 dx3) ∧ dx1

+ (∂1f2 dx1 + ∂2f2 dx2 + ∂3f2 dx3) ∧ dx2

+ (∂1f3 dx1 + ∂2f3 dx2 + ∂3f3 dx3) ∧ dx3

= (∂2f3 − ∂3f2) dx2,3 + (∂3f1 − ∂1f3) dx3,1 + (∂1f2 − ∂2f1) dx1,2

= e1 · curl ~F dx2,3 + e2 · curl ~F dx3,1 + e3 · curl ~F dx1,2.

(b) d(f3 dx1 ∧ dx2 + f1 dx2 ∧ dx3 + f2 dx3 ∧ dx1)= (∂1f3 dx1 + ∂2f3 dx2 + ∂3f3 dx3) ∧ dx1 ∧ dx2

+ (∂1f1 dx1 + ∂2f1 dx2 + ∂3f1 dx3) ∧ dx2 ∧ dx3

+ (∂1f2 dx1 + ∂2f2 dx2 + ∂3f2dx3) ∧ dx3 ∧ dx1

= (∂1f1 + ∂2f2 + ∂3f3) dx1 ∧ dx2 ∧ dx3

= div ~F dx1 ∧ dx2 ∧ dx3. ♦

13.1.16 Theorem. Let f be a 0-form, let ω and η be p-forms, and let ν be aq form, all of class C1 on S ⊆ Rn. Then

(a) d(aω + bη) = a dω + b dη, a, b ∈ R;(b) d2ω := d(dω) = 0;(c) d(ω ∧ ν) = (dω) ∧ ν + (−1)pω ∧ (dν);(d) d(fν) = (df) ∧ ν + fdν.

Proof. Part (a) is clear from the definition of addition and scalar multiplicationof m-forms and the linearity of the differential operator on 0-forms.


For (b), it suffices by linearity to prove that

d[(df) dxj1 ∧ dxj2 ∧ · · · ∧ dxjp

]= 0.

The left side of this equation is

d

[ n∑k=1

(∂kf)dxk ∧ dxj1 ∧ · · · ∧ dxjp]

=[( n∑

j=1

n∑k=1

∂j∂kf)dxj ∧ dxk

]∧ dxj1 ∧ · · · ∧ dxjp .

Since dxk ∧ dxj = −dxj ∧ dxk and ∂j∂kf = ∂k∂jf , the terms in the squarebrackets on the right cancel pairwise, producing zero, as required.

To prove (c), let

ω =∑j∈Jp

fj dxj and ν =∑

k∈Jq

gk dxk.

By the product rule for differentials of 0-forms,

d(ω ∧ ν) =∑

j∈Jp,k∈Jq

d(fjgk) ∧ dxj ∧ dxk

=∑

j∈Jp,k∈Jq

gk(dfj) ∧ dxj ∧ dxk +∑

j∈Jp,k∈Jq

fj(dgk) ∧ dxj ∧ dxk

= (dω) ∧ ν + (−1)−pω ∧ (dν),

the last equality because p adjacent interchanges are needed to place the formdgk in the second sum to the immediate left of dxk.

Part (d) follows from (c) with p = 0.

The Pullback of a FormThroughout this subsection, U ⊆ Rm and W ⊆ Rn

are open and ϕ : U →W is a C1 map.

13.1.17 Definition. The pullback by ϕ of a C1 function (0-form) f on W isthe 0-form ϕ∗(f) on U defined by

ϕ∗(f)(u) := f(ϕ(u)

), u ∈ U.

The pullback by ϕ of the 1-form dxj on W is the 1-form ϕ∗(dxj) on U definedby

ϕ∗(dxj) :=m∑i=1

∂ϕj∂ui

dui = dϕj , j = 1, . . . , n.


The pullback by ϕ of the C1 p-form

ω =∑

(j1,...,jp)∈Jp

fj1,...,jp dxj1 ∧ · · · ∧ dxjp

on W is the C1 p-form ϕ∗ω on U defined by

ϕ∗ω :=∑

(j1,...,jp)∈Jp

ϕ∗(fj1,...,jp)ϕ∗(dxj1) ∧ · · · ∧ ϕ∗(dxjp). ♦

Arguments similar to those used earlier show that the definition ofϕ∗ω isindependent of the representation of ω.

13.1.18 Example. Let ϕ = (ϕ1, ϕ2, ϕ3) : R2 → R3 be C1. Then

(a) ϕ∗(f dx1 ∧ dx2) = ϕ∗(f)ϕ∗( dx1) ∧ ϕ∗( dx2)

= (f ◦ ϕ)(∂ϕ1

∂u1du1 + ∂ϕ1

∂u2du2

)∧(∂ϕ2

∂u1du1 + ∂ϕ2

∂u2du2

)= (f ◦ ϕ)

(∂ϕ1

∂u1

∂ϕ2

∂u2− ∂ϕ2

∂u1

∂ϕ1

∂u2

)du1 ∧ du2.

(b) ϕ∗(f1 dx1 + f2 dx2 + f3 dx3

)= ϕ∗(f1)ϕ∗( dx1) + ϕ∗(f2)ϕ∗( dx2) + ϕ∗(f3)ϕ∗( dx3

)= (f1 ◦ ϕ)

(∂ϕ1

∂u1du1 + ∂ϕ1

∂u2du2

)+ (f2 ◦ ϕ)

(∂ϕ2

∂u1du1 + ∂ϕ2

∂u2du2

)+ (f3 ◦ ϕ)

(∂ϕ3

∂u1du1 + ∂ϕ3

∂u2du2

)=[(f1 ◦ ϕ)∂ϕ1

∂u1+ (f2 ◦ ϕ)∂ϕ2

∂u1+ (f3 ◦ ϕ)∂ϕ3

∂u1

]du1

+[(f1 ◦ ϕ)∂ϕ1

∂u2+ (f2 ◦ ϕ)∂ϕ2

∂u2+ (f3 ◦ ϕ)∂ϕ3

∂u2

]du2. ♦

13.1.19 Theorem. If ω and η are C1 p-forms and ν is a C1 q-form, then

(a) ϕ∗(aω + bη) = aϕ∗(ω) + bϕ∗(η), a, b ∈ R;

(b) ϕ∗(ω ∧ ν) = ϕ∗(ω) ∧ ϕ∗(ν);

(c) ϕ∗(dω) = dϕ∗(ω);

(d) (ϕ∗ω)u(a1, . . . ,ap) = ωϕ(u)(dϕu(a1), . . . , dϕu(ap)).

Proof. Part (a) follows directly from the definition of pullback. Part (b) is easilyestablished for ω = f dxi1 ∧ · · · ∧ dxip and ν = g dxj1 ∧ · · · ∧ dxjq ; bilinearityof the wedge product and linearity of ϕ∗ then imply that (b) holds generally.

For (c) it suffices, by linearity of the differential and pullback, to verifythat

ϕ∗(d(f dxj1 ∧ · · · ∧ dxjp)

)= dϕ∗(f dxj1 ∧ · · · ∧ dxjp),


that is,n∑j=1

[(∂jf) ◦ ϕ]ϕ∗( dxj) ∧ ϕ∗(dxj1) ∧ · · · ∧ ϕ∗(dxjp)

=( m∑i=1

∂i(f ◦ ϕ)dui)ϕ∗(dxj1) ∧ · · · ∧ ϕ∗(dxjp) (13.7)

By the chain rule,

∂i(f ◦ ϕ) =n∑j=1

[(∂jf) ◦ ϕ

]∂ϕj∂ui

,

hence the right side of (13.7) is

n∑j=1

[(∂jf) ◦ ϕ

]( m∑i=1

∂ϕj∂ui

dui

)∧ ϕ∗(dxj1) ∧ · · · ∧ ϕ∗(dxjp).

Recalling the definition of ϕ∗( dxj), we see that the last expression is preciselythe left side of (13.7).

To prove (d), let ω have canonical representation

ω =∑

(i1,...,ip)∈Ip

fi1,...,ip dxi1 ∧ · · · ∧ dxip .

By 13.1.2, it suffices to show that

(ϕ∗ω)u(e`1 , . . . , e`p) = ωϕ(u)(dϕu(e`1), . . . , dϕu(e`p))

for any (`1, . . . , `p) ∈ Ip. The left side of this equation is∑i∈Ip

ϕ∗(fi)(u)(ϕ∗(dxi1) ∧ · · · ∧ ϕ∗(dxip)

)(e`1 , . . . , e`p)

and the right side is∑i∈Ip

fi(ϕ(u)

)(dxi1 ∧ · · · ∧ dxip

)(dϕu(e`1), . . . , dϕu(e`p))

Hence it suffices to prove that(ϕ∗(dxi1) ∧ · · · ∧ ϕ∗(dxip)

)(e`1 , . . . , e`p)

=(dxi1 ∧ · · · ∧ dxip

)(dϕu(e`1), . . . , dϕu(e`p)) (13.8)

By multilinearity,

ϕ∗(dxi1) ∧ · · · ∧ ϕ∗(dxip) =∑

(j1,...,jp)∈Jp

∂ϕi1∂uj1

· · ·∂ϕip∂ujp

duj1 ∧ · · · ∧ dujp .


Now, duj1 ∧ · · · ∧ dujp(e`1 , . . . , e`p) 6= 0 only if the p-tuple (j1, . . . , jp) is apermutation of (`1, . . . , `p). For each such p-tuple define a permutation σ of(1, . . . , p) such that `k = jσ(k). Then

duj1 ∧ · · · ∧ dujp(e`1 , . . . , e`p) = (−1)σ du`1 ∧ · · · ∧ du`p(e`1 , . . . , e`p) = (−1)σ

and∂ϕi1∂uj1

· · ·∂ϕip∂ujp

= ∂ϕi1∂u`τ(1)

· · ·∂ϕip∂u`τ(p)

=∂ϕiσ(1)

∂u`1

· · ·∂ϕiσ(p)

∂u`p,

where τ = σ−1. Thus the left side of (13.8) is

[ϕ∗(dxi1)∧· · ·∧ϕ∗(dxip)

](e`1 , . . . , e`p) =

∑σ

(−1)σ∂ϕiσ(1)

∂u`1

· · ·∂ϕiσ(p)

∂u`p, (13.9)

where the sum is taken over all permutations σ of (1, . . . , p).On the other hand, since

dϕu(e`j ) = ∂`jϕ(u) =p∑i=1

∂ϕi(u)∂u`j

ei,

the right side of (13.8) is

dxi1 ∧ · · · ∧ dxip

p∑j=1

∂ϕj∂u`1

ej , . . . ,

p∑j=1

∂ϕj∂u`p

ej

=

∑(j1,...,jp)∈Jp

∂ϕj1

∂u`1

· · ·∂ϕjp∂u`p

dxi1 ∧ · · · ∧ dxip (ej1 , . . . , ejp). (13.10)

As above, dxi1 ∧ · · · ∧ dxip(ej1 , . . . , ejp) 6= 0 only if the p-tuple (j1, . . . , jp) isa permutation of (i1, . . . , ip). For each such p-tuple, define a permutation σ of(1, . . . , p) such that jk = iσ(k). Then

dxi1 ∧ · · · ∧ dxip (ej1 , . . . , ejp) = dxi1 ∧ · · · ∧ dxip(eiσ(1) , . . . , eiσ(k)

)= (−1)σ

and∂ϕj1

∂u`1

· · ·∂ϕjp∂u`p

=∂ϕiσ(1)

∂u`1

· · ·∂ϕiσ(p)

∂u`p

so (13.10) reduces to ∑σ

(−1)σ∂ϕiσ(1)

∂u`1

· · ·∂ϕiσ(p)

∂u`p,

where the sum is taken over all permutations of (1, . . . , p). As this is precisely(13.9) the proof is complete.


Exercises1. Let Tj ∈ L(Rn,R), j = 1, . . . ,m. Which of the following functions is

multilinear on Rn?(a) M(x1, . . . ,xm) :=

∑mi=1 Ti(xi). (b) M(x1, . . . ,xm) :=

∏mi=1 Ti(xi).

2. For fixed c = (c1, c2), d = (d1, d2) ∈ R2 define

M(x,y) := (c · x)(d · y)− (c · y)(d · x), x, y ∈ R2.

(a) Show that M is an alternating multilinear functional on R2.(b) Express M in terms of differentials, as in 13.1.3

3.S Let M(a1, . . . ,am) be a multilinear functional on Rn with the propertythat M(a1, . . . ,am) = 0 whenever two of the vectors aj are equal. Provethat M is alternating.

4. Let M be an alternating m-multilinear functional on Rn. Show that ifthe vectors a1, . . . ,am are linearly dependent, thenM(a1, . . . ,am) = 0.

5. Let M(a1, . . . ,am) be an m-multilinear functional on Rn. Define

Alt(M)(a1, . . . ,am) = 1m!∑σ

(−1)σM(aσ(1), . . . ,aσ(m)),

where the sum is taken over all permutations σ of (1, . . . ,m). Showthat Alt(M) is an alternating m-multilinear functional on Rn and thatAlt(M) = M iff M is alternating.

6. Prove that the vector space of m-forms on S has dimension(nm

).

7. Find the canonical representation of the following forms in R3:(a)S (f1 dx1 + f2 dx2 + f3 dx3) ∧ (g1 dx1 + g2 dx2 + g3 dx3).(b) (f1 dx1 + f2 dx2 + f3 dx3) ∧ (g1 dx1 + g2 dx2 + g3 dx3)

∧(h1 dx1 + h2 dx2 + h3 dx3).

8. Find the canonical representation of the following forms in R5:(a) (−dx1 + dx2 + dx3

)∧ (dx1 − 2dx2 + 3dx3).

(b)S (dx1 + dx2) ∧ (dx1 − dx3) ∧ (dx2 + 2dx3).(c) dx1 ∧ (dx1 ∧ dx3 + 3dx5 ∧ dx4).(d)

(dx1 ∧ dx2 + dx1 ∧ dx3

)∧(dx4 ∧ dx3 + dx2 ∧ dx5

)∧(dx3 ∧ dx1 + dx4 ∧ dx1

).

9. Find the canonical representation of the following forms in Rn:(a)S

(dx2 ∧ dx4 ∧ · · · ∧ dx2k

)∧(dx1 ∧ dx3 ∧ · · · ∧ dx2k−1

), 2k ≤ n.

(b)(dx1 ∧ dx5 ∧ · · · ∧ dx4k−3

)∧(dx3 ∧ dx7 ∧ · · · ∧ dx4k−1

)∧(dx2 ∧ dx6 ∧ · · · ∧ dx4k−2

)∧(dx4 ∧ dx8 ∧ · · · ∧ dx4k

), 4k ≤ n.


10. Show that if ω is an m-form and m is odd, then ω ∧ ω = 0. Find anexample of a 2-form ω in R4 such that ω ∧ ω 6= 0.

11. Use the method of 13.1.12 to verify the determinants

(a)

∣∣∣∣∣∣1 −1 1−1 1 1

1 1 −1

∣∣∣∣∣∣ = -4. (b)S

∣∣∣∣∣∣−1 0 2

2 −1 00 2 1

∣∣∣∣∣∣ = 9. (c)

∣∣∣∣∣∣3 1 20 −1 1−1 1 0

∣∣∣∣∣∣= -6.

12. Show directly that in Rn, d[f( dx1 ∧ · · · ∧ dxn)

]= 0.

13. Let f : R → R be C1 and define gj(x) = f(xj). Find the canonicalrepresentation of

(a)S dn∑j=1

gj dxj . (b) d

n∑j=1

gn−j+1 dxj .

14. Find d(f dg), where f is C1 and g is C2 on W .

15.S A form η on W is exact if η = dω for some form ω on W . Prove that ifη is exact and dν = 0, then η ∧ ν is exact.

16. Let f and ω :=∑ni=1 fi dxi be C1 on an open set W ⊆ Rn. Show that if

d(fω) = 0, then fω ∧ dω = (df) ∧ ω ∧ ω.

17.S Let U ⊆ Rk, V ⊆ R`, and W ⊆ Rn be open and let ϕ : U → V andψ : V → W be C1. If ω is an m-form on W , prove that (ψ ◦ ϕ)∗ω =ϕ∗(ψ∗ω). Hint. Use 13.1.19(d).

18. Let U, W ⊆ Rn be open and let ϕ : U → W and f : W → Rn be C1.Show that ϕ∗(dx1 ∧ · · · ∧ dxn) = det(ϕ′) du1 ∧ · · · ∧ dun.

19.S Let F = (f1, f2, f3) be C1 on R3 and homogeneous of degree k ∈ N. (SeeExercise 9.3.15.) Let ω = f1 dx1 + f2 dx2 + f3 dx3. Show that if dω = 0,then ω = df where f(x) = (k + 1)−1F (x) · x.

13.2 Integrals on Parameterized SurfacesRecall that the length of a parameterized curve C in Rn is, by definition,

a limit of lengths of inscribed polygonal lines. The proof of 12.2.4 showsthat if the curve C is C1, then its length may be also be approximated bytangent line segments. This idea may be extended to higher dimensions, usingtangent parallelepipeds to approximate surface area. This leads ultimately tothe definition of the integral of a function or a form on a surface.

Area of a Parallelepiped13.2.1 Definition. The parallelepiped spanned by vectors a1, . . . ,am ∈ Rn isthe set

P = P (a1, . . . ,am) :={ m∑i=1

tiai : 0 ≤ ti ≤ 1

}.

The volume vol(P ) of P is its n-dimensional Lebesgue measure. ♦

For m = n, there is a simple formula for the volume:13.2.2 Lemma. vol(P ) =

∣∣det[a1 . . . an

]∣∣.Proof. Denote by T ∈ L(Rn,Rn) the linear mapping with matrix A :=[a1 · · · an

]. Since T (ej) = aj , a typical member of P := P (a1, . . . ,an) may

be expressed asn∑i=1

tiai = T

( n∑i=1

tiei

)= T (t1, . . . , tn) ∈ T ([0, 1]n) .

By 11.6.3 and 11.6.9, λn(P ) = λn(T ([0, 1]n)) = |detA|λn([0, 1]n) = |detA|.

If m < n, then λn(P ) = 0 but P may still have positive m-dimensionalLebesgue measure, as defined in 11.6.9. Specifically, let V denote the linearspan of the vectors a1, . . ., am and choose an orthonormal basis v1, . . ., vnof Rn such that v1, . . ., vm is a basis for V. Define T ∈ L(V,Rm) so thatT (vj) = ej , 1 ≤ j ≤ m. Thus T “rotates” and/or “reflects” V onto Rm × {0}.The area of P is then defined by

area(P (a1, . . . ,am)

)= λm

(T(P (a1, . . . ,am)

)).

A concrete value for this area is given in the following theorem.13.2.3 Theorem. Let m < n, a1, . . ., am ∈ Rn, and A = [a1 · · ·am]. Then

area(P (a1, . . . ,am)

)=√

det(AtA) =[ ∑

i∈Im

(detAi

)2]1/2.

Proof. Set bj = T (aj) and B =[b1 · · · bm

]. By linearity of T ,

T(P (a1, . . . ,am)

)= P (b1, . . . , bm) ⊆ Rm,

hence, by 13.2.2,area

(P (a1, . . . ,am)

)= λm

(P (b1, . . . , bm)

)= |detB|.

Now, the (i, j)th entry of BtB is bi ·bj , and because T preserves inner productsthis is the same as ai · aj . Therefore, BtB = AtA, hence

|detB| =√

(detBt)(detB) =√

det(BtB) =√

det(AtA).This proves the first equality in the theorem. The second equality is from13.1.6.


Area of a Parameterized SurfaceLet ϕ : U → Rn be a parameterized m-surface in Rn with image S and

let u = (u1, . . . , um) ∈ U and a = ϕ(u) ∈ S. Choose a small m-dimensionalinterval

Q = [u1, u1 + ∆u1]× · · · × [um, um + ∆um] ⊆ U, ∆uj > 0.

As noted in Chapter 12, the line segments u+ tej in U map onto curves in Swith tangent vectors

dϕu(ej) = ∂jϕ(u), 1 ≤ j ≤ m,

at ϕ(u). The matrix with columns ∂jϕ(u) is ϕ′(u), the Jacobian matrix of ϕ atu. By 13.2.3, the parallelepiped spanned by the vectors ∆uj∂jϕ(u) therefore

S = ϕ(U)

ϕ(Q)

ϕ

(∆u1) dϕu(e1)

p

(∆u2) dϕu(e2)

UQ

(∆u1)e1

(∆u2)e2

u

FIGURE 13.1: Parallelogram approximation to ϕ(Q).

has area √det(ϕ′(u)tϕ′(u)

)∆u1 ∆u2 · · ·∆um,

which is taken as an approximation of the area of the surface element ϕ(Q).Partitioning U into a grid Q of intervals Q and summing these expressions,we obtain the Riemann sums∑

Q

√det(ϕ′(u)tϕ′(u)

)∆u1 ∆u2 · · ·∆um.

It is reasonable then to define the area of S as the limit of these sums as thediameters of the intervals Q tend to zero, that is,

area(ϕ) :=∫U


)du. (13.11)

Integral of a Function on a Parameterized SurfaceLet f be a continuous, real-valued function on S = ϕ(U). Motivated by

(13.11) we define the surface integral of f over ϕ by∫ϕ

f dS =∫U

(f ◦ ϕ)(u)√

det(ϕ′(u)tϕ′(u)

)du (13.12)


whenever the right side exists. In particular,

area(S) =∫ϕ

1 dS.

The integral on the right in (13.12) may be interpreted as a Lebesgue integralor (if ϕ has compact support) as a Riemann integral. In the latter case, it is alimit of Riemann sums∑

Q(f ◦ ϕ)(u)


)∆u1 · · ·∆um. (13.13)

This interpretation has important physical applications. For example, if fis the density in mass per unit area of a curved sheet S in R3, then (13.13)approximates the mass of the surface element

ϕ

({u+

∑j

tjej : 0 ≤ tj ≤ ∆uj}

),

hence∫ϕf gives the mass of S. For another example, let f(x) be denote the

temperature of the sheet at point x ∈ S. Then [area(S)]−1 ∫ϕf dS gives the

average temperature of the sheet.To evaluate (13.12), it is useful to note that since ϕ′ =

[∂1ϕ · · · ∂nϕ

],

by 13.1.6


)=

∑(i1,...,im)∈Im

[∂(ϕii , . . . ϕim)∂(u1, . . . , um) (u)

]2(13.14)

The following instances of 13.12 are of particular interest.

13.2.4 Special Cases.(a) m = 1: Then det

(ϕ′(u)tϕ′(u)

)= ‖ϕ′(u)‖2, hence∫

ϕ

f dS =∫U

(f ◦ ϕ)(u)‖ϕ′(u)‖ du =∫ϕ

f ds,

which is the line integral of Section 12.2.(b) m = 2: In this case


)= det

{[∂1ϕ∂2ϕ

] [∂1ϕ ∂2ϕ

]}=∣∣∣∣∂1ϕ · ∂1ϕ ∂1ϕ · ∂1ϕ∂1ϕ · ∂2ϕ ∂2ϕ · ∂2ϕ

∣∣∣∣hence ∫

ϕ

f dS =∫U

(f ◦ ϕ)√‖∂1ϕ‖2‖∂2ϕ‖2 −

[∂1ϕ · ∂2ϕ

]2du.

(c) m = n− 1: Here


)=

n∑i=1

[∂(ϕ1, . . . , ϕi, . . . ϕn)∂(u1, . . . , un−1) (u)

]2= ‖∂ϕ⊥(u)‖2,

hence ∫ϕ

f dS =∫U

(f ◦ ϕ)(u)‖∂ϕ⊥(u)‖ du.

(d) ϕ(u1, . . . , un−1) =(u1, . . . , un−1, g(u1, . . . , un−1)

)(the graph of g):

Let i = (1, . . . , i− 1, i+ 1, . . . , n). Then

∂(ϕ1, . . . , ϕi, . . . ϕn)∂(u1, . . . , un−1) =

∣∣∣∣∣∣∣∣∣∣∣

1 0 · · · 00 1 · · · 0...

......

0 0 · · · 1∂1g ∂2g · · · ∂n−1g

∣∣∣∣∣∣∣∣∣∣∣i={

(−1)n−1+i∂ig, i < n,1, i = n,

(13.15)

hencedet(ϕ′(u)tϕ′(u)

)= 1 + ‖∇g(u)‖2

and ∫ϕ

f dS =∫U

(f ◦ ϕ)(u)√

1 + ||∇g(u)||2 du. ♦

13.2.5 Example. Let S be the following portion of an n-dimensional cone:

S ={

(x1, . . . , xn+1) : x2n+1 =

n∑i=1

x2i , 0 < xn+1 < 1

}.

Then S is parameterized by

ϕ(x) =(x, g(x)

), g(x) := ‖x‖, x := (x1, . . . , xn),

where ∇g(x) = x/‖x‖. If f is of the form f(x) = h(‖x‖), then, by Exer-cise 11.6.3, ∫

ϕ

f dS =√

2nαn∫ 1

0h(r)rn−1 dr.

In particular, taking h = 1,

area(S) =√

2αn, ♦

The following result will be needed later to construct the integral of afunction on a generalm-surface. It asserts that the integral over a parameterizedsurface ϕ is invariant under a change of parameter and hence may be viewedas a construct intrinsic to the image of ϕ.


13.2.6 Proposition. Let U and V be open subsets of Rm, α : V → U a C1

function with C1 inverse, and ϕ : U → Rn a parameterized m-surface. Thenψ := ϕ ◦ α is a parameterized m-surface and

∫ϕf dS =

∫ψf dS.

Proof. By the chain rule, ψ′(v) = ϕ′(u)α′(v), where u = α(v), hence

det(ψ′(v)tψ′(v)

)= det

(α′(v)tϕ′(u)tϕ′(u)α′(v)

)=[Jα(v)

]2 det(ϕ′(u)tϕ′(u)

).

Therefore, by the change of variables theorem,∫ψ

f dS =∫V

(f ◦ ψ)(v)√

det(ψ′(v)tψ′(v)

)dv

=∫V

(f ◦ ϕ)(α(v))√

det(ϕ′(α(v))tϕ′(α(v))

)|Jα(v)| dv

=∫U

(f ◦ ϕ)(u)√


)du

=∫ϕ

f dS.

13.2.7 Remark. The material in this section holds, in particular, for a localparametrization of an m-surface as well as a local parametrization of an (n−1)-surface-with-boundary. In the latter case, the domain of the parametrizationat a boundary point is an open set in Hn−1. ♦

Integration of a Form on a Parameterized m-Surface13.2.8 Definition. Let ϕ : U → Rn be a parameterized orientable m-surfacein Rn and let

ω =∑

(j1,··· ,jm)∈Jm

fj1,··· ,jmdxj1 ∧ · · · ∧ dxjm

be a continuous m-form on S := ϕ(U). The integral of ω over ϕ is defined by∫ϕ

ω =∫S

ω = sign(ϕ)∫U

ωϕ(u)(dϕu(e1), . . . , dϕu(em)

)du. ♦

The inclusion of sign(ϕ) corresponds to the familiar convention∫ a

b

f(t) dt = −∫ b

a

f(t) dt

for Riemann integrals, which reflects the fact that the process of Riemannintegration respects the natural orientation (ordering) of the interval [a, b].Recalling that dϕu(ej) = ∂jϕ(u) and

dxj1 ∧ · · · ∧ dxjm(∂1ϕ(u), . . . , ∂mϕ(u)

)= ∂(ϕj1 , . . . , ϕjm)

∂(u1, . . . , um) (u),


we obtain the formula∫ϕ

ω = sign(ϕ)∫U

∑(j1,...,jm)∈Jm

(fj1,...,jm ◦ ϕ) ∂(ϕj1 , . . . , ϕjm)∂(u1, . . . , um) du. (13.16)

The following instances of (13.16) are of particular importance.

13.2.9 Special Cases. Let ϕ be positively oriented.(a) m = 1: ∫

ϕ

n∑i=1

fi dxi =n∑i=1

∫I

fi(ϕ(t)

)ϕ′i(t) dt,

which is the integral of Section 12.2.(b) m = n− 1:∫ϕ

n∑i=1

fi dx1 ∧ · · · ∧ dxi ∧ · · · ∧ dxn =∫U

n∑i=1

(fi ◦ ϕ)∂(ϕ1, . . . , ϕi, . . . ϕn)∂(u1, . . . , un−1) du.

In particular, for the graph ϕ(u1, . . . , un−1) =(u1, . . . , un−1, g(u1, . . . , un−1)

),

we have from (13.15)∫ϕ

n∑i=1

fi dx1∧· · ·∧ dxi∧· · ·∧dxn =∫U

[fn ◦ϕ+

n−1∑i=1

(−1)n−1+i(fi ◦ϕ)∂ig]du.

(c) m = 2, n = 3: Let Dij(u) := ∂(ϕi, ϕj)∂(u1, u2) . Then∫

ϕ

[f1 dx2 ∧ dx3 + f2 dx1 ∧ dx3 + f3 dx1 ∧ dx2

]=∫U

[(f1 ◦ ϕ)(u)D23(u) + (f2 ◦ ϕ)(u)D13(u) + (f3 ◦ ϕ)(u)D12(u)] du.

(d) m-form on parameterized surface ι : U → U :∫ι

∑(j1,··· ,jm)∈Jm

gj1,··· ,jk duj1 ∧ · · · ∧ dujm =∫U

∑j∈Jm

gj(u) du. ♦

13.2.10 Notation. For the integral on the left in (d) we write∫U

instead of∫ι. In particular, ∫

ι

g duj1 ∧ · · · ∧ dujm =∫U

g(u) du. ♦

13.2.11 Example. Let S be the following portion of a paraboloid:

S ={

(x1, x2, x3) : x1 = x22 + x2

3, 0 < x1 < 1}.

For purposes of integration, we may consider S to be the image of the parame-terized 2-surface

ϕ(t, θ) = (t,√t cos θ,

√t sin θ), 0 < t < 1, 0 < θ < 2π,

since there are no contributions to an integral on the set where θ = 0.By 13.2.9(c) ,∫

S

x22x3 dx1 ∧ dx2 =

∫ 1

0

∫ 2π

0[t3/2 cos2 θ sin θ]∂(ϕ1, ϕ2)

∂(t, θ) dθ dt

= −∫ 1

0

∫ 2π

0t2 cos2 θ sin2 θ dθ dt

= − π

12 . ♦

The following proposition, the analog of 13.2.6 for differential forms, showsthat the definition of integral of a form is invariant under reparametrizations.

13.2.12 Proposition. Let U, V be open connected subsets of Rm, α : V → Ua C1 function with C1 inverse and positive Jacobian, and ϕ : U → Rn aparameterized orientable m-surface. If ω is a continuous m-form on ϕ(U),then ∫

ϕ

ω =∫ϕ◦α

ω.

Proof. Note first that sign(Jα) is constant since α is C1 and V is connected.Let ψ = ϕ ◦ α. By the chain rule and the change of variables theorem,∫

V

(fj1,...,jm ◦ ψ

) ∂(ψj1 , . . . , ψjm)∂(v1, . . . , vm) dv

=∫V

(fj1,...,jm ◦ ϕ ◦ α

)(v)∂(ϕj1 , . . . , ϕjm)

∂(u1, . . . , um)(α(v)

)Jα(v) dv

=∫U

(fj1,...,jm ◦ ϕ) ∂(ϕj1 , . . . , ϕjm)∂(u1, . . . , um) du.

The conclusion now follows from (13.16) and linearity of the integral.

The final result of this section expresses∫ϕω as an integral of a form on U .

It will be needed in the proof of Stokes’s theorem.

13.2.13 Theorem. Let U ⊆ Rm be open and let ϕ : U → Rn be an orientedparameterized surface. If ω is a C1 m-form on ϕ(U), then∫

ϕ

ω = sign(ϕ)∫U

ϕ∗ω.

Proof. By (d) of 13.1.19, if ι : U → U denotes the identity map then

ωϕ(u)(dϕu(e1), . . . , dϕu(em)) = (ϕ∗ω)u(e1, . . . , em)= (ϕ∗ω)u(d ιu(e1), . . . , d ιu(em)),

The result now follows directly from the definition of the integral of a form(13.2.8) and 13.2.10.

Exercises1. Find the area of the following 2-surfaces in R3.

(a) ϕ(t, θ) = (t cos θ, t sin θ, t), t ∈ (0, 1), θ ∈ (0, 2π).(b)S ϕ(t, θ) = (t cos θ, t sin θ, θ), 0 < t < 1, 0 < θ < 2π.(c) ϕ(θ, s) = (1− s)

(a cos θ, a sin θ, 0

)+ s(b cos θ, b sin θ, 1),

0 < s < 1, 0 < θ < 2π, 0 < a < b.

2. Let a1, . . . ,am ∈ Rn be linearly independent and let b ∈ Rn. Defineϕ : Rm → Rn by

ϕ(u1, . . . , um) = b+m∑i=1

uiai.

(See 12.3.2.) For a continuous function f on Rn, prove that∫ϕ

f =√

det(AtA)∫Rn

(f ◦ ϕ)(u) du,

where A =[a1 · · ·am

]n×m.

3. Let ϕ be as in Exercise 2. Show that∫ϕ

∑i∈Im

fi dxi =∑i∈Im

det(Ai)∫U

fi ◦ ϕdu.

4.S Show that the area of the Cartesian product of circles

ϕ(θ1, . . . , θm) =(r1 cos θ1, r1 sin θ1, . . . , rm cos θm, rm sin θm

), ri > 0,

is (2πr1)(2πr2) · · · (2πrm).

5. Let ϕ be the product of two circles:

ϕ(θ1, θ2) =(r1 cos θ1, r1 sin θ1, r2 cos θ2, r2 sin θ2

), ri > 0,

and let

ω = f12 dx1 ∧ dx2 + f13 dx1 ∧ dx3 + f14 dx1 ∧ dx4

+ f23 dx2 ∧ dx3 + f24 dx2 ∧ dx4 + f34 dx3 ∧ dx4.

Show that∫ϕ

ω = r1r2

∫ 2π

0

∫ 2π

0

[(f13 ◦ ϕ) sin θ1 sin θ2 − (f14 ◦ ϕ) sin θ1 cos θ2

− (f23 ◦ ϕ) cos θ1 sin θ2 + (f24 ◦ ϕ] cos θ1 cos θ2]dθ dφ.

6.S (Area of an n-dimensional simplex in Rn+1). Use Example 11.5.5 to findthe surface area of

S ={

(x1, . . . xn+1) :n+1∑j=1

xj = 1 and xj ≥ 0}.

x1

x2

x3

1

1

1

S

FIGURE 13.2: Two dimensional simplex S in R3.

7.S Let U ⊆ Rn−2 be open and let ψ : U → Rn−1 be a parameterized(n− 2)-surface in Rn−1. Let ϕ : U × [0, h]→ Rn be the cylinder

ϕ(u, s) =(ψ(u), s

), u ∈ U, 0 ≤ s ≤ h.

Show that area(ϕ) = h · area(ψ).

8.S Let ϕ be the cylinder of Exercise 7 for n = 3 and h = 1. Show that∫ϕ

(f1 dx2 ∧ dx3 + f2 dx1 ∧ dx3 + f3 dx1 ∧ dx2

)=∫ 1

0

[(f1 ◦ ϕ)ψ′1 + (f2 ◦ ϕ)ψ′2

]dt.

9. Let ψ : [a, b] → R2 be a C1 curve in R2 and let ϕ : [a, b]× (0, h) → R3

be the cone

ϕ(t, s) =((1− s/h)ψ(t), s

), a ≤ t ≤ b, 0 < s < h.

Show that the area of ϕ is

h

2

∫ b

a

√[ψ′1(t)

]2 +[ψ′2(t)

]2 + h−2[ψ1(t)ψ′2(t)− ψ2(t)ψ′1(t)]2dt.

Use this to show that the surface area of a right circular cone with radiusr and axis length h is πr

√r2 + h2.

10. Let ϕ be the cone of Exercise 9 with h = 1. Show that∫ϕ


)= 1

2

∫ 1

0

{(f1 ◦ ϕ)ψ′1 + (f2 ◦ ϕ)ψ′2 + [ψ1(t)ψ′2(t)− ψ2(t)ψ′1(t)]

}dt.

11.S Let ψ : [a, b] → R2 a parameterized C1 curve with ψ2(t) > 0 for all t.Define

ϕ(t, θ) =(ψ1(t), ψ2(t) cos θ, ψ2(t) sin θ

), t ∈ I, θ ∈ (0, 2π),

which is the parameterized surface of revolution of 12.3.9. Show that

area(ϕ) = 2π∫ b

a

ψ2(t)‖ψ′(t)‖ dt = (2πy)length(ψ), (13.17)

where (x, y) = ψ and y := 1length(ψ)

∫ψ

y ds, the y-coordinate of the

centroid of ψ. Use the first part of (13.17) to find the surface area of thetorus

ϕ(t, θ) =(a cos θ, (b+ a sin t) cos θ, (b+ a sin t) sin θ

), 0 < θ, t < 2π,

where 0 < a < b. Show also that the area of the cone in Exercise 9 maybe found from (13.17).

12. Let ϕ be the parameterized surface of revolution in Exercise 11 and letω := f1 dx2 ∧ dx3 + f2 dx1 ∧ dx3 + f3 dx1 ∧ dx2. Show that∫ϕ

ω =∫ b

a

∫ 2π

0(f1 ◦ ϕ)ψ2(t)ψ′2(t) dθ dt+

∫ b

a

∫ 2π

0(f2 ◦ ϕ)ψ′1(t)ψ2(t) cos θ dθ dt

−∫ b

a

∫ 2π

0(f3 ◦ ϕ)ψ′1(t)ψ2(t) sin θ dθ dt.

Show also that if ψ(t) = (t, g(t)) (the graph of g), then this reduces to∫ b

a

∫ 2π

0g(t)

[(f1 ◦ ϕ)g′(t) + (f2 ◦ ϕ) cos θ − (f3 ◦ ϕ) sin θ

]dθ dt.

13. Use Exercise 12 to evaluate

(a)S

∫S

x1x3 dx1 ∧ dx2, (b)∫S

x2x3 dx1 ∧ dx2, (c)∫S

x21x

22 dx2 ∧ dx3,

where S is the cone

S ={

(x1, x2, x3) : x21 = x2

2 + x23, 0 < x1 < 1

}.

14. Repeat Exercise 13 using the portion of the hyperboloid

S ={

(x1, x2, x3) : x21 − x2

2 − x23 = 1, 1 < x1 <

√2}.

15. Let S be the torus given by x21 +(√

x22 + x2

3− b)2 = a2, where 0 < a < b.

Use Exercise 12 to evaluate

(a)∫S

x2 dx2 ∧ dx3. (b)S

∫S

x1 dx2 ∧ dx3. (c)∫S

x2 dx1 ∧ dx3.

13.3 Partitions of UnityThe theorem proved in this section will be used to extend the definition of

the integral to functions and forms onm-surfaces. It will also be needed laterin the proofs of Stokes’s theorem and the divergence theorem.

13.3.1 Definition. The support of a continuous function ψ : Rn → R isdefined by

supp(ψ) = cl({x : ψ(x) 6= 0}

). ♦

Thus, by definition of closure, supp(ψ) is the smallest closed set outside ofwhich ψ is zero.

13.3.2 Partition of Unity. Let K be a compact subset of Rn and let{Ui : i ∈ I} be an open cover of K. Then there exists a finite subcover{U1, . . . , Up} of K and C∞ functions χi : Rn → [0,+∞), i = 1, . . . , p, suchthat supp(χi) is compact and contained in Ui and

∑pi=1 χi = 1 on K.

U2

U1

K

χ1 χ2

FIGURE 13.3: A partition of unity subordinate to U1 and U2.

The functions χi are said to form a partition of unity subordinate to theopen sets Ui. They are typically used to patch together local data to form aglobal construct such as a surface integral, or to reduce a global problem to alocal one, as in the case of the proof of Stokes’s theorem.

The proof of 13.3.2 requires several lemmas which are of intrinsic interest.

13.3.3 Lemma. Let a < b. Then there exists a C∞ function h : R→ [0,+∞)such that h > 0 on (a, b), and h = 0 on (a, b)c.

Proof. Define h by

h(x) ={

exp[(x− a)−1(x− b)−1] if a < x < b,

0 otherwise.

Clearly, h(m) = 0 on [a, b]c for all m ≥ 0. Moreover, if x ∈ (a, b), then h(m)(x)is a sum of terms of the form

±h(x)(x− a)p(x− b)q , p, q ∈ Z+.

Since the exponent (x−a)−1(x− b)−1 is negative on (a, b), by l’Hospital’s rule,

limx→a+

h(x)(x− a)p(x− b)q = 0.

Therefore, limx→a h(m)(x) = 0, and an induction argument then shows that

h(m)(a) = 0 for all m. A similar argument holds at the point b. Thus h is C∞on R.

FIGURE 13.4: The functions h and g.

13.3.4 Lemma. Let a < b. Then there exists a C∞ function g : R→ R suchthat 0 ≤ g ≤ 1, g = 0 on (−∞, a], and g = 1 on [b,+∞).

Proof. Let h be the function in 13.3.3. Then

g(x) :=[ ∫ b

a

h]−1 ∫ x

a

h

has the required properties.

13.3.5 Lemma. Let I = (a1, b1) × · · · × (an, bn). Then there exists a C∞

function f : Rn → R such that f > 0 on I and f = 0 on Ic.

Proof. For each j, let hj : R → [0,+∞) be a C∞ function such that hj > 0on (aj , bj) and hj = 0 on (aj , bj)c. The function

f(x1, . . . , xn) := h1(x1) · · ·hn(xn)

then satisfies the requirements.

For the next lemma we define the open cube with center x ∈ Rn and edge2r by

{y ∈ Rn : xj − r < yj < xj + r, j = 1, . . . , n} .

13.3.6 Lemma. Let K ⊆ U ⊆ Rn, where K is compact and U is open. Thenthere exists a C∞ function ψ : Rn → [0, 1] such that supp(ψ) ⊆ U and ψ = 1on K.

Proof. For each x ∈ K, let Vx be an open cube with center x and edge 2rsuch that cl

(Vx

)⊆ U and let Wx ⊆ Vx denote the concentric open cube with

center x and edge r. Since K is compact, there exist finitely many cubes Wx

whose union contains K. Denote these cubes by W1, . . . ,Wm and denote thecorresponding cubes Vx by V1, . . . , Vm. (See Figure 13.5.) By 13.3.5, for each i

U

Vi

Wi K

f = 0f > 0

FIGURE 13.5: The cubes Wi and Vi.

there exists a C∞ function fi : Rn → R such that fi > 0 on Wi and fi = 0 onW ci . Set

f =m∑i=1

fi, V =m⋃i=1

Vi, and W =m⋃i=1

Wi.

Then f is nonnegative and C∞ on Rn, f > 0 on W ⊇ K, and supp(f) ⊆cl(V ) ⊆ U . Now let a = minx∈K f(x). Since a > 0, there exists a C∞ functiong : R→ [0, 1] such that g = 0 on (−∞, 0] and g = 1 on [a,+∞) (13.3.4). Thefunction ψ := g ◦ f then has the required properties.

Proof of the partition of unity theorem.

For each x ∈ K, let i(x) be an index such that x ∈ Ui(x). Choose a boundedopen set Vx containing x such that cl

(Vx

)⊆ Ui(x). Since K is compact, finitely

many of the sets Vx cover K. Denote these by V1, . . . Vp and denote thecorresponding sets Ui(x) by U1, . . . , Up. Since Vi ⊆ Ki := cl(Vi) ⊆ Ui, by 13.3.6there exists a C∞ function ψi : Rn → [0, 1] such that ψi = 1 on Ki andsupp(ψi) ⊆ Ui. Now set

χ1 = ψ1 and χi = (1− ψ1)(1− ψ2) · · · (1− ψi−1)ψi, i > 1.

Then χi is C∞, 0 ≤ χi ≤ 1, and supp(χi) ⊆ supp(ψi) ⊆ Ui. Finally, let

ηi = (1− ψ1)(1− ψ2) · · · (1− ψi).


For i > 1,

ηi−1 − ηi = (1− ψ1)(1− ψ2) · · · (1− ψi−1)[1− (1− ψi)

]= χi,

hencep∑i=1

χi = χ1 +p∑i=2

(ηi−1 − ηi) = χ1 + η1 − ηp = 1− ηp.

Since K ⊆⋃i Vi ⊆

⋃iKi and ψi = 1 on Ki, ηp = 0 on K, hence

∑pi=1 χi = 1

on K, completing the proof.

13.4 Integration on Compact m-SurfacesIn this section we define the integrals of a function and a form on a compact

m-surfaceS = {x ∈ V : F (x) = 0} ,

where V ⊆ Rn is open, F : V → Rn−m is C1, and F ′(x) has rank n−m forall x ∈ V .

To set the stage, let {(Ua, ϕa) : a ∈ S} be an atlas for S. By the partitionof unity theorem, there exist finitely many charts (Ui, ϕi) :=

(Uai , ϕai

)and C1

functions χi : Rn → R such that the sets Si := ϕi(Ui) cover S, supp(χi) ⊆ Si,and

∑i χi = 1 on S.

Integral of a FunctionThe (surface) integral of a continuous function f on S is defined by∫S

f dS =∑i

∫ϕi

χif =∑i

∫Ui

[(χif) ◦ ϕi

](u)√

det(ϕ′i(u)tϕ′i(u)

)du.

To see that the integral is independent of the system {(Ui, ϕi, χi)}i and hence

is well-defined, consider another such system {(Uj , ϕj , χj

)}j . Since

χi =∑j

χiχj and χj =∑i

χjχi on S,

we see that∑i

∫ϕi

fχi =∑i,j

∫ϕi

fχiχj and∑j

∫ϕj

fχj =∑i,j

∫ϕj

fχjχi.

Setαij = ϕ−1

j ◦ ϕi : ϕ−1i (Si ∩ Sj)→ ϕ−1

j (Si ∩ Sj).

Since fχiχj = 0 outside Si ∩ Sj and ϕi = ϕj ◦ αij on ϕ−1i (Si ∩ Sj), by 13.2.6∫

ϕifχiχj =

∫ϕjfχiχj . Therefore,∑

i

∫ϕi

fχi =∑j

∫ϕj

fχj ,

as required.The definition of the integral is extended to a finite union S of compact

m-surfaces S1, . . . , Sp by defining∫S

f dS =∑i

∫Si

f dS.

13.4.1 Definition. The area of S is defined as

area(S) =∫S

1 dS. ♦

13.4.2 Example. In 11.5.6 we found that the volume of the closed ballCnr (0) = {x ∈ Rn : ||x|| ≤ r} is rnαn, where

αn =

2(2π)(n−1)/2

n(n− 2) · · · 3 · 1 if n is odd,

(2π)n/2n(n− 2) · · · 4 · 2 if n is even.

We now show that for the sphere S := Sn−1r (0) = {x ∈ Rn : ||x|| = r},

area(S) = nrn−1αn = n

rλn(Cnr (0)

). (13.18)

To this end, note that the upper hemisphere Hu of S is the graph of thefunction

g(x1, . . . , xn−1) =√r2 − (x2

1 + · · ·+ x2n−1) =

√r2 − ||x||2, ||x|| ≤ r.

Let 0 < t < 1 and consider the part of the hemisphere Hut for which ||x|| < rt.

Since1 + ||∇g(x)||2 = 1 + ||x||2

r2 − ||x||2= r2

r2 − ||x||2,

by 13.2.4(c)

area(Hut ) = r

∫||x||<rt

(r2 − ||x||2

)−1/2dx = r(n− 1)αn−1

∫ rt

0

sn−2√r2 − s2

ds,

where the second equality comes from Exercise 11.6.3. The substitution s = xrproduces

area(Hut ) = (n− 1)rn−1αn−1

∫ t

0

xn−2√

1− x2dx.


The lower hemisphere counterpart H`t has the same area. Since

1S = limt→1−

1Hut + limt→1−

1H`t ,

it follows from Exercise 6 below that

area(S) = 2 limt→1−

area(Hut ) = 2(n− 1)rn−1αn−1

∫ 1

0

xn−2√

1− x2dx. (13.19)

By Exercise 5.3.7,

∫ 1

0

xn−2√

1− x2dx =

(n− 3)(n− 5) · · · 4 · 2(n− 2)(n− 4) · · · 3 · 1 if n is oddπ

2(n− 3)(n− 5) · · · 3 · 1(n− 2)(n− 2) · · · 4 · 2 if n is even.

(13.20)

Now use (13.19) and (13.20), to obtain (13.18). ♦

Integral of an m-FormThe definition of the (surface) integral of an m-form on a (positively)

oriented compact m-surface S is analogous to the case of a function on S:∫S

ω =∑i

∫ϕi

χiω =∑i

∫Ui

(χiω)ϕi(u)(∂1ϕi(u), . . . , ∂mϕi(u)

)du.

The argument that the integral is well-defined proceeds as above.As in the case of functions, the the integral of a form on a finite union S of

oriented compact m-surfaces S1, . . . , Sp is defined by∫S

f dS =∑i

∫Si

f dS.

ExercisesFor these exercises, declare a function f : S → R to be

Borel measurable if f ◦ ϕa : Ua → R is Borel measurableon Ua for each local parametrization ϕa of S.

1. Show that the collection of Borel measurable functions on S is closedunder the operations described in 10.5.3.

2. Define∫Sf dS for nonnegative Borel measurable functions f on S.

3. Call a Borel measurable function f integrable if∫Sf+ dS and

∫Sf− dS

are finite. In this case, define∫S

f dS =∫S

f+ dS −∫S

f− dS.


Prove that the resulting integral is linear and∣∣∣ ∫S

f dS∣∣∣ ≤ ∫

S

|f | dS.

4.S Formulate and prove versions of the monotone convergence theorem,Fatou’s lemma, and the dominated convergence theorem for Borel mea-surable functions on S.

5. Define the σ-field B(S) := {S ∩B : B ∈ B(Rn)}. Show that 1E is Borelmeasurable on S for every E ∈ B(S).

6. Show thatµS(E) :=

∫S

1E dS, E ∈ B(S),

defines a measure on B(S). (µS is called surface measure on S. It providesa way of calculating the area of Borel subsets of S.)

7.S Use Exercise 6 to verify the first equality in (13.19).

13.5 The Fundamental Theorems of CalculusThe fundamental theorem of single variable calculus expresses the integral

of a continuous function on an interval [a, b] as a function of the boundary{a, b}. The theorem tacitly assumes that integration occurs from “left to right,”which is to say that the interval [a, b] is positively oriented. The fundamentaltheorem of calculus may then be stated as follows: For any primitive F ofa continuous function f on [a, b], the integral of f is F (b) − F (a) if [a, b] ispositively oriented and F (a)−F (b) if [a, b] is negatively oriented. The theoremsproved in this section are higher dimensional versions of this formulation.

Stokes’s TheoremWhile the proof of the following theorem is based on many of the intricate

constructs developed earlier, the conclusion of the theorem is remarkably easyto state.

13.5.1 Stokes’s Theorem. Let S be a compact oriented (n− 1)-surface-with-boundary in Rn and let ω be a C1 (n− 2)-form on S. If ∂S has the inducedorientation, then ∫

∂S

ω =∫S

dω.

Proof. We prove the theorem first under the assumption that ω has compactsupport contained ϕ(I), where ϕ : U → Rn is a local parametrization of S andI is a bounded (n− 1) dimensional interval with cl(I) ⊆ U . For definiteness,


we assume that sign(ϕ) = 1, that is, the frame(dϕu(e1), . . . , dϕu(en−1)

)is

positive in Tϕ(u). Thus ϕ is oriented by ~Nϕ.Suppose first that U is open in Rn−1 and ϕ(U) ⊆ S \∂S. We may then take

I = (a1, b1)×· · ·× (an−1, bn−1). Since ω = 0 on ∂S,∫∂Sω = 0. Set η := ϕ∗(ω).

By 13.1.19, dη := ϕ∗(dω), hence, by 13.2.13,∫S

dω =∫U

dη.

Since η is an (n− 2) form on U ⊆ Rn−1, it may be expressed as

η =n−1∑i=1

(−1)i+1fi du1 ∧ · · · ∧ dui ∧ · · · ∧ dun−1,

where fi is C1 on U and supp(fi) ⊆ I. Then∫U

dη =∫U

n−1∑i=1

(−1)i+1

n−1∑j=1

∂fi∂uj

duj

∧ du1 ∧ · · · ∧ dui ∧ · · · ∧ dun−1

=∫U

n−1∑i=1

(−1)i+1 ∂fi∂ui

dui ∧ du1 ∧ · · · ∧ dui ∧ · · · ∧ dun−1

=n−1∑i=1

∫U

(∂ifi) du1 ∧ · · · ∧ dun−1. (13.21)

Since fi = 0 outside I, by the Fubini–Tonelli theorem, recalling 13.2.10, wehave∫U

(∂ifi) du1∧· · ·∧dun−1 =∫ b1

a1

· · ·∫ bn−1

an−1

∫ bi

ai

(∂ifi) dui dun−1 · · · dui · · · du1.

By the fundamental theorem of calculus, the innermost integral evaluates to

fi(u1, . . . , bi, . . . , un−1)− fi(u1, . . . , ai, . . . , un−1) = 0− 0 = 0.

Therefore, ∫S

dω = 0 =∫∂S

ω.

Now suppose that ϕ(U) ∩ ∂S 6= ∅. Then U is an open subset in Hn−1 thatintersects ∂Hn−1 and I is of the form (a1, b1)× · · · × (an−2, bn−2)× [0, bn−1).The above argument works for every term in the sum (13.21) except the last.It is still the case that fn−1(u1, . . . , un−2, bn−1) = 0, but now there is noguarantee that fn−1(u1, . . . , un−2, 0) = 0. Thus all we may conclude is that∫

S

dω =∫U

dη = −∫ b1

a1

· · ·∫ bn−2

an−2

fn−1(u1, . . . , un−2, 0) dun−2 · · · du1. (13.22)


Set V = U∩∂Hn−1. By definition of induced orientation, sign(ϕ∣∣V

)= (−1)n−1,

hence, by 13.2.13 again,∫∂S

ω = (−1)n−1∫V

η =n−1∑i=1

(−1)n+i∫V

fi du1 ∧ · · · ∧ dui ∧ · · · ∧ dun−1.

Since un−1 = 0 on V , dun−1 must be zero on V . Therefore, the first n − 2terms in the above sum vanish and we are left with∫

∂S

ω = −∫V

fn−1 du1 ∧ · · · ∧ dun−2,

which is (13.22). This verifies the theorem if ω has compact support.In the general case, let {ϕi : Ui → S : i = 1, . . . ,m} be an atlas of local

parameterizations of S and let {χi : i = 1, . . . ,m} be a partition of unitysubordinate to the open sets Ui. By the first part of the proof,∫

∂S

χiωi =∫S

d(χiω), i = 1, . . . ,m.

By the product rule, d(χiω) = χi dω + (dχi)ω. Since∑i χi = 1,∑

i

d(χiω) =[∑

i

χi

]dω +

[d∑i

χi

]ω = dω.

Therefore, ∫∂S

ω =∑i

∫∂S

χiω =∑i

∫S

d(χiω) =∫S

dω.

From the first part of the proof we have

13.5.2 Corollary. If S is a compact oriented (n−1)-surface-without-boundaryand if ω is an (n− 2)-form on S, then∫

S

dω = 0.

13.5.3 Remark. (a) For the case n = 3, Stokes’s formula takes the form∫∂S

[f1 dx1 + f2 dx2 + f3 dx3] =∫S

d[f1 dx1 + f2 dx2 + f3 dx3]. (13.23)

The left side of (13.23) may be written∫∂S

~F · ~dr, where ~dr := (dx1, dx2, dx3).

Let ~F := (f1, f2, f3) and (g1, g2, g3) := curl ~F . By 13.1.15(a),

d(f1 dx1 + f2 dx2 + f3 dx3

)= g1 dx2 ∧ dx3 + g2 dx3 ∧ dx1 + g3 dx1 ∧ dx2.


For a local parametrization ϕ,∫ϕ

[g1 dx2 ∧ dx3 + g2 dx3 ∧ dx1 + g3 dx1 ∧ dx2]

=∫U

[(g1 ◦ ϕ)∂(ϕ2, ϕ3)

∂(u1, u2) + (g2 ◦ ϕ)∂(ϕ3, ϕ1)∂(u1, u2) + (g3 ◦ ϕ)∂(ϕ1, ϕ2)

∂(u1, u2)

]du

=∫U

(curl ~F ◦ ϕ) ·(~Nϕ ◦ ϕ

)√det[ϕ′(u)tϕ′(u)

]du

=∫ϕ(U)

curl ~F · ~Nϕ dS

Using a partition of unity we see that the right side of (13.23) may thenbe written

∫S

curl ~F · ~N dS. Thus we obtain the classical version of Stokes’sformula ∫

∂S

~F · ~dr =∫S

curl ~F · ~N dS (13.24)

(b) If S is the graph of a function g(x, y) on a set D ⊆ R2, then S maybe parameterized by ϕ(x, y) = (x, y, g(x, y)). Hence, if S is oriented by theupward normal

(−∇g, 1)√

‖∇g‖2 + 1(see (12.3.7)(c) ), then (13.24) becomes∫

∂S

~F · ~dr =∫D

curl ~F · (−gx,−gy, 1) dx dy. ♦

13.5.4 Example. We verify (13.24) for

F (x1, x2, x3) = (x2x3, x1, x2)

on the cylinder-with-boundary

S :={

(x1, x2, x3) : x21 + x2

2 = 1, 0 ≤ x3 ≤ 1}

(Figure 12.17) oriented by the outward normal ~N(x1, x2, x3) = (x1, x2, 0). Thebottom boundary may be parameterized by ϕ1(t) = (cos t, sin t, 0) and the topby ϕ2(t) = (cos t,− sin t, 1), 0 ≤ t ≤ 2π. Therefore,∫

∂S

~F · ~dr =∫ 2π

0

[− f1(cos t, sin t, 0) sin t+ f2(cos t, sin t, 0) cos t

]dt

+∫ 2π

0

[− f1(cos t,− sin t, 1) sin t− f2(cos t,− sin t, 1) cos t] dt

=∫ 2π

0

[cos2 t+ sin2 t− cos2 t

]dt = π.

To find∫S

curl ~F · ~N dS we parameterize S by ϕ(t, x3) = (cos t, sin t, x3). Then

det[ϕ′(t, x3)tϕ′(t, x3)

]= det

[− sin t cos t 0

0 0 1

]− sin t 0cos t 0

0 1

= 1,

and since curlF = (1, x2, 1− x3),∫S

curl ~F · ~N dS =∫ 2π

0(cos t+ sin2 t) dt = π. ♦

Divergence TheoremThe divergence theorem is a variation of Stokes’s theorem. In the latter

theorem, integration occurs on an (n − 1)-dimensional surface in Rn with(n− 2)-dimensional boundary. In the former, the domain of integration is aregular region, which may be viewed as an n-dimensional surface in Rn with(n− 1)-dimensional boundary.

13.5.5 Definition. A bounded open subset E of Rn is called a regular regionif, for each point a ∈ bd(E), there exists a neighborhood Ua of a and a C1

function Fa : Ua → R with ∇Fa 6= 0 such that

(i) Sa := bd(E) ∩ Ua = {x : Fa(x) = 0},

(ii) E ∩ Ua = {x : Fa(x) < 0}, and

(iii) cl(E)c ∩ Ua = {x : Fa(x) > 0}. ♦

Note that Sa is an (n− 1)-surface in Rn and hence has a local parametriza-tion at each x ∈ Sa. Let

~na = ‖∇Fa‖−1∇Fa,

and let x ∈ Sa. For sufficiently small |t|, h(t) := x+ t~na(x) ∈ Ua. Since

(Fa ◦ h)′(0) = ∇Fa(x) · ~na(x) = ‖∇Fa(x)‖ > 0,

(Fa ◦ h) is strictly increasing. Because (Fa ◦ h)(0) = 0 we therefore have

Fa

(x+ t~na(x)

){< 0 if t < 0,> 0 if t > 0.

It follows from (ii) and (iii) that the normal vector t~na(x) to Sa at x pointsinto E if t < 0 and away from E (that is, toward Ec) if t > 0. The exteriorunit normal vector on bd(E) is then defined by

~n(x) = ~na(x), x ∈ Sa.

Uniqueness and continuity of ~na shows that ~n is well-defined and continuouson bd(E). (See Figure 13.6.)

E

a

UaFa > 0

Fa < 0

x

−→n (x)

Fa = 0

FIGURE 13.6: Regular region E.

13.5.6 Example. The n-dimensional annulus

E = {x ∈ Rn : r1 < ‖x‖ < r2}

is a regular region in Rn. Here, bd(E) has the components

Si = {x ∈ Rn : ‖x‖ = ri} , i = 1, 2.

The conditions of regularity are met by defining

Fa(x) ={r1 − ‖x‖ on {x ∈ Rn : ‖x‖ < (r1 + r2)/2} if a ∈ S1,‖x‖ − r2 on {x ∈ Rn : ‖x‖ > (r1 + r2)/2} if a ∈ S2.

Figure 13.7 depicts the case n = 2. ♦

E

S1 S2

FIGURE 13.7: Annulus in R2 with exterior normal.

13.5.7 Divergence Theorem. If E is a regular region in Rn and ω is a C1

1-form on cl(E), then ∫bd(E)

ω · ~n dS =∫E

divωx dx. (13.25)


Proof. The proof uses ideas similar to those used in the proof of Stokes’stheorem. By hypothesis, ω is C1 on an open set containing cl(E), which wemay assume also contains the sets Ua in 13.5.5. Since cl(E) is compact, by usinga partition of unity as in the proof of Stokes’s theorem, we may assume thatfor any a = (a1, . . . , an) ∈ cl(E) and the neighborhoods W of a constructedin the proof,

K :=n⋃i=1

supp(fi) ⊆W.

Suppose first that a ∈ E. Choose an n-dimensional interval W containinga such that cl(W ) ⊆ E. If K ⊆W , then ω = 0 on W c ⊇ bd(E), hence∫

bd(E)ω · ~n dS = 0 and

∫E

divωx dx =∫W

n∑i=1

∂ifi(x) dx = 0,

the last equality by the Fubini–Tonelli theorem and the fundamental theoremof calculus. Therefore, (13.25) holds in this case.

W

bd(E)

K

aE

FIGURE 13.8: The case a ∈ E.

Now let a ∈ bd(E) and let Ua and Fa be as in 13.5.5. We may assumethat the components ai of a and ni(a) of ~n(a) are positive, otherwise apply arotation and translation; the change of variables theorem implies that (13.25)is invariant under such transformations. (See Exercise 11 below for a specialcase of this.) We show that for each i = 1, . . . , n, there exists a neighborhoodWi of a such that if K ⊆Wi then∫

S

fi ni dS =∫E

∂ifi(x) dx. (13.26)

For notational simplicity, we do this for the case i = n.Since ∂nFa(a) 6= 0, by the implicit function theorem there exists a neighbor-

hood V of (a1, . . . , an−1), an open interval I containing an, and a C1 functiong : V → R such that V × I ⊆ Ua, an = g(a1, . . . , an−1), and

Fa

(x1, . . . , xn−1, g(x1, . . . , xn−1)

)= 0 on V.

By continuity, we may choose V and I sufficiently small so that

g(x1, . . . , xn−1) > 0 and ∂nFa(x) > 0 for all x ∈ V × I.

Now let x = (x1, . . . , xn) ∈ (V × I) ∩ E. Since Fa(x) is a strictly increasing

xn = g(x1, . . . , xn−1)

V

I

xn < g(x1, . . . , xn−1)Ua

aK

bd(E)

V × I

E

FIGURE 13.9: The case a ∈ bd(E).

function of xn ∈ I when the other coordinates are fixed and since Fa(x) < 0,it must be the case that 0 < xn < g(x1, . . . , xn−1). Thus

(V × I) ∩ E = {x ∈ V × I : 0 < xn < g(x1, . . . , xn−1)} and(V × I) ∩ Sa = {x ∈ V × I : xn = g(x1, . . . , xn−1)} .

(See Figure 13.9.) Note that the function ϕ defined by

ϕ(v) := (v, g(v)), v = (v1, . . . , vn−1) ∈ V,

is a local parametrization of Sa with unit normal

(1 + ‖∇g‖2)−1/2(−∇g, 1).Since this points outward it coincides with ~n. In particular, the nth componentof ~n is

nn = (1 + ‖∇g‖2)−1/2

on (V × I) ∩ Sa. Therefore, if K ⊆ V × I then, by 13.2.4(d),∫Sa

fn nn(a) dS =∫

(V×I)∩Sa

fn√1 + ‖∇g‖2

dS =∫V

(fn ◦ ϕ)(v) dv. (13.27)

On the other hand, since fn = ∂nfn = 0 outside K, by the Fubini–Tonellitheorem,∫

E

∂nfn(x) dx =∫

(V×I)∩E∂nfn(x) dx

=∫V

∫ g(v1,...,vn−1)

0(∂nfn)(v1, . . . , vn−1, xn) dxn dv1 . . . dvn−1

=∫V

(fn ◦ ϕ)(v) dv, (13.28)

the last equality by the fundamental theorem of calculus. Setting Wn = V × Iand comparing (13.27) and (13.28), we see that (13.26) holds for i = n. Asimilar proof works for i < n. Thus if K ⊆W1 ∩ · · · ∩Wn, then (13.26) holdsfor all i. Summing from 1 to n we obtain (13.25).


Connection with Stokes’s Theorem

Let E be a regular region in Rn whose boundary is a finite union of compactconnected (n− 1)-surfaces of the form S = {x : F (x) = 0}, where F : U → Ris a C1 function with ∇F 6= 0 such that Ua = U and Fa = F for all a ∈ S. Aball or annulus in Rn are simple examples. By 12.4.8, S is oriented and, foreach local parametrization ϕ : V → S,

~n(ϕ(v)) = ±1√det(ϕ′(v)tϕ′(v)

) n∑i=1

(−1)i−1 ∂(ϕ1, . . . , ϕi−1, . . . , ϕn)∂(v1, . . . , vn−1) (v).

where the sign is chosen to be the same for all v. Let each S have the orientationfor which the sign is (+). We shall call the resulting orientation of bd(E) positive.In this setting we have the following consequence of the divergence theorem.

13.5.8 Theorem. Let E be as described above and let

ω =n∑i=1

fi dx1 ∧ · · · ∧ dxi ∧ · · · ∧ dxn

be an (n− 1)-form on cl(E). If bd(E) is positively oriented, then∫bd(E)

ω =∫E

dω.

Proof. Recalling the additive definition of∫

bd(E) ω, we may assume that bd(E)consists of a single compact connected (n− 1)-surface S. Let

η :=n∑i=1

(−1)i−1fi dxi.

By the above,

(~n · η) ◦ ϕ(v) = 1√det(ϕ′(v)tϕ′(v)

) n∑i=1

(fi ◦ ϕ)(v)∂(ϕ1, . . . , ϕi, . . . , ϕn)∂(v1, . . . , vn−1) (v),

hence ∫ϕ

~n · η dS =n∑i=1

∫V

(fi ◦ ϕ)∂(ϕ1, . . . , ϕi, . . . , ϕn)∂(v1, . . . , vn−1) dv =

∫ϕ

ω.

Using a partition of unity we obtain∫S

ω =∫S

~n · η dS. (13.29)


On the other hand,

dω =n∑i=1

( n∑j=1

(∂jfi) dxj)∧ dx1 ∧ · · · ∧ dxi ∧ · · · ∧ dxn

=( n∑i=1

(−1)i−1∂ifi

)dx1 ∧ · · · ∧ dxn

= div η dx1 ∧ · · · ∧ dxn,

hence, recalling 13.2.10,∫E

dω =∫E

div ηx dx1 ∧ · · · ∧ dxn =∫E

div ηx dx (13.30)

The conclusion now follows from (13.29), (13.30), and the divergence theorem.

13.5.9 Remark. The divergence theorem has an interesting application tofluid dynamics. Consider an incompressible fluid moving in space. Let ρ(x, t)denote the density of the fluid in mass per unit volume at time t and point x,and let ~v(x, t) denote its velocity. If ~n is normal to a small surface element ofarea ∆S, then (ρ~v · ~n)(∆S)(∆t) is approximately the mass of the fluid flowingacross that surface element during a small time period ∆t. The rate of flow isthen (ρ~v · ~n)∆S. Adding these quantities and taking limits, we see that therate of flow of the fluid across a surface S in the direction of the normal isgiven by the integral ∫

S

ρ~v · ~n dS

Now let E be a regular region with smooth boundary S. Applying theforegoing to a ball Bε in E with boundary Sε, center y, and outer normal ~n,we see that the integral ∫

Sε

ρ~v · ~n dS

represents the rate of flow of the fluid out of the ball, that is, the negative ofthe rate of change of fluid in the ball. Since the amount of fluid in the ball attime t is

∫Bερ(x, t) dx,

d

dt

∫Bε

ρ(x, t) dx = −∫Sε

ρ~v · ~n dS = −∫Bε

div (ρ~v) dx,

the last equality by the divergence theorem. Differentiating under the integralsign and dividing by vol(Bε), we obtain

1vol(Bε)

∫Bε

∂tρ(x, t) dx = − 1vol(Bε)

∫Bε

div (ρ~v) dx.


Letting ε→ 0, we obtain

∂tρ(y, t) = −div(ρ(y, t)~v(y, t)

).

In particular, if ρ is constant in time, then div (ρ~v) is zero throughout E, hence∫S

ρ~v · ~n dS =∫E

div (ρ~v) dx = 0,

that is, the amount of fluid flowing out of E equals the amount flowing in. ♦

Green’s TheoremLet E be a regular region in R2 with boundary the union of finitely many

smooth simple pairwise disjoint curves C = ϕ(I). The boundary bd(E) is saidto be positively oriented if the vector obtained by rotating the unit tangentvector ~T , which is in the direction of (ϕ′1, ϕ′2), 90 degrees clockwise. Thisproduces the exterior normal ~n on C, which is in the direction of (ϕ′2,−ϕ′1).The region is then to the left as the boundary is traced in the direction of thetangent vector field on each curve C.

E

~n~T

C2

C3

C1

FIGURE 13.10: Regular region E in R2.

Now let ω = Qdx− P dy. Then

(ω ·~n)◦ϕ = (Q◦ϕ,−P ◦ϕ)·(ϕ′2,−ϕ′1)‖ϕ′‖−1 =[(P ◦ϕ)ϕ′1 +(Q◦ϕ)ϕ′2

]‖ϕ′‖−1,

hence ∫C

ω · ~n ds =∫C

(P dx+Qdy).

Summing over the curves C, we have∫bd(E)

ω · ~n ds =∫

bd(E)(P dx+Qdy).

Sincedivω = ∂Q

∂x− ∂P

∂y,

we obtain the following important special case of the divergence theorem.


13.5.10 Green’s Theorem. Let E be a region in R2, as described above. IfP, Q are C1 functions on an open set containing E, then∫

bd(E)(P dx+Qdy) =

∫∫E

(∂Q

∂x− ∂P

∂y

)dx dy. (13.31)

13.5.11 Corollary. The area of E is given by

area(S) = 12

∫∂S

(x dy − y dx).

Proof. Apply Green’s theorem to P (x, y) = −y/2, Q(x, y) = x/2, noting thatQx − Py = 1.

13.5.12 Example. The ellipse x2

a2 + y2

b2= 1 has parametrization x = a cos t,

y = b sin t, 0 ≤ t ≤ 2π. Therefore, the area inside the ellipse is

12

∫ 2π

0ab(cos2 t+ sin2 t) dt = πab. ♦

The Piecewise Smooth CaseBoth Stokes’s theorem and the divergence theorem may be extended to

more general surfaces called piecewise smooth. In the case n = 3, these arefinite unions of smooth surfaces S1, . . ., Sk that fit together so that

• no three surfaces meet in more than a single point, and

• the common boundary of two of these surfaces consists of finitely manydisjoint piecewise smooth simple closed curves.

S2

S3

S4

S3

S2

S5

S1

S1

FIGURE 13.11: Piecewise smooth surfaces.

(See Figure 13.11.) If S is such a surface, then the surface integral∫Sf dS is

defined as the sum∑kj=1

∫Sjf dS. The integral of a form on S has an analogous

definition. These definitions are reasonable since, by cancelations, the common


boundary of a pair of surfaces contributes nothing to the integral. We illustratethe basic idea with the simple example of a cube. Removing a face of the cuberesults in a surface-with-boundary Q, which we orient by the outward normal.If Stokes’s theorem is applied to each of the five faces and the results are added,the integrals along the boundaries cancel and one is left with Stokes’s theoremfor Q: ∫

∂Q

~F · ~dr =∫Q

curl ~F · ~N dS.

∂Q

Q

FIGURE 13.12: Oriented cube without bottom face.

Similarly, Green’s theorem extends to regions in R2 whose boundariesare only piecewise smooth. This, of course, leads to extended versions of itscorollaries. Here’s an application of the extended version of 13.5.11:

13.5.13 Example. Let ∂S be a closed polygon consisting of m line segments

Li := [(ai, bi) : (ai+1, bi+1)], i = 1, 2, . . . ,m,

where (am+1, bm+1) = (a1, b1) and the vertices are in counterclockwise order.(See Figure 13.13.)

(a1, b1)

(a2, b2)

(a3, b3)

(a4, b4)

(a5, b5)

L1

L2

L3

L4

L5

FIGURE 13.13: Closed polygon.

Then Li has the parametrization

x = (1− t)ai + tai+1, y = (1− t)bi + tbi+1, 0 ≤ t ≤ 1,

hence∫Li

(x dy − y dx) = (bi+1 − bi)∫ 1

0

[(1− t)ai + tai+1

]dt

− (ai+1 − ai)∫ 1

0

[(1− t)bi + tbi+1

]dt

= aibi+1 − ai+1bi.

Therefore,

area(S) = 12

m∑1=1

(aibi+1 − ai+1bi). ♦

Exercises1.S Verify directly the following version of Stokes’s theorem∫

∂S

[f dx+ g dy + h dz]

=∫S

[(hy − gz) dy ∧ dz + (fz − hx) dx ∧ dz + (gx − fy) dx ∧ dy

],

where S is the cylinder{

(x, y, z) : x2 + y2 = 1, 0 ≤ z ≤ 1}.

2. For (x, y) 6= (0, 0) define

P (x, y) = −y dxx2 + y2 and Q(x, y) = x dy

x2 + y2 .

Show that

(a) Qx = Py.(b)

∫ϕrP dx+Qdy = 2π, where ϕr(t) = (r cos t, r sin t), 0 ≤ t ≤ 2π.

(c)∫ψP dx + Qdy = 2π, where ψ is any piecewise smooth, clockwise

oriented, simple closed curve enclosing (0, 0).

(d)∫ 2π

0

cos2m t sin2m t

a2 cos4m+2 t+ b2 sin4m+2 tdt = 2π

(2m+ 1)ab , m ∈ Z+, a, b > 0.

3. Let 0 < r < R and let S ={

(x, y) : r2 ≤ x2 + y2 ≤ R2}. Verify Green’s


theorem on S for

(a) S P (x, y) = −y√x2 + y2

, Q(x, y) = x√x2 + y2

.

(b) P (x, y) = y√x2 + y2

, Q(x, y) = x√x2 + y2

.

(c) P (x, y) = x

x2 + y2 , Q(x, y) = −yx2 + y2 .

4. Use Green’s theorem to evaluate the following integrals, where the curvesC have counterclockwise orientation.

(a)∫C

[sin(x− y) dx+ sin(x+ y) dy

], C = bd

([0, π/2]× [0, π/2]

).

(b)∫C

[e−xy dx+ exy dy

], C = bd

([0, 1]× [0, 1]

).

(c)∫C

[cos(xy) dx+ sin(xy) dy

], C = bd

([0, 1]× [0, 1]

).

(d)S

∫C

[f(x) dx+ g(y) dy

], where f and g are C1 and C is simple, closed,

and piecewise C1.

5.S Use 13.5.11 to show that the area enclosed by the “elliptical astroid”(x2

a2

)1/(2m+1)

+(y2

b2

)1/(2m+1)

= 1, a > 0, b > 0, m ∈ Z+,

is given by

β

∫ π/2

0

(cos2m t+ sin2m t) dt = βπ

2(2m− 1)(2m− 3) · · · 5 · 3

2m(2m− 2) · · · 4 · 2 ,

where β := 4−mab(m+ 12 ). (See 5.3.4.)

6. Let E be a regular region in Rn and let f and g be C2 on cl(E). ProveGreen’s formulas:

(a)∫

bd(E)f∇g · ~n dS =

∫E

(∇f ·∇g + f∇2g

)dx.

(b)∫

bd(E)(f∇g − g∇f) · ~n dS =

∫E

(f∇2g − g∇2f

)dx,

where ∇2f :=n∑i=1

∂2f

∂x2i

, the Laplacian of f .

7. A C2 function f is said to be harmonic on set S ⊆ Rn if ∇2f = 0 on anopen set containing S.(a) Show that if f is harmonic on the ball Cr(0), then

∫Sr(0)∇f ·~n dS = 0.

(b) Show that if f and g are harmonic on the region cl(E) of 13.5.6 and~nt = ‖x‖−1x on St := St(0), then∫

S1

∇f · ~n1 dS =∫S2

∇f · ~n2 dS

and ∫S1

(g∇f − f ∇g) · ~n1 dS =∫S2

(g∇f − f ∇g) · ~n2 dS.

8. Let E ⊆ Rn be a regular region and let f be harmonic on cl(E) (Exer-cise 7). Show that∫

E

‖∇f‖2 dx =∫

bd(E)f ∇f · ~n dS,

where ~n is the outer normal. Deduce that if f = 0 on bd(E) and E isconnected, then f = 0 on E.

9.S Let E ⊆ Rn be a regular region and let f and g be harmonic on cl(E)(Exercise 7). Show that∫

bd(E)(f∇g + g∇f) · ~n dS = 2

∫E

∇f ·∇g dx,

where ~n is the outer normal.

10. Let n > 2. For t > 0, let Ct = Ct(0), St = St(0), and ~nt(x) = ‖x‖−1x,the outer normal to St. Suppose f is harmonic on Cr (Exercise 7). Provethe average value property of harmonic functions

f(0) = 1area(Sr)

∫Sr

f dS

by verifying (a)–(f) for 0 < t ≤ r. (Refer to 13.4.2.)

(a) The function g(x) := ‖x‖2−n, x 6= 0, is harmonic.

(b)∫St

f ∇g · ~nt dS = 2− ntn−1

∫St

f dS.

(c)∫St

g∇f · ~nt dS = 0.

(d) 1tn−1

∫St

f dS = 1rn−1

∫Sr

f dS.


(e) 1area(Sr)

∫Sr

f dS = 1area(St)

∫St

f dS.

(f) limt→0

1area(St)

∫St

f dS = f(0).

11. Let E be a region as in the statement of Green’s theorem. For thefunctions ψ in (a) and (b) below, prove that if the conclusion of Green’stheorem holds for ψ(E), then it holds for E. (This is a special case ofthe statement in the proof of the divergence that the region E may berotated and translated without loss of generality.)(a) ψ is the translation ψ(x, y) = (x+ x0, y + y0).(b) ψ is the rotation ψ(x, y) =

(x cos θ − y sin θ, x sin θ + y cos θ

).

S1 S1

S2

S2

CC

(a) (b)

FIGURE 13.14: Surfaces S1 and S2 with common boundary C.

12.S Orient the surfaces S1 and S2 in (a) and (b) of Figure 13.14 by theirouter normals ~n. Show that in

(a),∫S1∪S2

curl ~F · ~n dS = 0; (b),∫S1

curl ~F · ~n dS =∫S2

curl ~F · ~n dS.

13. Let a ∈ Rn, n > 2, and define an (n− 1) form ω on Rn+1 \ {a} by

ωx = ‖x− a‖−nn∑i=1

(−1)i−1(xi − ai) dx1 ∧ · · · ∧ dxi ∧ · · · ∧ dxn.

Show that dω = 0. Conclude that if S is a compact, oriented n-surface-with-boundary in Rn+1 and a 6∈ S, then

∫∂Sω = 0.

14.S Use the divergence theorem and 11.5.6 to show that the area of thesphere Sr(0) is nrn−1αn, derived by another method in 13.4.2.

15. Let E ⊆ Rn be a regular region and a ∈ E. Define f on Rn \ {a} by


f(x) = ‖x− a‖2−n. Show that div∇f = 0. Conclude that if Cr(a) ⊆ E,then ∫

bd(E)(∇f) · ~n dS =

∫Sr(a)

(∇f) · ~n dS = (2− n)nαn,

where ~n denotes the outer normals.

*13.6 Closed Forms in Rn

13.6.1 Definition. A C1 m-form ω on an open subset W of Rn is said to beclosed if dω = 0. The form ω is exact if there exists a C2 (m− 1)-form η onW such that d η = ω. ♦

By 13.1.16(b), an exact form is closed. The converse is false (see Exer-cise 13.5.2). However, there is a general class of regions on which every closedm-form is exact. We consider first the case m = 1.

Closed 1-Forms on Simply Connected Regions13.6.2 Definition. An open connected subset U of Rn is said to be simplyconnected if for each closed C2 curve ϕ : [a, b] → Rn in U there exists a C2

function Φ : [a, b]× [0, 1]→ U such that for all s ∈ [0, 1] and t ∈ [a, b],

Φ(t, 1) = ϕ(t), Φ(t, 0) = ϕ(a) = ϕ(b), and Φ(a, s) = Φ(b, s). ♦

The function Φ is called a (C2) homotopy between ϕ and the point p : ϕ(a) =ϕ(b).

1Φ( · , 1)

a b

Φ( · , 0)

Φ( · , s)

p

q

s

t

s

FIGURE 13.15: Curves contracting to p must pass through q.

Note that, for each s ∈ [0, 1], Φ(·, s) is a closed C2 curve in U such that

Φ(·, 1) = ϕ and Φ(·, 0) is a single point p. Thus a simply connected region Uhas the property that every closed curve in U may be contracted smoothly to apoint while remaining in U (see Figure 13.15). In R2 this means that there areno “holes” in U . In higher dimensions a simply connected set may have holes.For example, Rn \C1(0) is simply connected if n ≥ 3. However, the holes maynot be too large: the set R3 \ L, where L is a line, is not simply connected.

To prove that every closed 1-form of class C2 on a simply connected set isexact, we follow [5].

13.6.3 Lemma. Let ω be a closed 1-form on a simply connected subset U ofRn. Then

∫ϕω = 0 for each closed C2 curve ϕ in U .

Proof. Let ω =∑nj=1 fj dxj and let Φ : [a, b]× [0, 1] → U be a homotopy as

in 13.6.2. By hypothesis,

0 = dω =n∑j=1

n∑i=1

∂ifjdxi ∧ dxj =∑

1≤i<j≤n(∂ifj − ∂jfi)dxi ∧ dxj ,

hence∂ifj = ∂jfi for all i and j. (13.32)

Define C1 functions P (t, s) and Q(t, s) on S := [a, b]× [0, 1] by

P =n∑j=1

(fj ◦ Φ)∂sΦj and Q =n∑i=1

(fi ◦ Φ)∂tΦj ,

where we have suppressed the variable (t, s). By the chain and product rules,

∂tP =n∑j=1

{(∂sΦj)[(∇fj) ◦ Φ] · (∂tΦ) + (fj ◦ Φ)(∂tsΦj)

}and

∂sQ =n∑i=1

{(∂tΦi)[(∇fi) ◦ Φ] · (∂sΦ) + (fi ◦ Φ)(∂stΦi)

}.

Since Φ is C2, ∂tsΦj = ∂stΦj , hence

∂tP − ∂sQ =n∑j=1

{(∂sΦj)[(∇fj) ◦ Φ] · (∂tΦ)−

n∑i=1

{(∂tΦi)[(∇fi) ◦ Φ] · (∂sΦ)

=∑i, j

(∂sΦj)[(∂ifj) ◦ Φ](∂tΦi)−∑i, j

(∂tΦi)[(∂jfi) ◦ Φ](∂sΦj).

By (13.32), ∂tP − ∂sQ = 0, hence, by Green’s theorem,∫∂S

(P ds+Qdt) = 0.


Now, the positively oriented boundary of S consists of the parameterizedline segments

ψ1(t) = (t, 0), a ≤ t ≤ b; ψ2(s) = (b, s), 0 ≤ s ≤ 1;ψ3(t) = (−t, 1), −b ≤ t ≤ −a; ψ4(s) = (a,−s), −1 ≤ s ≤ 0.

(See Figure 13.6.) From the calculations

a

1

t

s

b

ψ1

ψ2

ψ3

ψ4

FIGURE 13.16: Boundary parametrization.

∫ψ1

(P ds+Qdt) =∫ b

a

Q(t, 0) dt,∫ψ2

(P ds+Qdt) =∫ 1

0P (b, s) ds,∫

ψ3

(P ds+Qdt) = −∫ b

a

Q(t, 1) dt,∫ψ4

(P ds+Qdt) = −∫ 1

0P (a, s) ds,

we have

0 =∫∂S

(P ds+Qdt) =∫ 1

0

[P (b, s)− P (a, s)

]ds+

∫ b

a

[Q(t, 0)−Q(t, 1)

]dt,

=∫ 1

0

n∑j=1

[fj(Φ(b, s))

)∂sΦj(b, s)− fj

(Φ(a, s)

)∂sΦj(a, s)

]ds

+∫ b

a

n∑j=1

[fj(ϕ(a)

)∂tΦj(t, 0)− fj

(ϕ(t)

)∂tΦj(t, 1)

]dt.

Since Φ(b, s) = Φ(a, s), the first integral is zero hence so is the second. Since∂tΦj(t, 0) = 0 and ∂tΦj(t, 1) = ϕ′j(t), we see from the second integral that∫

ϕ

ω =∫ b

a

n∑j=1

fj(ϕ(t)

)ϕ′j(t) dt = 0.

13.6.4 Lemma. Let ϕ : [0, 1] :→ Rn be a piecewise C1 curve such thatϕ(0) = ϕ(1). Then there exists a sequence of C∞ functions ϕk : [0, 1] → Rnwith the following properties:


(a) ϕk(0) = ϕk(1) = ϕ(0) for all k.

(b) limk ϕ′k(t) = ϕ′(t) at each continuity point t of ϕ′.

(c) limk ϕk = ϕ uniformly on [0, 1].

(d) The sequences {ϕk} and {ϕ′k} are uniformly bounded on [0, 1].

Proof. By considering components, we may assume that n = 1, that is, ϕ isreal-valued. Extend ϕ′ periodically with period 1 to R. Let M be a bound for|ϕ′| on R. By 13.3.3 there exists a C∞ function h : R→ R+ such that h > 0on (−1, 1) and h = 0 on (−1, 1)c. Multiplying h by a positive constant, wemay assume that

∫R h = 1. Let hk(x) = kh(kx), k = 1, 2, . . . . Then hk ≥ 0,

hk(x) = 0 for |x| ≥ 1/k, and∫R hk = 1. Define a C∞ function gk on R by

gk(x) =∫ ∞−∞

ϕ′(y)hk(x− y) dy =∫ 1/k

−1/kϕ′(x+ y)hk(y) dy.

The sequence {gk} is uniformly bounded since

|gk(x)| ≤∫ ∞−∞|ϕ′(x+ y)|hk(y) dy ≤M

∫ ∞−∞

hk(y) dy = M.

By periodicity,∫ 1

0ϕ′(x+ y) dx =

∫ 1

0ϕ′(x) dx = ϕ(1)− ϕ(0) = 0

(Exercise 5.3.1), hence, by Fubini’s theorem,∫ 1

0gk(x) dx =

∫ ∞−∞

hk(y)∫ 1

0ϕ′(x+ y) dx dy = 0.

Now define ϕk on R by

ϕk(x) = ϕ(0) +∫ x

0gk(y) dy.

Then (a) and (d) hold and (b) follows from

ϕ′k(x)− ϕ′(x) = gk(x)− ϕ′(x) =∫ 1/k

−1/k

[ϕ′(x+ y)− ϕ′(x)

]hk(y) dy,

which tends to 0 at continuity points x as k → +∞.Finally, (c) follows from (b), the inequality

|ϕk(t)− ϕ(t)| ≤∫ t

0|ϕ′k(x)− ϕ′(x)| dx ≤

∫ 1

0|ϕ′k(x)− ϕ′(x)| dx,

and Lebesgue’s dominated convergence theorem, noting that the set of discon-tinuity points of ϕ′ is finite and hence has measure zero.


13.6.5 Theorem. Let ω be a closed 1-form on a simply connected subset Uof Rn. Then ω is exact.Proof. By 12.2.10 it suffices to show that

∫ϕω = 0 for every piecewise C1

closed curve ϕ : [0, 1]→ U . Let {ϕk} be as in 13.6.4. Since ϕk → ϕ uniformlyon [0, 1] and ϕ([0, 1]) ⊆ U , it follows that ϕk([0, 1]) ⊆ U for all sufficiently largek (Exercise 8.5.22). For such k,

∫ϕkω = 0 by 13.6.3. By (b) and (c) of 13.6.4,

Lebesgue’s dominated convergence theorem, and the definition of integral of aform (13.16),

∫ϕkω →

∫ϕω. Therefore,

∫ϕω = 0, as required.

Closed m-Forms on Star-Shaped Regions13.6.6 Definition. A subset W of Rn is said to be star-shaped with respectto y ∈W if the line segment from y to any point x ∈W lies in W :

y + t(x− y) ∈W, 0 ≤ t ≤ 1. ♦

For example, a convex set is star-shaped with respect to every one of itspoints. In Figure 13.17, W is star-shaped with respect to y but not z, and Vis not star-shaped with respect to any of its points.

y

z x

W

y x

V

FIGURE 13.17: Star-shaped and non-star-shaped regions.

13.6.7 Poincaré’s Lemma. LetW ⊆ Rn be open and star-shaped with respectto some y ∈W . If ω is a closed C1 m-form on W , where 1 ≤ m ≤ n, then ωis exact.Proof. Define a function ψ : [0, 1] ×W → W by ψ(t,x) = y + t(x − y). Foran r-form

η =∑j∈Jr

gj dxj

on W , define the (r − 1)-form η on W by

ηx =∑

j∈Jm

[∫ 1

0tr−1(gj ◦ ψ)(t,x) dt

]η j, where

η j :=r∑i=1

(−1)i−1(xji − yji) dxj1 ∧ · · · ∧ dxji ∧ · · · ∧ dxjr , j = (j1, . . . , jr).


A standard argument shows that the definition of η is independent of thechoice of representation of η. In particular, by putting η in canonical form wesee that η = 0⇒ η = 0. Furthermore,

dη j =r∑i=1

(−1)i−1d(xji − yji) dxj1 ∧ · · · ∧ dxji ∧ · · · ∧ dxjm = r dxj.

Now letω =

∑j∈Jm

fj dxj.

Thenω =

∑j∈Jm

[∫ 1

0tm−1(fj ◦ ψ)(t,x) dt

]ωj,

and, by 13.1.16(d) (suppressing the variables (t,x) in fj ◦ ψ(t,x)),

d ω =∑

j∈Jm

{d

[∫ 1

0tm−1fj ◦ ψ dt

]∧ ωj +

[∫ 1

0tm−1fj ◦ ψ dt

]dωj

}.

Differentiating under the integral sign, applying the chain rule, and notingthat ψx = tIn, we have

d

[∫ 1

0tm−1(fj ◦ ψ) dt

]=

n∑i=1

[∫ 1

0tm(∂i(fj) ◦ ψ) dt

]dxi.

Therefore, using dωj = mdxj,

d ω =∑

j∈Jm

{n∑i=1

[∫ 1

0tm(∂ifj) ◦ ψ dt

]dxi ∧ ωj +m

[∫ 1

0tm−1fj ◦ ψ

]dxj

}.

(13.33)

On the other hand,

dω =∑

j∈Jm

(n∑i=1

∂ifj dxi

)∧ dxj =

∑j∈Jm

n∑i=1

∂ifj dxi ∧ dxj,

hence, since dω = 0,∑j∈Jm

n∑i=1

[∫ 1

0tm(∂i(fj) ◦ ψ)(t,x) dt

](dω)(i,j) = dω = 0.

By the above definition,

(dω)(i,j) =m∑`=1

(−1)`(xj` − yj`) dxi ∧ dxj1 ∧ · · · ∧ dxj` ∧ · · · ∧ dxjm

+ (xi − yi) dxj1 ∧ · · · ∧ dxjm= − dxi ∧ ωj + (xi − yi) dxj,


hence

=∑

j∈Jm

n∑i=1

[∫ 1


] [− dxi ∧ ωj + (xi − yi) dxj

]= 0. (13.34)

Adding (13.33) and (13.34), we obtain

d ω =∑

j∈Jm

{m

[∫ 1

0tm−1(fj ◦ ψ)

]+

n∑i=1

[(xi − yi)

∫ 1


]}dxj.

The term in braces is simply∫ 1

0

d

dt[tmfj ◦ ψ] dt = tm fj ◦ ψ

∣∣10 = fj.

Therefore, d ω = ω, which shows that ω is exact.

From Poincaré’s lemma we obtain the following results from classical vectoranalysis, where, in keeping with the spirit, we write grad f for ∇f .

13.6.8 Corollary. Let W be an open star-shaped subset of R3 and let

~F (x, y, z) =(P (x, y, z), Q(x, y, z), R(x, y, z)

)be a C1 vector field on W . Then

(a) curl ~F = 0 iff ~F = grad f for some C2 function f : W → R.

(b) div ~F = 0 iff ~F = curl ~G for some C2 vector field ~G on W .

Proof. (a) If ~F = grad f = (fx, fy, fz), then

curl ~F = (fzy − fyz, fxz − fzx, fyx − fxy),

which is zero because f is C2. Conversely, assume that curl ~F = 0, that is,

Ry −Qz = Pz −Rx = Qx − Py = 0.

Let ω = P dx+Qdy +Rdz. Then

dω = (Py dy + Pz dz) ∧ dx+ (Qx dx+Qz dz) ∧ dy + (Rx dx+Ry dy) ∧ dz= (Qx − Py) dx ∧ dy + (Rx − Pz) dx ∧ dz + (Ry −Qz) dy ∧ dz = 0

so ω is closed. By Poincaré’s lemma, there exists a 0-form f of class C2 on Wsuch that df = ω, that is, grad f = ~F .

(b) If ~F = curl ~G, where ~G = (f, g, h), then

P = hy − gz, Q = fz − hx, and R = gx − fy,


hence, if G is C2,

div ~F = Px +Qy +Rz = (hyx − gzx) + (fzy − hxy) + (gxz − fyz) = 0.

Conversely, assume div ~F = 0 and let

ω = Rdx ∧ dy + P dy ∧ dz +Qdz ∧ dx.

Then dω = div ~F dx ∧ dy ∧ dz, hence ω is closed. By Poincaré’s lemma,

ω = d(f dx+g dy+h dz) = (gx−fy) dx∧ dy+(hx−fz) dx∧ dz+(hy−gz) dy∧ dz

for some C2 functions f , g, h on W . Therefore,

P = hy − gz, Q = fz − hx, R = gx − fy,

that is, ~F = curl (f, g, h).

Part III

Appendices

Appendix ASet Theory

In this appendix we give an overview of those aspects of elementary set theorythat are used throughout the book. For details the reader may wish to consult[2, 8].

Notation for a SetA set is simply a collection of objects, each of which is called a member or

element of the set. Sets are usually denoted by capital letters, and members ofa set by small letters. If x is a member of the set A, we write x ∈ A; otherwise,we write x 6∈ A. The empty set, denoted by ∅, is the set with no members.

A concrete set may be described either by listing its elements or by set-builder notation. The latter notation is of the form {x : P (x)}, which is read“the set of all x such that P (x),” where P (x) is a well-defined property that xmust possess in order to belong to the set. For example, the set A of all oddpositive integers may be described as

A = {1, 3, 5, . . .} = {n : n = 2m− 1 for some positive integer m}.

A set A is a subset of a set B, written A ⊆ B, if every member of A is amember of B. If A ⊆ B and A 6= B, then A is called a proper subset of a set B.The empty set is a subset of every set and a proper subset of every nonemptyset. Sets A and B are said to be equal, written A = B, if each is a subset ofthe other. If all sets under consideration are subsets of the set S, then S iscalled a universal set (of discourse).

Set OperationsLet S be a universal set. The basic set operations are

A ∪B = {x : x ∈ A or x ∈ B}, union of A and B;A ∩B = {x : x ∈ A and x ∈ B}, intersection of A and B;A×B = {(x, y) : x ∈ A and y ∈ B}, Cartesian product of A and B;Ac = {x : x ∈ S and x 6∈ A}, complement of A in S;

A \B = {x : x ∈ A and x 6∈ B}, difference of A and B.

More generally, if {Ai : i ∈ I} is an arbitrary collection of sets indexed by a

505


set I, then the union and intersection of the collection are defined, respectively,by ⋃

i∈I

Ai = {x : x ∈ Ai for some i ∈ I},⋂i∈I

Ai = {x : x ∈ Ai for every i ∈ I}.

If the index set is {1, 2 . . . , n} or {1, 2, . . . , n, . . .}, we use the alternate notationn⋃j=1

Aj = A1 ∪A2 ∪ · · · ∪An,n⋂j=1

Aj = A1 ∩A2 ∩ · · · ∩An

and∞⋃j=1

Aj = A1 ∪A2 ∪ . . . ,∞⋂j=1

Aj = A1 ∩A2 ∩ . . .

A sequence of sets An is said to be increasing if A1 ⊆ A2 ⊆ · · · , in whichcase we write An ↑. Similarly, the sequence is decreasing if A1 ⊇ A2 ⊇ · · · ,written An ↓. In the first case we also write An ↑ A, where A = A1 ∪A2 ∪ · · · ,and in the second An ↓ A, where A = A1 ∩A2 ∩ · · · .

For finitely many sets we extend the definition of Cartesian product byn∏j=1

Aj = A1 × · · · ×An = {(a1, . . . , an) : aj ∈ Aj , j = 1, . . . n},

where (a1, . . . , an) is an (ordered) n-tuple. Also, we write

An = A×A · · · ×A︸︷︷︸n

.

In particular, for an interval [a, b] and the set of all real numbers R,

[a, b]n = [a, b]× · · · × [a, b]︸︷︷︸n

and Rn = R× · · · × R︸︷︷︸n

.

The following propositions summarize the basic properties of set operationsthat will be needed in the text. As with many set equalities, they may beestablished directly by showing that an arbitrary member of the left side of anequation is a member of the right side, and vice versa.

Proposition. If {Ai : i ∈ I} is collection of subsets of a set S, then

(a)(⋃

i∈I

Ai

)c=⋂i∈I

Aci . (b) A ∪(⋂

i∈I

Ai

)=⋂i∈I

A ∪Ai.

(c)(⋂

i∈I

Ai

)c=⋃i∈I

Aci . (d) A ∩(⋃

i∈I

Ai

)=⋃i∈I

A ∩Ai.

Set Theory 507

Parts (a) and (c) of the above proposition are known asDeMorgan’s laws.Parts (b) and (d) are called distributive laws.Proposition. The Cartesian product of sets has the following properties:

(a) A×(A1 ∪ · · · ∪An

)= (A×A1) ∪ · · · ∪ (A×An).

(b) A×(A1 ∩ · · · ∩An

)= (A×A1) ∩ · · · ∩ (A×An).

(c)(A1 ∩ · · · ∩An

)×(B1 ∩ · · · ∩Bn

)= (A1 ×B1) ∩ · · · ∩ (An ×Bn).

Partitions and Equivalence RelationsA collection of sets is pairwise disjoint if A∩B = ∅ for each pair of distinct

members A and B in the collection. A partition of a set S is a collection ofnonempty pairwise disjoint sets whose union is S.

An equivalence relation on a set S is a subset R of S×S with the followingproperties:

• (reflexivity) xRx for every x ∈ S;

• (symmetry) xRy ⇒ yRx;

• (transitivity) xRy and yRz ⇒ xRz.Here, as is customary, we have written xRy for (x, y) ∈ R.

There is an important duality regarding partitions and equivalence relations:If R is an equivalence relation on S, then the collection of sets of the form

[x] := {y ∈ S : xRy},

called an equivalence class of the relation, is a partition of S. Conversely, givena partition of S, define

xRy iff x and y are in the same partition member.

Then R is an equivalence relation on S whose equivalence classes are preciselythe members of the partition.

FunctionsLet A and B be nonempty sets. A function or mapping from A to B is a

rule f that assigns to each member x of A a unique member f(x) of B. Wethen write f : A → B. The set A is called the domain of f . The alternatenotation x 7→ f(x) : A→ B is also used. If A0 ⊆ A and B0 ⊆ B, then

f(A0) = {f(x) : x ∈ A0} and f−1(B0) = {x ∈ A : f(x) ∈ B0}

are called, respectively, the image of A0 and the pre-image of B0 under f . Theset f(A) is called the range of f . A function f : A→ B is said to be onto B iff(A) = B, and one-to-one if x1 6= x2 implies f(x1) 6= f(x2).


Proposition. Let f : A→ B be a function, {Ai : i ∈ I} a collection of subsetsof A, and {Bj : j ∈ J} a collection of subsets of B. Then

(a) f−1(⋃

j∈J

Bj

)=⋃j∈J

f−1(Bj).

(b) f−1(⋂

j∈J

Bj

)=⋂j∈J

f−1(Bj).

(c) f(⋃

i∈I

Ai

)=⋃i∈I

f(Ai).

(d) f(⋂

i∈I

Ai

)⊆⋂i∈I

f(Ai), where equality holds if f is one-to-one.

(e) f−1(Bcj ) =[f−1(Bj)

]c.(f) f(Aci ) ⊆

[f(Ai)

]c, where equality holds if f is onto B.(g) f

(f−1(Bj)

)⊆ Bj, where equality holds if f is onto B.

(h) Ai ⊆ f−1(f(Ai)), where equality holds if f is one-to-one.

If f : A → B and g : C → D are functions with B ⊆ C, then thecomposition of g and f is the function g ◦ f : A→ D defined by

(g ◦ f)(x) = g(f(x)

), x ∈ A.

If D0 ⊆ D, then(g ◦ f)−1(D0) = f−1(g−1(D0)

).

If f : A → B is one-to-one and onto B, then the inverse f−1 : B → A isdefined by the rule x = f−1(y) iff y = f(x). One then has the identities

(f−1 ◦ f)(x) = x and (f ◦ f−1)(y) = y, x ∈ A, y ∈ B.

Thus f−1 ◦ f and f ◦ f−1 are the identity functions on A and B, respectively.

CardinalityTwo sets A and B are said to have the same cardinality if there exists a

one-to-one function from A onto B. A set A is finite if either A is the empty setor A has the same cardinality as {1, 2, . . . , n} for some positive integer n. Inthe latter case, the members of A may be labeled with the numbers 1, 2, . . . , n,so A may be written {a1, a2, . . . , an}. A set A is countably infinite if it has thesame cardinality as the set of natural numbers. In this case the members of Amay be labeled with the positive integers 1, 2, 3, . . . A set is countable if it iseither finite or countably infinite; otherwise it is said to be uncountable. Theset of all integers is countably infinite, as is the set of rational numbers. Theset R of all real numbers is uncountable, as is any (nondegenerate) interval ofreal numbers.

Appendix BLinear Algebra

This appendix contains a brief review of the main ideas of linear algebra thatwill be needed in Part II of the text. For details and proofs the reader isreferred to [9].

Vector Spaces. BasesA vector space is a set V containing at least one member 0, called the

zero vector, together with two operations u + v and au (u, v ∈ V, a ∈ R),called vector addition and scalar multiplication, respectively, such that for allu, v, w ∈ V and a, b ∈ R the following axioms hold:

• Associativity of addition: (u+ v) +w = u+ (v +w).

• Commutativity of addition: u+ v = v + u.

• Additive identity: v + 0 = v.

• Existence of additive inverse: u+ (−u) = 0.

• Associativity of scalar multiplication: (ab)u = a(bu).

• Scalar distributivity: a(u+ v) = au+ av.

• Vector distributivity: (a+ b)u = au+ bu.

• Scalar multiplicative identity: 1u = u.

A subsetW of V containing the zero vector and closed under the operationsof vector addition and scalar multiplication is called a subspace of V. The setW is then a vector space under the operations it inherits from V. A linearcombination of vectors v1, . . . ,vn ∈ V is an expression of the form

c1v1 + · · ·+ cnvn, cj ∈ R.

The set of all linear combinations of v1, . . . ,vn is called the linear span ofv1, . . . ,vn or the subspace spanned by v1, . . . ,vn. The vectors v1, . . . ,vn arethen said to span V.

Vectors v1, . . . ,vn ∈ V are linearly independent if an equation of the form

c1v1 + · · ·+ cnvn = 0

509


can hold only if c1 = · · · = cn = 0. A basis for V is a finite set of linearlyindependent vectors that span V . It follows that each member of V is uniquelyexpressible as a linear combination of the basis vectors. A vector space that hasa basis is said to be finite dimensional; otherwise it is infinite dimensional. Allbases in a finite dimensional vector space V have the same number of vectors.This number is called the dimension of the vector space and is denoted bydimV. A frame for a finite dimensional vector space is an ordered basis.

If V is finite dimensional, then every set of linearly independent vectorsmay be extended to a basis, and every finite set of vectors that span V maybe reduced to a basis.

An important example of a finite dimensional vector space is Euclideanspace Rn (Section 1.6). The standard basis in Rn consists of the n vectors

e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), en = (0, 0, . . . , 0, 1).

An example of an infinite dimensional vector space is the set of all Riemannintegrable functions on [a, b] with the operations of pointwise addition andscalar multiplication.

A basis {w1, . . . ,wm} for a subspace W of Rn is orthonormal if

wi ·wj ={

0 if i 6= j,1 if i = j,

where (·) is the usual inner (= dot) product on Rn. For example, the standardbasis is orthonormal. Every subspace of Rn has an orthonormal basis.

Linear TransformationsLet U and V be vector spaces. A linear transformation from U to V is a

function T : U → V with the properties

T (u+ v) = Tu+ Tv and T (cu) = cTu, u, v ∈ U , c ∈ R.

Here, we have used the convention for linear transformations of dropping theparentheses in the notation T (u) when there is no danger of ambiguity. Thecollection of all linear transformations from U to V is denoted by L(U ,V). Itis a vector space under the operations T1 + T2 and cT defined by

(T1 + T2)(u) = T1u+ T2u, (cT )u = c(Tu), u ∈ U , c ∈ R.

If T ∈ L(U ,V) and S ∈ L(V,W), then ST := S ◦ T is a member of L(U ,W).Also, the subspace N (T ) := T−1({0}) of U is called the nullspace of T . Therange of T , which is a subspace of V, is denoted by R(T ). If U and R(T ) arefinite dimensional, then

dimN (T ) + dimR(T ) = dimU .

Linear Algebra 511

If T ∈ L(U ,V) is one-to-one and onto V, then T−1 ∈ L(V,U). In this caseT is said to be invertible. If U and V are finite dimensional, then T is invertibleiff N (T ) = {0} iff R(T ) = V. In this case T maps a frame (u1, . . . ,un) in Uonto a frame (v1, . . . ,vn) in V, where vj = Tuj . We indicate this by writing

T (u1, . . . ,un) = (v1, . . . ,vn).

MatricesAn m× n matrix is a rectangular array of real numbers with m rows and

n columns. It is written variously as

A = [aji ]m×n =

a1

1 a21 · · · an1

a12 a2

2 · · · an2...

... · · ·...

a1m a2

m · · · anm

=

a1a2...am

=[a1 a2 · · · an

],

where ai = (a1i , · · · , ani ) is the ith row of A and aj = (aj1, · · · , ajm) is the jth

column of A (written, of course, as a column). The number aji located in row iand column j of the matrix is also written aij and is called the (i, j)th entryof A.

For a ∈ R and matrices A = [aji ]m×n, B = [bji ]m×n, and C = [aji ]n×p, thesum A+B, scalar multiple aA, and product AC are defined, respectively, by

A+B = [xji ], xji := aji+b

ji , aA = [yji ], y

ji := aaji , AC = [zji ], z

ji :=

n∑k=1

aki cjk.

The product AC may also be written asa1a2...am

m×n

[c1 c2 · · · cp

]n×p = [ai · cj ]m×p.

The m× n matrix Om×n with all entries equal to 0 is called a zero matrix.It has the property that A+Om×n = A for all m×n matrices A. The collectionof m× n matrices is a vector space under the operations A+B and aA andwith zero Om×n.

The transpose of an m× n matrix A is the n×m matrix At:= [xji ], wherexji = aij . For example [

1 2 34 5 6

]t=

1 42 53 6

.The transpose operation has the following properties:

(A+B)t = At +Bt, (aA)t = aAt, (AC)t = CtAt.

For each n, the matrix

In :=

1 0 · · · 00 1 · · · 0...

... · · ·...

0 0 · · · 1

is called the nth order identity matrix. It has the property that

AIn = A and InB = B

for all m× n matrices A and all n× p matrices B.An n×n matrix A is said to be nonsingular if there exists a matrix, denoted

by A−1 and called the inverse of A, such that

AA−1 = A−1A = In.

The inverse operation has the property

(AB)−1 = B−1A−1

for all nonsingular n× n matrices A and B.An m×n matrix A is said to be in reduced row echelon form if the following

conditions hold:

• Any nonzero row has its first entry equal to 1. This entry is then calledthe leading entry of the row.

• If rows i and k are nonzero and i < k, then the leading entry of row i isto the left of the leading entry of row k.

• Entries above and below a leading entry are zero.

• Any zero row is below all nonzero rows.

For example, the following matrix is in reduced row echelon form:0 1 0 3 00 0 1 7 00 0 0 0 10 0 0 0 0

.For a given n, In is the only n×n matrix in reduced row echelon form withoutany zero rows.

An elementary row operation on an m× n matrix A is one of the following:

• Interchange a pair of rows.• Multiply a row by a nonzero scalar.• Add to one row a scalar multiple of another.

Linear Algebra 513

An elementary matrix is a matrix obtained from the identity matrix by anelementary row operation. Each elementary row operation onAmay be achievedby multiplying A on the left by a suitable elementary matrix. For example,the multiplication 0 1 0

1 0 00 0 1

1 2 34 5 67 8 9

=

4 5 61 2 37 8 9

switches the first and second rows of A, and the multiplication1 0 0

2 1 00 0 1

1 2 34 5 67 8 9

=

1 2 36 9 127 8 9

adds twice row one to row two. Using elementary operations, one may transformany m× n matrix A into reduced row echelon form R. It follows that thereexists a sequence of elementary matrices Ej such that

R = EpEp−1 · · ·E1A.

The row rank (column rank) of a matrix A is the maximum number oflinearly independent rows (columns) of A. The row rank of a matrix is alwaysequal to the column rank. (This is clear for the reduced row echelon form.)The rank of a matrix is its row (= column) rank.

The Matrix of a Linear TransformationLet T ∈ L(Rn,Rm). The matrix of T is defined by

[T ] =[Te1 Te2 · · · Ten

](where Tej is written as a column). If Tej = (aj1, a

j2, · · · , ajm) and x =

(x1, . . . , xn) =∑nj=1 xje

j , then, by linearity of T ,

T (x1, x2, . . . , xn) =n∑j=1

xjTej =

n∑j=1

(aj1xj , aj2xj , · · · , ajmxj)

=( n∑j=1

aj1xj ,

n∑j=1

aj2xj , · · · ,n∑j=1

ajmxj

),

which may be written in column matrix form as

[T ]xt =

a1

1 a21 · · · an1

a12 a2

2 · · · an2...

... · · ·...

a1m a2

m · · · anm

x1x2...xn

.


Note that aji may be expressed as (Tej) · ei.The operations of addition, scalar multiplication, and composition of linear

transformations correspond to addition, scalar multiplication, and multiplica-tion of matrices in the following way: If T, T ′ ∈ L(Rn,Rm) and S ∈ L(Rm,Rp),then

[T + T ′] = [T ] + [T ′], [tT ] = t[T ] [ST ] = [S][T ].In particular, if T ∈ L(Rn,Rn), then T is invertible iff [T ] is nonsingular.

An n × n matrix A is orthogonal if AAt = In, that is, if At = A−1 or,equivalently, detA = ±1. (See below.) A linear transformation T ∈ L(Rn,Rm)is said to be orthogonal if [T ] is orthogonal.

DeterminantsA permutation of the n-tuple (1, . . . , n) is a one-to-one function σ mapping

{1, . . . , n} onto itself. It is frequently denoted by (i1, . . . , in), where ik = σ(k).The permutation is said to be even or odd according as an even or odd numberof adjacent interchanges are required to transform (i1, . . . , in) to (1, . . . , n) (orvice versa). For example, (3, 2, 1) is odd and (4, 3, 2, 1) is even. The sign of apermutation σ is defined by

(−1)σ ={

1 if σ is even,−1 if σ is odd.

We then have

(−1)στ = (−1)σ(−1)τ and (−1)σ−1

= (−1)σ,

where, as is customary, τσ stands for τ ◦ σ.The determinant of an n× n matrix A = [aji ] is defined by

detA =

∣∣∣∣∣∣∣∣∣a1

1 a21 · · · an1

a12 a2

2 · · · an2...

... · · ·...

a1m a2

m · · · anm

∣∣∣∣∣∣∣∣∣ :=∑σ

(−1)σaσ(1)1 · · · aσ(n)

n ,

where the sum is taken over all permutations σ of (1, . . . , n). For example,∣∣∣∣a bc d

∣∣∣∣ = ad− bc,

since (−1)(1,2) = 1 and (−1)(2,1) = −1.If T ∈ L(Rn,Rn) we denote the determinant of the matrix of T by detT

rather than by the more cumbersome det[T ].The following theorem summarizes the main properties of determinants.

Parts (a)–(f) follow directly from the above definition; part (g) is proved inChapter 13.

Linear Algebra 515

Theorem. Let A =[a1 · · · an

]be an n× n matrix and t ∈ R. Then

(a) det[a1 · · · taj · · · an

]= tdet

[a1 · · · aj · · · an

].

(b) det[a1 · · · aj + b · · · an

]= det

[a1 · · · aj · · · an

]+ det

[a1 · · · b · · · an

].

(c) Interchanging two rows of A changes the sign of the determinant.

(d) If A has a pair of duplicate rows, then detA = 0.

(e) Adding a multiple of one row to another does not change the value of thedeterminant.

(f) detAt = detA. Thus any “row property” has a corresponding “columnproperty.”

(g) If B is an n× n matrix, then det(AB) = (detA)(detB).

The following theorem is frequently useful in evaluating determinants.

Theorem. Let A = [aij ] be an n×n matrix, and for each (i, j), let Aij denotethe matrix obtained by removing row i and column j from A. Then for eachfixed i and j,

detA =n∑k=1

(−1)i+kaik detAik =n∑k=1

(−1)k+jakj detAkj .

The first equality is called expansion along row i and the second expansionalong column j. For example, expanding along row 1,∣∣∣∣∣∣

a11 a12 a13a21 a22 a23a31 a32 a33

∣∣∣∣∣∣ = a11

∣∣∣∣a22 a23a32 a33

∣∣∣∣− a12

∣∣∣∣a21 a23a31 a33

∣∣∣∣+ a13

∣∣∣∣a21 a22a31 a32

∣∣∣∣ .The formula

∣∣∣∣a bc d

∣∣∣∣ = ad− bc may then be used to complete the evaluation.For another example, consider∣∣∣∣Ip Cp×q

Oq×p Dq×q

∣∣∣∣ = detD,

obtained by successive expansion along the first column.The preceding theorem may be used to prove the following result.

Theorem. Let A = [aij ] be an n× n matrix. Then A−1 exists iff detA 6= 0.In this case the (i, j) entry of A−1 is

(−1)i+j detAjidetA .


The last theorem may be used to prove Cramer’s Rule: Consider a systemof n equations in n unknowns, written in matrix form as Ax = b or explicitlyas

a11 a12 · · · a1na21 a22 · · · a2n...

... · · ·...

an1 an2 · · · ann

x1x2...xn

=

b1b2...bn

.If A is nonsingular, then the solution to the system is

xj = 1detA

∣∣∣∣∣∣∣∣∣a11 · · · a1,j−1 b1 a1,j+1 · · · a1na21 · · · a2,j−1 b2 a2,j+1 · · · a2n... · · ·

......

......

an1 · · · an,j−1 bn an,j+1 · · · ann

∣∣∣∣∣∣∣∣∣ .

Appendix CSolutions to Selected Problems

Section 1.21. (b)

[(ab)+(−a)b

]=[a+(−a)

]b = 0 ·b = 0, so uniqueness of the additive

inverse implies −(ab) = (−a)b. A similar argument works for the secondequality.(d) By (b), (−1)a = 1(−a) = −a.(f) Using commutativity and associativity of multiplication and thedistributive law and 1.2.1(i),

a/b+ c/d = ab−1(dd−1) + cd−1(bb−1) = ad(b−1d−1) + bc(b−1d−1)= ad(bd)−1 + bc(bd−1) = (ad+ bc)/(bd).

3. If s := r/x ∈ Q, then, by Exercise 2, x = r/s ∈ Q, a contradiction.Therefore, r/x ∈ I. The remaining parts have similar proofs.

5. The left side of (a) is n− 1n

n− 2n· · · 1

n= n!nn. For (b),

(2n)! =[2n(2n− 2)(2n− 4) · · · 4 · 2

][(2n− 1)(2n− 3) · · · 3 · 1

]= 2n

[n(n− 1)(n− 2) · · · 2 · 1

][(2n− 1)(2n− 3) · · · 3 · 1

].

8. f(k) = k3 − (k − 1)3 = 3k2 − 3k + 1.

Section 1.31. (c) Follows from a/b− c/d = (ad− bc)/bd.

4. If 0 < x < y, then multiplying the inequality by 1/(xy) and using (d) of1.3.2 shows that 1/y < 1/x. If x < y < 0, then 0 < −y < −x, hence, bythe first part, 1/(−x) < 1/(−y) so 1/x > 1/y.

6. (a) By Exercise 1.2.4, yn − xn = (y − x)n∑j=1

yn−jxj−1. Each term of the

sum is positive and less than yn−jyj−1 = yn−1. Since there are n terms,part (a) follows.

8. a = ta+ (1− t)a < tb+ (1− t)b = b.

517

10. If a > b, then x := a− b > 0 and a > b+ x, contradicting the hypothesis.

13. (b) 0 ≤ (x− y)2 + (y− z)2 + (z−x)2 = 2(x2 + y2 + z2)− 2(xy+ yz+xz).

14. Expand (x− a)2 ≥ 0 and divide by x.

18. If a ≤ x ≤ b, then x ≤ |b| and −x ≤ −a ≤ |a|, hence |x| ≤ max{|a|, |b|}.

21. Assume without loss of generality that S1 = S \{a1, . . . , ak}, so minS1 =ak+1. Each of the remaining sets Sj contains at least one of a1, . . . , ak,hence minSj ≤ ak < ak+1, verifying the assertion.

Section 1.42. (a) sup = 12, inf = −12. (b) sup = 1, inf = −1.

3. (c) sup = 10/3, inf = 3; (d) sup = 3+√

52 , inf = −∞;

(e) sup = +∞, inf = −∞. (h) sup = 3, inf = 0;

(i) sup = 12 +

√2

4 , inf = 12 −

√2

4 ; (m) sup = 4/3, inf = −1.

5. Let x, y ∈ A. Then ±(x−y) ≤ supA−inf A, hence |x−y| ≤ supA−inf A.Since |x|−|y| ≤ |x−y|, |x|−|y| ≤ supA−inf A so |x| ≤ supA−inf A+|y|.Since x was arbitrary, we have sup |A| ≤ supA − inf A + |y|, hencesup |A| − supA + inf A ≤ |y|. Since y was arbitrary, it follows thatsup |A| − supA+ inf A ≤ inf |A|.

6. (b) Since x > 0, xa ≤ x supA for all a ∈ A, hence sup (xA) ≤ x supA.Replacing x by 1/x proves the inequality in the other direction. Theinfimum case is similar.

9. Let a < b and choose a rational r in (a−√

2, b−√

2). Then r +√

2 isirrational and in (a, b).

12. (b) If n := bxc = −b−xc, then x− 1 < n ≤ x and x ≤ n < x+ 1. This ispossible only if x = n. The converse is trivial.(c) By definition −x− 1 < b−xc ≤ −x.

14. Let x := (bm)1/n and y :=(b1/n

)m. By definition, x is the unique positivesolution of xn = bm. Since yn =

[(b1/n

)m]n =[(b1/n

)n]m = bm, x = y.

17. Let ` ≤ x ≤ u for all x ∈ A. By the Archimedean principle, thereexist positive integers m and n such that −m < ` ≤ u < n. Set N =max{m,n}.

Solutions to Selected Problems 519

20. For any a ∈ N, if r :=√n+ a +

√n ∈ Q, then squaring both sides

of√n+ a = r −

√n shows that

√n ∈ Q and hence that n = j2 for

some j ∈ N (1.4.11). Then√n+ a ∈ Q, hence n = k2 for some k ∈ N.

Therefore, a = k2 − j2 = (k − j)(k + j). If a = 11, then k − j = 1 andj + k = 11 so n = 25. If a = 21, then either k − j = 1 and j + k = 21 ork − j = 3 and j + k = 7. The first choice leads to j = 10 and n = 100and the second to j = 2 and n = 4.

Section 1.53. Let f(n) denote the sum on the left side of the equation and g(n) the

sum on the right. Then f(1) = 1/2 = g(1). Now let n ≥ 1. Then

f(n+ 1)− f(n) =2n+2∑k=1

(−1)k+1

k−

2n∑k=1

(−1)k+1

k= 1

2n+ 1 −1

2n+ 2

g(n+ 1)− g(n) =2n+2∑k=n+2

1k−

2n∑k=n+1

1k

= 12n+ 2 + 1

2n+ 1 −1

n+ 1 .

Since the right sides are equal, f(n) = g(n)⇒ f(n+ 1) = g(n+ 1).

5. 253 n

3 − 152 n

2 + 16n.

6. (b)500∑k=1

(4k2 − 1) = 4 500 · 501 · 10016 − 500 = 167, 166, 500.

7. For n ≥ 1, let Q(n) be the statement P (n− 1 + n0). Then Q(1) = P (n0)is true. Assume Q(n) = P (n−1+n0) is true. Then Q(n+1) = P (n+n0)is true. By mathematical induction, Q(n) = P (n− 1 + n0) is true for alln ≥ 1, that is, P (n) is true for every n ≥ n0.

8. In each case, let f(n) be the left side of the inequality and g(n) the rightside, and let P (n) : f(n) < g(n). Let n0 be the base value of n for whichP (n) is true. It is straightforward to check that f(n0) < g(n0). AssumeP (n) holds for some n ≥ n0, so that f(n)/g(n) < 1. Then

(a) f(n+ 1)g(n+ 1) = 2n+ 3

2n+1 = f(n)2g(n) + 1

2n < 1.

(e) f(n+ 1)g(n+ 1) = 2n+1(n+ 1)!

(n+ 1)n+1 = f(n)g(n)

2(1 + 1/n)n < 1.

9. Check that 6 < ln(6!). For the induction step, use (n+ 1)! = (n+ 1)n!.

13. Let gn denote the expression on the right in the assertion. One checksdirectly that g0 = g1 = 1. Let n ≥ 2 and assume that fj = gj for all

2 ≤ j ≤ n. Then

gn+1 − fn+1 = gn+1 − fn − fn−1 = gn+1 − gn − gn−1

= 1√5(an+2 − an+1 − an

)+ 1√

5(bn+2 − bn+1 − bn

)= an√

5(a2 − a− 1) + bn√

5(b2 − b− 1) = 0.

15. The set of all nonnegative integers of the form m−qn, q ∈ Z, is nonempty(Archimedean principle), hence has a smallest member r = m− qn (wellordering principle). If r ≥ n, then 0 ≤ r − n = m − (q + 1)n < r,contradicting the minimal property of r. Therefore, m = qn+ r has therequired form. If also m = q′n+ r′, q′ ∈ Z, and r′ ∈ {0, . . . n− 1}, then|q − q′|n = |r − r′| < n, hence q′ = q and r′ = r.

Section 1.6

1. x = c− d · e− (b · c)(b · d)1− (a · b)(b · d) a, y = e− b · c− (a · b)(d · e)

1− (a · b)(b · d) d.

2. (c) By the triangle inequality,

||x||2 = ||x− y + y||2 ≤ ||x− y||2 + ||y||2,

hence ||x||2 − ||y||2 ≤ ||x− y||2. Similarly, ||y||2 − ||x||2 ≤ ||x− y||2.

3. By 1.6.3, ||x1 + x2 + · · ·+ xk||22 =n∑

i,j=1xi · xj =

k∑j=1

xj · xj .

7. The hypotheses imply thatn∑j=1

x2j =

n∑j=1

y2j = 1 and

n∑j=1

(xj + yj)2 = 4.

It follows that∑nj=1 xjyj = 1 and

∑nj=1(xj − yj)2 = 0. The same does

not hold for || · ||∞ (take x = (−1, 1) and y = (1, 1)) or for || · ||1 (takex = (1, 0) and y = (0, 1)).

Section 2.11. (a) an = [a+ b+ (−1)n(b− a)]/2.

3. (b) If n ≥ 6, |(2n2 − n)/(n2 + 3)− 2| = |n+ 6|/(n2 + 3) ≤ 2n/n2 = 2/n.Therefore, choose N ≥ min{6, 2/ε}.(e) |(2 + 1/n)3 − 8| =

[(2 + 1/n)2 + 2(2 + 1/n) + 4

]/n ≤ 19/n, so choose

any integer N > 19/ε.

5. Let r = pq−1, p, q ∈ Z, q > 0. For all n ≥ q, n!r ∈ Z, hence sin(n!rπ) = 0.

7. Let A = {x1, . . . , xp} and Aj = {n : an = xj}. One of these sets, say A1,must have infinitely many members. Since |x1− a| ≤ |x1− an|+ |an− a|and an → a, letting n → +∞ through A1 shows that x1 = a. We maytherefore choose ε > 0 so that I := (a− ε, a+ ε) contains no xj for j ≥ 2.Let N ∈ N such that an ∈ I for all n ≥ N . For such n, an = a.

8. (a) bn = (3an + 2bn − 3an)/2→ (c− 3a)/2.

9. (a) 2. (d) b/2√a. (g) −kak−1. (k) 1/2.

11. Use −r ≤ an − bn ≤ r and 2.1.4.

14. (a) Suppose first that r > 1. Set hn = r1/n−1. By the binomial theorem,r = (1 + hn)n > nhn, hence, by the squeeze principle, hn → 0. If r < 1,consider 1/r.

17. an < ran−1 < r2an−2 < · · · < rn−1a1 → 0. For the example, takean = 21/n.

19. Choose N such that an − a < ε for all n ≥ N . For such n,

0 ≤ min{a1, . . . , an} − a ≤ an − a < ε.

Therefore, min{a1, . . . , an} → a. The converse is false: consider an =1 + (−1)n.

22. Suppose that c ≤ f(x) − x ≤ d for all x, so c + jx ≤ f(jx) ≤ djx.Summing and using Exercise 1.5.4,

nc+ xn(n+ 1)/2 ≤n∑j=1

f(jx) ≤ nd+ xn(n+ 1)/2,

hence

c/n+ x(1 + 1/n)/2 ≤ (1/n2)n∑j=1

f(jx) ≤ d/n+ x(1 + 1/n)/2.

Letting n→ +∞, we obtain (a). Part (b) is proved similarly.

Section 2.21. Since

a1/n

a1/(n+1) = a1/n(n+1) < 1 < b1/n(n+1) = b1/n

b1/(n+1) ,

a1/n is increasing and b1/n is decreasing. Each tends to 1 by Exer-cise 2.1.14.

3. By results of Section 2.1,

an = a(1/n+ nb)−1 → 0 and nan = a(1/n2 + b)−1 → ab−1.

The condition an+1 < an is equivalent to (n2 + n)b > 1, which holdseventually. The condition (n+1)an+1 > nan is equivalent to the inequality(n+ 1)2 > n2.

7. Let f(x) = 1 + 12 + (1 + x)−1 = 3x+ 4

2x+ 3 . Then f : [1, 2] → [1, 2], f is

increasing and f(am) = am+2. Since a1, a2 ∈ [1, 2], an ∈ [1, 2] for all n.Since a1 = 1, a2 = 3/2, a3 = 7/5 and a4 = 17/12, the inequalities

a2n+2 < a2n and a2n+1 > a2n−1

hold for n = 1. Assume they hold for n = k. Then

a2k+4 = f(a2k+2) < f(a2k) = a2k+2 anda2k+3 = f(a2k+1) > f(a2k−1) = a2k+1,

hence the inequalities hold for n = k + 1.Since the sequences {a2n} and {a2n+1} are bounded and monotone,

the monotone convergence theorem implies that a2n → a and a2n+1 → bfor some a, b ∈ R. Letting n → +∞ in f(a2n) = a2n+2 gives f(a) = a.Therefore, a =

√2. Similarly, b =

√2.

9. For x > 0, x2 + r ≥ 2x√r, hence (x+ r/x)/2 ≥

√r. Therefore, an ≥

√r.

For x ≥√r, x2 + r ≤ 2x2, hence (x+ r/x)/2 ≤ x. Therefore, an ≥ an+1.

By the monotone convergence theorem, an → a for some a ≥√r. Letting

n→ +∞ in an = (an−1 + r/an−1)/2, yields a = (a+ r/a)/2, which haspositive solution a =

√r.

Section 2.31. (a) 0, ±3/8. (c) ±4, ±6, ±12, ±14.

3. (d) a2/kn =

(1 + 1

2n+ k

)2n+k (1 + 1

2n+ k

)−k→ e.

5. If {an} lies in the set {x1, . . . , xn}, then one of the sets {n : an = xj}musthave infinitely many members and a subsequence may be constructedfrom these.

8. Given ε > 0, choose N so that∑∞n=N |an+k − an| < ε. For m > n ≥ N ,

|amk − ank| ≤ |amk − a(m−1)k|+ · · ·+ |a(n+1)k − ank| < ε.

Therefore, {ank}∞n=1 is Cauchy.

10. Clearly an → 0 implies bn → 0. For the converse, suppose an 6→ 0.Choose ε > 0 and a subsequence such that ank ≥ ε > 0 for all k. Then

1 = bnk

(1aqnk

+ 1aq−pnk

)≤ bnk

(1εq

+ 1εq−p

),

hence bn 6→ 0. If 0 < q < p, then the sufficiency is false: Take an = n,q = 1/2 and p = 1. Then bn =

√n/(n+ 1)→ 0 but an → +∞.

Section 2.41. (a) lim inf = −5/3, lim sup = 5/3. (c) lim inf = −14, lim sup = 14.

(h) lim inf = −∞, lim sup = +∞.

3. Follows from Exercise 1.4.6.

5. Follows from {ank : k ≥ n} ⊆ {ak : k ≥ n}.

7. 0 < b− ε < bn < b+ ε⇒ an(b− ε) < anbn < an(b+ ε)⇒ (b− ε) lim supn→∞ an ≤ lim supn→∞ anbn ≤ (b+ ε) lim supn→∞ an.Now let ε→ 0.

10. Choose r so that lim infn→∞ bn > r > 0. Then, given ε > 0, there existsN such that an > a/2 and bn > r, and

cn := (bn − 3an)(bn + 2an) = b2n − anbn − 6a2n < ε

for every n > N . Then bn − 3an = cn/(bn + 2an) < ε/(r + a), solim supn→∞ bn ≤ 3a.

12. Suppose that lim infn a1/nn < lim infn

an+1

an. Choose r strictly between

these numbers and then chooseN such that an/an−1 > r for all n > N .For such n,

an > an−1r > an−2r2 > · · · > aNr

n−N ,

hence lim infn a1/nn ≥ lim infn(a1/n

N r1−N/n) = r, a contradiction. Toevaluate limn n/(n!)1/n take an = nn/n! and calculate

an+1

an=(n+ 1n

)n→ e.

Section 3.11. Let x1 < · · · < xn denote the points of E and let

δ = 12 min{xj − xi : 1 ≤ i < j ≤ n}.

Then for each j, (xj − δ, xj + δ) ∩ E = {xj}.

4. Let ε, M > 0.(b) The limit is 1. If |x− 1| < 1, then x > 0, hence∣∣∣∣ x+ 3

3x+ 1 − 1∣∣∣∣ = 2|x− 1|

3x+ 1 < 2|x− 1|.

Therefore, choose δ = min{1, ε/2}.(d) The limit is +∞: x < −

√M −1⇒ −x >

√M and −x−1 >

√M ⇒

x2 + x = (−x)(−x− 1) > M .

6. (a) 2/3. (d) +∞. (g) 9/25.

7. (b) −1/2. (f)√b+ x−

√b− x√

c+ x−√c− x

=√c+ x+

√c− x√

b+ x+√b− x

→√c

b.

(h) (a√d)/(c

√b).

9. The limit exists at a iff lim{x→a, x∈Q} f(x) = lim{x→a, x∈I} f(x). Bycontinuity of polynomials, this is equivalent to 4a2 +2a−11 = 3a2 +a−5.Thus a = −3, 2.

11. (a) a. (e) (c√a)/(2

√b).

13. Proof for the case f increasing and L := limn f(an) ∈ R: Given ε > 0,choose N such that L−ε < f(an) < L+ε for all n ≥ N . Let x > aN andlet n be the least integer > N such that x < an. Then an−1 ≤ x < an soL− ε < f(an−1) ≤ f(x) ≤ f(an) < L+ ε.

Section 3.21. (a) −1, 1. (c) −2/3, 2/3. (e) −1, 1. (i) −3, 1.

3. lim sup case: Assume a ∈ R. Set

L = lim sup{x→a, x∈E} f(x) and Lj = lim sup{x→a, x∈Ej} f(x), j = 1, 2.

By 3.2.1, there exists a sequence an ∈ E1 such that f(an)→ L1. Sincean ∈ E, by the same theorem, L1 ≤ L. Similarly, L2 ≤ L. Now let bn ∈ Esuch that f(bn) → L. Then one of the sets, say E1, contains infinitelymany terms of the sequence. Therefore, L ≤ L1, hence L = max{L1, L2}.

4. Let g(x) = 1/f(x). Then g(r) = 1/f(r).

Section 3.31. By continuity, f(2) = limx→2−(mx + 3) = limx→2+(3x2 + 7), that is,

2m+ 3 = 19. Therefore, m = 8.

4. This follows from

lim{x→a, x∈Q} d(x)g(x) = g(a) and lim{x→a, x∈I} d(x)g(x) = 0.

8. The identity implies that f(nx) = nf(x), n ∈ N. Also, f(0)+f(0) = f(0)so f(0) = 0. Since f(−x) + f(x) = f(0), we see that f(−x) = −f(x),hence f(nx) = nf(x) for all n ∈ Z. Let m, n ∈ N. Then f(x) =f(nx/n) = nf(x/n). Replacing x by xm gives mf(x) = f(mx) =nf(mx/n). Thus, f(tx) = tf(x) for all x ∈ R and t ∈ Q. Since f iscontinuous at zero and f(x − y) = f(x) + f(−y) = f(x) − f(y), f iscontinuous on R. Thus, f(tx) = tf(x) for all x, t ∈ R. Setting x = 1gives the desired result.

9. (c) Let a ∈ R and ε > 0. Choose N so that∑n>N 2−n < ε and then

choose δ > 0 so that (a, a+δ) contains none of the numbers c1, c2, . . . , cN .If a < x < a+ δ, then

0 ≤ f(x)− f(a) =∑

n:a<cn≤x2−n ≤

∑n>N

2−n < ε.

Therefore, f is right continuous at a.If a 6∈ {cn}, then we may choose δ so that (a− δ, a] contains none of

the numbers c1, c2, . . . , cN . If a− δ < x < a, then, as before,

0 ≤ f(a)− f(x) =∑

n:x<cn≤a2−n < ε.

Therefore, f is left continuous at a.If a = ck and a− δ < x < a, then

f(ck)− f(x) =∑

n:x<cn≤ck

2−n ≥ 2−k

so f is not left continuous at ck.

11. (a) Let xn → x in [0, 1] and let ε > 0. By hypothesis, for each n we maychoose tn ∈ (0, 1)\{x} such that |tn−xn| < 1/n and |f(tn)−g(xn)| < ε.Then tn → x, hence f(tn)→ g(x). From

|g(xn)− g(x)| ≤ |g(xn)− f(tn)|+ |f(tn)− g(x)|

we then havelim sup

n|g(xn)− g(x)| ≤ ε.

Since ε was arbitrary, g(xn)→ g(x).

(b) Note that f is continuous at x iff f(x) = g(x). Let ε > 0. We showthat the set Dε := {x ∈ [0, 1] : |f(x)− g(x)| ≥ ε} is finite. The desired

conclusion will follow on observing that the set of discontinuities off isprecisely

⋃∞n=1D1/n.

Suppose Dε is infinite. Then there exists a sequence of distinct termssuch that |f(xn) − g(xn)| ≥ ε for all n. By the Bolzano–Weierstrasstheorem, {xn} has a convergent subsequence, say xnk → x. Because theterms of {xn} are distinct, xnk 6= x for all large k, hence f(xnk)→ g(x).Also, by continuity, g(xnk)→ g(x). But this contradicts the inequality|f(xnk)− g(xnk)| ≥ ε.

Section 3.42. Let x0 > 0 and choose r > x0 such that f(x) < f(x0) for all x with|x| > r. Then the maximum M of f on [−r, r] is ≥ f(x0), hence M mustbe the maximum of f on R.

4. (e) Suppose f is not upper semicontinuous at x0. Choose r such thatf(x0) < r < lim supx→x0 f(x) and then i such that fi(x0) < r. For eachδ > 0, r < sup0<|x−x0|<δ f(x), hence there exists xδ with 0 < |xδ−x0| <δ such that f(xδ) > r. Thus we have

fi(x0) ≤ f(x0) < r < f(xδ) ≤ fi(xδ) ≤ sup0<|x−x0|<δ

fi(x).

Letting δ → 0 produces the contradiction

fi(x0) < r ≤ lim supx→x0

fi(x) ≤ fi(x0).

If fn(x) := xn on [0,1], then f = infn fn is discontinuous at x = 1.

5. (b) Define

f(x) ={

(1− 2x)(1− d(x)) if 0 ≤ x ≤ 1/2,(2x− 1)d(x) if 1/2 < x ≤ 1,

where d(x) is the Dirichlet function on [0, 1].

7. (c) f(x) = tan x − x. Since limx→(n+1/2)π− tan x = +∞, choose b ∈(nπ, (n+ 1/2)π) such that f(nπ) = −nπ < 0 < f(b).(f) Let f(x) denote the left side minus the right side of the equation.Then

limx→0+

f(x) = +∞, limx→π/2−

f(x) = −∞,

hence there exist 0 < a 0 > f(b).

9. Set g(x) = f(x)− x. Then g(b) ≤ 0 ≤ g(a) and the result follows fromthe intermediate value theorem.

Section 3.51. Take f(x) = 1/x and g(x) = x2 on (0, 1).

3. (a) Use the inequality

|ax2 − ay2|√ax2 + b+

√ay2 + b

≤√a |x− y|(|x|+ |y|)√

x2 + b/a+√y2 + b/a

≤√a |x− y|.

7. Given ε > 0, choose δ > 0 such that |f(x) − f(y)| < ε for all x, y ∈ Ewith |x−y| < δ. Then choose N such that |an−am| < δ for all m,n ≥ N .For such m, n, |f(an)− f(am)| < ε.

9. The inequality∣∣ |x|−|y| ∣∣ ≤ |x−y| shows that |x| is uniformly continuous.

The given functions are therefore compositions of uniformly continuousfunctions.

13. If 0 1, (sin x)/xp is continuous on (0,+∞) but has no continuousextension to [0,+∞).

15. Since f may be extended continuously to [a, b], it is bounded. Theexamples f(x) = x on (0,+∞) and f(x) = 1/x on (0, 1) show that theassumptions cannot be relaxed.

18. f(x) has unequal one-sided limits at 0 while those of g(x) are equal.Hence 0 is a removable discontinuity of g but not of f .

Section 4.1

1. If f denotes the given function, f(x+ h)− f(x)h

=

(b) 2√2x+ 2h+ 1 +

√2x+ 1

→ 1√2x+ 1

.

(d) −3√3x+ 3h+ 2

√3x+ 2

(√3x+ 2 +

√3x+ 3h+ 2

) → −32(3x+ 2)3/2 .

3. (a) 3(5x+ 7)1/3

5(3x+ 2)4/5 + 5(3x+ 2)1/5

3(5x+ 7)2/3 . (c) 4x(x2 + 1)2 cos

(x2 − 1x2 + 1

).

4. (b) − 1y cosxy2 −

y

2x

7. f is continuous at 1 iff 2a+ b = 1. For such a and b, f is differentiableiff a+ b = 3. Therefore, a = −2 and b = 5.

11. (b) The difference quotient is

f(a+ h2)− f(a)h2 h+ f(a− h)− f(a)

−h→ f ′(a).

14. For all h 6= 0, [f(x+ h)− f(x)]/h ≥ 0, hence f ′(x) ≥ 0.

16. Clear for n = 2. Suppose the assertion holds for n ≥ 2. Then

Dn+1(fg) =n∑k=0

(n

k

)D[(Dkf)(Dn−kg)

]=

n∑k=0

(n

k

)[(Dk+1f)(Dn−kg) + (Dkf)(Dn+1−kg)

]=

n∑k=1

[(n

k − 1

)+(n

k

)](Dkf)(Dn+1−kg) + gDn+1f + fDn+1g

=n+1∑k=0

(n+ 1k

)(Dkf)(Dn+1−kg).

18. (c) (f ′ ◦ g)g′′ + (f ′′ ◦ g)(g′)2.

19. (a) (−1)nn!xn+1 .

21. If x 6= 0, f ′(x) = xm+n−1(n cosxn +m

sin xnxn

). Also,

f ′(0) = limx→0

sin xnxn

xn+m−1

does not exist if n+m < 1,= 0 if n+m > 1,= 1 if n+m = 1.

Therefore, f ′ is continuous at 0 if n+m ≥ 1.

23. For the second order determinant use the expansion∣∣∣∣f1 f2g1 g2

∣∣∣∣ = f1g2 − f2g1.

For the third order determinant, expand along a row or column and usethe formula for the second order case. The same idea may be applied tonth order determinants.

Section 4.21. Set f(x) = cosx−

√x+ 1. Since f(0) > 0 > f(π/2), f has at least one

zero in (0, π/2), by the intermediate value theorem. Since f ′ < 0 on(0, π/2), f is strictly decreasing so the zero is unique.

3. Since f ′(x) = 4x(x− 1)(x− 2) < 0 on (1, 2), f has at exactly one zeroin the interval (1, 2) iff f(1)(= 1 + c) and f(2)(= c) have opposite signs,that is, iff c < 0 < c+ 1, or −1 < c < 0.

7. The assertion is clear if n = 0. Suppose it holds for all polynomialswith degree ≤ n. Let P (x) have degree n + 1 and suppose that theequation sin(ax) = P (x) has more than n+ 2 solutions. Then f(x) :=sin(ax) − P (x) has more than n + 2 zeros, hence, by Rolle’s theorem,f ′′(x) = −a2 sin(ax) − P ′′(x) has more than n zeros. But this meansthat sin(ax) = −P ′′(x)/a2 has more than n solutions, contradicting theinduction hypothesis.

9. By the Cauchy mean value theorem,

|f(x)− f(y)| |g′(c)| = |g(x)− g(y)| |f ′(c)| ≤ |g(x)− g(y)| |g′(c)|.

11. The derivative of x−1 sin x is negative since tan x > x, 0 < x < π/2.

17. Let c1 < · · · < cm be the distinct zeros of P ′. By the intermediate valuetheorem, P ′ has a constant sign on (cj , cj+1). Therefore, P (x) is strictlymonotone on these intervals.

19. Let |f ′| ≤ c < r. Then g′(x) = r + f ′(x) ≥ r − c > 0, so g is strictlyincreasing, hence one-to-one. By the mean value theorem, |f(x)−f(0)| ≤c|x| or f(0)− c|x| ≥ f(x) ≤ f(0) + c|x|. Therefore,

f(0) + rx− c|x| ≤ g(x) ≤ f(0) + rx+ c|x|.

Thus x > 0 ⇒ g(x) ≥ f(0) + (r − c)x ⇒ limx→+∞ g(x) = +∞, andx < 0 ⇒ g(x) ≤ f(0) + (r − c)x ⇒ limx→−∞ g(x) = −∞. By theintermediate value theorem, g(R) = R.

22. g′(0) = 0, hence f ′(0) > 0. Since f(±1/nπ) = ±1/nπ for all n ∈ N, f isnot monotone on any neighborhood of 0.

25. Let a, b ∈ I with a < b and suppose that f ′(a) < y0 < f ′(b), sog′(a) < 0 < g′(b). Then

(g(x)− g(a)

)/(x− a) < 0 for x ∈ (a, a+ δ), so

the minimum of g cannot occur at a. Similarly, the minimum of g cannotoccur at b. Thus, by the local extremum theorem, g′(x0) = 0, that is,f ′(x0) = y0, for some x0 ∈ (a, b).

28. Set q(x, y) = [f(x)− f(y)](x− y), x 6= y. If f is uniformly differentiableon I, then

|f ′(x)− f ′(y)| ≤ |f ′(x)− q(x, y)|+ |f ′(y)− q(x, y)|

shows that f ′ is uniformly continuous. Conversely, assume that f ′ isuniformly continuous. By the mean value theorem, for eachx < y thereexists a z ∈ (x, y) such that

|q(x, y)− f ′(y)| = |f ′(z)− f ′(y)|.

It follows that f is uniformly differentiable.

31. If such a function exists, then

limx→y

f(x)− f(y)x− y

= ϕ(y, y)

so f ′(y) = ϕ(y, y), which is continuous in y.Conversely, assume f is continuously differentiable on an open interval

I and define

ϕ(x, y) =

f(x)− f(y)

x− yif x 6= y,

f ′(x) if x = y.

Clearly ϕ is continuous on {(x, y) ∈ I × I : x 6= y}. By the mean valuetheorem, ϕ(x, y) = f ′(ξxy)(x − y), where ξxy is between x and y. Thecontinuity of ϕ on I × I now follows from the continuity of f ′.

Section 4.4

1. (b) (2− 3x)/(2x− 3), x 6= 3/2. (f) cos−1(

3x− 21− x

), 1/2 < x < 3/4.

5. (b) Fix y > 0 and let f(x) = ln(xy) − ln x − ln y. Since f ′(x) = 0,f(x) = f(1) = 0 for all x > 0.

7. (b) ax+y = exp((x+ y) ln a) = exp(x ln a) exp(y ln a) = axay.

9. xa = exp(a ln x), hence (xa)′ = exp(a ln x)(a/x) = axa−1.

13. The derivative of the left side of (c) is

2x2 + 1 −

4x(x2 + 1)2

√1− y2

, y :=(x2 − 1x2 + 1

)2

,

which reduces to 0. Therefore, the left side is constant.


14. Set c = f ′(0). Since

f(x+ h)− f(x)h

= f(x) f(h)− 1h

,

f ′(x) exists and equals cf(x). Therefore, e−cxf(x) has zero derivative,hence e−cxf(x) = f(0).

18. (f−1)′′(x) = −f ′′(f−1(x)

)[f ′(f−1(x)

)]3 .Section 4.5

1. (a) p− q. (d) −1. (g) −2. (j) 0. (m) −∞. (p) 0.(s) 1 if p > 1, +∞ if p ≤ 1. (v) 1.

2. (c) f(0) = limx→0+ f(x) = 5/3.

3. (a) ln an = n−1 ln sin(1/n) is of the form −∞+∞ , hence has the same limit

as1n2

cos(1/n)sin(1/n) = −cos(1/n)

n

1n−1 sin(1/n) → 0.

Therefore, an → 1.

6. By logarithmic differentiation,

f ′(x) =(

1 + 1x

)x1/x − ln x

xx1/x.

By l’Hospital’s rule, x1/x → 1, hence limx→+∞ f ′(x) = 1. Applying themean value theorem to f on each of the intervals [n, n+ 1] shows thatf(n+ 1)− f(n)→ 1.

9. Let L := limx→+∞ f ′(x)/g′(x). By l’Hospital’s rule, limx→+∞ g(x)/f(x)exists and equals 1/L. Another application of l’Hospital’s rule yields

limx→+∞

ln f(x)ln g(x) = lim

x→+∞

f ′(x)g′(x)

g(x)f(x) = 1.

10. (a) By l’Hospital’s rule, the quotient has the same limit as

αβ

2f ′(a+ αh)− f ′(a+ βh)

h

= αβ

2

[αf ′(a+ αh)− f ′(a)

αh− β f

′(a+ βh)− f ′(a)βh

],

which is αβ(α− β)f ′′(a)/2.


12. Apply l’Hospital’s rule n times to f(x)/x−n to obtain

limx→0+

xnf(x) = limx→0+

(−1)nf (n)(x)n(n+ 1) . . . (2n− 1)x−2n = lim

x→0+ax2nf (n)(x),

where a = (−1)n(n− 1)!(2n− 1)! . Therefore, lim

x→0+xnf(x) exists and equals aL.

16. By l’Hospital’s rule,

limx→+∞

f(g(x))g(x) = lim

x→+∞

f ′(g(x))g′(x)g′(x) = L.

For examples, take f(x) =√x, ln x, or x+ 1/x, and g(x) = xn, ex, or

ln x.


limx→+∞

f(x) = limx→+∞

xf(x)x

= limx→+∞

[xf ′(x) + f(x)

]= limx→+∞

xf ′(x) + limx→+∞

f(x).

For the second part consider, f(x) = ln x.

Section 4.62. Apply Taylor’s theorem to the function between the inequalities to

produce the number c ∈ (0, x) in the remainder term:

(b) f (k)(x) = (−1)ke−x, hence e−x =2n−1∑k=0

(−1)kk! xk + e−c

(2n)!x2n, where

e−c ∈ (0, 1).

3. Let In := 1n!

∫ x

a

(x− t)nf (n+1)(t) dt = −f(n)(a)n! (x− a)n + In−1. Iterat-

ing,

In = −n∑k=1

f (k)(a)k! (x− a)k + I0 = −Tn(x, a) + f(x).

5. By Taylor’s theorem,

bk = P (k)(b)k! = 1

k!

n−k∑j=0

(j + 1)(j + 2) · · · (j + k)(b− a)jak+j .

Section 4.71. (a) −1.52137970. (d) −1.42360584. (g) 1.220924381.

2. (a) 0.87672621. (c) 1.55714559.

4. 7.937253933.

Section 5.13. Since Mj(−f) = −mj(f), S(−f,P) = −S(f,P), hence∫ b

a

(−f) = infPS(−f,P) = inf

P(−S(f,P)) = − sup

PS(f,P) = −

∫ b

a

f.

Replacing f by −f shows that∫ ba

(−f) = −∫ baf .

5. Since g may be obtained from f by changing one point at a time, wemay assume that f = g except at a single point c ∈ (a, b). Let ε > 0 andlet M be a bound for both |f | and |g|. The point c is in at most twointervals of any partition P, and each of these has width ≤ ‖P‖. Sincef = g on the remaining intervals,

|S(f,P)− S(g,P)| ≤ 2M‖P‖.

It follows from 5.1.15 that∫ baf =

∫ bag. Similarly

∫ baf =

∫ bag.

6. (c) Let g = sin f , ε > 0, and let P be any partition of [a, b] such thatS(f,P)− S(f,P) < ε. For fixed j, choose sequences an, bn ∈ [xj−1, xj ]such that g(an)→Mj(g) and g(bn)→ mj(g). Then

g(an)− g(bn) ≤ |f(an)− f(bn)| ≤Mj(f)−mj(f),

hence Mj(g)−mj(g) ≤Mj(f)−mj(f). Therefore,

S(g,P)− S(g,P) ≤ S(f,P)− S(f,P) < ε.

7. (a) Let L = limP F (P) and M = limP G(P). Given ε > 0, choose P ′εand P ′′ε such that |F (P) − L| < η for all partitions P refining P ′ε and|G(P) −M | < η for all partitions P refining P ′′ε , where η = ε/(2|α| +2|β| + 2). Let Pε denote the common refinement of P ′ε and P ′′ε . Thenboth inequalities hold for any partition P refining Pε, hence

|(αF (P) + βG)− (αL+ βM)| ≤ |α||F (P)− L|+ |β||G(P)−M | < ε.

(b) Given ε > 0, choose Pε such that∫ baf − ε < S(f,Pε) ≤

∫ baf .

The inequality still holds if Pε is replaced by a refinement. Therefore,∫ baf = limP S(f,P).

Section 5.21. Assume cn → c ∈ (a, b). Choose δ > 0 so that a < c − δ < c + δ N . Since f has onlyfinitely many discontinuities on [a, c− δ] ∪ [c+ δ, b], f is integrable on

these intervals and the integrals are zero. Thus, given ε > 0, there existpartitions P1 of [a, c− δ] and P2 of [c+ δ, b] such that

S(f,P1)− S(f,P1) < ε/3 and S(f,P2)− S(f,P2) < ε/3.

Define a partition P on [a, b] by P = P1 ∪ P2 and let |f | ≤M on [a, b].If δ < ε/6M , then

S(f,P)−S(f,P) ≤ S(f,P1)−S(f,P1)+S(f,P2)−S(f,P2)+2Mδ < ε.

Therefore, f ∈ Rba. Moreover,∫ b

a

f =∫ c−δ

a

f +∫ c+δ

c−δf +

∫ b

c+δf =

∫ c+δ

c−δf,

hence ∣∣∣ ∫ b

a

f∣∣∣ ≤ ∫ c+δ

c−δ|f | ≤ 2Mδ.

Since δ may be made arbitrarily small,∫ baf = 0.

5. Set Mn = max{f1, . . . , fn}. Then M2 =(f1 + f2 + |f1 − f2|

)/2 ∈ Rba.

Since Mn = max{Mn−1, fn}, the general result follows by induction. Asimilar argument holds for min.

6. Choose x0 such that f(x0) = supa≤x≤b f(x). Then∫ baf ≤ f(x0)(b−a) <

M(b− a).

9. Let |f | ≤ M on [a, b]. Then |F (x, y) − F (x, y0)| ≤ M(y − y0), hencelimy→y0 F (x, y) = F (x, y0).

12. (a) By the approximation property, choose x0 such that |f(x0)| > M − ε.By continuity, we may take x0 ∈ (a, b) and we may choose δ > 0 suchthat |f(x)| > M − ε for all x ∈ (x0 − δ, x0 + δ). Then

M(b− a) ≥∫ b

a

|f | ≥∫ x0+δ/2

x0−δ/2|f | ≥ δ(M − ε).

(b) By (a), |f(x)|p > (M − ε)p on (x0 − δ, x0 + δ), hence, as in (a),

δ(M − ε)p ≤∫ b

a

|f |p ≤Mp(b− a).

Therefore,

δ1/p(M − ε) ≤(∫ b

a

|f |p)1/p

≤M(b− a)1/p,


hence

M − ε ≤ lim infp→+∞

(∫ b

a

|f |p)1/p

≤ lim supp→+∞

(∫ b

a

|f |p)1/p

≤M.

Since ε was arbitrary,

lim infp→+∞

(∫ b

a

|f |p)1/p

= lim supp→+∞

(∫ b

a

|f |p)1/p

= M.

Section 5.31. By a change of variables and periodicity,∫ p

0f(x+ y) dx =

∫ p+y

y

f(x) dx

=∫ p

y

f(x) dx+∫ p+y

p

f(x) dx

=∫ p

y

f(x) dx+∫ p+y

p

f(x− p) dx

=∫ p

y

f(x) dx+∫ y

0f(x dx =

∫ p

0f(x) dx.

3. (a) On [0, 1], 2x/π ≤ sin x ≤ x. Since∫ 1

0

x√x2 + 1

dx =√

2− 1, theinequalities follow.

5. (a) Substituting y = x1/n and integrating by parts n− 1 times yields∫ 1

0exp

(x1/n) dx = n

∫ 1

0yn−1ey dy = F (1)− F (0),

where

F (y) = (−1)n+1n!eyn−1∑j=0

(−1)jj! yj .

7. Let I denote the integral. Successive integration by parts yields

I = (k − 1)(k − 3) · · · (k − 2j + 1)1 · 3 · · · (2j − 1) Ij , Ij :=

∫ 1

0xk−2j(1− x2)j−1/2 dx.

If k is odd, take j = (k − 1)/2 so

I = (k − 1)(k − 3) · · · 4 · 21 · 3 · · · (k − 2) Ij , Ij =

∫ 1

0x(1− x2)(k−2)/2 dx = k−1.


If k is even, take j = k/2 so

I = (k − 1)(k − 3) · · · 3 · 13 · 5 · · · (k − 1) Ij = Ij =

∫ 1

0(1− x2)(k−1)/2 dx.

By trig. substitution and Exercise 6,

Ij =∫ π/2

0cosk θ dθ = π

2(k − 1)(k − 3) · · · 3 · 1k(k − 2) · · · 4 · 2 .

9. Substituting s = f(t) and integrating by parts yields∫ y

0f−1(s) ds =

∫ f−1(y)

0tf ′(t) dt = yf−1(y)−

∫ f−1(y)

0f,

hence∫ x

0f +

∫ y

0f−1 = yf−1(y) +

∫ x

0f −

∫ f−1(y)

0f = yf−1(y) +

∫ x

f−1(y)f.

If f−1(y) ≤ x, then f(t) ≥ y for all t ∈ [f−1(y), x], hence∫ x

f−1(y)f(t) dt+ yf−1(y) ≥

∫ x

f−1(y)y dt+ yf−1(y) = xy.

On the other hand, if f−1(y) ≥ x then f(t) ≤ y for all t ∈ [x, f−1(y)],hence ∫ x

f−1(y)f(t) dt+ yf−1(y) ≥ −

∫ f−1(y)

x

y dt+ yf−1(y) = xy.

10. (b) Take f(t) = ln(t + 1), 0 ≤ x ≤ 1, and 0 ≤ y ≤ ln 2 in Young’sinequality to obtain

(x+ 1) ln(x+ 1)− x+ ey − y− 1 =∫ x

0ln(t+ 1) dt+

∫ y

0(es − 1) ds ≥ xy.

Replace x+ 1 by x, 1 ≤ x ≤ 2.

13. Integrate by parts to obtain∫ b

a

f(x) sin(nx) dx = f(x) cos(nx)∣∣∣ab

+ 1n

∫ b

a

f ′(x) cos(nx) dx.

17. If F is a primitive of f , then∫ v(x)

u(x)f = F

(v(x)

)− F

(u(x)

). Now use the

chain rule.

limx→a

g(x)x− a

∫ x

a

f = limx→a

[g(x)f(x) + g′(x)

∫ x

a

f]

= g(a)f(a).

20. (a) sn is a Riemann sum for∫ 1

0xp dx, hence limn→+∞ sn = 1/(p+ 1).

21. By the mean value theorem,

|f(x)−f(xk−1)| ≤M |x−xk−1| ≤M(xk−xk−1) = Mh, x ∈ [xk−1, xk],

hence∣∣∣∣ ∫ b

a

f −n∑k=1

f(xk−1)h∣∣∣∣ =

∣∣∣∣ n∑k=1

∫ xk

xk−1

[f(x)− f(xk−1)

]dx

∣∣∣∣ ≤Mnh2.

Section 5.53. The substitution t = sin x yields∫ π/3

π/6f(

sin x)dx =

∫ √3/2

1/2

f(t)√1− t2

dt.

Now apply 5.5.3 with g(t) = (1− t2)−1/2.

7. G(b) ≤∫ b

a

fg ≤ G(a). Now apply the intermediate value theorem to G.

9. Apply 5.5.3 to obtain c ∈ [0, 1] such that∫ π

0g(x) sin x dx = g(0)

∫ c

0sin x dx+ g(1)

∫ π

c

sin x dx = cos c+ 1.

Section 5.71. Let f(x) denote the integrand and 0 < ε < 1. Then

∫ 10 f =

∫ ε0 f +

∫ 1εf.

On (0, ε], 2x/π ≤ sin x ≤ x and 1− ε ≤ 1− x < 1, hence

1|x|p≤ f(x) ≤ (π/2)p

(1− ε)q|x|p .

On [ε, 1),1

(1− x)q sinp 1 ≤ f(x) ≤ 1(1− x)q sinp ε .

Therefore,∫ 1

0 f converges iff p, q > 1.

5. Only (b) and (d) diverge.

8. Choose r > 0 so that 1/2 <(

sin xx

)p< 1 for 0 < x < r. Then

12

∫ ε

0xq−p dx ≤

∫ ε

0

sinp xxq

≤∫ ε

0xq−p dx.

Now apply 5.7.3(a).

9. (a) all p. (c) all p. (h) p > −2. (k) p > −1.

11. Let g(x) = x(1+x2)−1, h(x) = sin x and f := gh. Then |∫ x

1 h| is boundedand g′ < 0 so, by 5.7.17, f is improperly integrable on [1,+∞). For everyn,∫ ∞

0|f | dx ≥

n∑j=2

∫ jπ

(j−1)π

x| sin x|1 + x2 dx ≥

n∑j=2

∫ jπ

(j−1)π

π(j − 1)| sin x|1 + π2j2 dx

= M

n∑j=2

j − 11 + π2j2 ,

where M is a positive constant. The sums in the last equality are un-bounded, hence h is not improperly absolutely integrable in this case.

13. (a) Converges for all p > 0 if 0 < q < 1; diverges for all p > 0 if q ≥ 1.(b) Converges for all p > 0 if 0 < q < 1; diverges for all p > 0 if q ≥ 1.(c) Converges if p > 2 or q > 2 and diverges otherwise.(d) Converges if p < 2 or q < 2 and diverges otherwise.(e) Converges iff q < 1.(f) Converges iff pq < 1.

15. Integrate by parts:

In :=∫ ∞−∞

x2ne−x2/2 dx = (2n−1)

∫ ∞−∞

x2n−2e−x2/2 dx = (2n−1)In−1.

20. Both integrals converge. The root test is inconclusive.

24. By the Cauchy–Schwarz inequality,∫ ∞1

√f(x)x

≤(∫ ∞

1f(x) dx

)1/2(∫ ∞1

dx

x2

)1/2< +∞.

26. Let F (x) =∫ xafg, a ≤ x < b, and let bn ↑ b. By the weighted mean

value theorem,

F (bm)− F (bn) = f(cm,n)[G(bm)−G(bn)]

for some cm,n between bm and bn. Since G is bounded and f(cm,n)→ 0,{F (bn)} is a Cauchy sequence and hence converges. Since {bn} wasarbitrary,

∫ bafg converges.

28. Let F (t) =∫ t

0 f dx. Then∫ t

0f(x+ c) dx =

∫ c+t

c

f(x) dx =∫ t

c

f(x) dx+ F (t+ c)− F (t),

hence ∫ ∞0

f(x+ c) dx =∫ ∞c

f(x) dx.

Similarly,∫ 0

−∞f(x+ c) dx =

∫ c

−∞f(x) dx.

Section 5.82. Given ε > 0, let An be covered by intervals In,k, k = 1, 2, . . ., with total

length < ε/2n. Then the union is covered by intervals In,k, n, k = 1, 2, . . .,with total length < ε.

6. The discontinuity set is countable, hence the integral exists. Since alllower sums are zero, the integral must be zero.

Section 6.1

1. (a) m3(m+ 1)2m+ 1 . (c) ln(3/2). (e) 1

m

m∑k=1

(−1)kk

. (g) 23480 .

(i) − ln(m+ 1). (n)m∑k=1

(−1)kk

.

2. (a) 11 + r2 .

3. (a) 193e. (c) (e− 1/e)/2.

5. Let sn =n∑k=1

1k, un =

n∑k=1

12k − 1 , and vn =

n∑k=1

4(2k − 1)(2k + 1) .

(a) s2n = sn/2 + un, hence, by 6.1.9,

un − 12 lnn = [s2n − ln(2n)]− 1

2 [sn − lnn] + ln 2→ 12γ + ln 2.

8. Given ε > 0, choose N such that L− ε < ak/bk < L+ ε for all k ≥ N .Multiplying by bk and summing,

(L− ε)m∑k=n

bk <

m∑k=n

ak < (L+ ε)m∑k=n

bk, m > n ≥ N.

Letting m→ +∞ and dividing,

L− ε <∑∞k=n ak∑∞k=n bk

< L+ ε, n ≥ N.

12. Let sn and tn denote the nth partial sums of∑an and

∑bn, respectively.

Then tk = snk so {tk} is a subsequence of {sn}. Therefore, if∑n an

converges, so does∑k bk. If the terms an are nonnegative, then, for each

n, sn ≤ tk for k ≥ n, hence if∑k bk converges, then so does

∑n an. The

series∑∞n=0(−1)n shows that the latter assertion fails in general.

15. By summing a geometric series, a real number x with representationbNbN−1 · · · b0.a1a2 · · · an999 · · · , where an 6= 9 may be written as

bNbN−1 · · · b0.a1a2 · · · an + 10−n = bNbN−1 · · · b0.a1a2 · · · an−1a′n,

where a′n := an + 1. Therefore, a real number has at least one standardrepresentation.Suppose that bNbN−1 · · · b0.a1a2 · · · = cMcM−1 · · · c0.d1d2 · · · are stan-

dard representations. Then

|bNbN−1 · · · b0 − cMcM−1 · · · c0| = |(.d1d2 · · · )− (.a1a2 · · · )|

≤∞∑j=1

|dj − aj |10j .

Since the representations are standard, |dj − aj | cannot eventually equal9, hence the right side is < 1. Therefore, since the left side is an integer,it must be zero. It follows from Exercise 1.5.16 that M = N and bj = cj ,0 ≤ j ≤ N . Then a1.a2a3 · · · = d1.d2d3 · · · , hence a1 = d1. An inductionargument shows that an = dn for all n.

Section 6.21. By the ratio test, (a), (b), (e), and (f) converge; (c) and (d) diverge.

2. (a) Converges by ratio test.(d) Converges by ratio test.(g) Converges by integral test iff p > 1.(j) Diverges by limit comparison with

∑1/n.

(m) Converges by limit comparison with∑

1/n2.(p) Diverges by ratio test.(s) Diverges since 2lnn = np, p = ln 2 < 1.(v) Converges by limit comparison with

∑1/2n.

5. For all sufficiently large n, an < ann1/n < 2an.

6. (a) Converges iff p > 1. (e) Converges iff q > p. (g) Converges iffq > 1 + p.

8. (a) Since an → 0, a2n < an for all large n. Therefore,

∑bn converges by

comparison test.(d) Converges by comparison test: bn ≤ an.(h) Converges: For n sufficiently large, say n ≥ N , an < 1, hencebn = MaN · · · an < Man, where M = a1 · · · aN−1.(l) Converges by the Cauchy–Schwarz inequality.

11. The inequality implies that {an/bn} is a decreasing sequence and henceconverges to L < +∞. Now use the comparison test.

14. Since limx→∞ f(g(x)) = limx→∞ g(x) = 0, l’Hospital’s rule implies thatlimx→∞ f(g(x))/g(x) = limx→∞ f ′(g(x)) = f ′(0). Now apply the limitcomparison test to

∑n f(g(n)) and

∑n g(n).

15. (a) If∑f(1/np) converges, then f(0) = limn f(1/np) = 0. Suppose

f ′(0) 6= 0. Then, by l’Hospital’s rule, limx→0f(xp)x2p = ∞. Therefore,

eventually f(1/np) > 1/n2p so the series diverges by the comparison test.

17. (a) n!(e− sn) = m(n− 1)!−n∑k=1

n!k! ∈ N.

(b) e− sn =∞∑k=1

1(n+ k)!

= 1(n+ 1)!

[1 + 1

n+ 2 + 1(n+ 2)(n+ 3) + . . .

]<

1(n+ 1)!

[1 + 1

n+ 1 + 1(n+ 1)2 + . . .

]= 1

(n+ 1)!n+ 1n

,

hence n!(e− sn) < 1/n. By (a) and (b), n!(e− sn) is a positive integer< 1/n, which is impossible.

Section 6.33. (a) and (c) diverge: dn → 2/3; (b) converges: dn → 3/2.

5. (a) By ratio test: series converges if p < e and diverges if p > e. If p = e,series diverges by Raabe’s test since then dn → −1/2.

6. Ratio test fails. Raabe: dn → (1 + p)/2, hence converges if p > 1 anddiverges if p < 1. Also diverges if p = 1, since then an = 1/(2n+ 1).

10. (a) Diverges. (d) Converges iff r > 1.

13. − ln an/ lnn→ ln b.

16. (a) Converges iff q > p.

18. Let c > 1 and choose r ∈ (1, c). Then, for sufficiently large n, cn > r,hence

ln a−1n = lnn+ cn ln lnn > lnn+ r ln lnn = ln

[n(lnn)r

]and therefore an <

1n(lnn)r . Since

∫∞2 1/x(ln x)r dx < +∞, the integral

and comparison tests complete the proof in this case. The case c < 1 issimilar.The given series diverges.

21. Take bn = n lnn in Kummer’s test. Then

cn =[1 + 1

n+ βnn lnn

]n lnn−(n+1) ln(n+1) = (n+1) ln

(n

n+ 1

)+βn.

Since the first term on the right side tends to −1, lim infn→∞

βn > 1 implieslim infn→∞

cn > 0, and lim supn→∞

βn < 1 implies lim infn→∞

cn < 0.

Section 6.42. Choose r > 1 and N ∈ N such that |an+1|/|an| > r for all n ≥ N . Then|aN+k| > rk|aN | for all k, hence an 6→ 0. Therefore, series diverges.

4. (a) Diverges. (b) Converges conditionally.(c) Converges absolutely if p > 1, conditionally if p ≤ 1.(i) Converges absolutely if p > 1/2, conditionally if p ≤ 1/2.(m) Converges absolutely if p > 1, diverges if p < 1.

9. Let bn = n− 1/2np + (−1)n . If p ≤ 1, then bn sinnθ need not tend to zero (see

Example 8.3.10). For p > 1, it suffices by Dirichlet’s test to show that∑|bn+1 − bn| < +∞. This follows by limit comparison with

∑1/np.


13. (a) For n ∈ N, n = qmn + rn, where rn, mn ∈ N and 0 ≤ rn ≤ q − 1.Since sn − sqmn is a sum of terms of the form aqmn+j , j = 1, . . . q − 1,each of which → 0, sn − sqmn → 0. Therefore, sn → s.(b) For n ∈ N,

s6n =(

1 + 12 + 1

3

)−(

14 + 1

5 + 16

)+(

17 + 1

8 + 19

)−(

110 + 1

11 + 112

)+ · · ·+

(1

6n− 5 + 16n− 4 + 1

6n− 3

)−(

16n− 2 + 1

6n− 1 + 16n

)=(

1− 14

)+(

12 −

15

)+(

13 −

16

)+ · · ·+

(1

6n− 3 −1

6n

)= 3

1 · 4 + 32 · 5 + 3

3 · 6 + · · ·+ 3(6n− 3)6n = 3

6n−3∑k=1

1k(k + 3) .

The last expression converges to (1 + 1/2 + 1/3) = 11/6 by 6.1.5 withm = 3. By part (a), s = 11/6.(c) Let tn be the nth partial sum of the series. Then

t5n = 1 + 12 + 1

3 −14 −

15 + 1

6 + 17 + 1

8 −19 −

110 + · · · − 1

5n≥ 1

3 + 13 + 1

3 −13 −

13 + 1

8 + 18 + 1

8 −18 −

18 + · · ·

= 13 + 1

8 + 113 + · · ·+ 1

5n− 2 .

Thus t5n → +∞, so the series diverges.

Section 6.53. (a), (b), (c): Double limit does not exist; only one iterated limit exists.

(d), (g), (l): Iterated limits exist and are unequal. Hence double limitdoes not exist.(e), (h): Iterated limits exist and are equal. Double limit exists.(f), (i), (k): Iterated limits exist and are equal. Double limit does notexist.(j) If a = b, iterated limits exist and are equal, double limit exists. Ifa 6= b, iterated limits exist and are unequal.

9. Let sm,n =∑mj=1

∑nk=1 aj,k and sn =

∑nk=1 bk. Then for m ≥ n,

sn ≤ sn,n ≤ sm,n ≤ sm,m ≤ s2m−1,

hence the result follows from the squeeze principle.

10. (b) Let bn =∑nj=1 aj,n+1−j =

n∑j=1

1[j2 + (n+ 1− j)2]p/2 and let sn =∑n

k=1 bk. The minimum of x2+(n+1−x)2 on [1, n] occurs at x = (n+1)/2and the maximum at x = 1 and x = n, hence

(n+ 1)2/2 ≤ j2 + (n+ 1− j)2 ≤ n2 + 1, 1 ≤ j ≤ n,

and thereforen

(n2 + 1)p/2 ≤ bn ≤2p/2n

(n+ 1)p ,

so the double series converges iff p > 2.

11. If |r| ≥ 1, then am,n 6→ 0, hence the double series diverges. Let |r| < 1and set cm = |r|m/(1−|r|m). ChooseM such that |r|m < 1/2 form > M .Then∞∑m=1

∞∑n=1|r|mn =

M∑m=1

cm +∞∑

m=M+1cm ≤

M∑m=1

cm + 2∞∑m=1|r|m < +∞.

Therefore, the iterated series, and hence the double series, convergesabsolutely.

12. Let L < 1. Choose r ∈ (L, 1) and then N such that a1/mnm,n < r for

all m, n ≥ N . For such m, n, am,n < rmn, hence∑am,n converges by

Exercises 6 and 11. If L > 1, choose r ∈ (1, L) and then N such thata

1/mnm,n > r for all m, n ≥ N . For such m, n, am,n > rmn > 1, henceam,n 6→ 0, so the series diverges.

Section 7.11. (b) Pointwise to 0 on (−1, 1] for all p ≥ 0, uniformly on intervals [a, 1]

for a > −1 and p < 1. Uniformly on [−1, 1] if p < 0.(d) Pointwise to 0 on R, uniformly on |x| ≥ a > 0.(g) Uniformly to 0 on R.(j) Pointwise on R, uniformly on the sets |x| ≥ r > 1 and |x| ≤ s < 1.

2. (a) Pointwise but not uniformly. (b) Uniformly.

6. For example, fn(x) = x+ 1/n, f(x) = gn(x) = g(x) = x on [1,+∞].

10. Given ε > 0, choose δ > 0 such that |f(x) − f(y)| < ε for all x, y ∈ Rwith |x − y| < δ. Then choose N such that |an − a| < δ for all n ≥ N .For such n and for all x,

|fn(x)− f(x+ a)| = |f(x+ an)− f(x+ a)| < ε.

13. If x ∈ Q has reduced form x = k/m, then fn(x) = 1 for all n ≥ m.Therefore, fn converges pointwise to the Dirichlet function d(x). Supposethe convergence were uniform on [0, 1]. Then we could find n such that|fn(x)− d(x)| < 1 for all x ∈ [0, 1]. In particular, |fn(1/m)− 1| < 1 forall m > n, which is impossible since fn(1/m) = 0.

17. Let M > |f0(x)|+ 1 for all x ∈ S. Then

|fn+1(x)− fn(x)| = | sin(rfn(x)

)− sin

(rfn−1(x)

)|

≤ r|fn(x))− fn−1(x)| ≤ · · · ≤ rn|f1(x)

)− f0(x)|

≤Mrn.

Since r < 1, {fn} is uniformly Cauchy. Therefore, fn → some f , uniformlyon S. The generalization is proved in a similar manner, using the meanvalue theorem.

Section 7.24. Let x > 0. By l’Hospital’s rule, n2xe−nx has the same limit as 2ne−nx,

namely, 0. The convergence is not uniform on (0, 1), however, as maybe seen by taking bn = 1/n in 7.1.5. An integration by parts shows that∫ 1

0 n2xe−nx dx = 1− e−n(1 + n)→ 1.

5. (d) Let L := limn

∫ 10 fn. By the mean value theorem, e−x/n − 1 =

(−x/n)e−ξ/n, hence∣∣∣∣∣√n(e−x/n − 1

)x

∣∣∣∣∣ ≤ e−ξ/n√n≤ 1√

n

so√n(e−x/n − 1

)/x converges uniformly to zero. Therefore, L = 0.

6. If x ≥ r > 0, then fn(x) =√n/(1 + n2x2) ≤

√n/(1 + n2r2), hence

fn → 0 uniformly on [r,+∞). The convergence is not uniform on (0, 1),as can be seen by taking bn = 1/n in 7.1.5. A substitution shows that∫ 1

0 fn = n−1/2 arctann→ 0.

8. (a) n sin fn → f ′/f uniformly ⇒∫ ban sin fn → ln f(b)− ln f(a).

9. This follows from the inequality∣∣∣∣∫ x

a

fn(t) dt−∫ x

a

f(t) dt∣∣∣∣ ≤ ∫ x

a

|fn(t)− f(t)| dt ≤∫ b

a

|fn − f |.


Section 7.31. (a) Pointwise on (1,+∞), uniformly on [r,+∞), r > 1.

(d) Uniformly on [0,+∞).(g) Pointwise on (0,+∞), uniformly on [r,+∞), r > 0.(i) If p > 1, pointwise on [0,+∞), uniformly on [0, r];

If p = 1, converges only at x = 0.

2. (b) s(x) = 11 + ln x . Pointwise on (1/e, e), uniformly on closed subinter-

vals.

4. Both s(x) and c(x) converge uniformly on R by the M -test. Therefore,term by term integration is justified so∫ π/2

x

s(t) dt =∑n

an2n+ 1 cos

[(2n+ 1)x

],

∫ x

0c(t) dt =

∑n

ann

sin(nx).

6. (a) Let p ≤ 1/2 and x 6= 0. By l’Hospital’s rule, n−1[1− cos(x/np)] hasthe same limit as n→ +∞ as

−pxn−p−1 sin(x/np)−1/n2 = px2n1−2p sin(x/np)

x/np.

Since this limit is positive, (a) follows from the limit comparison test.(b) Since cosine is an even function, to show uniform convergence onintervals [a, b] we may assume a = 0. By the mean value theorem, foreach n ∈ N and x ∈ [0, b] there exists xn ∈ [0, b] such that

|1− cos(x/np)| = (x/np)| sin(xn/np)| ≤ b2/n2p.

Therefore, uniform convergence on [0, b] follows from the M -test. Since1−cos(x/np) does not converge uniformly to 0 on any unbounded interval,s(x) does not converge uniformly on R.

9. Let |f ′| ≤M on I. By the mean value theorem, for each x ∈ I and n ∈ Nthere exists ξ between x/(n+ 1) and 0 such that

1n

∣∣∣∣f( x

n+ 1

)∣∣∣∣ = |xf ′(ξ)|n(n+ 1) ≤

rM

n(n+ 1) .

Therefore, s(x) converges uniformly on I by the Weierstrass M -test.Since f ′ is bounded, the derived series

s′(x) =∞∑n=1

1n(n+ 1)f

′(

x

n+ 1

)converges uniformly on I and s′(0) = f ′(0).

11. Since fn ≥ 0, the partial sums of the series increase, so the conclusionfollows from Dini’s theorem (7.1.12).

13. For x ∈ [a, b], either fn(a) ≤ fn(x) ≤ fn(b) or fn(b) ≤ fn(x) ≤ fn(a),hence |fn(x)| ≤ Mn := max{|fn(a)|, |fn(b)|} ≤ |fn(a)| + |fn(b)|. Since∑Mn < +∞, s =

∑n fn converges uniformly on [a, b]. Since each

fn ∈ R([a, b]

), s ∈ R

([a, b]

)and

∫ bas =

∑∫ bafn.

15. By Dini’s theorem, the convergence of {gn} is uniform. Therefore, theresult follows from 7.3.9.

18. Since g is continuous and n−2[g + n] ↓ 0, the convergence is uniform onclosed bounded intervals I. By 7.3.9, s(x) converges uniformly on I. Theconvergence is not absolute for any x (compare with

∑n 1/n).

Section 7.41. (a) (−1, 3). (d) (−1, 1]. (g) (−1/4, 1/4). (i) (−1, 1).

2. (b)∞∑n=3

3n−3

2n−2xn, −2/3 < x < 2/3.

3. (a) Replace x by x− 1 in (7.12), where |x− 1| < 1, to obtain

x ln x = (x− 1) ln x+ ln x

=∞∑n=1

(−1)n+1

n(x− 1)n+1 +

∞∑n=1

(−1)n+1

n(x− 1)n

=∞∑n=2

(−1)nn− 1 (x− 1)n +

∞∑n=1

(−1)n+1

n(x− 1)n

= (x− 1) +∞∑n=2

[(−1)nn− 1 + (−1)n+1

n

](x− 1)n

= (x− 1) +∞∑n=2

(−1)nn(n− 1)(x− 1)n.

4. (a)∞∑n=1

(−1)n+12n + 3nn

xn, |x| < 1/3. (e)∞∑n=0

(−1)n(2n+ 1)!x

n, x > 0.

(g)∞∑n=0

(−1)n4n(2n+ 1)!x

2n+1, x ∈ R.

5. Use arccosx = π/2− arcsin x and (7.20).

9. (a)∞∑n=1

(−1)n x2n−1

(2n− 1)(2n+ 1)! .

10. (b) x(1− x2)(1 + x2)2 .

11. 27/4.

12. (a)∞∑n=1

(−1)n+1cnxn, |x| < 1, cn :=

n∑k=1

(−1)kk

.

(d)∞∑n=0

cnx2n+1, x ∈ R, cn :=

n∑k=0

(−1)k(2k + 1)!(n− k)! .

16. For |x| < (√

5− 1)/2,

(1− x− x2)s(x) =∞∑n=0

cnxn −

∞∑n=0

cnxn+1 −

∞∑n=0

cnxn+2

= c0 + c1x− c0x+∞∑n=2

(cn − cn−1 − cn−2)xn

= 1.

18. Replace x by −t2 in (7.19) to obtain

1√1 + t2

=∞∑n=0

(−1)n(2n)!(n!)24n t2n, |t| < 1.

Integrating from 0 to x yields the desired representation.

21. (a)) Choose r such that R−1s = lim supn |cn|1/n < r < 1. Then

|cn2 |1/n =(|cn2 |1/n

2)n< rn → 0,

hence Rt = +∞.(b) If cn = (1 + a/np)n, p > 0, then Rs = 1 and

Rt = limn

(1 + a

n2p

)−n=

e−a if p = 1/2,0 if p < 1/2 and a > 0+∞ if p < 1/2 and a < 01 if p > 1/2.

22. (a) If 0 < Rs < +∞, choose N such that |cn|1/n < 2R−1s for all n ≥

N . For such n, |cn|1/n2< (2R−1

s )1/n → 1, hence Rt ≥ 1. Similarly,|cn|1/n

2> (R−1

s /2)1/n for infinitely many n, hence Rt ≤ 1.

27. By the alternating series test,∑∞n=0 cnx

n converges at x = −1, hencethe result follows from Abel’s continuity theorem.

28. (a) Let sn(x) =n∑k=1

kxk and s(x) = x

(1− x)2 , x ∈ [0, 1). By 7.4.6 and

the boundedness of f , sn(x)→ s(x) uniformly on [0, r], 0 < r < 1.

30. Define h on I ∪ J by

h(x) ={f(x) if x ∈ I,g(x) if x ∈ J .

By 7.4.19, f = g on I ∩ J , hence h is well-defined and analytic on I ∪ J .

33. (a) By 7.4.13, if the series g(x) converges for |x−a| < r1, then f(x)g(x) =∑cn(x− a)n, where

c0 = a0b0 = a0 = 1 and cn =n∑k=0

akbn−k = a0bn − bn = 0, n ≥ 1.

Therefore, f(x)g(x) = 1 for |x− a| < r1.(b) Suppose |an| ≤Mn for all n. If |bj | ≤ (2M)j for 1 ≤ j ≤ n− 1, then

|bn| ≤n∑k=1|ak||bn−k| ≤

n∑k=1

2n−kMkMn−k < (2M)n.

By induction, |bn| ≤ (2M)n for all n.(c) By 7.4.16, there exists a constant M > 0 such that |an| ≤Mn for alln, hence (b) holds. By 7.4.16, g is analytic at a.

Section 8.11. Only (b) and (d) are not metrics.

3. Symmetry and coincidence are clear. To verify the triangle inequalityd(x, y) ≤ d(x, z) + d(y, z) simply note that if xj 6= yj then either xj 6= zjor yj 6= zj so that every index j contributing to d(x, y) also contributesto d(x, z) + d(y, z).

5. By the triangle inequality,

d(x, y) ≤ d(x, a) + d(a, y) ≤ d(x, a) + d(a, b) + d(b, y),

hence d(x, y) − d(a, b) ≤ d(x, a) + d(b, y). Similarly d(a, b) − d(x, y) ≤d(x, a) + d(b, y).

10. Let {xn} be a Cauchy sequence in E. Some Ej must contain a subsequenceof {xn}, and since Ej is complete, the subsequence converges to a memberof Ej . By Exercise 9, {xn} converges.The assertion is false for infinitely sets. For example, let {r1, r2, . . .} be

an enumeration of the rationals, and take En = {rn} (or {r1, . . . , rn}).

13. The proof of (a) is straightforward. For the necessity in (b), let {(xn, yn)}be Cauchy in Z. Since d(xn, xm) ≤ η

((xn, yn), (xm, ym)

), {xn} is Cauchy

in X. Similarly, {yn} is Cauchy in Y . The converse is clear. Part (c) isproved in a similar manner, and (d) follows from (b) and (c).

15. Part (a) is straightforward. For example, if ρ(x, y) = 0, then ρ(x, y) =d(x, y), hence x = y. Parts (b) and (c) follow from the observation thatρ(x, y) = d(x, y) if either term is less than a. Part (d) follows from (b)and (c).

The metrics need not be metrically equivalent: Take d to be the usualmetric on R.The function σ does not define a metric on X since σ(x, x) = a > 0.

18. (a) The triangle inequality follows from the observation that the functiont(1 + t)−1 is increasing on [0,+∞). The remaining properties of a metricare easily established. Parts (b) and (c) follow from the definition of ρand the equation

d(x, y) = ρ(x, y)1− ρ(x, y) ,

noting that ρ < 1.The metrics |x−y| and |x−y|/(1+|x−y|) are not metrically equivalent.

20. By Exercise 18, each ρk is a metric onX. It follows easily that ρ is a metricon X. For (b), suppose ρ(xn, x)→ 0. Since ρk ≤ 2kρ, ρk(xn, x)→ 0. ByExercise 18, dk(xn, x) → 0. Conversely, suppose dk(xn, x) → 0, henceρk(xn, x) → 0, for each k. Given ε > 0, choose M ∈ N such that∑n>M 2−n < ε/2 and choose N > M so that

ρ1(xn, x) + ρ2(xn, x) + · · ·+ ρM (xn, x) < ε/2

for all n ≥ N . For such n, ρ(xn, x) < ε.

23. For x, y ∈ [1, b],

|fn(x, y)− f(x, y)| =∣∣∣∣y(1 + xn)1/n − x(1 + yn)1/n

y(1 + yn)1/n

∣∣∣∣=∣∣∣∣ (1 + xn)1/n − x(1 + y−n)1/n

(1 + yn)1/n

∣∣∣∣≤ |(1 + xn)1/n − x|+ |x− x(1 + y−n)1/n|

(1 + yn)1/n

≤ x[(1 + x−n)1/n − 1

]+ x

[(1 + y−n)1/n − 1

]≤ b

[(21/n − 1) + (21/n − 1)

]→ 0.

Section 8.21.

(0, 0)(1, 0)

Bd∞1 (0, 0)

(0, 0)(1, 0)

Bd11 (0, 0)

FIGURE C.1: Open balls for Exercise 1.

3. r = d(x, y)/2.

5. If x, y ∈ Br(a) and 0 < t < 1, then

‖tx+ (1− t)y − a‖ = ‖t(x− a) + (1− t)(y − a)‖≤ ‖t(x− a)‖+ ‖(1− t)(y − a)‖< tr + (1− t)r = r.

In general, spheres are not convex. (Consider (R2, d2).)

8. By Exercise 8.1.6, ρ is a metric. Since ex is a continuous function on Rwith continuous inverse, ρ(xn, x) → 0 iff |xn − x| → 0. Therefore, ρ istopologically equivalent to the usual metric of R. (R, ρ) is not completein this metric. For example, {−n}∞n=1 is a Cauchy sequence in (R, ρ)with no limit. Therefore, ρ cannot be metrically equivalent to the usualmetric of R.

12. Let {fn} be a sequence in C converging uniformly to f . Then fn(x) =fn(1 − x) for all n and x. Taking limits yields f(x) = f(1 − x) for allx. To see that C is not closed in the metric of Exercise 8.1.22, definefn ∈ C by fn(1/2) = 1, fn(x) = 0 if x ∈ [0, 1/2− 1/n] ∪ [1/2 + 1/n, 1]and linear on [1/2− 1/n, 1/2 + 1/n].

Section 8.31. (a) cl(A) ∪ cl(B) is closed and ⊇ A ∪ B, so cl(A) ∪ cl(B) ⊇ cl(A ∪B).

Similarly, cl(A ∪B) ⊇ cl(A) and cl(A ∪B) ⊇ cl(B).(d) int(A)∪int(B) is open and ⊆ A∪B, hence int(A)∪int(B) ⊆ int(A∪B).The example A = (0, 1], B = (1, 2) in R produces strict inclusion.(f) bd(cl(A)) = cl(cl(A)) \ int(cl(A)) ⊆ cl(A) \ int(A) = bd(A). Theexample A = Q in R produces strict inclusion.

3. (b){

(x, y, 0) : x2 + y2 = 1}. (e) {(1, 0), (0, 0)}.

(f) The circle{

(x, y) : x2 + y2 = 1}together with the point (0, 0).

6. (a) By 8.3.6, y ∈ clY (A) iff for any sequence {an} in A with an → y,y ∈ A. The same characterization can be given for y ∈ clX(A) ∩ Y .

8. The sequence {fn} has no cluster points in(C([0, 1]), ‖ · ‖∞

), hence the

set {f1, f2, . . .} is closed. The identically zero function is a cluster pointof the sequence in

(C([0, 1]), ‖ · ‖1

), hence the set is not closed in this

space.

9. (a) B is open and B ⊆ C, hence B ⊆ int(C). The example B1(x) = {x}and C1(x) = X in a nontrivial discrete space gives strict inclusion.

12. (b) By 8.3.9, for any y ∈ R there exist integers nk > 0, mk such thatnk/(2π) +mk → y − x/(2π) hence

sin(nk + x) = sin[2π((nk + x)/(2π) +mk

)]→ sin(2πy).

Therefore, the set is dense in [−1, 1].

16. Let u ∈ U and choose ε > 0 such that Bε(u) ⊆ U . Since Y is dense inX, Bε(u) ∩ U ∩ Y = Bε(u) ∩ Y 6= ∅.If U is not open, then the assertion may not hold. For example, take

X = [0, 1], Y = (0, 1], and U = {0}.

20. (a) Let u, v ∈ I :=⋃

i∈I Ii and t ∈ (0, 1). Then u ∈ Ii and v ∈ Ijfor some i, j ∈ I. Since Ii ∩ Ij 6= ∅, Ii ∪ Ij is an interval. Therefore,tu+ (1− t)v ∈ Ii ∪ Ij ⊆ I, hence I is an interval. Since each Ii is open, Iis open.

Section 8.41. (b), (k) (o), (r) Limit and iterated limits are 0.

(e) Limit does not exist. One iterated limit is 0, the other is 1.(i) Limit and iterated limits exist and = 1/2.

2. (a) The limit is 1 since∣∣∣∣x2 − 5y2

x2 + 3y2 − 1∣∣∣∣ = 8y2

x2 + 3y2 <8y2

(|y|/a)2/p + 3y2 ≤ 8a−2/p|y|2(1−1/p) → 0.

(b) The limit does not exist, as may be seen by converting to polarcoordinates.

3. By the Cauchy mean value theorem, for each x 6= y there exists a numberθ = θ(x, y) between x and y such that

g(x, y) = f ′(θ)cos θ .

Since limy→x θ(x, y) = x, define g(x, x) = f ′(x)/ cosx.

6. This follows from Exercise 8.1.5

7. Given ε > 0, choose δ > 0 such that |f(x)− f(a)| < ε for all x, a with|x− a| < δ. Let

√(x− a)2 + (y − b)2 < δ/2. Then∣∣∣√x2 + y2 −√a2 + b2

∣∣∣ = |x2 + y2 − a2 − b2|√x2 + y2 +

√a2 + b2

≤ |x− a|(|x|+ |a|) + |y − b|(|y|+ |b|)√x2 + y2 +

√a2 + b2

≤ |x− a|+ |y − b|≤ 2√

(x− a)2 + (y − b)2 < δ,

hence |g(x, y)− g(a, b)| < ε.

8. For a proof using the sequential criterion for uniform continuity, letxn − an, yn − bn → 0. Then αxn + βyn − (αan + βbn) → 0, henceg(xn, yn)− g(an, bn) = f(αxn + βyn)− f(αan + βan)→ 0.The functions xy and sin(xy) are not uniformly continuous on R2. (For

the former take xn = yn = n+ 1/n and an = bn = n. For the latter takexn = yn =

√2π [n+ 1/(3n)] and an = bn =

√2π n.)

11. This follows from the inequalities

|fj(x)− fj(a)| ≤ ‖f(x)− f(a)‖ ≤n∑j=1|fj(x)− fj(a)|.

12. We prove the uniform continuity part. Given ε > 0, choose a fixed nsuch that ρ(fn(x), f(x)) < ε/3 for all x ∈ X. Then choose δ > 0 suchthat ρ(fn(x), fn(a)) < ε/3 for all x, a ∈ X with d(x, a) < δ. The triangleinequality then shows that ρ(f(x), f(a)) < ε/3 for all x, a ∈ X withd(x, a) < δ.

Section 8.51. (a) compact. (b) closed, not bounded.

(f) bounded, not closed. (h) neither bounded nor closed.

3. Compact case: Let {Ui : i ∈ I} be an open cover of C := C1 ∪ · · · ∪ Ck,where each Cj is compact. For each j there exists a finite set Ij ⊆ Isuch that {Ui : i ∈ Ij} covers Cj . If I0 is the union of the Ij , then{Ui : i ∈ I0} is a finite subcover of C.

4. Such an intersection is closed and contained in a compact set and istherefore compact.

7. If E is totally bounded, then cl(E) is totally bounded. SinceX is complete,cl(E) is complete. Therefore, by 8.5.8, cl(E) is sequentially compact. Inparticular, every sequence in E has a cluster point in X.

Conversely, assume every sequence in E has a subsequence thatconverges in X. Let {yn} be a sequence in cl(E). For each n, choosexn ∈ E such that d(xn, yn) < 1/n. By hypothesis, a subsequence xnkconverges to some x ∈ X, hence ynk → x. Therefore, cl(E) is sequentiallycompact hence totally bounded.

11. Suppose⋂∞n=1 Cn = ∅. Then

⋃∞n=1 C

cn = X, hence {Ccn : n ∈ N} is

an open cover of X and therefore also of C1. Choose n ∈ N such thatC1 ⊆ Cc1∪· · ·∪Ccn. Taking complements, Cn = C1∩· · ·∩Cn ⊆ Cc1 ⊆ Ccn,which is impossible.

13. By the approximation property of suprema, there exist sequences {an}and {bn} in A such that d(an, bn) → d(A). Since A is compact, thereexists a subsequence {a′n} of {an} converging to some a ∈ A. Similarly,there exists a subsequence {b′′n} of the corresponding subsequence {b′n}that converges to some b ∈ A. It follows that d(a, b) = limn d(a′′n, b′′n) =d(A).For the example, take A = {fn} in C

([0, 1]

)with the sup metric, where

fn(x) = xn. Then d(A) = 1 > d(fn, fm) for all m, n.

15. (a) For any a ∈ A, d(A, x) ≤ d(a, x) ≤ d(a, y) + d(y, x), hence d(A, x)−d(y, x) ≤ d(a, y). Taking the infimum over a yields d(A, x) − d(y, x) ≤d(A, y) or d(A, x)− d(A, y) ≤ d(y, x). Interchanging x and y yields (a).(b) If x 6∈ cl(A) there exists r > 0 such that Br(x) ∩ cl(A) = ∅. Thend(a, x) ≥ r for all a ∈ A, hence d(A, x) > 0. Conversely, assume x ∈ cl(A)and let an ∈ A with an → x. Since d(A, x) ≤ d(an, x)→ 0, d(A, x) = 0.(c) By (b), the denominator of FAB(x) is positive, hence FAB is well-defined. Continuity follows from (a), and clearly 0 ≤ FAB ≤ 1. The lastassertions follow from (b).(d) U = {x ∈ X : FAB(x) < 1/2}, V = {x ∈ X : FAB(x) > 1/2}.

19. Let xn := f(1/n) and yn := f(2π−1/n). Then limn xn = limn yn = (1, 0)but f−1(xn) = 1/n→ 0 and f−1(yn) = 2π − 1/n→ 2π.

21. Each set is a continuous image of the compact set A×B.

Section 8.63. Suppose that F is equicontinuous at a. Given ε > 0, choose δ > 0 such

that ρ(f(x), f(a)) < ε for all x ∈ X with d(x, a) < δ and all f ∈ F .Given sequences {fn} in F and {xn} in E with xn → a, choose N suchthat d(xn, a) < δ for all n ≥ N . For such n, ρ

(fn(xn), fn(a)

)< ε.

Conversely, suppose that F is not equicontinuous at a. Then thereexist ε > 0 and members xn of E and fn of F such that d(xn, a) < 1/nbut ρ

(fn(xn), fn(a)

)≥ ε. Therefore, the sequential condition does not

hold.

7. Let x > a ≥ c. By the mean value theorem applied to the functionf(z) = z−p on (na, nx),∣∣∣∣ 1

(nx)p −1

(na)p

∣∣∣∣ = pn|x− a|yp+1n

for some yn ∈ (na, nx).

Since yp+1n ≥ (nc)p+1 ≥ cp+1, |(nx)−p−(na)−p| ≤ p|x−a|c−(p+1), which

shows equicontinuity.

9. Take xn = a+ π/n in Exercise 3. Then xn → a but

sin(nxn)− sin(na) = −2 sin(na),

which has no limit if a is a nonzero rational number.

11. By the mean value theorem, |f(x)− f(y)| ≤M |x− y|.

14. Let ‖fi‖∞ ≤ M for all i. Then |Fi(x) − Fi(y)| ≤ M |x − y|, hence F isuniformly equicontinuous on [a, b]. It follows that the uniform closure Gof F in C([a, b]) is uniformly equicontinuous on [a, b] (Exercise 6). SinceG is also closed and bounded, it is compact (Arzelà–Ascoli Theorem),hence totally bounded.

Section 8.71. (c) not connected. (d) path connected, hence connected.

(e) connected iff −1 ≤ a ≤ 1.

5. Then f(u) and f(v) have opposite signs, say f(u) < 0 < f(v). Since therange of f is connected, it contains the interval (f(u), f(v)).

7. Let f = (g, h) : X → R2 and L := {(x, x) : x ∈ R}. Then L separates R2

into two open half-planes H1 and H2. Choose any x0 ∈ X and supposef(x0) ∈ H1. Then E := f−1(Hc

1) = f−1(H2) is both open and closed.Since X is connected, E = ∅. Therefore, f(X) ⊆ H1.

9. Consider the case B := B1(0). Any point in Bc may be connected to thesphere S := S2(0) by a radial line segment. Since S is path connected(8.7.10), Bc is path connected.

12. Denote the union by A. Let f : A→ {0, 1} be continuous. Since An isconnected, f(An) is a single point. Since An ∩An+1 6= ∅, an inductionargument shows that f is constant.

16. Suppose that f : L → C is such a function. Then f−1 : C → L iscontinuous (8.5.11). Remove a point p from the interior of L. Then f−1

maps the connected set C \ f(p) onto the disconnected set L \ p.The function f(t) = (cos t, sin t) maps [0, 2π] continuously onto the

circle x2 + y2 = 1.

20. Let x ∈ bd(A) and ε > 0. Then there exist u, v ∈ Bε(x) such thatf(u) ≥ c and f(v) < c. Since Bε(x) is convex, it is connected, hencef(Bε(x)

)is an interval and so must contain c. Taking ε = 1/n, we may

construct a sequence xn → x with f(xn) = c for each n. Therefore,f(x) = c. This shows that bd(A) ⊆ B.

The example f(x) = x2 on R with c = 0 shows that the inclusion maybe strict.

22. (a) Cx is connected by Exercise 13. Let u ∈ Cx and choose ε > 0 suchthat Bε(u) ⊆ U . Since Bε(u) is connected, Bε(u) ∪ Cx is connected,hence Bε(u) ⊆ Cx. Therefore, Cx is open. If Cx ∩ Cy 6=, then Cx ∪ Cy

is connected hence Cx = Cy. Therefore, U is a union of pairwise disjointcomponents.(b) Choose a point with rational coordinates in each component in (a).Since these points form a countable set, the union is countable.

Section 8.83. Choose a sequence of polynomials Pn converging uniformly to f on

[a, b]. By hypothesis,∫ bafPn = 0 for all n, hence

∫ baf2 = 0. Since f is

continuous, f = 0. If a ≥ 0, then the polynomials with even powers forma separating algebra, hence the result follows as before.

6. By the Stone–Weierstrass theorem, there exists a sequence of functionsgn in A converging uniformly to f . Set fn = gn − gn(x0). Then fn ∈ Aand gn(x0)→ 0, hence

‖fn − f‖∞ ≤ ‖fn − gn‖∞ + ‖gn − f‖∞ = |gn(x0)|+ ‖gn − f‖∞ → 0.

9. By 8.8.8, there exists a sequence {Tn} of trigonometric polynomialsconverging uniformly to f on [0, 2π]. For any j, sin(jx) and cos(jx)are linear combinations of products sinm x cosn x, hence, by hypothesis,∫ 2π

0 f(x)Tn(x) dx = 0 for all n. Therefore,∫ 2π

0 f2 = 0 so f = 0.

11. The set of all functions of the form T (x) := b0 +∑mj=1 bj sin(jx) on

[−π/2, π/2] is an algebra A containing the constant functions. Since sin xseparates points, so does A. Therefore, given ε > 0, ‖f − T‖∞ < ε/2 forsome T . Since f(0) = 0, |b0| < ε/2. Therefore, ‖f − (T − b0)‖∞ < ε.

15. The functions∑ni=1 gi(x)hi(y) form an algebra and separate points of

X × Y .

Section 8.91. Assume that X has the decreasing sequence property and let {xn} be a

Cauchy sequence in X. Take Cn = cl({xn, xn+1, . . .}

). Then Cn is closed,

Cn+1 ⊆ Cn and d(Cn) → 0 (because {xn} is Cauchy). By assumption,there exists x ∈ X such that x ∈ Cn for all n. It follows that somesubsequence of {xn} converges to x. Therefore xn → x (Exercise 8.1.9).

3. Let {r1, r2 . . .} be an enumeration of Q. Then Un := {rn, rn+1, . . .} isopen and dense in Q but

⋂n Un = ∅.

Section 9.1

1. (a) 2y dx− 2x dy(x+ y)2 . (e) cos(x2y)(2xy dx+ x2 dy).

(h) exy2(y2 dx+ 2xy dy).

2. (b)[ex sin y ex cos yey cosx ey sin x

]. (c) 1

(x2 + y2)2

[y3 − x2y x3 − xy2

4xy2 −4y3

].

3. Let ∆ = {(x, x) : x ∈ R}.(a) Differentiable on R2 iff p, q > 3, in which case partials are continuous.(d) Differentiable on R2 iff p, q > 1. Partials are continuous iff p > 2.

4. (a) Differentiable and partials continuous iff p+ q > 1.(d) Differentiable and partials continuous iff p+ q > 2s+ 1.

8. x ·∇f(x) = a · f(x), x ·∇g(x) = g(x).

10. e−f(x)(ex1 , ex2 , . . . , exn).

12. (a) xi‖x‖

. (c) ‖x‖2 − x2

i

‖x‖3.

Section 9.21. Let α denote the right side of the inequality. Clearly, ‖T‖ ≤ α. If ‖x‖ ≤ 1,

then ‖Tx‖ ≤ ‖T‖‖x‖ ≤ ‖T‖, hence α ≤ ‖T‖.


3. Since ∇(ψ−1) = ψ−2∇ψ, the assertion follows from the scalar productrule.

4. (a) Let f(x) = x and ψ(x) = ‖x‖ in the product rule (9.2.6). Since dfx

is the identity transformation and ∇ψ(x) = x/‖x‖,

dgx(h) = ‖x‖h+ ‖x‖−1(x · h)x.

Therefore,

dgx(x) = ‖x‖x+ ‖x‖−1(x · x)x = 2‖x‖x.

6. Let η(h), µ(k) be such that

f(a+h) = f(a)+dfa(h)+‖h‖η(h), g(b+k) = g(b)+dgb(k)+‖k‖µ(k)

for all h ∈ Rp, k ∈ Rq with ‖h‖, ‖k‖ sufficiently small, and

limh→0

η(h) = limk→0

µ(k) = 0.

Let T (h,k) = αdfa(h) + βdgb(k). Then T is linear in (h,k) and

ε(h,k) := F (a+ h, b+ k)− F (a, b)− T (h,k) = α‖h‖η(h) + β‖k‖µ(k).

Since ‖(h,k)‖ =√‖(h‖2 + ‖k‖2 ≥ ‖h‖, ‖k‖,

‖ε(h,k)‖‖(h,k)‖ ≤

|α|‖h‖‖η(h)‖+ |β|‖k‖‖µ(k)‖‖(h,k)‖ ≤ |α|‖η(h)‖+ |β|‖µ(k)‖.

10. Part (a) follows from Exercise 8.5.14. For (b), set g(t) = ‖f(t) − v‖2.Then g′(t) = 2(f(t)− v) · f ′(t), and since g(t0) is the minimum value ofg, g′(t0) = 0.

Section 9.31. g′

(ϕ(x)ψ(y)

)(ϕ′(x)ψ(y), ϕ(x)ψ′(y)

).

3. gx(a · x, b · x)a+ gy

(a · x, b · x)b.

7. Tf ′(x).

10. (a) Let g(t) = f(a+ tu). By definition, Duf(a) = g′(0). On the otherhand, by the chain rule, g′(t) = u · ∇f(a+ tu). Setting t = 0 yields (a).

(c) limt→0

f(tu)− f(0)t

= limt→0

ab2

a2 + b4t2exists for all u = (a, b). f is not

continuous at (0, 0), since f → 0 along y = 0 but f = 1/2 along y =√x,

x > 0.


12. Let F (x) =∫ baf(t, x) dt. By the mean value theorem,

F (x+ h)− F (x)h

=∫ b

a

fx(t, x+ rh) dt, for some r = r(t, x, h) ∈ (0, 1).

Since fx is uniformly continuous, fx(t, x+ rh)→ fx(t, x) uniformly in ton [a, b] as h→ 0. Therefore, F ′(x) =

∫ bafx(t, x) dt.

15. Let ϕ(t) = t−pf(tx). By the product rule and the chain rule,

ϕ′(t) = −ptp+1 f(tx) + 1

tp∇f(tx) · x.

If f is homogeneous of degree p, then ϕ is a constant function, hence

p

tp+1 f(tx) = 1tp∇f(tx) · x.

Setting t = 1 produces the desired identity. On the other hand, if theidentity holds, then tx ·∇f(tx) = pf(tx) for all t and x, hence ϕ′(t) = 0.Therefore, ϕ(t) = ϕ(1), which shows that f is homogeneous of degree p.

17. Fix y ∈ C and define g on U by g(x) = f(x) − f(y) − dfy(x). Theng(x) − g(y) = f(x) − f(y) − dfy(x − y) and dgz = dfz − dfy (9.1.7),hence the result follows from 9.3.6 applied to g.

Section 9.41. (a) {(x, y) : x 6= y}. (b) {(x, y) : x+ y 6= (2n+ 1)π/2, n ∈ Z}.

(e) {(x, y) : xy 6= 0}. (f) {(x, y) : x, y > 0, y 6= x}.(i) {(x, y) : y 6= ±x}. (j) {(x, y, z) : xyz 6= 0}.

2. (i) x = 12(u+√u2 − 4v

), y = 1

2(u−√u2 + 4v

).

(v) x = 1√2

(u−√u2 − 4v2

)1/2, y = 1√2

(u+√u2 − 4v2

)1/2.4. Set u = x(x2 + y2)−1 and v = y(x2 + y2)−1. Square and add. Jf = −1.

Section 9.51. Let f(x, y) = x+ y2 + exy − 1. Then fx(0, 0) = 1 and fy(0, 0) = 0, so the

implicit function theorem guarantees a local solution x = x(y) but saysnothing about a solution y = y(x).

5. Let

F (x, y, z) = sin(x+ z) + ln(y + z)−√

2/2 andG(x, y, z) = exz + sin(πy + z)− 1.


Then, at (π/4, 1, 0), F = G = 0 and

∂(F,G)∂(x, y)

∂(F,G)∂(y, z)

∂(F,G)∂(x, z) 6= 0.

8. Let F = x−y+z+u2−2, G = −x+ 2z+u3−2, H = −y+ 3z+u4−3.Then, at (1, 1, 1, 1), F = G = H = 0 and

∂(F,G,H)∂(x, y, u)

∂(F,G,H)∂(y, z, u)

∂(F,G,H)∂(x, z, u) 6= 0.

9. (b) Let a := fx(0, 0) and b := fy(0, 0). The condition is b(a+ 1) 6= 0. The

derivative is−fx(x, y)fx

(f(x, y), y

)fy(x, y)fx

(f(x, y), y

)+ fy

(f(x, y), y

) .11. (a) The condition is a(a3−ab2−b3) 6= 0 where a := fx(0, 0), b := fy(0, 0).

13. f ′(1) + g′(1) + h′(1) 6= 0.

15. Let y = F (x1, . . . , xn). If x1 is a function of x2, . . . , xn, then, assumingthe necessary differentiability,

0 = ∂y

∂xn= Fx1

∂x1

∂xn+ Fxn ,

hence ∂x1

∂xn= −Fxn

Fx1

. In this manner we obtain

∂x2

∂x1

∂x3

∂x2. . .

∂xn∂xn−1

∂x1

∂xn= (−1)nFx1

Fx2

Fx2

Fx3

. . .Fxn−1

Fxn

FxnFx1

= (−1)n.

Section 9.61. (b) zrr = t2zxx + 2tzxy + zyy, ztt = r2zxx + 2rzxy + zyy.

(e) zrr = (e2r sin2 t)zxx + (e2r cos2 t)zyy + (2e2r sin t cos t)zxy+ (er sin t)zx + (er cos t)zx,

ztt = (e2r cos2 t)zxx + (e2r sin2 t)zyy − (2e2r sin t cos t)zxy− (er sin t)zx − (er cos t)zx.

(f) zr = axzx, zt = byzy, zrr = a2x2zxx + a2xzx, ztt = b2y2zyy + b2yzy.

4. Fx + zxFz = 0, hence

0 = Fxx + 2zxFxz + z2xxFzz + zxxFz = Fxx − 2Fx

FzFxz + F 2

x

F 2z

Fzz + zxxFz

and sozxx = − 1

FzFxx + 2Fx

F 2z

Fxz −F 2x

F 3z

Fzz.


5. (a) ut = −k2u, uxx = −u.(b) By logarithmic differentiation,

ut =(

x2

4k2t2− 1

2t

)u, uxx =

(x2

4k4t2− 2

4k2t

)u.

7. The second order partial derivatives are

wρρ = (sinφ cos θ)2wxx + (sinφ sin θ)2wyy + (cos θ)2wzz

+ (2 sinφ)[(sinφ sin θ cos θ)wxy + (cosφ cos θ)wxz + (cosφ sin θ)wyz

],

wθθ = (ρ sinφ)2[(sin2 θ)wxx + (cos2 θ)wyy − 2(sin θ cos θ)wxy]

− (ρ sinφ)[(cos θ)wx − (sin θ)wy],wφφ = ρ2[(cosφ cos θ)2wxx + (cosφ sin θ)2wyy + (sinφ)2wzz

]+ 2ρ2[(cos2 φ sin θ cos θ)wxy − (cosφ sinφ cos θ)wxz − (cosφ sinφ sin θ)wyz

]− ρ[(sinφ cos θ)wx + (sinφ sin θ)wy + (cosφ)wz

].

9. fxi = pxi‖x‖p−2g′ (‖x‖p), hence

fxixi = p[‖x‖p−2 + (p− 2)x2

i ‖x‖p−4] g′ (‖x‖p) + p2x2i ‖x‖2(p−2)g′′ (‖x‖p) ,

fxixj = pxixj

[(p− 2)‖x‖p−4g′ (‖x‖p) + p‖x‖2(p−2)g′′ (‖x‖p)

](i 6= j).

Section 9.7

1. (b) ∂3f

∂x3 (dx)3 + 3 ∂3f

∂x2∂y(dx)2 dy + 3 ∂3f

∂x∂y2 dx (dy)2 + ∂3f

∂y3 (dx)3.

2. (a) 2y2(3x+ y) (dx)2 + 12xy(x+ y) dx dy + 2x2(x+ 3y) (dy)2.

(b) 6x4y

(dx)2 + 4x3y2 dx dy + 2

x2y3 (dy)2.

(c) −y2 sin(xy) (dx)2 + 2[

cos(xy)− xy sin(xy)]dx dy − x2 sin(xy) (dy)2.

(d) 2f(x, y)[(2x2 + 1) (dx)2 + 4xy dx dy + (2y2 + 1) (dy)2.

(e) 1(x2 + y)2

[2(y − x2) (dx)2 − 4x dx dy − (dy)2].

3. zero.

5. (a) f + h1∂f

∂x1+ h2

∂f

∂x2+ h3

∂f

∂x3. The terms are evaluated at a.

8. By induction,

∂pf(x)∂xp1

1 ∂xp22 . . . ∂xpnn

= bp11 b

p22 . . . bpnn ϕ(p)(b · x),


hence

(x · ∇)pf(0) = ϕ(p)(0)∑(

p

p1, p2, . . . , pn

)(b1x1)p1(b2x2)p2 . . . (bnxn)pn

= ϕ(p)(0)(b · x)p,

where the second equality follows from the multinomial theorem.

11. (a) x+ y − 16 (x+ y)3. (d) x+ y − 1

3 (x+ y)3.

Section 9.82. x2+2y2+3z2−xy−yz−xz = 1

2[(x−y)2+(y−z)2+(x−z)2]+y2+2z2 ≥ 0.

3. (a) (0, 0): local min; (−4/3, 4/3): saddle.(d) (1, 1), (−1,−1): local max; (0, 0): saddle.(f) (2,−2): saddle.(i) (1/3, 1/3): local max; (0, 0), (0, 1), (1, 0): saddle.

4. (a) Use polar coordinates to optimize the resulting single variable functiong(θ) = cos θ+sin θ, g′(θ) = − sin θ+cos θ, 0 ≤ θ ≤ 2π. The critical pointsof g occur at values of θ that satisfy sin θ = cos θ = ±

√2/2. At these

values, g(θ) = ±√

2. Also, g(0) = g(2π) = 1. Therefore, the maximumand minimum values of f are ±

√2.

6. (b) The only critical point is (2/3,−1/3). On bd(D), f = x2 − x+ 2,−1 ≤ x ≤ 1, which has critical point (±1/

√2,±1/

√2). Checking the

values of f at these points and at (±1, 0) shows that the maximum of f isf(−1, 0) = f(−1/

√2,−1/

√2) = 2 and the minimum is f(2/3,−1/3) =

−1/3.(d) The only critical point is (0, 0). On bd(D), f = ± sin

(x√

1− x2),

−1 ≤ x ≤ 1, which has critical points (±1/√

2,±1/√

2). Checking thevalues of f at these points and at (±1, 0) shows that the extreme valuesof f are ± sin(1/2).

10. Sincelim

(x,y)→(0+,0+)f(x, y) = lim

(x,y)→(+∞,+∞)f(x, y) = +∞,

f has a minimum on (0, c) × (0, d) for suitable c, d > 0, and theminimum must occur at a critical point. The unique critical point is(a2/3b−1/3, a−1/3b2/3), which gives the minimum 3(ab)1/3.

11. Let f(m, b) =∑ni=1(yi − mxi − b

)2. Since not all x coordinates arethe same, m must be bounded. Since the data is bounded, b must be


bounded. Therefore, the minimum exists and must occur at the uniquecritical point (m, b) of f , which is determined by the system

n∑i=1

(yi −mxi − b)(−xi) =n∑i=1

(yi −mxi − b)(−1) = 0.

It follows that x · y −m‖x‖2 − nbx = mx− y + b = 0.

15. Let f(x, y) = ax2 + 2bxy + y2 and g(x, y) = x2 + y2 − c2. The equation∇f = λ∇g yields ax+ by = λx and bx+ y = λy. Multiplying the firstequation by x and the second by y and then adding yields

f(x, y) = λ(x2 + y2) = λc2.

Since the system (a − λ)x + by = bx + (1 − λ)y = 0 has a nontrivialsolution iff the determinant of the coefficient matrix is zero, we obtainλ2 − (a+ 1)λ+ a− b2 = 0. Solving for λ we see that the maximum andminimum values of f on the circle are

λc2 =[a+ 1±

√(a+ 1)2 + 4(a− b2)

](c2/2).

17. We minimize f(x, y) := (x − 1)2 + (y − 2)2 + (z − 3)2 subject to theconstraint g(x, y, z) := x2 + y2 − z = 0. From ∇f = λ∇g we have

x− 1 = λx, y − 2 = λy, z − 3 = −λ/2,

from which it follows that y = 2x and z = 3−(x−1)/2x. From z = x2+y2

we then have 3− (x− 1)/2x = 5x2, or 10x3 − 5x− 1 = 0.

19. We minimize f(x, y) := (x − 1)2 + (y − 2)2 + (z − 3)2 subject to theconstraint g(x, y, z) := z2 − x2 − y2 − 1 = 0. From ∇f = λ∇g we have

x = 11 + λ

, y = 21 + λ

x, z = 31− λ,

hence y = 2x and z = 3x/(2x − 1). Substituting into z2 − x2 − y2 = 1yields the desired polynomial.

22. Let f(x, y, z) = x+ 2y+ 3z, g1(x, y, z) = x+ y+ z − 1, and g2(x, y, z) =x2 + y2 + z2 − 1. From ∇f = λ1∇g1 + λ2∇g2,

1 = λ1 + 2λ2x, 2 = λ1 + 2λ2y, 3 = λ1 + 2λ2z.

Subtracting yields 1 = 2λ2(y − x) = 2λ2(z − y) so y − x = z − y orx− 2y + z = 0. Combining this with the constraint x+ y + z = 1 yieldsy = 1/3 and z = 2/3 − x. From the constraint x2 + y2 + z2 = 1 weobtain x2 − 2x/3− 2/9 = 0 so x = (1 ±

√3)/3. The maximum value of

f (≈ 3.154694) occurs when x = (1−√

3)/3, the minimum (≈ 0.845293)when x = (1 +

√3)/3.

24. We minimize f(x) :=∑ni=1(xi − bi)2 subject to the constraint g(x) :=

a · x− c = 0. From ∇f = λ∇g we have (xj − bj) = λaj/2, hence

(xj − bj)2 =λ2a2

j

4 = λ(ajxj − ajbj)2 , 1 ≤ j ≤ n.

Adding and using the constraint,

f(x) = λ2

4 ‖a‖2 and f(x) = λ

2 (a · x− a · b) = λ

2 (c− a · b)

Therefore, λ = 2(c− a · b)‖a‖−2, which gives the desired conclusion.

26. Let f(x) = ‖x−a‖2 and g(x) = ‖x‖2−1. The equation ∇f = λ∇g leadsto the system xj − aj = λxj , or xj(1− λ) = aj , j = 1, . . . , n. Therefore,

xj = aj/(1 − λ), so by the constraint ‖a‖2 =n∑j=1

a2j = (1− λ)2, hence

x = ±a/‖a‖. The distance to the sphere is then the smaller of∥∥± a‖a‖−1 − a∥∥ =

∣∣1± ‖a‖−1∣∣ ‖a‖,namely

∣∣1− ‖a‖−1∣∣ ‖a‖.

27. (a) Let f(x) = a · x and g(x) =∑ni=1 bi/xi − 1. From ∇f = λ∇g we

have ai = −λbi/x2i , hence

xi = µ√bi/ai and bix

−1i =

√aibi/µ, µ :=

√−λ.

The constraint implies that µ =∑ni=1√aibi. Since aixi = µ

√aibi, the

minimum is( n∑i=1

√aibi

)2.

That the value is indeed the minimum may be argued as follows. If xis any point satisfying the constraint, then

f(x) = a1x1 + a2x2 + · · ·+ an−1xn−1 + anbn

1−∑n−1i=1 bi/xi

,

where xi > bi. Thus |f | → +∞ as the variables x1, x2, . . ., xn−1 becomelarge or as

∑ni=1 bi/xi nears 1. Therefore, the minimum occurs in the

interior of a compact set, hence at the point obtained above.

30. Since cl(U) is compact, there exist points u, v ∈ cl(U) such that

f(u) ≤ f(x) ≤ f(v) for all x ∈ cl(U).

If f(u) = f(v), then f is a constant function and the result follows. Iff(u) < f(v), then one of the points, say u, must lie in U . By 9.8.2,f ′(u) = 0.

Section 10.11. Let µ be as in 10.1.5 with pk = 1/k, or let µ be as in ??, and takeAk = {k, k + 1, . . .}.

3. By the inclusion-exclusion principle and additivity,

µ(A ∪B) = µ(A) + µ(B)− µ(A ∩B) = µ(A), andµ(A) = µ(A \B) + µ(A ∩B) = µ(A \B).

5. Let B = A1 ∪ · · · ∪An. By 10.1.6(c),

µ(A1 ∪ · · · ∪An+1) = µ(B ∪An+1) = µ(B) + µ(An+1)− µ(B ∩An+1).

By the induction hypothesis,

µ(B) + µ(An+1)

=n+1∑i=1

µ(Ai)−n∑

1≤i<j≤nµ(Ai ∩Aj) + · · ·+ (−1)n−1µ(A1 ∩ · · · ∩An)

and

µ(B ∩An+1) =n∑i=1

µ(Ai ∩An+1)−n∑

1≤i<j≤nµ(Ai ∩Aj ∩An+1)

+ · · ·+ (−1)n−1µ(A1 ∩ · · · ∩An ∩An+1).

Subtracting produces the desired formula.

7. For any finite subset S of N with at least m members, define

ES =⋂j∈S

Ej ∩⋂j∈Sc

Ecj .

There are countably many such sets, they are pairwise disjoint, and∞⋃k=1

Ek ⊇⋃S

ES = A.

Moreover, Ek ∩ ES 6= ∅ iff k ∈ S, in which case Ek ⊇ ES . Therefore, byadditivity,

∞∑k=1

µ(Ek ∩A) =∞∑k=1

∑S

µ(Ek ∩ ES) =∑S

∞∑k=1

µ(ES ∩ Ek)

=∑S

∑k∈S

µ(ES ∩ Ek) ≥∑S

mµ(ES) = mµ(A).

Section 10.21. Clearly, λ∗(A) ≤ α(A). To show the reverse inequality, let {Ik} be a

sequence in I that covers A. Given ε > 0 choose Jk ∈ J containingIk such that |Jk| < |Ik| + ε/2n. Then {Jk} covers A and

∑k |Ik| ≥∑

k |Jk|−ε ≥ α(A)−ε. Since ε was arbitrary,∑k |Ik| ≥ α(A). Therefore,

λ∗(A) ≥ α(A).

4. Let {Ik} be any sequence of intervals that covers A. Then the intervalsx+ Ik cover x+A, hence

λ∗(x+A) ≤∞∑k=1|x+ Ik| =

∞∑k=1|Ik|.

Since {Ik} was arbitrary, λ∗(x + A) ≤ λ∗(A). Since A = (A + x) − x,the reverse inequality also holds.

Section 10.32. For any C ⊆ Rn,

C ∩ (−E) = −[(−C) ∩ E] and C ∩ (−E)c = −[(−C) ∩ Ec],

hence, using D := −C as a test set for E,

λ∗(C ∩ (−E)

)+ λ∗

(C ∩ (−E)c

)= λ∗(D ∩ E) + λ∗(D ∩ Ec) = λ∗(D).

By Exercise 10.2.5, λ∗(D) = λ∗(C). Therefore, −E ∈M and λ(−E) =λ(E).

4. Let D =⋃∞n=1(rk − ε/2n+1, rk + ε/2n+1), where {r1, r2, . . .} is an enu-

meration of Q.

Section 10.41. Let {r1, r2, . . . , } be an enumeration of the rationals in [0, 1] and setA =

⋃∞k=1(rj − ε/2k+1, rj + ε/2k+1) and C = [0, 1] ∩ Ac. Then C is

compact and λ(A) < ε, hence

λ(E \ C) = λ([0, 1] \ C) = λ([0, 1] ∩A) < ε.

4. Let F denote the collection of all Borel sets B such that rB is a Borelset. Then F is a σ-field containing all intervals, hence F = B. A similarargument works for the other sets.

Section 10.51. For t ∈ R,

{x : f(x) < t} =∞⋃k=1{x ∈ Ak : f(x) < t}

=∞⋃n=1{x : (f1Ak)(x) < t} ∩Ak ∈ F .

5. (a) {x : g(x) < h(x)} =⋃r∈Q {x : g(x) < r < h(x)} ∈ F .

(d) Since gh is measurable the assertion follows from

{x ∈ S : g(x)h(x) ≤ 1} ∩ {x ∈ S : g(x)h(x) ≥ 1} .

8. If 1E is measurable, then E = {x : 1E(x) > 0} ∈ F . Conversely, ifE ∈ F and t ∈ R, then

{x : 1E(x) ≤ t} =

∅ if t < 0,Ec if 0 ≤ t < 1, andS if t ≥ 1.

In each case, {x : 1E(x) ≤ t} ∈ F , hence 1E is measurable.

10. 1A∆B(x) = 1 iff 1A(x)− 1B(x) = 1 or 1B(x)− 1A(x) = 1 iff x ∈ A \Bor x ∈ B \A.

14. The range of f is {1/k : k ∈ N}. Since f(x) = 1/k iff 1/x− 1 < k ≤ 1/xiff 1/(k + 1) < x ≤ 1/k, the assertion follows from Exercise 7.

17. Let ε > 0 and choose N ∈ N such that 2N > 1/ε and f ≤ N on S. Letk > N , so 0 ≤ f ≤ k. Then, in the notation of the proof of 10.5.8,

fk =k2k∑j=1

j − 12k 1Ak,j , where

Ak,j ={x ∈ S : (j − 1)2−k ≤ f(x) < j2−k

}, j = 1, 2, . . . , k2k.

For any x ∈ S there exists j ∈ {1, 2, . . . , k2k} such that x ∈ Ak,j , hence0 ≤ f(x)− fk(x) = f(x)− (j − 1)2−k ≤ 1/2k < ε.

19. (a) That F is a σ-field follows from properties of preimages.(b) Since f−1(I1 × · · · × Im) =

⋂mj=1 f

−1j (Ij), F contains all intervals,

hence, by minimality, F = B(Rm).(c) If A ∈ B(R) and B := F−1(A), then B ∈ B(Rm), hence, from (b),g−1(A) = f−1(B) ∈ B(Rn).

Section 11.2

3. Since f−1({d2})

=∞⋃k=1

(d/10k, (d+ 1)/10k

)∩ I,

∫[0,1]

f dλ =9∑d=1

d2λ(f−1({d2}

))= 1

9

9∑d=1

d2.

5. (a) If∫E|g| dλ = 0, then g = 0 a.e., hence both integrals in (a) are

zero. Suppose∫E|g| dλ 6= 0. Since m

∫E|g| ≤

∫Ef |g| ≤ M

∫E|g| on E,

a :=( ∫

E|g| dλ

)−1 ∫Ef |g| dλ satisfies the requirement.

(b) For example, take E = (−1, 1) and f = g = 1(−1,0) − 1(0,1), so∫E

fg =∫E

[1(−1,0) + 1(0,1)

]= 2,

∫E

g = 0.

(c) Given ε > 0, choose δ > 0 such that −ε < f(t) − f(x) < ε for allt ∈ (x− δ, x+ δ). If y ∈ (x, x+ δ), then, by (a),∫

[a,y]f dλ−

∫[a,x]

f dλ− f(x)(y − x) =∫

[f(t)− f(x)]1[x,y](t) dt

= ay

∫1[x,y] dλ = ay(y − x),

where |ay| ≤ ε. Dividing by y − x proves the assertion for right-handlimits.

7. Use 11.2.17 and the Stone–Weierstrass theorem.

9. By the approximation property of integrals, given ε > 0, there exists asimple function g such that ||f − g||1 < ε and g = 0 outside an interval[a, b]. If k > b, then

∫[k,k+1] g dλ = 0, hence∣∣∣ ∫

[k,k+1]f dλ

∣∣∣ =∣∣∣ ∫

[k,k+1](f − g) dλ

∣∣∣ ≤ ∫[k,k+1]

|f − g| dλ < ε.

11. Let A = {x ∈ I : |f(x)| > ε}. Then∫I

f2 dλ ≥∫A

f2 dλ ≥ ε2λ(A).

15. By linearity of the integral,∫ 1

0 P (x)f(x) dx = 0 for every polynomialP (x) whose terms have even powers. Let g : [0, 1] be continuous. Since x2


separates points of [0, 1], by the Stone–Weierstrass theorem there existsa sequence of such polynomials Pn such that ||g − Pn||∞ → 0. Then∣∣∣ ∫ 1

0fg∣∣∣ =

∣∣∣ ∫ 1

0f(g − Pn)

∣∣∣ ≤ ||g − Pn||∞ ∫ 1

0|f | → 0,

so∫ 1

0 gf = 0. Since the continuous functions are dense in L1[0, 1], thereexists a sequence of continuous functions gn such that

∫ 10 |f − gn| → 0.

Then ∫ 1

0f2 =

∫ 1

0f(f − gn) ≤ ||f ||∞

∫ 1

0|f − gn| → 0.

Therefore,∫ 1

0 f2 = 0, hence f = 0 a.e.

Section 11.31. In each case the integrand fk → 0. Moreover, in (a) the functions

are bounded, and in (b) |fk(x)| ≤ 1/x2 on [1,+∞), and |fk(x)| ≤kx3/2/(1 + k2x2) ≤ 1/

√x on [0, 1]. Thus in each case the dominated

convergence theorem may be used.

2. (b) k ln(1 + f/k2) ≤ f/k.

(c) k sin(f/k

)=f sin

(f/k

)f/k

→ f and k sin(f/k) ≤ f .

In each case apply the dominated convergence theorem.

3. g(1 + f/k)ne−f ↑ g.

5. |f(x) sin(tx)/x| ≤ |tf(x)|, hence is integrable in x for each t. If tk → t,then {tk} is bounded hence |tkf(x)| ≤ C|f(x)|. Now apply the dominatedconvergence theorem.

8. Assume first that f ≥ 0. Since Rn =⋃∞k=1Ak, 11.3.4 implies that∫

f dλ =∞∑k=1

∫Ak

f dλ =∞∑k=1

akλ(Ak).

The general result follows by considering f±.

12. By 11.2.6, the series∑∞k=1 |fk(x)| converges a.e.

15. By Fatou’s lemma,∫

lim inf(fk − g) dλ ≤ lim inf∫

(fk − g) dλ.

19. Let µ+(E) =∫Ef+ dλ and µ−(E) =

∫Ef− dλ, E ∈ M(Rn). Then µ±

are measures that agree on compact intervals. Since every open intervalis an increasing union of compact intervals, by continuity from below,the measures µ± agree on open intervals. Since every open set is a

countable disjoint union of open intervals, they agree on open sets. By10.4.4, for any bounded E ∈ M there exists a decreasing sequence ofbounded open sets Uk ⊇ E such that λ(Uk)→ λ(E). By Exercise 11.2.8,µ±(Uk)→ µ±(E). Therefore, µ± agree on bounded Lebesgue measurablesets. In particular, if Ak = {x ∈ [−k, k] : f(x) > 0}, then

∫Akf dλ =

0. Since 1Akf ≥ 0, it follows that 1Akf = 0 a.e., hence λ(Ak) = 0.Since Ak ↑ {x ∈ R : f(x) > 0}, λ

({x ∈ R : f(x) > 0}

)= 0. A similar

argument shows that λ({x ∈ R : f(x) < 0}

)= 0.

Section 11.51. The given set is the union of the sets

Ej,k = [−k, k]× · · · ×j(

[−k, k] ∩Q)× · · · × [−k, k], k ∈ N, 1 ≤ j ≤ k,

over all k and j, and λ(Ej,k) = (2k)n−1λ(Q ∩ [−k, k]) = 0.

2. (a) By the Fubini–Tonelli theorem and a substitution,∫I

f dλ =∫ ∞

0· · ·∫ ∞

0x1 · · ·xne−||x||

2dx1 · · · dxn

=∫ ∞

0· · ·∫ ∞

0x2 · · ·xne−(x2

2+···x2n) dx2 · · · dxn

∫ ∞0x1e−x2

1 dx1

= 12

∫ ∞0· · ·∫ +∞

0x2 · · ·xne−(x2

2+···x2n) dx2 · · · dxn

= · · · = 12n .

5. By the Fubini–Tonelli theorem, the integral, denote it by I, equals∫0≤x≤x1≤···≤xm−1≤1

∫ 1

xm−1

x dxm dλ(x, x1, x2, . . . , xm−1).

Performing the inner integration, we have

I =∫

0≤x≤x1≤···≤xm−1≤1(1− xm−1)x dλ(x, x1, x2, . . . , xm−1).

Integrating with respect to xm−1,

I = 12

∫0≤x≤x1≤···≤xm−2≤1

(1− xm−2)2x dλ(x, x1, x2, . . . , xm−2).

Continuing we obtain

I = 1m!

∫ 1

0(1− x)mx dx = 1

(m+ 2)! ,

the last equality by 5.3.4.

7. The function

F (t, x) := t−pf(t)1[x1/p,1](t) = t−pf(t)1(0,tp](x)

is Borel measurable on (0, 1)× (0, 1) and is integrable since∫(0,1)

∫(0,1)|F (t, x)| dλ(t, x) =

∫(0,1)

t−p|f(t)|∫

(0,1)1(0,tp](x) dx dt

=∫

(0,1)t−p|f(t)| tp dt

=∫

(0,1)|f | dλ < +∞.

Repeating the calculation with F (t, x) instead of |F (t, x)|,∫(0,1)

g(x) dx =∫

(0,1)

∫(0,1)

F (t, x) dλ(t, x) =∫

(0,1)f dλ.

9. (a) Let Ic(t) =∫ c

0e−xt sin x dx. Using the given identity, we have∫ c

0

sin xx

dx =∫ c

0

∫ ∞0

e−xt sin x dt dx =∫ ∞

0Ic(t) dt, (†)

the second equality by the Fubini–Tonelli theorem, valid since∫ c

0

∫ ∞0

e−xt| sin x| dt dx =∫ c

0

| sin x|x

dx < +∞.

Integrating by parts twice yields

Ic(t) = 11 + t2

[1− e−ct(cos c+ t sin c)

]so limc→+∞ Ic(t) = (1 + t2)−1. Moreover, if c ≥ 1 then

|Ic(t)| ≤1 + e−ct(1 + t)

1 + t2≤ 2

1 + t2.

Since the last function is integrable on [0,+∞), (†) and Lebesgue’sdominated convergence theorem imply that∫ ∞

0

sin xx

dx = limc→+∞

∫ ∞0

Ic(t) dt =∫ ∞

0

11 + t2

dt = π

2 .

11. (a) By translation invariance and symmetry,

f ∗ g(x) =∫Rnf(x− y)g(y) dy =

∫Rnf(−y)g(x+ y) dy

=∫Rnf(y)g(x− y) dy = g ∗ f(x).


14. Let α =[G(b) − G(a)

]and β =

[F (b) − F (a)

]. By the Fubini–Tonelli

theorem,∫[a,b]

F (x)g(x) dx− αF (a) =∫

[a,b][F (x)− F (a)]g(x) dx

=∫

[a,b]

∫[a,b]

1[a,x](t)f(t)g(x) dt dx

=∫

[a,b]

∫[a,b]

1[t,b](x)f(t)g(x) dx dt,

and by a notation switch,∫[a,b]

G(x)f(x) dx− βG(a) =∫

[a,b][G(x)−G(a)]f(x) dx

=∫

[a,b]

∫[a,b]

1[a,x](t)g(t)f(x) dt dx

=∫

[a,b]

∫[a,b]

1[a,t](x)g(x)f(t) dx dt.

Adding, we have∫[a,b]

F (x)g(x) dx− αF (a) +∫

[a,b]G(x)f(x) dx− βG(a)

=∫

[a,b]

∫[a,b]

[1[a,t](x) + 1[t,b](x)

]f(t)g(x) dx dt

=∫

[a,b]

∫[a,b]

f(t)g(x) dx dt = αβ,

which implies the desired conclusion.

Section 11.63. Take f(x) = h(||x||) in (11.16).

5. Let x = (x1, . . . , xn). The hole H may be described by

H ={

(x, xn+1) : −√

1− ||x||2 ≤ xn+1 ≤√

1− ||x||2, ||x|| ≤ R}.

By Exercise 3,

λ(H) = 2∫||x||≤R

√1− ‖|x||2 dλ(x) = 2nαn

∫ R

0

√1− r2 dr

= nαn[R√

1−R2 − arcsin√

1−R2 + π/2].


6. Define a C∞ map ϕ : Rn × (0, 2π)→ Rn+1 by

ϕ(x1, . . . , xn, θ) = (x1, . . . , xn−1, xn cos θ, xn sin θ).

The condition xn > 0 implies that ϕ is one-to-one. Since Er = ϕ(E ×(0, 2π)

)and

Jϕ =

∣∣∣∣∣∣In−1 0 0

0 cos θ −xn sin θ0 sin θ xn cos θ

∣∣∣∣∣∣ = xn,

the change of variables theorem implies that

λn+1(Er) =∫ϕ(E×(0,2π)

) 1 dλn+1 =∫ 2π

0

∫E

xn dλn(x1, . . . , xn) dθ

= 2π∫E

xn dλn(x1, . . . , xn).

Section 12.11. If ψ = ϕ ◦α and φ = ψ ◦β, where α and β are C1 with C1 inverses, thenφ = ϕ ◦ γ, where γ = α ◦ β is C1 with C1 inverse β−1 ◦ α−1.

3. There are two unit tangent vectors at the point where t = ±1, namely,(±e1 + e2)/

√2.

4. (a) The trace is the parabola (x, 1 − 2x2). The tangent vector field is(cos t)e1 − 2(sin(2t))e2.

5. (b) x = a cos t, y = b sin t, z = ab sin(2t), 0 ≤ t ≤ 2π.

6. If ϕ(t) = x for infinitely many t, then, by the Bolzano–Weierstrasstheorem, there would exist a point t ∈ [a, b] and a sequence tn → t withtn 6= t for all n such that ϕ(tn) = x for all n. Then

ϕ′(t) = limn

ϕ(tn)− ϕ(t)tn − t

= limn

0− 0tn − t

= 0,

hence ϕ is not smooth.

Section 12.21. Only (b) and (c) are rectifiable.

2. (b) −√

5/4.

4. (b) x = 1 + 2 cos t, y = 2 + 3 sin t, 0 ≤ t ≤ 2π.

length =∫ 2π

0

√4 sin2 t+ 9 cos2 t dt.

5. (b) 6.

7. x′ = r′ cos θ−rθ′ sin θ, y′ = r′ sin θ+rθ′ cos θ, (x′)2+(y′)2 = (rθ′)2+(r′)2.

9. (a) W = −∫ b

a

(∇P

(ϕ(t)

))· α′(t) dt =

∫ a

b

d

dt(P ◦ ϕ)(t) dt.

Section 12.31. (a) If T (e1, e2, e3) = (e1 + e2, e2 + e3, e3 + e1), then

[T ] =

1 0 11 1 00 1 1

,which has positive determinant. Therefore, the sign of the frame ispositive.

5. Take ψ(φ) = (a cosφ, b+ a sinφ), 0 < φ < 2π. By 12.3.9,

~Nϕ(ϕ(φ, θ)

)= a(cosφ, sinφ cos θ, sinφ sin θ).

Setting x = a cosφ, y = (b+ a sinφ) cos θ and z = (b+ a sinφ) sin θ, wehave

a sinφ =√y2 + z2 − b, cos θ = y√

y2 + z2, sin θ = z√

y2 + z2,

hence

~Nϕ(x, y, z) = xe1 + y(√y2 + z2 − b)√y2 + z2

e2 + z(√y2 + z2 − b)√y2 + z2

e3.

6. Let (x, y, z) = ϕ(t, θ) and use ∂ϕ⊥ = (xt, yt, zt)× (xθ, yθ, zθ).(a) ∂ϕ⊥(t, θ) = t

(− cos θ, sin θ, 1

), so

~Nϕ(t cos θ, t sin θ, t) = 1√2

(− cos θ,− sin θ, 1).

Therefore, ~Nϕ(x, y, z) = 1√2

(−xz,−yz, 1).

(f) Let f(s) = a + s(b − a). Then ϕ(t, s) = (f(s) cos t, f(s) sin t, s) and∂ϕ⊥(t, s) = f(s)(cos t, sin t, a− b). Therefore,

~Nϕ(x, y, z) = 1√1 + (a− b)2

(cos t, sin t, a− b) ,

where x = f(s) cos t, y = f(s) sin t, and z = s, so

~Nϕ(x, y, z) = 1√1 + (a− b)2

(x

f(z) ,y

f(z) , a− b).

7. If v ∈ V and u = (v, s), then


)= det

(ψ′(v)tψ′(v)

)6= 0,

hence ϕ is a parameterized (n − 2)-surface. Moreover, since the deter-

minant ∂(ϕ1, . . . , ϕn−1)∂(u1, . . . , un−1) has a column of zeros, setting u = (v, s), we

have

∂ϕ⊥(u) =n∑i=1

(−1)i+n ∂(ϕ1, . . . , ϕi, . . . , ϕn)∂(u1, . . . , un−1) (v, un−1)ei

=n−1∑i=1

(−1)i+n ∂(ψ1, . . . , ψi, . . . , ψn−1)∂(u1, . . . , un−1) (v)ei

=(∂ϕ⊥(v), 0

),

proving (b). Part (c) follows from (b) and 12.3.6.

Section 12.43. The transition mapping is ϕ−1 ◦ ϕ1(x) = 1

2x, as may be seen from

x = ϕ−11 (y) = 2(1− yn)−1(y1, . . . , yn−1), yn < 1,

henceϕ1(x) = (4 + ‖x‖2)−1(4x1, . . . , 4xn−1, ‖x‖2 − 4

).

5. For (y1, y2, y2) on the cone,

(x1, x2) = ϕ−1(y1, y2, y2) = 11− y3

(y1, y2), 0 < y3 < 1,

hence ϕ(x1, x2) = ((1− y3)x1, (1− y3)x2, y3), where

y3 = α

1 + αα =

(x1

a1

)2+(x2

a2

)2.

7. (a) {v : v1 + 2v2 + 3v3 = 0}.

9. The transpose of Tv is

11− yn

1 0 · · · 0 y1

1− yn

0 1 · · · 0 y2

1− yn...

......

...0 0 · · · 1 yn−1

1− yn

v1v2...vn

= 11− yn

v1 + y1vn1− yn

v2 + y2vn1− yn...

vn + yn−1vn1− yn

.


For part (a), use the identities

n−1∑j=1

y2j = 1− y2

n andn−1∑j=1

vjyj = −vnyn,

a consequence of ‖y‖ = 1 and y · v = 0.

11. (a) The conditions are

F (x) = 0 and G(x,v) := v1∂1F (x) + v2∂2F (x) + v3∂3F (x) = 0.

The Jacobian matrix of (F,G) (suppressing x and v) is[∂1F ∂2F ∂3F 0 0 0∂1G ∂2G ∂3G ∂1F ∂2F ∂3F

],

which has rank 2 since for each x, ∂iF (x) 6= 0 for some i.

Section 13.13. Fix ai for i ≥ 3 and set B(a, b) := M(a, b,a3, . . . ,am). Then

B(a+ b,a+ b) = B(a,a) +B(b,a) +B(a, b) +B(b, b),

hence B(b,a) +B(a, b) = 0. This shows that M changes sign if the firsttwo arguments are interchanged. The other pairs are treated similarly.

7. (a) (f1g2 − g1f2) dx1,2 + (f1g3 − g1f3) dx1,3 + (f2g3 − g2f3) dx2,3.

8. (b) −dx1,2,3.

9. (a) (−1)k(k+1)/2 dx1 ∧ · · · ∧ dx2k,

11. (b) (− dx1 + 2 dx2) ∧ (− dx2 + 2 dx3) ∧ (2 dx1 + dx3) = 9 dx1,2,3.

13. (a)n∑j=1

(n∑i=1

∂gj∂xi

dxi

)∧ dxj =

n∑j=1

f ′(xj) dxj ∧ dxj = 0,

15. By 13.1.16, d(ω ∧ ν) = (dω) ∧ ν + (−1)pω ∧ (dν) = η ∧ ν.

17. By 13.1.19(d),

[ϕ∗(ψ∗ω)](a1, . . . ,am) = (ψ∗ω)ϕ(u)(dϕu(a1), . . . , dϕu(am))= ωψ(ϕ(u))

((dψ)ϕ(u)(dϕu(a1)), . . . , (dψ)ϕ(u)(dϕu(am))

)= ωψ◦ϕ(u)

(d(ψ ◦ ϕ)u(a1), . . . , d(ψ ◦ ϕ)u(am)

),

the last equality by the chain rule.


19. By Exercise 9.3.15, x · ∇fj(x) = kfj(x). Since dω = 0, ∂1f2 = ∂2f1,∂3f1 = ∂1f3 and ∂2f3 = ∂3f2 (13.1.15). Therefore,

∂1f = 1k + 1 (x1∂1f1 + x2∂1f2 + x3∂1f3 + f1)

= 1k + 1 (x1∂1f1 + x2∂2f1 + x3∂3f1 + f1)

= f1.

Similarly, ∂2f = f2 and ∂3f = f3.

Section 13.21. (b) Since ∂ϕ⊥(t, θ) = (sin θ,− cos θ, t), ‖∂ϕ⊥(t, θ)‖2 = 1 + t2, hence

area(ϕ) = 2π∫ 1

0

√1 + t2 dt = 1

2[√

2 + ln(1 +√

2)].

4. For the case m = 2:

ϕ′(θ1, θ2) =

−r1 sin θ1 0r1 cos θ1 0

0 −r2 sin θ20 r2 cos θ2

,hence det

[ϕ′(θ1, θ2)tϕ′(θ1, θ2)

]= (r1r2)2. Therefore,

area(ϕ) =∫ 2π

0

∫ 2π

0r1r2 dθ1 dθ2 = (2πr1)(2πr2).

6. In the notation of Example 11.5.5, the surface is the graph of

g(x1, . . . xn) := 1−n∑j=1

xj , (x1, . . . , xn) ∈ S(1, n).

Therefore, the surface area is∫S(1,n)

√1 + ‖∇g‖2 dλn =

√1 + nλn

(S(1, n)

)=√n+ 1n! .

7. Let un−1 = s. Since

ϕ(u1, . . . , un−1) =(ψ1(u1, . . . , un−2), . . . , ψn−1(u1, . . . , un−2), un−1

),

∂(ϕ1, . . . , ϕi, . . . , ϕn)∂(u1, . . . , un−1) =

∂(ψ1, . . . , ψi, . . . , ψn−1)

∂(u1, . . . , un−2) if i ≤ n− 1

0 if i = n.

Therefore, det(ϕ′(u)tϕ′(u)

)= det

(ψ′(u)tψ′(u)

), so

area(ϕ) =∫ h

0

∫U

√det(ψ′(u)tψ′(u)

)du = h · area(ψ).

8. From ϕ(t, s) = (ψ1(t), ψ2(t), s), we have∂(ϕ2, ϕ3)∂(t, s) =

∣∣∣∣ψ′2(t) 00 1

∣∣∣∣ , ∂(ϕ1, ϕ3)∂(t, s) =

∣∣∣∣ψ′1(t) 00 1

∣∣∣∣ , ∂(ϕ1, ϕ2)∂(t, s) =

∣∣∣∣ψ′1(t) 0ψ′2(t) 0

∣∣∣∣ .11. By 12.3.7(b), ∂ϕ⊥(t, θ) = ψ2(t)

(ψ′2(t),−ψ′1(t) cos θ,−ψ′1(t) sin θ

), hence

‖∂ϕ⊥(t, θ)‖ = ψ2(t)‖ψ′(t)‖.For the torus, take ψ(t) = (a cos t, b+ a sin t), 0 < t < 2π. Then

area(ϕ) = 2π∫ 2π

0ψ2(t)‖ψ′(t)‖ dt = 2πa

∫ 2π

0(b+ a sin t) dt = 4π2ab.

For the cone, take ψ(t) = (t, rt/h), 0 ≤ t ≤ h, so

area(ϕ) = 2π∫ h

0(rt/h)

√(r/h)2 + 1 dt = πr

√r2 + h2.

13. (a) Take g(t) = t, 0 < t < 1, so (x1, x2, x3) := ϕ(t) =(t, t cos θ, t sin θ

)and ∫

ϕ

ω =∫ 1

0

∫ 2π

0t[(f1 ◦ ϕ) + (f2 ◦ ϕ) cos θ − (f3 ◦ ϕ) sin θ

]dθ dt.

Since f1 = f2 = 0 and f3 = x1x3 = t2 sin θ,∫ϕ

ω = −∫ 1

0

∫ 2π

0t3 sin2 θ dθ dt = −π4 .

15. (b) A local parametrization of S is(x1, x2, x3) := ϕ(t, θ) =

(ψ1(t), ψ2(t) cos θ, ψ2(t) sin θ

), 0 < θ, t < 2π,

whereψ1(t) = a cos t and ψ2(t) = b+ a sin t.

Therefore, by Exercise 12,∫ϕ


)= a

∫ 2π

0

∫ 2π

0(f1 ◦ ϕ)(b+ a sin t) cos t dθ dt

− a∫ 2π

0

∫ 2π

0(f2 ◦ ϕ)(b+ a sin t) sin t cos θ dθ dt

− a∫ 2π

0

∫ 2π

0(f3 ◦ ϕ)(b+ a sin t) sin t sin θ dθ dt.


Since f2 = f3 = 0 and f1 = x1 = a cos t,∫ϕ

ω = a2∫ 2π

0

∫ 2π

0(b+ a sin t) cos2 t dθ dt = 2a2bπ2.

Section 13.44. For example, let fn and g be Borel measurable and integrable on S with|fn| ≤ g for all n. If fn → f on S, then

∫Sfn dS →

∫Sfn dS. This follows

directly from the dominated convergence theorem applied to fn ◦ ϕa.

7. This follows from Exercise 6 and Hu1/n ∪H

`1/n ↑ S.

Section 13.51. The parameterizations are

• S : (cos t, sin t, z),• bottom boundary: (cos t, sin t, 0),• top boundary: (cos t,− sin t, 1),

where, 0 ≤ t ≤ 2π and 0 ≤ z ≤ 1. The left side of Stokes’s formula thenreduces to

−∫ 2π

0[f(cos t, sin t, 0) + f(cos t,− sin t, 1)] sin t dt

+∫ 2π

0[g(cos t, sin t, 0)− g(cos t,− sin t, 1)] cos t dt (†)

and the right side to∫ 1

0

∫ 2π

0[(hy − gz)(cos t, sin t, z) cos t+ (fz − hx)(cos t, sin t, z) sin t] dt dz

=∫ 2π

0

∫ 1

0[−gz(cos t, sin t, z) cos t+ fz(cos t, sin t, z) cos t] dz dt,

where we have used

hy(cos t, sin t, z) cos t− hx(cos t, sin t, z) sin t = d

dth(cos t, sin t, z).

Make the substitution t = 2π − s in (†) to complete the argument.

3. (a) Both sides equal (a) 2π(R− r).

4. (d) 0.

5. Use the parametrization x = a cos2m+1 t, y = b sin2m+1 t, 0 ≤ t ≤ 2π, tofirst obtain

12

∫ϕ

(x dy − y dx) = ab(m+ 1

2) ∫ 2π

0cos2m t sin2m t dt.

Then use sin(2t) = 2 sin t cos t and 5.3.4.

9. Apply the divergence theorem using d(fg) = f dg+ g df and div d(fg) =f∇2g + 2∇f ·∇g + g∇2f .

12. In (a), the induced orientation of C from S1 is the opposite of the inducedorientation from S2. By Stoke’s theorem,∫

S1

curl ~F · ~n dS =∫C

~F · ~dr = −∫S2

curl ~F · ~n dS,

hence ∫S

curl ~F · ~n dS =∫S1

curl ~F · ~n dS +∫S2

curl ~F · ~n dS = 0.

In (b), the induced orientations of C from S1 and S2 are the same, so∫S1

curl ~F · ~n dS =∫C

~F · ~dr =∫S2

curl ~F · ~n dS.

14. Let F (x) = x, x ∈ Rn. On S := Sr(0), ~n = x/r, hence∫S

~F · ~n dS =∫S

r dS = r · area(S).

Since div ~F = n,∫‖x‖<r

div ~F dx =∫‖x‖<r

ndx = nrnαn.

Bibliography

[1] Apostol, Thom, Mathematical Analysis, 2nd ed. Addison-Wesley, 2000.

[2] Bloch, Ethan, Proofs and Fundamentals, Birkhaüser, 2000.

[3] Burckill, J.C. and H. Burkill, A Second Course in Mathematical Analysis,Cambridge Press, 1970.

[4] Dudley, Richard, Real Analysis and Probability, Cambridge UniversityPress, 2002.

[5] Fleming, Wendell, Functions of Several Variables, 2nd ed. Springer Verlag,1977.

[6] Mendelson, Elliot, Number Systems and the Foundations of Analysis,Dover, 2008.

[7] Rudin, Walter, Principles of Mathematical Analysis, 3rd ed. McGraw-Hill,1976.

[8] Sibley, Thomas, Foundations of Mathematics, John Wiley & Sons 2009.

[9] Strang, Gilbert, Linear Algebra and Its Application, 4th ed. ThomsonBrooks/Cole, 2006.

[10] Wilder, Raymond, Introduction to the Foundations of Mathematics, 2nded. John Wiley and Sons, 1965.

581

A Course in Real Analysis provides a rigorous treatment of the foundations of differ-ential and integral calculus at the advanced undergraduate level.

The first part of the text presents the calculus of functions of one variable. This part covers traditional topics, such as sequences, continuity, differentiability, Riemann inte-grability, numerical series, and the convergence of sequences and series of functions. It also includes optional sections on Stirling’s formula, functions of bounded variation, Riemann–Stieltjes integration, and other topics.

The second part focuses on functions of several variables. It introduces the topological ideas needed (such as compact and connected sets) to describe analytical properties of multivariable functions. This part also discusses differentiability and integrability of multivariable functions and develops the theory of differential forms on surfaces in n.

The third part consists of appendices on set theory and linear algebra as well as solu-tions to some of the exercises.

Features• Provides a detailed axiomatic account of the real number system• Develops the Lebesgue integral on n from the beginning• Gives an in-depth description of the algebra and calculus of differential forms on

surfaces in n

• Offers an easy transition to the more advanced setting of differentiable manifolds by covering proofs of Stokes’s theorem and the divergence theorem at the concrete level of compact surfaces in n

• Summarizes relevant results from elementary set theory and linear algebra • Contains over 90 figures that illustrate the essential ideas behind a concept or

proof• Includes more than 1,600 exercises throughout the text, with selected solutions

in an appendix

With clear proofs, detailed examples, and numerous exercises, this book gives a thor-ough treatment of the subject. It progresses from single variable to multivariable func-tions, providing a logical development of material that will prepare readers for more advanced analysis-based studies.

K22153

w w w . c r c p r e s s . c o m

A COURSE IN

REAL ANALYSISA COURSE IN

REAL ANALYSIS

HUGO D. JUNGHENN

JUNGHENN

• Access online or download to your smartphone, tablet or PC/Mac• Search the full text of this and other titles you own• Make and share notes and highlights• Copy and paste text and figures for use in your own documents• Customize your view by changing font size and layout

WITH VITALSOURCE®

EBOOK

Mathematics

mathematics a real analysis · a course in real analysis provides a rigorous treatment of the...

Documents