mathschoolinternational.commathschoolinternational.com/math-books/calculus/books/calculus... ·...

Texts in Applied Mathematics 67

Hansjörg Kielhöfer

Calculus of VariationsAn Introduction to the One-Dimensional Theory with Examples and Exercises

www.MathSchoolinternational.com

Texts in Applied Mathematics

Volume 67

Editors-in-chief:

S. S. Antman, University of Maryland, College Park, USAL. Greengard, New York University, New York City, USAP. J. Holmes, Princeton University, Princeton, USA

Series Editors:

J. Bell, Lawrence Berkeley National Lab, Berkeley, USAJ. Keller, Stanford University, Stanford, USAR. Kohn, New York University, New York City, USAP. Newton, University of Southern California, Los Angeles, USAC. Peskin, New York University, New York City, USAR. Pego, Carnegie Mellon University, Pittsburgh, USAL. Ryzhik, Stanford University, Stanford, USAA. Singer, Princeton University, Princeton, USAA. Stevens, Universität Münster, Münster, GermanyA. Stuart, University of Warwick, Coventry, UKT. Witelski, Duke University, Durham, USAS. Wright, University of Wisconsin-Madison, Madison, USA


More information about this series at http://www.springer.com/series/1214


Hansjörg Kielhöfer

Calculus of VariationsAn Introduction to the One-DimensionalTheory with Examples and Exercises


Hansjörg KielhöferRimsting, BayernGermany

ISSN 0939-2475 ISSN 2196-9949 (electronic)Texts in Applied MathematicsISBN 978-3-319-71122-5 ISBN 978-3-319-71123-2 (eBook)https://doi.org/10.1007/978-3-319-71123-2

Library of Congress Control Number: 2017958602

Mathematics Subject Classification (2010): 49-01, 49J05

c© Springer International Publishing AG 2018This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part ofthe material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting, reproduction on microfilms or in any other physical way, and transmission or informationstorage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodologynow known or hereafter developed.The use of general descriptive names, registered names, trademarks, service marks, etc. in this publica-tion does not imply, even in the absence of a specific statement, that such names are exempt from therelevant protective laws and regulations and therefore free for general use.The publisher, the authors and the editors are safe to assume that the advice and information in this bookare believed to be true and accurate at the date of publication. Neither the publisher nor the authors orthe editors give a warranty, express or implied, with respect to the material contained herein or for anyerrors or omissions that may have been made. The publisher remains neutral with regard to jurisdictionalclaims in published maps and institutional affiliations.

Printed on acid-free paper

This Springer imprint is published by Springer NatureThe registered company is Springer International Publishing AGThe registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland


Preface

This book is the translation of my book “Variationsrechnung,” published byVieweg + Teubner and Springer Fachmedien, Wiesbaden, Germany, in 2010. TheGerman version is based on lectures that I gave many times at the University ofAugsburg, Germany. The audience consisted of students of mathematics, and alsoof physics, having a solid background in calculus and linear algebra.

My goal is to offer students a fascinating field of mathematics, which emergedhistorically from concrete questions in geometry and physics. In order to keepthe prerequisites as low as possible I confine myself to one independent variable,which is commonly called “one-dimensional calculus of variations.” Some advancedmathematical tools, possibly not familiar to the reader, are given along with proofsin the Appendix or can be found in introductory textbooks on analysis given in thebibliography. Accordingly, this book is a textbook for an introductory course on thecalculus of variations appropriate for students with no preliminary knowledge offunctional analysis. The exercises with solutions make the text also appropriate forself-study.

I present several famous historical problems, for example, Dido’s problem,the brachistochrone problem of Johann Bernoulli, the problem of the hangingchain, the problem of geodesics, etc. In order to find solutions, i.e., to deter-mine minimizers, I start by establishing the Euler-Lagrange equation, first withoutand later with constraints, for which Lagrange’s multiplier rule becomes crucial.Minimizers, whose existence is not questioned at this point, necessarily satisfy theEuler-Lagrange equations.

Apart from these, I also discuss questions arising in phase transitions and mi-crostructures. These problems are typically nonconvex, and the Weierstraß-Erdmanncorner conditions on broken extremals become relevant.

In the history of the calculus of variations the existence of a minimizer was ques-tioned only in the second half of the 19th century by Weierstraß. We present hisfamous counterexample to Dirichlet’s principle, which awakens the requirement foran existence theory. This leads to the “direct methods in the calculus of variations.”Here one independent variable has the advantage that the Sobolev spaces and thefunctional analytic tools can be given without great difficulties in the text or in

v


vi Preface

the Appendix. Some emphasis is put on quadratic functionals, since their Euler-Lagrange equations are linear. The above-mentioned Dirichlet’s principle offers anelegant way to prove the existence of solutions of (linear) boundary value problems:simply obtain minimizers.

The text includes numerous figures to aid the reader. Exercises intended to deep-en the coverage of topics are given at the end of each section. Solutions to the exer-cises are provided at the end of the book.

I thank Rita Moeller and Bernhard Gawron for preparing the manuscript, and Ithank Ingo Blechschmidt for producing the figures. Finally, I thank my friend TimHealey for many corrections and suggestions, which improved the book.

March 2017 Hansjörg Kielhöfer


Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

1 The Euler-Lagrange Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Function Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 The First Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 The Fundamental Lemma of Calculus of Variations . . . . . . . . . . . . . 81.4 The Euler-Lagrange Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.5 Examples of Solutions of the Euler-Lagrange Equation . . . . . . . . . . 171.6 Minimal Surfaces of Revolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241.7 Dido’s Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271.8 The Brachistochrone Problem of Johann Bernoulli . . . . . . . . . . . . . . 291.9 Natural Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361.10 Functionals in Parametric Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391.11 The Weierstraß-Erdmann Corner Conditions . . . . . . . . . . . . . . . . . . . 48

2 Variational Problems with Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 572.1 Isoperimetric Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572.2 Dido’s Problem as a Variational Problem

with Isoperimetric Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662.3 The Hanging Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 692.4 The Weierstraß-Erdmann Corner Conditions

under Isoperimetric Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712.5 Holonomic Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 772.6 Geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 962.7 Nonholonomic Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1032.8 Transversality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1122.9 Emmy Noether’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1252.10 The Two-Body Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

vii


viii Contents

3 Direct Methods in the Calculus of Variations . . . . . . . . . . . . . . . . . . . . . . 1393.1 The Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1393.2 An Explicit Performance of the Direct Method

in a Hilbert Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1553.3 Applications of the Direct Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

Solutions of the Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225


Introduction

The calculus of variations was created as a tool to calculate minimizers (or maxi-mizers) of certain mathematical quantities taking values in the set of real numbers.Historically these quantities have been derived from concrete problems in geometryand physics, some of which we study in this book.

In the 18th century mathematicians and physicists, such as Maupertuis, d’Alem-bert, Euler, and Lagrange, postulated variational principles stating that nature actswith minimal expenditure (“principle of least action”). The mathematical formula-tion of these principles led to the calculus of variations presented in this book in amodern language.

The notion “calculus of variations” goes back to the year 1744 and was intro-duced by Euler: For extremals of a function, the derivative vanishes, and in the 18thcentury that derivative was called the “first variation.”

The “mathematical quantities” investigated in this book are not ranges of map-pings from R

n into R, in general, i.e., they are not ranges of functions of finite-ly many variables. This is a main difference between the calculus of variationsand optimization theory, which has the same goal, namely finding extremals. Ourexamples show that the domain of definition of the mappings is infinite-dimensional.Traditionally, such mappings are called “functionals,” a notion that reappears in thenomenclature “functional analysis.” As a matter of fact, the calculus of variationsand functional analysis became “siblings” who cross-fertilized each other up topresent time. An essential step is the extension of the n-dimensional linear spaceRn to an infinite-dimensional function space. Since the methods of linear algebra

are not so effective in infinite-dimensional spaces one needs an additional topologi-cal structure.

Let us study some historical examples.

1. What is the shortest distance between two points? This question can beanswered only if an admitted connection is defined. It must be a continuouscurve that has a well-defined length. Moreover it must be determined where thatcurve runs: in a plane, in a space, on a sphere, on a manifold? The variablesof this problem are admitted curves that connect two fixed points, and the real

ix


x Introduction

quantities to be minimized are their lengths. Admitted curves cannot bedetermined by finitely many real variables, but they form a subset of an infinite-dimensional function space.We assume that the admitted curves connect two points in a plane or in R

n. Fora definition of their length, we use the Euclidean distance between two pointswhat defines also the length of the connecting line segment:

x= (x1,x2 . . . ,xn), y= (y1, . . . ,yn),

‖x− y‖ =( n

∑i=1

(xi − yi)2)1/2, n ∈ N.

(1)

An admitted curve k connecting two points A and B has the length L(k) which isdefined as the supremum of the lengths of all inscribed polygonal chains, where,in turn, the length of a polygonal chain is the sum of the lengths of its linesegments (Figure 1).

Fig. 1 On the Length of a Rectifiable Curve

A continuous curve having finite length is called rectifiable. Let k be a rectifiablecurve connecting A and B and let P be a point on k different from A and B.Identifying points in the Euclidean space with their position vectors, respectively,the triangle inequality for the Euclidean distance yields

‖B−A‖ ≤ ‖P−A‖+‖B−P‖, (2)

where equality holds if and only if P is on the straight line connecting A and B.According to the definition of L(k)

‖B−A‖ ≤ ‖P−A‖+‖B−P‖ ≤ L(k) (3)

and‖B−A‖ < ‖P−A‖+‖B−P‖ ≤ L(k), (4)


Introduction xi

if P is not on the straight line connecting A and B. This proves that all rectifiablecurves connecting A and B are longer than ‖B−A‖ if they contain a point not onthe line segment between A and B. Moreover, the shortest connection is given bythe line segment having length ‖B−A‖.

2. Heron’s shortest distance problem and the law of reflection. Consider twopoints A and B in the plane on one side of a straight line g. For which point P ong is ‖P−A‖+‖B−P‖ minimal? (Figure 2)

Fig. 2 Heron’s Problem

We reflect A with respect to the line g, and we obtain A′. Obviously ‖P−A‖ =‖P−A′‖ and according to our example 1, ‖P−A′‖+‖B−P‖ is minimal if thethree points A′, P, and B are on a line. The three angles in Figure 3 are equal

Fig. 3 The Law of Reflection

which proves the law of reflection: The reflected beam APB is shortest if theincidence angle equals the angle of reflection. This means also that on the pathAPB the light needs the shortest running time.Fermat’s principle in geometric optics (Fermat 1601–1665) states that a lightbeam chooses the path on which it has minimal running time. This principleimplies the law of refraction.


xii Introduction

3. Snell’s law of refraction: (W. Snellius 1580–1626) Given two points A and B ina plane on opposite sides of a line g that divides the plane into two half planesin which a light beam has two different velocities v1 and v2. On which path APBdoes the light beam have the minimal running time?

Fig. 4 The Law of Refraction

Snell’s law of refraction reads:

sinα1

sinα2=

v1

v2, (5)

where the angles α1 and α2 are shown in Figure 4.A different approach to the same problem is the following: A shoreline of alake is represented by the line g, and a person at A observes a swimmer whothreatens to drown at B. Therefore he needs help as quick as possible. Since theperson at A runs faster than he can swim (v1 is bigger than v2) he chooses a pathas sketched in Figure 4. Determine the point P where he jumps into the lake(Exercise 0.2). Hint: The respective distances between A and P and between Pand B are expressed as a function of P (in suitable coordinates). The velocities v1

and v2 determine the times T1 and T2 to cover the respective distances. MinimizeT1 +T2 as a function of P by setting its derivative to zero.

4. The isoperimetric problem of queen Dido (9th century b.c.) When the phoeni-cian princess Dido had fled to North Africa, she asked the Berber king for a bit ofland. She was offered as much land as could be encompassed by an oxhide. Didocut the oxhide into fine stripes so that she got a long enough rope to encircle anentire hill named Byrsa, the center of Carthage. What shape was the region thatshe encircled so that area was maximal?Let k be a closed curve having prescribed length L(k) and let F(k) denote thearea in its interior. For which k is F(k) maximal? We follow the arguments of J.Steiner (1796–1863): It is seen by reflection that the enclosed region is convex,cf. Figure 5. Choose points A and B on k that divide k into two arcs k1 and k2 of


Introduction xiii

equal length. Then the regions F1 and F2 encircled by k1, k2, and the line segmentg between A and B, respectively, should have the same size.

Fig. 5 Convexity of the Maximal Area

Fig. 6 Reduction to One Half

If F1 is bigger than F2, say, then replace F2 by the reflection of F1 with respectto the line g, cf. Figure 6. Thus the problem of maximal area is reduced to thefollowing: Find the arc k1 with prescribed length, whose endpoints A and B areon a straight line g, such that the area bounded by k1 and g is maximal.As seen before the region is convex. Choose a point P on k1 and let α be theangle at P in the triangle APB, cf. Figure 7:

Fig. 7 On Steiner’s Solution of Dido’s Problem


xiv Introduction

Imagine a hinge in P such that the sides PA and PB can rotate increasing ordecreasing the angle α . The areas F ′

1 and F ′′1 are fixed and rotate as well. The

length of the arc k1 does not change under rotation either, only the area of thetriangle Fα with corners APB changes. Since the lengths of the sides PA andPB are fixed the area of the triangle Fα is maximal for α0 = 90◦. Then F1 =F ′

1 +F ′′1 +Fα0 is maximal as well.

This proves that F1 is maximal if for any point P on k1 the triangle APB is a right-angle triangle. By the converse of Thales’ theorem the arc k1 is a semicircle andtherefore the closed curve k encircling a maximal area is a circle.We return to Dido’s problem in Paragraphs 1.7 and 2.2.

5. The brachistochrone problem posed by Johann Bernoulli (1667–1748). In June1696, he introduced the following problem in the “Acta Eruditorum” (a Journalfor Science published in Leipzig, Germany): Given two points A and B in avertical plane, what is the curve traced out by a point mass M acted on only bygravity, which starts at A and reaches B in the shortest time?In January, 1697, Bernoulli published his solution that we discuss in Paragraph1.8. It is commonly accepted that 1696 was the year of birth of modern calculusof variations (Figure 8).

Fig. 8 Admitted Curves for Bernoulli’s Problem

Not knowing so far Bernoulli’s arguments, we realize in view of the historicalexamples that the methods to solve a variational problem have been diverse. In themiddle of the 18th century the time was ripe for a systematic approach. As a mat-ter of fact, about 1755 Euler (1707–1783) and Lagrange (1736–1813) postulatedindependently a differential equation that extremals must fulfill. That differentialequation is now called Euler-Lagrange equation. In those days, the existence of ex-tremals was presumed evident, and one was interested in necessary conditions for


Introduction xv

extremals: After Euler and Lagrange had given a necessary condition in terms of thefirst variation, Legendre (1752–1833) then gave one in terms of the second variation.

In 1860 Weierstraß (1815–1897) shook the faith in the hitherto existing methodsby a counterexample that he published in 1869. It falsifies Dirichlet’s (1805–1859)arguments to solve the boundary value problem for Laplace’s (1749–1827) equationby a variational problem. Weierstraß’ counterexample shows that the existence of aminimizer is not evident, and it initiated the need for an existence theory which isnow known as “direct methods in the calculus of variations.” These techniques de-veloped hand in hand with the new field of “functional analysis,” and consequentlyunderstanding the former requires some knowledge of the latter. Accordingly the“direct methods” are less appropriate for beginners than the Euler-Lagrange calcu-lus. Nonetheless, we delve into it in Chapter 3.

Before the Euler-Lagrange calculus can be applied a variational problem has tobe given a mathematical formulation. In case of the brachistochrone problem, therunning time of the point mass needs to be expressed in terms of the curve. This“mathematical modeling” requires both physical knowledge and intuition. Namelythe domain of definition of the time functional has to be given mathematically. Atoo narrow restriction of the class of admissible or competing curves might falsifythe result. On the other hand, a too large class might burst open the mathematicalpossibilities.

In this book the mathematical modeling yields functionals of the form

J(y) =∫ b

aF(x,y,y′)dx (6)

or in parametric form

J(x,y) =∫ tb

taΦ(x,y, x, y)dt . (7)

For real functions y= y(x) where x ∈ [a,b] ⊂ R this reads as

J(y) =∫ b

aF(x,y(x), y′(x))dx (8)

or for plane curves (x,y) = (x(t),y(t)) where t ∈ [ta, tb] ⊂ R

J(x,y) =∫ tb

taΦ(x(t),y(t), x(t), y(t))dt . (9)

Following a tradition which goes back to Newton (1643–1727) the dot indicatesthe derivative with respect to t (which is not necessarily the real time). Obviouslyadmissible functions or curves need to be differentiable. The functions F and Φdepending on three or four variables, respectively, are called Lagrange functions orLagrangians. Generalizations to curves in space or in R

n are apparent. However, forthe theory presented in this book the admissible functions or curves must depend


http://dx.doi.org/10.1007/978-3-319-71123-2_3

xvi Introduction

only on one independent variable, and thus all integrals are taken over an interval.The corresponding theory is commonly called the “one-dimensional theory.”

As mentioned before, this book focuses on the Euler-Lagrange calculus with anintroduction to direct methods.

Exercises

0.1. Let A,B∈Rn two points and x(t)= (x1(t), . . . ,xn(t)), t ∈ [ta, tb], a continuously

differentiable curve connecting A and B, i.e., x(ta) = A and x(tb) = B. Prove

‖B−A‖ ≤∫ tb

ta‖x(t)‖dt = length of the curve. (10)

Here ‖x(t)‖ = (∑nk=1 (xk(t))

2)1/2 and (10) says that the length of all continuous-ly differentiable curves connecting A and B is at least as big as ‖B−A‖ which isthe length of the straight line segment x(t) = A+ t(B−A), t ∈ [0,1], connectingA and B. This proves that the line segment is the shortest among all continuouslydifferentiable connections.

Hint: Prove (10) with an inequality for integrals that follows from the triangle in-equality for approximating Riemann sums.

0.2. Prove Snell’s refraction law (5).


Chapter 1The Euler-Lagrange Equation

1.1 Function Spaces

In order to give the functionals

J(y) =∫ b

aF(x,y,y′)dx (1.1.1)

a domain of definition, we need to introduce suitable function spaces. First of all werequire that the Lagrange function or Lagrangian,

F : [a,b]×R×R → R, is continuous. (1.1.2)

Here [a,b] = {x|a ≤ x ≤ b} is a compact interval in the real line.

Definition 1.1.1. C[a,b] = {y|y : [a,b] → R is continuous},C1[a,b] = {y|y ∈C[a,b],y is differentiable on [a,b], y′ ∈C[a,b]},

where in the boundary points the one-sided derivatives are taken. A function y ∈C1[a,b] is called continuously differentiable on [a,b].

C1,pw[a,b] = {y|y ∈C[a,b], y ∈C1[xi−1,xi], i= 1, . . . ,m},where a= x0 < x1 < · · · < xm = b is a partition of [a,b] depending on y. A functiony ∈C1,pw[a,b] is called piecewise continuously differentiable.

Obviously C1[a,b] ⊂ C1,pw[a,b] ⊂ C[a,b] and all three spaces are infinite-dimensional vector spaces over R, provided that addition of functions and scalarmultiplication are defined in the natural way: (y1 + y2)(x) = y1(x) + y2(x),(αy)(x) = αy(x) for α ∈ R.

The graph of a typical function y ∈ C1,pw[a,b] looks like the graph sketched inFigure 1.1.

c© Springer International Publishing AG 2018H. Kielhöfer, Calculus of Variations, Texts in Applied Mathematics 67,https://doi.org/10.1007/978-3-319-71123-2_1

1


2 1 The Euler-Lagrange Equation

Fig. 1.1 The Graph of a Piecewise Continuously Differentiable Function

For y ∈C1,pw[a,b], define

J(y) =∫ b

aF(x,y,y′)dx=

m

∑i=1

∫ xi

xi−1

F(x,y(x),y′(x))dx. (1.1.3)

Then J :C1,pw[a,b]→R is a well defined functional or a function of a function. Theintegrals in the sum (1.1.3) exist since in view of (1.1.2) the integrand is continuouson each interval [xi−1,xi], i= 1, . . . ,m.

The real vector spaces of Definition 1.1.1 are normed as follows:

Definition 1.1.2. For y ∈C[a,b] let ‖y‖0 = ‖y‖0,[a,b] := maxx∈[a,b]

|y(x)|,for y ∈C1[a,b] let ‖y‖1 = ‖y‖1,[a,b] := ‖y‖0,[a,b] +‖y′‖0,[a,b], andfor y ∈ C1,pw[a,b] let ‖y‖1,pw = ‖y‖1,pw,[a,b] := ‖y‖0,[a,b] +max

i∈{1,...,m}{‖y′‖0,[xi−1,xi]}.

For two functions y1,y2 ∈ X = C[a,b], C1[a,b] or C1,pw[a,b], the expression‖y1 − y2‖ is the distance between y1 and y2 with respect to the norm in X.

Any partition of a function y ∈ C1,pw[a,b] can clearly be increased by finitelymany arbitrary points without changing the properties of Definition 1.1.1 and ofthe norm ‖y‖1,pw in Definition 1.1.2. For two functions y1,y2 ∈ C1,pw[a,b], theirdistance can be defined via the union of their respective partitions.

The norm ‖ ‖ on a real vector space X has the following properties:

Definition 1.1.3. A mapping ‖ ‖ : X → R is a norm provided

1. ‖y‖ ≥ 0 for all y ∈ X , ‖y‖ = 0 ⇔ y= 0,2. ‖αy‖ = |α|‖y‖ for all α ∈ R, y ∈ X ,3. ‖y1+ y2‖ ≤ ‖y1‖+‖y2‖ for all y1,y2 ∈ X.

Inequality 3. is called triangle inequality.

Verify the properties of a norm for all norms given in Definition 1.1.2.


1.1 Function Spaces 3

In a normed vector space X , convergence and continuity are defined as follows:

Definition 1.1.4. We define for a sequence (yn)n∈N ⊂ X:limn→∞

yn = y0 ⇔ for any ε > 0 there exists an n0(ε) such that

‖yn − y0‖ < ε for all n ≥ n0(ε).Let J : D → R be a functional defined on a subset D ⊂ X.J is continuous in y0 ∈ D ⇔ for any ε > 0 there is a δ (ε)> 0,

such that |J(y)− J(y0)| < ε for ally ∈ D with ‖y− y0‖ < δ (ε)

⇔ for each sequence (yn)n∈N ⊂ D ⊂ Xsatisfying lim

n→∞yn = y0,

there holds limn→∞

J(yn) = J(y0).J is continuous on D if J is continuous in all y0 ∈ D.

The last equivalence defining continuity via sequences is proved in the same wayas for functions defined on subsets of R or Rn.

The convergence in X =C[a,b] or in X =C1[a,b]means uniform convergence ofyn or of yn and y′

n, respectively, on [a,b]. That uniform convergence guarantees thatany limit function belongs to the same space as the sequence. This is not necessarilythe case for convergence in the space C1,pw[a,b], cf. Exercise 1.1.1. The normedspacesC[a,b] andC1[a,b] are each “complete” due to the above-mentioned propertydefining a Banach space. The space C1,pw[a,b] is not a Banach space, but this factdoes not play a role in chapters 1 and 2.

Remark. The introduction of the space C1,pw[a,b] is necessary to describe “brokenextremals.” Variational problems with a Lagrangian that is nonconvex as a func-tion of y′ typically have broken extremals. We discuss such nonconvex variationalproblems in Paragraphs 1.5, Example 6, 1.11, and 2.4. They are not only of math-ematical interest, since extremals can have only specific “corners,” but they alsoplay an important role in the modeling of phase transitions, microstructures, andthe decomposition of binary alloys.

Exercises

1.1.1. A sequence (yn)n∈N ⊂ X in a normed vector space X is called a Cauchysequence if for any ε > 0 there is an n0(ε) such that ‖ym − yn‖ < ε for allm,n ≥ n0(ε). Let

y(x) =

⎧⎪⎪⎨⎪⎪⎩

1k2

(x− kx+1) for x ∈[1k,

1k−1

],

1k2

(x+ kx−1) for x ∈[

1k+1

,1k

],


http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2


for k = 2n, n ∈ N, and y(0) = 0. Sketch y and verify that y ∈ C[0,1], but y /∈C1,pw[0,1]. Let

yn(x) =

⎧⎪⎪⎨⎪⎪⎩y(x) for x ∈

[1

2n+1,1

],

0 for x ∈[0,

12n+1

],

for n ∈ N. Show that (yn)n∈N ⊂C1,pw[0,1] is a Cauchy sequence with respect to thenorm ‖ ‖1,pw,[0,1] but that there is no y0 ∈C1,pw[0,1] such that limn→∞ yn = y0 inC1,pw[0,1].

Hint: limn→∞ yn = y inC[0,1].

1.1.2. Prove that J :C1,pw[a,b] → R defined by

J(y) =∫ b

aF(x,y,y′)dx as in (1.1.3)

is continuous if (1.1.2) holds.

1.2 The First Variation

Let J :D⊂X →R be a functional defined on a subsetD of a normed vector space X .We require that, for y ∈D and for some fixed h ∈ X , the vector y+ th stays within Dfor all t ∈ (−ε,ε)⊂R. Then g(t)= J(y+th) defines a function g : (−ε,ε)⊂R→R.

Definition 1.2.1. If

g′(0) = limt→0

J(y+ th)− J(y)t

in R (1.2.1)

exists, then the functional J is Gâteaux differentiable in y in direction h and thederivative g′(0) is denoted dJ(y,h).

The Gâteaux differential dJ(y,h) (Gâteaux, 1889–1914) satisfies dJ(y,αh) =αdJ(y,h), but it is neither linear nor continuous in h, in general, as is shown bythe following example: Let X = R

2, y= (y1,y2) ∈ D= R2, and

J(y) =

⎧⎨⎩

y21

(1+

1y2

)for y2 = 0,

0 for y2 = 0.(1.2.2)


1.2 The First Variation 5

Then we get for y= (0,0) and h= (h1,h2) with h2 = 0

limt→0

J(y+ th)− J(y)t

= limt→0

(th21+

h21h2

)=

h21h2

(1.2.3)

and dJ(0,h) =h21h2

for h2 = 0 , dJ(0,h) = 0 for h2 = 0.

If a functional J : Rn → R is totally or Fréchet differentiable (Fréchet, 1878–1973), then the Gâteaux differential dJ(y,h) is linear and continuous inh= (h1, ...,hn) ∈ R

n:

ddtJ(y+ th)|t=0 =

n

∑k=1

∂J∂yk

(y)hk = (∇J(y),h), (1.2.4)

by the chain rule. In Rn, the Fréchet differential is represented by the gradient

∇J(y) =(

∂J∂y1

(y), . . . , ∂J∂yn (y)

)and the Euclidean scalar product ( , ).

Definition 1.2.2. If dJ(y,h) exists in y ∈D⊂ X for h ∈ X and if dJ(y,h) is linear inh, then dJ(y,h) is called the first variation of J in y in direction h, and it is denoted

dJ(y,h) = δJ(y)h. (1.2.5)

If (1.2.5) holds for all h ∈ X0 ⊂ X, where X0 is a linear subspace, then

δJ(y) : X0 → R (1.2.6)

is a linear functional.

In case (1.2.4) we have X0 = X =Rn and δJ(y)h= (∇J(y),h). Moreover δJ(y) :

X0 → R is continuous.

Remark. If X is finite-dimensional, then any linear functional on X is continu-ous, as can be seen by its matrix representation (or its representation with a scalarproduct). However, if X is infinite-dimensional, this is not necessarily the case: LetX =C1[0,1]⊂C[0,1] be normed as C[0,1] in Definition 1.1.2 and define Ty= y′(1)for y ∈ X. Then for yn(x) = 1

nxn we obtain ‖yn‖0,[0,1] = 1

n , Tyn = 1, whencelimn→∞ yn = 0, limn→∞Tyn = 1 = T0 = 0, which shows that T is not continuousin y= 0.

We now compute the Gâteaux differential of the functional (1.1.3):

Proposition 1.2.1. Let the functional

J(y) =∫ b

aF(x,y,y′)dx (1.2.7)



be defined on D ⊂C1,pw[a,b]. We assume that to each y ∈ D and h ∈C1,pw0 [a,b] :=

C1,pw[a,b]∩{y(a) = 0,y(b) = 0}, there is some ε such that y+ th ∈ D for all t ∈(−ε,ε). Moreover we assume that the Lagrange function F : [a,b]×R×R → R

is continuous and continuously partially differentiable with respect to the last twovariables. Then the Gâteaux differential exists in all y ∈ D in all directions h ∈C1,pw0 [a,b] and is represented by

δJ(y)h=∫ b

aFy(x,y,y′)h+Fy′(x,y,y

′)h′dx. (1.2.8)

Here Fy and Fy′ denote the partial derivatives of F with respect to the second andthird variables, respectively.

Proof. We fix y and h and w.l.o.g. we assume the same partition a = x0 < x1 <· · · < xm = b for both functions. We obtain for some x ∈ [xi−1,xi] ⊂ [a,b] and for allt ∈ (−ε,ε)\{0}

1t(F(x,y(x)+ th(x), y′(x)+ th′(x))−F(x,y(x), y′(x)))

=1t

∫ t

0

dds

F(x,y(x)+ sh(x), y′(x)+ sh′(x))ds

= Fy(x,y(x), y′(x))h(x)+Fy′(x,y(x), y′(x))h′(x) (1.2.9)

+1t

∫ t

0Fy(x,y(x)+ sh(x), y′(x)+ sh′(x))−Fy(x,y(x), y′(x))dsh(x)

+1t

∫ t

0Fy′(x,y(x)+ sh(x), y′(x)+ sh′(x))−Fy′(x,y(x), y

′(x))dsh′(x).

Since y and h are in C1[xi−1,xi],

{(x,y(x)+ sh(x), y′(x)+ sh′(x))|x ∈ [xi−1,xi], |s| ≤ ε2}

⊂ [xi−1,xi]× [−c,c]× [−c′,c′] ⊂ [a,b]×R×R

(1.2.10)

for some positive constants c and c′. Uniform continuity of Fy on the compact set(1.2.10) implies that for all x ∈ [xi−1,xi] and for any ε > 0, we have

|Fy(x,y(x)+ sh(x), y′(x)+ sh′(x))−Fy(x,y(x), y′(x))| < ε,provided |s|(|h(x)|+ |h′(x)|)< δ (ε) and |s| ≤ ε

2.

(1.2.11)

In view of |h(x)| + |h′(x)| ≤ ‖h‖0,[xi−1,xi] + ‖h′‖0,[xi−1,xi] ≤ ‖h‖1,pw,[a,b], estimate(1.2.11) is fulfilled for

|s| <min

{ε2,

δ (ε)‖h‖1,pw

}. (1.2.12)


1.2 The First Variation 7

An analogous estimate holds for Fy′ . Then for all x ∈ [xi−1,xi] and for any ε > 0(1.2.9) implies

|1t(F(x,y(x)+ th(x), y′(x)+ th′(x))−F(x,y(x), y′(x)))−(Fy(x,y(x), y′(x))h(x)+Fy′(x,y(x), y

′(x))h′(x))|≤ 1

|t| |t|ε(|h(x)|+ |h′(x)|) ≤ ε‖h‖1,pw < ε,

provided 0< |t| <min

{ε2,

δ (ε)‖h‖1,pw

}and 0< ε <

ε‖h‖1,pw .

(1.2.13)

Estimate (1.2.13) means that

limt→0

(1t(F(x,y(x)+ th(x), y′(x)+ th′(x))−F(x,y(x), y′(x))

)

= Fy(x,y(x), y′(x))h(x)+Fy′(x,y(x), y′(x))h′(x)

uniformly for x ∈ [xi−1,xi] , i= 1, . . . ,m.

(1.2.14)

Uniform convergence allows the interchange of the limit and the integral, i.e., itimplies the convergence of the integral to the integral of the limit. Hence, we obtainfinally

limt→0

J(y+ th)− J(y)t

=

limt→0

m

∑i=1

∫ xi

xi−1

1t(F(x,y(x)+ th(x), y′(x)+ th′(x))−F(x,y(x), y′(x)))dx

=m

∑i=1

∫ xi

xi−1

Fy(x,y(x), y′(x))h(x)+Fy′(x,y(x), y′(x))h′(x)dx

=∫ b


′)h′dx= dJ(y,h) = δJ(y)h,

(1.2.15)

where the last equality follows from the linearity h �→ dJ(y,h), cf.Definition 1.2.2. �

Proposition 1.2.2. Under the same hypotheses as those of Proposition 1.2.1 thefirst variation

δJ(y) :C1,pw0 [a,b] → R (1.2.16)

is linear and continuous for each y ∈ D. In particular,

|δJ(y)h| ≤C(y)‖h‖1,pw for all h ∈C1,pw0 [a,b] , (1.2.17)

where the positive constant C(y) depends on y ∈ D.



If the functional (1.2.7) is defined on all of C1,pw[a,b] we can choose any h inC1,pw[a,b], and the proof of Proposition 1.2.1 yields

δJ(y)h=∫ b


′)h′dx,

δJ(y) :C1,pw[a,b] → R is linear and continuous and

|δJ(y)h| ≤C(y)‖h‖1,pw for all y,h ∈C1,pw[a,b].

(1.2.18)

The homogeneous boundary conditions on h play no role in the proof.

Exercises

1.2.1. Prove Proposition 1.2.2.

1.2.2. Assume in addition to the hypotheses of Proposition 1.2.1 that the Lagrangefunction F is two times continuously partially differentiable with respect to the lasttwo variables. Then

g′′(0) = δ 2J(y)(h,h)

exists and is called the second variation of J in y in direction h. Prove the repre-sentation

δ 2J(y)(h,h) =∫ b

aFyyh

2+2Fyy′hh′ +Fy′y′(h

′)2dx,

where Fyy =Fyy(x,y(x),y′(x)) and analogously Fyy′ ,Fy′y′ are the second partial deriv-atives of F with respect to the second and third variables, respectively.

1.2.3. Under the same hypotheses as those of Exercise 1.2.2, prove that

δ 2J(y) :C1,pw[a,b]×C1,pw[a,b] → R defined by

δ 2J(y)(h1,h2) =∫ b

aFyyh1h2+Fyy′(h1h

′2+h′

1h2)+Fy′y′h′1h

′2dx

is bilinear and continuous for each y ∈ D. In particular

|δ 2J(y)(h1,h2)| ≤C(y)‖h1‖1,pw‖h2‖1,pwfor all h1,h2 ∈C1,pw[a,b].

1.3 The Fundamental Lemma of Calculus of Variations

The derivative of a piecewise continuously differentiable function is piecewise con-tinuous. Therefore we introduce


1.3 The Fundamental Lemma of Calculus of Variations 9

Definition 1.3.1. Cpw[a,b] = {y|y : [a,b] → R, y ∈ C[xi−1,xi], i = 1, . . . ,m} for apartition a= x0 < x1 < · · · < xm = b depending on y.

At a corner of a piecewise continuously differentiable function the two one-sidedderivatives have two different values. Accordingly we allow that a piecewise con-tinuous function has two values of the two one-sided limits at a point of its partition.This contradicts the definition of a function, but we afford that inaccuracy sinceaccuracy in Definition 1.3.1 would be cumbersome.

Lemma 1.3.1. If f ∈Cpw[a,b] and

∫ b

af hdx= 0 for all h ∈C∞

0 (a,b), (1.3.1)

then f (x) = 0 for all x ∈ [a,b].

C∞0 (a,b) is called the space of “test functions,” and it consists of all infinitely

differentiable functions having a compact support in the open interval (a,b). Thesupport of some h is the closure of {x|h(x) = 0}. Apparently h(a) = h(b) = 0 andC∞0 (a,b) ⊂C1,pw

0 [a,b].

Proof. Assume f (x) = 0 for some x ∈ [xi−1,xi] ⊂ [a,b]. Since f is continuous on[xi−1,xi] there is some open interval I in [xi−1,xi] such that f (x) = 0 for all x ∈ I.If f is positive in I, say, choose a function h ∈C∞

0 (a,b) having its support in I andbeing positive in the interior of its support. (Such a function exists by the remarkbelow.) Then f h ≥ 0 in [a,b] with f h > 0 in the support of h, and thus the integralof f h over [a,b] is positive, contradicting (1.3.1). �

Remark. A function h ∈C∞0 (a,b) having a prescribed support, and only one sign

is constructed as follows: The function g(x) = ±exp(−(1− x2)−1) for |x| < 1 andg(x) = 0 for |x| ≥ 1 is in C∞

0 (R) and has support [−1,1]. Then h(x) = g((x−x0)/r)has for any x0 ∈R and r> 0 the support [x0−r,x0+r], and h is positive or negativein (x0 − r,x0+ r), depending on the sign of g.

Lemma 1.3.2. If f ∈Cpw[a,b] and

∫ b

af h′dx= 0 for all h ∈C1,pw

0 [a,b], (1.3.2)

then f (x) = c for all x ∈ [a,b].



Proof. Choose c = 1b−a

∫ ba f (x)dx = 1

b−a ∑mi=1

∫ xixi−1

f (x)dx and h(x) =∫ xa ( f (s)−c)ds. Then h ∈C[a,b], h(a) = 0, h(b) = 0, and for x ∈ [xi−1,xi] we obtainh′(x) = f (x)− c when taking the respective one-sided derivatives in the partitionpoints.Thus h ∈C1,pw

0 [a,b], and in view of (1.3.2) and the choice of c, we find

∫ b

a( f − c)h′dx=

∫ b

af h′dx− c

∫ b

ah′dx= 0. (1.3.3)

On the other hand∫ b

a( f − c)h′dx=

∫ b

a( f − c)2dx=

m

∑i=1

∫ xi

xi−1

( f (x)− c)2dx= 0, (1.3.4)

and the continuity of f − c on [xi−1,xi] implies f (x) = c for all x ∈ [a,b]. �

Lemma 1.3.3. For f and h in C1,pw[a,b], the formula for integration by parts isvalid: ∫ b

af h′dx= −

∫ b

af ′hdx+ f h

∣∣ba. (1.3.5)

For h ≡ 1 formula (1.3.5) gives the fundamental theorem of calculus for piece-wise continuously differentiable functions.

Proof. Assuming identical partitions for f and h, formula (1.3.5) reads as follows:

m

∑i=1

∫ xi

xi−1

f h′dx= −m

∑i=1

∫ xi

xi−1

f ′hdx+ f h∣∣ba . (1.3.6)

Since f ,h ∈C1[xi−1,xi] the classical formula of integration by parts holds on eachinterval [xi−1,xi]. In the sum (1.3.6), the boundary values at interior partition pointsxi, i= 1, . . . ,m−1 drop out and only the boundary values at x0 = a and xm = b areleft. �

Next we state and prove the fundamental lemma of calculus of variations dueto DuBois-Reymond (1831–1889):

Lemma 1.3.4. If f ,g ∈Cpw[a,b] and

∫ b

af h+gh′dx= 0 for all h ∈C1,pw

0 [a,b], (1.3.7)

then g ∈ C1,pw[a,b] ⊂ C[a,b] and g′ = f piecewise on [a,b], i.e., g′(x) = f (x) forx ∈ [xi−1,xi] if f ∈C[xi−1,xi].


1.4 The Euler-Lagrange Equation 11

Proof. Defining F(x) =∫ xa f (s)ds, the function F ∈C[a,b], and for x ∈ [xi−1,xi] we

obtain F ′(x) = f (x) when taking one-sided derivatives at partition points. Conse-quently F ∈C1,pw[a,b], and due to Lemma 1.3.3, we find

∫ b

af hdx=

∫ b

aF ′hdx= −

∫ b

aFh′dx for all h ∈C1,pw

0 [a,b]. (1.3.8)

By (1.3.7) this implies

∫ b

af h+gh′dx=

∫ b

a(−F+g)h′dx= 0 for all h ∈C1,pw

0 [a,b]. (1.3.9)

Since g ∈Cpw[a,b], Lemma 1.3.2 is applicable. Hence −F(x)+g(x) = c or g(x) =c+F(x) for all x ∈ [a,b]. Therefore g ∈C1,pw[a,b] and g′ = F ′ = f piecewise in thesense explained before. �

Lemma 1.3.4 is a regularity theorem in the sense that under validity of (1.3.7),g is more regular than assumed. In particular g is continuous on the entire interval[a,b]. A special case is the following:

If f ,g ∈C[a,b] and (1.3.7) holds,then g ∈C1[a,b] and g′ = f on [a,b]. (1.3.10)

1.4 The Euler-Lagrange Equation

For the functional

J(y) =∫ b

aF(x,y,y′)dx, (1.4.1)

defined onD⊂C1,pw[a,b], we give now the most important necessary condition thata minimizing or maximizing function y ∈ D has to fulfill.

Definition 1.4.1. A function y ∈ D is a local minimizer of the functional (1.4.1) if

J(y) ≤ J(y) for all y ∈ D satisfying ‖y− y‖1,pw < d (1.4.2)

with a constant d > 0.

A local maximizer is defined in the analog way. It is sufficient to studyminimizers since maximizers of J are minimizers of −J.

A local minimizer can be defined using a different norm: If in (1.4.2) onerequires ‖y− y‖0 < d, then the inequality in (1.4.2) has to be fulfilled by more func-tions, which implies that the definition of a local minimizer is stronger. Accord-ingly, one calls y a strong local minimizer in this case. We shall not use this



definition. In Exercise 1.4.7 we see that a local minimizer is not necessarily a stronglocal minimizer.

We assume that for y ∈ D the first variation δJ(y)h exists for all h ∈C1,pw0 [a,b]

according to Definition 1.2.2. If the domain of definition is characterized by bound-ary conditions, i.e., if D=C1,pw[a,b]∩{y(a) = A, y(b) = B}, this is true.

Proposition 1.4.1. Let y ∈ D ⊂ C1,pw[a,b] be a local minimizer of the functional(1.4.1), and assume that the Lagrange function F : [a,b]×R×R→R is continuousand continuously partially differentiable with respect to the last two variables. Then

Fy′(·,y,y′) ∈C1,pw[a,b] ⊂C[a,b] andddx

Fy′(·,y,y′) = Fy(·,y,y′) piecewise on [a,b].(1.4.3)

For y ∈C1[a,b] we have Fy′(·,y,y′) ∈C1[a,b], and (1.4.3)2 holds on the entire inter-val [a,b].

Proof. For any fixed h ∈C1,pw0 [a,b], g(t) = J(y+ th) is defined for t ∈ (−ε,ε) and

g is locally minimal at t = 0, by assumption. Furthermore the first variation (= theGâteaux differential) dJ(y,h) = g′(0) exists, and due to a well-known result of cal-culus, g′(0) = 0. Hence by (1.2.8),

δJ(y)h=∫ b


′)h′dx= 0

for all h ∈C1,pw0 [a,b].

(1.4.4)

By the assumptions on y and F , the functions Fy(·,y,y′) and Fy′(·,y,y′) are inCpw[a,b], and therefore Lemma 1.3.4 is applicable and implies (1.4.3). The lastclaim is covered by (1.3.10). �

Equation (1.4.3)2 is the Euler-Lagrange equation. It is an ordinary differentialequation that has to be fulfilled by local minimizers piecewise.

The local minimizer is not necessarily two times (piecewise) differentiable. Ifthis is the case and if the second partial derivatives Fy′y′ , Fy′y, and Fy′x exist, then bydifferentiation of the left-hand side of (1.4.3)2, we find

Fy′y′(·,y,y′)y′′ +Fy′y(·,y,y′)y′ +Fy′x(·,y,y′) = Fy(·,y,y′)(piecewise) on [a,b]. (1.4.5)

Equation (1.4.5) is a quasilinear ordinary differential equation of second order sincethe highest (second) derivative of y appears linearly in it. Equation (1.4.4) is calledthe weak version of the Euler-Lagrange equation (1.4.3). Lemma 1.3.3 implies:

Proposition 1.4.2. The weak version (1.4.4) and the strong version (1.4.3) of theEuler-Lagrange equation are equivalent.



Proposition 1.4.2 is exceptional for one independent variable x. It is not valid, ingeneral, for more than one independent variable, i.e., for partial differential equa-tions.

A solution y ∈ D ⊂C1,pw[a,b] of the Euler-Lagrange equation is not necessarilya (local) minimizer (or maximizer). Here is an example:

J(y) =∫ 1

0(y′)3dx on D=C1,pw[0,1]∩{y(0) = 0, y(1) = 0} (1.4.6)

A function y ∈ D solves the Euler-Lagrange equation if

3(y′)2 ∈C1,pw[0,1] ⊂C[0,1] andddx

3(y′)2 = 0 piecewise on [0,1]. (1.4.7)

This means(y′)2 = c1 ≥ 0 and y′ = ±√

c1 on [0,1]. (1.4.8)

There are infinitely many solutions that fulfill the boundary conditions. Some ofthem are sketched in Figure 1.2. A special solution is y ≡ 0.

Fig. 1.2 Solutions of the Euler-Lagrange Equation

No solution is a local minimizer or maximizer, cf. Exercise 1.4.8.However, if the Lagrange function is convex with respect to the last two variables,

then any solution of the Euler-Lagrange equation is a local minimizer.

Definition 1.4.2. Let the Lagrange function F : [a,b]×R×R → R be continuousand continuously partially differentiable with respect to the last two variables. F isconvex with respect to the last two variables if



F(x, y, y′) ≥ F(x,y,y′)+Fy(x,y,y′)(y− y)+Fy′(x,y,y′)(y′ − y′)

for all (x,y,y′),(x, y, y′) ∈ [a,b]×R×R.(1.4.9)

Geometrically (1.4.9) means that the graph of F(x, ·, ·) is above the tangent planespanned by the two tangents to the graphs of F(x, ·,y′) and of F(x,y, ·), respectively,cf. Figure 1.3.

Fig. 1.3 Convexity

Proposition 1.4.3. Assume that the Lagrange function F of the functional J givenby (1.4.1) and defined on D=C1,pw[a,b]∩{y(a) = A, y(b) = B} is continuous, con-tinuously partially differentiable, and convex with respect to the last two variables.Then any solution y ∈ D of its Euler-Lagrange equation is a global minimizer of J.

Proof. Let y ∈ D any solution of the Euler-Lagrange equation, and let y be anyelement in D. Then h = y− y ∈ C1,pw

0 [a,b] since both functions fulfill the sameboundary conditions. Convexity (1.4.9) implies



J(y) =∫ b

aF(x, y, y′)dx=

∫ b

aF(x,y+h,y′ +h′)dx

≥∫ b

aF(x,y,y′)dx+

∫ b


′)h′dx

= J(y) due to (1.4.4).

(1.4.10)

Therefore y is not only a local but also a global minimizer. �

Let y ∈ D ⊂ C1,pw[a,b] be a local minimizer of J, and assume that the secondvariation of J exists in y as defined in Exercise 1.2.2. Then by a result of calculus,

δ 2J(y)(h,h) ≥ 0 for all h ∈C1,pw0 [a,b]. (1.4.11)

Under additional assumptions (1.4.11) implies a necessary condition on a local orglobal minimizer, cf. Exercise 1.4.3. Exercises 1.4.4 and 1.4.6 give sufficient condi-tions on a global and a local minimizer, respectively.

Exercises

1.4.1. The support supp(h) of a function h : R → R is the closure of the set{x|h(x) = 0}. Prove that to any compact interval I ⊂ (a,b) there is a sequence(hn)n∈N ⊂C1,pw

0 [a,b] having the properties

a) supp(hn) ⊂ I for all n ≥ n0,

b) limn→∞

∫ b

ah2ndx= 0,

c) limn→∞

∫ b

a(h′

n)2dx= ∞.

1.4.2. Under the hypotheses of Exercise 1.2.2 there exists the second variation ofJ in y ∈ D ⊂ C1,pw[a,b] in direction h ∈ C1,pw

0 [a,b]. We assume in addition thatFyy′(·,y,y′) ∈C1,pw[a,b]. Prove

δ 2J(y)(h,h) =∫ b

aPh2+Q(h′)2dx where

P= Fyy − ddx

Fyy′ ∈Cpw[a,b] and Q= Fy′y′ ∈Cpw[a,b].

1.4.3. Prove that under the hypotheses of Exercise 1.4.2, a local minimizer y∈D⊂C1,pw[a,b] of J given by (1.4.1) has to satisfy

Fy′y′(x,y(x),y′(x)) ≥ 0 for all x ∈ [a,b].

Hint: Use (1.4.11) and Exercises 1.4.2 and 1.4.1.



This necessary condition, due to Legendre, can be proven without the addi-tional assumption given in Exercise 1.4.2, cf. reference [3], page 57.

1.4.4. Assume the existence of the second variation of J in y ∈ D = C1,pw[a,b]∩{y(a) = A, y(b) = B} in direction h ∈ C1,pw

0 [a,b], cf. Exercise 1.2.2. Furthermoreassume that

δJ(y)h= 0 and

δ 2J(y)(h,h) ≥ 0 for all y ∈ D and for all h ∈C1,pw0 [a,b].

Prove that y is a global minimizer of J on D.

Hint: Use for two times continuously differentiable function g : R → R the identityg(1)−g(0) = g′(0)+

∫ 10 (1− t)g′′(t)dt.

1.4.5. Under the hypotheses of Exercise 1.2.2 the second variation of J exists iny ∈ D=C1,pw[a,b]∩{y(a) = A, y(b) = B} in direction h ∈C1,pw

0 [a,b]. Prove

J(y+h) = J(y)+δJ(y)h+12

δ 2J(y)(h,h)+R(y,h)

where R(y,h)/‖h‖21,pw → 0 if ‖h‖1,pw → 0.

The continuous linear mapping δJ(y) is also called (first) Fréchet derivative, andthe continuous bilinear mapping δ 2J(y) is also called second Fréchet derivative ofJ in y.

1.4.6. Assume that the first and second Fréchet derivatives of J exist in y ∈ D ⊂C1,pw[a,b] and that the following relations hold:

δJ(y)h= 0δ 2J(y)(h,h) ≥C‖h‖21,pw

for all h ∈ C1,pw0 [a,b] for a constant C > 0. Prove that y is a local minimizer of

J on D.

1.4.7. Consider the functional

J(y) =∫ 1

0(y′)2+(y′)3dx.

a) Prove that y = 0 is a local minimizer of J on D = C1,pw[0,1] ∩ {y(0) = 0,y(1) = 0}.

b) Prove that y= 0 is not a strong local minimizer according to the definition afterDefinition 1.4.1.


1.5 Examples of Solutions of the Euler-Lagrange Equation 17

Hint: Define for b ∈ (0,1) and n ∈ N

yn,b(x) =

⎧⎪⎪⎨⎪⎪⎩

1nb

x for x ∈ [0,b],

− 1n(1−b)

x+1

n(1−b)for x ∈ [b,1],

then yn,b ∈ D and for any d > 0 there is an n ∈ N and some suitable bn ∈ (0,1) with‖yn,bn‖0 < d and J(yn,bn)< 0.

1.4.8. Show that no solution of the Euler-Lagrange equation for the functional(1.4.6) is a local minimizer or maximizer.

Hint: Use also (1.4.11).

1.5 Examples of Solutions of the Euler-Lagrange Equation

1. J(y) =∫ 1

−1y2(2x−y′)2dx is defined on D=C1[−1,1]∩{y(−1) = 0, y(1) = 1}.

The function

y(x) =

{0 for x ∈ [−1,0],x2 for x ∈ [0,1],

is inD and J(y) = 0. By J(y)≥ 0 for all y∈D that function is a global minimizer.Observe that y is not in C2[−1,1], but it fulfills the Euler-Lagrange equation onthe entire interval [−1,1].

2. The Dirichlet integral J(y) =∫ b

a(y′)2dx is defined on D=C1,pw[a,b]∩{y(a) =

A, y(b) = B}.Omitting the factor 2, the Euler-Lagrange equation reads

ddx

y′ = y′′ = 0 piecewise on [a,b],

which is solved by y(x) = ci1x+ ci2 for x ∈ [xi−1,xi], i = 1, . . . ,m. This is notthe whole truth: The definition of D and (1.4.3)1 imply that y as well as y′ is inC[a,b], and thus the solution is the line y(x) = c1x+c2 with c1 = (B−A)/(b−a)and c2 = (bA−aB)/(b−a), fulfilling the boundary conditions.In order to decide on a minimizing property of that line we can apply Proposition1.4.3 or we can compute the second variation:

δ 2J(y)(h,h) =∫ b

a2(h′)2dx ≥ 0 for all y ∈ D and for all h ∈C1,pw

0 [a,b].

Exercise 1.4.4 shows that the line is a global minimizer.Finally we can argue directly as follows: Let y ∈ D and y = y+ y− y = y+ hwhere h ∈C1,pw

0 [a,b]. Then



J(y) = J(y+h) =∫ b

a(y′)2dx+2

∫ b

ay′h′dx+

∫ b

a(h′)2dx

≥ J(y)+2c1

∫ b

ah′dx= J(y),

since the second integral vanishes by the homogeneous boundary conditions.

3. The counterexample of Weierstraß reads as follows: J(y) =∫ 1

−1x2(y′)2dx

defined on D=C1[−1,1]∩{y(−1) =−1, y(1) = 1}. Apparently J(y)≥ 0 for ally ∈ D, and upon inserting

yn(x) =arctannxarctann

,

we find

J(yn) =∫ 1

−1

n2x2

(arctann)2(1+n2x2)2dx

<1

(arctann)2

∫ 1

−1

dx1+n2x2

=2

narctann.

Since limn→∞ J(yn) = 0, we deduce that infy∈D J(y) = 0. But there is no func-tion y ∈ D such that J(y) = 0. Such a function has to fulfill xy′(x) = 0 for allx ∈ [−1,1], which means y′(x) = 0 for x ∈ [−1,0)∪ (0,1], and in view of theboundary conditions y(x) = −1 for x ∈ [−1,0) and y(x) = 1 for x ∈ (0,1]. How-ever, such a function is not in the domain of definition D of J.The Euler-Lagrange equation

ddx

(2x2y′) = 0 or x2y′ = c1 has the solutions

y(x) = −c1x+ c2.

None of these functions is in D either.That example contradicts Dirichlet’s argument that there exists an admissiblefunction for which a functional, which is convex and bounded from below, attainsits minimum.

4. The length of the curve {(x,y(x))|x∈ [a,b]} between (a,A) and (b,B) is given by

the functional J(y)=∫ b

a

√1+(y′)2dx onD=C1,pw[a,b]∩{y(a)=A, y(b)=B}.

The Euler-Lagrange equation reads

ddx

y′√1+(y′)2

= 0 piecewise on [a,b] or

y′(x) =ci√1− c2i

= ci1 for x ∈ [xi−1,xi], i= 1, . . . ,m, having the solutions

y(x) = ci1x+ ci2 for x ∈ [xi−1,xi] ⊂ [a,b].



As in example 2 the continuity of Fy′(·,y,y′) and of y on the entire interval [a,b]imply ci1 = c1 and ci2 = c2 for i = 1, . . . ,m. With the constants of example 2 weobtain the line segment between (a,A) and (b,B), and by Exercise 0.1 of theIntroduction, it minimizes J on D.A different argument uses Exercise 1.4.4: By

Fy′y′(y′) =

1

(1+(y′)2)3/2> 0, Fyy(y′) = Fyy′(y

′) = 0

the representation in Exercise 1.2.2 implies δ 2F(y)(h,h)≥ 0 for all y∈D and forall h ∈C1,pw

0 [a,b]. Therefore the only solution of the Euler-Lagrange equation isa global minimizer.

5. J(y) =∫ b

ay2+(y′)2dx is defined on D=C1,pw[a,b]∩{y(a) = A, y(b) = B}. The

Euler-Lagrange equation reads (omitting the factors 2)

y′′ = y piecewise on [a,b],

having the solutions y(x) = ci1 coshx+ ci2 sinhx for x ∈ [xi−1,xi]. Continuity of yas well as of y′ by (1.4.3)1 means

(ci1 − ci+11 )coshxi+(ci2 − ci+1

2 )sinhxi = 0,

(ci1 − ci+11 )sinhxi+(ci2 − ci+1

2 )coshxi = 0,

whence ci1 = ci+11 = c1 and ci2 = ci+1

2 = c2 for i = 1, . . . ,m− 1. The boundaryconditions determine the constants in a unique way (exercise). Proposition 1.4.3answers the question whether the solution is an extremal. We can also computethe second variation:

δ 2J(y)(h,h) = 2∫ b

ah2+(h′)2dx ≥ 0

for all y ∈ D and for all h ∈C1,pw0 [a,b].

By Exercise 1.4.4 the solution y(x) = c1 coshx+ c2 sinhx is a global minimizer.

6. J(y) =∫ 1

0((y′)2 −1)2dx is defined on D =C1,pw[0,1]. Apparently J(y) ≥ 0 for

all y ∈ D, and J(y) = 0 for all piecewise continuously differentiable functionswith y′ = ±1. All sawtooth functions with slopes ±1 are global minimizers ofJ on D, cf. Figure 1.4. Even with boundary conditions there might be infinitelymany, for instance for y(0)= y(1)= 0. The Euler-Lagrange equation reads (omit-ting the factor 4)

ddx

((y′)2 −1)y′ = 0 piecewise on [a,b], and

((y′)2 −1)y′ is continuous on [a,b].



Fig. 1.4 A Global Minimizer

Fig. 1.5 A W-Potential

Solutions are given by piecewise lines but that do not necessarily have slopes± 1.We consider the graphs of W (z) = (z2 − 1)2 and of W ′(z) = 4(z2 − 1)z, inFigure 1.5. The Euler-Lagrange equation requires 4((y′)2 − 1)y′ = c having thesolutions y′ = ci, i = 1,2,3, cf. Figure 1.5. For c = 0 we obtainW (y′) > 0 and



Fig. 1.6 A Solution of the Euler-Lagrange Equation

Fig. 1.7 A Solution of the Euler-Lagrange Equation

therefore a function with slopes y′ = ci as sketched in Figure 1.6 is not a globalminimizer. For c= 0 we obtain the slopes y′ = −1,0,1, and becauseW (0) = 1,a function as sketched in Figure 1.7 is not a minimizer as well.The second variation reads

δ 2J(y)(h,h) =∫ 1

0W ′′(y′)(h′)2dx,

whereW ′′(z) = 4(3z2 −1) is sketched in Figure 1.8.Sawtooth functions with slopes y′ = ci where |ci| ≥ 1√

3fulfill the necessary

Legendre condition W ′′(y′) ≥ 0 for a local minimizer given in Exercise 1.4.3.The sufficient conditions given in Exercises 1.4.4 and 1.4.6 are not applicable. Tosummarize, global minimizers of a nonconvex variational problem are not deter-mined by the first and second variations alone. One needs additional necessaryconditions known as the “Weierstraß-Erdmann corner conditions,” cf. Paragraph1.11.Next we discuss three special cases.

7. The Lagrange function does not depend explicitly on x: J(y) =∫ ba F(y,y

′)dx isdefined on D ⊂ C1,pw[a,b], and let y ∈ D∩C2(a,b) be a local minimizer. Weneed the additional assumption on the regularity of y for technical reasons. Forglobal minimizers it is not required (cf. Proposition 1.11.2), and for an “ellip-tic” Euler-Lagrange equation, it is automatically fulfilled (cf. Exercise 1.5.1 and



Proposition 1.11.4). In any case the Euler-Lagrange equation reads

ddx

Fy′(y,y′) = Fy(y,y′) on [a,b],

Fig. 1.8 The W-Potential is not Convex

and due to the additional regularity of y we may compute:

ddx

(F(y,y′)− y′Fy′(y,y′))

= Fy(y,y′)y′ +Fy′(y,y′)y′′ − y′′Fy′(y,y′)− y′ d

dxFy′(y,y

′)

= (Fy(y,y′)− ddx

Fy′(y,y′))y′.

This shows that any solution of the Euler-Lagrange equation and any constantfunction solves the differential equation of first order

F(y,y′)− y′Fy′(y,y′) = c1 on [a,b].

Any solution of this differential equation solves the Euler-Lagrange equation orit is constant. If it can be solved for y′ one obtains y′ = f (y;c1), which, in turn,is solved by “separating the variables”:

Let h(y;c1) be a primitive of1

f (y;c1), i.e.,

ddy

h(y;c1) =1

f (y;c1)= 0.

There exists the inverse function h−1 and



y(x) = h−1(x+ c2;c1) solves y′ = f (y;c1), sinceddx

y(x) =1

ddyh(y(x);c1)

= f (y(x);c1).

8. The Lagrange function does no depend explicitly on y: J(y) =∫ ba F(x,y

′)dx isdefined on D ⊂C1,pw[a,b]. The Euler-Lagrange equation reads

ddx

Fy′(·,y′) = 0 piecewise on [a,b], whence

Fy′(x,y′(x)) = c1 on [a,b].

Observe that due to (1.4.3)1, Fy′(·,y′) ∈ C[a,b]. If the latter equation can besolved for y′, i.e.,

y′(x) = f (x;c1), then by integration

y(x) =∫

f (x;c1)dx+ c2.

If D =C1,pw[a,b] then c1 = 0. Indeed, for any h ∈C1,pw[a,b] the weak versionof the Euler-Lagrange equation (1.4.4) and Lemma 1.3.3 give

∫ b

aFy′(x,y

′)h′dx= 0=∫ b

ac1h

′dx= c1(h(b)−h(a)).

Hence c1 = 0 since h(a) and h(b) are arbitrary.9. The Lagrange function does not depend explicitly on y′: J(y) =

∫ ba F(x,y)dx is

defined on D ⊂Cpw[a,b]. The Euler-Lagrange equation reads

Fy(x,y(x)) = 0 on [a,b]

which needs to be solved for y. This is not a differential equation.

Remark. Cases 8 and 9 seem to be identical upon substituting y′ = u in case 8.However, one obtains according to case 9

Fu(x,u(x)) = 0, u= y′,

whereas case 8 givesFy′(x,y

′(x)) = c1.

When computing the first variation one has to realize that the perturbation in case8 is of the form y′ + th′ and that it is not sufficient to simply set y′ = u. A localminimizer has to fulfill

δJ(u)h′ = 0 for all h ∈C10 [a,b],

which by Lemma 1.3.2 implies Fu(·,u) = c1. On the other hand, we see in case 8that c1 = 0 if D=C1,pw[a,b].



Exercises

1.5.1. Assume that the Lagrange function of the functional J(y) =∫ ba F(x,y,y

′)dxis two times continuously partially differentiable with respect to all three variablesand that y ∈C1,pw[a,b]∩C1[xi−1,xi] is a solution of the Euler-Lagrange equation

ddx

Fy′(·,y,y′) = Fy(·,y,y′) on [xi−1,xi] ⊂ [a,b].

If Fy′y′(x0,y(x0),y′(x0)) = 0 for some x0 ∈ (xi−1,xi), prove that y is two times con-tinuously differentiable in a neighborhood of x0 in (xi−1,xi). Hence, local ellipticityimplies local regularity.

Hint: Apply the implicit function theorem.

1.5.2. Find extremals of J(y) =∫ 10 F(x,y,y

′)dx in D = C1,pw[0,1]∩ {y(0) = 0,y(1) = 1} where

a) F(x,y,y′) = y′, b) F(x,y,y′) = yy′, c) F(x,y,y′) = xyy′.

Compute the supremum of J in case c).

1.5.3. Find solutions of the Euler-Lagrange equations inC2[0,1] for

a) J(y) =∫ 1

0((y′)2+2y)dx, y(0) = 0, y(1) = 1,

b) J(y) =∫ 2

−1((y′)2+2yy′)dx, y(−1) = 1, y(2) = 0,

c) J(y) =∫ 1

0((y′)2+2xy′ + x2)dx, y(0) = 0, y(1) = 0,

d) J(y) =∫ 2

0((y′)2+2yy′ + y2)dx, y(0) = 0, y(2) = 1.

Compute the second variations of J and find out whether the solutions of theEuler-Lagrange equations are local or global extremals among all admitted solutionsfulfilling the same boundary conditions.

1.6 Minimal Surfaces of Revolution

When the graph of a continuous positive function y is revolved around the x-axis itgenerates a surface of revolution as sketched in Figure 1.9.

The area of a surface of revolution generated by a function y∈C1[a,b] is given by

J(y) = 2π∫ b

ay√1+(y′)2dx. (1.6.1)

The variational problem reads as follows: Which curve connecting (a,A) and (b,B)generates a surface of revolution having the smallest surface area? Such a surface iscalled minimal surface of revolution.


1.6 Minimal Surfaces of Revolution 25

Fig. 1.9 A Surface of Revolution

The functional (1.6.1) is defined on D = C1[a,b]∩{y(a) = A, y(b) = B}. By arescaling we normalize the problem as follows:

y(x) = y(a+Ax)/A for x ∈[0,

b−aA

]= [0, b] (1.6.2)

gives

J(y) = 2πA2∫ b

0y√1+(y′)2dx, y(0) = 1, y(b) =

BA= B. (1.6.3)

We omit the factor 2πA2 as well as the tilde, and we study the normalized problem

J(y) =∫ b

0y√

1+(y′)2dx on

D=C1[0,b]∩{y(0) = 1, y(b) = B}.(1.6.4)

The functional (1.6.4) is treated as the special case 7 in Paragraph 1.5. The requiredadditional regularity of a local minimizer is guaranteed by Exercise 1.5.1: In view of

Fy′y′(y,y′) =

y

(1+(y′)2)3/2> 0, provided y> 0, (1.6.5)

on [a,b], any positive solution of the Euler-Lagrange equation in D is in C2(a,b),and as discussed in case 7 of Paragraph 1.5, it solves the differential equation

F(y,y′)− y′Fy′(y,y′) = c1 on [a,b]. (1.6.6)

For the Lagrange function F(y,y′) = y√

1+(y′)2, equation (1.6.6) yields



y= c1√

1+(y′)2 and

y′ =

√y2 − c21c21

= f (y;c1).(1.6.7)

By a separation of variables as described in case 7 of Paragraph 1.5, we obtain thesolution

y(x) = c1 cosh

(x+ c2c1

), (1.6.8)

which for c1 > 0 is a positive solution of the Euler-Lagrange equation. The function(1.6.8) is called a catenary since it describes a hanging chain, cf. Paragraph 2.3. Theconstants c1 > 0 and c2 have to be determined such that the boundary conditionsy(0) = 1 and y(b) = B are fulfilled. It is not obvious that this is possible at all, and ifso, that it is possible in a unique way. An experiment with a soap film between tworings shows that an increase of the distance b between the rings causes a contractionof the surface followed by a ripping of the film such that it forms finally two discs.That scenario is also described by the mathematics above.

Fig. 1.10 Transition to the Goldschmidt Solution


http://dx.doi.org/10.1007/978-3-319-71123-2_2

1.7 Dido’s Problem 27

If b is large compared to B, the area of any surface of revolution is bigger thanthe area π(1+ B2) of the two discs. This “Goldschmidt solution,” named afterB. Goldschmidt (1807–1851) and sketched in Figure 1.10, is not an admitted solu-tion of the original variational problem. The latter has no solution for large b, since(1.6.8) cannot fulfill the boundary conditions. If b decreases there exist two solu-tions (1.6.8), whose areas are still bigger than that of the Goldschmidt solution, butthe upper solution generates a locally minimizing surface. If b decreases even morethen the upper solution generates a globally minimizing surface with an area that issmaller than that of the Goldschmidt solution.

A precise analysis of this problem can be found in [3], p. 80, p. 436, [2], p. 82,[13], p. 298.

1.7 Dido’s Problem

As explained in the Introduction, the variational problem can be reduced to the fol-lowing: Which curve with prescribed length L in the upper half-plane having itsendpoints on the x-axis encloses together with the x-axis a region of maximal area?Admitted, however, are only curves that are graphs of functions of x. In Paragraph2.2 we revisit this problem in more generality.

Fig. 1.11 Dido’s Problem for Graphs of Functions

Since the endpoints are not fixed it is convenient to parametrize the curve by itsarc length. If y ∈C1[a,b]∩{y(a) = 0, y(b) = 0} then the arc length from (a,0) to(x,y(x)) is given by

s(x) =∫ x

a

√1+(y′(ξ ))2dξ where s(a) = 0, s(b) = L. (1.7.1)

Since s(x) is strictly monotone the inverse function exists and is a continuouslydifferentiable function x : [0,L] → [a,b]. Then

{(x,y(x))|x ∈ [a,b]} = {(x(s),y(x(s))|s ∈ [0,L]} (1.7.2)


http://dx.doi.org/10.1007/978-3-319-71123-2_2


is the parametrization of the curve by its arc length, cf. Figure 1.11.Next the area

∫ ba ydx has to be expressed by the arc length:

dxds

(s) =1

dsdx (x(s))

=1√

1+(y′(x(s)))2,

dds

y(x(s)) = y′(x(s))dxds

(s), and thus

dxds

(s) =

√1−

(dds

y(x(s)))2

.

(1.7.3)

The area transforms according to the substitution formula:

∫ b

ay(x)dx=

∫ L

0y(x(s))

dxds

(s)ds=∫ L

0y(x(s))

√1−

(dds

y(x(s)))2

ds. (1.7.4)

Denoting y(s) = y(x(s)) we obtain the functional

J(y) =∫ L

0y√1− (y′)2ds, defined on

D=C1[0,L]∩{y(0) = 0, y(L) = 0},(1.7.5)

which is to be maximized. We proceed as in the special case 7 of Paragraph 1.5. Inview of

Fy′ y′(y, y′) = − y

(1− (y′)2)3/2< 0 for y> 0, (1.7.6)

Exercise 1.5.1 implies that any positive solution of the Euler-Lagrange equation isinC2(0,L) and solves

F(y, y′)− y′Fy′(y, y′) = c1 on [0,L]. (1.7.7)

For F(y, y′) = y√1− (y′)2 this gives the equations

y= c1√

1− (y′)2 and

y′ =

√1−

(yc1

)2

= f (y;c1),(1.7.8)

which admit solutions (obtained by separation of variables)

y(s) = c1 sin

(s+ c2c1

). (1.7.9)


1.8 The Brachistochrone Problem of Johann Bernoulli 29

For c1 > 0 and 0 < (s+ c2)/c1 < π , these are positive solutions of the Euler-Lagrange equation. The boundary conditions determine the constants as follows:

y(0) = c1 sinc2c1

= 0, c2 = 0,

y(L) = c1 sinLc1

= 0, c1 =Lπ,

(1.7.10)

since the solution must be positive. Finally, by (1.7.3),

x(s) = a+∫ s

0

dxds

(σ)dσ = a+∫ s

0

√1− (y′(σ))2dσ

= a+∫ s

0sin

πL

σdσ = a+Lπ

− Lπcos

πLs,

(1.7.11)

and the curve {(a+

Lπ

− Lπcos

πLs,

Lπsin

πLs

)∣∣∣∣s ∈ [0,L]}

(1.7.12)

is a semi-circle having center (a+Lπ,0), radius

Lπ, and length L, cf. Figure 1.12.

Fig. 1.12 Solution of Dido’s Problem

1.8 The Brachistochrone Problem of Johann Bernoulli

The problem was posed by Johann Bernoulli in 1696 as follows: Given two pointsA and B in a vertical plane, what is the curve traced out by a point mass M acted ononly by gravity, which starts at A and reaches B in shortest time?

That curve is called brachistochrone.We treat the problem following Euler and Lagrange, and we point out that

Bernoulli could not know their results published more than 50 years later. Belowwe sketch Bernoulli’s arguments based on Fermat’s principle of least time for abeam of light and on Snell’s refraction law.



Fig. 1.13 An Admitted Curve for Bernoulli’s Problem

The physical problem to be solved first is to express the running time of the pointmass in terms of its trajectory. Let the starting point of the trajectory be the origin(0,0) of a coordinate system whose y-axis points downwards.

We parametrize the trajectory by the time t: {(x(t),y(t))|t ∈ [0,T ]}. Here(x(0),y(0)) = (0,0), (x(T ),y(T )) = (b,B) is the endpoint and T is the running time.We assume continuous differentiability such that the length of the tangent (x(t), y(t))is the velocity v(t) =

√x(t)2+ y(t)2. Then the arc length is given by

s(t) =∫ t

0

√x(τ)2+ y(τ)2dτ =

∫ t

0v(τ)dτ and

dsdt

(t) = v(t).(1.8.1)

By conservation of energy, the sum of kinetic and potential energy is constantalong the curve:

12mv2+mg(h0 − y) = mgh0 and

v=√2gy,

(1.8.2)

where m is the mass of M, g is the gravitational acceleration, and the constant h0 isa fictitious height. For t = 0 we have y(0) = 0 and v(0) = 0.

By “physical evidence” we assume that the trajectory can also be parametrizedby x: {(x(t),y(t))|t ∈ [0,T ]} = {(x, y(x))|x ∈ [0,b]}. The arc length is now



s(t) =∫ x(t)

a

√1+(y′(ξ ))2dξ and the velocity is

dsdt

(t) =√

1+(y′(x(t)))2x(t) = v(t) =√2gy(x(t)),

(1.8.3)

by (1.8.2)2. This gives finally the running time

T =∫ T

01dt =

∫ T

0

√1+(y′(x(t)))2

2gy(x(t))x(t)dt

=∫ b

0

√1+(y′(x))2

2gy(x)dx,

(1.8.4)

by the substitution formula. Omitting the factor 1√2g, the functional to be

minimized is

J(y) =∫ b

0

√1+(y′)2

ydx, (1.8.5)

which has the following anomalies: For physical reasons we expect y′(0) =+∞, andsince y(0) = 0, (1.8.5) is an improper integral. Finally, only positive functions areadmitted such that the domain of definition of J is

D=C[0,b]∩C1,pw(0,b]∩{y(0) = 0, y(b) = B}∩ {y> 0 in (0,b]}∩{J(y)< ∞}. (1.8.6)

We compute the first variation on an interval [δ ,b] where δ is small: For any δ > 0there is a d> 0 such that y(x)≥ d> 0 for y∈ D and x∈ [δ ,b]. Choose h∈C1,pw

0 [δ ,b]with a support supp(h) ⊂ [δ ,b]. For any h there exists an ε > 0 such that y+ th ∈ Dfor t ∈ (−ε,ε). For a local minimizer y of J on D we obtain

δJ(y)h=∫ b

δFy(y, y′)h+Fy′(y, y

′)h′dx= 0

for all h ∈C1,pw0 [δ ,b].

(1.8.7)

Relation (1.8.7), in turn, implies that y solves the Euler-Lagrange equation piecewiseon [δ ,b]. In particular, by (1.4.3)1,

f := Fy′(y, y′) ∈C[δ ,b] or

y′√1+(y′)2

= f√y ∈C[δ ,b], | f√y| < 1 or

y′ =

√f 2y

1− f 2y∈C[δ ,b] and y ∈C1[δ ,b],

(1.8.8)



since y ∈C[δ ,b]. Therefore the supplement of Proposition 1.4.1 applies, and sinceδ > 0 is arbitrarily small, y solves the Euler-Lagrange equation in (0,b]. In view of

Fy′ y′(y, y′) =

1√y

1

(1+(y′)2)3/2> 0 on (0,b], (1.8.9)

Exercise 1.5.1 guarantees that y ∈C2(0,b) and the method of the special case 7 ofParagraph 1.5 is applicable, yielding

F(y, y′)− y′Fy′(y, y′) = c1 on (0,b] and finally

y′ =

√2r− yy

where 2r =1

c21> 0.

(1.8.10)

This differential equation is not solved by known special functions. We proceed inthe same way as Bernoulli, i.e., we transform (1.8.10) into a parametric differentialequation. To this purpose the parameter x is replaced by a new parameter τ which isnot the physical time (Figure 1.13):

(x, y(x)) = (x(τ), y(τ)), x ∈ [0,b], τ ∈ [τ0,τb]. (1.8.11)

The ansatz y(τ) = r(1− cosτ) yields by (1.8.10)2

y(τ) = y(x(τ)),dydτ

(τ) = y′(x(τ))dxdτ

(τ),

dxdτ

(τ) =dydτ

(τ)/y′(x(τ)) =dydτ

(τ)

√y(τ)

2r− y(τ)

= r sinτ√

1− cosτ1+ cosτ

= r(1− cosτ),

(1.8.12)

where we use sinτ =√1− cos2 τ . Integration gives

x(τ) = r(τ − sinτ)+ c2,

y(τ) = r(1− cosτ) for τ ∈ [τ0,τb].(1.8.13)

The four constants are determined by the boundary conditions:

y(τ0) = r(1− cosτ0) = 0 ⇒ τ0 = 0,

x(τ0) = x(0) = c2 = 0

x(τb) = r(τb − sinτb) = b

y(τb) = r(1− cosτb) = B.

(1.8.14)

AccordinglybB

=τb − sinτb1− cosτb

=: f (τb). The function f (τb) for τb ∈ (0,2π) is

sketched in Figure 1.14, cf. also Exercise 1.8.1:



Fig. 1.14 Graph of the Function f

The function f (τb) is strictly monotonically increasing, f (0) = 0 andlim

τb→2πf (τb) = +∞. This implies:

There is precisely one τb ∈ (0,2π) such that f (τb) =bB> 0,

bB

≤ π2

⇒ τb ≤ π,bB>

π2

⇒ τb ∈ (π,2π).(1.8.15)

The last constant r is determined by

r =B

1− cosτb. (1.8.16)

The resulting brachistochrone,

x(τ) = r(τ − sinτ),y(τ) = r(1− cosτ) for τ ∈ [0,τb],

(1.8.17)

is a cycloid, depicted in Figure 1.15.A fixed point on the perimeter of a rolling wheel with radius r traces out a cycloid.

It is remarkable that for bB ≤ π

2 , i.e., for τb ≤ π , the trajectory descends, whereas forbB > π

2 , i.e., for τb ∈ (π,2π), it ascends.The parameter τ gives the physical running time t on a cycloid with parameter r

as follows. By analogy to (1.8.4),



Fig. 1.15 A Cycloid

t0 =∫ x(t0)

0

√1+(y′)2

2gydx=

1√2g

∫ τ0

0

√1+(y′(x(τ))2

y(x(τ))dxdτ

(τ)dτ, (1.8.18)

where we apply the substitution formula with x(t0) = x(τ0). Using y(x(τ)) = y(τ),the formulas (1.8.12), (1.8.17), and (1.8.18) give

√rgτ0. Thus the relation between

the parameter τ and the physical time t is

t =√

rg

τ, and in particular, T =√

rg

τb. (1.8.19)

We have to leave open whether the cycloid gives indeed the minimal value. For thefunctional (1.8.5) we have J(y)=

√2gT <∞, whence y∈ D, cf. (1.8.6). Furthermore

its tangent at (0,0) is vertical:

dydx

(x) =dydτ

(τ)/ dxdτ

(τ) → +∞ for

{x ↘ 0,

τ ↘ 0.(1.8.20)

Now we sketch how Bernoulli solved his variational problem. He didn’t have theresults of Euler and Lagrange but he knew Fermat’s principle of least time. Thatprinciple, in turn, implies Snell’s refraction law (5) as described in the Introduc-tion. Bernoulli discretized the continuous problem. In thin layers he assumed a con-stant velocity along straight line segments. Increasing velocities decrease the slopesaccording to Snell’s refraction law as sketched in Figure 1.16.



Fig. 1.16 On Bernoulli’s Solution of the Brachistochrone Problem

Snell’s law gives

sinαk

sinαk+1=

vkvk+1

,sinαk+1

sinαk+2=

vk+1

vk+2etc. or

vksinαk

= c for all k.(1.8.21)

The slope of the line segment in the kth layer is y′k = tan

(π2 −αk

), where it has to

be taken into account that the y-axis points downwards. Well-known formulas of



trigonometry imply

√1+(y′

k)2 =

√1+ tan2

(π2

−αk

)=

1cos(π

2 −αk)=

1sinαk

, (1.8.22)

and using vk =√2gyk from (1.8.2)2 and (1.8.21)2 one obtains

√2gyk(1+(y′

k)2) = c or

y′k =

√2r− ykyk

where 2r =c2

2g.

(1.8.23)

A transition from the discretized to the continuous problem yields finally the differ-ential equation (1.8.10)2, whose parametric form is solved by a cycloid.

Remark. Bernoulli did not prove that the cycloid is indeed the curve that a pointmass traces out to reach a given endpoint in shortest time. For him and his contem-poraries this was evident. As a matter of fact, his proof using Snell’s refraction lawgives stronger evidence than a proof using the Euler-Lagrange equation. Weierstraßproved the minimizing property of the cycloid much later, cf. [2], p. 46.

Exercises

1.8.1. Analyze the function f (τ) =τ − sinτ1− cosτ

for τ ∈ [0,2π).

1.8.2. Let (0,0) be the starting point and (b,B) = (b, 2π b) be the endpoint in acoordinate system {(x,y)}where the y-axis points downwards. Compare the runningtime of a point mass acted on by gravity on the line segment and on the cycloid from(0,0) to (b,B). Compute the ratio of the running times.

1.9 Natural Boundary Conditions

Let the functional

J(y) =∫ b

aF(x,y,y′)dx. (1.9.1)

be defined on D=C1,pw[a,b]. Then for any y ∈ D, y+ th ∈ D for all h ∈C1,pw[a,b]and for all t ∈ R. Under the hypotheses of Proposition 1.2.2 the first variation existsin all y and in all directions h in D=C1,pw[a,b] and is given by

δJ(y)h=∫ b


′)h′dx. (1.9.2)


1.9 Natural Boundary Conditions 37

For a local minimizer δJ(y)h = 0 for all h ∈ D and therefore also for allh ∈ C1,pw

0 [a,b] ⊂ D. By Proposition 1.4.1 y solves the Euler-Lagrange equation(1.4.3), and, in addition, y fulfills the natural boundary conditions:

Proposition 1.9.1. Under the hypotheses of Proposition 1.4.1 a local minimizery ∈C1,pw[a,b] of the functional (1.9.1) solves the Euler-Lagrange equation (1.4.3)and fulfills the natural boundary conditions

Fy′(a,y(a),y′(a)) = 0 and

Fy′(b,y(b),y′(b)) = 0.

(1.9.3)

Proof. For y ∈ D, the statements (1.4.3) hold, and integration by parts, cf. Lemma1.3.3, yields

δJ(y)h=∫ b

aFyh+Fy′h

′dx

=∫ b

a(Fy − d

dxFy′)hdx+Fy′h

∣∣ba,

(1.9.4)

where we use the abbreviations Fy = Fy(·,y,y′) and Fy′ = Fy′(·,y,y′). By the Euler-Lagrange equation in its weak form (1.4.4) and its strong form (1.4.3), relation(1.9.4) implies

Fy′(b,y(b),y′(b))h(b)−Fy′(a,y(a),y

′(a))h(a) = 0. (1.9.5)

Choosing h(b) = 0 and h(a) = 0 proves the natural boundary condition at x= a andh(a) = 0, h(b) = 0 proves it at x= b. �

Remark. If for a local minimizer y ∈ D, one boundary condition y(a) = A ory(b) = B is prescribed, then at the respective free boundary the natural boundary isfulfilled.

Examples:

1. The functional J(y) =∫ ba

√1+(y′)2dx defined on D=C1,pw[a,b] describes the

length of a graph {(x,y(x))|x ∈ [a,b]} between the vertical lines x= a and x= b.In Example 4 of Paragraph 1.5 we have seen that a minimizer of J is among thelines y(x) = c1x+ c2. The natural boundary conditions are

Fy′(a,y(a),y′(a)) =

y′(a)√1+(y′(a))2

= 0 whence y′(a) = 0,

and analogously y′(b) = 0. Minimizers are therefore y(x) = c2.2. What is the curve traced out by a point mass acted on only by gravity which

starts at (0,0) and reaches the vertical line x= b in shortest time?



Fig. 1.17 Admitted Curves for Example 2

The functional (1.8.5) is defined on (1.8.6) without a boundary condition atx= b. A minimizer fulfills the Euler-Lagrange equation (1.8.10) with the bound-ary condition y(0) = 0. The natural boundary condition at x= b is

y′(b)√y(b)(1+(y′(b))2

= 0 whence y′(b) = 0.

After transformation into a parametric differential equation, solutions are againcycloids (1.8.17), and the natural boundary condition for y(τ) = y(x(τ)) implies

dydτ

(τb) = y′(x(τb))dxdτ

(τb) = y′(b)dxdτ

(τb) = 0, and whence,

r sinτb = 0.

This gives τb = π , and by x(τb) = r(π − sinπ) = rπ = b, we obtain the cycloid

x(τ) =bπ(τ − sinτ),

y(τ) =bπ(1− cosτ) for τ ∈ [0,π],

meeting the vertical line at (b, 2π b) orthogonally (Figure 1.17).


1.10 Functionals in Parametric Form 39

Exercises

1.9.1. a) Find all solutions of the Euler-Lagrange equation of the functional

J(y) =∫ 1

0(y′)2+ y dx

on C1,pw[0,1].b) Which solutions fulfill the natural boundary conditions?c) Which solutions fulfill the boundary conditions y(0) = 0, y(1) = 1?d) Which solutions are local extremals without and with boundary conditions?

Which are globally extremal?

1.9.2. Does the functional

J(y) =∫ b

a(y′)2+ arctany dx

have local or global extremals inC1,pw[a,b]? Is the functional bounded below?

1.10 Functionals in Parametric Form

In some cases we have parametrized graphs {(x,y(x))|x ∈ [a,b]} of functions like{(x(t), y(t))|t ∈ [ta, tb]}. Apparently the class of parametrized curves is bigger thanthe class of graphs, cf. Figure 1.18.

But even if the admitted curves are graphs, a parametrization can give more infor-mation about minimizing curves. This will be demonstrated in the next paragraph.

Fig. 1.18 A Graph of a Function and a Parametrized Curve



Definition 1.10.1. A functional

J(x,y) =∫ tb

taΦ(x,y, x, y)dt, defined on

D ⊂ (C1,pw[ta, tb])2 with a continuous function

Φ : R4 → R,

(1.10.1)

is called a functional in parametric form or shortly a parametric functional. Curvesin D may have to fulfill boundary conditions at t = ta or at t = tb.

Without loss of generality we assume that each component of (x,y) ∈(C1,pw[ta, tb])2 =C1,pw[ta, tb]×C1,pw[ta, tb] has the same partition ta = t0 < t1 < · · ·<tm = tb and both components x and y are in C1[ti−1, ti], i = 1, . . . ,m. The integral(1.10.1) is defined as a sum of integrals over [ti−1, ti], cf. (1.1.3).

Two examples:∫ tbta

√x2+ y2dt is the length of a parametrized curve (x,y) ∈ D.

The Lagrange function L(x,y, x, y) = 12m(x

2 + y2)−V (x,y), with a potential V :R2 → R, is called the free energy and

∫ tbta L(x,y, x, y)dt is called the action of the

point mass m along the curve {(x(t),y(t))|t ∈ [ta, tb]}.There is an important difference between these two examples:Whereas the length

of a curve does not depend on its parametrization, the physical quantities like thekinetic energy and the action along a curve require its parametrization by physicaltime. The following definition takes this difference into account:

Definition 1.10.2. A parametric functional (1.10.1) is invariant if

Φ(x,y,α x,α y) = αΦ(x,y, x, y) (1.10.2)

for all α > 0 and for all (x,y, x, y) ∈ R4.

Under the assumption (1.10.2), the functional (1.10.1) is invariant when repara-metrized. To this purpose we define:

Definition 1.10.3. Let the function ϕ ∈C1[τa,τb] fulfill ϕ(τa) = ta, ϕ(τb) = tb, anddϕdτ (τ)> 0 for all τ ∈ [τa,τb] (with one-sided derivatives at the boundary).

Then ϕ : [τa,τb] → [ta, tb] is bijective and {(x(ϕ(τ)),y(ϕ(τ))) = (x(τ), y(τ))|τ ∈[τa,τb]} is a reparametrization of the curve {(x(t),y(t))|t ∈ [ta, tb]}.

If (x,y) ∈ (C1,pw[ta, tb])2 (with boundary conditions at t = ta or t = tb) then(x, y) ∈ (C1,pw[τa,τb])2 (with boundary conditions at τ = τa or τ = τb).



Proposition 1.10.1. If the parametric functional is invariant according to Defini-tion 1.10.2, then for any reparametrization, it follows that

J(x, y) =∫ τb

τaΦ

(x, y,

dxdτ

,dydτ

)dτ = J(x,y). (1.10.3)

Proof. By analogy with (1.1.3) the integral (1.10.1) is the sum of integrals over[ti−1, ti] where (x,y) ∈ (C1[ti−1, ti])2. Then (x, y) ∈ (C1[τi−1,τi])2 where ϕ(τi) =ti, i = 0, . . . ,m and J(x, y) is the sum of integrals over [τi−1,τi]. For any summand,we apply (1.10.2), and the substitution formula yields

∫ τi

τi−1

Φ(x, y,dxdτ

,dydτ

)dτ

=∫ τi

τi−1

Φ(x(ϕ(τ)),y(ϕ(τ)), x(ϕ(τ))dϕdτ

(τ), y(ϕ(τ))dϕdτ

(τ))dτ

=∫ τi

τi−1

Φ(x(ϕ(τ)),y(ϕ(τ)), x(ϕ(τ)), y(ϕ(τ)))dϕdτ

(τ)dτ

=∫ ti

ti−1

Φ(x,y, x, y)dt.

(1.10.4)

�

Proposition 1.10.1 is applicable to J(x,y) =∫ tbta

√x2+ y2dt, and therefore the

length of a curve (x,y) does not depend on its parametrization.Next we define the first variation for parametric functionals. Like in

Paragraph 1.2 we assume that for (x,y) ∈ D ⊂ (C1,pw[ta, tb])2, the perturbation(x,y) + s(h1,h2) ∈ D, where h = (h1,h2) ∈ (C1,pw

0 [ta, tb])2 and s ∈ (−ε,ε). TheGâteaux differential dJ(x,y,h1,h2) is given by g′(0) for g(s) = J((x,y)+ s(h1,h2)),provided the derivative exists. This is the case under the assumptions of the follow-ing Proposition:

Proposition 1.10.2. Let the parametric functional (1.10.1) be defined on D ⊂(C1,pw[ta, tb])2 and let its Lagrange function Φ : R4 → R be continuously partiallydifferentiable with respect to all four variables (or continuously totally differen-tiable). Then for all (x,y) ∈ D and all h = (h1,h2) ∈ (C1,pw

0 [ta, tb])2, the Gâteauxdifferential exists and it is represented by

dJ(x,y,h1,h2) =∫ tb

taΦxh1+Φyh2+Φxh1+Φyh2dt. (1.10.5)

Here Φx,Φy,Φx,Φy denote the partial derivatives with respect to the four variablesand in (1.10.5) we use the abbreviations Φx = Φx(x,y, x, y) etc.

Proof. The proof is a minor modification of the proof of Proposition 1.2.1. �



Since the Gâteaux differential (1.10.5) is linear in h = (h1,h2), we call it as inDefinition 1.2.2 the first variation of J in (x,y) in direction h= (h1,h2), denoted by

dJ(x,y,h1,h2) = δJ(x,y)h. (1.10.6)

Finally, if J is defined on all of (C1,pw[ta, tb])2 then

J : (C1,pw[ta, tb])2 → R is continuous, and

δJ(x,y) : (C1,pw[ta, tb])2 → R is linear and continuous.(1.10.7)

If a curve (x,y) ∈ D ⊂ (C1,pw[ta, tb])2 is a local minimizer of J according to Def-inition 1.4.1, meaning

J(x,y) ≤ J(x, y) for all (x, y) ∈ D, where

‖x− x‖1,pw,[ta,tb] < d and ‖y− y‖1,pw,[ta,tb] < d,(1.10.8)

then the function g(s) = J((x,y) + s(h1,h2)) has a local minimum at s = 0,whence g′(0) = 0. This implies, cf. Proposition 1.4.1:

Proposition 1.10.3. Let the curve (x,y) ∈ D ⊂ (C1,pw[ta, tb])2 be a local minimizerof the parametric functional (1.10.1) and let the Lagrange function Φ : R4 → R becontinuously totally differentiable. Then

(Φx(x,y, x, y),Φy(x,y, x, y)) ∈ (C1,pw[ta, tb])2 and

ddt

Φx(x,y, x, y) = Φx(x,y, x, y) piecewise on [ta, tb],

ddt

Φy(x,y, x, y) = Φy(x,y, x, y) piecewise on [ta, tb].

(1.10.9)

Proof. We follow the proof of Proposition 1.4.1. In view of g′(0)= 0, (1.10.5) yields

δJ(x,y)h=∫ tb

taΦxh1+Φyh2+Φxh1+Φyh2dt = 0 (1.10.10)

for all h = (h1,h2) ∈ (C1,pw0 [ta, tb])2. Choosing h2 ≡ 0 and h1 ≡ 0, respectively, we

obtain ∫ tb

taΦxh1+Φxh1dt = 0 for all h1 ∈C1,pw

0 [ta, tb],∫ tb

taΦyh2+Φyh2dt = 0 for all h2 ∈C1,pw

0 [ta, tb].(1.10.11)

The claim (1.10.9) is a consequence of Lemma 1.3.4. �Equations (1.10.9) are the Euler-Lagrange equations for a locally minimizing

curve of the parametric functional (1.10.1). Our remarks following Proposition 1.4.1are valid here as well.



For an invariant parametric functional (1.10.1), any local minimizer (x,y) ∈D ⊂ (C1,pw[ta, tb])2 remains a local minimizer under reparametrization (x, y) ∈ D ⊂(C1,pw[τa,τb])2, cf. Proposition 1.10.1. Consequently, (x, y) fulfills the same Euler-Lagrange equations as does (x,y).

The following proposition states the invariance of the Euler-Lagrange equations.

Proposition 1.10.4. If the parametric functional (1.10.1) is invariant according toDefinition 1.10.2, then for any reparametrization (x(τ), y(τ)) = (x(ϕ(τ)),y(ϕ(τ)))of a curve (x,y)∈ (C1,pw[ta, tb])2 according to Definition 1.10.3, the following equiv-alence holds for a continuously totally differentiable Lagrange function Φ:

(Φx(x, y,ddτ

x,ddτ

y),Φy(x, y,ddτ

x,ddτ

y)) ∈ (C1,pw[τa,τb])2,

ddτ

Φx(x, y,ddτ

x,ddτ

y) = Φx(x, y,ddτ

x,ddτ

y),

ddτ

Φy(x, y,ddτ

x,ddτ

y) = Φy(x, y,ddτ

x,ddτ

y),

(1.10.12)

holds piecewise on [τa,τb], respectively, if and only if

(Φx(x,y, x, y),Φy(x,y, x, y)) ∈ (C1,pw[ta, tb])2,ddt

Φx(x,y, x, y) = Φx(x,y, x, y),

ddt

Φy(x,y, x, y) = Φy(x,y, x, y),

(1.10.13)

holds piecewise on [ta, tb], respectively.

Proof. By differentiation, relation (1.10.2) gives

Φx(x,y,α x,α y) = αΦx(x,y, x, y),Φy(x,y,α x,α y) = αΦy(x,y, x, y),Φx(x,y,α x,α y) = Φx(x,y, x, y),Φy(x,y,α x,α y) = Φy(x,y, x, y),

(1.10.14)

where the last two equations are divided by α > 0. Inserting x(τ) = x(ϕ(τ)), ddτ

x(τ) = x(ϕ(τ)) dϕdτ (τ), y(τ) = y(ϕ(τ)), and d

dτ y(τ) = y(ϕ(τ)) dϕdτ (τ) into

(1.10.14)3,4 yields

Φx(x, y,ddτ

x,ddτ

y)(τ) = Φx(x,y, x, y)(ϕ(τ)),

Φy(x, y,ddτ

x,ddτ

y)(τ) = Φy(x,y, x, y)(ϕ(τ)),(1.10.15)



which proves the equivalence of (1.10.12)1 and (1.10.13)1. By (1.10.15)1 and(1.10.14)1 we obtain

ddτ

Φx(x, y,ddτ

x,ddτ

y)(τ) =ddt

Φx(x,y, x, y)(ϕ(τ))dϕdτ

(τ),

Φx(x, y,ddτ

x,ddτ

y)(τ) = Φx(x,y, x, y)(ϕ(τ))dϕdτ

(τ).(1.10.16)

Since dϕdτ (τ)> 0 for all τ ∈ [τa,τb], the relations (1.10.16) prove the equivalence of

(1.10.12)2 and (1.10.13)2. Analogously the equivalence of (1.10.12)3 and (1.10.13)3follows from (1.10.15)2 and (1.10.14)2. �

Remark. Be careful with reparametrizations of parametric functionals. If, forinstance, a curve is parametrized by its arc length, then the Lagrange function of theparametric functional describing the length of the curve simplifies to the constant 1,cf. (2.6.6). But now the parameter interval depends on the individual curve.

The natural boundary conditions on local minimizers read as follows:

Proposition 1.10.5. Let (x,y) ∈ D ⊂ (C1,pw[ta, tb])2 be a local minimizer of theparametric functional (1.10.1) with continuously totally differentiable Lagrangefunction Φ . If the components x and/or y are free at t = ta and/or t = tb, then theyfulfill the natural boundary conditions

Φx(x(ta),y(ta), x(ta), y(ta)) = 0 and/or

Φy(x(ta),y(ta), x(ta), y(ta)) = 0 and/or

Φx(x(tb),y(tb), x(tb), y(tb)) = 0 and/or

Φy(x(tb),y(tb), x(tb), y(tb)) = 0, respectively.

(1.10.17)

Proof. We consider only the case when x is free at t = ta. Then (x,y)+ s(h1,h2) ∈D for any h = (h1,h2) ∈ C1,pw[ta, tb] with arbitrary h1(ta) but satisfying h1(tb) =0, h2(ta) = 0, h2(tb) = 0. Since δJ(x,y)h = 0 we obtain after integration by parts,allowed by (1.10.9)1,

∫ tb

taΦxh1+Φyh2+Φxh1+Φyh2dt

=∫ tb

ta

(Φx − d

dtΦx

)h1+

(Φy − d

dtΦy

)h2dt− (Φxh1)(ta) = 0.

(1.10.18)

All other boundary terms vanish by the choice of h1 and h2. Since the localminimizer fulfills the Euler-Lagrange equations (1.10.9)2,3, only the boundary termin (1.10.18) remains. If h1(ta) = 0, we obtain (1.10.17)1. �


http://dx.doi.org/10.1007/978-3-319-71123-2_2


It is not necessary to confine ourselves to plane curves. Furthermore the Lagrangefunction can depend explicitly on the parameter. We generalize:

A curve in Rn, n ∈ N, is given by {x(t) = (x1(t), . . . ,xn(t))|t ∈ [ta, tb]}. A func-

tional

J(x) =∫ tb

taΦ(t,x, x)dt, (1.10.19)

with a continuous Lagrange function Φ : [ta, tb]×Rn ×R

n → R can be defined onD⊂ (C1,pw[ta, tb])n. We call it a parametric functional for curves in R

n. Boundaryconditions can be imposed for all or only some components of x ∈ D.

If Φ does not depend explicitly on the parameter and if

Φ(x,α x) = αΦ(x, x) (1.10.20)

for all α > 0 and for all (x, x) ∈ Rn ×R

n, then the parametric functional (1.10.19)is called invariant. The proof of Proposition 1.10.1 for n> 2 again shows that anyreparametrization of the curve x according to Definition (1.10.3) leaves the func-tional invariant.

If Φ : [ta, tb] × Rn × R

n → R is continuously partially differentiable withrespect to the last 2n variables, then the first variation of J in x and in directionh= (h1, . . . ,hn) ∈ (C1,pw

0 [ta, tb])n exists and it is represented by

δJ(x)h=∫ tb

ta

n

∑k=1

(Φxkhk+Φxk hk)dt

=∫ tb

ta(Φx,h)+(Φx, h)dt.

(1.10.21)

In (1.10.21) we use the abbreviations Φx = (Φx1 , . . . ,Φxn), Φx = (Φx1 , . . . ,Φxn), andalso employ the Euclidean scalar product ( , ) in R

n. The argument of Φxk andof Φxk is the vector (t,x(t), x(t)) ∈ [ta, tb]×R

n ×Rn.

The distance between two curves x, x ∈ (C1,pw[ta, tb])n is defined bymax

k∈{1,...,n}‖xk − xk‖1,pw,[ta,tb] which, in turn, allows us to define a local minimizer of J

as in Definition 1.4.1. For a local minimizer x ∈ D ⊂ (C1,pw[ta, tb])n, the first varia-tion vanishes, i.e., δJ(x)h= 0 for all h∈ (C1,pw

0 [ta, tb])n, and as expounded in Propo-sition 1.10.3 for n= 2, this implies the system of Euler-Lagrange equations

Φx(·,x, x) ∈ (C1,pw[ta, tb])n, and

ddt

Φx(·,x, x) = Φx(·,x, x) piecewise on [ta, tb].(1.10.22)

The proof of Proposition 1.10.4 can be extended to this system: If the parametricfunctional does not depend explicitly on the parameter, and if it is invariant in thesense of (1.10.20), then the system of Euler-Lagrange equations is invariant underreparametrizations.



Finally, there is the analogue of Proposition 1.10.5: If the kth component xk ofa local minimizer x is free at t = ta and/or at t = tb, then it fulfills the naturalboundary conditions

Φxk(ta,x(ta), x(ta)) = 0 and/or Φxk(tb,x(tb), x(tb)) = 0, respectively. (1.10.23)

Among the historically first variational problems, we find physical examples, inparticular in Lagrangian mechanics:

Let {x(t) = (x1(t),x2(t),x3(t))|t ∈ [ta, tb]} be the trajectory of a point mass m inthe space R3 parametrized by the time t. For x ∈ (C1[ta, tb])3

T =12m(x21+ x22+ x23) =

12m‖x‖2 = T (x) (1.10.24)

is the kinetic energy, and

V =V (x1,x2,x3) =V (x) (1.10.25)

is the potential energy. (‖ ‖ is the Euclidean norm in R3). We assume continuous

total differentiability of V . Then

E = T +V is the total energy,

L= T −V is the free energy, and

J(x) =∫ tb

taL(x, x)dt is the action

(1.10.26)

of the point mass m along the trajectory {x(t)|t ∈ [ta, tb]}. The function L : R3 ×R3 →R is the Lagrangian, a nomenclature that is transferred to all variational func-

tionals.According to the “principle of least action” the minimization of the action leads

to the system of Euler-Lagrange equations, which a trajectory has to fulfill:

ddtLx(x, x) = Lx(x, x), or

mx1 = −Vx1(x1,x2,x3)mx2 = −Vx2(x1,x2,x3)mx3 = −Vx3(x1,x2,x3), or

mx= −∇V (x) = −gradV (x).

(1.10.27)

The equations (1.10.27) are the equations of motions for a point mass m. Since thetotal energy is conserved along a trajectory that fulfills (1.10.27), cf. Exercise 1.10.4,the system (1.10.27) is called a conservative system. Uniqueness of a solutionis only possible if (natural) boundary conditions are imposed. However, it is notobvious that a solution minimizes the action. A sufficient condition is given inExercise 1.10.6.



We remark that the action is not invariant in the sense of (1.10.20). Thereforethe physical time cannot be replaced by a different parameter without changing thephysical meaning of equations (1.10.27).

Remark. In a different notation, (1.10.19) can be rewritten as

J(y) =∫ b

aF(x,y,y′)dx (1.10.28)

for y(x) = (y1(x), . . . ,yn(x)), y′(x) = (y′1(x), . . . ,y

′n(x)). The functional (1.10.28)

generalizes the functional (1.1.1) in a natural way. The Lagrange function of(1.10.28) F : [a,b]×R

n ×Rn → R is a continuous and a continuously partially dif-

ferentiable function with respect to the last 2n variables by assumption. The Euler-Lagrange equations are similar to (1.4.3), differing only by Fy = (Fy1 , . . . ,Fyn),Fy′ = (Fy′1 , . . . ,Fy′n), and by the fact that (1.4.3) now becomes a system of n dif-ferential equations.

Exercises

1.10.1. Prove that a curve (x,y) ∈ (C1,pw[ta, tb])2 is the graph of a function y ∈C1,pw[a,b] if x(ta) = a, x(tb) = b and x(t)> 0 piecewise for t ∈ [ti−1, ti], i= 1, . . . ,m.Here ta = t0 < t1 < · · · < tm−1 < tm = tb.

1.10.2. The parametric functional,

J(x) =∫ tb

ta(F(x), x)dt,

defined for a totally differentiable vector field F :Rn → Rn and for a curve x ∈ D=

(C1,pw[ta, tb])n∩{x(ta) = A, x(tb) = B}, is called the integral of the field F along thecurve x. Here ( , ) is the Euclidean scalar product in Rn.

a) Compute the first variation δJ(x) : (C1,pw0 [ta, tb])n → R

n.b) Give the system of Euler-Lagrange equations. Does it have solutions in any D?c) Assume

∂Fi∂xk

(x) =∂Fk∂xi

(x) for all x ∈ Rn and i,k = 1, . . . ,n.

Show that δJ(x) = 0 for all x ∈D and that in this case any x ∈D is a solution ofthe Euler-Lagrange system piecewise. What does this mean for the functional J?

A Lagrange function (a Lagrangian) with identically vanishing first variation iscalled a “Null Lagrangian.”

1.10.3. Let the curve x ∈ (C2[ta, tb])n be a local minimizer of the parametric func-tional



J(x) =∫ tb

taΦ(x, x)dt,

where the Lagrange function Φ :Rn×Rn →R is continuously totally differentiable.

Prove thatΦ(x, x)− (x,Φx(x, x)) = c1 on [ta, tb],

where ( , ) is the scalar product in Rn.

1.10.4. Adopt the definitions (1.10.24)–(1.10.27). Show that a local minimizer x ∈(C1[ta, tb])3 of the action is in (C2[ta, tb])3 and that the total energy E = E(x, x) =const for t ∈ [ta, tb].

1.10.5. Compute the second variation of the parametric functional

J(x) =∫ tb

taΦ(t,x, x)dt

in x in direction h, where the Lagrange function Φ : R×Rn ×R

n → R is contin-uous and two times continuously partially differentiable with respect to the last 2nvariables. That is, compute

d2

ds2J(x+ sh)|s=0 = δ 2J(x)(h,h),

where x, h ∈ (C1,pw[ta, tb])n, cf. Exercise 1.2.2.

1.10.6. a) Adopt the definitions (1.10.24)–(1.10.27) and assume that the poten-tial energy V : R3 → R is two times continuously differentiable. Compute thesecond variation δ 2J(x)(h,h) in x, h ∈ (C1[ta, tb])3, cf. Exercise 1.10.5.

b) Let the Hessian of the potential energy

D2V (x) =(

∂ 2V∂xi∂x j

(x))

i=1,2,3j=1,2,3

fulfill(D2V (x)h,h) ≤ 0 for all x, h ∈ R

3.

Show that any solution of the equations of motions (1.10.27) is a global min-imizer of the action among all trajectories x ∈ (C2[ta, tb])3 fulfilling the sameboundary conditions at t = ta and at t = tb.

Hint: Exercise 1.4.4.

1.11 The Weierstraß-Erdmann Corner Conditions

Wementioned before in Paragraph 1.10 that a functional in parametric form can givemore information about a minimizer of a functional in nonparametric form (1.1.1).Therefore we consider (1.1.1) as a special case of (1.10.1).


1.11 The Weierstraß-Erdmann Corner Conditions 49

A reparametrization according to Definition 1.10.3 is given by

ϕ : [τa,τb] → [a,b], ϕ ∈C1[τa,τb],dϕdτ

(τ)> 0 for all τ ∈ [τa,τb].(1.11.1)

Let y ∈C1,pw[a,b] and set

x= ϕ(τ) = x(τ) ∈ [a,b],y(x) = y(ϕ(τ)) = y(x(τ)) = y(τ).

Then (x, y) ∈C1[τa,τb]×C1,pw[τa,τb], and

{(x,y(x))|x ∈ [a,b]} = {(x(τ), y(τ))|τ ∈ [τa,τb]}

(1.11.2)

is a reparametrization of the graph of y. Then on [τa,τb], we have piecewise

dydτ

(τ) = y′(ϕ(τ))dϕdτ

(τ) = y′(x)dxdτ

(τ), or

y′(x) =˙y˙x(τ) where x= x(τ) and ˙( ) =

d ( )dτ

.

(1.11.3)

The substitution formula transforms the functional (1.1.1) into a parametric one:

J(y) =∫ b

aF(x,y,y′)dx=

∫ τb

τaF

(x, y,

˙y˙x

)˙xdτ = J(x, y). (1.11.4)

The Lagrange function of the parametric functional is invariant in the sense of Def-inition 1.10.2:

Φ(x, y, ˙x, ˙y) = F

(x, y,

˙y˙x

)˙x, (1.11.5)

Any reparametrization (1.11.2) preserves the value of J(y).

Proposition 1.11.1. Let y ∈ D ⊂C1,pw[a,b] be a global minimizer of

J(y) =∫ b

aF(x,y,y′)dx, (1.11.6)

where admitted functions in D have possibly to fulfill boundary conditions at x = aand/or at x = b. Then any reparametrization (1.11.2) of the graph {(x,y(x))|x ∈[a,b]} is a local minimizer of the corresponding parametric functional

J(x, y) =∫ τb

τaF

(x, y,

˙y˙x

)˙xdτ, (1.11.7)

defined on D ⊂ C1[τa,τb]×C1,pw[τa,τb]. Admissible curves in D fulfill x(τa) = a,x(τb) = b, and y satisfies the boundary conditions that are possibly prescribed byy ∈ D. Furthermore ˙x(τ)> 0 for τ ∈ [τa,τb].



Proof. Since [τa,τb] is compact, ˙x(τ)≥ d > 0 for τ ∈ [τa,τb]. Any admissible curvenear (x, y) in the sense of (1.10.8)2 is given by

{(x(τ)+h1(τ), y(τ)+h2(τ))|τ ∈ [τa,τb]}, where

h1 ∈C10 [τa,τb], h2 ∈C1,pw[τa,τb] and

‖h1‖1,[τa,τb] < d, ‖h2‖1,pw,[τa,τb] < d.

(1.11.8)

If y must satisfy prescribed boundary conditions at τ = τa and/or τ = τb, then nec-essarily h2(τa) = 0 and/or h2(τb) = 0. By ˙x(τ)+ h1(τ)> 0 for τ ∈ [τa,τb], the per-turbation (1.11.8) is the graph of a function y ∈ D ⊂C1,pw[a,b], cf. Exercise 1.10.1.Therefore, in view of (1.11.4),

J(x, y) = J(y) ≤ J(y) = J(x+h1, y+h2), (1.11.9)

which proves the claim. �

Why must y ∈ D be a global minimizer of the nonparametric functional in orderto guarantee that the reparametrized curve (x, y) is a local minimizer of the corre-sponding parametric functional?

The answer is given in Figure 1.19:A perturbation of the graph {(x,y(x))|x∈ [a,b]} by {(x+h1(x),y(x))|x∈ [a,b]}=

{(x, y(x))|x ∈ [a,b]}, where h1 ∈C10 [a,b] is sketched in Figure 1.19, has a distance

‖y− y‖1,pw,[a,b] ≥ |y′(x)− y′(x)| for any point x ∈ [a,b]. However small ‖h1‖1,[a,b]might be, which measures the distance between curves, the distance between y andy is still big.

However, ‖y− y‖0,[a,b] gets small for small ‖h1‖1 and ‖h2‖1 in (1.11.8). ThereforeProposition 1.11.1 holds for a “strong local minimizer” of (1.11.6), cf. the remarksafter Definition 1.4.1.

Fig. 1.19 Perturbation of a Graph considered as a Curve

Remark. A global minimizer of the nonparametric functional is not necessarily aglobal minimizer of the corresponding parametric functional if the constraint ˙x(τ)>0 is given up. Here is an example:



J(y) =∫ 1

0(y′)2dx defined on

D=C1,pw[0,1]∩{y(0) = 0, y(1) = 1}.By example 2 in Paragraph 1.5, the global minimizer is the line segment given byy(x) = x. The corresponding parametric functional is given by

J(x, y) =∫ τ1

τ0

( ˙y˙x

)2˙xdτ =

∫ τ1

τ0

˙y2

˙xdτ defined on

D= (C1[τ0,τ1]×C1,pw[τ0,τ1])∩{(x(τ0),(y(τ0)) = (0,0),(x(τ1),(y(τ1)) = (1,1)},˙x(τ)> 0 on [τ0,τ1].

Giving up the constraint that ˙x has to be positive, we now require only that x ∈C1,pw[τ0,τ1] with ˙x = 0 piecewise.

Choosing the curves

x(τ) = pτy(τ) = qτ

}f or τ ∈ [0,1],

x(τ) = p+(1− p)(τ −1)y(τ) = q+(1−q)(τ −1)

}f or τ ∈ [1,2],

where p = 0 and p = 1, then the curves connect (x(0), y(0)) = (0,0) and (x(2),y(2)) = (1,1). Furthermore,

J(x, y) = 1+(p−q)2

p(1− p),

and we see that J can have any value in R.The given curve can be composed by arbitrarily many pieces as sketched in

Figure 1.20 for 0 < q < 1 < p. As long as p and q are fixed, all such composi-tions give the same value of J. This shows that the parametric functional has anyreal value in any neighborhood of ( 12τ, 12τ) in (C[0,2])2.

Fig. 1.20 A Parametric Versus a Nonparametric Functional



Next we make use of the regularity (1.10.9)1 of a local minimizer.

Proposition 1.11.2. Let y ∈ D ⊂ C1,pw[a,b] be a global (or a strong local)minimizer of the functional

J(y) =∫ b

aF(x,y,y′)dx (1.11.10)

satisfying boundary conditions or not. Assume that the Lagrange functionF : R3 → R is continuously totally differentiable. Then

Fy′(·,y,y′) ∈C[a,b], and

F(·,y,y′)− y′Fy′(·,y,y′) ∈C[a,b].(1.11.11)

In addition, if F does not depend explicitly on x, then F(y,y′)− y′Fy′(y,y′) = c1 on[a,b].

Proof. According to Proposition 1.11.1 the curve {(x,y(x))|x ∈ [a,b]} is a localminimizer of the corresponding parametric functional (1.11.7) with Lagrange func-tion Φ(x, y, ˙x, ˙y) = F(x, y, ˙x

˙y ) ˙x. Then Φ : R4 ∩{ ˙x > 0} → R is continuously totallydifferentiable, and according to Proposition 1.10.3, the curve fulfills the Euler-Lagrange equations. Due to the invariance of the Lagrange function, any repara-metrized curve has that same property, cf. Proposition 1.10.4. We obtain for (1.11.5)

Φ ˙x(x, y, ˙x, ˙y) = F

(x, y,

˙y˙x

)−Fy′

(x, y,

˙y˙x

) ˙y˙x∈C1,pw[τa,τb], (1.11.12)

and for the reparametrization {(x,y(x))|x∈ [a,b]}, this gives by (1.11.2) and (1.11.3)x= x, ˙x= 1, y= y, ˙y= y′, τa = a, τb = b and

F(·,y,y′)− y′Fy′(·,y,y′) ∈C1,pw[a,b] ⊂C[a,b].(1.11.13)

The first claim of (1.11.11) is part of Proposition 1.4.1. The last addition is given asExercise 1.11.1. �

The continuity conditions (1.11.11) are the so-calledWeierstraß-Erdmann cor-ner conditions. They admit only specific corners for minimizers.

We recall Example 6 of Paragraph 1.5. We generalize it to

J(y) =∫ b

aW (y′)dx on D ⊂C1,pw[a,b] (1.11.14)

where the potential W : R → R sketched in Figure 1.21 is continuously differen-tiable.

Any local minimizer y satisfies

ddx

W ′(y′) = 0 or W ′(y′) = c1 on [a,b], (1.11.15)



since, due to the first corner condition, the function W ′(y′) is continuous on [a,b].The second corner condition for global minimizers implies

W (y′)− y′W ′(y′) = c2 on [a,b] (1.11.16)

by the supplement of Proposition 1.11.2. This can also be seen directly from(1.11.15): The derivative y′ and therefore W ′(y′)− y′W ′(y′) is piecewise constantand by continuity constant [a,b]. Relation (1.11.15) provides three constants for y′,namely

y′ = c11,c21,c

31 , (1.11.17)

cf. Figures 1.21 and 1.5. Upon substitution into (1.11.16), we obtain

W (ci1)− ci1W′(ci1) = c2 for i= 1,2,3 and therefore

W (c11)−W (c21) = c11W′(c11)− c21W

′(c21)

= (c11 − c21)W′(c11) = (c11 − c21)W

′(c21),

W (c21)−W (c11)c21 − c11

=W ′(c11) =W ′(c21).

(1.11.18)

Fig. 1.21 AW-Potential

The geometric interpretation is that the slope of the secant throughW (c11) andW (c21)equals the slope of the tangent inW (c11) as well as inW (c21). A second interpretationis the following:



W (c21)−W (c11) =∫ c21

c11

W ′(z)dz= (c21 − c11)W′(ci1), i= 1,2. (1.11.19)

Both conditions allow only two specific constants c11 = α and c21 = β sketched inFigure 1.21. The horizontal line limiting with the graph ofW ′ the hatched areas ofequal size is called “Maxwell line.”

The Weierstraß-Erdmann corner conditions give necessary conditions on theslopes of a global minimizer at a corner. However, any straight line segmentover [a,b] fulfills the Euler-Lagrange equation (1.11.15) and the corner condition(1.11.16) as well. Therefore a separate discussion depending on the potentialW andon possible boundary conditions has to decide on global minimizers of (1.11.14).

We do this now for the Example 6 of Paragraph 1.5 whereW (z) = (z2−1)2. Herec11 = α = −1, c21 = β = 1 and we find infinitely many global minimizers withoutboundary conditions as sketched in Figure 1.4. However, for the boundary con-ditions y(0) = 0 and y(1) = 2 corners with slopes ±1 are not possible. The onlycandidate for a minimizer is the line y = 2x. We show that it is indeed a globalminimizer.

The tangent to the graph ofW in (2,W (2)) = (2,9) is the lineW (2)+W ′(2)(z−2) = 9+ 24(z− 2), and Figure 1.5 shows thatW (z) ≥W (2)+W ′(2)(z− 2) for allz ∈ R. For y ∈C1,pw[0,1]∩{y(0) = 0, y(1) = 2}, we obtain

J(y) =∫ 1

0W (y′)dx ≥

∫ 1

0W (2)+W ′(2)(y′ −2)dx

=∫ 1

0W (2)dx= J(y), since

∫ 1

0y′dx= y(1)− y(0) = 2.

(1.11.20)

This proves that the line segment between (0,0) and (1,2) is the global minimizerof J in D=C1,pw[0,1]∩{y(0) = 0, y(1) = 2}.

Next we give a sufficient condition to exclude broken minimizers.

Proposition 1.11.3. Let y ∈ D ⊂C1,pw[a,b] be a local minimizer of the functional

J(y) =∫ b

aF(x,y,y′)dx, (1.11.21)

where the Lagrange function F is continuous, is continuously differentiable, oncewith respect to the second and twice with respect to the third variable. Boundaryconditions can possibly be imposed. If

Fy′y′(x,y(x),z) = 0 for all x ∈ [a,b], z ∈ R, (1.11.22)

then y ∈C1[a,b].

Proof. Assume that y ∈C1,pw[a,b]\C1[a,b]. Then there is a xi ∈ (a,b) such that



y′−(xi) = lim

x↗xiy′(x) = lim

x↘xiy′(x) = y′

+(xi). (1.11.23)

For f (z) = F(xi,y(xi),z), we have

f ′(z) = Fy′(xi,y(xi),z), and

f ′(y′−(xi)) = f ′(y′

+(xi)),(1.11.24)

due to the first Weierstraß-Erdmann corner condition (1.11.11)1, or due to (1.4.3)1,which holds also for local minimizers. The existence of some z between y′−(xi) andy′+(xi) such that

f ′′(z) = Fy′y′(xi,y(xi),z) = 0, (1.11.25)

guaranteed by Rolle’s theorem, contradicts the assumption (1.11.22). �

Proposition 1.11.3 and Exercise 1.5.1 imply the following regularity theorem:

Proposition 1.11.4. Let y ∈ D ⊂C1,pw[a,b] be a local minimizer of the functional

J(y) =∫ b

aF(x,y,y′)dx, (1.11.26)

where the Lagrange function F is twice continuously differentiable with respect toall three variables. If

Fy′y′(x,y(x),z) = 0 for all x ∈ [a,b], z ∈ R, (1.11.27)

then y ∈C2[a,b].

Condition (1.11.27) means “ellipticity” of the Euler-Lagrange equation (1.4.5)along a minimizer. We return to this in Chapter 3.

Exercises

1.11.1. Let y ∈ D ⊂C1,pw[a,b] be a global minimizer of the functional

J(y) =∫ b

aF(y,y′)dx,

where the Lagrange function F :R2 →R is continuously totally differentiable. Showthat

F(y,y′)− y′Fy′(y,y′) = c1 on [a,b].

Compare that result with the special case 7 of Paragraph 1.5.


http://dx.doi.org/10.1007/978-3-319-71123-2_3


1.11.2. Compute global minimizers of

J(y) =∫ b

aW (y′)dx in D=C1,pw[a,b], where W (z) =

12z4+

13z3 − 1

2z2.

Are they unique?

1.11.3. Compute and sketch global minimizers of the functional of Exercise 1.11.2if D=C1,pw[a,b]∩{y(a) = 0, y(b) = 0}.Hint: Compute J(y) for a sawtooth function having slopes c11 = α < 0 and c21 =β > 0 according to Figure 1.21 and fulfilling the boundary conditions. Show thatJ(y) ≥ J(y) for all y ∈ D.


Chapter 2Variational Problems with Constraints

2.1 Isoperimetric Constraints

Many early variational problems like Dido’s problem or the problem of the hangingchain have constraints in a natural way: Maximize the area with given perimeteror minimize the potential energy of a hanging chain with given length. These con-straints belong to the class of isoperimetric constraints, and they are of the sametype as the functional to be maximized or minimized.

Definition 2.1.1. Consider a functional

J(y) =∫ b

aF(x,y,y′)dx, (2.1.1)

defined on D ⊂C1,pw[a,b]. A constraint of the type

K(y) =∫ b

aG(x,y,y′)dx= c (2.1.2)

is called an isoperimetric constraint. The Lagrange functions F and G are continu-ous.

The goal of this paragraph is a necessary condition on a local minimizer y ∈ Dof J under an isoperimetric constraint.

We recall the following theorem from calculus:

Let f ,g : R2 → R be continuously totally differentiable.If f is locally extremal at x0 ∈ R

2 subject to g(x) = c,andif ∇g(x0) �= 0 holds, then∇ f (x0)+λ∇g(x0) = 0 for some λ ∈ R.

(2.1.3)


57


58 2 Variational Problems with Constraints

This is the method of Lagrange multipliers, which is proved in its general versionin the Appendix. In Figure 2.1, we visualize the case (2.1.3) by a hiking map withthe contour lines of a landscape described by f and a trail to a summit described byg(x) = c.

Fig. 2.1 Local Extremals under a Constraint

Hikers, even when they are not mathematicians, realize that a point where thetrail and a contour line are tangential is locally extremal on their trail. Since thegradients are orthogonal to level curves, the Lagrange multiplier rule (2.1.3) givesprecisely the points where the contour lines and the trail are tangent to each other.On the summit, the gradient of f vanishes and the multiplier rule is satisfied withλ = 0.

Proposition 2.1.1. Let y ∈ D ⊂ C1,pw[a,b] be a local minimizer of the functional(2.1.1) under the isoperimetric constraint (2.1.2), i.e.,

J(y) ≤ J(y) for all y ∈ D∩{K(y) = c}satisfying ‖y− y‖1,pw < d.

(2.1.4)

The domain D involves possibly boundary conditions. The Lagrange functionsF,G : [a,b]×R×R → R are assumed to be continuous and continuously partiallydifferentiable with respect to the last two variables.Furthermore assume that y is not critical for the constraint K, which means thatδK(y) :C1,pw

0 [a,b]→R is linear, continuous, and surjective (or not identically zero).


2.1 Isoperimetric Constraints 59

Then, there is some λ ∈ R such that

Fy′(·,y,y′)+λGy′(·,y,y′) ∈C1,pw[a,b] and

ddx

(Fy′(·,y,y′)+λGy′(·,y,y′)) = Fy(·,y,y′)+λGy(·,y,y′)

piecewise on [a,b].

(2.1.5)

Proof. By assumption on G, the first variation δK(y) exists and is a linear and con-tinuous mapping, and also there is a function h0 ∈C1,pw

0 [a,b] with δK(y)h0 = 1. For

an arbitrary h ∈C1,pw0 [a,b], we define

f (x1,x2) = J(y+ x1h+ x2h0) and

g(x1,x2) = K(y+ x1h+ x2h0).(2.1.6)

We remark that y+x1h+x2h0 ∈ D for all x= (x1,x2) ∈ R2, since D ⊂C1,pw[a,b] is

possibly constrained by boundary conditions only. As expounded in Paragraph 1.2,

∂ f∂x1

(0,0) = limx1→0

J(y+ x1h)− J(y)x1

= δJ(y)h,

∂ f∂x2

(0,0) = δJ(y)h0,

∂g∂x1

(0,0) = δK(y)h,∂g∂x2

(0,0) = δK(y)h0 = 1.

(2.1.7)

The arguments in the proof of Proposition 1.2.1 yield also that

∂ f∂x1

(x) = δJ(y+ x1h+ x2h0)h,∂ f∂x2

(x) = δJ(y+ x1h+ x2h0)h0,

∂g∂x1

(x) = δK(y+ x1h+ x2h0)h,∂g∂x2

(x) = δK(y+ x1h+ x2h0)h0.

(2.1.8)

The representation (1.2.8) of the first variation shows that for fixed y,h, and h0, thepartial derivatives (2.1.8) are continuous with respect to x = (x1,x2). This followsbecause the partial derivatives Fy,Fy′ and Gy,Gy′ are all continuous on [a,b]×R×R

and uniformly so on compact subsets, cf. the arguments in (1.2.10)–(1.2.12). Con-sequently, the functions f ,g : R2 → R defined by (2.1.6) are continuously totallydifferentiable.

By assumption on y, the point x = (0,0) ∈ R2 is a local minimizer of f subject

to g(x) = c, and since the gradient ∇g(0) �= 0 by (2.1.7)3, the multiplier rule (2.1.3)yields

∇ f (0)+λ∇g(0) = 0 for some λ ∈ R or

δJ(y)h+λδK(y)h= 0,

δJ(y)h0 +λδK(y)h0 = 0 due to (2.1.7).

(2.1.9)


http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1


The definitions of f and g in (2.1.6) depend on y,h, and h0. Whereas y and h0 arefixed, and h ∈ C1,pw

0 [a,b] is arbitrary. It is crucial that the multiplier λ does notdepend on h. As a matter of fact, in view of δK(y)h0 = 1 and of (2.1.9)3,

λ = −δJ(y)h0 for all h ∈C1,pw0 [a,b]. (2.1.10)

Then, (2.1.9)2 reads:

∫ b

a(Fy+λGy)h+(Fy′ +λGy′)h

′dx= 0 for all h ∈C1,pw0 [a,b], (2.1.11)

where the argument of all functions is (x,y(x),y′(x)). Lemma 1.3.4 finally impliesthe claim (2.1.5). �

If the functional (2.1.1) is defined on all of D=C1,pw[a,b], which means that theboundary of y ∈ D is free at x = a and at x = b, then a local minimizer under anisoperimetric constraint fulfills natural boundary conditions.

Proposition 2.1.2. Let y ∈ D = C1,pw[a,b] be a local minimizer of the functional(2.1.1) under the isoperimetric constraint (2.1.2) in the sense of (2.1.4). Assume thehypotheses of Proposition 2.1.1.

Then, y fulfills (2.1.5), and with the same multiplier λ ∈ R, it fulfills the naturalboundary conditions

Fy′(a,y(a),y′(a))+λGy′(a,y(a),y

′(a)) = 0 and

Fy′(b,y(b),y′(b))+λGy′(b,y(b),y

′(b)) = 0.(2.1.12)

Proof. Following the proof of Proposition 2.1.1, we choose h0 ∈ C1,pw[a,b] suchthat δK(y)h0 = 1. Then with arbitrary h ∈ C1,pw

0 [a,b], we obtain (2.1.9), (2.1.10),(2.1.11), and finally (2.1.5). But now an arbitrary h ∈ C1,pw[a,b] is admitted in(2.1.6), and thus, (2.1.9), (2.1.10), and (2.1.11) hold for any h ∈ C1,pw[a,b]. Inte-gration by parts, cf. Lemma 1.3.3, yields

δJ(y)h+λδK(y)h

=∫ b

a(Fy+λGy)h+(Fy′ +λGy′)h

′dx

=∫ b

a(Fy+λGy − d

dx(Fy′ +λGy′))hdx+(Fy′ +λGy′)h

∣∣ba = 0,

(2.1.13)

where the argument of all functions is (x,y(x), y′(x)). Due to (2.1.5), the only termleft in (2.1.13) is

(Fy′(x,y(x),y′(x))+λGy′(x,y(x),y

′(x)))h(x)∣∣ba = 0, (2.1.14)

proving the claim (2.1.12) by an arbitrary choice of h(a) and h(b). �


http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1


Remark. If only one boundary value is prescribed in D ⊂ C1,pw[a,b], then anylocal minimizer subject to the isoperimetric constraint fulfills one natural boundarycondition (2.1.12) at the free end.

Next we generalize Propositions 2.1.1 and 2.1.2 to functionals and isoperimetricconstraints that are defined for vector-valued functions. Also the constraint consistsof several components.

Definition 2.1.2. The functional

J(y) =∫ b

aF(x,y,y′)dx, y(x) = (y1(x), . . . ,yn(x)), (2.1.15)

as well as the isoperimetric constraints

Ki(y) =∫ b

aGi(x,y,y′)dx= ci, i= 1, . . . ,m, (2.1.16)

are defined for y ∈ D ⊂ (C1,pw[a,b])n. The Lagrange functions F,Gi : [a,b]×Rn ×

Rn → R are continuous. We express the m constraints as a single vector-valued

constraint as follows:

K(y) = (K1(y), . . . ,Km(y)) = (c1, . . . ,cm) = c. (2.1.17)

We point out that the dimensions n and m are completely independent.The domain of definition D of J determines the space of admissible perturbations asfollows: If

D= {y ∈ (C1,pw[a,b])n| yk(a) = Ak and/or yk(b) = Bk}for certain k ∈ {1, . . . ,n}, then

D0 = {h ∈ (C1,pw[a,b])n| hk(a) = 0 and/or hk(b) = 0}for the same choice of indices.

(2.1.18)

All remaining components have a free boundary at x= a and/or x= b.

Proposition 2.1.3. Let y∈D⊂ (C1,pw[a,b])n be a local minimizer of the functional(2.1.15) under the isoperimetric constraints (2.1.17), i.e.,

J(y) ≤ J(y) for all y ∈ D∩{K(y) = c}where max

k=1,...,n‖yk − yk‖1,pw < d. (2.1.19)

The Lagrange functions F,Gi : [a,b]×Rn ×R

n → R, i = 1, . . . ,m, are assumed tobe continuous and continuously partially differentiable with respect to the last 2nvariables.



If y is not critical for the constraints, i.e., if δK(y) = (δK1(y), . . . ,δKm(y)) :D0 → R

m is linear, continuous, and surjective (or if the m linear functionalsδK1(y), . . . ,δKm(y) are linearly independent in the sense of Exercise 2.1.1), thenholds:

Fy′(·,y,y′)+m

∑i=1

λiGi,y′(·,y,y′) ∈ (C1,pw[a,b])n,

ddx

(F+m

∑i=1

λiGi)y′ = (F+m

∑i=1

λiGi)y piecewise on [a,b](2.1.20)

for some λ = (λ1, . . . ,λm) ∈ Rm.

Here, we agree upon the notation Fy′ = (Fy′1 , . . . ,Fy′n), Fy = (Fy1 , . . . ,Fyn), and anal-ogously for Gi,y′ , Gi,y. The argument of all functions in (2.1.20) is (x,y(x),y′(x)) =(x,y1(x), . . . ,yn(x), y′

1(x), . . . ,y′n(x)), and (2.1.20)2 is a system of n differential equa-

tions.

Proof. By assumption on Gi, the first variations δKi(y) exist, they are linear andcontinuous. Due to the surjectivity of δK(y), there exist functions h1, . . . ,hm ∈ D0

such that

δKi(y)h j = δi j ={

1 for i= j,0 for i �= j,

i, j = 1, . . . ,m. (2.1.21)

We define for arbitrary h ∈ (C1,pw0 [a,b])n

f (s, t1, . . . , tm) = J(y+ sh+ t1h1 + · · ·+ tmhm),Ψi(s, t1, . . . , tm) = Ki(y+ sh+ t1h1 + · · ·+ tmhm), i= 1, . . . ,m.

(2.1.22)

In view of (2.1.18)1, the functions of the arguments in (2.1.22) are in D, and asexpounded in the proof of Proposition 2.1.1, the functions

f : Rm+1 → R and

Ψ = (Ψ1, . . . ,Ψm) : Rm+1 → Rm

are continuously totally differentiable.

(2.1.23)

The choice of the functions h1, . . . ,hm in (2.1.21) yields the following structure ofthe Jacobian matrix:

DΨ(0) =

⎛⎜⎝

δK1(y)h 1 · · · 0...

.... . .

...δKm(y)h 0 · · · 1

⎞⎟⎠ . (2.1.24)

Since the rank of DΨ(0) is maximal, the set {(s, t1, . . . , tm)|Ψ(s, t1, . . . , tm) = c ∈Rm} is locally a one-dimensional manifold near 0 ∈ R

m+1, i.e., a curve, cf. the



Appendix. By assumption on y, the function f at 0 ∈Rm+1 is locally minimal subject

to the constraint Ψ(s, t1, . . . , tm) = c. Therefore, the vector-valued Lagrange multi-plier rule (proved in the Appendix (A.20)–(A.23)) is applicable and reads:

∇ f (0)+m

∑i=1

λi∇Ψi(0) = 0 for λ = (λ1, . . . ,λm) ∈ Rm. (2.1.25)

The partial derivatives of the gradients are computed as in (2.1.7), whence the m+1equations (2.1.25) give

δJ(y)h+m

∑i=1

λiδKi(y)h= 0,

δJ(y)h j+m

∑i=1

λiδKi(y)h j = 0, j = 1, . . . ,m.

(2.1.26)

In view of (2.1.21), the last m equations yield

λ j = −δJ(y)h j for j = 1, . . . ,m, (2.1.27)

which means that the Lagrange multipliers λ j do not depend on h ∈ (C1,pw0 [a,b])n,

despite the fact that the definitions of the functions f and Ψi involve h. The firstequation (2.1.26) reads

∫ b

a((F+

m

∑i=1

λiGi)y,h)+((F+m

∑i=1

λiGi)y′ ,h′)dx= 0,

where ( , ) denotes the Euclidean scalar product in Rn.

(2.1.28)

Choosing h = (0, . . . , h, . . .0), where h ∈ C1,pw0 [a,b] in the k-th component is arbi-

trary, (2.1.28) gives

∫ b

a(F+

m

∑i=1

λiGi)yk h+(F+m

∑i=1

λiGi)y′k h′dx= 0, (2.1.29)

and thus, Lemma 1.3.4 proves the claim (2.1.20) for the k-th component, k =1,2, ...,n. �

By (2.1.18), some boundaries of some components of admitted functions inD ⊂ (C1,pw[a,b])n are free. Local minimizers under isoperimetric constraints ful-fill natural boundary conditions there. We give here only the result, since a prooffollowing Proposition 2.1.2 along with the result of Proposition 2.1.3 is straightforward.

Proposition 2.1.4. Under the hypotheses of Proposition 2.1.3, a local minimizer yunder isoperimetric constraints whose k-th component yk has a free boundary at


http://dx.doi.org/10.1007/978-3-319-71123-2_1


x = a and/or x = b fulfills (2.1.20), and with the same multipliers λi ∈ R, it fulfillsthe natural boundary conditions

(F+m

∑i=1

λiGi)y′k(a,y(a), y′(a)) = 0 and/or

(F+m

∑i=1

λiGi)y′k(b,y(b), y′(b)) = 0.

(2.1.30)

Next we investigate parametric functionals introduced in Paragraph 1.10, whichare now subject to isoperimetric constraints.

Definition 2.1.3. Consider a parametric functional

J(x) =∫ tb

taΦ(t,x, x)dt, x(t) = (x1(t), . . . ,xn(t)), (2.1.31)

defined on D ⊂ (C1,pw[ta, tb])n. Constraints of the type

Ki(x) =∫ tb

taΨi(t,x, x)dt = ci, i= 1, . . . ,m, (2.1.32)

are called isoperimetric constraints. The Lagrange functions Φ ,Ψi : [ta, tb]×Rn ×

Rn → R are continuous, and we express the m constraints as a single vector con-

straint as follows:

K(x) = (K1(x), . . . ,Km(x)) = (c1, . . . ,cm) = c. (2.1.33)

We give the main proposition on the Euler-Lagrange equations and on the natu-ral boundary conditions for a local minimizer subject to isoperimetric constraints.The proof is completely analogous to those of the preceding propositions. Also thedomain of definition D0 ⊂ (C1,pw[ta, tb])n corresponds to that of (2.1.18).

Proposition 2.1.5. Let the curve x ∈D⊂ (C1,pw[ta, tb])n be a local minimizer of theparametric functional (2.1.31) under the isoperimetric constraints (2.1.33), i.e.,

J(x) ≤ J(x) for all x ∈ D∩{K(x) = c}where max

k=1,...,n‖xk − xk‖1,pw < d. (2.1.34)

The Lagrange functions Φ ,Ψi : [ta, tb]×Rn×R

n →R are assumed to be continuousand continuously partially differentiable with respect to the last 2n variables.


http://dx.doi.org/10.1007/978-3-319-71123-2_1


If x is not critical for the constraints, i.e., if δK(x) = (δK1(x), . . . ,δKm(x)) :D0 → R

m is linear, continuous, and surjective (or if the m linear functionalsδK1(x), . . . ,δKm(x) are linearly independent in the sense of Exercise 2.1.1), thenthere holds:

Φx(·,x, x)+m

∑i=1

λiΨi,x(·,x, x) ∈ (C1,pw[ta, tb])n,

ddt(Φ +

m

∑i=1

λiΨi)x = (Φ +m

∑i=1

λiΨi)x piecewise on [ta, tb].(2.1.35)

Here, we agree upon the notation Φx = (Φx1 , . . . ,Φxn), Φx = (Φx1 , . . . ,Φxn),and analogously upon Ψi,x, Ψi,x. The argument of all functions in (2.1.35) is(t,x(t), x(t)) = (t,x1(t), . . . ,xn(t), x1(t), . . . , xn(t)), and (2.1.35)2 is a system of dif-ferential equations.If the k-th component xk of the local minimizer x is free at t = ta and/or t = tb, thenx fulfills the natural boundary conditions

(Φ +m

∑i=1

λiΨi)xk(ta,x(ta), x(ta)) = 0 and/or

(Φ +m

∑i=1

λiΨi)xk(tb,x(tb), x(tb)) = 0,

(2.1.36)

respectively, with the same multipliers λi ∈ R.

If the Lagrange functions Φ and Ψi do not depend explicitly on t, their invariance

Φ(x,α x) = αΦ(x, x),Ψi(x,α x) = αΨi(x, x) for i= 1, . . . ,m and all α > 0,

(2.1.37)

implies that the integrals (2.1.31), (2.1.32), as well as the Euler-Lagrange equationsare invariant with respect to reparametrizations of the admitted curves, cf. Definition1.10.3 and Proposition 1.10.4. The extension to n components and to a system of ndifferential equations is apparent.

Remark. The noncriticality of a local minimizer for the isoperimetric constraintsdepends on the domain of definition D0. Here is an example:

For K(y) =∫ b

ay′dx= c, we obtain

δK(y)h=∫ b

ah′dx= h(b)−h(a) and

δK(y) :C1,pw0 [a,b] → R vanishes identically, whereas

δK(y) :C1,pw[a,b] → R is surjective.


http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1


Exercises

2.1.1. Prove the equivalence of the following statements, where D0 ⊂ (C1,pw[a,b])n

is a subspace:

i) δK(y) = (δK1(y), . . . ,δKm(y)) : D0 → Rm is surjective.

ii) δK1(y), . . . ,δKm(y) are linearly independent in the following sense:∑mi=1 λiδKi(y)h= 0 for all h ∈ D0 implies λ1 = · · · = λm = 0.

2.1.2. Prove for

J(y) =∫ 1

0(y′)2dx and K(y) =

∫ 1

0y2dx,

the equivalence of the following statements:

i) y is a global minimizer of J on D=C1,pw[0,1]∩{y(0) = 0, y(1) = 0}under the isoperimetric constraint K(y) = 1.

ii) y ∈C2[0,1], K(y) = 1, y′′ = λy on [0,1], y(0) = 0, y(1) = 0,

−λ∫ 1

0 h2dx ≤ ∫ 10 (h

′)2dx for all h ∈C1,pw0 [0,1] and some λ ∈ R.

The inequality in ii) is called Poincaré inequality (Poincaré, 1854–1912).Give y and λ < 0 explicitly assuming i) or ii).The existence of a global minimizer is proved in Paragraph 3.3.

2.1.3. Compute a global minimizer y ∈ D=C1,pw[0,1] of

J(y) =∫ 1

0(y′)2dx,

under the isoperimetric constraints

K1(y) =∫ 1

0y2dx= 1,

K2(y) =∫ 1

0ydx= m,

provided it exists.Distinguish the cases m2 = 1 and m2 < 1.If it exists for m = 0, prove a Poincaré inequality for all h ∈ C1,pw[0,1] ∩

{∫ 10 hdx= 0}.

2.2 Dido’s Problem as a Variational Problemwith Isoperimetric Constraint

We treat now the variational problem introduced in Paragraph 1.7 in a more “naturalway”: Which closed curve with a given perimeter bounds a maximal area?


http://dx.doi.org/10.1007/978-3-319-71123-2_3

http://dx.doi.org/10.1007/978-3-319-71123-2_1

2.2 Dido’s Problem as a Variational Problem with Isoperimetric Constraint 67

Piecewise continuously differentiable closed curves are admitted:

D= (C1,pw[ta, tb])2 ∩{x(ta) = x(tb), ‖x(t)‖ > 0 for t ∈ [ti−1, ti], i= 1, . . . ,m}.(2.2.1)

The condition ‖x(t)‖ > 0 excludes cusps. The isoperimetric constraint for curves inD reads

K(x) =∫ tb

ta‖x‖dt =

m

∑i=1

∫ ti

ti−1

√x2

1 + x22dt = L. (2.2.2)

What is the area bounded by a curve in D? We use Green’s formula for a plane con-tinuously totally differentiable vector field f = ( f1, f2) : R2 → R

2 over a boundeddomain Ω ⊂ R

2, whose boundary ∂Ω is a piecewise continuously differentiablepositively oriented curve, cf. Figure 2.2. Green’s formula reads

∫Ω

(∂ f1∂x1

+∂ f2∂x2

)dx=

∫∂Ω

f1dx2 − f2dx1. (2.2.3)

Fig. 2.2 On Green’s Formula

We choose the vector field f (x1,x2) = (x1,x2), and we obtain

∫Ω

2dx=∫

∂Ωx1dx2 − x2dx1 =

∫ tb

tax1x2 − x1x2dt, (2.2.4)

and hence, the parametric functional to be maximized is

J(x) =12

∫ tb

tax1x2 − x1x2dt. (2.2.5)

We show that the hypotheses of Proposition 2.1.5 are fulfilled. Excluding the originin R

2, the conditions on differentiability are fulfilled. The constraint K(x) = L iscritical for the following curves:



δK(x)h= 0 for all h ∈ (C1,pw0 [ta, tb])2 ⇔

ddt

xk√x2

1 + x22

= 0 piecewise on [ta, tb] for k = 1,2 ⇔

xk√x2

1 + x22

= ck on [ta, tb] for k = 1,2, since

xk√x2

1 + x22

∈C1,pw[ta, tb] ⊂C[ta, tb] for k = 1,2,

(2.2.6)

cf. Proposition 1.10.3. A closed curve cannot have tangents in one constant direc-tion, and therefore it is not critical. By (2.1.35)

⎛⎝−x2 +λ

x1√x2

1 + x22

, x1 +λx2√x2

1 + x22

⎞⎠ ∈ (C1,pw[ta, tb])2,

ddt

(−x2 +λ

x1

‖x‖)= x2,

ddt

(x1 +λ

x2

‖x‖)= −x1

piecewise on [ta, tb].

(2.2.7)

Since the expression (2.2.7)1 as well as (x2,−x1) are continuous on [ta, tb], (2.2.7)2

gives

2x2 −λx1

‖x‖ = c1 on [ta, tb]and

2x1 +λx2

‖x‖ = c2 on [ta, tb].(2.2.8)

Multiplication of (2.2.8)1 by x2 and of (2.2.8)2 by x1 and addition of the resultingequations yield

2(x1x1 + x2x2) = c2x1 + c1x2,

orddt(x2

1 + x22) =

ddt(c2x1 + c1x2) piecewise on [ta, tb].

(2.2.9)

Equation (2.2.9)2 implies due to continuity

x21 + x2

2 − c2x1 − c1x2 = c3 on [ta, tb], and finally

(x1 − c2

2)2 +(x2 − c1

2)2 = c3 +

c21

4+

c22

4= R2 on [ta, tb].

(2.2.10)

Any circle with arbitrary center( c2

2 , c12

), radius R, and perimeter L bounds a maxi-

mal area of size L2

4π .


http://dx.doi.org/10.1007/978-3-319-71123-2_1

2.3 The Hanging Chain 69

Exercise

2.2.1. Compute the continuously differentiable curve starting in (0,A) on the posi-tive y-axis and ending on the positive x-axis that encloses with the positive axes agiven fixed area S that creates a minimal surface of revolution when rotating aroundthe x-axis.

2.3 The Hanging Chain

At first sight, a hanging chain acted on only by gravity looks like a parabola, cf.Figure 2.3.

Fig. 2.3 A Hanging Chain

The variational principles of mechanics require that the potential energy of thehanging chain is minimal. Let the chain hang between (a,A) and (b,B) and denotey(x) the height of the chain over x ∈ [a,b]. Then, its potential energy is given by

gρ∫ b

ay√

1+(y′)2dx (2.3.1)

where g is the gravitational acceleration and ρ is the density of the homogeneousmass of the chain. (The mass at height y(x) is ρds = ρ

√1+(y′(x))2dx.) The vari-

ational problem then is the following:



Minimize J(y) =∫ b

ay√

1+(y′)2dx

in D=C1[a,b]∩{y(a) = A, y(b) = B}under the isoperimetric constraint

K(y) =∫ b

a

√1+(y′)2dx= L,

(2.3.2)

where L is the length of the chain.A necessary condition is obviously (b− a)2 + (B− A)2 ≤ L2, and in case of

equality only the straight line between (a,A) and (b,B) fulfills the constraint.The hypotheses of Proposition 2.1.1 are fulfilled if y is not critical for the con-

straint K. As shown in Example 4 of Paragraph 1.5, the only critical function for thefunctional K is the straight line which is excluded in case (b−a)2 +(B−A)2 < L2.

Therefore, the Euler-Lagrange equation (2.1.5) holds. Neither the Lagrange func-tion F nor G depend explicitly on x, and therefore, we can proceed as in the specialcase 7 of Paragraph 1.5. To that purpose, we need the regularity y ∈ C2(a,b). Byvirtue of

(Fy′y′ +λGy′y′)(y,y′) = (y+λ )

1

(1+(y′)2)3/2�= 0

for y+λ �= 0,

(2.3.3)

Exercise 1.5.1 guarantees that regularity in any open interval where (2.3.3)2 is sat-isfied. Since the Euler-Lagrange equation,

ddx

((y+λ )y′√

1+(y′)2) =

√1+(y′)2 > 0 on [a,b], (2.3.4)

has no piecewise constant solutions, there is an interval (x1,x2) ⊂ [a,b] where(2.3.3)2 holds, and by the regularity y ∈C2(x1,x2), the minimizer y fulfills

F(y,y′)+λG(y,y′)− y′(Fy′(y,y′)+λGy′(y,y′)) = c1, (2.3.5)

cf. case 7 of Paragraph 1.5. Here, (2.3.5) means

y+λ = c1

√1+(y′)2 on (x1,x2). (2.3.6)

By separation of variables, cf. case 7 of Paragraph 1.5, we obtain as in (1.6.8)

y(x)+λ = c1 cosh

(x+ c2

c1

)for x ∈ (x1,x2). (2.3.7)

Clearly c1 �= 0. Hence, y(x)+λ �= 0 not only for x ∈ (x1,x2) but for all x ∈ [a,b].Check that (2.3.7) is indeed a solution of the Euler-Lagrange equation on the entireinterval [a,b].

The constants c1 �= 0, c2 and λ are determined by the boundary conditions y(a) =A, y(b) = B and K(y) = L. The computation is not trivial, and it gives for (b−a)2+


http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

2.4 The Weierstraß-Erdmann Corner Conditions under Isoperimetric Constraints 71

(B−A)2 < L2 two solutions c1 > 0 and c1 < 0. The hanging chain with minimalpotential energy is described by c1 > 0, and the solution with c1 < 0 describes theshape of a chain of length L with maximal potential energy.

Since the hyperbolic cosine describes the hanging chain, it is also called a cate-nary.

2.4 The Weierstraß-Erdmann Corner Conditionsunder Isoperimetric Constraints

Using the parametrization (1.11.2) for the functional (2.1.1), as well as for the con-straint (2.1.2), we obtain

J(y) = J(x, y) =∫ τb

τaF(x, y,

˙y˙x) ˙xdτ and

K(y) = K(x, y) =∫ τb

τaG(x, y,

˙y˙x) ˙xdτ = c.

(2.4.1)

We assume that the Lagrange functions F,G : R3 → R are continuously totally dif-ferentiable. The domain of definition D ⊂C1,pw[a,b] of the functional J is possiblycharacterized by boundary conditions at x= a and/or x= b. Arguing as in the proofof Proposition 1.11.1, we realize that a global minimizer y ∈ D ⊂ C1,pw[a,b] of Junder the isoperimetric constraint K(y) = c is a local minimizer of (2.4.1)1 under theisoperimetric constraint (2.4.1)2. This is true for any parametrization that is admit-ted. The domain of definition D ⊂ (C1,pw[τa,τb])2 is determined by the domain ofdefinition D of J, but in any case, x(τa) = a and x(τb) = b.

If y is not critical for the constraint K, Proposition 2.1.3 holds with n= 1 and m=1. In order to apply Proposition 2.1.5 with n= 2 and m= 1, the curve {(x,y(x))|x ∈[a,b]} has to be noncritical for the parametric constraint K (2.4.1)2.

By Proposition 1.10.2, we obtain for Ψ(x, y, ˙x, ˙y) = G(x, y, ˙y˙x ) ˙x and h= (0,h2),

δK(x, y)h=∫ τb

τaGy(x, y,

˙y˙x) ˙xh2 +Gy′(x, y,

˙y˙x)h2dτ

=∫ b

aGy(x,y,y′)h+Gy′(x,y,y

′)h′dx

= δK(y)h,

(2.4.2)

where we use the substitutions x= ϕ(τ) = x(τ) and h(x) = h2(ϕ−1(x)) = h2(τ), cf.(1.11.2), (1.11.3).

Relation (2.4.2) proves: If δK(y) : D ⊂ C1,pw[a,b] → R is not identically zero,i.e., if y is not critical for K, then δK(x, y) : D ⊂ (C1,pw[τa,τb])2 → R is not identi-cally zero either, which means that the curve {(x,y(x))|x∈ [a,b]}= {(x(τ), y(τ))|τ ∈[τa,τb]} is not critical for the parametric constraint as well.


http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1


Proposition 2.4.1. Let y ∈ D ⊂C1,pw[a,b] be a global minimizer of the functional

J(y) =∫ b

aF(x,y,y′)dx (2.4.3)

under the isoperimetric constraint

K(y) =∫ b

aG(x,y,y′)dx= c (2.4.4)

where the Lagrange functions F,G : R3 → R are continuously totally differentiable.Assume that y is not critical for K. Then, the Lagrange equation (2.1.5) and

Fy′(·,y,y′)+λGy′(·,y,y′) ∈C[a,b],F(·,y,y′)+λG(·,y,y′)− y′(Fy′(·,y,y′)+λGy′(·,y,y′)) ∈C[a,b],

(2.4.5)

hold for the same λ ∈ R.If F and G do not depend explicitly on x, then

F(y,y′)+λG(y,y′)− y′(Fy′(y,y′)+λGy′(y,y′)) = c1. (2.4.6)

holds on [a,b].

Proof. We know that the parametrized curve {(x,y(x))|x ∈ [a,b]} is a local mini-mizer of (2.4.1)1 under the constraint (2.4.1)2. Furthermore the curve is not crit-ical for (2.4.1)2. By Proposition 2.1.5, the regularity (2.1.35)1 then holds for theLagrange functions Φ(x, y, ˙x, ˙y) = F(x, y, ˙y

˙x ) ˙x and Ψ(x, y, ˙x, ˙y) = G(x, y, ˙y˙x ) ˙x, and in

particular,

Φ ˙x(x, y, ˙x, ˙y)+λΨ˙x(x, y, ˙x, ˙y) ∈C1,pw[τa,τb] ⊂C[τa,τb],

Φ ˙x(x, y, ˙x, ˙y) = F(x, y,˙y˙x)−Fy′(x, y,

˙y˙x)

˙y˙x,

Ψ˙x(x, y, ˙x, ˙y) = G(x, y,˙y˙x)−Gy′(x, y,

˙y˙x)

˙y˙x.

(2.4.7)

Due to the invariance of the functionals according to Definition 1.10.2, Proposition2.1.5 holds for any reparametrization of the curve (x, y). Choosing the parameter x,we have x= x, ˙x= 1, y= y, ˙y= y′, τa = a, τb = b, which proves (2.4.5)2 by (2.4.7).The continuity (2.4.5)1 is proved in Proposition 2.1.1. The supplement (2.4.6) isExercise 2.4.1. �

We apply Proposition 2.4.1 to the following Example: Minimize

J(y) =∫ b

aW (y′)dx in D=C1,pw[a,b] (2.4.8)


http://dx.doi.org/10.1007/978-3-319-71123-2_1



K(y) =∫ b

ay′dx= m. (2.4.9)

The potentialW : R→R is continuously differentiable and sketched in Figure 1.21.Since no boundary conditions are imposed, we have δK(y)h=

∫ ba h

′dx= h(b)−h(a)for all y ∈ D, which are therefore not critical for the constraint (2.4.9).

Proposition 2.4.1 states the necessary conditions, which a global minimizer of(2.4.8) under (2.4.9) has to fulfill:

W ′(y′)+λ ∈C1,pw[a,b] ⊂C[a,b],ddx

(W ′(y′)+λ ) = 0 or W ′(y′)+λ = c1 on [a,b],

W (y′)+λy′ − y′(W ′(y′)+λ ) =W (y′)− y′W ′(y′) = c2 on [a,b].

(2.4.10)

Equations (2.4.10) have the same solutions as the variational problem (2.4.8) with-out a constraint, cf. (1.11.14)–(1.11.19):A global minimizer is a line or a sawtooth function having two specific slopes c1

1 =αand c2

1 = β that we sketch in Figure 1.21 and that fulfill (1.11.18) or (1.11.19). Thenumber of corners, however, is not determined. For α < β , let

y′ = α on intervals Iαi ⊂ [a,b], i= 1, . . . ,m1,

y′ = β on intervals Iβi ⊂ [a,b], i= 1, . . . ,m2,

wherem1⋃i=1

Iαi = Iα ,

m2⋃i=1

Iβi = Iβ , Iα ∪ Iβ = [a,b].

(2.4.11)

One possible function y is sketched in Figure 2.4, and we remark that any y+ cfulfills the necessary conditions as well.

A function (2.4.11) fulfills the constraint (2.4.9) if

α|Iα |+β |Iβ | = m ,

|Iα |+ |Iβ | = b−a(2.4.12)

holds. |Iα |, |Iβ | denote the total length of all intervals Iαi and Iβ

i , respectively. Rela-tions (2.4.12) imply

α <m

b−a< β and

|Iα | = (b−a)β −mβ −α

, |Iβ | = m− (b−a)αβ −α

.(2.4.13)


http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1


Fig. 2.4 A Global Minimizer

For mb−a

≤ α or β ≤ mb−a

y′ =m

b−aon [a,b]

(2.4.14)

gives the only function that fulfills the constraint (2.4.9).Are the functions given in (2.4.11)–(2.4.14) indeed global minimizers?Figure 1.21 shows that

W (α)+W (β )−W (α)

β −α(y′ −α) ≤W (y′) (2.4.15)

holds for all y′ ∈ R, and integration of (2.4.15) yields, using∫ ba y

′dx= m,

W (α)(b−a)β −m

β −α+W (β )

m− (b−a)αβ −α

≤∫ b

aW (y′)dx. (2.4.16)

Therefore, the sawtooth function (2.4.11) with (2.4.13) fulfills

J(y) =∫ b

aW (y′)dx ≤

∫ b

aW (y′)dx= J(y)

for all y ∈C1,pw[a,b] fulfilling the constraint (2.4.9).(2.4.17)


http://dx.doi.org/10.1007/978-3-319-71123-2_1


Again as shown by Figure 1.21 in case (2.4.14),

W

(m

b−a

)+W ′

(m

b−a

)(y′ − m

b−a

)≤W (y′) (2.4.18)

holds for all y′ ∈ R. Integration of (2.4.18) yields, using∫ ba y

′dx= m,

W

(m

b−a

)(b−a) ≤

∫ b

aW (y′)dx. (2.4.19)

Therefore, the line given by (2.4.14) fulfills

J(y) =∫ b

aW (y′)dx ≤

∫ b

aW (y′)dx= J(y)

for all y ∈C1,pw[a,b] fulfilling the constraint (2.4.9).(2.4.20)

All functions that fulfill the necessary conditions given in Proposition 2.4.1 areglobal minimizers of the functional (2.4.8) under the constraint (2.4.9).

Substituting y′ = u in (2.4.8) and in (2.4.9), the functions given above are globalminimizers of

J(u) =∫ b

aW (u)dx in D=Cpw[a,b], (2.4.21)


K(u) =∫ b

audx= m. (2.4.22)

Indeed, any y ∈C1,pw[a,b] gives via y′ = u a function in Cpw[a,b], and conversely,any u∈Cpw[a,b] gives by integration a function y∈C1,pw[a,b]. Note that no bound-ary conditions are involved.

An application of problem (2.4.21), (2.4.22) is given by J. Carr, M.E. Gurtin, andM. Slemrod in “Structured Phase Transitions on a Finite Interval,” Arch. Rat. Mech.Anal. 86, 317–351 (1984). They investigate a so-called spinodal decomposition ofa binary alloy. Stable states are described by minimizers of the total energy (2.4.21)under preservation of mass (2.4.22). Here, the values of the minima of the potentialW at u= α and at u= β are identical. The function u measures the relative density(or concentration) of one component, and it is scaled as follows: If u(x) = α oru(x) = β , then only one component is present at x ∈ [a,b], respectively. Thus, thepiecewise constant minimizers (2.4.11) describe complete decompositions of thetwo components of the alloy. However, the distribution of the two components, alsocalled phases, is completely arbitrary.

Which distribution of the two phases or which pattern of the interfaces is pre-ferred, cannot be determined by the model (2.4.21), (2.4.22).

More information is given by the energy


http://dx.doi.org/10.1007/978-3-319-71123-2_1


Jε(u) =∫ b

aε(u′)2 +W (u)dx, (2.4.23)

where for small ε > 0, the additional term models the energy of interfaces betweenthe phases. This functional in one or higher dimensions is the so-called Allen-Cahnor Modica-Mortola functional. One expects that global minimizers uε of Jε(u) underpreservation of mass (2.4.22) converge to a global minimizer u0 of J0(u) = J(u)as ε tends to zero and that this limit describes the physically relevant distributionof the two phases. This limiting behavior is proved in the above-mentioned paper.Moreover, the minimizers uε are monotonic and converge pointwise to u0 which hasa single interface. The situation is sketched in Figure 2.5.

Fig. 2.5 Special Global Minimizers

The one-dimensional model (2.4.23) is interesting from a mathematical point ofview but it does not describe a real situation. More realistic models in two or higherdimensions are studied in the literature. We mention a result for a two-dimensionalmodel showing the analog limiting behavior, which is presented by H. Kielhöferin “Minimizing Sequences Selected via Singular Perturbations and their PatternFormation,” Arch. Rat. Mech. Anal. 155, 261–276 (2000). For the piecewise con-stant limit describing a complete decomposition of the two components of the alloy,the “minimal interface criterion” holds, cf. L. Modica, “The Gradient Theory ofPhase Transitions and the Minimal Interface Criterion,” Arch. Rat. Mech. Anal. 98,123–142 (1987).

Exercise

2.4.1. Prove the supplement (2.4.6) of Proposition 2.4.1.


2.5 Holonomic Constraints 77

2.5 Holonomic Constraints

The notion of a holonomic constraint - an expression commonly accredited to H.Hertz - was introduced in classical mechanics. The point masses are not free inspace but are forced to be on some manifold. For instance, a pendulum moves on acircle and airplanes or deep-sea vessels run on a sphere.

We present the mathematical definition:

Definition 2.5.1. Let a parametric functional

J(x) =∫ tb

taΦ(x, x)dt, (2.5.1)

be defined on D ⊂ (C1[ta, tb])n. Then,

Ψ(x) = 0, (2.5.2)

is called a holonomic constraint. Here,

Ψ : Rn → Rm where n > m, (2.5.3)

is a continuously totally differentiable function, and we assume that the Jacobianmatrix

DΨ(x) =(

∂Ψi

∂x j(x)

)i=1,...,mj=1,...,n

∈ Rm×n (2.5.4)

has the maximal rank m for all x satisfyingΨ(x) = 0.

In the Appendix we prove that the set,

M = {x ∈ Rn|Ψ(x) = 0}, is a (n−m)-dimensional

continuously differentiable manifold.(2.5.5)

The constraint (2.5.2) forces admitted curves in D ⊂ (C1,pw[ta, tb])n to be on themanifold M. The domain of definition D ⊂ (C1,pw[ta, tb])n may be characterized byboundary conditions on some components at t = ta and/or t = tb, which have clearlyto be compatible with the constraint.

Again Lagrange’s multiplier rule is the crucial tool to deduct necessary condi-tions on local constrained minimizers, but the derivation is not as simple as forisoperimetric constraints. The main difference is that the Lagrange multipliers areno longer constant. For technical reasons, we need more regularity of the Lagrangefunction Φ , of the constraint Ψ , and of the local minimizer x.

Proposition 2.5.1. Let the curve x ∈ D ⊂ (C2[ta, tb])n be a local minimizer of theparametric functional (2.5.1) under the holonomic constraint (2.5.2). In particular,



J(x) ≤ J(x) for all x ∈ D∩{Ψ(x) = 0}satisfying max

k=1,...,n‖xk − xk‖1 < d. (2.5.6)

We assume that the functions Φ : Rn ×Rn → R and Ψ : Rn → R

m are twice andthree times continuously partially differentiable, respectively. Then, there exists acontinuous function

λ = (λ1, . . . ,λm) : [ta, tb] → Rm, (2.5.7)

such that x is a solution of the following system of n differential equations:

ddt

Φx(x, x) = Φx(x, x)+m

∑i=1

λi∇Ψi(x) on [ta, tb]. (2.5.8)

Here, we agree upon Φx = (Φx1 . . . ,Φxn), Φx = (Φx1 . . . ,Φxn), x= (x1, . . . ,xn), x=(x1, . . . , xn), and ∇Ψi = (Ψi,x1 , . . . ,Ψi,xn). The functions λ1, . . . ,λm are calledLagrange multipliers.

Proof. As a first step, we construct an admitted perturbation of the minimizing curvex ∈ D∩{Ψ(x) = 0}. That perturbation is no longer of the simple form x+ sh, sincewe need to guarantee that the perturbation x+h(s, ·) stays on the manifold M.

For that purpose, we use the nomenclature of the Appendix. Let

h ∈ (C20 [ta, tb])

n be a curve in Rn, (2.5.9)

and for x∈M, let P(x) :Rn → TxM be the orthogonal projection on the tangent spaceTxM in x. This operator is twice continuously differentiable with respect to x ∈ M,cf. (A.17), (A.19). Then,

a(t) = P(x(t))h(t) ∈ Tx(t)M for [ta, tb],

a ∈ (C2[ta, tb])n and a(ta) = 0, a(tb) = 0.(2.5.10)

For x ∈ M, let Q(x) : Rn → NxM be the orthogonal projection on the normal spaceNxM. Again this operator is twice continuously differentiable with respect to x ∈M,cf. (A.16), and for any fixed t0 ∈ [ta, tb],

Q(x(t)) : Nx(t0)M → Nx(t)M is bijective, provided |t− t0| < δ , (2.5.11)

where 0 < δ is sufficiently small, cf. (A.18). We define

H : R× [ta, tb]×Nx(t0)M → Rm by

H(s, t,z) =Ψ(x(t)+ sa(t)+Q(x(t))z),(2.5.12)

and by the assumptions on x and Ψ , the properties of the curve a, and the projectionQ, the function H satisfies:



H is twice continuously differentiable with respect to all three variables,

H(0, t0,0) =Ψ(x(t0)) = 0,

DzH(0, t0,0) = DΨ(x(t0))Q(x(t0)) : Nx(t0)M → Rm is bijective,

since Q(x(t0))|Nx(t0)M = I , and

DΨ(x(t0)) : Nx(t0)M → Rm is bijective, cf. (A.9).

(2.5.13)By the theorem on implicit functions, there exists a unique function

β : (−ε0,ε0)× (t0 −δ0, t0 +δ0) → Nx(t0)M, satisfying

β (0, t0) = 0 and H(s, t,β (s, t)) = 0

for (s, t) ∈ (−ε0,ε0)× (t0 −δ0, t0 +δ0).

(2.5.14)

Moreover that function β is twice continuously differentiable with respect to its twovariables. (If t0 = ta or t0 = tb, choose the domain of definitions (−ε0,ε0)× [ta, ta+δ0) and (−ε0,ε0)× (tb −δ0, tb], respectively.) Defining

h(s, t) = sa(t)+Q(x(t))β (s, t) = sa(t)+b(s, t), we obtain

Ψ(x(t)+h(s, t)) = 0, meaning x(t)+h(s, t) ∈ M

for (s, t) ∈ (−ε0,ε0)× (t0 −δ0, t0 +δ0).(2.5.15)

Moreover, by construction,

a(t) ∈ Tx(t)M and b(s, t) ∈ Nx(t)M. (2.5.16)

Due to the local uniqueness of β in (2.5.14) and by the injectivity of Q(x(t)) onNx(t0)M, cf. (2.5.11) for 0 < δ ≤ δ0, the perturbation h has the following properties:

sa(t) = 0 ⇒ β (s, t) = 0. In particular

β (0, t) = 0, β (s, ta) = 0, β (s, tb) = 0, meaning

h(0, t) = 0 for t ∈ (t0 −δ0, t0 +δ0) and

h(s, ta) = 0, h(s, tb) = 0.

(2.5.17)

By Ψ(x(t) + sa(t) + b(s, t)) = 0 for all s ∈ (−ε0,ε0) and for any fixed t ∈ (t0 −δ0, t0 +δ0), the derivative with respect to s vanishes, i.e.,

∂∂ s

Ψ(x(t)+ sa(t)+b(s, t))|s=0

= DΨ(x(t))(a(t)+∂∂ s

b(0, t)) = DΨ(x(t))∂∂ s

b(0, t) = 0,

since a(t) ∈ Tx(t)M = Kern DΨ(x(t)) , cf. (A.7).

(2.5.18)



By construction b(s, t) ∈ Nx(t)M, cf. (2.5.16), and hence, ∂∂ s b(0, t) ∈ Nx(t)M. Then,

injectivity of DΨ(x(t)) on Nx(t)M, cf. (A.9), implies

∂∂ s

b(0, t) = 0, and by definition of h in (2.5.15) ,

∂∂ s

h(0, t) = a(t) for all t ∈ (t0 −δ0, t0 +δ0).(2.5.19)

Finally, due to the continuity of the second derivatives, we note

∂∂ s

∂∂ t

h(0, t) =∂∂ t

∂∂ s

h(0, t) =∂∂ t

a(t) = a(t)

for all t ∈ (t0 −δ0, t0 +δ0).(2.5.20)

Thus far, h is constructed only on (−ε0,ε0)× (t0 − δ0, t0 + δ0), where δ0 and ε0

depend on t0. Since [ta, tb] is compact, finitely many open intervals (t0 −δ0, t0 +δ0)cover [ta, tb], and for simplicity we denote the smallest among the finitely manyε0 > 0 again by ε0. For t in an intersection of two intervals (t0 −δ0, t0 +δ0), the twofunctions β (s, t) coincide on the intersection of the two rectangles (−ε0,ε0)× (t0 −δ0, t0 +δ0) by uniqueness, cf. (2.5.14). We summarize:

There exists a perturbation

h : (−ε0,ε0)× [ta, tb] → Rn which is twice continuously differentiable

with respect to both variables and which has the property that

x(t)+h(s, t) ∈ M for (s, t) ∈ (−ε0,ε0)× [ta, tb].The perturbation is composed of two terms, h(s, t) = sa(t)+b(s, t),where a(t) ∈ Tx(t)M, b(s, t) ∈ Nx(t)M, satisfying

h(s, ta) = 0, h(s, tb) = 0 for s ∈ (−ε0,ε0),

h(0, t) = 0,∂∂ s

h(0, t) = a(t) and∂ 2

∂ s∂ th(0, t) = a(t) for t ∈ [ta, tb].

(2.5.21)In a second step of the proof, we make use of the minimizing property of x under

the holonomic constraint. Therefore, in view of the properties of the perturbation h,the real-valued function

J(x+h(s, ·)) is locally minimal at s= 0 , whence

dds

J(x+h(s, ·))|s=0 = 0.(2.5.22)

Following the arguments in the proof of Proposition 1.2.1, the derivative withrespect to s is computed as follows:


http://dx.doi.org/10.1007/978-3-319-71123-2_1


Fig. 2.6 A Perturbation on a Manifold

dds

∫ tb

taΦ(x+h(s, ·), x+ ∂

∂ th(s, ·))dt|s=0

=∫ tb

ta

n

∑k=1

(Φxk(x, x)∂∂ s

hk(0, ·)+Φxk(x, x)∂ 2

∂ s∂ thk(0, ·))dt

=∫ tb

ta(Φx(x, x),a)+(Φx(x, x), a)dt = 0,

(2.5.23)

where, as usual, Φx = (Φx1 , . . . ,Φxn),Φx = (Φx1 , . . . ,Φxn), and ( , ) is the scalarproduct in R

n. We also use (2.5.21)8.Due to the assumed regularity of x and Φ , integration by parts is possible, which

yields ∫ tb

ta(Φx(x, x)− d

dtΦx(x, x),a)dt = 0. (2.5.24)

Note that due to a(ta) = 0 and a(tb) = 0, the boundary terms vanish.The following computations make use of both the properties of the projections

summarized in (A.14), and of the Definition (A.7) of Tx(t)M and Nx(t)M:



Q(x(t))(ddt

Φx(x(t), x(t))−Φx(x(t), x(t))) ∈ Nx(t)M = range DΨ(x(t))∗,

DΨ(x(t))∗ = (∇Ψ1(x(t)) · · ·∇Ψm(x(t))) ∈ Rn×m,

Q(x(t))(ddt

Φx(x(t), x(t))−Φx(x(t), x(t))) =m

∑i=1

λi(t)∇Ψi(x(t))

for t ∈ [ta, tb].(2.5.25)

(If linear mappings or operators are identified with matrices, then they operate oncolumn vectors. In the transposed Jacobian matrix, cf. (A.3), gradients of the com-ponents of a mapping are columns.)

In view of (A.24), the coefficients λi(t) are unique, and by continuity of the left-hand side, λ = (λ , . . . ,λm) ∈ (C[ta, tb])m.

Coming back to (2.5.9) and (2.5.10), we have the decompositions

h(t) = P(x(t))h(t)+Q(x(t))h(t) = a(t)+b(t),where a(t) ∈ Tx(t)M and b(t) ∈ Nx(t)M.

(2.5.26)

Then finally:

∫ tb

ta(ddt

Φx(x, x)−Φx(x, x)−m

∑i=1

λi∇Ψi(x),h)dt

=∫ tb

ta(ddt


∑i=1

λi∇Ψi(x),a)dt

+∫ tb

ta(ddt


∑i=1

λi∇Ψi(x),b)dt = 0.

(2.5.27)

The first integral (2.5.27)2 vanishes by virtue of (2.5.24) and (∑mi=1 λi∇Ψi(x),a) = 0

pointwise for all t ∈ [ta, tb]. Observe that the vectors in the scalar product are ortho-gonal. The second integral (2.5.27)3 vanishes by (2.5.25)3: Using the properties(A.14) of the projection Q(x(t)), the integrand vanishes pointwise for all t ∈ [ta, tb].Observe that b= Q(x)b.

Since h ∈ (C20 [ta, tb])

n is arbitrary, Lemma 1.3.1 and (2.5.27) imply the claim(2.5.8). �

Lagrangian Mechanics:N point masses m1, . . . ,mN in R

3 have the coordinates x=(x1,y1,z1, . . . ,xN ,yN ,zN)∈R

3N , where (xk,yk,zk) are the coordinates of mk. Their motion is governed by thekinetic energy

T (x) =N

∑k=1

12mk(x2

k + y2k + z2

k), (2.5.28)

and by the potential energy

V (x) =V (x1,y1,z1, . . . ,xN ,yN ,zN) (2.5.29)


http://dx.doi.org/10.1007/978-3-319-71123-2_1


under m < 3N holonomic constraints

Ψi(x) =Ψi(x1,y1,z1, . . . ,xN ,yN ,zN) = 0, i= 1, . . . ,m, (2.5.30)

which describe exterior constraints as well as possibly interior links among the pointmasses.

The action along a trajectory of all N point masses is

J(x) =∫ tb

taT (x)−V (x)dt, (2.5.31)

where L= T −V is called the Lagrangian. According to Proposition 2.5.1, a trajec-tory that minimizes the action has to satisfy

mkxk = −Vxk(x)+m

∑i=1

λiΨi,xk(x),

mkyk = −Vyk(x)+m

∑i=1

λiΨi,yk(x),

mkzk = −Vzk(x)+m

∑i=1

λiΨi,zk(x) for k = 1, . . . ,N,

(2.5.32)

provided the hypotheses of Proposition 2.5.1 are fulfilled.The number 3N −m is the dimension of the manifold M in R

3N defined by theconstraints (2.5.30). It is the number of the degrees of freedom that the point masseshave in moving freely on the manifold M. The 3N−m free coordinates on the man-ifold are called generalized coordinates. The additional terms in (2.5.32) comparedto the free system (1.10.27) have the physical dimension of forces and are calledconstraining forces. They act orthogonally to the manifold M, cf. (2.5.25).

The simple gravity pendulum: The point mass m moves in the vertical (x1,x2)-plane subject to gravity mg in direction of the negative x2-axis, and it is forced tomove on a circle with radius �.

The Lagrangian L= T −V is given by

L(x, x) =12m(x2

1 + x22)−mgx2, (2.5.33)

with the holonomic constraint

Ψ(x) = x21 + x2

2 − �2 = 0. (2.5.34)

We then obtain the system (cf. (2.5.32))

mx1 = 2λx1,

mx2 = −mg+2λx2.(2.5.35)

The constraining force acts in direction (x1,x2) orthogonal to the circle (2.5.34).


http://dx.doi.org/10.1007/978-3-319-71123-2_1


Fig. 2.7 The Gravity Pendulum

The equation of motion of a pendulum is commonly given in terms of the gen-eralized coordinate ϕ , which is the angle sketched in Figure 2.7. Introducing polarcoordinates

x1 = r sinϕ,

x2 = −r cosϕ,(2.5.36)

the Lagrangian (2.5.33) becomes

12m(r2 + r2ϕ2)+mgr cosϕ. (2.5.37)

On the circle (2.5.34), we have r = �, r = 0, and in terms of the generalized coordi-nate ϕ , the Lagrangian reads

L(ϕ, ϕ) =12m�2ϕ2 +mg�cosϕ. (2.5.38)

Hamilton’s principle says that solutions of the Euler-Lagrange equation with theLagrangian (2.5.38) give solutions of the constrained system (2.5.35) and vice versa.We confirm Hamilton’s principle for this example.

The Euler-Lagrange equation for (2.5.38) is the so-called pendulum equation

ddtLϕ = Lϕ , or explicitly

�ϕ +gsinϕ = 0.(2.5.39)

Let ϕ be a solution of (2.5.39). According to Exercise 1.10.4, the conservation law


http://dx.doi.org/10.1007/978-3-319-71123-2_1


12�ϕ2 −gcosϕ = E = const. (2.5.40)

is valid along any solution. Defining

x1 = �sinϕ, x1 = �cosϕϕ , x1 = −�sinϕϕ2 + �cosϕϕ,

x2 = −�cosϕ, x2 = �sinϕϕ , x2 = �cosϕϕ2 + �sinϕϕ,(2.5.41)

and using (2.5.39) and (2.5.40), the functions (2.5.41) solve the system (2.5.35) for

λ = − m2�

(3gcosϕ +2E) = − m2�

(−3g�x2 +2E). (2.5.42)

On the other hand, let (x1,x2) be a solution of (2.5.35) satisfying x21 + x2

2 = �2.We define

ϕ = arctan

(−x1

x2

)or

x1 = �sinϕ,x2 = −�cosϕ,

(2.5.43)

and by (2.5.41), one obtains

mx1 −2λx1 = −m�sinϕϕ2 +m�cosϕϕ −2λ�sinϕ = 0,

mx2 +mg−2λx2 = m�cosϕϕ2 +m�sinϕϕ +mg+2λ�cosϕ = 0.(2.5.44)

Multiplication of the first equation by cosϕ and of the second equation by sinϕ andaddition of the two equations yield the differential equation (2.5.39) after divisionby m. Relation (2.5.42) gives the constraining force 2λ (x1,x2).

Another example is the cycloidal pendulum: The point mass m moves in a ver-tical (x1,x2)-plane subject to gravity mg, and it is forced on a cycloid. In order touse the formulas (1.8.17) for the cycloid, we orient the x2-axis downward. The gen-eralized coordinate ϕ on the cycloid is the angle of the generating wheel sketchedin Figure 2.8. By (1.8.17), the point mass m has the coordinates

x1 = r(ϕ − sinϕ),x2 = r(1− cosϕ), ϕ ∈ [0,2π], (2.5.45)

which allows the Lagrangian to be expressed in terms of the generalized coordinateϕ as follows:

x1 = r(1− cosϕ)ϕ,

x2 = r sinϕϕ ,

T =12m(x2

1 + x22) = mr2(1− cosϕ)ϕ2,

V = −mgx2 = mgr(cosϕ −1),

L= mr2(1− cosϕ)ϕ2 −mgr(cosϕ −1).

(2.5.46)

This gives finally the Euler-Lagrange equation


http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1


Fig. 2.8 The Cycloidal Pendulum

ddtLϕ = Lϕ , or explicitly

(1− cosϕ)ϕ +12

sinϕϕ2 =g2r

sinϕ.

(2.5.47)

Using the trigonometric formulas 1−cosϕ = 2sin2 ϕ2 and sinϕ = 2sin ϕ

2 cos ϕ2 , one

obtains after division by 2sin ϕ2

sinϕ2

ϕ +12

cosϕ2

ϕ2 =g2r

cosϕ2

,

or −2d2

dt2cos

ϕ2=

g2r

cosϕ2

,

or q+ωq= 0, where q= cosϕ2

and ω =g4r

.

(2.5.48)

The last equation has the solutions q(t) = c1 sin√

ωt+ c2 cos√

ωt with the period

P= 2π

√4rg

. (2.5.49)

In contrast to the simple gravity pendulum with period

P ≈ 2π

√�

g, (2.5.50)

valid only for small amplitudes, the period (2.5.49) of the cycloidal pendulum doesnot depend on the amplitude. Therefore, the cycloidal pendulum is isochronal.



Remark. If N point masses are forced by a holonomic constraint on a (3N −m)-dimensional manifold, they have by definition 3N−m= n degrees of freedom. Thatn-dimensional manifold M can be described by n independent and free coordinatesq1, . . . ,qn. (In (A.11), this is done locally.).

If the Lagrangian L = T −V is expressed in terms of the so-called generalizedcoordinates q= (q1, . . . ,qn), then by Hamilton’s principle the motion of the N pointmasses is governed by the Euler-Lagrange equation of the action:

q= (q1, . . . ,qn) are the generalized coordinates,

L(q, q) is the Lagrangian,

ddtLq(q, q) = Lq(q, q) is the Euler-Lagrange equation,

(2.5.51)

whose solutions describe the motion of the N point masses. It is a system of n ordi-nary differential equations of second order.

By Proposition 1.4.2, the strong and weak versions of the Euler-Lagrange equa-tion are equivalent, and the latter means that the first variation of the action func-tional vanishes. This is also shortly denoted as a principle of stationary or leastaction.

By the so-called Legendre transformation, new coordinates and a new functionH are introduced:

p= Lq(q, q), (2.5.52)

where the solution for q is required to be unique, i.e.,

p= Lq(q, q) ⇔ q= h(p,q), (2.5.53)

for a continuously totally differentiable function h : Rn ×Rn → R

n. One defines

H : Rn ×Rn → R,

H(p,q) = L(q,h(p,q))− (p,h(p,q)), or more concisely,

H = L− qLq = L− pq,

(2.5.54)

where the notation for the scalar product in Rn, used in (2.5.54)3, is commonly usedin physics.

The coordinates p are called generalized momenta, and the function H is calledthe Hamiltonian. For a solution q of the Euler-Lagrange system (2.5.51)3, oneobtains by definitions (2.5.53) and (2.5.54)2,

Hq = Lq+Lqhq − phq = Lq =ddtLq = p,

Hp = Lqhp −h− php = −h= −q.(2.5.55)

The so-called Hamiltonian system


http://dx.doi.org/10.1007/978-3-319-71123-2_1


p= Hq(p,q),q= −Hp(p,q),

(2.5.56)

is equivalent to the Euler-Lagrange system (2.5.51)3 if one redefines p = Lq,q = h(p,q), and L = H + pq. The Hamiltonian system is a system of 2n ordinarydifferential equations of first order.

The price for the reduction of the order is the doubling of the dimension. What isthe advantage?

The special structure of the Hamiltonian system implies some useful propertiesof its flow. We mention an important conservation law:

ddtH(p,q) = Hp(p,q)p+Hq(p,q)q= 0, whence

H(p,q) = const. along solutions of (2.5.56).(2.5.57)

For L= 12mq

2 −V (q), the Hamiltonian H = L− qLq = − 12mq

2 −V (q) = − 12m p2 −

V (q) = −E is the negative total energy which is conserved along solutions of theHamiltonian system. (We see in Exercise 1.10.4 that this is true also for solutionsof the Euler-Lagrange equation.) In this case, the Hamiltonian system reads p =−Vq(q), q= 1

m p.More conservation laws can be found as follows: A function E = E(p,q) is con-

stant along solutions of (2.5.56) if

ddtE = Ep p+Eqq= EpHq −EqHp = 0 or if

[E,H] = EpHq −EqHp = 0.(2.5.58)

The expression [E,H] is called Poisson bracket. Equation (2.5.58)2 is a partial dif-ferential equation of first order.

Finally, we mention that the flow of the Hamiltonian system (2.5.56) preservesthe volume: The unique solutions of the initial value problem

x= f (x), f : Rn → Rn,

x(0) = z,(2.5.59)

with a continuously totally differentiable vector field f are denoted x(t) = ϕ(t,z).Uniqueness implies the relation ϕ(s+ t,z) = ϕ(s,ϕ(t,z)) for all s, t and s+ t forwhich the solutions exist. Moreover ϕ(0,z) = z for all z ∈ R

n. The mapping ϕ(t, ·) :Rn → R

n is called the flow of the system (2.5.59). Let μ(Ω) be the (Lebesgue-)measure of a measurable set Ω ⊂ R

n. The flow of the system (2.5.59) is volumepreserving if

μ(ϕ(t,Ω)) = μ(ϕ(0,Ω)) = μ(Ω) for all measurable Ω ∈ Rn (2.5.60)

and for all t for which the flow exists. Liouville’s theorem reads as follows:


http://dx.doi.org/10.1007/978-3-319-71123-2_1


If div f (x) = 0 for all x ∈ Rn,

then the flow of (2.5.59) is volume preserving.(2.5.61)

Here, div f (x) = ∑ni=1

∂ fi∂xi

(x) is the divergence of the vector field f . We prove Liou-ville’s theorem in the Appendix, (A.25)–(A.31).

The 2n-dimensional vector field of the Hamiltonian system (2.5.56) satisfies

f = (Hq,−Hp) and div f =n

∑i=1

(Hqipi −Hpiqi) = 0, (2.5.62)

provided the Hamiltonian is twice continuously partially differentiable. Hence, theHamiltonian flow is volume preserving. This has interesting consequences suchas the exclusion of asymptotically stable equilibria (sinks) and the application ofergodic theory to Hamiltonian mechanics.

Let the parametric functional (2.5.1) be invariant in the sense of (1.10.20). Anyadmitted reparametrization of a curve x ∈ D ⊂ (C1[ta, tb])n leaves, by Proposition1.10.1, the functional as well as the holonomic constraint (2.5.2) invariant. Hence,Proposition 2.5.1 holds for a reparametrized local constrained minimizer with pos-sibly different Lagrange multipliers. This is stated in the next proposition.

Proposition 2.5.2. Let the functional (2.5.1) be invariant in the sense of (1.10.20)and let x(τ) = x(ϕ(τ)) be a reparametrization of x ∈ (C1[ta, tb])n according to Defi-nition 1.10.3. Assume that the functions Φ and Ψ are continuously totally differen-tiable. Then,

ddτ

Φx(x,ddτ

x) = Φx(x,ddτ

x)+m

∑i=1

λi∇Ψi(x) (2.5.63)

holds on [τa,τb] if and only if

ddt


∑i=1

λi∇Ψi(x) (2.5.64)

holds on [ta, tb]. The Lagrange multipliers change as follows:

λi(τ) = λi(ϕ(τ))dϕdτ

(τ) for τ ∈ [τa,τb]. (2.5.65)

Proof. Relations like (1.10.14), following from (1.10.20) by differentiation, giveidentities, which are analogous to (1.10.15) and (1.10.16). By d

dτ x(τ)=x(ϕ(τ)) dϕdτ (τ)

with dϕdτ (τ) > 0, one obtains


http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1


ddτ

Φx(x(τ),ddτ

x(τ))−Φx(x(τ),ddτ

x(τ))

=ddτ

Φx(x(ϕ(τ)), x(ϕ(τ)))−Φx(x(ϕ(τ)), x(ϕ(τ)))dϕdτ

(τ)

=(

ddt

Φx(x(ϕ(τ)), x(ϕ(τ)))−Φx(x(ϕ(τ)), x(ϕ(τ))))dϕdτ

(τ),

(2.5.66)

proving the claimed equivalence. �

A holonomic constraint requires x(ta),x(tb) ∈ M. If no further boundary condi-tions are prescribed, local constrained minimizers fulfill natural boundary condi-tions.

Proposition 2.5.3. Let the curve x ∈ (C2[ta, tb])n be a local minimizer of the func-tional (2.5.1) subject to the holonomic constraint (2.5.2). Assume the differentiabil-ity conditions on Φ and Ψ of Proposition 2.5.1. Then, x fulfills the system (2.5.8),and without further restrictions on the boundaries of x at t = ta and/or t = tb, thenatural boundary conditions hold:

Φx(x(ta), x(ta)) ∈ Nx(ta)M and/or Φx(x(tb), x(tb)) ∈ Nx(tb)M, (2.5.67)

respectively.

Proof. Proposition 2.5.1 holds regardless of any boundary conditions, meaning thatx fulfills the system (2.5.8). Assume now no further restriction on x(ta)∈M. Choos-ing h ∈ (C2[ta, tb])n with arbitrary h(ta) and h(tb) = 0, we obtain in (2.5.10) a curvea(t) with an arbitrary vector a(ta) ∈ Tx(ta)M and a(tb) = 0. Constructing a perturba-tion like in (2.5.12)–(2.5.21), we end up with h(s, t) having the properties (2.5.21)except that in place of h(s, ta) = 0, we have x(ta)+ h(s, ta) ∈ M. Using this pertur-bation, one obtains from (2.5.23) after intergration by parts,

∫ tb

ta(Φx(x, x)− d

dtΦx(x, x),a)dt− (Φx(x(ta), x(ta)),a(ta)) = 0. (2.5.68)

By Φx(x(t), x(t))− ddt Φx(x(t), x(t)) = −∑m

i=1 λi(t)∇Ψi(x(t)) ∈ Nx(t)M, the integralin (2.5.68) vanishes due to a(t) ∈ Tx(t)M. Hence,

(Φx(x(ta), x(ta)),a(ta)) = 0 for arbitrary a(ta) ∈ Tx(ta)M, (2.5.69)

which proves the natural boundary condition (2.5.67) at t = ta. �

Differentiating the holonomic constraint Ψ(x(t)) = 0 for t ∈ [ta, tb] gives

DΨ(x(t))x(t) = 0,

or x(t) ∈ Kern DΨ(x(t)) = Tx(ta)M for t ∈ [ta, tb].(2.5.70)



We study that restriction for the mechanical model (2.5.28)–(2.5.32): The naturalboundary conditions together with (2.5.70) yield

Lx(x(ta), x(ta)) ∈ Nx(ta)M and

x(ta) ∈ Tx(ta)M.(2.5.71)

By orthogonality, (2.5.71) implies for L(x, x) = T (x)−V (x)

(Lx(x(ta), x(ta)), x(ta)) =N

∑k=1

mk(x2k + y2

k + z2k) = 0, and hence,

x(ta) = 0.

(2.5.72)

With or without holonomic constraints, the velocity vanishes at a free boundary.We generalize (2.5.1) to an explicit dependence on the independent variable, i.e.,

J(x) =∫ tb

taΦ(t,x, x)dt, (2.5.73)

defined on D ⊂ (C1[ta, tb])n. We further assume that the holonomic constraints nowdepend explicitly on t:

Ψ(t,x) = 0, (2.5.74)

where Ψ : [ta, tb]×Rn → R

m. Since reparametrizations are not an issue for thesevariational problems, we treat them in a different notation, namely

J(y) =∫ b

aF(x,y,y′)dx, (2.5.75)

andG(x,y) = 0, (2.5.76)

where G : [a,b]×Rn →R

m with n>m. For a function y : [a,b]→Rn, the constraints

(2.5.76) read G(x,y(x)) = 0 for x ∈ [a,b].We assume that F is twice and that G is three times continuously partially differ-

entiable with respect to all variables, and that

DyG(x,y) =(

∂Gi

∂y j(x,y)

)i=1,...,mj=1,...,n

∈ Rm×n (2.5.77)

has the maximal rank m for all (x,y) ∈ [a,b]×Rn where G(x,y) = 0. As shown in

the Appendix, for each x ∈ [a,b],



Mx = {y ∈ Rn|G(x,y) = 0} is a (n−m) -dimensional

continuously differentiable manifold, and

M =⋃

x∈[a,b]({x}×Mx) is a (n+1−m) -dimensional

continuously differentiable manifold

with boundary ({a}×Ma)∪ ({b}×Mb).

(2.5.78)

Observe that the Jacobian matrix DG(x,y) ∈ Rm×(n+1) has rank m, which is maxi-

mal.By the holonomic constraints G(x,y) = 0, the graphs of functions y ∈ D ⊂

(C1[a,b])n belong to M and y(x) ∈ Mx for x ∈ [a,b]. Additional boundary condi-tions can be imposed on {a}×Ma and {b}×Mb, respectively.

Fig. 2.9 An Admitted Perturbation on a Manifold

Proposition 2.5.4. Let y ∈ D ⊂ (C2[a,b])n be a local minimizer of the functional(2.5.75), subject to the holonomic constraints (2.5.76). In particular,

J(y) ≤ J(y) for all y ∈ D∩{G(x,y) = 0}satisfying max

k=1,...,n‖yk − yk‖1 < d. (2.5.79)

Then, there is a continuous function

λ = (λ1, . . . ,λm) : [a,b] → Rm, (2.5.80)



such that y is a solution of the following system of differential equations:

ddx

Fy′(·,y,y′) = Fy(·,y,y′)+m

∑i=1

λi∇yGi(·,y) on [a,b]. (2.5.81)

Here, we employ the notation Fy′ = (Fy′1 , . . . ,Fy′n), Fy = (Fy1 , . . . ,Fyn), y =(y1, . . . ,yn), y′ = (y′

1, . . . ,y′n), and ∇yGi = (Gi,y1 , . . . ,Gi,yn).

Proof. Since the proof is analogous to that of Proposition 2.5.1, we merely givea sketch. As a first step, we construct an admitted perturbation of the minimizingfunction y in D∩{G(x,y) = 0} such that y(x)+ h(t,x) ∈ Mx. For that purpose, weuse the orthogonal projections

Px(y) : Rn → TyMx and Qx(y) : Rn → NyMx

on the tangential and normal spaces to Mx

in y ∈ Mx, x ∈ [a,b], respectively.

(2.5.82)

For any h ∈ (C20 [a,b])

n ,

f (x) = Px(y(x))h(x) ∈ Ty(x)Mx for all x ∈ [a,b],

f ∈ (C2[a,b])n and f (a) = 0, f (b) = 0.(2.5.83)

A construction analogous to (2.5.12)–(2.5.21) (with a function H(t,x,z)=G(x,y(x)+t f (x)+Qx(y(x))z) for (t,x,z) ∈ R× [a,b]×Ny(x0)Mx0 ) yields:

There is a twice continuously differentiable perturbation

h : (−ε0,ε0)× [a,b] → Rn, satisfying

y(x)+h(t,x) ∈ Mx for (−ε0,ε0)× [a,b].The perturbation is of the form h(t,x) = t f (x)+g(t,x), where

f (x) ∈ Ty(x)Mx, g(t,x) ∈ Ny(x)Mx,

h(t,a) = 0, h(t,b) = 0 for t ∈ (−ε0,ε0),

h(0,x) = 0,∂∂ t

h(0,x) = f (x), and∂ 2

∂ t∂xh(0,x) = f ′(x)

for x ∈ [a,b].

(2.5.84)

Since J(y+h(t, ·)) is locally minimal at t = 0,



ddtJ(y+h(t, ·))|t=0 = 0 or explicitly

∫ b

a

n

∑k=1

(Fyk(·,y,y′)∂∂ t

hk(0, ·)+Fy′k(·,y,y′)

∂ 2

∂ t∂xhk(0, ·))dx

=∫ b

a(Fy(·,y,y′), f )+(Fy′(·,y,y′), f ′)dx= 0.

(2.5.85)

Integration by parts gives

∫ b

a(Fy(·,y,y′)− d

dxFy′(·,y,y′), f )dx= 0

for all f ∈ (C20 [a,b])

n satisfying f (x) ∈ Ty(x)Mx.

(2.5.86)

As in (2.5.25), the properties of the tangent and normal space and their projectionsentail

Qx(y(x))(ddx

Fy′(x,y(x),y′(x))−Fy(x,y(x),y′(x))) =

m

∑i=1

λi(x)∇yGi(x,y(x)),

(2.5.87)where the functions λi are uniquely determined and continuous on [a,b].Let h ∈ (C2

0 [a,b])n be arbitrary. Then, the decomposition

h(x) = Px(y(x))h(x)+Qx(y(x))h(x) = f (x)+g(x), where

f (x) ∈ Ty(x)Mx and g(x) ∈ Ny(x)Mx,(2.5.88)

shows by the same arguments given after (2.5.27) that

∫ b

a

( ddx

Fy′(·,y,y′)−Fy′(·,y,y′)−m

∑i=1

λi∇yGi(·,y),h)dx= 0 (2.5.89)

holds. Since h ∈ (C20 [a,b])

n is arbitrary, this proves the claim (2.5.81). �If for admitted functions y∈D∩{G(x,y) = 0} only y(a)∈Ma and y(b)∈Mb are

required, then local constrained minimizers fulfill natural boundary conditions.

Proposition 2.5.5. Let y ∈ (C2[a,b])n be a local minimizer of the functional(2.5.75) under the holonomic constraints (2.5.76). Assume the same differentiabilityproperties of F and G as given in Proposition 2.5.4. If no further boundary condi-tions are imposed on y at x = a and/or x = b, then y fulfills the natural boundaryconditions

Fy′(a,y(a),y′(a)) ∈ Ny(a)Ma and/or

Fy′(b,y(b),y′(b)) ∈ Ny(b)Mb, respectively.

(2.5.90)

Proof. Suppose that no further boundary condition is imposed at x= a. We constructperturbations h(t,x) as in (2.5.84), but now f (a)∈ Ty(a)Ma is arbitrary and f (b) = 0.



Then, the perturbation is only constrained by y(a) + h(t,a) ∈ Ma. Integration byparts of (2.5.85) gives

∫ b

a(Fy(·,y,y′)− d

dxFy′(·,y,y′), f )dx− (Fy′(a,y(a),y

′(a)), f (a)) = 0, (2.5.91)

by f (b) = 0. By (2.5.81), the integral vanishes, and since f (a)∈ Ty(a)Ma is arbitrary,the vector Fy′(a,y(a),y′(a)) is in the orthogonal complement of Ty(a)Ma, which isNy(a)Ma. �

The holonomic constraints G(x,y(x)) = 0 imply

Gx(x,y(x))+DyG(x,y(x))y′(x) = 0 for x ∈ [a,b], (2.5.92)

which gives additional constraints at the boundaries x= a and x= b.

Exercises

2.5.1. Adopt the notation employed in (2.5.28)–(2.5.32), and show that the totalenergy E(x, x) = T (x) +V (x) is constant along a trajectory that fulfills both theholonomic constraints and the Euler-Lagrange equations of the action.

2.5.2. Compute the trajectory x = x(t) of a point mass m running on the inclinedplane x1 + x3 −1 = 0 from x(0) = (0,0,1) with initial speed x(0) = (0,v2,0) in theplane x3 = 0. The only force acting on the point mass is the gravity g in directionof the negative x3-axis. Compute the running time and compare it to the time of afree fall from (0,0,1) to (0,0,0). Does the running time depend on the initial speed(0,v2,0)?

2.5.3. Compute the Euler-Lagrange equations for the spherical pendulum movingon the sphere x2

1+x22+x2

3 −�2 = 0 under gravity that acts in direction of the negativex3-axis.Compute also the Euler-Lagrange equations with respect to the generalized coordi-nates ϕ ∈ [0,2π] and θ ∈ [−π

2 , π2 ], where

x1 = r sinθ cosϕ,

x2 = r sinθ sinϕ,

x3 = r cosθ ,

are the spherical coordinates.

2.5.4. Show that any solution x = x(t) of the Euler-Lagrange equations for thespherical pendulum fulfills the conservation law

ddt(x1x2 − x1x2) = 0, or x1x2 − x1x2 = c,



and gives a geometric interpretation.

Hint: Use formula (2.2.4) along x and describe the area that it bounds.

2.6 Geodesics

We show in the Appendix that for a continuously totally differentiable mappingΨ : Rn → R

m, where n > m, the nonempty zero set M = {x ∈ Rn|Ψ(x) = 0} is a

continuously differentiable manifold of dimension n−m.

Definition 2.6.1. A shortest path connection between two points on a manifold Mis called a geodesic.

A geodesic is a curve x∈D= (C1[ta, tb])n∩{x(ta) =A,x(tb) =B} that minimizesthe functional

J(x) =∫ tb

ta‖x‖dt, (2.6.1)

measuring its length, under the holonomic constraint

Ψ(x) = 0. (2.6.2)

The hypotheses of Proposition 2.5.1 are that Ψ is three times continuously par-tially differentiable and that admitted curves are twice continuously differentiable.Furthermore, since the Lagrange function Φ(x) = ‖x‖ is not differentiable at x= 0,only curves x with nonvanishing tangent vectors are admitted.

By Proposition 2.5.1, a geodesic has to fulfill the system of Euler-Lagrange equa-tions

ddt

x‖x‖ =

m

∑i=1

λi∇Ψi(x) on [ta, tb], (2.6.3)

where the continuous Lagrange multipliers λ = (λ1, . . . ,λm) ∈ (C[ta, tb])m dependon t.

Since the functional (2.6.1) is invariant, we can apply Proposition 2.5.2. A suit-able reparametrization simplifies the system (2.6.3) as follows: The arc length

s(t) =∫ t

ta‖x(σ)‖dσ , s(t) = ‖x(t)‖ > 0, (2.6.4)

has an inverse function t = ϕ(s) which is a continuously differentiable mappingϕ : [0,L]→ [ta, tb]. Here, L= L(x) is the length of the admitted curve x∈ (C2[ta, tb])n.

A reparametrization of x by the arc length


2.6 Geodesics 97

Fig. 2.10 Path Connections on a Manifold

x(ϕ(s)) = x(s), where

dϕds

(s) =1

s(ϕ(s))> 0,

(2.6.5)

is admitted according to Definition 1.10.3, and relations (2.6.4), (2.6.5) imply∥∥∥∥dxds (s)

∥∥∥∥ = ‖x(ϕ(s))‖dϕds

(s) = 1. (2.6.6)

Observe that the interval of the new parameter is given by the length of the curve.Nonetheless Proposition 2.5.2 is applicable, since the equivalence of (2.5.63) and of(2.5.64) holds for any fixed curve.

Due to (2.6.6), a geodesic parametrized by its arc length has to fulfill the follow-ing system, where we replace s by t and omit the tilde:

x=m

∑i=1

λi∇Ψi(x) on [0,L],

‖x(t)‖ = 1.

(2.6.7)

Examples:

We compute the geodesics on a sphere SR in R3 given by

Ψ(x) = x21 + x2

2 + x23 −R2 = 0. (2.6.8)


http://dx.doi.org/10.1007/978-3-319-71123-2_1


According to (2.6.7), we have to solve

x= 2λx,

x21 + x2

2 + x23 = 1.

(2.6.9)

We differentiate (2.6.8) twice, and we obtain by (2.6.9)

2(x1x1 + x2x2 + x3x3) = 0,

x21 + x2

2 + x23 + x1x1 + x2x2 + x3x3 = 0,

or 1+2λ (x21 + x2

2 + x23) = 1+2λR2 = 0,

or 2λ = − 1R2 , and (2.6.9)1 becomes finally x+

1R2 x= 0.

(2.6.10)

The general solution of (2.6.10)4 is given by

x(s) = acos1Rs+bsin

1Rs where a,b ∈ R

3. (2.6.11)

The constraints imply

‖x(0)‖ = ‖a‖ = R , ‖x(0)‖ =1R

‖b‖ = 1,

(x(0), x(0)) = (a,1Rb) = 0 by (2.6.10)1.

(2.6.12)

The vectors a and b have length R, and they are orthogonal. Let c ∈ R3 be a third

vector (of length 1) that is orthogonal to a and to b. Then,

(x(s),c) = 0 for all s ∈ [0,L]. (2.6.13)

On the other hand, the set

E = {x ∈ R3|(x,c) = 0}, (2.6.14)

describes a plane in R3 containing the center 0 of the sphere SR. This gives the result

that any geodesic fulfills

x(s) ∈ SR ∩E for all s ∈ [0,L]. (2.6.15)

The intersection of the sphere SR and of the plane E through the center 0 is called agreat circle.

Given two points A and B on the sphere SR, the three points 0, A, and B span aplane E, and the geodesic between A and B runs in SR ∩E. The points A and B areconnected by two arcs on a great circle, one of which is shorter than the other, orboth have the same length if A and B are antipodes.

Shortest connections between two points on a sphere are important for the routeplanning of flights. If, for instance, two cities in Europe and North America are on


2.6 Geodesics 99

the same circle of latitude, then the aircraft does not follow that circle of latitude,but rather it takes a seeming detour over the North Atlantic.

If the manifold M is described by generalized coordinates, the geodesics areminimizers of a variational problem without holonomic constraints. We discuss anexample of a surface of revolution in R

3:

M = {(r cosϕ, r sinϕ, f (r))|0 ≤ r1 < r < r2 ≤ ∞,ϕ ∈ [0,2π]}. (2.6.16)

We assume that f is twice continuously differentiable, and for r1 = 0, that f ′(0) �= 0.Thus, the surface M has a cusp at (0,0, f (0)), cf. Figure 2.11.

Fig. 2.11 A Surface of Revolution

Given two points A=(ra cosϕa, ra sinϕa, f (ra)) and B=(rb cosϕb, rb sinϕb,f (rb)) on M, we determine the geodesic between A and B. For that purpose, weparametrize connecting curves on M as follows:

{x(t) = (r(t)cosϕ(t), r(t)sinϕ(t), f (r(t)))|t ∈ [ta, tb]} (2.6.17)

where r(ta) = ra, ϕ(ta) = ϕa, r(tb) = rb, ϕ(tb) = ϕb. The length of the curves isgiven by the functional



J(x) =∫ tb

ta

√x2

1 + x22 + x2

3dt

=∫ tb

ta

√r2(1+( f ′(r))2)+ r2ϕ2dt = J(r,ϕ),

(2.6.18)

with Lagrange function Φ(r,ϕ, r, ϕ) =√

r2(1+( f ′(r))2)+ r2ϕ2. Assuming thatthe tangent x(t) does not vanish, the Euler-Lagrange equations for a minimizer readas follows:

ddt

Φr =ddt

r(1+( f ′(r))2)√r2(1+( f ′(r))2)+ r2ϕ2

=r2 f ′(r) f ′′(r)+ rϕ2√r2(1+( f ′(r))2)+ r2ϕ2

= Φr,

ddt

Φϕ =ddt

r2ϕ√r2(1+( f ′(r))2)+ r2ϕ2

= 0 = Φϕ , t ∈ [ta, tb].

(2.6.19)We distinguish two cases:

1. The points A and B have the same angular coordinates ϕa = ϕb.With the ansatz ϕ(t) = ϕa = ϕb for all t ∈ [ta, tb], the system (2.6.19) for r= r(t)with r(t) �= 0 and ϕ(t) = 0 simplifies to

ddt

√1+( f ′(r))2 =

r f ′(r) f ′′(r)√1+( f ′(r))2

. (2.6.20)

Any parametrization (2.6.17) with a continuously differentiable function r(t) ful-fills (2.6.20), and the geodesic is a meridian {(r cosϕa,r sinϕa, f (r))} where rruns from ra to rb.

2. The points A and B have different angular coordinates ϕa �= ϕb.Since the Lagrange function Φ in (2.6.18) is invariant according to Defini-tion 1.10.2, Proposition 1.10.4 is applicable, allowing any reparametrization inthe system (2.6.19) of Euler-Lagrange equations. Choosing the arc length asa parameter (cf. (2.6.4)–(2.6.6)), the denominators in (2.6.19) are equal to 1.Equation (2.6.19)2 then yields r2ϕ = c �= 0, since the angular coordinate ϕ is notconstant in this case. Consequently, ϕ is monotonically increasing or decreasing.This means geometrically that the geodesic winds round the surface of revolutionwith a constant angular velocity.

Due to ϕ(t) �= 0 for all t ∈ [ta, tb], there exists the inverse function t = τ(ϕ), andr(t) = r(τ(ϕ)) = r(ϕ) is now parametrized by ϕ ∈ [ϕa,ϕb], assuming that ϕa < ϕb.

For a curve (2.6.17) parametrized by ϕ , the Lagrange function of the functional(2.6.18) transforms as follows: Replace r by r, r by dr

dϕ , and ϕ by 1, and inte-grate over [ϕa,ϕb]. By Proposition 1.10.1, the functional describes the length asbefore. Omitting the tilde and setting dr

dϕ = r′, we obtain the new Lagrange func-

tion Φ(r,r′) =√(r′)2(1+( f ′(r))2)+ r2. Since it does not depend explicitly on

the parameter ϕ , we can proceed as in case 7 of Paragraph 1.5: Any solution ofΦ(r,r′)− r′Φr′ = c1 is also a solution of the Euler-Lagrange equation, or it is con-stant. In this case, we obtain


http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

2.6 Geodesics 101

r2√(r′)2(1+( f ′(r))2)+ r2

= c1 for r = r(ϕ), ϕ ∈ [ϕa,ϕb]. (2.6.21)

We establish a geometric property of a curve x(ϕ) = (x1(ϕ),x2(ϕ),x3(ϕ)) =(r(ϕ)cosϕ,r(ϕ)sinϕ, f (r(ϕ))) satisfying (2.6.21). For that purpose, we use thetangent

x′(ϕ) = (r′(ϕ)cosϕ − r(ϕ)sinϕ, r′(ϕ)sinϕ + r(ϕ)cosϕ, f ′(r(ϕ))r′(ϕ)).(2.6.22)

The circle of latitude on M intersecting the curve in x(ϕ) is

{z(ψ) = (r(ϕ)cosψ,r(ϕ)sinψ, f (r(ϕ))|ψ ∈ [0,2π]}, (2.6.23)

with tangent in x(ϕ) given by

z′(ϕ) = (−r(ϕ)sinϕ,r(ϕ)cosϕ,0), ( )′ =ddψ

. (2.6.24)

The scalar product of the two tangents (2.6.22) and (2.6.24) in R3 yields

(x′(ϕ),z′(ϕ)) = r(ϕ)2 = ‖x′(ϕ)‖ ‖z′(ϕ)‖cosβ (ϕ), (2.6.25)

where β (ϕ) is the angle between the vectors x′(ϕ) and z′(ϕ). This is the equi-valent geometric definition of the scalar product. Now, ‖x′(ϕ)‖= Φ(r(ϕ),r′(ϕ)) =√(r′(ϕ))2(1+( f ′(r(ϕ)))2)+(r(ϕ))2 and ‖z′(ϕ)‖ = r(ϕ). Combining this with

(2.6.21) and (2.6.25), we finally obtain

r(ϕ)cosβ (ϕ) = c1 for all ϕ ∈ [ϕa,ϕb]. (2.6.26)

Formula (2.6.26) is called Clairaut’s relation (Clairaut, 1713–1765). It relates theradius of a circle of latitude on M and the angle of an intersecting geodesic betweenthe circle of latitude. If, for instance, r(ϕ) decreases, then the angle β (ϕ) decreasesas well, and for r(ϕ) = c1, Clairaut’s relation gives β (ϕ) = 0. Apparently r(ϕ)≥ c1

for all ϕ ∈ [ϕa,ϕb], and r(ϕ) = c1 for a single value of ϕ only. (The Euler-Lagrangesystem (2.6.19) has no locally constant solution r = c1 for ϕ in an interval; observethat (2.6.21) is not equivalent to the Euler-Lagrange equation for the Lagrange func-tion Φ(r,r′)).

The geodesic connecting the points A and B, sketched in Figure 2.12, is continuedas a solution of (2.6.21) beyond the interval [ϕa,ϕb]. It satisfies r≥ c1 globally, and itcannot form a loop since in this case it is tangent to a second circle of latitude witha radius r > c1. By cosβ = 1 in the point of contact, Clairaut’s relation (2.6.26)would be violated. The global geodesic winds around the surface of revolution anddisappears finally at “infinity.” This is to be shown in Exercise 2.6.2 for a cone: Thegeodesic comes from infinity, winds around the cone, touches a circle of latitude,and winds around to infinity again.

There is another type of a surface of revolution other than (2.6.16), namely



Fig. 2.12 A Geodesic on a Surface of Revolution

M = {(g(x3)cosϕ,g(x3)sinϕ,x3)|−∞ ≤ c < x3 < d ≤ ∞}. (2.6.27)

The function g : (c,d) → R is assumed to be positive and twice continuously differ-entiable. In contrast to the surface depicted in Figure 2.12 that surface of revolutionmight also narrow downward, and in this case the condition r ≥ c1 means also abound of the geodesic from below. Since the surface between two circles of latitudeis compact, the global geodesic exists for ϕ ∈ (−∞,∞) and it winds infinitely oftenaround the surface, and it approaches each point of it arbitrarily close. A detaileddiscussion can be found in [13], p. 138.


2.7 Nonholonomic Constraints 103

Exercises

2.6.1. Compute the geodesic from A = (1,0,0) to B = (−1,0,1) on the cylinderZ = {x= (x1,x2,x3)|x2

1 + x22 = 1}. Is it unique?

2.6.2. Consider the cone K = {x = (x1,x2,x3)|x21 + x2

2 = x23, x3 ≥ 0} = {(r cosϕ,

r sinϕ,r)|0 ≤ r < ∞,ϕ ∈ [0,2π]}. Assuming that geodesics exist globally on K,show that all global geodesics except meridians are unbounded in both direction.To be more precise: Fix a point x0 = x(t0) on a geodesic x= x(t) and parametrize thegeodesic in direction t ≥ t0 by the arc length and in direction t ≤ t0 by the negativearc length. Then, x= x(s) for s∈ (s−,s+), x(0) = x0. Since segments of the geodesicof finite length can be continued, we can assume that s− = −∞ and s+ = +∞.The global geodesic {x(s)|s ∈ (−∞,∞)} satisfies the Euler-Lagrange equation inCartesian coordinates with a holonomic constraint or in generalized coordinateswithout constraint. Show that lims→∞ ‖x(s)‖ = lims→∞

√2r(s) = ∞ and lims→−∞

‖x(s)‖ = lims→−∞√

2r(s) = ∞. Make use of the fact that the Euler-Lagrange equa-tions simplify for curves parametrized by the (positive or negative) arc length.

2.7 Nonholonomic Constraints

In physics, in particular in mechanics, a constraint is called nonholonomic if itrestricts not only the coordinates of position but also of velocity.

Definition 2.7.1. For a parametric functional

J(x) =∫ tb

taΦ(t,x, x)dt (2.7.1)

or, in a different notation, for a nonparametric functional

J(y) =∫ b

aF(x,y,y′)dx, (2.7.2)

defined on D ⊂ (C1[ta, tb])n or on D ⊂ (C1[a,b])n, respectively, the constraints

Ψ(t,x, x) = 0 for t ∈ [ta, tb] orG(x,y,y′) = 0 for x ∈ [a,b] , respectively,

(2.7.3)

are called nonholonomic constraints. The functions

Ψ : [ta, tb]×Rn ×R

n → Rm

and G : [a,b]×Rn ×R

n → Rm for n > m,

(2.7.4)



are three times continuously partially differentiable with respect to all variables,and the matrices

DpΨ(t,x, p) =(

∂Ψi

∂ p j(t,x, p)

)i=1,...,mj=1,...,n

and DpG(x,y, p) =(

∂Gi

∂ p j(x,y, p)

)i=1,...,mj=1,...,n

(2.7.5)

have maximal rank m at all points (t,x, p) ∈ [ta, tb]×Rn ×R

n and all (x,y, p) ∈[a,b]×R

n ×Rn that solve (2.7.3) with x= p and y′ = p, respectively.

Apparently, the parametric and nonparametric versions are equivalent, and theydiffer only by notation. We give the general results only for (2.7.2) with (2.7.3)2.

The nonholonomic constraints (2.7.3)2 are no longer geometric constraints onthe coordinates of admitted functions as are holonomic constraints, but they linkthe coordinates of the functions to their derivatives. Such equations are differen-tial equations, but the form (2.7.3)2 differs from those that are commonly taught incourses on ordinary differential equations: They are implicit differential equations.Nonetheless condition (2.7.5) implies by the implicit function theorem that near asolution, equation (2.7.3) is solved for m components of the derivative. In the result-ing system of m explicit differential equations, the remaining n−m components ofthe derivative play the role of “free parameters.”

The theory of nonholonomic constraints does not give the existence of solutionsof (2.7.3) on the entire interval [ta, tb] or [a,b], respectively, but it assumes the exis-tence of a minimizing (or maximizing) function of the functional (2.7.1) or (2.7.2),respectively, that fulfills (2.7.3). The theory investigates mainly the class of admit-ted perturbations in order to derive a necessary system of Euler-Lagrange equations.The following example shows that admitted perturbations do not necessarily exist.

For n= 2 and m= 1, let

G(x,y,y′) = y′2 −

√1+(y′

1)2 = 0 for x ∈ [0,1]. (2.7.6)

Then,

DpG(x,y, p) =

⎛⎝− p1√

1+ p21

,1

⎞⎠ (2.7.7)

has rank m= 1. An admitted extremal must fulfill the boundary conditions

y(0) = (y1(0),y2(0)) = (0,0),y(1) = (y1(1),y2(1)) = (0,1),

(2.7.8)

and also the nonholonomic constraint (2.7.6). By



y′2(x) =

√1+(y′

1(x))2 ≥ 1,

y2(1) =∫ 1

0y′

2(x)dx= 1 is fulfilled only by

y′1(x) = 0, or y1(x) = y1(0) = 0. Hence,

y2(x) = x.

(2.7.9)

Accordingly, the only admitted function is y(x) = (y1(x),y2(x)) = (0,x) forx ∈ [0,1], and there is no admitted perturbation. Hence, a variational problem withnonholonomic constraint (2.7.6) and boundary conditions (2.7.8) is ill-posed.

In order to allow perturbations, an extremal satisfying a nonholonomic constraintmust be free in the following sense:

Definition 2.7.2. A solution y∈ (C1[a,b])n of G(x,y,y′) = 0 satisfying y(a) =A andy(b) = B is free if the boundary value problem

G(x,y,y′) = 0 for x ∈ [a,b]y(a) = A, y(b) = B,

(2.7.10)

has a solution for all B ∈ Rn in a ball ‖B− B‖ < d for some positive d. If y is not

free, it is bound.

Here are two examples in parametric form:

1. For m= 1, let

Ψ(x, x) = x21 + · · ·+ x2

n −1 = 0 for t ∈ [ta, tb] (2.7.11)

and assume n ≥ 2. Then,

DpΨ(x, p) = 2(p1, . . . , pn) (2.7.12)

has rank m = 1 for all ‖p‖2 = 1, for which (2.7.11) is solvable with x = p. Allsolutions of

Ψ(x, x) = 0,

x(ta) = A, x(tb) = B,(2.7.13)

have length L=∫ tbta ‖x‖dt = tb − ta, and therefore

the straight line segment satisfying (2.7.13) with

‖B−A‖ = tb − ta is bound,

and any continuously differentiable curve satisfying (2.7.13) with

‖B−A‖ < tb − ta is free,

(2.7.14)

cf. Figure 2.13.



Fig. 2.13 A Bound Curve and a Free Curve

2. Consider the nonholonomic constraint

Ψ(x, x) = DΨ(x)x= 0 for t ∈ [ta, tb], (2.7.15)

where the mapping Ψ : Rn → Rm, with m < n, is continuously totally differ-

entiable and whose Jacobian matrix DΨ(x) ∈ Rm×n has maximal rank m for all

x∈Rn satisfying Ψ(x) =Ψ(A) for some A∈R

n. In view of DpΨ(x, p) =DΨ(x),the nonholonomic constraint (2.7.15) fulfills the maximal rank condition (2.7.5)for the same points x ∈ R

n. All solutions of

Ψ(x, x) = DΨ(x)x= 0 for t ∈ [ta, tb],x(ta) = A, x(tb) = B,

(2.7.16)

are on the (n−m)-dimensional manifold M = {x ∈ Rn|Ψ(x) = Ψ(A)}, since

(2.7.16)1 is equivalent to ddtΨ(x) = 0. Solutions exist only if the endpoints B

and B are on M, and therefore any curve satisfying the nonholonomic constraint(2.7.16) is bound.

This example shows that the holonomic constraintΨ(x)=Ψ(A) can be expressedas a nonholonomic constraint d

dtΨ(x) = DΨ(x)x= 0 that binds solution curves.Isoperimetric constraints can be expressed as nonholonomic constraints as well.

For

Ki(y) =∫ b

aGi(x,y,y′)dx= ci, i= 1, . . . ,m, (2.7.17)

with G = (G1, . . . ,Gm) and c = (c1, . . . ,cm) for (y,z) ∈ (C1[a,b])n+m, define thenonholonomic constraint

G(x,y,z,y′,z′) = z′(x)−G(x,y,y′) = 0, x ∈ [a,b],z(a) = 0, z(b) = c.

(2.7.18)



This is equivalent to the isoperimetric constraint in that sense that a function ysatisfies (2.7.17) if and only if the function (y,z) satisfies (2.7.18). In view of

D(p,q)G(x,y,z, p,q) = (−DpG(x,y, p) E) ∈ Rm×(n+m), (2.7.19)

where E ∈ Rm×m denotes the unit matrix, the Jacobian matrix (2.7.19) has maximal

rank m throughout. We see in Exercise 2.7.4 that a function y ∈ (C1[a,b])n is notcritical for the isoperimetric constraint (2.7.17), in the sense of Proposition 2.1.3, ifand only if the function (y,z)∈ (C1[a,b])n+m satisfying (2.7.18) is normal accordingto Definition 2.7.3 below. Finally, a function is normal for a nonholonomicconstraint if and only if it is free for it. That equivalence is proven in [11], IV, 4.

We quote the main proposition on nonholonomic constraints.

Proposition 2.7.1. Let y ∈ (C2[a,b])n be a local minimizer of the functional (2.7.2)under the nonholonomic constraint (2.7.3)2. The functions F and G are three timescontinuously partially differentiable with respect to all variables, and the matrix(2.7.5)2 has maximal rank m for all (x,y, p) in a neighborhood of the graph of (y,y′)in [a,b]×R

n ×Rn.

Then, there is a continuously differentiable function

λ = (λ1, . . . ,λm) : [a,b] → Rm and

λ0 ∈ {0,1},(2.7.20)

such that y solves the system of differential equations

ddx

(λ0F+m

∑i=1

λiGi)y′ = (λ0F+m

∑i=1

λiGi)y on [a,b]. (2.7.21)

Here, Fy′ = (Fy′1 , . . . ,Fy′n), Fy = (Fy1 , . . . ,Fyn), and Gi,y′ , Gi,y are defined analo-gously. The argument of all functions in (2.7.21) is (x,y(x),y′(x)).Finally

(λ0,λ ) �= (0,0) ∈ R× (C1[a,b])m. (2.7.22)

The case λ0 = 1 is of special interest, since for λ0 = 0 the Lagrange function Fdoes not appear in the system (2.7.21). In view of (2.7.22) in case λ0 = 0, some ofthe multipliers λ1, . . . ,λm do not vanish. That motivates the following definition:

Definition 2.7.3. A solution y ∈ (C2[a,b])n of G(x,y,y′) = 0 on [a,b] is called nor-mal if

ddx

(m

∑i=1

λiGi(·,y,y′))y′ = (m

∑i=1

λiGi(·,y,y′))y on [a,b] (2.7.23)



for λ = (λ1, . . . ,λm) ∈ (C1[a,b])m has the only solution λ = 0 ∈ Rm. If (2.7.23) is

solvable for some nontrivial λ , then y is not normal.

As remarked before, a function y satisfying a nonholonomic constraint is normalif and only if it is free. Without using this terminology, this is also proved in [3],Paragraph 69, and in [13], Chaps. 2, 3.

In view of these definitions, we complete Proposition 2.7.1 as follows:

Corollary 2.7.1. If the local minimizer in Proposition 2.7.1 is free or normal withrespect to the nonholonomic constraint, then it fulfills the system (2.7.21) withλ0 = 1.

The proof of Proposition 2.7.1 and of Corollary 2.7.1 is elaborate, and we referto the literature [3], [11], [13], for example.

Proposition 2.7.1 is the most general in covering all constraints considered so far:For isoperimetric constraints, the Lagrange multipliers λ1, . . . ,λm are constant

and λ0 = 1. Then, Proposition 2.7.1 yields Proposition 2.1.3, cf. Exercise 2.7.4 fordetails.

Curves that satisfy holonomic constraints in the nonholonomic form (2.7.15) arenot free, cf. example 2 above and Exercise 2.7.2. Nonetheless system (2.7.21) isequivalent to system (2.5.8) of Proposition 2.5.1, cf. Exercise 2.7.3. This showsthat for the choice λ0 = 1 in (2.7.21), the local minimizer is not necessarily free ornormal. That condition is sufficient but not necessary.

We apply Proposition 2.7.1 to the hanging chain, which we have consideredalready in Paragraph 2.3:

We parametrize the graph {(x,y(x))|x∈ [a,b]} representing the hanging chain byits arc length s, i.e.,

(x,y(x)) = (x(s), y(s)) for s ∈ [0,L] where

(x(0), y(0)) = (a,A), (x(L), y(L)) = (b,B),(2.7.24)

cf. Figure 2.3. Then, its potential energy is given by (up to the factor gρ)

J(x, y) =∫ L

0y(s)ds (2.7.25)

which has to be minimized under the boundary conditions (2.7.24)2 and the non-holonomic constraint

˙x2 + ˙y2 = 1, . =dds

, (2.7.26)

cf. (2.6.6).Assuming that L >

√(b−a)2 +(B−A)2, then any admitted curve from (a,A)

to (b,B) is not a straight line segment and therefore it is free according to (2.7.14).However, we do not use that information, but we apply Proposition 2.7.1 formally



excluding λ0 = 0. For the parametric versions (2.7.25) and (2.7.26), the system(2.7.21) reads

2dds

λ1 ˙x= 0,

2dds

λ1 ˙y= λ0, s ∈ [0,L].(2.7.27)

This yields λ1 ˙x= c1, λ1 ˙y= 12 λ0s+ c2, and using (2.7.26), we obtain

λ 21 = λ 2

1 ( ˙x2 + ˙y2) = c21 +(

12

λ0s+ c2)2. (2.7.28)

For λ0 = 0, this gives λ1 =√

c21 + c2

2 �= 0 (by (2.7.22)) and

x(s) =c1

λ1s+ c3, y(s) =

c2

λ1s+ c4, (2.7.29)

which describes a straight line. However, in view of L >√(b−a)2 +(B−A)2, it is

not admitted.Therefore, λ0 = 1, and from the above calculations,

λ1(s) =

√c2

1 +(12s+ c2)2,

˙y(s) = (12s+ c2)/

√c2

1 +(12s+ c2)2,

y(s) = 2

√c2

1 +(12s+ c2)2 + c4,

˙x(s) = c1/

√c2

1 +(12s+ c2)2,

x(s) = 2c1Arsinh

(1c1(

12s+ c2)

)+ c3,

c1 sinh

(x(s)− c3

2c1

)=

12s+ c2 and

y(s) = 2c1 cosh

(x(s)− c3

2c1

)+ c4.

(2.7.30)

We obtain a catenary with three constants as in (2.3.7).

Remark. The general Proposition 2.7.1 with λ0 = 1 goes back to Lagrange. Wesketch roughly his arguments for n = 2 and m = 1: For a minimizer y ∈ (C1[a,b])2

of (2.7.2) under the constraint (2.7.3)2, let a function y+ th with any h ∈ (C10 [a,b])

2

be an admitted perturbation, i.e., G(x,y+ th, y′ + th′) = 0 for t ∈ (−ε,ε).Then, d

dt G(x,y+th, y′+th′)|t=0 = 0 or (Gy,h)+(Gy′ ,h′)= 0 where ( , ) denotes

the scalar product in R2. By assumption, J(y+ th) is minimal at t = 0, and hence,

δJ(y)h= 0. Therefore,



∫ b

a(Fy+λGy,h)+(Fy′ +λGy′ ,h

′)dx= 0 (2.7.31)

for any arbitrary continuously differentiable function λ , and integration by partsyields ∫ b

a(Fy+λGy − d

dx(Fy′ +λGy′),h)dx= 0. (2.7.32)

The function λ is then determined by the scalar differential equation

Fy1 +λGy1 − ddx

(Fy′1 +λGy′1) = 0 on [a,b], (2.7.33)

and since the component h2 is arbitrary, the fundamental lemma of the calculus ofvariations finally yields

Fy2 +λGy2 − ddx

(Fy′2 +λGy′2) = 0 on [a,b]. (2.7.34)

It is not appropriate to criticize the weak points of that argument, but rather wejudge Lagrange’s merit by the pioneering and useful result.

Exercises

2.7.1. For n ≥ 2, let

Ψ(t,x, x) = x21 + · · ·+ x2

n −1 = 0 on [ta, tb].

Show that any curve satisfying that nonholonomic constraint is normal, provided itis not a straight line.

2.7.2. LetΨ(x, x) = DΨ(x)x,

where Ψ : Rn → Rm with m < n, be a twice continuously partially differentiable

function whose Jacobian matrix DΨ(x) has a maximal rank for all x ∈ M = {x ∈Rn|Ψ(x) =Ψ(A)}.Show that a curve x ∈ (C1[ta, tb])n satisfying x(ta) = A and Ψ(x, x) = 0 is not

normal.

2.7.3. Let Ψ(x, x) = DΨ(x)x, where Ψ satisfies the hypotheses required in Exer-cise 2.7.2. Let Φ : Rn ×R

n → R be a twice continuously partially differentiableLagrange function that fulfills system (2.5.8), viz.,

ddt

Φx = Φx+m

∑i=1

λi∇Ψi on [ta, tb],



where λ = (λ1, . . . ,λm) : [ta, tb] → Rm is continuous.

Show that system (2.5.8) is equivalent to system (2.7.21), given by

ddt(Φ +

m

∑i=1

λiΨi)x = (Φ +m

∑i=1

λiΨi)x,

where λ0 = 1 and λ = (λ1, . . . , λm) : [ta, tb] → Rm is continuously differentiable.

2.7.4. Formulate the isoperimetric constraints

Ki(y) =∫ b

aGi(x,y,y′)dx= ci, i= 1, . . . ,m,

with functions Gi : [a,b]×Rn ×R

n → R, which are continuous and continuouslypartially differentiable with respect to the last 2n variables, as equivalent nonholo-nomic constraints

G(x,y,z,y′,z′) = z′(x)−G(x,y,y′) = 0 for x ∈ [a,b],z(a) = 0, z(b) = c,

with G = (G1, . . . ,Gm), c = (c1, . . . ,cm), and (y,z) ∈ (C1[a,b])n+m. Then,D(p,q)G(x,y,z, p,q) has rank m, cf. (2.7.19).

a) Show that a function y ∈ (C1[a,b])n is not critical for the isoperimetric con-straints in the sense of Proposition 2.1.3 or of Exercise 2.1.1, if and only ifthe function (y,z) ∈ (C1[a,b])n+m, satisfying the nonholonomic constraints, isnormal.

b) Show that for a local minimizer y ∈ (C2[a,b])n, of the functional (2.7.2) or(2.1.15), under the nonholonomic constraints (2.7.18), the Lagrange multipliersin system (2.7.21) are constant and that for λ0 = 1, system (2.7.21) is convertedto system (2.1.20).

2.7.5. A functional of second order for y ∈C2[a,b],

J(y) =∫ b

aF(x,y,y′,y′′)dx,

where the Lagrange function F : [a,b]×R×R×R→R is three times continuouslydifferentiable with respect to all variables, is formulated as a functional of first orderfor (y,z) ∈ (C1[a,b])2 via

J(y,z) =∫ b

aF(x,y,y′,z′)dx,

under the nonholonomic constraint

G(x,y,y′,z′) = y′ − z= 0.



Prove the following statements:

a) The nonholonomic constraint fulfills the rank condition (2.7.5).b) Any function fulfilling the nonholonomic constraint is free.c) Any function fulfilling the nonholonomic constraint is normal.d) Give the Euler-Lagrange equation for a local minimizer y ∈ C2[a,b] of J(y) by

application of Proposition 2.7.1.

2.7.6. Consider the nonholonomic constraint

G(y,y′) = g1(y)y′1 +g2(y)y′

2 = 0 on [a,b] ,

for n= 2 and m= 1. Assume that the vector field g= (g1,g2) : R2 → R2 is contin-

uously totally differentiable and that g(y) �= 0 for all y ∈ R2. Then,

DpG(y, p) = ∇pG(y, p) = g(y) �= 0

has rank m= 1 for all y ∈ R2. Prove the statements:

a) Any solution y ∈ (C1[a,b])2 of G(y,y′) = 0 is not normal.b) Any solution y ∈ (C2[a,b])2 of G(y,y′) = 0 satisfying y(a) = A and y(b) = B is

bound.

2.8 Transversality

For the problem of finding a shortest connection between two disjoint surfaces inspace, all rectifiable curves having free boundaries on the surfaces are admitted. Wegeneralize this scenario as follows:

Definition 2.8.1. For a parametric functional

J(x) =∫ tb

taΦ(t,x, x)dt (2.8.1)

curves x ∈ (C1,pw[ta, tb])n for n ≥ 2 are admitted, whose boundaries satisfy x(ta) ∈Ma and/or x(tb) ∈ Mb. Here, Ma and Mb are disjoint manifolds in Rn given by

M = {x ∈ Rn|Ψ(x) = 0}, (2.8.2)

where the function Ψ : Rn → Rm is twice continuously totally differentiable and

whose Jacobian matrix DΨ(x) has maximal rank m. The dimension n−m of themanifolds M =Ma and M =Mb may be different as long as n−m > 0.

If x∈ (C1,pw[ta, tb])n is admitted, then x+sh for any h∈ (C1,pw0 [ta, tb])n is admitted

as well. Therefore, we can state by the same arguments employed before leading to(1.10.22):


http://dx.doi.org/10.1007/978-3-319-71123-2_1

2.8 Transversality 113

Fig. 2.14 Curves Connecting Two Manifolds

Proposition 2.8.1. Let x ∈ (C1,pw[ta, tb])n ∩ {x(ta) ∈ Ma and/or x(tb) ∈ Mb} be alocal minimizer of the functional (2.8.1) whose Lagrange function Φ : [ta, tb]×R

n×Rn → R is continuous and continuously partially differentiable with respect to the

last 2n variables. Then, x fulfills the system of Euler-Lagrange equations

Φx(·,x, x) ∈ (C1,pw[ta, tb])n and

ddt

Φx(·,x, x) = Φx(·,x, x) piecewise on [ta, tb].(2.8.3)

It is of interest which “natural boundary condition” a local minimizer has tofulfill at a free boundary on a manifold. We use the terminology of the Appendix;in particular, we refer to the definitions of the tangent and normal spaces at a pointx of a manifold, cf. (A.7).

Proposition 2.8.2. Under the hypotheses of Proposition 2.8.1, a local minimizerx∈ (C1,pw[ta, tb])n∩{x(ta)∈Ma and/or x(tb)∈Mb} of the functional (2.8.1) satisfies

Φx(ta,x(ta), x(ta)) ∈ Nx(ta)Ma and/or

Φx(tb,x(tb), x(tb)) ∈ Nx(tb)Mb, respectively.(2.8.4)

Proof. Let Ma =M be given by (2.8.2). Then, for x(ta) ∈ Ma, there exists a pertur-bation



x(ta)+ sy+ϕ(sy) ∈ Ma, where

y ∈ Tx(ta)Ma, ‖y‖ ≤ 1,

ϕ(sy) ∈ Nx(ta)Ma , s ∈ (−r,r),

(2.8.5)

cf. (A.11). The function ϕ : Br(0) ⊂ Tx(ta)Ma → Nx(ta)Ma is continuously totallydifferentiable and satisfies

ϕ(0) = 0 and

Dϕ(0) = 0; in particular,dds

ϕ(sy)|s=0 = 0,(2.8.6)

cf. (A.13). For s ∈ (−r,r), t ∈ [ta, tb], let

h(s, t) = η(t)(sy+ϕ(sy)), where

η ∈C1[ta, tb] , η(ta) = 1,η(tb) = 0.(2.8.7)

Then, the perturbation x+h(s, ·) of x is admitted since it fulfills

h(s, ·) ∈ (C1[ta, tb])n,x(ta)+h(s, ta) = x(ta)+ sy+ϕ(sy) ∈ Ma, h(s, tb) = 0,

h(0, t) = 0,

∂∂ s

h(0, t) = η(t)y,∂∂ s

h(0, t) = η(t)y.

(2.8.8)

Since the functional J(x+h(s, ·)) is locally minimal at s= 0, we derive

dds

J(x+h(s, ·))|s=0 = 0

=∫ tb

ta(Φx,ηy)+(Φx, ηy)dt

=∫ tb

ta(Φx − d

dtΦx,ηy)dt+(Φx,ηy)

∣∣tbta

= −(Φx(ta,x(ta), x(ta)),y)

(2.8.9)

where we use also (2.8.3) and (2.8.7)2. Since y ∈ Tx(ta)Ma fulfills only ‖y‖ ≤ 1 butis otherwise arbitrary, the claim (2.8.4)1 follows from the fact that Nx(ta)Ma is theorthogonal complement of Tx(ta)Ma. The claim (2.8.4)2 is proved in an analogousway. �

The requirements (2.8.4) are transversality conditions.For an example, consider

J(x) =∫ tb

taϕ(x)‖x‖dt, (2.8.10)



where ϕ : Rn → R is a continuously differentiable function. If ϕ(x) > 0, it is calleda “density” of a “weighted” length of a curve x. The Euler-Lagrange equation reads

ddt(ϕ(x)

x‖x‖ ) = ∇ϕ(x)‖x‖, (2.8.11)

and due to the invariance (1.10.20) of the Lagrange function, the Euler-Lagrangeequation (2.8.11) is invariant under reparametrizations. A parametrization by thearc length gives ‖x‖ = 1, cf. (2.6.6), such that (2.8.11) and (2.8.4) become

ddt(ϕ(x)x) = ∇ϕ(x), ‖x‖ = 1,

ϕ(x(ta))x(ta) ∈ Nx(ta)Ma,

ϕ(x(tb))x(tb) ∈ Nx(tb)Mb.

(2.8.12)

If ϕ(x(ta)) �= 0 or ϕ(x(tb)) �= 0, the conditions (2.8.12)2,3 mean that the curve isorthogonal to Ma or to Mb. If ϕ(x) ≡ 1, the minimizing curve is a straight line.

Fig. 2.15 A Transversal Curve

Next we derive a modified transversality if the Lagrange function depends explic-itly on the starting point and/or on the endpoint. This condition is used for the gen-eralized brachistochrone problem studied below.

Proposition 2.8.3. Let x ∈ (C1,pw[ta, tb])n ∩ {x(ta) ∈ Ma and/or x(tb) ∈ Mb} be alocal minimizer of

J(x) =∫ tb

taΦ(x, x,x(ta))dt, (2.8.13)

where Φ : Rn ×Rn ×R

n → R is continuously totally differentiable. Then, x satis-fies the Euler-Lagrange equation (2.8.15), and at t = ta a modified transversality


http://dx.doi.org/10.1007/978-3-319-71123-2_1


condition, viz.,

Φx(x(ta), x(ta),x(ta))−∫ tb

taΦx(ta)(x, x,x(ta))dt ∈ Nx(ta)Ma. (2.8.14)

Here, Φx(ta) denotes the vector of the derivatives of Φ with respect to the last nvariables. If x(tb) ∈ Mb, then x satisfies the transversality condition (2.8.4)2.

Proof. For all h ∈ (C1,pw0 [ta, tb])n, the perturbation x+ sh is admitted as well, and by

assumption,

J(x+ sh) =∫ tb

taΦ(x+ sh, x+ sh,x(ta))dt

is minimal at s= 0. Therefore, x satisfies

Φx(x, x,x(ta)) ∈ (C1,pw[ta, tb])n and

ddt

Φx(x, x,x(ta)) = Φx(x, x,x(ta)) piecewise on [ta, tb].(2.8.15)

For the perturbation h(s, t) given in (2.8.7) and (2.8.8), we have ∂∂ s h(0, ta) = y, and

thus,

dds

J(x+h(s, ·))|s=0 = 0

=∫ tb

ta(Φx(x, x,x(ta)),ηy)+(Φx(x, x,x(ta)), ηy)dt

+∫ tb

ta(Φx(ta)(x, x,x(ta)),y)dt

= −(Φx(x(ta), x(ta),x(ta)),y)+(∫ tb

taΦx(ta)(x, x,x(ta))dt,y

).

(2.8.16)

After integration (2.8.16)2 by parts, we apply the Euler-Lagrange equation (2.8.15),cf. (2.8.9). The fact that y ∈ Tx(ta)Ma satisfies only ‖y‖ ≤ 1 but is otherwise arbitraryimplies the claim (2.8.14).

The last claim is proved with a perturbation h that is analogous to (2.8.7) whereta and Ma are replaced by tb and Mb, respectively. Then, h(s, ta) = x(ta) for all s ∈(−r,r), whence ∂

∂ s h(0, ta) = 0. Therefore, the integral (2.8.16)3 vanishes. �

We study the following brachistochrone problem: Given two disjoint curvesMa and Mb in a vertical plane, determine the curve on which a point mass acted ononly by gravity runs from Ma to Mb in shortest time.

The curve is certainly a cycloid, and the problem is where the minimizing cycloidstarts on Ma and where it ends on Mb.

We establish the running time of a point mass m on a curve (x,y) parametrizedby the time t in a vertical (x,y)-plane, where the y-axis points downward. We canuse now our knowledge on holonomic constraints, namely



Fig. 2.16 Admitted Curves for a Brachistochrone connecting Two Manifolds

T =12m(x2 + y2) is the kinetic energy,

V = −mgy is the potential energy, and

Ψ(x,y) = 0 is the holonomic constraint

(2.8.17)

that determines the minimizing curve. According to (2.5.28)–(2.5.32), the equationsof motion derived from the Euler-Lagrange equations are

mx= λΨx(x,y),my= mg+λΨy(x,y).

(2.8.18)

Due to ddtΨ(x,y) =Ψx(x,y)x+Ψy(x,y)y= 0 and by xx+ yy= 1

2ddt (x

2+ y2) = 12ddt v

2,(2.8.18) implies

ddt(

12v2 −gy) = 0. Hence

12v(t)2 −gy(t) =

12v(ta)2 −gy(ta), and

√x(t)2 + y(t)2 =

√2g(y(t)− y(ta))+ v(ta)2.

(2.8.19)

This gives the running time on the curve {(x(t),y(t))|t ∈ [ta, tb]}:

T = tb − ta =∫ tb

ta1dt =

∫ tb

ta

√x2 + y2

2g(y− y(ta))+ v(ta)2 dt, (2.8.20)



where v(ta) is the initial speed.Since the Lagrange function is invariant in the sense of Definition 1.10.2, we can

reparametrize (2.8.20) by τ used in (1.8.17) representing the cycloid. By (1.8.19)t = ατ , α > 0. Defining k = v(ta)2/2g, the functional (2.8.20) becomes (up to thefactor 1√

2g )

J(x,y) =∫ τb

τa

√x2 + y2

y− y(τa)+ kdτ, ˙( ) =

ddτ

, (2.8.21)

where, after reparametrization, we omit the tilde.The functional (2.8.21) is to be minimized among admitted curves (x,y) ∈

(C1[τa,τb])2 ∩ {(x(τa),y(τa)) ∈ Ma,(x(τb),y(τb)) ∈ Mb} ∩ {y− y(τa) + k > 0 on(τa,τb]}. The minimizing curve satisfies for k > 0 the Euler-Lagrange system(2.8.15) on [τa,τb], and according to Proposition 2.8.3, it has also to satisfy themodified transversality (2.8.14) at τ = τa. We know that the Euler-Lagrange systemis solved by a cycloid (and we recommend to the reader to verify it). We obtain

x(τ) = x(τa)− c+ r(τ − sinτ),y(τ) = y(τa)− k+ r(1− cosτ),c= r(τa − sinτa), k = r(1− cosτa).

(2.8.22)

The modified transversality is evaluated using the derivatives of the Lagrange func-tion of (2.8.21) in the family (2.8.22), which gives at τ = τa,

1√2r

((1,cot

τa2)−

∫ τb

τa

(0,

11− cosτ

)dτ

)

=1√2r

(1,cotτb2) ∈ N(x(τa),y(τa))Ma.

(2.8.23)

At τ = τb, the transversality (2.8.4)2 must be satisfied, meaning

1√2r

(1,cotτb2) ∈ N(x(τb),y(τb))Mb. (2.8.24)

Since

cotτb2

=sinτb

1− cosτb=

y(τb)x(τb)

, (2.8.25)

the vector in (2.8.23)2 and in (2.8.24) is tangent to the cycloid at its endpoint(x(τb),y(τb)) on Mb. Geometrically, (2.8.24) means that the cycloid is orthogonalto Mb.

The remarkable property is that by (2.8.23)2, the tangent vector in the endpointis also orthogonal to Ma in its starting point (x(τa),y(τa)), see Figure 2.17.

The limit v(ta)↘ 0 yields k= 0,τa = 0, and finally c= 0 in (2.8.22). The cycloidhas then a vertical tangent in its starting point on Ma, as expected.

We determine for special curves Ma and Mb the starting and endpoints of theminimizing cycloid. We choose


http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1


Fig. 2.17 Conditions on a Brachistochrone Connecting Two Manifolds

Ma = {(x,y)|x2 + y2 = 1}, Mb = {(x,y)|(x− x0)2 +(y−1)2 = 1}, (2.8.26)

which are two circles with centers (0,0) and (x0,1) and radii 1. At the points(x(0),y(0)) ∈ Ma and (x(τb),y(τb)) ∈ Mb, the manifolds Ma and Mb have the samenormal vectors, which are radial vectors in this case, cf. (2.8.23)2, (2.8.24):

(x(0),y(0)) = (cosα,sinα),(x(τb),y(τb)) = (x0,1)− (cosα,sinα).

(2.8.27)

Substitution into (2.8.22) (where c= k = 0) gives

r(τb − sinτb) = x0 −2cosα,

r(1− cosτb) = 1−2sinα and

f (τb) =τb − sinτb1− cosτb

=x0 −2cosα1−2sinα

, see Fig. 1.14.

(2.8.28)

From (2.8.24) and (2.8.25), the tangent at the end is normal to the circle, whichmeans

cotτb2

= tanα. (2.8.29)

Finally, the equation

r =1−2sinα1− cosτb

(2.8.30)



gives the parameter r, where τb and α are determined by (2.8.28)3 and (2.8.29). InFigure 2.18, we sketch three typical cases.

Proposition 2.8.1 applied to nonparametric functionals

J(y) =∫ b

aF(x,y,y′)dx, (2.8.31)

for functions y ∈ (C1,pw[a,b])n with boundary points on manifolds Ma and/or Mb inRn, n ≥ 2, shows that local minimizers satisfy the following transversalities:

Fy′(a,y(a),y′(a)) ∈ Ny(a)Ma and/or

Fy′(b,y(b),y′(b)) ∈ Ny(b)Mb,

(2.8.32)

where Fy′ = (Fy′1 , . . . ,Fy′n). We mention that the natural boundary conditions onthe boundaries of a manifold M in R

n given in Proposition 2.5.5 are precisely thetransversalities conditions (2.8.32).

Another problem is the following (we confine ourselves to the case n= 1).

Definition 2.8.2. For the functional

J(y) =∫ xb

xaF(x,y,y′)dx, (2.8.33)

functions y ∈ C1,pw[xa,xb] are admitted, whose boundaries (xa,y(xa)) and/or(xb,y(xb)) are on one-dimensional manifolds (curves) Ma and/or Mb, respectively.Ma and Mb are disjoint curves in R2 and represented by

M = {(x,y) ∈ R2|Ψ(x,y) = 0}, or in particular,

Mψ = {(x,y) ∈ R2|y−ψ(x) = 0}.

(2.8.34)

The functionsΨ : R2 → R and ψ : R → R are twice continuously (partially) differ-entiable, and the Jacobian matrix DΨ(x) has maximal rank 1 for all x ∈ M. In thecase of (2.8.34)2, the set Mψ is the graph of ψ , and M and Mψ are continuouslydifferentiable manifolds of dimension 1, i.e., curves, cf. the Appendix. It is also pos-sible that one boundary is prescribed by xa = a and y(a) = A and/or xb = b andy(b) = B.

Since the intervals [xa,xb], where admitted functions are defined, are variable, thedistance between two admitted functions is not defined as usual. We reparametrizethe graph of a function y ∈C1,pw[xa,xb] by a parameter on a fixed interval [τa,τb],cf. (1.11.1), (1.11.2):

{(x,y(x))|x ∈ [xa,xb]} = {(x(τ), y(τ))|τ ∈ [τa,τb]}, (2.8.35)

and we obtain a parametric functional


http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1


Fig. 2.18 Brachistochrones Connecting Two Circles



Fig. 2.19 Admitted Curves Connecting a Manifold and a Point

J(y) =∫ xb

xaF(x,y,y′)dx=

∫ τb

τaF(x, y,

˙y˙x) ˙xdτ = J(x, y), (2.8.36)

cf. (1.11.4). For the parametric functional, curves (x, y)∈ (C1[τa,τb]×C1,pw[τa,τb])∩{ ˙x(τ)> 0 for τ ∈ [τa,τb]} are admitted, which are graphs of functions y∈C1,pw[xa,xb],cf. Exercise 1.10.1. The boundaries of a function y are on manifolds Ma and/or Mb

according to Definition 2.8.2 if and only if the endpoints of the curve (2.8.35) areon Ma and/or Mb.

As in Proposition 1.11.1, it follows that an admitted global minimizer y of(2.8.33) yields, via a parametrization (2.8.35), an admitted local minimizer of theparametric functional (2.8.36). Like the graph of y, the admitted perturbations(1.11.8) of the curve must have endpoints on Ma and/or Mb.

Applying the Propositions 2.8.1 and 2.8.2, the invariance of the Lagrange func-tion Φ(x, y, ˙x, ˙y) = F(x, y, ˙y

˙x ) ˙x in the sense of Definition 1.10.2 gives necessary con-ditions on the global minimizer.

Proposition 2.8.4. Let y ∈ C1,pw[xa,xb] ∩ {(xa,y(xa)) ∈ Ma and/or (xb,y(xb)) ∈Mb} be a global minimizer of the functional (2.8.33) whose Lagrange functionF : R3 → R is continuously totally differentiable. Then, the following hold:


http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1


Fy′(·,y,y′) ∈C1,pw[xa,xb] ⊂C[xa,xb],ddx

Fy′(·,y,y′) = Fy(·,y,y′) piecewise on [xa,xb],

F(·,y,y′)− y′Fy′(·,y,y′) ∈C[xa,xb],((F − y′Fy′)(xa,y(xa),y′(xa)), Fy′(xa,y(xa),y′(xa))) ∈ N(xa,y(xa))Ma,

and in case (2.8.34)2,

F(xa,y(xa),y′(xa))+(ψ ′(xa)− y′(xa))Fy′(xa,y(xa),y′(xa)) = 0

and/or an analog transversality at (xb,y(xb)).

(2.8.37)

Proof. As mentioned before, we apply Propositions 2.8.1 and 2.8.2 to the paramet-ric functional (2.8.36). The derivatives of the Lagrange function read

Φ ˙x(x, y, ˙x, ˙y) = F(x, y,˙x˙y)−Fy′(x, y,

˙x˙y)

˙x˙y,

Φ ˙y(x, y, ˙x, ˙y) = Fy′(x, y,˙x˙y),

Φx(x, y, ˙x, ˙y) = Fx(x, y,˙x˙y) ˙x,

Φy(x, y, ˙x, ˙y) = Fy(x, y,˙x˙y) ˙x.

(2.8.38)

Due to the invariance of Φ , the regularity and the Euler-Lagrange equation (2.8.3)hold for any parametrization. In particular, we have

x= x, ˙x= 1, y= y, ˙y= y′, τa = xa, τb = xb, (2.8.39)

whence (2.8.37)1–(2.8.37)3, cf. also Proposition 1.10.4. The transversality (2.8.4)1

implies (2.8.37)4, and the last statement follows from the fact that a vector tangentto Mψ in (xa,y(xa)) = (xa,ψ(xa)) is given by (1,ψ ′(xa)). �

In (2.8.37)1 and (2.8.37)3, we recognize the Weierstraß-Erdmann corner condi-tions; the transversalities (2.8.37)4 and (2.8.37)7 are called free transversalities.

Exercises

2.8.1. Let x ∈ (C1,pw[ta, tb])n be a local minimizer of

J(x) =∫ tb

taΦ(x, x)dt,

subject to isoperimetric constraints


http://dx.doi.org/10.1007/978-3-319-71123-2_1


Ki(x) =∫ tb

taΨi(x, x)dt = ci , i= 1, . . . ,m,

whose boundary fulfills x(ta) ∈ Ma = {x ∈ Rn|Ψ(x) = 0}. Find and prove its

transversality at its boundary x(ta).We assume that Ψ : Rn → R

ma , where 0 < ma < n, is twice continuously totallydifferentiable and that the Lagrange functions Φ ,Ψi : Rn×R

n →R are continuouslytotally differentiable. We assume also that the Jacobian matrix DΨ(x) has maximalrank ma for all x ∈ Ma and that x is not critical for the constraints in the sense thatδK(x) : (C1,pw

0 [ta, tb])n → Rm is surjective.

2.8.2. Let x ∈ (C2[ta, tb])n be a local minimizer of

J(x) =∫ tb

taΦ(x, x)dt,

subject to the holonomic constraint

Ψ(x) = 0 or

x ∈ M = {x ∈ Rn|Ψ(x) = 0},

whose boundary fulfills x(ta) ∈ Ma = {x ∈ M|Ψa(x) = 0} ⊂ M. Find and prove itstransversality in its boundary x(ta).

We assume that Ψ : Rn → Rm and Ψa : Rn → R

ma , where 0 < m+ma < n, arethree times and the Lagrange function Φ :Rn×R

n →R is twice continuously totallydifferentiable. We assume also that the Jacobian matrices of Ψ : Rn → R

m and(Ψ ,Ψa) : Rn → R

m+ma have maximal rank m and m+ma in all x ∈ M and in allx ∈ Ma, respectively. Then, Ma ⊂ M is a (n− (m+ma))-dimensional submanifoldof the (n−m)-dimensional manifold M.

2.8.3. Determine (xb,y(xb)), xb > 1, on the graph of

ψ(x) =2x2 −3 ,

such thatJ(y) =

∫ xb

1x3(y′)2dx, y(1) = 0,

is extremal. (Only the necessary conditions on extremals have to be verified.)

2.8.4. Determine an extremal of

J(y) =∫ xb

0y2 +(y′)2dx, y(0) = 1,

whose boundary fulfills (xb,y(xb)) = (xb,2), xb > 0.

2.8.5. Determine an extremal y ∈C1,pw[0,1] of


2.9 Emmy Noether’s Theorem 125

J(y) =∫ 1

0

12(y′)2 + yy′ + y′ + ydx,

whose boundaries are on M0 = {(0,y)|y ∈ R} and M1 = {(1,y)|y ∈ R}.

2.8.6. Check whether the functions determined in Exercises 2.8.3–2.8.5 are localor global minimizers or maximizers among admitted functions of the respectivefunctionals.

Hint: Observe that the boundary (xb,y(xb)+h(xb)) of admitted perturbations y+his necessarily on the respective given manifold, where xb and xb might be different.

2.9 Emmy Noether’s Theorem

“Invariants” are important for mathematics as well as theoretical physics, meaningthat certain quantities or physical laws are invariant under special transformations.These are, in particular, actions of the “orthogonal group,” or a special symmetrygroup, under which a given physical law is invariant, and possibly translations. Theinvariance restricts functions or functionals and often simplifies their analysis. Theinvariance of differential equations, like the Euler-Lagrange equation, means thatone solution defines a family of solutions, thus allowing, as we shall see, to estab-lish conservation laws. These, in turn, help the mathematical analysis and providephysical insights.

Definition 2.9.1. A continuously totally differentiable Lagrange functionΦ : Rn ×R

n → R of a parametric functional

J(x) =∫ tb

taΦ(x, x)dt, (2.9.1)

is invariant under a family of local diffeomorphisms

hs : Rn → Rn, s ∈ (−δ ,δ ), (2.9.2)

ifΦ(hs(x),Dhs(x)x) = Φ(x, x) for all (x, x) ∈ R

n ×Rn (2.9.3)

and for all s ∈ (−δ ,δ ). The mappings hs are twice continuously partially differen-tiable with respect to the parameters s∈ (−δ ,δ ) and x∈R

n. The Jacobian matrices

Dhs(x) ∈ Rn×n are regular for all s ∈ (−δ ,δ ) and all x ∈ R

n. (2.9.4)

Inserting a curve x ∈ (C1[ta, tb])n into hs, we then have ddt h

s(x) = Dhs(x)x, andhence, the invariance (2.9.3) implies



Φ(hs(x),ddths(x)) = Φ(x, x) for all t ∈ [ta, tb] (2.9.5)

and for all s ∈ (−δ ,δ ). Therefore, a minimizing curve x defines a family hs(x) ofminimizing curves of the functional (2.9.1). Before profiting from that fact, we givethe following formulas obtained by differentiation of (2.9.3):

Φx(hs(x),Dhs(x)x)Dhs(x)+Φx(hs(x),Dhs(x)x)D2hs(x)x= Φx(x, x),Φx(hs(x),Dhs(x)x)Dhs(x) = Φx(x, x).

(2.9.6)

Here, we employ the notation

Φx = (Φx1 , . . . ,Φxn), Φx = (Φx1 , . . . ,Φxn),

Dhs(x) =(

∂hsi∂x j

(x))

i=1,...,nj=1,...,n

, D2hs(x)x=

(n

∑k=1

∂ 2hsi∂x j∂xk

(x)xk

)i=1,...,nj=1,...,n

,(2.9.7)

and that the product of a row vector and a matrix is again a row vector. For a curvex ∈ (C1[ta, tb])n, we obtain

ddths(x) = Dhs(x)x and

ddtDhs(x) = D2hs(x)x for t ∈ [ta, tb]. (2.9.8)

Proposition 2.9.1. Assume that Φ : Rn ×Rn → R is continuously totally differen-

tiable and invariant under a family of local diffeomorphisms hs : Rn → Rn, s ∈

(−δ ,δ ), in the sense of Definition 2.9.1. If x∈ (C1[ta, tb])n is a solution of the Euler-Lagrange system, i.e.,

ddt

Φx(x, x) = Φx(x, x) on [ta, tb], (2.9.9)

then hs(x) ∈ (C1[ta, tb])n is a solution as well, i.e.,

ddt

Φx(hs(x),ddths(x)) = Φx(hs(x),

ddths(x)) on [ta, tb]. (2.9.10)

Proof. By (2.9.6) and (2.9.8), we obtain



ddt

Φx(x, x)−Φx(x, x) = 0

=ddt(Φx(hs(x),Dhs(x)x)Dhs(x))

−Φx(hs(x),Dhs(x)x)Dhs(x)−Φx(hs(x),Dhs(x)x)D2hs(x)x

=ddt

Φx(hs(x),Dhs(x)x)Dhs(x)+Φx(hs(x),Dhs(x)x)D2hs(x)x

−Φx(hs(x),Dhs(x)x)Dhs(x)−Φx(hs(x),Dhs(x)x)D2hs(x)x

=(

ddt

Φx(hs(x),Dhs(x)x)−Φx(hs(x),Dhs(x)x))Dhs(x)

=(

ddt

Φx(hs(x),ddths(x))−Φx(hs(x),

ddths(x))

)Dhs(x),

(2.9.11)

which implies the claim (2.9.10), given that Dhs(x) is assumed to be regular for allt ∈ [ta, tb], cf. (2.9.4). �

In 1918, E. Noether (1882–1935) proved the following conservation law. It isimportant in mathematics as well as theoretical physics.

Proposition 2.9.2. Assume that Φ : Rn ×Rn → R is continuously totally differen-

tiable and invariant under a family of local diffeomorphisms hs : Rn → Rn, s ∈

(−δ ,δ ), in the sense of Definition 2.9.1. If x∈ (C1[ta, tb])n is a solution of the Euler-Lagrange system of the functional (2.9.1), i.e.,

ddt

Φx(x, x) = Φx(x, x) on [ta, tb], (2.9.12)

then for each s ∈ (−δ ,δ )

Φx(hs(x),Dhs(x)x)∂∂ s

hs(x) = const. for t ∈ [ta, tb]. (2.9.13)

In the special case h0(x) = x and Dh0(x) = E, (2.9.13) yields

Φx(x, x)∂∂ s

hs(x)|s=0 = const. for t ∈ [ta, tb]. (2.9.14)

Here, the product of the vectors Φx and ∂∂ s h

s is the Euclidean scalar product in Rn.

Different from our usual notation for the scalar product, we adopt here the nota-tion of physicists.

Proof. By differentiation of (2.9.3) with respect to s ∈ (−δ ,δ ), we obtain


hs(x)+Φx(hs(x),Dhs(x)x)∂∂ s

Dhs(x)x= 0. (2.9.15)



Using∂∂ s

Dhs(x)x=∂∂ s

ddths(x) =

ddt

∂∂ s

hs(x) (2.9.16)

and the Euler-Lagrange system (2.9.10), differentiation by the product rule yieldsfor all t ∈ [ta, tb]:

ddt

(Φx(hs(x),Dhs(x)x)

∂∂ s

hs(x))

=ddt

Φx(hs(x),ddths(x))

∂∂ s

hs(x)+Φx(hs(x),Dhs(x)x)ddt

∂∂ s

hs(x)

= Φx(hs(x),ddths(x))

∂∂ s


Dhs(x)x

= Φx(hs(x),Dhs(x)x)∂∂ s


Dhs(x)x= 0

(2.9.17)

by (2.9.15). �

We give some applications to Lagrangian mechanics of Noether’s Theorem.

1. Here and in the next two applications, we return to the mechanical model intro-duced in (2.5.28), (2.5.29), (2.5.31): N point masses m1, . . . ,mN in R

3 havethe coordinates x= (x1,y1,z1, . . . ,xN ,yN ,zN) ∈ R

3N . The Lagrangian is the freeenergy

L(x, x) = T (x)−V (x) where

T (x) =N

∑k=1

12mk(x2

k + y2k + z2

k).(2.9.18)

We assume invariance under a simultaneous shift of all point masses in thedirection of one axis, i.e.,

hs : R3N → R3N is defined by

hs(x1,y1,z1, . . . ,xN ,yN ,zN) = (x1 + s,y1,z1, . . . ,xN + s,yN ,zN).(2.9.19)

The invariance of the kinetic energy is obvious, and we assume only the invari-ance of the potential energy, which means

V (hs(x)) =V (x). (2.9.20)

Then, (2.9.14) of Proposition 2.9.2 yields

Lx(x, x)∂∂ s

hs(x)|s=0 =N

∑k=1

mkxk = const. (2.9.21)

along each solution x of the Euler-Lagrange system of the action (2.5.31). Thesecurves x are solutions of the equations of motion (the Euler-Lagrange system)



mkxk = −Vxk(x),mkyk = −Vyk(x),mkzk = −Vzk(x) on [ta, tb], k = 1, . . . ,N.

(2.9.22)

If the potential energy is invariant under a simultaneous shift in the direction ofone axis, the conservation law (2.9.21) means that the combined total momen-tum of all point masses in that direction is conserved.

2. Assume that the potential energy of the N point masses m1, . . . ,mN dependsonly on the differences (xi,yi,zi)−(xk,yk,zk) of the coordinates of mi and of mk,i,k = 1, . . . ,N. Then, V is invariant under simultaneous shifts in all directionsof R3, and as seen in Example 1, the total momentum of all point masses isconstant in all directions:

N

∑k=1

mk(xk, yk, zk) = a or

N

∑k=1

mk(xk,yk,zk) = at+b where a,b ∈ R3.

(2.9.23)

The coordinates of the barycenter of all N point masses are ∑Nk=1mk(xk,yk,zk)/

∑Nk=1mk, and (2.9.23) means that the barycenter moves with constant speed in

one direction of the three-dimensional space.3. Assume that the potential energy of the N point masses depends only on the

distances√

(xi − xk)2 +(yi − yk)2 +(zi − zk)2 of mi and mk, i,k= 1, . . . ,N, thenthe total linear momentum is constant in all directions.However, there is still another conservation law. The free energy L(x, x) =T (x)−V (x) is also invariant under simultaneous rotations of all point masses.Take for instance the rotation about the z-axis by an angle s

Rs =

⎛⎝ coss −sins 0

sins coss 00 0 1

⎞⎠ ,

hs(x) = (Rs(x1,y1,z1), . . . ,Rs(xN ,yN ,zN)).

(2.9.24)

Then, linearity and orthogonality of hs yield

Dhs(x)x= (Rs(x1, y1, z1), . . . ,Rs(xN , yN , zN)),T (Dhs(x)x) = T (x) , since

‖Rs(xk, yk, zk)‖2 = ‖(xk, yk, zk)‖2 = x2k + y2

k + z2k

(2.9.25)

for k= 1, . . . ,N. Furthermore, the orthogonality of Rs implies that simultaneousrotations do not change the distances of the point masses. Hence, they leave thepotential energy invariant:

V (hs(x)) =V (x). (2.9.26)



Since L(x, x) = T (x)−V (x) is invariant under simultaneous rotations (2.9.24),Noether’s Theorem (Proposition 2.9.2) yields:

∂∂ s

Rs|s=0 =

⎛⎝ 0 −1 0

1 0 00 0 0

⎞⎠ ,

Lx(x, x)∂∂ s

hs(x)|s=0 =N

∑k=1

mk(ykxk − xkyk)

=N

∑k=1

((xk,yk,zk)×mk(xk, yk, zk),e3) = const.,

(2.9.27)

where “×” denotes the vector product, ( , ) denotes the Euclidean scalarproduct in R

3, and e3 = (0,0,1). The expression (2.9.27)3 describes the totalangular momentum of the point masses about the z-axis.Since the free energy L(x, x) = T (x)−V (x) is (by assumption) invariant undersimultaneous rotations about all axes, the total angular momentum is constantabout all axes, i.e.,

N

∑k=1

(xk,yk,zk)×mk(xk, yk, zk) = c ∈ R3. (2.9.28)

Finally, the conservation of the total energy of one point mass, cf. Exercise 1.10.4,holds for N point masses as well.

We summarize: If the potential energy V of the N point masses depends only ontheir distances, the following conservation laws hold for any motion governed by(2.9.22):

N

∑k=1

mk(xk, yk, zk) = a

(conservation of linear momentum),N

∑k=1

(xk,yk,zk)×mk(xk, yk, zk) = c

(conservation of angular momentum),N

∑k=1

12mk(x2

k + y2k + z2

k)+V (x1,y1,z1, . . . ,xN ,yN ,zN) = E

(conservation of total energy).

(2.9.29)

A prominent and interesting example is the solar system, consisting of the sun andits planets considered as point masses. By Newton’s law of gravitation, the forcebetween two point masses mi and mk depends only on the masses and their mutualdistance. If N masses are involved, the system gives rise to a so-called N-body prob-lem. For N ≥ 3, the theoretical analysis of their motion is an extremely hard prob-lem, even when exploiting the conservation laws (2.9.29). Up to the present time,


http://dx.doi.org/10.1007/978-3-319-71123-2_1

2.10 The Two-Body Problem 131

there is no proof that the motion of the solar system is stable. This was the prizequestion asked by the Swedish king Oskar II in the year 1885. The prize was wonby H. Poincaré although he did not give a definite answer. But he opened the doorto a promising mathematical theory.

The earth with the moon, the International Space Station (ISS), and countlesssatellites form a similar system, where, however, the influence of the sun and theother planets cannot be neglected. The success and precision of all space flights relyon accurate numerical simulations.

In the next paragraph, we study the two-body problem.

Exercise

2.9.1. Assume that a continuously partially differentiable Lagrange function Φ :Rn×R

n →R is invariant under a family of local diffeomorphisms hs :Rn →Rn, s∈

(−δ ,δ ), in the sense of Definition 2.9.1. Prove the following conservation law:Let x ∈ (C2[ta, tb])n be a local minimizer of the functional

J(x) =∫ tb

taΦ(x, x)dt,

subject to a holonomic constraint

Ψ(x) = 0, where Ψ : Rn → Rm, n > m,

is three times continuously partially differentiable. Assume that the Jacobian matrixDΨ(x) ∈ R

m×n has a maximal rank m for all x ∈ M = {x ∈ Rn|Ψ(x) = 0} and that

Ψ is invariant, i.e.,Ψ(hs(x)) =Ψ(x) for all x ∈ R

n.

Then,


hs(x) = const. for t ∈ [ta, tb] and for all s ∈ (−δ ,δ ),

and in particular for h0(x) = x and Dh0(x) = E,

Φx(x, x)∂∂ s

hs(x)|s=0 = const. for t ∈ [ta, tb] and for all s ∈ (−δ ,δ ).

2.10 The Two-Body Problem

Two point masses m1 and m2 move in R3 acted on only by the gravitational force

on each other. Isaac Newton (1643–1727) gave the formula for this force: If the



coordinates of mk are (xk,yk,zk), then the distance between the two point masses m1

and m2 is r =√

(x1 − x2)2 +(y1 − y2)2 +(z1 − z2)2, and the magnitude of the forceacting on m1 as well as on m2 amounts to

G= γm1m2

r2 , where γ is the gravitational constant. (2.10.1)

In the equations of motion (2.9.22), the force is the negative gradient of the potentialenergy. According to (2.10.1), for x = (x1,y1,z1,x2,y2,z2), this potential energy isgiven by

V (x) = −kr, k = γm1m2, (2.10.2)

and the equations of motion for m1 and m2 read

m1x1 = − kr3 (x1 − x2), m2x2 =

kr3 (x1 − x2),

m1y1 = − kr3 (y1 − y2), m2y2 =

kr3 (y1 − y2),

m1z1 = − kr3 (z1 − z2), m2z2 =

kr3 (z1 − z2).

(2.10.3)

The force on m1 acts in direction of m2, and the force on m2 acts in direction of m1,with the same magnitude given by (2.10.1). Since the potential energy depends onlyon the distance between m1 and m2, the conservation laws (2.9.29) are valid, and inparticular (2.9.23)2 shows that the barycenter of m1 and m2 moves with a constantspeed in a fixed direction of R3. This motion, however, is not of interest. Only themotion of the relative position vector

(x,y,z) = (x1 − x2,y1 − y2,z1 − z2), (2.10.4)

describing the relative position of the two point masses is to be analyzed. By(2.10.3), the components of the relative position vector satisfy the following sys-tem:

m1m2x= −(m1 +m2)kr3 x,

m1m2y= −(m1 +m2)kr3 y,

m1m2z= −(m1 +m2)kr3 z.

(2.10.5)

The system (2.10.5) corresponds to the Euler-Lagrange equations of the Lagrangian

L=12m(x2 + y2 + z2)−V (x,y,z), where

V = −kr, r =

√x2 + y2 + z2, m=

m1m2

m1 +m2.

(2.10.6)



Again, this potential energy (2.10.6)2 depends only on the length r of the vector(x,y,z). Hence for solutions of (2.10.5), the conservation laws (2.9.29) are valid; inparticular, conservation of the angular momentum holds:

(x,y,z)×m(x, y, z) = c ∈ R3. (2.10.7)

Therefore, any solution (x,y,z) of (2.10.5) fulfills

((x,y,z),(x,y,z)× (x, y, z)) = ((x,y,z),cm) = 0, (2.10.8)

because the vector product is orthogonal to the relative position vector. For c = 0,the solution fulfills (x, y, z) = α(x,y,z) for all t ∈ [ta, tb], describing a straight line.(Equations (2.10.5) do not admit an equilibrium.) For c �= 0, the solution curve is ina plane orthogonal to the vector c. In both cases, we see that

any solution of (2.10.5) is planar. (2.10.9)

Without loss of generality, we assume that the motion takes place in the (x,y)-plane,i.e., z= z(t) = 0 for all t. The conservation of angular momentum (2.10.7) yields inthis case

(x,y,0)× (x, y,0) = (xy− xy)e3 =cm

, e3 = (0,0,1),

xy− xy= β ∈ R, where βe3 =cm

for all t ∈ [ta, tb].(2.10.10)

If β = 0, the solution runs in a straight line through (0,0). For β �= 0, we considerthe domain F = F(t) sketched in Figure 2.20.

Fig. 2.20 On Green’s Formula

By Green’s formula (2.2.3), the area F is given by the line integral



F =12

∫ sb

sax ˙y− ˙xyds (2.10.11)

where {(x(s), y(s))|s∈ [sa,sb]} is a parametrization of the boundary of F , cf. (2.2.4).Along the lines from (0,0) to (x(t0),y(t0)) and from (x(t),y(t)) to (0,0), the inte-grand vanishes; the vectors (x, y) and ( ˙y,− ˙x) are orthogonal. From (x(t0),y(t0)) to(x(t),y(t)), we parametrize the curve by (x, y) = (x,y), and (2.10.10)2 implies

F =12

∫ t

t0xy− xydt =

12

β (t− t0) and

dFdt

=12

β .

(2.10.12)

The last expression is called “areal velocity,” and formulas (2.10.12) mean the fol-lowing:

The areal velocity of any solution is constant.

The line segment from (0,0) to (x(t),y(t)) sweeps out

equal areas during equal time intervals.

(2.10.13)

We introduce polar coordinates in the (x,y)-plane:

x= r cosϕ, r ≥ 0,

y= r sinϕ, ϕ ∈ R.(2.10.14)

The conserved total energy E = T+V , cf. (2.9.29), of a solution (r,ϕ) = (r(t),ϕ(t))is given by

12m(r2 + r2ϕ2)+V (r) = E, (2.10.15)

and the angular momentum, cf. (2.10.10), reads

r2ϕ = β . (2.10.16)

Replacing ϕ in (2.10.15) by ϕ from (2.10.16) yields

12m(r2 +

β 2

r2 )−kr= E, or

r =

√2m(E −U(r)) = F(r), where U(r) =

β 2m2r2 − k

r.

(2.10.17)

The function U(r) is called “effective potential energy.” Obviously, it is necessary

that U(r) ≤ E for a solution r = r(t). For β �= 0, solutions with − k2

2β 2m≤ E < 0

are bounded, there is no solution for E < − k2

2β 2m, and for E ≥ 0 all solutions are

unbounded. For β = 0, solutions for E < 0 are bounded, and for E ≥ 0, they mightbe unbounded.



Fig. 2.21 The Effective Potential Energy

Now, let − k2

2β 2m< E < 0. Then,

ϕ = G(r), whereddr

G(r) =β

r2F(r), (2.10.18)

which follows from the fact that the expressions (2.10.16) and (2.10.17)2 yield ϕ =ddrG(r)r =

ddt G(r). The function

β√m/r2√

2(E −U(r))has a primitive,

G(r) = arccos(β

√m/r)− (k/β

√m)√

2E+(k2/β 2m), and ϕ = G(r) yields

r =p

1+ ecosϕ, where p= β 2m/k, e=

√1+E

2β 2mk2 .

(2.10.19)

The assumption on E gives 0 < e < 1, and (2.10.19)3 corresponds to an ellipse,where

a=12

(p

1+ e+

p1− e

)=

p1− e2 is the semimajor axis,

b= a√

1− e2 is the semi-minor axis,

and the origin is a focus, cf. Figure 2.22.

(2.10.20)

A transition from ϕ to ϕ +c (an integration constant in (2.10.19)) corresponds toa rotation of the ellipse about the origin.



Fig. 2.22 An Ellipse in Polar Coordinates

All solutions with nonvanishing angular momentum and with energy allowingonly bounded solutions run on ellipses. According to (2.10.4), the vector (x,y,0) isthe vector from m2 to m1 with length r, which means:

The point mass m1 runs on an ellipse

where the point mass m2 is at one of its foci.(2.10.21)

The period T of m1 on the ellipse is related to its area F , cf. (2.10.12):

F =12

βT = πab= πa2√

1− e2. (2.10.22)

By (2.10.19)3 and (2.10.20), we obtain

β 2 =kmp=

kma(1− e2) and

T 2

a3 = 4π2mk

.

(2.10.23)

By the definitions (2.10.2) and (2.10.6)2, we can rewrite (2.10.23)2 as follows:

T 2

a3 =4π2

γ(m1 +m2). (2.10.24)

In 1609, Johannes Kepler (1571–1630) published in his treatise “AstronomiaNova” two laws, which are known as the first and second Kepler’s laws:



1. The orbit of a planet is an ellipse with the sun at one of the two foci.2. A line segment joining the sun and a planet sweeps out equal areas during equal

intervals of time.We find both laws in (2.10.21) and in (2.10.13).In 1619, Kepler published in an article “Harmonices Mundi” his so-called thirdlaw:

3. The square of the orbital period of a planet is proportional to the cube of thesemimajor axis of its ellipse.

To be more precise, the proportionality constant is the same for all planets. ToKepler, this demonstrated the harmony of the world.

We find the third law in (2.10.24) - almost. But due to the big mass m2 of the sun,the sum m1 +m2 is almost equal for all planets.

We have to admit an inaccuracy too: The above proof of the three Kepler’s lawsis valid only if one planet orbits the sun. But there are eight big planets, Mercury,Venus, Earth, Mars, Jupiter, Saturn, Uranus, Neptune, an asteroid belt, and a trans-Neptunian region with Pluto and some more. Since all celestial bodies attract eachother according to Newton’s law, the solar system has to be treated as an N-bodyproblem with N ≥ 9. However, an accurate analysis like that for the two-body prob-lem is not (yet) possible. As mentioned before, we do not even know if the presentconstellation of the solar system is stable.To Kepler and his contemporaries, the harmony of the solar system was undoubted.The order of the cosmos was a proof of God for Newton.

Some comments on other solutions of (2.10.17): For β �= 0 and E ≥ 0, there is aminimal r > 0 with r= 0, cf. Figure 2.21. If m1 starts with r < 0, r changes sign forthe minimal r and r > 0 is conserved for the future time. The point mass m1 passesm2 and disappears at infinity.

For β = 0, the orbits are straight lines through the origin (0,0), due to ϕ = 0,by (2.10.16). For E < 0, there is a maximal r > 0 where r = 0 can vanish. If m1

starts with r > 0 below that bound, then r changes sign and m1 falls with r < 0 on astraight line on m2. If E ≥ 0, the sign of r does not change: The point mass m1 fallson m2 if r < 0, or it disappears at infinity if r > 0.

Exercises

2.10.1. Assume that at time t = 0, the distance of the point mass m1 from m2 isR and that its velocity at time t = 0 is zero. Give a formula for the time T that thepoint mass takes to fall on a straight line on m2, and estimate the time T that m1

takes from r = R to r = 0.

2.10.2. Assume that a point mass m1 falls on a straight line from infinity on m2

with vanishing energy E = 0.

a) Give its velocity at the distance R from m2.b) Give the time that it takes to fall from height R on m2.


Chapter 3Direct Methodsin the Calculus of Variations

3.1 The Method

The Euler-Lagrange calculus was created to determine extremals of functionals. Ifthe solution of the Euler-Lagrange equation is unique among all admitted functions,then physical or geometric insights into the problem might lead to the conclusionthat it is indeed the desired extremal. In addition, the second variation providesnecessary and also sufficient conditions on extremals.

History shows that this approach is quite effective, but it hits its limit if the solv-ability of the Euler-Lagrange equation is open. In particular, if the admitted func-tions depend on more than one variable, the Euler-Lagrange equation is a partial dif-ferential equation, for which it is not generally possible to give an explicit solution.But also for boundary value problems governed by ordinary differential equations,the existence of a solution is generally not as obvious, as shown for some examplesgiven in this book. In these cases, the calculus of variations provides a converseargument: If the variational problem has an extremal then the Euler-Lagrange equa-tion has a solution.

This argument was used by Dirichlet to “prove” in an elegant way the existenceof harmonic functions having prescribed boundary values. The minimizer amongall functions, having the same boundary values, of the so-called Dirichlet integralsolves Laplace’s equation since it is the corresponding Euler-Lagrange equation.Apart from its regularity, which is not trivial in this case, the crucial problem is itsexistence. For Dirichlet the answer was “evident” since the Dirichlet integral is con-vex and bounded from below. As we mention in the Introduction, Weierstraß’ exam-ple, discussed in Paragraph 1.5, shows that convex functionals, which are boundedfrom below, do not have necessarily admitted minimizers.

The key question of the direct methods in the calculus of variations is the follow-ing.

Under which general conditions do functionals have minimizers?We discuss this questions for an abstract functional


139


http://dx.doi.org/10.1007/978-3-319-71123-2_1

140 3 Direct Methods in the Calculus of Variations

J : D ⊂ X → R∪{+∞} (3.1.1)

where X is a normed linear space over R. For global minimizers, which are of maininterest, a first necessary condition is that the functional J is bounded from belowon D:

(H1) J(y) ≥ c ∈ R for all y ∈ D.

Then the infimum exists in R (if J is not identically +∞ on D):

inf{J(y)|y ∈ D} = m > −∞. (3.1.2)

According to the definition of the infimum, there exists a so-called minimizingsequence (yn)n∈N ⊂ D satisfying

limn→∞

J(yn) = m in R. (3.1.3)

Here are two more hypotheses:(H2) The minimizing sequence (yn)n∈N contains a subsequence (ynk)k∈N, which con-

verges to an element y0 ∈ D with respect to a suitable definition of convergence:

limk→∞

ynk = y0 ∈ D. (3.1.4)

(H3) The functional J is lower semi-continuous with respect to the convergence(3.1.4), i.e.,

limn→∞

yn = y0 ⇒ liminfn→∞

J(yn) ≥ J(y0). (3.1.5)

We then have:

Proposition 3.1.1. Under the hypotheses (H1), (H2), and (H3), the functional Jpossesses a global minimizer y0 ∈ D.

Proof. For the subsequence of the minimizing sequence, we have

limk→∞

J(ynk) = m= liminfk→∞

J(ynk) ≥ J(y0) ≥ m. Hence

J(y0) = m= inf{J(y)|y ∈ D}.(3.1.6)

�

The hypotheses (H2) and (H3) compete: The more general is the definition of theconvergence in (H2), i.e., the more sequences converge, the more restrictive is thelower semi-continuity for the functional. One has to find a balance in the follow-ing sense: Under which natural definition of convergence do minimizing sequencescontain convergent subsequences such that a sufficiently large class of funtionals arelower semi-continuous with respect to that convergence.


3.1 The Method 141

We give the definition of a suitable convergence and show next that the class offunctionals satisfying (H3) is sufficiently large.

In functional analysis two definitions of convergence are introduced in a normedlinear space X : convergence with respect to the norm, cf. Definition 1.1.4, called“strong convergence,” and the so-called weak convergence defined as follows:

X ′ = {� : X → R|� is linear and continuous} is the dual space of X .

A sequence (yn)n∈N ⊂ X converges weakly to y0if lim

n→∞�(yn) = �(y0) for all � ∈ X ′.

(3.1.7)

The weak convergence is denoted w-limn→∞ yn = y0. Obviously strong convergenceimplies weak convergence.

Remark. In X = Rn endowed with the Euclidean norm the weak convergence

means a convergence componentwise, which, in turn, implies the strong conver-gence. Therefore in a finite-dimensional normed linear space, strong and weak con-vergence are equivalent.In infinite-dimensional normed linear spaces strong convergence implies weakconvergence but not vice versa. We provide a counterexample in the next remark.

A (real) Hilbert space is defined as follows:

In a (real) Hilbert space X , the norm ‖ ‖is defined by a scalar product( , )

in the following way: ‖y‖ =√(y,y).

A Hilbert space is complete in the following sense:

Any Cauchy sequence converges

to an element in X (strongly with respect to the norm).

(3.1.8)

As in the case of a finite-dimensional Euclidean space, a norm defined by a scalarproduct fulfills the axioms of Definition 1.1.3. The proof relies on the Cauchy-Schwarz inequality.

The Riesz representation theorem (F. Riesz, 1880-1956) reads as follows:

For any � ∈ X ′ there exists a unique z ∈ X

such that �(y) = (y,z) for all y ∈ X .(3.1.9)

We prove this theorem in Proposition 3.1.2.In view of (3.1.9), weak convergence in a real Hilbert space can be defined as

follows:

w- limn→∞

yn = y0 ⇔ limn→∞

(yn,z) = (y0,z) for all z ∈ X . (3.1.10)


http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1


Remark. As stated by Hilbert, the prototype of a Hilbert space is the space �2,which results from a Fourier analysis with a complete and countable orthonormalsystem:

�2 = {y= (ηk)k∈N|∞

∑k=1

η2k < ∞} with the scalar product

(y,z) =∞

∑k=1

ηkζk and norm ‖y‖ =

√∞

∑k=1

η2k .

(3.1.11)

In this space we give an example of a weakly convergent sequence, which does notconverge strongly:

en = (δkn)k∈N = (0,0, . . . ,1,0, . . .) with the number 1 at the nth place,

(en,z) = ζn, limn→∞

ζn = 0, since∞

∑k=1

ζ 2k < ∞. On the other hand,

‖en − em‖ =√2 for all n,m ∈ N with n �= m.

Therefore w- limn→∞

en = 0 but limn→∞

en does not exist.

(3.1.12)

The sequence (en)n∈N ⊂ �2 is bounded since ‖en‖ = 1. This example shows that ininfinite-dimensional normed spaces, the theorem of Bolzano-Weierstraß, saying thata bounded sequence contains a convergent subsequence, is not necessarily true.

We quote from functional analysis the following theorem, which goes back toHilbert. It is crucial for the direct methods in the calculus of variations:

In a “reflexive” space, in a Hilbert space, e.g.,

any bounded sequence contains a weakly convergent subsequence.(3.1.13)

Remark. The definition of a reflexive normed linear space goes beyond the contextof this introduction. All notions and spaces of functional analysis used in this and inthe next paragraph are found in introductory textbooks on analysis and functionalanalysis, such as [18], [19], [20], [21] and [25]. We prove the sequential weakcompactness in the Appendix.

Our excursion into functional analysis allows us to specify hypothesis (H2) moreprecisely and, at the same time, to give a sufficient condition under which it is ful-filled:

(H2)′ The minimizing sequence (yn)n∈N is bounded in a reflexive space X , in a Hilbertspace, e.g., which means ‖yn‖ ≤C for all n ∈ N.Then (yn)n∈N contains a subsequence (ynk)k∈N that converges weakly to an ele-ment y0 ∈ X :

w- limk→∞

ynk = y0 ∈ X . (3.1.14)

If D ⊂ X is closed and convex, for instance if D= z0+X0, where X0 is a closedsubspace of X , then the weak limit y0 of (ynk)k∈N ⊂ D is in D as well.


3.1 The Method 143

A suitable hypothesis on the functional J should entail the boundedness of a mini-mizing sequence. The common hypothesis is “coercivity,” which in its generalform reads as follows:

(H2)′′ For all unbounded sequences (yn)n∈N ⊂ D ⊂ X the sequence (J(yn))n∈N isunbounded from above. This is the case if

J(y) ≥ f (‖y‖) for all y ∈ D ⊂ X

where limr→∞

f (r) = ∞.(3.1.15)

What does hypothesis (H3) mean under (H2)′ and (H2)′′?(H3)′ The functional J is weakly lower semi-continuous, meaning

w- limn→∞

yn = y0 ⇒ liminfn→∞

J(yn) ≥ J(y0). (3.1.16)

Which conditions on J imply (H3)′?A sufficient condition is a partial convexity of J, which in the concrete case

(1.1.1) is given by the partial convexity of the Lagrange function F with respectto the variable y′, cf. Proposition 3.2.5. We recall Exercise 1.4.3, where for a localminimizer the necessary Legendre condition, which is a local partial convexity withrespect to y′, is to be proved. Therefore this sufficient condition to guarantee (H3)′is natural.

Remark. If the Lagrange function F is not partially convex with respect to thevariable y′, the functional J in (1.1.1) does not necessarily have a global minimizer,even if it is bounded from below and if minimizing sequences are bounded. A typicalexample is given by J(y) =

∫ ba y

2+((y′)2 −1)2dx. Its infimum zero is not a minimum:Obviously the conditions y = 0 and y′ = ±1 are exclusive. In view of the relevanceof energy functionals with W-potentials depending on y′ of this type, which wemention in Paragraph 2.4, a way out was created by extending the class of admittedfunctions to measures, the so-called Young measures. For the example above, a mini-mizer has the derivatives ±1 in each point with probability 1/2, respectively. Fordetails we refer for instance to J. M. Ball “A version of the fundamental theoremfor Young measures” in “Partial Differential Equations and Continuum Models ofPhase Transitions,” Lecture Notes in Physics 359, 207–215, Springer-Verlag Berlin,Heidelberg (1989).

So far the functional J : X →R∪{+∞} is as general as possible. Next we restrictourselves to quadratic functionals for two reasons: The direct methods give goodresults, and the Euler-Lagrange equations are linear. However, in Paragraph 3.3 wedemonstrate the strength of the direct method by solving nonlinear problems.


http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_2


Definition 3.1.1. Let X be a (real) Hilbert space. A bilinear form B : X ×X → R

is equivalent to the scalar product ( , ) on X if the following conditions hold for ally,z ∈ X:

B(y,z) = B(z,y), i.e., B is symmetric,

|B(y,z)| ≤C1‖y‖‖z‖ where C1 > 0, i.e., B is continuous,

B(y,y) ≥C2‖y‖2 where C2 > 0, i.e., B is positive definite.

(3.1.17)

If a bilinear form fulfills all conditions (3.1.17), then B is a scalar product on X ,and in view of C2‖y‖2 ≤ B(y,y) ≤C1‖y‖2, the expression

√B(y,y) defines a norm

that is equivalent to√(y,y), cf. (3.1.8). The following proposition contains Riesz’

representation theorem (3.1.9) as a special case.

Proposition 3.1.2. For a bilinear form B : X ×X → R on a real Hilbert X space,assume the hypotheses (3.1.17). Then the following holds:

For any � ∈ X ′there is a unique y0 ∈ Xsuch that

B(y0,h) = �(h) for all h ∈ X .(3.1.18)

This element y0 ∈ X is the global minimizer of the functional

J(y) = 12B(y,y)− �(y), i.e.,

J(y0) = inf{J(y)|y ∈ X} = m.(3.1.19)

All minimizing sequences (yn)n∈N of the functional (3.1.19)1 converge strongly in Xto the global minimizer y0:

limn→∞

J(yn) = m= J(y0) implies

limn→∞

yn = y0 in X .(3.1.20)

Proof. The continuity of � : X →R is equivalent to |�(y)| ≤C3‖y‖ for all y∈ X , anddue to the positive definiteness of B, the functional is coercive and bounded frombelow:

J(y) ≥ 12C2‖y‖2 −C3‖y‖ ≥ c for all y ∈ X . (3.1.21)

Using the symmetry (3.1.17)1 and the positive definiteness (3.1.17)2 of B, we obtainfor any minimizing sequence (yn)n∈N ⊂ X :


3.1 The Method 145

C2‖yn − ym‖2 ≤ B(yn − ym,yn − ym)= 2B(yn,yn)+2B(ym,ym)−B(yn+ ym,yn+ ym)

= 4(12B(yn,yn)− �(yn)+ 1

2B(ym,ym)− �(ym)

−B( 12 (yn+ ym), 12 (yn+ ym))+2�( 12 (yn+ ym)))

= 4(J(yn)+ J(ym)−2J( 12 (yn+ ym)))

= 4(J(yn)−m+ J(ym)−m−2(J( 12 (yn+ ym))−m))≤ 4(ε + ε) for n,m ≥ no(ε),

(3.1.22)

where we use J( 12 (yn+ ym))−m ≥ 0. Thus any minimizing sequence is a Cauchysequence in X , and due to completeness of the Hilbert space, it possesses a limity0 ∈ X . By J(y)− J(y) = 1

2B(y− y,y+ y)− �(y− y), the functional J : X → R iscontinuous, which proves (3.1.20).

Apparently,

J(y0+ th) = 12B(y0+ th,y0+ th)− �(y0+ th)

= J(y0)+ t(B(y0,h)− �(h))+ 12 t

2B(h,h)≥ J(y0) for all h ∈ Xand for all t ∈ R.

(3.1.23)

Therefore

ddtJ(y0+ th)|t=0 = δJ(y0)h= B(y0,h)− �(h) = 0 for all h ∈ X , (3.1.24)

which proves (3.1.18).To prove uniqueness, we assume

B(y0,h) = �(h) = B(y0,h) or

B(y0 − y0,h) = 0 for all h ∈ X ,and in particular,

B(y0 − y0,y0 − y0) = 0,

(3.1.25)

which implies y0 = y0 by positive definiteness. �

Remark. The representation theorem (3.1.18) is also valid for bilinear forms sat-isfying (3.1.17) without the symmetry (3.1.17)1. In this case, (3.1.18) is called theLax-Milgram Theorem, which is not proved with methods of the calculus of varia-tions, but which follows from the Riesz representation theorem as shown below: Forany u∈ X, there exist unique v and v∗ in X, such that for the scalar product, we have

B(u,y) = (v,y), and

B(y,u) = (y,v∗) for all y ∈ X ,

which follows from the fact that B(u, ·), B(·,u) ∈ X ′. Setting v = Lu and v∗ = L∗u,we define linear operators L,L∗ : X → X, which fulfill



(Ly,u) = (y,L∗u) for all y,u ∈ X .

Both operators L and L∗ are continuous:

‖Lu‖2 = (Lu,Lu) = B(u,Lu) ≤C1‖u‖‖Lu‖, or

‖Lu‖ ≤C1‖u‖, and analogously, ‖L∗u‖ ≤C1‖u‖ for all u ∈ X .

The positive definiteness of B implies

C2‖u‖2 ≤ B(u,u) = (Lu,u) ≤ ‖Lu‖‖u‖, or

C2‖u‖ ≤ ‖Lu‖, and analogously C2‖u‖ ≤ ‖L∗u‖ for all u ∈ X .

This proves that the symmetric and continuous bilinear form

S(u,w) = (L∗u,L∗w) is positive definite, i.e.,

C22‖u‖2 ≤ ‖L∗u‖2 = S(u,u) for all u ∈ X ,

and Riesz’ representation theorem is applicable: For any � ∈ X ′, there is a uniquey∗ ∈ X, such that

�(h) = S(y∗,h) = (LL∗y∗,h) = B(L∗y∗,h)for all h ∈ X ,

proving the Lax-Milgram theorem with y0 = L∗y∗.

Next we study eigenvalue problems using the direct methods in the calculus ofvariations. We follow the classical approach in “Methods of Mathematical Physics,Vol. 1 and 2” by R. Courant and D. Hilbert, which is based on the work of BaronJ.W.S Rayleigh (1842–1919), W. Ritz (1878–1909), E. Fischer (1875–1954),H. Weyl (1885–1955), and R. Courant (1888–1972).

Let K : X ×X →R be a continuous and symmetric bilinear form. Then a (global)minimizer y ∈ X of

B(y) = B(y,y), under the constraint

K(y) = K(y,y) = 1,(3.1.26)

fulfills by Lagrange’s multiplier rule (formally, for the time being)

B(y,h) = λK(y,h) for all h ∈ X , (3.1.27)

where λ ∈R, which follows from δB(y)h= 2B(y,h) and δK(y)h= 2K(y,h). Equa-tion (3.1.27) is the weak form of a linear eigenvalue problem as will be shown inParagraph 3.3. The existence of a minimizer is proved in the following proposition.We employ the abbreviations B(y,y) = B(y) and K(y,y) = K(y).


3.1 The Method 147

Proposition 3.1.3. Let B : X ×X → R be a bilinear form on a (real) Hilbert spacefulfilling conditions (3.1.17), and let K : X ×X → R be a bilinear form having thefollowing properties:

K(y,z) = K(z,y) for all y,z ∈ X , i.e., K is symmetric,

w- limn→∞

yn = y0 ⇒ limn→∞

K(yn) = K(y0),

i.e., K is weakly sequentially continuous,

K(y) = K(y,y) > 0 for all y �= 0.

(3.1.28)

Then there exists a global minimizer u1 ∈ X of

B(y) = B(y,y), under the constraint

K(y) = K(y,y) = 1, i.e.,

B(u1) = inf{B(y)|y ∈ X , K(y) = 1} = λ1 > 0.

(3.1.29)

The minimizer fulfills the weak eigenvalue problem

B(u1,h) = λ1K(u1,h) for all h ∈ X . (3.1.30)

Any minimizing sequence (yn)n∈N ⊂ X ∩ {K(y) = 1} contains a subsequence,(ynk)k∈N, that converges (strongly) to the global minimizer u1 in X:

limk→∞

B(ynk) = λ1 = B(u1) and

limk→∞

ynk = u1 in X .(3.1.31)

Proof. Since B is positive definite, the infimum λ1 is nonnegative in (3.1.29)3, andany minimizing sequence (yn)n∈N ⊂ X ∩ {K(y) = 1} is bounded in X . The weaksequential compactness (3.1.13) implies the existence of a subsequence (ynk)k∈Nsuch that w- limk→∞ ynk = u1 ∈ X . Furthermore

B(ynk) = B(ynk ,ynk) = B(ynk −u1,ynk −u1)+2B(ynk ,u1)−B(u1,u1)≥ 2B(ynk ,u1)−B(u1,u1), and thus,

liminfk→∞

B(ynk) ≥ limk→∞

2B(ynk ,u1)−B(u1,u1) = B(u1),(3.1.32)

where we use B(·,u1) ∈ X ′, cf. the definition (3.1.7) of weak convergence. By(3.1.32), the functional B is weakly sequentially lower semi-continuous. The hypoth-esis (3.1.28)2 on K implies 1= limn→∞K(ynk) = K(u1), and therefore

λ1 = limk→∞

B(ynk) ≥ B(u1) ≥ λ1, and thus,

limk→∞

B(ynk) = B(u1) = λ1 = inf{B(y)|y ∈ X ,K(y) = 1}.(3.1.33)



Since B(u1) = B(u1,u1) = λ1, we have λ1 > 0; otherwise λ1 = 0 and the positivedefiniteness of Bwould imply u1 = 0, contradicting K(u1) = 1, cf. (3.1.28)3. Finally,

limk→∞

B(ynk −u1,ynk −u1)

= limk→∞

B(ynk ,ynk)−2 limk→∞

B(ynk ,u1)+B(u1,u1)

= B(u1)−2B(u1)+B(u1) = 0,

(3.1.34)

which proves, by the positive definiteness of B, the (strong) convergence of (ynk)k∈Nto u1 in X .

For any y ∈ X , y �= 0, we have K(y) > 0 and K(y/√K(y)) = 1. Therefore

B(y/√K(y)) = B(y,y)/K(y,y) ≥ λ1, or

B(y,y)−λ1K(y,y) ≥ 0 for all y ∈ X .(3.1.35)

In other words, for all h ∈ X and t ∈ R,

B(u1+ th)−λ1K(u1+ th)

= B(u1)−λ1K(u1)+2t(B(u1,h)−λ1K(u1,h))+ t2(B(h)−λ1K(h))≥ 0= B(u1)−λ1 = B(u1)−λ1K(u1).

(3.1.36)

Since B(u1+th) is minimal at t = 0, the derivative with respect to t at t = 0 vanishes,i.e.,

2B(u1,h)−λ12K(u1,h) = δB(u1)h−λ1δK(u1)h= 0

for all h ∈ X .(3.1.37)

Thus we rediscover Lagrange’s multiplier rule in a Hilbert space. �

Next we prove the existence of infinitely many linear independent weak eigen-vectors. We assume the hypotheses stated in Proposition 3.1.3.

Proposition 3.1.4. The weak eigenvalue problem

B(u,h) = λK(u,h) for all h ∈ X (3.1.38)

has infinitely many linear independent eigenvectors un ∈ X with eigenvalues λn,n ∈ N, satisfying

K(un,um) = δnm ={1 for n= m,0 for n �= m,

0 < λ1 ≤ λ2 ≤ ·· · ≤ λn ≤ ·· ·and lim

n→∞λn =+∞.

(3.1.39)

Proof. The proof is by induction: The base clause is the statement of Proposition3.1.3, and for the induction hypothesis we make the following statements: Any k


3.1 The Method 149

“K-orthonormal” elements u1, . . . ,uk in X fulfilling (3.1.39)1 are linearly indepen-dent and for

Uk = span[u1, . . . ,uk], dimUk = k,

Wk = {y ∈ X |K(y,ui) = 0 for i= 1, . . . ,k},we have

X =Uk ⊕Wk.

(3.1.40)

Indeed, for any y ∈ X , we have

y=k

∑i=1

K(y,ui)ui+ y−k

∑i=1

K(y,ui)ui, and

y−k

∑i=1

K(y,ui)ui ∈Wk.

(3.1.41)

By the infinite dimension of X dimWk = ∞ for any k ∈ N, and for any system ofK-orthonormal vectors definition (3.1.40)2 yields

X =W0 ⊃W1 ⊃W2 ⊃ ·· · ⊃Wk ⊃Wk+1 ⊃ ·· · , (3.1.42)

where each spaceWk is closed, and therefore it is a Hilbert space. Now we are readyto formulate the induction hypothesis:

For k = 1, . . . ,n there exist weak eigenvectors uk ∈Wk−1,

which are K-orthonormal in the sense of (3.1.39)1,with corresponding eigenvalues 0 < λ1 ≤ λ2 ≤ ·· · ≤ λn satisfying

B(uk) = inf{B(y)|y ∈Wk−1,K(y) = 1} = λk.

(3.1.43)

The arguments in the proof of Proposition 3.1.3, in particular, the arguments for(3.1.33), guarantee the existence of some un+1 ∈Wn satisfying

B(un+1) = inf{B(y)|y ∈Wn,K(y) = 1} = λn+1, (3.1.44)

due to the fact that the weak limit un+1 of a minimizing sequence inWn∩{K(y) = 1}is in Wn and in the set {K(y) = 1} as well. The first statement follows from thedefinition of Wn, from the fact that K(·,ui) ∈ X ′, and from the definition of weakconvergence. The second statement follows from the assumed properties of K, cf.(3.1.28). SinceWn−1 ⊃Wn,

λn ≤ λn+1, (3.1.45)

and the arguments for (3.1.34) show that the chosen subsequence of the minimizingsequence converges strongly to un+1 in X .

The induction hypothesis (3.1.43)4 and (3.1.44) together with the arguments for(3.1.35)–(3.1.37) imply



B(uk,h) = λkK(uk,h) for all h ∈Wk−1,k = 1, . . . ,n,

B(un+1,h) = λn+1K(un+1,h) for all h ∈Wn.(3.1.46)

By the definition ofWn ⊂Wk−1 for k = 1, . . . ,n, (3.1.46)1 implies

B(uk,un+1) = B(un+1,uk) = 0 for k = 1, . . . ,n, (3.1.47)

and for any h ∈ X , (3.1.41)1 and (3.1.46)2 yield

B(un+1,h) = B(un+1,h−n

∑i=1

K(h,ui)ui)

= λn+1K(un+1,h−n

∑i=1

K(h,ui)ui) = λn+1K(un+1,h),(3.1.48)

i.e., un+1 is a weak eigenvector with eigenvalue λn+1. By their construction, theeigenvectors u1, . . . ,un,un+1 are K-orthonormal, which finishes the induction step.

Assume that the sequence of eigenvalues is bounded, i.e.,

0 < B(un,un) = B(un) = λn ≤C for all n ∈ N. (3.1.49)

The positive definiteness of B implies then that the sequence (un)n∈N is boundedin X , and thus there is a subsequence (unk)k∈N converging weakly in X , i.e.,w- limk→∞ unk = u0 ∈ X . The hypotheses (3.1.28) on K finally yield

1= limk→∞

K(unk) = K(u0) = 1,

limk→∞

K(unk −u0) = limk→∞

(K(unk)−2K(unk ,u0)+K(u0)) = 0,

K(unk −unl ) = K(unk)−2K(unk ,unl )+K(unl ) = 2,

liml→∞

K(unk −unl ) = K(unk −u0) = 2,

(3.1.50)

which is contradictory. Therefore (3.1.39)3 is true. �

Corollary 3.1.1. The geometric multiplicity of each eigenvalue λn is finite, i.e., thedimension of the eigenspace spanned by the eigenvectors with eigenvalue λn is finite.

Proof. All linear independent eigenvectors with eigenvalue λn can beK-orthonormalized, and the assumption of infinitely many is contradictory as shownin (3.1.49) and (3.1.50). �

Proposition 3.1.5. The system of all K-orthonormal weak eigenvectors (3.1.39)1 iscomplete or forms a Schauder basis in X (cf. Definition 3.2.5): Any y ∈ X can bedeveloped into a “Fourier series”


3.1 The Method 151

y=∞

∑n=1

cnun where cn = K(y,un)

which converges in X .

(3.1.51)

Proof. Let for arbitrary y ∈ X , let yN = ∑Nn=1 cnun, where the coefficients cn are

given in (3.1.51). We show that limN→∞ yN = y in X . Due to K-orthonormality,

K(y− yN ,un) = 0 for n= 1, . . . ,N or

y− yN ∈WN ,(3.1.52)

cf. (3.1.40)2. For any vector y∈WN , y �= 0, we have K(y/√

K(y)) =K(y)/K(y) = 1,and by (3.1.43)4 or (3.1.44),

B(y/√

K(y)) = B(y)/K(y) ≥ λN+1, or

B(y) ≥ λN+1K(y) for all y ∈WN .(3.1.53)

In particular,B(y− yN) ≥ λN+1K(y− yN). (3.1.54)

UsingB(un,um) = λnK(un,um) = λnδnm, (3.1.55)

one obtains

K(y− yN) = K(y)−N

∑n=1

c2n = K(y)−K(yN),

B(y− yN) = B(y)−N

∑n=1

λnc2n = B(y)−B(yN).

(3.1.56)

By (3.1.54), λN+1 > 0, and B(yN) ≥ 0,

0 ≤ K(y− yN) ≤ 1λN+1

B(y− yN) ≤ 1λN+1

B(y), and

limN→∞

K(y− yN) = 0,(3.1.57)

where we use (3.1.39)3. Therefore, in view of (3.1.56)1,

K(y) = limN→∞

N

∑n=1

c2n =∞

∑n=1

c2n < ∞. (3.1.58)

Formula (3.1.56)2 implies



0 ≤ B(y− yN) = B(y)−N

∑n=1

λnc2n, or

limN→∞

N

∑n=1

λnc2n =

∞

∑n=1

λnc2n ≤ B(y),

(3.1.59)

since λn > 0 for all n ∈ N. Convergence of this series means that the partial sumsform a Cauchy sequence:

B(yN − yM) =N

∑n=M

λnc2n < ε provided N > M ≥ N0(ε). (3.1.60)

The positive definiteness of B then yields

C2‖yN − yM‖2 ≤ B(yN − yM) < ε, provided N > M ≥ N0(ε), (3.1.61)

and by completeness of the Hilbert space X , any Cauchy sequence converges tosome y ∈ X , i.e.,

limN→∞

yN = y in X . (3.1.62)

Finally, the continuity of K and (3.1.57)2 yield

0= limN→∞

K(y− yN) = K(y− y), or y= y, (3.1.63)

where we used hypothesis (3.1.28)4 on K. �

Proposition 3.1.6. The recursive construction described in Proposition 3.1.4 pro-vides all weak eigenvectors (up to linear combinations in case of geometric mul-tiplicities bigger than one) and all eigenvalues of the weak eigenvalue problem(3.1.38).

The proof is set in Exercise 3.1.1.The positive definiteness of B (3.1.17)3 can be replaced by a “K-coercivity”:

B(y,y) ≥C2‖y‖2 − c2K(y,y) for all y ∈ X . (3.1.64)

In this case, the symmetric and continuous bilinear form

B(y,z) = B(y,z)+ c2K(y,z) for y,z ∈ X , (3.1.65)

is positive definite, and the weak eigenvectors un with eigenvalues λn, i.e.,

B(un,h) = λnK(un,h) fulfill

B(un,h) = (λn − c2)K(un,h) for all h ∈ X ,n ∈ N.(3.1.66)


3.1 The Method 153

The “spectrum” is shifted by the constant c2 such that not all eigenvalues λn =λn − c2 are necessarily positive. The eigenvectors un for B and B are the same, andtherefore Proposition 3.1.5 is valid also if nonpositive eigenvalues are present.

The nth eigenvalue of the weak eigenvalue problem (3.1.38) is determined asfollows, cf. (3.1.43), (3.1.44):

λn =min

{B(y)K(y)

∣∣y ∈Wn−1,y �= 0

}, or

λn =min

{B(y)K(y)

∣∣y ∈ X ,y �= 0, K(y,ui) = 0, i= 1, . . . ,n−1

},

where u1, . . . ,un−1 are the first n−1 eigenvectors.

(3.1.67)

The minimizing vector is the nth eigenvector un. The quotient B(y)/K(y) is calledRayleigh quotient.

The characterization (3.1.67) of the nth eigenvalue λn uses the first n−1 eigen-vectors u1, . . . ,un−1. The so-called Minimax Principle of Courant, Fischer, andWeyl characterizes the nth eigenvalue without using the first n− 1 eigenvectors.Apart from its mathematical elegance, this principle has relevant applications as wesee in Paragraph 3.3.

Proposition 3.1.7. Let v1, . . . ,vn−1 (n ≥ 2) be any vectors in a real Hilbert spaceX, which define a closed subspace of X:

V (v1, . . . ,vn−1) = {y ∈ X |K(y,vi) = 0 for i= 1, . . . ,n−1}. (3.1.68)

In V (v1, . . . ,vn−1) the Rayleigh quotient attains its minimum:

d(v1, . . . ,vn−1) =min

{B(y)K(y)

∣∣y ∈V (v1, . . . ,vn−1), y �= 0

}. (3.1.69)

Then the nth eigenvalue λn of the weak eigenvalue problem (3.1.38) is given by

λn = maxv1,...,vn−1∈X

{d(v1, . . . ,vn−1)}. (3.1.70)

Proof. The arguments for (3.1.44) prove that the Rayleigh quotient attains its min-imum in V (v1, . . . ,vn−1). We determine the coefficients of the linear combinationy= α1u1+ · · ·+αnun ∈ X with the first n eigenvectors such that y∈V (v1, . . . ,vn−1),i.e.,

K(y,vi) =n

∑k=1

αkK(uk,vi) = 0 for i= 1, . . . ,n−1,and

n

∑k=1

α2k = 1.

(3.1.71)



This is possible since the homogeneous linear system (3.1.71)1 with n−1 equationsfor n unknowns has nontrivial solutions (α1, . . . ,αn) ∈ R

n, which can be normal-ized. By the K-orthonormality of the eigenvectors, cf. (3.1.39)1, and by (3.1.55), weobtain

K(y) =n

∑k=1

α2k = 1 and

B(y) =n

∑k=1

λkα2k ≤ λn

n

∑k=1

α2k = λn,

(3.1.72)

where we use also (3.1.39)2. Therefore the minimum d(v1, . . . ,vn−1) ≤ λn for anychoice of the vectors v1, . . . ,vn−1 ∈ X . In view of (3.1.67), d(u1, . . . ,un−1) = λn,which proves (3.1.70). �

So far our results on direct methods for bilinear forms on a Hilbert space. Appli-cations are given in Paragraph 3.3. However, we cannot apply Propositions 3.1.1,3.1.2–3.1.4 to functionals in the setting of the first two chapters since the spacesC1[a,b] and C1,pw[a,b] are not reflexive and a fortiori not Hilbert spaces.

Therefore, for the direct methods of this chapter, the functionals have to beextended to a Hilbert space. In Paragraph 3.3 we investigate also the regularity thatminimizers must have for the variational problem and for the Euler-Lagrange equa-tion. This last step for a satisfactory solution of a variational problem is called“regularity theory.” For this theory, like for the existence of a minimizer, a partialconvexity, called “ellipticity,” of the Lagrange function is crucial, cf. Proposition1.11.4.

Remark. All results on abstract quadratic functionals following Definition 3.1.1are not only applicable to Sturm-Liouville boundary and eigenvalue problems aselaborated in Paragraph 3.3 (cf. (3.3.1) and (3.3.4)) but they apply also to ellip-tic boundary and eigenvalue problems over higher dimensional bounded domains,see for instance the standard reference: R. Courant and D. Hilbert, “Methods ofMathematical Physics, Vol.1,” Interscience Publishers, New York, 1953.

Exercises

3.1.1. Prove Proposition 3.1.6.

Hint: Show for any eigenvector u of (3.1.38) with eigenvalue λ , that λ = λn0 forsome n0 ∈N and that u is a linear combination of all constructed linear independenteigenvectors with eigenvalue λn0 .


http://dx.doi.org/10.1007/978-3-319-71123-2_1

3.2 An Explicit Performance of the Direct Method in a Hilbert Space 155

3.2 An Explicit Performance of the Direct Methodin a Hilbert Space

Around the turn of the last century, Hilbert justified Dirichlet’s principle by thedirect method. However, a systematic application to variational problems goes backto L. Tonelli (1885–1946) in the years 1910–1930. We confine ourselves to the casethat the reflexive space is a Hilbert space.

We assume that the linear space

L2(a,b) = {y|y : (a,b) → R is measurable,∫ b

ay2dx < ∞} (3.2.1)

is known from a course on calculus. Otherwise its definition and properties arefound, e.g., in [20]. Defining

(y,z)0,2 =∫ b

ayzdx, ‖y‖0,2 =

√(y,y)0,2, (3.2.2)

then L2(a,b) is a Hilbert space (Riesz-Fischer theorem). In this space, y= 0 meansthat y(x) = 0 for “almost all” x ∈ (a,b), i.e., for all x ∈ (a,b) except a set of measurezero. The integral is the Lebesgue integral, and the measure is the Lebesgue measure(H. Lebesgue, 1875–1941).

The functional J in (1.1.1) contains the derivative of y, and since the classicalderivative is not adequate for the definition of a Hilbert space, a “weak” or “distri-butional” derivative is defined as follows:

Definition 3.2.1. Assume that for y,z ∈ L2(a,b), we have

∫ b

azh+ yh′dx= 0 for all h ∈C∞

0 (a,b). (3.2.3)

Then y is “weakly differentiable” with weak derivative y′ = z.

In view of Lemma 3.2.1, the weak derivative y′ = z ∈ L2(a,b) of y ∈ L2(a,b) isunique (as a function in L2(a,b)), and the formula of integration by parts (1.3.5)shows that the classical derivative, also if it exists only piecewise, coincides withthe weak derivative (almost everywhere). With this notion we can define a Hilbertspace as follows:

Definition 3.2.2. W 1,2(a,b) = {y|y ∈ L2(a,b) and y′ ∈ L2(a,b) exists according to(3.2.3)} is endowed with the scalar product and the norm

(y, y)1,2 = (y, y)0,2+(y′, y′)0,2, ‖y‖1,2 =√

(y,y)1,2. (3.2.4)


http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1


The spaceW 1,2(a,b) is complete, and therefore it is a Hilbert space, which fol-lows easily from the completeness of L2(a,b), cf. Exercise 3.2.1.W 1,2(a,b) is calleda “Sobolev space,” named after S. L. Sobolev (1908–1989). Details about Sobolevspaces are found in [5], [6], [9], [25], and in many books on functional analysis,calculus of variations, and partial differential equations. We give here only thoseproperties that we need. Apparently C1,pw[a,b] ⊂W 1,2(a,b).

The next lemma extends Lemma 1.3.1.

Lemma 3.2.1. If for y ∈ L2(a,b)

∫ b

ayhdx= 0 for all h ∈C∞

0 (a,b), (3.2.5)

then y(x) = 0 for almost all x ∈ (a,b).

Proof. We make use of the fact that the space of “test functions” C∞0 (a,b) is dense

in L2(a,b), i.e., for any y ∈ L2(a,b) and any ε > 0 there exists an h ∈C∞0 (a,b) such

that ‖y−h‖0,2 < ε . Then (3.2.5) implies, using Cauchy-Schwarz’ inequality,

‖y‖20,2 = (y,y)0,2 = (y−h,y)0,2 ≤ ‖y−h‖0,2‖y‖0,2, and hence,

‖y‖0,2 < ε,(3.2.6)

which implies ‖y‖0,2 = 0 or the claim. �

Remark. All functions that are Lebesgue integrable are approximated by “simple”or “step functions.” Apparently, in L2(a,b), a step function is approximated by testfunctions.

Lemma 1.3.2 can be extended as well.

Lemma 3.2.2. If for y ∈ L2(a,b)

∫ b

ayh′dx= 0 for all h ∈C∞

0 (a,b), (3.2.7)

then y(x) = c for almost all x ∈ (a,b).

Proof. 1) We prove first: If

∫ b

ayhdx= 0 for all h ∈C∞

0 (a,b) satisfying∫ b

ahdx= 0, (3.2.8)

then y(x) = c for almost all x ∈ (a,b).Choose any g ∈ C∞

0 (a,b) and some fixed f ∈ C∞0 (a,b) satisfying

∫ ba f dx = 1.

Then


http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1


h= g−∫ b

agdx f ∈C∞

0 (a,b) satisfying∫ b

ahdx= 0, (3.2.9)

and therefore by (3.2.8),

0=∫ b

ayhdx=

∫ b

aygdx−

∫ b

agdx

∫ b

ay f dx

=∫ b

a(y−

∫ b

ay f dx)gdx for all g ∈C∞

0 (a,b),(3.2.10)

which implies by Lemma 3.2.1 that y(x) =∫ ba y f dx= c for almost all x ∈ (a,b).

2) We prove the general case: Choose any g ∈ C∞0 (a,b) satisfying

∫ ba gdx = 0.

Define h(x) =∫ xa gds. Then h ∈ C∞

0 (a,b), h′ = g, and by (3.2.7)

∫ ba ygdx = 0.

Then case 1) applies, proving the lemma. �

The fundamental theorem of calculus holds also for the weak derivative andLebesgue integral:

Lemma 3.2.3. For y ∈W 1,2(a,b), it follows that

y(x2)− y(x1) =∫ x2

x1y′dx for all a ≤ x1 < x2 ≤ b. (3.2.11)

Proof. According to Definition 3.2.2 of W 1,2(a,b), a function y ∈ W 1,2(a,b) ⊂L2(a,b) can be modified on a set of measure zero, and in this sense such functionsform an “equivalence class.” Therefore the pointwise equality in (3.2.11) seems tomake no sense. In Lemma 3.2.4 it is shown that each function in W 1,2(a,b) has acontinuous representative, and formula (3.2.11) is valid for this representative.

Defining

z(x) =∫ x

ay′ds for x ∈ [a,b] (3.2.12)

the Cauchy-Schwarz inequality implies that z is continuous on [a,b], cf. (3.2.18).For an arbitrary h ∈C∞

0 (a,b) Fubini’s theorem implies

∫ b

azh′dx=

∫ b

a

∫ x

ay′h′dsdx=

∫ b

a

∫ b

sy′h′dxds

=∫ b

ay′(−h)ds or

∫ b

ay′h+ zh′dx= 0.

(3.2.13)

By (3.2.3),∫ b

ay′h+ yh′dx= 0 for all h ∈C∞

0 (a,b), (3.2.14)

which yields, in view of (3.2.13)2,



∫ b

a(z− y)h′dx= 0 for all h ∈C∞

0 (a,b),or

z(x)− y(x) = c for almost all x ∈ (a,b),(3.2.15)

due to Lemma 3.2.2.Choosing the continuous representative of y, namely y(x) = z(x)− c, we obtainz(x) = y(x)− y(a), and finally,

∫ x2

x1y′dx=

∫ x2

ay′dx−

∫ x1

ay′dx= z(x2)− z(x1) = y(x2)− y(x1). (3.2.16)

�

For the continuous representative of y ∈W 1,2(a,b), the following holds:

Lemma 3.2.4. If y ∈W 1,2(a,b), then y is uniformly Hölder continuous with expo-nent 1/2 and the following estimates hold:

|y(x2)− y(x1)| ≤ |x2 − x1|1/2‖y′‖0,2 for all a ≤ x1 < x2 ≤ b,

‖y‖0 = maxx∈[a,b]

|y(x)| ≤ 1

(b−a)1/2‖y‖0,2+(b−a)1/2‖y′‖0,2.

(3.2.17)

Proof. Relation (3.2.11) implies (3.2.17)1, by virtue of the Cauchy-Schwarz inequal-ity:

|y(x2)− y(x1)| ≤(∫ x2

x11dx

)1/2(∫ x2

x1(y′)2dx

)1/2

≤ |x2 − x1|1/2‖y′‖0,2. (3.2.18)

For a continuous function y ∈C[a,b], the mean value theorem for integrals reads

∫ b

aydx= y(ξ )(b−a) for some ξ ∈ (a,b). (3.2.19)

Lemma 3.2.3 yields for any x ∈ [a,b], using also estimates like (3.2.18),

y(x) = y(ξ )+∫ x

ξy′ds=

1b−a

∫ b

aydx+

∫ x

ξy′ds,

|y(x)| ≤ (b−a)1/2

b−a

(∫ b

ay2dx

)1/2

+(b−a)1/2(∫ b

a(y′)2dx

)1/2

,

(3.2.20)

where we agree on∫ x

ξ y′ds=−∫ ξ

x y′ds if x< ξ . Since x∈ [a,b] is arbitrary, (3.2.17)2is proved. �

Uniformly Hölder continuous functions form a subspace of C[a,b], called afterO. Hölder (1859–1937):



Definition 3.2.3. C1/2[a,b] = {y|y : [a,b]→R is uniformly Hölder continuous withexponent 1/2} is endowed with the norm

‖y‖1/2 = ‖y‖0+ supa≤x1<x2≤b

|y(x2)− y(x1)||x2 − x1|1/2

. (3.2.21)

By the estimates of Lemma 3.2.4 and by the definition of the norm ‖ ‖1/2 in(3.2.21), we can state for the continuous representatives of functions inW 1,2(a,b):

Proposition 3.2.1. The Sobolev spaceW 1,2(a,b) is continuously embedded into theHölder space C1/2[a,b], i.e.,

W 1,2(a,b) ⊂C1/2[a,b] and

‖y‖1/2 ≤ c0‖y‖1,2 for all y ∈W 1,2(a,b),(3.2.22)

with a constant c0 > 0.

The following result is due to C. Arzelà (1847–1912) and to G. Ascoli (1843–1896). It is crucial for the direct methods.

Proposition 3.2.2. Let a sequence (yn)n∈N ⊂C[a,b] be bounded and equicontinu-ous, i.e., for all n ∈ N,

‖yn‖0 = maxx∈[a,b]

|yn(x)| ≤C,

and for any ε > 0 there exists a δ (ε) > 0 such that

|yn(x2)− yn(x1)| < ε, provided |x2 − x1| < δ (ε)for all x1,x2 ∈ [a,b].

(3.2.23)

Then (yn)n∈N contains a subsequence (ynk)k∈N, which converges uniformly to a con-tinuous function y0 ∈C[a,b]:

limk→∞

ynk = y0 in C[a,b], or

limk→∞

‖ynk − y0‖0 = 0.(3.2.24)

A proof is given in the Appendix.The definition (3.2.21) of the Hölder norm and the embedding (3.2.22) allows us

to deduce from the Arzelà-Ascoli theorem the following proposition:



Proposition 3.2.3. Let a sequence (yn)n∈N ⊂W 1,2(a,b) be bounded, i.e.,

‖yn‖1,2 ≤C for all n ∈ N. (3.2.25)

Then (yn)n∈N contains a subsequence (ynk)k∈N, which converges uniformly to a con-tinuous function y0 ∈C[a,b], cf. (3.2.24).

The boundedness of a sequence (yn)n∈N ⊂W 1,2(a,b) implies, by (3.2.22)2, thatit is bounded inC[a,b] and, according to the definition (3.2.4) of the norm ‖ ‖1,2, itmeans also that the sequence of the weak derivatives (y′

n)n∈N is bounded in L2(a,b).Since L2(a,b) is a Hilbert space, we can apply the weak sequential compactness(3.1.13), and we obtain:

Proposition 3.2.4. Let a sequence (yn)n∈N ⊂W 1,2(a,b) be bounded. Then it con-tains a subsequence (ynk)k∈N having the following properties:

limk→∞

ynk = y0 in C[a,b] and

w- limk→∞

y′nk = z0 in L2(a,b).

(3.2.26)

Furthermore, y0 ∈W 1,2(a,b), and y′0 = z0 is the weak derivative.

Proof. W.l.o.g let (ynk)k∈N be the subsequence, for which the Arzelà-Ascoli theo-rem in Proposition 3.2.3 and for which the weak sequential compactness (3.1.14)holds. According to the definition (3.1.11) of weak convergence in a Hilbert space,

limk→∞

∫ b

ay′nkhdx=

∫ b

az0hdx for all h ∈C∞

0 (a,b). (3.2.27)

Since y′nk is the weak derivative of ynk the convergence (3.2.26)1 implies

limk→∞

∫ b

ay′nkhdx= − lim

k→∞

∫ b

aynkh

′dx= −∫ b

ay0h

′dx. (3.2.28)

Equality of the limits in (3.2.27) and in (3.2.28) proves the statements of Proposition3.2.4. �

Remark. A bounded sequence (yn)n∈N in the Hilbert space W 1,2(a,b) contains, byvirtue of the weak sequential compactness (3.1.13), a subsequence (ynk)k∈N, whichconverges weakly in W 1,2(a,b), i.e., limk→∞((ynk ,z)0,2 +(y′

nk ,z′)0,2) = (y0,z)0,2 +

(y′0,z

′)0,2 for all z ∈W 1,2(a,b). Apparently (3.2.26) gives more information.

The embeddings W 1,2(a,b) ⊂ C1/2[a,b] ⊂ C[a,b] allows us to define boundaryconditions y(a) = A and y(b) = B for functions y ∈W 1,2(a,b). For A = B = 0, wedefine



Definition 3.2.4. W 1,20 (a,b) = {y|y ∈W 1,2(a,b) satisfying y(a) = 0, y(b) = 0}.

The following Poincaré inequality holds:

Lemma 3.2.5. All y ∈W 1,20 (a,b) satisfy

‖y‖0,2 ≤ (b−a)‖y′‖0,2. (3.2.29)

Proof. Lemma 3.2.3 yields for y(a) = 0,

y(x) =∫ x

ay′ds, and hence,

|y(x)| ≤ (b−a)1/2(∫ b

a(y′)2dx

)1/2

,

(3.2.30)

which gives the claim after squaring, integration, and taking the square root. �

Remark. The constant (b− a) in (3.2.29) is not optimal: Exercise 2.1.2 gives for(a,b)= (0,1) the constant 1/π instead of b−a= 1. However, this Poincaré estimaterequires the existence of a global minimizer, which is proved in Paragraph 3.3.

For functions with nonhomogeneous boundary conditions, we can prove the fol-lowing estimate:

Lemma 3.2.6. All y ∈ D=W 1,2(a,b)∩{y(a) = A, y(b) = B} satisfy

‖y‖1,2 ≤C1‖y′‖0,2+C2 (3.2.31)

with constants C1, C2 that do not depend on y.

Proof. Let z0 ∈ D be fixed, for instance, let z0 be the straight line from (a,A) to(b,B). Then for y∈D the function y−z0 = h∈W 1,2

0 (a,b) and for y= z0+h (3.2.29)implies

‖y‖0,2 ≤ ‖z0‖0,2+(b−a)‖h′‖0,2≤ ‖z0‖0,2+(b−a)‖z′0‖0,2+(b−a)‖y′‖0,2= C1‖y′‖0,2+C2.

(3.2.32)

Then (3.2.31) holds for C1 =√

2C21 +1 and C2 =

√2C2

2 . �

Next we perform the direct method for a functional

J(y) =∫ b

aF(x,y,y′)dx (3.2.33)


http://dx.doi.org/10.1007/978-3-319-71123-2_2


whose admitted functions fulfill boundary conditions y(a)=A and y(b)=B. For thispurpose, we extend the domain of definition fromC1,pw[a,b]∩{y(a)=A, y(b)=B},given in the first two chapters of this book, toD=W 1,2(a,b)∩{y(a)=A, y(b)=B}.

According to Proposition 3.2.1, a function y ∈ D is continuous on [a,b], but itsweak derivative is only in L2(a,b). For a continuous Lagrange function F : [a,b]×R×R → R, the function F(·,y,y′) is measurable (according to a result of measuretheory), but the integral (3.2.33) is not necessarily finite for each y ∈ D. Since weare only interested in minimizers we have only to guarantee that the functional isbounded from below. The functional can be unbounded from above, it can possiblyattain the value +∞. Since J(y) is finite for y ∈ C1,pw[a,b], the infimum is finitein case J is bounded from below on D. The following proposition gives sufficientconditions for the existence of a global minimizer of J. This proposition is not themost general. For sharper results, we refer to the literature, in particular to [5].

Proposition 3.2.5. Assume that the Lagrange function F : [a,b]×R×R → R ofthe functional (3.2.33) is continuous and continuously partially differentiable withrespect to the third variable. The hypotheses for all x ∈ [a,b], and for all y,y′, y′ ∈R

are:

F(x,y,y′) ≥ c1(y′)2 − c2 where c1 > 0 (coercivity),

F(x,y, y′) ≥ F(x,y,y′)+Fy′(x,y,y′)(y′ − y′)

(partial convexity with respect to the variable y′).(3.2.34)

Then the functional (3.2.33) possesses a global minimizer y0 ∈ D =W 1,2(a,b)∩{y(a) = A, y(b) = B}.

Proof. We verify the hypotheses (H1), (H2), and (H3) of Paragraph 3.1.

(H1) In view of F(x,y,y′) ≥ −c2 for all (x,y,y′) ∈ [a,b]×R×R, the functional J isbounded from below.Therefore m = inf{J(y)|y ∈ D} ∈ R exists, and we let (yn)n∈N ⊂ D be a mini-mizing sequence.

(H2) The coercivity implies for n ≥ n0

c1‖y′n‖20,2 − c2(b−a) ≤ J(yn) ≤ m+1. (3.2.35)

By the estimates (3.2.31) and (3.2.35), the minimizing sequence (yn)n∈N isbounded inW 1,2(a,b). Proposition 3.2.4 then guarantees the existence of a sub-sequence (ynk)k∈N and a limit y0 ∈W 1,2(a,b) satisfying

limk→∞

ynk = y0 inC[a,b] and

w- limk→∞

y′nk = y′

0 in L2(a,b).(3.2.36)



Due to the uniform convergence (3.2.36)1, the limit fulfills the same boundaryconditions as ynk , and hence y0 ∈ D.

(H3) Let (yn)n∈N ⊂W 1,2(a,b) be sequence converging in the sense of (3.2.36) to alimit function y0 ∈W 1,2(a,b). We define

MN = {x ∈ [a,b] | (y′0(x))

2 > N2} and

KN = [a,b]\MN .(3.2.37)

Then for the Lebesgue measure μ ,

μ(MN)N2 <∫

MN

(y′0)

2dx ≤ ‖y′0‖20,2,

μ(MN) <1N2 ‖y′

0‖20,2, (y′0(x))

2 ≤ N2 for x ∈ KN .

(3.2.38)

By (3.2.34)1, the function F(x,y,y′) = F(x,y,y′)+ c2 ≥ 0 for all (x,y,y′) ∈ [a,b]×R×R, and

J(y) =∫ b

aF(x,y,y′)dx= J(y)+ c2(b−a). (3.2.39)

Then the following holds:

∫ b

aF(x, yn, y′

n)dx ≥∫

KN

F(x, yn, y′n)dx

=∫

KN

F(x, yn, y′n)− F(x, yn, y′

0)dx+∫

KN

F(x, yn, y′0)dx

≥∫

KN

Fy′(x, yn, y′0)(y

′n − y′

0)dx+∫

KN

F(x, yn, y′0)dx

=∫

KN

(Fy′(x, yn, y′0)−Fy′(x, y0, y

′0)(y

′n − y′

0)dx

+∫

KN

Fy′(x, y0, y′0)(y

′n − y′

0)dx+∫

KN

F(x, yn, y′0)dx.

(3.2.40)

We investigate the last three terms separately.

|1st term| ≤(∫

KN

(Fy′(x, yn, y′0)−Fy′(x, y0, y

′0))

2dx

)1/2

‖y′n − y′

0‖0,2. (3.2.41)

For x ∈ KN|yn(x)|, |y0(x)| ≤ c, since lim

n→∞yn = y0 inC[a,b],

|y′0(x)| ≤ N in view of (3.2.38)2.

(3.2.42)

By assumption, the function Fy′ : [a,b]× [−c,c]× [−N,N]→R is uniformly con-tinuous, and therefore the uniform convergence limn→∞ yn = y0 implies the uniformconvergencelimn→∞Fy′(·, yn, y′

0) = Fy′(·, y0, y′0) on KN . Thus the first factor in (3.2.41) converges



to zero as n→ ∞. The second factor in (3.2.41) is bounded becausew- limn→∞y′n = y′

0in L2(a,b), and because weakly convergent sequences are bounded. (For the mini-mizing sequence, we do not need this result from functional analysis; the sequenceof their weak derivatives is bounded in L2(a,b) by (3.2.35).) Thus we obtain

limn→∞

∫

KN

(Fy′(x, yn, y′0)−Fy′(x, y0, y

′0))(y

′n − y′

0)dx= 0. (3.2.43)

By (3.2.42) and the continuity of Fy′

|Fy′(x, y0(x), y′0(x))| ≤CN for x ∈ KN , whence

Fy′(·, y0, y′0) ∈ L2(KN).

(3.2.44)

Since w- limn→∞y′n = y′

0 in L2(a,b), the function y′

0 is the weak limit of the sequence(y′

n) in L2(KN) as well, and by definition of the weak limit,

limn→∞

∫

KN

Fy′(x, y0, y′0)(y

′n − y′

0)dx= 0. (3.2.45)

Thus the 2nd term in (3.2.40)5 converges to zero as does the 1st term, cf. (3.2.43).Again, (3.2.42) and the uniform continuity of F : [a,b]× [−c,c]× [−N,N] → R

imply the uniform convergence limn→∞ F(·, yn, y′0) = F(·, y0, y′

0) on KN , whence

limn→∞

∫

KN

F(x, yn, y′0)dx=

∫

KN

F(x, y0, y′0)dx. (3.2.46)

From (3.2.40), it then follows that

liminfn→∞

J(yn) ≥∫

KN

F(x, y0, y′0)dx. (3.2.47)

Since (3.2.47) holds for all N ∈ N, the lemma of Fatou allows us to take the limitN → ∞: Let χKN be the characteristic function of the set KN , which, in view of(3.2.37) and (3.2.38), converges pointwise almost everywhere in [a,b] to the con-stant 1. Then

liminfN→∞

∫ b

aχKN F(x, y0, y

′0)dx ≥

∫ b

aliminfN→∞

χKN F(x, y0, y′0)dx

=∫ b

aF(x, y0, y′

0)dx= J(y0).(3.2.48)

Combining (3.2.47) and (3.2.48) yields

liminfn→∞

J(yn) ≥ J(y0), and from (3.2.39),

liminfn→∞

J(yn) ≥ J(y0).(3.2.49)



Since the three hypotheses (H1), (H2), and (H3) are fulfilled, Proposition 3.1.1 guar-antees the existence of a global minimizer y0 ∈ D of the functional J. �

In Exercise 3.2.2 the coercivity (3.2.34)1 is weakened to

F(x,y,y′) ≥ c1(y′)2 − c2|y|q − c3 where c1 > 0, 1 ≤ q < 2. (3.2.50)

It is still open whether the global minimizer y0 ∈D⊂W 1,2(a,b) fulfills the Euler-Lagrange equation. The hypotheses of Proposition 3.2.5 do not guarantee the exis-tence of the first variation δJ(y0)h for all directions h ∈W 1,2

0 (a,b).

Proposition 3.2.6. Assume that the Lagrange function F : [a,b]×R×R → R ofthe functional (3.2.33) is continuous and continuously partially differentiable withrespect to the last two variables. The hypotheses for all (x,y,y′)∈ [a,b]×R×R are:

|Fy(x,y,y′)| ≤ f1(x,y)(y′)2+ f2(x,y),|Fy′(x,y,y′)| ≤ g1(x,y)|y′|+g2(x,y), where

fi,gi : [a,b]×R → R are continuous for i= 1,2.

(3.2.51)

Then J :W 1,2(a,b) → R is continuous, and there exists the first variation

δJ(y)h=∫ b


′)h′dx (3.2.52)

for all y ∈W 1,2(a,b) and all directions h ∈W 1,2(a,b).

Proof. The representation

F(x,y,y′)−F(x, y, y′)

=∫ 1

0

ddtF(x, y+ t(y− y), y′ + t(y′ − y′))dt

=∫ 1

0Fy(x, y+ t(y− y), y′ + t(y′ − y′))(y− y)dt

+∫ 1

0Fy′(x, y+ t(y− y), y′ + t(y′ − y′))(y′ − y′)dt,

(3.2.53)

and (3.2.51) give, for (x,y,y′) ∈ [a,b] × [−c,c] × R and for (x, y, y′) ∈ [a,b] ×[−c, c]×R, the estimate

|F(x,y,y′)−F(x, y, y′)|≤ c1(|y′|2+ |y′|2)|y− y|+ c2(|y′|+ |y′|)|y′ − y′|, (3.2.54)

where the constants c1 and c2 depend on c and c. By the embedding W 1,2(a,b) ⊂C[a,b], cf. Proposition 3.2.1, and by setting ‖y‖0 ≤ c0‖y‖1,2 = c, ‖y‖0 ≤ c0‖y‖1,2 =c, (3.2.54) gives



|J(y)− J(y)|≤ c1(‖y‖21,2+‖y‖21,2)‖y− y‖0+ c2(‖y‖1,2+‖y‖1,2)‖y− y‖1,2,

(3.2.55)

where the second summand is estimated by the Cauchy-Schwarz inequality. Fur-thermore, (3.2.55) and again the embedding W 1,2(a,b) ⊂ C[a,b] show that J :W 1,2(a,b) → R is locally Lipschitz continuous.To prove (3.2.52) we have to show that

ddtJ(y+ th)|t=0 = δJ(y)h (3.2.56)

exists and is given by (3.2.52). The proof of Proposition 1.2.1 relies on the fact that,due to uniform convergence of the difference quotient to the derivative on [a,b], dif-ferentiation and integration can be interchanged. Here the possibility of that inter-change is guaranteed by Lebegue’s dominated convergence theorem.For this purpose, the last two terms in (1.2.9) are estimated via (3.2.51) and thecontinuity of y,h ∈W 1,2(a,b), yielding ‖y‖0,‖h‖0 ≤ c. Finally, let |t| ≤ 1, and weobtain

∣∣∣∣1t

∫ t

0Fy(x,y(x)+ sh(x),y′(x)+ sh′(x))−Fy(x,y(x),y′(x))dsh(x)

∣∣∣∣

≤ c3+ c4(|y′(x)|2+ |h′(x)|2),∣∣∣∣1t

∫ t

0Fy′(x,y(x)+ sh(x),y′(x)+ sh′(x))−Fy′(x,y(x),y

′(x))dsh′(x)∣∣∣∣

≤ (c5+ c6(|y′(x)|+ |h′(x)|))|h′(x)|.

(3.2.57)

For y′,h′ ∈ L2(a,b), both terms of (3.2.57) have integrable majorants, which do notdepend on |t| ≤ 1. They allow the interchange of the limit t → 0 and integration.Following the proof of Proposition 1.2.1, the continuity of Fy and Fy′ implies thatthe last two terms of (1.2.9) converge to zero pointwise for each x ∈ [a,b], and as in(1.2.15) we obtain (3.2.52). �

For the next proposition, we sharpen the growth conditions (3.2.51) on Fy(x,y,y′),which, however, is due to our Hilbert-space approach.

Proposition 3.2.7. Let the Lagrange function F : [a,b]×R×R → R of the func-tional

J(y) =∫ b

aF(x,y,y′)dx (3.2.58)

be continuous and continuously partially differentiable with respect to the last twovariables. We assume that F is coercive as in (3.2.34)1 or in (3.2.50), and that itis partially convex in the sense of (3.2.34)2. Furthermore we assume the growthconditions for all (x,y,y′) ∈ [a,b]×R×R:


http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1


|Fy(x,y,y′)| ≤ f1(x,y)|y′|+ f2(x,y),|Fy′(x,y,y′)| ≤ g1(x,y)|y′|+g2(x,y), where

fi,gi : [a,b]×R → R are continuous for i= 1,2.

(3.2.59)

The functional (3.2.58) possesses a global minimizer y0 ∈D=W 1,2(a,b)∩{y(a) =A, y(b) = B}, which fulfills the Euler-Lagrange equation in its weak form

∫ b

aFy(x,y0,y′

0)h+Fy′(x,y0,y′0)h

′dx= 0

for all h ∈W 1,20 (a,b),

(3.2.60)

and also in its strong form

Fy′(·,y0,y′0) ∈W 1,2(a,b) and

ddx

Fy′(·,y0,y′0) = Fy(·,y0,y′

0) on (a,b).(3.2.61)

Here ddxFy′(·,y0,y′

0) is the weak derivative of Fy′(·,y0,y′0) in the sense of Definition

3.2.1.

Proof. By Proposition 3.2.5, a global minimizer y0 of J exists, and by Definition3.2.4, the minimizer y0 and perturbations y0+ th for all h ∈ W 1,2

0 (a,b) and for allt ∈R are in the domain of definitionD. By Proposition 3.2.6 the first variation exists,and we obtain for the global minimizer

ddtJ(y0+ th)|t=0 = δJ(y0)h= 0 for all h ∈W 1,2

0 (a,b). (3.2.62)

The representation (3.2.52) and Definition 3.2.1 prove (3.2.60) and (3.2.61), pro-vided Fy(·,y0,y′

0) and Fy′(·,y0,y′0) ∈ L2(a,b). This is assured by the growth condi-

tions (3.2.59) and by y0 ∈W 1,2(a,b) ⊂C[a,b]. �

The minimizer y0 ∈W 1,2(a,b) fulfills the Euler-Lagrange equation (3.2.61) withweak derivatives. It is a goal to prove enough regularity of y0 so that the Euler-Lagrange equation is solved in the classical sense. If y0 ∈ C1[a,b] this is the case,cf. Proposition 1.4.1, and if Fy′y′(x,y0(x),y′

0(x)) �= 0 for all x ∈ [a,b], Exercise 1.5.1then yields even y0 ∈C2[a,b]. However, the step from y0 ∈W 1,2(a,b) ⊂C1/2[a,b](Proposition 3.2.1) to y0 ∈C1[a,b] is not simple, in general, and therefore we referto the literature, to [5], e.g. For some special functionals, however, the requiredregularity comes along in a simple way, as we see in the next paragraph. Again, theellipticity Fy′y′(x,y(x),y′(x)) �= 0 plays a crucial role.

Besides existence and regularity, the computation of a minimizer is of interest.The theory is very helpful for this purpose, according to which a global minimizer


http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1


is approximated by a minimizing sequence. Therefore the construction and compu-tation of a minimizing sequence is important in applications.

Definition 3.2.5. A Hilbert space X possesses a Schauder basis S = {en}n∈N (J.Schauder, 1899–1943), if any element y∈X is the limit of a unique convergent seriesin X:

y=∞

∑n=1

cnen = limN→∞

N

∑n=1

cnen, cn ∈ R. (3.2.63)

Each separable Hilbert space possesses a Schauder basis, which can beorthonormalized. In (3.1.12) an orthonormal Schauder basis in X = �2 is given,and Proposition 3.1.5 ascertains that a Schauder basis in a Hilbert space X consistsof eigenfunctions of a weak eigenvalue problem.

Let S= {en}n∈N be a Schauder basis in the Hilbert spaceW 1,20 (a,b) and let

UN = span[e1, . . . ,eN ] ⊂W 1,20 (a,b) (3.2.64)

be the N-dimensional subspace, which is a Hilbert space for each N ∈ N. For somefixed z0 ∈ D =W 1,2(a,b)∩{y(a) = A, y(b) = B} the domain of definition can be

written as D= z0+W 1,20 (a,b), and under the hypotheses of Proposition 3.2.7, there

exists a global minimizer y0 ∈ D of J : D → R and a fortiori there exists a globalminimizer yN ∈ z0+UN of J : z0+UN → R. In view ofUN ⊂UN+1 ⊂W 1,2

0 (a,b),

J(yN) ≥ J(yN+1) ≥ J(y0) for all N ∈ N. (3.2.65)

Proposition 3.2.8. Under the hypotheses of Proposition 3.2.7 the sequence(yN)N∈N of global minimizers of the functional (3.2.58) defined on z0 +UN is aminimizing sequence of the functional J : z0 +W 1,2

0 (a,b) → R. The coefficients(cN1 , . . . ,cNN) ∈ R

N of yN = z0+∑Nn=1 c

Nn en fulfill the N-dimensional system of equa-

tions∫ b

aFy(x,cN1 , . . . ,cNN)ek+ Fy′(x,c

N1 , . . . ,cNN)e

′kdx= 0, k = 1, . . . ,N,

Fy(x,cN1 , . . . ,cNN) = Fy(x,z0+N

∑n=1

cNn en,z′0+

N

∑n=1

cNn e′n),

Fy′(x,cN1 , . . . ,cNN) = Fy′(x,z0+

N

∑n=1

cNn en,z′0+

N

∑n=1

cNn e′n).

(3.2.66)

Proof. By Proposition 3.2.6, J :W 1,2(a,b)→R is continuous, and therefore for anyε > 0 there is a δ > 0 such that |J(y)− J(y0)| < ε provided ‖y− y0‖1,2 < δ . Bythe property (3.2.63) of a Schauder basis, for δ > 0 there is an N0 ∈ N and somevN0 ∈UN0 such that ‖y0 − z0 − vN0‖1,2 < δ . By (3.2.65),


3.3 Applications of the Direct Method 169

0 ≤ J(yN)− J(y0) ≤ J(z0+ vN0)− J(y0) < εfor all N ≥ N0,

(3.2.67)

which proves limN→∞ J(yN) = J(y0). The global minimizer yN ∈ z0 +UN fulfillsthe Euler-Lagrange equation in its weak form (3.2.60) for all h ∈UN , which proves(3.2.66). �

By (3.2.36) the minimizing sequence (yN)N∈N contains a subsequence (yNk)k∈Nsuch that

limk→∞

yNk = y0 inC[a,b] and

w- limk→∞

y′Nk

= y′0 in L2(a,b).

(3.2.68)

Under suitable hypotheses the convergence in (3.2.68) can be improved. This isthe case for quadratic functionals, which we study in the next paragraph. TheEuler-Lagrange equation is then a linear differential equation, and the regularity of aminimizer follows easily in these cases. The system of equations (3.2.66) is a linearsystem with a symmetric matrix of coefficients. For a suitable choice of a Schauderbasis {en}n∈N, it can be a diagonal matrix.

Exercises

3.2.1. Prove that the completeness of the space L2(a,b) implies the completenessof the Sobolev spaceW 1,2(a,b). Show that any Cauchy sequence inW 1,2(a,b) con-verges to some limit inW 1,2(a,b).

3.2.2. Show that Proposition 3.2.5 can be proved with the weaker coercivity(3.2.50). Indicate where the proof has to be modified.

3.2.3. Which hypothesis for Proposition 3.2.5 is not fulfilled by the counterexampleof Weierstraß (example 3 in 1.5)?

Show that the minimizing sequence given in Paragraph 1.5 is not bounded inW 1,2(−1,1).

3.3 Applications of the Direct Method

In this paragraph we prove the existence of classical solutions of boundary valueproblems, in particular, of the so-called Sturm-Liouville boundary value problems.We study the Sturm-Liouville eigenvalue problem, named after C.-F. Sturm (1803–1855) and J. Liouville (1809–1882). The proofs are based on the direct methodin the calculus of variations in the spirit of Dirichlet’s principle. The solution is a


http://dx.doi.org/10.1007/978-3-319-71123-2_1


minimizer of a related functional and it is therefore approximated by minimizingsequences. This approximation, called the Ritz or Galerkin method (W. Ritz, 1878–1909, B.G. Galerkin, 1871–1945), is the foundation for numerical methods, in par-ticular, finite element methods, for elliptic boundary and eigenvalue problems.

1. A nonlinear boundary value problem

y′′ + f (·,y) = 0 on [a,b],y(a) = A, y(b) = B, where

f : [a,b]×R → R is continuous and fulfills

| f (x,y)| ≤ c2|y|r+ c3 for all (x,y) ∈ [a,b]×R

with 0 ≤ r < 1 and nonnegative constants c2,c3.

(3.3.1)

Proposition 3.3.1. The boundary value problem (3.3.1) possesses a solution y0 ∈C2[a,b].

Proof. Let g : [a,b]×R → R be a partial primitive satisfying gy(x,y) = f (x,y) for(x,y) ∈ [a,b]×R. Define for the functional J(y) =

∫ ba F(x,y,y

′)dx, with Lagrangefunction F(x,y,y′) = 1

2 (y′)2 − g(x,y), which fulfills the hypotheses (3.2.34)2 and

(3.2.50) for Proposition 3.2.5 and the growth conditions (3.2.59) of Proposition3.2.7. Consequently the functional J has a global minimizer y0 ∈ D =W 1,2(a,b)∩{y(a) = A, y(b) = B}, which satisfies the Euler-Lagrange equation in its weak andin its strong version (3.2.60), (3.2.61).Furthermore, y0 ∈ D ⊂C[a,b] and Fy′(·,y0,y′

0) = y′0 ∈W 1,2(a,b) ⊂C[a,b]. By the

definition (3.2.3) of the weak derivative, (1.3.10), and the uniqueness of the weakderivative, we conclude that y′

0 is the classical derivative and therefore y0 ∈C1[a,b].The Euler-Lagrange equation,

∫ b

ay′0h

′ − f (x,y0)hdx for all h ∈C10 [a,b],

and where y′0, f (·,y0) ∈C[a,b],

(3.3.2)

implies by (1.3.10),

y′0 ∈C1[a,b] or y0 ∈C2[a,b], and

y′′0 + f (·,y0) = 0 on [a,b].

(3.3.3)

The boundary conditions are fulfilled by y0 ∈ D. �

The example (3.3.8) below shows that a growth (3.3.1)4 with an exponent r = 1prevents the existence of a solution, in general.For a solution y0 ∈C2[a,b] of the differential equation (3.3.1)1 in the classical sense,a so-called bootstrapping increases its regularity, provided the function f is k times


http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1


continuously partially differentiable with respect to its variables: In this case y0 ∈Ck+2[a,b].

Remark. The differential equation (3.3.1)1 is a special case of a general Euler-Lagrange equation (1.4.5), which is a quasilinear ordinary differential equation ofsecond order. We assume that the Lagrange function F and accordingly the func-tional J depends on a real parameter λ , which models, for instance, a variablephysical quantity. Then the Euler-Lagrange equation (1.4.5) is of the form

a(·,y,y′,λ )y′′ +b(·,y,y′,λ ) = 0 on [a,b].

Assuming that b(·,0,0,λ ) = 0, the boundary value problem with homogeneousboundary values y(a) = y(b) = 0 possesses the “trivial solution” y≡ 0 for all valuesof the parameter λ . The natural question, which values of the parameter λ admit“nontrivial solutions,” has a positive answer in the following setting: Assume thatthe functional J(y,λ ) loses its convexity when the parameter λ crosses a thresholdλ0 as sketched in Figure 3.1: For λ < λ0, the trivial solution y= 0 is the only “sta-ble” minimizer, which becomes “unstable” for λ > λ0, and accordingly two new“stable nontrivial” minimizers emerge.This scenario of minimizers and a (local) maximizer in Figure 3.1 is reflectedin a (y,λ )-diagram of solutions of the Euler-Lagrange equation as depicted inFigure 3.2.The y-axis represents the (infinite-dimensional) space of admitted functions, andeach point (y,λ ) represents a function for the parameter λ .

Fig. 3.1 A Continuous Loss of Convexity

Figure 3.2 suggests the technical term of a “bifurcation” of solutions dependingon a parameter. In this scenario a unique solution bifurcates into several solu-tions when the parameter exceeds a critical threshold, and this bifurcation comesalong with an “exchange of stability”: the trivial solution loses and the nontrivialsolutions gain stability. Physicists call bifurcation also “a self-organization of newstates,” because it happens without external agency. If the bifurcating solutions haveless symmetry than the “trivial solution,” which is often the case, then bifurcationis also labeled “a spontaneous symmetry breaking.” There are many applications,among which a classical one is the buckling of the so-called Euler beam: a straight


http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1


beam buckles under a growing axial load at a critical threshold, thus losing itssymmetry. The “trivial” straight state persists, but it is unstable beyond the criticalload.

Fig. 3.2 A Bifurcation Diagram

At the “bifurcation point” (0,λ0), the implicit function theorem is obviously notvalid. This theorem assures the existence of a unique solution in a perturbed form ifthe parameter is varied, and in most physical experiments it describes what oneexpects. The validity of the implicit function theorem is “generic” insofar as abifurcation is an exceptional event. Under which general mathematical conditions abifurcation occurs is a natural and interesting question. This led to the field “bifur-cation theory,” which is presented, e.g., in [17].

2. The Sturm-Liouville boundary value problem

− (py′)′ +qy= f on [a,b],y(a) = 0, y(b) = 0, where

p ∈C1[a,b], q, f ∈C[a,b] and

p(x) ≥ d > 0, q(x) ≥ 0 for all x ∈ [a,b].

(3.3.4)

Proposition 3.3.2. For all f ∈ C[a,b], the boundary value problem (3.3.4) pos-sesses a unique classical solution y0 ∈C2[a,b].

Proof. On the Hilbert spaceW 1,20 (a,b), we define

B(y,z) =∫ b

ap(x)y′z′ +q(x)yzdx and

�(y) =∫ b

af (x)ydx.

(3.3.5)



Then the bilinear form B fulfills the assumptions (3.1.17) of Proposition 3.1.2:Symmetry is obvious; the Cauchy-Schwarz inequality implies the continuity, byvirtue of (3.3.4)4 B(y,y)≥ d‖y′‖20,2; positive definiteness then followsby thePoincaréinequality (3.2.29). The linear functional � is continuous on L2(a,b), again, by theCauchy-Schwarz inequality, and a fortiori it is continuous on W 1,2

0 (a,b). By the

Riesz representation theorem 3.1.3, there is a unique y0 ∈W 1,20 (a,b) such that

∫ b

ap(x)y′

0h′ +(q(x)y0 − f (x))hdx= 0

for all h ∈W 1,20 (a,b).

(3.3.6)

This Euler-Lagrange equation in a weak form yields the regularity of y0 as in theproof of Proposition 3.3.1: By definition of the weak derivative,

ddx

(py′0) = qy0 − f ∈C[a,b] ⊂ L2(a,b),

py′0 ∈W 1,2(a,b) ⊂C[a,b], and hence, y′

0 = py′0/p ∈C[a,b],

y0 ∈C1[a,b], and by (1.3.10),

py′0 ∈C1[a,b]. Finally, y′

0 = py′0/p ∈C1[a,b],

(3.3.7)

due to p ∈C1[a,b]. Therefore y0 ∈C2[a,b] fulfills equation (3.3.7)1 in the classicalsense, and the boundary conditions are fulfilled by y0 ∈W 1,2

0 (a,b). �

If the coefficients p,q, and f are smooth enough, more regularity of the solution y0can be obtained by “bootstrapping.”The conditions (3.3.4)4 cannot be relaxed in an essential way:The necessary condition of Legendre in Exercise 1.4.3 requires p(x) ≥ 0 for allx ∈ [a,b]. The following example shows that also for p(x) ≥ d > 0, Proposition3.3.2 is no longer true when q is negative:

− y′′ −π2y= f on [0,1],y(0) = 0, y(1) = 0,

(3.3.8)

does not possess a solution for all f ∈C[a,b], and if it exists, it is not unique. Assumea solution y ∈C2[a,b] of (3.3.8) for f (x) = sinπx. Then, after two integrations byparts, we find

−∫ 1

0y′′ f +π2y f dx= −

∫ 1

0y( f ′′ +π2 f )dx= 0=

∫ 1

0f 2dx > 0, (3.3.9)

which is a contradiction. For f = 0 (3.3.8) has two solutions: y= 0 and y(x)= sinπx.However, the condition q(x) ≥ 0 is not necessary. For p(x) ≥ d > 0, one has to

assume that the homogeneous boundary value problem (3.3.4)1,2 with f = 0 hasonly the trivial solution y= 0. Then the claim of Proposition 3.3.2 is true. The proof


http://dx.doi.org/10.1007/978-3-319-71123-2_1


relies on the nontrivial Riesz-Schauder theory of functional analysis, and the above-mentioned application is called the “Fredholm alternative.”

Remark. The left side of equation (3.3.4)1 is not a general linear ordinary differ-ential equation of second order, but it is of a so-called self-adjoint form. A generalequation −(py′)′ + ry′ + qy with an arbitrary coefficient r ∈ C[a,b] can be trans-formed in a self-adjoint form by multiplication by eR where R′ = −r/p:

eR[−(py′)′ + ry′ +qy] = −(eRpy′)′ + eRqy.

For a computation of the unique solution y0 ∈ C2[a,b] of (3.3.4), we describe theRitz method of Proposition 3.2.8:Let S = {en}n∈N be a Schauder basis in the Hilbert spaceW 1,2

0 (a,b), and let UN =span[e1, . . . ,eN ] be the N-dimensional subspace. Then

J(yN) = inf{J(y)|y ∈UN} (3.3.10)

defines by Proposition 3.2.8 a minimizing sequence (yN)N∈N ⊂ W 1,20 (a,b) of the

functional (3.1.19), which converges by Proposition 3.1.2 strongly to the globalminimizer y0 ∈W 1,2

0 (a,b) of J. This minimizer is the classical solution y0 ∈C2[a,b]of (3.3.4), cf. Propositions 3.1.2 and 3.3.2. In particular limN→∞ yN = y0 in C[a,b]by Proposition 3.2.1, which means uniform convergence.The coefficients of yN = ∑N

n=1 cNn en are determined by the linear system

N

∑n=1

αkncNn = βk , k = 1, . . . ,N, where

αkn = B(ek,en), βk = �(ek) = ( f ,ek)0,2,

(3.3.11)

cf. Proposition 3.2.8. The so-called stiffness matrix (αkn) ∈ RN×N can be reduced

to a diagonal form by an appropriate choice of the Schauder basis: In case of theeigenfunctions of the corresponding Sturm-Liouville eigenvalue problem (3.3.12),the matrix (αkn) is a diagonal matrix, cf. Propositions 3.3.3, 3.1.4. However, theseeigenfunctions are known only in special cases.The finite element method to solve (3.3.4) approximately relies also on the Ritzmethod as described before: here the “elements” en have “small” support such thatthe stiffness matrix is a “sparse,” having nonzero entries only near the diagonal.There are efficient solvers for the linear system (3.3.11) in this case.

3. The Sturm-Liouville eigenvalue problem

− (py′)′ +qy= λρy on [a,b],y(a) = 0, y(b) = 0.

(3.3.12)

The coefficients p and q and the “weight function” ρ fulfill:



p ∈C1[a,b], q,ρ ∈C[a,b],p(x) ≥ d > 0 and q(x) ≥ −c2ρ(x) for all x ∈ [a,b],ρ(x) > 0 for all x ∈ (a,b).

(3.3.13)

For ρ ≡ 1, (3.3.12) is a common eigenvalue problem. A number λ ∈ R is calledan eigenvalue if (3.3.12) has a nontrivial solution y ∈ C2[a,b], called an eigen-function with eigenvalue λ .If ρ(x) ≥ δ > 0 for all x ∈ [a,b], then condition (3.3.13)2 is fulfilled for anyq ∈C[a,b], with some nonnegative constant c2.

Proposition 3.3.3. Under the hypotheses (3.3.13), the Sturm-Liouville eigenvalueproblem (3.3.12) possesses infinitely many linear independent eigenfunctions un ∈C2[a,b] with eigenvalues λn ∈ R, which satify

∫ b

aρunumdx= δnm,

λ1 < λ2 < · · · < λn < · · · , limn→∞

λn =+∞.(3.3.14)

The system S = {un}n∈N of the eigenfunctions provides a Schauder basis inW 1,2

0 (a,b), i.e., any y ∈W 1,20 (a,b) can be developed into a series

y=∞

∑n=1

cnun where cn =∫ b

aρyundx, (3.3.15)

which converges in W 1,20 (a,b).

Proof. On the Hilbert spaceW 1,20 (a,b) we define the bilinear forms

B(y,z) =∫ b

ap(x)y′z′ +q(x)yzdx,

K(y,z) =∫ b

aρ(x)yzdx,

(3.3.16)

and we note that they are symmetric, continuous, and also that B is K-coerciveaccording to (3.1.64).

Furthermore K(y) = K(y,y)> 0 for y �= 0. It is true that K is weakly sequentiallycontinuous in the sense of (3.1.28)2, but the proof is not simple. Therefore we showonly the property of K that we use in the proofs of the propositions in Paragraph 3.1.

Let (yn)n∈N be a minimizing sequence, which is bounded in W 1,20 (a,b). By

(3.1.13) or (3.1.14) and Proposition 3.2.4, this sequence contains a subsequence(ynk)k∈N, which converges weakly in W 1,2

0 (a,b) as well as in the sense (3.2.26)

to a function y0 ∈ W 1,20 (a,b). The uniform convergence (3.2.26)1 then implies

limk→∞K(ynk) = K(y0).



Therefore all results of Paragraph 3.1 on the weak eigenvalue problem,

B(u,h) = λK(u,h) for all h ∈W 1,20 (a,b), (3.3.17)

hold for B and K defined in (3.3.16). In particular, for u ∈W 1,20 (a,b),

∫ b

ap(x)u′h′ +(q(x)−λρ(x))uhdx= 0

for all h ∈W 1,20 (a,b),

(3.3.18)

and we may copy the arguments in (3.3.7) for the regularity of y0 : Each weak eigen-function u ∈W 1,2

0 (a,b) satisfying (3.3.18) for some eigenvalue λ is in C2[a,b] andfulfills (3.3.12) in the classical sense. If the coefficients p, q, and the weight functionρ are “smooth” a “bootstrapping” yields as much regularity of the eigenfunction asp, q, and ρ allow.

The claim λn < λn+1 for all n ∈ N means that all eigenvalues have a geometricmultiplicity one. Assume λn = λn+1. By the homogeneous boundary condition atx= a there exist (α,β ) �= (0,0) ∈ R

2 such that

αun(a)+βun+1(a) = 0, and

αu′n(a)+βu′

n+1(a) = 0.(3.3.19)

For the linear differential equation of second order

y′′ +p′

py′ +

λnρ −qp

y= 0 (3.3.20)

the initial values y(a) and y′(a) determine a unique solution, which, by the trivial ini-tial values (3.3.19), is the trivial solution αun+βun+1 = 0. Therefore αK(un,un)+βK(un+1,un) = α = 0 (cf. (3.2.39)1), and αK(un,un+1)+βK(un+1,un+1) = β = 0,which contradicts the choice (α,β ) �= (0,0). �

Remark. Each of the eigenvalues λn not only have geometric multiplicity one, butby symmetry also algebraic multiplicity one: Assume that

− (pu′)′ +qu−λnρu= un (3.3.21)

has a solution u ∈C2[a,b]∩{u(a) = 0, u(b) = 0}. Then

B(u,un)+λnK(u,un) = ‖un‖20,2,B(un,u)+λnK(un,u) = 0,

(3.3.22)

and hence, un = 0 by the symmetry of B and K. This contradiction shows that thereis no generalized eigenfunction proving algebraic simplicity of the eigenvalue.

According to (3.1.67), the eigenvalues are determined as follows:



λn =min

{B(y)K(y)

∣∣y ∈W 1,20 (a,b),y �= 0,K(y,ui) = 0, i= 1, . . . ,n−1

},

where u1, . . . ,un−1 are the first n−1 eigenfunctions.

The minimum is attained in the nth eigenfunction un,

which can be normalized to K(un) = 1.

(3.3.23)

The quotient B(y)/K(y) is called Rayleigh quotient.For n= 1 (3.3.23) implies a Poincaré inequality:

λ1

∫ b

aρy2dx ≤

∫ b

ap(y′)2+qy2dx for all y ∈W 1,2

0 (a,b). (3.3.24)

The eigenvalues can also be determined by a minimax principle:

Proposition 3.3.4. Let the functions v1, . . . ,vn−1 ∈ L2(a,b) for n ≥ 2 define the fol-lowing closed subspace of W 1,2

0 (a,b):

V (v1, . . . ,vn−1) = {y ∈W 1,20 (a,b)|K(y,vi) = 0 for i= 1, . . . ,n−1}, (3.3.25)

and let

d(v1, . . . ,vn−1) =min

{B(y)K(y)

∣∣y ∈V (v1, . . . ,vn−1),y �= 0

}. (3.3.26)

Thenλn = max

v1,...,vn−1∈L2(a,b){d(v1, . . . ,vn−1)}. (3.3.27)

The proof is the same as that for Proposition 3.1.7.The next propositions are consequences of the minimax principle: We establish

the monotonicity of the eigenvalues of the Sturm-Liouville eigenvalue problem independence on the weight function ρ and on the length of the interval (a,b).

Proposition 3.3.5. Assume in (3.3.13)2 that q(x)≥ 0 for all x ∈ [a,b]. For two con-tinuous weight functions ρ1, ρ2 of the Sturm-Liouville eigenvalue problem (3.3.12),suppose that

0 < δ ≤ ρ1(x) ≤ ρ2(x) for all x ∈ [a,b]. (3.3.28)

Then the nth eigenvalues λn = λn(ρ) of (3.3.12) satisfy

λn(ρ1) ≥ λn(ρ2) for all n ∈ N. (3.3.29)

Proof. Define Ki(y,z) =∫ ba ρiyzdx, Ki(y) := Ki(y,y), and let di(v1, . . . ,vn−1) be the

minimum (3.3.26) of the Rayleigh quotient B(y)/Ki(y), i = 1,2, in V (v1, . . . ,vn−1)for arbitrary functions v1, . . . ,vn−1 ∈ L2(a,b). Since



K1(y,vk) = K2(y,ρ1

ρ2vk), k = 1, . . . ,n−1,

B(y)K1(y)

≥ B(y)K2(y)

for y ∈W 1,20 (a,b),y �= 0,

(3.3.30)

definitions (3.3.27) and (3.3.23), with the first n− 1 eigenfunctions u21, . . . ,u2n−1

determined with weight function ρ2, imply

λn(ρ1) ≥ d1

(ρ2

ρ1u21, . . . ,

ρ2

ρ1u2n−1

)≥ d2(u21, . . . ,u

2n−1) = λn(ρ2), (3.3.31)

which is the claim (3.3.29). In (3.3.30)2 we use B(y) > 0 for y �= 0, guaranteed bythe assumption on q. �

If ρ1(x0) < ρ2(x0) for at least one x0 ∈ (a,b), then λ1(ρ1) > λ1(ρ2) > 0, cf.Exercise 3.3.4.

Proposition 3.3.6. For two intervals of the Sturm-Liouville eigenvalue problem(3.3.12), suppose that

(a1,b1) ⊂�=(a2,b2). (3.3.32)

Then the nth eigenvalues λn = λn(a,b) of (3.3.12) satisfy

λn(a1,b1) > λn(a2,b2) for all n ∈ N. (3.3.33)

Proof. If we extend any y ∈ W 1,20 (a1,b1) by zero off the interval (a1,b1), then

W 1,20 (a1,b1)⊂W 1,2

0 (a2,b2), cf. Exercise 3.3.3.We denote the bilinear forms (3.3.16)over the interval (ai,bi) by Bi and Ki, i= 1,2.

Let u21, . . . ,u2n−1 be the first n− 1 eigenfunctions over (a2,b2). Then by (3.3.23)

and (3.3.27),

λn(a2,b2)

=min

{B2(y)K2(y)

∣∣0 �= y ∈W 1,20 (a2,b2),K2(y,u2k) = 0,k = 1, . . . ,n−1

}

≤ min

{B1(y)K1(y)

∣∣0 �= y ∈W 1,20 (a1,b1),K1(y,u2k) = 0,k = 1, . . . ,n−1

}

≤ λn(a1,b1),

(3.3.34)

which is due to the fact that in (3.3.34)3 B1 can be replaced by B2, and K1 can bereplaced by K2 if y ∈W 1,2

0 (a1,b1) is extended by zero off (a1,b1).Assume that λn(a2,b2) = λn(a1,b1). Then there exists a nonzero function yn ∈

W 1,20 (a1,b1), which, after extension by zero, minimizes the Rayleigh quotient

B2(y)/K2(y) inWn−1 = {y ∈W 1,20 (a2,b2) | K2(y,u2k) = 0, k = 1, . . . ,n−1}.



The proof of Proposition 3.1.4 then shows that yn is a weak eigenfunction, andProposition 3.3.3 finally shows that yn ∈ C2[a2,b2] is a classical eigenfunction of(3.3.12) with eigenvalue λn(a2,b2). Since (a1,b1) is properly contained in (a2,b2)an extension by zero implies that there is some x0 ∈ (a2,b2) such that yn(x0) =y′n(x0) = 0. As mentioned already in the proof of Proposition 3.3.3, the uniquenessof the initial value problem (3.3.20) with y(x0) = y′(x0) = 0 implies yn(x) = 0 forall x ∈ [a2,b2], contradicting yn �= 0. This proves λn(a1,b1) > λn(a2,b2). �

Here is an example:− y′′ = λy on [a,b]y(a) = 0, y(b) = 0,

(3.3.35)

has the eigenfunctions

un(x) = sinnπx−ab−a

with eigenvalues λn =(

nπb−a

)2

, n ∈ N. (3.3.36)

In this example the nth eigenfunction un has precisely n− 1 simple zeros in (a,b).The simplicity follows from the fact that y(x0) = y′

0(x0) = 0 for some x0 ∈ (a,b)implies y = 0. This property concerning the number of simple zeros of the eigen-functions of a Sturm-Liouville eigenvalue problem holds more generally:

Proposition 3.3.7. The nth eigenfunction un of a Sturm-Liouville eigenvalue prob-lem (3.3.12) has at most n−1 simple zeros in (a,b).

Proof. Assume that the eigenfunction un ∈ C2[a,b]∩ {y(a) = 0,y(b) = 0} has msimple zeros a= x0 < x1 < · · · < xm < xm+1 = b where m ≥ n.

For i= 1, . . . ,n, we define

wi(x) ={ciun(x) for x ∈ [xi−1,xi],

0 for x /∈ [xi−1,xi].(3.3.37)

By Exercise 3.3.3, wi ∈ W 1,20 (xi−1,xi) ⊂ W 1,2

0 (a,xn). Choose the constants ci �= 0such that K(wi) = 1. Then by (3.3.17) and the definition (3.3.37) of the functionswi, we have

B(wi,wj) = λnK(wi,wj) = λnδi j, i, j = 1, . . . ,n,

with the nth eigenvalue λn = λn(a,b).(3.3.38)

For arbitrary v1, . . . ,vn−1 ∈ L2(a,xn) there exist n coefficients αi to determine y =α1w1+ · · ·+αnwn ∈W 1,2

0 (a,xn) such that

∫ xn

aρyvkdx=

n

∑i=1

αiK(wi,vk) = 0 for k = 1, . . . ,n−1 and

n

∑i=1

α2i = 1.

(3.3.39)



Then K(y) = 1, B(y) = λn, and an application of the minimax principle of Proposi-tion 3.3.4 gives

λn(a,xn) ≤ λn = λn(a,b). (3.3.40)

The assumption (a,xn) ⊂�=(a,b) and (3.3.40) contradict the monotonicity (3.3.33) of

Proposition 3.3.6. �

Proposition 3.3.7 implies that the first eigenfunction u1 has no zero in (a,b),which means that it has one sign in (a,b). Typically one chooses a positive sign ofu1 in (a,b). For all eigenfunctions the following proposition holds:

Proposition 3.3.8. The nth eigenfunction un of the Sturm-Liouville eigenvalue prob-lem (3.3.12) has precisely n− 1 simple zeros in (a,b). Between two consecutivezeros of un, there is precisely one simple zero of un+1, where the zeros un(a) = 0 andun(b) = 0 are taken into account.

Proof. Assume that between two consecutive zeros xi < xi+1 of un there is no zero ofun+1. Then there are two zeros x j < x j+1 of un+1 with (xi,xi+1)⊂ (x j, x j+1)⊂ (a,b),and un as well as un+1 has one sign in (xi,xi+1) and (x j, x j+1), respectively. There-fore λn and λn+1 are the first eigenvalues of the eigenvalue problem over [xi,xi+1]and [x j, x j+1], respectively; due to K-orthogonality, all other eigenfunctions have tochange sign in the underlying interval. According to Proposition 3.3.6, λn ≥ λn+1,contradicting λn < λn+1, cf. (3.3.14)2. Thus un+1 has a zero between two zeros ofun.

We have shown that if un has the zeros a= x0 < x1 < · · · < xm < xm+1 = b, thenun+1 has at least m+1 zeros in (a,b).

We know that u1 has no zero, and thus, u2 has at least one zero in (a,b). Theinduction hypothesis that un has at least n− 1 zeros in (a,b) implies that un+1 hasat least n zeros in (a,b). Thus this is true for all n ∈ N, and in combination withProposition 3.3.7, we have proved Proposition 3.3.8. �

We remark that Proposition 3.3.8 excludes that un and un+1 have a common zeroin (a,b).

More applications of the direct method are found in the literature cited in thereference list.

Exercises

3.3.1. According to (3.3.23), the nth eigenfunction un ∈C2[a,b] is the global mini-mizer of B defined onC1

0 [a,b] under the isoperimetric constraints



K(y) = 1,

K(y,uk) = 0 for k = 1, . . . ,n−1.

Assuming the regularity uk ∈ C2[a,b] for k = 1, . . . ,n− 1, Proposition 2.1.3 givesthat un satisfies the Euler-Lagrange equation

ddx

(2pu′n) = 2qun+2λnρun+

n−1

∑k=1

λkρuk on [a,b],

with Lagrange multipliers λn, λ1, . . . , λn−1, provided the constraints are not criticalfor un. Prove:

a) The isoperimetric constraints are not critical for un.b) λk = 0 for k = 1, . . . ,n−1.c) λn = −B(un).

3.3.2. Show that the system {un}n∈N of all eigenfunctions of (3.3.12) is also com-plete or a Schauder basis in L2(a,b).

Hint: {un}n∈N is complete in L2(a,b) ⇔ the closure of span[un|n ∈ N] in L2(a,b) isL2(a,b), cf. [25], V.4.

3.3.3. For any y ∈ W 1,20 (a,b), show that its extension by zero off [a,b] defines a

function y ∈W 1,20 (c,d) for all c ≤ a < b ≤ d.

3.3.4. Under the assumptions of Proposition 3.3.5, show that the additional assump-tion ρ1 �= ρ2 implies λ1(ρ1) > λ1(ρ2).


http://dx.doi.org/10.1007/978-3-319-71123-2_2

Appendix

First we prove some facts about manifolds, some of which are probably knownfrom courses on calculus. The advantage of a revision here is that we introducethe terminology, which we use in applications of the calculus of variations. Weprove Lagrange’s multiplier rule, Liouville’s theorem on volume preservation, theweak sequential compactness of bounded sets in a Hilbert space, and the theorem ofArzelà-Ascoli.

For a continuously totally differentiable mapping

Ψ : Rn → Rm, where m < n, (A.1)

we investigate the nonempty zero set

M = {x ∈ Rn|Ψ(x) = 0} ⊂ R

n (A.2)

under the assumption that the Jacobian matrix

DΨ(x) =(

∂Ψi

∂x j(x)

)i=1,...,mj=1,...,n

∈ Rm×n (A.3)

has the maximal rank m for all x ∈ M. If the linear spaces Rn and Rm are endowed

with the canonical bases, we can identify the Jacobian matrix with a linear transfor-mation DΨ(x) ∈ L(Rn,Rm). Then the following statements are equivalent:

RangeDΨ(x) = Rm, or DΨ(x) is surjective,

dimKernelDΨ(x) = n−m > 0 for all x ∈ M.(A.4)

The transposed matrix represents the dual or adjoint transformation, which is char-acterized by the Euclidean scalar product in R

m and Rn as follows:

(DΨ(x)y,z)Rm = (y,DΨ(x)∗z)Rn for all y ∈ Rn, z ∈ R

m. (A.5)

c© Springer International Publishing AG 2018H. Kielhöfer, Calculus of Variations, Texts in Applied Mathematics 67,https://doi.org/10.1007/978-3-319-71123-2

183


184 Appendix

The transposed matrix DΨ(x)∗ has the same rank m as the matrix DΨ(x), andDΨ(x)∗ ∈ L(Rm,Rn) is injective, which is easily proved by (A.5): RangeDΨ(x) =Rm implies KernelDΨ(x)∗ = {0}.For a subspace U ⊂ R

n we define the orthogonal complement byU⊥ = {y ∈ R

n|(y,u) = 0 for all u ∈U}. Then Rn is the direct sum R

n =U ⊕U⊥,which we prove in (A.45)–(A.48). In particular,

(RangeDΨ(x)∗)⊥ = {y ∈ Rn| (y,DΨ(x)∗z)Rn = 0 for all z ∈ R

m}= {y ∈ R

n| (DΨ(x)y,z)Rm = 0 for all z ∈ Rm}

= {y ∈ Rn| DΨ(x)y= 0} = KernelDΨ(x),

Rn = KernelDΨ(x) ⊕ RangeDΨ(x)∗,

(A.6)

and the direct sum is orthogonal. We define for x ∈ M

TxM = KernelDΨ(x), dimTxM = n−m,

NxM = RangeDΨ(x)∗ = (TxM)⊥, dimNxM = m(A.7)

and therefore Rn = TxM⊕NxM for all x ∈ M.

Below we fix x0 ∈ M and x ∈ Rn are arbitrary. Then there is a unique decompo-

sition x− x0 = y+ z, where y ∈ Tx0M and z ∈ Nx0M. Defining

F(y,z) =Ψ(x0 + y+ z) yields a mapping

F : Tx0M×Nx0M → Rm, satisfying

F(0,0) = 0.

(A.8)

The mapping is continuously totally differentiable with respect to its two variables,and in particular

DzF(0,0) = DΨ(x0)|Nx0M: Nx0M → R

m is bijective. (A.9)

Observe that Nx0M is the orthogonal complement of Tx0M =KernDΨ(x0). Property(A.9) allows the application of the implicit function theorem: There exists a neigh-borhood Br(0) = {y ∈ Tx0M|‖y‖ < r} and a continuously differentiable mapping

ϕ : Br(0) ⊂ Tx0M → Nx0M, satisfying ϕ(0) = 0 and

F(y,ϕ(y)) = 0 for all y ∈ Br(0).(A.10)

Moreover, all zeros of F in a neighbourhood of (0,0) ∈ Tx0M ×Nx0M = Rn are

given by (y,ϕ(y)). By definition (A.8), this implies that the graph of ϕ shifted bythe vector x0 coincides with the set M in a neighborhood U(x0) of x0 in R

n:

{x0 + y+ϕ(y)|y ∈ Br(0) ⊂ Tx0M} =M∩U(x0). (A.11)


Appendix 185

We now show that the affine subspace x0 +Tx0M is tangent to the set M in x0. Byvirtue of F(y,ϕ(y)) = 0 for all y ∈ Br(0), differentiation via the chain rule at (0,0)yields

DyF(0,0)+DzF(0,0)Dϕ(0) = 0 in L(Tx0M,Rm). (A.12)

According to definition (A.8),

DyF(0,0) = DΨ(x0)|Tx0M= 0, due toTx0M = KernelDΨ(x0),

DzF(0,0) = DΨ(x0)|Nx0M: Nx0M → R

m is bijective, and hence,

Dϕ(0) = 0 in L(Tx0M, Nx0M).

(A.13)

The derivatives of ϕ in all directions of the subspace Tx0M vanish in y = 0, andtherefore the term “tangent space of M in x0” is justified. The orthogonal spaceNx0M is called “normal space ofM in x0”.

x0+Tx0M

U(x0)x0

x0+Nx0MM

Fig. A.1 Tangent and Normal Space

We define on the neighborhood U(x0) = {x = x0 + y+ z|y ∈ Tx0M,‖y‖ < r, z ∈Nx0M, ‖z‖ < r}, where ‖ ‖ is the Euclidean norm, the mapping H(x) = H(x0 +y+ z) = x+ϕ(y). By (A.11), H((x0 +Tx0M)∩U(x0)) =M∩U(x0). This mappingis also called a “local straightening,” and due to DH(x0) = E+Dϕ(0) = E (whichis the identity matrix), H :U(x0) → H(U(x0)) is a diffeomorphism, cf. Figure A.2.

The set M defined by (A.2), with the maximal rank condition (A.3) of theJacobian matrix, is a so-called continuously differentiable manifold of dimensionn−m.

Any direct and orthogonal decomposition Rn =U ⊕V defines orthogonal pro-

jections, whose properties are the following:


186 Appendix

M

x0 =H(x0)x0 H

Fig. A.2 A Local Straightening

The unique decomposition x= y+ z forx ∈ Rn, y ∈U, z ∈V

defines via Px= y andQx= z,

linear transformations P,Q ∈ L(Rn,Rn), satisfying:

P= I−Q, Q= I−P, P2 = P, Q2 = Q,

RangeP=U, KernelP=V, RangeQ=V, KernelQ=U,

(Px, x) = (x,Px) = (Px,Px),(Qx, x) = (x,Qx) = (Qx,Qx) for x, x ∈ R

n.

(A.14)

The symmetries are consequences of the orthogonality of the two subspaces:(Px, x) = (Px,Px+Qx) = (Px,Px) = (Px+Qx,Px) = (x,Px) and analog relationsfor (Qx, x).

P is called the orthogonal projection on U along V , and Q is called the ortho-gonal projection on V along U .

For the direct and orthogonal decomposition Rn = TxM ⊕NxM for x ∈ M, the

orthogonal projections are referred to as P(x) and Q(x), where P(x) is the projectionon TxM along NxM and Q(x) is the projection on NxM along TxM.

Next we investigate the differentiability of these projections. Assume that

Ψ ∈Ck(Rn,Rm) for k ≥ 1, (A.15)

i.e., Ψ is k times continuously partially differentiable. Then the components ofthe Jacobian matrix DΨ(x) and also of the transposed matrix DΨ(x)∗ are (k− 1)times continuously partially differentiable with respect to x ∈ R

n. By assumption,DΨ(x)∗ ∈ L(Rm,Rn) is injective, and thus, it maps the canonical basis in R

m ontoa basis in NxM, which is (k− 1) times continuously partially differentiable withrespect to x ∈ R

n. When this basis is orthonormalized according to E. Schmidt(1876–1959), we obtain


Appendix 187

an orthonormal basis {b1(x), . . . ,bm(x)} ⊂ NxM,

which is (k−1) times continuously partially differentiable

with respect to x ∈ M,and the orthogonal projection

Q(x) : Rn → NxM along TxM is defined by

Q(x)x=m

∑i=1

(x,bi(x))bi(x) for x ∈ Rn.

(A.16)

As summarized in (A.14)

P(x) = I−Q(x) : Rn → TxM

is the orthogonal projection on TxM along NxM.(A.17)

By their construction, both projections Q and P are (k−1) times continuously par-tially differentiable with respect to x ∈M. (The differentiability holds in a neighbor-hood of x ∈M in R

n, due to the fact that the maximal rank condition of the Jacobianmatrix holds also in a neighborhood of x ∈ M in R

n.)We show that locally (where ‖ ‖ is the Euclidean norm),

TxM = P(x)Tx0M, NxM = Q(x)Nx0M

for ‖x− x0‖ < δ , where 0 < δ is sufficiently small.(A.18)

If P(x) : Tx0M → TxM is injective, then it is also surjective. Assume P(x)y = 0 fory ∈ Tx0M. Since P(x0)y = y and ‖y‖ = ‖P(x0)y−P(x)y‖ ≤ ‖P(x0)−P(x)‖‖y‖ <ε‖y‖, due to the continuity of the projection, we conclude that y= 0. The argumentfor Q(x) : Nx0M → NxM is the same.

Let {a1, . . . ,an−m} be an orthonormal basis in Tx0M. By (A.18),{P(x)a1, . . . ,P(x)an−m} is a basis in TxM for all ‖x− x0‖ < δ , which we orthonor-malize. Hence, we obtain

an orthonormalized basis {a1(x), . . . ,an−m(x)} ⊂ TxM,

which is (k−1) times continuously partially differentiable

with respect to x ∈ M,and

P(x) : Rn → TxM along NxM is defined by

P(x)x=n−m

∑i=1

(x,ai(x))ai(x) for x ∈ Rn.

(A.19)

The sets {TxM|x ∈ M} and {NxM|x ∈ M} are called tangent and normal bundle,respectively, and they can be given the structure of a manifold.

Next we establish the multiplier rule of Lagrange. Let

f : Rn → R be continuously totally differentiable, (A.20)

and assume that x0 ∈ M minimizes f locally under the constraint Ψ(x) = 0, i.e.,


188 Appendix

f (x0) ≤ f (x) for all x ∈ M satisfying ‖x− x0‖ < d (A.21)

where d > 0 is some constant. Then, by (A.11), the curve {x(t) = x0+at+ϕ(at)|t ∈(−ε,ε)} ⊂ M for any a ∈ Tx0M (with ε depending on a), and by (A.21), g :(−ε,ε) → R, defined by g(t) = f (x0 + at + ϕ(at)), is locally minimal at t = 0.Consequently, (where ( , ) is the Euclidean scalar product),

g′(0) = (∇ f (x0), x(0)) = (∇ f (x0),a+Dϕ(0)a)= (∇ f (x0),a) = 0 where we use (A.13)3.

(A.22)

Since a ∈ Tx0M is arbitrary, the gradient ∇ f (x0) ∈ (Tx0M)⊥ = Nx0

M= RangeDΨ(x0)∗, cf. (A.7). By convention (cf. (A.3)), the columns of DΨ(x0)∗ ∈Rn×m are the gradients of Ψi, i= 1, . . . ,m, i.e.,

DΨ(x0)∗ = (∇Ψ1(x0) · · ·∇Ψm(x0)), and

∇ f (x0) ∈ RangeDΨ(x0)∗ = Nx0M means

∇ f (x0)+m

∑i=1

λi∇Ψi(x0) = 0 for some λ = (λ1, . . . ,λm) ∈ Rm.

(A.23)

The constants λ1, . . . ,λm are called Lagrange multipliers.Since DΨ(x0)∗ ∈ L(Rm,Rn) has maximal rank m and dimRangeDΨ(x0)∗ =

dimNx0M = m, the linear transformation

DΨ(x0)∗ : Rm → Nx0M is an isomorphism, (A.24)

and thus, the Lagrange multipliers in (A.23)3 are uniquely determined.Next we prove Liouville’s theorem, cf. (2.5.61). The system

x= f (x), f : Rn → Rn, (A.25)

where f is a continuously totally differentiable vector field, generates a flow ϕ(t,z),i.e.,

∂∂ t

ϕ(t,z) = f (ϕ(t,z)), ϕ(0,z) = z ∈ Rn. (A.26)

Differentiation of (A.26) with respect to z (which is possible by the continuouslydifferentiable dependence of the flow on the initial condition) yields (D= Dz)

∂∂ t

Dϕ(t,z) = Df (ϕ(t,z))Dϕ(t,z), where

Dϕ(0,z) = E (= the identity matrix).(A.27)

The Jacobian matrix of ϕ(t, .) has the columns

Dϕ(t,z) = (ϕz1(t,z) · · ·ϕzn(t,z)). (A.28)


http://dx.doi.org/10.1007/978-3-319-71123-2_2

Appendix 189

Since the determinant is linear with respect to each column, differentiation withrespect to t gives, in view of (A.27),

∂∂ t

detDϕ(t,z) =n

∑i=1

det(ϕz1(t,z) · · ·∂∂ t

ϕzi(t,z) · · ·ϕzn(t,z))

=n

∑i=1

det(ϕz1(t,z) · · ·Df (ϕ(t,z))ϕzi(t,z) · · ·ϕzn(t,z))

= traceDf (ϕ(t,z))det(ϕz1(t,z) · · ·ϕzi(t,z) · · ·ϕzn(t,z))= div f (ϕ(t,z))detDϕ(t,z),

(A.29)

where we use a result of linear algebra on the trace of a matrix. The differentialequation (A.29) for detDϕ(t,z) has the solution

detDϕ(t,z) = detDϕ(0,z)exp∫ t

0div f (ϕ(s,z))ds

= 1exp 0 = 1,

(A.30)

where we use (A.27)2 and div f (x) = 0 for all x ∈ Rn. Let Ω be measurable set in

Rn. Then the change-of-variable formula for Lebesgue integrals over measurable

sets yields

μ(ϕ(t,Ω)) =∫

ϕ(t,Ω)1dz =

∫Ω

|detDϕ(t,z)|dz =∫

Ω1dz = μ(Ω), (A.31)

which is Liouville’s theorem (2.5.61). If div f (x) does not vanish, then formulas(A.30) and (A.31) describe how a flow changes a volume.

Next we prove the sequential weak compactness in a Hilbert space, cf.(3.1.13).

Let (yn)n∈N be a bounded sequence in a Hilbert space X , i.e., ‖yn‖ ≤ C for alln ∈ N. Then the numbers

αnm = (yn,ym) ∈ R, n,m ∈ N, (A.32)

are bounded in R : |αnm| ≤C2 for all n,m ∈ N. By the Bolzano-Weierstraß theoremone can select subsequences of each previous sequence such that

(αn1k ,m

)k∈N converges for m= 1,

(αn2k ,m

)k∈N converges for m= 1,2, and so on,

(αnik,m)k∈N converges for m= 1, . . . , i.

(A.33)

This construction yields a diagonal sequence

(αnkk,m)k∈N which converges for all m ∈ N. (A.34)


http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_3

190 Appendix

With the notation αnkk,m= αkm, we obtain a sequence such that

limk→∞

αkm = limk→∞

(yk,ym) exists in R for each m ∈ N. (A.35)

Note that (yk)k∈N is a subsequence of the original sequence (yn)n∈N.Let U := clX (span[yn,n ∈ N]), the closure of the subspace of X spanned by the

vectors {y1,y2, . . .}. Then for any u ∈U and ε > 0, there are some N ∈ N and somey ∈ span[y1, . . . ,yN ] =UN such that

‖u− y‖ <ε

4C, (A.36)

where C bounds the sequence ‖yn‖. Then for the subsequence (yk)k∈N

|(yk,u)− (yl ,u)| = |(yk − yl ,u)|≤ |(yk − yl , y)|+ |(yk,u− y)|+ |(yl ,u− y)|≤ |(yk − yl , y)|+2C

ε4C

.

(A.37)

By (A.35), the sequence ((yk, y))k∈N is convergent, and hence,

|(yk − yl , y)| < ε2

for k, � ≥ k0(ε, y). (A.38)

The arguments thus far demonstrate that

((yk,u))k∈N is a Cauchy sequence for each u ∈U, (A.39)

and therefore it is a convergent sequence in R.The Hilbert space is decomposed as

X =U ⊕U⊥, (A.40)

where U⊥ is the orthogonal complement of the closed subspace U , cf. (A.45)–(A.48) below. If y= u+w, u ∈U, w ∈U⊥, then apparently, since yk ∈U ,

(yk,y) = (yk,u) for all k ∈ N. (A.41)

By (A.39)limk→∞

(yk,y) = �(y) ∈ R for each y ∈ X . (A.42)

The abovedefined transformation � : X → R is linear, and by |(yk,y)| ≤C‖y‖ for allk ∈ N, the estimate

|�(y)| ≤C‖y‖ (A.43)

holds, proving continuity of � or � ∈ X ′. By Riesz’ representation theorem (3.1.9),there exists a unique z ∈ X such that �(y) = (y,z) = (z,y) for all y ∈ X . Finally, by(A.42),


http://dx.doi.org/10.1007/978-3-319-71123-2_3

Appendix 191

limk→∞

(yk,y) = (z,y) for all y ∈ X or

w− limk→∞

yk = z.(A.44)

We have proven that the bounded sequence (yn)n∈N contains a weakly convergentsubsequence (yk)k∈N.

We now establish the decomposition (A.40) via Riesz’ representation theorem:The orthogonal complement

U⊥ = {w ∈ X |(w,u) = 0 for all u ∈U}, (A.45)

is a closed subspace of X satisfying U ∩U⊥ = {0}. We define on U ⊂ X

� :U → R by �(u) = (u,y), where y ∈ X is arbitrary. (A.46)

Then � ∈U ′, and since the closed subspace U ⊂ X is a Hilbert space as well, thereexists a unique z ∈U such that

�(u) = (u,z), or

(u,y) = (u,z) for all u ∈U.(A.47)

Then this vector z ∈U yields the decomposition

y= z+ y− z, satisfying

(y− z,u) = 0 for all u ∈U, meaning

y− z ∈U⊥.

(A.48)

Finally we prove the Arzelà-Ascoli Theorem, cf. Proposition 3.2.2.Let S= {xm|m∈N} ⊂ [a,b] be a countable dense subset, for instance the rational

numbers. Then the numbers

αnm = yn(xm) ∈ R, n,m ∈ N, (A.49)

are bounded in R, i.e., |αnm| ≤ C for all n,m ∈ N. By the arguments in (A.32)–(A.35), there exists a subsequence (yk)k∈N of the sequence (yn)n∈N such that

limk→∞

yk(xm) exists in R for each m ∈ N. (A.50)

For any x ∈ [a,b] and for any ε > 0 there is an xm ∈ S such that

|x− xm| < δ(ε

3

), (A.51)

where δ is taken from the equicontinuity (3.2.23). Then the following estimateshold:


http://dx.doi.org/10.1007/978-3-319-71123-2_3

http://dx.doi.org/10.1007/978-3-319-71123-2_3

Solutions of the Exercises

0.1 ‖B−A‖ = ‖x(tb)− x(ta)‖ = ‖∫ tbta x(t)dt‖ ≤ ∫ tb

ta ‖x(t)‖dt.0.2 Choose the coordinates A = (0,y1), P = (x,0), B = (x2,y2). Then the lengthof the line segments are: AP= (x2 + y2

1)1/2, PB= ((x2 − x)2 + y2

2)1/2. The running

times are:

T1 +T2 =1v1(x2 + y2

1)1/2 +

1v2((x2 − x)2 + y2

2)1/2.

Differentiation with respect to x gives

x

v1(x2 + y21)1/2

− x2 − x

v2((x2 − x)2 + y22)1/2

= 0, orsinα1

v1=

sinα2

v2.

1.1.1 The function y is continuous on [0,1], but it has corners in all points 1n . There-

fore there is no finite partition of [0,1] for which y fulfills Definition 1.1.1.

Graph of the Function of Exercise 1.1.1


193


194 Solutions of the Exercises

For n > m

ym(x)− yn(x) ={y(x) for x ∈ [ 1

2n+1 , 12m+1 ],

0 for x ∈ [0,1]\[ 12n+1 , 1

2m+1 ],

and therefore ‖ym − yn‖1,pw,[0,1] ≤ 1(2m+2)3 + 2m+3

(2m+2)2 < ε for n > m ≥ n0(ε).Assume that there is some limit y0 ∈ C1,pw[0,1]. Then, according to Definition1.1.2, limn→∞ ‖yn − y0‖0,[0,1] = 0. Since (yn − y)(x) = −y(x) for x ∈ [0, 1

2n+1 ] and(yn − y)(x) = 0 for x ∈ [ 1

2n+1 ,1], we obtain ‖yn − y‖0,[0,1] = 1(2n)3 , and there-

fore limn→∞ yn = y in C[0,1]. Uniqueness of the limit in C[0,1] implies y0 = y /∈C1,pw[0,1], which is a contradiction.

1.1.2 We show the continuity in some y0 ∈ C1,pw[a,b]. By Definition 1.1.1, thefunctions y0 and y′

0 are bounded on [a,b]; the latter is bounded on all intervals[xi−1,xi], i= 1, . . . ,m. Therefore their respective ranges are compact intervals [−c,c]and [−c′,c′]. The continuous function F is uniformly continuous on the compact set[a,b]× [−c,c]× [−c′,c′]. Hence for y ∈C1,pw[a,b],

|F(x,y(x),y′(x))−F(x,y0(x),y′0(x))| < ε for all x ∈ [a,b],

if‖y− y0‖1,pw,[a,b] < δ (ε).

Then, by definition 1.1.3,

|J(y)− J(y0)| < ε(b−a) if‖y− y0‖1,pw < δ (ε).

Observe that (x,y0(x),y′0(x)) ∈ [a,b]× [−c,c]× [−c′,c′] if c and c′ are large enough

and δ = δ (ε) is small enough.

1.2.1 We prove estimate (1.2.17), which entails continuity. The functions y and hhave w.l.o.g. the same partition, and for x ∈ [xi−1,xi], (x,y(x),y′(x)) ∈ [xi−1,xi]×[−c,c]× [−c′,c′]. Hence, due to continuity of Fy and of Fy′ ,

|Fy(x,y(x),y′(x))| ≤ C|Fy′(x,y(x),y′(x))| ≤ C′ for all x ∈ [xi−1,xi], i= 1, . . . ,m,

with constants C = C(y) and C′ = C′(y). We then obtain

|δJ(y)h| =∣∣∣∣

m

∑i=1

∫ xi

xi−1

Fy(x,y(x),y′(x))h(x)+Fy′(x,y(x),y′(x))h′(x)dx

∣∣∣∣≤

m

∑i=1

∫ xi

xi−1

|Fy(x,y(x),y′(x))||h(x)|+ |Fy′(x,y(x),y′(x))||h′(x)|dx

≤m

∑i=1

(xi − xi−1)(C‖h‖0,[a,b] +C′‖h′‖0,[xi−1,xi])

≤ (b−a)max{C,C′}‖h‖1,pw =C(y)‖h‖1,pw.


http://dx.doi.org/10.1007/978-3-319-71123-2_1

Solutions of the Exercises 195

1.2.2 Follow the proof of Proposition 1.2.1, and show that the second derivativewith respect to t and integration can be interchanged. Then the formula for the sec-ond variation follows from the second derivative of F(x,y+ th,y′+ th′) with respectto t in t = 0.

1.2.3 The proof is essentially the same as for Proposition 1.2.1, cf. Exercise 1.2.1.

1.4.1 Let [x0 − 1n ,x0 + 1

n ] ⊂ I for n ≥ n0. Then for the saw tooth hn of height 1,

A Saw Tooth

properties a), b), c) hold.

1.4.2 According to the hypotheses and Lemma 1.3.3, an integration by parts isallowed:

δ 2J(y)(h,h) =∫ b

aFyyh

2 +2Fyy′hh′ +Fy′y′(h

′)2dx

=∫ b

aFyyh

2 − ddx

(Fyy′h)h+Fyy′hh′ +Fy′y′(h

′)2dx

=∫ b

a(Fyy − d

dxFyy′)h

2 +Fy′y′(h′)2dx.

1.4.3 Assume that Fy′y′(x0,y(x0),y′(x0)) < 0 for some x0 ∈ [xi−1,xi] ⊂ [a,b]. Bycontinuity of Fy′y′(·,y,y′) on [xi−1,xi], there is a compact interval I ⊂ [xi−1,xi]∩(a,b)such that Fy′y′(x,y(x),y′(x)) ≤ −c1 < 0 for all x ∈ I. W.l.o.g we can assume that P,defined in Exercise 1.4.2, is continuous on I as well. Hence, P(x,y(x),y′(x)) ≤ c2

on I. Then by (1.4.11) using a sequence (hn)n∈N described in Exercise 1.4.1,

0 ≤∫ b

aPh2

n+Q(h′n)

2dx ≤ c2

∫Ih2n+

∫IFy′y′(h

′n)

2dx

≤ c2

∫ b

ah2ndx− c1

∫ b

a(h′

n)2dx < 0 for n ≥ n1 ≥ n0,

due to the fact that the first term converges to 0 and the second term goes to −∞.This contradiction proves that the assumption is wrong.


http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1


1.4.4 Define g(t) = J(y+ th), where h ∈ C1,pw0 [a,b] such that y+ th ∈ D for all

t ∈ R. According to Exercise 1.2.2, g′′(t) = δ 2J(y+ th)(h,h) ≥ 0 for all t ∈ R, byassumption. The formula given in the hint (easy to verify) gives

J(y+h) = J(y)+δJ(y)h+∫ 1

0(1− t)δ 2J(y+ th)(h,h)dt

≥ J(y) for all h ∈C1,pw0 [a,b].

For an arbitrary y ∈ D, we set y= y+ y− y= y+h, where h= y− y ∈C1,pw0 [a,b].

1.4.5 We define h = h/‖h‖, where ‖h‖ = ‖h‖1,pw, and g(t) = J(y+ th). Then byTaylor’s theorem

g(t) = g(0)+g′(0)t+12g′′(0)t2 + r(t), and we have

J(y+ th) = J(y)+δJ(y)th+12

δ 2J(y)(th, th)+ R(y, h; t).

By the estimates (1.2.17) and from Exercise 1.2.3 while using the uniform continuityof F on compact sets, we obtain

limt→0

R(y, h; t) = 0 uniformly for ‖h‖ = 1.

In the sequel, the function y ∈ D is arbitrary but fixed. Furthermore

ddtR(y, h; t) = δJ(y+ th)h−δJ(y)h− tδ 2J(y)(h, h),

d2

dt2R(y, h; t) = δ 2J(y+ th)(h, h)−δ 2J(y)(h, h).

Using the same estimates for δJ(y+th) and δ 2J(y+th), and the uniform continuityof Fy,Fy′ ,Fyy,Fyy′ ,Fy′y′ on compact sets, we obtain

limt→0

ddtR(y, h; t) = 0 uniformly for ‖h‖ = 1,

limt→0

d2

dt2R(y, h; t) = 0 uniformly for ‖h‖ = 1.

The mean value theorem yields

R(y, h; t)/t2 =1tddtR(y, h;τ) =

τtddtR(y, h;τ)/τ =

τtd2

dt2R(y, h;σ),

where 0 < |σ | < |τ| < |t|.By the above result,

limt→0

R(y, h; t)/t2 = 0 uniformly for ‖h‖ = 1.


http://dx.doi.org/10.1007/978-3-319-71123-2_1


Setting t = ‖h‖ gives th= h, R(y,h) = R(y, h;‖h‖), and finally

lim‖h‖→0

R(y,h)/‖h‖2 = lim‖h‖→0

R(y,h/‖h‖;‖h‖)/‖h‖2 = 0.

1.4.6 For arbitrary y ∈ D, we set y= y+ y− y= y+h, where h ∈C1,pw0 [a,b]. Then

by Exercise 1.4.5,

J(y) = J(y)+δJ(y)h+12

δ 2J(y)(h,h)+R(y,h), and thus,

J(y)− J(y) ≥ 12C‖h‖2

1,pw −|R(y,h)|

= (12C−|R(y,h)|/‖h‖2

1,pw)‖h‖21,pw > 0,

provided |R(y,h)|/‖h‖21,pw < 1

2C, which, due to Exercise 1.4.5, holds for ‖h‖1,pw <d. According to Definition 1.4.1, the function y is a local minimizer of J.

1.4.7

a) J(0) = 0 and J(y) =∫ 1

0 (y′)2(1+ y′)dx ≥ 0 provided ‖y‖1,pw < 1.

b)

J(yn,b) =1bn2

(1+

1bn

)+

1n2(1−b)

(1− 1

n(1−b)

)

=1n2

(1

b(1−b)+

1b2n

− 1n(1−b)2

)=

1n2b(1−b)

(1+

1−2bb(1−b)n

)< 0

if12

< b < 1 andb(1−b)2b−1

<1n.

This holds for b= bn close to 1. Finally, ‖yn,b‖0 = 1n < d for n ≥ n0.

1.4.8 All solutions of the Euler-Lagrange equation are piecewise straight lines hav-ing slopes y′ = ±√

c1, c1 ≥ 0, cf. (1.4.7), (1.4.8). By δ 2J(y)(h,h) = 6∫ 1

0 y′(h′)2dx(cf. Exercise 1.2.2), the necessary condition (1.4.11) for a local minimizer is onlyfulfilled for y′ ≥ 0 on [0,1]. The boundary conditions allow only y′ = 0, and there-fore, y= 0. The analogous argument for −J shows that the only possible local maxi-mizer is y= 0. Let

y0(x) ={

3x for x ∈ [0, 13 ],

1− 32 (x− 1

3 ) for x ∈ [ 13 ,1].

Then y0 ∈D and J(y0) = 27/4. Since J(αy0) = α3J(y0), the range of the functionalJ on D is R. Furthermore, ‖αy0‖1,pw,[0,1] = |α|‖y0‖1,pw = 4|α| < d for |α| < d/4,and the functional J is positive and negative in any neighbourhood of y = 0. SinceJ(0) = 0, the function y= 0 is neither a local minimizer nor maximizer of J on D.


http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1


1.5.1 By assumption, the function f = Fy′(·,y,y′) is continuously differentiable onthe interval [xi−1,xi]. We set G(x,z) = Fy′(x,y(x),z)− f (x), and then G : [xi−1,xi]×R → R is continuously partially differentiable with respect to both variables, byassumption on F and because y ∈ C1[xi−1,xi]. We know that G(x0,y′(x0)) = 0and Gz(x0,y′(x0)) = Fy′y′(x0,y(x0),y′(x0)) �= 0, by assumption. The implicit func-tion theorem then guarantees the existence of a unique continuously differentiablefunction z(x) that fulfills the equation G(x,z(x)) = 0 in a neighborhood of x0 andz(x0) = y′(x0). By uniqueness, z(x) = y′(x) since G(x,y′(x)) = 0 as well. Continu-ous differentiability of z = y′ implies two times continuous differentiability of y ina neighborhood of x0.

1.5.2

a) J(y) =∫ 1

0 y′dx= y(1)−y(0) = 1 for all admitted functions y∈D. Any admittedfunction satisfies the Euler-Lagrange equation.

b) J(y) = 12

∫ 10

ddx (y

2)dx= 12 ((y(1))

2 −(y(0))2) = 12 for all admitted functions, and

each of them fulfill the Euler-Lagrange equation.c) J(y) = 1

2

∫ 10 x d

dx (y2)dx = − 1

2

∫ 10 y2dx+ 1

2 for all admitted functions. Choosing

the admitted functions yn(x) = xn, we obtain∫ 1

0 y2ndx=

12n+1 . Hence the supre-

mum of J is 12 . The supremum is not a maximum, there is no minimizer, and no

admitted function satisfies the Euler-Lagrange equation.

1.5.3

a) 2 ddx y

′ = 2, or y′′ = 1. Only the solution y(x) = 12 (x

2 + x) fulfills the boundary

conditions. Since δ 2J(y)(h,h) = 2∫ 1

0 (h′)2dx ≥ 0 for all admitted functions y,

the solution is a global minimizer, cf. Exercise 1.4.4.b) 2 d

dx (y′ + y) = 2y′, or y′′ = 0. Only the solution y(x) = − 1

3x+23 fulfills the

boundary conditions. Since δ 2J(y)(h,h) = 2∫ 2−1(h

′)2 + 2hh′dx = 2∫ 2−1(h

′)2 +ddx (h

2)dx = 2∫ 2−1(h

′)2dx ≥ 0 for all admitted functions y and for all h ∈C1,pw

0 [−1,2], the solution is a global minimizer.c) 2 d

dx (y′+x)= 0 or y′′ =−1. Only the solution y(x)= 1

2 (x−x2) fulfills the bound-

ary conditions. Since δ 2J(y)(h,h) = 2∫ 1

0 (h′)2dx ≥ 0 for all admitted functions

y, the solution is a global minimizer.d) 2 d

dx (y′ + y) = 2(y′ + y) or y′′ = y. Only the solution y(x) = sinhx/sinh2 ful-

fills the boundary conditions. Since δ 2J(y)(h,h) =∫ 2

0 2(h′)2 +4hh′ +2h2dx =2

∫ 20 (h

′ + h)2dx ≥ 0 for all admitted functions y, the solution is a global mini-mizer.

1.8.1 A multiple application of L’Hospital’s rule gives f (0) = 0 and f ′(0) = 13 . The

asymptotic behavior limτ→2π f (τ) = +∞ is obvious.We show f ′(τ) > 0 for τ ∈ (0,2π) in different subintervals. We compute the

derivative and define

f ′(τ) =2(1− cosτ)− τ sinτ

(1− cosτ)2 :=g(τ)−h(τ)(1− cosτ)2 .



First of all, g′(τ)− h′(τ) = sinτ − τ cosτ = cosτ(tanτ − τ) > 0 for τ ∈ (0, π2 ).

Therefore, since g(0) = h(0) = 0, g(τ)−h(τ)> 0 and thus, f ′(τ)> 0 for τ ∈ (0, π2 ).

We see also that g′(τ)− h′(τ) > 0 for τ ∈ [π2 ,π], and hence, g(τ)− h(τ) > 0 and

f ′(τ)> 0 for τ ∈ [π2 ,π]. Since g(τ)> 0 and h(τ)< 0 for τ ∈ (π,2π), it follows that

g(τ)−h(τ) > 0, and therefore, f ′(τ) > 0 for τ ∈ (π,2π).

1.8.2 The running time on the line y(x) = 2π x from x = 0 to x = b is, according to

(1.8.4),

T =1√2g

∫ b

0

√1+( 2

π )2

2π x

dx=

√bπg

(1+

4π2

).

The running time on the cycloid is√

bπg ; according to (1.8.19), τb = π and r = B

2 =

bπ . Thus the ratio of the running times is T : Tmin =

√1+ 4

π2 .

1.9.1

a) The Euler-Lagrange equation is ddx2y′ = 1 piecewise on [0,1]. Since 2y′ ∈

C1,pw[0,1] ⊂C[0,1], cf. (1.4.3)1, the solution is given by 2y′(x) = x+ c1 for allx ∈ [0,1]. Hence y(x) = 1

4x2 + 1

2c1x+ c2, noting that y ∈C[0,1] as well.b) The natural boundary conditions are y′(0) = 0 and y′(1) = 0. No solution fulfills

the natural boundary conditions, because y′(0) = 0 implies y′(1) = 12 .

c) y(x) = 14x

2 + 34x is the only solution.

d) Without boundary conditions, no solution is locally extremal because the natu-ral boundary conditions cannot be fulfilled. With boundary conditions, Proposi-tion 1.4.3 is applicable because the Lagrange function is convex. Another argu-ment is the following: The second variation δ 2J(y)(h,h) =

∫ 10 (h

′)2dx ≥ 0 forall y ∈ D = C1,pw[0,1]∩ {y(0) = 0, y(1) = 1}, and thus, according to Exer-cise 1.4.4, each solution of the Euler-Lagrange equation in D is a global mini-mizer.

1.9.2 The Euler-Lagrange equation reads

ddx

2y′ =1

1+ y2 piecewise on [a,b].

The regularity (1.4.3)1 implies y′ ∈C[a,b]. Hence y∈C1[a,b], and since Fy′y′(y,y′)=2, Exercise 1.5.1 implies that y ∈C2(a,b). Therefore

2y′′ =1

1+ y2 > 0 on all (a,b).

The natural boundary conditions on minimizers are y′(a) = 0 and y′(b) = 0. Rolle’stheorem implies the existence of some x ∈ (a,b) such that y′′(x) = 0, which, how-ever, is excluded by the Euler-Lagrange equation. Hence, there is no (local or global)minimizer, even though the functional is bounded from below by −π

2 (b−a).


http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1


1.10.1 Since x(t) > 0 on [ti−1, ti], x is strictly monotone and maps [ti−1, ti] one-to-one onto [xi−1,xi]. Since x(ta) = a = x(t0) and x(tb) = b = x(tm), we obtain apartition a= x0 < x1 < · · · < xm = b of the interval [a,b]. Let ψi : [xi−1,xi]→ [ti−1, ti]be the continuously differentiable inverse function of the function x, i.e., ψi(x) = tif x(t) = x. Setting y(x) = y(ψi(x)), we obtain y(x) = y(t) and y ∈C1[xi−1,xi]. Sincey(ψi(xi)) = y(ti) = y(ψi+1(xi)) for i = 1, . . . ,m− 1, y ∈C[a,b], and altogether y ∈C1,pw[a,b].

1.10.2

a) Let h ∈ (C1,pw0 [ta, tb])n. Then

δJ(x)h=∫ tb

ta(DF(x)h, x)+(F(x), h)dt

=∫ tb

ta(h,DF(x)∗x)− ( d

dtF(x),h

)dt

=∫ tb

ta((DF(x)∗ −DF(x))x,h)dt,

where DF(x) is the Jacobian matrix of F in x, and DF(x)∗ is the transposedmatrix.

b) The system of Euler-Lagrange equations reads

(DF(x)−DF(x)∗)x= 0 for all t ∈ [ta, tb],

or x ∈ Kernel(DF(x)−DF(x)∗). That system does not have solutions in any Dif the kernel is {0}.

c) In the case DF(x) = DF(x)∗, δJ(x) = 0, and any x ∈ D solves the Euler-Lagrange equations piecewise. The symmetry of the Jacobian matrix impliesthat the vector field F possesses a potential f : Rn → R, i.e., F(x) = ∇ f (x).Then, for all x ∈ D, by Lemma 1.3.3 with h ≡ 1, we have

J(x) =∫ tb

ta(∇ f (x), x)dt =

∫ tb

ta

ddt

f (x)dt = f (x(tb))− f (x(ta)) = f (B)− f (A),

which means that the curvilinear integral is path independent.

1.10.3 The Euler-Lagrange equations read

ddt

Φx(x, x) = Φx(x, x) on [ta, tb].

By the assumed regularity of the local minimizer solving the Euler-Lagrange sys-tem, we may differentiate:


http://dx.doi.org/10.1007/978-3-319-71123-2_1


ddt(Φ(x, x)− (x,Φx(x, x)))

= (Φx(x, x), x)+(Φx(x, x), x)− (x,Φx(x, x))− (x,ddt(Φx(x, x))

= (Φx(x, x)− ddt

Φx(x, x), x) = 0,

which proves the claim.

1.10.4 By (1.10.22)1, Lx(x, x) = mx ∈ (C1[ta, tb])3, and hence x ∈ (C2[ta, tb])3. Bythe chain rule, we have

ddtE(x, x) = m(x, x)+(gradV (x), x) = 0 for t ∈ [ta, tb],

which proves constant energy.

1.10.5 Follow the proof of Proposition 1.2.1, where differentiation and integrationare interchanged. This is possible for the second derivative as well and we obtain:

dds

J(x+ sh) =∫ tb

ta(Φx(t,x+ sh, x+ sh),h)+(Φx(t,x+ sh, x+ sh), h)dt,

d2

ds2 J(x+ sh)∣∣s=0

=∫ tb

ta(D2

xΦ(t,x, x)h,h)+2(DxDxΦ(t,x, x)h,h)+(D2xΦ(t,x, x)h, h)dt,

with the matrices

D2xΦ(t,x, x) =

(∂ 2

∂xi∂x jΦ(t,x, x)

)i=1,...,nj=1,...,n

,

DxDxΦ(t,x, x) =(

∂ 2

∂xi∂ x jΦ(t,x, x)

)i=1,...,nj=1,...,n

,

D2xΦ(t,x, x) =

(∂ 2

∂ xi∂ x jΦ(t,x, x)

)i=1,...,nj=1,...,n

, i= row index, j = column index.

1.10.6

a) The second variation of the action reads

δ 2J(x)(h,h) =∫ tb

tam‖h‖2 − (D2V (x)h,h)dt.

b) The assumption (D2V (x)h,h) ≤ 0 implies

δ 2J(x)(h,h) ≥ 0 for all x ∈ (C1[ta, tb])3, h ∈ (C10 [ta, tb])

3.


http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1


Let the path x ∈ (C2[ta, tb])3 satisfy the system (1.10.27), implying

δJ(x)h= 0 for all h ∈ (C10 [ta, tb])

3.

Let a path x ∈ (C1[ta, tb])3 fulfill the same boundary conditions as x, i.e., x−x=h ∈ (C1

0 [ta, tb])3. Setting

g(s) = J(x+ sh), we obtain

J(x) = g(1) = g(0)+g′(0)+∫ 1

0(1− s)g′′(s)ds

= J(x)+δJ(x)h+∫ 1

0(1− s)δ 2J(x+ sh)(h,h)ds

≥ J(x), due to bothδJ(x)h= 0and δ 2J(x+ sh)(h,h) ≥ 0 fors ∈ [0,1].

1.11.1 The curve {(x,y(x))|x∈ [a,b]} satisfies the Euler-Lagrange equations (1.10.9),for the Lagrange function (1.11.5), in all admitted parametrizations, and thus alsowhen parametrized by x. This means

ddx

(F(y,y′)− y′Fy′(y,y′)) = Fx(y,y′) = 0,

piecewise on [a,b]. Constancy follows by the second Weierstraß-Erdmann cornercondition.

1.11.2 The function W has local minima at z = −1 and at z = 12 , having values

W (−1) = − 13 and W ( 1

2 ) = − 596 . At z = 0, the function W has a local maximum.

Since W (−1) is a global minimum, a global minimizer of the functional J is givenby y= −x and y′ = −1. All parallel lines y(x) = −x+ c are global minimizers, too.

1.11.3 The minimizers found in Exercise 1.11.2 do not satisfiy the boundary con-ditions, the line y = 0 is not a minimizer, and thus, global minimizers must havecorners. The slopes are inserted in Figure 1.20 and denoted c1

1 = α and c21 = β . We

study a “prototype” y having only one corner.

y y1

xa b1

Global Minimizers


http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1


The negative slope is c11 =

y1x1−b , and the positive slope is c2

1 =y1

x1−a . The tangentdepicted in Figure 1.20 has the following property:

W (z) ≥W (c11)+

W (c21)−W (c1

1)c2

1 − c11

(z− c11) for all z ∈ R.

Let y ∈C1,pw[a,b]∩{y(a) = 0,y(b) = 0}. Then∫ ba y

′dx= 0, and therefore,

J(y) =∫ b

aW (y′)dx ≥

(W (c1

1)−W (c2

1)−W (c11)

c21 − c1

1

c11

)(b−a)

=(W (c1

1)c2

1

c21 − c1

1

−W (c21)

c11

c21 − c1

1

)(b−a).

For the “prototype,” we obtain

J(y) =∫ b

aW (y′)dx=W (c2

1)(x1 −a)+W (c11)(b− x1)

=(W (c1

1)b− x1

b−a−W (c2

1)a− x1

b−a

)(b−a)

=(W (c1

1)c2

1

c21 − c1

1

−W (c21)

c11

c21 − c1

1

)(b−a),

which proves∫ ba W (y′)dx≥ ∫ b

a W (y′)dx for all y∈C1,pw[a,b]∩{y(a) = 0, y(b) = 0}.The functional J has the same value for all saw-tooth functions having the slopes c1

1and c2

1, and fulfilling the boundary conditions y(a) = 0 and y(b) = 0. Therefore theyare all global minimizers.

2.1.1m

∑i=1

λiδKi(y)h= 0 for allh ∈ D0 ⇔

(λ ,δK(y)h) = 0 for allh ∈ D0, where

λ = (λ1, . . . ,λm), δK(y)h= (δK1(y)h, . . . ,δKm(y)h)and( , ) is the scalar product inRm.

If δK(y) is surjective, then λ = 0. On the other hand, if the scalar product is zerofor λ = 0 only, then δK(y) is surjective.

2.1.2 Assume (i); δK(y)h= 2∫ 1

0 yhdx, and y is not critical for K, due to δK(y)y= 2.An application of Proposition 2.1.1 gives (up to the factor 2)

y′′ = λy piecewise on [0,1].

Since y is continuous on [0,1], the minimizer satisfies y ∈ C2[0,1] and the Euler-Lagrange equation on all of [0,1]. The Lagrange multiplier λ is determined as fol-lows:


http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_2


∫ 1

0y′′ydx= −

∫ 1

0(y′)2dx= λ

∫ 1

0y2dx= λ ,

and hence −λ = min{J(y)} under the constraint K(y) = 1. Since for all h ∈ D =C1,pw

0 [0,1], one obtains K(h/α) = 1 for α2 =∫ 1

0 h2dx > 0; the inequality J(h/α) ≥−λ holds, which proves the Poincaré inequality.

Assume (ii). For y∈D satisfying K(y) = 1, the difference y−y= h∈C1,pw0 [0,1].

Then for y= y+h,

J(y) = J(y)+2∫ 1

0y′h′dx+

∫ 1

0(h′)2dx

= J(y)−2∫ 1

0y′′hdx+

∫ 1

0(h′)2dx

= J(y)−2λ∫ 1

0yhdx+

∫ 1

0(h′)2dx,

1 = K(y)+2∫ 1

0yhdx+

∫ 1

0h2dx, and thus,

−2∫ 1

0yhdx=

∫ 1

0h2dx (since K(y) = 1).

Therefore, by the Poincaré inequality,

J(y) = J(y)+λ∫ 1

0h2dx+

∫ 1

0(h′)2dx ≥ J(y),

which proves (i).Explicitly, y(x) =

√2sinπx and λ = −π2. The other candidates yn(x)

=√

2sinnπx and λ = −n2π2 are excluded for n ≥ 2, due to the observation that thefunctions yn ∈C1,pw

0 [0,1] do not satisfy the Poincaré inequality with λn for n ≥ 2.

2.1.3 For m=±1, the constants y=±1 satisfy the constraints and J(y) = 0. There-fore they are global minimizers.Assume m �= ±1. According to Exercise 2.1.1, a function y ∈C1,pw[0,1] is not cri-tical for K = (K1,K2) if

2λ1y+λ2 = 0 on [0,1]

holds for (λ1,λ2) = (0,0) only. Thus, only constant functions y can be critical. How-ever, a constant cannot fulfill both constraints. Therefore, a nonconstant (local) min-imizer y must satisfy the Euler-Lagrange equation

2y′′ = 2λ1y+λ2 piecewise on [0,1],

and the natural boundary conditions

y′(0) = 0 and y′(1) = 0.



From (2.1.5)1 we have y′ ∈C[0,1]. Hence y∈C2[0,1], and the Euler-Lagrange equa-tion is satisfied on all of [0,1]. We obtain

2∫ 1

0y′′dx= 0 = 2λ1

∫ 1

0ydx+λ2 = 2λ1m+λ2, or λ2 = −2λ1m.

The Euler-Lagrange equation with the natural boundary conditions admits the (non-constant) solutions

yn(x) = acosnπx+m, λ1 = −n2π2, n ∈ N,

which fulfill the constraints for 12a

2 +m2 = 1. This implies m2 < 1, and in this caseJ(yn) = n2π2(1−m2), which is minimal for n= 1 only.If a global minimizer exists for m2 < 1, then it is given by

y1(x) =√

2(1−m2)cosπx+m , withJ(y1) = π2(1−m2).

In this case, one obtains, for m = 0 and for any h ∈C1,pw[0,1]∩{∫ 10 hdx = 0} and

α2 =∫ 1

0 h2dx > 0,

K1(h/α) = 1, K2(h/α) = 0, J(h/α) ≥ π2,

which implies a Poincaré inequality

π2∫ 1

0h2dx ≤

∫ 1

0(h′)2dx for all h ∈C1,pw[0,1]∩{

∫ 1

0hdx= 0}.

Note that the existence of a global minimizer of J under the given constraints isassumed.

2.2.1 Let (x,y) ∈ (C1[ta, tb])2 satisfy (x(ta),y(ta)) = (0,A) and (x(tb),y(tb)) = (b,0), where b needs to be determined. By (2.2.5), the isoperimetric con-straint is

K(x,y) =12

∫ tb

tayx− xydt = S,

where it must be observed that the curve runs from (0,A) to (b,0). (The curvilinearintegrals along the axes are zero.) The surface of revolution to be minimized is

J(x,y) = 2π∫ tb

tay√

x2 + y2dt,

cf. (1.6.1) and (1.11.5). We omit the factor 2π , and we apply Proposition 2.1.5. Since

δK(x,y)(h1,h2) = −∫ tb

tayh1 − xh2dt


http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_2


for (h1,h2) ∈ (C10 [ta, tb])

2), (after integration by parts) nonconstant curves (x,y) ∈(C1[ta, tb])2 are not critical for the isoperimetric constraint. The Euler-Lagrangeequations read

ddt

(yx√x2 + y2

+12

λy

)= −1

2λ y,

ddt

(yy√x2 + y2

− 12

λx

)=

√x2 + y2 +

12

λ x.

Due to the invariance (2.1.37), the Euler-Lagrange equations are invariant withrespect to admitted reparametrizations. We parametrize by the arc length and obtainx2 + y2 = 1 and [ta, tb] = [0,L], cf. (2.6.6). Then the above equations become

(1) y(x+λ ) = c1,

(2) yy−λx= t+ c2,

for t ∈ [0,L]. Since y(L) = 0, the constant c1 = 0, and the natural boundary conditiony(L)(x(L)+ 1

2 λ ) = 0 is satisfied. Since we require y(t) > 0 for t ∈ [0,L], equation(1) implies x+λ = 0. Thus from by x(0) = 0, we have

x(t) = −λ t.

The relation x2 + y2 = 1 implies y2 = 1−λ 2. Since y(0) = A and y(L) = 0, we thenhave

y(t) = −√

1−λ 2t+A and√

1−λ 2L= A.

From equation (2), c2 = −A√

1−λ 2, and the constraint yields

12

∫ L

0yx− xydt = −1

2λAL= S.

The curve is a straight line, which begins at (0,A) and meets the x-axis at

b= x(L) = −λL=2SA

.

This straight line segment is the only curve that satisfies all necessary conditions ona minimizer, and therefore it is the minimizer, provided it exists.

2.4.1 The curve {(x,y(x))|x∈ [a,b]} satisfies the Euler-Lagrange equation (2.1.35),regardless of its (admitted) parametrization, cf. (1.11.1)–(1.11.5). Since F and G donot depend explicitly on x, the derivatives with respect to x vanish: Φx = Ψx = 0.The first equation of (2.1.35)2 yields, when parametrized by x,

ddx

(F(y,y′)+λG(y,y′)− y′(Fy′(y,y′)+λGy′(y,y′))) = 0,

piecewise on [a,b], and the continuity (2.4.5)2 implies (2.4.6).


http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2


2.5.1 The trajectory fulfills the system (2.5.32). Multiplying the three equations byxk, yk, zk, respectively, and summing up yield

ddt

N

∑k=1

12mk(x2

k + y2k + z2

k )+ddtV (x) =

m

∑i=1

λiddt

Ψi(x) = 0,

due to Ψi(x(t)) = 0, i= 1, . . . ,m, on [ta, tb].

This proves the constancy of the total energy along the trajectory.

2.5.2 For T = 12m(x

21 + x2

2 + x23), V = mgx3, and Ψ(x1,x2,x3) = x1 + x3 −1 = 0 the

system (2.5.32) becomesmx1 = λ ,

mx2 = 0,

mx3 = −mg+λ .

Subtracting the third equation from the first one, integrating twice yields(x1 − x3)(t). Using x1 + x3 = 1 and the initial conditions x1(0) = 0, x1(0) =0, x3(0) = 1, x3(0) = 0, one obtains unique solutions x1(t) and x3(t). The secondequation gives, together with the initial conditions x2(0) = 0 and x2(0) = v2, theunique solution x2(t):

x1(t) =14gt2, x2(t) = v2t, x3(t) = −1

4gt2 +1.

At t = t1 = 2√g , the point mass m has reached the plane x3 = 0. The running time

does not depend on v2. The time of the free fall x3(t) = − 12gt

2 + 1 from x3 = 1 to

x3 = 0 is t2 =√

2g .

2.5.3 For T = 12m(x

21+ x2

2+ x23),V =mgx3, and Ψ(x1,x2,x3) = x2

1+x22+x2

3 −�2 = 0,the system (2.5.32) becomes

mx1 = 2λx1,

mx2 = 2λx2,

mx3 = −mg+2λx3.

For spherical coordinates one obtains, with r = �,

x1 = �cosθ cosϕθ − �sinθ sinϕϕ ,

x2 = �cosθ sinϕθ + �sinθ cosϕϕ ,

x3 = −�sinθθ ,


http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2


which gives

T =12m�2(θ 2 + sin2 θϕ2), V = mg�cosθ .

The Euler-Lagrange equations for L(θ ,ϕ, θ , ϕ) = T (θ , ϕ)−V (θ) read

θ = sinθ cosθϕ2 +gl

sinθ ,

ddt

sin2 θϕ = 0, or sin2 θϕ = c.

The second relation is the conservation law given in Exercise 2.5.4.

2.5.4 From the system describing the spherical pendulum, cf. Exercise 2.5.3, onededuces

m(x1x2 − x2x1) = 0 andddt(x1x2 − x2x1) = 0.

By formula (2.2.5), ∫ t

tax1x2 − x1x2dt = 2F(t),

where F(t) is the area depicted in the following figure:

F(t)

x1

(x1(ta),x2(ta))

x2

0

(x1(t),x2(t))

On Green’s Formula

The curvilinear integrals along the line segments from 0 to (x1(ta),x2(ta)) andfrom (x1(t),x2(t)) to 0 are zero because the vector (x1,x2) is orthogonal to the outernormal vector (x2,−x1). From x1x2 − x1x2 = c, differentiation gives

ddtF(t) =

c2,

which means constancy of the area velocity.


http://dx.doi.org/10.1007/978-3-319-71123-2_2


2.6.1 The geodesic parametrized by arc length satisfies the following system:

x1 = 2λx1, x21 + x2

2 = 1,

x2 = 2λx2, x21 + x2

2 + x23 = 1,

x3 = 0.

Differentiation of x21 + x2

2 = 1 two times gives

x21 + x2

2 + x1x1 + x2x2 = x21 + x2

2 +2λ (x21 + x2

2) = 0 and

2λ = −x21 − x2

2 = x23 −1.

By the third equation, x3(s) = c1s+c0, and x3(0) = 0 implies c0 = 0. If L denotes thelength of the geodesic, we have x3(L) = c1L= 1 or c1 = L−1. By the geometry of thecylinder we have apparently L > 1 or c1 < 1, which is confirmed by the differentialequations

x1 = (c21 −1)x1, x2 = (c2

1 −1)x2,

which are solvable under the constraint x21 +x2

2 = 1 for c21 −1 < 0 only. The general

solutions are

x1(s) = a1 cos√

1− c21s+b1 sin

√1− c2

1s,

x2(s) = a2 cos√

1− c21s+b2 sin

√1− c2

1s.

The boundary conditions give:

x1(0) = a1 = 1, x2(0) = a2 = 0, x1(L) = cos√

1− c21L+b1 sin

√1− c2

1L= −1,

x2(L) = b2 sin√

1− c21L= 0.

The condition x21+x2

2 = 1 implies by differentiation, x1x1+x2x2 = 0, which at s= 0gives b1 = 0. The relation x2

1 + x22 = 1−c2

1 determines, at s= 0, the coefficient b2 =

±1, which implies, from x2(L) = ±sin√

1− c21L = 0, that

√1− c2

1L = π . Finally,

since c1 = L−1, we obtain L =√

1+π2. We find two geodesics from A = (1,0,0)to B= (−1,0,1):

{x(s) =

(cos

π√1+π2

s, ±sinπ√

1+π2s,

1√1+π2

s

)∣∣s ∈ [0,√

1+π2]}

.

2.6.2 The Euler-Lagrange equations for geodesics parametrized by arc length readin Cartesian coordinates

x1 = 2λx1, x21 + x2

2 = x23, x3 ≥ 0,

x2 = 2λx2, x21 + x2

2 + x23 = 1,

x3 = −2λx3.



Differentiation of ‖x(s)‖2 two times yields

dds

‖x‖2 = 2(x, x),d2

ds2 ‖x‖2 = 2‖x‖2 +2(x, x) = 2+2λ (x21 + x2

2 − x23) = 2.

In generalized coordinates, parametrized by the arc length, (2.6.19)1 yields, forf (r) = r, f ′(r) = 1, f ′′(r) = 0,

2r = rϕ2 and 2r2 + r2ϕ2 = 1.

This gives for r(s)2

dds

r2 = 2rr,d2

ds2 r2 = 2r2 +2rr = 1− r2ϕ2 + r2ϕ2 = 1.

Since ‖x‖2 = 2r2, the two differential equations coincide.Any geodesic, which is not a meridian, fulfills Clairaut’s relation (2.6.62). Hencer ≥ c1 > 0 for some constant c1. The differential equation for r2 is solved via

r(s)2 =12s2 + r1s+ r2

0, where r(0)2 = r20 > 0,

dds

r(s)2|s=0 = r1.

The estimate r2 ≥ c21 implies 1

2 r21 + c2

1 ≤ r20. Finally,

lims→±∞

r(s)2 = ∞.

2.7.1 Let x ∈ (C1[ta, tb])n satisfy ‖x‖2 −1 = 0 and (2.7.23). Then

2ddt(λ x) = 0 or λ x= c.

Furthermore,λ 2 = λ 2‖x‖2 = ‖c‖2.

If λ = ±‖c‖ �= 0, then x= 0, and hence x is a straight line. Conversely, if x is not astraight line, then λ = 0, and according to Definition 2.7.3, x is normal.

2.7.2 Let x∈ (C1[ta, tb])n satisfy x(ta)=A, Ψ(x, x)=DΨ(x)x= 0, and (2.7.3). Then

ddt

m

∑i=1

λi∇Ψi(x) = ∇

(m

∑i=1

λi(∇Ψi(x), x)

)= ∇

(m

∑i=1

λiddt

Ψi(x)

)

=m

∑i=1

λiddt

∇Ψi(x), and hencem

∑i=1

λi∇Ψi(x) = 0.

Since DΨ(x) has the maximal rank m for x ∈ M = {x ∈ Rn|DΨ(x)x = d

dtΨ(x) =0} = {x ∈ R

n|Ψ(x) = Ψ(x(ta)) = Ψ(A)}, the last equation implies that λi = 0 or


http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2


λi = ci for i= 1, . . . ,m. These constants do not necessarily vanish. Hence, accordingto Definition 2.7.3, x is not normal.

2.7.3 We use the following identities for Ψ(x, x) = DΨ(x)x= ddtΨ(x):

∇Ψi(x) = Ψi,x(x, x) = ( ddtΨi(x))x, Ψi,x(x, x) = d

dt ∇Ψi(x). Then

ddt(Φ +

m

∑i=1

λiΨi)x = (Φ +m

∑i=1

λiΨi)x ⇔

ddt

Φx+ddt

m

∑i=1

λi∇Ψi = Φx+m

∑i=1

λiddt

∇Ψi ⇔

ddt

Φx = Φx −m

∑i=1

˙λi∇Ψi.

The equivalence of the two systems holds with λi = − ˙λi.Remark: The functions λi may be replaced by λi + ci with constants ci withoutchanging system (2.7.21):

ddt

m

∑i=1

ciΨi,x =ddt

m

∑i=1

ci∇Ψi =m

∑i=1

ciddt

∇Ψi =m

∑i=1

ciΨi,x.

2.7.4

a) Let the function (y,z) ∈ (C1[a,b])n+m satisfy the nonholonomic constraint and(2.7.23) on [a,b], i.e.,

− ddx

(m

∑i=1

λiGi)y′ = −(m

∑i=1

λiGi)y,

ddx

λi = 0, i= 1, . . . ,m.

Thus, the functions λi are constant on [a,b], and the first n-dimensional systembecomes

m

∑i=1

λi(Gi,y − ddx

Gi,y′) = 0 on [a,b] ⇔m

∑i=1

λi

∫ b

a(Gi,y,h)+(Gi,y′ ,h

′)dx= 0, which means

m

∑i=1

λiδKi(y)h= 0 for all h ∈ (C1,pw0 [a,b])n,

cf. Proposition 1.4.2. The function (y,z) is normal if and only if this impliesλ1 = · · · = λm = 0. This is precisely the condition that y is not critical for theisoperimetric constraints, cf. Exercise 2.1.1.

b) Since the Lagrange function F does not depend on z and z′, the last m equationsof (2.7.21) are identical to the last m equations of (2.7.23), which imply that all


http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_1

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2


λi are constant, cf. part a). If the local minimizer (y,z) is normal with respect tothe nonholonomic constraint, then, according to Corollary 2.7.1, one can chooseλ0 = 1. In this case (2.7.21) is converted to (2.1.20).

2.7.5

a) For G(x,y,z,y′,z′) = y′ − z, one obtains

DpG(x,y,z, p1, p2) = (1,0),

and hence DpG has rank m= 1.b) The boundary value problem (2.7.10) reads

y′ − z = 0, y(a) = A1, z(a) = A2, y(b) = B1, z(b) = B2.

Since z ∈C1[a,b] is constrained only by the values z(a) and z(b), the equation

y(b) =∫ b

az(ξ )dξ +A1 = B1

can be solved for any B1.c) Let y′ − z = 0 and suppose that (2.7.23) is satisfied, i.e.,

ddx

λ = 0 and 0 = −λ .

Therefore any solution of y′ − z = 0 is normal.d) System (2.7.21) reads, for λ0 = 1, cf. Corollary 2.7.1:

ddx

(Fy′ +λ ) = Fy,ddx

Fz′ = −λ , and hence,

d2

dx2Fy′′ −ddx

Fy′ = −Fy on [a,b].

Here Fy′′ = Fy′′(·,y,y′,y′′), Fy′ = Fy′(·,y,y′,y′′), and Fy = Fy(·,y,y′,y′′).

2.7.6 The constraint G(y,y′) = 0 means geometrically that the vector y′ is orthogo-nal to the vector g(y). Specifically

y′ = αg⊥(y) where g⊥(y) = (−g2(y),g1(y)),

and α = α(y,y′) is a continuous scalar function.

a) System (2.7.23) reads (g is a column):

ddx

(λg) = λ (∇g1y′1 +∇g2y

′2) = λDg∗y′,

where Dg∗ =Dg(y)∗ is the transposed Jacobian matrix of the vector field g, andy′ is a column. Furthermore,


http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2


λ ′g+λDgy′ = λDg∗y′, or λ ′g= λ (Dg∗ −Dg)y′.

Inserting the column y′ = αg⊥(y) gives

λ ′g= λ (Dg∗ −Dg)αg⊥ = λ (g2,y1 −g1,y2)αg= λ (rotg)αg, and hence

λ ′ = α(rotg)λ ,

due to g(y) �= 0. For any continuous scalar function α(rotg) of x this lineardifferential equation possesses a nontrivial solution λ = λ (x). Therefore anysolution y of G(y,y′) = 0 is not normal.

b) Let y ∈ (C2[a,b])2 satisfy G(y,y′) = 0 and y(a) = A, y(b) = B. Then

y′ = αg⊥(y) where α = α(y,y′) =y′

2

g1(y1,y2)= − y′

1

g2(y1,y2),

and because g(y) �= 0, at least one expression is defined. By assumption on y,the function α ∈C1[a,b]. Therefore, the solution of the initial value problem

w′ = αg⊥(w), w(a) = A,

is unique, i.e., w = y on [a,b]. Define

β (x) =∫ x

aα(y(s),y′(s))ds,

and let z be the unique solution of the initial value problem

z′ = g⊥(z), z(0) = A.

Then w(x) = z(β (x)) solves

w′(x) = z′(β (x))α(y(x),y′(x)) = g⊥(z(β (x))α, or

w′ = αg⊥(w), w(a) = z(0) = A.

Hence w = y or y(x) = z(β (x)). In particular,

y(b) = z(β (b)) ∈ {z ∈ R2|z = z(x), x ∈ R},

which implies that the boundary value problem (2.7.10) is not solvable for all Bin a full neighborhood of B; any endpoint of y has to be on the curve describedby z.

2.8.1 As a first step, we choose h1, . . . ,hm ∈ (C1,pw0 [ta, tb])n satisfying (2.1.21), and

we define, for arbitrary h ∈ (C1,pw0 [ta, tb])n, the functions (2.1.22). Then, by (2.1.26)

and (2.1.27), the Lagrange multipliers do not depend on h, and the Euler-Lagrangeequations (2.1.35) are established.


http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2


As a second step, we define (2.1.22) with the admitted perturbation h(s, ·) con-structed in (2.8.7), (2.8.8) with h1, . . . ,hm ∈ (C1,pw

0 [ta, tb])n as in the first step. Thenwe obtain (2.1.26) with the same Lagrange multipliers (2.1.27). There is one differ-ence: In (2.1.26)1, the function h has to be replaced by ∂

∂ s h(0, ·) = η/7y. Specifi-cally, (2.1.26) gives, cf. (2.1.28),

∫ tb

ta((Φ +

m

∑i=1

λiΨi)x,ηy)+((Φ +m

∑i=1

λiΨi)x, ηy)dt

=∫ tb

ta((Φ +

m

∑i=1

λiΨi)x − ddt

(Φ +m

∑i=1

λiΨi)x,ηy)dt+(Φ +m

∑i=1

λiΨi)x,ηy)∣∣tbta= 0.

In view of the Euler-Lagrange equation (2.1.35), the integral vanishes, and choosingη(ta) = 1, η(tb) = 0, y ∈ Tx(ta)Ma, ‖y‖ ≤ 1, we obtain the transversality

(Φ +m

∑i=1

λiΨi)x(x(ta), x(ta)) ∈ Nx(ta)Ma,

with the same Lagrange multipliers as in the Euler-Lagrange equations.

2.8.2 Defining Ψ = (Ψ ,Ψa) : Rn → Rm+ma , the set Ma is given by Ψ(x) = 0, and

by the maximal rank of the Jacobian matrix, Ma is an (n− (m+ma))-dimensionalmanifold contained in M, cf. the Appendix.

Following (2.8.5), let x(ta) + sy+ ϕ(sy) ∈ Ma, where y ∈ Tx(ta)Ma, ϕ(sy) ∈Nx(ta)Ma, ‖y‖ ≤ 1 and s ∈ (−r,r). We choose h ∈ (C2[ta, tb])n satisfying

h(ta) = y ∈ Tx(ta)Ma, h(tb) = 0,

and we obtain as in (2.5.10)

a(t) = P(x(t))h(t) ∈ Tx(t)M for t ∈ [ta, tb],

a(ta) = y, since Tx(ta)Ma ⊂ Tx(ta)M, and a(tb) = 0.

We define a function H as in (2.5.12), where we insert the above function a. Fol-lowing (2.5.12)–(2.5.21), we construct a perturbation h : (−ε0,ε0)× [ta, tb] → R

n

satisfying

x(t)+h(s, t) ∈ M for (s, t) ∈ (−ε0,ε0)× [ta, tb],h(s, t) = sa(t)+b(s, t), a(t) ∈ Tx(t)M, b(s, t) ∈ Nx(t)M.

Since x(ta)+ sy+ϕ(sy) ∈ Ma ⊂ M,

Ψ(x(ta)+ sa(ta)+ϕ(sy)) = 0, and

Ψ(x(ta)+ sa(ta)+b(s, ta)) = 0.

Since b(s, ta) ∈ Nx(ta)M ⊂ Nx(ta)Ma, and ϕ(sy) ∈ Nx(ta)Ma is locally unique, we con-clude


http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2


b(s, ta) = ϕ(sy) = ϕ(sa(ta)), and

h(s, ta) ∈ Ma for all s ∈ (−ε0,ε0),0 < ε0 ≤ r.

To summarize, h(s, ·) is a perturbation, which is admitted for the holonomic con-straint as well as for the boundary condition. Furthermore, J(x+ h(s, ·)) is locallyminimal at s= 0 (observe (h(0, ·) = 0)). We follow (2.5.23):

dds

J(x+h(s, ·))|s=0 = 0

=∫ tb

ta(Φx(x, x),a)+(Φx(x, x), a)dt

=∫ tb

ta(Φx(x, x)− d

dtΦx(x, x),a)dt+(Φx(x, x),a)

∣∣tbta

= −(Φx(x(ta), x(ta)),y), where y= a(ta) ∈ Tx(ta)Ma.

The argument is that in view of the Euler-Lagrange equation (2.5.8) and the orthog-onality of ∑m

i=1 λi(t)∇Ψi(x(t))∈Nx(t)M to a(t)∈ Tx(t)M, the integral vanishes. Sincey ∈ Tx(ta)Ma is restricted only by ‖y‖ ≤ 1, we obtain the transversality

Φx(x(ta), x(ta)) ∈ Nx(ta)Ma,

which is weaker than the natural boundary condition (2.5.67).

2.8.3 The Euler-Lagrange equation reads ddx2x3y′ = 0, and thus, y(x) = c1

x2 +c2. Thecondition y(1) = 0 implies c1 + c2 = 0. By transversality,

4c21

x3 +(

4x3 − 2c1

x3

)4c1 = 0, and hence c1 = 4.

This gives y(x) = 4(

1x2 −1

)and (xb,y(xb)) = (

√2,−2).

2.8.4 The Euler-Lagrange equation y′′ = y has the solutions y(x) = c1ex + c2e−x.The condition y(0) = 1 implies c1 + c2 = 1. The transversality on the line ψ(x) = 2(hence ψ ′(x) ≡ 0) reads y2 − (y′)2 = 0, which implies c1c2 = 0. Since y(x) = ex

cannot fulfill y(xb) = 2 for some positive xb, only y(x) = ex remains, and xb = ln2.

2.8.5 The Euler-Lagrange equation reads

ddx

(y′ + y+1) = y′ +1.

Since y′ + y+1 ∈C1,pw[0,1] and y+ 1 ∈C1,pw[0,1], it follows that y′ ∈C1,pw[0,1]and

y′′ + y′ = y′ +1, or y′′ = 1piecewise on [0,1].


http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2


Solutions y ∈C1,pw[0,1] ⊂C[0,1] satisfying y′ ∈C1,pw[0,1] ⊂C[0,1] are

y(x) =12x2 + c1x+ c2.

Transversality in this case means precisely the natural boundary condition, in par-ticular

y′(0)+ y(0)+1 = 0, and y′(1)+ y(1)+1 = 0.

This gives the solution y(x) = 12x

2 − 32x+

12 .

2.8.6 (i) The solution y(x) = 4(

1x2 −1

)of Exercise 2.8.2 is a global minimizer:

Continuously differentiable perturbations y+ h satisfying h(1) = 0 and y(x0) +h(x0) = 2

x20−3, or 2

x20+h(x0) = 1 for some x0 > 1, are admitted. Then

J(y+h) =∫ x0

1x3(− 8

x3 +h′(x))2dx

=∫ √

2

1x3 64

x6 dx+∫ x0

√2x3 64

x6 dx−16h(x0)+∫ x0

1x3(h′(x))2dx

= J(y)+16(1− 2

x20

−h(x0))+∫ x0

1x3(h′(x))2dx

> J(y), since 1− 2

x20

−h(x0) = 0holds for x0 > 1.

(ii) The solution y(x) = ex of Exercise 2.8.4 is a global minimizer:Continuously differentiable perturbations y+ h satisfying h(0) = 0 and y(x0) +h(x0) = 2, or ex0 +h(x0) = 2 for some x0 > 0, are admitted. Then

J(y+h) =∫ x0

0(y+h)2 +(y′ +h′)2dx

=∫ ln2

0y2 +(y′)2dx+

∫ x0

ln2y2 +(y′)2dx+2

∫ x0

0(y− y′′)hdx+2y′(x0)h(x0)

+∫ x0

0h2 +(h′)2dx= J(y)+ e2x0 −4+2ex0h(x0)+

∫ x0

0h2 +(h′)2dx

= J(y)− (h(x0))2 +∫ x0

0h2 +(h′)2dx,

where we use ex0 = 2−h(x0). By Example 5 in Paragraph 1.5,∫ x0

0h2 +(h′)2dx ≥ Min{

∫ x0

0h2 +(h′)2dx|h(0) = 0, h(x0) = h(x0)}

=∫ x0

0c2

1(ex − e−x)2 + c2

1(ex+ e−x)2dx

= c21(e

2x0 − e−2x0) = h(x0)c1(ex0 + e−x0) (c1(ex0 − e−x0) = h(x0))

= (h(x0))2cothx0 > (h(x0))2 for x0 > 0.



Therefore J(y+h) ≥ J(y) for all admitted perturbations h.(iii) Any y+h, where h ∈C1,pw[0,1], is an admitted perturbation of the solution

y(x) = 12x

2 − 32x+

12 of Exercise 2.8.5:

J(y+h) =∫ 1

0

12(y′ +h′)2 +(y+h)(y′ +h′)+ y′ +h′ + y+hdx

= J(y)+∫ 1

0(−y′′ − y′ + y′ +1)hdx+(y′ + y+1)h|10 +

∫ 1

0

12(h′)2 +hh′dx

= J(y)+12((h(1))2 − (h(0))2 +

∫ 1

0(h′)2dx),

where the other terms vanish by virtue of the Euler-Lagrange equation and the nat-ural boundary conditions.Choosing h1(x) = ε(x+ 1

2 ), we obtain

J(y+h1) = J(y)+32

ε2 > J(y),

and for h2(x) = ε(−x+ 32 ),

J(y+h2) = J(y)− 12

ε2 < J(y).

Therefore the solution of Exercise 2.8.5 is not a local minimizer.

2.9.1 According to Proposition (2.5.1), the local minimizer x solves

ddt


∑i=1

λi∇Ψi(x) on [ta, tb].

Due to the invariance of Φ , differentiation with respect to x and s yields

∇Ψi(hs(x))Dhs(x) = ∇Ψi(x), ∇Ψi(hs(x))∂∂ s

hs(x) = 0,

where here and in the sequel, ∇Ψi are row vectors and the product of ∇Ψi and ∂∂ s h

s

is the Euclidean scalar product in Rn. By the computations (2.9.11),

ddt


∑i=1

λi∇Ψi(x) = 0

=

(ddt

Φx(hs(x),ddths(x))−Φx(hs(x),

ddths(x))−

m

∑i=1

λi∇Ψi(hs(x))

)Dhs(x).

Thus, hs(x) solves the Euler-Lagrange system with the same Lagrange multipliersfor all s ∈ (−δ ,δ ). Using (2.9.16) and (2.9.17), we obtain


http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2


ddt(Φx(hs(x),Dhs(x)x)

∂∂ s

hs(x))

=ddt

Φx(hs(x),ddths(x))

∂∂ s

hs(x)+Φx(hs(x),Dhs(x)x)ddt

∂∂ s

hs(x)

=(

Φx(hs(x),ddths(x))+

m

∑i=1

λi∇Ψi(hs(x)))

∂∂ s

hs(x)

+Φx(hs(x),Dhs(x)x)∂∂ s

Dhs(x)x= 0,

where we use (2.9.15) and the invariance of Ψ .

Remark It is not necessary that the curve x be a local minimizer under the holo-nomic constraint. It suffices that x solves the Euler-Lagrange system (2.5.8).

2.10.1 For β = 0 and for r = 0 for m1 at a distance R from m2, formula (2.10.17)gives E = −k/R. For r < 0 the square root in (2.10.17) is negative. Let G(r) be aprimitive of −1/F(r). Then

ddtG(r) = G′(r)r = (−1/F(r))F(r) = −1, or G(r) = c− t.

For t = 0, we have G(R) = c, and for t = T , we obtain G(0) =G(R)−T . This givesthe following estimate:

T = G(R)−G(0) =∫ R

0G′(r)dr =

√m2k

∫ R

0

1√1r − 1

R

dr

=√

m2k

∫ R

0

√r

1− rR

dr =√

m2k

√R

∫ R

0

√rR

1− rR

dr

=√

m2k

R3/2∫ 1

0

√s

1− sdx < 2

√m2k

R3/2.

2.10.2 Take β = 0, E = 0, and the negative square root in (2.10.17). Then, for r=R,the velocity is

r = −√

2km

1√R

,

and the computation in Exercise 2.10.1 gives the falling time

T =√

m2k

∫ R

0

√rdr =

23

√m2k

R3/2.

3.1.1 Let u ∈ X be a weak eigenvector with eigenvalue λ . Then by the symmetry ofB and K, we have


http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2

http://dx.doi.org/10.1007/978-3-319-71123-2_2


B(u,un) = λK(u,un), and

B(un,u) = λnK(un,u) for alln ∈ N.

Hence, (λ −λn)K(u,un) = 0.

If λ �= λn for all n∈N, then cn =K(u,un) = 0 for all n∈N. Proposition (3.1.5) thenimplies u= 0, which is not a weak eigenvector. Therefore λ = λn0 for some n0 ∈N.By the above argument K(u,un) = 0 for all n �= n0, and again Proposition 3.1.5 andCorollary 3.1.1 imply that u is a linear combination of the weak eigenvectors witheigenvalue λn0 .

3.2.1 According to Definition 3.2.2 of the norm inW 1,2(a,b), a sequence (yn)n∈N isa Cauchy sequence inW 1,2(a,b) if (yn)n∈N as well as (y′

n)n∈N are Cauchy sequencesin L2(a,b). Both sequences have a limit, y0 and z0 in L2(a,b), respectively. Accord-ing to Definition 3.2.1,

(y′n,h)0,2 = −(yn,h′)0,2 for all h ∈C∞

0 (a,b),

and for all n ∈ N. Taking the limits, the continuity of the scalar product (Cauchy-Schwarz inequality) implies

(z0,h)0,2 = −(y0,h′) for all h ∈C∞

0 (a,b).

Hence z0 = y′0 in the weak sense. By Definition 3.2.2, y0 ∈W 1,2(a,b), and

limn→∞

‖yn − y0‖1,2 = 0.

3.2.2 By (3.2.50), we deduce

J(y) ≥ c1‖y′‖20,2 − c2

∫ b

a|y|qdx− c3(b−a)

≥ c1‖y′‖20,2 − c2(b−a)1−(q/2)

(∫ b

ay2dx

)q/2

− c3(b−a), and

in view of Hölder’s inequality,

J(y) ≥ c1

2C21

‖y‖20,2 − c2(b−a)1−(q/2)‖y‖q0,2 − c1C2

2

C21

− c3(b−a) (using (3.2.31))

≥ −c4, due to c1 > 0 and q < 2.

Inserting a minimizing sequence, we obtain

c1‖y′n‖20,2 − c2

∫ b

a|yn|qdx− c3(b−a) ≤ m+1 or

m+1+ c3(b−a) ≥ c1‖y′n‖20,2 − c2(b−a)1−(q/2)‖yn‖q0,2

≥ c1

2C21

‖yn‖21,2 − c2(b−a)1−(q/2)‖yn‖q1,2 − c1C2

2

C21

, where again we use (3.2.31).


http://dx.doi.org/10.1007/978-3-319-71123-2_3

http://dx.doi.org/10.1007/978-3-319-71123-2_3

http://dx.doi.org/10.1007/978-3-319-71123-2_3

http://dx.doi.org/10.1007/978-3-319-71123-2_3

http://dx.doi.org/10.1007/978-3-319-71123-2_3

http://dx.doi.org/10.1007/978-3-319-71123-2_3

http://dx.doi.org/10.1007/978-3-319-71123-2_3


Hence, since c1 > 0 and q < 2, the sequence is bounded in W 1,2(a,b): ‖yn‖1,2 ≤C3

for all n ∈ N.Again, by virtue of (3.2.50),

J(y) = J(y)+ c2

∫ b

a|y|qdx+ c3(b−a) ≥ 0 for all y ∈ D,

and limn→∞

yn = y0 in C[a,b]. This is the same as uniform convergence in [a,b], which

implies

limn→∞

c2

∫ b

a|yn|qdx+ c3(b−a) = c2

∫ b

a|y0|qdx+ c3(b−a).

Thus, for nonnegative F , cf. (3.2.49)1, we have

liminfn→∞

J(yn) ≥ J(y0),

which implies finally the lower semi-continuity of J:

liminfn→∞

J(yn) = liminfn→∞

J(yn)− limn→∞

c2

∫ b

a|yn|qdx− c3(b−a)

≥ J(y0)− c2

∫ b

a|y0|qdx− c3(b−a) = J(y0).

3.2.3 The Lagrange function F(x,y,y′) = x2(y′)2 does not satisfy the coercivity(3.2.34)1. The minimizing sequence yn(x) = arctannx/arctann is not bounded inW 1,2(−1,1):

‖yn‖21,2 ≥ 1

(arctann)2

∫ 1

−1

(n

1+n2x2

)2

dx >4

π2 n2∫ 1

n

− 1n

1(1+n2x2)2 dx

>4

π2 n2 2n

14=

2nπ2 .

3.3.1

a) The first variation of the first constraint yields δK(un)un = 2K(un,un) = 2, andδK(un)ui = 2K(un,ui)= 0 for i= 1, . . . ,n−1. Due to linearity, the first variationof the last n−1 constraints is given by the functionals themselves, and in viewof K(ui,uk) = δik for i,k = 1, . . . ,n− 1, the n isoperimetric constraints are notcritical for un.

b) Scalar multiplication of the Euler-Lagrange equation by uk in L2(a,b) gives, byvirtue of the K-orthonormality of the eigenfunctions,

∫ b

a((pu′

n)′ −qun)ukdx=

12

λk.

On the other hand, scalar multiplication of the equation for the k-th eigenfunc-tion by un gives


http://dx.doi.org/10.1007/978-3-319-71123-2_3

http://dx.doi.org/10.1007/978-3-319-71123-2_3

http://dx.doi.org/10.1007/978-3-319-71123-2_3


∫ b

aun((pu′

k)′ −quk)dx= −

∫ b

aunλkρukdx= 0.

After integration by parts twice and taking into account the homogeneousboundary conditions on un and uk, we see that both left sides above are equal,and hence λk = 0 for k = 1, . . . ,n−1.

c) Scalar multiplication of the equation (pu′n)

′−qun = λnρun by un and integrationby parts yield

−∫ b

ap(u′

n)2 +qu2

ndx= λn

∫ b

aρu2

ndx= λn, or −B(un) = λn.

3.3.2 Let S= span{un|n ∈ N} and let clXS denote the closure of some space S in X .By Proposition 3.3.3, we know that cl

W 1,20 (a,b)S =W 1,2

0 (a,b). Since convergence in

W 1,20 (a,b) implies convergence in L2(a,b), we have

W 1,20 (a,b) = cl

W 1,20 (a,b)S ⊂ clL2(a,b)S ⊂ L2(a,b).

For the closure of all spaces in L2(a,b), we obtain

L2(a,b) = clL2(a,b)W1,20 (a,b) ⊂ clL2(a,b)S ⊂ L2(a,b).

Hence all spaces are equal, and the system {un}n∈N is complete in L2(a,b).

3.3.3 We define

y′(x) ={y′(x) for x ∈ [a,b],

0 for x /∈ [a,b],

where y′ is the weak derivative of y. The functions y, y′ are in L2(c,d), and

∫ d

cy′h+ yh′dx=

∫ b

ay′h+ yh′dx for all h ∈C∞

0 (c,d).

In order to prove that y′ is the weak derivative of y according to Definition 3.2.1, wehave to show that the above integral is zero. If h ∈C∞

0 (a,b), then hh ∈C∞0 (a,b), and

since y ∈W 1,2(a,b), Definitions 3.2.1 and 3.2.2 imply

∫ b

ay′hh+ y(hh)′dx= 0 =

∫ b

a(y′h+ yh′)h+(yh)h′dx, or

y′h+ yh′ ∈ L2(a,b) is the weak derivative ofyh ∈ L2(a,b),

i.e., yh ∈W 1,2(a,b).

Then by Lemma 3.2.3 and since yh ∈W 1,2(a,b), there follows

∫ b

ay′h+ yh′dx=

∫ b

a

ddx

(yh)dx= y(b)h(b)− y(a)h(a) = 0,


http://dx.doi.org/10.1007/978-3-319-71123-2_3

http://dx.doi.org/10.1007/978-3-319-71123-2_3

http://dx.doi.org/10.1007/978-3-319-71123-2_3

http://dx.doi.org/10.1007/978-3-319-71123-2_3

http://dx.doi.org/10.1007/978-3-319-71123-2_3


due to y(a) = y(b) = 0. Therefore, y ∈W 1,20 (c,d), due to y(c) = y(d) = 0. (It is an

easy exercise that, according to Definition 3.2.1, y′h+ yh′ is the weak derivative ofyh.)

3.3.4 Denote the first eigenfunctions with weight functions ρi by ui1. Then

B(u11,u

21) = λ1(ρ1)K1(u1

1,u21),

B(u21,u

11) = λ1(ρ2)K2(u2

1,u11).

By symmetry, the two left sides above are equal. Assuming that λ1(ρ1) = λ1(ρ2),subtraction of the two equations yields

λ1(ρ1)∫ b

a(ρ2 −ρ1)u1

1u21dx= 0.

Under the hypotheses of Proposition 3.3.5, λ1(ρ1) > 0, and by Proposition 3.3.8,both first eigenfunctions are positive on (a,b) (w.l.o.g.). Thus the above integral canvanish under assumption (3.3.28) only if ρ2 − ρ1 = 0. By Proposition 3.3.5, onlyλ1(ρ1) > λ1(ρ2) is left as a possibility, yielding ρ1 �= ρ2.


http://dx.doi.org/10.1007/978-3-319-71123-2_3

http://dx.doi.org/10.1007/978-3-319-71123-2_3

http://dx.doi.org/10.1007/978-3-319-71123-2_3

http://dx.doi.org/10.1007/978-3-319-71123-2_3

http://dx.doi.org/10.1007/978-3-319-71123-2_3

Bibliography

Textbooks

1. Blanchard, P., Brüning, E.: Direkte Methoden der Variationsrechnung. Springer, Wien, NewYork (1982)

2. Bliss, G.A.: Variationsrechnung. Herausgegeben von F. Schwank. Teubner, Leipzig, Berlin(1932)

3. Bolza, J.: Vorlesungen über Variationsrechnung, 2nd edn. Chelsea Publ. Comp, New York(1962)

4. Brunt, B.v.: The Calculus of Variations. Universitext. Springer, New York (2004)5. Buttazzo, G., Giaquinta, M., Hildebrandt, S.: One-dimensional Variational Problems. Claren-

don Press, Oxford (1998)6. Dacorogna, B.: Introduction to the Calculus of Variations. Imperial College Press, London

(2004)7. Dacorogna, B.: Direct Methods in the Calculus of Variations, 2nd edn. Springer, Berlin (2008)8. Elsgolc, L.E.: Variationsrechnung. BI, Mannheim (1970)9. Evans, L.C.: Partial Differential Equations. Graduate Studies in Mathematics, vol. 19. AMS,

Providence, RI (1998)10. Ewing, G.M.: Calculus of Variations with Applications. Courier Dover Publications, New

York (1985)11. Funk, P.: Variationsrechnung und ihre Anwendung in Physik und Technik, Zweite edn.

Springer, Berlin (1970)12. Gelfand, I.M., Fomin, S.V.: Calculus of Variations. Prentice-Hall, Englewood Cliffs, N.J.

(1963)13. Giaquinta, M., Hildebrandt, S.: Calculus of Variations I. Springer, Berlin (1996)14. Giaquinta, M., Hildebrandt, S.: Calculus of Variations II. Springer, Berlin (1996)15. Giusti, E.: Direct Methods in the Calculus of Variations. World Scientific, Singapore (2003)16. Jost, J., Li-Jost, X.: Calculus of Variations. Cambridge University Press, Cambridge (1998)17. Kielhöfer, H.: Bifurcation Theory. An Introduction with Applications to PDEs, 2nd edn.

Springer, New York (2012)18. Kreyszig, E.: Introductory Functional Analysis with Applications. Wiley, New York (1989)19. Royden, H.L., Fitzpatrick, P.M.: Real Analysis, 4th edn. Prentice Hall, Boston (2010)20. Rudin, W.: Real and Complex Analysis, 3rd edn. McGraw-Hill, New York (1987)21. Rudin, W.: Functional Analysis, 2nd edn. McGraw-Hill, New York (1991)


223


224 Bibliography

22. Pinch, E.R.: Optimal Control and the Calculus of Variations. Oxford University Press, NewYork (1993)

23. Sagan, H.: Introduction to the Calculus of Variations. McGraw-Hill, New York (1969)24. Weinstock, R.: Calculus of Variations: With Applications to Physics and Engineering.

McGraw-Hill, New York (1952)25. Werner, D.: Funktionalanalysis. Springer, Berlin (2007)


Index

Aaction, 46, 83areal velocity, 134Arzelà-Ascoli Theorem, 159, 193

BBanach space, 3bifurcation, 171bilinear form, 144Bolzano-Weierstraß

theorem of, 142bootstrapping, 170, 173, 176bound

for a nonholonomic constraint, 105brachistochrone, xiv, 29, 116broken extremals, 3

Ccatenary, 71Cauchy-Schwarz inequality, 141, 156Clairaut’s relation, 101coercive, 144coercivity, 143, 162complete

complete normed space, 3conservation law, 88, 127, 130conservative system, 46constraining forces, 83constraint

critical, 58, 65holonomic, 77isoperimetric, 57, 61, 64nonholonomic, 103

continuous, 3sequential continuity, 3

weakly lower semi-continuous, 143weakly sequentially continuous, 147

convergence, 3strong, 141weak, 141

convex, 13partial convexity, 13, 143, 162

criticalfor an isoperimetric constraint, 58, 62,

65cycloid, 33

Ddegrees of freedom, 83, 87derivative

distributional, 155weak, 155

Dido’s problem, xii, 27, 66direct methods, 139Dirichlet, 139Dirichlet integral, 139

Eeigenvalue, 148eigenvalue problem

weak, 147eigenvector, 148eigenvektor

weak, 149ellipticity, 24, 55, 154energy

effective potential, 134free, 46kinetic, 46, 82potential, 46, 82


225


226 Index

total, 46Euler-Lagrange equation, 12exchange of stability, 171

FFermat’s principle, xifinite elements, 174Fourier series, 150Fréchet derivative, 16Fredholm alternative, 174free

for a nonholonomic constraint, 105function

continuous, 1continuously differentiable, 1piecewise continuous, 8piecewise continuously differentiable, 1

functional, xvnonparametric, 48, 50parametric, xv, 40, 45, 49, 67

fundamental lemma of calculus of varia-tions, 10

GGâteaux

differentiable, 4differential, 4

Galerkin method, 170generalized coordinates, 83, 87geodesic, 96

HHölder continuous, 158Hölder norm, 159Hamiltonian, 87

system, 87harmonic function, 139Hilbert space, 141

Iinvariance

of the Euler-Lagrange equations, 43invariant, 40, 45, 65, 125

KK-coercivity, 152K-orthonormal, 149Kepler’s laws, 136

LLagrange function, 1Lagrange multiplier, 63, 77, 108, 190Lagrangian, xv, 46, 83, 87Laplace’s equation, 139Lax-Milgram Theorem, 145Lebesgue integral, 155Lebesgue measure, 155Legendre

necessary condition due to Legendre, 16transformation, 87

Liouville’s theorem, 88, 190

Mmanifold, 77, 187maximizer, 11minimal surface of revolution, 24minimax principle, 153, 177minimizer

local, 11strong local, 11

minimizing sequence, 140multiplicity

algebraic multiplicity of an eigenvalue,176

geometric multiplicity of an eigenvalue,150

multiplier rule of Lagrange, 189

Nnatural boundary conditions, 37, 44, 46, 60,

63, 90, 94, 113, 120norm, 2normal

for a nonholonomic constraint, 107normal space, 187Null Lagrangian, 47

Oorthogonal projection, 78, 187orthonormal, 168

Pparametric differential equation, 32parametric form, 40parametric functional, xv, 40, 45, 49, 67pendulum

cycloidal, 85gravity, 83spherical, 95

piecewise continuously differentiable, 1


Index 227

Poincaré inequality, 66, 161, 177point mass, xiv, 46Poisson bracket, 88positive definite, 144principle of least action, 46

Qquasilinear, 12

RRayleigh quotient, 153, 177regularity, 24regularity theory, 154reparametrization, 40Riesz Representation Theorem, 141, 144Riesz-Schauder theory, 174Ritz method, 170, 174

Ssawtooth function, 19Schauder basis, 150, 168self-adjoint form, 174self-organization

of new states, 171sequential weak compactness, 191Snell’s law of refraction, xii, 34Sobolev space, 156spectrum, 153spontaneous symmetry breaking, 171stiffness matrix, 174strong version

of the Euler-Lagrange equation, 12

Sturm-Liouvilleboundary value problem, 172eigenvalue problem, 174

support of a function, 9surface of revolution, 24symmetric bilinear form, 144

Ttangent space, 187test function, 9, 156transversality, 114

free, 123modified, 115

two-body problem, 131

Vvariation

first, 5second, 8

Wweak version

of the Euler-Lagrange equation, 12weakly differentiable, 155Weierstraß counterexample, 18, 139Weierstraß-Erdmann corner conditions, 52,

71weight function, 174

YYoung measure, 143


mathschoolinternational.commathschoolinternational.com/math-books/calculus/books/calculus... ·...

Documents