calculus of variations and partial di erential equationsdgomes/notas_calvar.pdf · introduction...
TRANSCRIPT
Calculus of Variations and Partial
Differential Equations
Diogo Aguiar Gomes
Contents
. Introduction 5
1. Finite dimensional optimization problems 9
1. Unconstrained minimization in Rn 10
2. Convexity 16
3. Lagrange multipliers 26
4. Linear programming 30
5. Non-linear optimization with constraints 37
6. Bibliographical notes 48
2. Calculus of variations in one independent variable 49
1. Euler-Lagrange Equations 50
2. Further necessary conditions 57
3. Applications to Riemannian geometry 60
4. Hamiltonian dynamics 75
5. Sufficient conditions 89
6. Symmetries and Noether theorem 105
7. Critical point theory 111
8. Invariant measures 116
9. Non convex problems 118
10. Geometry of Hamiltonian systems 119
11. Perturbation theory 122
12. Bibliographical notes 126
3. Calculus of variations and elliptic equations 127
1. Euler-Lagrange equation 129
2. Further necessary conditions and applications 136
3. Convexity and sufficient conditions 136
4. Direct method in the calculus of variations 136
3
4 CONTENTS
5. Euler-Lagrange equations 145
6. Regularity by energy methods 146
7. Holder continuity 155
8. Schauder estimates 171
4. Optimal control and viscosity solutions 183
1. Elementary examples and properties 186
2. Dynamic programming principle 188
3. Pontryagin maximum principle 190
4. The Hamilton-Jacobi equation 192
5. Verification theorem 193
6. Existence of optimal controls - bounded control space 195
7. Sub and superdifferentials 197
8. Optimal control in the calculus of variations setting 202
9. Viscosity solutions 214
10. Stationary problems 224
5. Duality theory 231
1. Model problems 231
2. Some informal computations 237
3. Duality 241
4. Generalized Mather problem 244
5. Monge-Kantorowich problem 266
. Bibliography 269
. Index 271
Introduction
This book is dedicated to the study of calculus of variations and its
connection and applications to partial differential equations. We have
tried to survey a wide range of techniques and problems, discussing,
both classical results as well as more recent techniques and problems.
This text is suitable to a first one-year graduate course on calculus of
variations and optimal control, and is organized in the following way:
1. Finite dimensional optimization problems;
2. Calculus of variations with one independent variable;
3. Calculus of variations and elliptic partial differential equations;
4. Deterministic optimal control and viscosity solutions;
5. Duality theory.
The first chapter is dedicated to finite dimensional optimization,
giving emphasis to techniques that can be generalized and applied in in-
finitely dimensional problems. This chapter starts with an elementary
discussion of unconstrained optimization in Rn and convexity. Then
we discuss constrained optimization problems, linear programming and
KKT conditions. The following chapter concerns variational problems
with one independent variable. We study classical results including
applications to Riemannian geometry and classical mechanics. We also
discuss sufficient conditions for minimizers, Hamiltonian dynamics and
several other related topics. The next chapter concerns variational
problems with functionals defined through multiple integrals. In many
of these problems, the Euler-Lagrange equation is an elliptic partial
differential equation, possibly non linear. Using the direct method in
the calculus of variations, we prove the existence of minimizers. Then
5
6 INTRODUCTION
we show that the minimum is a weak solution to the Euler-Lagrange
equation and study its regularity. The study of regularity follows the
classical path: first we consider energy methods, then we prove the De
Giorgi-Nash-Moser estimates and finally Schauder estimates. In the
fourth chapter we consider optimal control problems. We study both
classical control theory methods such as the dynamic programming
and Pontryagin maximum principle, as well as more recent tools such
as viscosity solutions of Hamilton-Jacobi equations. The last chap-
ter is a brief introduction to the (infinite dimensional) duality theory
and its applications to non-linear partial differential equations. We
study Mather’s problem and Monge-Kantorowich optimal mass trans-
port problem. These have important relations with Hamilton-Jacobi
and Monge-Ampere equations, respectively.
The pre-requisites of these notes are some familiarity with the
Sobolev spaces and functional analysis, at the level of [Eva98b]. With
some few exceptions, we do not assume familiarity with partial differ-
ential equations beyond elementary theory.
Many of the results discussed, as well as important extensions,
can be found in the bibliography. In what it what concerns finite
dimensional optimization and linear programming, the main reference
is [Fra02]. On variational problems with one independent variable,
a key reference is [AKN97]. The approach to elliptic equations in
chapter 3 was strongly influenced by the course the author frequented
at the University of California at Berkeley by Fraydoun Rezakhanlou,
by the (unpublished) notes on Elliptic Equations by my advisor L. C.
Evans, and by the book [Gia83]. The books [GT01] and [Gia93]
are also classical references in this area. Optimal control problems are
discussed in 4. The main references are [Eva98b], [Lio82], [Bar94]
[FS93], [BCD97]. The last chapter concerns duality theory. We rec-
ommend the books [Eva99] [Vil03a], [Vil] as well as the author’s
papers [Gom00], [Gom02b].
INTRODUCTION 7
I would like to thank my students: Tiago Alcaria, Patrıcia Engracia,
Sılvia Guerra, Igor Kravchenko, Anabela Pelicano, Ana Rita Pires,
Veronica Quıtalo, Lucian Radu, Joana Santos, Ana Santos, and Vitor
Saraiva, which took courses based on part of these notes and suggested
me several corrections and improvements. My friend Pedro Girao de-
serves a special thanks are he read the first LATEX version of these notes
and suggested many corrections and improvements.
1
Finite dimensional optimization problems
This chapter is an introduction to optimization problems in finite
dimension. We are certain that many of the results discussed, as well as
its proofs, are familiar to the reader. However, we feel that it is instruc-
tive to recall them and, throughout this text, observe how they can be
adapted for infinite dimensional problems. The plan of this chapter is
the following: we start in §1 by considering unconstrained minimization
problems in Rn, we discuss existence and uniqueness of minimizers, as
well as first and second order tests for minimizers. The following sec-
tion, §2, concerns properties of convex functions which will be needed
throughout the text. We start the discussion of constrained optimiza-
tion problems in §3 by studying the Lagrange multiplier method for
equality constraints. Then, the general case involving both equality
and inequality constrains is discussed in the two remaining sections. In
§4 we consider linear programming problems, and in §5 we discuss non-
linear optimization problems and we derive the Karush-Kuhn-Tucker
(KKT) conditions. The chapter ends with a few bibliographical refer-
ences.
The general setting of optimization problems is the following: given
a function f : Rn → R and a set X ⊂ Rn, called the admissible set, we
would like to solve the following minimization problem
(1)
min f(x)
x ∈ X,
i.e. to find the solution set S ⊂ X such that
f(y) = infXf,
9
10 1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS
for all y ∈ S. We should note that the ”min” in (1) should be
read ”minimize” rather than ”minimum” as the minimum may not
be achieved. The number infX f is called the value of problem (1).
1. Unconstrained minimization in Rn
In this section we address the unconstrained minimization case,
that is the case in which the admissible set X is Rn. Let f : Rn → Rbe an arbitrary function. We look for conditions on f that
• ensure the existence of a minimum;
• show that this minimum is unique.
In many instances, existence and uniqueness results are not enough:
we would also like to
• determine necessary or sufficient conditions for a point to be a
minimum;
• estimate the location of a possible minimum.
By looking for all points that satisfy necessary conditions one can
determine a set of candidate minimizers. Then, by looking at sufficient
conditions one may in fact be able to show that some of these points
are indeed minimizers.
To study the existence of a minimum of f , we can use the following
procedure, called the direct method of the calculus of variations: let
(xn) be a minimizing sequence, that is, a sequence such that
f(xn)→ inf f.
Proposition 1. Let A be an arbitrary set and f : A→ R. Then there
exists a minimizing sequence.
1. UNCONSTRAINED MINIMIZATION IN Rn 11
Proof. If infA f = −∞, there exists xn ∈ A such that f(xn) →−∞. Otherwise, if infA f > −∞ ,we can always find xn ∈ A such
that infA f ≤ f(xn) ≤ infA f + 1n, which again produces a minimizing
sequence.
Let f : Rn → R. Suppose (xn) is a minimizing sequence for f . If
xn (or some subsequence) converges to a point x, and, if additionaly,
f(xn) converges to f(x), then x is a minimum of f because
f(x) = lim f(xn),
and
lim f(xn) = inf f,
because xn is a minimizing sequence. Thus f(x) = inf f . Although
minimizing sequences always exist, they may fail to converge, even up
to subsequences, as the next exercise illustrates:
Exercise 1. Consider the function f(x) = e−x. Compute inf f , give an
example of a minimizing sequence. Show that no minimizing sequence
for f converges.
As the previous exercise suggests, to ensure convergence it is nat-
ural to impose certain compactness conditions. In Rn, any bounded
sequence (xn) has a convergent subsequence. A convenient condition
on f that ensures boundedness of minimizing sequences is coercivity:
a function f : Rn → R is called coercive if f(x)→ +∞, as |x| → ∞.
Exercise 2. Let f be a coercive function and let xn be a sequence such
that f(xn) is bounded. Show that xn is bounded. Note in particular
that if f(xn) is convergent then xn is bounded.
Therefore, from the previous exercise, it follows
Proposition 2. Let f : Rn → R be a coercive function. Let (xn) is
a minimizing sequence for f . Then there exists a point x for which,
through some subsequence xn → x.
12 1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS
Unfortunately, if f is discontinuous at x, f(xn) may fail to converge
to f(x). This poses a problem because if xn is a minimizing sequence
f(xn) → inf f and if this limit is not f(x) then x cannot be a mini-
mizer. It would, therefore, seem natural to require f to be continuous.
However, to establish that x is a minimizer we do not really need con-
tinuity. In fact, a weaker property is sufficient: it is enough that for
any sequence (xn) converging to x the following inequality holds:
(2) lim inf f(xn) ≥ f(x).
A function f is called lower semicontinuous if inequality (2) holds for
any point x and any sequence xn converging to x.
Example 1. The function
f(x) =
1 if x 6= 0
0 if x = 0
is lower semicontinuous. However,
g(x) =
0 if x 6= 0
1 if x = 0
is not. J
ADD HERE GRAPH OF FUNCTIONS
Proposition 3. Let f : Rn → R be lower semicontinuous and let
(xn) ⊂ Rn be a minimizing sequence converging to x ∈ Rn. Then x is
a minimizer of f .
Proof. Let xn be a minimizing sequence. Then
inf f = lim f(xn) = lim inf f(xn) ≥ f(x),
that is, f(x) ≤ inf f .
Lower semicontinuity is a weaker property than continuity, and
therefore easier to be satisfied.
1. UNCONSTRAINED MINIMIZATION IN Rn 13
Establishing the uniqueness of minimizer is, in general, more com-
plex. A convenient condition that implies uniqueness of minimizers is
convexity.
A set A ⊂ Rn is convex if for all x, y ∈ A and any 0 ≤ λ ≤ 1 we
have λx+ (1− λ)y ∈ A. Let A be a convex set A function f : A→ Ris convex if, for any x, y ∈ A and 0 ≤ λ ≤ 1,
f(λx+ (1− λ)y) ≤ λf(x) + (1− λ)f(y),
and it is uniformly convex if there exists θ > 0 such that for all x, y ∈ Aand 0 ≤ λ ≤ 1,
f(λx+ (1− λ)y) + θλ(1− λ)|x− y|2 ≤ λf(x) + (1− λ)f(y).
Example 2. Let ‖ · ‖ be any norm in Rn. Then, by the triangle
inequality
‖λx+ (1− λ)y‖ ≤ ‖λx‖+ ‖(1− λ)y‖ = λ‖x‖+ (1− λ)‖y‖,
for all 0 ≤ λ ≤ 1. Thus the mapping x 7→ ‖x‖ is convex. J
Exercise 3. Show that the square of the Euclidean norm in Rd, ‖x‖2 =∑k x
2k is uniformly convex.
Proposition 4. Let A ⊂ Rn be a convex set and f : A→ R be a convex
function. If x and y are minimizers of f then so is λx+ (1− λ)y, for
any 0 ≤ λ ≤ 1. If f is uniformly convex then x = y.
Proof. If x and y are minimizers then f(x) = f(y) = min f .
Consequently, by convexity
f(λx+ (1− λ)y) ≤ λf(x) + (1− λ)f(y) = min f.
Therefore λx+ (1− λ)y is a minimizer of f . If f is uniformly convex,
and choosing 0 < λ < 1, we obtain
f(λx+ (1− λ)y) + θλ(1− λ)|x− y|2 ≤ min f,
which implies x = y.
14 1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS
The characterization of minimizers, through necessary or sufficient
conditions is usually made by introducing certain conditions that in-
volve first or second derivatives. Let f : Rn → R be a C2 function. Re-
call that Df and D2f denote, respectively the first and second deriva-
tives of f . Also we use the notation that a n × n matrix A ≥ 0 if A
is semidefinite positive and A > 0 is A is definite positive. The next
proposition is a well known result that illustrates this.
Proposition 5. Let f : Rn → R be a C2 function and x a minimizer
of f . Then
Df(x) = 0 and D2f(x) ≥ 0.
Proof. For any vector y ∈ Rn and ε > 0 we have
0 ≤ f(x+ εy)− f(x) = εDf(x)y +O(ε2),
dividing by ε, and letting ε→ 0, we obtain
Df(x)y ≥ 0.
Since y is arbitrary we conclude that:
Df(x) = 0.
In a similar way,
0 ≤ f(x+ εy) + f(x− εy)− 2f(x)
ε2= yTD2f(x)y + o(1),
and so, when ε→ 0, we obtain
yTD2f(x)y ≥ 0.
Let f : Rn → R be a C1 function. A point x is called a critical
point of f if Df(x) = 0.
Exercise 4. Let A be any set and f : A → R be a C1 function in
the interior intA of A. Show that any maximizer or minimizer of f is
either a critical point or lies on the boundary ∂A of A.
1. UNCONSTRAINED MINIMIZATION IN Rn 15
We will now show that any critical point of a convex function is a
minimizer. For that we need the following preliminary result:
Proposition 6. Let f : Rn → R be a C1 convex function. Then, for
any x, y we have
f(y) ≥ f(x) +Df(x)(y − x).
Proof. We have
(1−λ)f(x)+λf(y) ≥ f(x+λ(y−x)) = f(x)+λDf(x)(y−x)+o(|λ(y−x)|).
Thus, reorganizing the inequality and dividing by λ we obtain
f(y) ≥ f(x) +Df(x)(y − x) + o(1),
as λ→ 0.
We can use now this result to prove:
Proposition 7. Let f : Rn → R be a C1 convex function and x a
critical point of f . Then x is a minimizer of f .
Proof. Since Df(x) = 0 and f is convex, it follows from proposi-
tion 6 that
f(y) ≥ f(x),
for all y.
Exercise 5. Let f(x, λ) : Rn × Rm → R be a C2 function, x0 a mini-
mizer of f(·, 0), with D2xxf(x0, 0) definite positive. Show that, for each
λ in a neighborhood of λ = 0, there exists a unique local minimizer xλof f(·, λ) with xλ|λ=0 = x0. Compute Dλxλ at λ = 0.
Growth conditions on f can be used to estimate the norm of a
minimizer. In finite dimensional problems, estimates on the norm of a
minimizer are important for numerical methods. For instance, if such
an estimate exits, it makes it possible localize the search region for
a minimizer. In infinite dimensional problems this issue is even more
16 1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS
relevant as it will be clear later in these notes. An elementary result is
given in the next exercise:
Exercise 6. Let f : Rn → R be such that f(x) ≥ C1|x|2 +C2, C1 > 0.
Let x0 be a minimizer of f . Show that
|x0| ≤
√f(y)− C2
C1
,
for any y ∈ Rn.
Exercise 7. Let f(x, λ) : R2 → R be a continuous function. Suppose
for each λ there is at least one minimizer xλ of x 7→ f(x, λ). Suppose
there exists C such that |xλ| ≤ C for all λ in a neighborhood of λ = 0.
Suppose that for λ = 0 there exists a unique minimizer x0. Show that
limλ→0 xλ = x0.
Exercise 8. Let f ∈ C1(R2). Define u(x) = infy∈R f(x, y). Suppose
that
lim|y|→∞
f(x, y) = +∞,
uniformly in x. Let x0 be a point in which the infimum in y of f is
achieved at a single point y0. Show that u is differentiable in x at x0
and that∂u
∂x(x0) =
∂f
∂x(x0, y0).
Give an example that shows that u may fail to be differentiable if the
infimum of f in y is achieved at more than one point.
Exercise 9. Find all maxima and minima (both local and global) of
the function xy(1− x2 − y2) on the square −1 ≤ x, y ≤ 1.
2. Convexity
As we discussed in the previous section, convexity is a central prop-
erty in optimization. In this section we discuss additional properties of
convex functions which will be necessary in the sequel.
2. CONVEXITY 17
2.1. Characterizarion of convex functions. We now discuss
several tools that are useful to characterize convex functions. We first
observe that given a family of convex functions it is possible to build
another convex function by taking the pointwise supremum. This is a
useful construction and is illustrated in figure
ADD FIGURE HERE
Proposition 8. Let I be an arbitrary set and fι : Rn → R, ι ∈ I, an
indexed collection of convex functions. Let
f(x) = supι∈I
fι(x).
Then f is convex.
Proof. Let x, y ∈ Rn and 0 ≤ λ ≤ 1. Then
f(λx+ (1− λ)y) = supι∈I
fι(λx+ (1− λ)y) ≤ supι∈I
λfι(x) + (1− λ)fι(y)
≤ supι1∈I
λfι1(x) + supι2∈I
(1− λ)fι2(y)
= λf(x) + (1− λ)f(y).
Corollary 9. Suppose f : Rn → R is a C1 function satisfying
f(y) ≥ f(x) +Df(x)(y − x),
for all x. Then f is convex.
Proof. It suffices to observe that
f(y) ≥ supx∈Rn
f(x) +Df(x)(y − x),
which by proposition 8 is convex. Finally, we just observe that
supx∈Rn
f(x) +Df(x)(y − x) ≥ f(y),
and so the equality follows.
Proposition 10. Let f : Rn → R be a C2 function. Then f is convex
if and only if D2f(x) is positive semi-definite, for all x ∈ Rn.
18 1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS
Proof. Observe that if f is convex then for any y ∈ Rn and any
ε ≥ 0 we have
f(x− εy) + f(x+ εy)− 2f(x)
ε2≥ 0.
By sending ε→ 0 and using Taylor formula conclude
yTD2f(x)y ≥ 0,
and so D2f(x) is semi-definite positive.
Conversely,
f(y)− f(x) =
∫ 1
0
Df(x+ s(y − x))(y − x)ds =
= Df(x)(y − x) +
∫ 1
0
[Df(x+ s(y − x))(y − x)−Df(x)(y − x)] ds
= Df(x)(y − x) +
∫ 1
0
[∫ 1
0
s(y − x)TD2f(x+ ts(y − x))(y − x)dt
]ds
≥ Df(x)(y − x),
since (y − x)TD2f(x + ts(y − x))(y − x) ≥ 0, by the semi-positive
definiteness hypothesis.
Proposition 11. Let f : Rn → R be a continuous function. Then f is
convex if and only if
(3) f(x+ y) + f(x− y)− 2f(x) ≥ 0,
for any x, y ∈ Rn.
Proof. Clearly convexity implies (3). Let x, y ∈ Rn, and 0 ≤ λ ≤1 be such that λx+ (1− λ)y = z. We must prove that
(4) λf(x) + (1− λ)f(y) ≥ f(z)
holds. We claim that the previous equation holds for any λ = k2j
, for
any 0 ≤ k ≤ 2j. Clearly (4) holds when j = 1. Now we proceed with
induction in j. Assume that (4) holds for λ = k2j
. Then we claim that
it holds with λ = k2j+1 . If k is even we can reduce the fraction, therefore
2. CONVEXITY 19
we may suppose that k is odd, λ = k2j+1 and λx + (1 − λ)y = 0. Now
note that
z =1
2
[k − 1
2j+1x+
(1− k − 1
2j+1
)y
]+
1
2
[k + 1
2j+1x+
(1− k + 1
2j+1y
)].
Thus
f(z) ≤ 1
2f
(k − 1
2j+1x+
(1− k − 1
2j+1
)y
)+
1
2f
(k + 1
2j+1x+
(1− k + 1
2j+1
)y
)but, since k−1 and k+1 are even, k0 = k−1
2and k1 = k+1
2are integers.
Hence
f(z) ≤ 1
2f
(k0
2jx+
(1− k0
2j
)y
)+
1
2f
(k1
2jx+
(1− k1
2j
)y
)But this implies, that
f(z) ≤ k0 + k1
2j+1f(x) +
(1− k0 + k1
2j+1
)f(y).
Since k0 + k1 = k we get
f(z) ≤ k
2j+1f(x) +
(1− k
2j+1
)f(y).
Since f is continuous and the rationals of the form k2j
are dense in [0, 1],
we conclude that
f(z) ≤ λf(x) + (1− λ)f(y),
for any real 0 ≤ λ ≤ 1.
Exercise 10. Let f : Rn → R be a C2 function. Show that the following
statements are equivalent:
1. f is uniformly convex;
2. D2f ≥ γ > 0, for some γ > 0;
3. f(x+y
2
)+ θ |x−y|
2
4≤ f(x)+f(y)
2;
4. f(y) ≥ f(x) +Df(x)(y − x) + γ2|x− y|2, for some γ > 0.
Exercise 11. Let ϕ : R→ R be a non-decreasing convex function, and
ψ : Rn → R a convex function. Show that ϕ ψ is convex. Show by
giving an example that if ϕ is not non-decreasing then ϕ ψ may fail
to be convex.
20 1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS
2.2. Lipschitz continuity. Convex functions enjoy remarkable
properties. We will first show that any convex function is locally
bounded and Lipschitz.
Proposition 12. Let f : Rd → R be a convex function. Then f is
locally bounded and locally Lipschitz.
Proof. For x ∈ Rd denote |x|1 =∑
k |xk|. Define XM = x ∈ Rd :
|x|1 ≤M. We will prove that f is bounded on XM/8.
Any point x ∈ XM can be written as a convex combination of the
points ±Mek, where ek is the k-th standard unit vector. Thus
f(x) ≤ maxkf(Mek), f(−Mek).
Suppose now f is not bounded by bellow on XM/8. Then there exists
a sequence xn ∈ XM/8 such that f(xn) → −∞. Choose a point y ∈XM/4∩Xc
M/8. Note that 2y−xn ∈ XM . Therefore we can write 2y−xnas a convex combination of the points ±Mek, i.e.
y =1
2xn +
1
2
∑k
∑±
±λ±kMek.
Thus
f(y) ≤ 1
2f(xn) +
1
2maxkf(Mek), f(−Mek),
which is a contradiction if f(xn)→ −∞.
Now we will show the second part of the proposition, i.e., that any
convex function is also locally Lipschitz. By contradiction, by changing
coordinates if necessay, we can assume that 0 is not a Lipschizt point,
that is, there exists a sequence xn → 0 such that
|f(xn)− f(0)| ≥ C|xn|,
for all C and all n large enough. In particular this implies that
lim supn→∞
f(xn)− f(0)
|xn|∈ −∞,+∞.
2. CONVEXITY 21
and, similarly,
lim infn→∞
f(xn)− f(0)
|xn|∈ −∞,+∞.
By the previous part of the proof, we can assume that f is bounded
on X1. For each n choose a point yn such that |yn|1 = 1 such that
xn = |xn|1yn. Then
f(xn) ≤ |xn|f(yn) + (1− |xn|)f(0),
which implies
f(yn) ≥ f(0) +f(xn)− f(0)
|xn|.
Therefore
(5) lim supn→∞
f(xn)− f(0)
|xn|= −∞,
otherwise we would have a contradiction (note that f(yn) is bounded).
We can also write 0 = 11+|xn|xn −
|xn|1+|xn|yn. So
f(0) ≤ 1
1 + |xn|f(xn) +
|xn|1 + |xn|
f(−yn).
This implies
f(−yn) ≥ f(0) +f(0)− f(xn)
|xn|.
Because f(−yn) is bounded
lim supn→∞
f(0)− f(xn)
|xn|= −∞
which is a contradiction to (5).
2.3. Separation. In this last subsection we study separation prop-
erties that arise from convexity and present some applications.
Proposition 13. Let C be a closed convex set not containing the ori-
gin. Then there exists x0 ∈ C which minimizes |x| over all x ∈ C.
22 1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS
Proof. Consider a minimizing sequence xn. By a simple compu-
tation, we have the parallelogram identity∥∥∥∥xn + xm2
∥∥∥∥2
+1
4‖xn − xm‖2 =
1
2‖xn‖2 +
1
2‖xm‖2.
Because xn+xm2∈ C, by convexity, we have the inequality∥∥∥∥xn + xm
2
∥∥∥∥2
≥ infy∈C‖y‖2.
As n,m→∞ we also have
‖xn‖, ‖xm‖ → infy∈C‖y‖2.
But then, as n,m→∞, we conclude that
‖xn − xm‖2 → 0.
Therefore any minimizing sequence is a Cauchy sequence and hence
convergent.
Exercise 12. Let F : Rn → R be a uniformly convex function. Show
that any minimizing sequence for F is a Cauchy sequence. Hint:
F (xn)+F (xm)−2 inf F ≥ F (xn)+F (xm)−2F (xn + xm
2) ≥ θ
2|xn−xm|2.
Proposition 14. Let U and V be disjoint closed convex sets. Suppose
one them is compact. Then there exists w ∈ Rn and a > 0 such that
(w, x− y) ≥ a > 0,
for all x ∈ U and y ∈ V .
Proof. Consider the closed convex set W = U − V (this set is
closed because either U or V is compact). Then there exists a point
w ∈ W with minimal norm. Since 0 6∈ W , w 6= 0. So, for all x ∈ Uand y ∈ V , by the convexity of W ,
‖w‖2 ≤ ‖λ(x− y) + (1− λ)w‖2
= (1− λ)2‖w‖2 + 2λ(1− λ)(x− y, w) + λ2‖x− y‖2.
The last inequality implies
0 ≤ ((1− λ)2 − 1)‖w‖2 + 2λ(1− λ)(x− y, w) + λ2‖x− y‖2.
2. CONVEXITY 23
Dividing by λ and letting λ→ 0 we obtain
(x− y, w) ≥ ‖w‖2 > 0.
As a first application to the separation result we discuss a general-
ization of derivatives for convex functions. The subdifferential ∂−f(x)
of a convex function f : Rn → R at a point x ∈ Rn is the set of vectors
p ∈ Rn such that
f(y) ≥ f(x) + p · (y − x),
for all y ∈ Rn.
Proposition 15. Let f : Rn → R be a convex function and x0 ∈ Rn.
Then ∂−f(x0) 6= ∅.
Proof. Consider the set E(f) = (x, y) ∈ Rn+1 : y ≥ f(x), the
epigraph of f . Then, because f is convex and hence continuous, E(f)
is a closed convex set. Consider the sequence yn = f(x0)− 1n. Because
for each n the sets E(f) and (x0, yn) are disjoint closed convex sets,
and the second one is compact, there is a separating plane
(6) f(x) ≥ αn(x− x0) + βn,
for all x and
(7) f(x0)− 1
n= yn ≤ βn ≤ f(x0).
Thus, from (7) we get that βn is bounded. Since f is locally bounded,
the inequality (6) implies the boundedness of αn. Therefore, up to a
subsequence, there exists α = limαn and β = lim βn. Furthermore
f(x) ≥ α(x− x0) + β,
and, again using (7), we get that f(x0) = β. Thus
f(x) ≥ α(x− x0) + f(x0),
and so α ∈ ∂−f(x).
Exercise 13. Let f : R→ R, be given by f(x) = |x|. Compute ∂−f .
24 1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS
Exercise 14. Let f : Rn → R be convex. Show that if f is differentiable
at x ∈ Rn then ∂−f(x) = Df(x).
Proposition 16. Let f : Rn → R be a C1 convex function. Then
(Df(x)−Df(y)) · (x− y) ≥ 0.
Proof. Observe that
f(y) ≥ f(x) +Df(x) · (y − x) f(x) ≥ f(y) +Df(y) · (x− y).
Add these two inequalities.
Exercise 15. Prove the analogous to the previous proposition for the
case in which f is not C1 by replacing derivatives by points in the
subdifferential.
Exercise 16. Let f be a uniformly convex function. Show that
(Df(x)−Df(y)) · (x− y) ≥ γ|x− y|2.
Exercise 17. Let f : Rn → R be a convex function. Show that a point
x ∈ Rn is a minimizer of f if and only if 0 ∈ ∂−f(x).
Exercise 18. Let A be a convex set and f : A → R be a uniformly
convex function. Let x ∈ A be a maximizer of f . Show that x is
an extreme point, that is, that there are no y, z ∈ A, x 6= y, z and
0 < λ < 1 such that x = λy + (1− λ)z.
The second application of Proposition 14 is a very important result
called Farkas lemma:
Lemma 17 (Farkas Lemma). Let A be a m×n matrix, c a line vector
in Rn. Then we have one and only one of the following alternatives
1. c = yTA, for some y ≥ 0
2. There exists a column vector w ∈ Rn, such that Aw ≤ 0 and
cw > 0
2. CONVEXITY 25
Proof. If the first alternative does not hold, the sets U = yTA, y ≥0 and V = c are disjoint and convex. Then the separation theo-
rem for convex sets (see proposition 14) implies that there exists an
hyperplane with normal w which separates them, that is
(8) yTAw ≤ a
and
cw > a.
Note that a ≥ 0 (by setting y = 0 in (8)), so cw > 0. Furthermore, for
any γ ≥ 0 we have
γyTAw ≤ a,
by letting γ → +∞ we conclude that
yTAw ≤ 0.
So this corresponds to the second alternative.
Example 3. Consider a discrete state one-period pricing model, that
is, we are given n assets which at the initial time cost ci, 1 ≤ i ≤ n per
unit (we regard c as a row vector) and after one unit of time, each asset
is worth with probability pj, 1 ≤ j ≤ m, Pji. A portfolio is a (column)
vector π ∈ Rn. The value of the portfolio at time 0 is cπ and at time
one, with probability pj the value is (Pπ)j. An arbitrage opportunity
is a portfolio such that cπ < 0 and (Pπ)j ≥ 0, i.e. a portfolio with
negative cost and non-negative return.
Farkas lemma yields that either
1. there exists y ∈ Rm, y ≥ 0 such that c = yP
or
2. there exists an arbitrage portfolio.
Furthermore, if one of the assets is a no-interest bearing bank ac-
count, for instance c1 = 1 and Pj1 = 1. Then y is a probability vector
which in general may be different from p. J
26 1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS
3. Lagrange multipliers
Many important problems require minimizing (or maximizing) func-
tions under equality constraints. The Lagrange multiplier method is
the standard tool to study these problems. For inequality constraints,
the Lagrange multiplier method can be extended in a suitable way as
it will be studied in the two following sections.
Proposition 18. Let f : Rn → R and g : Rn → Rm (m < n) be C1
functions. Suppose c ∈ Rm fixed, and assume that the rank of Dg is m
at all points of the set g = c. Then, if x0 is a minimum of f in the set
g(x) = c, there exists λ ∈ Rm such that
Df(x0) = λTDg(x0).
Proof. Let x0 be as in the statement. Suppose that w1, . . . wm are
vectors in Rn satisfying
det [Dg(x0)W ] 6= 0,
where W ≡ [w1 · · ·wm] is the matrix with columns w1, . . . wm. Note
that it is possible to choose such vectors because the rank of Dg is m.
Given v ∈ Rn consider the equation
g(x0 + εv +Wi) = c.
The implicit function theorem implies that there exists a unique func-
tion i(ε) : R→ Rm,
i(ε) =
i1(ε)...
im(ε)
,defined in a neighborhood of ε = 0, with i(0) = 0, and such that
g(x0 + εv +Wi(ε)) = c.
Additionally,
i′(0) = −(Dg(x0)W )−1Dg(x0)v.
Since x0 is a minimizer of f in the set g(x) = c, the function
I(ε) = f(x0 + εv +Wi(ε))
3. LAGRANGE MULTIPLIERS 27
satisfies
0 = I ′(0) = Df(x0)v +Df(x0)Wi′(0),
that is,
Df(x0)v = λTDg(x0)v,
with
λT = Df(x0)W (Dg(x0)W )−1,
for any vector v.
Proposition 19. Let f : Rn → R, and g : Rn → Rm, with m < n,
be smooth functions. Assume that Dg has maximal rank at all points.
Let xc be a minimizer of f(x) under the constraint g(x) = c, and λcthe corresponding Lagrange multiplier, i.e.
(9) Df(xc) = λcDg(xc).
Suppose that xc is differentiable function of c. Define
V (c) = f(xc).
Then DcV (c) = λc.
Proof. We have
g(xc) = c.
By differentiating with respect to c we obtain
Dg(xc)∂xc∂c
= I.
Multipying by λc and using (9) yields
λc = λcDg(xc)∂xc∂c
= Df(xc)∂xc∂c
= DcV (c).
Exercise 19. Let f : Rn → R, and g : Rn → Rm, with m < n, be
smooth functions. Assume that Dg has maximal rank at all points. Let
x0 be a minimizer of f(x) under the constraint g(x) = g(x0), λ the
corresponding Lagrange multiplier, and F = f + λg. Show that
D2xixj
F (x0)ξiξj ≥ 0,
for all vectors ξ that satisfy Dxig(x0)ξi = 0.
28 1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS
Proposition 20. Let f : Rn → R, and g : Rn → Rm, with m < n.
Let x0 be a minimizer of f(x) under the constraint g(x) = g(x0). Then
there exist constants λ0, · · ·λm not identically zero such that
λ0Df + λ1Dg1 · · ·λmDgm = 0
at x0. Furthermore, if Dg has maximal rank we can choose λ0 = 1.
Proof. First observe that the matrix[Df
Dg
]cannot have rank m + 1. Indeed, this follows by applying the implicit
function theorem to the function (x, c) 7→ (f(x) − c0, g(x) − c′) with
x ∈ Rn and c = (c0, c′) ∈ Rm+1, to obtain a contradiction to x0 being
a minimizer.
This fact then implies that there exist constants λ0, · · ·λm not iden-
tically zero such that
λ0Df + λ1Dg1 + · · ·+ λmDgm = 0
at x0. Observe also that if Dg has maximal rank we can choose λ0 = 1.
In fact, if λ0 6= 0, it suffices to multiply λ by 1λ0
. To see that λ0 6= 0 we
argue by contradiction. In fact, if λ0 = 0 we would have
λ1Dg1 + · · ·+ λmDgm = 0
which contradicts the hypothesis that Dg has maximal rank m.
Example 4 (Minimax principle). There exists a nice formal interpre-
tation of Lagrange multipliers, which although not rigorous is quite
useful. Fix c ∈ Rm, and consider the problem of minimizing a function
f : Rn → R under the constraint g(x)− c = 0, with g : Rn → Rm. This
problem can be rewritten as
minx
maxλ
f(x) + λT (g(x)− c).
The minimax principle asserts that the maximum can be exchanged
with the minimum (which is frequently false) and, therefore, we obtain
3. LAGRANGE MULTIPLIERS 29
the ”equivalent” problem
maxλ
minxf(x) + λT (g(x)− c).
From this we deduce that, for each λ the minimum xλ is determined
by
(10) Df(xλ) + λTDg(xλ) = 0.
Furthermore, the function to maximize in λ is
f(xλ) + λT (g(xλ)− c).
Differentiating this equation with respect to λ, assuming that xλ is
differentiable, and using (10), we obtain
g(xλ) = c.
J
Exercise 20. Use the minimax principle to determine (formally) op-
timality conditions for the problem
min f(x)
under the constraint g(x) ≥ c.
The next exercise illustrates that the minimax principle may indeed
be false, although in many problems it is an important heuristic
Exercise 21. Show that the minimax principle is not valid in the fol-
lowing cases:
1. x+ λ;
2. x3 + λ(x2 + 1);
3. 11+(x−λ)2 .
Exercise 22. Let A and B be arbitrary sets and F : A×B → R. Show
that
infa∈A
supb∈B
F (a, b) ≥ supb∈B
infa∈A
F (a, b).
30 1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS
4. Linear programming
We now continue the study of constrained optimization problems
by looking into minimization of linear functions subjected to linear in-
equality constraints - i.e., linear programming problems. A detailed dis-
cussion on this class of problems can be found, for instance, in [GSS08]
or [Fra02].
4.1. The setting of linear programming. A model problem in
linear programming is the following: given a line vector c ∈ Rn, a real
m × n matrix A, and a column vector b ∈ Rm we look for a column
vector x ∈ Rn which is a solution to the problem:
(11)
maxx cx
Ax ≤ b
x ≥ 0,
where the notation v ≥ 0 for a vector v means that all components of
v are non-negative. The set defined by the inequalities Ax ≤ b and
x ≥ 0 may be empty, or in this set the function cx may be unbounded
by above. To simplify the discussion, we assume that this situation
does not occur.
Move here feasible set
Example 5. Add example here.
J
Observe that if c 6= 0 the maximizers of cx cannot be interior points
of the feasible set, otherwise by exercise 4 they would be critical points.
Therefore, the maximizers must lie on the boundary of Ax ≤ b, x ≥ 0.
Unfortunately this boundary can be quite complex as consists on a
finite (but frequently large) union of intersections of planes (of the
form dx = e) with half-planes (of the form dx ≤ e).
4. LINEAR PROGRAMMING 31
Exercise 23. Suppose that no line of A vanishes. Show that the bound-
ary of the set Ax ≤ b consist of all points which satisfy Ax ≤ b with
equality in at least one coordinate.
Note that the linear programming problem (11) is quite general
as it is possible to include equality constraints as inequalities: in fact
A′x = b′ is the conjunction of A′x ≤ b′ and −A′x ≤ −b′.
A vector x is called feasible for (11) if it satisfies the constraints,
that is Ax ≤ b and x ≥ 0.
Example 6 (Diet problem). A animal food factory would like to min-
imize the production cost of a pet food, while keeping it nutritionally
balanced. Each food i costs ci by unit. Therefore, if each unit of pet
food contains an amount xi of the food ci, the total cost is
cx.
There is, of course, the obvious constraint that x ≥ 0. Suppose that
Aij represents the amount of the nutrient i in the food j, and bi the
minimum recommended amount of the nutrient i. Then, to ensure a
nutritionally ballanced diet we must have
Ax ≥ b.
Thus the diet problem is min cx
Ax ≥ b
x ≥ 0.
J
Example 7 (Optimal Transport). A large multinational needs to trans-
port its supply from each factory i to the distribution points j. The
supply in i is si and the demand in j is dj. The cost of transporting
one unit from i to j is cij. We would like to determine the quantity πijtransported from i to j solving the following optimization problem
minπ
∑ij
cijπij,
32 1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS
under the constraints πij ≥ 0, and supply and demand bounds∑j
πij ≤ si,∑i
πij ≥ dj.
J
Example 8. The existence of feasible vectors, i.e. vectors satisfying
the constraint Ax ≤ b is not obvious. There exists, however a procedure
that can convert this question into a new linear programming problem.
Let x0 be a new variable. We would like to solve
minx0
where the minimum is taken over all vectors (x0, x) which satisfy the
constraints (Ax)j ≤ bj +x0, for all j. It is clear that the feasible set for
this problem is non-empty, take for instance x = 0 and x0 = max |bj|.
This new linear programming problem has therefore a value (which
could be −∞ but not +∞). If the value is non-positive, there exist
feasible vectors for the constraint Ax ≤ b. Otherwise, if the value is
positive, it implies that the feasible set of the original problem is empty.
J
Exercise 24. Let A be m × n matrix, with m > n. Consider the
overdetermined system
Ax = b
for b ∈ Rm. In general, this equation has no solution. We would like to
determine a vector x ∈ Rn which minimizes the maximum of the error
supi|(Ax)i − bi|.
Rewrite this problem as a linear programming problem. Compare this
problem with the minimum square method which consists in solving
minx‖Ax− b‖2.
4. LINEAR PROGRAMMING 33
4.2. The dual problem. To problem (11), which we call primal,
we associate another problem, called the dual, which consists in deter-
mining y ∈ Rm, which solves
(12)
min yT b
yTA ≥ c
y ≥ 0.
As the next exercise shows, the dual problem can be motivated by the
minimax principle:
Exercise 25. Show that (11) can be written as
(13) maxx≥0
miny≥0
cTx+ yT (b− Ax).
Suppose we can exchange the maximum with the minimum in (13).
Relate the resulting problem with (12).
Example 9 (Interpretation of the dual of the diet problem). The dual
of the diet problem (example 6) is the followingmax yT b
yTA ≤ c
y ≥ 0.
This problem admits the following interpretation. A competing com-
pany is willing to provide a nutritionally balanced diet, charging for
each unit of the nutrient i a price yi. Obviously, the competing com-
pany would like to maximize its income. There are the following con-
straints: y ≥ 0, and furthermore if the food item j costs cj the com-
peting company should charge an amount (yTA)j no larger than cj.
This constraint is quite natural, since if it does not hold, at least part
of the diet could be obtained by buying the food items j such that
(yTA)j > cj. J
Exercise 26. Show that the dual of the dual is equivalent to the primal.
Exercise 27. Determine the dual of the optimal transport problem and
give a possible interpretation.
34 1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS
As the next theorem concerns the relation between the primal and
dual problems:
Theorem 21.
1. Weak Duality: Suppose x and y are feasible, respectively, for
(11) and (12), then
cx ≤ yT b.
2. Optimality: Furthermore, if cx = yT b then x and y are solu-
tions of (11) and (12), respectively.
3. Strong duality: If (11) has a solution x∗, then (12) also has
a solution y∗,
cx∗ = (y∗)T b.
Finally, y∗j = 0 for all indices j such that (Ax∗)j < bj.
Proof. To prove the weak duality, observe that
cx ≤ (yTA)x = yT (Ax) ≤ yT b.
The optimality criterion follows from the previous inequality.
To prove the strong duality, we may assume that the inequality
Ax ≤ b includes also x ≥ 0, for instance replacing A by the augmented
matrix
A =
[A
−I
]and the vector b by
b =
[b
0
].
In this case it will be enough to prove that there exists a vector y∗ ∈Rn+m such that y∗ ≥ 0,
c = (y∗)T A
with y∗j = 0 for all indices j such that (Ax∗)j < bj. In fact, if such
vector y∗ is given we just set y∗ to be the first n coordinates of y∗.
4. LINEAR PROGRAMMING 35
Then c ≤ (y∗)TA and then
cx∗ = (y∗)T Ax∗ = (y∗)T b = (y∗)T b,
since b differs from b by adding n zero entries. From this point on we
drop the ∼ to simplify the notation.
First we state the following auxiliary result, whose proof is a simple
corollary to Lemma 17:
Lemma 22. Let A be a m × n matrix, c a line vector in Rn and J
an arbitrary set of lines of A. Then we have one and only one of the
following alternatives
1. c = yTA, for some y ≥ 0 with yj = 0 for all j 6∈ J .
2. There exists a column vector w ∈ Rn, such that (Aw)j ≤ 0 for
all j ∈ J and cw > 0.
Exercise 28. Use Lemma 17 to prove Lemma 22.
Let x∗ be a solution of (11). Let J be the set of indices j for which
(Ax∗)j = bj. We will show that there exists y ≥ 0 such that c = yTA
and yj = 0 for j 6∈ J . By contradiction assume that no such y exists.
By the previous lemma there is w such that cw > 0 and (Aw)j ≤ 0 for
j ∈ J . But then, x = x∗ + εw is feasible, for ε > 0 sufficiently small
since
Ax = Ax∗ + εAw ≤ b.
However,
cx = c(x∗ + εw) > cx∗,
which contradicts the optimality of x∗.
Therefore, for some y ≥ 0,
cx∗ = yTAx∗ = yT b.
Consequently, by the second part of the theorem we conclude that y is
optimal.
36 1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS
Lemma 23. Let x and y be, respectively, feasible for the primal and
dual problems. Define
s = b− Ax ≥ 0, e = ATy − cT ≥ 0.
Then
sTy + xT e = bTy − xT cT ≥ 0.
Proof. Since x, y ≥ 0 we have
sTy = bTy − xTATy ≥ 0 xT e = xTATy − xT cT ≥ 0.
By adding these two expressions, we obtain
sTy + xT e = bTy − xT cT ≥ 0.
Theorem 24 (Complementarity). Suppose x and y are solutions of
(11) and (12), respectively. Then
sTy = 0 and xT e = 0.
Proof. We have sTy, xT e ≥ 0. If x and y are optimal then cx =
yT b. By the previous lemma
sTy + xT e = 0,
which implies the theorem.
Exercise 29. Study the following problem in R2:
maxx1 + 2x2
with x1, x2 ≥ 0, x1 + x2 ≤ 1 and 2x1 + x2 ≤ 3/2. Determine the dual
problem, its solution and show that it has the same value as the primal
problem.
Exercise 30. Let x∗ be a solution of the problem
min cx
5. NON-LINEAR OPTIMIZATION WITH CONSTRAINTS 37
under the constraints Ax ≥ b, x ≥ 0 and let y∗ be a solution of the
dual. Use complementarity to show that x∗ minimizes
cx− (y∗)TAx
under the constraint x ≥ 0.
Exercise 31. Solve by elementary methods the problem
maxx1 + x2
under the constraints 3x1 + 4x2 ≤ 12, 5x1 + 2x2 ≤ 10.
Exercise 32. Consider the problem
min−7x1 + 9x2 + 16x3,
under the constraints x ≥ 0, 2 ≤ x1 + 2x2 + 9x3 ≤ 7. Obtain an upper
and lower bound for the value of the minimum.
Exercise 33. Show that the solution set of a linear programming prob-
lem is a convex set.
Exercise 34. Consider a linear programming problem in Rn
min cεx
under the constraints Ax ≤ b, x ≥ 0. Suppose cε = c0 + εc1. Suppose
that for ε > 0 there exists a minimizer xε which converges to a point
x0, as ε → 0. Show that x0 is a minimizer of c0x under Ax ≤ b, x ≥0. Show, furthermore that if this limit problem has more than one
minimizer then x0 minimizes c1x among all other minimizers.
5. Non-linear optimization with constraints
Let f : Rn → R and g : Rn → Rm be C1 functions. We consider
the following non-linear optimization problem:
(14)
maxx
f(x)
g(x) ≤ 0
x ≥ 0.
38 1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS
We denote the feasible set by X:
X = x ∈ Rn|x ≥ 0, g(x) ≤ 0,
and the solution set by S:
S = x ∈ X : f(x) = supx∈X
f(x).
In this section we derive necessary conditions, called the Karush-Kuhn-
Tucker (KKT) conditions, for a point to be a solution of the problem.
We start by explaining these conditions which generalize both the La-
grange multipliers for equality constraints and the optimality condi-
tions from linear programming. We then show that under convexity
hypothesis these conditions are in fact sufficient. After that we show
that under a condition called constraint qualification that the KKT
conditions are indeed necessary optimality conditions. We end the
discussion with several conditions that allow to check in practice the
constraint qualification conditions.
5.1. KKT conditions. For y ∈ Rm define the Lagrangian
L(x, y, µ) = f(x)− yTg(x) + µTx
For (x, µ, y) ∈ Rn × Rn × Rm the KKT conditions are the following:
(15)
∂L∂xi
= 0
g(x) ≤ 0, yTg(x) = 0
x ≥ 0, µTx = 0
µ, y ≥ 0.
The variables y and µ are called the Lagrange multipliers.
Several variations of the KKT conditions arise in different problems.
For instance, in the case in which there is no positivity constraints for
the variable x, the KKT conditions take the form: for (x, y) ∈ Rn×Rm,
5. NON-LINEAR OPTIMIZATION WITH CONSTRAINTS 39
and L(x, y) = f(x)− yTg(x),
(16)
∂L∂xi
= 0
g(x) ≤ 0, yTg(x) = 0
y ≥ 0.
Exercise 35. Derive (16) from (15) by writing x = x+ − x− where
x+, x− ≥ 0.
Another example are equality constraints g(x) = 0, again without
positivity constraints in the variable x. We can write the equality
constraint as g(x) ≤ 0 and −g(x) ≤ 0. Let y± be the multipliers
corresponding to ±g(x) ≤ 0, define y = y+ − y−. Then (16) can be
written as
∂f
∂xi=
m∑j=1
yj∂gj∂xi
, g(x) = 0,
that is, y is the Lagrange multiplier for the equality constraint g(x) = 0.
Consider a linear programming problem where in (14) we set
f(x) = cx, g(x) = Ax− b.
Then the KKT conditions are thenc− yTA = −µAx ≤ b, yT (Ax− b) = 0
x ≥ 0, µTx = 0
µ, y ≥ 0.
In this case, the first line of the KKT conditions can be rewritten as
c− yTA ≤ 0,
that is, since y ≥ 0, y is admissible for the dual problem. Using the
condition µTx = 0 we conclude that
c · x = yTAx.
40 1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS
Then the second line of the KKT condition yields yTAx = yT b, which
implies
cx = yT b,
which is the optimality criterion for the linear programming problem,
and shows that a solution of the KKT condition is in fact a solution
of (14). Furthermore, it also shows that y is a solution to the dual
problem.
Example 10. Let Q be an n× n real matrix. Consider the quadratic
programming problem
(17)
maxx
12xTQx
Ax ≤ b
x ≥ 0.
The KKT conditions are
(18)
xTQ− yTA = −µAx ≤ b, yT (Ax− b) = 0
x ≥ 0, µTx = 0
µ, y ≥ 0.
J
5.2. Duality and sufficiency of KKT conditions. We can write
problem (14) in the following minimax form:
supx≥0
infy≥0
f(x)− yTg(x).
We define the dual problem as
(19) infy≥0
supx≥0
f(x)− yTg(x).
Let
h∗(y) = supx≥0
f(x)− yTg(x),
and
h∗(x) = infy≥0
f(x)− yTg(x).
5. NON-LINEAR OPTIMIZATION WITH CONSTRAINTS 41
Then (14) is equivalent to
supx≥0
h∗(x),
and (19) is equivalent to the problem
infy≥0
h∗(y).
From exercise 22, we have the duality inequality
supx≥0
h∗(x) = supx≥0
infy≥0
f(x)− yTg(x)
≤ infy≥0
supx≥0
f(x)− yTg(x) = infy≥0
h∗(y).
Furthermore, if x ≥ 0 and y ≥ 0 satisfy
h∗(x) = h∗(y)
then x and y are, respectively, solutions to (14) and (19).
If we choose
f(x) = cx, g(x) = Ax− b,(14) is a linear programming problem. Then
h∗(x) =
cx if Ax− b ≤ 0
−∞ otherwise,
and
h∗(y) =
bTy if ATy − c ≥ 0
+∞ otherwise.
Consider the quadratic programming problem
(20)
max 12xTQx
Ax− b ≤ 0.
Note that here the variable x does not have any sign constraint.
In this case we define
h∗(x) = infy≥0
1
2xTQx− yT (Ax− b) =
12xTQx if Ax− b ≤ 0
−∞ otherwise,
42 1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS
and
h∗(y) = supx
1
2xTQx− yT (Ax− b).
If we assume that Q is non-singular and negative definite we have
h∗(y) = −1
2yTAQ−1ATy + yT b.
It is easy to check directly that h∗(x) ≤ h∗(y).
It turns out that the KKT conditions are in fact sufficient if f and
g satisfy additional convexity conditions.
Proposition 25. Suppose that −f and each component of g is convex.
Let (x, µ, y) ∈ Rn×Rn×Rm be a solution of the KKT conditions (15).
Then x is a solution of (14).
Proof. Let x ∈ X. By the concavity of f we have
f(x)− f(x) ≤ Df(x)(x− x).
By the KKT conditions (15),
Df(x)(x− x) = yTDg(x)(x− x)− µT (x− x).
Since each component of g is convex, and y ≥ 0,
yTDg(x)(x− x) ≤ yT (g(x)− g(x))
Since yTg(x) = 0, yTg(x) ≥ 0, µTx ≥ 0, and µT x = 0, we have
f(x)− f(x) ≤ 0,
that is x is solution.
As the next proposition shows, the KKT conditions imply strong
duality.
Proposition 26. Suppose that −f and each component of g is convex.
Let (x, µ, y) ∈ Rn×Rn×Rm be a solution of the KKT conditions (15).
Then
h∗(x) = h∗(y).
5. NON-LINEAR OPTIMIZATION WITH CONSTRAINTS 43
Proof. Observe that, by the previous theorem, any solution to
Df(x)− yTDg(x) + µT = 0,
with µ ≥ 0, µTx = 0, is a maximizer of the function
f(x)− yTg(x),
under the constraint x ≥ 0. Therefore
h∗(y) = f(x)− yTg(x) = f(x),
since yTg(x) = 0. Furthermore,
h∗(x) = f(x) + infy≥0−yTg(x) = f(x),
because g(x) ≤ 0. Thus
h∗(x) = h∗(y).
5.3. Constraint qualification and KKT conditions. Consider
the constraints
(21) g(x) ≤ 0, x ≥ 0.
Let X denote the admissible set for (21). For x ∈ X define the active
coordinates indices as I(x) = i : xi = 0, and the active constraints
indices as J(x) = j : gj(x) = 0. For x ∈ X define the tangent cone
to the admissible set X at the point x ∈ X as the set T (x) of vectors
v ∈ Rn which satisfy
vi ≥ 0, v ·Dgj(x) ≤ 0,
for all i ∈ I(x) and all j ∈ J(x). We say that the constraints satisfy the
constraint qualification condition if for any x ∈ X and any v ∈ T (x)
there exists a C1 curve x(t) with x(0) = x and x(0) = v with x(t) ∈ Xfor all t ≥ 0 sufficiently small.
Proposition 27. Let x be a solution of (14), and assume that the
constraint qualification condition holds. Then there exists µ ∈ Rn and
y ∈ Rm such that (15) holds.
44 1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS
Proof. Fix v ∈ T (x) and let x be a curve as in the constraint
qualification condition. Because x is a maximizer,
(22) 0 ≥ d
dtf(x(t))
∣∣∣∣t=0
= v ·Df(x).
From Farkas lemma (Lemma 17) we know that either there is v ∈ T (x)
such that v ·Df > 0 or else the vector −Df belongs to the positive cone
generated by ei, i ∈ I and −Dgj(x), for j ∈ J . By (22) we know that
the first alternative does not hold, hence there exists a vector µ ∈ Rn,
with µi ≥ 0 for i ∈ I, and µi = 0 for i ∈ Ic, and y ∈ Rm with yj ≥ 0
for j ∈ J and yj = 0 for j ∈ J c such that
Df = yTDg − µT .
By the construction of y and µ, as well as the definition of I and J , it
is clear that µTx = 0 as well as yTg = 0.
To give an interpretation of the Lagrange multipliers in the KKT
conditions, consider the family of problems
(23)
maxx
f(x)
gθ(x) ≤ 0,
where θ ∈ Rm and
gθ(x) = g(x)− θ.
We will assume that for all θ the constraint qualification condition
holds. Furthermore, assume that there exists a unique solution xθ
which is a differentiable function of θ. Define the value function
V (θ) = f(xθ).
Let yθ ∈ Rm be the corresponding Lagrange multipliers, which we
assume to be also differentiable.
We claim that for any θ0 ∈ Rm
(24)∂V (θ0)
∂θj= yθ0j .
5. NON-LINEAR OPTIMIZATION WITH CONSTRAINTS 45
To prove this identity, observe first that we have, using the KKT con-
ditions,
∂V (θ)
∂θj=∑k
∂f(xθ)
∂xk
∂xθ
∂θj=∑kj
yθj∂gθj (x
θ)
∂xk
∂xθk∂θj
.
By differentiating the complementarity condition∑
k yθkg
θk(xθ) = 0 with
respect to θj we obtain
(25) 0 =
[∑k
∂yθk∂θj
gθk(xθ) + yθk
∑i
∂gθk(xθ)
∂xi
∂xθi∂θj
]− yθj .
For θ = θ0 we either have gθk(xθ0) = 0 or gθk(x
θ0) < 0, in which case
yθk vanishes in a neighborhood of θ0. Consequently, in this last case we
have∂yθ0k
∂θj= 0. Therefore
∂yθk∂θj
gθk(xθ) = 0.
So, from (25), we conclude that
yθ0j =∑k
yθk∂gθ0k (xθ0)
∂xi
∂xθ0i∂θk
.
Thus we obtain (24).
5.4. Checking the constraint qualification conditions. Con-
sider the following optimization problem
(26)
maxx
x1
−(1− x1)3 + x2 ≤ 0
x ≥ 0.
The Lagrangian is
L(x, y, µ) = x1 − yT (x2 − (1− x1)3) + µ1x1 + µ2x2
and so∂L(x, y, µ)
∂x1
= 1− 3(1− x1)2y + µ1.
In particular, when x1 = 1, the equation
1 + µ1 = 0
46 1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS
does not have a solution with µ1 ≥ 0. Hence the KKT conditions are
not satisfied. Nevertheless the point (x1, x2) = (1, 0) is a solution.
This example illustrates the need for obtaining simple criteria to
check whether the constraint qualification conditions hold. We will
show that the following are sufficient conditions for the verification of
the constraint qualifications.
1. The Mangasarian-Fromowitz condition: for any x ∈ X there is
v such that
∇gi(x)v < 0;
2. The Cotte-Dragominescu condition: for any x ∈ X the active
constraints are positively linearly independent:∑y∇gi = 0, y ≥ 0 implies y = 0;
3. The Arrow-Hurwicz and Uzawa condition: for any x ∈ X the
active constraints are linearly independent.
It is obvious that 3. implies 2. We will show that 1. is equivalent to 2.
To do so we need the following lemma:
Proposition 28 (Gordon alternative). Let A be a real-valued m × nmatrix. Then one and only one of the following holds:
• There exists x ∈ Rn such that Ax < 0;
• There exists y ∈ Rm, y ≥ 0, and y 6= 0, such that yTA = 0.
Proof. (i) It is clear that the two conditions are disjoint. Other-
wise, if Ax < 0 and yTA = 0 we would have 0 = yTAx < 0 which is a
contradiction.
(ii) We consider the following optimization problem:
(27)
maxy
y1 + · · ·+ ym
yTA = 0
y ≥ 0.
5. NON-LINEAR OPTIMIZATION WITH CONSTRAINTS 47
It is clear that if the second alternative holds then the value of this
problem is +∞. Otherwise, y = 0 is a solution and the value is 0. In
this case the dual problem:
(28)
minx
0
(Ax)i ≤ −1, i = 1, . . . ,m
has a solution, i.e., there is a point x satisfying the constraints. Hence,
the first alternative holds.
Proposition 29. The Cotte-Dragominescu condition is equivalent to
the Mangasarian-Fromowitz condition.
Proof. Set A = ∇g. The Mangasarian-Fromowitz condition cor-
responds to the first case in the Gordon alternative. Therefore, the
only solution of∑y∇gi = 0 and y ≥ 0 is y = 0. Thus the Cotte-
Dragominescu condition is satisfied. Conversely, if the only solution
to∑y∇gi = 0 and y ≥ 0 is y = 0 the second case of the Gordon
alternative does not hold. Then the first alternative holds and so the
Mangasarian-Fromowitz condition is satisfied.
Theorem 30. If the Mangasarian-Fromowitz condition holds then the
constraint qualification condition is satisfied.
Proof. Let x0 ∈ X. Take w such that ∇gi(x0)w ≤ 0. We must
construct a curve x(ε) in such a way that x(ε) ∈ X for ε sufficiently
small and such that x(0) = w. Let v be a vector as in the Mangasarian-
Fromowitz condition. Take M sufficiently large and define
x(ε) = x0 + εw +Mε2v.
Then using Taylor’s series we have
gi(x(ε)) = gi(x0)+ε∇gi(x0)w+Mε2∇gi(x0)v+ε2
2wTD2gi(x0)w+O(ε3).
Thus, if M is large enough and ε sufficiently small gi(x(ε)) < 0.
Theorem 31. If either the Cotte-Dragominescu condition or the Arrow-
Hurwicz and Uzawa condition hold then so does the constraint qualifi-
cation condition.
48 1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS
6. Bibliographical notes
In what concerns linear programming problem, we have used the
books [GSS08] or [Fra02]...
2
Calculus of variations in one independent variable
This chapter is dedicated to a classical subject in the calculus of
variations: variational problems with one independent variable. These
are extremely important because of its applications to classical me-
chanics and Riemannian geometry. Furthermore they serve as a model
for optimal control problems and problems with multiple integrals. We
start in Section 1, by deriving the Euler-Lagrange equation and give
some elementary applications. Then, in section 2 we study additional
necessary conditions for minimizers, and in section 3 we discuss several
applications to Riemannian geometry and classical mechanics.
An introduction to the Hamiltonian formalism is discussed in sec-
tion 4. The next issue, section 5, is the study of sufficient conditions
for a trajectory to be a minimizer: first we establish the existence of
local minimizers, then we study the connections between smooth solu-
tions of Hamilton-Jacobi equations and global minimizers, and finally
we discuss the Jacobi equation, conjugate points and curvature.
Symmetries are an important topic in calculus of variations. In
section 6 we present Routh’s method for integration of Lagrangian
systems and Noether’s theorem.
Of course, not every solution to the Euler-Lagrange equation is a
minimizer. Section 7 is a brief introduction to minimax methods and to
the mountain pass theorem. We also consider several examples of non-
existence of minimizing orbits (Lavrentiev phenomenon) and relaxation
methods (Young measures) in section 9.
49
50 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
Invariant measures for Lagrangian and Hamiltonian systems are
considered in section 8.
The next part of this chapter is dedicated to the study of the ge-
ometry of Hamiltonian systems: symplectic and Poisson structures,
Darboux theorem and Arnold-Liouville integrability, section 10
In the last section, section 11 we consider perturbation problems
and describe the Linstead series perturbation procedure.
We end the chapter with bibliographical notes.
1. Euler-Lagrange Equations
In classical mechanics, the trajectories x : [0, T ] → Rn of a me-
chanical system are determined by a variational principle called the
minimal action principle. This principle asserts that the trajectories
are minimizers (or at least critical points) of an integral functional. In
this section we study this problem and discuss several examples.
Consider a mechanical system on Rn with kinetic energy K(x, v)
and potential energy U(x, v). We define the Lagrangian, L(x, v) : Rn×Rn → R to be difference between the kinetic energy K and potential
energy U of the system, that is, L = K−U . The variational formulation
of classical mechanics asserts that trajectories of this mechanical system
minimize (or are at least critical points) of the action functional
S[x] =
∫ T
0
L(x(t), x(t))dt,
under fixed boundary conditions. More precisely, a C1 trajectory x :
[0, T ]→ Rn is a minimizer S under fixed boundary conditions if for any
C1 trajectory y : [0, T ]→ Rn such that x(0) = y(0) and x(T ) = y(T )
we have
S[x] ≤ S[y].
1. EULER-LAGRANGE EQUATIONS 51
In particular, for any C1 function ϕ : [0, T ]→ Rn with compact support
in (0, T ), and any ε ∈ R we have
i(ε) = S[x + εϕ] ≥ S[x] = i(0).
Thus i(ε) has a minimum at ε = 0. So, if i is differentiable, i′(0) = 0. A
trajectory x is a critical point of S, if for any C1 function ϕ : [0, T ]→ Rn
with compact support in (0, T ) we have
i′(0) =d
dεS[x + εϕ]
∣∣∣∣ε=0
= 0.
The critical points of the action which are of class C2 are solutions
to an ordinary differential equation, the Euler-Lagrange equation, that
we derive in what follows. Any minimizer of the action functional
satisfies further necessary conditions which will be discussed in section
2.
Theorem 32 (Euler-Lagrange equation). Let L(x, v) : Rn × Rn → Rbe a C2 function. Suppose that x : [0, T ]→ Rn is a C2 critical point of
the action S under fixed boundary conditions x(0) and x(T ). Then
(29)d
dtDvL(x, x)−DxL(x, x) = 0.
Proof. Let x be as in the statement. Then for any ϕ : [0, T ]→ Rn
with compact support on (0, T ), the function
i(ε) = S[x + εϕ]
has a minimum at ε = 0. Thus
i′(0) = 0,
that is, ∫ T
0
DxL(x, x)ϕ+DvL(x, x)ϕ = 0.
Integrating by parts, we conclude that∫ T
0
[d
dtDvL(x, x)−DxL(x, x)
]ϕ = 0,
52 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
for all ϕ : [0, T ] → Rn with compact support in (0, T ). This implies
(29) and ends the proof of the theorem.
Example 11. In classical mechanics, the kinetic energy K of a particle
with mass m with trajectory x(t) is:
K = m|x|2
2.
Suppose that the potential energy U(x) depends only on the position x.
Assume also that U is smooth. Then the Lagrangian for this mechanical
system is then
L = K − U.
and the corresponding Euler-Lagrange equation is
mx = −U ′(x),
which is the Newton’s law. J
Exercise 36. Let P ∈ Rn, and consider the Lagrangian L(x, v) : Rn×Rn → R defined by L(x, v) = g(x)|v|2 +P ·v−U(x), where g and U are
C2 functions. Determine the Euler-Lagrange equation and show that it
does not depend on P .
Exercise 37. Suppose we form a surface of revolution by connecting a
point (x0, y0) with a point (x1, y1) by a curve (x, y(x)), x ∈ [0, 1], and
then revolving it around the y axis. The area of this surface is∫ x1
x0
x√
1 + y2dx.
Compute the Euler-Lagrange equation and study its solutions.
To understand the behavior of the Euler-Lagrange equation it is
sometimes useful to change coordinates. The following proposition
shows how this is achieved:
Proposition 33. Let x : [0, T ]→ Rn be a critical point of the action∫ T
0
L(x, x)dt.
1. EULER-LAGRANGE EQUATIONS 53
Let g : Rn → Rn be a C2 diffeomorphism and L given by
L(y, w) = L(g(y), Dg(y)w).
Then y = g−1 x is a critical point of∫ T
0
L(y, y)dt.
Proof. This is a simple computation and is left as an exercise to
the reader.
Before proceeding, we will discuss some applications of variational
methods to classical mechanics. As mentioned before, the trajectories
of a mechanical system with kinetic energy K and potential energy
U are critical points of the action corresponding to the Lagrangian
L = K−U . In the following examples we use this variational principle
to study the motion of a particle in a central field, and the planar two
body problem.
Example 12 (Central field motion). Consider the Lagrangian of a
particle in the plane subjected to a radial potential field.
L(x,y, x, y) =x2 + y2
2− U(
√x2 + y2).
Consider polar coordinates, (r, θ), that is (x, y) = (r cos θ, r sin θ) =
g(r, θ), We can change coordinates (see proposition 33) and obtain the
Lagragian in these new coordinates
L(r, θ, r, θ) =r2θ2 + r2
2− U(r).
Then the Euler-Lagrange equations can be written as
d
dtr2θ = 0
d
dtr = −U ′(r) + rθ2.
The first equation implies that r2θ ≡ η is conserved. Therefore, rθ2 =η2
r3 . Multiplying the second equation by r we get
d
dt
[r2
2+ U(r) +
η2
2r2
]= 0.
54 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
Consequently
Eη =r2
2+ U(r) +
η2
2r2
is a conserved quantity. Thus, we can solve for r as a function of r
(given the values of the conserved quantities Eη and η) and so obtain
a first-order differential equation for the trajectories. J
Example 13 (Planar two-body problem). Consider now the problem
of two point bodies in the plane, with trajectories (x1,y1) and (x2,y2).
Suppose that the interaction potential energy U depends only on the
distance√
(x1 − x2)2 + (y1 − y2)2 between them. We will show how
to reduce this problem to the one of a single body under a radial field.
The Lagrangian of this system is
L = m1x2
1 + y21
2+m2
x22 + y2
2
2− U(
√(x1 − x2)2 + (y1 − y2)2).
Consider new coordinates (X, Y, x, y), where (X, Y ) is the center of
mass
X =m1x1 +m2x2
m1 +m2
, Y =m1y1 +m2y2
m1 +m2
,
and (x, y) the relative position of the two bodies
x = x1 − x2, y = y1 − y2.
In these new coordinates the Lagrangian, using proposition 33, is
L = L1(X, Y) + L2(x,y, x, y).
Therefore, the equations for the variables X and Y are decoupled from
the ones for x,y. Elementary computations show that
d2
dt2X =
d2
dt2Y = 0.
Thus X(t) = X0 +VXt and Y(t) = Y0 +VY t, for suitable constants X0,
Y0, VX and VY .
Since
L2 =m1m2
m1 +m2
x2 + y2
2− U(
√x2 + y2),
the problem now is reduced to the previous example. J
1. EULER-LAGRANGE EQUATIONS 55
Exercise 38 (Two body problem). Consider a system of two point
bodies in R3 with masses m1 and m2, whose relative location is given
by the vector r ∈ R3. Assume that the interaction depends only on
the distance between the bodies. Show that by choosing appropriate
coordinates, the motion can be reduced to the one of a single point
particle with mass M = m1m2
m1+m2under a radial potential. Show, by
proving that r × r is conserved, that the orbit of a particle under a
radial field lies in a fixed plane for all times.
Exercise 39. Let x : [0, T ] → Rn be a solution to the Euler-Lagrange
equation associated to a C2 Lagrangian L : Rn × Rn → R. Show that
E(t) = −L(x, x) + x ·DvL(x, x)
is constant in time. For mechanical systems this is simply the conser-
vation of energy. Occasionally, the identity ddtE(t) = 0 is also called
the Beltrami identity.
Exercise 40. Consider a system of n point bodies of mass mi, and
positions ri ∈ R3, 1 ≤ i ≤ n. Suppose the kinetic energy is T =∑imi2|r|2 and the potential energy is U = −
∑i,j 6=i
mimj2|ri−rj | . Let I =∑
imi|ri|2. Show that
d2
dt2I = 4T + 2U,
which is strictly positive if the energy T +U is positive. What implica-
tions does this identity have for the stability of planetary systems?
Exercise 41 (Jacobi metric). Let L(x, v) : Rn × Rn → R be a C2
Lagrangian. Let x : [0, T ] → Rn be a solution to the corresponding
Euler-Lagrange
(30)d
dtDvL−DxL = 0,
for the Lagrangian
L(x, v) =|v|2
2− V (x).
Let E(t) = |x(t)|22
+ V (x(t)).
1. Show that E = 0.
56 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
2. Let E0 = E(0). Show that x is a solution to the Euler-Lagrange
equation
(31)d
dtDvLJ −DxLJ = 0
associated to LJ =√E0 − V (x)|x|.
3. Show that any reparametrization in time of x is also a solution
to (31) and observe that the functional∫ T
0
√E0 − V (x)|x|
represents the lenght of the path between x(0) and x(T ) using
the Jacobi metric g =√E0 − V (x).
4. Show that the solutions to the Euler-Lagrange (31) when repar-
ametrized in time in such a way that the energy of the reparametrized
trajectory is E0 satisfy (30).
Exercise 42 (Braquistochrone problem). Let (x1, y1) be a point in a
(vertical) plane. Show that the curve y = u(x) that connects (0, 0) to
(x1, y1) in such a way that a particle with unit mass moving under the
influence a unit gravity field reaches (x1, y1) in the minimum amount
of time minimizes ∫ x1
0
√1 + u2
−2udx.
Hint: use the fact that the sum of kinetic and potential energy is con-
stant.
Determine the Euler-Lagrange equation and study its solutions, us-
ing exercise 39.
Exercise 43. Consider a second-order variational problem:
(32) minx
∫ T
0
L(x, x, x)
where the minimum is taken over all trajectories x : [0, T ] → Rn
with fixed boundary data x(0),x(T ), x(0), x(T ). Determine the Euler-
Lagrange equation corresponding to .
2. FURTHER NECESSARY CONDITIONS 57
2. Further necessary conditions
A classical strategy in the study of variational problems consists
in establishing necessary conditions for minimizers. If there exists a
minimizer and if the necessary conditions have a unique solution, then
this solution has to be the unique minimizer and thus the problem is
solved. In addition to Euler-Lagrange equations, several other neces-
sary conditions can be derived. In this section we discuss boundary
conditions which arise, for instance when the end-points are not fixed,
and second-order conditions.
2.1. Boundary conditions. In certain problems, the boundary
conditions, such as end point values are not prescribed a-priori. In
this case, it is possible to prove that the minimizers satisfy certain
boundary conditions automatically. These are called natural boundary
conditions.
Example 14. Consider the problem of minimizing the integral
(33)
∫ T
0
L(x, x)dt,
over all C2 curves x : [0, T ] → Rn. Note that the boundary values for
the trajectory x at t = 0, T are not prescribed a-priori.
Let x be a minimizer of (33) (with free endpoints). Then for all
ϕ : [0, T ]→ Rn, not necessarily compactly supported,∫ T
0
DxL(x, x)ϕ+DvL(x, x)ϕdt = 0.
Integrating by parts and using the fact that x is a solution to the
Euler-Lagrange equation, we conclude that
DvL(x(0), x(0)) = DvL(x(T ), x(T )) = 0.
J
58 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
Exercise 44. Consider the problem of minimizing the integral∫ T
0
L(x, x)dt,
over all C2 curves x : [0, T ]→ Rn such that x(0) = x(T ). Deduce that
DvL(x(0), x(0)) = DvL(x(T ), x(T )).
Use the previous identity to show that any periodic (smooth) minimizer
is in fact a periodic solutions to the Euler-Lagrange equations.
Exercise 45. Consider the problem of minimizing∫ T
0
L(x, x)dt+ ψ(x(T )),
with x(0) fixed and x(T ) free. Derive a boundary condition at t = T
for the minimizers.
Exercise 46 (Free boundary).
Consider the problem of minimizing∫ T
0
L(x, x),
over all terminal times T and all C2 curves x : [0, T ]→ Rn. Show that
x is a solution to the Euler-Lagrange equation and that
L(x(T ), x(T )) = 0,
DxL(x(T ), x(T ))x(T ) +DvL(x(T ), x(T ))x(T ) ≥ 0,
DvL(x(T ), x(T )) = 0.
Let q ∈ R and L : R2 → R given by
L(x, v) =(v − q)2
2+x2
2− 1
If possible, determine T and x : [0, T ]→ R that are (local) minimizers
of ∫ T
0
L(x, x)ds,
with x(0) = 0.
2. FURTHER NECESSARY CONDITIONS 59
2.2. Second-order conditions. If f : R → R is a C2 function
which has a minimum at a point x0 then f ′(x0) = 0 and f ′′(x0) ≥ 0.
For the minimal action problem, the analog of the vanishing of the first
derivative is the Euler-Lagrange equation. We will now consider the
analog to the second derivative being non-negative.
The next theorem concerns second-order conditions for minimizers:
Theorem 34 (Jacobi’s test). Let L(x, v) : Rn × Rn → R be a C2
Lagrangian. Let x : [0, T ] → Rn be a C1 minimizer of the action
under fixed boundary conditions. Then, for each η : [0, T ] → Rn, with
compact support in (0, T ), we have
(34)
∫ T
0
1
2ηTD2
xxL(x, x)η + ηTD2xvL(x, x)η +
1
2ηTD2
vvL(x, x)η ≥ 0.
Proof. If x is a minimizer, the function ε 7→ I[x + εη] has a
minimum at ε = 0. By computing d2
dε2I[x + εη] at ε = 0 we obtain
(34).
A corollary of the previous theorem is Lagrange’s test that we state
next:
Corollary 35 (Lagrange’s test). Let L(x, v) : Rn × Rn → R be a C2
Lagrangian. Suppose x : [0, T ] → Rn is a C1 minimizer of the action
under fixed boundary conditions. Then
D2vvL(x, x) ≥ 0.
Proof. Use Theorem 34 with η = εξ(t) sin tε, for ξ : [0, T ] → Rn,
with compact support in (0, T ), and let ε→ 0.
Exercise 47. Let L : R2n → R be a continuous Lagrangian and let
x : [0, T ]→ Rn be a continuous piecewise C1 trajectory. Show that for
each δ > 0 there exists a trajectory yδ : [0, T ] → Rn of class C1 such
that ∣∣∣∣∫ T
0
L(x, x)−∫ T
0
L(yδ, yδ)
∣∣∣∣ < δ.
60 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
As a corollary, show that the value of the infimum of the action over
piecewise C1 trajectories is the same as the infimum over trajectories
globally C1. Note, however, that a minimizer may not be C1.
Exercise 48 (Weierstrass test). Let x : [0, T ]→ Rn be a C1 minimum
of the action corresponding to a Lagrangian L. Let v, w ∈ Rn and
0 ≤ λ ≤ 1 be such that λv + (1− λ)w = 0. Show that
λL(x, x + v) + (1− λ)L(x, x + w) ≥ L(x, x).
Hint: To prove the inequality at a point t0, choose η such that
η(t) =
v if t0 ≤ t ≤ t+ λε
w if t+ λε < t ≤ t0 + ε
0 otherwise
and consider I[x + η], as ε→ 0.
3. Applications to Riemannian geometry
This section is dedicated to some applications of the calculus of
variations to Riemannian geometry, namely the study of geodesics and
curvature. We also present some applications to geometric mechanics,
namely the study of the rigid body.
In our examples we will use most of the time local coordinates and
will not try to address global problems in geometry. In fact, by using
suitable charts, the problems we address can usually be reduced to
problems in Rn. To simplify the notation we will also use the Einstein
convention for repeated indices, that is aibi in fact is an abreviation of∑i aibi.
Example 15. Let M be a Riemannian manifold with metric g, defined
in local coordinates by the positive definite symmetric matrix gij(x).
Let L : TM → R be given by
L(x, v) =1
2gij(x)vivj.
3. APPLICATIONS TO RIEMANNIAN GEOMETRY 61
Let x : [a, b]→M be a curve that minimizes∫ b
a
L(x, x)dt,
over all curves with certain fixed boundary conditions. Then, we have
d
dt(gijxi)−
1
2Djgmkxmxk = 0,
that is,
(35) xi +1
2
[gij (Dkgmj +Dkgmj −Djgmk)
]xmxk = 0,
where gij represents the inverse matrix of gij. We can write the previous
equation in the more compact form
xi + Γikmxmxk = 0,
where
(36) Γikm =1
2gij (Dkgmj +Dmgkj −Djgmk)
is the Christoffel symbol for the metric g (note that the change in the
order of the indices in the second term does not change the sum in (35)
but makes Γ symmetric in the indices m and k). J
Theorem 36. Let gij be a smooth Riemannian metric in Rn. The
critical points x of the functional
(37)
∫ T
0
1
2gij(x)xixjdt
are also critical points of the functional
(38)
∫ T
0
√gij(x)xixjdt,
Additionally, we can reparametrize the critical points of (38) in such a
way that they are also critical points of (37).
Proof. The fact that the critical points of (37) are critical points
of (38) is a simple computation. To prove the second part of the
theorem it suffices to observe that the solutions of the Euler-Lagrange
associated to L preserve the energy E = 12gij(x)xixj. Using this fact is
62 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
easy to find the correct parametrization of the critical points of (38).
The minimizers of (38) are called geodesics, although sometimes
the name is also used for critical points.
Example 16. Consider a parametrization f : A ⊂ Rm → Rn of a
m-dimensional manifold. The induced metric in Rm is represented by
the matrix
g = (Df)TDf.
The motivation is the following, given a curve θ(t) ∈ M consider the
corresponding tangent vector θ(t) in TM . Let x = f(θ) and x = Dfθ.
Then we define
〈θ, θ〉 = 〈x, x〉,
which gives rise, precisely to the induced metric. J
Exercise 49. Consider R2\0 with polar coordinates (r, θ). Show that
the standard metric in R2 can be written in these coordinates as
g =
[1 0
0 r2
].
Let
L(r, θ, r, θ) =r2 + r2θ2
2,
the Lagrangian of a free particle in polar coordinates. Compute the
Euler-Lagrange equation and determine the corresponding Christoffel
symbol.
Exercise 50. Consider the sphere x2 + y2 + z2 = 1 and the associate
spherical coordinates (θ, ϕ)
x = cos θ sinϕ
y = sin θ sinϕ
z = cosϕ,
3. APPLICATIONS TO RIEMANNIAN GEOMETRY 63
θ ∈ (0, 2π) and ϕ ∈ (0, π). Show that the induced metric is given by
the matrix
g =
[sin2 ϕ 0
0 1
].
Determine the Euler-Lagrange equation for L = 12gijvivj and the Christof-
fel symbol corresponding to the coordinates (θ, ϕ).
Exercise 51. Consider the revolution surface in R3 parametrized by
(r, θ):
x = r cos θ
y = r sin θ
z = z(r).
Show that the induced metric is
g =
[1 + (z′)2 0
0 r2
].
Show that the equation for the geodesics is
θ +2
rrθ = 0
r− r
1 + (z′)2θ2 +
z′z′′
1 + (z′)2r2 = 0
Determine the corresponding Christoffel symbols. Prove the Clairaut
identity, that is, that r cos β is constant, where β is the angle between∂∂θ
and r ∂∂r
+ θ ∂∂θ
.
Exercise 52 (Spherical pendulum). Show that for a spherical pendu-
lum with unit mass, the Lagrangian can be written as
L =θ2 sin2 ϕ+ θ2
2− U(ϕ).
Exercise 53. Determine the Lagrangian of point particle constrained
to the cone z2 = x2 + y2.
Exercise 54. Consider the Lagrangian for a particle of unit mass con-
strained to move in the cycloid parametrized by
x = θ − sin θ y = cos θ.
64 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
Show that the y coordinate is 2π-periodic for any initial condition that
yields a periodic orbit.
3.1. Parallel Transport. The Christoffel symbols Γikm can be
used to study parallel transport in a Riemannian manifold. In this
section we define and discuss the main properties of parallel transport.
Let M be a manifold and Ξ(M) the set of all C∞ vector fields in
M . As usual in differential geometry, we identify vector fields in M
with the corresponding first-order linear differential operators. That
is, if X = (X1, . . . Xn) is a vector field, we identify X with the first
order differential operator
X =∑i
Xi∂
∂xi.
Then, the commutator of two vector fields X and Y is the vector field
[X, Y ], which is defined through its action as a differential operator in
smooth functions f :
[X, Y ]f = X(Y (f))− Y (X(f)).
A connection ∇ in M is a mapping
∇ : Ξ× Ξ→ Ξ
satisfying the following properties
1. ∇fX+gYZ = f∇XZ + g∇YZ,
2. ∇X(Y + Z) = ∇XY +∇XZ,
3. ∇X(fY ) = f∇XY +X(f)Y ,
for all X, Y, Z ∈ Ξ(M) and all f, g ∈ C∞(M).
The vector ∇XY represents the rate of variation of Y along a curve
tangent to X.
3. APPLICATIONS TO RIEMANNIAN GEOMETRY 65
Exercise 55. Let M be a manifold and ∇ a connection in M . Define
Γikm as
∇ ∂∂xk
∂
∂xm= Γikm
∂
∂xi.
Show that
(39) ∇XY =
[ΓikmXkYm +Xj
∂Yi∂xj
]∂
∂xi,
whereX = Xj∂∂xj
e Y = Yj∂∂xj
.
In every point x, the formula (39) only depends on the value of the
vector field X at x, this allow us to define the covariant derivative of a
vector field Y along a curve x(t) trough
DY
dt= ∇xY.
A vector field X is parallel along a curve x(t) if
DX
dt= 0.
A connection is symmetric if
∇XY −∇YX = [X, Y ].
In general, connections in a manifold do not have to be symmetric, and
therefore
∇XY −∇YX = T (X, Y ) + [X, Y ],
where T is the torsion.
Exercise 56. Determine an expression for the torsion in local coordi-
nates.
Exercise 57. Let ∇ be a symmetric connection. Show that
Γkij = Γkji.
A manifold can be endowed with different connections. For Rie-
mannian manifolds, are of special interest the connections which are
66 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
compatible with the metric, that is, that for all vector fields X and Y
satisfy
(40)d
dt〈X, Y 〉 = 〈D
dtX, Y 〉+ 〈X, D
dtY 〉,
where the derivatives are taken along any arbitrary curve x(t). There
exists a unique symmetric connection compatible with the metric, the
Levi-Civita connection, whose Christoffel symbols are given by (36).
Theorem 37. Let M be a Riemannian manifold with metric g. The
the Levi-Civita connection, defined in local coordinates by the Christof-
fel symbols (36), is the unique connection which is symmetric and com-
patible with the metric g.
Proof. Let ∇ be a connection which is symmetric and compatible
with the metric g. Then one can use (40) to determine Dkgmj, Dmgkjand Djgmk and it is a simple computation to show that its Christoffel
symbols are give by (36).
Exercise 58. Verify that the Christoffel symbols define a connection.
Exercise 59. Use formula (36) to determine the Christoffel symbol
corresponding to the polar coordinates in R2 - compare with the result
of exercise 49.
Exercise 60. Let X be a vector field and x a trajectory that satisfies
dx
dt= X(x).
Show that in local coordinates
xi∂
∂xi= Xk(x)
∂Xi
∂xk
∂
∂xi,
and, therefore,
DX
dt=(Γikmxkxm + xi
) ∂
∂xi.
3. APPLICATIONS TO RIEMANNIAN GEOMETRY 67
Show that the previous definition is independent of the choice of local
coordinates, which allow us to define covariant acceleration as:
Dx
dt=(Γikmxkxm + xi
) ∂
∂xi,
for any C2 trajectory.
Example 17. Equation (15) can be then rewritten as
Dx
dt= 0,
which should be compared with the Newton law for a particle in the
absence of forces x = 0. J
Exercise 61. Let M be a Riemannian manifold in which is defined a
potential V : M → R. The corresponding Lagrangian is
L(x, v) =1
2gijvivj − V (x).
Determine the Euler-Lagrange equation.
Example 18. A force field in a manifold M is a mapping
F : TM → T ∗M
such that the image of TxM is a subset of T ∗xM . The generalized
Newton law is
gDx
dt= F,
in which the metric g is identified with the operator g : TM → T ∗M
defined by (gX)(Y ) = 〈X, Y 〉. J
3.2. Rigid Body - I. The rigid body is perhaps one of the best
examples in which the geometric formalism of the classical mechanics
is natural.
Consider a rigid body F with a fixed point at the origin. The
position of F at the time t can be described by a matrix M(t) ∈ SO(3)
with M(0) = I (recall that SO(3) is the set of 3× 3 matrices M that
satisfy MTM = I and detM = 1). More precisely, consider a point
of F which was at the position x in t = 0. Then at time t, the same
68 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
point is located at x(t) = M(t)x. If the body has mass density ρ(x),
the kinetic energy is given by
T =1
2
∫ρ(x)|Mx|2.
Since M is an isometry, we have
|Mx|2 = |M−1Mx|2,
that is,
T =1
2
∫ρ(x)|M−1Mx|2.
The mapping that to a vector K, tangent to SO(3) at the point M ,
associates
(41) K 7→ 〈K,K〉M =1
2
∫ρ(x)|M−1Kx|2
is a metric in SO(3) which is invariant by left translation. More pre-
cisely, let G ∈ SO(3) be fixed. The left translation by G is the mapping
LG : SO(3)→ SO(3) defined by
LGM = GM, M ∈ SO(3).
We have that L∗G : TMSO(3)→ TGMSO(3) is simply
L∗GK = GK K ∈ TMSO(3).
A metric is called left-invariant if 〈L∗GK,L∗GK〉LGM = 〈K,K〉M .
Exercise 62. Verify that the metric (41) is left invariant.
Exercise 63. Let M(t) be a C1 curve in SO(3). Show that:
1. The matrix M−1M is anti-symmetric.
2. There exists a vector ωM−1M = (ω1, ω2, ω3) such that
M−1M =
0 −ω3 ω2
ω3 0 −ω1
−ω2 ω1 0
and M−1Mx = ωM−1M ×x, in which × is the usual inner prod-
uct in R3. The vector ωMT M is called the angular velocity.
3. APPLICATIONS TO RIEMANNIAN GEOMETRY 69
3. Verify that the kinetic energy is a quadratic form in ωM , that
is, there exists a symmetric matrix I (the inertia tensor) such
that
T =1
2ωTMT M
IωMT M .
4. Let M1(t) and M2(t) be C1 curves in SO(3) and M(t) = M1(t)M2(t).
Determine ωMT M as a function of ωMT1 M1
and ωMT2 M2
.
5. Let y(t) be the trajectory of a body in a referential under con-
stant rotation. Identify the forces that act over the body: refer-
ential acceleration, centrifugal force and Coriolis force.
Let M(t) be a curve in SO(3). The trajectory of a point x is
x(t) = M(t)x. Let G ∈ SO(3) and consider the change of coordinates
Gy = x. Then
y(t) = GTM(t)Gy.
And, therefore, in the new coordinates the trajectory isN(t) = GTM(t)G
The kinetic energy can be written as
T =1
2ωTMT M
IωMT M =1
2ωTNT N
IωNT N .
We would like to relate I and ωNT N with I and ωMT M . We have
ωNT N ∧ x = GTMTMGx = GT (ωMT M ∧ (Gx))
= GTG[(GTωMT M) ∧ (GT (Gx))
]= (GTωMT M) ∧ x,
that is, ωNT N = GTωMT M and, consequently,
I = GT IG.
Since I is symmetric, we can always choose a rotation matrix G such
that in the new referential
I =
I1 0 0
0 I2 0
0 0 I3
.The constants Ii are called the principal moments of inertia.
70 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
3.3. Poincare equations. Let M be a differentiable manifold.
Consider a set of n linearly independent vector fields Zi in M . The
speed x of a trajectory x : [0, T ] → M can be written as a linear
combination of these vector fields:
x(t) = wi(t)Zi(x),
the functions wi(t) are called quasi-velocities [AKN97].
Sometimes it is useful to write the Lagrangian as a function of
the quasi-velocities, that is, we write L(x,w). We will deduce the
Euler-Lagrange equations in this situation. Let us consider a family of
trajectories xτ (t) depending differentiably on a parameter τ . We have
∂xτ∂t
= wiZi∂xτ∂τ
= ξiZi.
Write Zi = Zki
∂∂xk
. Then by differentiating and dropping the subscript
in xτ ,
∂2xk∂τ∂t
=∂wi∂τ
Zki + wi
∂Zki
∂xm
∂xm∂τ
=∂wi∂τ
Zki + wiξjZ
mj
∂Zki
∂xm,
and
∂2xk∂t∂τ
=∂ξi∂tZki + ξj
∂Zkj
∂xm
∂xm∂t
=∂ξi∂tZki + wiξjZ
mi
∂Zkj
∂xm.
As ∂2x∂t∂τ
= ∂2x∂τ∂t
and
[Zj, Zi] =
[Zmj
∂Zki
∂xm− Zm
i
∂Zkj
∂xm
]∂
∂xk,
we have
0 =∂ξi∂tZi −
∂wi∂τ
Zi + wiξj[Zi, Zj],
that is,∂wi∂τ
=∂ξi∂t
+ cikjwkξj,
3. APPLICATIONS TO RIEMANNIAN GEOMETRY 71
where cikjZi = [Zk, Zj]. Then
d
dτ
∫ T
0
L(xτ ,wτ ) =
∫ T
0
Zi(L)ξi +∂L
∂wi
(∂ξi∂t
+ cikjwkξj
)and, therefore,
d
dt
∂L
∂wi= Zi(L) +
∂L
∂wkckjiwj.
3.4. Rigid body - II. Let M and N be differentiable manifolds.
Recall that for a diffeomorphism f : M → N and any vector field X in
TM we define the vector field f∗X to be the vector field in TN which
satisfies
(f∗X)(h) = (X(h f)) f−1,
for all h ∈ C∞(N).
In the case of the rigid body, or more generally, in the case of a
Lagrangian defined in a Lie group and left invariant, we can choose the
vector fields Zi of the form
Zi(g) = g∗Zi(e),
that is, left invariant vector fields.
Lemma 38. Let Xi and Yi, i = 1, 2, be vector fields in a manifold M
and f : M →M a diffeomorfism. Assume that
Yi = f∗Xi.
Then
[Y1, Y2] = f∗[X1, X2].
Proof. Let p ∈M . We have
Yi(g)|f(p) = f∗Xi(g)|f(p) = Xi(g f)|p ,
that is,
Yi(g) f = Xi(g f).
Therefore
Y1(Y2(g))|f(p) = X1(Y2(g) f) = X1(X2(g f)).
72 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
Consequently
[Y1, Y2] = f∗[X1, X2].
Thus, from the previous result, ckij is constant since
[g∗Zi, g∗Zj] = g∗[Zi, Zj].
Therefore, if L is left invariant L ≡ L(w). Consequently
d
dt
∂L
∂wi=
∂L
∂wkckjiwj.
In the case of a rigid body, using, if necessary, a orthogonal transfor-
mation to diagonalize the inertia tensor into diag(I1, I2, I3). We can
choose vectors Z1, Z2, Z3, such that in the identity they have the fol-
lowing form:
Z1 =
0 1 0
−1 0 0
0 0 0
, Z2 =
0 0 −1
0 0 0
1 0 0
e Z3 =
0 0 0
0 0 1
0 −1 0
and that are left invariant. Thus, the Lagrangian is
L(w) =I1w
21 + I2w
22 + I3w
23
2.
Exercise 64. Verify that the commutator of the vector fields Zi corre-
sponds to the commutator of the corresponding matrices, and that
[Z1, Z2] = Z3 [Z2, Z3] = Z1 [Z3, Z1] = Z2.
Using the previous exercise, the Euler-Lagrange equation are then
I1w1 = (I2 − I3)w2w3(42)
I2w2 = (I3 − I1)w3w1
I3w3 = (I1 − I2)w1w2,
that is,
(43) Iw + w × (Iw) = 0.
3. APPLICATIONS TO RIEMANNIAN GEOMETRY 73
The angular momentum vector is given by
N = Iω.
With this notation, (43) can be written as
(44) N = N ∧ ω.
From the previous equation we conclude that
d
dt‖N‖2 = 0
d
dtN · ω = 0.
The first identity represents the conservation of the total angular mo-
mentum and the second the conservation of the energy. Let
L =
0 N1 −N2
−N1 0 N3
N2 −N3 0
, A =
0 ω1 −ω2
−ω1 0 ω3
ω2 −ω3 0
.The equation (44) can be written as
(45) L = [A,L].
A pair (A,L) satisfying (45) is called a Lax pair. Equations with the
previous structure have a rich structure and are interesting in the study
of diverse equations such as Kortwreg-of-Vries equations.
Proposition 39. Let L be a solution of (45). Then the eigenvalues of
L are constant. Furthermore, if v0 is an eigenvalue of L at t = 0 and
v solves
v = Av,
with v(0) = v0 then v(t) is an eigenvalue for all t.
Proof. Let v(0) = v0 be an eigenvector of L at t = 0 with corre-
sponding eigenvalue λ ∈ C. Define v(t) through the differential equa-
tion
v = Av.
Thend
dtLv = Lv + Lv = ALv − LAv + LAv = ALv,
that is, w = Lv satisfies
w = Aw, w(0) = λv(0),
74 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
which implies w(t) = λv(t).
The Euler equation (42) admits as stationary solutions rotations
around each of the principal inertia axis. For instance, ω1 6= 0, ω2 =
ω3 = 0. In the case in which I1 = I2 = I3 the only solutions are
stationary rotations ω = 0.
Proposition 40. The stationary solution ω1 6= 0, ω2 = ω3 = 0 is stable
if I1 < I2, I3 or I2, I3 < I1 and unstable if I2 < I1 < I3 or I3 < I1 < I2.
Proof. In the unstable cases, it suffices to look at the linearized
matrix (42): 0 0 0
0 0 I3−I1I2
ω1
0 I1−I2I3
ω1 0
,and check that it has two eigenvalues with opposite sign. The stable
case requires some additional work which is left to the reader.
If I1 = I2 = Ic the body is called a symmetrical top. In this in case,
ω3 = 0
and
ω2 =Ic − I3
Icω1ω3
ω1 =I3 − IcIc
ω3ω2.
From this last equation one concludes that
ω1 = −(I3 − Ic)2
I2c
ω32ω1,
that is,
ω1 = −kω1,
with k > 0, which implies that ω1 is a periodic function, and, in a
similar way, the same holds for ω2.
4. HAMILTONIAN DYNAMICS 75
Finally, in the general case, the conservation of energy and total
angular momentum implies that the trajectory ω(t) satisfies:
I21ω
21 + I2
2ω22 + I2
3ω23 = C1
I1ω21 + I2ω
22 + I3ω
23 = C2,
that is, the trajectories belongs to the intersection of two ellipsoids.
Exercise 65. Consider a rigid body with mass density ρ. Show that
the inertia tensor admits the matricial representation:
I =
∫
(y2 + z2)dρ −∫xydρ −
∫xzdρ
−∫xydρ
∫(x2 + z2)dρ −
∫yzdρ
−∫xzdρ −
∫yzdρ
∫(x2 + y2)dρ
.Exercise 66. Show that S(θ, ϕ, ψ) given by cosϕ − sinϕ 0
sinϕ cosϕ 0
0 0 1
1 0 0
0 cos θ − sin θ
0 sin θ cos θ
cosψ − sinψ 0
sinψ cosψ 0
0 0 1
,for (θ, ϕ, ψ) ∈ (0, π) × (0, 2π) × (0, 2π) defines a local parametrization
of SO(3). The coordinates (θ, ϕ, ψ) are called the Euler angles.
Exercise 67. Consider a rigid body with a fixed point and such that
I1 = I2. Show that the kinetic energy written in the local coordinates
(θ, ϕ, ψ) is
I1
2(θ2 + ϕ2 sin θ2) +
I3
2(ψ + φ cos θ)2.
4. Hamiltonian dynamics
In this section we introduce the Hamiltonian formalism of Classical
Mechanics. We start by discussing the main properties of the Legendre
transform. Then we derive Hamilton’s equations. Afterwards we dis-
cuss briefly the classical theory of canonical transformations. The sec-
tion ends with a discussion of additional variational principles.
76 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
4.1. Legendre transform. Before we proceed, we need to discuss
the Legendre transform of convex functions. The Legendre transform is
used to define the Hamiltonian of a mechanical system and it plays an
essential role in many problems in calculus of variations. Additionally,
it illustrates many of the tools associated with convexity.
Let L(v) : Rn → R be a convex function, satisfying the following
superlinear growth condition:
lim|v|→∞
L(v)
|v|= +∞.
The Legendre transform L∗ of L is
L∗(p) = supv∈Rn
[−v · p− L(v)] .
This is the usual definition of Legendre transform in optimal control,
see [FS93] or [BCD97]. However, it differs by a sign from the Legendre
transform traditionally used in classical mechanics:
L](p) = supv∈Rn
[v · p− L(v)] ,
as it is defined, for instance, in [AKN97] or [Eva98b]. They are
related by the elementary identity
L∗(p) = L](−p).
We will frequently denote L∗(p) by H(p). The Legendre transform of
H is denoted by H∗ and is
H∗(v) = supp∈Rn
[−p · v −H(p)] .
In classical mechanics, the Lagrangian L can depend also on a po-
sition coordinate x ∈ Rn, L(x, v), but for purposes of the Legendre
transform x is taken as a fixed parameter. In this case we write also
H(p, x) = L∗(p, x).
Proposition 41. Let L(x, v) be a C2 function, which for each x fixed
is uniformly convex and superlinear in v. Let H = L∗. Then
1. H(p, x) is convex in p;
4. HAMILTONIAN DYNAMICS 77
2. H∗ = L;
3. for each x
lim|p|→∞
H(p, x)
|p|=∞;
4. let v∗ be defined by p = −DvL(x, v∗), then
H(p, x) = −v∗ · p− L(x, v∗);
5. in a similar way, let p∗ be given by v = −DpH(p∗, x), then
L(x, v) = −v · p∗ −H(p∗, x);
6. if p = −DvL(x, v) or v = −DpH(p, x), then
DxL(x, v) = −DxH(p, x).
Proof. The first statement follows from the fact that the supre-
mum of convex functions is a convex function. To prove the second
point, observe that
H∗(x,w) = supp
[−w · p−H(p, x)]
= supp
infv
[(v − w) · p+ L(x, v)] .
For v = w we conclude that
H∗(x,w) ≤ L(x,w).
The opposite inequality is obtained by observing, since L is convex in
v, that for each w ∈ Rn there exists s ∈ Rn such that
L(x, v) ≥ L(x,w) + s · (v − w).
Therefore,
H∗(x,w) ≥ supp
infv
[(p+ s) · (v − w) + L(x,w)] ≥ L(x,w),
by letting p = −s.
To prove the third point observe that
H(p, x)
|p|≥ λ−
L(x,−λ p|p|)
|p|,
78 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
by choosing v = −λ p|p| . Thus, we conclude
lim inf|p|→∞
H(p, x)
|p|≥ λ.
Since λ is arbitrary, we have
lim inf|p|→∞
H(p, x)
|p|=∞.
To establish the fourth point, note that for fixed p the function
v 7→ v · p+ L(x, v)
is differentiable and strictly convex. Consequently, its minimum, which
exists by coercivity and is unique by the strict convexity, is achieved
for
−p−DvL(x, v) = 0.
Note also that v as function of p is a differentiable function by the
inverse function theorem.
The proof of the fifth point is similar.
Finally, to prove the last item, observe that for
p(x, v) = −DvL(x, v),
we have
H(p(x, v), x) = −v · p(x, v)− L(x, v).
Differentiating this last equation with respect to x and using
v = −DpH(p(x, v), x),
we obtain
DxH = −DxL.
Exercise 68. Compute the Legendre transform of the following func-
tions:
4. HAMILTONIAN DYNAMICS 79
1.
L(x, v) =1
2aij(x)vivj + hi(x)vi − U(x),
where aij is a positive definite matrix and h(x) an arbitrary
vector field.
2.
L(x, v) =√aij(x)vivj,
where aij is a positive definite matrix.
3.
L(x, v) =1
2|v|λ − U(x),
with λ > 1.
Exercise 69. By allowing the Lagrangian and its Legendre transform
to assume the values ±∞ comute the Legendre transforms of
1. for ω ∈ Rn
L(v) =
0 if v = ω
+∞ otherwise.
2. for ω ∈ Rn set
L(v) = ω · v.3. for R > 0
L(v) =
0 if |v| ≤ R
+∞ otherwise.
4.2. Hamiltonian formalism. To motivate the Hamiltonian for-
malism, we consider the following alternative problem. Rather than
looking for curves x : [0, T ]→ Rn, which minimize the action∫ T
0
L(x, x)dt
we can consider extended curves (x,v) : [0, T ] → R2n which minimize
the action
(46)
∫ T
0
L(x,v)dt
80 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
and that satisfy the additional constraint x = v. Obviously, this prob-
lem is equivalent to the original one, however it motivates the intro-
duction of a Lagrange multiplier p in order to enforce the constraint.
Therefore, we will look for critical points of
(47)
∫ T
0
L(x,v) + p · (v − x)dt.
Proposition 42. Let L : Rn × Rn → R be a smooth Lagrangian. Let
(x,v) : [0, T ] → R2n be a critical point of (46) under fixed boundary
conditions and under the constraint x = v (the choice of p is irrelevant
since the corresponding term always vanishes). Let
p = −DvL(x,v).
Then the curve (x,v,p) is a critical point of (47) under fixed boundary
conditions. Additionally, any critical point (x,v,p) of (47) satisfiesx = v
p = −DvL(x,v)
p = DxL(x,v).
In particular, x is a critical point of (46). Furthermore, the Euler-
Lagrange equation can be rewritten as
p = DxH(p,x) x = −DpH(p,x).
Proof. Let φ, ψ and η be C2([0, T ],Rn) with compact support in
(0, T ). Then, at ε = 0
d
dε
∫ T
0
L(x + εφ,v + εψ) + (p + εη) · (v − x) + ε(p + εη) · (ψ − φ)
=
∫ T
0
DxL(x, x)φ+DvLψ + p · (ψ − φ) + η · (v − x)
=
∫ T
0
[DxL(x, x) + p]φ = 0.
If p = −DvL(x, v), then v maximizes
−p · v − L(x, v).
4. HAMILTONIAN DYNAMICS 81
Let
H(p, x) = maxv
[−p · v − L(x, v)] .
By proposition 41 we have
DxH(p, x) = −DxL(x, v)
whenever
p = −DvL(x, v).
Additionally, we also have
v = −DpH(p, x).
Therefore, the Euler-Lagrange equation can be rewritten as
p = DxH(p,x) x = −DpH(p,x).
These are the Hamilton equations.
Exercise 70. Suppose H(p, x) : Rn ×Rn → R is a C1 function. Show
that the energy, which coincides with H, is conserved by the Hamilton-
ian flow sinced
dtH(p,x) = 0.
4.3. Canonical transformations. Before discussing canonical trans-
formations we need to review some basic facts about differential forms
in Rn. Firstly, recall that given a C1 function f : Rn → R its differen-
tial, denoted by df , is a mapping df : Rn × Rn → R that to any point
x ∈ Rn and each direction v ∈ Rn it associates the derivative of f in
the direction v:
df(x)(v) =d
dtf(x+ vt)
∣∣∣∣t=0
.
Note that for each x ∈ Rn this mapping is linear in v. For instance, for
each coordinate i ∈ 1, . . . , n we can consider the projection function
in this coordinate: x 7→ xi, whose differential is dxi.
A (first order) differential form is any mapping
Λ : Rn × Rn → R,
82 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
which is linear on the second coordinate. For simplicity, we assume
also that this map is continuous in the first coordinate. Clearly we can
write
Λ =∑i
fi(x)dxi,
where fi(x) = Λ(x)(ei).
An important example of a differential form is the differential df of
a C1 function f . In fact, by linearity, we have
df =∑i
∂f
∂xidxi.
The integral of a differential Λ form along a path γ : [0, T ]→ Rn is
simply ∫ T
0
Λ(γ(t))(γ(t))dt =∑i
∫ T
0
fi(γ(t))γi(t)dt.
Exercise 71 (Poincare-Cartan invariant). Fix t ∈ R and consider a
closed curve
γ = (x(s, t),p(s, t)) : [0, T ]→ R2n.
Suppose that for each fixed s ∈ [0, 1]
d
dtx(s, t) = −DpH(p(s, t),x(s, t))
d
dtp(s, t) = DxH(p(s, t),x(s, t)).
Show that ∮pdx ≡
∫ 1
0
p · ∂x
∂sds
is independent of t.
Exercise 72. Show that the critical points of∫ T
0
pdx +H(p,x)dt
under fixed boundary conditions satisfy the Hamilton equations.
4. HAMILTONIAN DYNAMICS 83
Let (x,p) be a solution of the Hamilton equation. By exercise
72, (x,p) is a critical point of∫pdx +Hdt.
Let S(x, p) : R2n → R be a C1 function. Then (x,p) is also a critical
point of ∫pdx +Hdt− dS,
because the last integral differs from the previous only be the addition
of the differential of a function S. Consider now a change of coordinates
P (x, p), X(x, p). In general the functional∫pdx + Hdt − dS when
rewritten in terms of the new coordinates (P,X) does not have the
form∫PdX + H(P,X)dt, and, therefore, the Hamilton equations in
these new coordinates may not have the standard form. A change of
coordinates (x, p) 7→ (X(x, p), P (x, p)) is called canonical if there exist
functions S and H(P,X) such that
(48) pdx+Hdt− dS = PdX + Hdt.
Consider now a solution (x,p) : [0, T ] → R2n of Hamilton’s equa-
tions. Suppose the coordinate change (x, p) 7→ (X(x, p), P (x, p)) is
canonical. Then the trajectory written in the new coordinates (X,P)
is a critical point of the functional∫ T
0
PdX + Hdt.
Therefore (X,P) satisfies Hamilton’s equations in the new coordinates,
which are
(49) P = DXH(P,X) X = −DP H(P,X).
Thus, in order to have (48), we must have (because the change of
coordinates does not depend on t)
H(p, x) = H(P (p, x), X(p, x)).
From this we conclude that
pdx− PdX = dS.
84 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
Suppose now we can write the function S in terms of x and X, that is
S ≡ S(x,X). Then
(50) p = DxS P = −DXS.
Consider now the inverse procedure. Given S(x,X), suppose that (50)
defines a change of coordinates (for this to happen locally it is sufficient,
by the implicit function theorem that detD2xXS 6= 0). Then, in these
new coordinates we have (49). Since S determines (at least formally)
the change of coordinates, we call it a generating function. J
Example 19. Consider the generating function S(x,X) = xX. Then
the corresponding canonical transformation is p = X, P = −x. Thus,
(x, p) 7→ (X,P ) = (p,−x) and H(P,X) = H(−P,X). J
Suppose now that S, written as a function of (x, P ), is:
S(x, P ) = −PX + S1(x, P ).
Then (48) can be written as:
pdx+ PdX +XdP −DxS1dx−DPS1dP = PdX,
that is,
p = DxS1 X = DPS1.
Example 20. Let S1(x, P ) = xP . Then p = P and X = x, therefore
S1 generates the identity transformation. J
Exercise 73. Assume now that S can be written as a function of X
and p and that we have
S(X, p) = px+ S2(X, p).
Determine the corresponding canonical transformation in terms of S2.
Exercise 74. Suppose that S can be written as a function of p and P
with the following form:
S(p, P ) = px− PX + S3(p, P ).
Determine corresponding canonical transformation in terms of S3.
4. HAMILTONIAN DYNAMICS 85
Example 21. Consider the Hamiltonian
H ≡ H(px, py, x− y).
Choosing
S1 = P1(x+ y) + P2(x− y)
we obtain
px = P1 + P2 py = P1 − P2,
X1 = x1 + x2 X2 = x− y,
and
H(P1, P2, X1, X2) ≡ H(P1, P2, X2) = H
(P1 + P2
2,P1 − P2
2, X2
),
which does not depend on X1 and, therefore, P1, the total linear mo-
mentum is conserved. J
Example 22. Let S1(x, P ) be a C2 solution of the Hamilton-Jacobi
equation
H(DxS1(x, P ), x) = H(P ).
Suppose that
X = DPS1(x, P ) p = DxS1(x, P )
defines implicitly a change of coordinates (x, p) 7→ (X,P ). Assume
that detD2xPS1 6= 0. Then, if (x(t),p(t)) satisfy
x = −DpH(p,x) p = DxH(p,x),
in the new coordinates we have
X = −DPH(P) P = 0.
J
Example 23. Consider a Hamiltonian H(p, x) with one degree of free-
dom, that is x, p ∈ R. We would like to construct a canonical change
of coordinates such that the new Hamiltonian depends only on P . We
will first construct the corresponding generating function. For that,
suppose that there exists such a generating function S1(x, P ). Then
dS1 = XdP + pdx.
86 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
Fix a value P . We will try to choose S1 so that the new Hamiltonian H
depends only on P , that is H(p(P,X), x(P,X)) = H(P ). Along each
curve γ = (x,p) : [0, T ]→ R2 such that P is constant, we have
dS1 = pdx.
Therefore,
S1(x(T ), P )− S1(x(0), P ) =
∫ T
0
p(t) · x(t)dt.
In principle, from the equation H(p, x) = H(P ). So we can solve for p
as a function of x and of the value H(P ). In this case, the generating
function is automatically determined as a function of H and of x. In the
following example we consider a concrete application of this technique.
J
Example 24. Consider the Hamiltonian system with one degree of
freedom:
H(p, x) =p2
2+ V (x),
with V (x) 2π-periodic. For each value of H(P ) we have (assuming for
definiteness p > 0)
S1(x, P ) =
∫ x
0
√2(H(P )− V (y))dy.
Therefore,
X =
∫ x
0
∂
∂H
√2(H(P )− V (y))DPH(P )dy.
In principle, the function H(P ) can be more or less arbitrary. To
impose uniqueness it is convenient to require periodicity in the change
of variables
X(0, P ) = X(2π, P ),
which implies
DPH(P ) =
[∂
∂H
∫ 2π
0
√2[H(P )− V (y)
]dy
]−1
.
J
4. HAMILTONIAN DYNAMICS 87
Exercise 75. Show that the polar coordinates change of variables (x, p) =
(r cos θ, r sin θ) is not canonical. Determine a function g(r) such that
(x, p) = (g(r) cos θ, g(r) sin θ) is a canonical transformation (for r > 0).
4.4. Other variational principles. In the case of Hamiltonian
systems, as the next exercise shows, there exists an additional varia-
tional principle:
Exercise 76. Show that the critical points (x,p) of the functional∫ T
0
px− xp
2+H(p,x)
are solutions to the Hamilton equation
Unfortunately the functional of the previous exercise is not coer-
cive in W 1,2 and may not have any minimizer. The Clarke duality
principle (following exercise) is another variational principle for convex
Hamiltonians which is coercive.
Exercise 77 (Clarke duality). Let H(p, x) : R2n → R be a C∞ func-
tion, strictly convex and coercive, both in x and p. Let H∗(vx, vp) :
R2n → R be the total Legendre transform
H∗(wx, wp) = supx,p−wx · x− wp · p−H(p, x).
Let (vx,vp) be a critical point of∫ T
0
1
2[vx · vp − vp · vx] +H∗(vx, vp).
Show that
x = −DvxH∗(vx, vp) p = −DvpH
∗(vx, vp)
is a solution of Hamilton’s equations.
Exercise 78. Apply the previous exercise to the Hamiltonian
H(p, x) =p2 + x2
2.
88 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
Example 25 (Maupertuis principle). Consider a system with Lagrangian
L and energy given by
E(x, x) = DvL(x, x)x− L(x, x).
Since the energy is conserved by the solutions of the Euler-Lagrange
equation, the critical points of the action are also critical points of the
functional ∫ T
0
L+ E =
∫ T
0
DvL(x, x)x,
under the constraint that energy is conserved.
Obviously, in general it is hard to construct energy-preserving vari-
ations. We are going to illustrate, in an example, how to avoid this
problem. Let L be the Lagrangian
L(x, v) =1
2gijvivj − U(x).
Then,
E =1
2gijvivj + U(x)
and
DvLv = gijvivj.
Thus we can write
DvLv = 2 (E − U(x)) .
Therefore the functional can be rewritten as
(51) M(x, E) =
∫ T
0
√2 (E − U(x))
√gijxixjdt
The last term represents the arc length along the curve that connects
x(0) to x(T ). This integral is independent of the parametrization and
therefore we can look at its critical points (without any constraint)
which obviously depend on the parameter E. Then, once determined,
in principle we can choose a parametrization of the curve that pre-
serves the energy. The next exercise shows that such critical points are
solutions to the Euler-Lagrange equation:
5. SUFFICIENT CONDITIONS 89
Exercise 79. Let x be a critical point of M(x, E0) parametrized in
such a way that
E(x, x) = E0.
Show that x is a solution of the Euler-Lagrange equation.
J
5. Sufficient conditions
This section addresses a very classical topic in the calculus of vari-
ations, namely the study of conditions that ensure that a solution to
the Euler-Lagrange equation is indeed a minimizer.
5.1. Existence of minimizers. In general, it is not possible to
guarantee that a solution to the Euler-Lagrange is a minimizer of the
action. However, for short time, the next theorem settles this issue.
Theorem 43 (Existence of minimizers). Let L(x, v) be strictly convex
in v satisfying
|D2xxL| ≤ C, |D2
xvL| ≤ C.
Let x : [0, T ]→ Rn be a solution to the Euler-Lagrange equation. Then,
for T sufficiently small, x is a minimizer of the action over all C1
functions y with the same endpoints: y(0) = x(0), and y(T ) = x(T ).
Proof. Observe that if f is a C2 function then
f(1) = f(0) + f ′(0) +
∫ 1
0
∫ s
0
f ′′(r)drds.
Applying this identity to
f(r) = L((1− r)x + ry, (1− r)x + ry),
90 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
we obtain
∫ T
0
L(y, y)dt
=
∫ T
0
[L(x, x) +DxL(x, x)(y − x) +DvL(x, x)(y − x)
+
∫ 1
0
∫ s
0
[(y − x)TD2
xxL((1− r)x + ry, (1− r)x + ry)(y − x)
+ 2(y − x)TD2xvL((1− r)x + ry, (1− r)x + ry)(y − x)
+(y − x)TD2vvL((1− r)x + ry, (1− r)x + ry)(y − x)
]drds
]dt.
Since x satisfies the Euler-Lagrange equation and, by strict convexity,
D2vvL ≥ γ, we have
∫ T
0
L(y, y)dt ≥∫ T
0
[L(x, x)
+
∫ 1
0
∫ s
0
((y − x)TD2
xxL((1− r)x + ry, (1− r)x + ry)(y − x)
+2(y − x)TD2xvL((1− r)x + ry, (1− r)x + ry)(y − x)
)drds
+γ|y − x|2]dt.
The one-dimensional Poincare inequality implies
∫ T
0
|y − x|2dt ≤ T 2
2
∫ T
0
|y − x|2dt,
that is,
∫ T
0
∫ 1
0
∫ s
0
(y − x)T ·
·D2xxL((1− r)x + ry, (1− r)x + ry)(y − x)drdsdt
≥ −CT 2
∫ T
0
|y − x|2.
5. SUFFICIENT CONDITIONS 91
Thus, for any ε,∫ T
0
∫ 1
0
∫ s
0
(y − x)T ·
·D2vxL((1− r)x + ry, (1− r)x + ry)(y − x)drdsdt
≥ −ε∫ T
0
|y − x|2 − C
ε
∫ T
0
|y − x|2
≥ −(ε+
CT 2
ε
)∫ T
0
|y − x|2.
Thus, choosing T sufficiently small and taking, ε = T we obtain∫ T
0
L(y, y)dt ≥∫ T
0
L(x, x) + θ
∫ T
0
|y − x|2,
for some θ > 0.
Exercise 80. Prove the one-dimensional Poincare inequality∫ T
0
φ2 ≤ T 2
2
∫ T
0
|φ|2
for all C1 function φ satisfying φ(0) = φ(T ) = 0.
Exercise 81. Suppose that the Lagrangian L instead of satisfying
|D2xxL| ≤ C, |D2
xvL| ≤ C,
as in theorem 43, satisfies
|D2xxL| ≤ C(1 + |v|2), |D2
xvL| ≤ C(1 + |v|).
Assume further that the curves y are constrained to have bounded
derivatives in L2. Can you adapt the proof and the statement of theo-
rem 43 to include this case?
5.2. Hamilton-Jacobi equations.
Theorem 44. Let V (x, t) be a C2 solution of the Hamilton-Jacobi
equation
(52) Vt = H(DxV, x),
for 0 ≤ t ≤ T . Let x be a solution to the equation
x = −DpH(DxV (x),x).
92 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
Then x is a solution to the Euler-Lagrange equation
d
dtDvL−DxL = 0
which minimizes the action
(53)
∫ T
0
L(x, x)dt,
under fixed boundary conditions.
Proof. Obviously, it suffices to show that the trajectory x mini-
mizes the action that automatically it will be a solution to the Euler-
Lagrange equation. Observe that the problem of minimizing (53) with
fixed endpoints is equivalent to minimize∫ T
0
L(x, x) + xDxV (x) + Vtdt,
with the same endpoint constraint. In the trajectory x we have∫ T
0
L(x, x) + xDxV (x) + Vtdt = 0.
But for any other trajectory y we have
L(y, y) + yDxV (y) ≥ L(y, y)−DpH(DxV (y),y)DxV (y)
= −H(DxV (y),y),
and, therefore, ∫ T
0
L(y, y) + yDxV (y) + Vt(y)dt ≥ 0.
To solve the Hamilton-Jacobi equation we can use the method of
characteristics: let (p,x) be a solution of Hamilton’s equation:
(54) p = DxH(p,x) x = −DpH(p,x),
with initial data (p(0),x(0)) = (DxV (x), x). Then
V (x(0), 0)− V (x(t), t) =
∫ t
0
L(x, x)ds.
5. SUFFICIENT CONDITIONS 93
Therefore, in order for the method of characteristics to yield a solution
to the Hamilton-Jacobi equation in a neighborhood of the trajectory,
we must have that the mapping
x 7→ x(t;x)
is invertible. As it was seen previously, the equation (54) is equivalent
to the Euler-Lagrange equation:
d
dtDvL(x, x)−DxL = 0.
The derivative of this equation with respect to a parameter is the Jacobi
equation
(55)d
dt
[D2vvLY +D2
xvLY]−D2
xvLY −D2xxLY = 0.
If Y (0) = I then there exists T > 0 such that detY (t) 6= 0 for all
0 ≤ t < T , and therefore the method of the characteristics yields a
local solution of the Euler-Lagrange equation.
5.3. Existence and regularity of minimizers. In this section
we assume that the Lagrangian L(x, v) is C∞, strictly convex in v,
satisfies
(56) − C + θ|v|2 ≤ L(x, v) ≤ C(1 + |v|2),
for θ > 0, and that, for each fixed compact K and x ∈ K we have
(57) |DxL(x, v)| ≤ CK(1 + |v|2),
(58) |DvL(x, v)| ≤ CK(1 + |v|).
Theorem 45. Suppose L(x, v) : Rn × Rn → R is smooth and satisfies
the previous (56), (57) and (58). Then, for any T > 0 and any x0, x1 ∈Rn, there exists a minimizer of x ∈ W 1,2[0, T ] of
(59)
∫ T
0
L(x, x)ds
satisfying x(0) = x0, x(T ) = x1.
94 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
Proof. Let xn be a minimizing sequence. Then, using (56) we
conclude that ‖xn‖L2 is uniformly bounded. By Poincare inequality,
we conclude that
supn‖xn‖W 1,2 <∞.
By Morrey’s theorem, the sequence xn is equicontinuous and bounded
(since xn(0) is fixed), thus there exists, by Ascoli-Arzela theorem, a
subsequence which converges uniformly. We can extract a further sub-
sequence that converges weakly in W 1,2 to a function x. We would like
to prove that x is a minimum. To do that it is enough to prove that
the functional is weakly lower semicontinuous, that is, that
(60) lim infn→∞
∫ T
0
L(xn, xn) ≥∫ T
0
L(x, x),
whenever xn x in W 1,2. By contradiction suppose that there is a
sequence xn x such that
(61) lim infn→∞
∫ T
0
L(xn, xn) <
∫ T
0
L(x, x),
By convexity,∫ T
0
L(xn, xn)(62)
≥∫ T
0
L(xn, xn)− L(x, xn) + L(x, x) +DvL(x, x)(xn − x).
Because xn x we have∫ T
0
DvL(x, x)(xn − x)→ 0,
since DvL(x, x) ∈ L2. From the uniform convergence of xn to x we
conclude that ∫ T
0
L(xn, xn)− L(x, xn)→ 0,
since
|L(xn, xn)− L(x, xn)| ≤ CK |xn − x|(1 + |xn|2).
Thus by taking the lim inf in (62) we obtain a contradiction to (61),
and therefore (60) holds.
5. SUFFICIENT CONDITIONS 95
Theorem 46. Let x be a minimizer of (59). Then x is a weak solution
to the Euler-Lagrange equation, that is, for all ϕ ∈ C∞c (0, T ),
(63)
∫ T
0
DxL(x, x)ϕ+DvL(x, x)ϕ = 0.
Proof. To obtain this result, it is enough to prove that at ε = 0,
d
dε
∫ T
0
L(x + εϕ, x + εϕ)
∣∣∣∣ε=0
=
∫ T
0
d
dεL(x + εϕ, x + εϕ)
∣∣∣∣ε=0
,
that is, justify the exchange of the derivative with the integral.
By Morrey’s theorem, since x ∈ W 1,2(0, T ), we have ‖x‖L∞ ≤ C.
So x ∈ K for a suitable compact set K. Let |ε| < 1. Observe that
there exists a compact K ⊃ K such that x + εϕ ∈ K for all t. For
almost every t ∈ [0, T ], the function
ε 7→ L(x + εϕ, x + εϕ)
is a C1 function of ε. Furthermore
|L(x + εϕ, x + εϕ)| ≤ CK(1 + |x + εϕ|2) ≤ CK(1 + |x|2 + |ϕ|2),
and, ∣∣∣∣ ddεL(x + εϕ, x + εϕ)
∣∣∣∣ ≤ CK(1 + |x|2 + |ϕ|2)(|ϕ|+ |ϕ|).
This estimate allows us to exchange the derivative with the integral.
Exercise 82. Show that the identity (63) also holds for ϕ ∈ W 1,20 .
Theorem 47. Suppose L(x, v) : Rn×Rn → R is smooth, satisfies (56)
and it is strictly convex. Then the weak solutions to the Euler-Lagrange
equation are C2 and, therefore, classical solutions.
Proof. Let x ∈ W 1,2(0, T ) be a weak solution to the Euler-Lagrange
equation. Define
p(t) = p0 +
∫ T
t
DxL(x, x)ds,
96 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
with p0 ∈ Rn to be chosen later. For each ϕ ∈ C∞c (0, T ) taking values
in Rn we have ∫ T
0
d
dt(p · ϕ)dt = p · ϕ
∣∣T0
= 0.
Thus, ∫ T
0
−DxL(x, x)ϕ+ pϕdt = 0.
Using the Euler-Lagrange equation in the weak form we conclude that∫ T
0
(p +DvL(x, x))ϕdt = 0,
which implies that p +DvL is constant, that is,
p = −DvL(x, x),
choosing p0 conveniently. Since p is continuous, by the previous iden-
tity, x = −DpH(p,x). Therefore, x is continuous. Moreover, if H(p, x)
is the Hamiltonian associated to L, we have
p = DxH(p,x),
which shows that p is C1. But, since
x = −DpH(p,x),
we have that x is C1 and, therefore, x is C2.
5.4. Conjugate points. In this section we study the second vari-
ation of the action and certain issues concerning the existence of mini-
mizing trajectories. If the Lagrangian corresponds to the kinetic energy
in a Riemannian Manifold we also study the connections with curva-
ture.
5.4.1. Second variation and conjugate points. The next exercise es-
tablishes the connection between Jacobi equation (55) and the second
variation:
Exercise 83. Let x : [0, T ]→ Rn. Consider the functional
Y 7→∫ T
0
1
2D2vivj
LYiYj +D2xivj
LYiYj +1
2D2xixj
LYiYj.
5. SUFFICIENT CONDITIONS 97
Show that the Euler-Lagrange equation is Jacobi equation (55). Show
that if Y is a solution of the Jacobi equation with Y (0) = Y (T ) = 0
then
(64)
∫ T
0
1
2D2vivj
LYiYj +D2xivj
LYiYj +1
2D2xixj
LYiYj = 0.
Let x is a solution of the Euler-Lagrange equation corresponding
to the Lagrangian L. A point x(T ) is conjugate to x(0) if there exists
a non vanishing solution of (55) satisfying Y (0) = Y (T ) = 0. The
dimension of the space of solutions Y to the Jacobi equation which
satisfy Y (0) = 0 is n. Similarly, the space of solutions Y to the Jacobi
equation which satisfy Y (0) = 0 is also n. Since the space of solutions
to the Jacobi equation is 2n, in general the intersection of these two
spaces is 0-dimensional, i.e. it only contains the trivial solution.
Exercise 84. Let x be a solution to the Euler-Lagrange equation. Show
that Y (t) = ddλx(λt)
∣∣λ=0
is a solution to the Jacobi equation satisfying
Y (0) = 0.
Suppose L = 12gijvivj for some Riemannian metric g. Show that
Y (t) 6= 0, for all t 6= 0. Conclude that the space of solutions Y to the
Jacobi equation which satisfy Y (0) = Y (T ) = 0 is at most n− 1.
Theorem 48. Let L(x, v) be a C∞ Lagrangian, strictly convex and
coercive. Let x a solution of the Euler-Lagrange equation corresponding
to the Lagrangian L. Let T be such that x(T ) is conjugate to x(0). then
the trajectory x is not a local minimum of the action∫ T1
0
L(x, x)
para T1 > T .
Proof. Let Y be a non-trivial solution of the Jacobi equation com
Y (0) = Y (T ) = 0. for each ε > 0 consider the trajectory
xε =
x + εY if 0 ≤ t ≤ T
x otherwise.
98 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
for each δ > 0, computing the Taylor expansion up to second order and
taking into account (64) we obtain∫ T+δ
0
L(xε, xε) ≤∫ T+δ
0
L(x, x) +O(ε3).
However, if the sign of the term O(ε3) is negative, we obtain a contra-
diction, if it is positive, by replacing Y by −Y , we are in the previous
situation. Therefore the only non-trivial case occurs when the third
order term vanishes and we have:∫ T+δ
0
L(xε, xε) ≤∫ T+δ
0
L(x, x) +O(ε4).
Let ϕ be defined in the following way ϕ(t) = εY (t) if 0 ≤ t ≤ T − δ,ϕ(t) = 0 for t > T + δ and is linear in t for T − δ ≤ t ≤ T + δ,
interpolating between the values of ϕ(T − δ) = εY (T − δ) and 0 =
ϕ(T + δ). We would like to show that∫ T+δ
0
L(x + ϕ, x + ϕ) <
∫ T+δ
0
L(x, x),
if ε and δ were chosen conveniently. For that we will proceed to prove
a series of estimates. To simplify notation, and for reasons that will
be clear later on, we assume that δ = ε3/2 and we will use the relation
a ∼ b to denote a = b+O(ε4), and similarly for and≺ for inequalities.
We have∫ T+δ
T−δL(xε, xε) ∼
∫ T
T−δL(xε(T ), xε(T ))+
DxL(xε(T ), xε(T ))(xε(t)− xε(T ))+
DvL(xε(T ), xε(T ))(xε(t)− xε(T ))+∫ T+δ
T
L(x(T ), x(T ))+
DxL(x(T ), x(T ))(x(t)− x(T ))+
DvL(x(T ), x(T ))(x(t)− x(T )).
We observe that since |t− T | ≤ δ
x(t)− x(T ) = x(T )(t− T ) +O(δ2).
5. SUFFICIENT CONDITIONS 99
Furthermore
DxL(xε(T ), xε(T )) = DxL(x(T ), x(T )) +O(ε)
DvL(xε(T ), xε(T )) = DvL(x(T ), x(T )) +O(ε).
consequently,∫ T+δ
T−δL(xε, xε) ∼δL(x(T ), xε(T )) + δL(x(T ), x(T )
2δL
(x(T ),
xε(T ) + x(T )
2
)+ 2δγε2|Y (T )|2 ∼∫ T+δ
T−δL(x + ϕ, x + ϕ) + Cδε2.
and so,∫ T−δ
0
L(xε, xε) +
∫ T+δ
T−δL(x + ϕ, x + ϕ) ≺
∫ T+δ
0
L(x, x)ds− Cδε2,
which, for δ = ε3/2 and ε sufficiently small, implies that x does not
minimize the action between 0 and T + δ.
5.4.2. Curvature. The curvature tensor R is defined by
R(X, Y )Z = ∇X∇YZ −∇Y∇XZ −∇[X,Y ]Z.
Exercise 85. Show that
R(X, Y )Z = RlijkXiYzZk
∂
∂xl,
where
Rlijk =
∂Γjk∂xi− ∂Γik
∂xj+ ΓmjkΓ
lim − ΓmikΓ
ljm.
Exercise 86 (Bianchi’s identity). Show that for all vector fields X, Y, Z
R(X, Y )Z +R(Y, Z)X +R(Z,X)Y = 0.
Theorem 49. Let L be the Lagrangian be the kinetic energy defined
by a Riemannian metric. Consider a geodesic x with tangent vector
X = x. Then Jacobi’s equation can be written as
(65)D2Y
dt2= R(X, Y )X.
100 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
Proof. Consider a one-parameter family of geodesics φ(t, δ), that
is for each δ the mapping t 7→ ψ(t, δ) is a geodesic. Let
Y =∂φk∂δ
∂
∂xk
and
X =∂φk∂t
∂
∂xk.
We have [X, Y ] = 0, and so
R(X, Y )X = ∇X∇YX −∇Y∇XX −∇[X,Y ]X
= ∇X∇YX −∇Y∇XX
= ∇X∇YX,
since ∇XX = DXdt
= 0. Once more, using [X, Y ] = 0 and the fact that
the connection is symmetric, we have ∇YX = ∇XY which then yields
(65).
Lemma 50. For all vector fields X, Y, Z, we have
〈R(X, Y )Z,Z〉 = 0.
Proof. We have
〈∇Y∇XZ,Z〉 = Y 〈∇XZ,Z〉 − 〈∇XZ,∇YZ〉,
and
〈∇X∇YZ,Z〉 = X〈∇YZ,Z〉 − 〈∇YZ,∇XZ〉.Therefore
〈∇X∇YZ,Z〉 − 〈∇Y∇XZ,Z〉 = X〈∇YZ,Z〉 − Y 〈∇XZ,Z〉
= XY 〈Z,Z〉 − Y X〈Z,Z〉 −X〈Z,∇YZ〉+ Y 〈Z,∇XZ〉,
that is,
〈∇X∇YZ,Z〉 − 〈∇Y∇XZ,Z〉 =1
2[X, Y ]〈Z,Z〉.
Since
〈∇[X,Y ]Z,Z〉 = [X, Y ]〈Z,Z〉 − 〈Z,∇[X,Y ]Z〉we have
〈∇[X,Y ]Z,Z〉 =1
2[X, Y ]〈Z,Z〉,
which implies the desired identity.
5. SUFFICIENT CONDITIONS 101
Proposition 51. Let Y be a solution of (65) along a geodesic x whose
tangent vector is X = x and satisfies
D
dtX = 0.
Suppose that Y (0) = 0 and that
〈X, DYdt〉 = 0
at t = 0. Then 〈X, DYdt〉 = 0 for all t.
Proof. We have
d
dt〈X, D
dtY 〉 = 〈D
dtX,
D
dtY 〉+ 〈X, D
2
dt2Y 〉
= 〈X,R(X, Y )X〉 = 0,
taking into account that DdtX = 0, and using in the last identity lemma
50.
Suppose we are looking for solutions to the Jacobi equation satisfy-
ing Y (0) = 0 along a geodesic x. Consider the solution Y constructed
in exercise 84. Observe that ˙Y is tangent to the geodesic x. We can
write the solution Y = aY + Y⊥ where a ∈ R and Y⊥ is a solution to
the Jacobi equation such that Y⊥(0) = 0 and Y⊥(0) is orthogonal to
x(0). By the previous proposition, Y⊥(t) is orthogonal at all times to
x(t). Additionally, by exercise 84, Y (t) 6= 0 for all t 6= 0. Therefore,
if Y (T ) = 0 then a = 0. Consequently, to look for conjugate points it
suffices to consider initial conditions orthogonal to the geodesic.
A manifold has constant sectional curvature k0 if for all vector fields
X, Y,W,Z we have
〈R(X, Y )W,Z〉 = k0 [〈X,W 〉〈Y, Z〉 − 〈Y,W 〉〈X,Z〉] .
Exercise 87. Show that the sphere x2 + y2 + z2 has constant sectional
curvature.
102 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
Exercise 88. Let ei be an orthonormal basis for TpM . Show that if M
has constant sectional curvature then
Rijkl = 〈R(ei, ej)ek, el〉 = k0(δikδjl − δilδjk).
Example 26. Let M be a manifold with constant sectional curvature.
Let x be a geodesic in M with |x| = 1 and let Y be a Jacobi field
orthogonal to x. Then, Jacobi’s equation
D2Y
dt2= R(x, Y )x
can be written asD2Y
dt2= k0Y,
since for each vector field X we have
〈R(x, Y )x, X〉 = k0(〈x, x〉〈Y,X〉 − 〈x, X〉〈Y, x〉) = k0〈Y,X〉.
Thus, depending on the sign of k0, we obtain the following solutions
Y (t) =
sin t√k0e(t) if k0 > 0
te(t) if k0 > 0
sinh t√−k0e(t) if k0 < 0,
where e(t) is a parallel vector field. As a conclusion, if the sectional
curvature is negative, the geodesics cannot have conjugate points. J
5.4.3. Computation of conjugate points. In this section we explicitly
compute conjugate points.
Example 27 (Sphere). Consider a sphere of radius 1 in spherical co-
ordinates (θ, ϕ) as in exercise (50). The Euler-Lagrange equations are ddt
(sin2 ϕθ) = 0
ddtϕ+ sinϕ cosϕθ2 = 0.
And the corresponding Jacobi equation isddt
[sin2 ϕpθ
]+ d
dt
[2 sinϕ cosϕθpϕ
]= 0
ddtpϕ + cos2 ϕpϕθ
2 − sin2 ϕpϕθ2 + 2 sinϕ cosϕθpθ = 0.
5. SUFFICIENT CONDITIONS 103
Consider a geodesic ϕ = π2, θ = 1 (the equator). In this case the Jacobi
equation is pθ = 0
pϕ + pϕ = 0,
which has as a particular solution
pϕ = sin t pθ = 0,
which shows that θ = π is conjugated to θ = 0. J
Example 28 (Lobatchewski plane). Consider the following metric in
the upper semiplane (y > 0) given by
g =
[1y2 0
0 1y2
].
The geodesics minimize ∫x2 + y2
2y2,
and, consequently, are solutions to the Euler-Lagrange equation
d
dt
[x
y2
]= 0,
d
dt
[y
y2
]+
x2 + y2
y3= 0.
Consider vertical geodesics, that is with x = 0. Then
d
dt
[y
y2
]+
y2
y3= 0,
which admits
y = ae−t
as a solution.
The Jacobi equation is
d
dt
[pxy2
]− d
dt
[2xpyy3
]= 0
d
dt
[pyy2− 2
ypyy3
]+ 2
xpx + ypyy3
− 3x2 + y2
y4py = 0.
104 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
Observe that to determine the conjugate points we only need to con-
sider solutions which are orthogonal to the geodesic. So we for vertical
geodesics we can set px = 0, and x = 0. Thus
d
dt
[pyy2− 2
ypyy3
]+ 2
ypyy3− 3
y2
y4py = 0
Set p = pyy2 . Then
p + 2y
yp− 3
y2
y2p = 0
Since y = −y we have
p− p = 0.
We leave as a (simple) exercise to check that therefore there are no
conjugate points. J
5.4.4. Cut locus.
Theorem 52. Let x be a solution of the Euler-Lagrange equation and
let T > 0 be the infimum of all t for which x is not a minimizing
trajectory. Then either x(0) and x(T ) are conjugated or there exists y
such that y(0) = x(0) and y(T ) = x(T ) such that∫ T
0
L(x, x) =
∫ T
0
L(y, y).
Proof. Since for t > T the trajectory x is not minimizing, there
exist ti > T and solutions to the Euler-Lagrange equation yi such
that ti → T , yi(0) = x(0), yi(ti) = x(ti). By the proof of theorem
45, which guarantees the existence of minimizing trajectories, we can
assume that yi(0) is uniformly bounded. Then yi(0)→ y(0), through
some subsequence. If y(0) 6= x(0), it is easy to check that we have∫ T
0
L(x, x) =
∫ T
0
L(y, y),
otherwise the trajectory x would not be minimizing for t < T . In
the second case, consider the flow φ(x, v, t) = (φx, φv) given by the
Euler-Lagrange equations with initial conditions (x, v) at 0, that is,
φ(x(0), x(0), t) = (x(t), x(t). If x(0) is not conjugated to x(T ) the
6. SYMMETRIES AND NOETHER THEOREM 105
matrix Dvφx(x(0), x(0), T ) is non singular, therefore for v in a neigh-
borhood of x(0) and t sufficiently close to T the mapping
v 7→ φ(x(0), v, t)
is a diffeomorfism. But yi(0) → x(0) and yi(ti) = xi(ti) which is a
contradiction.
6. Symmetries and Noether theorem
Noether’s theorem concerns variational problems which admit sym-
metries. By this theorem, associated to each symmetry there is a
quantity that is conserved by the solutions of the Euler-Lagrange equa-
tion. In classical mechanics, for instance, translation symmetry yields
conservation of linear momentum, to rotation symmetry corresponds
conservation of angular momentum and time-invariance implies energy
conservation.
6.1. Routh’s method. We start the discussion of symmetries by
considering a classical technique to simplify the Euler-Lagrange equa-
tions. Consider a Lagrangian of the form L(x, x, y), that is, indepen-
dent of the coordinate y. Note that this corresponds to translation
invariance in the coordinate y. The Euler-Lagrange equation shows
that
py = −DyL(x, x, y)
is constant. We will explore this fact to simplify the Euler-Lagrange
equations. We assume further that w 7→ L(x, x, w) is strictly convex
and superlinear. Then we define the partial Legendre transform with
respect to y, Routh’s function, as
R(x, x,py) = supw−py · w − L(x, x, w).
By convexity, the supremum is achieved at a unique point w(x, x,py).
We have that
py = −DwL y = −DpyR.
106 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
Note that, by the Euler-Lagrange equation
py = 0
and,
d
dt
∂R
∂x− ∂R
∂x=− d
dt
∂L
∂x+∂L
∂x− d
dt
[∂L
∂w
∂w
∂x+ py
∂w
∂x
]+∂L
∂w
∂w
∂x+ py
∂w
∂x
=d
dt
∂L
∂x− ∂L
∂x= 0.
Therefore, since py is constant, we can solve these equations in the
following way: for each fixed py consider the equation
d
dt
∂R
∂x− ∂R
∂x= 0.
Once this equation is solved, determine y through
y = −DpyR(x, x,py).
Exercise 89. Apply Routh’s method to the Lagrangian
L =x2
2+
y2
2− U(x).
Exercise 90. Apply Routh’s method to the symmetric to in an external
field which has as Lagrangian
L =I1
2(θ2 + ϕ2 sin2 θ) +
I3
2(ψ + ϕ cos θ)2 − U(ϕ, θ).
Exercise 91. Apply Routh’s method to the spherical pendulum whose
Lagrangian is:
L =θ2 sin2 ϕ+ ϕ2
2− U(ϕ).
6.2. Noether theorem. As a motivation for the definition of in-
variance of a Lagrangian with respect to a transformation group, ob-
serve that if φ : Rn → Rn is a diffeomorphism and γ : [0, T ] → Rn is
an arbitrary curve, then φ(γ) is another curve in Rn whose velocity is
Dxφ(γ)γ. Suppose for each τ ∈ R, φτ : Rn → Rn is a diffeomorphism.
6. SYMMETRIES AND NOETHER THEOREM 107
We say that a Lagrangian L(x, v) is invariant under a transformation
group φτ (x) if for each τ ∈ R
L(x, v) = L(φτ (x), Dxφτ (x)v).
We will assume additionally φτ is differentiable in τ .
Theorem 53. Let L be a Lagrangian invariant under a smooth trans-
formation group φτ (x). Let x be a solution of the Euler-Lagrange equa-
tion. then
DvL(x(T ), x(T ))d
dτφτ (x(T ))
∣∣∣∣τ=0
is independent of T .
Proof. Let x be a solution of the Euler-Lagrange equation and
xτ (t) = φτ (x(t)).
Then
xτ = Dxφτ (x(t))x(t).
Consequently,
(66)
∫ T
0
L(xτ , xτ )
is constant in τ . Differentiating (66) with respect to τ we obtain∫ T
0
DxL(xτ , xτ )dxτdτ
+DvL(xτ , xτ )dxτdτ
= 0.
Integrating by parts, using the Euler-Lagrange equation, and taking
τ = 0 we obtain
DvL(xτ (0), xτ (0))d
dτφτ (x(0))
∣∣∣∣τ=0
= DvL(xτ (T ), xτ (T ))d
dτφτ (x(T ))
∣∣∣∣τ=0
.
Exercise 92. Let ω ∈ Rn and L(x, v) be a Lagrangian satisfying, for
all τ , L(x+ωτ, v) = L(x, v). Show that DvL·ω is a constant of motion.
108 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
Exercise 93. Let L(x, y, vx, vy) =v2x+v2
y
2− x2+y2
2. Show that L is in-
variant by rotations and, using Noether’s theorem, that the angular
momentum xvy − yvx is a constant of motion.
Theorem 54. Suppose L is a Lagrangian which does not depend on t.
Then the energy is conserved.
Proof. Observe that∫ T+h
h
L(x(t− h), x(t− h))dt
is independent on h. Differentiate with respect to h, integrate by parts
using the Euler-Lagrange equation.
Example 29. Consider the Lagrangian
L =x2 + y2
2y2,
corresponding to the geodesic flow in the Lobatchewski plane. Iden-
tifying the upper semi-plane with z ∈ C : =(z) > 0 and the points
(x, y) with z = x+ iy, the mapping
z 7→ az + b
cz + d
defines an action of the group SL(2,R), the group of matrices with unit
determinant, in the Lobatchewski plane, which leaves the Lagrangian
invariant. Use matrices of the form
A1(τ) =
[1 τ
0 1
], A2(τ) =
[eτ 0
0 e−τ
]e A3(τ) =
[1 0
τ 1
],
we obtain the conservation laws
x
y2,
xx + yy
y2and
x(x2 − y2) + 2yxy
y2.
J
Exercise 94. Obtain the general law F (x,y) = 0 of motion of a geo-
desic in the Lobatchewski plane.
6. SYMMETRIES AND NOETHER THEOREM 109
6.3. Monotonicity formulas. As before, let L.Rn × Rn → R be
a smooth Lagrangian. A sub-symmetry (resp. super-symmetry) of L
is a (smooth) one-parameter mapping φτ (x) such that φ0(x) = x and
d
dτL(φτ (x), Dxφτ (x)v)
∣∣∣∣τ=0
≤ 0 (resp. ≥ 0).
A simple variation of the proof of Noether’s theorem yields:
Theorem 55. Let φτ be a sub-symmetry of L. Then
d
dt
[DvL(x, x)
d
dτφτ (x)
∣∣∣∣τ=0
]≤ 0,
with the opposite inequality for super-symmetries.
Proof. It suffices to observe that
0 ≥ d
dτ
∫ T
0
L(φτ (x), Dxφτ (x)x)dt
∣∣∣∣τ=0
=
∫ T
0
DxL(x, x)d
dτφτ (x)
∣∣∣∣τ=0
+DvL(x, x)d
dt
d
dτφτ (x)
∣∣∣∣τ=0
= DvL(x, x)d
dτφτ (x)
∣∣∣∣τ=0
∣∣∣∣T0
,
which then implies the result.
An application of this theorem is the following corollary:
Corollary 56. Let L(x, v) : Rn × Rn → R be smooth Lagrangian ad-
mitting a strict sub-symmetry. Then the corresponding Euler-Lagrange
equations does not have periodic orbits.
Next we present some additional examples and applications.
Example 30. Suppose, for some y ∈ Rn and h ≥ 0, L(x + hy, v) ≤L(x, v), then
d
dtDvL(x, x)y ≤ 0.
J
110 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
Example 31. Consider the case in which L(λx, λv) is increasing in λ,
for λ ≥ 0. Thend
dtDvL(x, x)x ≥ 0.
J
Example 32. Consider the mapping φτ (x) = x + τF (x), and assume
thatd
dτL(x+ τF (x), v + τDxFv) ≤ 0,
at τ = 0. Thend
dtDvL(x, x)F (x) ≤ 0.
Consider the case L = |v|22
, and F = ∇U , for some concave function U .
Thend
dτ
|(I + τD2U)v|2
2
∣∣∣∣τ=0
= vTD2Uv ≤ 0.
Thusd
dt∇U · v ≤ 0,
that isd2
dt2U(x) ≤ 0,
that is U(x(t)) is a concave function. J
Example 33. Consider a system of n non-interacting particles, and
set
U =∑i 6=j
|xi − xj|.
Clearly U is a convex function. By the previous example we have
d2
dt2|xi − xj| ≥ 0.
J
Exercise 95. Consider a smooth Lagrangian of the form
e−αtL(x, v)
This Lagrangian is sub-invariant in time.
7. CRITICAL POINT THEORY 111
1. Prove thatd
dte−αtE(t) ≥ 0,
where
E = DvL(x, x)x− L(x, x).
In particular, show that this estimate yields exponential blow up
of the energy.
2. Impose conditions upon L that ensure that exponential blow up
of the kinetic energy can also be bounded using simple estimates
by E(t) ≤ Ceβt.
Exercise 96. Consider the Lagrangian:, L : Rn × Rn → R
L(x, v) =1
β|v|β − 1
α|x|α.
Deduce the Virial theorem:
limT→∞
1
T
∫ T
0
|x|β = limT→∞
1
T
∫ T
0
|x|α
Hint: use the scaling transformation x→ λx, for λ in a neighborhood
of 1.
7. Critical point theory
In this section we discuss methods to construct non-minimizing
critical points.
7.1. Some informal computations. Let T > 0 be given. For
a ∈ Rn, let xa be an orbit which minimizes the action under the con-
straint x(0) = x(T ) = a. In general x(0) 6= x(T ) this orbit does not
have period T . Let I[a] be the function that associates to a the action
xa:
I[a] =
∫ T
0
L(xa, xa)dt.
At the maxima or minima of I[a], if I is differentiable
I ′[a] = 0,
112 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
that is (assuming xa is differentiable in a)
0 =
∫ T
0
DxL(xa, xa)Daxa +DvL(xa, xa)Daxa.
Integrating by parts and using the fact that xa satisfies the Euler-
Lagrange equation, we obtain
DvL(xa(0), xa(0)) = DvL(xa(T ), xa(T )),
which is equivalent to p(0) = p(T ) and, if the Legendre transform
v 7→ p = −DvL is injective (see exercise 97), implies xa(0) = xa(T ).
Thus we conclude that the orbits corresponding to maxima or minima
of I[a] are T periodic.
Exercise 97. Show that if L(x, v) if strictly convex in v then then
v 7→ DvL(x, v)
is injective.
In general the differentiablity of xa is hard to establish and in the
next section we work around this problem using mountain pass tech-
niques.
7.2. Mountain pass lemma. LetH be a Hilbert space with inner
product (·, ·). Consider a functional Φ : H → R. Φ is differentiable if
there exists a function Φ′(u) ∈ H such that
lim‖u−v‖→0
‖Φ(u)− Φ(v)− (Φ′(u), v − u)‖‖u− v‖
= 0.
A function Φ is C1 if Φ′ exists and is continuous. Similarly Φ is C1,1 if
Φ′ is Lipschitz.
A point u ∈ H is a critical point if Φ′(u) = 0. The set of critical
points in the level set Φ(u) = c is denoted by
Kc = u : Φ′(u) = 0, Φ(u) = c.
7. CRITICAL POINT THEORY 113
A functional Φ : H → R satisfies the Palais-Smale condition if any
sequence (uk) ∈ H satisfying sup |Φ(uk)| ≤ C and ‖Φ′(uk)‖ → 0 is
pre-compact, that is, it admits a convergent subsequence.
Lemma 57 (Deformation lemma). Let Φ : H → R be a functional
satisfying the Palais Smale condition. Let c ∈ R be such that Kc = ∅.Then, there exists ε > 0 and δ > 0 and a continuous function η :
[0, 1]×H → H such that
1. η0(u) = u;
2. η1(u) = u if |Φ(u)− c| > ε;
3. Φ(ηt(u)) ≤ Φ(u);
4. Φ(η1(u)) ≤ c− δ if Φ(u) ≤ c+ δ.
Proof. Firstly, we claim that there exist non-negative real num-
bers σ and ε such that
|Φ(u)− c| < ε =⇒ ‖Φ′(u)‖ ≥ σ.
To show this claim, assume by contradiction that there exist sequences
σk → 0 and εk → 0 such that |Φ(uk) − c)| ≤ εk and ‖Φ′(uk)‖ ≤ σk.
This implies the existence of a convergent subsequence of uk with limit
u. This vector is a critical point, which is a contradiction.
Choose δ, 0 < δ < ε and 0 < δ < σ2
2. Define
A = u : |Φ(u)− c| > ε, B = u : |Φ(u)− c| < δ.
Let
g(u) =dist(u,A)
dist(u,A) + dist(u,B),
0 ≤ g ≤ 1. We have that g ≡ 0 in A and g ≡ 1 in B. Let also
h(t) =
1 if 0 ≤ t ≤ 1
1t
if t > 1.
Consider
V (u) = −g(u)h(‖Φ′(u)‖)Φ′(u).
114 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
For each u ∈ H consider the equation
(67) ηt = V (ηt),
with η0 = u. We have that
d
dtΦ(ηt) ≤ 0.
If ηt ∈ B thend
dtΦ(ηt) ≤ −σ2
and for ηt ∈ A then V ≡ 0.
Finally, to end the proof, it is enough to observe that if |Φ(u)−c| < δ
then we have Φ(η1) ≤ c− δ since σ2
2> δ.
Exercise 98. Show that the solution of (67) is continuous on the initial
condition u.
Theorem 58 (Mountain pass). Let Φ be a C1,1 functional satisfying
the Palais Smale condition. Suppose that
1. Φ(0) = 0;
2. Φ(u) ≥ a if ‖u‖ = r, where a, r > 0
3. there exists v ∈ H such that Φ(v) ≤ 0, with ‖v‖ > r.
Let
Γ = g ∈ C([0, 1], H) : g(0) = 0, g(1) = v
then the set Kc, with
c = infg∈Γ
max0≤t≤1
Φ(g(t)),
is non-empty.
Proof. Clearly c > a. Suppose that Kc = ∅. Choose ε < a2
and
apply the deformation lemma to construct the homeomorphism η. Let
g be such that
max0≤t≤1
Φ(g(t)) ≤ c+ δ.
7. CRITICAL POINT THEORY 115
Then
max0≤t≤1
Φ(η(g(t))) ≤ c− δ,
which is a contradiction.
Exercise 99. Consider the Lagrangian
L(x, v) =1
2|v|2 +
1
2x2 − εx
4
4.
Let Φ be the functional
Φ(x) =
∫ 1
0
L(x, x)ds
defined in H1per(0, 1).
1. Show that Φ is differentiable and show that its derivative is give
by
〈Φ′(x),y〉 =
∫ 1
0
xy + xy − εx3y
2. Show that Φ′(x) is Lipschitz in x, that is, the vector w ∈ H
that satisfies
〈Φ′(x),y〉 =
∫ 1
0
zy + zy,
is a Lipschitz function of x.
3. Show that Φ satisfies the Palais-Smale condition:
(a) Let xn be a sequence satisfying Φ(xn) ≤ C and Φ′(xn)→ 0.
Show that ∫ 1
0
x2n + x2
n ≤ C.
(b) Show that this implies that, through a subsequence, xn x,
for some function x in H1per(0, 1) and that xn → x uni-
formly.
(c) Use the fact that Φ′(xn) → 0 in H1per(0, 1) to show that
xn → x in H1per(0, 1) using Lax-Milgram theorem.
4. Show that x ≡ 0 is a strict local minimum of the action, that
is,
Φ[x] ≥ α‖x‖H1per,
for some α > 0 and ‖x‖ sufficiently small.
116 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
5. Show that there exists a curve y 1-periodic that satisfies Φ[y] <
0.
6. Prove the existence of a non-trivial 1-periodic solution Euler-
Lagrange equation.
8. Invariant measures
An important issue in dynamical systems are invariant measures
under the flow induced by a vector field. In this section we review
some results and construct invariant measures under the Hamiltonian
flow.
Lemma 59. Let µ be a measure on a manifold M . Let χ be a smooth
vector field on M . The measure µ is invariant with respect to the flow
generated by the vector field χ iff for any smooth compactly supported
function ξ : M → R we have∫M
∇ξ · χdµ = 0.
Proof. Let Φt be the flow, generated by the vector field χ. Then if
µ is invariant under Φt, for any smooth compactly supported function
ξ(x) and any t > 0 we have∫ξ(Φt(x)
)− ξ(x)dµ = 0.
By differentiating with respect to t, and setting t = 0, we obtain the
“only if” part of the theorem.
To establish the converse, we have to prove that for any t the mea-
sure µt is well-defined as
µt(S) = µ((Φt)
−1(S)).
and coincides with µ.
8. INVARIANT MEASURES 117
By the Riesz representation theorem it is sufficient to check that
the identity ∫ξdµ =
∫ξdµt
holds for any continuous function ξ (vanishing at ∞). Any continuous
function can be uniformly approximated by smooth functions. There-
fore it is sufficient to prove the above identity for smooth functions ξ
with compact support.
Assume, without loss of generality, that ξ(x) is a C2-smooth func-
tion. Fix t > 0. We have to prove that∫ξ(Φt(x)
)− ξ(x)dµ = 0.
We have∫ξ(Φt(x)
)− ξ(x)dµ =
N−1∑k=0
∫ξ(Φt(k+1)/N(x)
)− ξ(Φtk/N(x)
)dµ
=N−1∑k=0
∫ξk(Φt/N(x)
)− ξk(x)dµ ,
where ξk(x) = ξ(Φtk/N(x)
)N−1∑k=0
∫ξk(Φt/N(x)
)− ξk(x)dµ
=N−1∑k=0
∫∇ξk(x) ·
(Φt/N(x)− x
)+O( t
N2 )dµ =
=N−1∑k=0
∫∇ξk(x) ·
(tNχ(x) +O( t
N2 ))
+O( tN2 )dµ
= tN
N−1∑k=0
∫∇ξk(x) · χ(x)dµ+O( t
N) = O( t
N).
Taking the limit N →∞ we complete the proof.
Exercise 100. Consider a measure on R2n with density eβH(p,x). Show
that this measure is invariant under the Hamiltonian flow generated by
H.
118 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
Exercise 101. Show that the Hamiltonian flow preserves area in phase
space.
Example 34. Let u(x, P ) be a solution of H(P + Dxu, x) = H(P ).
Then the graph
(68) p = P +Dxu(x, P ),
is invariant under the flow generated by (75). Furthermore, the flow
restricted to this graph is conjugated to a translation as X is constant.
If the Hamiltonian H(p, x) is Zn periodic in x, and u is a Zn periodic
function, that is H(p, x + k) = H(p, x), and u(x + k) = u(x), for
all p, x ∈ Rn and k ∈ Zn, the graph (68) can be interpreted as an
invariant torus. Furthermore, as the Lebesgue measure dX in the new
coordinates is invariant under the Hamiltonian dynamics, the change
of variables formula implies that the measure supported in the graph
(68) with density
(69) θ(x)dx = det(I +D2Pxu)dx
is an invariant measure.
9. Non convex problems
This section is an introduction to the calculus of variations for non-
convex Lagrangians.
Exercise 102. Suppose that
lim|v|→∞
L(x, v)
|v|→ ∞,
uniformly in x. Show that any C1 minimizing sequence of the action
with fixed endpoints is equicontinuous.
Exercise 103. Consider the problem
minx(−1)=0,x(1)=1
∫ 1
−1
|tx(t)|2dt.
Show that
xn =1
2+
arctannx
2 arctann
10. GEOMETRY OF HAMILTONIAN SYSTEMS 119
is a minimizing sequence that does not converge uniformly.
Exercise 104 (Discontinuities). Let L(x, v) : R2n → R be a C2 func-
tion. Consider a continuous trajectory x(·), sectionally C1. Suppose
that x is a minimizer of ∫ T
0
L(x, x)dt,
over all piecewise C1 curves which satisfy fixed boundary conditions.
Let t0 be a point where x is discontinuous with left and right limits v±.
Determine an equation that relates v+ with v−. Show that, if L(x, v)
is strictly convex in v, the continuous minimizers which are sectionally
C2 and whose left and right derivatives exist at all points are of class
C1.
Exercise 105 (Lavrentiev phenomenon). Consider the variational prob-
lem
minu(0)=1,u(1)=1
∫ 1
0
(u3 − t)2u6.
Show that u = t1/3 minimizes this problem when the minimum is taken
over continuous functions u on [0, 1] and differentiable in (0, 1). How-
ever, for any sequence uk of continuous functions on [0, 1] satisfying
uk(0) = 0 and uk(1) = 1 with bounded derivative and converging point-
wise to x1/3 we have ∫ 1
0
(u3k − t)2u6
k →∞.
10. Geometry of Hamiltonian systems
We can discuss the Hamiltonian formalism using a more geometric
approach. Suppose for now that in (47) we can apply the minimax
principle and exchange infv supp with supp infv. In this case we obtain
the problem
(70) infx(·)
supp(·)
∫ T
0
−H(p,x)dt− p · xdt.
120 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
To generalize the problem, suppose the variable x represents a point in
a manifold M and consider the differential form on [0, T ]× T ∗M
σ = −Hdt− α,
with α = pdx. Then (70) is equivalent to determining critical points of
(71)
∫γ
σ
over all curves (x,p) : [0, T ] → T ∗M . In a more general setting,
suppose we are given an even dimensional manifold S, which replaces
T ∗M , and is endowed with a 1−form α such that dα is non-degenerate.
Let H : S → R, we would like to determine the critical curves, γ, of
(71). That is, curves γ such that for all C1 variation γτ we have
d
dτ
∫γτ
σ
∣∣∣∣τ=0
= 0.
Let γ : [0, T ] → S be a critical point, XH : [0, T ] → TS a tangent
vector to the curve γ, Y vector field in S with Y (γ(0)) = Y (γ(T )) = 0
and, finally, set φτ = exp(τY ). Consider
i(τ) =
∫γ
σ =
∫ T
0
−H(φτ (γ))dt− αφτ (γ)(Dφ∗τXH)dt
along φτ (γ).
Exercise 106. Show that
di(τ)
dτ
∣∣∣∣τ=0
=
∫ T
0
−dH(x,p)(Y )− dα(XH , Y ).
e, therefore, the critical points satisfy
dα(XH , ·) = −dH(·).
Hint: Observe that
d
dταφτ (γ)(Dφ
∗τXH)
∣∣∣∣τ=0
= LY α(XH),
and recall that LY α = d(iY α) + iY (dα).
10. GEOMETRY OF HAMILTONIAN SYSTEMS 121
A symplectic manifold is an even dimensional manifold S endowed
with a closed non-degenerate 2-form ω (recall that a form is non-
degenerate if for all non-zero vector field X, ω(X, ·) is non-zero). Given
a Hamiltonian H : S → R, the vector field XH which generates the
Hamiltonian flow is uniquely determined by the equation
ω(XH , ·) = −dH.
It is important to observe that the form ω is only required to be closed,
and not exact. Locally this distinction is irrelevant, but it has impor-
tant consequences at the global level.
Exercise 107. Consider R4 with the symplectic form ω = dp1 ∧ dx1 +
2dp2 ∧ dx2. Let H : R4 → R. Determine XH .
To determine the vector field XH it is necessary to solve the system
of linear equations iXHω = −dH. To avoid this problem, we introduce
the Poisson bracket o F,G of two functions F and G defined as
F,G = ω(XF , XG).
Exercise 108. Show that F,G = XF (G). In this way we can identify
F, · = XF .
Exercise 109. Let ω =∑
i dpi ∧ dxi. Determine the Poisson bracket.
Exercise 110. Show that ·, ·
1. is bilinear;
2. anti-symmetric;
3. satisfies the Leibnitz rule:
F,GH = F,GH + F,HG;
4. satisfies the Jacobi identity:
F, G,H+ H, F,G+ G, H,F = 0.
A Poisson manifold is a manifold P (in arbitrary dimension) en-
dowed with a bracket ·, · satisfying the properties 1-4 of the previous
exercise.
122 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
Exercise 111. Show that using the Poisson one can define the vector
field corresponding to a Hamiltonian H through the identification of
H, · with the vector field XH .
Exercise 112. Let M be a Poisson manifold and F1, F2 : M → R such
that
F1, F2 = C.
Show that
[XF1 , XF2 ] = 0.
Hint: Consider [XF1 , XF2 ]g for arbitrary g : M → R.
11. Perturbation theory
Exercise 113. Consider the Hamiltonian H : R4 → R given by
H(p, x) = ω · p+ ε sin(x1 + x2).
Assume that ω ∈ R2 satisfies ω · k > 0 for all k ∈ Z2. Show that
exists a canonical transformation(x, p) 7→ (X,P ) such that the new
Hamiltonian is
H(P ) = ω · P.Consider now the case
H(p, x) =|p|2
2+ ω · p+ ε sin(x+ y).
Show that in a neighborhood of P = 0 we have, using the same change
of coordinates,
H(P,X) = ω · P +O(ε2 + |P |2).
We consider Hamiltonians of the form
(72) Hε(p, x) = H0(p) + εH1(p, x),
with H0, H1 smooth, H0(p) strictly convex and H1(p, x) bounded with
bounded derivatives, and Zn periodic in x. We would like to approxi-
mate the solutions of
(73) Hε(P +Dxuε, x) = Hε(P ),
11. PERTURBATION THEORY 123
We are given a reference value P = P0, and we assume that for
ε = 0 the rotation vector ω0 = DPH0(P0) satisfies Diophantine non-
resonance conditions
(74) |ω0 · k| ≥C
|k|s,
for some positive constant C and some real s > 0.
In this section we review the classical perturbation theory for Hamil-
tonian systems using a construction equivalent to the Poincare nor-
mal form near an invariant tori. Somewhat incorrectly, but following
[AKN97], we call it the Linstedt series method. Although these re-
sults are fairly standard, see [AKN97], for instance, we present them
in a more convenient form for our purposes.
Consider the Hamiltonian dynamics:
(75)
x = −DpHε(p,x)
p = DxHε(p,x),
we use the convention that boldface (x,p) are trajectories of the Hamil-
tonian flow and not the coordinates (x, p). The Hamilton-Jacobi inte-
grability theory suggests that we should look for functions Hε(P ) and
uε(x, P ), periodic in x, solving the Hamilton-Jacobi equation:
(76) Hε(P +Dxuε, x) = Hε(P ).
Then, by performing the change of coordinates (x, p)↔ (X,P ) deter-
mined by:
(77)
X = x+DPuε
p = P +Dxuε,
the dynamics (75) is simplified toX = −DPH(P)
P = 0,
we use again the convention that boldface (X,P) are trajectories of the
Hamiltonian flow and not the new coordinates (X,P ).
124 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
If u is an approximate solution to (132) satisfying
(78) Hε(P +Dxu, x) = Hε(P ) + f(x, P ),
then the change of coordinates (77) transforms (75) into
(79)
X = −DPHε(P)−DPf(X,P)
P = DXf(X,P),
with the convention that f(X,P ) = f(x(X,P ), P ).
The KAM theory deals with constructing solutions of (132) by using
an iterative procedure, a modified Newton’s method, that yields an
expansion
uε = u0 + εv1 + ε2v2 · · · .
The main technical point in KAM theory is to prove the convergence of
these expansions. An alternate method that yields such an expansion
is the Linstedt series [AKN97]. However we should point out that
whereas the KAM expansion is a convergent one, the Linstedt series
may fail to converge. Nevertheless, since we will only need finitely many
terms we will use a variation of the Linstedt series that we describe
next.
We say that a vector ω ∈ Rn is Diophantine if for all k ∈ Zn\0,|ω · k| ≥ C
|k|s , for some C, s > 0. Let P0 be such that ω0 = DPH0(P0)
is Diophantine. We look for an approximate solution of
Hε(P +Dxuε(x, P ), x) = Hε(P ),
valid for P = P0 +O(ε). When ε = 0, H0(P ) = H0(P ) and the solution
u0 is constant, for instance we may take u0 ≡ 0. For ε > 0 we have,
formally, uε = O(ε), and so we suggests the following approximation
uεN to uε:
uεN =εv1(x, P0) + ε(P − P0)DPv1(x, P0) + ε2v2(x, P0)+(80)
+1
2ε(P − P0)2D2
PPv1(x, P0) + ε2(P − P0)DPv2(x, P0)+
+ ε3v3(x, P0) + · · · ,
11. PERTURBATION THEORY 125
this expansion is carried out up to order N − 1 in such a way that,
formally uε − uεN = O(εN). For example
uε1 = 0, uε2 = εv1, uε3 = εv1 + ε2v2 + ε(P − P0)DPv1.
The functions vi and DkPkvi satisfy transport equations
DpH0(P0)Dxw = f(· · · ),
for some suitable f , and can be solved inductively. For instance:
H1(P0) = DpH0(P0)Dxv1 +H1(P0, x),
DPH1(P0) = DpH0(P0)Dx(DPv1) +D2ppH0(P0)Dxv1 +DpH1(P0, x),
and
H2(P0) =
DpH0(P0)Dxv2 +1
2D2ppH0(P0)Dxv1Dxv1 +DpH1(P0, x)Dxv1.
Note that the derivatives of vi with respect to P , DkPkvi, are computed
by solving appropriate transport equations, as is illustrated above for
DPv1, and not by differentiating vi. In fact vi may not be defined for
P 6= P0. However if its derivative exists it satisfies a transport equation.
The constantsH1(P0), DPH1(P0), H2(P0)... are uniquely determined
by integral compatibility conditions, for example,
H1(P0) =
∫H1(P0, x)dx,
DPH(P0) =
∫DpH1(P0, x)dx,
and
H2(P0) =
∫1
2D2ppH0(P0)Dxv1Dxv1 +DpH1(P0, x)Dxv1dx.
If H is sufficiently smooth and ω0 is non-resonant then these equations
have smooth solutions that are unique up to constants. Finally one
can check that
(81) Hε(P +DxuεN , x) = HN
ε (P ) +O(εN + |P − P0|N),
126 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE
with
HNε (P ) = H0(P0) + εH1(P0) + (P − P0)DPH0(P0) + ε2H2(P0) + · · · ,
and this expansion is carried up to order N − 1 in such a way that
formally
Hε(P ) = HNε (P ) +O(εN + |P − P0|N).
Consider the change of coordinatesp = P +DxuεN(x, P )
X = x+DP uεN(x, P ).
Then, by (78) and (79), (75) is transformed into:X = −DPHε(P) +O(εN + |P− P0|N−1)
P = O(εN + |P− P0|N).
12. Bibliographical notes
There is a very large literature on the topics of this chapter. The
main references we have used were [Arn66] and [AKN97]. Two clas-
sical physics books on this subject are [Gol80] and [LL76]. On the
more geometrical perspective, the reader may want to look at [?] (see
also [?]) and [Oli98]. Additional material on classical calculus of vari-
ations can be found in [?] and the classical book [?]. In what concerns
symmetries, additional material can be consulted in [?]. A very good
reference in Portuguese is [?].
3
Calculus of variations and elliptic equations
The objective of this chapter is to study the existence and regularity
of minimizers of functionals of the form
I[u] =
∫U
L(Du, u, x)dx,
where U is a open subset of Rn, and L : Rn×m × Rn × U → R is a
suitable Lagrangian. The models we will consider are quite simplified,
illustrating, however, the ideas used in more general cases. Moreover,
we will only establish regularity for minimizers in the interior of U ,
avoiding, thus, the study of the behavior up to the boundary that,
frequently, is quite technical. Also, to simplify, we assume that U is
bounded and has a regular boundary. The interested reader will be
able to find, in higher generality, the results studied in this chapter in,
for instance, [Gia83], [Gia93], or [GT01]. We will consider both the
scalar case m = 1 and vectorial case m > 1. However, as the theory is
more complete in the scalar case, we will prove a few more results.
We will start by establishing necessary conditions for a function
to be a minimizer, and then, as before we proceed with studying the
existence of minimizers using the direct method in the calculus of varia-
tions. This guarantees the existence, for instance if L(p, z, x) is convex
in p and satisfies certain growth conditions. Then, we will show that
these minimizers are weak solutions to the Euler-Lagrange equation
(82) − divxDpL+ Lz = 0.
Although the results that we prove are valid for more general problems,
in a significant part of this chapter, we consider the particular case
127
128 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS
where
(83) L = L(p)− zf(x).
The regularity theory for elliptic equations addresses this problem and
establishes conditions under which u is smooth enough so that it sat-
isfies (82) in the classical sense.
The study of the regularity of elliptic equations follows several steps.
First the energy methods show that the minimizers are in W 2,2(Ω) and
solve
divDpL(Du) = f.
This establishes the existence of second derivatives in the weak sense.
Then we will try to show that these are classic solutions to the Euler-
Lagrange equation. This is a second order partial differential equation,
thus we will try to establish that u ∈ C2,α.
We will first will consider the deGiorgi-Nash Moser Holder estimates
for elliptic equations
(84) (aij(x)vxi)xj = (f(x))xk ,
with f ∈ Lp and aij uniformly elliptic, that is,
θ|χ|2 ≤ aij(x)χiχj ≤ Θ|χ|2.
These estimates imply that the solutions of (84) are Holder continuous
independently of the regularity of aij.
We will use the following strategy: each of the derivatives v = uxkof u is a weak solution of
(85) − (D2pipj
Lvxj)xi = fxk .
That is, rewriting the equation, we conclude that v solves an equation
of the form
−(aijvxi)xj = fxk .
The deGiorgi-Nash Moser estimates imply that Du is Holder continu-
ous. Therefore the coefficients D2pipj
L of (85) are Holder continuous.
1. EULER-LAGRANGE EQUATION 129
Finally, the Schauder estimates show that the solutions v of
(aij(x)vxi)xj = f(x)xk ,
with a and f Holder continuous, and a elliptic, have Holder continuous
derivative Dv. The combination of all these estimates yields that v ∈C1,α, that is u ∈ C2,α.
1. Euler-Lagrange equation
The first step to study variational problems involving multiple in-
tegrals is the derivation of necessary conditions for a function to be a
minimizer. In this section we will proceed formally and will not provide
a rigorous justification of the calculations, or worry about the conver-
gence of integrals or the regularity of functions. As the reader will have
the opportunity to observe in the following sections, these are delicate
questions that require careful analysis. However, if adequate hypothe-
ses are imposed, all the calculations in this section can be properly
justified.
1.1. Scalar case. Let L, the Lagrangian, be
L(p, z, x) : Rn × R× U → R+
be a C∞ function, U a bounded and open subset of Rn with smooth
boundary (C∞). We would like to study the minimizers of
I[w] =
∫U
L(Dxw,w, x)dx,
in the set A of functions w that they satisfy certain boundary condi-
tions, for instance,
A = w = g in ∂U ,
where g : ∂U → R is a fixed function.
Let w0 be a minimizer of I[·]. In an analogous way to what was
done in last chapter, we are also going to deduce the Euler-Lagrange
equation.
130 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS
Theorem 60 (Euler-Lagrange equation). Let w0 be a C2 minimizer of
I[·]. Then
(86) divx[DpL(Dxw0, w0, x)]−DzL(Dxw0, w0, x) = 0.
Proof. If w0 ∈ A then for all φ ∈ C∞0 (U), we have w0 + εφ ∈ A.
Therefore if w0 is a minimum of I[·], the function
i(ε) = I[w0 + εφ]
has a minimum at ε = 0. Consequently i′(0) = 0. Therefore we have∫U
DpL(Dxw0, w0, x)Dxφ+DzL(Dxw0, w0, x)φdx = 0.
Using divergence theorem we conclude that∫U
DpL(Dxw0, w0, x)Dxφ = −∫U
φ divx[DpL(Dxw0, w0, x)].
Therefore, for all φ ∈ C∞0∫U
φ [divx[DpL(Dxw0, w0, x)]−DzL(Dxw0, w0, x)] dx = 0,
which implies
divxDpL(Dxw0, w0, x)−DzL(Dxw0, w0, x) = 0.
Example 35. Let
L(p, z, x) =|p|2
2+ f(x)z,
where f : Rn → R is an arbitrary smooth function. The Euler-Lagrange
equation is then
−∆w + f(x) = 0,
which is the Poisson equation. J
Not every solution to the Euler-Lagrange equation is a minimum
of I, in general solutions can respond to minimum, maximum or even
saddle points. We can, however, as it happens in finite dimension, to
1. EULER-LAGRANGE EQUATION 131
establish further necessary conditions by looking at the second varia-
tion, that is by computing
d2
dε2i(ε)
∣∣∣∣ε=0
,
which for minimum points is nonnegative. Therefore for any φ ∈C∞c (U), we have
(87)
∫U
D2ppLDxφDxφ+ 2D2
pzLDxφ φ+D2zzLφ
2 ≥ 0.
Let B[u, v] be the bilinear form given by the following expression:
B[u, v] =
∫U
D2ppL(Dxw0, w0, x)DxuDxv +D2
pzL(Dxw0, w0, x)vDxu+
+D2pzL(Dxw0, w0, x)uDxv +D2
zzL(Dxw0, w0, x)uv.
From (87) we conclude that B must be positive definite if w0 is a
minimum.
Example 36. Let L = |p|22
+ f(x)z. Then
B[u, v] =
∫U
DxuDxv,
which implies B[φ, φ] ≥ 0. In fact, if φ ∈ C∞c (U) and φ 6= 0 then
B[φ, φ] > 0.
Exercise 114. Prove that in this case this implies that any solution to
be Euler-Lagrange equation is in fact a minimum.
J
Example 37. In this example we derive further necessary conditions
for the existence of a minimizer. Let
ϕδ(x) = δ2η(x) sin
(ξ · xδ
),
where η(x) ∈ C∞c (U). Then
0 ≤ B[ϕδ, ϕδ] =
∫U
D2pipj
Lξiξjη(x)2 sin2
(ξ · xδ
)+O(δ2).
132 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS
Since sin2(ξ·xδ
) 1
2, as δ → 0, we have
D2pipj
Lξiξj ≥ 0,
that is, the mapping p 7→ L(p, w0, x) is convex for any minimizer w0
and any x ∈ U . As we will see, in the scalar case, the convexity in
p of the Lagrangian is very important both to establish existence of
minimizers as well as proving its regularity. For systems, we will be
able to derive weaker conditions (which agree with convexity in the
scalar case) under which one can show the existence of a minimizer
Exercise 115. Compute the Euler-Lagrange equation corresponding to
u 7→∫B1(0)
u2x − u2
y,
with u : C1(B1)∩C(B1) and u = 0 in ∂B1(0). Show that the solutions
to the Euler-Lagrange equation are not minimizers.
Exercise 116. Consider the problem
minu,uν=0 em ∂B1(0)
∫B1(0)
(∆u)2 + uf.
Determine the Euler-Lagrange equation and its second variation. Show
that the solutions to the Euler-Lagrange equation are (global) minimiz-
ers.
Exercise 117. Let Ω ⊂ Rn be a regular domain, and u a C2(Ω)∩C(Ω)
solution of
∆u = f
in Ω, with u = 0 in ∂Ω.
Show that u minimizes ∫Ω
|∇u|2
2+ fu,
over all C2 functions that vanish in ∂Ω.
Exercise 118. Let Ω a regular domain and f ∈ L1(Ω). Show that if∫Ω
fϕ = 0
1. EULER-LAGRANGE EQUATION 133
for all ϕ ∈ C∞c (Ω) satisfying∫
Ωϕ = 0 then∫
Ω
fϕ = 0
for all ϕ ∈ C∞(Ω) with∫
Ωϕ = 0. Show that this implies that f is
almost everywhere constant.
Exercise 119. Consider the problem
min
∫U
|∇u|2 + f(x)udx,
where the minimum is taken over all functions that satisfy∫U
udx = 0,
instead of the usual Dirichlet boundary condition u|∂U = 0. Derive the
Euler-Lagrange equation and a boundary condition for u. Hint: use
the previous exercise.
Exercise 120. Determine the curve y = γ(x), γ(0) = γ(1) = 0, in the
plane such that the area A defined by 0 ≤ x ≤ 1 and 0 ≤ y ≤ γ(x) is
|A| = α (with α sufficiently small) and such that its length is as small
as possible.
Exercise 121. Determine a differential equation for the surface in R3
defined parametrically by
u : B1(0) ⊂ R2 → R3
such that u|∂B1 = γ, that is, its boundary is a given closed curve γ and
which minimizes the area∫B1
det((Du)TDu)1/2dx.
1.2. Systems. For functionals defined for vector valued functions
u : U ⊂ Rn → Rm the derivation of the Euler-Lagrange equation is
similar:
Exercise 122. Let u : U ⊂ Rn → Rm be a minimizer of u : U ⊂ Rn →Rm ∫
U
L(Du, u, x).
134 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS
Show that
−∑k
(Dpαk
L(x, u,Du))xk
= DzαL(x, u,Du).
The second variation is also similar:
Exercise 123. Let u : U ⊂ Rn → Rm be a minimizer of u : U ⊂ Rn →Rm ∫
U
L(Du, u, x).
Show that for any compactly supported ϕ : U → Rm
∑α,β
[∑j,k
∫U
(D2
pαk pβxj
Lϕαxkϕβxj
+∑j
D2pαk z
βLϕαxkϕβ +D2
zαzβLϕαϕβ
)]≥ 0.
Theorem 61. Let u : U ⊂ Rn → Rm be a C2 minimizer of u : U ⊂Rn → Rm ∫
U
L(Du, u, x),
under fixed boundary conditions at ∂U . Then
Lpiαpjβξαξβkikj ≥ 0,
for all vectors ξ ∈ Rm and k ∈ Rn.
Proof. Let η ∈ C∞c (Ω) be a real valued function. Fix ξ ∈ Rm and
k ∈ Rn. Use
ϕ = ε3ξη(x) sink · xε
in exercise 123 as ε→ 0.
A function F satisfies the Legendre-Hadamard condition if
Fpiαpjβ(p, z, x)ξαξβηiηj ≥ θ|ξ|2|η|2
for all vectors ξ ∈ Rn and η ∈ Rm. The Legendre-Hadamard condition
is weaker than convexity in p (unless m = 1 or n = 1), which would be
Fpiαpjβ(p, z, x)M i
αMjβ ≥ θ|M |2,
for all matrices M ∈ Rm×n.
1. EULER-LAGRANGE EQUATION 135
Exercise 124. Let U be a domain in R2. Show that if L(P, z, x) :
R2×2 → R is given by
L(P ) = detP + ε|P |2
then L satisfies the Legendre-Hadamard condition but is not convex if
ε is sufficiently small.
Exercise 125. Use the Lagrange multiplier method to show that the
minimizers of ∫U
|Du|2
with u = 0 in ∂U , under the constraint∫U
|u|2 = 1,
are eigenvalues of the Laplacian.
Exercise 126. Use the Lagrange multiplier method to determine a
boundary condition in ∂U for the minimizers of∫U
|Du|2
under the constraints ∫U
|u|2 = 1,
∫U
u = 0.
Exercise 127. Let 1 < p < ∞. Determine the Euler-Lagrange equa-
tion for the minimizers of the functional∫U
|Du|p,
with u = g in ∂U .
Exercise 128. Let U be a domain in Rn. Let L(P ) : Rn×n → R be
given by
L(P ) = detP.
Determine the Euler-Lagrange equation corresponding to the functional
u 7→∫U
L(Du).
Explain why this Lagrangian is called a ”null Lagrangian”.
136 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS
2. Further necessary conditions and applications
2.1. Boundary conditions.
2.2. Variational inequalities.
2.3. Lagrange Multipliers.
2.4. Minimal surfaces.
2.5. Higher order problems.
3. Convexity and sufficient conditions
4. Direct method in the calculus of variations
4.1. Scalar case. To ensure the existence of a minimizer, we will
impose conditions on the Lagrangian which ensure coercivity and lower
semicontinuity. In our discussion we will consider the following model
problem:
(88) minu|∂U=0
∫U
L(Du, u, x)dx.
Similar methods would work if we were to choose the boundary condi-
tion at ∂U , u = g ∈ C∞(∂U) (or with adequate regularity).
The following condition:
(89) L(p, z, x) ≥ α|p|q − β,
for all (p, z, x) ∈ Rn × R× U , with α, β > 0 and 1 < q <∞ is enough
to ensure coercivity. In fact, it implies
I[w] ≥ α‖Du‖qLq(U) − γ,
4. DIRECT METHOD IN THE CALCULUS OF VARIATIONS 137
for some γ > 0. Consequently, in the Sobolev space W 1,q0 we have
I[w]→∞,
as ‖u‖W 1,q0→∞.
Exercise 129. Show that the functional associated to a Lagrangian
satisfying (83) is coercive in W 1,α0 for 1 < α < ∞ if L(p) ≥ C|p|α, f
is bounded.
Let wk be a maximizing sequence. Then
supk‖wk‖W 1,q <∞.
To see this, observe using wk|∂U = 0 and the Poincare inequality we
have
‖wk‖W 1,q ≤ C‖Dwk‖Lq .
Exercise 130. Let U be a bounded domain, g ∈ C∞(∂U) and wk a
minimizing sequence with the boundary condition w = g on ∂U . Show
that
supk‖wk‖W 1,q <∞.
Since in an infinite dimensional space a bounded sequence may fail
to have any convergent subsequence, we will have to use weak conver-
gence. In a reflexive and separable Banach space any bounded sequence
wk has a weakly convergent subsequence (which we still denote by wk):
wk w.
This means, using a bounded sequence in W 1,q, that∫DwkDφ+ wkφ→
∫DwDφ+ wφ,
for all φ ∈ W 1,q′ , where 1q
+ 1q′
= 1.
As the next example shows, one main difficulty in using weak con-
vergence arises from the lack of continuity with respect to weak con-
vergence of non-linear functionals.
138 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS
Exercise 131. Let wk(x) = sin 2πkx. Show that wk 0 in Lq([0, 1])
and that∫ 1
0w2k = 1
2, independently of k. Conclude that w2
k 6 0.
For our purposes we will show that under certain conditions, we
may have we have weakly lower semicontinuity, that is, whenever
wk w,
then
lim infk→∞
I[wk] ≥ I[w].
Note that in general we do not expect continuity, i.e. I[wk]→ I[w].
Theorem 62. Assume that for fixed z and x the mapping p 7→ L(p, z, x)
is convex then I[·] is weakly lower semicontinuous in W 1,q.
Remark. From the previous chapter we already know that convexity
of L in p is a natural condition.
Proof. Suppose that wk w in W 1,q. Then:
1. supk ‖wk‖W 1,q <∞.
2. By Rellich-Kondrachov theorem we can extract a subsequence
wk → w in Lr, with r < q∗.
3. By extracting, if necessary, a further subsequence, we may as-
sume that wk → w almost everywhere.
4. By Egorov’s theorem, for all ε > 0 there exists a set Eε ⊂ U
such that
|U\Eε| ≤ ε
and
wk → w,
uniformly in Eε.
5. Defining
Fε =
x ∈ U : |w|+ |Dw| < 1
ε
4. DIRECT METHOD IN THE CALCULUS OF VARIATIONS 139
we have |U\Fε| → 0 when ε → 0 and therefore Gε = Fε ∩ Eεsatisfies
|U\Gε| → 0,
when ε→ 0.
We can assume, without loss of generality, that L ≥ 0. Then
I[wk] =
∫U
L(Dwk, wk, x)dx ≥
≥∫Gε
L(Dwk, wk, x) ≥
≥∫Gε
L(Dw,wk, x) +
∫Gε
DpL(Dw,wk, x)(Dwk −Dw)→
→k→∞
∫Gε
L(Dw,w, x),
therefore
lim infk→∞
I[wk] ≥∫Gε
L(Dw,w, x).
Using the monotone convergence theorem when ε→ 0, we obtain
lim infk→∞
I[wk] ≥ I[w].
Exercise 132. Give an example of a sequence uk convergent in Lr
to a function u ∈ Lr which does not converge pointwise. Show that,
however, that there exists a subsequence of uk which converges to u
almost everywhere.
As a corollary we have existence of a solution of (88):
Theorem 63. Suppose L is coercive, that is, it satisfies (89) and con-
vex in p, then there exists a minimizer of (88).
Exercise 133. The following Lagrangian
L(p, z, x) =|p|2
2+ f(x)u
140 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS
does not satisfy (89). Show, using similar ideas that there exists a
minimizer in W 1,2(Ω) of the functional∫Ω
L(Du, u, x)
in the class of functions u|∂Ω = g, for g ∈ C∞(∂Ω).
Exercise 134. Generalize the previous exercise for Lagrangians of the
form
L(Du) + f(x)u,
with L convex and L(p) ≥ c1|p|q + c2 for some 1 < q <∞.
Exercise 135. Use the direct method in the calculus of variations to
establish the existence of minimizers in W 2,2 of the functional∫Ω
|∆u|2
2+ f(x) ·Du+ g(x)u,
with u|∂ω = h1 and uν |∂ω = h2 (where uν is the normal derivative).
Exercise 136. Let Ω ⊂ Rn be a bounded domain. f : Ω → Rn a C∞
function with compact support. Show that the variational problem
minu|∂Ω=0
∫Ω
|∇u|2
2+ f · ∇u+
1
1 + u2
admits a minimizer in W 1,20 (Ω).
Exercise 137. Let Ω be a regular domain. Establish the existence of
minimizers in W 1,2(Ω) of the functional∫Ω
|∇u|2 + |u|2 +
∫∂Ω
g(x)u,
where g :∈ L2(∂Ω).
4.2. Systems. A functional of the form
u 7→∫U
L(Du, u, x)dx
isquasiconvex if for all P ∈ Rn×m, z0 ∈ Rm and x0 ∈ Rm and any cube
Q ⊂ Rn ∫Q
L(P, z0, x0)dx ≤∫Q
L(P +Dv, z0, x0)dx
4. DIRECT METHOD IN THE CALCULUS OF VARIATIONS 141
for all function v with compact support Q.
Exercise 138. Consider a minimizer u : U → Rm of∫U
L(Du, u, x)dx.
Let Q be a cube containing the origin. Suppose ϕ is a compactly sup-
ported function on Q. Let uλ = u+ λϕ(xλ). Deduce from∫
λQ
L(Du, u, x) ≤∫λQ
L(Duλ, uλ, x)
that ∫Q
L(Du(0), u(0), 0) ≤∫Q
L(Du(0) +Dϕ, u(0), 0).
Exercise 139. Show that convexity implies quasiconvexity.
Theorem 64. Let L(P, z, x) : Rn×m×Rm×U → R, U ⊂ Rn a bounded
domain. Suppose that L is quasiconvex and satisfies the following prop-
erties:
• 0 ≤ c|P |p + c ≤ L ≤ C|P |p + C
• |DPL| ≤ C|P |p−1 + C
• |DzL| ≤ C|P |p−1 + C
• |DxL| ≤ C|P |p + C.
Then there exists a minimizer u ∈ W 1,p0 of
I[U ] =
∫U
L(Du, u, x).
Note that a similar result would also hold for non-homogeneous
boundary conditions u = g in ∂U .
Proof. First recall the following result. Let Qi(x) denote a dyadic
cube containing x with sidelenght 2−i. For f ∈ Lp, 1 < p <∞, define
〈f〉i(x) =
∫Qi(x)
− f.
Then 〈f〉i → f in Lp.
142 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS
Clearly any minimizing sequence uk is bounded in W 1,p and there-
fore there exists u ∈ W 1,p0 such that
uk u
and uk → u strongly in Lr for some r > p.
Consider the sequence of measures µk with density µk = |Duk|p +
|uk|p. Then there exists µ such that µk µ. By translation we may
choose a dyadic division of Rn, (Qji ) such that µ(∂Qj
i ) = 0 for all cubes.
Let xji denote the center of Qji , and zji = 〈u〉i(xji ).
Fix ε > 0 and choose V ⊂⊂ U such that∫U\V
L(Du, u, x) ≤ ε.
Then
I[uk] ≥∑
j:Qji∩V 6=∅
∫Qji
L(Duk, uk, x)
=∑
j:Qji∩V 6=∅
∫Qji
L(Duk, 〈uk〉i, xji ) + E0,
where the error term E0 can be estimated as follows:
E0 ≤∑
j:Qji∩V 6=∅
∫Qji
|L(Duk, uk, x)− L(Duk, 〈uk〉i, xji )|
≤∑
j:Qji∩V 6=∅
∫Qji
[(C|Duk|p + C) |xji − x|
+(C|Duk|p−1 + C
)|uk − 〈uk〉i|
]→ 0,
as i → ∞, uniformly in k. Indeed, in first term the convergence is
uniform because ‖Duk‖Lp is globally bounded, whereas in the second
case |uk − 〈uk〉i| → 0, uniformly in k because uk is bounded in W 1,p.
Therefore we have
I[uk] ≥∑
j:Qji∩V 6=∅
∫Qji
L(Du+D(uk − u), zji , xji ) + oi(1),
4. DIRECT METHOD IN THE CALCULUS OF VARIATIONS 143
where oi(1) stand for the error terms that converge to 0 as i → ∞,
uniformly in k. Fix ε > 0. Thus, for i sufficiently large and all k:
I[uk] ≥∑
j:Qji∩V 6=∅
∫Qji
L(Du+D(uk − u), zji , xji )− ε,
Now choose 0 < σ < 1 and denote by Qji a concentric cube with Qj
i
but with edge σ2−i. Choose ϕji smooth, compactly supported with
ϕji =
1 in Qji
0 in (Qji )C ,
and with
|Dϕji | ≤C2i
1− σ.
Then
I[uk] ≥∑
j:Qji∩V 6=∅
∫Qji
L(Du+D(vji ), zji , x
ji ) + E1 − ε,
where
vji = ϕji (uk − u),
and
|E1| ≤ C∑
j:Qji∩V 6=∅
∫Qji\Q
ji
1 + |Du|p + |Duk|p + |Dϕji |p|uk − u|p,
and therefore, for any ε > 0
lim supk→∞
|E1| < ε,
if σ is sufficiently close to 1. Thus we can choose i0, and k0 large enough
so that
I[uk] ≥∑
j:Qji∩V 6=∅
∫Qji
L(Du+D(vji ), zji , x
ji )− 2ε,
144 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS
for all i ≥ i0 and all k ≥ k0. Note that∑j:Qji∩V 6=∅
∫Qji
L(Du+D(vji ), zji , x
ji )
≥∑
j:Qji∩V 6=∅
∫Qji
L(〈Du〉i +D(vji ), zji , x
ji ) + E2.
Furthermore, we have
E2 ≤∑
j:Qji∩V 6=∅
∫Qji
∣∣L(Du+D(vji ), zi, xi)− L(〈Du〉i +D(vji ), zji , x
ji )∣∣
≤∑
j:Qji∩V 6=∅
∫Qji
(1 + |Du|p−1 + |〈Du〉i|p−1)|Du− 〈Du〉i|,
since ‖〈Du〉i‖Lp ≤ ‖Du‖Lp , for i large enough, we have |E2| ≤ ε.
Therefore, using quasiconcavity, for k and i large enough,
I[uk] ≥∑
j:Qji∩V 6=∅
∫Qji
L(〈Du〉i, zji , xji )− 3ε.
Finally, observe that∑j:Qji∩V 6=∅
∫Qji
L(〈Du〉i, zi, xi)
≥∑
j:Qji∩V 6=∅
∫Qji
L(Du, u, x) + E3
where
E3 ≤∑
j:Qji∩V 6=∅
∫Qji
|L(〈Du〉i, zi, xi)− L(Du, u, x)| ,
which also converges to 0 as i → ∞. Therefore, by sending ε → 0 we
obtain that uk converges weakly in W 1,p to a minimizer.
Exercise 140. Suppose that L(P ) satisfies the uniform strict quasi-
convexity property:∫Q
L(P ) +γ
2|Dv|2 ≤
∫Q
L(P +Dv),
for all v ∈ C∞c (Q). Let uk be a minimizing sequence in W 1,2. Show
that uk → u strongly in W 1,2.
5. EULER-LAGRANGE EQUATIONS 145
5. Euler-Lagrange equations
The minimizers we obtained in the previous section using the direct
method in the calculus of variations are critical points and, therefore,
we would like to show that they are solutions (in an appropriate sense)
of the Euler-Lagrange equations.
We will suppose the following additional hypothesis on L:
1. |L(p, z, x)| ≤ C(|p|q + |z|q + 1);
2. |DpL(p, z, x)| ≤ C(|p|q−1 + |z|q−1 + 1);
3. |DzL(p, z, x)| ≤ C(|p|q−1 + |z|q−1 + 1).
A function u ∈ W 1,q is weak solution of the Euler-Lagrange equa-
tion (86) if, for all v ∈ C∞c (U),
(90)
∫U
DpL(Du, u, x)Dxv +DzL(Du, u, x)vdx = 0.
Remark. This is a natural constraint to impose since from (90) we
can obtain (86) by integration by parts.
Theorem 65. Under the previous assumptions, if u ∈ W 1,q minimizes
I[·] then u is a weak solution of the Euler-Lagrange equation.
Proof. Let
i(τ) = I[u+ τv].
Theni(τ)− i(0)
τ=
∫U
Lτ (x),
where
Lτ (x) =L(Du+ τDv, u+ τv, x)− L(Du, u, x)
τ.
Clearly
Lτ (x)→ DpL(Du, u, x)Dv +DzL(Du, u, x)v,
146 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS
almost everywhere. Additionally,
Lτ (x) =1
τ
∫ τ
0
d
dsL(Du+ sDv, u+ sv, x)ds =
=1
τ
∫ τ
0
DpL(Du+ sDv, u+ sv, x)Dv+
+DzL(Du+ sDv, u+ sv, x)vds ≤
≤C(|Du|q + |Dv|q + |u|q + |v|q + 1).
Therefore, the dominated convergence theorem yields the desired re-
sult.
Exercise 141. Prove the last inequality of the previous theorem, that
is,
|DpL(Du+ sDv, u+ sv, x)Dv +DzL(Du+ sDv, u+ sv, x)v|
≤ C(|Du|q + |Dv|q + |u|q + |v|q + 1),
uniformly for 0 ≤ s ≤ τ . Hint: recall the inequalities
ab ≤ ar
r+bs
swith
1
r+
1
s= 1
and
|a+ b|r ≤ C(ar + br).
Exercise 142. Impose conditions on F (A, p, z) so that you can prove
the existence of minimizers in W 2,2 of∫Ω
F (∆u,Du, u),
and that these are weak solutions to the corresponding Euler-Lagrange
equation.
6. Regularity by energy methods
In order to motivate the results of this section, we start with an
example:
6. REGULARITY BY ENERGY METHODS 147
Example 38. Let L = |p|22
+f(x)z. The corresponding Euler-Lagrange
equation is
−∆u+ f(x) = 0.
Let u be a C2 solution of the previous equation. Multiplying the equa-
tion by ∆u and integrating we obtain∫(∆u)2 =
∫f(x)∆u.
Integrating by parts the left-hand side of this identity and ignoring
the boundary terms (of course this wrong and some effort must be done
in order to avoid this difficulty), we have∑i,j
∫|D2
xixju|2 =
∫f(x)∆u ≤ C
ε‖f‖L2 + ε‖∆u‖L2 .
As a conclusion, we have
‖D2u‖2L2 ≤ C‖f‖2
L2 .
This example suggest that if it is possible to somehow control the
boundary terms then the solutions to the Euler-Lagrange equation
should not only be in W 1,2 but also in W 2,2. J
To simplify the presentation we will consider a restricted class of
Lagrangians of the form
L(p)− zf(x),
with
θ ≤ D2ppL(p) ≤ Θ,
for suitable constants 0 < θ < Θ. We should note that more complex
problems can be handled using similar techniques and nothing essential
is really lost by considering this particular problem. We also need to
recall the following theorem Let u : Rn → R. For h ∈ R define
Dhi u =
u(x+ hei)− u(x)
h.
148 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS
Theorem 66. Let 1 ≤ p <∞, u ∈ W 1,p(U) and V , V ⊂⊂ U . Then
‖Dhu‖Lp(V ) ≤ C‖Du‖Lp(U).
Conversely, if u ∈ Lp and
suph‖Dhu‖Lp(V ) ≤ C,
then u ∈ W 1,p(V ).
Theorem 67. Let u ∈ W 1,2(U) be a weak solution of the equation
− div(DpL(Du)) = f.
Then u ∈ W 2,2loc (U).
Proof. Let V ⊂⊂ W ⊂⊂ U (recall that A ⊂⊂ B means that A is
compact subset of B) and ξ ∈ C∞c (U) withξ ≡ 1 in V
ξ ≡ 0 in U\W0 ≤ ξ ≤ 1.
Let h > 0 be sufficiently small and 1 ≤ k ≤ n. Define
v = −D−hk (ξ2Dhku),
where
Dhkw =
w(x+ hek)− w(x)
h.
Exercise 143. Show that the operator Dhk satisfies an “integration by
parts formula”: ∫vDh
ku = −∫uD−hk v,
for u, v ∈ Cc(U).
Suppose u is a weak solution of the Euler-Lagrange equation, then
0 =
∫DpL(Du)Dv − fv =
=
∫Dhk(DpL(Du))D(ξ2Dh
ku) + fD−hk (ξ2Dhku).
6. REGULARITY BY ENERGY METHODS 149
We can rewrite:
Dhk(DpL(Du)) =
DpL(Du(x+ hek))−DpL(Du(x))
h
=1
h
∫ 1
0
d
dsDpL(sDu(x+ hek) + (1− s)Du(x))ds
=1
h
∫ 1
0
D2ppL(· · · )(Du(x+ hek)−Du(x))ds
= ah(x)DhkDu,
where
ah(x) =
∫ 1
0
D2ppL(· · · ).
The matrix ah is positive definite. Therefore
θ
∫ξ2|Dh
kDu|2 ≤∫ξ2ah(Dh
kDu)(DhkDu).
Therefore we have the following estimate:∫U
Dhk(DpL(Du))D(ξ2Dh
ku) ≥ θ
∫U
ξ2|DhkDu|2
+ 2
∫U
ah(DhkDu)(Dh
ku)ξDξ
≥ θ
2
∫U
ξ2|DhkDu|2 − C
∫W
|Dhku|2.
The second term of the Euler-Lagrange equation satisfies the estimate:∣∣∣∣∫ fD−hk (ξ2Dhku)
∣∣∣∣ ≤ C
ε
∫U
|f |2 + C
∫U
|Du|2 + ε
∫U
ξ2|D−hk Dhku|2
≤ C
ε
∫U
|f |2 + C
∫U
|Du|2 + ε
∫U
ξ2|DhkDu|2,
where we used the estimates, which follow from theorem 66,∫U
|(D−hk ξ2
) (Dhku)|2 ≤ C
∫U
|Du|2,
and ∫U
ξ2|D−hk Dhku|2 ≤ C
∫U
ξ2|DhkDu|2.
Therefore, for ε sufficiently small,
θ
4
∫U
ξ2|DhkDu|2 ≤
∫U
|f |2 + |Du|2.
150 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS
So u ∈ W 2,2(V ).
The last theorem implies in particular that the Euler-Lagrange
equation div(DpL)−DzL = 0 holds almost everywhere.
To conclude our discussion concerning energy methods, we are go-
ing to review some facts concerning elliptic equations, namely Lax-
Milgram’s theorem.
Exercise 144. Let u ∈ W 2,2loc be a solution of the Euler-Lagrange equa-
tion
(91) − div(DpL(Du)) = f(x).
Show that u is a weak solution of
−(D2pipj
L(Du)uxkxj)xi = fxk ,
which can be obtained from (91) by differentiation with respect to xk.
Let v = uxk . The previous exercise shows that v is a weak solution
of
(92) − (aijvxj)xi = g,
where
aij = D2pipj
L(Du), g = fxk .
Equation (92) is an elliptic equation since the matrix a is positive
definite, that is,
aijξiξj ≥ θ|ξ|2,
for all vectors ξ ∈ Rn.
The main result to establish existence of solutions of elliptic equa-
tions is Lax Milgram’s theorem
Theorem 68 (Lax-Milgram ). Let H be a Hilbert space with norm ‖·‖,inner product (·, ·) and duality pairing denoted by 〈·, ·〉. Let
B[·, ·] : H ×H → R
6. REGULARITY BY ENERGY METHODS 151
be a continuous bilinear form, that is
|B[u, v]| ≤ α‖u‖‖v‖,
and coercive, that is,
B[u, u] ≥ β‖u‖2.
Let f : H → R be a continuous linear functional (f ∈ H ′). Then there
exists u ∈ H such that
B[u, v] = 〈f, v〉 ∀v ∈ H.
Proof. For the proof of the theorem, we need the following result
from functional analysis:
Theorem 69 (Riesz representation theorem). Let H be a Hilbert space
and H ′ its dual. Then, for each u∗ ∈ H ′, there exists u ∈ H such that
〈u∗, v〉 = (u, v) ∀v ∈ H.
For each fixed u, the functional
v 7→ B[u, v]
is a continuous linear functional. Thus, by Riesz theorem, there exists
w ∈ H, dependent upon u that we denote by
w = Au,
such that
B[u, v] = (Au, v).
We will show that A is a continuous linear mapping. To establish
linearity it suffices to observe that
(A(λ1u1 + λ2u2), v) = B[λ1u1 + λ2u2, v] =
= λ1B[u1, v] + λ2B[u2, v] =
= λ1(Au1, v) + λ2(Au2, v).
The continuity follows from the estimate
‖Au‖2 = (Au,Au) = B[u,Au] ≤ α‖u‖‖Au‖,
152 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS
and, therefore,
‖Au‖ ≤ α‖u‖.By coercivity we have
β‖u‖2 ≤ B[u, u] = (Au, u) ≤ ‖Au‖‖u‖,
and, therefore,
‖Au‖ ≥ β‖u‖.consequently, A is injective and its image is closed in H.
Finally, we claim that the image of A is H. For that, let w ∈ R(A)⊥.
Then
β‖w‖2 ≤ B[w,w] = (Aw,w) = 0
and, therefore, w = 0. Therefore, we have just shown that A has a
continuous inverse.
Again, by Riesz theorem, there exists w such that
〈f, v〉 = (w, v),
and, consequently, since A is invertible, there exists u such that
Au = w,
that is
B[u, v] = (Au, v) = (w, v) = 〈f, v〉.
As an application of the Lax-Milgram theorem, we have the follow-
ing result:
Example 39. Let H = W 1,20 , f ∈ L2 and
B[u, v] =
∫U
aijuxivxj ,
with aij elliptic,
〈f, v〉 = −∫U
fvxk .
We have
B[u, v] ≤ C‖u‖W 1,20‖v‖W 1,2
0
6. REGULARITY BY ENERGY METHODS 153
and, by Poincare inequality,
B[u, u] ≥ β‖u‖2W 1,2
0.
Thus, by Lax-Milgram’s theorem, there exists a weak solution in W 1,20
of
−(aijuxi)xj = fxk .
J
Exercise 145. Use Lax-Milgram’s theorem to establish the existence
of solutions of
∆2u = f,
with u ∈ W 2,20 (B1(0)).
Exercise 146. Suppose that b(x) : Rn → Rn is a bounded C∞(Rn)
function and that f ∈ L2(Rn). Use Lax-Milgram’s theorem to establish
the existence of solutions in W 1,2(Rn) of
−∆u+ b(x) · ∇u+ λu = f,
for λ large enough.
In what remains in this section we will establish an essential result:
Garding’s inequality.
Theorem 70. Let Aαβij (x) be uniformly continuous function and satis-
fying ∑ij
∑αβ
Aαβij ηαηβξiξj ≥ C|η|2|ξ|2.
Let U be a bounded domain with smooth boundary. Then, for all u ∈W 1,2(U) we have
C
∫|u|2 +
∫ ∑ij
∑αβ
Aαβij DiuαDju
β ≥ C
∫|Du|2.
Proof. By the extension theorem, for u ∈ W 1,2(U) there exists
another function u ∈ W 1,2(Rd), compactly supported, such that u = u
in U and ‖u‖W 1,2(Rd) ≤ C‖u‖W 1,2(U). We will drop the ∼ in what
follows, to simplify the notation.
154 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS
First consider the case in which Aαβij is constant. In this case, by
using Fourier transform we have∫ ∑ij
∑αβ
Aαβij DiuαDju
β = C∑ij
∑αβ
∫Aαβij ξiξju
αuβ
≥ C
∫|ξ|2|u|2 ≤ C‖u‖W 1,2(Rd).
Now we consider a localized version of the inequality, suppose that
suppu ⊂ BR(x0) with R sufficiently small. Let ω(R) denote the mod-
ulus of continuity of Aαβij . Then∫ ∑ij
∑αβ
Aαβij (x)DiuαDju
β =
∫ ∑ij
∑αβ
Aαβij (x0)DiuαDju
β
+
∫ ∑ij
∑αβ
(Aαβij (x)− Aαβij (x0))DiuαDju
β
≥ C‖u‖W 1,2 − ω(R)‖u‖W 1,2 ≥ C
2‖u‖W 1,2 ,
if R is small enough.
Since we can assume that u has compact support in a fixed compact,
we can use a partition of unity to write
u =∑k
ϕ2ku.
Then we have∑k
∫ ∑ij
∑αβ
ϕ2kA
αβij (x)Diu
αDjuβ
=∑k
∫ ∑ij
∑αβ
Aαβij (x)Di(ϕkuα)Dj(ϕku
β) + low order terms
Thus by reassembling everything we obtain the desired inequality.
Exercise 147. Use Garding’s estimate to obtain W 2,2 regularity for
minimizers of Lagrangians that satisfy the Legendre-Hadamard condi-
tion (for systems, the scalar case was already considered!).
7. HOLDER CONTINUITY 155
Exercise 148. Let h > 0 and uhn be the following sequence obtained
by the following inductive procedure: given uhn ∈ W 1,2(Rn), uhn+1 is
determined by:
minuhn+1
∫Rn
(uhn+1 − uhn)2
2h+|∇un+1|2
2,
where uh0 ≡ u0 is the initial data.
1. Use the direct method in the calculus of variations to show that
for each uhn there exists a uhn+1 ∈ W 1,2(Rn).
2. Show that the sequence ‖∇uhn‖L2 is decreasing.
3. Determine the Euler-Lagrange equation for uhn+1.
4. Consider the family in L2 indexed by h
vh =∞∑k=0
uhk(x)1kh≤t<(k+1)h.
Show that vh(·, t) is uniformly bounded in L2(Rn) for all h and
t ∈ [0, T ].
5. Show that, there exists v such that when h→ 0, vh v em L2
and that v is a weak solution of the heat equation
vt = ∆v.
7. Holder continuity
This section is dedicated to establishing C1,α regularity for the solu-
tions of scalar variational problems. As before, we are going to consider
the problem
(93) − div(DpL(Du)) = f.
We will prove that for V ⊂⊂ U ,
Du ∈ Cα(V ),
for some 0 < α < 1, independently of the boundary data. As before,
we will work with x ∈ Rd for d ≥ 2.
156 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS
For that we will differential (93) with respect to an arbitrary direc-
tion and we conclude that v = uxk is a weak solution to
−(D2pipj
L(Du)vxj)xi = fxk .
This leads us to look at estimates for the linear equation
−(aijvxi)xj =∑k
(fk)xk ,
for fk ∈ L2.
First we will establish certain L∞ estimates for the non-homogeneous
linear equations with zero boundary data. Then we consider the ho-
mogeneous equation subjected to non-zero boundary data. Finally we
gather all these estimates to establish our main result.
The regularity for systems is harder and the methods studied in
this section cannot be applied as they rely on the solution u being a
scalar.
7.1. L∞ estimates. Our first step consists in obtaining L∞ esti-
mates for non-homogeneous linear equations with zero boundary data.
We start by establishing an auxiliary result
Lemma 71. Let β > 1, α > 0, and C > 0. Suppose
φ(h) ≤ C
(h− k)α(φ(k))β.
Then
φ(M) = 0,
for M =(Cφ(0)β−12αβ/(β−1)
)1/α.
Proof. Define
kn = M(1− 1
2n).
Then
(94) φ(kn+1) ≤ C
(kn+1 − kn)αφ(kn)β ≤ C
2α(n+1)
Mαφ(kn)β.
7. HOLDER CONTINUITY 157
We now will prove by induction that
φ(kn) ≤ φ(0)2−nµ,
with µ = αβ−1
> 0. The case n = 0 is trivial. If the induction hypothesis
holds for some n we must show it also holds for n+ 1. Using (94) and
the induction hypothesis we have
φ(kn+1) ≤ C2α(n+1)
Mαφ(0)β2−βnµ
≤ φ(0)β[
C1/α2n+12−βn/(β−1)
C1/αφ(0)(β−1)/α2β/(β−1)
]α≤ φ(0)
[2n+1−βn/(β−1)
2β/(β−1)
]α= φ(0)2−α(n+1)/(β−1).
Our main theorem in this section is the following:
Theorem 72. Let fi ∈ Lp for some p > d. Let u be a solution of
(95) − (aijuxi)xj =∑i
(fi)xi in Ω
with u = 0 on ∂Ω. Then
(96) ‖u‖L∞ ≤ C‖f‖Lp(Ω)|Ω|1d− 1p .
Proof. Let k > 0 and multiply (95) by (u − k)+. Then, after an
integration by parts∫Ω
aijuxi(u− k)+xjdx = −
∑i
∫Ω
fi(u− k)+xidx.
Define
A(k) = u > k ∩ Ω.
Then ∫A(k)
aijuxiuxj = −∑i
∫A(k)
fiuxi .
158 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS
Therefore, since aij is elliptic
θ
∫A(k)
|∇u|2 ≤ C
(∫A(k)
|fi|2)1/2(∫
A(k)
|∇u|2)1/2
,
which then yields∫A(k)
|∇u|2 ≤ C
∫A(k)
|fi|2 ≤(∫
A(k)
|fi|p)2/p
|A(k)|1−2/p.
If d > 2 (in the case d = 2 we can choose in the place of 2∗ any exponent
q > 2, and proceed analogously), by Sobolev theorem(∫A(k)
((u− k)+)2∗)2/2∗
=
(∫Ω
((u− k)+)2∗)2/2∗
≤ C
∫Ω
|∇(u− k)+|2
≤∫A(k)
|∇u|2 ≤ C∑i
‖fi‖2Lpφ(k)1−2/p,
where φ(k) = |A(k)|. We also have for any h > k,
(h− k)2φ(h)2/2∗ ≤(∫
A(h)
((u− k)+)2∗)2/2∗
≤ C∑i
‖fi‖2Lpφ(k)(1−2/p).
Therefore we obtain the following relation
φ(h) ≤ C
(∑i
‖fi‖Lp)2∗
φ(k)β
(h− k)α,
where α = 2∗, β =1− 2
p
1− 2d
= (1 − 2p)2∗
2. Then lemma 71 implies that
φ(M) = 0 for some
M ≤ C∑i
‖fi‖Lpφ(0)(β−1)/α ≤ C∑i
‖fi‖Lp|Ω|1/d−1/p.
7. HOLDER CONTINUITY 159
7.2. Holder continuity for the homogeneous equation. Now
we consider weak solutions to the equation
(97) − (aijvxi)xj = 0,
where aij satisfies
θ ≤ [aij] ≤ Θ,
but no regularity assumptions are imposed, as well as no boundary
data.
A function u ∈ W 1,p(U) is a subsolution of (97) if∫U
aijuxiφxj ≤ 0,
for all φ ∈ W 1,p′
0 (U) with φ ≥ 0. In a similar way, u is a supersolution
if −u is a subsolution.
Exercise 149. Let u be a smooth subsolution of (97). Show that
−(aijuxi)xj ≤ 0.
Lemma 73. Let u be a subsolution of (97) in W 1,p and ψ : R →R a non-decreasing convex function, such that ψ(u) ∈ W 1,p (e.g. ψ′
bounded). Then ψ(u) is also a subsolution.
Proof. Let v = ψ(u). Then∫aijψ(u)xiφxj =
∫aijψ
′(u)uxiφxj =
=
∫aijuxi(ψ
′(u)φ)xj −∫aijuxiuxjψ
′′(u)φ ≤ 0,
since u is a subsolution, ψ′(u)φ is non negative and, by the convexity
of ψ, the last term is negative.
The next lemma shows that the subsolutions of the equation have
its supremum controlled by the Lp norm. This is not a surprising result
since the main strategy in the study of elliptic equations is to try to
establish control of ”‘high”’ norms in terms of ”‘low”’ norms, recall for
instance what was discussed concerning energy methods.
160 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS
Lemma 74. Let u be a subsolution (97). Then, for p > 0 and 0 < θ <
1,
esssupBRθ u ≤C
(1− θ)n/p
[∫BR
− (u+)p]1/p
.
Proof. Since u+ is a subsolution, we can assume that u ≥ 0.
Case 1. p ≥ 2
Let φ = ξ2up−1, with ξ ∈ C∞c . Then∫aijuxiφxj =
∫aijuxi
[(p− 1)up−2uxjξ
2 + 2ξξxjup−1]≤ 0,
which implies ∫up−2|Du|2ξ2 ≤ C
∫up|Dξ|2.
Since
D(up/2ξ) = Dξup/2 +p
2ξup/2−1Du,
we have ∫|D(up/2ξ)|2 ≤ C
∫up|Dξ|2.
consequently, by Sobolev’s inequality,[∫(up/2ξ)2∗
]2/2∗
≤ C
∫up|Dξ|2.
Given 0 < ρ < R, let ξ ∈ C∞c with 0 ≤ ξ ≤ 1, ξ ≡ 1 in Bρ = B(x0, ρ)
and ξ ≡ 0 em BR = B(x0, R)C . We can additionally assume that
|Dξ| ≤ C
R− ρ.
Then, for n ≥ 3 (for n < 3 the estimate is trivial by Sobolev’s theorem),
(98)
[∫Bρ
upn/(n−2)
](n−2)/n
≤ C
(R− ρ)2
∫BR
up.
Thus we have obtained an estimate for the Lpnn−2 norm in terms of the Lp
norm. Unfortunately, these norms are computed in distinct sets. The
main idea is to iterate this inequality and, at the same time, control the
7. HOLDER CONTINUITY 161
domains and the estimate’s constants in order to obtain a non-trivial
estimate for the L∞ norm in terms of the Lp norm. For that, consider
Rk = R(θ +1− θ
2k),
which satisfies
Rk −Rk+1 =1− θ2k+1
R.
Let
pk = p
(n
n− 2
)k.
Then, applying estimate (98), with R = Rk, ρ = Rk+1 and p = pk,[∫BRk+1
upk+1
]n−2n
≤ C
R2(1− θ)24k+1
∫BRk
upk ,
that is
‖u‖Lpk+1 (BRk+1) ≤
[C
R2(1− θ)2
] 1pk
4k+1pk ‖u‖Lpk (BRk ).
By iteration we obtain
‖u‖Lpk+1 (BRk+1) ≤
[C
R2(1− θ)2
]∑kj=0
1pj
4∑kj=0
j+1pj ‖u‖Lp(BR).
Since∞∑j=0
1
pj=
n
2p,
and∑∞
j=0j+1pj
is finite, we get
‖u‖Lpk+1 (BRk+1) ≤
C
[R(1− θ)]n/p‖u‖Lp(BR),
where the last constant, C, is independent of k. Letting k → ∞ we
conclude
‖u‖L∞(BRθ) ≤C
(1− θ)n/pR−n/p‖u‖Lp(BR) =
=C
(1− θ)n/p
[∫BR
− up]1/p
.
Case 2. 0 < p < 2
162 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS
By the previous estimate,
‖u‖L∞(BRθ) ≤C
(1− θ)n/2Rn/2
[∫BR
u2
]1/2
≤
≤ C
(1− θ)n/2Rn/2
[∫BR
up]1/2
‖u‖1−p/2L∞(BR).
Using the inequality:
ab ≤ a2/p
2/p+
b2/(2−p)
2/(2− p),
which holds for 0 < p < 2, we obtain:
‖u‖L∞(BRθ) ≤1
2‖u‖L∞(BR) +
C
[(1− θ)R]n/p
[∫BR
up]1/p
.
If we define
ϕ(t) = ‖u‖L∞(Bt),
we have, for s < t ≤ R,
(99) ϕ(s) ≤ 1
2ϕ(t) +
C
(t− s)n/p‖u‖Lp(BR).
We need now a technical lemma:
Lemma 75. Let ϕ be a bounded non-decreasing function satisfying
(99). Then, for s < t ≤ R,
ϕ(s) ≤ C‖u‖Lp(BR)(t− s)−n/p.
Thus the lemma implies
‖u‖L∞(BRθ) ≤ C‖u‖Lp(BR)
(1− θ)n/pRn/p.
Proof. Let ϕ satisfying
ϕ(s) ≤ 1
2ϕ(t) + a(t− s)−α,
for s < t. Let 0 < τ < 1 and
si+1 = si + (1− τ)τ i(t− s),
7. HOLDER CONTINUITY 163
with s−1 = s. Then
ϕ(si) ≤1
2ϕ(si+1) + a(1− τ)−ατ−iα(t− s)−α,
and, therefore by induction
ϕ(s) ≤ 1
2iϕ(si) + a
i−1∑j=0
(1− τ)−ατ−jα(t− s)−α2−j.
Choosing τ sufficiently close to one 1 such that τ−α
2< 1 we have, as
i→∞,
ϕ(s−1) = ϕ(s) ≤ Ca(t− s)−α.
This ends the proof of the lemma.
The next step is to study estimates similar to the ones of lemma 74
for p < 0. In this case we obtain, however, the opposite inequality.
Lemma 76. Let u be a non-negative supersolution. Then there exists
δ > 0 and p0 > 0 such that
(100) essinfBR/2 u ≥ δ
(∫BR
− up0
)1/p0
.
Proof. We will leave the following fact as an exercise:
Exercise 150. Let u be a positive supersolution. Show that 1u
is a
subsolution.
Combining the last exercise with lemma 74 we obtain
esssupBR/2 u−1 ≤ C
(∫BR
− u−p)1/p
,
for p > 0. In this way,
essinfBR/2 u ≥ C
[∫BR
− u−p∫BR
− up]−1/p(∫
BR
− up)1/p
,
164 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS
which implies (100) if we can prove(∫BR
− u−p)(∫
BR
− up)≤ C,
for some p > 0 and C > 0.
To prove this inequality we need the John-Nirenberg lemma, whose
proof is the subject of the next section.
Lemma 77 (John-Nirenberg). Denote by Q a generic cube contained
in U and Q′ ⊂ Q a generic subcube of Q. For f ∈ L1, let |f |∗,Q be
given by
|f |∗,Q = supQ′⊂Q
∫Q′− |f − fQ′ |dx,
where fQ′ denotes the average of f in Q′.
Then, if |f |∗,Q < ∞, there exist positive constants C1, C2 and λ
such that1
|Q||x ∈ Q : |f(x)− fQ| ≥ λ|f |∗,Q| ≤ C1e
−λC2 .
We leave as an exercise the proof of the following corollary:
Corollary 78. If |f |∗,Q <∞ then for some ε > 0 we have∫Q
− eεf < C,
independent of Q.
The proof of the corollary is left as an exercise, which is a variation
of the following lemma:
Lemma 79. Let f ∈ Lp, f ≥ 0. Then∫|f |p =
∫ ∞0
pλp−1|x : f(x) > λ|dλ.
Proof. We have∫|f |pdx =
∫ ∫ f(x)
0
pλp−1dλdx =
∫ ∞0
∫pλp−1χλ<f(x)dxdλ.
7. HOLDER CONTINUITY 165
Let
v = lnu− β.If |v|∗,Q <∞, corollary (78) implies∫
− up0 ≤ Ceβp0
and ∫− u−p0 ≤ Ce−βp0 .
consequently, ∫− up0
∫− u−p0 ≤ C,
for some p0 > 0 to be determined. This suggests we should try to
estimate | lnu|∗,Q = | ln v|∗,Q.
Let φ(x) = ξ2
u(x). Then
−∫aijuxiuxj
ξ2
u2+
∫aijuxi
2ξξxju≥ 0,
which implies ∫|Du|2
u2ξ2 ≤ C
∫|Dξ|2.
Let ξ ≡ 1 in Q′ and ξ ≡ 0 in the exterior of a cube with twice the
sidelenght and same center. Then we conclude∫Q′|D lnu|2 ≤ Cρn−2,
where ρ is the sidelenght of Q′.
The Poincare inequality implies∫Q′| lnu− (lnu)Q′|2 ≤ Cρ2
∫Q′|D lnu|2 ≤ Cρn.
Thus,∫Q′| lnu− (lnu)Q′ | ≤ Cρn/2
[∫Q′| lnu− (lnu)Q′|2
]1/2
≤ Cρn.
Therefore | lnu|∗,Q <∞, which ends the proof.
166 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS
Theorem 80 (Harnack inequality). Let u be a positive solution. Then
essinfBR/2 u ≥ C esssupBR/2 u.
Proof. By the two previous lemmas we have
essinfBR/2 u ≥ δ
(∫BR
− up0
)1/p0
≥ C esssupBR/2 u.
Using Harnack’s inequality, the Holder continuity of u is a conse-
quence of the following theorem
Theorem 81 (deGiorgi-Nash-Moser). Let u be a solution of (97) .
Then u is Holder continuous. Furthermore, if we set
M(R) = esssupBR u,
m(R) = essinfBR u.
and let ω(R) = M(R)−m(R), there exists γ < 1 such that
ω(R/2) ≤ γω(R).
Proof. The Harnack inequality implies, by subtracting m(R)− εto u, and letting ε→ 0,
C[m(R/2)−m(R)] ≥M(R/2)−m(R).
Defining ω(r) = M(r)−m(r) we have
ω(R/2) = M(R/2)−m(R/2)
≤M(R/2)−(m(R) +
1
C[M(R/2)−m(R)]
)=
(1− 1
C
)[M(R/2)−m(R)]
≤(
1− 1
C
)ω(R).
By induction we obtain
ω(2−kR) ≤ ηkω(R),
7. HOLDER CONTINUITY 167
with η < 1. Therefore
ω(ρ) ≤ Cρα,
that is u is Holder continuous.
7.3. John-Nirenberg Lemma. Before discussing the proof of
John-Nirenberg lemma, we need to establish a version of the Calderon-
Zygmund decomposition:
Lemma 82. Let Q be a dyadic cube and Q ⊃ Q the unique dyadic cube
whose side is twice the size of Q.
Let f ∈ L1(Q0) and α > |fQ0|. Then there exists a disjoint sequence
of dyadic cubes Qj such that
|fQj | ≤ α < |fQj |,
and |f | ≤ α almost everywhere in Q0\ ∪j Qj.
Proof. We start with Q0 which we divide into 2n dyadic cubes
Q0,k. Then we select those in which
|fQ0,k| > α
and we subdivide again the ones which are not selected. By continuing
iteratively, we obtain a sequence of cubes Qj such that
|fQj | > α
and
|fQj | ≤ α,
since Qj has not been selected. By Lebesgue differentiation theorem,
in the complement of ∪jQj
|f | ≤ α,
almost everywhere, since no cube of the complement was selected.
Now we give the proof of John-Nirenberg lemma (lemma 77).
168 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS
Proof. Without loss of generality we may assume that fQ = 0 and
|f |∗,Q = 1.
Let α0 > 0 and for each natural l apply lemma (82) with α = α0l.
Let Qlj be the sequence of cubes that are obtained in this way. Then,
|fQlj | > lα0 |fQlj | ≤ lα0,
and for x 6∈ ∪Qlj
|f(x)| ≤ lα0.
Now we are going to estimate ∣∣∪jQlj
∣∣ .In the complement of this set |f | ≤ lα0 and therefore the previous
estimate gives an upper bound:
|x : |f(x)| > lα0| .
We are going to establish a recurrence relation between the estimate at
l and the one at l+1. This estimate will allow us to obtain exponential
decay in l.
Fix l and j, and suppose i is such taht Ql+1i ⊆ Ql
j. Then∣∣∣fQlj − fQl+1i
∣∣∣ ≤ ∫Ql+1i
− |f − fQlj |
and therefore
(101)∑i:Ql+1
i ⊆Qlj
|Ql+1i ||fQlj − fQl+1
i| ≤
∑i:Ql+1
i ⊆Qlj
∫Ql+1i
|f − fQlj | ≤ |f |∗,Q|Qlj|.
If, for Ql+1i ⊆ Ql
j, we obtain a lower bound for
|fQlj − fQl+1i|,
equation (101) yields a recurrence relation for the values of∑|Ql+1
i |as a function of
∑|Ql
j|, by adding both sides over j. To obtain this
7. HOLDER CONTINUITY 169
lower bound, observe that
|fQlj − fQl+1i| ≥ |fQl+1
i| − |fQlj |
≥ |fQl+1i| − |fQlj − fQlj | − |fQlj |
≥ (l + 1)α0 − |fQlj − fQlj | − lα0.
However, if P is a dyadic subcube of Q with P ⊆ Q, then
|fP − fP | =1
|P |
∣∣∣∣∫P
(f − fP )
∣∣∣∣ ≤ 1
|P |
∫P
|f − fP |
=2d
|P |
∫P
|f − fP | ≤2d
|P |
∫P
|f − fP |
≤ 2d|f |∗,Q ≤ 2d.
Therefore
|fQlj − fQl+1i| ≥ (l + 1)α0 − 2d − lα0 = α0 − 2d.
If we choose α0 = 2 + 2d we obtain
|fQlj − fQl+1i| ≥ 2,
which implies ∑i
|Ql+1i | ≤
1
2
∑j
|Qlj|.
Therefore
|x : |f(x)| > lα0| ≤ 2−l−1|Q|,
which easily yields the lemma.
7.4. Holder continuity. Finally we use all the estimates in the
previous sections to establish interior Holder continuity.
Theorem 83. Let u be a solution of
(102) − (aijuxi)xj = (fi)xk
in an open set U . Then u is Holder continuous in any compact subset
of U .
170 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS
Proof. Write u = v + w where v is a solution of
−(aijvxi)xj = (fi)xk ,
in B2R and v = 0 in ∂B2R. Therefore w solves
−(aijwxi)xj = 0,
in B2R and with arbitrary boundary data in ∂B2R. Then we have
‖v‖L∞(B2R) ≤ CR1−n/p,
where C depends on the Lp norm of f , ellipticity of aij but not on the
solution u or R.
Let ωw be the modulus of continuity of w. Then for all R′ < R we
know that
ωw(R′
4) ≤ ηωw(R′),
for some 0 < η < 1. Hence
ωu(R/4) ≤ CR1−d/p + ωw(R/4)
≤ CR1−n/p + ηωw(R)
≤ CR1−d/p + ηωu(R).
Then the Holder continuity follows from next lemma:
Lemma 84. Suppose ω(R/4) ≤ CRα + ηω(R). Then ω(R) ≤ CRγ.
Proof. Suppose
M > supR0≤R≤4R0
ω(R)
Rγ,
to be chosen later as a function of γ. Then, for all R0/4 ≤ R ≤ R0 we
have
ω(R) ≤ ηω(4R) + C(4R)α
≤Mη(4R)γ + C(4R)α
≤MRγ,
if we choose γ < α sufficiently small, and then M large enough so that
4γMη + 4αC < M.
8. SCHAUDER ESTIMATES 171
Now, if R0
4i+1 ≤ R ≤ R0
4iwe have
ω(R) ≤ (Mη4γ + 4αC)Rγ ≤MRγ.
8. Schauder estimates
In this section we will prove that weak solutions of equations of the
form
(aij(x)vxi)xj = f
are C1,α, as long as both the coefficients aij and f are Holder continuous
functions. These are the so called Schauder estimates. We should
observe that although we will carry out the proof for the scalar case,
the argument is unchanged for elliptic systems, in contrast with the
regularity results of the previous section.
8.1. Morrey and Campanato spaces. The key idea in Schauder
estimates is to use the ellipticity of the equation to control the oscilla-
tion of the solution. For this we will need certain spaces of functions the
Campanato and Morrey spaces, as well as some of its basic properties.
For p ≥ 1 and λ ≥ 0 we define the Campanato seminorm
[u]p,λ =
[sup
x∈U,ρ>0ρ−λ
∫U(x,ρ)
|u− ux,ρ|p]1/p
,
where U(x, ρ) = U ∩B(x, ρ) and
ux,ρ =
∫U(x,ρ)
− u.
To avoid technicalities, we assume that for ρ sufficiently small, |U(x, ρ)| ≥cρ−n. In any case, our main objective is to establish interior estimates
on U and not up to the boundary.
172 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS
The Campanato space Lp,λ(U) is the space of functions u ∈ Lp(U)
which satisfy
‖u‖p,λ ≡ ‖u‖Lp + [u]p,λ <∞.
The Morrey space Lp,λ(U) is the space of functions u ∈ Lp(U) for which
‖u‖Lp,λ ≡[supx∈U
supρ>0
ρ−λ∫U(x,ρ)
|u|p]1/p
<∞.
Exercise 151. Show that [·]p,λ and ‖ · ‖Lp,λ are, respectively, a semi-
norm and a norm.
Proposition 85. Depending on the relative values of λ, p and n we
have the following isomorfisms:
(i) If 0 ≤ λ < n then Lp,λ ' Lp,λ;
(ii) If n < λ < n+ p then Lp,λ ' C0,λ−np ;
(iii) Lp,0 = Lp and Lp,λ ' R if λ > n+ p.
Proof. To prove (i), we start by showing that
[u]p,λ ≤ C‖u‖Lp,λ
and then we will establish the opposite inequality. Let us start by
observing that∫U(x,ρ)
|u− ux,ρ|p ≤ C
∫U(x,ρ)
(|u|p + |ux,ρ|p) .
Then Jensen’s inequality implies
|ux,ρ|p ≤∫U(x,ρ)
− |u|p,
and, therefore, ∫U(x,ρ)
|ux,ρ|p ≤∫U(x,ρ)
|u|p.
This implies,
ρ−λ∫U(x,ρ)
|u− ux,ρ|p ≤ C‖u‖pLp,λ
,
that is, ‖u‖p,λ ≤ C‖u‖Lp,λ .
8. SCHAUDER ESTIMATES 173
To prove the opposite inequality, we need some preliminary esti-
mates. First, observe that∫U(x,ρ)
|u|p ≤ C|ux,ρ|p|U(x, ρ)|+ C
∫U(x,ρ)
|u− ux,ρ|p
≤ Cρn|ux,ρ|p + C[u]pp,λρλ.
Therefore
(103) ρ−λ∫U(x,ρ)
|u|p ≤ C[u]pp,λ + Cρn−λ|ux,ρ|p.
Unfortunately, the norm Lp,λ does not control directly |ux,ρ|p. To use
this estimate we need some auxiliary estimates. For R > r we have:
|ux,R − ux,r|p ≤ Cr−n∫U(x,r)
(|ux,R − u|p + |ux,r − u|p)(104)
≤ Cr−n(Rλ + rλ)[u]pp,λ ≤ Cr−nRλ[u]pp,λ.
Let R = R02−i, r = R02−i−1 and R0 > 1. Then
|ux,R − ux,r| ≤ CR(λ−n)/p0 2(n−λ)i/p[u]p,λ.
Let ρ = R02−l−1. Then
|ux,ρ| = |ux,R02−l−1| ≤ |ux,R02−l−1 − ux,R02−l |+ |ux,R02−l − ux,R02−l+1|
+ . . .+ |ux,R0/2 − ux,R0 |+ |ux,R0|
≤ |ux,R0|+ C[u]p,λ
l∑i=0
R(λ−n)/p0 2(n−λ)i/p
≤ |ux,R0|+ Cρ(λ−n)/p[u]p,λ.
Therefore
|ux,ρ|p ≤ C|ux,R0|p + Cρλ−n[u]pp,λ ≤ C‖u‖pLp + Cρλ−n[u]pp,λ.
By combining the last inequality with (103) and using λ < n, we have
ρ−λ∫U(x,ρ)
|u|p ≤ C‖u‖pLp + [u]pp,λ.
In what concerts the second statement of the proposition, (ii), the
inclusion
C0,λ−np ⊂ Lp,λ
174 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS
is elementary and is left as an exercise:
Exercise 152. Show that C0,λ−np ⊂ Lp,λ.
So we need to establish the opposite inequality, Lp,λ ⊂ C0,λ−np . Let
u ∈ Lp,λ ∩ C0. Given x and y, let R = |x− y| and α = λ−np
. We must
show that
|u(x)− u(y)| ≤ CRα.
By the triangle inequality
|u(x)− u(y)| ≤ |u(x)− ux,2R|+ |ux,2R − uy,2R|+ |uy,2R − u(y)|.
Applying (104) we obtain
|ux,R − ux,R2−l−1 | ≤l∑
i=0
|ux,R2−i − ux,R2−(i+1)|
≤ CRα[u]p,λ
l∑i=0
2−αi ≤ CRα[u]p,λ,
where the constant C is independent of l. Therefore, by taking l→∞,
we obtain
|ux,2R − u(x)|, |uy,2R − u(y)| ≤ CRα[u]p,λ.
We also have,
|U(x, 2R) ∩ U(y, 2R)||ux,2R − uy,2R| ≤∫U(x,2R)
|ux,2R − u|+
+
∫U(y,2R)
|uy,2R − u|.
Since |U(x, 2R)∩U(y, 2R)| ≥ cRn, we obtain, using Holder’s inequality,
|ux,2R − uy,2R| ≤ CR−n|U(x, 2R)|1−1/p
(∫U(x,2R)
|ux,2R − u|p)1/p
+
+ CR−n|U(y, 2R)|1−1/p
(∫U(y,2R)
|uy,2R − u|p)1/p
≤ CRα[u]p,λ.
The last statement of the proposition, (iii), is left as an exercise:
8. SCHAUDER ESTIMATES 175
Exercise 153. Prove (iii). Hint: observe that if |u(x) − u(y)| ≤C|x− y|α for some α > 1 then u is constant.
8.2. Preliminary estimates. The next lemma gives a key esti-
mate concerning the behavior of solutions of elliptic equations in the
interior of U .
Lemma 86. Let u ∈ H1 be a solution of
−(aijuxi)xj = fxk ,
with aij elliptic satisfying θ ≤ aij ≤ Θ. Then, for any r and x for
which B(x, r) ⊂ U , we have:∫B(x, r
2)
|Du|2 ≤ C
[r−2
∫B(x,r)
|u|2 +
∫B(x,r)
|f |2].
Proof. Without loss of generality we can assume x = 0. Let
ξ ∈ C∞c with ξ ≡ 1 in B(0, 12) and ξ ≡ 0 in B(0, 1)C , and η(x) = ξ(x
r).
Then
θ
∫Br
|D(ηu)|2 ≤∫Br
aij(ηu)xi(ηu)xj
=
∫Br
aijuxi(η2u)xj +
∫Br
aijηxiu(ηu)xj
−∫Br
aijuxiηηxju
=
∫Br
fxkη2u+
∫Br
aij(ηxiu(ηu)xj − uxiηηxju)
≤ −∫Br
f(η2u)xk +C
εr2
∫Br
u2 + ε
∫Br
η2|Du|2 + ε
∫Br
|D(ηu)|2
≤ C
∫Br
f 2 +C
εr2
∫Br
u2 + Cε
∫Br
η2|Du|2,
which implies, by choosing ε sufficiently small,∫Br/2
|Du|2 ≤ C
∫Br
f 2 +C
r2
∫Br
u2.
176 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS
Exercise 154. Show that if aij is constant and u ∈ H1 satisfies
−(aijuxi)xj = 0,
then, for all multiindices α ∈ Zn,
‖Dαu‖L2(Br/2) ≤ Cr−|α|‖u‖L2(Br).
8.3. Schauder estimates. The main objective of this section is
to prove:
Theorem 87. Let u ∈ H1 be a solution of
−(aijuxi)xj = fxk .
Then
(i) If aij is constant and f ∈ L2,λ then
Du ∈ L2,λloc ,
for 0 ≤ λ < n+ 2.
(ii) If aij(x) is continuous and f ∈ L2,λ then
Du ∈ L2,λloc ,
for 0 ≤ λ < n.
(iii) If aij(x), f(x) ∈ C0,α with 0 < α < 1 then
Du ∈ C0,βloc ,
for some β > 0.
Proof. (i). Suppose that B(x, 2R) ⊂ U (without loss of generality
assume x = 0). Let w be the unique solution of
−(aijwxi)xj = fxk
in H10 (B2R), whose existence is guaranteed by Lax-Milgram theorem
(theorem 68).
8. SCHAUDER ESTIMATES 177
We have the following estimate:∫B2R
|Dw|2 ≤ C
∫B2R
aijwxiwxj = −C∫B2R
(f − γ)wxk
≤ 1
2
∫B2R
|Dw|2 + C
∫B2R
|f − γ|2,
for any constant γ. Define γ = f0,2R ≡ f2R, to simplify the notation,
we obtain
(105)
∫B2R
|Dw|2 ≤ C
∫B2R
|f − f2R|2 ≤ CRλ[f ]22,λ.
We will use w to decompose the solution into two parts:
u = v + w.
w, by definition, satisfies w = 0 in the boundary of B2R and
−(aijwxi)xj = fxk .
Consequently, v = u − w has unknown boundary data but it satisfies
the homogeneous equation
−(aijvxi)xj = 0.
By hypothesis, the coefficients are constant, therefore v is C∞ (ex-
ercise 154) and, by Poincare’s inequality for ρ < 2R∫Bρ
|Dv − (Dv)ρ|2 ≤ Cρ2
∫Bρ
|D2v|2.
It is important to observe that in the Poincare the dependence of the
constant in ρ is exactly the one used previously:
Exercise 155. Show that the Poincare inequality in Bρ has the form∫Bρ
|u− uρ|2 ≤ Cρ2
∫Bρ
|Du|2,
where C does not depend on ρ.
Exercise 156. Use the Fourier transform to prove the following inter-
polation inequality
‖u‖Hk+θ(Rd) ≤ C‖u‖1−θHk(Rd)
‖u‖θHk+1(Rd).
178 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS
Recall that for s ∈ R the norm Hs is defined by
‖u‖2Hs(Rd) =
∫Rd
(1 + |ξ|2)s|u|2dξ.
Exercise 157. Let 1 ≤ p0, p1 ≤ ∞. Prove the following interpolation
inequality
‖u‖Lpθ ≤ ‖u‖θLp0‖u‖1−θLp1 ,
where1
pθ=
θ
p0
+1− θp1
.
Using interpolation techniques and the Fourier transform, it is also
possible to prove the following version of the Sobolev theorem:
Theorem 88. Let u ∈ Hs, where 0 < s < n2. Let 1
p∗= 1
2− s
n. Then
‖u‖Lp∗ ≤ C‖u‖Hs .
Proof. If s is integer, this the standard Sobolev theorem, for s
fractionary we can use interpolation.
Let v = D2v. Since the coefficients aij are constant
−(aij vxi)xj = 0.
Therefore, for 1 ≤ p <∞,∫Bρ
|v|2 ≤ ρn/p′‖v‖2
L2p(BR/2)
≤ Cρn/p′‖v‖2
Hn/(2p′)(BR/2),
using Sobolev’s theorem (theorem 88). By exercise 154, and using
interpolation, we obtain
(106)
∫Bρ
|v|2 ≤ Cρn/p
′
Rn/p′‖v‖2
L2(BR).
8. SCHAUDER ESTIMATES 179
As a conclusion, given ν sufficiently small, there exists p, sufficiently
large, such that np′
= n− ν. By the Poincare inequality
∫Bρ
|Dv − (Dv)ρ|2 ≤ Cρ2
∫Bρ
|D2v|2
≤ Cρn+2−ν
Rn−ν
∫BR
|D2v|2 ≤ C( ρR
)n+2−ν∫B2R
|Dv − (Dv)2R|2,
where in the last inequality we have applied lemma 86.
Let ρ < R2
. Then∫Bρ
|Du− (Du)ρ|2 ≤ 2
∫Bρ
|Dv − (Dv)ρ|2 + 2
∫Bρ
|Dw − (Dw)ρ|2
≤ C( ρR
)n+2−ν∫B2R
|Dv − (Dv)2R|2 + C
∫Bρ
|Dw|2
:= T1 + T2,
where
T1 = C( ρR
)n+2−ν∫B2R
|Dv − (Dv)2R|2 =
= C( ρR
)n+2−ν∫B2R
|Du− (Du)2R −Dw + (Dw)2R|2 ≤
≤ C( ρR
)n+2−ν∫B2R
|Du− (Du)2R|2 + C( ρR
)n+2−ν∫B2R
|Dw|2
and
T2 = C
∫Bρ
|Dw|2.
Let
Φ(ρ) = supρ1≤ρ
∫Bρ1
|Du− (Du)ρ1|2.
Then, using the estimate (105), we have
Φ(ρ) ≤ C( ρR
)n+2−νΦ(2R) + CRλ[f ]22,λ.
180 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS
From the previous inequality we conclude, when λ < n + 2 and using
the lemma that we will prove next, that
Φ(ρ) ≤ C
[Φ(2R)
( ρR
)λ+ [f ]22,λρ
λ
],
which implies, when λ < n+ 2,
supρ<R
2
ρ−λ∫Bρ
|Du− (Du)ρ|2 ≤ C(‖u‖2L2 + ‖f‖2
L2,λ).
Lemma 89. Either Φ ≥ 0, not decreasing and ρ < R/2, with R < R0
and
Φ(ρ) ≤ aΦ(2R)[( ρ
2R
)α+ ε]
+ bRβ
with 0 < β < α and ε > 0. Then, if ε is small enough,
Φ(ρ) ≤ C
[Φ(2R0)
(ρ
R0
)γ+ bρβ
],
with β < γ < α.
Proof. Let β < γ < α and θ sufficiently small such that
2aθα < θγ.
Suppose ε < θα, such that
aε < θγ/2.
Then
Φ(2θR) ≤ Φ(2R)θγ + bRβ.
Exercise 158. Estimate Φ(2θ2R) and Φ(2θ3R), applying inductively
the previous inequality.
By induction,
Φ(2θk+1R) ≤ θγ(k+1)Φ(2R) + bRβ
(k∑j=0
θγjθβ(k−j)
)≤ θγ(k+1)Φ(2R) + bRβθβkc(θ).
Therefore, given ρ and k satisfying
2θk+1R ≤ ρ ≤ 2θkR,
8. SCHAUDER ESTIMATES 181
we have
Φ(ρ) ≤ Φ(2θkR) ≤ θγ(k)Φ(2R) + b(Rθk)βc(θ)
≤ C[( ρR
)γΦ(2R) + bc(θ)ρβ
].
(ii). We have
−(aij(x0)uxi)xj = [(aij(x)− aij(x0))uxi + δkjf ]xj ,
that is
L0u = g,
where L0 is an operator with constant coefficients. To simplify nota-
tion, we take x0 = 0. Let w ∈ H10 (BR), defined by
L0w = g
and v = u− w. Let v = Dv and so, L0v = 0. Proceeding as in (106),∫Bρ
|v|2 ≤ C( ρR
)n−ν ∫BR
|v|2.
Consequently,∫Bρ
|Du|2 ≤ 2
∫Bρ
|v|2 + 2
∫Bρ
|Dw|2 ≤
≤ C( ρR
)n−ν ∫BR
|v|2 + 2
∫Bρ
|Dw|2 ≤
≤ C( ρR
)n−ν ∫BR
|Du|2 + C
[( ρR
)n−ν+ 1
] ∫BR
|Dw|2.
However, w depends implicitly on u, and therefore we must proceed
with caution.∫BR
|Dw|2 ≤ C
∫BR
aij(0)wxiwxj
=
∫BR
gw = −∫BR
fwxk −∫BR
(aij(x)− aij(0))uxiwxj
≤ C
∫BR
|f |2 +1
4
∫BR
|Dw|2 + C(ω(R))2
∫BR
|Du|2,
182 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS
where ω(R) is the modulus of continuity of aij.
Therefore∫Bρ
|Du|2 ≤ C( ρR
)n−ν ∫BR
|Du|2 +CRλ‖f‖2L2,λ +C(ω(R))2
∫BR
|Du|2.
Thus, for R so that ω(R) is sufficiently small and applying lemma 89
to the function
Φ(ρ) = supρ≤ρ
∫Bρ
|Du|2,
we obtain Du ∈ L2,λ if λ < n.
(iii). Let g and w as in (ii). We have∫Bρ
|Du−(Du)ρ|2 ≤ C( ρR
)n+2−ν∫B2R
|Du− (Du)ρ|2 +
∫B2R
|Dw|2
≤ C( ρR
)n+2−ν∫B2R
|Du− (Du)ρ|2 +
∫B2R
|f − f2R|2
+ Cω(2R)2
∫B2R
|Du|2.
By hypothesis we have ω(R) ≤ Rα and∫B2R
|f − f2R|2 ≤ CRn+2α.
For λ0 < n, part (ii) implies Du ∈ L2,λ0 . Choosing
Φ(ρ) = supρ≤ρ
∫Bρ
|Du− (Du)ρ|2
we obtain
Φ(ρ) ≤ C( ρR
)n+2−νΦ(2R) + CRλ0+2α,
and so
Φ(ρ) ≤ Cρλ0+2α.
4
Optimal control and viscosity solutions
This chapter is dedicated to the study of deterministic optimal con-
trol problems and its connection with Hamilton-Jacobi equations.
put some more details concerning controlled dynamics,
motivation, define a control space
A typical problem in optimal control, which is studied in detail
in this chapter, is the terminal value optimal control problem. This
problem consists in determining the optimal trajectories x(·) which
minimize
J [u;x, t] =
∫ t1
t
L(x,u)ds+ ψ(x(t1)),
among all controls u(·) : [t, t1] → Rn and all (continuous) trajectories
x with a initial condition x(t) = x and which are (almost everywhere
in time) solutions to the controlled dynamics
x = f(x,u).
The value function V is defined as
(107) V (x, t) = inf J [u;x, t]
in which the infimum is taken over all controls.
An important case is the ”calculus of variations setting”. In this
case, f(x, u) = u, and the optimal trajectories x(·) are solutions to the
Euler-Lagrange equation
d
dt
∂L
∂v(x, x)− ∂L
∂x(x, x) = 0,
183
184 4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS
and p = −DvL(x, x) is a solution of Hamilton’s equations:
x = −DpH(p,x), p = DxH(p,x).
This problem was studied in detail in chapter 2. However we will
revisit it, and generalize the previous results by allowing more general
Lagrangians. In fact, will work under the following assumptions:
L(x, v) : R2n → R,
x ∈ Rn, v ∈ Rn, is a C∞ function, strictly convex em v, i.e., D2vvL is
positive definite, and satisfying the coercivity condition
lim|v|→∞
L(x, v, t)
|v|=∞,
for each (x, t); without loss of generality, we may also assume that
L(x, v, t) ≥ 0, by adding a constant if necessary. We will also assume
that
L(x, 0, t) ≤ c1, |DxL| ≤ c2L+ c3,
for suitable constants c1, c2 and c3; finally we assume that there exists
a function C(R) such that
|D2xxL| ≤ C(R), |DvL| ≤ C(R)
whenever |v| ≤ R. The o terminal cost, ψ, is assumed to be a bounded
Lipschitz function.
Example 40. Note that, although the conditions on L are quite tech-
nical, they are fulfilled by a wide class of Lagrangians, for instance
L(x, v) =1
2vTA(x)v − V (x),
where A and V are C∞, Zn-periodic is x, and A(x) is positive definite.
J
Before considering the ”calculus of variations setting” we study a
simpler case. Let U , the control space, be a compact convex set. We
restrict the class of admissible controls by requiring u(s) ∈ U , for
all t ≤ s ≤ t1. Furthermore, we suppose that L(x, u) is a bounded
4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS 185
continuous function, convex in u. We suppose that the function f(x, u)
satisfies the following Lipschitz condition
|f(x, u)− f(y, u)| ≤ C|x− y|.
To establish existence of optimal solutions we simplify even further by
assuming that f(x, u) has the form
(108) f(x, u) = A(x)u+B(x),
where A and B are Lipschitz continuous functions.
In section 1, we start the rigorous study of optimal control problems
by establishing basic properties. The dynamic programming principle
is proved in §2.
The analog of the Euler-Lagrange equation for optimal control prob-
lems is the Pontryagin maximum principle, which will be studied in §3.
In §4, we show that, if the value function V is differentiable, it satisfies
the Hamilton-Jacobi partial differential equation
−Vt +H(DxV, x) = 0,
in which H(p, x), the Hamiltonian, is the (generalized) Legendre trans-
form of the Lagrangian L
(109) H(p, x) = supv∈U−p · f(x, v)− L(x, v).
It is well known that first order partial differential equations such as the
Hamilton-Jacobi equation may not admit classical solutions. Using the
method of characteristics, the next exercise gives an example of non-
existence of smooth solutions:
Exercise 159. Solve, using the method of characteristics, the equationut + u2x = 0 x ∈ R, t > 0
u(x, 0) = ±x2.
It is therefore necessary to consider weak solutions to the Hamilton-
Jacobi equation: viscosity solutions. In section §9 we develop the the-
ory of viscosity solutions for Hamilton-Jacobi equations, and show that
186 4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS
the value function is the unique viscosity solution of the Hamilton-
Jacobi equation.
Finally, in §10 we address the stationary optimal control problem
which corresponds to the Hamilton-Jacobi equation
H(Dxu, x) = H,
and the discounted cost infinite horizon problem, whose Hamilton-
Jacobi equation is
αu+H(Du, x) = 0.
Main references on optimal control and viscosity solutions are [BCD97],
[FS93], [Lio82], [Bar94], and [Eva98b].
1. Elementary examples and properties
In this section we establish some elementary properties and study
some explicit examples.
Proposition 90. The value function V satisfies the following inequal-
ities
−‖ψ‖∞ ≤ V ≤ c1|t1 − t|+ ‖ψ‖∞.
Proof. The first inequality follows from L ≥ 0. To obtain the
second inequality it is enough to observe that
V ≤ J(x, t; 0) ≤ c1|t1 − t|+ ‖ψ‖∞.
Example 41 (Lax-Hopf formula). Suppose that L(x, v) ≡ L(v), L
convex in v and coercive. Assume further that f(x, v) = v. By Jensen’s
inequality
1
t1 − t
∫ t1
t
L(x(s)) ≥ L
(1
t1 − t
∫ t1
t
x(s)
)= L
(y − xt1 − t
),
1. ELEMENTARY EXAMPLES AND PROPERTIES 187
where y = x(t1). Therefore, to solve the terminal value optimal control
problem, it is enough to consider constant controls of the form u(s) =y−xt1−t . Thus
V (x, t) = infy∈Rn
[(t1 − t)L
(y − xt1 − t
)+ ψ(y)
],
and, consequently, the infimum is a minimum. Thus Lax-Hopf formula
gives an explicit solution to the optimal control problem. J
Exercise 160. Suppose Q and A be n×n constant positive definite ma-
trices. Let L(v) = 12vTQv and ψ(y) = 1
2yTAy. Use Lax-Hopf formula
to determine V (x, t).
Proposition 91. Let ψ1(x) and ψ2(x) be continuous functions such
that
ψ1 ≤ ψ2,
and V1(x, t) and V2(x, t) the corresponding value functions. Then
V1(x, t) ≤ V2(x, t).
Proof. Fix ε > 0. Then there exists an almost optimal control uε
and corresponding trajectory xε such that
V2(x, t) >
∫ t1
t
L(xε(s),uε(s), s)ds+ ψ2(xε(t1))− ε.
Clearly
V1(x, t) ≤∫ t1
t
L(xε(s),uε(s), s)ds+ ψ1(xε(t1)),
and therefore
V1(x, t)− V2(x, t) ≤ ψ1(xε(t1))− ψ2(xε(t1)) + ε ≤ ε.
Since ε is arbitrary, this ends the proof.
An important corollary is the continuity of the value function (with
respect to the L∞ norm) on the terminal value.
188 4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS
Corollary 92. Let ψ1(x) and ψ2(x) be continuous functions and V1(x, t)
and V2(x, t) the corresponding value functions. Then
supx|V1(x, t)− V2(x, t)| ≤ sup
x|ψ1(x)− ψ2(x)|.
Proof. Note that
ψ1 ≤ ψ2 ≡ ψ2 + supy|ψ1(y)− ψ2(y)|.
Let V2 be the value function corresponding to ψ2. Clearly,
V2 = V2 + supy|ψ1(y)− ψ2(y)|.
By the previous proposition,
V1 − V2 ≤ 0,
which implies
V1 − V2 ≤ supy|ψ1(y)− ψ2(y)|.
By reverting the roles of V1 and V2 we obtain the other inequality.
2. Dynamic programming principle
The dynamic programming principle, that we prove in the next the-
orem, is simply a semigroup property that the value function evolution
satisfies.
Theorem 93 (Dynamic programming principle). Suppose that t0 ≤t ≤ t′ ≤ t1. Then
(110) V (x, t) = infu
[∫ t′
t
L(x(s),u(s), s)ds+ V (y, t′)
],
where x(t) = x and x = f(x,u).
2. DYNAMIC PROGRAMMING PRINCIPLE 189
Proof. Denote by V (x, t) the right hand side of (110). For fixed
ε > 0, let uε be an almost optimal control for V (x, t), and xε(s) the
corresponding trajectory trajectory , i.e.,
J(x, t; xε) ≤ V (x, t) + ε.
We claim that V (x, t) ≤ V (x, t) + ε. To check this statement, let
x(·) = xε(·) and y = xε(t′). Then
V (x, t) ≤∫ t′
t
L(xε(s),uε(s), s)ds+ V (y, t′).
Additionally
V (y, t′) ≤ J(y, t′; uε).
Therefore
V (x, t) ≤ J(x, t; uε) ≤ V (x, t) + ε,
and, since ε is arbitrary, V (x, t) ≤ V (x, t).
To prove the opposite inequality, we will proceed by contradiction.
Therefore, if V (x, t) < V (x, t), we could choose ε > 0 and a control u]
such that ∫ t′
t
L(x](s),u](s), s)ds+ V (y, t′) < V (x, t)− ε,
where x] = f(x],u]), x](t) = x, and y = x](t′). Choose u[ such that
J(y, t′; u[) ≤ V (y, t′) +ε
2
Define u? as u?(s) = u](s) for s < t′
u?(s) = u[(s) for t′ < s.
So, we would have
V (x, t)− ε >∫ t′
t
L(x](s),u](s), s)ds+ V (y, t′) ≥
≥∫ t′
t
L(x](s),u](s), s)ds+ J(y, t′; u[)− ε
2=
= J(x, t; u?)− ε
2≥ V (x, t)− ε
2,
which is a contradiction.
190 4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS
3. Pontryagin maximum principle
In this section we assume the control space U is bounded and that
there exists an optimal control u∗ and corresponding optimal trajectory
x∗. We assume also that the terminal data ψ is differentiable.
Let r ∈ [t, t1) be a point where u∗ is strongly approximately con-
tinuous, i.e.,
ϕ(u∗(r)) = limδ→0
1
δ
∫ r+δ
r
ϕ(u∗(s))ds,
for all continuous functions ϕ. Denote by Ξ0 the fundamental solution
of
(111) ξ0 = Dxf(x∗,u∗)ξ0,
with Ξ0(r) = I.
Let p∗ be given by
p∗(r) =Dxψ(xR(t1))Ξ0(t1)f(x∗(r),u∗(r))(112)
+
∫ t1
r
DxL(x∗(s),u∗(s), s)Ξ0(s)f(x∗(r),u∗(r))ds.(113)
Lemma 94 (Pontryagin maximum principle). Suppose that ψ is dif-
ferentiable. Then, for almost all r ∈ [t, t1),
f(x∗(r),u∗(r)) · p∗(r) + L(x∗(r),u∗(r), r)(114)
= minv∈U
[f(x∗, v) · p∗(r) + L(x∗(r), v, r)] .
Proof. Let v ∈ U . For almost all r ∈ [t0, t1) u∗ is strongly ap-
proximately continuous (see [EG92]). Let r be one of these points.
Define
uδ(s) =
v if r < s < r + δ
u∗(s) otherwise,
3. PONTRYAGIN MAXIMUM PRINCIPLE 191
and
xδ(s) =
x∗(s) if t < s < r
x∗(r) +∫ srf(x∗δ , v) if r < s < r + δ
x∗(s) + δξδ if r + δ < s < t1,
where
ξδ(r + δ) =1
δ
∫ r+δ
r
[f(x∗δ(s), v)− f(x∗(s),u∗(s))] ds,
and yδ = x∗(s) + δξδ solves, for r + δ < s < t1,
yδ = f(yδ,u∗).
Observe that
ξ0(r) = limδ→0
ξδ(r + δ) = f(x∗(r), v)− f(x∗(r),u∗(r)).
Then, as δ → 0, ξδ converges to the solution ξ0 of (111). Thus ξ0(s) =
Ξ0(s) (f(x∗(r), v)− f(x∗(r),u∗(r))).
Clearly
J(t, x; u∗) ≤∫ t1
t
L(xδ(s),uδ(s), s)ds+ ψ(x∗(t1) + δξδ).
This last inequality implies
1
δ
∫ r+δ
r
[L(xδ(s), v, s)− L(x∗(s),u∗(s), s)] ds+
+1
δ
∫ t1
r+δ
[L(x∗(s) + δξδ,u∗(s), s)− L(x∗(s),u∗(s), s)] ds+
+1
δ[ψ(x∗(t1) + δξδ)− ψ(x∗(t1))] ≥ 0.
When δ → 0, the first term converges to
L(x∗(r), v, r)− L(x∗(r),u∗(r), r),
since u∗ is strongly approximately continuous. The second term tends
to ∫ t1
r
DxL(x∗(s),u∗(s), s)ξ0(s)ds,
whereas the third one has the following limit:
Dxψ(xR(t1)) · ξ0(t1)).
192 4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS
This implies that for almost all r r,
L(x∗(r), v, r)− L(x∗(r),u∗(r), r)
+ p∗(r) · (f(x∗(r), v)− f(x∗(r),u∗(r))) ≥ 0.
consequently
f(x∗(r),u∗(r)) · p∗(r) + L(x∗(r),u∗(r), r)
= minv∈U
[f(x∗(r), v) · p∗(r) + L(xR(r), v, r)] ,
as required.
4. The Hamilton-Jacobi equation
Proposition 95. Suppose the value function is C1. Let r ∈ [t, t1) be a
point where u∗ is strongly approximately continuous. Then
p∗(r) = DxV (x, r).
Proof. Let u∗ be an optimal control for the initial condition (x, r).
For y ∈ Rn and δ > 0 consider the solution
xδ = f(xδ,u∗),
with initial condition xδ(t) = x+ δy. Then
∂xδ(s)
∂δ
∣∣∣∣δ=0
= Ξ0(s)y.
Since for all δ
V (x+ δy, r) ≤∫ t1
r
L(xδ,u∗)ds+ ψ(xδ(t1)),
by differentiating with respect to δ we obtain
DxV (x, r)y =
∫ t1
r
DxL(x,u∗)Ξ0(s)yds+Dxψ(x(t1))Ξ0(t1)y,
which implies the result.
Theorem 96. Suppose the value function V is C1. Then it solves
(115) − Vt +H(DxV, x) = 0.
5. VERIFICATION THEOREM 193
Proof. Consider an optimal trajectory x∗
V (x∗(t), t) =
∫ t1
t
L(x∗(s),u∗(s))ds.
Then, by differentiating with respect to t we have
Vt(x∗(t), t) +DxV (x∗(t), t)f(x∗(t),u∗(t)) + L(x∗(t),u∗(t)) = 0.
Which by Pontryagin maximum principle is equivalent to the Hamilton-
Jacobi equation (115).
Exercise 161. Let M(t), N(t) be n×n matrices with time-differentiable
coefficients. Suppose that is N invertible. Let D be a n × n constant
matrix. Consider the Lagrangian
L(x, v) =1
2xTM(t)x+
1
2vTN(t)v
and the terminal condition ψ = 12xTDx. Show that there exists a solu-
tion to the Hamilton-Jacobi with terminal condition ψ at t = T of the
form
V =1
2xTP (t)x,
where P (t) satisfies the Ricatti equation
P = P TN−1P −M
and P (T ) = D.
5. Verification theorem
Theorem 97. Let L(x, v) be a C1 Lagrangian, strictly convex in v,
and let f(x, u) a control law satisfying (108), and H the generalized
Legendre transform (109) of L. Let Φ(x, t) a classical solution to the
Hamilton-Jacobi equation
(116) − Φt +H(DxΦ, x) = 0
on the time interval [0, T ], with terminal data Φ(x, T ) = ϕ(x). Then,
for all 0 ≤ t ≤ T ,
Φ(x, t) = V (x, t),
where V is the value function.
194 4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS
Proof. Let x be any trajectory satisfying
x = f(x,u).
Then
ϕ(x(T ))− Φ(x(t), t) =
∫ T
t
d
dsΦ(x(s), s)ds
=
∫ T
t
DxΦ(x(s), s) · f(x,u) + Φs(x(s), s)ds.
Adding∫ TtL(x(s),u(s))ds+Φ(x(t), t) to the above equality and taking
the infimum over all controls u, we obtain
inf
(∫ T
t
L(x(s),u(s))ds+ ϕ(x(T )
))= Φ(x(t), t) + inf
(∫ T
t
Φs(x(s), s) + L(x(s),u(s)) +DxΦ(x(s), s) · f(x,u)ds
).
Now recall that for any v,
−H(p, x) ≤ L(x, v) + p · f(x, v),
therefore
inf
(∫ T
t
L(x(s), x(s))ds+ ϕ(x(T )
))≥ Φ(x(t), t) + inf
(∫ T
t
(Φs(x(s), s) +H(DxΦ(x(s), s),x(s))
)ds
)= Φ(x(t), t).
Let r(x, t) be uniquely defined as
(117) r(x, t) ∈ argminv∈U L(x, v) +DxΦ(x, t) · f(x, v).
A simple argument shows that r is a continuous function.
Now consider the trajectory x given by solving the following differ-
ential equation
x(s) = f(x, r(x(s), s)),
6. EXISTENCE OF OPTIMAL CONTROLS - BOUNDED CONTROL SPACE195
with initial condition x(t) = x. Note that since the right-hand side is
continuous there is a solution, although it may not be unique. Then
inf
(∫ T
t
L(x(s), x(s))ds+ ϕ(x(T )
))≤ Φ(x(t), t) +
∫ T
t
(Φs
(x(s), s
)−H
(DxΦ(x(s), s),x(s)
))ds
= Φ(x(t), t),
which ends the proof.
We should observe from the proof that (117) gives an optimal feed-
back law for the optimal control, provided we can find a solution to the
Hamilton-Jacobi equation.
6. Existence of optimal controls - bounded control space
We now give a proof of the existence of optimal controls for bounded
control space. The unbounded case will be addressed in §8.
Lemma 98. Suppose that f is as in (108). Then J is weakly lower
semicontinuous, with respect to weak-* convergence in L∞.
Proof. Let un be a sequence of controls such that un∗u in L∞[t, t1].
Then, by using Ascoli-Arzela theorem, we can extract a subsequence
such that xn(·) converges uniformly to x(·). Furthermore because the
control law (108) is linear we have
x = f(x,u).
We have
J(x, t; un) =
∫ t1
t
[L(xn(s),un(s), s)− L(x(s),un(s), s)] ds+
+
∫ t1
t
L(x(s),un(s), s)ds+ ψ(xn(t1)).
196 4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS
The first term,∫ t1t
[L(xn(s),un(s), s)− L(x(s),un(s), s)] ds, converges
to zero. Similarly, ψ(xn(t1)) → ψ(x(t1)). Finally, the convexity of L
implies
L(x(s),un(s), s) ≥ L(x(s),u(s), s) +DvL(x(s),u(s), s)(un(s)− u(s)).
Since un u∫ t1
t
DvL(x(s),u(s), s)(un(s)− u(s))ds→ 0.
Hence
lim inf J(x, t; un) ≥ J(x, t; u),
and so J is weakly lower semicontinuous.
Using the previous result we can now state and prove our first
existence result.
Lemma 99. Suppose the control set U is bounded. There exists a
minimizer u∗ of J in U .
Proof. Let un be a minimizing sequence, that is, such that
J(x, t; un)→ infu∈UR
J(x, t; u).
Because this sequence is bounded in L∞, by Banach-Alaoglu theorem
we can extract a sequence un∗u∗. Clearly, we have u∗ ∈ U . We claim
now that
J(x, t; u∗) = infu∈U
J(x, t; u).
This just follows from the weak lower semicontinuity:
infu∈U
J(x, t; u) ≤ J(x, t; u∗) ≤ lim inf J(x, t; un) = infu∈U
J(x, t; u),
which ends the proof.
Example 42 (Bang-Bang principle). Consider the case of a bounded
closed convex control space U and suppose the Lagrangian vanishes.
Suppose f(x, v) = v and that the terminal value ψ is convex.
7. SUB AND SUPERDIFFERENTIALS 197
In this setting we first observe that the set of all optimal controls
is convex. As such it admits an extreme point u∗. We claim that u∗
takes values on ∂U .
To see this choose a time r and suppose that for some ε there is
a set of positive measure in [r, r + ε] for which u∗ is in the interior of
U . Then there exists an L∞ function ν supported on this set such that∫ r+εr
dν = 0, and such that u∗ ± ν is an admissible control. By our
assumptions it is also an optimal control. It is clear then that u∗ is not
an extreme point which is a contradiction. J
Exercise 162. Show that the Bang-Bang principle also holds if the
Lagrangian is independent on the state variable x, that is L ≡ L(v).
7. Sub and superdifferentials
Let ψ : Rn → R be a continuous function. The superdifferential
D+x ψ(x) of ψ at x is the set of vectors p ∈ Rn such that
lim sup|v|→0
ψ(x+ v)− ψ(x)− p · v|v|
≤ 0.
Consequently, p ∈ D+x ψ(x) if and only if
ψ(x+ v) ≤ ψ(x) + p · v + o(|v|),
as |v| → 0. Similarly, the subdifferential, D−x ψ(x), of ψ at x is the set
of vectors p such that
lim inf|v|→0
ψ(x+ v)− ψ(x)− p · v|v|
≥ 0.
Exercise 163. Show that if u : Rn → R has a maximum (resp. mini-
mum) at x0 then 0 ∈ D+u(x0) (resp. D−u(x0)).
We can regard these sets as one-sided derivatives. In fact, ψ is
differentiable then
D−x ψ(x) = D+x ψ(x) = Dxψ(x).
More precisely,
198 4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS
Proposition 100. If D−x ψ(x), D+x ψ(x) 6= ∅ then
D−x ψ(x) = D+x ψ(x) = p
and ψ is differentiable at x with Dxψ = p. Conversely, if ψ is differ-
entiable at x then
D−x ψ(x) = D+x ψ(x) = Dxψ(x).
Proof. Suppose that D−x ψ(x) and D+x ψ(x) are both non-empty.
Then we claim that these two sets agree and have a single point p. To
see this, take p− ∈ D−x ψ(x) and p+ ∈ D+x ψ(x). Then
lim inf|v|→0
ψ(x+ v)− ψ(x)− p− · v|v|
≥ 0
lim sup|v|→0
ψ(x+ v)− ψ(x)− p+ · v|v|
≤ 0.
By subtracting these two identities
lim inf|v|→0
(p+ − p−) · v|v|
≥ 0.
In particular, by choosing v = −ε p+−p−|p−−p+| , we obtain
−|p− − p+| ≥ 0,
which implies p− = p+ ≡ p. Additionally p satisfies
lim|v|→0
ψ(x+ v)− ψ(x)− p · v|v|
= 0,
and, therefore, Dxψ = p.
To prove the converse it suffices to observe that if ψ is differentiable
then
ψ(x+ v) = ψ(x) +Dxψ(x) · v + o(|v|).
Exercise 164. Let ψ be a continuous function. Show that if x0 is a
local maximum of ψ then 0 ∈ D+ψ(x0).
7. SUB AND SUPERDIFFERENTIALS 199
Proposition 101. Let
ψ : Rn → R
be a continuous function. Then, if
p ∈ D+x ψ(x0) (resp. p ∈ D−x ψ(x0)),
there exists a C1 function φ such that
ψ(x)− φ(x)
has a local strict maximum (resp. minimum) at x0 and
p = Dxφ(x0).
On the other hand, if φ is a C1 function such that
ψ(x)− φ(x)
has a local maximum (resp. minimum) at x0 then
p = Dxφ(x0) ∈ D+x ψ(x0) (resp. D−x ψ(x0)).
Proof. By subtracting p · (x − x0) + ψ(x0) to ψ, we can assume,
without loss of generality, that ψ(x0) = 0 and p = 0. By changing
coordinates, if necessary, we can also assume that x0 = 0. Because
0 ∈ D+x ψ(0) we have
lim sup|x|→0
ψ(x)
|x|≤ 0.
Therefore there exists a continuous function ρ(x), with ρ(0) = 0, such
that
ψ(x) ≤ |x|ρ(x).
Let η(r) = max|x|≤rρ(x). η is continuous, non decreasing and η(0) =
0. Let
φ(x) =
∫ 2|x|
|x|η(r)dr + |x|2.
The function φ is C1 and satisfies φ(0) = Dxφ(0) = 0. Additionally, if
x 6= 0,
ψ(x)− φ(x) ≤ |x|ρ(x)−∫ 2|x|
|x|η(r)dr − |x|2 < 0.
Thus ψ − φ has a strict local maximum at 0.
200 4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS
To prove the second part of the proposition, suppose that the dif-
ference ψ(x) − φ(x) has a strict local maximum at 0. Without loss
of generality, we can assume ψ(0) − φ(0) = 0 and φ(0) = 0. Then
ψ(x)− φ(x) ≤ 0 or, equivalently,
ψ(x) ≤ p · x+ (φ(x)− p · x).
Thus, by setting p = Dxφ(0), and using the fact that
lim|x|→0
φ(x)− p · x|x|
= 0,
we conclude that Dxφ(0) ∈ D+x ψ(0). The case of a minimum is similar.
A continuous function f is semiconcave if there exists C such that
f(x+ y) + f(x− y)− 2f (x) ≤ C|y|2.
Similarly, a function f is semiconvex if there exists a constant such that
f(x+ y) + f(x− y)− 2f (x) ≥ −C|y|2.
Proposition 102. The following statements are equivalent:
1. f is semiconcave;
2. f(x) = f(x)− C2|x|2 is concave;
3. for all λ, 0 ≤ λ ≤ 1, and any y, z such that λy + (1− λ)z = 0
we have
λf(x+ y) + (1− λ)f(x+ z)− f(x) ≤ C
2(λ|y|2 + (1− λ)|z|2).
Additionally, if f is semiconcave, then
a. D+x f(x) 6= ∅;
b. if D−x f(x) 6= ∅ then f is differentiable at x;
c. there exists C such that, for each pi ∈ D+x f(xi) (i = 0, 1),
(x0 − x1) · (p0 − p1) ≤ C|x0 − x1|2.
Remark. Of course analogous results hold for semiconvex functions.
7. SUB AND SUPERDIFFERENTIALS 201
Proof. Clearly 2 =⇒ 3 =⇒ 1. Therefore, to prove the equiv-
alence, it is enough to show that 1 =⇒ 2. Subtracting C|x|2 to f ,
we may assume C = 0. Also, by changing coordinates if necessary, it
suffices to prove that for all x, y such that λx+ (1− λ)y = 0, for some
λ ∈ [0, 1], we have:
(118) λf(x) + (1− λ)f(y)− f(0) ≤ 0.
We claim now that the previous equation holds for each λ = k2j
, with
0 ≤ k ≤ 2j. Clearly (118) holds for j = 1. We will proceed by
induction on j. Suppose that (118) if valid for λ = k2j
. We will show
that it also holds for λ = k2j+1 . If k is even, we can reduce the fraction
and, therefore, we assume that k is odd, λ = k2j+1 and λx+(1−λ)y = 0.
Note that
0 =1
2
[k − 1
2j+1x+
(1− k − 1
2j+1
)y
]+
1
2
[k + 1
2j+1x+
(1− k + 1
2j+1y
)].
consequently,
f(0) ≥1
2f
(k − 1
2j+1x+
(1− k − 1
2j+1
)y
)+
+1
2f
(k + 1
2j+1x+
(1− k + 1
2j+1
)y
)but, since k−1 and k+1 are even, k0 = k−1
2and k1 = k+1
2are integers.
Therefore
f(0) ≥ 1
2f
(k0
2jx+
(1− k0
2j
)y
)+
1
2f
(k1
2jx+
(1− k1
2j
)y
)But this implies
f(0) ≥ k0 + k1
2j+1f(x) +
(1− k0 + k1
2j+1
)f(y).
From k0 + k1 = k we obtain
f(0) ≥ k
2j+1f(x) +
(1− k
2j+1
)f(y).
Since the function f is continuous and the rationals of the form k2j
are
dense in R, we conclude that
f(0) ≥ λf(x) + (1− λ)f(y),
202 4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS
for each real λ, with 0 ≤ λ ≤ 1.
To prove the second part of the proposition, observe that by propo-
sition 100, a =⇒ b. To check a, i.e., that D+x f(x) 6= ∅, it is enough
to observe that if f is concave then D+x f(x) 6= ∅. By subtracting
C|x|2 to f , we can reduce the problem to concave functions. Finally, if
pi ∈ D+x f(xi) (i = 0, 1) then
f(x0)− C
2|x0|2 ≤ f(x1)− C
2|x1|2 + (p1 − Cx1) · (x0 − x1),
and
f(x1)− C
2|x1|2 ≤ f(x0)− C
2|x0|2 + (p0 − Cx0) · (x1 − x0).
Therefore,
0 ≤ (p1 − p0) · (x0 − x1) + C|x0 − x1|2,
and so (p0 − p1) · (x0 − x1) ≤ C|x0 − x1|2.
Exercise 165. Let f : Rn → R be a continuous function. Show that if
x0 is a local maximum then 0 ∈ D+f(x0).
8. Optimal control in the calculus of variations setting
We now consider the calculus of variations setting and prove the
existence of optimal controls. The main technical issue is the fact that
the control space U = Rn is unbounded and therefore compactness
arguments do not work directly. Fortunately, the coercivity of the
Lagrangian is enough to establish the existence of a-priori bounds on
optimal controls.
Theorem 103. Let x ∈ Rn and t0 ≤ t ≤ t1. Suppose that the La-
grangian L(x, v) satisfies:
A. L is C∞, strictly convex in v (i.e., D2vvL is positive definite),
and satisfying the coercivity condition
lim|v|→∞
L(x, v)
|v|=∞,
8. OPTIMAL CONTROL IN THE CALCULUS OF VARIATIONS SETTING 203
uniformly in (x, t);
B. L is bounded by bellow (without loss of generality we assume
L(x, v) ≥ 0);
C. L satisfies the inequalities
L(x, 0) ≤ c1, |DxL| ≤ c2L+ c3
for suitable c1, c2, and c3;
D. there exist functions C0(R), C1(R) : R+ → R+ such that
|DvL| ≤ C0(R), |D2xxL| ≤ C1(R)
whenever |v| ≤ R.
Then, if ψ is a bounded Lipschitz function,
1. There exists u∗ ∈ L∞[t, t1] such that its corresponding optimal
trajectory x∗, given by
x∗(s) = u(s) x∗(t) = x,
satisfies
V (x, t) =
∫ t1
t
L(x∗(s), x∗(s))ds+ ψ(x∗(t1)).
2. There exists C, depending only on L, ψ and t1− t but not on x
or t such that |u(s)| < C for t ≤ s ≤ t1. The optimal trajectory
x∗(·) is a C2[t, t1] solution of the Euler-Lagrange equation
(119)d
dtDvL−DxL = 0
with initial condition x∗(t) = x.
3. The adjoint variable p, defined by
(120) p(t) = −DvL(x∗, x∗),
satisfies the differential equationp(s) = DxH(p(s),x∗(s))
x∗(s) = −DpH(p(s),x∗(s))
204 4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS
with terminal condition
p(t1) ∈ D−x ψ(x∗(t1)).
Additionally,
(p(s), H(p(s),x∗(s))) ∈ D−V (x∗(s), s)
for t < s ≤ t1.
4. The value function V is Lipschitz, and so almost everywhere
differentiable.
5. If D2vvL is uniformly bounded, then for each t < t1, V (x, t) is
semiconcave in x.
6. For t ≤ s < t1
(p(s), H(p(s),x∗(s))) ∈ D+V (x∗(s), s)
and, therefore, DV (x∗(s), s) exists for t < s < t1.
Proof. We will divide the proof into several auxiliary lemmas.
For R > 0, define UR = u ∈ U : ‖u‖∞ ≤ R. From lemma 99 there
exists a minimizer uR of J in UR. Then we will show that the minimizer
uR satisfies uniform estimates in R. Finally, we will let R→∞.
Let pR be the adjoint variable given by the Pontryagin maximum
principle. We now will try to estimate the optimal control uR uniformly
in R, in order to send R→∞.
Lemma 104. Suppose ψ is differentiable. Then there exists a constant
C, independent on R, such that
|pR| ≤ C.
Proof. Since ψ is Lipschitz and differentiable we have
|Dxψ| ≤ ‖Dxψ‖∞ <∞.
Therefore
|pR(s)| ≤∫ t1
s
|DxL(xR(r),uR(r)|dr + ‖Dxψ‖∞.
8. OPTIMAL CONTROL IN THE CALCULUS OF VARIATIONS SETTING 205
Let VR be the value function for the terminal value problem with the
additional constraint of bounded control: |v| ≤ R. From |DxL| ≤c2L+ c3, it follows
|pR(s)| ≤ C(VR(t, x) + 1),
for an appropriate constant C. Proposition 90, shows that there exists a
constant C, which does not depend on R, such that VR ≤ C. Therefore
|pR| ≤ C.
As we will see, the uniform estimates for pR yield uniform estimates
for uR.
Lemma 105. Let ψ be differentiable. Then there exists R1 > 0 such
that, for all R,
‖uR‖∞ ≤ R1.
Proof. Suppose |p| ≤ C. Then, for each c1, the coercivity condi-
tion on L implies that there exists R1 such that, if
v · p+ L(x, v) ≤ c1
then |v| ≤ R1. But then,
uR(s) · pR(s) + L(xR(s),uR(s)) ≤ L(xR(s), 0) ≤ c1,
that is, ‖uR‖∞ ≤ R1.
Since uR is bounded independently of R, we have
V = J(x, t; uR0),
for R0 > R1. Let u∗ = uR0 and p = pR0 .
Lemma 106 (Pontryagin maximum principle - II). If ψ is differen-
tiable, optimal control u∗ satisfies
u∗ · p + L(x∗,u∗) = minv
[v · p + L(x∗, v)] = −H(p,x∗),
206 4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS
for almost all s and, therefore,
p = −DvL(x∗,u∗) and u∗ = −DpH(p,x∗),
where H = L∗. Additionally, p satisfies the terminal condition
p(t1) = Dxψ(x∗(t1)).
Proof. Clearly it is enough to choose R sufficiently large.
Lemma 107. Let ψ be differentiable. The minimizing trajectory x(·)is C2 and satisfies the Euler-Lagrange equation (119). Furthermore,
p = DxH(p,x∗) x = −DpH(p,x∗).
Proof. By its definition p is continuous. We know that
x∗(s) = −DpH(p(s),x∗(s)),
almost everywhere. Since the right hand side of the previous identity is
continuous, the identity holds everywhere and, therefore, we conclude
that x∗ is C1. Because p is given by the integral of a continuous
function (112),
p(r) = Dxψ(x∗(t1)) +
∫ t1
r
DxL(x∗(s),u∗(s))ds,
we conclude that p is C1. Additionally,
x∗ = −DpH(p,x∗)
and, therefore, x∗ is C1, which implies that x is C2. We have also
p = −DvL(x∗, x∗) p = −DxL(x∗, x∗),
from which it follows
(121)d
dtDvL(x∗, x∗)−DxL(x∗, x∗) = 0.
Thus, since DxL(x∗, x∗) = −DxH(p,x∗), we conclude that
p = DxH(p,x∗) x∗ = −DpH(p,x∗),
as required.
8. OPTIMAL CONTROL IN THE CALCULUS OF VARIATIONS SETTING 207
In the case in which ψ is only Lipschitz and not C1, we can consider
a sequence of C1 functions, ψn → ψ uniformly, such that
‖Dxψn‖∞ ≤ ‖Dψ‖L∞ .
for each ψn. Let
Jn(x, t; u) =
∫ t1
t
L(xn(s), xn(s))ds+ ψn(xn(t1)),
and x∗n, u∗n, respectively, the corresponding optimal trajectory and op-
timal control. Similarly, let pn be the corresponding adjoint variable.
Passing to a subsequence, if necessary, the boundary values xn(t1) and
pn(t1) converge, respectively, for some x0 and p0. The optimal trajec-
tories x∗n and corresponding optimal controls u∗n converge uniformly,
by using Ascoli-Arzela theorem, to optimal trajectories and controls of
the limit problem. Let
p(s) = limn→∞
pn(s).
Then, for almost every s,
u∗ · p(s) + L(x∗(s),u∗(s)) = infv
[v · p(s) + L(x∗(s), v)] ,
which implies
p(s) = −DvL(x∗(s), x∗(s)),
for almost all s. But, in the previous equation both terms are contin-
uous functions thus the identity holds for all s.
Lemma 108. For t < s ≤ t1 we have
(p(s), H(p(s),x∗(s))) ∈ D−V (x∗(s), s).
Proof. Let x∗ be the optimal trajectory and u∗ the corresponding
optimal control. For r ≤ t1 and y ∈ Rn, define xr = x∗(r) and consider
the sub-optimal control
u] = u∗ +y − xrr − t
,
whose trajectory we denote by x], x](t) = x. Note that x](r) = y.
208 4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS
We have
V (x, t) =
∫ s
t
L(x∗(τ),u∗(τ))dτ + V (x∗(s), s)
and, by the sub-optimality of x],
V (x∗(t), t) ≤∫ r
t
L(x](τ),u](τ))dτ + V (y, r).
This implies
V (x∗(s), s)− V (y, r) ≤ φ(y, r),
with
φ(y, r) =
∫ r
t
L(x](τ),u](τ))dτ −∫ s
t
L(x∗(τ),u∗(τ))dτ.
Since φ is differentiable at y and r,
(−Dyφ(x∗(s), s),−Drφ(x∗(s), s)) ∈ D−V (x∗(s), s).
Observe that
x](τ) = x∗(τ) +y − xrr − t
(τ − t),
and, therefore,
Dyφ(x∗(s), s) =
∫ s
t
[DxL
τ − ts− t
+DvL1
s− t
]dτ.
Integrating by parts and using (121), we obtain
Dyφ(x∗(s), s) = DvL(x∗(s), x∗(s)) = −p(s).
Similarly,
Drφ(y, r) = L(y,u](r)) +
∫ s
t
[−DxL
y − xr(r − t)2
(τ − t)
+DxL−u∗(r)
(r − t)(τ − t)−DvL
y − xr(r − t)2
+DvL−u∗(r)
r − t
]dτ.
Integrating by parts and evaluating at y = x∗(s), r = s, we obtain
Drφ(x∗(s), s) = L(x∗(s), x∗(s))− u∗(s)DvL(x∗(s), x∗(s))
= −H(p(s),x∗(s)),
as we needed.
Lemma 109. The value function V is Lipschitz.
8. OPTIMAL CONTROL IN THE CALCULUS OF VARIATIONS SETTING 209
Proof. Let t < t1 be fixed and x, y arbitrary. We suppose first
that t1 − t < 1. Then
V (y, t)− V (x, t) ≤ J(y, t; u∗)− V (x, t),
where V (x, t) = J(x, t; u∗). Therefore, there exists a constant C, de-
pending only on the Lipschitz constant of ψ and of the supremum of
|DxL|, such that
V (y, t)− V (x, t) ≤ C|x− y|.
Suppose that t1 − t > 1. Letu(s) = u∗ + (x− y) if t < s < t+ 1
u(s) = u∗(s) if t+ 1 ≤ s ≤ t1.
Then
V (y, t)− V (x, t) ≤ J(y, t; u)− V (x, t) ≤ C|x− y|,
where the constant C depends only on DxL and on DvL, and not on
the Lipschitz constant of ψ. Reverting the roles of x and y we conclude
|V (y, t)− V (x, t)| ≤ C|x− y|.
Without loss of generality we can suppose that t < t. Note that
|V (x, t)− V (x∗(t), t)| ≤ C|t− t|.
To prove that V is Lipschitz in t it is enough to check that
(122) |V (x∗(t), t)− V (x, t)| ≤ C|t− t|.
But since x∗ is uniformly bounded
|x∗(t)− x| ≤ C|t− t|
thus, the previous Lipschitz estimate implies (122).
Lemma 110. V is differentiable almost everywhere.
Proof. Since V is Lipschitz, the almost everywhere differentiabil-
ity follows from Rademacher theorem.
210 4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS
In general, the value function is Lipschitz and not C1 or C2. How-
ever we can prove an one-side estimate for second derivatives, i.e. that
V is semiconcave.
Lemma 111. Suppose that |D2xvL|, |D2
vvL| ≤ C(R) whenever |v| ≤ R.
Then, for each t < t1, V (x, t) is semiconcave in x.
Proof. Fix t and x. Choose y ∈ Rn arbitrary. We claim that
V (x+ y, t) + V (x− y, t) ≤ 2V (x, t) + C|y|2,
for some constant C. Clearly,
V (x+ y, t) + V (x− y, t)− 2V (x, t)
≤∫ t1
t
[L(x∗ + y, x∗ + y) + L(x∗ − y, x∗ − y)− 2L(x∗, x∗)] ds,
where
y(s) = yt1 − st1 − t
.
Since |D2xxL| ≤ C1(R),
L(x∗ + y, x∗ + y) ≤ L(x∗, x∗ + y) +DxL(x∗, x∗ + y)y + C|y|2
and, in a similar way for the other term. We also have
L(x∗, x∗ + y) + L(x∗, x∗ − y) ≤ 2L(x∗, x∗) + C|y|2 + C|y||y|.
Thus
L(x∗ + y, x∗ + y) + L(x∗ − y, x∗ − y) ≤ 2L(x∗, x∗) + C|y|2 + C|y|2.
This inequality implies the lemma.
Lemma 112. We have
(p(s), H(p(s),x∗(s))) ∈ D+V (x∗(s), s)
for t ≤ s < t1. Therefore DV (x∗(s), s) exists for t < s < t1.
Proof. Let u∗ be an optimal control at (x, s) and let p be the
corresponding adjoint variable. Define W by
W (y, r) = J
(y, r; u∗ +
x∗(r)− yt1 − r
)− V (x, s).
8. OPTIMAL CONTROL IN THE CALCULUS OF VARIATIONS SETTING 211
Hence, for each y ∈ Rn and t ≤ r < t1,
V (y, r)− V (x, s) ≤ W (y, r),
with equality at (y, r) = (x, s). Since W is C1, it is enough to check
that
DyW (x∗(s), s) = p(s),
and
DrW (x∗(s), s) = H(p(s),x∗(s)).
The first identity follows from
DyW (s,x∗(s)) =
∫ t1
s
DxLϕ+DvLdϕ
dτdτ,
where ϕ(τ) = t1−τt1−s . Using the Euler-Lagrange equation
d
dtDvL−DxL = 0
and integration by parts we obtain
DyW (s,x∗(s)) = −DvL(x∗(s), x∗(s)) = p(s).
On the other hand,
DrW (s,x∗(s)) = −L(x∗(s), x∗(s)) +
∫ t1
s
DxLφ+DvLdφ
dτdτ,
where
φ(τ) =τ − t1t1 − s
x∗(s).
Using again the Euler-Lagrange equation and integration by parts, we
obtain
DrW (s,x∗(s)) = −L(x∗(s), x∗(s), s) +DvL(x∗(s), x∗(s))x∗(s),
or equivalently
DrW (s,x∗(s)) = H(p(s),x∗(s)).
The last part of the lemma follows from proposition 100.
This ends the proof of the theorem.
212 4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS
In what follows we prove that the value function is differentiable at
points of uniqueness of optimal trajectory.
A point (x, t) is regular if there exists a unique optimal trajectory
x∗(s) such that x∗(t) = x and
V (x, t) =
∫ t1
t
L(x∗(s), x∗(s))ds+ ψ(x∗(t1)).
Theorem 113. V is differentiable with respect to x at (x, t) if and only
if (x, t) is a regular point.
Proof. The next lemma shows that differentiability at a point x
implies that x is a regular point:
Lemma 114. If V is differentiable with respect to x at a point (x, t),
then there exists a unique optimal trajectory
Proof. Since V is differentiable with respect to x at (x, t), then
any optimal trajectory satisfies
x∗(t) = −DpH(p(t),x∗(t)),
since p(t) = DxV (x). Therefore, once DxV (x∗(t), t) is given, the veloc-
ity x∗(t) is uniquely determined. The solution of the Euler-Lagrange
equation (119) is determined by the initial condition and velocity: x∗(t)
and x∗(t). Thus, the optimal trajectory is unique.
To prove the other implication we need an auxiliary lemma:
Lemma 115. Let p such that
‖DxV (·, t)− p‖L∞(B(x,2ε)) → 0
when ε → 0. Then V is differentiable with respect to x at (x, t) and
DxV (x, t) = p.
Proof. Since V is Lipschitz, it is differentiable almost everywhere.
By Fubin theorem, for almost every point with respect to the Lebesgue
8. OPTIMAL CONTROL IN THE CALCULUS OF VARIATIONS SETTING 213
measure induced in Sn−1, V is differentiable y = x + λk, with respect
to the Lebesgue measure in R. For these directions
V (y, t)− V (x, t)− p · (y − x)
|x− y|
=
∫ 1
0
(DxV (x+ s(y − x), t)− p) · (y − x)
|x− y|ds.
Suppose 0 < |x− y| < ε. Then∣∣∣∣V (x, t)− V (y, t)− p · (x− y)
|x− y|
∣∣∣∣ ≤ ‖DxV (·, t)− p‖L∞(B(x,ε)).
In principle, the last identity only holds almost everywhere. However,
for y 6= x, the left-hand side is continuous in y. consequently, the
inequality holds for all y 6= x. Therefore, when y → x,∣∣∣∣V (x, t)− V (y, t)− p · (x− y)
|x− y|
∣∣∣∣→ 0,
which implies DxV (x, t) = p.
Suppose that V is not differentiable at (x, t). We claim that (x, t)
is not regular. By contradiction, suppose that (x, t) is regular. Then if
V fails to be differentiable, the previous lemma implies that for each p,
‖DxV (·, t)− p‖L∞(B(x,ε)) 9 0.
Thus, we could choose two sequences x1n and x2
n such that xin → x but
whose corresponding optimal trajectories xin satisfy
lim x1n(t) 6= lim x2
n(t).
However, this shows that (x, t) is not regular. Indeed if (x, t) were
regular, and xn were any sequence converging to x, and x∗n(·) the cor-
responding optimal trajectory then
x∗n(t)→ x∗(t).
If this were not true, by Ascoli-Arzela theorem, we could extract a
convergent subsequence xnk(·)→ y(·), and for which
x∗nk(t)→ v 6= x∗(t).
214 4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS
Let y(·) be the solution to the Euler-Lagrange equation with initial
condition y(t) = x(t) and y(t) = v. Note that x∗n(·) → y(·) and
x∗n(·)→ y(·), uniformly in compact sets, and, therefore,
V (x, t) = limn→∞
V (xn, t) = limn→∞
J(xn, t; xn)
= J(x, t; y) > J(x, t; x∗) = V (x, t),
since the trajectory y cannot be optimal, by regularity, which is a
contradiction.
Remark. This theorem implies that all points of the form (x∗(s), s),
in which x∗ is and optimal trajectory are regular for t < s < t1.
Exercise 166. Show that the optimal control ”bounded control space”
setting, the value function is Lipschitz continuous if the terminal cost
is Lipschitz continuous.
Exercise 167. In the optimal control ”bounded control space” setting,
show that if ψ is Lipschitz, for any (x, t) there exists p such that
(p(s), H(p(s),x∗(s))) ∈ D−V (x∗(s), s)
for t < s ≤ t1 and
(p(s), H(p(s),x∗(s))) ∈ D+V (x∗(s), s)
for t ≤ s < t1.
9. Viscosity solutions
In this section we discuss the viscosity solutions in the calculus of
variations setting. Since, however with small modifications our results
hold for the bounded control setting, we have added exercises in which
the reader is required to prove the analogous results.
Theorem 116. Consider the calculus of variations setting for the opti-
mal control problem. Suppose that the value function V is differentiable
at (x, t). Then, at this point, V satisfies the Hamilton-Jacobi equation
(123) − Vt +H(DxV, x, t) = 0.
9. VISCOSITY SOLUTIONS 215
Proof. If V is differentiable at (x, t) then the result follows by
using statement 6 in theorem 103.
Exercise 168. Show that (123) also holds in the ”bounded control
case” setting. Hint: use exercises 166 and 167.
Corollary 117. Consider the calculus of variations setting for the op-
timal control problem. The value function V satisfies the Hamilton-
Jacobi equation almost everywhere.
Proof. Since the value function V is differentiable almost every-
where, by theorem 103, theorem 116 implies this result.
Exercise 169. Show that the previous corollary also holds in the ”bounded
control case” setting.
However, it is not true that a Lipschitz function satisfying the
Hamilton-Jacobi equation almost everywhere is the value function of
the terminal value problem, as shown in the next example.
Example 43. Consider the Hamilton-Jacobi equation
−Vt + |DxV |2 = 0
with terminal data V (x, 1) = 0. The value function is V ≡ 0, which
is a (smooth) solution of the Hamilton-Jacobi equation However, there
are other solutions, for instance,
V (x, t) =
0 if |x| ≥ 1− t|x| − 1 + t if |x| < 1− t
which satisfy the same terminal condition t = 1 and is solution almost
everywhere. J
A bounded uniformly continuous function V is a viscosity subsolu-
tion (resp. supersolution) of the Hamilton-Jacobi equation (123) if for
any C1 function φ and any interior point (x, t) ∈ argmaxV − φ (resp.
argmin) then
−Dtφ+H(Dxφ, x, t) ≤ 0
216 4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS
(resp. ≥ 0) at (x, t). A bounded uniformly continuous function V is
a viscosity solution of the Hamilton-Jacobi equation if it is both a sub
and supersolution.
The value function is a viscosity solution of (123), although it may
not be a classical solution. The motivation behind the definition of
viscosity solution is the following: if V is differentiable and (x, t) ∈argmaxV −φ (or argmin) then DxV = Dxφ and DtV = Dtφ, therefore
we should have both inequalities. The specific choice of inequalities is
related with the following parabolic approximation of the Hamilton-
Jacobi equation
(124) −Dtuε +H(Dxu
ε, x, t) = ε∆uε.
This equation arises naturally in optimal stochastic control (see [FS93]).
The limit ε → 0 corresponds to the case in which the diffusion coeffi-
cient vanishes.
Proposition 118. Let uε be a family of solutions of (124) such that, as
ε → 0, the sequence uε → u uniformly. Then u is a viscosity solution
of (123).
Proof. Suppose that φ(x, t) is a C2 function such that u− φ has
a strict local maximum at (x, t). We must show that
−Dtφ+H(Dxφ, x, t) ≤ 0.
By hypothesis, uε → u uniformly. Therefore we can find sequences
(xε, tε) → (x, t) such that uε − φ has a local maximum at (xε, tε).
Therefore,
Duε(xε, tε) = Dφ(xε, tε)
and
∆uε(xε, tε) ≤ ∆φ(xε, tε).
Consequently,
−Dtφ(xε, tε) +H(Dxφ(xε, tε), xε, tε) ≤ ε∆φ(xε, tε).
It is therefore enough to take ε→ 0 to end the proof.
9. VISCOSITY SOLUTIONS 217
An useful characterization of viscosity solutions is given in the next
proposition:
Proposition 119. Let V be a bounded uniformly continuous function.
Then V is a viscosity subsolution of (123) if and only if for each (p, q) ∈D+V (x, t),
−q +H(p, x, t) ≤ 0.
Similarly, V is a viscosity supersolution if and only if for each (p, q) ∈D−V (x, t),
−q +H(p, x, t) ≥ 0.
Proof. This result is an immediate corollary of proposition 101.
Example 44. In example 43 we have found two different solutions to
equation
−Vt + |DxV |2 = 0
satisfying the same boundary data.
It is easy to check that the value function V = 0 is viscosity solution
(it is smooth, satisfies the equation and the terminal condition). The
other solution, which is an almost everywhere solution is not a viscosity
solution (check!).
Now we will show that the definition of viscosity solution is consis-
tent with classical solutions.
Proposition 120. A differentiable solution of (123) is a classical so-
lution.
Proof. If V is differentiable then
D+V = D−V = (DxV,DtV ).
Since V is a viscosity solution, we obtain immediately
−DtV +H(DxV, x, t) ≤ 0, e −DtV +H(DxV, x, t) ≥ 0,
therefore −DtV +H(DxV, x, t) = 0.
218 4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS
Theorem 121. Let uα be the value function of the infinite horizon
discounted cost problem (??). Then uα is a viscosity solution to
αuα +H(Duα, x) = 0.
Similarly, let V be a solution to the initial value problem (??). Then
V is a viscosity solution of
Vt +H(DxV, x) = 0.
Proof. We present the proof only for the discounted cost infinite
horizon as the other case is similar, and we refer the reader to [Eva98a],
for instance. Let ϕ : Td → R, ϕ(x), be a C∞ function, and let x0 ∈argmin(uα − ϕ). By adding a suitable constant to ϕ we may assume
that u(x0)− ϕ(x0) = 0, and u(x)− ϕ(x) ≥ 0 at all other points.
We must show that
αϕ(x0) +H(Dxϕ(x0), x0) ≥ 0,
that is, there exists v ∈ Rd such that
αϕ(x0) + v ·Dxϕ(x0)− L(x0, v) ≥ 0.
By contradiction assume that there exists θ > 0 such that
αϕ(x0) + v ·Dxϕ(x0)− L(x0, v) < −θ,
for all v. Because the mapping v 7→ L is superlinear and ϕ is C1,
there exists a R > 0 and r1 > 0 such that for all x ∈ Br1(x0) and all
v ∈ BcR(0) = Rd \BR(0) we have
αϕ(x) + v ·Dxϕ(x)− L(x, v) < −θ2.
By continuity, for some 0 < r < r1 and all x ∈ Br(x0) we have
αϕ(x) + v ·Dxϕ(x)− L(x, v) < −θ2,
for all v ∈ BR(0).
Therefore for any trajectory x with x(0) = x0 and any T ≥ 0 such
that the trajectory x stays near x0 on [−T, 0], i.e., x(t) ∈ Br(x0) for
9. VISCOSITY SOLUTIONS 219
t ∈ [−T, 0] we have
e−αTu(x(−T ))− u(x0) ≥ e−αTϕ(x(−T ))− ϕ(x0)
= −∫ 0
−Teαt(αϕ(x(t)) + x(t) ·Dxϕ(x(t))
)dt
≥ θ
2
∫ 0
−Teαtdt−
∫ 0
−TeαtL(x, x)dt.
This yields
u(x0) ≤ −θ2
∫ 0
−Teαtdt+
∫ 0
−TeαtL(x, x)dt+ e−αTu(x(−T ))
Since the infimum in (??) is, in fact, a minimum we can choose a time
interval [−T ∗, 0] and a trajectory x∗ that minimizes (??):
u(x0) =
∫ 0
−T ∗eαtL(x∗, x∗)dt+ e−αTu(x∗(−T ∗)).
A minimizing trajectory on [−T ∗, 0] also minimizes on any sub interval:
for any T ∈ (0, T ∗) we have
u(x0) =
∫ 0
−TeαtL(x∗, x∗)dt+ e−αTu(x∗(−T )).
Taking T small enough we can insure that x∗ stays near x0 on [−T, 0].
This yields a contradiction.
Now consider x0 ∈ argmax(uα − ϕ). Again, by adding a suitable
constant to ϕ we may assume that u(x0)−ϕ(x0) = 0, and u(x)−ϕ(x) ≤0 at all other points.
We must show that
αϕ(x0) +H(Dxϕ(x0), x0) ≤ 0,
that is, for all v ∈ Rd we have
αϕ(x0) + v ·Dxϕ(x0)− L(x0, v) ≤ 0.
By contradiction assume that there exists θ > 0 such that for some v
αϕ(x0) + v ·Dxϕ(x0)− L(x0, v) > θ.
220 4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS
By continuity, for some r > 0 and all x ∈ Br(x0) we have
αϕ(x) + v ·Dxϕ(x)− L(x, v) >θ
2.
The trajectory x, with x(0) = x0, x = v stays near x0 for t ∈ [−T, 0],
provided T > 0 is sufficiently small. Therefore
e−αTu(x(−T ))− u(x0) ≤ e−αTϕ(x(−T ))− ϕ(x0)
= −∫ 0
−Teαt(αϕ(x(t)) + x(t) ·Dxϕ(x(t))
)dt
≤ −θ2
∫ 0
−Teαtdt−
∫ 0
−TeαtL(x, x)dt.
This yields
u(x0) ≥ θ
2
∫ 0
−Teαtdt+
∫ 0
−TeαtL(x, x)dt+ e−αTu(x(−T )) .
But since by (??)
u(x0) ≤∫ 0
−TeαtL(x, x)dt+ e−αTu(x(−T )),
this yields the contradiction θ2
1−e−αTα≤ 0 with T > 0.
Exercise 170. Show that the function V (x, t) given by the Lax-Hopf
formula is Lipschitz in x for each t < t1, regardless of the smoothness
of the terminal data (note, however that the constant depends on t).
Exercise 171. Use the Lax-Hopf formula to determine the viscosity
solution of
−ut + u2x = 0,
para t < 0 and u(x, 0) = ±x2 − 2x.
Exercise 172. Use the Lax-Hopf formula to determine the viscosity
solution of
−ut + u2x = 0,
for t < 0 and
u(x, 0) =
0 if x < 0
x2 if 0 ≤ x ≤ 1
2x− 1 if x > 1.
9. VISCOSITY SOLUTIONS 221
To establish uniqueness of viscosity solutions we need the following
lemma:
Lemma 122. Let V be a viscosity solution of
−Vt +H(DxV, x) = 0
in [0, T ] × Rn and φ a C1 function. If V − φ has a maximum (resp.
minimum) at (x0, t0) ∈ Rd × (0, T ] then
(125) −φt(x0, t0)+H(Dxφ(x0, t0), x0) ≤ 0 (resp. ≥ 0) at (x0, t0).
Remark: The important point is that the inequality is valid even for
some non-interior points (t0 = 0).
Proof. Only the case t0 = 0 requires proof since in the other case
the maximum is interior and then the viscosity property (the definition
of viscosity solution) yields the inequality. Consider
φ = φ− ε
t.
Then V − φ has an interior local maximum at (xε, tε) with tε < 0.
Furthermore, (xε, tε)→ (x0, 0), as ε→ 0. At the point (xε, tε) we have
φt(xε, tε) +ε
t2ε+H(Dxφ(xε, tε), xε) ≤ 0,
that is, since εt2ε≥ 0,
φt(x0, 0) +H(Dxφ(x0, 0), x0) ≤ 0.
Analogously we obtain the opposite inequality, using φ = φ+ εt.
Finally we establish uniqueness of viscosity solutions:
Theorem 123 (Uniqueness). Suppose H satisfies
|H(p, x)−H(q, x)| ≤ C(|p|+ |q|)|p− q|
|H(p, x)−H(p, y)| ≤ C|x− y|(C +H(p, x))
Then the value function is the unique viscosity solution to the Hamilton-
Jacobi equation
−Vt +H(DxV, x) = 0
222 4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS
that satisfies the terminal condition V (x, T ) = ψ(x).
Proof. Let V and V be two viscosity solutions with
sup−T≤t≤0
V − V = σ > 0.
For 0 < ε, λ < 1 we define
ψ(x, y, t, s) = V (x, t)−V (y, s)−λ(t+s+2T )− 1
ε2(|x−y|2+|t−s|2)−ε(|x|2+|y|2).
When ε, λ are sufficiently small we have
maxψ(x, y, t, s) = ψ(xε,λ, yε,λ, tε,λ, sε,λ) >σ
2.
Since ψ(xε,λ, yε,λ, tε,λ, sε,λ) ≥ ψ(0, 0,−T,−T ), and both V and V are
bounded, we have
|xε,λ − yε,λ|2 + |tε,λ − sε,λ|2 ≤ Cε2
and
ε(|xε,λ|2 + |yε,λ|2) ≤ C.
From these estimates and the fact that V and V are continuous, it then
follows that|xε,λ − yε,λ|2 + |tε,λ − sε,λ|2
ε2= o(1),
as ε→ 0.
Denote by ω and ω the modulus of continuity of V and V . Thenσ
2≤ V (xε,λ, tε,λ)− V (yε,λ, sε,λ)
= V (xε,λ, tε,λ)− V (xε,λ,−T ) + V (xε,λ,−T )− V (xε,λ,−T )+
+ V (xε,λ,−T )− V (xε,λ, sε,λ) + V (xε,λ, sε,λ)− V (yε,λ, sε,λ) ≤
≤ ω(T + tε,λ) + ω(T + sε,λ) + ω(o(ε)).
Therefore, if ε is sufficiently small T + tε,λ > µ > 0, uniformly in ε.
Let φ be given by
φ(x, t) = V (yε,λ, sε,λ) + λ(2T + t+ sε,λ)+
+1
ε2(|x− yε,λ|2 + |t− sε,λ|2) + ε(|x|2 + |yε,λ|2).
9. VISCOSITY SOLUTIONS 223
Then, the difference
V (x, t)− φ(x, t)
achieves a maximum at (xε,λ, tε,λ).
Similarly, for φ given by
φ(y, s) = V (xε,λ, tε,λ)− λ(2T + tε,λ + s)−
− 1
ε2(|xε,λ − y|2 + |tε,λ − s|2)− ε(|xε,λ|2 + |y|2),
the difference
V (y, s)− φ(y, s)
has a minimum at (yε,λ, sε,λ).
Therefore
φt(xε,λ, tε,λ) +H(Dxφ(xε,λ, tε,λ), xε,λ) ≤ 0,
and
φs(yε,λ, sε,λ) +H(Dyφ(yε,λ, sε,λ), yε,λ) ≥ 0.
Simplifying, we have
(126) λ+ 2tε,λ − sε,λ
ε2+H(2
xε,λ − yε,λε2
+ 2εxε,λ, xε,λ) ≤ 0,
and
(127) − λ+ 2tε,λ − sε,λ
ε2+H(2
xε,λ − yε,λε2
− 2εyε,λ, yε,λ) ≥ 0.
From (126) we gather that
(128) H(2xε,λ − yε,λ
ε2+ 2εxε,λ, xε,λ) ≤ −λ+
o(1)
ε.
224 4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS
By subtracting (126) to (127) we have
2λ ≤ H(2xε,λ − yε,λ
ε2− 2εyε,λ, yε,λ)−H(2
xε,λ − yε,λε2
+ 2εxε,λ, xε,λ)
≤ H(2xε,λ − yε,λ
ε2− 2εyε,λ, yε,λ)−H(2
xε,λ − yε,λε2
− 2εyε,λ, xε,λ)
+H(2xε,λ − yε,λ
ε2− 2εyε,λ, xε,λ)−H(2
xε,λ − yε,λε2
+ 2εxε,λ, xε,λ)
≤(C + CH(2
xε,λ − yε,λε2
+ 2εxε,λ, xε,λ)
)|xε,λ − yε,λ|
+ Cε
(∣∣∣∣2xε,λ − yε,λε2+ 2εxε,λ
∣∣∣∣+
∣∣∣∣2xε,λ − yε,λε2− 2εyε,λ
∣∣∣∣) |xε,λ − yε,λ|≤(o(1)
ε+ C
)(|xε,λ − yε,λ|+ |tε,λ − sε,λ|) ,
when ε→ 0, which is a contradiction.
10. Stationary problems
In this section we consider optimal control stationary problems.
These problems arise in stationary steady state control and also in the
infinite horizon discounted cost problem. In this chapter we consider
the calculus of variations setting, however similar results hold for the
bounded control setting.
We define the discounted cost function Jα, with discount rate α, as
Jα(x;u) =
∫ ∞0
L(x(s), x(s))e−αsds.
In this case, the optimal trajectories x(·) satisfy the differential equa-
tion
x = u,
with the initial condition x(0) = x.
As before, the value function, uα, is given by
uα(x) = inf Jα(x; u),
where infimum is taken over all controls u ∈ L∞loc.
10. STATIONARY PROBLEMS 225
The dynamic programming principle in this case is
Proposition 124. For each t > 0
uα(x) = infx(0)=x
[∫ t
0
L(x(s), x(s))e−αsds+ e−αtuα(x(t))
].
Proof. Observe that
uα(x) = infx(0)=x
[∫ t
0
L(x(s), x(s))e−αsds
+e−αt∫ ∞t
L(x(s), x(s))e−α(s−t)ds
]≥ inf
x(0)=x
[∫ t
0
L(x(s), x(s))e−αsds+ e−αtuα(x(t))
].
The other inequality is left as an exercise:
Exercise 173. Show that
uα(x) ≤ infx(0)=x
[∫ t
0
L(x(s), x(s))e−αsds+ e−αtuα(x(t))
].
Because of the dynamic programming, it is clear that
V (x, t) = e−αtuα(x)
is a viscosity solution of
−Vt + e−αtH(eαtDxV, x) = 0.
This then implies
Corollary 125. uα is a viscosity solution of
αuα +H(Dxuα, x) = 0.
Furthermore
Corollary 126. If uα is differentiable then it is a solution of
(129) H(Dxuα, x) + αuα = 0.
226 4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS
Exercise 174. Show that the optimal trajectories for the discounted
cost infinite horizon are solutions to the (negatively damped) Euler-
Lagrange equation
(130)d
dt
∂L
∂x− α∂L
∂x− ∂L
∂x= 0.
If x(t) satisfies (130), the energy H may not be conserved
Example 45. Let L(x, v) = v2
2+ cosx. Then (130) reads
x− αx + sin x = 0.
When α = 0 the energy
H =x2
2− cos x
is constant in time, but for α > 0 we have
dH
dt= αx2.
Therefore, the energy increases in time unless x = 0. J
Proposition 127. Suppose that x(t) satisfies (130). Then
dH
dt= αDvL(x(t), x(t)) · x(t).
Proof. Let
p(t) = −DvL(x(t), x(t))
we have
dH
dt= DpH · p +DxH · x
= x · (αDvL+DxL)−DxL · x = αDvL · x.
We assume now that H is Zn periodic in x. We will show that as
α→ 0, the solution uα converges (up to constants) to a solution of
(131) H(Dxu, x) = H.
for some H.
10. STATIONARY PROBLEMS 227
Theorem 128. Let uα be a viscosity solution to
αuα +H(Duα, x) = 0.
Then αuα is uniformly bounded and uα is Lipschitz, uniformly in α.
Proof. First let xM be the point where uα(x) has a global max-
imum, and xm a point of global minimum. Then, by the viscosity
property, i.e., the definition of the viscosity solution, we have
αuα(xM) +H(0, xM) ≤ 0, αuα(xm) +H(0, xm) ≥ 0,
which yields that αuα is uniformly bounded.
Now we establish the Lipschitz bound. Observe that if uα is Lips-
chitz, then there exists M > 0 such that
uα(x)− uα(y) ≤M |x− y|,
for all x, y. By contradiction, assume that for every M > 0 there exists
x and y such that
uα(x)− uα(y) > M |x− y|.
Let ϕ(x) = uα(y) + M |x − y|. Then uα(x) − ϕ(x) has a maximum at
some point x 6= y. Therefore
αuα(x) +H(M x−y|x−y| , x
)≤ 0,
which by the coercivity of H yields a contradiction if M is sufficiently
large.
Example 46. We can also use directly calculus of variations methods
to show that the exists C, independent of α, such that
uα ≤C
α.
Indeed, since L(x, 0) is bounded
uα(x) ≤ Jα(x, 0) ≤∫ ∞
0
L(x, 0)e−αsds ≤ C
α.
J
228 4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS
Theorem 129. (Stability theorem for viscosity solutions) Assume that
for α > 0 function uα is a viscosity solution for Hα(u,Du, x) = 0. Let
Hα → H uniformly on compact sets, and uα → u uniformly. Then u
is a viscosity solution for H(u,Du, x) = 0.
Proof. Suppose u−ϕ has a strict local maximum (resp. minimum)
at a point x0. Then there exists xα → x such that uα − ϕ has a local
maximum (resp. minimum) at xα. Then
Hα(uα(xα), Dϕ(xα), xα) ≤ 0 (resp. ≥ 0).
Letting α→ 0 finishes the proof.
As demonstrated in context of homogenization of Hamilton-Jacobi
equations, in the classic but unpublished paper by Lions, Papanicolaou
and Varadhan [Lio82], it is possible to construct, using the previous
result, viscosity solutions to the stationary Hamilton-Jacobi equation
(132) H(Du, x) = H.
Theorem 130 (Lions, Papanicolao, Varadhan). There exists a num-
ber H and a function u(x), Zd periodic in x, that solves (132) in the
viscosity sense.
Proof. Since uα − minuα is periodic, equicontinuous, and uni-
formly bounded, it converges, up to subsequences, to a function u.
Moreover uα ≤ Cα
, thus αuα converges uniformly, up to subsequences,
to a constant, which we denote by −H. Then, the stability theorem for
viscosity solutions, theorem 129, implies that u is a viscosity solution
of
H(Du, x) = H.
Theorem 131. Let u : Td → R be a viscosity solution to
H(Du, x) = C.
Then u is Lipschitz, and the Lipschitz constant does not depend on u.
10. STATIONARY PROBLEMS 229
Proof. First observe that from the fact that u = u − 0 achieves
maximum and minimum in Td we have
minx∈Td
H(0, x) ≤ C ≤ maxx∈Td
H(0, x).
Then, it is enough to argue as in the proof of Theorem 128.
Exercise 175. Let u : R → R be continuous and piecewise differen-
tiable (with left and right limits for the derivative at any point). Show
that u is a viscosity solution of
H(Dxu, x) = H
if
1. u satisfies the equation almost everywhere;
2. whenever Dxu is discontinuous then Dxu(x−) > Dxu(x+).
Example 47 (One dimensional pendulum). The Hamiltonian corre-
sponding to a one-dimensional pendulum with unit mass and unit
length is
H(p, x) =p2
2− cos 2πx.
In this case, it is not difficult to determine explicitly the solution to
the Hamilton-Jacobi equation
H(P +Dxu, x) = H(P ),
where P is a real parameter. In fact, for P ∈ R and almost every
x ∈ R, the solution u(P, x) satisfies
(P +Dxu)2
2= H(P ) + cos 2πx.
consequently, H(P ) ≥ 1 and, therefore,
Dxu = −P ±√
2(H(P ) + cos 2πx), q.t.p. x ∈ R.
Thus
u =
∫ x
0
−P + s(y)
√2(H(P ) + cos 2πy)dy + u(0),
where |s(y)| = 1. Since H is convex em p and u is a viscosity solution,
the only possible discontinuities on the derivative of u are the ones
230 4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS
that satisfy Dxu(x−)−Dxu(x+) > 0, see exercise 175. Therefore s can
change sign from 1 to −1 at any point, however the jumps from −1 to
1 can only happen when√2(H(P ) + cos 2πx) = 0.
Since we are looking for 1-periodic solutions, there are only two cases
to consider. The first, in which H(P ) > 1 and the solution is C1
since√
2(H(P ) + cos 2πy) never vanishes. In this case H(P ) can be
determined as from P through the equation
P = ±∫ 1
0
√2(H(P ) + cos 2πy)dy.
It is easy to check that this equation has a unique solution H(P ) when-
ever
|P | >∫ 1
0
√2(1 + cos 2πy)dy,
that is,
|P | > 4
π.
The second case occurs whenever the last inequality does not hold, that
is H(P ) = 1 and thus s(x) can have discontinuities. In fact, s(x) jumps
from −1 to 1 whenever x = 12
+ k, with k ∈ Z, and there exists a point
x0 defined by the equation∫ 1
0
s(y)√
2(1 + cos 2πy)dy = P,
such that s(x) jumps from 1 to −1 at x0 + k, k ∈ Z. J
Exercise 176. Let φ : Tn → R be a C1 function not identically con-
stant. Show that there exist two distinct viscosity solutions of
Dxu · (Dxu−Dxφ) = 0,
whose difference is not constant.
5
Duality theory
This chapter is dedicated to the study of duality theory in optimiza-
tion problems. The main applications we study are infinite dimensional
linear programming problems such as Monge Kantorowich and Mather
problems.
1. Model problems
In this section we discuss certain minimization problems which in-
volve linear objective functions under linear constraints, that is, infinite
dimensional linear programming problems. Surprisingly there are deep
relations between these problems and certain nonlinear partial differ-
ential equations.
1.1. Mather problem.
1.1.1. Classical Mather problem. Let Td be the d-dimensional stan-
dard torus. Consider a Lagrangian L(x, v), L : Td × Rd → R, smooth
in both variables, strictly convex in the velocity v, and coercive, that
is,
lim|v|→∞
infx
L(x, v)
|v|= +∞.
The minimal action principle of classical mechanics asserts that the
trajectories x(t) of mechanical systems are critical points or minimizers
of the action
(133)
∫ T
0
L(x, x)ds.
231
232 5. DUALITY THEORY
These critical points are then solutions to the Euler-Lagrange equations
(134)d
dtDvL(x, x)−DxL(x, x) = 0.
Mather’s problem is a relaxed version of this variational principle, and
consists in minimizing the action
(135)
∫Td×Rd
L(x, v)dµ(x, v)
among a suitable class of probability measures µ(x, v). Originally, in
[Mat91], this minimization was performed over all measures invariant
under the Euler-Lagrange equations (134). However, as realized by
[Mn96], it is more convenient to consider a larger class of measures,
the holonomic measures. It turns out that both problems are equivalent
as any holonomic minimizing measure is automatically invariant under
the Euler-Lagrange equations. In what follows, we will define this class
of measures and provide the motivation for it.
Let x(t) be a trajectory on Td. Define a measure µTx on Td×Rd by
its action on test functions ψ ∈ Cc(Td×Rd), ψ(x, v), (continuous with
compact support) as follows:
〈ψ, µTx 〉 =1
T
∫ T
0
ψ(x(t), x(t)
)dt.
If x(t) is globally Lipschitz, the family µTxT>0 has support contained
in a fixed compact set, and therefore is weakly-∗ compact. Conse-
quently one can extract a limit measure µx which encodes some of the
asymptotic properties of the trajectory x.
Let ϕ ∈ C1(Td). For ψ(x, v) = v ·Dϕ(x) we have
〈ψ, µx〉 = limT→∞
1
T
∫ T
0
x ·Dϕ(x)dt = limT→∞
ϕ(x(T )
)− ϕ
(x(0)
)T
= 0.
Let γ(v) be a continuous function, γ : Rd → R, such that inf γ(v)1+|v| >
0, and lim|v|→∞
γ(v)1+|v| = ∞. A measure µ in Td × Rd is admissible if
1. MODEL PROBLEMS 233∫Td×Rd γ(v)dµ < ∞. An admissible measure µ on Td × Rd is called
holonomic if for all ϕ ∈ C1(Td) we have
(136)
∫Td×Rd
v ·Dϕdµ = 0.
Mather’s problem consists in minimizing (135) under all probabil-
ity measures that satisfy (136). As pointed out before, however, this
problem was introduced by Mane in [Mn96] in his study of Mather’s
original problem [Mat91].
1.1.2. Stochastic Mather problem. In the framework of stochastic
optimal control one is led to replace deterministic trajectories by sto-
chastic processes. Suppose that x(t) satisfies the stochastic differential
equation
dx = νdt+ σdW,
in which ν is a bounded, progressively measurable process, σ > 0 and
W a n−dimensional Brownian motion. One would like to minimize the
average action
E
∫ T
0
L(x, ν)dt.
As before, one can associate to these stochastic processes, probability
measures µ in Tn × Rn defined as∫Tn×Rn
φ(x, v)dµ = limT→∞
1
T
∫ T
0
φ(x(t), ν(t))dt,
in which the limit is taken through an appropriate subsequence.
The Dynkin’s formula is the analog for stochastic processes to the
fundamental theorem of calculus. This formula applied to ϕ(x(t)),
states that
E [ϕ(x(T ))− ϕ(x)] = E
∫ T
0
νDxϕ(x(t)) +σ2
2∆ϕ(x(t))dt.
This identity implies∫Tn×Rn
vDxϕ(x) +σ2
2∆ϕ(x)dµ = 0,
for all ϕ(x) : Tn → R, C2.
234 5. DUALITY THEORY
The stochastic Mather problem [Gom02a] consists in minimizing∫Tn×Rn
L(x, v)dµ,
over all probability measures µ Tn × Rn that satisfy∫Tn×Rn
vDxϕ(x) +σ2
2∆ϕ(x)dµ = 0,
for all ϕ(x) : Tn → R of class C2.
1.1.3. Discrete Mather problem. Also interesting is the discrete case,
in which the trajectories are replaced by sequences (xn, vn) that satisfy
xn+1 = xn + vn. In this case, if the sequence vn is globally bounded,
for instance, we can associate to this sequence a measure µ in Tn×Rn
through ∫Tn×Rn
φ(x, v)dµ = limN→∞
1
N
N∑n=1
φ(xn, vn),
in which the limit is take through an appropriate subsequence.
Since for all continuous functions ϕ : Tn → R we have
N∑n=1
ϕ(xn + vn)− ϕ(xn) = ϕ(xN+1)− ϕ(x1),
we obtain ∫Tn×Rn
[ϕ(x+ v)− ϕ(x)] dµ = 0.
Therefore, we propose Mather discrete problem, which consists in
minimizing ∫Tn×Rn
L(x, v)dµ,
over all probability measures µ in Tn × Rn that satisfy∫Tn×Rn
[ϕ(x+ v)− ϕ(x)] dµ = 0,
for all continuous function ϕ : Tn → R.
1. MODEL PROBLEMS 235
1.1.4. Generalized Mather problem. To state the generalized Mather
problem, we must now make precise our framework. Let U ⊂ Rm be
a non-empty closed convex set. Assume that, for some k ≥ 0 (usually
k = 0, 1, 2) there exists a linear operator Av : Ck(Tn) → C(Tn × U),
which satisfies the following two conditions: the first one is that for
each fixed ϕ ∈ Ck(Tn) we have
|Avϕ| ≤ Cϕ(1 + |v|),
uniformly in Tn × U , which of course, if U is bounded means simply
that |Avϕ| is bounded; the second condition is that for ϕ ∈ Ck(Tn) the
mapping (x, v) 7→ Avϕ is continuous in Tn × U .
We assume that there exists another operator B defined in Ck(Tn)
which satisfies the following compatibility conditions with Av:
(137) Avκ = Bκ,
for any κ ∈ R, and that, for any given probability measure ν on Tn,
there exists a probability measure µν in Tn × U such that
(138)
∫Tn×U
Avϕdµν =
∫TnBϕdν,
for all ϕ ∈ Ck(Tn).
The Lagrangian L(x, v) : Tn × U → R is continuous and convex in
v, bounded below, and, either U is bounded, and no further hypothesis
are required, or if U is unbounded we assume that, uniformly in x
lim|v|→∞
L(x, v)
|v|=∞.
The generalized Mather problem consists in minimizing
(139)
∫Tn×U
L(x, v)dµ,
over all probability measures µ in Tn × U that satisfy the constraint
(140)
∫Tn×U
Avϕdµ =
∫TnBϕdν,
for all functions ϕ : Tn → R with appropriate regularity.
236 5. DUALITY THEORY
1.2. Monge-Kantorowich problem. The Monge-Kantorowich
optimal mass transport problem, see [Eva99] or [Vil03b] is the fol-
lowing: given two positive measures µ+ and µ− in Rn which satisfy the
mass balance condition ∫Rndµ+ =
∫Rndµ−,
then one looks for a function s : Rn → Rn which transports µ+ into
µ−, that is, ∫Rnϕ(s(x))dµ+ =
∫Rnϕ(y)dµ−,
for each ϕ ∈ C∞c (Rn), more compactly we write this condition as
s#µ+ = µ−, and furthermore that minimizes total transport cost
1
2
∫Rn|x− s(x)|2dµ+(x).
Unfortunately, proving directly that such a mapping exists is a hard
problem, and we will instead consider a relaxed version of the problem.
Obviously, given a mapping s for which s#µ+ = µ− we can define
a measure π in R2n by∫R2n
φ(x, y)dπ =
∫Rnφ(x, s(x))dµ+.
Additionally, the marginals satisfy π|x = µ+ and π|y = µ−.
It is therefore natural to consider the relaxed Monge-Kantorowich
problem, which consists in minimizing
min1
2
∫R2n
|x− y|2dπ,
where the minimum is taken over all probability measures that satisfy
π|x = µ+ and π|y = µ−, that his∫R2n
ϕ(x)dπ =
∫Rnϕ(x)dµ+,
and ∫R2n
ψ(y)dπ =
∫Rnψ(y)dµ−,
for all continuous functions ϕ and ψ.
2. SOME INFORMAL COMPUTATIONS 237
Our strategy is to first prove existence of a solution to the relaxed
problem, which can be done under quite general assumptions, and only
then to prove (whenever possible) that the support of the optimal plan
is in fact a graph (x, s(x)) and, therefore, that there exists an optimal
transport mapping. The next example shows that the existence of an
optimal transport mapping can in fact fail:
Exercise 177. Let µ+ = δ0 and µ− = 12δ−1 + 1
2δ1. show that there does
not exist a function s which transports µ+ into µ−.
2. Some informal computations
2.1. Mather problem. In Mather’s problem, both in the deter-
ministic and in stochastic cases, the constraint∫Tn×Rn
vDxϕ(x) +σ2
2∆ϕ(x)dµ = 0,
(σ ≥ 0) is linear in v. Additionally, the Lagrangian is strictly convex
in v. This implies that minimizing measure has support in a graph
(x, v(x)). In fact, if the minimizing measure µ(x, v) were not support
in a graph, we could replace it by another measure µ given by∫Tn×Rn
φ(x, v)dµ(x, y) =
∫Tnφ(x, v(x))dθ(x),
where
v(x) =
∫Rnvµ(x, v)dv
and ∫Tnψ(x)dθ(x) =
∫Tn×Rn
ψ(x)µ(x, v)dv,
for all ψ ∈ C(Tn). Thus∫Tn×Rn
vDxϕ(x) +σ2
2∆ϕ(x)dµ = 0.
Additionally, the convexity of L in v implies∫Ldµ ≤
∫Ldµ.
238 5. DUALITY THEORY
If L is strictly convex, the inequality is strict unless v = v(x), µ almost
everywhere.
In conclusion:
Theorem 132. Let L(x, v) be strictly convex in v and µ a minimizing
measure for Mather’s problem (deterministic or stochastic). Then µ it
is supported in a graph
(x, v) = (x, v(x)).
additionally the projection θ of µ in the coordinate x satisfies
−∇ · (v(x)θ(x)) +σ2
2∆θ = 0,
and the distribution sense.
In order to simplify the presentation we are going to assume that
L = |v|22− U(x). Using formally Lagrange multipliers (see note after
exercise 25), we conclude that Mather’s problem is equivalent to the
problem without constraints
minθ,v(x)
∫Tn
(|v|2
2− U(x) + vDxϕ+
σ2
2∆ϕ+H
)θdx.
The function ϕ corresponds to the Lagrange multiplier for the holo-
nomy condition and H to the constraint∫
Tn θ = 1.
To obtain the Euler-Lagrange equation, we make the following vari-
ations
v → v + εw, θ → θ + εη.
This implies
v = −Dxϕ(x),
and|v|2
2− U(x) + vDxϕ+
σ2
2∆ϕ+H = 0.
Therefore
(141) − σ2
2∆ϕ+H(Dxϕ, x) = H,
2. SOME INFORMAL COMPUTATIONS 239
with
H(p, x) =|p|2
2+ U(x).
Exercise 178. Adapt minimax principle from exercise 25 to Mather’s
problem and formally verify the previous results.
As an application, we are going to prove an estimate for the second
derivatives of the solution of the Hamilton-Jacobi equation. In order
to keep the presentation as elementary as possible we assume that the
dimension is 1. We further assume that the solution to equation (141)
is twice differentiable in x:
−σ2
2∆(ϕxx) +DxϕDx(ϕxx) + |Dxϕx|2 + Uxx = 0.
Since v = −Dxϕ we have∫−σ
2
2∆(ϕxx) +DxϕDx(ϕxx)dµ = 0,
and therefore ∫|D2ϕ|2dµ ≤ C.
In section 5 we will make rigorous many of the ideas discussed in
this section. Mather’s problem is an infinite dimensional linear pro-
gramming problem. In in general, as we have discussed for finite di-
mensional problems, one can use duality to gain a better understanding
of the problem. For Mather’s problem (see exercise 178), the dual is
given by
infφ
supx−σ
2
2∆φ+H(Dxφ, x).
The duality theory implies that the value of this infimum is
−∫Ldµ.
On the other hand, this value is also the unique number H for which
−σ2
2∆u+H(Dxu, x) = H
240 5. DUALITY THEORY
has a periodic solution u. To check this fact directly, let u be a solution
of (141) then
infφ
supx−σ
2
2∆φ+H(Dxφ, x) ≤ sup
x−σ
2
2∆u+H(Dxu, x) = H.
Additionally, for each periodic function φ, u − φ has a minimum at a
point x0. At this point, Dxu = Dxφ, and ∆u ≥ ∆φ. Therefore
supx−σ
2
2∆φ+H(Dxφ, x) ≥ −σ
2
2∆φ(x0) +H(Dxφ, x0)
≥ −σ2
2∆u(x0) +H(Dxu, x0) = H.
2.2. Monge-Kantorowich problem. To obtain formally the Euler-
Lagrange equation to the Monge-Kantorowich we will suppose that
both µ+ and µ− have densities ρ+ and ρ−. Let s(x) be an optimal mass
transport map, µ the measure in R2n induced by s(x), with marginals
µ±. Let w be a divergence free vector field in Rn and ϕτ the flow
associated to the differential equation
d
dτz =
w(z)
ρ+(z).
Since w has zero divergence
∇ ·(ρ+ d
dτϕτ
)= 0.
Therefore ϕ#τ µ
+ = µ+. Define the measure µτ in R2n as∫φ(x, y)dµτ =
∫φ(ϕτ (x), y)dµ.
Since µ0 = µ, and µ is optimal, we have
d
dτ
∫|x− y|2dµτ
∣∣∣∣τ=0
= 0,
this is
2
∫(ϕτ (x)− y) · d
dτϕτ (x)dµ
∣∣∣∣τ=0
= 0.
This implies ∫(x− s(x)) · w(x) = 0.
3. DUALITY 241
This identity holds for all the versions free vector fields. Consequently,
the function x− s(x) is a gradient. Therefore
s(x) = DxΨ(x),
for some Ψ(x). The condition s#µ+ = µ− which, by the change of
variables formula is equivalent to
ρ+(x) = ρ−(s(x)) detDs(x),
which can be written asMonge-Ampere equation
ρ+(x) = ρ−(DΨ(x)) detD2Ψ(x).
Exercise 179. Use the minimax principle, see exercise 25, to deter-
mine the dual of Monge-Kantorowich problem.
Exercise 180. Consider the anti-optimal transport problem which con-
sists in determining the measure π(x, y) with marginals µ1 and µ2 which
maximizes ∫R2n
|x− y|2dπ(x, y).
Determine its dual.
Exercise 181. Use minimax principle to determine the dual of the
problem
min
∫R2n
c(x, y)π(x, y)dxdy
over all nonnegative probability densities π which satisfy∫Rnπ(x, y)dx =
∫Rnπ(y, x)dx.
3. Duality
According to the informal ideas discussed in section 1, we are now
going to discuss rigorously the duality theory. The main tool is the
Legendre-Fenchel-Rockefellar theorem, whose proof will be presented
in what follows, our proof is based in the one presented in [Vil03b].
Let E be a locally convex topological vector space with dual E ′.
The duality pairing between E and E ′ is denoted by (·, ·). Let h : E →
242 5. DUALITY THEORY
(−∞,+∞] be a convex function. The Legendre-Fenchel transform h∗ :
E ′ → [−∞,+∞] of h is defined by
h∗(y) = supx∈E
((x, y)− h(x)
),
for y ∈ E ′. In a similar way, if g : E → [−∞,+∞) is concave we define
g∗(y) = infx∈E
((x, y)− g(x)
).
Theorem 133 (Fenchel-Legendre-Rockafellar). Let E be a locally con-
vex topological vector space over R with dual E ′. Let h : E → (−∞,+∞]
be a convex function and g : E → [−∞,+∞) a concave function. Then,
if there exists a point x0 where both g and h are finite and at least one
of them is continuous,
(142) miny∈E′
[h∗(y)− g∗(y)] = supx∈E
[g(x)− h(x)] .
Remark. It is part of the theorem that the infimum in the left-hand
side above is a minimum.
Proof. First we show the “≥” inequality in (142). Recall that
infy∈E′
[h∗(y)− g∗(y)] = infy∈E′
supx1,x2∈E
[g(x1)− h(x2)− (y, x1 − x2)] .
By choosing x1 = x2 = x we conclude that
infy∈E′
[h∗(y)− g∗(y)] ≥ supx∈E
[g(x)− h(x)] .
The opposite inequality is more involved and requires the use of Hahn-
Banach’s theorem. Let
λ = supx∈E
[g(x)− h(x)] .
If λ = +∞ there is nothing to prove, thus we may assume λ < +∞.
We just need to show that there exists y ∈ E ′ such that for all x1 and
x2 we have
(143) g(x1)− h(x2)− (y, x1 − x2) ≤ λ,
since then, by taking the supremum over x1 and x2 yields
h∗(y)− g∗(y) ≤ λ.
3. DUALITY 243
From λ ≥ g(x)− h(x) it follows g(x) ≤ λ + h(x). Hence the following
convex subsets of E × R:
C1 =
(x1, t1) ∈ E × R : t1 < g(x1)
and
C2 =
(x2, t2) ∈ E × R : λ+ h(x2) < t2.
are disjoint. Let x0 as in the statement of the theorem. We will assume
that g is continuous at x0 (for the case in which h is the continuous
function the argument is similar). Since (x0, g(x0) − 1) ∈ C1 and g is
continuous at x0, C1 has non empty interior. Therefore, see [?, Chpt
4, sect 14.5], the sets C1 and C2 can be separated by a nonzero linear
function, i.e., there exists a nonzero vector z = (w, α) ∈ E ′ × R such
that
infc1∈C1
(z, c1) ≤ supc2∈C2
(z, c2),
that is, for any x1 such that g(x1) > −∞ and for any x2 s.t. h(x2) <
+∞ we have
(w, x1) + αt1 ≤ (w, x2) + αt2,
whenever t1 < g(x1) and λ+ h(x2) < t2.
Note that α can not be zero. Otherwise by using x2 = x0 and
taking x1 in a neighborhood of x0 where g is finite we deduce that w
is also zero. Therefore α > 0, otherwise, by taking t1 → −∞ we would
obtain a contradiction. Dividing w by α and letting y = −wα
, we would
obtain
−(y, x1) + g(x1) ≤ −(y, x2) + h(x2) + λ.
This is equivalent to (143) and thus we completed the proof.
Remark. The condition of continuity at x0 can be relaxed to the
condition of “Gateaux continuity” or directional continuity, that is the
function t 7→ f(x0 + tx) is continuous at t = 0 for any x ∈ E. Here f
stands for either h or g.
244 5. DUALITY THEORY
4. Generalized Mather problem
The generalized Mather problem is an infinite dimensional linear
programming problem. Its dual problem, that we compute in this
section, can be obtained using Fenchel-Legendre-Rockafellar’s theorem,
as we explain in what follows.
Let Ω = Tn × U . If U is bounded, set γ = 1, otherwise, let γ be a
function γ(v) : Ω→ [1,+∞) satisfying
lim|v|→+∞
L(x, v)
γ(v)= +∞, lim
|v|→+∞
|v|γ(v)
= 0.
Let M be the set of Radon measures in Ω with weight γ, that is,
M =
µ signed measure in Ω with
∫Ω
γd|µ| <∞.
The setM is the dual of the set Cγ,0(Ω) of continuous functions φ that
satisfy
(144) ‖φ‖γ = supΩ
∣∣∣∣φγ∣∣∣∣ <∞,
if U is bounded, and, if U is unbounded, satisfy both (144) and
lim|v|→∞
φ(x, v)
γ(v)= 0.
Let
M1 =
µ ∈M :
∫Ω
dµ = 1, µ ≥ 0
,
and
M2 = cl
µ ∈M :
∫Ω
Avϕdµ =
∫Ω
Bϕdν, ∀ϕ(x) ∈ Ck(Tn)
,
in which k is the degree of differentiability needed on ϕ so that Avϕ is
well defined, and the closure cl is taken in the weak topology.
For φ ∈ Cγ,0(Ω) let
h(φ) = sup(x,v)∈Ω
(−φ(x, v)− L(x, v)).
4. GENERALIZED MATHER PROBLEM 245
Since h is the supremum of convex functions, it is also a convex func-
tion, and, as was shown in [Gom02a], it is also continuous with respect
to uniform convergence in Cγ,0(Ω). Consider the set
C = clφ : φ = Avϕ, ϕ ∈ Ck(Tn)
,
where cl denotes the closure in Cγ,0. Since Av is a linear operator, C is
a convex set.
Let ν be a fixed probability measure on Tn, and let µν as in (138).
Define
g(φ) =
−∫φdµν if φ ∈ C,
−∞ otherwise.
As C is a closed convex set, g is concave and upper semicontinuous.
Note that if φ = Avϕ, then∫φdµν =
∫Bϕdν.
We claim that the dual of
(145) supφ∈Cγ0 (Ω)
g(φ)− h(φ)
is the generalized Mather problem .
We start by computing the Legendre transforms of h and g.
Proposition 134. We have
h∗(µ) =
∫Ldµ if µ ∈M1
+∞ otherwise,
and
g∗(µ) =
0 if µ ∈M2
−∞ otherwise.
Proof. By its definition
h∗(µ) = supφ∈Cγ0 (Ω)
(−∫φdµ− h(φ)
).
First we show that if µ is non-positive then h∗(µ) =∞.
246 5. DUALITY THEORY
Lemma 135. If µ 6≥ 0 then h∗(µ) = +∞.
Proof. If µ 6≥ 0 we can choose a sequence of non-negative func-
tions φn ∈ Cγ0 (Ω) such that∫
−φndµ→ +∞.
Therefore, since
sup−φn − L ≤ 0,
we have h∗(µ) = +∞.
Lemma 136. If µ ≥ 0 then
h∗(µ) ≥∫Ldµ+ sup
ψ∈Cγ0 (Ω)
(∫ψdµ− supψ
).
Proof. Let Ln be a sequence of functions in Cγ0 (Ω) increasing
pointwisely to L. Any φ in Cγ0 (Ω) can be written as φ = −Ln − ψ, for
some ψ in Cγ0 (Ω). Therefore
supφ∈Cγ0 (Ω)
(−∫φdµ− h(φ)
)=
= supψ∈Cγ0 (Ω)
(∫Lndµ+
∫ψdµ− sup(Ln + ψ − L)
).
Since
sup (Ln − L) ≤ 0,
we have
sup(Ln + ψ − L) ≤ supψ.
Therefore
supφ∈Cγ0 (Ω)
(−∫φdµ− h(φ)
)≥ sup
ψ∈Cγ0 (Ω)
(∫Lndµ+
∫ψdµ− supψ
).
By the monotone convergence theorem∫Lndµ→
∫Ldµ.
4. GENERALIZED MATHER PROBLEM 247
Thus,
supφ∈Cγ0 (Ω)
(−∫φdµ− h(φ)
)≥∫Ldµ+ sup
ψ∈Cγ0 (Ω)
(∫ψdµ− supψ
),
as required.
If∫Ldµ = +∞ then h∗(µ) = +∞. On the other hand, if
∫dµ 6= 1
then
supψ∈Cγ0 (Ω)
(∫ψdµ− supψ
)≥ sup
α∈Rα
(∫dµ− 1
)= +∞,
by choosing ψ = α, constant. Therefore h∗(µ) = +∞.
When∫dµ = 1, the previous lemma implies
h∗(µ) ≥∫Ldµ,
by choosing ψ = 0.
Additionally, for each φ∫(−φ− L)dµ ≤ sup(−φ− L),
if∫dµ = 1. Therefore
supφ∈Cγ0 (Ω)
(−∫φdµ− h(φ)
)≤∫Ldµ.
In this way,
h∗(µ) =
∫Ldµ if µ ∈M1
+∞ otherwise.
Let µν be such that ∫Avϕdµν =
∫Bϕdν,
for all ϕ ∈ Ck(Tn). We can write any measure µ ∈ M2 as a sum of
µν + µ, with ∫Avϕdµ = 0,
248 5. DUALITY THEORY
for all ϕ ∈ Ck(Tn). By continuity, it follows∫φdµ = 0,
for all φ ∈ C. Furthermore, for any µ 6∈ M2, there exists φ ∈ C such
that ∫φd(µ− µν) 6= 0.
Thus
g∗(µ) = infφ∈C−∫φdµ+
∫φdµν =
0 if µ ∈M2
−∞ otherwise.
Theorem 137.
(146) supφ∈Cγ,0(Ω)
(g(φ)− h(φ)) = minµ∈M
(h∗(µ)− g∗(µ)).
Note 1: minµ∈M(h∗(µ)− g∗(µ)) = minµ∈M1∩M2
∫Ldµ.
Note 2: It is part of the theorem that the right-hand side of (146) is
a minimum, and therefore there exists a generalized Mather measure.
Proof. The set g > −∞ is non-empty, and, in this set, h is a
continuous function as proved in [Gom02a]. Then the result fol-
lows from Fenchel-Legendre-Rockafellar’s Theorem, see, for instance
[Vil03b].
Let
H(ϕ, x) = supv−L(x, v)− Avϕ.
As an example, suppose Avϕ = ∆ϕ+ vDxϕ. Then
H(ϕ, x) = −∆ϕ+H(Dxϕ, x).
4. GENERALIZED MATHER PROBLEM 249
The result in Theorem 137 can then be restated in the more convenient
identity:
(147) minµ
∫Ldµ = − inf
ϕsupx
[H(ϕ, x) +
∫Bϕdν
],
where the minimum on the left-hand side is taken over all measures µ
that satisfy (140), and the infimum on the right-hand side is taken over
all ϕ ∈ Ck(Tn).
In the remaining of this section we consider Mather’s classical prob-
lem Avϕ = vDxϕ and B = 0.
Theorem 138. Let Avϕ = vDxϕ. Let H? given by
H? = − supφ∈Cγ0 (Ω)
(h2(φ)− h1(φ)).
Then
H? = infλ : ∃ϕ ∈ C1(Tn) : H(Dxϕ, x) < λ.
Proof. It is enough to observe that
H? = infϕ∈C1(Tn)
sup(x,v)∈Ω
−vDxϕ− L = infϕ∈C1(Tn)
supx∈Tn
H(Dxϕ, x).
Theorem 139. H? is the only value for which
H(Dxu, x) = H?
admits a periodic viscosity solution.
Proof. Let u be a periodic viscosity solution of
H(Dxu, x) = H.
We claim that there is no C1 function ψ such that
(148) supxH(Dxψ, x) < H.
250 5. DUALITY THEORY
By contradiction, let ψ be a function satisfying (148). Since u and ψ
are periodic functions, there exists a point x0 in which u − ψ would
have a local minimum. But then
H(Dxψ, x0) ≥ H,
which is is a contradiction. Thus, we conclude that H? ≥ H.
To prove the other inequality, consider a standard mollifier ηε and
define uε = ηε ∗ u. Then
H(Dxuε, x) ≤ H + h(ε, x),
where
h(ε, x) = sup|p|≤R
sup|x−y|≤ε
|H(p, x)−H(p, y)|,
and R is an estimate for the Lipschitz constant of u. Let
Hε = H + supxh(ε, x).
Then uε satisfies
H(Dxuε, x) ≤ Hε.
Therefore
H? ≤ limε→0
Hε = H.
Consequently H? = H.
4.1. Regularity. In this section we present (with small adapta-
tions) the regularity results for viscosity solutions in the support of the
Mather measures by [EG01]. We should point out that the proofs of
Theorems 141–147 presented here appeared in [EG01]. For the setting
of this survey, we had to add an elementary lemma, Lemma 140, for the
presentation to be self-contained, as our definition of Mather measures
differs from the one used in [EG01].
Lemma 140. Let µ be a minimizing holonomic measure. Then∫Td×Rd
DxL(x, v)dµ = 0.
4. GENERALIZED MATHER PROBLEM 251
Proof. Let h ∈ Rd, consider the measure µh on Td × Rd given by∫Td×Rd
φ(x, v)dµh =
∫Td×Rd
φ(x+ h, v)dµ,
for all continuous and compactly supported function φ : Td×Rd → R.
Clearly, for every h, µh is holonomic. Since µ is minimizing, it follows
d
dε
∫L(x+ εh, v)dµ
∣∣∣∣ε=0
= 0,
that is, ∫Td×Rd
DxL(x, v)hdµ = 0.
Since h ∈ R is arbitrary, the statement of the Lemma follows.
It will be convenient to define the measure µ on Td × Rd as the
push forward measure of the measure µ with respect to the one to one
map (v, x) 7→ (p, x), where p = DvL(v, x). In other words we define
the measure µ on Td × Rd to be∫Td×Rd
φ(x, p)dµ =
∫Td×Rd
φ(x,DvL(x, v))dµ.
We also define projection µ in Td of a measure µ in Td × Rd as∫Tdϕ(x)dµ(x) =
∫Td×Rd
ϕ(x)dµ(x, v).
Note that, in similar way, µ is also the projection of the measure µ.
Observe that for any smooth function ϕ(x) we have that µ satisfies the
following version of the holonomy condition:∫Td×Rd
DpH(p, x)Dxϕ(x)dµ = 0,
because we can use identity (??) if p = Dv(x, v).
Theorem 141. Let u be any viscosity solution of (132), and let µ
be any minimizing holonomic measure. Then µ-almost everywhere,
Dxu(x) exists and p = Dxu(x), µ-almost everywhere.
252 5. DUALITY THEORY
Proof. Let u be any viscosity solution of (132). Let ηε be a stan-
dard mollifier, uε = ηε ∗ u. By strict uniform convexity there exists
γ > 0 such that for any p, q ∈ Rd and any x ∈ Td we have
H(p, x) > H(q, x) +DpH(q, x)(p− q) +γ
2|p− q|2.
By Theorem 131, any viscosity solution of (132), and in particular u,
is Lipschitz.
Recall that, by Rademacher’s theorem [Eva98a], a locally Lips-
chitz function is differentiable Lebesgue almost everywhere. Using
p = Dxu(y) and q = Dxuε(x), conclude that for every point x and
for Lebesgue almost every point y:
H(Dxu(y), x) ≥ H(Dxuε(x), x)+DpH(Dxu
ε(x), x)(Dxu(y)−Dxuε(x))+
γ
2|Dxu
ε(x)−Dxu(y)|2.
Multiplying the previous identity by ηε(x− y) and integrating over Rd
in y yields
H(Dxuε(x), x)+
γ
2
∫Rdηε(x−y)|Dxu
ε(x)−Dxu(y)|2dy ≤∫
Rdηε(x−y)H(Dxu(y), x)dy ≤ H+O(ε).
Let
βε(x) =γ
2
∫Rdηε(x− y)|Dxu
ε(x)−Dxu(y)|2dy.
Now observe that
γ
2
∫Td×Rd
|Dxuε(x)− p|2dµ ≤
∫Td×Rd
[H(Dxuε(x), x)−H(p, x)−DpH(p, x)(Dxu
ε(x)− p)] dµ
≤∫
Td×RdH(Dxu
ε(x), x)dµ−H,
because ∫Td×Rd
DpH(x, p)Dxuε(x) = 0,
and
pDpH(x, p)−H(x, p) = L(x,DpH(x, p)),
and∫
Td×Rd L(x,DpH(x, p))dµ = −H. Therefore,
γ
2
∫Td×Rd
|Dxuε(x)− p|2dµ+
∫Tdβε(x)dµ ≤ O(ε).
Thus, for µ-almost every point x, βε(x)→ 0. Therefore, µ-almost every
point is a point of approximate continuity of Dxu (see [EG92], p. 49).
4. GENERALIZED MATHER PROBLEM 253
Since u is semiconcave (Proposition ??), it is differentiable at points of
approximate continuity. Furthermore
Dxuε → Dxu
pointwise, µ-almost everywhere, and so Dxu is µ measurable. Also we
have
p = Du(x), µ− almost everywhere.
By looking at the proof the previous theorem we can also state the
following useful result:
Corollary 142. Let ηε be a standard mollifier, uε = ηε ∗ u. Then∫Td|Dxu
ε −Dxu|2dµ ≤ Cε,
as ε→ 0.
As a Corollary we formulate an equivalent form of Theorem 141.
Corollary 143. Let u be any viscosity solution of (132), and let µ
be any minimizing holonomic measure. Then µ-almost everywhere,
Dxu(x) exists and
(149) DvL(v, x) = Dxu(x) µ− almost everywhere.
and
(150) DxL(v, x) = −DxH(Dxu(x), x) µ− almost everywhere.
Proof. First we observe that the measure µ is the push forward
measure of the measure µ with respect to the one to one map (v, x) 7→(p, x), where p = DvL(v, x). Therefore an µ – almost everywhere
identity
F1(p, x) = F2(p, x) (p, x)-µ almost everywhere
implies the µ – almost everywhere identity
F1(DvL(v, x), x) = F2(DvL(v, x), x) (v, x)-µ almost everywhere.
254 5. DUALITY THEORY
Thus (149) follows directly from Theorem 141.
Using (149) and the identity DxL(v, x) = −DxH(DvL(v, x), x), we
arrive at (150).
We observe that from the previous corollary it also follows∫TdDpH(Dx, x)Dxudµ = 0.
Indeed,∫TdDpH(Dxu, x)Dxudµ =
∫TdDpH(Dx, x)Dxu
εdµ+
∫TdDpH(Dxu, x) (Dxu−Dxu
ε) dµ.
We have ∫TdDpH(Dx, x)Dxu
εdµ = 0.
To handle the second term, fix δ > 0. Then∣∣∣∣∫TdDpH(Dxu, x) (Dxu−Dxu
ε)
∣∣∣∣ ≤ δ
∫Td|DpH(Dxu, x)|2dµ+
1
δ
∫Td|Dxu−Dxu
ε|2 dµ.
Note that since u is Lipschitz the term DpH(Dxu, x) is bounded, and
so is∫
Td |DpH(Dxu, x)|2dµ. Send ε→ 0, and then let δ → 0.
Theorem 144. Let u be any viscosity solution of (132), and let µ be
any minimizing holonomic measure. Then∫Td|Dxu(x+ h)−Dxu(x)|2dµ ≤ C|h|2.
Proof. Applying Theorem ?? we have
H(Dxuε(x+ h), x+ h) ≤ H + Cε.
By Theorem 141 the derivative Dxu(x) exists µ almost everywhere.
By proposition ?? viscosity solution satisfies equation (132) in classical
sense at all points of differentiability. Thus H(Dxu(x), x) = H for µ
almost all points x. Now observe that
Cε ≥ H(Dxuε(x+ h), x+ h)−H(Dxu(x), x)
= H(Dxuε(x+ h), x+ h)−H(Dxu
ε(x+ h), x) +H(Dxuε(x+ h), x)−H(Dxu(x), x)
4. GENERALIZED MATHER PROBLEM 255
The term
H(Dxuε(x+ h), x+ h)−H(Dxu
ε(x+ h), x) = DxH(Dxuε(x+ h), x)h+O(h2)
= DxH(Dxu(x), x)h+O(h2 + h|Dxuε(x+ h)−Dxu(x)|)
≥ DxH(Dxu(x), x)h+O(h2)− γ
4|Dxu
ε(x+ h)−Dxu(x)|2.
Therefore, for µ almost every x, we have
H(Dxuε(x+h), x)−H(Dxu, x) ≤ Cε−DxH(Dxu(x), x)h+
γ
4|Dxu
ε(x+h)−Dxu(x)|2+Ch2.
Since
H(Dxuε(x+h), x)−H(Dxu, x) ≥ γ
2|Dxu
ε(x+h)−Dxu(x)|2+DpH(Dxu, x)(Dxuε(x+h)−Dxu(x))
we have
γ
4
∫|Dxu
ε(x+h)−Dxu(x)|2dµ ≤ Cε+C|h|2−∫DxH(Dxu(x), x)hdµ.
By (150) and Lemma 140 it follows∫DxH(Dxu(x), x)hdµ = −
∫DxL(v, x)hdµ = 0.
As ε→ 0, through a suitable subsequence (since Dxuε(x+h) is bounded
in L2µ), we may assume thatDxu
ε(x+h) ξ(x) in L2µ, for some function
ξ ∈ L2µ, and ∫
|ξ −Dxu|2dµ ≤ C|h|2.
Finally, we claim that ξ(x) = Dxu(x + h) for µ almost all x. This
follows from Theorem 141 and the fact that for µ almost all x we have
ξ(x) ∈ D−x u(x + h), where D−x stands for the subdifferential. To see
this, observe that by Proposition ?? u is semiconcave, therefore uε are
uniformly semiconcave, that is
uε(y + h) ≤ uε(x+ h) +Dxuε(x+ h)(y − x) + C|y − x|2,
where C is independent of ε. Fixing y and integrating against a non-
negative function ϕ(x) ∈ L2µ yields∫
Td
(uε(y + h)− uε(x+ h)−Dxu
ε(x+ h)(y − x)− C|y − x|2)ϕ(x)dµ ≤ 0
By passing to the limit we have that
u(y+h) ≤ u(x+h)+ξ(x)(y−x)+C|y−x|2 for all y and µ-almost all x,
256 5. DUALITY THEORY
that is, ξ(x) ∈ D−x u(x+ h) for µ-almost all x.
Lemma 145. Let u be any viscosity solution of (132), and let µ be
any minimizing holonomic measure. Let ψ : Td × R → R be a smooth
function. Then ∫TdDpH(Dxu, x)Dx [ψ(x, u(x))] dµ = 0
Proof. Clearly we have∫TdDpH(Dxu, x)Dx [ψ(x, uε(x))] dµ = 0.
By the uniform convergence of uε to u, and L2µ convergence of Dxu
ε to
Dxu, see Corollary 142, we get the result.
Theorem 146. Let u be any viscosity solution of (132), and let µ be
any minimizing holonomic measure. Then, for µ almost every x and
all h ∈ Rd,
|u(x+ h)− 2u(x) + u(x− h)| ≤ C|h|2.
Proof. Let h 6= 0 and define
u(x) = u(x+ h), u(x) = u(x− h).
Consider the mollified functions uε, uε, where we take
(151) 0 < ε ≤ η|h|2,
for small η > 0. We have
H(Duε, x+ h) ≤ H + Cε, H(Duε, x− h) ≤ H + Cε.
For µ-almost every point x, (for which Du(x) exists and therefore
H(Du(x), x) = H) we have
H(Duε, x)−2H(Du, x)+H(Duε, x) ≤ 2Cε+H(Duε, x)−H(Duε, x+h)+H(Duε, x)−H(Duε, x−h).
Hence
γ
2(|Duε −Du|2 + |Duε −Du|2) +DpH(Du, x) · (Duε − 2Du+Duε)
≤ C(ε+ |h|2) + (DxH(Duε, x)−DxH(Duε, x)) · h.
4. GENERALIZED MATHER PROBLEM 257
Using the inequality∣∣(DxH(p, x)−DxH(q, x))·h∣∣ ≤ ∥∥∥ ∂2H
∂p∂x
∥∥∥ |p− q| |h| ≤ γ4|p−q|2+ 1
γ
∥∥∥ ∂2H∂p∂x
∥∥∥2
|h|2 ,
where∥∥∥ ∂2H∂p∂x
∥∥∥ = supp,x
sup|z|=1,|h|=1
∑i,j
∣∣∣zjhi ∂2H∂pj∂xi
(p, x)∣∣∣, we arrive at
γ
4(|Duε −Du|2 + |Duε −Du|2) +DpH(Du, x) · (Duε − 2Du+Duε) ≤ C(ε+ |h|2).
Fix now a smooth, nondecreasing, function Φ : R → R, and write
φ := Φ′ ≥ 0. Multiply the last inequality above by φ(uε−2u+uε
|h|2
), and
integrate with respect to µ:
γ
4
∫Td
(|Duε −Du|2 + |Duε −Du|2)φ
(uε − 2u+ uε
|h|2
)dµ(152)
+
∫TdDpH(Du, x) · (Duε − 2Du+Duε)φ(· · · ) dµ
≤ C(ε+ |h|2)
∫Tdφ(· · · ) dµ.
Now the second term on the left hand side of (152) equals
(153) |h|2∫
Rd
∫TdDpH(p, x) ·Dx
[Φ
(uε − 2u+ uε
|h|2
)]dµ
and thus, by Lemma 145 it vanishes. So now dropping the above term
from (152) and rewriting, we deduce
∫Td|Duε(x+ h)−Duε(x− h)|2φ
(uε(x+ h)− 2u(x) + uε(x− h)
|h|2
)dµ
(154)
≤ C(ε+ |h|2)
∫Tdφ
(uε(x+ h)− 2u(x) + uε(x− h)
|h|2
)dµ.
We confront now a technical problem, as (154) entails a mixture
of first-order difference quotients for Duε and second-order difference
quotients for u, uε. We can however relate these expressions, since u is
semiconcave.
To see this, first of all define
(155) Eε := x ∈ supp(µ) | uε(x+ h)− 2u(x) + uε(x− h) ≤ −κ|h|2,
258 5. DUALITY THEORY
the large constant κ > 0 to be fixed below. The functions
(156) u(x) := u(x)− α
2|x|2, uε(x) := uε(x)− α
2|x|2
are concave. Also a point x ∈ supp(µ) belongs to Eε if and only if
(157) uε(x+ h)− 2u(x) + uε(x− h) ≤ −(κ+ α)|h|2.
Set
(158) f ε(s) := uε(x+ s
h
|h|
)(−|h| ≤ s ≤ |h|).
Then f is concave, and
uε(x+ h)− 2uε(x) + uε(x− h) = f ε(|h|)− 2f ε(0) + f ε(−|h|)
=
∫ |h|−|h|
f ε′′(x)(|h| − |s|) ds
≥ |h|∫ |h|−|h|
f ε′′(s) ds (since f ε
′′ ≤ 0)
= |h|[f ε′(|h|)− f ε′(−|h|)
]= (Duε(x+ h)−Duε(x− h)) · h.
Consequently, if x ∈ Eε, this inequality and (157) together imply
2|uε(x)− u(x)|+ |Duε(x+ h)−Duε(x− h)||h| ≥ (κ+ α)|h|2.
Now |uε(x) − u(x)| ≤ Cε on Td, since u is Lipschitz continuous. We
may therefore take η in (151) small enough to deduce from the foregoing
that
(159) |Duε(x+ h)−Duε(x− h)| ≥ (κ
2+ α)|h|.
But then
(160) |Duε(x+ h)−Duε(x− h)| ≥ (κ
2− α)|h|.
Return now to (154). Taking κ > 2α and
φ(z) =
1 if z ≤ −κ0 if z > −κ.
4. GENERALIZED MATHER PROBLEM 259
The inequality (154) was derived for smooth functions φ. However, by
replacing φ in (154) by a sequence φn of smooth functions increasing
pointwise to φ, and using the monotone convergence theorem, we con-
clude that (154) holds for this function φ. Then we discover from (154)
that
(κ
2− α)2|h|2µ(Eε) ≤ C(ε+ |h|2)µ(Eε).
We fix κ so large that
(κ
2− α)2 ≥ C + 1,
to deduce
(|h|2 − Cε)µ(Eε) ≤ 0.
Thus µ(Eε) = 0 if η in (151) is small enough, and this means
uε(x+ h)− 2u(x) + uε(x− h) ≥ −κ|h|2
for µ-almost every point x. Now let ε→ 0:
u(x+ h)− 2u(x) + u(x− h) ≥ −κ|h|2
µ-almost everywhere Since
u(x+ h)− 2u(x) + u(x− h) ≤ α|h|2
owing to the semiconcavity, we have
|u(x+ h)− 2u(x) + u(x− h)| ≤ C|h|2
for µ-almost every point x. As u is continuous, the same inequality
obtains for all x ∈ supp(µ).
Now we state and prove the main result of this section.
Theorem 147. Let u be any viscosity solution of (132), and let µ
be any minimizing holonomic measure. Then for µ-almost every x,
Dxu(x) exists and for Lebesgue almost every y
(161) |Dxu(x)−Dxu(y)| ≤ C|x− y|.
260 5. DUALITY THEORY
Proof. First we show that
(162) |u(y)− u(x)− (y − x) ·Dxu(x)| ≤ C|x− y|2.
Fix y ∈ Rd and take any point x ∈ supp(µ) at which u is differentiable.
According to Theorem 146 with h := y − x, we have
(163) |u(y)− 2u(x) + u(2x− y)| ≤ C|x− y|2.
By semiconcavity, we have
(164) u(y)− u(x)−Du(x) · (y − x) ≤ C|x− y|2,
and also
(165) u(2x− y)− u(x)−Du(x) · (2x− y − x) ≤ C|x− y|2.
Use (165) in (163):
u(y)− u(x)−Du(x) · (y − x) ≥ −C|x− y|2.
This and (164) establish (162).
Estimate (161) follows from (162), as follows. Take x, y as above.
Let z be a point to be selected later, with |x − z| ≤ 2|x − y|. The
semiconcavity of u implies that
(166) u(z) ≤ u(y) +Du(y) · (z − y) + C|z − y|2.
Also,
u(z) = u(x)+Du(x)·(z−x)+O(|x−z|2), u(y) = u(x)+Du(x)·(y−x)+O(|x−y|2),
according to (162). Insert these identities into (166) and simplify:
(Du(x)−Du(y)) · (z − y) ≤ C|x− y|2.
Now take
z := y + |x− y| Du(x)−Du(y)
|Du(x)−Du(y)|to deduce (161).
4. GENERALIZED MATHER PROBLEM 261
Now take any point x ∈ supp(µ), and fix y. There exist points
xk ∈ supp(µ) (k = 1, . . . ) such that xk → x and u is differentiable at
xk. According to estimate (162)
|u(y)− u(xk)−Du(xk) · (y − xk)| ≤ C|xk − y|2 (k = 1, . . . ).
The constant C does not depend on k or y. Now let k → ∞. Owing
to (161) we see that Du(xk) converges to some vector η, for which
|u(y)− u(x)− η · (y − x)| ≤ C|x− y|2.
Consequently u is differentiable at x and Du(x) = η.
It follows from Theorem 147 that function v defined by Theorem
?? is Lipschitz on a set of full measure µ. Indeed, by substituting the
L.H.S. and the R.H.S. of (149) into Hp(p, x) = Hp(p, x) in place of p’s
and using (??) we have
v(x) = DpH(Du(x), x) µ almost everywhere.
We can then extend v as a Lipschitz function to the support of µ,
which is contained in the closure of this set of full measure. Note that
any Lipschitz function ϕ defined on a closed set K can be extended to
a globally defined Lipschitz function ϕ in the following way: without
loss of generality assume that Lip(ϕ) = 1; define
ϕ(x) = infy∈K
ϕ(y) + 2d(x, y).
An easy exercise then shows that ϕ = ϕ in K and that ϕ is Lipschitz.
Therefore we may assume that v is globally defined and Lipschitz.
4.2. Holonomy variations. In this section we study a class of
variations that preserve the holonomy constraint. These variations will
be used later to establish the invariance under the Euler-Lagrange flow
of minimizing holonomic measures.
Let ξ : Td → Rd, ξ(x) be a C1 vector field on Td. Let Φ(t, x) be the
flow by ξ, i.e.,
Φ(0, x) = x, and ∂∂t
Φ(t, x) = ξ(Φ(t, x)
).
262 5. DUALITY THEORY
Consider the prolongation of ξ to Td ×Rd, which is the vector field on
Td × Rd given by
(167) xk(x, v) = ξk(x) , vk(x, v) = vi∂ξk∂xi
(x) .
Lemma 148. The flow of (167) is given by
(168) Xk(t, x, v) = Φk(t, x) , Vk(t, x, v) = vs∂Φk
∂xs(t, x).
Proof. Since the X-part of the flow coincides with the Φ-flow, it
only remains to show that
V (0, x, v) = v , and ∂∂tV (t, x, v) = v
(X(t, x, v), V (t, x, v)
).
The first statement (V (0, x, v) = v) is clear since the map x 7→ Φ(0, x)
is the identity map. The second statement can be rewritten as
∂∂tVk(t, x, v) = Vi(t, x, v)
∂ξk∂xi Φ(t,x)
.
A simple computations yields
∂∂tVk(t, x, v) = vs
∂∂xs
(∂∂t
Φk(t, x))= vs
∂∂xs
(ξk(Φ(t, x)
))= vs
∂ξk∂xi Φ(t,x)
∂Φi
∂xs (t,x)
= Vi(t, x, v)∂ξk∂xi Φ(t,x)
,
which is the desired identity.
For any real number t and any function ψ(x, v), define a new func-
tion ψt as follows
(169) ψt(x, v) = ψ(X(t, x, v), V (t, x, v)
).
Thus the flow (168) generates the flow on space of functions ψ(x, v)
given by (169).
Lemma 149. The set C, defined in (??), is invariant under the flow
given by (169).
Proof. Let g ∈ C1(Td) be such that ψ(x, v) = vi∂∂xig(x). Let gt
denote the flow by Φ of the function g, i.e., gt(x) = g(Φ(t, x)
). We
4. GENERALIZED MATHER PROBLEM 263
claim that for any real number t we have
ψt(x, v) = vi∂
∂xigt(x),
where ψt is given by (169). Indeed,
ψt(x, v) = Vk(t, x, v)∂g
∂xk X(t,x,v)
= vs∂g
∂xk Φ(t,x)
∂Φk
∂xs (t,x)
= vs∂
∂xs
(g(Φ(t, x)
))= vs
∂
∂xsgt(x),
and so the Lemma is proven.
The flow on functions (169) generates the flow on measures: (t, µ) 7→µt, where
(170)
∫ψdµt =
∫ψtdµ.
Lemma 150. The flow (170) preserves the holonomy constraint.
Proof. Let µ be a holonomic measure. We have to prove that µtis also a holonomic, i.e.,
∫ψdµt = 0 for any ψ ∈ C. This is clear since
the flow (169) preserves the set C.
Theorem 151. Let µ be a minimizing measure for the action (135),
subject to the holonomy constraint. Then for any C1 vector field ξ :
Td → Rd we have
(171)
∫∂L
∂xsξs +
∂L
∂vsvk
∂∂xk
ξsdµ = 0.
Proof. Let µt be the flow generated from µ by (170). Relation
(171) expresses the fact ddt
(∫L(x, v)dµt
)t=0
= 0.
4.3. Invariance. In this section we present a new proof of the
invariance under the Euler-Lagrange flow of minimal holonomic mea-
sures.
In what follows ( )−1js denotes the j, s entry of the inverse matrix. We
will only use this notation for symmetric matrices, thus, this notation
will not lead to any ambiguity. Before stating and proving the main
Theorem of this section, we will prove an auxiliary lemma.
264 5. DUALITY THEORY
Lemma 152. Let µ be a minimal holonomic measure. Let vε(x) be
any smooth function. Let φ(x, v) be any smooth compactly supported
function. Then
(172)∫vk∂φ
∂xk
(x, vε(x)
)+∂φ
∂vj
(x, vε(x)
)(∂2L
∂2v
)−1
js
(x, vε(x)
)( ∂L∂xs
(x, v)− vk∂2L
∂xk∂vs
(x, vε(x)
))dµ =∫
vk∂
∂xk
(φ(x, vε(x)
))dµ−
∫vk
∂
∂xk
( ∂L∂vs
(x, vε(x)
)Xεs
)dµ+
∫vk
( ∂L∂vs
(x, vε(x)
)− ∂L∂vs
(x, v)) ∂
∂xk
(Xεs
)dµ,
where Xεs is a function of x only (does not depend on v), and is defined
as follows:
Xεs(x) =
∂φ
∂vj
(x, vε(x)
)(∂2L
∂2v
)−1
js
(x, vε(x)
).
Remark. We will only use this lemma for the case when vε is the
standard smoothing of the function v(x), that is, vε = ηε ∗ v, where ηεis a standard mollifier. The function v(x) is the function whose graph
contains the support of µ, given in Theorem ??. This explains the
notation vε.
Proof. This Lemma is based on Theorem 151. In this proof and
bellow vε stands for the function vε(x). We have:
vk∂φ
∂xk
(x, vε(x)
)= vk
∂
∂xk
(φ(x, vε(x)
))− vk
∂φ
∂vj
(x, vε(x)
) ∂vεj∂xk
(x).
Rewrite the last term:
vk∂φ
∂vj(x, vε(x))
∂vεj∂xk
(x) = vk∂φ
∂vj(x, vε)
(∂2L
∂2v
)−1
js(x, vε)
∂2L
∂vs∂vq(x, vε)
∂vεq∂xk
(x) = vkXεs(x)
∂2L
∂vs∂vq(x, vε)
∂vεq∂xk
(x).
Plug these two lines into (172). And therefore we reduce (172) to
(173)∫Xεs(x)
(∂L
∂xs(x, v)− vk
( ∂2L
∂xk∂vs(x, vε) +
∂2L
∂vs∂vq(x, vε)
∂vεq∂xk
))dµ =
−∫vk
∂
∂xk
( ∂L∂vs
(x, vε)Xεs
)dµ+
∫vk
( ∂L∂vs
(x, vε)− ∂L∂vs
(x, v)) ∂
∂xk
(Xεs
)dµ.
4. GENERALIZED MATHER PROBLEM 265
Using the chain rule in the LHS and the Leibniz rule in the RHS we
further reduce (173) to∫Xεs
(∂L
∂xs(x, v)−vk
∂
∂xk
( ∂L∂vs
(x, vε)))
dµ = −∫vkX
εs
∂
∂xk
( ∂L∂vs
(x, vε))dµ−
∫vk∂L
∂vs(x, v)
∂
∂xk
(Xεs
)dµ.
Noting the cancellation of the term∫vkX
εs∂∂xk
(∂L∂vs
(x, vε))dµ, we see
that the last identity is equivalent to (171) with ξs(x) = Xεs(x).
Theorem 153. Let µ be a minimizing holonomic measure. Then µ is
invariant under the Euler-Lagrange flow.
Proof. By Lemma 59 we have to prove that for any smooth com-
pactly supported function φ(x, v)
(174)
∫vk∂φ
∂xk+∂φ
∂vj
(∂2L
∂2v
)−1
js
[∂L
∂xs− vk
∂2L
∂xk∂vs
]dµ = 0,
where ( )−1js stands for the j, s entry of the inverse matrix.
The idea of the proof is first to rewrite (174) in an equivalent form
and then apply an approximation argument. Since µ is supported by
the graph v = v(x) we will change the x, v arguments with x,v(x)
for the following four types of functions ∂φ∂xk
, ∂φ∂vj
,(∂2L∂2v
)−1
js, and ∂2L
∂xk∂vs,
occurring in (174):
(175)∫vk∂φ
∂xk
(x,v(x)
)+∂φ
∂vj
(x,v(x)
)(∂2L
∂2v
)−1
js
(x,v(x)
)( ∂L∂xs
(x, v)− vk∂2L
∂xk∂vs
(x,v(x)
))dµ = 0.
To complete the proof of the theorem, we use Lemma 152. The first
and second integrals in the RHS of (172) are zero due to the holonomy
constraint. The third integral in the RHS of (172) tends to zero as
ε→ 0, because |vε(x)−v(x)| < cε and therefore |vε(x)− v| < cε µ-a.e.,
and because Xεs is uniformly Lipschitz and hence ∂xkX
εs is uniformly
bounded. Therefore the LHS of (172) tends to zero as ε→ 0.
266 5. DUALITY THEORY
But the LHS of (172) also tends to the LHS of (175) as ε → 0.
Indeed, since v(x) is a Lipschitz vector field we have
vε(x)→ v(x) (uniformly) and∂vε(x)
∂xis uniformly bounded.
Moreover for any smooth function Ψ(x, v) we have
Ψ(x, vε(x)
)→ Ψ
(x,v(x)
)(uniformly) and
∂
∂x
(Ψ(x, vε(x)
))is uniformly bounded.
Also note that for µ almost all (x, v) we have v = v(x). Therefore the
Theorem is proven.
5. Monge-Kantorowich problem
In this section we are going to study the Monge-Kantorowich prob-
lem. First we will show that there exists a solution.
Theorem 154. Let µ± be two probability measures on Rn with∫Rn|x|2dµ± <∞.
Then exists a measure µ which minimizes
1
2
∫R2n
|x− y|2dµ(x, y)
over all probability measures µ on R2n which satisfy µ|x = µ+ and
µ|y = µ−.
Remark. The integrability condition∫
Rn |x|2dµ± <∞ can be relaxed
see, for instance [Vil03a].
Proof. Let µn be a minimizing sequence, that is,∫R2n
1
2|x− y|2dµn → inf
µ
∫R2n
1
2|x− y|2dµ.
Since the sequence µn satisfy µn|x = µ+ and µn|y = µ− we have
supn
∫R2n
|x|2 + |y|2dµn <∞.
5. MONGE-KANTOROWICH PROBLEM 267
consequently, the sequence µn is precompact, that is, through a sub-
sequence µn µ, for some measure µ with the same marginals. Let
ck(x, y) be a sequence of compactly supported continuous functions
such that ck(x, y) increases pointwise to 12|x− y|2. Then, by the mono-
tone convergence theorem
1
2
∫R2n
|x− y|2dµ = limk→∞
∫R2n
ck(x, y)dµ
= limk→∞
limn→∞
∫R2n
ck(x, y)dµn
≤ limn→∞
∫R2n
1
2|x− y|2dµn,
from which we conclude that µ is a minimizer.
Exercise 182. Show that the dual of Monge-Kantorowich problem con-
sists in determining continuous functions φ(x) and ψ(y) such that
φ(x) + ψ(y) ≤ 1
2|x− y|2
and that maximize∫Rnφ(x)dµ1(x) +
∫Rnψ(y)dµ2(y).
Let φ and ψ be two admissible functions, that is,
φ(x) + ψ(y) ≤ 1
2|x− y|2.
Then
φ(x)− |x|2
2+ ψ(y)− |y|
2
2≤ −xy,
that is
φ(x) + ψ(y) ≥ xy,
with φ(x) = |x|22− φ(x) and ψ(y) = |y|2
2− ψ(y). On the other hand,∫
Rnφ(x)dµ+(x)+
∫Rnψ(y)dµ−(y)
=−∫
Rnφ(x)dµ+(x)−
∫Rnψ(y)dµ−(y)
+
∫Rn
|x|2
2dµ+(x) +
∫Rn
|y|2
2dµ−(y).
268 5. DUALITY THEORY
Let
Θ(φ, ψ) = −∫
Rnφ(x)dµ+(x)−
∫Rnψ(y)dµ−(y).
Let
ψ∗(x) = infyxy − ψ(y),
be the Legendre transform of ψ. The pair (ψ∗, ψ) satisfies
Θ(φ, ψ) ≤ Θ(ψ∗, ψ).
Applying a similar reasoning to the pair (ψ∗, ψ), and replacing ψ(y) by
ψ∗∗(y) = infxxy − ψ(x)
we obtain
Θ(ψ∗, ψ) ≤ Θ(ψ∗, ψ∗∗).
Therefore, the dual of the Monge-Kantorowich problem is equivalent
to minimizing ∫Rnψ∗(x)dµ+(x) +
∫Rnψ∗∗(y)dµ−(y)
over convex conjugate functions satisfying
ψ∗(x) + ψ∗∗(y) ≥ xy.
Bibliography
[AKN97] V. I. Arnold, V. V. Kozlov, and A. I. Neishtadt. Mathematical aspects ofclassical and celestial mechanics. Springer-Verlag, Berlin, 1997. Trans-lated from the 1985 Russian original by A. Iacob, Reprint of the originalEnglish edition from the series Encyclopaedia of Mathematical Sciences[Dynamical systems. III, Encyclopaedia Math. Sci., 3, Springer, Berlin,1993; MR 95d:58043a].
[Arn66] V. Arnold. Sur la geometrie differentielle des groupes de Lie de dimensioninfinie et ses applications a l’hydrodynamique des fluides parfaits. Ann.Inst. Fourier (Grenoble), 16(fasc. 1):319–361, 1966.
[Bar94] Guy Barles. Solutions de viscosite des equations de Hamilton-Jacobi.Springer-Verlag, Paris, 1994.
[BCD97] Martino Bardi and Italo Capuzzo-Dolcetta. Optimal control and viscos-ity solutions of Hamilton-Jacobi-Bellman equations. Systems & Control:Foundations & Applications. Birkhauser Boston Inc., Boston, MA, 1997.With appendices by Maurizio Falcone and Pierpaolo Soravia.
[EG92] L. C. Evans and R. F. Gariepy. Measure theory and fine properties offunctions. CRC Press, Boca Raton, FL, 1992.
[EG01] L. C. Evans and D. Gomes. Effective Hamiltonians and averaging forHamiltonian dynamics. I. Arch. Ration. Mech. Anal., 157(1):1–33, 2001.
[Eva98a] L. C. Evans. Partial differential equations. American Mathematical So-ciety, Providence, RI, 1998.
[Eva98b] Lawrence C. Evans. Partial differential equations, volume 19 of GraduateStudies in Mathematics. American Mathematical Society, Providence,RI, 1998.
[Eva99] Lawrence C. Evans. Partial differential equations and Monge-Kantorovich mass transfer. In Current developments in mathematics,1997 (Cambridge, MA), pages 65–126. Int. Press, Boston, MA, 1999.
[Fra02] Joel N. Franklin. Methods of mathematical economics, volume 37 of Clas-sics in Applied Mathematics. Society for Industrial and Applied Mathe-matics (SIAM), Philadelphia, PA, 2002. Linear and nonlinear program-ming, fixed-point theorems, Reprint of the 1980 original.
269
270 BIBLIOGRAPHY
[FS93] Wendell H. Fleming and H. Mete Soner. Controlled Markov processes andviscosity solutions, volume 25 of Applications of Mathematics. Springer-Verlag, New York, 1993.
[Gia83] Mariano Giaquinta. Multiple integrals in the calculus of variations andnonlinear elliptic systems, volume 105 of Annals of Mathematics Studies.Princeton University Press, Princeton, NJ, 1983.
[Gia93] Mariano Giaquinta. Introduction to regularity theory for nonlinear ellip-tic systems. Lectures in Mathematics ETH Zurich. Birkhauser Verlag,Basel, 1993.
[Gol80] H. Goldstein. Classical mechanics. Addison-Wesley Publishing Co.,Reading, Mass., second edition, 1980. Addison-Wesley Series in Physics.
[Gom00] D. Gomes. Viscosity Solutions and Asymptotics for Hamiltonian Sys-tems, Ph. D. Thesis. University of California at Berkeley, 2000.
[Gom02a] D. Gomes. A stochastic analogue of Aubry-Mather theory. Nonlinearity,15(3):581–603, 2002.
[Gom02b] Diogo Aguiar Gomes. A stochastic analogue of Aubry-Mather theory.Nonlinearity, 15(3):581–603, 2002.
[GSS08] D. Gomes, A. Sernadas, and C. Sernadas. Foundations and applicationsof linear optimization. preprint, 2008.
[GT01] David Gilbarg and Neil S. Trudinger. Elliptic partial differential equa-tions of second order. Classics in Mathematics. Springer-Verlag, Berlin,2001. Reprint of the 1998 edition.
[Lio82] Pierre-Louis Lions. Generalized solutions of Hamilton-Jacobi equations.Pitman (Advanced Publishing Program), Boston, Mass., 1982.
[LL76] L. D. Landau and E. M. Lifshitz. Course of theoretical physics. Vol.1. Pergamon Press, Oxford, third edition, 1976. Mechanics, Translatedfrom the Russian by J. B. Skyes and J. S. Bell.
[Mat91] J. N. Mather. Action minimizing invariant measures for positive definiteLagrangian systems. Math. Z., 207(2):169–207, 1991.
[Mn96] Ricardo Mane. Generic properties and problems of minimizing measuresof Lagrangian systems. Nonlinearity, 9(2):273–310, 1996.
[Oli98] Waldyr Oliva. Geometric Mechanics. IST - Lecture Notes, Lisbon, 1998.[Vil] C. Villani. Optimal transportation, dissipative pdes and functional in-
equalities.[Vil03a] Cedric Villani. Topics in optimal transportation, volume 58 of Graduate
Studies in Mathematics. American Mathematical Society, Providence,RI, 2003.
[Vil03b] Cedric Villani. Topics in optimal transportation, volume 58 of GraduateStudies in Mathematics. American Mathematical Society, Providence,RI, 2003.
Index
Campanato space, 172canonical transformation, 83Christoffel symbol, 61coercivity in Rn, 11condition
Legendre-Hadamard, 134conjugate point, 97connection
compatible with the metric, 66Levi-Civita, 66symmetric, 65
convex, 13strictly, 13
critical point, 14, 112critical point of the action, 51curvature
sectional, 101curvature tensor, 99
derivativecovariant, 65
Dynamic programming principle, 188
equationPoisson, 130Euler-Lagrange, 51Monge-Ampere, 241
equationsHamilton, 81
Euler-Lagrange equation, 129
generating function, 84
Harnack inequality, 166
invariantPoincare-Cartan, 82
Karush-Kuhn-Tucker (KKT)conditions, 38
Legendre transform, 76Lemma
John-Nirenberg, 164lower semicontinuity, 12
minimax principle, 28minimizing sequence, 10Morrey space, 172
Palais-Smale condition, 113Parallel transport, 64Poisson manifold, 121problem
Monge-Kantorowich, 236
quasiconvex, 140
regular point, 212
semiconcave, 200semiconvex, 200subdifferential, 23, 197subsolution, 159
271
272 INDEX
superdifferential, 197supersolution, 159symplectic manifold, 121
TheoremDeGiorgi-Nash-Moser, 166Fenchel-Legendre-Rockafellar, 242Lax Milgram, 150
torsion, 65
viscosity solution, 216viscosity supersolution/subsolution,
215
weakly lower semicontinuity, 138