classical mechanics - uzhuser.math.uzh.ch/cattaneo/classical_mechanics.pdf · classical mechanics...

CLASSICAL MECHANICS

UNIVERSITY OF ZURICH

INSTITUTE FOR MATHEMATICS

SPRING 2014

PROF. DR. ALBERTO S. CATTANEO AND NIMA MOSHAYEDI

Update: August 4, 2014

1

2 A. S. CATTANEO AND N. MOSHAYEDI

Abstract. This script was written for the course called “Classical Mechanics“

at the University of Zurich. The course was given by Professor Alberto S.

Cattaneo in the spring semester 2014. I want to thank Professor Cattaneofor giving me his notes from the lecture and also for corrections and remarks

on it. I also want to mention that this script should only be notes, which

give all the definitions and so on, in a compact way and should not replacethe lecture. Not every detail is written in this script, so one should also

either use another book on Classical Mechanics and read the script together

with the book, or use the script parallel to a lecture on Classical Mechanics.This course also gives an introduction on smooth manifolds and combines

the mathematical methods of differentiable manifolds with those of Classical

Mechanics.

Nima Moshayedi, August 4, 2014

CLASSICAL MECHANICS 3

Contents

Part 1. From Newton’s Laws to Lagrange’s equations 61. Introduction 62. Elements of Newtonian Mechanics 62.1. Newton’s Apple 62.2. Energy Conservation 72.3. Phase Space 82.4. Newton’s Vector Law 82.5. Pendulum 82.6. The Virial Theorem 92.7. Use of Hamiltonian as a Differential equation 102.8. Generic Structure of One-Degree-of-Freedom Systems 103. Calculus of Variations 113.1. Functionals and Variations 113.2. Extremals 123.3. Shortest Path 133.4. Multiple Functions 133.5. Symmetries and Conservation Laws 144. The Action principle 154.1. Coordinate-Invariance of the Action Principle 164.2. Central Force Field Orbits 174.3. Systems with Constraints and Lagrange Multipliers 19

Part 2. Differential forms 205. Introduction 206. Notations 207. Definitions 218. Properties 23

Part 3. Hamiltonian systems 239. Introduction 2310. Legendre Transform 2410.1. Derivatives and Convexity 2410.2. Involution 2510.3. Total Differential of Legendre Transform 2510.4. Local Legendre Transformation 2610.5. Multivariable Case 2611. Canonical Equations 2711.1. Hamiltonian Function 2711.2. Canonical Action Principle 2811.3. Previous Examples in Canonical Form 2812. The Poisson bracket 3012.1. Constants of motion 3112.2. The Poisson bracket in coordinate-free language 31

Part 4. Symplectic Integrators 3313. Introduction 33


14. The Euler method 3315. Hamiltonian systems 34

Part 5. The Noether Theorem 3616. Introduction 3617. Symmetries in Lagrangian mechanics 3617.1. Symmetries and the Lagrangian function 3717.2. Examples 3817.3. Generalized symmetries 3918. From the Lagrangian to the Hamiltonian formalism 4319. Symmetries in Hamiltonian mechanics 4419.1. Symplectic geometry 4519.2. The Kepler problem 45

Part 6. The Hamilton-Jacobi equation 4720. Introduction 4721. The Hamilton–Jacobi equation 4722. The action as a function of endpoints 4822.1. Hamiltonian systems 5123. Solving the Cauchy problem for the Hamilton–Jacobi equation 5124. Generating functions 54

Part 7. Introduction to Differentiable manifolds 5625. Introduction 5626. Manifolds 5627. Maps 5927.1. Submanifolds 6028. Topological manifolds 6029. Differentiable manifolds 6229.1. The tangent space 6229.2. The tangent bundle 6430. Vector bundles 6630.1. Constructions on vector bundles 6730.2. Differential forms 6731. Applications to mechanics 6831.1. The Noether 1-form 6831.2. The Legendre mapping 6931.3. The Liouville 1-form 6931.4. Symplectic geometry 70Appendix A. Topology 71Appendix B. Derivations 72B.1. Vector fields as derivations 76Appendix C. The Levi-Civita Symbol 77C.1. Definition 77C.2. Relation to the vector product 77C.3. Einstein notation 78C.4. Identities for products of Levi-Civita symbols 78


References 79


Part 1. From Newton’s Laws to Lagrange’s equations

1. Introduction

Classical mechanics is a very peculiar branch of physics. It used to be consideredthe sum total of our theoretical knowledge of the physical universe (Laplace’s daemon,the Newtonian clockwork), but now it is known as an idealization, a toy model if youwill. Classical Mechanics still describes the world pretty well in the range of validity,which is for example that of our everyday experience. So it is still an indispensablepart of any physicist’s or engineer’s education. It is so useful because the more accuratetheories that we know of (general relativity and quantum mechanics) make correctionsto classical mechanics generally only in extreme situations (black holes, neutron stars,atomic structure, superconductivity, and so forth). Given that GR and QM are muchmore harder theory to use and apply it is no wonder that scientists will revert to classicalmechanics whenever possible. So, what is classical mechanics?

2. Elements of Newtonian Mechanics

In the title classical means that there are no quantum effects. The simplest mechan-ical system is a mass point, which is a single point moving in space that has a finitemass m attached to it. You can think of the matter field belonging to a mass point asa delta-function in space: an infinitely concentrated, featureless lump of matter. Theequation of motion for the mass point comes from physics and is expressed by Newton’slaw:

force = mass× accelaration,

F = ma.

Our first mass point system comes right at the beginning of mechanics, it is Newton’sapple.

2.1. Newton’s Apple. Newton’s apple is a mass point with mass m and vertical posi-tion z(t) at time t. Here z is a Cartesian coordinate pointing upwards. From physics itis known that the gravitational force on the apple is given by mg and points downwards.Here g is the gravity constant with typical value 9.81ms−1. Thus Newton’s law for theapple is

mz = −mg ⇐⇒ z + g = 0,

where the dot denotes a time derivative. We see that the mass of the apple doesnot affect its motion in the gravitational field. What is the solution to the equationgiven above? This is a second-order ODE in time, so the solution to the initial-valueproblem requires specifying two initial conditions. In our case these are given by theinitial position z(0) and the initial velocity z(0). Given these two numbers the solutionto the above ODE is


z(t) = z(0) + z(0)t− g

2t2,

which is the equation for a parabola that you may recognize. It is worth reflectingabout what we have done so far. Newton’s law tells us how to evolve a mechanicalsystem in time. More specifically, let us define the state of our system as the collectionof variables that completely specifies the conditions of our system at a moment in time.This is a key definition in mechanics. In the present case we have that

state = position, velocity = z, z

because these were the initial conditions needed for our ODE. If our ODE is wellposed for t ∈ [0, T ] with some T > 0 then there is a unique map such that

state(0) −→ state(t)

for all t ∈ [0, T ]. Thus, in principle, the present state contains all the informationneeded to compute any future state; in other words the classical mechanical universeis deterministic and the future can in principle be predicted by solving a well-poseddifferential equation.

2.2. Energy Conservation. To derive the energy conservation law from the solution ofour ODE, we use a standard procedure: multiply the equation of motion by the velocityz and manipulate. This yields

zz + gz = 0⇔ d

dt

(z2

2+ gz

)= 0,

which is a conservation law for the energy function

H(z, z) =z2

2+ gz = const = E.

This function is defined up to an integration constant. The meaning of it is that theenergy function H(z, z), which is called the Hamiltonian, is constant if z(t) satisfiesNewton’s law. Other ways of saying the same thing are: H is an invariant of the motionor H is a first integral of Newton’s law. The value of H along a trajectory is denoted by

E and is of course known from the initial state. In physics, the z2

2part of H is called

the kinetic energy and the remainder is called the potential energy. The 1-dimensionalpotential energy is always given by a function V : R→ R of the given coordinate, notedV (z) if the coordinate is z, which is not always the same for different mechanical systems.

Definition 2.1. Energy function (one dimensional)Let z be the position of a point mass with mass m. The kinetic energy of the

particle is given by T = 12mz2 and the potential energy is a function V : R → R,

denoted V (z). Then the function

H(z, z) = T + U =1

2mz2 + V (z)

is called the Hamiltonian energy function of the system.


2.3. Phase Space. The dynamics of our mechanical system is best visualized in phasespace, which is the space spanned by two state coordinates z and z. Phase space isattractive for the following reasons:

• any possible state of the system corresponds to a specific point in phase space,so phase space is also the space of all possible states;

• a solution traces out a trajectory in phase space, and these trajectories do notcross if Newton’s law is well posed;

• If the energy is conserved, then the trajectories are contained in the contours(or level)of constant H.

The last point means that plotting contours of constant H immediately produces thesolution trajectories, albeit not their parametrization with respect to time t. This is veryuseful, because it allows us to learn something about the solution to another equationwithout solving the equation! That is what mathematics is all about. For Newton’sapple we note that all trajectories are open and that there are no fixed points, i.e., nopoints at which z = z = 0. By inspection, such a fixed point would correspond to acritical point of H, i.e., a point where ∇H = 0.

2.4. Newton’s Vector Law. In general a mass point moves in three-dimensional spaceand its position is described by the cartesian coordinates x = (x, y, z). Newton’s law isthen the vector law

mx = F(x, x, t) subject to given x(0) and x(0).

Here the prescribed force vector F may in general depend on the state x, x andon time t. However, we will consider only energy-conserving forces, i.e. forces thatderive from a time-independent scalar potential energy function V (x) via F = −∇V .For instance, the gravitational potential associated with our ODE at the beginning wasV = mgz. The vector character of the above equation means that Newton’s law fora mass point corresponds to a system of three coupled ODE’s in time and that thephase space is six-dimensional (two dimensions, position and velocity, for each spatialdirection). Each coordinate adds a position-velocity pair to the definition of the systemstate. The number of coordinates is often called the degrees of freedom, and so themass point has three degrees of freedom in general. In our apple example, the twodegrees of freedom related to the horizontal directions were irrelevant because Newton’slaw reduced to x = y = 0 in them. Therefore, the system reduced to a single degreeof freedom. another kind of reduction occurs through kinematical constraints, as thenext example.

2.5. Pendulum. A pendulum consists of a mass point with mass m attached to a rodwith length l. The pendulum lies in the xz-plane and is fixed at the coordinate originx = 0. There are two degrees of freedom for the position of the mass point, namely, x(t)and z(t), but they have to satisfy the constraint


x2 + z2 = l,

at all times. This can be used to eliminate one degree of freedom from consideration.Indeed, using the angle φ such that x = l sinφ and z = −l cosφ we will derive theequation

φ+g

lsinφ = 0,

where g is gravity as before. This nonlinear equation is harder to solve than theone before, but fortunately we can get most information about the solution from theHamiltonian function. The state of the constrained system is described by the phasespace coordinates φ, φ and energy conservation is derived exactly as before and yieldsthe pendulum Hamiltonian

H(φ, φ) =φ2

2− g

lcosφ.

Clearly finding the potential energy amounts to setting the force in Newton’s law equalto − dV

dφand integrating. In systems with more than one degree of freedom this works

only (locally) if the vector force F(x) satisfies the integrability condition ∇ × F = 0.For small-amplitude oscillations |φ| 1 and therefore the potential energy term canbe approximated by the first few terms of its Taylor series. Keeping only the first nonconstant term leads to the linear harmonic oscillator equations

H =φ2

1+g

l

φ2

2and φ+

g

lφ = 0.

with the simple general solution φ = A cos(Ωt) + B sin(Ωt), where the frequency

Ω =√g/l. In phase space the contours of this H are ellipses and therefore all orbits are

bounded, which is consistent with the small-amplitude, low-energy approximation.

2.6. The Virial Theorem. The pendulum allows demonstrating a second useful tech-nique for extracting some knowledge about the solution from the governing equationswithout solving them. The procedure is similar to finding the energy conservation law,but with the difference that this time we multiply the ODE for the pendulum by φ in-stead of φ. Also, we then time-average the equation over the interval t ∈ [0, T ]. Thisfirst yields

φφ+g

lφ sinφ = 0⇐⇒ d

dt(φφ)− (φ)2 +

g

lφ sinφ = 0

and then

1

Tφφ |T0 +

1

T

∫ T

0

(−(φ)2 +

g

lφ sinφ

)dt = 0.

The first term is evaluated at the endpoints of the time integral. Now, under theassumption that φφ is bounded this first term goes to zero as T →∞. If we denote the

time-averaging as T →∞ by (...) then we obtain the virial theorem.

(φ)2 =g

lφ sinφ.

In general, this shows a relationship that has to be true for all motions satisfyingthe assumption that φφ is bounded. In particular, for small-amplitude oscillations the


right hand side reduces to twice the averaged potential energy. This shows that for smalloscillations there is equipartition of energy between its kinetic and potential forms.When does the virial apply? In general, the virial theorem is guaranteed to apply ifthe conservation law H = E can be used to derive an a priori bound on φ and φ. Forthe pendulum this occurs if H < Ec, i.e. for the case of bounded orbits, where Ec isthe critical energy. It also applies to the case H = Ec, because of the infinite traveltime between saddle points that we noted before. However, it does not apply to thehigh-energy revolutions H > Ec, which repeatedly spin over the top such that |φ| growswithout bound.

2.7. Use of Hamiltonian as a Differential equation. The invariance of the functionH(φ, φ) along a solution trajectory can be exploited to aid the integration of Newton’slaw. On a given trajectory the energy has value E and therefore the equation

H(φ, φ) = E,

can be solved as a first-order ODE. This means the second-order- ODE in Newton’slaw has been reduced to a first-order ODE by using the first integral H = E. In thependulum case we obtain

dφ

dt= ±

√2(E +

g

lcosφ

)⇐⇒

∫dφ√

2(E + g

lcosφ

) = ±∫dt.

The sign is determined from the initial conditions. this reduces the solution proce-dure to a quadrature, which in this case can be performed using elliptic functions. Forinstance, this can be used to compute the period of finite-amplitude oscillations. Thisuse of H as an ODE foreshadows the Hamilton-Jacobi PDE we will encounter later.

2.8. Generic Structure of One-Degree-of-Freedom Systems. We can now sum-marize the structure of the equations for a generic coordinates q(t) satisfying Newton’slaw

q +dV

dq= 0

with potential energy function V (q). Note that we set the constant mass m = 1. Thestate of the system is determined by q, q and the Hamiltonian

H(q, q) = T + V =q2

2+ V (q).

is conserved if q is a solution of Newton’s equation. Here T is a common shorthandfor the kinetic energy. Newton’s law expresses that the particle is accelerated towardsdecreasing values of the potential V (q). In general, this means acceleration towards aminimum of V , should one exist. In the case of Newton’s apple there was no minimumand the push goes on forever. In the case of the pendulum minima of V occur at φ = 0and its 2π-periodic repetitions. Upon reaching a minimum the particle overshoots dueto its kinetic energy and then it climbs the potential on the other side. If this climbingmotion is reversed before a maximum of V is reached then the particle turns back towardsto minimum and an oscillation is observed. This is the low-energy case of the pendulum.If the motion goes over the next maximum of V then the particle aims for the neighboringminimum. This is the high-energy case of the pendulum going over the top. Turningfor the particle corresponds to zero kinetic energy and energy conservation implies that


V (q) = E at a turning point. Depending on the value of E this equation may or maynot have a solution. For a given value of the energy H = E we can obtain the first-orderequation

q = ±√

2(E − V (q)) with quadrature

∫dq√

2(E − V (q))= ±

∫dt.

Finally, if qq is bounded along a trajectory, then we have the virial theorem

q2

2=

1

2qdV

dq

for time-averaging over very long intervals on this trajectory. A useful special casearises if V is a power law V = Aq2m with A > 0 and integer m > 0. Then qq remainsbounded by energy conservation and the virial theorem yields

q2

2= T = mV .

For m = 1 this shows equipartition of energy, as in the linear harmonic oscillator butin any case it shows how the energy must, on average, be partitioned between its kineticand potential forms. Like energy conservation, this is a powerful fact about the solutionq(t) that can be derived without solving the governing equations.

3. Calculus of Variations

Interesting things can be understood in at least two different ways. Mechanics can beunderstood from the point of view of Newton’s law: solve a certain initial value problemfor an ODE that takes from the present state at t = 0 to a future state at t = T . The truepath of the coordinate q(t), say, is then a solution to this initial-value problem for thedifferential equation called Newton’s law. Now, there is an alternative, complementarypoint of view that looks at the entire path q(t) for all t ∈ [0, T ] simultaneously and thengives a criterion for the true path as the solution to an optimization problem. To get tothis point of view we need to recall the calculus of variations.

3.1. Functionals and Variations. Consider a function y(x) defined on the intervalx ∈ [a, b]. We will assume that y is always smooth enough to make possible all differentialoperations that we need to carry out1. Typically, it will be sufficient that y is twicecontinuously differentiable. Define the integral

J [y(x)] =

∫ b

a

F (y, y′, x)dx

in which the function F is assumed to be sufficiently smooth in all its three variableslots to allow partial derivatives up to second order to exist. We write derivatives ofF with respect to the arguments in its slots as partial derivatives. For example F =x2y + (y′)2 then

1Paraphrasing Einstein: y(x) should be as smooth as necessary for the problem at hand, but notany smoother. The general theory of calculus of variations accommodates functions (and generalizedfunctions) that are less regular than we assume.


∂F

∂x= 2xy,

∂F

∂y= x2,

∂F

∂y′= 2y′,

∂2F

∂x∂y=

∂2F

∂y∂x= 2x

and so on. The number J depends on the whole function y(x) and we say that J is afunctional of y(x); this relationship is denoted by the square brackets. Thus a functionalmaps a function y(x) to a single number J . This is a familiar concept, for instance theusual integral norms are functionals. The calculus of variations is concerned with thechange in J if the function y(x) is subject to a small variation, i.e., if y(x) is changed bya small amount. This means that y(x) is replaced by

y(x) −→ y(x) + δy(x),

where the variation δy(x) is a smooth function that is small in the sense that

‖δy‖∞ 1 and ‖δy′‖∞ 1.

Here the variation of y′ equals the derivative of δy, i.e.

δy′ =d

dxδy.

The change in J is the also small and can be computed from the Taylor expansion ofF as

J [y + δy]− J [y] =

∫ b

a

(F (y + δy, y′ + δy′, x)− F (y, y′, x))dx

=

∫ b

a

(δ∂F

∂y(y, y′, x) + δy′

∂F

∂y′(y, y′, x)

)dx+ o(δy, δy′).

Using δy′ = ddxδy the integral can be rewritten using integration by parts

δJ =

∫ b

a

[∂F

∂y− d

dx

(∂F

∂y′

)]δydx+

∂F

∂y′δy |ba .

This expression is called the first variation of J around y(x) and it is usually denotedby δJ . In general, it consists of an integral part and an endpoint part. The endpointpart vanishes if the admissible variations satisfy δy(a) = δy(b) = 0. For fixed y(x) thefirst variation δJ is a linear functional in δy, and it plays the same role here as does thedifferential in ordinary calculus.

3.2. Extremals. A function y(x) is an extremal of J if the first variation of J aroundy(x) vanishes for all δy that vanishes at the endpoints, i.e., δJ = 0, ∀δy s.t. δy(a) =δy(b) = 0. This is only possible if the integrand is zero everywhere. Otherwise, we couldchoose a variation that is zero at the endpoints but makes the integral nonzero, whichleads to δJ 6= 0. Therefore an extremal must satisfy the celebrated Euler-Lagrangeequation


Definition 3.1. Euler-Lagrange equation (EL-equation)The EL-equation for the extremal variational problem is given by

EL :d

dx

(∂F

∂y′

)=∂F

∂y.

Therefore the EL-equation is typically a second-order ODE for y(x).

Remark 3.1. The boundary conditions depend on the admissible functions y(x). Forinstance, if y(a) and y(b) are fixed then these are the boundary conditions. If y(b) is notfixed then the vanishing of δJ for all δy implies that ∂F

∂y′ = 0 at x = b. This is called the

natural boundary condition for the variational problem. An analogous statement appliesat the other endpoint at x = a.

3.3. Shortest Path. A simple example is to find the shortest path between two points(xA, yA) and (xB , yB) in two-dimensional Euclidean space. Actually, we have not spokenabout minima and maxima yet, and a good deal of work is needed to verify whether agiven extremal corresponds to a minimum of the functional or not. Sometimes this isclear from the context, as is the case here.

We assume that the curve can be parametrized by x, xB > xA and then the lengthfunctional can be written as

J =

∫ B

A

ds =

∫ B

A

√dx2 + dy2 =

∫ xB

xA

√1 + y′2dx

based on the function y(x) such that y(xA) = yA and y(xB) = yB . Thus F =√

1 + y′2

and the EL equation is

d

dx

(∂F

∂y′

)= 0⇒ ∂F

∂y′=

y′√1 + y′2

= const.

This implies that y′ is constant and therefore the extremal y(x) is a straight linethrough the two endpoints, as it should be. A variant of this problem allows the secondendpoint to move freely in y at fixed xB ; i.e. y(xB) is not fixed. This brings into playthe natural boundary condition at this point. i.e. ∂F

∂y′ = 0 at x = xB . This implies that

y′ = 0 there. Thus in this case the extremal is a horizontal straight line connecting thefirst point (xA, yA) to the second point (xB , yB). The y-location of the second point isitself part of the solution. Clearly, here we found the shortest path between the firstpoint and the line x = xB .

3.4. Multiple Functions. The EL-equation generalize easily to functionals that dependon multiple functions. Specifically, for a functional that depends on N functions yn(x)with integrand F (y1, y

′1, ..., yN , y

′N , x), the EL-equations are

n = 1, ..., N :d

dx

(∂F

∂y′n

)=

∂F

∂yn.

Typically, this is a system of N coupled second-order ODEs for the functions yn(x).For instance, if we use a parametric representation of the path in the form (x(τ), y(τ))with τ ∈ [0, 1] such that

(x(0), y(0)) = (xA, yA) and (x(1), y(1)) = (xB , yB),


then we naturally have a length functional that depends on N = 2 functions as

J [x(τ), y(τ)] =

∫ 1

0

F (x, y)dτ =

∫ 1

0

√x2 + y2dτ.

The dot denotes differentiation with respect to the parameter τ . The first variationof J is

δJ =

∫ 1

0

[d

dτ

(−x√x2 + y2

)δx+

d

dτ

(−y√x2 + y2

)δy

]dτ +

xδx+ yδy√x2 + y2

|10 .

The variations δx and δy are independent and therefore the necessary conditions foran extremal are the two EL-equations

∂F

∂x= − x√

x2 + y2= const and

∂F

∂y= − y√

x2 + y2= const.

These agree with the straight line condition. If the endpoints are variable then wenow have an interesting new possibility in the case where the initial point (xA, yA) isfixed whilst the final point (xB(s), yB(s)) can vary along a smooth curve parametrized bys. This imposes the condition (δx(1), δy(1)) = (x′B(s), y′B(s))ds on admissible endpointvariations, which implies that the critical path ends at a point (xB(s), yB(s)) such that

x(1)x′B(s) + y(1)y′B(s) = 0.

Plugging everything together we get an equation for s and therefore for the endpoints(xB(s), yB(s)). The geometric interpretation of the last equation is simple: the shortestpath must intersect the curve of possible endpoints with an angle of ninety degrees. Thisis a general property of the shortest path between a point and a smooth curve and it isconsistent with the special case xB = const that we looked at before.

3.5. Symmetries and Conservation Laws. We return to the generic N = 1 func-tional and define a conservation law to be a function G(y, y′, x) that is constant alongextremals of the functional. In other words, G is a first integral of the EL-equations thatis

dG

dx=∂G

∂x+∂G

∂yy′ +

∂G

∂y′y′′ = 0

if y(x) satisfies the EL-equations. Conservation laws greatly simplify the problem offinding the extremal function y(x) in the first place, and they also allow us to understandsomething about the nature of the extremals without knowing them in detail. It turnsout that many conservation laws can be linked to continuous symmetries of the functionalJ relative to transformation groups applied to x and y. The most general theorem tothis effect is called Noether’s theorem, but we will only use a few simple consequencesof it.

For instance, the distance functional is invariant under the continuous transformationgroup (x, y) 7→ (x, y) + (a, b) for arbitrary a, b ∈ R (this transformation includes theendpoints and the claimed invariance is then trivial). We say that J has a continuoussymmetry with respect to translations in both x and y. This reflects the homogeneityof Euclidean space. By inspection, for a general functional based on the integrandF (y, y′, x) this kind of translational symmetry is possible only if F does not depend


explicitly on either x or y: i.e. we can have F (y′) only, and we saw that this led to theconservation law ∂F

∂y′ = const.

In general, a translational symmetry in either x or y can be used to write downa generic conservation law. The form of the conservation law depend on whether thesymmetry refers to a dependent or an independent variable. For example, a symmetry inthe dependent variable y implies F (y′, x) and this leads directly to a generic conservationlaw for Gy = ∂F

∂y′ because

dGydx

=d

dx

(∂F

∂y′

)=∂F

∂y= 0

by the EL-equations. Similarly, a symmetry in the independent variable x impliesF (y, y′) and this also leads to a generic conservation law, but for the different quantityGx = y′ ∂F

∂y′ − F . This follows from

dGxdx

= y′′∂F

∂y′+ y′

d

dx

(∂F

∂y′

)− ∂F

∂x− ∂F

∂yy′ − ∂F

∂y′y′′ = −∂F

∂x= 0.

In the distance example both these conservation laws reduce to y′ = const. Similarlyconservation laws are obtained for arbitrary N ≥ 1. For instance, in the parametrizedversion of the distance problem we had N = 2 and the functional again had translationalsymmetries in x and y. However, in this case both x and y are dependent variablesand therefore the associated conservation laws are simply the quantities ∂F

∂xand ∂F

∂y.

There is an additional symmetry in the independent parametrization variable τ , but theassociated conservation law for

Gτ = x∂F

∂x+ y

∂F

∂y− F

is irrelevant because Gτ is identically zero for arbitrary functions x(τ) and y(τ). Thisis a consequence of the fact that the parametrization of the curve is irrelevant.

4. The Action principle

We now return to Newton’s apple and rephrase its fate as a variational problem.Compared to the generic theory, we replace x by t, y(x) by z(t) and F by a function Lto be defined as follows. The action integral is

S[z(t)] =

∫ T

0

L(z, z)dt =

∫ T

0

(T − V )dt =

∫ T

0

(z2

2− gz

)dt,

where L = T −V is the Lagrangian. Notice that L is not equal to H. It involves thedifference of kinetic and potential energy.

The following statement is called the action principle: Newton’s law is the EL-equation for an extremal of the action integral relative to all trajectories that have afixed initial point z(0) and a fixed terminal point z(T ). The proof of this is immediate:the vanishing of the first variation implies the EL-equation

d

dt

∂L

∂z=∂L

∂zand

∂L

∂z= −g, ∂L

∂z= z ⇒ z = −g.


This is Newton’s law. Because the endpoints are fixed there are no further termsto consider in δS. The peculiar thing is that the boundary conditions under which thetrajectory of z(t) is an extremal of S are different from the initial conditions used tosolve Newton’s law. The former involve the position at both t = 0 and t = T whereasthe latter involve the position and velocity simultaneously at time t = 0. The actionfunctional S has a symmetry with respect to time because the Lagrangian L does notdepend explicitly on t. This implies the conservation of

z∂L

∂z− L =

z2

2+ gz = H(z, z),

which is the familiar energy conservation law. Time symmetry implies energy con-servation. Newton’s apple immediately generalizes to a generic one-degree-of-freedomsystem with coordinate q(t). We have that the action

S[q] =

∫ T

0

L(q, q)dt =

∫ T

0

(T − V )dt =

∫ T

0

(q2

2− V (q)

)dt

has vanishing first variation at the true path q(t) subject to fixed q(0) and q(T ).In other words, the true path q(t) is an extremal of S over the space of all admissiblefunctions satisfying the fixed endpoint conditions. The EL-equation is

d

dt

∂L

∂q=∂L

∂q⇒ q + V ′(q) = 0.

The above remains true also for time-dependent potentials V (q, t).

4.1. Coordinate-Invariance of the Action Principle. There are a number of dis-tinct properties of the action principle that make it attractive to use. First, the genericEL-equation are invariant under arbitrary coordinate transformations. This means thatif we know that q(t) satisfies the generic EL-equation

d

dt

∂L

∂q=∂L

∂q

then any other coordinate Q(t) such that q = f(Q) for some function f also satisfiesthe generic EL-equation based on the transformed Lagragian

L(Q.Q, t) = L(f(Q), f ′(Q)Q, t).

This transformed Lagrangian follows directly from the variable substitution and usingq = f ′(Q)Q. The partial derivatives

∂L

∂Q=∂L

∂qf ′(Q) +

∂L

∂qf ′′(Q)Q and

∂L

∂Q=∂L

∂qf ′(Q)

combine to yield

d

dt

(∂L

∂Q

)=

d

dt

(∂L

∂q

)f ′ +

∂L

∂qf ′′Q =

∂L

∂qf ′ +

∂L

∂qf ′′Q =

∂L

∂Q.

We have indeed obtained the generic EL-equation for Q. Of course, the final ODEfor Q(t) will differ from that for q(t). The point is that the procedure that leads to theODE is the generic EL-equation, which is always formed in the same way. A secondproperty of the action principle is that, in conservative force fields, it generalizes easilyto a particle moving in three dimensions. Then Newton’s law is the vector law


x +∇V (x) = 0

for the particle location x = (x, y, z). However the action S remains the scalar withLagrangian L given by

L(x, x) =1

2|x|2 − V (x).

The action principle then relates to independent variations of x(t), y(t) and z(t). Thecorresponding EL-equations are the three coupled ODEs

d

dt

∂L

∂x=∂L

∂x,

d

dt

∂L

∂y=∂L

∂y,

d

dt

∂L

∂z=∂L

∂z.

This is already easy to solve in Cartesian coordinates but the key point is that theseequations remain valid in arbitrary curvilinear coordinates (q1, q2, q3). This is becauseof the coordinate-invariance of the EL-equations that was already noted.

The same is not true for Newton’s law, which needs to be reformulated in curvilinearcoordinates according to the rules of vector calculus, usually a tedious task. An examplewill make this clear.

4.2. Central Force Field Orbits. Consider a very heavy point mass sitting at the ori-gin of a three-dimensional Cartesian coordinate system. A very light particle is orbitingit with position x(t). We neglect the motion of the heavy mass and want to compute themotion of the light mass, which we call our particle. This is a model for Earth orbitingthe sun, for example.

The heavy mass creates a gravitational field that depends only on distance r fromthe origin. It acts like a potential energy V (r) on the particle. Specifically, in Newton’stheory of gravitation the potential is given by

V (r) = −Gr,

where G > 0 is a constant. This pulls the particle towards the origin all the time,with a central force that is proportional to 1

r2. Now, if z(0) = z(0) = 0 then one can

easily show that the particle will never leave the xy-plane. We orient the coordinatessuch that this is the case and now have a system with two degrees of freedom for theparticle, say polar coordinates with radius r(t) and angle θ(t) such that x = r cos θ andy = r sin θ.

A naive application of Newton’s laws would be to write

r + V ′(r) = 0 and θ = 0,

which captures the central force and the correct fact that there is no force in theazimuthal direction. However, the statement is clearly wrong as it leads to uniformangular motion θ = A + Bt and to finite-time collapse of the particle r = 0. Both areclearly wrong. Indeed, you would not be reading this if these equations were true, hencethey must be false.

The error was, of course, that Newton’s law takes different forms in Cartesian andpolar coordinates. Rather than using vector calculus, we correct our mistake by usingthe scalar action principle. Here all that is needed is to find the correct expression forthe kinetic energy T in polar coordinates. This is easily done based on the Euclideanlength element ds, which is given by


ds2 = dx2 + dy2 = dr2 + r2dθ

in Cartesian and polar coordinates. This elementary use of the metric is all we needto know about the nature of polar coordinates. The kinetic energy of the particle is

T =1

2

(ds

dt

)2

=1

2(x2 + y2) =

1

2(r2 + r2θ2)

and therefore the Lagrangian is

L(r, r, θ) =1

2(r2 + r2θ2)− V (r).

We could now try to solve the two EL-equations

d

dt

∂L

∂r=∂L

∂rand

d

dt

∂L

∂θ=∂L

∂θ.

However, there is a shortcut and as mathematicians we know that taking shortcuts isalways worth it, no matter how difficult it is to actually do it. In this case the shortcutcomes from the symmetries of S.

The present S is symmetric with respect to time t and angle θ, i.e., it is symmetricwith respect to continuous translations in time and continuous rotations in space. ByNoether’s theorem there are two corresponding conserved quantities. The time symmetrygives the usual conserved energy

H = r∂L

∂r+ θ

∂L

∂θ− L =

1

2(r2 + r2θ2) + V (r) = E

and the rotational symmetry gives a second conserved quantity via

d

dt

∂L

∂θ= 0⇒ ∂L

∂θ= r2θ = M.

This quantity is called the angular momentum in physics and it is conserved2 for allcentral potentials V (r). The presence of two conserved quantities allows us to integrateour system with two degrees of freedom by a sequence of quadratures. First, we substituteM for θ in the energy law.

H(r, r, θ) = E to obtain H(r, r,M/r2) = E,

which is a first order ODE for r(t) alone. Upon solving this ODE for r(t) we can in turn

view the second conservation law θ = M/r2 as a first-order ODE for θ(t). This procedurecan be carried out and yields the full orbit r(t) and θ(t) in terms of quadratures.

Here we will be satisfied with information on r(t) alone, i.e.

1

2r2 + V (r) +

M2

2r2= E.

This can be viewed as the energy equation for a generic one-degree-of-freedom systemwith an effective potential energy

Veff = V (r) +M2

2r2.

2The conservation of M is enough to explain Kepler’s second law, the law of ”equal areas”, whichwas extrapolated from astronomical observations before the laws of mechanics were known.


For nonzero M the behavior is fundamentally altered near r = 0, because now theeffective potential accelerates r(t) away from the origin. This effect is what was missedby the naive approach, which is only correct in the case M = 0.

4.3. Systems with Constraints and Lagrange Multipliers. A further property ofthe action principle is that it generalizes easily to systems with constraints. For instance,the pendulum is the result of considering a point mass with two degrees of freedom x(t)and z(t) exposed to gravity and subject to the constraint x2 + z2 = l. Without thisconstraint, using polar coordinates r, φ, we have the Lagrangian

L =1

2(r2 + r2φ2) + gr cosφ.

With the constraint, we will assume that the action principle continues to apply inthe following sense: the action S is extremal relative to all r(t) and φ(t) that satisfythe constraint as well as the usual endpoint conditions. This assumption has beencorroborated by solving many model problems and finding no contradiction; we treat itas a basic statement of physics. In the present case the constraint r = l can be directlysubstituted in the Lagrangian above, which eliminates r and reduces the problem to aone-degree-of-freedom system with Lagrangian

L =1

2l2φ2 + gl cosφ.

The variations in φ are unconstrained. However, it often happens that the constraintcannot be used to eliminate degrees of freedom. For instance, this could happen if theconstraint involves time derivatives of coordinates. Then the EL-equations do not applybecause they have been derived under assumption of unconstrained variations. Still,in such cases the constraint can be incorporated by the general method of Lagrangemultipliers, as we shall see in this example.

This is a mathematical method and not a physical law. It transforms a variationalproblem with constraints into a new unconstrained problem with more degrees of free-dom. For this unconstrained system the EL-equations then apply in their standard form.Specifically, the method consists of using the homogeneous constrains r − l = 0 to formthe augmented Lagrangian

L = L− λ(r − l).Here λ(t) is a Lagrange multiplier function that must be added to the list of unknown

functions r(t) and φ(t). Because the constraint acts at every moment in time thereis not a single Lagrange multiplier but a full function. Indeed, by breaking down theaction integral into a finite Riemann sum it is clear that there are as many constraintsas members in that sum. They all have their own multiplier and in the continuous limitthese multipliers become a function of t.

Now, according to the theory of Lagrange multipliers the augmented action S based

on L is extremal relative to unconstrained variations of r, φ and λ precisely if the originalaction is extremal relative to the constrained variations in r and φ. Therefore, we considerthe three EL-equations for the augmented action

S[r, φ, λ] =

∫ T

0

[1

2(r2 + r2φ2) + gr cosφ− λ(r − l)

]dt.

They are


r : r = rφ2 + g cosφ− λ

φ :d

dt(r2φ) = −gr sinφ

λ : r = l

The last equation enforces the constraint. In the general case, these three equationsneed to be solved simultaneously. In the present simple case substituting from the thirdinto the second equation again produces the EL-equation for φ alone. the first equationdecouples and becomes

λ = lφ2 + g cosφ.

This gives λ(t) once φ(t) has been found. It can be shown that λ is equal to thecentral force along the rod of the pendulum that is necessary to enforce the constraint.This turns out to be true more generally, i.e., the value of the Lagrange multiplier isproportional to the constraint forces.

The method of Lagrange multipliers is an amazing construction. In your own time you

might ponder a few peculiar questions such as whether S is numerically equal to S forthe extremal path, what happens to λ when the constraint is rewritten as sin(r− l) = 0(or some such), what are the symmetries and associated conserved quantities of theaugmented Lagrangian and what is the derivative of the extremalized action S withrespect to l?

Part 2. Differential forms

5. Introduction

We want to introduce the notion of differential forms and their properties, whichwe’re going to use for the Hamiltonian formalism. In this part we’re going to explainthe wedge product between two differential forms, the exterior-derivative, the pullbackof a differential form, the Lie-derivative, the contraction of a differential form and theproperties of these.

6. Notations

In this part U will denote an open subset of Rn, α a k-form, β an l-form, f a functionand X a vector field on U . We will denote by (x1, . . . , xn) the coordinates on U and


accordingly write

α(x) =

n∑i1,...,ik=1

αi1,...,ik (x) dxi1 ∧ · · · ∧ dxik ,

β(x) =

n∑j1,...,jl=1

βj1,...,jl(x) dxj1 ∧ · · · ∧ dxjl ,

X(x) =

n∑i=1

Xi(x)∂

∂xi.

For simplicity we will assume throughout that α, β, f and X are smooth, i.e., that allcomponents αi1,...,ik , all components βj1,...,jl , all components Xi and f are arbitrarilyoften continuously differentiable. Recall that functions and zero-forms are one and thesame thing.

Remark 6.1. The symbols ∂∂x1

, . . . , ∂∂xn

denote the basis of Rn corresponding to our

choice of coordinates. The symbols dx1, . . . , dxn denote the dual basis of (Rn)∗; i.e., thecanonical pairing of dxi with ∂

∂xjis 1 if i = j and 0 otherwise. The induced basis of

∧k(Rn)∗ is given by the symbols (dxi1 ∧ · · · ∧ dxik )1≤i1<···<ik≤n. The wedge product of

the symbols dxi is defined by the identity

dxi ∧ dxj = −dxj ∧ dxi.

Using this identity one can rewrite all the terms in the above expansion of α into a linearcombination of the basis elements (dxi1 ∧ · · · ∧ dxik )1≤i1<···<ik≤n of ∧k(Rn)∗. Noticethat the coefficients of each basis element is given by the complete antisymmetrization ofthe components with respect to their indices. This means that it is enough to considercompletely antisymmetric components, but it is quite convenient (see, e.g., the formulaefor the wedge product and for the exterior derivative below) to allow for more general(though redundant) components.

Remark 6.2. If k = 0, then α is a function; if k > n or k < 0, then α is 0.

Remark 6.3. The attentive reader might have noticed that we use consistently upper andlower indices (to denote components of vectors and forms, respectively). This helps inbookkeeping but is not essential. It also allows using Einstein’s convention on repeatedindices that a sum over an index is understood when the index is repeated, once as anupper index and once as a lower index. With this convention, which we will not use inthis note, the last formula would, e.g., read: X(x) = Xi(x) ∂

∂xi.

7. Definitions

Definition 7.1. Wedge product The wedge product of α and β is the (k+ l)-form

α∧β(x) =

n∑i1,...,ik=1

n∑j1,...,jl=1

αi1,...,ik (x)βj1,...,jl(x) dxi1∧· · ·∧dxik ∧dxj1∧· · ·∧dxjl .

Notice that if k + l > n, then α ∧ β is automatically zero.


Definition 7.2. Exterior derivative The differential or exterior derivative of α isthe (k + 1)-form

dα(x) =

n∑j=1

n∑i1,...,ik=1

∂

∂xjαi1,...,ik (x) dxj ∧ dxi1 ∧ · · · ∧ dxik .

Notice that dxi denotes at the same time the i-th basis vector of (Rn)∗ and thedifferential of the coordinate function xi. Also notice that if α is a top form, i.e., k = n,then automatically dα = 0.

Definition 7.3. Pullback If V is an open subset of Rm and φ a smooth mapV → U , the pullback of α is the k-form on V defined by

φ∗α(y) := ∧kdφ(y)∗α(φ(y)), y ∈ V,where dφ(y) : Rm → Rn denotes the differential of φ at y, dφ(y)∗ its transpose and∧kdφ(y)∗ the k-th exterior power of the latter. If (y1, . . . , ym) are coordinates on V ,we have

φ∗α(y) =

m∑j1,...,jk=1

n∑i1,...,ik=1

αi1,...,ik (φ(y))∂φi1

∂yj1(y) · · · ∂φ

ik

∂yjk(y) dyj1 ∧ · · · ∧ dyjk .

Observe that, if W is an open subset of Rs and ψ a smooth map W → V , we have

(φ ψ)∗ = ψ∗φ∗.

Definition 7.4. Lie derivativeThe Lie derivative with respect to X of α is the k-form

LXα = limt→0

φ∗X,tα− αt

,

where φX,t is the flow of X at time t. Explicitly we have

LXα(x) =

n∑i1,...,ik=1

X(αi1,...,ik )(x) dxi1 ∧ · · · ∧ dxik+

+

n∑i1,...,ik=1

n∑r,s=1

(−1)r−1αi1,...,ik (x)∂Xir

∂xs(x) dxs ∧ dxi1 ∧ · · · ∧ dxir ∧ · · · ∧ dxik ,

where X(αi1,...,ik ) =∑nj=1 X

j ∂∂xj

αi1,...,ik denotes the directional derivative of the

function αi1,...,ik in the direction of X and the caret denotes that the factor dxir

is omitted.

Definition 7.5. ContractionThe contraction of X with α is the (k − 1)-form

ιXα(x) =

n∑i1,...,ik=1

n∑r=1

(−1)r−1αi1,...,ik (x)Xir (x) dxi1 ∧ · · · ∧ dxir ∧ · · · ∧ dxik .

If α is a function, i.e., k = 0, then automatically ιXα = 0.


8. Properties

The wedge product is bilinear over Rn, whereas the differential, the pullback, theLie derivative and the contraction are linear over Rn. Moreover, we have the followingproperties:

α ∧ β = (−1)k+lβ ∧ α,(1)

d2α = 0,(2)

d(α ∧ β) = dα ∧ β + (−1)kα ∧ dβ,(3)

φ∗f = f φ,(4)

φ∗(α ∧ β) = φ∗α ∧ φ∗β,(5)

dφ∗α = φ∗dα,(6)

LXf = X(f),(7)

LX(α ∧ β) = LXα ∧ β + α ∧ LXβ,(8)

LXdα = dLXα,(9)

ιX(α ∧ β) = ιXα ∧ β + (−1)kα ∧ ιXβ,(10)

LXα = ιXdα+ dιXα (Cartan’s formula).(11)

Observe that the above properties characterize d, φ∗, LX and ιX completely: the formulaein the previous Section may be recovered from these properties. If Y is a second vectorfield, we also have

ιXιY α = −ιY ιXα,(12)

LXLY α− LY LXα = L[X,Y ]α,(13)

ιXLY α− LY ιXα = ι[X,Y ]α,(14)

where [X,Y ] is the Lie bracket of X and Y defined by

[X,Y ](f) = X(Y (f))− Y (X(f)).

Part 3. Hamiltonian systems

9. Introduction

The second formulation we will look at is the Hamilton formalism. In this system, inplace of the Lagrangian we define a quantity called the Hamiltonian, to which the canon-ical equations of motion are applied. While the EL-equations describe the motion of aparticle as a single second-order differential equation, the canonical equations describethe motion as a coupled system of two first-order differential equations. One of the manyadvantages of Hamiltonian mechanics is that it is similar in form to quantum mechanics,the theory that describes the motion of particles at subatomic distance scales. An un-derstanding of Hamiltonian mechanics provides a good introduction to the mathematicsof quantum mechanics.


10. Legendre Transform

Consider a smooth real-valued function f(x) for x ∈ R that is strictly convex, i.e.,f ′′ > 0 for all x. Then the Legendre transformation transforms the pair (x, f(x)) into anew pair (p, F (p)) by the definition

Definition 10.1. Legendre TransformThe Legendre transform of a convex function f(x) is given by

F (p) = maxx

[px− f(x)].

The necessary condition for a maximum is

p = f ′(x).

This is to be viewed as an equation for x given p. The second x-derivative of theLegendre transform is −f ′′(x), which is negative by assumption. Therefore p = f ′(x)is necessary and sufficient for the unique maximum. Thus, an equivalent way to writeF (p) is via the two equations

F (p) = px− f(x) and p = f ′(x).

This can be contracted to

F (p) = xf ′(x)− f(x)

provided one does not forget to solve p = f ′(x) for x in terms of p. This is invariablythe tricky step. Note that the Legendre transformation is not a linear transformationdespite the appearance of the contracted equation. this is because inverting p = f ′(x) isnot a linear procedure in f(x).

As an example, the Legendre transformation of f = xα

αfor α > 1 and x > 0 is

computed to be

F (p) =α− 1

αxα and p = xα−1 > 0.

Inverting the second equation and substituting in the first yields

F (p) =pβ

βwhere

1

α+

1

β= 1.

We see that β > 1; i.e., F (p) for p > 0 is again convex. This is true in general as weshall see.

10.1. Derivatives and Convexity. From the equations F (p) = px − f(x) and p =f ′(x), the differential of F (p) can be written as

dF = pdx+ xdp− f ′(x)dx = xdp⇒ F ′(p) = x.

This is a remarkable formula, which makes this transform useful for differential equa-tions. The second derivative follows as

F ′′(p) =dx

dp=

1dpdx

=1

f ′′(x)> 0,


and therefore F (p) is strictly convex. We observe that the Legendre transform pre-serves convexity.

10.2. Involution. This means that we can apply the Legendre transform to F (p) andthe claim is that this returns f(x). This means that the Legendre transform is aninvolution; i.e., it is its own inverse. To show that this is true we rewrite the Legendretransform as

f(x) = xf ′(x)− F (p) = F ′(p)p− F (p),

where we have used the equation for the differential and p = f ′(x). However, the lastexpression is equal to

maxp

[xp− F (p)]

for fixed x by the same argument as given before, which yields x = F ′(p) for thelocation of the maximum in p. Therefore

f(x) = maxp

[xp− F (p)] and F (p) = maxx

[px− f(x)]

both hold and this proves that the Legendre transform is an involution.

10.3. Total Differential of Legendre Transform. Consider f(x, y) and its Legendretransform with respect to x, which we shall denote by

F (p, y) = maxx

[px− f(x, y)].

Again, this is equivalent to the two equations

F (p, y) = px− f(x, y) and p = fx(x, y).

The total differential of F (p, y) is

dF =

(∂F

∂p

)y

dp+

(∂F

∂y

)p

dy,

where we have indicated what must be kept constant during differentiation. On theother hand,

dF = pdx+ xdp−(∂f

∂x

)y

dx−(∂f

∂y

)x

dy = xdp−(∂f

∂y

)x

dy.

Comparing the last two expressions we find that(∂F

∂p

)y

= x and

(∂F

∂y

)p

= −(∂f

∂y

)x

.

The first equation is the natural extension of the differential of F . The second equationapplies to derivatives with respect to any variable that is not involved in the transfor-mation. Note the minus sign.


10.4. Local Legendre Transformation. The assumption of global strict convexitycan be relaxed if one is only interested in the local Legendre transform of f(x) in theneighborhood of a given point x. That is to say, consider any smooth function f(x) ata point x and define the corresponding values of the local Legendre transform F (p) andof its argument p as

F (p) = xf ′(x)− f(x) and p = f ′(x).

Thus (x, f(x)) is mapped locally to a particular pair (p, F (p)). If we vary x→ x+ dxthen we can compute how p and F change in return. This yields

dp =dp

dxdx = f ′′(x)dx and dF =

dF

dpdp = xf ′′(x)dx.

In this way the local Legendre transform can be carved out and the map (x, f(x))→(p, F (p)) remains invertible if f ′′(x) is nonzero. The sign of f ′′(x) does not matter.

For a strictly convex function this local construction can be extended globally andthen agrees with the previous definition. However, if f ′′(x) goes through zero then theLegendre transform F (p) becomes multivalued. This follows from dp = f ′′(x)dx, whichchanges sign and therefore the same values of p are traversed again. For example, thelocal Legendre transform of

f(x) =x3

3is F (p) = ±2p3/2

3

with the domain restriction p ≥ 0. The sign of F corresponds to the sign of x.

10.5. Multivariable Case. The Legendre transform with respect to several variablesis defined as a sequence of single-variable transforms. For example, consider a globallystrictly convex function f(x1, x2), i.e., a function whose matrix of second derivatives ispositive definite. then the Legendre transform is defined as

F (p1, p2) = maxx1,x2

[p1x1 + p2x2 − f(x1, x2)].

As before, this yields

p1 =∂f

∂x1and p2 =

∂f

∂x2

and also

∂F

∂p1= x1 and

∂F

∂p2= x2.

the same applies to the preservation of convexity and the involution character of theLegendre transform. A local Legendre transform relaxes the assumption of global strictconvexity to the assumption that the equations of p1 and p2 can be inverted locally, i.e.,the matrix of second derivatives of f is nonsingular at (x1, x2).


11. Canonical Equations

We now apply the Legendre transform to the action principle. This is a generalmathematical technique and applies to any feasible variational problem, whether of me-chanical origin or not. To be specific we consider the standard action integral for aone-degree-of-freedom system, i.e.,

S[q] =

∫ T

0

L(q, q, t)dt

with corresponding EL-equation

d

dt

∂L

∂q=∂L

∂q.

We assume that the Lagrangian L is strictly convex in its second slot, i.e., in itsdependence on the value q. This is certainly true in the standard kinetic energy case, inwhich L = T − V (q) and T = 1

2q2.

11.1. Hamiltonian Function. Based on this we introduce the Hamiltonian functionH(q, p, t) as the Legendre transform of L with respect to its second slot:

H(q, p, t) = maxq

[pq − L(q, q, t)].

This is equivalent to

H(q, p, t) = q∂L

∂q− L(q, q, t) and p =

∂L

∂q,

where the second equation must be inverted for given values of (q, p, t) to yield thevalue of q that must be inserted in the first equation.

Let’s consider this done and the definition of H as a function of q and t as well asof the new variable p to be completed. We can now use the differential relations tosubstitute in the EL-equation. This yields

∂L

∂q= p and

∂L

∂q= −∂H

∂q⇒ d

dtp = p = −∂H

∂q.

In addition, we also have

∂H

∂p= q.

In summary, the single second-order EL-equation for q(t) is replaced by the twocanonical equations.

Definition 11.1. Canonical Equations Let U ∈ Rn be a open subset of Rn andlet H(q, p, t) be a Hamiltonian function on U ×U × I. Then the EL-equation, in theform of the Hamiltonian function, is described by the following two equations:

q =∂H

∂p

p = −∂H∂q


The initial-value problem is posed by specifying initial values for both q(0) and p(0).This is a Hamiltonian system of first-order ODEs which has a number of structuraladvantages.

11.2. Canonical Action Principle. The canonical equations can be viewed as the EL-equations for a new action principle called the canonical action principle. This is a newfunctional in terms of the two functions q and p, namely

S[q, p] =

∫ T

0

(pq −H(q, p, t))dt.

Perhaps we should use a different symbol for it. Either way, in the present formthis is not equal to S[q], because we can still chose p(t) freely and this affects the valueof S[q, p]. However, if we fix q(t) and maximize the integrand over all possible valuesof p then the integrand becomes numerically equal to L because we are performing aLegendre transformation on H. Therefore we have that

S[q] = maxp(t)

S[q, p].

This means that the original action principle is recovered by demanding that thecanonical action S[q, p] be extremal with respect to independent variations of both q(t)and p(t).

The variation with respect to p does not require integration by parts and thereforedoes not require endpoint conditions for p(t). The corresponding EL-equation is the firsthalf of the canonical equations. the variation with respect to q does require integrationby parts and we use the canonical equations, where we have to fix the endpoint valuesof q.

We have derived the canonical equations once by direct substituting and a second timeby identifying a new action principle in terms of two functions for which the canonicalequations are the EL-equations. It is a peculiar fact that EL-equations for the canonicalaction principle are first-order equations in times, because the generic form of the EL-equations yields second-order equations. This is due to the very special linear way inwhich q appears in the canonical action integral.

11.3. Previous Examples in Canonical Form. The generic one-degree-of-freedomsystem has a quadratic kinetic energy 1

2mq2, which is obviously convex. The Legendre

transform then yields the canonical momentum

p =∂L

∂q= mq ⇒ q =

p

m.

The second form stresses that the equation above is to be used for eliminating qin favor of p. We find that p is proportional to the velocity and that the canonicalmomentum is equal to the standard momentum in physics if q is a Cartesian coordinate.This is good, because all our earlier work stands as it is. This is bad, because it makesus forget that the canonical formalism uses different variables in general, i.e., p 6= q ingeneral.

The Legendre transform of L(q, q, t) with respect to q is the Hamiltonian function

H(q, p, t) = pq − L =p2

m− p2

2m+ V (q, t) =

p2

2m+ V (q, t),


which is the energy of the system. It is typical that the kinetic energy term is againquadratic and that the mass m slides into the denominator in the canonical formalism.The canonical equations are

q =∂H

∂p=

p

mand p = −∂H

∂q= −∂V

∂q.

If L is symmetric in time t then so is the Hamiltonian function H. This leads toconservation of H along a solution trajectory (q(t), p(t)), as can be seen from the chainrule

dH

dt=∂H

∂qq +

∂H

∂pp+

∂H

∂t=∂H

∂q

∂H

∂p− ∂H

∂p

∂H

∂q+∂H

∂t=∂H

∂t.

This is zero if ∂H∂t

= 0 and therefore time symmetry implies energy conservation inexplicit form in the canonical formalism. Conservation laws tend to be more explicit inthe canonical formalism, which is one of its advantages.

This also happens in the central force orbit problem, which had two degrees of freedomr(t) and θ(t). The Lagrangian function was (we set m = 1 again)

L(r, r, θ) =1

2(r2 + r2θ2)− V (r)

which for r > 0 is again quadratic and convex in both r and θ. We perform a Legendretransform with respect to both of these and obtain

pr =∂L

∂r= r and pθ =

∂L

∂θ= r2θ.

This is trivially inverted and leads to the Hamiltonian function

H(r, pr, θ, pθ) =1

2(p2r + r−2p2

θ) + V (r).

Time symmetry means that H is conserved again. The canonical equations are

r =∂H

∂pr= pr, pr = −∂H

∂r=p2θ

r3− V ′(r),

θ =∂H

∂pθ=p2θ

r2, pθ = −∂H

∂θ= 0

I hope by now you see the recipe. The crucial equation is the last: the symmetrywith respect to θ implies the conservation of the canonical momentum associated with θ,namely pθ. This conserved quantity is the angular momentum M we encountered before.

We conclude that in general a symmetry with respect to a coordinate qi leads to aconservation law for associated canonical momentum pi because

pi = −∂H∂qi

= 0.

For a given Lagrangian with a continuous symmetry a smart change of variablesq → q might make the Lagrangian explicitly symmetric with respect to one of the newcoordinates and then one can exploit the conservation law for the associated canonicalmomentum. An example for this is transforming the orbit system from Cartesian intopolar coordinates, which made the azimuthal symmetry explicit. The art and science


of transformations that seek to make symmetries explicit goes by the name of canonicaltransformations in mechanics.

Finally, we can note that a symmetry with respect to a canonical momentum pilikewise leads a conservation law for the associated coordinate qi, namely

qi =∂H

∂pi= 0.

This reinforces the view that in the canonical equations the pairs (qi, pi) appear onthe same footing. We call (qi, pi) a pair of canonically conjugate variables.

12. The Poisson bracket

Definition 12.1. Poisson bracket In canonical coordinates (qi, pi) on the phasespace, given two functions f(qi, pi, t) and g(qi, pi, t), the Poisson bracket is given by

f, g =

N∑i=1

(∂f

∂qi

∂g

∂pi− ∂f

∂pi

∂g

∂qi

).

The canonical equations have an equivalent expression in terms of the Poisson bracket.This may be most directly demonstrated in an explicit coordinate frame. Suppose thatf(q, p, t) is a function on the phase space. Then from the multivariable chain rule, onehas

d

dtf(q, p, t) =

∂f

∂q

dq

dt+∂f

∂p

dp

dt+∂f

∂t.

Further, one may take p = p(t) and q = q(t) to be solutions to the canonical equations;that is,

q =∂H

∂p= q,H

p = −∂H∂q

= p,H

Then, one has

d

dtf(q, p, t) =

∂f

∂q

∂H

∂p− ∂f

∂p

∂H

∂q+∂f

∂t= f,H+

∂f

∂t.

Thus, the time evolution of a function f on the phase space can be given as aone-parameter family of symplectomorphisms (that are canonical transformations whichare area-preserving diffeomorphisms or just transformations that preserve the Poissonbracket), with the time t being the parameter: Hamiltonian motion is a canonical trans-formation generated by the Hamiltonian. That is, Poisson brackets are preserved in it,so that any time t in the solution to the canonical equations, q(t) = q(0) exp(−tH, ·),p(t) = p(0) exp(−tH, ·), can serve as the bracket coordinates. Poisson brackets arecanonical invariants.

Dropping the coordinates from the notation, one has


d

dtf =

(∂

∂t− H, ·

)f.

This equation is known as the Liouville equation.

12.1. Constants of motion. If we have a constant of motion for our given system thenthe constant of motion will commute with the Hamiltonian under the Poisson bracket.Suppose some function f(q, p) is a constant of motion. This implies that if p(t), q(t) isa trajectory or solution to the canonical equations, then one has that

df

dt= 0

along that trajectory. Then one has

d

dtf(q, p) = f,H

where, as above, the intermediate step follows by applying the canonical equations.If the Poisson bracket of f and g vanish (f, g = 0), then f and g are said to be in

involution.

12.2. The Poisson bracket in coordinate-free language. Let M be a symplecticmanifold, that is a manifold equipped with a symplectic form: a 2-form ω which is bothclosed (i.e. its exterior derivative dω = 0) and non-degenerate. For example, in thetreatment above, take M to be R2n and take

ω =

n∑i=1

dqi ∧ dpi.

If ιvω is the contraction operator defined by (ιvω)(w) = ω(v, w), then non-degeneracyis equivalent to saying that for every one-form α there is a unique vector field Ωα suchthat ιΩαω = α. Then if H is a smooth function on M , the Hamiltonian vector field XHcan be defined to be ΩdH . It is easy to see that

Xpi =∂

∂qi

Xqi = − ∂

∂pi

The Poisson bracket ·, · on (M,ω) is a bilinear operator on differentiable functions,defined by

f, g = ω(Xf , Xg).

The Poisson bracket of two functions on M is itself a function on M . The Poissonbracket is antisymmetric because

f, g = ω(Xf , Xg) = −ω(Xg, Xf ) = −g, f.Furthermore,

f, g = ω(Xf , Xg) = ω(Ωdf , Xg) = (ιΩdfω)(Xg) = df(Xg) = Xgf = LXgf.


Here Xgf denotes the vector field Xg applied to the function f as a directionalderivative, and LXgf denotees the (entirely equivalent) Lie derivative of the function f .If α is an arbitrary one-form on M , the vector Ωα generates (at least locally) a flow φx(t)satisfying the boundary condition φx(0) = x and the first-order differential equation

dφxdt

= Ωα |φx(t) .

The φx(t) will be symplectomorphisms (canonical transformations) for every t as afunction of x if and only if LΩαω = 0; when this is true, Ωα is called a symplectic vectorfield. Recalling Cartan’s identity LX(ω) = ιX(dω) + d(ιXω) and dω = 0, it follows that

LΩαω = d(ιΩαω) = dα.

Therefore Ωα is a symplectic vector field if and only if α is a closed form. Sinced(df) = d2f = 0, it follows that every Hamiltonian vector field Xf is a symplectic vectorfield, and that the Hamiltonian flow consists of canonical transformations. From above,under the Hamiltonian flow Xh,

d

dtf(φx(t)) = XHf = f,H.

This is a fundamental result in Hamiltonian mechanics, governing the time evolutionof functions defined on phase space. As noted above, when f,H = 0, f is a constant ofmotion of the system. In addition, in canonical coordinates (with pi, pj = qi, qj = 0and qi, pj = δij), the canonical equations for the time evolution of the system followimmediately from this formula.

It also follows that the Poisson bracket is a derivation in both arguments; that is, itsatisfies Leibnitz’s product rule:

fg, h = fg, h+ gf, h and f, gh = gf, h+ hf, g.The Poisson bracket is intimately connected to the Lie bracket of the Hamiltonian

vector fields. Because the Lie derivative is a derivation,

Lvιwω = ιLvwω + ιwLvω = ι[v,w]ω + ιwLvω.

Thus if v and w are symplectic, using Lvω = 0, Cartan’s identity and the fact thatιwω is a closed form,

ι[v,w]ω = Lvιwω = d(ιvιwω) + ιvd(ιwω) = d(ιvιwω) = d(ω(v, w)).

It follows that [v, w] = Xω(w,v), so that

[Xf , Xg] = Xω(Xg,Xf ) = −Xω(Xf ,Xg) = Xf,g.

Thus, the Poisson bracket on functions corresponds to the Lie bracket of the associatedHamiltonian vector field. We have also shown that the Lie bracket of two symplecticvector fields is a Hamiltonian vector field and hence is also symplectic. In the languageof abstract algebra, the symplectic vector fields form a sub algebra of the Lie algebra ofsmooth vector fields on M , and the Hamiltonian vector fields form an ideal of this subalgebra. The symplectic vector fields are the Lie algebra of the (infinite-dimensional) Liegroup of symplectomorphisms of M .

It is widely asserted that the Jacobi identity for the Poisson bracket,


f, g, h+ g, h, f+ h, f, g = 0

follows from corresponding identity for the Lie bracket of vector fields, but this istrue only up to a locally constant function. However, to prove the Jacobi identity, it issufficient to show that:

adf,g = [adf , adg],

where the operator adg on smooth functions on M is defined by adg(·) = ·, g andthe bracket on the right-hand side is the commutator of operators, [A,B] = AB − BA.The operator adg is equal to the operator Xg. The proof of the Jacobi identity followsfrom the equation

[Xf , Xg] = Xω(Xf ,Xg) = −Xω(Xf ,Xg) = −Xf,g,because the Lie bracket of vector fields is just their commutator as differential oper-

ators.The algebra of smooth functions on M , together with the Poisson bracket forms a

Poisson algebra, because it is a Lie algebra under the Poisson bracket, which additionallysatisfies the Leibniz rule. We have shown that every symplectic manifold is a Poissonmanifold, that is a manifold with a “curly-bracket“ operator on smooth functions suchthat the smooth functions form a Poisson algebra. However, not every Poisson manifoldarises in this way, because Poisson manifolds allow for degeneracy which cannot arise inthe symplectic case. Note also that the Poisson bracket of two constants of motion isagain a constant of motion.

Part 4. Symplectic Integrators

13. Introduction

The Hamiltonian flow preserves the symplectic structure; an approximation, suchas for numerical integration, in general does not, which can be expected to miss someimportant features of the system under study. In this part, we briefly recall how toimprove the Euler method in such a way that the symplectic structure is preserved.

14. The Euler method

Let X be a vector field on an open subset O of Rn and φX,t its flow (at time t). Recallthat φX,t(x0), x0 ∈ U , is by definition the evaluation at time t of the unique solution tothe Cauchy problem

(15)

x(t) = X(x),

x(0) = x0.

The time t must belong to the maximal interval of definition of the solution, which inturn in general depends on x0. By uniqueness we have the fundamental property

φX,t φX,s = φX,t+s,


for t, s and t+s in the maximal interval. This implies that, for t in the maximal intervaland any integer N , we have

φX,t = φX, tN · · · φX, t

N︸︷︷︸N times

.

The Euler method consists in approximating φX,τ , τ = tN

, by a truncation of its Taylor

expansion in τ . Let us work it out up to O(τ2). A path x(t) may be expanded aroundt = 0 as

x(τ) = x(0) + τ x(0) +O(τ2).

If it is the solution to (15), we then have

x(τ) = x0 + τX(x0) +O(τ2).

This yields

(16) φX,τ (x0) = x0 + τX(x0) +O(τ2).

The Euler method, at this order, consists in replacing φX,τ by its truncation

φX,τ (x0) := x0 + τX(x0),

getting the approximate solution

φEulerX,t := φX, t

N · · · φX, t

N︸︷︷︸N times

.

15. Hamiltonian systems

Let H be a Hamiltonian function (on an open subset U ⊂ R2d with simplectic struc-ture) and XH its Hamiltonian vector field. We want to compute its flow φXH ,t. (To

match the notation of the previous Section, we set n = 2d and O = U × Rd.) The

problem is that in general the truncation φXH , tNdoes not preserve the symplectic form,

nor does so the ensuing Euler approximation φEulerXH ,t

.The idea of a symplectic integrator, in this context, consists in choosing a different

approximation of φXH ,τ that is equal to φXH ,τ up to O(τ2) but preserves the symplecticform.

Suppose that H = H1 + H2 (we will see below that this is interesting for practicalpurposes if we can compute the Hamiltonian flows of H1 and H2 exactly). We then haveXH = XH1 +XH2 . By (16) we have

φXH1,τ (x0) = x0 + τXH1(x0) +O(τ2),

φXH2,τ (x0) = x0 + τXH2(x0) +O(τ2).

This implies(17)

(φXH1,τ φXH2

,τ )(x0) = x0 + τ(XH1(x0) +XH2(x0)) +O(τ2) = x0 + τXH(x0) +O(τ2),

so

φXH ,τ = φXH1,τ φXH2

,τ +O(τ2).


The idea is now to replace φXH ,τ by

(18) φXH ,τ := φXH1,τ φXH2

,τ

getting the approximate solution

(19) φSIXH ,t := φXH , tN

· · · φXH , tN︸︷︷︸N times

Notice that φXH1,τ and φXH2

,τ preserve the symplectic structure and hence so does

φSIXH ,t

.The method is applicable if we can compute φXH1

,τ and φXH2,τ exactly. The typical

case is that of a Hamiltonian of the form H(q, p) = T (p) + U(q). In this case, we setH1(q, p) = T (p) and H2(q, p) = U(q).3 To be even more specific (even though this is not

needed), let us assume that T (p) = ||p||22m

. Let us now compute the corresponding flows.In the case of H1 we have to solve

q =p

m,

p = 0,

with initial conditions at t = 0 given by q0 and p0. This yields

q(τ) = q0 + τp0

m, p(τ) = p0.

Hence

φXH1,τ (q0, p0) =

(q0 + τ

p0

m, p0

).

In the case of H2 we have to solve

q = 0,

p = −∇U(q),

with the same initial conditions. This yields

q(τ) = q0, p(τ) = p0 − τ∇U(q0).

Therefore,φXH2

,τ (q0, p0) = (q0, p0 − τ∇U(q0)) .

We finally have by (18) that

(20) φXH ,τ (q0, p0) =

(q0 + τ

p0 − τ∇U(q0)

m, p0 − τ∇U(q0)

)Notice that this agrees up to O(τ2) with the first-order Taylor approximation

φXH ,τ (q0, p0) =(q0 + τ

p0

m, p0 − τ∇U(q0)

)but is different from the second-order Taylor approximation of φXH ,τ (which in generaldoes not preserve the symplectic structure).

If we had chosen instead H1(q, p) = U(q) and H2(q, p) = T (p) = ||p||22m

, we would havegot

φXH ,τ (q0, p0) =(q0 + τ

p0

m, p0 − τ∇U

(q0 + τ

p0

m, p0

)),

3Observe that the Hamiltonian H2 is not convex as a function of p, so it certainly does not arise asthe Legendre transform of a Lagrangian. This shows one more reason why the Hamiltonian formalismis often preferable.


which yields a different approximation that also preserves the symplectic structure.

Part 5. The Noether Theorem

16. Introduction

A symmetry is an invertible transformation that preserves some properties. In geom-etry, e.g., one considers transformations that preserve “shape” (in Euclidean geometrythis leads to translations, rotations and reflections as symmetries). In mechanical sys-tems, symmetries are transformations that preserve the equations of motions. One alsoconsiders stricter symmetries that preserve more, e.g., the action functional.

Symmetry transformations often arise as flows of vector fields. The latter are thencalled infinitesimal symmetries. A fundamental result in mechanics is Noether’s theoremwhich relates infinitesimal symmetries to constants of motion.

17. Symmetries in Lagrangian mechanics

Let JL be the action functional associated to a Lagrangian function on U × Rd × I:

JL[x] =

∫ b

a

L(x(t), x(t), t) dt,

where x is a path [a, b]→ U , [a, b] ⊂ I.

Definition 17.1. SymmetryA symmetry, in a strict sense, is a diffeomorphism φ of U such that

(21) JL[φ x] = JL[x]

for each path x on U .

Notice that in particular a symmetry maps extremal paths (with end points xa, xb)to extremal paths (with end points φ(xa), φ(xb)).

Definition 17.2. Infinitesimal SymmetryA vector field X on U is called an infinitesimal symmetry if its flow φs is a sym-

metry for all s (where it is defined). We then have

0 =∂

∂s

∣∣∣s=0

JL[φs x] =δJLδx

[x,X],

where we regard the path t 7→ X(x(t)) as a variation.

Recall the general formula

δJLδx

[x, δx] = ELL[x, δx] +

(d∑i=1

∂L

∂viδxi)∣∣∣b

a,


with

ELL[x, δx] =

∫ b

a

d∑i=1

(∂L

∂qi(x(t), x(t), t)− d

dt

∂L

∂vi(x(t), x(t), t)

)δxi(t) dt.

If X is an infinitesimal symmetry and x is an extremal path (i.e., it satisfies the Euler–Lagrange equations), then we get

0 =

(d∑i=1

∂L

∂viXi

)∣∣∣ba.

This shows that the quantity in brackets is the same at the initial time a and at the finaltime b. Since the choice of the end times was irrelevant for the derivation, this showsthat the quantity in brackets does not change and is therefore a constant of motion.4 Wecan summarize this as follows:

Definition 17.3. Noether 1-formThe Noether 1-form associated to the Lagrangian L is the 1-form αL on U×Rd×I

defined by

αL :=

d∑i=1

∂L

∂vi(q, v, t) dqi.

Theorem 17.1. Noether’s Theorem If X is an infinitesimal symmetry of JL,then

IX := ιXαL

is a constant of motion.

17.1. Symmetries and the Lagrangian function. A symmetry of JL may actuallybe recognized directly on L.

Definition 17.4. Tangent lift of a diffeomorphismThe tangent lift φ of φ is given by

φ : U × Rd × [a, b] → U × Rd × [a, b](q, v, t) 7→ (φ(q), dφ(q)v, t)

Lemma 17.2. A diffeomorphism φ of U is a symmetry of JL, as expressed byequation (21), if and only if L φ = L.

Proof. Recall that we observed that for any map φ we have JL[φx] = JLφ[x] (actually

we proved this in class). This immediately shows that φ is a symmetry if L φ = L.On the other hand, assume that φ is a symmetry. Then, by the above observation,

we have that JLφ[x] = JL[x] for every path x. If we define L := L φ−L, we then have

JL[x] = 0 for every path x. We want to show that L vanishes identically. Assume on

4Recall that a function f on U ×Rd × I is a called a constant of motion (a.k.a. conserved quantityor first integral) if f(x(t), x(t), t) is constant in t for every solution x of the Euler–Lagrange equations.


the contrary that there is a point (q, v, τ) in U × Rd × I such that L(q, v, τ) 6= 0. Thenconsider the path x(t) = q + (t− τ)v on the interval [τ, τ + ε]. We then have

JL[x] =

∫ τ+ε

τ

L(q + (t− τ)v, v, t) dt = L(q, v, τ)ε+O(ε2).

By choosing ε appropriately, we then see that JL[x] cannot vanish for all paths and thisis a contradiction.

We now want to move on to infinitesimal symmetries. First we need the following

Lemma 17.3. Tangent lift of a vector fieldLet X be a vector field on U and φs its flow. Then the tangent lift φs is the flow

of the vector field X on U × Rd × I given by

X(q, v, t) =d∑i=1

Xi(q)∂

∂qi+

d∑i,j=1

∂Xi

∂qj(q) vj

∂

∂vi,

called the tangent lift of X.

Notice that the tangent lift of X defined above does not depend on t and does nothave a component in the t direction, so it can be regarded (and this is usually done) asa vector field on U × Rd.

Proof. From φs+s′ = φs φs′ and φ0 = id, we get, by the definition of the tangent lift,

that φs+s′ = φs φs′ and φ0 = id. So φs is also a flow. To compute the corresponding

vector field, we just have to derive φs with respect to s at s = 0 and this gives theexplicit expression for X in the Lemma.

We now have the

Corollary 1. A vector field X is an infinitesimal symmetry of JL if and only if

X(L) = 0

17.2. Examples.

Example 17.1. Suppose that the system is invariant under translations in the ith direc-tion, i.e., Xi = ∂

∂qiis an infinitesimal symmetry. This implies that the ith component

of the generalized momentum,

pi =∂L

∂vi= ιXiαL,

is a constant of motion. Since Xi = ∂∂qi

, we see that this occurs if and only if ∂L∂qi

= 0.

Example 17.2. Suppose that d = 3 and the system is invariant under rotations aroundthe ith axis. In this case, the infinitesimal symmetry is given by the vector field Xi(q) =∑3j,k=1 εijk q

j ∂∂qk

. The corresponding constant of motion is

J i = ιXiαL =

3∑j,k=1

εijk qj ∂L

∂vk

which is called the (generalized) angular momentum. If we denote ∂L∂vk

by pk, we then

have J i = (q×p)i. One can easily compute X =∑3j,k=1 εijk q

j ∂∂qk

+∑3j,k=1 εijk v

j ∂∂vk

.


This describes an infinitesimal rotation acting simultaneously on the q-space and on thev-space. A Lagrangian L of the form L(q,v, t) = T (v)− U(q) is then invariant if bothfunctions T and U are invariant under rotations (around the ith axis). If T and U areinvariant under the whole group SO(3), then all components of the vector J = q × pare conserved. Finally, if T (v) = 1

2m||v||2, we then have J = q×mv which is the usual

expression for the angular momentum.

17.3. Generalized symmetries. Condition (21) is a bit too strong since it involvesthe action functional also away from extremal paths. One way to weaken it, in such away that a symmetry still preserves the Euler–Lagrange equations, is to assume that therestriction of

JL,φ[x] := JL[φ x]− JL[x]

on each path space P [a,b]xa,xb := x : [a, b] → U : x(a) = xa, x(b) = xb is constant. This

can also be characterized as in the following

Lemma 17.4. Assume that the restriction of JL,φ on each P [a,b]xa,xb is constant. Then

there is a function F on U such that

JL,φ[x] = F (xb)− F (xa)

for each x ∈ P [a,b]xa,xb for each a, b, xa, xb.

Remark 17.1. Notice that the function F is defined up to an additive constant on eachconnected component of U .

Proof. Let us first consider the case when U is connected. Fix a point y in U . Then

JL,φ will take the same value on all paths in P [a,b]y,xb . We then define F (xb) as such value.

Next we want to show that F (y) = 0. For xb = y, we also have the constant pathx(t) = y ∀t ∈ [a, b] at our disposal. This can be considered for arbitrary a and b. SinceJL,φ is defined as an integral, its value vanishes if b tends to a. So F (y) = 0.

Consider now a path x in P [a,b]xa,y with the property that all its derivatives at t = a

vanish. Next define x ∈ P [a,b]y,xa by x(t) = x(a+ b− t). Finally define x ∈ P [a,b]

y,y by

x(t) =

x(2t− a) if t∈

[a, a+b

2

]x(2t− b) if t ∈

[a+b

2, b]

Notice that the vanishing condition on the derivatives of x at its initial point makes thispath smooth. We have JL,φ[x] = F (y) = 0. On the other hand, since JL,φ is definedas an integral, we have JL,φ[x] = JL,φ[x] + JL,φ[x]. So JL,φ[x] = −JL,φ[x] = −F (xa)

since x ∈ P [a,b]y,xa . By the constancy of JL,φ we conclude that JL,φ[x] = −F (xa) for every

x ∈ P [a,b]xa,y

Finally consider a path x in P [a,b]xa,xb that passes through y at some time τ ∈ (a, b).

Denote by x the restriction of this path to the interval [a, τ ] and by x its restrictionto [τ, b]. Since JL,φ is defined as an integral, we have JL,φ[x] = JL,φ[x] + JL,φ[x].

Since x ∈ P [τ,b]y,xb , we know that JL,φ[x] = F (xb) and, since x ∈ P [a,τ ]

xa,y , we know thatJL,φ[x] = −F (xa). We conclude that JL,φ[x] = F (xb)− F (xa).

If U is not connected, we just apply the above procedure to each connected component.


We then define a generalized symmetry as a pair (φ, F ), where φ is a diffeomorphism ofU and F is a function on U , such that

JL[φ x] = JL[x] + F (xb)− F (xa)

for each x ∈ P [a,b]xa,xb for each a, b, xa, xb. Notice that we can write

F (xb)− F (xa) =

∫ b

a

d

dtF (x(t)) dt =

∫ b

a

d∑i=1

xi(t)∂F

∂qi(x(t)) dt.

If we define F (q, v) :=∑di=1 v

i ∂F∂qi

(q), then, by repeating the arguments in the proof of

Lemma 17.2, we conclude that (φ, F ) is a generalized symmetry if and only if

(22) L φ = L+ F .

The infinitesimal version of a generalized symmetry is a vector field X such that itsflow φs together with a family Fs of functions is a generalized symmetry for all s (whereit is defined):

(23) JL[φs x] = JL[x] + Fs(xb)− Fs(xa)

for all s. Define f = ∂Fs∂s|s=0. We then say that the pair (X, f) defines a generalized

infinitesimal symmetry. By differentiating equations (22) and (23) with respect to s ats = 0 we get the following

Proposition 17.5. The pair (X, f) defines a generalized infinitesimal symmetry ofJL if and only if

X(L) = f

with f(q, v) :=∑di=1 v

i ∂f∂qi

(q). In this case

IX := ιXαL − f

is a constant of motion.

Remark 17.2. The even more general case when the restriction of JL,φ on each P [a,b]xa,xb is

only locally constant5 on each path space P [a,b]xa,xb := x : [a, b] → U : x(a) = xa, x(b) =

xb can be treated with a bit more of work. We enunciate the results for completeness.A diffeomorphism φ of U has this property if and only there is a closed 1-form Θ =∑di=1 Θi dqi such that

JL[φ x] = JL[x] +

∫x

Θ

and this occurs if and only if L φ = L + Θ with Θ(q, v) =∑di=1 v

i Θi(q). A vectorfield X defines the infinitesimal version of this if and only if there is a closed 1-form

θ =∑di=1 θi dqi such that X(L) = θ with θ(q, v) =

∑di=1 v

i θi(q). This does not lead toa conservation law if θ is not exact, but only to the statement that the closed 1-formd(ιXαL) − θ integrates to zero along every orbit. Notice that if the closed form θ isexact, θ = df , then we are in the case described above and we have indeed a constantof motion as in Proposition 17.5. By Poincare Lemma we know that this necessarilyhappens if U is star shaped. Otherwise, we may always restrict our attention to a star

5This means that each such restriction does not change under continuous deformations of the path.On the other hand, if the path space is not connected, the restriction may take a different value ondifferent connected components.


shaped neighborhood V ⊂ U of the initial conditions. As long as the orbit lies in V , wehave a constant of motion. We may also take a later point of this orbit as a new initialcondition and choose a new star shaped neighborhood. In general, we may patch thewhole orbit by star shaped neighborhoods Vi and in each of them we have a constantof motion. In each Vi we have indeed a function fi such that θ|Vi = dfi and a constantof motion IX,i := ιXαL − fi defined on Vi. Notice however that, if Vi and Vj have anonempty intersection, the restriction of fi and fj to this intersection will not be equalin general, but will differ by a constant. So, in this setting, a constant of motion exists,but only locally, and it may be thought globally only up to locally defined constants. Inparticular, it is not a globally defined function.

Remark 17.3. Notice that we have a linear map

T : Ω1(U) → C∞(U × Rd)∑di=1 θi(q) dqi 7→

∑di=1 v

i θi(q).

The function θ used in the previous remark is then Tθ. The function f used above isTdf . Notice that

Tφ∗θ = φ∗Tθ

for every smooth map φ : U → V and for every θ ∈ Ω1(V ). More abstracly, we may write

(24) Tφ∗ = φ∗T

as an equality of linear maps Ω1(V )→ Ω1(U).

17.3.1. Equivalent Lagrangians. Let L be a Lagrangian on U ×Rd×I, U open in Rd, andlet G be a function on U . Define

L = L+ TdG

(i.e., L(q, v, t) = L(q, v, t)+∑di=1 v

i ∂G∂qi

(q)). We then have JL[x] = JL[x]+G(xb)−G(xa)

for all x ∈ P [a,b]xa,xb . This implies that L and L have the same extremal paths and are

therefore equivalent from the point of view of Euler–Lagrange equations. The Noether1-forms are related by

αL = αL + dG.

Also notice that, if (φ, F ) is a generalized symmetry for L, then by (24) we get that (φ, F+

φ∗G) is a generalized symmetry for L. Hence, if (X, f) is a generalized infinitesimal

symmetry for L, we see that (X, f+X(G)) is a generalized infinitesimal symmetry for L.We then have that ιXαL−f−X(G) = ιXαL−f , so the constant of motion correspondingto the generalized symmetry (X, f) of L is equal to the constant of motion corresponding

to the generalized symmetry (X, f +X(G)) of L.Notice that a strict infinitesimal symmetry X of L is only a generalized one if one

uses the equivalent Lagrangian L and X(G) 6= 0. So the notion of strict symmetry reallydepends on the choice of Lagrangian. Since one cannot be sure to have chosen the “right”Lagrangian (and there might be no Lagrangian that is “right” for all symmetries in casethere are more at hand), the correct framework to use is that of generalized symmetries.

Also observe that, if f can be written as −X(G) for some function G, then we maytransform the generalized infinitesimal symmetry (X, f) of L into an infinitesimal sym-

metry, in the strict sense, of the equivalent Lagrangian L. Notice however that, even ifthis may be possible, different infinitesimal symmetries may require different equivalentLagrangians to be made strict.


Example 17.3 (constant magnetic field). Consider a charged particle moving in aconstant magnetic field of magnitude B 6= 0. The system is clearly invariant undertranslations in every direction. On the other hand, the Lagrangian depends on a vectorpotential generating this field and this cannot be translation invariant, otherwise it wouldproduce the zero magnetic field. This provides an example of generalized symmetry.

Suppose for definiteness that B points in x3-direction and the particle has mass mand charge e. Newton’s equation then read (setting the speed of light c to 1)

mx1 = eBv2,

mx2 = −eBv1,

mx3 = 0,

and they are clearly invariant under translations x(t) 7→ x(t)+a, where a is an arbitraryvector, since they only depend on the first and second time derivatives of x. On theother hand, the Lagrangian of the system is

L(x,v) =1

2m||v||2 + ev ·A(x).

Notice that A cannot be translation invariant (i.e., constant) since B = ∇×A 6= 0. Forexample we may choose A1 = −Bx2, A2 = A3 = 0 getting

L(x,v) =1

2m||v||2 − eB v1x2.

We then have

αL =

3∑i=1

mvidqi − eB x2dx1.

Notice that this Lagrangian is invariant under translations in the x1 and in the x3

direction. So we have the integrals of motion

P1 := ι ∂∂x1

αL = mv1 − eB x2,

P3 := ι ∂∂x3

αL = mv3.

Under a translation in the x2 direction we have, on the other hand,

∂

∂x2L =

∂

∂x2L = −eB v1 = f

with f(x) = −eB x1. We then get the integral of motion

P2 := ι ∂∂x2

αL − f = mv2 + eB x1.

Exercise 17.6. Show that rotations around the x3 axis are also generalized symmetriesand compute the corresponding integral of motion.

Exercise 17.7. Show that the integrals of motions computed above do not change if thevector potential A is changed to A +∇λ.

Exercise 17.8. Show that one can choose A such that v ·A is invariant under rotationsaround the x3 axis, so that rotations around the x3 axis become symmetries in the strictsense.

Exercise 17.9. Show that changing the vector potential A to A = A +∇λ produces anequivalent Lagrangian.


18. From the Lagrangian to the Hamiltonian formalism

Suppose L is a convex Lagrangian on U×Rd×I. Denote by ψL its associated Legendremapping and by H its Legendre transform. The first observation is that the factors ∂L

∂vi

appearing in the Definition 17.3 of the Noether 1-form are just the generalized momentapi, so we have

Definition 18.1. Liouville 1-form

αL = ψ∗Lα,

where

α :=

d∑i=1

pidqi

is the Liouville 1-form (a.k.a. the Poincare 1-form or the tautological 1-form).

Notice that the dependency on the Lagrangian of the Noether 1-form comes onlythrough the Legendre mapping: the Liouville 1-form is independent of L. Also notice thatthe canonical symplectic form ω is actually dα. Now suppose that X is an infinitesimalsymmetry. Then IX = ιXαL is a constant of motion. We clearly have IX = ψ∗LFX with

FX = ιXα =

d∑i=1

Xi(q) pi.

Since this is a constant of motion, we then have XH(FX) = 0. But this is equivalent to

(25) X(H) = 0,

where X is the Hamiltonian vector field of FX :

(26) ιXω = −dFX .

This is a consequence of the following basic

Lemma 18.1. Let f and g be functions, and let Xf and Xg be their Hamiltonianvector fields. Then

Xf (g) = −Xg(f) = ιXg ιXfω

We will see that equations (25) and (26) characterize symmetries in the Hamiltonianformalism. The present case has however two peculiarities. The first is that FX is linearin the p variables. The second is that we have FX = ιXα, which in turn implies LX α = 0.

A simple computation shows that

X = X −d∑

i,j=1

∂Xi

∂qj(q) pi

∂

∂pj

which is called the cotangent lift of X.If (X, f) is a generalized inifnitesimal symmetry, then we have IX = ιXαL − f =

ψ∗LFX,f with FX,f =∑di=1 X

i(q) pi−f(q). If we denote by Xf = X−Xf the Hamiltonian

vector field of FX,f , we then get Xf (H) = 0 and ιXfω = −dFX,f , which have the same


form as (25) and (26). Notice that in this case FX,f is at most linear in the p variablesand that LXf α = df .

19. Symmetries in Hamiltonian mechanics

Let H ∈ C∞(V ) be a time-independent Hamiltonian on an open subset V of Rd×Rd.Notice that we can assume H to be time independent without loss of generality.6

Recall that Hamilton’s equation for H can be obtained as Euler–Lagrangian equationsfor the Lagrangian7

LH(q, p, vq, vp) =

d∑i=1

piviq −H(q, p) = Tα−H,

where we have used the notations of Remark 17.3 with α =∑di=1 pi dqi the Liouville

1-form. Notice that we also have αLH = α.

A symmetry is then a diffeomorphism φ of the phase space V such that φ∗LH = LH .

By Remark 17.3, we have φ∗LH = Tφ∗α − φ∗H. Since H does not depend on thevelocities, we then have that φ is a symmetry if and only if

φ∗H = H and φ∗α = α.

Notice that the second equation also implies φ∗ω = ω. A vector field Y on V is then aninfinitesimal symmetry if and only if

Y (H) = 0 and LY α = 0,

whereas the corresponding constant of motion is ιY α.This occurs, e.g., if H is the Legendre transform of L and Y is the cotangent lift of

an infinitesimal symmetry of L.Similarly, we see that (φ, F ) is a generalized symmetry if and only if

φ∗H = H and φ∗α = α+ dF.

A pair (Y, g) of a vector field and a function on V is then a generalized infinitesimalsymmetry if and only if

Y (H) = 0 and LY α = dg.

In this case, the constant of motion is F = ιY α− g. This implies

ιY ω = −dF.

Notice that now F can be an arbitrary function on V . We have thus arrived at the

Theorem 19.1. A (possibly generalized) infinitesimal symmetry of a Hamiltoniansystem on V with Hamiltonian function H is a Hamiltonian vector field XF , ιXF ω =−dF , such that XF (H) = 0.

6If H(q, p, t) were time dependent, we could replace it by the time-independent Hamiltonian

H(q, τ, p, pτ ) = H(q, p, τ) + pτ on extended phase space.7Notice that fixing both positions and momenta at endpoints in general yields no solutions. On

the other hand, the boundary term in computing the functional derivative of this Lagrangian doesnot involve the variation of the momenta. One then considers extremal paths with fixed qs at theendpoints, leaving the ps free. The corresponding Euler–Lagrange equations for these extremal pathsare then the Hamilton equations.


Remark 19.1. Notice that by Lemma 18.1 this immediately implies XH(F ) = 0, so F isthe corresponding constant of motion.

Remark 19.2. Lemma 18.1 can also be read the other way around. Namely, supposethat F is a constant of motion for a Hamiltonian system with Hamiltonian H (i.e.,XH(F ) = 0). Then XF is a symmetry (i.e., XF (H) = 0). In this way, we have aone-to-one correspondence between symmetries and constants of motion upto an additive local constant.

Remark 19.3. Suppose that H is the Legendre transform of a Lagrangian L defined onU × Rd =: V . In the Lagrangian formalism, infinitesimal symmetries are vector fieldson U that preserve L (possibly up to a term Tdf , with f a function on U). Thesecorrespond to infinitesimal symmetries in the Hamiltonian formalism whose constants ofmotions are at most linear in the p variables. These are just very particular examples ofsymmetries.

Remark 19.4. The general case corresponds, in its infinitesimal form, to having a vectorfield Y and a closed 1-from θ such that Y (H) = 0 and LY α = θ. Notice that the secondequation is equivalent to LY ω = 0. Again this does not produce a constant of motionin general. However, locally we may have θ exact and proceed as above.

19.1. Symplectic geometry. The above results have a nice, general form in symplecticgeometry. There one just assumes to have a closed 2-form ω with the property that foreach function f there is a unique vector field Xf , called the Hamiltonian vector field off , such that ιXfω = −df . The latter property may be checked by computing the matrixrepresenting ω at each point in local coordinates and verifying that it is nondegenerate.

A function f satisfying ιXω = −df is called a Hamiltonian function for X and is notuniquely defined (it is defined up to a constant on each connected component). A vectorfield that is the Hamiltonian vector field for some function is called a Hamiltonian vectorfield. Not every vector field is Hamiltonian. Notice that a Hamiltonian vector field Xautomatically satisfies LXω = 0. A vector satisfying this property is called a symplecticvector field. In general not every symplectic vector field is Hamiltonian.

Notice that Lemma 18.1 holds in this general setting. Hence, if we are given a Hamil-tonian function H, which defines the dynamic of the system, we define a symmetry tobe a Hamiltonian vector field X such that X(H) = 0. If f is a Hamiltonian function forX, then the Lemma implies that XH(f) = 0, so f is a constant of motion. This is thegeneral, simple and beautiful version of Noether’s theorem in symplectic geometry.

19.2. The Kepler problem. Consider the Hamiltonian

H(q,p) =||p||2

2m− K

||q||

for a body in a gravitational field, where K is a positive constant. Notice that H is asmooth function on U × R3 with U = R3 \ 0. Rotation invariance, in the Lagrangianversion, yields conservation of the angular momentum8

J = q× p,

8By this we mean that each component of the vector J is a constant of motion.


see Example 17.2. Notice that the components of J are linear in p as they come fromsymmetries in the Lagrangian formalism. Another constant of motion is the Laplace–Runge–Lenz vector (shortly, the Lenz vector)

A = p× J−mK q

||q|| .

The simplest way to prove that A is conserved is by taking its time derivative along asolution to Hamilton’s equations

q =p

m,

p = −K q

||q||3

Using J = 0 along a solution, we then get

A = −K q

||q||3 × J−K(

q

||q|| −p · q||q||3

)= 0.

From XH(A) = 0 we then get XAi(H) = 0 for all i.Notice that A has a quadratic term in the p variables, so it does not come from a

symmetry in the Lagrangian setting (nor is XAi a cotangent lift). On the other hand,the vector fields XAi are symmetries in the Hamiltonian formalism.

The conservation of the Lenz vector can be used to derive the Kepler orbits directly(without having to solve the differential equations). First observe that, by the cyclicproperty of the triple product, we have

A · q = ||J||2 −mK||q||.

Recall that adapting the coordinates to the initial conditions we may assume that themotion occurs in the xy plane. Then J points in the z direction. So A is also in the xyplane. If we denote by θ the angle between A and q, we may then rewrite the aboveequation as

Ar cos θ = J2 −mKr,

wehre we have set r := ||q||, J := ||J|| and A := ||A||. Notice that J and A are obviouslyalso constants of motion. We then get

r =J2

mK

1 + AmK

cos θ,

which shows that the orbits are conic sections with eccentricity e = AmK

.

Remark 19.5. The Hamiltonian vector fields XJi generate infinitesimal rotations andcorrespond to the fact that the Hamiltonian is rotation invariant (with rotations extendedto phase space by the cotangent lift; i.e, A ∈ SO(3) acts by (q,p) 7→ (Aq,Ap)). Thevector fields XAi generate additional infinitesimal transformations. One can show thatthe XJi ’s and the XAi ’s together generate a (nonlinear) action of the group SO(4) onphase space.


Part 6. The Hamilton-Jacobi equation

20. Introduction

The Hamilton–Jacobi equation is a PDE associated to a Hamiltonian system withwhich it has a two-way link. On the one hand, it is the PDE satisfied by the actionfunctional evaluated on orbits as a function of the end variables; as such, solving theHamilton equation provides an effective method of solving the Cauchy problem for theHamilton–Jacobi equation (method of characteristics). On the other hand, a solutiondepending on enough parameters (complete integral) provides a generating function fora canonical transformation that trivializes the Hamiltonian, thus allowing one to solvethe Hamilton equations.

The latter method actually only works for very special systems (integrable systems);however, perturbations of integrable systems are more effectively studied in the canonicalvariables in which the unperturbed Hamiltonian is trivialized.

Finally, the Hamilton–Jacobi equation is related to the asymptotics of the Schrodingerequation in the classical limit.

21. The Hamilton–Jacobi equation

Throughout we will denote by U an open subset of Rd, by I an interval and by H aHamiltonian function on U × Rd × I, which we will write as H(q, p, t).

Definition 21.1. Hamilton-Jacobi equationThe Hamilton–Jacobi equation for the unknown function S on an open subset of

of U × I is then

∂S

∂t+H

(q,∂S

∂q, t

)= 0

where ∂S∂q

is a shorthand notation for ∂S∂q1

, . . . , ∂S∂qd

.

If the Hamiltonian does not depend on the time variable t, we simply write H(q, p).To it one associates the

Definition 21.2. reduced Hamilton-Jacobi equationThe reduced Hamilton–Jacobi equation in the unknown function S0 on an open

subset of U is given by:

H

(q,∂S0

∂q

)= E

where E is a parameter, called the energy.

Notice that if S0 is a solution of the reduced Hamilton–Jacobi equation at energy E,then S(q, t) = S0(q)− Et is a solution of the Hamilton–Jacobi equation.


Finally, notice that S and S0 enter into the equations only through their derivatives;so every solution can be shifted by a constant yielding a new solution.

Remark 21.1. The Hamilton–Jacobi equation is also related to the asymptotics of theSchrodinger equation which appears in quantum mechanics. For the HamiltonianH(q, p) =∑di=1

p2i2m

+ V (q), this is the PDE

i~∂ψ∂t

= −d∑i=1

~2

2m

∂2ψ

(∂qi)2+ V (q)ψ,

where the unknown ψ is a time-dependent complex-valued function on U and ~ is aconstant. If one writes

ψ(q, t) = A(q, t) ei~φ(q,t),

where A and φ are real valued, and assumes that for ~ small we have A = A0 + O(~)and φ = φ0 + O(~), then φ0 solves the Hamilton–Jacobi equation on W := q ∈ U :A0(q, t) 6= 0 ∀t. This gives an indication that in the limit ~→ 0 quantum mechanics isapproximated by classical mechanics.

22. The action as a function of endpoints

Let L be a Lagrangian function on U × Rd × I and denote by S its correspondingaction functional:

S[x] =

∫ tB

tA

L(x(t), x(t), t) dt,

where x is a map [tA, tB ]→ U , with [tA, tB ] ⊂ I. Let W be an open subset of U×I×U×Isuch that for each (qA, tA, qB , tB) ∈ W there is a unique extremal path, denoted byq∗qA,tA,qB ,tB (or simply q∗). Define

S∗(qA, tA, qB , tB) := S[q∗qA,tA,qB ,tB ].

Example 22.1. Consider the free particle in one dimension; i.e., d = 1, U = R, L(q, v, t) =12mv2. The EL equation, mq = 0, is easily solved and we have, withW = (qA, tA, qB , tB) ∈

U × I × U × I : tA 6= tB,

q∗qA,tA,qB ,tB (t) =(t− tA)qB + (tB − t)qA

tB − tA.

Hence

q∗qA,tA,qB ,tB (t) =qB − qAtB − tA

and

S∗(qA, tA, qB , tB) =

∫ tB

tA

1

2m

(qB − qAtB − tA

)2

dt =1

2m

(qB − qA)2

tB − tA.

We now want to study the dependency of S∗ on its arguments.


Theorem 22.1. We have

∂S∗

∂qiB= pBi,

∂S∗

∂tB= −HB ,

∂S∗

∂qiA= −pAi,

∂S∗

∂tA= HA,

where

pBi(qA, tA, qB , tB) :=∂L

∂vi(qB , q

∗qA,tA,qB ,tB (tB), tB),

pAi(qA, tA, qB , tB) :=∂L

∂vi(qA, q

∗qA,tA,qB ,tB (tA), tA),

HB(qA, tA, qB , tB) =

d∑i=1

pBi(qA, tA, qB , tB)q∗i(tB)− L(qB , q∗(tB), tB),

HA(qA, tA, qB , tB) =

d∑i=1

pAi(qA, tA, qB , tB)q∗i(tA)− L(qA, q∗(tA), tA).

One can also compactly write

dS∗ = −d∑i=1

pAidqiA +HAdtA +

d∑i=1

pBidqiB −HBdtB .

Example 22.2. Let us check the above formulas in the case of Example 22.1. We explicitlyhave

∂S∗

∂qB= m

qB − qAtB − tA

= mq∗qA,tA,qB ,tB (tB)

and∂S∗

∂tB= −1

2m

(qB − qAtB − tA

)2

= −1

2mq∗qA,tA,qB ,tB (tB)2.

Proof of Theorem 22.1. We begin with the derivatives with respect to qB . Let δqB be avector in Rd. We have

limε→0

S∗(qA, tA, qB + εδqB , tB)− S∗(qA, tA, qB , tB)

ε=

d∑i=1

∂S∗

∂qiBδqiB .

Write q∗ε := q∗qA,tA,qB+εδqB ,tBand q∗ = q∗0 . We have

q∗ε = q∗ + εδq +O(ε2)

for a uniquely defined path δq : [tA, tB ]→ Rd. Thus,

limε→0

S[q∗ε ]− S[q∗]

ε=δS

δq[q∗, δq],

which impliesd∑i=1

∂S∗

∂qiBδqiB =

δS

δq[q∗, δq].

Now observe that, since q∗ε is an extremal path, we have (by the formula on page 36)

(27)δS

δq[q∗, δq] =

d∑i=1

(∂L

∂vi(qB , q

∗(tB), tB)δqi(tB)− ∂L

∂vi(qA, q

∗(tA), tA)δqi(tA)

).


Since q∗ε (tA) = qA, we get δq(tA) = 0. From q∗ε (tB) = qB + εδqB , we conclude δq(tB) =δqB . Thus,

δS

δq[q∗, δq] =

d∑i=1

∂L

∂vi(qB , q

∗(tB), tB)δqiB ,

which proves the first equation.We now come to the second equation, the derivative with respect to tB . For δtB ∈ R,

we have

limε→0

S∗(qA, tA, qB , tB + εδtB)− S∗(qA, tA, qB , tB)

ε=∂S∗

∂tBδtB .

Write q∗ε := q∗qA,tA,qB ,tB+εδtBand q∗ = q∗0 . Notice that q∗ε is defined on the interval

[tA, tB + εδtB ]. Assume εδtB ≥ 0 and denote by q∗ε the restriction of q∗ε to [tA, tB ]. Wethen have

S[q∗ε ] = S[q∗ε ] +

∫ tB+εδtB

tB

L(q∗ε (t), q∗ε (t), t) dt.

Notice that we have

q∗ε = q∗ + εδq +O(ε2)

for a uniquely defined path δq : [tA, tB ]→ Rd. Thus,9

limε→0

S[q∗ε ]− S[q∗]

ε=δS

δq[q∗, δq] + L(q∗ε (tB), q∗ε (tB), tB)δtB ,

which implies∂S∗

∂tBδtB =

δS

δq[q∗, δq] + L(q∗ε (tB), q∗ε (tB), tB)δtB .

We use again (27). Notice that q∗ε (tA) = q∗ε (tA) = qA implies δq(tA) = 0. On the otherhand,

qB = q∗ε (tB + εδtB) = q∗ε (tB) + εq∗ε (tB)δtB +O(ε2) =

= q∗ε (tB) + εq∗(tB)δtB +O(ε2) = q∗(tB) + ε(δq(tB) + q∗ε (tB)δtB) +O(ε2) =

= qB + ε(δq(tB) + q∗ε (tB)δtB) +O(ε2),

so δq(tB) = −q∗ε (tB)δtB . We conclude that

δS

δq[q∗, δq] = −

d∑i=1

∂L

∂vi(qB , q

∗(tB), tB)(q∗ε )i(tB)δtB ,

which proves the second equation.The third and the fourth equations are proved along the same lines.

Now assume that L is convex and denote by H its Legendre transform. We then have

HB(qA, tA, qB , tB) = H(qB , pB(qA, tA, qB , tB), tB).

The first two eqations in Theorem 22.1 then imply

∂S∗

∂tB+H

(qB ,

∂S∗

∂qB, tB

)= 0;

i.e., S∗ as a function of the end variables (qB , tB) satisfies the Hamilton–Jacobi equation.

9The limit is for ε→ 0+ if δtB ≥ 0 and for ε→ 0− otherwise.


22.1. Hamiltonian systems. Let H be a Hamiltonian function on U ×Rd × I. Recallthat Hamilton equations for H are also the EL equations for the the Lagrangian function

L(q, p, vq, vp, t) :=

d∑i=1

piviq −H(q, p, t)

defined on (U × Rd) × R2d × I. Denote by S the action functional corresponding to L.

The EL equations for L are the same as the conditions for an extremal path for S withfixed boundary values for q (but no conditions on p).

Let W be again an open subset of U×I×U×I such that for each (qA, tA, qB , tB) ∈Wthere is a unique extremal path in U × Rd denoted by (q∗qA,tA,qB ,tB , p

∗qA,tA,qB ,tB ) (or

simply (q∗, p∗)) and define

S∗(qA, tA, qB , tB) := S[(q∗qA,tA,qB ,tB , p∗qA,tA,qB ,tB )].

Remark 22.1. Notice that, if H is the Legendre transform of a Lagrangian function L,then we have

S∗(qA, tA, qB , tB) = S∗L(qA, tA, qB , tB),

where SL is the action functional for L.

Notice that L is not convex; nevertheless, we have that

Theorem 22.2. S∗ as a function of the end variables (qB , tB) satisfies theHamilton–Jacobi equation.

Proof. From ∂L∂viq

= pi, we get that pB(qA, tA, qB , tB) = p∗(tB); so, by the first equation

in Theorem 22.1,∂S∗

∂qiB= p∗i (tB).

Since ∂L∂vpi

= 0, we get

HB(qA, tA, qB , tB) = H(qB , p∗(tB), tB).

The second equation of Theorem 22.1 then yields the Hamilton–Jacobi for S∗.

23. Solving the Cauchy problem for the Hamilton–Jacobi equation

Definition 23.1. Cauchy problem for the HJ-equationThe Cauchy problem for the Hamilton–Jacobi equation is the system

∂S

∂t+H

(q,∂S

∂q, t

)= 0,

S(q, t0) = σ(q),

where σ is a given function on U . For simplicity, and actually without loss ofgenerality, we assume that H is time independent and take t0 = 0.


We want to show that we can solve the Cauchy problem for the Hamilton–Jacobiequation by integrating the Hamilton equations. First define

L :=

(q, p) ∈ U × Rd : pi =

∂σ

∂qi(q) ∀i

and Lt := φt(L), where φt is the flow of the Hamiltonian vector field of H.

Notice that L is defined as the graph of a map. This will be the case also for Lτ for τsmall. Let t1 be the greatest number such that Lτ is a graph for all τ ∈ (0, t1). Then foreach q ∈ U and for each τ ∈ (0, t1) there is a unique p(q, τ) such that (q, p(q, τ)) ∈ Lτ .We then have a unique orbit (q∗, p∗) on the interval [0, τ ] such that q∗(τ) = q andp∗(τ) = p(q, τ): namely, (q∗, p∗)(t) = φt(φ

−1τ (q, p(q, τ))).

Equivalently, (q∗, p∗) is the unique orbit with q∗(τ) = q and p∗i (0) = ∂σ∂qi

(q∗(0)) ∀i.10

These orbits are called characteristics of the system.Let φ : U × (0, t1) → U be the map that assigns q∗(0) to a pair (q, τ). Notice that

limτ→0 φ(q, τ) = q ∀q ∈ U , so we can extend φ to U × [0, t1) Then we have the

Theorem 23.1. If H is the Legendre transform of the Lagrangian L, then thefunction

S(q, τ) := σ(φ(q, τ)) +

∫ τ

0

L(q∗(t), q∗(t)) dt

solves the Cauchy problem for τ ∈ [0, t1).

Proof. We clearly have S(q, 0) = σ(q). Then let

Schar(q, τ) :=

∫ τ

0

L(q∗(t), q∗(t)) dt = S∗(φ(q, τ), 0, q, τ).

By Theorem 22.1 we have

∂Schar

∂qj(q, τ) = pj(q, τ)−

d∑i=1

∂σ

∂qi(φ(q, τ))

∂φi

∂qj(q, τ),

∂Schar

∂τ(q, τ) = −H(q, p(q, τ))−

d∑i=1

∂σ

∂qi(φ(q, τ))

∂φi

∂τ(q, τ).

Hence

∂S

∂qj(q, τ) = pj(q, τ),

∂S

∂τ(q, τ) = −H(q, p(q, τ)),

so S solves the Hamilton–Jacobi equation.

For a general Hamiltonian H, we have the

10In practice, one solves the backward Cauchy problem with final conditions q∗(τ) = q and p∗(τ) =

p for some p and then uses the conditions p∗i (0) = ∂σ

∂qi(q∗(0)) ∀i to determine p as a function of q and

τ .


Theorem 23.2. The function

S(q, τ) := σ(φ(q, τ)) +

∫ τ

0

(d∑i=1

p∗i (t)q∗i(t)−H(q∗(t), p∗(t))

)dt

solves the Cauchy problem for τ ∈ [0, t1).

Notice, by the way, that for H the Legendre transform of L one has∫ τ

0

(d∑i=1

p∗i (t)q∗i(t)−H(q∗(t), p∗(t))

)dt =

∫ τ

0

L(q∗(t), q∗(t)) dt.

Proof. We clearly have S(q, 0) = σ(q). Then recall that the Hamilton equations for Hare also the EL equations for the Lagrangian function

L(q, p, vq, vp, t) :=

d∑i=1

piviq −H(q, p, t)

defined on (U × Rd) × R2d × I. Denote by S the action functional corresponding to Land observe that

SHam(q, τ) :=

∫ τ

0

(d∑i=1

p∗i (t)q∗i(t)−H(q∗(t), p∗(t))

)dt = S[(q∗, p∗)].

We now want to compute derivatives of SHam with respect to its arguments. Weessentially proceed like in the proof of Theorem 22.1. The first remark is that, for anysolution (Q,P ) of the Hamilton equations on an interval [0, τ ], we have

δS[(Q,P ), (δQ, δP )] := limε→0

S[(Q+ εδQ, P + εδP )]− S[(Q,P )]

ε

=

d∑i=1

(Pi(τ)δQi(τ)− Pi(0)δQi(0)

).

In particular, for the characteristic (q∗, p∗) we have

δS[(q∗, p∗), (δQ, δP )] =

d∑i=1

pi(q, τ)δQi(τ)−d∑i=1

∂σ

∂qi(φ(q, τ))δQi(0),

where again we have written p∗(τ) = p(q, τ).Now consider the characteristic (q∗ε , p

∗ε ) corresponding to (q + εδq, τ). We write q∗ε =

q∗ + εδq∗ +O(ε2) and p∗ε = p∗ + εδp∗ +O(ε2). Since

q + εδq = q∗ε (τ) = q + εδq∗(τ) +O(ε2),

we get δq∗(τ) = δq. Since

φ(q + εδq, τ) = q∗ε (0) = φ(q, τ) + εδq∗(0) +O(ε2),

we get δq∗i(0) =∑dj=1

∂φi

∂qj(q, τ)δqj . Hence

δS[(q∗, p∗), (δq∗, δp∗)] =

d∑i=1

pi(q, τ)δqi −d∑

i,j=1

∂σ

∂qi(φ(q, τ))

∂φi

∂qj(q, τ)δqj .


Since

limε→0

SHam(q + εδq, τ)− SHam(q, τ)

ε=

d∑j=1

∂SHam

∂qj(q, τ)δqj ,

we finally get

∂SHam

∂qj(q, τ) = pj(q, τ)−

d∑i=1

∂σ

∂qi(φ(q, τ))

∂φi

∂qj(q, τ)

and so∂S

∂qj(q, τ) = pj(q, τ).

Similarly, we now denote by (q∗ε , p∗ε ) the characteristic corresponding to (q, τ + εδτ).

We assume εδτ ≥ 0 and denote by (q∗ε , p∗ε ) the restriction of (q∗ε , p

∗ε ) to [0, τ ]. We then

have

(28) SHam(q, τ+εδτ) = = S[(q∗ε , p∗ε )]+

(d∑i=1

pi(q, τ)q∗i(τ)−H(q, p(q, τ))

)εδτ+O(ε2).

We write q∗ε = q∗+ εδq∗+O(ε2) and p∗ε = p∗+ εδp∗+O(ε2). Reasoning as above we get

δq∗(τ) = −q∗(τ)δτ and δq∗i(0) = ∂φi

∂τ(q, τ)δτ . So, putting everything together, we get

∂SHam

∂τ(q, τ) = −H(q, p(q, τ))−

d∑i=1

∂σ

∂qi(φ(q, τ))

∂φi

∂τ(q, τ)

and so∂S

∂τ(q, τ) = −H(q, p(q, τ)).

Hence S solves the Hamilton–Jacobi equation.

Remark 23.1. In Remark 21.1 we have seen that the Hamilton–Jacobi equation is relatedto the asymptotics of the Schrodinger equation. With the results of this Section, we now

also see that ψ(q, t) := ei~S(q,t) solves the Schrodinger equation up to O(~). This shows

the role of the exponential of the action functional in quantum mechanics (its full-fledgedrole appears in Feynman’s path integral).

24. Generating functions

Let ω =∑di=1 dpidq

i be the symplectic form on W ⊂ R2d with coordinates (q, p)

and Ω =∑di=1 dPidQ

i be the symplectic form on Z ⊂ R2d with coordinates (Q,P ).Recall that a symplectomorphism (in this context a.k.a. canonical transformation) isa diffeomorphisms φ : W → Z such that φ∗Ω = ω. Also recall that the orbits of theHamiltonian system with Hamiltonian function H on W are bijectively mapped by thesymplectomorphism φ to the orbits of the Hamiltonian system with Hamiltonian function

H := H φ−1 on Z. The idea is to look for a symplectomorphism that makes H verysimple, so that its Hamilton equations can be solved explicitly.

We actually look for a symplectomorphism that makes H depend only on the P

variables: H(Q,P ) = K(P ) for some function K. In this case, the Hamilton equation

are just P = 0 and Qi = ∂K∂Pi

(P ), ∀i. The solution of the Cauchy problem with initial


condition, say at time t = 0, given by (Q0, P0) is then P (t) = P0 and Qi(t) = Qi0 +∂K∂Pi

(P0)t, ∀i.Notice that ω = dα with α =

∑di=1 pidq

i and that Ω = dβ with β =∑di=1 PidQ

i.Notice that a diffeomorphism φ : W → Z such that α−φ∗β is the differential of a functionF is in particular a symplectomorphism. We will only consider symplectomprhisms ofthis form.11 More explicitly, we denote by Qi(q, p) and Pi(q, p) the components of φ. Sowe have

(29)

d∑i=1

pidqi −

d∑i=1

Pi(q, p)dQi(q, p) = dF (q, p).

Now we assume that the graph of φ in Z×W may be parametrized by the (q, P ) variables(instead of the (q, p) variables). Namely, we want to solve the equations Pi = Pi(q, p)with respect to the p variables getting them as smooth functions of the P s and the qs.By the implicit function theorem, this is possible if the following condition is satisfied:

Assumption 1. We assume that the matrix(∂Pi∂pj

)i,j=1,...,d

is nondegenerate for all

(q, p) ∈W .

Under this assumption, we then get functions p(q, P ) and define Q(q, P ) := Q(q, p(q, P )).Equation (29) now becomes

d∑i=1

pi(q, P )dqi −d∑i=1

PidQi(q, P ) = dF (q, P )

with F (q, P ) = F (q, p(q, P )). Setting

S(q, P ) := F (q, P ) +

d∑i=1

PiQi(q, P ),

we finally get

d∑i=1

pi(q, P )dqi +

d∑i=1

Qi(q, P )dPi = dS(q, P ),

where we have removed the tildes for simplicity of notation. Notice that this equationis equivalent to the system

pi =∂S

∂qi,(30)

Qi =∂S

∂Pi,(31)

for i = 1, . . . , d. Notice that Assumption 1 is satisfied if the following holds:

Assumption 2. The matrix(

∂2S∂qj∂Pi

)i,j=1,...,d

is nondegenerate for all (q, P ).

11In general, φ is a symplectomorphism if and only if α−φ∗β is closed. If W and Z are contractible,in particular star shaped, then every closed 1-form is automatically exact.


As the map φ can then be reconstructed by these equations, an S satisfying this

condition is called a generating function for φ. Next we want H(Q,P ) = K(P ). Since

H(Q(q, P ), P ) = H(q, p(q, P )),we get by (30) that

H

(q,∂S

∂q

)= K(P ).

Hence S, as a function of q parametrized by P , solves the reduced Hamilton–Jacobiequation at energy K(P ). A solution satisfying Assumption 2 is called a completeintegral.

What we have shown is Jacobi’s theorem that a complete integral for the Hamilton–Jacobi equation for H allows one to solve its Hamilton equations.

Notice that the P variables are constants of motions for the H system. Also noticethat their differentials are clearly linear independent and that their pairwise Poissonbrackets vanish. Regarded as functions of the (q, p) variables, they are then independentconstants of motions for the H system in involution. A d-dimensional Hamiltoniansystem with d independent constants of motions in involution is called integrable. Wethen see that the above method can only work for integrable systems.

Part 7. Introduction to Differentiable manifolds

25. Introduction

Differentiable manifolds are sets that locally look like some Rn so that we can docalculus on them. Examples of manifolds are open subsets of Rn or subsets defined byconstraints satisfying the assumptions of the implicit function theorem (example: then-sphere Sn). Also in the latter case, it is however more practical to think of manifoldsintrinsically in terms of charts.

The example to bear in mind are charts of Earth collected in an atlas, with theindications on how to pass from one chart to another.

26. Manifolds

Definition 26.1. ChartA chart on a set M is a pair (U, φ) where U is a subset of M and φ is an injective

map from U to Rn for some n.

The map φ is called a chart map or a coordinate map. One often refers to φ itself asa chart, for the subset U is part of φ as its definition domain.

If (U, φU ) and (V, φV ) are charts on M , we may compose the bijections (φU )|U∩V : U∩V → φU (U ∩ V ) and (φV )|U∩V : U ∩ V → φV (U ∩ V ) and get the bijection

φU,V := (φV )|U∩V (φU |U∩V )−1 : φU (U ∩ V )→ φV (U ∩ V )

called the transition map from (U, φU ) to (V, φV ) (or simply from U to V ).


Definition 26.2. AtlasAn atlas on a set M is a collection of charts (Uα, φα)α∈I , where I is an index

set, such that ∪α∈IUα = M .

Remark 26.1. We usually denote the transition maps between charts in an atlas (Uα, φα)α∈Isimply by φαβ (instead of φUα,Uβ ).

One can easily check that, if φα(Uα) is open ∀α ∈ I (in the standard topology of thetarget), then the atlas A = (Uα, φα)α∈I defines a topology12 on M :

OA(M) := V ⊂M | φα(V ∩ Uα) is open ∀α ∈ I.We may additionally require that all Uα be open in this topology or, equivalently, thatφα(Uα∩Uβ) is open ∀α, β ∈ I. In this case we speak of an open atlas. All transition mapsin an open atlas have open domain and codomain, so we can require them to belong toa class C ⊂ C0 of maps (e.g., Ck for k = 0, 1, . . . ,∞, or analytic, or complex analytic, orLipschitz).

Definition 26.3. C-Atlas A C-atlas is an open atlas such that all transition mapsare C-maps.

Example 26.1. Let M = Rn. Then A = (Rn, φ) is a C-atlas for any structure C if φ isan injective map with open image. Notice that M has the standard topology iff φ is ahomeomorphism with its image. If φ is the identity map Id, this is called the standardatlas for Rn.

Example 26.2. Let M be an open subset of Rn with its standard topology. Then A =(U, ι), with ι the inclusion map, is a C-atlas for any structure C.Example 26.3. Let M = Rn. Let A = (Rn, Id), (Rn, φ). Then A is a C-atlas iff φ andits inverse are C-maps.

Example 26.4. LetM be the set of lines (i.e., one-dimensional affine subspaces) of R2. LetU1 be the subset of nonvertical lines and U2 the subset of nonhorizontal lines. Notice thatevery line in U1 can be uniquely parametrized as y = m1x+ q1 and every line in U2 canbe uniquely parametrized as x = m2y+ q2. Define φi : Ui → R2 as the map that assignsto a line the corresponding pair (mi, qi), for i = 1, 2. Then A = (U1, φ1), (U2, φ2) is aCk-atlas for k = 0, 1, 2, . . . ,∞.

Example 26.5. Let M = Sn := x ∈ Rn+1 |∑n+1i=1 (xi)2 = 1 be the n-sphere. Let

N = (0, . . . , 0, 1) and S = (0, . . . , 0,−1) denote its north and south poles, respectively.Let UN := Sn \ N and US := Sn \ S. Let φN : UN → Rn and φS : US → Rn be thestereographic projections with respect to N and S, respectively: φN maps a point y inSn to the intersection of the plane xn+1 = 0 with the line passing through N and y;similarly for φS . Then A = (UN , φN ), (US , φS) is a Ck-atlas for k = 0, 1, 2, . . . ,∞.

Example 26.6. Let M be a subset of Rn defined by Ck-constraints satisfying the assump-tions of the implicit function theorem. Then locally M can be regarded as the graph ofa Ck-map. Any open covering of M with this property yields a Ck-atlas.

The same set often occurs with different atlases, like in the last example, that we wishto consider equivalent.

12For more on topology, see Appendix A.


Definition 26.4. C-equivalence Two C-atlases on the same set are C-equivalent iftheir union is also a C-atlas.

Notice that the union of two atlases has in general more transition maps and in checkingequivalence one has to check that also the new transition maps are C-maps.

Example 26.7. Let M = Rn, A1 = (Rn, Id) and A2 = (Rn, φ) for an injective mapφ with open image. These two atlases are C-equivalent iff φ and its inverse are C-maps.

We finally arrive to the

Definition 26.5. C-manifold A C-manifold is an equivalence class of C-atlases.

Remark 26.2. Usually in defining a C-manifold we explicitly introduce one atlas andtacitly consider the corresponding C-manifold as the equivalence class containing thisatlas. Also notice that the union of all atlases in a given class is also an atlas, calledthe maximal atlas, in the same equivalence class. Thus, we may equivalently define amanifold as a set with a maximal atlas. This is not very practical as the maximal atlasis huge.

Working with an equivalence class of atlases instead of a single one also has the advantagethat whatever definition we want to give requires choosing just a particular atlas in theclass and we may choose the most convenient one.

Example 26.8. The standard C-manifold structure on Rn is the C-equivalence class of theatlas (Rn, Id).

Remark 26.3. Notice that the same set can be given different manifold structures. Forexample, let M = Rn. On it we have the the standard C-structure of the previousexample. For any injective map φ with open image we also have the C-structure givenby the equivalence class of the the C-atlas (Rn, φ). The two structures define the sameC-manifold iff φ and its inverse are C-maps. Notice that if φ is not a homeorphism, the twomanifolds are different also as topological spaces. Suppose that φ is a homeomorphismbut not a Ck-diffeomorphism; then the two structures define the same topological spaceand the same C0-manifold, but not the same Ck-manifold.

Recall that the existence of a Ck-diffeomorphism between an open subset of Rm andan open subset of Rn implies m = n since the differential at any point is a linearisomorphism of Rm and Rn as vector spaces (the result is also true for homeomorphisms,though the proof is more difficult). So we have the

Definition 26.6. DimensionA connected manifold has dimension n if for any (and hence for all) of its charts

the target of the chart map is Rn. In general, we say that a manifold has dimensionn if all its connected components have dimension n. We write dimM = n.


27. Maps

Let F : M → N be a map of sets. Let (U, φU ) be a chart on M and (V, ψV ) be a charton N . The map

FU,V := ψV F|U φ−1U : φU (U)→ ψV (V )

is called the representation of F in the charts (U, φU ) and (V, ψV ).

Definition 27.1. C-mapA map F : M → N between C-manifolds is called a C-map or C-morphism if all

its representations are C-maps.

Remark 27.1. Notice that is it enough to choose one atlas in the equivalence class ofthe source and one atlas in the equivalence class of the target and to check that allrepresentations are C-maps for charts of these two atlases. The condition will thenautomatically hold for any other atlases in the same class.

Definition 27.2. C-functionA C-map from a C-manifold M to R with its standard manifold structure is called

a C-function. We denote by C(M) the vector space of C-functions on M .

Remark 27.2. Notice that a Ck-map between open subsets of Cartesian powers of R isalso automatically Cl ∀l ≤ k, so a Ck-manifold can be regarded also as a Cl-manifold∀l ≤ k. As a consequence, ∀l ≤ k, we have the notion of Cl-maps between Ck-manifoldsand of Cl-functions on a Ck-manifold.

Definition 27.3. Ck-diffeomeorphismAn invertible C-map between C-manifolds whose inverse is also a C-map is called

a C-isomorphism. A Ck-isomorphism, k ≥ 1, is usually called a Ck-diffeomorphism(or just a diffeomorphism).

Example 27.1. Let M and N be open subsets of Cartesian powers of R with the standardC-manifold structure. Then a map is a C-map of C-manifolds iff it is a C-map in thestandard sense.

Example: Let M be a C-manifold and U an open subset thereof. We consider U asa C-manifold by restricting any atlas from M to U . Then the inclusion map ι : U → Mis a C-map.

Example 27.2. Let M be Rn with the equivalence class of the atlas (Rn, φ), where φis an injective map with open image. Let N be Rn with its standard structure. Thenφ : M → N is a C-map for any C (since its representation is the identity map on an opensubset of Rn). If in addition φ is also surjective, then φ : M → N is a C-isomorphism.

Remark 27.3. Let M and N be as in the previous example with φ a bijection. Assumethat φ : R → R is a homeomorphism but not a Ck-diffeomorphism. Then the givenatlases are C0-equivalent but not Ck-equivalent. As a consequence, M and N are thesame C0-manifold but different Ck-manifolds. On the other hand, φ : M → N is alwaysa Ck-diffeomorphism of Ck-manifolds. More difficult is to find examples of two Ck-mani-folds that are the same C0-manifold (or C0-isomorphic to each other), but are different,


non Ck-diffeomorphic Ck-manifolds. Milnor constructed a C∞-manifold structure on the7-sphere that is not diffeomorphic to the standard 7-sphere. From the work of Donaldsonand Freedman one can derive uncountably many different C∞-manifold structures on R4

(called the exotic R4s) that are not diffeomorphic to each other nor to the standard R4.In dimension 3 and less, one can show that any two C0-isomorphic manifolds are alsodiffeomorphic.

27.1. Submanifolds. A submanifold is a subset of a manifold that is locally given byfixing some coordinates. More precisely:

Definition 27.4. C-submanifoldLet N be an n-dimensional C-manifold. A k-dimensional C-submanifold, k ≤ n,

is a subset M of N such that there is a C-atlas (Uα, φα)α∈I of N with the propertythat ∀α such that Uα ∩M 6= ∅ we have φα(Uα) = V1,α × V2,α with V1,α open in Rkand V2,α open in Rn−k and φα(Uα ∩M) = V1,α × x for some x in V2,α.

One can prove that (Uα, φα)α∈I:Uα∩M 6=∅ is a C-atlas for M . Moreover, the inclusionmap M → N becomes a C-map.

Example 27.3. Any open subset M of a manifold N is a submanifold.

Example 27.4. One may check that a subset of the standard Rn defined in terms ofCk-constraints satisfying the assumptions of the implicit function theorem is a Ck-sub-manifold. There is a more general version of this, the implicit function theorem formanifolds, which we do not present here.

28. Topological manifolds

In this Section we concentrate on C0-manifolds. Notice however that every C-manifoldis by definition also a C0-manifold.

As we have seen, an atlas whose chart maps have open images defines a topology. Inthis topology the chart maps are clearly open maps. We also have the

Lemma 28.1. All the chart maps of a C0-atlas are continuous, so they are homeo-morphisms with their images.

Proof. Consider a chart (Uα, φα), φα : Uα → Rn. Let V be an open subset of Rn andW := φ−1

α (V ). For any chart (Uβ , φβ) we have φβ(W ∩Uβ) = φαβ(V ). In a C0-atlas, alltransition maps are homeomorphisms, so φαβ(V ) is open for all β, which shows that Wis open. We have thus proved that φα is continuous. Since we already know that it isinjective and open, we conclude that it is a homeomorphism with its image.

Different atlases in general define different topologies. However,

Lemma 28.2. Two C0-equivalent C0-atlases define the same topology.


Proof. Let A1 = (Uα, φα)α∈I and A2 = (Uj , φj)j∈J be C0-equivalent. Let W beopen in the A1-topology. We have φj(W ∩ Uα) = φαj(φα(W ∩ Uα)). Since the atlasesare equivalent, φαj is a homeomorphism and, since W is A1-open, φα(W ∩ Uα) is open.Hence φj(W ∩ Uα) is open. Since this holds for all j ∈ J , we get that W ∩ Uα is openin the A2-topology. Finally, we write W = ∪α∈IW ∩Uα, i.e., as a union of A2-open set.This shows that W is open in the A2-topology.

As a consequence a C0-manifold has a canonically associated topology in which all chartsare homeomorphism. This suggests the following

Definition 28.1. Topological manifoldA topological manifold is a topological space endowed with an atlas (Uα, φα)α∈I

in which all Uα are open and all φα are homeomorphisms with their images.

Theorem 28.3. A topological manifold is the same as a C0-manifold.

Proof. We have seen above that a C0-manifold structure defines a topology in whichevery atlas in the equivalence class has the properties in the definition of a topologicalmanifold; so a C0-manifold is a topological manifold. On the other hand, the atlas ofa topological manifold is open and all transition maps are homeomorphisms since theyare now compositions of homeomorphisms. The C0-equivalence class of this atlas thendefines a C0-manifold.

Also notice the following

Lemma 28.4. Let M and N be C0-manifolds and so, consequently, topological man-ifolds. A map F : M → N is a C0-map iff it is continuous. In particular, a C0-iso-morphism is the same as a homeomorphism.

Proof. Suppose that F is a C0-map. Let (Uα, φα)α∈I be an atlas onM and (Vβ , ψβ)β∈Jbe an atlas onN . For everyW ⊂ N , ∀α ∈ I and ∀β ∈ J , we have φα(F−1(W∩Vβ)∩Uα) =F−1α,β(ψβ(W ∩ Vβ)). If W is open, then ψβ(W ∩ Vβ) is open for all β. Since all Fα,β are

continuos, we conclude that φα(F−1(W ∩ Vβ) ∩ Uα) is open for all α and all β. Hence,F−1(W ∩ Vβ) is open for all β, so F−1(W ) = ∪β∈JF−1(W ∩ Vβ) is open. This showsthat F is continuous.

On the other hand, if F is continous, then all its representations are also continuoussince all chart maps are homeomorphisms. Thus, F is a C0-map.

Remark 28.1. In the following we will no longer distinguish between C0-manifolds andtopological manifolds.13 Both descriptions are useful. Sometimes we are given a set withcharts (like in the example of the manifold of lines in the plane). In other cases, weare given a topological space directly (like in all examples when our manifold arises as asubset of another manifold, e.g., Rn).

In the definition of a topological manifold, several textbooks assume the topology to beHausdorff and second countable. These properties have important consequences (likethe existence of a partition of unity which is fundamental in several contexts, e.g., in

13What we have proved above is that the category of C0-manifolds and the category of topologicalmanifolds are isomorphic, if you know what categories are.


showing the existence of Riemannian metrics, or in proving Stokes theorem), but are notstrictly necessary, so we will not assume them here.

Notice that the Hausdorff property, stating that distinct points always have disjointneighborhoods, is usually only assumed to avoid the “pathology” of having not uniquelydefined limits. There are however several cases when one needs non-Hausdorff manifolds,so this assumption is often no longer required in modern textbooks.

Example: Let M := R∪∗ where ∗ is a one-element set (and ∗ 6∈ R). Let U1 = R,φ1 = Id, and U2 = (R \ 0) ∪ ∗ with φ2 : U2 → R defined by φ2(x) = x if x ∈ R \ 0and φ2(∗) = 0. One can easily see that this is a C0-atlas (actually a C∞-atlas, for thetransition functions are just identity maps). On the other hand, the induced topology isnot Hausdorff, for 0 and ∗ do not have disjoint neighborhoods.

29. Differentiable manifolds

A Ck-manifold with k ≥ 1 is also called a differentiable manifold. If k = ∞, one alsospeaks of a smooth manifold. The Ck-morphisms are also called differentiable maps, andalso smooth maps in case k =∞. Recall the following

Definition 29.1. Immersion and SubmersionLet F : U → V be a differentiable map between open subsets of Cartesian powers

of R. The map F is called an immersion if dxF is injective ∀x ∈ U and a submersionif dxF is surjective ∀x ∈ U .

Then we have the

Definition 29.2. A differentiable map between differentiable manifolds is called animmersion if all its representations are immersions and a submersion if all its repre-sentations are submersions. An injective immersion is also called an embedding.

Observe that to check if a map is an immersion or a submersion one just has to considerall representations for a given choice of atlases.

One can prove that the image of an embedding is a submanifold (and this is one verycommon way in which submanifolds arise in examples).

29.1. The tangent space. Recall that to an open subset of Rn we associate anothercopy of Rn, called its tangent space. Elements of this space, the tangent vectors, have thegeometric interpretation of velocities of curves passing through a point or of directionsalong which we can differentiate functions. We will use all these viewpoints to givedifferent caracterizations of tangent vectors to a manifold, even though we relegate thelast one, directional derivatives, to Appendix B as it can be safely ignored for the restof these notes.

In the following M is an n-dimensional Ck-manifold, k ≥ 1.


Definition 29.3. Coordinatized tangent vectorA coordinatized tangent vector at q ∈ M is a triple (U, φU , v) where (U, φU ) is

a chart with U 3 q and v is an element of Rn. Two coordinatized tangent vectors(U, φU , v) and (V, φV , w) at q are defined to be equivalent if w = dφU (q)φU,V v. Atangent vector at q ∈ M is an equivalence class of coordinatized tangent vectors atq. We denote by TqM , the tangent space of M at q, the set of tangent vectors at q.

A chart (U, φU ) at q defines a bijection of sets

(32) Φq,U : TqM → Rn[(U, φU , v)] 7→ v

We will also simply write ΦU when the point q is understood. Using this bijection,we can transfer the vector space structure from Rn to TqM making ΦU into a linearisomorphism. A crucial result is that this linear structure does not depend on the choiceof the chart:

Lemma 29.1. TqM has a canonical structure of vector space for which Φq,U is anisomorphism for every chart (U, φU ) containing q.

Proof. Given a chart (U, φU ), the bijection ΦU defines the linear structure

λ ·U [(U, φU , v)] = [(U, φU , λv)],

[(U, φU , v)] +U [(U, φU , v′)] = [(U, φU , v + v′)],

∀λ ∈ R and ∀v, v′ ∈ Rn. If (V, φV ) is another chart, we have

λ ·U [(U, φU , v)] = [(U, φU , λv)] =

= [(V, φV , dφU (q)φU,V λv)] = [(V, φV , λdφU (q)φU,V v)] =

= λ ·V [(V, φV , dφU (q)φU,V v)] = λ ·V [(U, φU , v)],

so ·U = ·V . Similarly,

[(U, φU , v)] +U [(U, φU , v′)] = [(U, φU , v + v′)] =

= [(V, φV , dφU (q)φU,V (v + v′))] = [(V, φV , dφU (q)φU,V v + dφU (q)φU,V v′)] =

= [(V, φV , dφU (q)φU,V v)] +V [(V, φV , dφU (q)φU,V v′)] =

= [(U, φU , v)] +V [(U, φU , v′)],

so +U = +V .

From now on we will simply write λ[(U, φU , v)] and [(U, φU , v)]+[(U, φU , v′)] without the

U label.Notice that in particular we have

dimTqM = dimM

where dim denotes on the left-hand-side the dimension of a vector space and on theright-hand-side the dimension of a manifold.

Let now F : M → N be a differentiable map. Given a chart (U, φU ) of M containingq and a chart (V, ψV ) of N containing F (q), we have the linear map

dU,Vq F : Φ−1V dφU (q)FU,V ΦU : TqM → TF (q)N.


Lemma 29.2. The linear map dU,Vq F does not depend on the choice of charts, sowe have a canonically defined linear map

dqF : TqM → TF (q)N

Proof. Let (U ′, φU′) be also a chart containing q and (V ′, ψV ′) be also a chart containingF (q). Then

dU,Vq F [(U, φU , v)] = [(V, ψV ,dφU (q)FU,V v)] =

= [(V ′, ψV ′ , dψ(F (q))ψV,V ′ dφU (q)FU,V v)] =

= [(V ′, ψV ′ ,dφ′U

(q)FU′,V ′ (dφU (q)φU,U′)−1v)] =

= dU′,V ′

q F [(U ′, φU′ , (dφU (q)φU,U′)−1v)] = dU

′,V ′q F [(U, φU , v)],

so dU,Vq F = dU′,V ′

q .

We also immediately have the following

Lemma 29.3. Let F : M → N and G : N → Z be differentiable maps. Thendq(G F ) = dF (q)GdqF for all q ∈M .

Remark 29.1. Suppose M is a submanifold of Rn defined by l constraints satisfying theconditions of the implicit function theorem. We may reorganize the constraints as a mapΦ: Rn → Rl. The conditions are that dqΦ for q ∈M is surjective and that dqΦ dqι = 0,where ι is the inclusion map of M into Rn. As a consequence, TqM = ker dqΦ ∀q ∈ M .This is an explicit way of computing the tangent space.

Remark 29.2. Notice that we can now characterize immersions and submersions, intro-duced in Definition 29.2, as follows: A differentiable map F : M → N is an immersioniff dqF is injective ∀q ∈M and is a submersion iff dqF is surjective ∀q ∈M .

A differentiable curve in M is a differentiable map γ : I → M , where I is an opensubset of R with its standard manifold structure. For t ∈ I, we define the velocity of γat t as

γ(t) := dtγ1 ∈ Tγ(t)M

where 1 is the vector 1 in R. Notice that for M an open subset of Rn this coincides withthe usual definition of velocity.

For q ∈ M , define Pq as the space of differentiable curves γ : I → M such that I 3 0and γ(0) = q. It is easy to verify that the map Pq → TqM , γ 7→ γ(0) is surjective, so wecan think of TqM as the space of all possible velocities at q.

This observation together with Remark 29.1 yields a practical way of computing thetangent spaces of a submanifold of Rn.

29.2. The tangent bundle. We can glue all the tangent spaces of an n-dimensionalCk-manifold M , k ≥ 1, together:

TM := ∪q∈MTqM


An element of TM is usually denoted as a pair (q, v) with q ∈ M and v ∈ TqM .14 Weintroduce the surjective map π : TM → M , (q, v) 7→ q. Notice that the fiber TqM canbe also obtained as π−1(q).TM has the following structure of Ck−1-manifold. Let (Uα, φα)α∈I be an atlas in

the equivalence class defining M . We set Uα := π−1(Uα) and

φα : Uα → Rn × Rn(q, v) 7→ (φα(q),Φq,Uαv)

where Φq,Uα is the isomorphism defined in (32). Notice that the chart maps are linearin the fibers. The transition maps are then readily computed as

φαβ(x,w) = (φαβ(x),dxφαβw)

Namely, they are the tangent lifts of the transition maps for M and are clearly Ck−1.

Definition 29.4. Tangent bundleThe tangent bundle of the Ck-manifold M , k ≥ 1, is the Ck−1-manifold defined by

the equivalence class of the above atlas.

Remark 29.3. Observe that another atlas on M in the same Ck-equivalence class yieldsan atlas on TM that is Ck−1-equivalent to previous one.

Remark 29.4. Notice that π : TM → M is a Ck−1-surjective map and, if k > 1, asubmersion.

Definition 29.5. Tangent liftIf M and N are Ck-manifolds and F : M → N is a Ck-map, then the tangent lift

F : TM → TN is the Ck−1-map (q, v) 7→ (F (q), dqFv).

Definition 29.6. Vector field A vector field on M is a Ck−1-map X : M → TMsuch π X = IdM .

In an atlas (Uα, φα)α∈I on M and the corresponding atlas (Uα, φα)α∈I on TM , avector field X is represented by a collection of Ck−1-maps Xα : φα(Uα)→ Rn. All thesemaps are related by

Xβ(φαβ(x)) = dxφαβ Xα(x)

for all α, β ∈ I and for all x ∈ φα(Uα ∩ Uβ). Notice that a collection of maps Xαsatisfying all these relations defines a vector field and this is how often vector fields areintroduced.

To a vector field X we associate the ODE

q = X(q).

A solution is a path q : I →M such that q(t) = X(q(t)) ∈ Tq(t)M for all t ∈ I.Assume k > 1, so the vector field is continuously differentiable. Notice that the

existence and uniqueness theorem as well as the theorem on dependence on the initialvalues extend immediately to the case of Ck-manifolds, as it enough to have them in

14Notice that we now denote by v a tangent vector at q, i.e., an equivalence class of coordinatizedtangent vectors at q, and no longer an element of Rn.


charts. (A solution that passes from one chart to another can be regarded as a pairof solutions, the end point of the first serving as the initial condition of the second.)To a vector field X we may then associate its flow Φt. We have that Φ0 = IdM andΦt+s = Φt Φs, which also implies that all Φt’s are diffeomorphisms. We recover X(q)as d0Φ(q)1, regarding Φ(q) as a map I →M .

30. Vector bundles

The tangent bundle introduced in the previous Section is an example of a more generalstructure known as a vector bundle.

Definition 30.1. Vector bundleA Ck-vector bundle of rank r over a Ck-manifold of dimension n is a Ck-manifold

E together with a surjection π : E →M such that:

(1) Eq := π−1(q) is a vector space for all q ∈M .

(2) E possesses a Ck-atlas of the form (Uα, φα)α∈I with Uα = π−1(Uα) for aCk-atlas (Uα, φα)α∈I of M and

φα : Uα → Rn × Rr(q, v ∈ Eq) 7→ (φα(q), Aα(q)v)

where Aα(q) is a linear isomorphism for all q ∈ Uα.(3) The maps Aαβ(q) := Aβ(q)Aα(q)−1 : Rr → Rr are Ck in q (i.e., Aαβ : Uα ∩

Uβ → End(Rn) is a Ck-map, where we identify End(Rn) with Rn2

with itsstandard manifold structure) for all α, β ∈ I.

Notice that π is a Ck-map with respect to this manifold structure and that for k > 0 itis a submersion. Also notice that the atlas in the definition has transition functions

φαβ(x, u) = (φαβ(x), Aαβ(x)u)

that are linear in the second factor Rr.

Definition 30.2. SectionA section of a Ck-vector bundle E

π−→M is a Ck-map σ : M → E with πσ = IdM .

Example 30.1. It is readily verified that the tangent bundle TM of a Ck-manifold M withk ≥ 1 is a Ck−1-vector bundle where we regard the base manifold M as a Ck−1-manifold.A section of TM is then the same as a vector field on M .

If one picks an atlas as in Definition 30.1, then a section of E is the same as a collectionof Ck-maps15 σα : φα(Uα)→ Rr such that

(33) σβ(φαβ(x)) = Aαβ(x)σα(x)

for all α, β ∈ I and for all x ∈ φα(Uα ∩ Uβ).

15These maps also called local sections.


30.1. Constructions on vector bundles. Another important consideration is that allconstructions in linear algebra extend from vector spaces to vector bundles. We onlyconsider two examples. We fix E and M as in Definition 30.1.

Example 30.2 (The dual bundle). Let E∗ := ∪q∈ME∗q . We denote an element of E∗ as

a pair (q, ω) with ω ∈ E∗q . We let πE∗(q, ω) = q. To a chart (Uα, φα)α∈I of E we

associate the chart (Uα, φα)α∈I of E∗ with Uα = π−1E∗(Uα) = ∪q∈UαE∗q and

φα : Uα → Rn × (Rr)∗(q, ω ∈ E∗q ) 7→ (φα(q), (Aα(q)∗)−1 ω)

where we regard (Rr)∗ as the manifold Rr with its standard structure. It follows thatwe have transitions maps

φαβ(x, u) = (φαβ(x), (Aαβ(x)∗)−1 u).

In the case E = TM , the dual bundle is denoted by T ∗M and is called the cotangentbundle of M .

Example 30.3 (Exterior power). Let ∧mE := ∪q∈M ∧mEq. We denote an element of ∧mEas a pair (q, ω) with ω ∈ ∧mEq. We let π∧mE(q, ω) = q. To a chart (Uα, φα)α∈I of E

we associate the chart (Uα, φα)α∈I of ∧mE with Uα = π−1∧mE(Uα) = ∪q∈Uα ∧mEq and

φα : Uα → Rn × ∧mRr(q, ω ∈ ∧mEq) 7→ (φα(q),∧mAα(q)ω)

where we regard ∧mRr as the manifold R( rm ) with its standard structure. It follows thatwe have transitions maps

φαβ(x, u) = (φαβ(x),∧mAαβ(x)u).

One further construction is given in the following

Example 30.4 (Pullback bundle). Let F : N →M be a Ck-map and Eπ−→M a Ck-vector

bundle. One defines F ∗E := (q, e) ∈ N × E | F (q) = π(e). One can readily see thatF ∗E is a Ck-vector bundle over M with projection map πF∗E(q, e) = q. In practice, thefiber of F ∗E at q is given by the fiber of E∗ at F (q) and the fiber transition maps ofF ∗E at q are given as the fiber transition maps of E at F (q). More precisely, we pick anatlas (Vj , ψj)j∈J of N . To the atlas in Definition 30.1, we then associate a new atlas(Vαj , ψαj)(α,j)∈I×J of N with Vαj := F−1(Uα) ∩ Vj and ψαj := ψj |Vαj

. The atlas of

F ∗E is then given by Vαj = π−1F∗E(Vαj) = ∪q∈VαjEF (q) and

ψαj : Vαj → Rs × Rr(q, v ∈ EF (q)) 7→ (ψαj(q), Aα(F (q)) v)

where s is the dimension of N . It follows that we have transitions maps

ψ(αj)(βj′)(x, u) = (ψ(αj)(βj′)(x), Aαβ(F (x))u).

30.2. Differential forms. For simplicity we are now going to consider only smoothmanifolds.

Definition 30.3. Differential formAn m-form on a smooth manifold M is a section of ∧mTM . We denote by

Ωm(M) the C∞(M)-module of m-forms and by Ω•(M) = ⊕mΩm(M). An elementof Ω•(M) is called a differential form on M .


Notice that if σ is an m-form, then (33) now reads

(∧mdxφαβ)∗σβ(φαβ(x)) = σα(x),

so we have the

Proposition 30.1. If (Uα, φα)α∈I is an atlas for M , then an m-form σ on M isthe same as a collection of m-forms σα, defined on φα(Uα), such that

(34) σα = φ∗αβσβ

for all α, β ∈ I.

Recall that the wedge product and the differential of differential forms on open subsets ofRn are compatible with pullbacks. As a consequence they can be extended to manifolds.Also notice that if φ is a diffeomorphism of open subsets of Cartesian powers of R, thenwe have that φ∗(ιφ∗Xσ) = ιXφ

∗σ for all vector fields X and differential forms σ, so thewhole Cartan calculus extends to manifolds.16

31. Applications to mechanics

Again for simplicity we only consider smooth manifolds.

31.1. The Noether 1-form. A smooth Lagrangian on M is by definition a smoothfunction on TM . If (Uα, φα)α∈I is an atlas for M , then Lα := L φ−1

α is a smoothfunction on φα(Uα) for all α ∈ I. The Noether 1-form on φα(Uα) is

θLα :=

n∑i=1

∂Lα∂viα

dqiα ∈ Ω1(φα(Uα)),

where (q1α, . . . , q

nα, v

1α, . . . , v

nα) are coordinates on φα(Uα) = φα(Uα)× Rn.

Proposition 31.1. The collection of 1-forms θLα defines a 1-form θL on TM calledthe Noether 1-form for L.

Proof. We have to verify (34), where now the transition maps are those of the tangentbundle; viz., we have to verify that

(35) θLα = φ∗αβθLβ ,

for all α, β ∈ I. Explicitly, the maps φαβ relate the coordinates (qβ , vβ) = φαβ(qα, vα)by

qiβ = φiαβ(qα),

viβ =

n∑j=1

∂φiαβ

∂qjα(qα) vjα.

16Recall that φ∗X(x) = dφ−1(x)X(φ−1(x)).


Since Lα(qα) = Lβ(qβ) by definition, we have

(36)∂Lα∂viα

=

n∑j=1

(∂qjβ∂viα

∂Lβ

∂qjβ+∂vjβ∂viα

∂Lβ

∂vjβ

)=

n∑j=1

∂φjαβ∂qiα

∂Lβ

∂vjβ.

On the other hand,

(37)

n∑i=1

∂φjαβ∂qiα

dqiα = dqjβ .

Hence (35) is proved.

31.2. The Legendre mapping. Let L be a Lagrangian on M and Lα := L φ−1α its

representation in the chart (Uα, φα) as above. We define

pαi :=∂Lα∂viα

as a function of the coordinates (qα, vα). Equation (36) in the proof of Proposition 31.1,now reads

pαi =

n∑j=1

∂φjαβ∂qiα

(qα) pβj ,

i.e., pβ = ((dqαφαβ)∗)−1 pα. This shows that (qβ , pβ) = φαβ(qα, p

α) where φαβ are thetransition maps for the cotangent bundle. This implies that the maps

ψLα : φα(Uα) = φα(Uα)× Rn → φα(Uα) = φα(Uα)× Rn(qα, vα) 7→ (qα, p

α(qα, vα))

are the representation of a map

(38) ψL : TM → T ∗M

called the Legendre mapping. This also shows that Hamiltonian mechanics naturallytakes place on the cotangent bundle.

31.3. The Liouville 1-form. Let (Uα, φα)α∈I be an atlas forM and let (Uα, φα)α∈Ibe the corresponding atlas for T ∗M . One defines

θα :=

n∑i=1

pαi dqiα ∈ Ω1(φα(Uα)),

where (q1α, . . . , q

nα, p

α1 , . . . , p

αn) are the coordinates on φα(Uα) = φα(Uα)× Rn.

Proposition 31.2. The collection of 1-forms θα defines a 1-form θ on T ∗M calledthe Liouville 1-form (a.k.a. the Poincare 1-form or the tautological 1-form).

Proof. We have to verify (34), where now the transition maps are those of the cotangentbundle; viz., we have to verify that

(39) θα = φ∗αβθβ ,


for all α, β ∈ I. Explicitly, the maps φαβ relate the coordinates (qβ , pβ) = φαβ(qα, p

α)by

qiβ = φiαβ(qα),

pαi =

n∑j=1

∂φjαβ∂qiα

(qα) pβj .

By (37), we immedately get (39).

By direct inspection of the formulae we also get the

Proposition 31.3. Let L be a Lagrangian. Then θL = ψ∗Lθ, where ψL denotes theLegendre mapping of equation (38).

Remark 31.1. There is also a coordinate independent definition of θ. Namely, denote by(q, p), q ∈M and p ∈ (TqM)∗, the points in T ∗M and let π : T ∗M →M , π(q, p) = q bethe projection map. For v ∈ T(q,p)T

∗M , define θ(q,p)v := p(d(q,p)π v).

31.4. Symplectic geometry. Symplectic geometry arises in the general formulation ofthe Hamiltonian systems encountered in mechanics.

Definition 31.1. Symplectic formA symplectic form on a manifold N is a closed, nondegnerate 2-form on N . A

symplectic manifold is a pair (N,ω) where N is a manifold and ω is a symplecticform on N .

Namely, ω ∈ Ω2(M), dω = 0, and for each q ∈ N the bilinear form ωq on TqN isnondegenerate (equivalently, the linear map ω]q : TqN → T ∗qN , (ω]qv)w := ωq(v, w), forv, w ∈ TqN , is an isomorphism).

Example 31.1. Let N be an open subset of R2n with coordinates q1, . . . , qn, p1, . . . , pn.Then

(40) ω =

n∑i=1

dpi ∧ dqi

is a symplectic form on N .

Example 31.2. Let N = T ∗M and ω = dθ where θ is the Liouville 1-form on T ∗M .Then (N,ω) is a symplectic manifold. Nondegeneracy is verified since in local charts ωis written as in (40).

Remark 31.2. Darboux’s Theorem, which we do not prove here, asserts that every sym-plectic manifold possesses an atlas such that the symplectic form in each chart is as in(40).

From now on, let (N,ω) be a symplectic manifold.

Definition 31.2. Hamiltonian vector fieldThe Hamiltonian vector field XH of a function H on N is the unique vector field

satisfying ιXHω = −dH. A vector field is called Hamiltonian if it is the Hamiltonianvector field of a function (which is defined up to a local constant).


If H and F are two functions on N , then one can easily see that

(41) XH(F ) = −XF (H).

This has two important consequences. The first is Noether’s theorem. We need first the

Definition 31.3. Hamiltonian systemA Hamiltonian system is a pair ((N,ω), H) where (N,ω) is a symplectic mani-

fold and H is a function on N . A constant of motion for the Hamiltonian system((N,ω), H) is a function that is constant on the orbits of XH . An infinitesimal sym-metry for the Hamiltonian system ((N,ω), H) is a Hamiltonian vector field X on Nsuch that X(H) = 0.

Theorem 31.4. Noether’s Theorem A Hamiltonian vector field is a symmetryfor the Hamiltonian system ((N,ω), H) iff any of its Hamiltonian functions is aconstant of motion.

Proof. Let F be a Hamiltonian function for the vector field at hand, which we denoteby XF . Being a symmetry means XF (H) = 0. On the other hand, F is a constant ofmotion iff XH(F ) = 0. The Theorem then follows from (41).

The second consequence of (41) is that the bracket

H,F := XH(F ),

called the Poisson bracket on (N,ω), is skew-symmetric. One can also show that it satifiesthe Jacobi identity. This immediately implies that the Poisson bracket of two contantsof motion for the same Hamiltonian function is again a constant of motion.

Appendix A. Topology

We recall a few facts about topology.

Definition A.1. TopologyA topology on a set S is a collection O(S) of subsets of S such that

(1) ∅, S ∈ O(S);(2) ∀U, V ∈ O(S) we have U ∩ V ∈ O(S);(3) if (Uα)α∈I is a family indexed by I with Uα ∈ O(S) ∀α ∈ I, we have∪α∈IUα ∈ O(S).

A set with a topology is called a topological space.

Example A.1. The collection of the usual open subsets of Rn forms a topology on Rn,called its standard topology.

In general, elements of a topology are called open sets and elements of a topologicalspace are called points. A neighborhood of a point is an open set containing it.


Definition A.2. Continuous mapA map F : S → T between topological spaces (S,O(S)) and (T,O(T )) is called

continuous if F−1(U) ∈ O(S) ∀U ∈ O(T ). A continous invertible map whose inverseis also continuous is called a homeomorphism. A map that maps open sets to opensets (i.e., in the above notation, F (U) ∈ O(T ) ∀U ∈ O(S)) is called open.

Topologies may often be induced. We consider two examples here.

Example A.2. Let (S,O(S)) be a topological space and let T be a subset of S. Then

OS(T ) := U ⊂ T | ∃V ∈ O(S) : U = V ∩ T

is a topology on T . With this topology the inclusion map ι : T → S is continuous.

This is in particular the topology one usually considers on subsets of Rn with itsstandard topology.

Example A.3. Let (S,O(S)) be a topological space and π : S → T be a surjective map.Then

OS,π(T ) := U ⊂ T | π−1(U) ∈ O(S)

is a topology on T . With this topology π is continuous.

Notice in particular that π arises when we have a quotient relation on S and defineT as the set of equivalence classes.

Remark A.1. Unless stated otherwise, when we speak of Rn, we tacitly assume thestandard topology; when speaking of a subset of a topological space or a quotient of atopological space, we tacitly assume the induced topology.

Appendix B. Derivations

In this Appendix we make a digression, which can be omitted with no consequencesby the hasty reader, on the interpretation of tangent vectors as directions along whichone can differentiate functions. This idea leads, in the case of smooth manifolds, toa definition of the tangent space where the linear structure is intrinsic and does notrequire choosing charts, even though only at an intermediate stage. The construction isalso more algebraic in nature.

The characterizing algebraic property of a derivative is the Leibniz rule for differen-tiating products. From the topological viewpoint, derivatives are characterized by thefact that, being defined as limits, they only see an arbitrarily small neighborhood of thepoint where we differentiate. The latter remark then suggests considering functions “upto a change of the definition domain,” a viewpoint that turns out to be quite useful.

Let M be a Ck-manifold, k ≥ 0. For q ∈M we denote by Ckq (M) the set of Ck-functionsdefined in a neighborhood of q inM . Notice that by pointwise addition and multiplicationof functions (on the intersection of their definition domains), Ckq (M) is a commutativealgebra.


Definition B.1. GermWe define two functions in Ckq (M) to be equivalent if they coincide in a neighbor-

hood of q.a An equivalence class is called a germ of Ck-functions at q. We denoteby CkqM the set of germs at q with the inherited algebra structure.

aMore pedantically, f ∼ q if there is a neighborhood U of q in M contained in the definitiondomains of f and g such that f|U = g|U .

Notice that two equivalent functions have the same value at q. This defines an algebramorphism, called the evaluation at q:

evq : CkqM → R.

We are now ready for the

Definition B.2. Derivation at a pointA derivation at q in M is a linear map D : CkqM → R satisfying the Leibniz rule

D(fg) = Df evq g + evq fDg,

for all f, g ∈ CkqM . Notice that a linear combination of derivations at q is also a

derivation at q. We denote by DerkqM the vector space of derivations at q in M .

Remark B.1. Notice that if U is a neighborhood of q, regarded as a Ck-manifold, a germat q ∈ U is the same as a germ at q ∈ M . So we have CkqU = CkqM . As a consequencewe have

DerkqU = DerkqM

for every neighborhood U of q in M .

The first algebraic remark is the following

Lemma B.1. A derivation vanishes on germs of constant functions (the germ of aconstant function at q is an equivalence class containing a function that is constantin a neighborhood of q).

Proof. Let D be a derivation at q. First consider the germ 1 (the equivalence classcontaining a function that is equal to 1 in a neighborhood of q). From 1 · 1 = 1, itfollows that

D1 = D1 1 + 1D1 = 2D1,

so D1 = 0. Then observe that, if f is the germ of a constant function, then f = k1,where k is the evaluation of f at q. Hence, by linearity, we have Df = kD1 = 0

Remark B.2. Notice that all the above extends to a more geneal context: one may definederivations an any algebra with a character (an algebra morphism to the ground field).The above Lemma holds in the case of algebras with one.

Let now F : M → N be a Ck-morphism. Then we have an algebra morphismF ∗ : CkF (q)(N) → Ckq (M), f 7→ f F|

F−1(V ), where V is the definition domain of f .

This clearly descends to germs, so we have an algebra morphism

F ∗ : CkF (q)N → CkqM,


which in turn induces a linear map of derivations

derkq : DerkqM → DerkF (q)N

D 7→ D F ∗

It then follows immediately that, if G : N → Z is now a Ck-morphism, then

derkq (G F ) = derkF (q)GderkqF.

This in particular implies that, if F is a Ck-isomorphism, then derkqF is a linear isomor-phism.

Let (U, φU ) be a chart containing q. We then have an isomorphism derkqφU : DerkqU →DerkφU (q)φU (U). As in Remark B.1, we have DerkqU = DerkqM and DerkφU (q)φU (U) =

DerkφU (q)Rn. Hence we have an isomorphism

derkqφU : DerkqM∼−→ DerkφU (q)Rn

for each chart (U, φU ) containing q. It remains for us to understand derivations at apoint of Rn:

Lemma B.2. For every y ∈ Rn, the linear map

Ay : DerkyRn → Rn

D 7→

Dx1

...Dxn

is surjective for k ≥ 1 and an isomorphism for k = ∞ (here x1, . . . , xn denote thegerms of the coordinate functions on Rn).

Proof. For k ≥ 1 we may also define the linear map

By : Rn → DerkyRn

v =

v1

...vn

7→ Dv

with

Dv[f ] =

n∑i=1

vi∂f

∂xi(y),

where f is a representative of [f ]. Notice that AyBy = Id, which implies that Ay issurjective.

It remains to show that, for k =∞, we also have ByAy = Id. Let f be a representativeof [f ] ∈ C∞y Rn. As a function of x, f may be Taylor-expanded around y as

f(x) = f(y) +

n∑i=1

(xi − yi) ∂f∂xi

(y) +R2(x),

where the rest can be written as

R2(x) =

n∑i,j=1

(xi − yi)(xj − yj)∫ 1

0

(1− t) ∂2f

∂xi∂xj(y + t(x− y)) dt.


(To prove this formula just integrate by parts.) Define

σi(x) :=∂f

∂xi(y) +

n∑j=1

(xj − yj)∫ 1

0

(1− t) ∂2f

∂xi∂xj(y + t(x− y)) dt,

so we can write

f(x) = f(y) +

n∑i=1

(xi − yi)σi(x).

Observe that, for all i, both xi − yi and σi are C∞-functions;17 the first vanishes atx = y, whereas for the second we have

σi(y) =∂f

∂xi(y).

For a derivation D ∈ Der∞y Rn, we then have, also using Lemma B.1,

D[f ] =

n∑i=1

Dxi∂f

∂xi(y) = ByAy(D)[f ],

which completes the proof.

From now on, we simply write Derq and derq instead of Der∞q and der∞q .

Corollary 2. For every q in a smooth manifold, we have

dim DerqM = dimM

We finally want to compare the construction in terms of derivations with the one interms of equivalence classes of coordinatized tangent vectors.

Theorem B.3. Let M be a smooth manifold, q ∈M , and (U, φU ) a chart containingq. Then the isomorphism

τq,U := (derqφU )−1A−1φ(q)Φq,U : TqM → DerqM

does not depend on the choice of chart. We will denote this canonical isomorphismsimply by τq.

If F : M → N is a smooth map, we have dqF = τ−1F (q) derqF τq.

Proof. Explicitly we have,

(τq,U [(U, φU , v)])[f ] =

n∑i=1

vi∂(f φ−1

U )

∂xi(φU (q)),

17Here it is crucial to work with k =∞. For k ≥ 2 finite, in general σi is only Ck−2, and for k = 1it is not even defined.


for every representative f of [f ] ∈ C∞q M . We then have, by the chain rule,

(τq,V [(U, φU , v)])[f ] = (τq,V [(V, φV , dφU (q)φU,V v)])[f ] =

=

n∑i,j=1

∂φiU,V∂xj

(φU (q)) vj∂(f φ−1

V )

∂xi(φV (q)) =

=

n∑i=1

vi∂(f φ−1

U )

∂xi(φU (q)) = (τq,U [(U, φU , v)])[f ].

The last statement of the Theorem also easily follows from the chain rule in differentiatingf F , f ∈ [f ] ∈ C∞F (q)N .

B.1. Vector fields as derivations. We now want to show that vector fields on amanifolds are the same as derivations on its algebra of functions.

Definition B.3. Derivation on a smooth manifoldA derivation on a smooth manifold M is an R-linear map D : C∞(M)→ C∞(M)

such thatD(fg) = Df g + f Dg.

Notice that a linear combination of derivations is also a derivation. We denote byDer(M) the C∞(M)-module of derivations on C∞(M).

We first want to connect derivations with derivations at a point q. Notice that, forevery Ck-manifold, k ≥ 0, we have a linear map γq : Ck(M)→ CkqM that associates to afunction its germ at q.

Lemma B.4. For every q ∈M , γq is surjective.

Proof. Let [f ] ∈ CkqM . Pick an atlas (Uα, φα)α∈I , and let α be an index such that

Uα 3 q. Let f be a representative of (φ−1α )∗[f ] ∈ Ckq φα(Uα). Choose ψ ∈ Ck(φα(Uα))

with the following properties:

(1) ψ|V = 1, where V is an open ball containing φα(q) and contained in φα(Uα).(2) ψ|φα(Uα)\W = 0, where W is an open ball containing V and contained in φα(Uα).

Let f := fψ. We clearly have [f ] = (φ−1α )∗[f ]. Let fα := f φα. We then have

[fα] = [f ]. Finally, we define a function f on M by f(x) = fα(x) for x ∈ Uα and

f(x) = 0 for x ∈M \ Uα. We claim that f ∈ Ck(M).

For β ∈ I, define Wβ := φαβ(W ) ⊂ φβ(Uβ). We clearly have that f φ−1β coincides

with f φβα on φβ(Uα ∩ Uβ) and vanishes in the complement of Wβ in φβ(Uβ). This

shows that f φ−1β is Ck for all β, so f ∈ Ck(M). Finally, observe that γq(f) = [f ].

Theorem B.5. If M is a smooth manifold, we have a canonical C∞(M)-linearisomorphism

τ : X(M)→ Der(C∞(M)),

where X(M) is the C∞(M)-module of vector fields on M .


Proof. If X is a vector field and f is a function, we define ((τ(X))f)(q) := (τqX(q))γqf .It is readily verified that τ(X) is a derivation. It is also clear that τ is C∞(M)-linearand injective. We only have to show that it is surjective.

If D is a derivation and [f ] ∈ C∞q , we define Dq[f ] := (Df)(q) for any f ∈ γ−1q ([f ]).

This is readily seen to be independent of the choice of f and to be a derivation at q. Wethen define X(q) := τ−1

q (Dq), which is readily seen to depend smoothly on q. Hence wehave found an inverse map fo τ .

Appendix C. The Levi-Civita Symbol

C.1. Definition. The Levi-Civita symbol εi1,i2,...,in or total antisymmetric tensor oralso called epsilon-tensor, is a symbol which is useful for vector and tensor calculus inphysics. We will only look at the three dimensional epsilon-tensor εijk. Many laws inphysics, especially classical mechanics and electrodynamics contain vector products. Thecommon definition is

~a×~b =

a1

a2

a3

×b1b2b3

=

a2b3 − a3b2a3b1 − a1b3a1b2 − a2b1

.

Identities, like the Grassmann-identity

~a× (~b× ~c) = (~a · ~c)~b− (~a ·~b)~c,

can be computed by its components but there exists a much more systematical andelegant way do this, namely with the Levi-Civita symbol.

Definition C.1. Levi-Civita SymbolThe Levi-Civita symbol εijk for the indices i, j, k ∈ 1, 2, 3 is given by

εijk =

1 if (ijk) is an even permutation of (123)

−1 if (ijk) is an odd permutation of (123)

0 otherwise

The last case is equal to the case that there are at least two indices which are thesame.

C.2. Relation to the vector product. The most important application of the Levi-Civita symbol is for the vector product. We can compute the k-th component of thevector product as

(~a×~b

)k

=

3∑i,j=1

εijkaibj .

One can easily check that this is true.


C.3. Einstein notation. The Einstein notation is a convention for the notation ofmathematical objects. With that notation one can just leave away the sign for a sum inthe given formula. One has to sum over double appearing indices. We can look at it bythe example of the vector product.

(~a×~b

)k

=

3∑i,j=1

εijkaibj = εijkaibj .

As another example we can take a look at the usual euclidean scalar product on theRn.

~a ·~b =

n∑i=1

aibi = δijaibj .

C.4. Identities for products of Levi-Civita symbols. The product of two Levi-Civita symbols can be expressed with the Kronecker symbols. There are four differentcases.

Proposition C.1. Identities

• The two Levi-Civita symbols don’t have a common index:

εijkεlmn =

∣∣∣∣∣∣δil δim δinδjl δjm δjnδkl δkm δkn

∣∣∣∣∣∣= δilδjmδkn + δimδjnδkl + δinδjlδkm − δimδjlδkn − δilδjnδkm − δinδjmδkl

• The two Levi-Civita symbols have one common index:

εijkεimn =

∣∣∣∣δjm δjnδkm δkn

∣∣∣∣ = δjmδkn − δjnδkm.

• The two Levi-Civita symbols have two common indices:

εijkεijn = 2δkn.

• The two Levi-Civita symbols have the same indices:

εijkεijk = 3! = 6.


References

[1] Oliver Buhler: A Brief Introduction to Classical, Statistical, and Quantum Mechanics, AmericanMathematical Society, Courant Institute of Mathematical Sciences

[2] Leon A. Takhtajan: Quantum Mechanics for Mathematicians, Graduate Studies in Mathematics,Volume 95, American Mathematical Society.

[3] V.I. Arnold: Mathematical Methods of Classical Mechanics, Springer-Verlag, Graduate Texts inMathematics.

[4] John M. Lee: Introduction to smooth manifolds, University of Washington, Department of Math-ematics.

[5] L.D. Landau, E.M. Lifshitz: Mechanics, Course of Theoretical Physics. Vol. 1 (3rd ed.).Butterworth-Heinemann.

Institut fur Mathematik, Universitat Zurich, Winterthurerstrasse 190 CH-8057 ZurichE-mail address, N. Moshayedi: [email protected], [email protected]

classical mechanics - uzhuser.math.uzh.ch/cattaneo/classical_mechanics.pdf · classical mechanics...

Documents