classical theoretical physics - alexander altland

121
Classical Theoretical Physics Alexander Altland

Upload: jalexlg

Post on 29-Dec-2015

130 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Classical Theoretical Physics - Alexander Altland

Classical Theoretical PhysicsAlexander Altland

Page 2: Classical Theoretical Physics - Alexander Altland

ii

Page 3: Classical Theoretical Physics - Alexander Altland

Contents

1 Electrodynamics 11.1 Magnetic field energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Electromagnetic gauge field . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Covariant formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Fourier transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3.1 Fourier transform (finite space) . . . . . . . . . . . . . . . . . . . . . 81.3.2 Fourier transform (infinte space) . . . . . . . . . . . . . . . . . . . . 101.3.3 Fourier transform and solution of linear differential equations . . . . . 121.3.4 Algebraic interpretation of Fourier transform . . . . . . . . . . . . . . 14

1.4 Electromagnetic waves in vacuum . . . . . . . . . . . . . . . . . . . . . . . . 201.4.1 Solution of the homogeneous wave equations . . . . . . . . . . . . . . 201.4.2 Polarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.5 Green function of electrodynamics . . . . . . . . . . . . . . . . . . . . . . . . 231.5.1 Physical meaning of the Green function . . . . . . . . . . . . . . . . . 261.5.2 Electromagnetic gauge field and Green functions . . . . . . . . . . . . 27

1.6 Field energy and momentum . . . . . . . . . . . . . . . . . . . . . . . . . . . 281.6.1 Field energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281.6.2 Field momentum ∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301.6.3 Energy and momentum of plane electromagnetic waves . . . . . . . . 32

1.7 Electromagnetic radiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2 Macroscopic electrodynamics 372.1 Macroscopic Maxwell equations . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.1.1 Averaged sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.2 Applications of macroscopic electrodynamics . . . . . . . . . . . . . . . . . . 47

2.2.1 Electrostatics in the presence of matter * . . . . . . . . . . . . . . . . 472.2.2 Magnetostatics in the presence of matter . . . . . . . . . . . . . . . . 50

2.3 Wave propagation in media . . . . . . . . . . . . . . . . . . . . . . . . . . . 512.3.1 Model dielectric function . . . . . . . . . . . . . . . . . . . . . . . . . 512.3.2 Plane waves in matter . . . . . . . . . . . . . . . . . . . . . . . . . . 522.3.3 Dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542.3.4 Electric conductivity . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

iii

Page 4: Classical Theoretical Physics - Alexander Altland

iv CONTENTS

3 Relativistic invariance 613.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.1.1 Galilei invariance and its limitations . . . . . . . . . . . . . . . . . . . 613.1.2 Einsteins postulates . . . . . . . . . . . . . . . . . . . . . . . . . . . 633.1.3 First consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.2 The mathematics of special relativity I: Lorentz group . . . . . . . . . . . . . 663.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.3 Aspects of relativistic dynamics . . . . . . . . . . . . . . . . . . . . . . . . . 713.3.1 Proper time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713.3.2 Relativistic mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . 72

3.4 Mathematics of special relativity II: Co– and Contravariance . . . . . . . . . . 763.4.1 Covariant and Contravariant vectors . . . . . . . . . . . . . . . . . . . 763.4.2 The relativistic covariance of electrodynamics . . . . . . . . . . . . . . 78

4 Lagrangian mechanics 814.1 Variational principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834.1.2 Euler–Lagrange equations . . . . . . . . . . . . . . . . . . . . . . . . 844.1.3 Coordinate invariance of Euler-Lagrange equations . . . . . . . . . . . 86

4.2 Lagrangian mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 894.2.1 The idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 894.2.2 Hamilton’s principle . . . . . . . . . . . . . . . . . . . . . . . . . . . 914.2.3 Lagrange mechanics and symmetries . . . . . . . . . . . . . . . . . . 924.2.4 Noether theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 934.2.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5 Hamiltonian mechanics 995.1 Hamiltonian mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.1.1 Legendre transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 1005.1.2 Hamiltonian function . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.2 Phase space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065.2.1 Phase space and structure of the Hamilton equations . . . . . . . . . . 1065.2.2 Hamiltonian flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1075.2.3 Phase space portraits . . . . . . . . . . . . . . . . . . . . . . . . . . 1095.2.4 Poisson brackets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1105.2.5 Variational principle . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

5.3 Canonical transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1135.3.1 When is a coordinate transformation ”canonical”? . . . . . . . . . . . 1135.3.2 Canonical transformations: why? . . . . . . . . . . . . . . . . . . . . 113

Page 5: Classical Theoretical Physics - Alexander Altland

Chapter 1

Electrodynamics

1.1 Magnetic field energy

As a prelude to our discussion of the full set of Maxwell equations, let us address a questionwhich, in principle, should have been answered in the previous chapter: What is the energystored in a static magnetic field? In section ??, the analogous question for the electric fieldwas answered in a constructive manner: we computed the mechanical energy required to buildup a system of charges. It turned out that the answer could be formulated entirely in terms ofthe electric field, without explicit reference to the charge distribution creating it. By symmetry,one might expect a similar prescription to work in the magnetic case. Here, one would askfor the energy needed to build up a current distribution against the magnetic field created bythose elements of the current distribution that have already been put in place. A momentsthought, however, shows that this strategy is not quite as straightforwardly implemented asin the electric case: no matter how slowly we move a ‘current loop’ in an magnetic field, anelectric field acting on the charge carriers maintaining the current in the loop will be induced— the induction law. Work will have to be done to maintain a constant current and it is thiswork function that essentially enters the energy balance of the current distribution.

I B

To make this picture quantitative, consider a current loop carry-ing a current I. We may think of this loop as being consecutivelybuilt up by importing small current loops (carrying current I) frominfinity (see the figure.) The currents flowing along adjacent seg-ments of these loops will eventually cancel so that only the netcurrent I flowing around the boundary remains. Let us, then, com-pute the work that needs to be done to bring in one of these loopsfrom infinity.

Consider, first, an ordinary point particle kept at a (mechanical)potential U . The rate at which this potential changes if the particlechanges its position is dtU = dtU(x(t)) = ∇U · dtx = −F · v,where F is the force acting on the particle. Specifically, for the charged particles moving inour prototypical current loops, F = qE, where E is the electric field induced by variation of

1

Page 6: Classical Theoretical Physics - Alexander Altland

2 CHAPTER 1. ELECTRODYNAMICS

the magnetic flux through the loop as it approaches from infinity.Next consider an infinitesimal volume element d3x inside the loop. Assuming that the

charge carriers move at a velocity v, the charge of the volume element is given by q = ρd3x andrate of its potential change by dtφ = −ρd3xv ·E = −d3xj ·E. We may now use the inductionlaw to express the electric field in terms of magnetic quantities:

∫ds·E = −c−1

∫δSdσ n·dtB =

−c−1∫δSdσ n · dt(∇ × A) = −c−1

∫ds · dtA. Since this holds regardless of the geometry

of the loop, we have E = −c−1dtA, where dtA is the change in vector potential due to themovement of the loop. Thus, the rate at which the potential of a volume element changes isgiven by dtφ = c−1d3xj ·dtA. Integrating over time and space, we find that the total potentialenergy of the loop due to the presence of a vector potential is given by

E =1

c

∫d3x j ·A. (1.1)

Although derived for the specific case of a current loop, Eq.(1.1) holds for general currentdistributions subject to a magnetic field. (For example, for the current density carried by apoint particle at x(t), j = qδ(x− x(t))v(t), we obtain E = qv(t) ·A(x(t)), i.e. the familiarLorentz–force contribution to the Lagrangian of a charged particle.)

Now, assume that we shift the loop at fixed current against the magnetic field. The changein potential energy corresponding to a small shift is given by δE = c−1

∫d3x j · δA, where

∇ × δA = δB denotes the change in the field strength. Using that ∇ ×H = 4πc−1j, werepresent δE as

δE =1

∫d3x (∇×H) · δA =

1

∫d3x εijk(∂jHk)δAi =

= − 1

∫d3xHkεijk∂jδAi =

1

∫d3xH · δB,

where in the integration by parts we noted that due to the spatial decay of the fields nosurface terms at infinitey arise. Due to the linear relation H = µ−1

0 B, we may write H · δB =δ(H · B)/2, i.e. δE = 1

8πδ∫B · E. Finally, summing over all shifts required to bring the

current loop in from infinity, we obtain

E =1

∫d3xH ·B (1.2)

for the magnetic field energy. Notice (a) that we have again managed to express theenergy of the system entirely in terms of the fields, i.e. without explicit reference to thesources creating these fields and (b) the structural similarity to the electric field energy (??).Finally, (c) the above construction shows that a separation of electromagnetism into staticand dynamic phenomena, resp., is not quite as unambiguous as one might have expected:the derivation of the magnetostatic field energy relied on the dynamic law of induction. Wenext proceed to embed the general static theory – as described by the concept of electric andmagnetic potentials – into the larger framework of electrodynamics.

Page 7: Classical Theoretical Physics - Alexander Altland

1.2. ELECTROMAGNETIC GAUGE FIELD 3

Thus, from now on, we are up to the theory of the full set of Maxwell equations

∇ ·D = 4πρ, (1.3)

∇×H− 1

c

∂tD =

cj, (1.4)

∇× E +1

c

∂tB = 0, (1.5)

∇ ·B = 0. (1.6)

1.2 Electromagnetic gauge field

Consider the full set of Maxwell equations in vacuum (E = D and B = H),

∇ · E = 4πρ,

∇×B− 1

c

∂tE =

cj,

∇× E +1

c

∂tB = 0,

∇ ·B = 0. (1.7)

As in previous sections we will try to use constraints inherent to these equations to compactifythem to a smaller set of equations. As in section ??, the equation ∇ ·B = 0 implies

B = ∇×A. (1.8)

(However, we may no longer expect A to be time independent.) Now, substitute this represen-tation into the law of induction: ∇×(E+c−1∂tA) = 0. This implies that E+c−1∂tA = −∇φcan be written as the gradient of a scalar potential, or

E = −∇φ− 1

c∂tA. (1.9)

We have, thus, managed to represent the electromagnetic fields as in terms of derivatives ofa generalized four–component potential A = (φ,A). (The negative sign multiplying A hasbeen introduced for later reference.) Substituting Eqs. (1.8) and (1.9) into the inhomogeneousMaxwell equations, we obtain

−∆φ− 1c∂t∇ ·A = 4πρ,

−∆A + 1c2∂2tA +∇(∇ ·A) + 1

c∂t∇φ = 4π

cj (1.10)

These equations do not look particularly inviting. However, as in section ?? we may observethat the choice of the generalized vector potential A is not unique; this freedom can be usedto transform Eq.(1.10) into a more manageable form: For an arbitrary function f : R3×R→

Page 8: Classical Theoretical Physics - Alexander Altland

4 CHAPTER 1. ELECTRODYNAMICS

R, (x, t) 7→ f(x, t). The transformation A → A +∇f leaves the magnetic field unchangedwhile the electric field changes according to E→ E−c−1∂t∇f . If, however, we synchronouslyredefine the scalar potential as φ→ φ− c−1∂tf , the electric field, too, will not be affected bythe transformation. Summarizing, the generalized gauge transformation

A → A +∇f,φ → φ− 1

c∂tf, (1.11)

leaves the electromagnetic fields (1.8) and (1.9) unchanged.The gauge freedom may be used to transform the vector potential into one of several

convenient representations. Of particular relevance to the solution of the time dependentMaxwell equations is the so–called Lorentz gauge

∇ ·A +1

c∂tφ = 0. (1.12)

Importantly, it is always possible to satisfy this condition by a suitable gauge transformation.Indeed, if (φ′,A′) does not obey the Lorentz condition, i.e. ∇·A′+ 1

c∂tφ′ 6= 0, we may define

a transformed potential (φ,A) through φ = φ′ − c−1∂tf and A = A′ +∇f . We now requirethat the newly defined potential obey (1.12). This is equivalent to the condition(

∆− 1

c2∂2t

)f = −

(∇ ·A′ + 1

c∂tφ′),

i.e. a wave equation with inhomogeneity −(∇ ·A + c−1∂tφ). We will see in a moment thatsuch equations can always be solved, i.e.

we may always choose a potential (φ,A) so as to obey the Lorentzgauge condition.

In the Lorentz gauge, the Maxwell equations assume the simplified form(∆− 1

c2∂2t

)φ = −4πρ,(

∆− 1

c2∂2t

)A = −4π

cj. (1.13)

In combination with the gauge condition (1.12), Eqs. (1.13) are fully equivalent to the set ofMaxwell equations (1.7).

1.2.1 Covariant formulation

The formulation introduced above is mathematically efficient, however it has an unsatisfactorytouch to it: the gauge transformation (1.11) couples the “zeroth component” φ with the

Page 9: Classical Theoretical Physics - Alexander Altland

1.2. ELECTROMAGNETIC GAUGE FIELD 5

“vectorial” components A. While this indicates that the four components (φ,A) shouldform an entity, the equations above treat φ and A separately. In the following we show thatthere exists a much more concise and elegant covariant formulation of electrodynamics thattranscendents this separation. At this point, we introduce the covariant notation merely as amnemonic aid. Later in the text, we will understand the physical principles behind the newformulation.

Definitions

Consider a vector x ∈ R4. We represent this object in terms of four components {xµ},µ = 0, 1, 2, 3, where x0 will be called the time–like component and xi, i = 1, 2, 3 space-like components.

1Four–component objects of this structure will be called contravariant

vectors. To a contravariant vector xµ, we associate a covariant vector xµ (notice thepositioning of the indices as superscripts and subscripts, resp.) through

x0 = x0, xi = −xi. (1.14)

A co– and a contravariant vector xµ and x′µ can be “contracted” to produce a number:2

xµx′µ ≡ x0x′0 − xix′i.Here, we are employing the Einstein summation convention according to which pairs of indicesare summed over, xµx′µ ≡

∑4µ=1 x

µx′µ, etc. Notice that

xµx′µ = xµx′µ.

We next introduce a number of examples of co- and contravariant vectors that will be ofimportance below:

. Consider a point in space time (t, r) defined by a time argument, and three real spacecomponents ri, i = 1, 2, 3. We store this information in a so-called contravariant four–component vector, xµ, µ = 0, 1, 2, 3 as x0 = ct, xi = ri, i = 1, 2, 3. Occasionally, wewrite x ≡ {xµ} = (ct,x) for short. The associated covariant vector has components{xµ} = (ct, r).

. Derivatives taken w.r.t. the components of the contravariant 4–vector are denoted by∂µ ≡ δ

δxµ= (c−1∂t, ∂r). Notice that this is a covariant object. Its contravariant partner

is given by {∂µ} = (c−1∂t,−∂r).

. The 4-current j = {jµ} is defined through j ≡ (cρ, j), where ρ and j are charge densityand vectorial current density, resp.

. the 4–vector potential A ≡ {Aµ} through A = (φ,A).

We next engage these definitions to introduce the1

Following a widespread convention, we denote four component indices µ, ν, · · · = 0, 1, 2, 3 by Greek letters(from the middle of the alphabet.) “Space–like” indices i, j, · · · = 1, 2, 3 are denoted by (mid–alphabet) latinletters.

2

Although one may formally consider sums xµx′µ, objects of this type do not make sense in the present

formalism.

Page 10: Classical Theoretical Physics - Alexander Altland

6 CHAPTER 1. ELECTRODYNAMICS

Covariant representation of electrodynamics

Above, we saw that the freedom to represent the potentials (φ,A) in different gauges may beused to obtain convenient formulations of electrodynamics. Within the covariant formalism,the general gauge transformation (1.11) assumes the form

Aµ → Aµ − ∂µf. (1.15)

(Exercise: convince yourself that this is equivalent to (1.11).) Specifically, this freedom maybe used to realize the Lorentz gauge,

∂µAµ = 0. (1.16)

Once more, we verify that any 4-potential may be transformed to one that obeys the Lorentzcondition: for if A′µ obeys ∂µA

′µ ≡ λ, where λ is some function, we may define the transformedpotential Aµ ≡ A′µ − ∂µf . Imposing on Aµ the Lorentz condition,

0!

= ∂µAµ = ∂µ(A′µ − ∂µf) = λ− ∂µ∂µf,

we obtain the equivalent condition �f!

= λ. Here, we have introduced the d’Alambertoperator

� ≡ ∂µ∂µ =

1

c2∂2t −∆. (1.17)

(In a sense, this is a four–dimentional generalization of the Laplace operator, hence the four–cornered square.) This is the covariant formulation of the condition discussed above. A gaugetransformation Aµ → Aµ + ∂µf of a potential Aµ in the Lorenz gauge class ∂µA

µ = 0 iscompatible with the Lorentz condition iff �f = 0. In other words, the class of Lorentz gaugevector potentials still contains a huge gauge freedom.

The general representation of the Maxwell equations through potentials (1.10) affords the4-representation

�Aµ − ∂µ(∂νAν) =

cjµ.

We may act on this equation with∑

µ ∂µ to obtain the constraint

∂µjµ = 0, (1.18)

which we identify as the covariant formulation of charge conservation, ∂µjµ = ∂tρ+∇·j =

0: the Maxwell equations imply current conservation. Finally, imposing the Lorentz condition,∂µA

µ = 0, we obtain the simplified representation

∂ν∂νAµ =

cjµ, ∂µA

µ = 0. (1.19)

Before turning to the discussion of the solution of these equations a few general remarksare in order:

Page 11: Classical Theoretical Physics - Alexander Altland

1.3. FOURIER TRANSFORMATION 7

. The Lorentz condition is the prevalent gauge choice in electrodynamics because (a) itbrings the Maxwell equations into a maximally simple form and (b) will turn out belowto be invariant under the most general class of coordinate transformations, the Lorentztransformations see below. (It is worthwhile to note that the Lorentz condition doesnot unambiguously fix the gauge of the vector potential. Indeed, we may modify anyLorentz gauge vector potential Aµ as Aµ → Aµ+∂µf where f satisfies the homogeneouswave equation ∂µ∂µf = 0. This doesn’t alter the condition ∂µAµ = 0.) Other gaugeconditions frequently employed include

. the Coulomb gauge or radiation gauge ∇ · A = 0 (employed earlier in section??.) The advantage of this gauge is that the scalar Maxwell equation assumes thesimple form of a Poisson equation ∆φ(x, t) = 4πρ(x, t) which is solved by φ(x, t) =∫d3x′ρ(x, t)/|x − x′|, i.e. by an instantaneous ’Coulomb potential’ (hence the name

Coulomb gauge.) This gauge representation has also proven advantageous within thecontext of quantum electrodynamics. However, we won’t discuss it any further in thistext.

. The passage from the physical fields E, B to potentials Aµ as fundamental variablesimplies a convenient reduction of variables. Working with fields, we have 2 × 3vectorial components, which, however are not independent: equations such as∇×B = 0represent constraints. This should be compared to the four components of the potentialAµ. The gauge freedom implies that of the four components only three are independent(think about this point.) In general, no further reduction in the number of variableis possible. To see this, notice that the electromagnetic fields are determined by the“sources” of the theory, i.e. the 4-current jµ. The continuity equation ∂µj

µ = 0 reducesthe number of independent sources down to 3. We certainly need three independent fieldcomponents to describe the field patterns created by the three independent sources. (Theelectromagnetic field in a source–free environment jµ = 0 is described in terms of onlytwo independent components. This can be seen, e.g., by going to the Coulomb gauge.The above equation for the time–like component of the potential is then uniquely solvedby φ = 0. Thus, only the two independent components of A constraint by ∇ ·A = 0remain.

1.3 Fourier transformation

In this – purely mathematical – section, we introduce the concept of Fourier transformation.We start with a formal definition of the Fourier transform as an abstract integral transfor-mation. Later in the section, we will discuss the application of Fourier transformation in thesolution of linear differential equations, and its conceptual interpretation as a basis-change infunction space.

Page 12: Classical Theoretical Physics - Alexander Altland

8 CHAPTER 1. ELECTRODYNAMICS

1.3.1 Fourier transform (finite space)

Consider a function f : [0, L] → C on an interval [0, L] ⊂ R. Conceptually, the Fouriertransformation is a representation of f as a superposition (a sum) of plane waves:

f(x) =1

L

∑k

eikxfk, (1.20)

where the sum runs over all values k = 2πn/L, n ∈ Z. Series representations of this type aresummarily called Fourier series representations. At this point, we haven’t proven that thefunction f can actually be represented by a series representation of this type. However, if itexists, the coefficients fk can be computed with the help of the identity (prove it)

1

L

∫ L

0

dx e−ikxeikx′= δkk′ , (1.21)

where k = 2πn/L, k′ = 2πn′/L and δkk′ ≡ δnn′ . We just need to multiply Eq. (1.1) by thefunction e−ikx and integrate over x to obtain∫ L

0

dx e−ikxf(x)(1.1)=

1

L

∫ L

0

dx e−ikx∑k′

eik′xfk′

=∑k′

fk′1

L

∫ L

0

dx e−ikxeik′x (1.21)

= fk.

We call the numbers fk ∈ C the Fourier coefficients of the function f . The relation

fk =

∫ L

0

dx e−ikxf(x), (1.22)

connecting a function with its Fourier coefficients is called the Fourier transform of f .The ’inverse’ identity (1.1) expressing f through its Fourier coefficients is called the inverseFourier transform. To repeat, at this point, we haven’t proven the existence of the inversetransform, i.e. the possibility to ’reconstruct’ a function f from its Fourier coefficients fk.

A criterion for the existence of the Fourier transform is that the insertion of (1.22) intothe r.h.s. of (1.1) gives the back the original function f :

f(x)!

=1

L

∑k

eikx∫ L

0

dx′ e−ikx′f(x′) =

=

∫ L

0

dx′1

L

∑k

eik(x−x′)f(x′).

A necessary and sufficient condition for this to hold is that

1

L

∑k

eik(x−x′) = δ(x− x′), (1.23)

Page 13: Classical Theoretical Physics - Alexander Altland

1.3. FOURIER TRANSFORMATION 9

1.5 2.0 2.5 3.0

-1.0

-0.5

0.5

1.0

Figure 1.1: Representation of a function (thick line) by Fourier series expansion. The graphsshow series epansions up to (1, 3, 5, 7, 9) terms in *

where δ(x) is the Dirac δ–function. For reasons that will become transparent below, Eq. (1.23)is called a completeness relation. Referring for a proof of this relation to the exercise below,we here merely conclude that Eq. (1.23) establishes Eq. (1.22) and its inverse (1.1) as anintegraltransform, i.e. the full information carried by f(x) is contained in the set of all Fouriercoefficients {fk}. By way of example, figure 1.1 illustrates the approximation of a sawtoothshaped function f by Fourier series truncated after a finite number of terms.

EXERCISE Prove the completeness relation. To this end, add an infinitesimal “convergencegenerating factor” δ > 0 to the sum:

1

L

∑k

eik(x−x′)−δ|k| =1

L

∞∑n=−∞

e2πin(x−x′)/L−2π|n|δ/L.

Next compute the sum as a convergent geometric series. Take the limit δ ↘ 0 and convince

yourself that you obtain a representation of the Dirac δ–function.

It is straightforward to generalize the formulae above to functions defined in higher dimen-sional spaces: consider a function f : M → C, where M = [0, L1]× [0, L2]× · · · × [0, Ld] isa d–dimensional cuboid. It is straightforward to check, that the generalization of the formulaeabove reads

f(x) =d∏l=1

1

Ll

∑k

eik·xfk,

fk =

∫M

ddx e−ix·kf(x),

(1.24)

where the sum runs over all vectors k = (k1, . . . , kd), kl = 2πnl/Ll, nl ∈ Z. Eqs. (1.21)

Page 14: Classical Theoretical Physics - Alexander Altland

10 CHAPTER 1. ELECTRODYNAMICS

and (1.23) generalize to ∫M

ddx ei(k−k′)·x = δk,k′ ,

d∏l=1

1

Ll

∑k

eik(x−x′) = δ(x− x′),

where δk,k′ ≡∏d

l=1 δkl,k′l .

Finally, the Fourier transform of vector valued functions, f : M → Cn is definedthrough the Fourier transform of its components, i.e. fk is an n–dimensional vector whosecomponents f lk, are given by the Fourier transform of the component functions f l(x), l =1, . . . , n.

This completes the definition of Fourier transformation in finite integration domains. Atthis point, a few general remarks are in order:

. Why is Fourier transformation useful? There are many different answers to thisquestion, which underpins the fact that we are dealing with an important concept: Inmany areas of electrical engineering, sound engineering, etc. it is customary to workwith wave like signals, i.e. signals of harmonic modulation ∼ Re eikx = cos(kx). The(inverse) Fourier transform is but the decomposition of a general signal into harmonicconstituents. Similarly, many “signals” encountered in nature (water or sound wavesreflected from a complex pattern of obstacles, light reflected from rough surfaces, etc.)can be interpreted in terms of a superposition of planar waves. Again, this is a situationwhere a Fourier ’mode’ decomposition is useful.

On a different note, Fourier transformation is an extremely potent tool in the solutionof linear differential equations. This is a subject to which we will turn below.

. There is some freedom in the definitions of Fourier transforms. For example,we might have chosen f(x) = c

L

∑k e

ikxfk with fk = 1c

∫dx e−ikxfk and an arbitrary

constant c. Or, f(x) = 1L

∑k e−ikxfk with fk =

∫dx eikxfk (inverted sign of the

phases), or combinations thereof. The actual choice of convention is often a matter ofconvenience.

1.3.2 Fourier transform (infinte space)

In many applications, we want to consider functions f : Rd → C without explicit reference tosome finite cuboid shaped integration domain. Formally, the extension to infinite space maybe effected by sending the extension of the integration interval, L, to infinity. In the limitL → ∞, the spacing of consecutive Fourier modes ∆k = 2π/L approaches zero, and Eq.

Page 15: Classical Theoretical Physics - Alexander Altland

1.3. FOURIER TRANSFORMATION 11

(1.1) is identified as the Riemann sum representation of an integral,

limL→∞

f(x) = limL→∞

1

L

∑k

eikxfk =1

2πlim

∆k→0∆k∑n∈Z

ei∆knxf∆kn

=1

∫dk eikxf(k),

Following a widespread convention, we denote the Fourier representation of functions in infinitespace by f(k) instead of fk. Explicit reference to the argument “k” will prevent us fromconfusing the Fourier representation f(k) from the original function f(x).

The infinite space version of the inverse transform reads as

f(k) =

∫ ∞−∞

dx e−ikxf(x). (1.25)

While this definition is formally consistent, it has a problem: only functions decaying suf-ficiently fast at infinity so as to make f(x) exp(ikx) integrable can be transformed in thisway. This makes numerous function classes of applied interest – polynomials, various types oftranscendental functions, etc. – non-transformable and severely restrict the usefulness of theformalism.

However, by a little trick, we may upgrade the definition to one that is far more useful.Let us modify the transform according to

f(k) = limδ↘0

∫dx e−ikx−δx

2

f(x). (1.26)

Due to the inclusion of the convergence generating factor, δ, any function that does notincrease faster than exponential can be transformed! Similarly, it will be useful to include aconvergence generating in the reverse transform, i.e.

f(x) = limδ↘0

∫dk eikx−δk

2

f(k).

Finally, upon straightforward extension of these formulae to multi–dimensional functions f :Rd → C, we arrive at the definition of Fourier transform in infinite space

f(x) = limδ↘0

∫ddk

(2π)deix·k−δk

2

f(k),

f(k) = limδ↘0

∫ddx e−ik·x−δx

2

f(x).

(1.27)

In these formulae, x2 ≡ x · x is the norm of the vector x, etc. A few more general remarkson these definitions:

. To keep the notation simple, we will suppress the explicit reference to the convergencegenerating factor, unless convergence is an issue.

Page 16: Classical Theoretical Physics - Alexander Altland

12 CHAPTER 1. ELECTRODYNAMICS

. Occasionally, it will be useful to work with convergence generating factors differentfrom the one above. For example, in one dimension, it may be convenient to effectconvergence by the definition

f(x) = limδ↘0

1

∫dk eixk−δ|k|f(k),

f(k) = limδ↘0

∫dx e−ikx−δ|x|f(x).

. For later reference, we note that the generalization of the completeness relation toinfinite space reads as ∫

ddk

(2π)deik·(x−x

′) = δ(x− x′). (1.28)

. Finally, one may wonder, whether the inclusion of the convergence generating factormight not spoil the consistency of the transform. To check that this is not the case, weneed to verify the infinite space variant of the completeness relation (1.23), i.e.

limδ↘0

∫ddk

(2π)deik·(x−x

′)−δk2f(k) = δ(x− x′). (1.29)

(For only if this relation holds, the two equations in (1.27) form a consistent pair, cf.the argument in section 1.3.1.)

EXERCISE Prove relation (1.29). Do the Gaussian integral over k (hint: notice that the integralfactorizes into d one–dimensional Gaussian integrals.) to obtain a result that in the limit δ ↘ 0asymptotes to a representation of the δ–function.

1.3.3 Fourier transform and solution of linear differential equations

One of the most striking features of the Fourier transform is that it turns derivative operations(complicated) into algebraic multiplications (simple): consider the function ∂xf(x).

3Without

much loss of generality, we assume the function f : [0, L] → C to obey periodic boundaryconditions, f(0) = f(L). A straightforward integration by parts shows that the Fouriercoefficients of ∂xf are given by ikfk. In other words, denoting the passage from a function toits Fourier transform by an arrow,

f(x) −→ fk ⇒ ∂xf(x) −→ ikfk.

This operation can be iterated to give

∂nxf(x) −→ (ik)nfk.

3

Although our example is one–dimensional, we write ∂x instead of ddx . This is in anticipation of our

higher–dimensional applications below.

Page 17: Classical Theoretical Physics - Alexander Altland

1.3. FOURIER TRANSFORMATION 13

In a higher dimensional setting these identities generalize to (think why!)

∂lf(x) −→ (ikl)fk,

where ∂l ≡ ∂∂xl

and kl is the l–component of the vector k. Specifically, the operations ofvector analysis become purely algebraic operations:

∇f(x) −→ ikfk,

∇ · f(x) −→ ik · fk,∇× f(x) −→ ik× fk,

∆f(x) −→ −(k · k)fk, (1.30)

etc. Relation of this type are of invaluable use in the solution of linear differential equations(of which electrodynamics is crawling!)

To begin with, consider a linear differential equation with constant coefficients. Thisis a differential equation of the structure

(c(0) + c(1)i ∂i + c

(2)ij ∂i∂j + . . . )ψ(x) = φ(x),

where c(n)i,j,... ∈ C are constants. We may now Fourier transform both sides of the equation to

obtain(c(0) + ic

(1)i ki − c(2)

ij kikj + . . . )ψk = φk.

This is an algebraic equation which is readily solved as

ψk =φk

c(0) + ic(1)i ki − c(2)

ij kikj + . . ..

The solution ψ(x) can now be obtained by computation of the inverse Fourier transform,

ψ(x) =

∫ddk

(2π)dφke

−ik·x

c(0) + ic(1)i ki − c(2)

ij kikj + . . ., (1.31)

Fourier transform methods thus reduce the solution of linear partial differential equations withconstant coefficients to the computation of a definite integral. Technically, this integral maybe difficult to compute, however, the conceptual task of solving the differential equation isunder control.

EXAMPLE By way of example, consider the second order ordinary differential equation

(−∂2x + λ−2)ψ(x) = cδ(x− x0),

where λ, c and x0 are constants. Application of formula (1.31) gives a4

solution

ψ(x) = c

∫dk

e−ikxeikx0

k2 + λ−2.

4

The solution is not unique: addition of a solution ψ0 of the homogeneous equation (−∂2x+λ−2)ψ0(x) = 0gives another solution ψ(x)→ ψ(x) + ψ0(x) of the inhomogeneous differential equation.

Page 18: Classical Theoretical Physics - Alexander Altland

14 CHAPTER 1. ELECTRODYNAMICS

The integral above can be done by the method of residues (or looked up in a table). As a resultwe obtain the solution.

ψ(x) = cλe−|x−x0|/λ.

(Exercise: convince yourself that ψ is a solution to the differential equation.) In the limit λ→∞,the differential equation collapses to a one–dimensional variant of the Poisson equation, −∂2

xψ(x) =cδ(x− x0) for a charge c/4π at x0. Our solution asymptotes to

ψ(x) = −c|x− x0|,

which we may interpret as a one–dimensional version of the Coulomb potential.

However, the usefulness of Fourier transform methods has its limits. The point is that theFourier transform does not only turn complicated operations (derivatives) simple, the converseis also true. The crux of the matter is expressed by the so–called convolution theorem:straightforward application of the definitions (1.27) and of the completeness relation (1.29)shows that

f(x)g(x) −→ (f ∗ g)(k) ≡∫

ddp

(2π)df(k− p)g(p). (1.32)

In other words, the Fourier transform turns a simple product operation into a non–local inte-gral convolution. Specifically, this means that the Fourier transform representation of (linear)differential equations with spatially varying coefficient functions will contain convolution oper-ations. Needless to say that the solution of the ensuing integral equations can be complicated.

Finally, notice the mathematical similarity between the Fourier transform f(x)→ f(k) andits reverse f(k)→ f(x). Apart from a sign change in the phase and the constant pre–factor(2π)−d the two operations in (1.27) are identical. This means that identities such as (1.30)or (1.32) all have a reverse version. For example,

∂kf(k) −→ −ixf(x),

a convolution,

(f ∗ g)(x) ≡∫ddxf(x− y)g(y) −→ f(k)g(k)

transforms into a simple product, etc. In practice, one often needs to compromise and figureout whether the real space or the Fourier transformed variant of a problem offers the bestperspectives for mathematical attack.

1.3.4 Algebraic interpretation of Fourier transform

While it is perfectly o.k. to apply the Fourier transform as a purely operational tool, newperspective become available once we understand the conceptual meaning of the transform.The Fourier transform is but one member of an entire family of integral transforms. All thesetransforms have a common basis, which we introduce below. The understanding of theseprinciples will elucidate the mathematical structure of the formulas introduced above, and beof considerable practical use.

Page 19: Classical Theoretical Physics - Alexander Altland

1.3. FOURIER TRANSFORMATION 15

A word of caution: this section won’t contain difficult formulae. Nonetheless, it doesn’tmake for easy reading. Please do carefully think about the meaning of the structures introducedbelow.

Function spaces as vector spaces

Consider the space V of L2 functions f : [0, L] → C, i.e. the space of functions for which∫ L0dx |f |2 exists. We call this a ’space’ of functions, because V is a vector space: for two

elements f, g ∈ V and a, b ∈ C, af + bg ∈ V , the defining criterion of a complex vectorspace. However, unlike the finite–dimensional spaces studied in ordinary linear algebra, thedimension of V is infinite. Heuristically, we may think of the dimension of a complex vectorspace as the number of linearly independent complex components required to uniquely specifya vector. In the case of functions, we need the infinitely many “components” f(x), x ∈ [0, L]for a full characterization.

0.2 0.4 0.6 0.8 1.0

-1.0

-0.5

0.5

1.0

In the following we aim to develop some intuition for the “linearalgebra” on function spaces. To this end, it will be useful tothink of V as the limit of finite dimensional vector spaces.Let xi = iδ, δ = L/N be a discretization of the interval[0, L] into N segments. Provided N is sufficiently large, thedata set fN ≡ (f1, . . . , fN), fi ≡ f(xi) will be an accuraterepresentative of a (smooth) function f (see the figure for a discretization of a smooth functioninto N = 30 data points). We think of the N -component complex vectors fN generated inthis way as elements of an N -dimensional complex vector space VN . Heuristically it should beclear that in the limit of infinitely fine discretization, N → ∞, the vectors fN carry the fullinformation on their reference functions, and VN approaches a representative of the functionspace “limN→∞ VN = V ”.

5

5

The quotes are quite appropriate here. Many objects standard in linear algebra, traces, determinants,etc. do not afford a straightforward extrapolation to the infinite dimensional case. However, in the presentcontext, the ensuing structures – the subject of functional analysis – do not play an essential role, and anaive approach will be sufficient.

Page 20: Classical Theoretical Physics - Alexander Altland

16 CHAPTER 1. ELECTRODYNAMICS

On VN , we may define a complex scalar product, 〈 , 〉 : VN × VN → Cthrough, 〈f ,g〉 ≡ 1

N

∑Ni=1 f

∗i gi.

6Here, the factor N−1 has been included to keep

the scalar product roughly constant as N increases. In the limit N → ∞, theoperation 〈 , 〉 evolves into a scalar product on the function space V : to twofunctions f, g ∈ V , we assign

〈f, g〉 ≡ limN→∞

1

N

∑i

f ∗i gi =

∫ L

0

dxf ∗(x)g(x).

Corresponding to the scalar product on VN , we have a natural orthonormal basis{ei|i = 1, . . . , N}, where ei = (0, . . . , N, 0, . . . ) and the entry N sits at the ithposition. With this definition, we have

〈ei, ej〉 = Nδij.

The N–normalization of the basis vectors ei implies that the coefficients of generalvectors f are simply obtained as

fi = 〈ei, f〉. (1.33)

What is the N → ∞ limit of these expressions? At N → ∞ the vectors ei become therepresentative of functions that are “infinitely sharply peaked” around a single point (see thefigure above). Increasing N , and choosing a sequence of indices i that correspond to a fixedpoint y ∈ [0, L],

7we denote this function by δy. Equation (1.33) then asymptotes to

f(y) = 〈δy, f〉 =

∫ L

0

dx δy(x)f(x),

where we noted that δy is real. This equation identifies δy(x) = δ(y − x) as the δ-function.8

This consideration identifies the δ–functions {δy|y ∈ [0, L]} as the function space analog ofthe standard basis associated to the canonical scalar product. Notice that there are “infinitelymany” basis vectors. Much like we need to distinguish between vectors v ∈ VN and thecomponents of vectors, vi ∈ C, we have to distinguish between functions f ∈ V and their“coefficients” or function values f(x) = 〈δx, f〉 ∈ C.

Fourier transformation is a change of basis

Why did we bother to introduce these formal analogies? The point is that much like inlinear algebra we may consider bases different from the standard one. In fact, there is oftengood motivation to switch to another basis. Imagine one aims to mathematically describea problem displaying a degree of “symmetry”. Think, for example, of the Poisson equationof a rotationally symmetric charge distribution. It will certainly be a good idea, to seek for

6

If no confusion is possible, we omit the subscript N in fN .7

For example, i = [Ny/L], where [x] is the largest integer smaller than x ∈ R.8

Strictly speaking, δy is not a “function”. Rather, it is what in mathematics is called a distribution.However, we will continue to use the loose terminology “δ–function”.

Page 21: Classical Theoretical Physics - Alexander Altland

1.3. FOURIER TRANSFORMATION 17

solutions that reflect this symmetry, i.e. one might aim to represent the solution in termsof a basis of functions that show definite behavior under rotations. Specifically, the Fouriertransformation is a change to a function basis that behaves conveniently when acted upon byderivative operations.

To start with, let us briefly recall the linear algebra of changes of basis in finite dimen-sional vector spaces. Let {wα|α = 1, . . . , N} be a second basis of the space VN (besidesthe reference basis {ei|i = 1, . . . , N}). For simplicity, we presume that the new basis isorthonormal with respect to the given scalar product, i.e.

〈wα,wβ〉 = Nδα,β.

Under these conditions, any vector v ∈ VN may be expanded as

v =∑α

v′αwα. (1.34)

Written out for individual components vi = 〈ei,v〉, this reads

vi =∑α

v′αwα,i, (1.35)

where wα,i ≡ 〈ei,wα〉 are the components of the new basis vectors in the old basis. Thecoefficients of the expansion, v′α, of v in the new basis are obtained by taking the scalarproduct with wα: 〈wα,v〉 =

∑β v′β〈wα,wβ〉 = Nv′α, or

v′α =1

N

∑j

w∗α,jvj (1.36)

Substitution of this expansion into (1.35) obtains vi = N−1∑

j

∑αw∗α,jwα,i vj. This relation

holds for arbitrary component sets {vi}, which is only possible if

δij =1

N

∑α

w∗α,jwα,i. (1.37)

This is the finite dimensional variant of a completeness relation. The completeness relationholds only for orthonormalized bases. However, the converse is also true: any system ofvectors {wα|i = 1, . . . , n} that satisfying the completeness criterion (1.37) is orthonormaland “complete” in the sense that it can be used to represent arbitrary vectors as in (1.34).We may also write the completeness relation in a less index–oriented notation,

〈ei, ej〉 =∑α

〈ei, wα〉 〈wα, ej〉. (1.38)

EXERCISE Prove that the completeness relation (1.37) represents a criterion for the orthonor-

mality of a basis.

Page 22: Classical Theoretical Physics - Alexander Altland

18 CHAPTER 1. ELECTRODYNAMICS

Let us now turn to the infinite dimensional generalization of these concepts. The (in-verse) Fourier transform states (1.1) that an arbitrary element f ∈ V of function space (a“vector”) can be expanded in a basis of functions {ek ∈ V |k ∈ 2πL−1Z}. The function values(“components in the standard basis”) of these candidate basis functions are given by

ek(x) = 〈δx, ek〉 = eikx.

The functions ek are properly orthonormalized (cf. Eq. (1.21))9

〈ek, ek′〉 =

∫ L

0

dx e∗k(x)ek′(x) = Lδkk′ .

The inverse Fourier transform states that an arbitrary function can be expanded as

f =1

L

∑k

fk ek. (1.39)

This is the analog of (1.34). In components,

f(x) =1

L

∑k

fk ek(x), (1.40)

which is the analog of (1.35). Substitution of ek(x) = eikx gives (1.1). The coefficients canbe obtained by taking the scalar product 〈ek, 〉 with (1.39), i.e.

fk = 〈ek, f〉 =

∫ L

0

dx e−ikxf(x),

which is the Fourier transform. Finally, completeness holds if the substitution of these coef-ficients into the inverse transform (1.40) reproduces the function f(x). Since this must holdfor arbitrary functions, we are led to the condition

1

L

∑k

e−ikxeiky = δ(x− y),

The abstract representation of this relation (i.e. the analog of (1.38)) reads (check it!)

〈δx, δy〉 =1

L

∑k

〈δx, ek〉 〈ek, δy〉.

For the convenience of the reader, the most important relations revolving around completenessand basis change are summarized in table 1.1. The column “invariant” contains abstractdefinitions, while “components” refers to the coefficients of vectors/function values (For betterreadability the normalization factors N we have our specific definition of scalar products aboveare omitted in the table. )

9

However, this by itself does not yet prove their completeness as a basis!

Page 23: Classical Theoretical Physics - Alexander Altland

1.3. FOURIER TRANSFORMATION 19

vector space function spaceinvariant components invariant components

elements vector, v components vi function, f value, f(x)scalar product 〈v,w〉 v∗i wi 〈f,g〉

∫dx f∗(x)g(x)

standard basis ei ei,j=δij δx δx(y)=δ(x−y)alternate basis wα wα,i ek ek(x)=exp ikx

orthonormality 〈wα,wβ〉=δαβ w∗α,iwβ,i=δαβ 〈ek,εk′ 〉=Lδkk′∫dx ei(k−k

′)x=Lδkk′

expansion v=vαwα vi=vαwα,i f= 1L

∑kfkek f(x)= 1

L

∑kfke

ikx

coefficients vα=〈wα,v〉 vα=w∗α,ivi fk=〈ek,f〉 fk=∫dx e−ikxf(x)

completeness 〈ei,ej〉=〈ei,wα〉〈wα,ej〉 δij=wα,iw∗α,j δ(x−y)= 1

L〈δx,ek〉〈ek,δy〉 δ(x−y)= 1

L

∑keik(x−y)

Table 1.1: Summarizing the linear algebraic interpretation of basis changes in function space.(In all formulas, a double index summation convention implied.)

With this background, we are in a position to formulate the motivation for Fouriertransformation from a different perspective. Above, we had argued that “Fourier transfor-mation transforms derivatives into something simple”. The derivative operation ∂x : V → Vcan be viewed as a linear operator in function space: derivatives map functions onto functions,f → ∂xf and ∂x(af + bg) = a∂xf + b∂xg, a, b ∈ C, the defining criterion for a linear map.In linear algebra we learn that given a linear operator, A : VN → VN , it may be a good ideato look at its eigenvectors, wα, and eigenvalues, λα, i.e. Awα = λαwα. In the specific case,where A is hermitean, 〈v, Aw〉 = 〈Av,w〉 for arbitrary v,w, we even know that the set ofeigenvectors {wα} can be arranged to form an orthonormal basis.

These structures extend to function space. Specifically, consider the operator −i∂x, wherethe factor (−i) has been added for convenience. A straightforward integration by parts showsthat this operator is hermitean:

〈f, (−i∂x)g〉 =

∫dx f ∗(x)(−i∂x)g(x) =

∫dx (−i∂xf)∗(x)g(x) = 〈(−i∂x)f, g〉.

Second, we have,

(−i∂xek)(x) = −i∂x exp(ikx) = k exp(ikx) = kek(x),

or(−i∂x) ek = k ek

for short. This demonstrates that

the exponential functions exp(ikx) are eigenfunctions of the derivativeoperator, and k is the corresponding eigenvalue.

This observation implies the conclusion:

Fourier transformation is an expansion of functions in the “eigenbasis” of thederivative operator, i.e. the basis of exponential functions {ek}.

This expansion is appropriate, whenever we need a “simple” representation of the derivativeoperator. In the sections to follow, we will explore how Fourier transformation works in practice.

Page 24: Classical Theoretical Physics - Alexander Altland

20 CHAPTER 1. ELECTRODYNAMICS

1.4 Electromagnetic waves in vacuum

As a warmup to our discussion of the full problem posed by the solution of Eqs. (1.13), weconsider the vacuum problem, i.e. a situation where no sources are present, jµ = 0. Takingthe curl of the second of Eqs. (1.13) and using (1.8) we then find(

∆− 1

c2∂2t

)B = 0.

Similarly, adding to the gradient of the first equation c−1 times the time derivative of thesecond, and using Eq.(1.9), we obtain(

∆− 1

c2∂2t

)E = 0,

i.e. in vacuum both the electric field and the magnetic field obey homogeneous wave equations.

1.4.1 Solution of the homogeneous wave equations

The homogeneous wave equations are conveniently solved by Fourier transformation. To thisend, we define a four–dimensional variant of Eq. (1.27),

f(t,x) =

∫d3kdω

(2π)4f(ω,k)eik·x−iωt,

f(ω,k) =

∫d3xdt f(t,x)e−ik·x+iωt, (1.41)

The one difference to Eq. (1.27) is that the sign–convention in the exponent of the (ω/t)–sectorof the transform differs from that in the (k,x)–sector. We next subject the homogeneous waveequation (

∆− 1

c2∂2t

)ψ(t,x) = 0,

(where ψ is meant to represent any of the components of the E– or B–field) to this transfor-mation and obtain (

k2 −(ωc

)2)ψ(ω,k) = 0,

where k ≡ |k|. Evidently, the solution ψ must vanish for all values (ω,k), except for thosefor which the factor k2 − (ω/c)2 = 0. We may thus write

ψ(ω,k) = 2πc+(k)δ(ω − kc) + 2πc−(k)δ(ω + kc),

where c± ∈ C are arbitrary complex functions of the wave vector k. Substituting thisrepresentation into the inverse transform, we obtain the general solution of the scalar ho-mogeneous wave equation

ψ(t,x) =

∫d3k

(2π)3

(c+(k)ei(k·x−ckt) + c−(k)ei(k·x+ckt)

). (1.42)

In words:

Page 25: Classical Theoretical Physics - Alexander Altland

1.4. ELECTROMAGNETIC WAVES IN VACUUM 21

The general solution of the homogeneous wave equation is obtained bylinear superposition of elementary plane waves, ei(k·x∓ckt),

where each wave is weighted with an arbitrary coefficient c±(k).The elementary constituents are called waves because for anyfixed instance of space, x/time, t they harmonically dependon the complementary argument time, t/position vector, x.The waves are planar in the sense that for all points in theplane fixed by the condition x · k = const. the phase of the

wave is identical, i.e. the set of points k · x = const. defines a ‘wave front’ perpendicularto the wave vector k. The spacing between consequtive wave fronts with the same phasearg(exp(i(k · x − ckt)) is given by ∆x = 2π

k≡ λ, where λ is the wave length of the wave

and λ−1 = k/2π its wave number. The temporal oscillation period of the wave fronts is setby 2π/ck.

k

E

B

Focusing on a fixed wave vector k, we next generalize our results tothe vectorial problem posed by the homogeneous wave equations. Sinceevery component of the fields E and B is subject to its own independentwave equation, we may write down the prototypical solutions

E(x, t) = E0ei(k·x−ωt), B(x, t) = B0e

i(k·x−ωt), (1.43)

where we introduced the abbreviation ω = ck and E0,B0 ∈ C3 are constant coefficientvectors. The Maxwell equations ∇ · B = 0 and (vacuum) ∇ · E = 0 imply the conditionk · E0 = k · B0 = 0, i.e. the coefficient vectors are orthogonal to the wave vector. Wavesof this type, oscillating in a direction perpendicular to the wave vector, are called transversewaves. Finally, evaluating Eqs.(1.43) on the law of induction ∇×E+ c−1∂tB = 0, we obtainthe additional equation B0 = nk × E0, where nk ≡ k/k, i.e. the vector B0 is perpendicularto both k and E0, and of equal magnitude as E0. Summarizing, the vectors

k ⊥ E ⊥ B ⊥ k, |B| = |E| (1.44)

form an orthogonal system and B is uniquely determined by E (and vice versa).At first sight, it may seem that we have been to liberal in formulating the solution (1.43):

while the physical electromagnetic field is a real vector field, the solutions (1.43) are manifestlycomplex. The simple solution to this conflict is to identify ReE and ReB with the physicalfields.

10

1.4.2 Polarization

In the following we will discuss a number of physically different realizations of plane electro-magnetic waves. Since B is uniquely determined by E, we will focus attention on the latter.

10

One may ask why, then, did we introduce complex notation at all. The reason is that working withexponents of phases is way more convenient than explicitly having to distinguish between the sin and cosfunctions that arise after real parts have been taken.

Page 26: Classical Theoretical Physics - Alexander Altland

22 CHAPTER 1. ELECTRODYNAMICS

Let us choose a coordinate system such that e3 ‖ k. We may then write

E(x, t) = (E1e1 + E2e2)eik·x−iωt,

where Ei = |Ei| exp(iφi). Depending on the choice of the complex coefficients Ei, a numberof physically different wave–types can be distinguished.

Linearly polarized waves

zx

y

For identical phases φ1 = φ2 = φ, we obtain

ReE = (|E1|e1 + |E2|e2) cos(k · x− ωt+ φ),

i.e. a vector field linearly polarized in the direction of thevector |E1|e1 + |E2|e2. The corresponding magnetic field isgiven by

ReB = (|E1|e2 − |E2|e1) cos(k · x− ωt+ φ),

i.e. a vector perpendicular on both E and k, and phase synchronous with E.

INFO Linear polarization is a hallmark of many artificial light sources, e.g. laser light is usually

linearly polarized. Likewise, the radiation emitted by many antennae shows approximately linear

polarization. In nature, the light reflected from optically dense media – water surfaces, window

panes, etc. – tends to exhibit a partial linear polarization.11

Circularly polarized waves

Next consider the case of a phase difference, φ1 = φ2 ∓ π/2 ≡ φ and equal amplitudes|E1| = |E2| = E:

ReE(x, t) = E (e1 cos(k · x− ωt+ φ)∓ e2 sin(k · x− ωt+ φ)) .

zx

y

Evidently, the tip of the vector E moves on a circle — E iscircularly polarized. Regarded as a function of x (for any fixedinstance of time), E traces out a spiral whose characteristicperiod is set by the wave number λ = 2π/k, (see Fig. ??.)The sense of orientation of this spiral is determined by the signof the phase mismatch. For φ1 = φ2 + π/2 ( φ1 = φ2 − π/2)we speak of a right (left) circulary polarized wave or a wave ofpositive (negative) helicity.

INFO As with classical point particles, electromagnetic fields can be subjected to a quantization

procedure. In quantum electrodynamics it is shown that the quantae of the electromagnetic field,

the photons carry definite helicity, i.e. they represent the minimal quantum of circularly polarized

waves.11

This phenomenon motivates the application of pole filters in photography. A pole filter blocks light of apre–defined linear polarization. This direction can be adjusted so as to suppress light reflections. As a resultone obtains images exhibiting less glare, more intense colouring, etc.

Page 27: Classical Theoretical Physics - Alexander Altland

1.5. GREEN FUNCTION OF ELECTRODYNAMICS 23

Elliptically polarized waves

Circular and linear polarization represent limiting cases of a more general form of polarization.Indeed, the minimal geometric structure capable of continuously interpolating between a linesegment and a circle is the ellipse. To conveniently see the appearance of ellipses, considerthe basis change, e± = 1√

2(e1 ± ie2), and represent the electric field as

E(x, t) = (E+e+ + E−e−)eik·x−iωt.

zx

y

In this representation, a circularly polarized wave correspondsto the limit E+ = 0 (positive helicity) or E− = 0 (negative he-licity). Linear polarization is obtained by setting E+ = ±E−.It is straightforward to verify that for generic values of the ra-tio |E−/E+| ≡ r one obtains an elliptically polarized wavewhere the ratio between the major and the minor axis is setby |1 + r/1 − r|. The tilting angle α of the ellipse w.r.t. the1–axis is given by α = arg(E−/E+).

This concludes our discussion of the polarization of electromagnetic radiation. It is impor-tant to keep in mind that the different types of polarization discussed above represent limitingcases of what in general is only partially or completely un–polarized radiation. Radiation ofa given value of the frequency, ω, usually involves the superposition of waves of differentwave vector k (at fixed wave number k = |k| = ωc−1.) Only if the amplitudes of all partialwaves share a definite phase/amplitude relation, do we obtain a polarized signal. The degreeof polarization of a wave can be determined by computing the so–called Stokes parameters.However, we will not discuss this concept in more detail in this text.

1.5 Green function of electrodynamics

To prepare our discussion of the full problem (1.19), let us consider the inhomogeneous scalarwave equation (

∆− 1

c2∂2t

)ψ = −4πg, (1.45)

where the inhomogeneity g and the solution f are scalar functions. As with the Poissonequation (??) the weak spot of the wave equation is its linearity. We may, therefore, againemploy the concept of Green functions to simplify the solution of the problem.

Page 28: Classical Theoretical Physics - Alexander Altland

24 CHAPTER 1. ELECTRODYNAMICS

INFO It may be worthwhile to rephrase the concept of the solution of linear differential equationsby Green function methods in the present dynamical context: think of a generic inhomo-geneity 4πg(x, t) as an additive superposition of δ–inhomogeneities, δ(y,s)(x, t) ≡ δ(x−y)δ(t−s)concentrated in space and time around y and s, resp. (see the figure, where the vertical bars rep-resent individual contributions δ(y,s) weighted by g(y, s) and a finite discretization in space–timeis understood.)

x

t

g

4πg(x, t) =

∫d3yds (4πg(y, s))δ(y,s)(x, t).

We may suppress the arguments (x, t) to writethis identity as

4πg =

∫d3yds (4πg(y, s)) δ(y,s).

This formula emphasizes the interpretation ofg as a weighted sum of particular functionsδ(y,s), with constant “coefficients” (i.e. num-bers independent of (x, t)) 4πg(y, s). Denotethe solutions of the wave equation for an in-dividual point source δ(y,s) by G(y,s). Thelinearity of the wave equation means that the solution f of the equation for the function g is givenby the sum f =

∫ddyds (4πg(y, s))G(y,s) of point source solutions, weighted by the coefficients

g(y, t). Or, reinstating the arguments (x, t),12

f(x, t) =

∫d3ydsG(x, t;y, s) g(y, s). (1.46)

The functions G(y,s)(x,t) ≡ G(x, t;y, s) are the Green functions of the wave equation. Knowledge

of the Green function is equivalent to a full solution of the general equation, up to the final

integration (1.46).

The Green function of the scalar wave equation (1.45) G(x, t;x′, t) is defined by thedifferential equation (

∆− 1

c2∂2t

)G(x, t;x′, t′) = −δ(x− x′)δ(t− t′). (1.47)

Once the solution of this equation has been obtained (which requires specification of a set ofboundary conditions), the solution of (1.45) becomes a matter of a straightforward integration:

f(x, t) = 4π

∫d3x′dt′G(x, t;x′, t′)g(x′, t′). (1.48)

Assuming vanishing boundary conditions at infinity G(x, t;x′, t′)|x−x′|,|t−t′|→∞−→ 0, we next turn

to the solution of Eq.(1.47).

12

To check that this is a solution, act with the wave equation operator −∆ + c−2∂t on both sides of theequation and use that (∆− c−2∂t)G(x, t;y, s) = −δ(y,s)(x, t) ≡ −δ(x− y)δ(t− s).

Page 29: Classical Theoretical Physics - Alexander Altland

1.5. GREEN FUNCTION OF ELECTRODYNAMICS 25

INFO In fact, the most elegant and efficient solution strategy utilizes methods of the theory

of complex functions. however, we do not assume familiarity of the reader with this piece of

mathematics and will use a more elementary technique.

We first note that for the chosen set of boundary conditions, the Green function G(x −x′, t − t′) will depend on the difference of its arguments only. We next Fourier transformEq.(1.47) in the temporal variable, i.e. we act on the equation with the integral transformf(ω) =

∫dt2π

exp(iωt)f(t) (whose inverse is f(t) =∫dt exp(−iωt)f(ω).) The temporally

transformed equation is given by(∆ + k2

)G(x, ω) = − 1

2πδ(x), (1.49)

where we defined k ≡ ω/c. If it were not for the constant k (and a trivial scaling factor (2π)−1

on the l.h.s.) this equation, known in the literature as the Helmholtz equation, would beequivalent to the Poisson equation of electrostatics. Indeed, it is straightforward to verify thatthe solution of (1.49) is given by

G±(x, ω) =1

8π2

e±ik|x|

|x| , (1.50)

where the sign ambiguity needs to be fixed on physical grounds.

INFO To prove Eq.(1.50), we introduce polar coordinates centered around x′ and act with thespherical representation of the Laplace operator (cf. section ??) on G±(r) = 1

8π2 eikr/r. Noting

that the radial part of the Laplace operator, ∆# is given by r−2∂rr2∂r = ∂2

r + (2/r)∂r and∆#(4πr)−1 = −δ(x) (the equation of the Green function of electrostatics), we obtain

(∆ + k2

)G±(r, ω) =

1

8π2

(∂2r + 2r−1∂r + k2

) e±ikrr

=

=e±ikr

8π2

((∂2r + 2r−1∂r

)r−1 + 2e∓ikr(∂rr

−1)∂re±ikr + e∓ikrr−1

(∂2r + k2 + 2r−1∂r

)e±ikr

)=

= − 1

2πδ(x)∓ 2ikr−2 ± 2ikr−2 = − 1

2πδ(x),

as required.

Doing the inverse temporal Fourier transform, we obtain

G±(x, t) =

∫dω e−iωtGω/c(x) =

∫dω e−iωt

1

8π2

e±iωc−1|x|

|x| =1

δ(t∓ c−1|x|)|x| ,

or

G±(x− x′, t− t′) =1

δ(t− t′ ∓ c−1|x− x′|)|x− x′| . (1.51)

For reasons to become clear momentarily, we call G+ (G−) the retarded (advanced) Greenfunction of the wave equation.

Page 30: Classical Theoretical Physics - Alexander Altland

26 CHAPTER 1. ELECTRODYNAMICS

t

x

x = ct

Figure 1.2: Wave front propagation of the pulse created by a point source at the originschematically plotted as a function of two–dimensional space and time. The width of thefront is set by δx = cδt, where δt is the duration of the time pulse. Its intensity decays as∼ x−1.

1.5.1 Physical meaning of the Green function

Retarded Green function

To understand the physical significance of the retarded Green function G+, we substitute ther.h.s. of Eq.(1.51) into (1.48) and obtain

f(x, t) =

∫d3x′

g(x′, t− |x− x′|/c)|x− x′| . (1.52)

For any fixed instance of space and time, (x, t), the solution f(x, t) is affected by the sourcesg(x′, t′) at all points in space and fixed earlier times t′ = t−|x−x|′/c. Put differently, a timet− t′ = |x−x′|/c has to pass before the amplitude of the source g(x′, t′) may cause an effectat the observation point x at time t — the signal received at (x, t) is subject to a retardationmechanism. When the signal is received, it is so at a strength g(x′, t)/|x−x′|, similarly to theCoulomb potential in electrostatics. (Indeed, for a time independent source g(x′, t′) = g(x′),we may enter the wave equation with a time–independent ansatz f(x), whereupon it reducesto the Poisson equation.) Summarizing, the sources act as ‘instantaneous Coulomb charges’which are (a) of strength g(x′, t′) and felt at times t = t′ + |x− x′|/c.

EXAMPLE By way of example, consider a point source at the origin, which ‘broadcasts’ a signal

for a short instance of time at t′ ' 0, g(x′, t′) = δ(x′)F (t′) where the function F is sharply

peaked around t′ = 0 and describes the temporal profile of the source. The signal is then given

by f(x, t) = F (t − |x|/c)/|x|, i.e. we obtain a pattern of out–moving spherical waves, whose

amplitude diminishes as |x|−1 or, equivalently ∼ 1/tc (see Fig. 1.2.)

Advanced Green function

Consider now the solution we would have obtained from the advanced Green function, f(x, t) =∫d3x′ g(x

′,t+|x−x′|/c)|x−x′| . Here, the signal responds to the behaviour of the source in the future.

Page 31: Classical Theoretical Physics - Alexander Altland

1.5. GREEN FUNCTION OF ELECTRODYNAMICS 27

The principle of cause–and–effect or causality is violated implying that the advanced Greenfunction, though mathematically a legitimate solution of the wave equation, does not carryphysical meaning. Two more remarks are in order: (a) when solving the wave equation bytechniques borrowed from the theory of complex functions, the causality principle is built infrom the outset, and the retarded Green function automatically selected. (b) Although theadvanced Green function does not carry immanent physical meaning, its not a senseless objectaltogether. However, the utility of this object discloses itself only in quantum theories.

1.5.2 Electromagnetic gauge field and Green functions

So far we have solved the wave equation for an arbitrary scalar source. Let us now specializeto the wave equations (1.19) for the components of the electromagnetic potential, Aµ. To afirst approximation, these are four independent scalar wave equations for the four sources jµ.We may, therefore, just copy the prototypical solution to obtain the retarded potentials ofelectrodynamics

φ(x, t) =

∫d3x′

ρ(x′, t− |x− x′|/c)|x− x′| ,

A(x, t) =1

c

∫d3x′

j(x′, t− |x− x′|/c)|x− x′| . (1.53)

INFO There is, however, one important consistency check that needs to be performed: recallingthat the wave equations (1.19) hold only in the Lorentz gauge ∂µA

µ = 0⇔ c−1∂tφ+∇ ·A = 0,we need to check that the solutions (1.53) actually meet the Lorentz gauge condition.Relatedly, we have to keep in mind that the sources are not quite independent; they obey thecontinuity equation ∂µj

µ = 0⇔ ∂tρ+∇ · j = 0.As in section ??, it will be most convenient to probe the gauge behaviour of the vector potentialin a fully developed Fourier language. Also, we will use a 4–vector notation throughout. In thisnotation, the Fourier transformation (1.41) assumes the form

f(k) =

∫d4x

(2π)4f(x)eix

µkµ , (1.54)

where a factor of c−1 has been absorbed in the integration measure and kµ = (k0,k) with k0 =

ω/c.13

The Fourier transform of the scalar wave equation (1.45) becomes kµkµψ(k) = −4πg(k).

Specifically, the Green function obeys the equation kµkµG(k) = −(2π)−4 which is solved by

G(k) = −((2π)4kµkµ)−1. (One may check explicitly that this is the Fourier transform of our

solution (1.51), however for our present purposes there is no need to do so.) The solution of thegeneral scalar wave equation (1.48) obtains by convoluting the Green function and the source,

f(x) = 4π(G ∗ g)(x) ≡ 4π

∫d4xG(x− x′)g(x′).

Using the convolution theorem, this transforms to f(k) = (2π)44πG(k)g(k) = −4πg(k)kµkµ

. Specifi-

cally, for the vector potential, we obtain

Aµ = −4πjµ

kνkν. (1.55)

13

Recall that x0 = ct, i.e. kµxµ = k0x0 − k · x = ωt− k · x.

Page 32: Classical Theoretical Physics - Alexander Altland

28 CHAPTER 1. ELECTRODYNAMICS

This is all we need to check that the gauge condition is fulfilled: The Fourier transform of theLorentz gauge relation ∂µA

µ = 0 is given by kµAµ = kµAµ = 0. Probing this relation on (1.55),

we obtain kµAµ ∝ kµjµ = 0, where we noted that kµj

µ = 0 is the continuity relation.

We have thus shown that the solution (1.53) conforms with the gauge constraints. As an important

byproduct, our proof above reveals an intimate connection between the gauge behaviour of the

electromagnetic potential and current conservation.

Eqs. (1.53) generalize Eqs.(??) and (??) to the case of dynamically varying sources. Inthe next sections, we will explore various aspects of the physical contents of these equations.

1.6 Field energy and momentum

In sections ?? and 1.1 we have seen that the static electric and the magnetic field, resp.,carry energy. How does the concept of field energy generalize to the dynamical case wherethe electric and the magnetic field form an inseparable entity? Naively, one may speculatethat the energy, E, carried by an electromagnetic field is given by the sum of its electric andmagnetic components, E = 1

∫d3x(E ·D+B ·H). As we will show below, this expectation

actually turns out to be correct. However, there is more to the electromagnetic field energythan the formula above. For example, we know from daily experience that electromagneticenergy can flow, and that it can be converted to mechanical energy. (Think of a microwaveoven where radiation flows into whatever fast–food you put into it, to next deposit its energyinto [mechanical] heating.)

In section 1.6.1, we explore the balance between bulk electromagnetic energy, energy flow(current), and mechanical energy. In section 1.6.2, we will see that the concept of field energycan be generalized to one of “field momentum”. Much like energy, the momentum carried bythe electromagnetic field can flow, get converted to mechanical momentum, etc.

1.6.1 Field energy

We begin our discussion of electromagnetic field energy with a few formal operations on theMaxwell equations: multiplying Eq. (1.4) by E· and Eq. (1.5) by −H·, and adding the resultsto each other, we obtain

E · (∇×H)−H · (∇× E)− 1

c(E · ∂tD + H · ∂tB) =

cj · E.

Using that for general vector field v and w, (check!) ∇· (v×w) = w · (∇×v)−v · (∇×w),as well as E · ∂tD = ∂t(E ·D)/2 and H · ∂tB = ∂t(H ·B)/2, this equation can be rewrittenas

1

8π∂t(E ·D + B ·H) +

c

4π∇ · (E×H) + j · E = 0. (1.56)

To understand the significance of Eq.(1.60), let us integrate it over a test volume V :

dt

∫V

d3x1

8π(E ·D + B ·H) = −

∫S(V )

dσ n · c4π

(E×H)−∫V

d3x j · E, (1.57)

Page 33: Classical Theoretical Physics - Alexander Altland

1.6. FIELD ENERGY AND MOMENTUM 29

where use of Gauß’s theorem has been made. The first term is the sum over the electric andthe magnetic field energy density, as derived earlier for static field configurations. We nowinterpret this term as the energy stored in general electromagnetic field configurations and

w ≡ 1

8π(E ·D + B ·H) (1.58)

as the electromagnetic energy density. What is new in electrodynamics is that the fieldenergy may change in time. The r.h.s. of the equation tells us that there are two mechanismswhereby the eletromagnetic field energy may be altered. First, there is a surface integral overthe so–called Poynting vector field

S ≡ c

4πE×H. (1.59)

We interpret the integral over this vector as the energy current passing through the surface Sand the Poynting vector field as the energy current density. Thus, the pair (energy density,w)/(Poynting vector S) plays a role similar to the pair (charge density, ρ)/(current density,j) for the matter field. However, unlike with matter, the energy of the electromagnetic fieldis not conserved. Rather, the balance equation of electromagnetic field energy abovestates that

∂tw +∇ · S = −j · E. (1.60)

The r.h.s. of this equation contains the matter field j which suggests that it describes theconversion of electromagnetic into mechanical energy. Indeed, the temporal change of theenergy of a charged point particle in an electromagnetic field (cf. section 1.1 for a similar lineof reasoning) is given by dtU(x(t)) = −F · x = −q(E + c−1v×B) · v = −qE · v, where weobserved that the magnetic component of the Lorentz force is perpendicular to the velocityand, therefore, does not do work on the particle. Recalling that the current density of a pointparticle is given by j = qδ(x− x(t))v, this expression may be rewritten as dtU =

∫d3x j ·E.

The r.h.s. of Eq. (1.59) is the generalization of this expression to arbitrary current densities.Energy conservation implies that the work done on a current of charged particles has to be

taken from the electromagnetic field. This explains the appearance of the mechanical energyon the r.h.s. of the balance equation (1.60).

INFO Equations of the structure

∂tρ+∇ · j = σ, (1.61)

generally represent continuity equations: suppose we have some quantity – energy, oxygenmolecules, voters of some political party – that may be described by a density distribution ρ.

14

14

In the case of discrete quantities (the voters, say) ρ(x, t)d3x is the average number of “particles” presentin a volume d3x where the reference volume d3x must be chosen large enough to make this number biggerthan unity (on average.)

Page 34: Classical Theoretical Physics - Alexander Altland

30 CHAPTER 1. ELECTRODYNAMICS

The contents of our quantity in a small reference volume d3x maychange due to (i) flow of the quantity to somewhere else. Mathemati-cally, this is described by a current density j, where j may be a functionof ρ. However, (ii) changes may also occur as a consequence of a con-version of ρ into something else (the chemical reaction of oxygen tocarbon dioxide, voters changing their party, etc.). In (1.61), the rateat which this happens is described by σ. If σ equals zero, we speak ofa conserved quantity. Any conservation law can be represented in differential form, Eq. (1.61),or in the equivalent integral form

dt

∫Vd3x ρ(x, t) +

∫SdS · j =

∫Vd3xσ(x, t). (1.62)

It is useful to memorize the form of the continuity equation. For if a mathematical equation of the

form ∂t(function no. 1) + ∇ · (vector field) = function no. 2 crosses our way, we know that we

have encountered a conservation law, and that the vector field must be “the current” associated

to function no. 1. This can be non–trivial information in contexts where the identification of

densities/currents from first principles is difficult. For example, the structure of (1.60) identifies

w as a density and S as its current.

1.6.2 Field momentum ∗Unllike in previous sections on conservation laws, we here identify B = H, D = E, i.e. weconsider the vacuum case.

15

As a second example of a physical quantity that can be exchanged between matter andelectromagnetic fields, we consider momentum. According to Newtons equations, the changeof the total mechanical momentum Pmech carried by the particles inside a volume B is givenby the integrated force density, i.e.

dtPmech =

∫B

d3x f =

∫B

d3x

(ρE +

1

cj×B

),

where in the second equality we inserted the Lorentz force density. Using Eqs. (??) and (??)to eliminate the sources we obtain

dtPmech =1

∫B

d3x(

(∇ · E)E−B× (∇×B) + c−1B× E).

Now, using that B × E = dt(B × E) − B × E = dt(B × E) − cE × (∇ × E) and adding0 = B(∇ ·B) to the r.h.s., we obtain the symmetric expression

dtPmech =1

∫B

d3x((∇ · E)E− E× (∇× E) + B(∇ ·B)−B× (∇×B) + c−1dt(B× E)

),

15

The discussion of the field momentum in matter (cf. ??) turns out to be a delicate matter, wherefore weprefer to stay on the safe ground of the vaccum theory.

Page 35: Classical Theoretical Physics - Alexander Altland

1.6. FIELD ENERGY AND MOMENTUM 31

which may be reorganized as

dt

(Pmech −

1

4πc

∫B

d3xB× E

)=

=1

∫B

d3x ((∇ · E)E− E× (∇× E) + B(∇ ·B)−B× (∇×B)) .

This equation is of the form dt(something) = (something else). Comparing to our earlierdiscussion of conservation laws, we are led to interpret the ‘something’ on the l.h.s. as aconserved quantity. Presently, this quantity is the sum of the total mechanical momentumdensity Pmech and the integral

Pfield =

∫d3xg, g ≡ 1

4πcE×B =

S

c2. (1.63)

The structure of this expression suggests to interpret Pfield as the momentum carried bythe electromagnetic field and g as the momentum density (which happens to be givenby c−2× the Poynting vector.) If our tentative interpretation of the equation above as aconservation law is to make sense, we must be able to identify its r.h.s. as a surface integral.This in turn requires that the components of the (vector valued) integrand be representableas a Xj ≡ ∂iTij i.e. as the divergence of a vector–field Tj with components Tij. (Here,j = 1, 2, 3 plays the role of a spectator index.) If this is the case, we may, indeed, transformthe integral to a surface integral,

∫Bd3xXj =

∫Bd3x ∂iTij =

∫∂Bdσ n ·Tj. Indeed, it is not

difficult to verify that

[(∇ ·X)−X× (∇×X)]j = ∂i

[XiXj −

δij2X ·X

]Identifiyng X = E,B and introducing the components of the Maxwell stress tensor as

Tij =1

[EiEj +BiBj −

δij2

(E · E + B ·B)

], (1.64)

The r.h.s. of the conservation law assumes the form∫Bd3x ∂iTij =

∫∂Bdσ niTij, where ni

are the components of the normal vector field of the system boundary. The law of theconservation of momentum thus assumes the final form

dt(Pmech + Pfield)j =

∫∂B

dσ niTij. (1.65)

Physically, dtdσniTij is the (jth component of the) momentum that gets pushed through dσin time dt. Thus, dσniTij is the momentum per time, or force excerted on dσ and niTij theforce per area or radiation pressure due to the change of linear momentum in the system.

Page 36: Classical Theoretical Physics - Alexander Altland

32 CHAPTER 1. ELECTRODYNAMICS

It is straightforward to generalize the discussion above to the conservation of angularmomentum: The angular momentum carried by a mechanical system of charged particlesmay be converted into angular momentum of the electromagnetic field. It is evident fromEq. (1.63) that the angular momentum density of the field is given by

l = x× g =1

4πcx× (E×B).

However, in this course we will not discuss angular momentum conservation any further.

1.6.3 Energy and momentum of plane electromagnetic waves

Consider a plane wave in vacuum. Assuming that the wave propagates in 3–direction, thephysical electric field, Ephys is given by

Ephys.(x, t) = ReE(x, t) = Re((E1e1 + E2e2)eikx3−iωt

)= r1u1(x3, t)e1 + r2u2(x3, t)e2,

where ui(x3, t) = cos(φi + kx3 − ωt) and we defined Ei = ri exp(iφ1) with real ri. Similarly,the magnetic field Bphys. is given by

Bphys.(x, t) = Re (e3 × E(x, t)) = r1u1(x3, t)e2 − r2u2(x, t)e1.

From these relations we obtain the energy density and Poynting vector as

w(x, t) =1

8π(E2 + B2) =

1

4π((r1u1)2 + (r2u2)2)(x3, t)

S(x, t) = cw(x, t)e3,

where we omitted the subscript ’phys.’ for notational simplicity.

EXERCISE Check that w and S above comply with the conservation law (1.60).

1.7 Electromagnetic radiation

To better understand the physics of the expressions (1.53), let us consider where the sourcesare confined to within a region in space of typical extension d. Without loss of generality,we assume the time dependence of the sources to be harmonic, with a certain characteristicfrequency ω, jµ(x) = jµ(x) exp(−iωt). (The signal generated by sources of more generaltime dependence can always be obtained by superposition of harmonic signals.) As we shallsee, all other quantities of relevance to us (potentials, fields, etc.) will inherit the sametime dependence, i.e. X(x, t) = X(x) exp(−iωt). Specifically, Eq. (1.53) implies that theelectromagnetic potentials, too, oscillate in time, Aµ(x) = Aµ(x) exp(−iωt). Substituting

this ansatz into Eq. (1.53), we obtain Aµ(x) = 1c

∫d3x′ j

µ(x′)eik|x−x′|

|x−x′| .

Page 37: Classical Theoretical Physics - Alexander Altland

1.7. ELECTROMAGNETIC RADIATION 33

As in our earlier analyses of multipole fields, we assume that the observation point x is faraway from the source, r = |x| � d. Under these circumstances, we may focus attention onthe spatial components of the vector potential,

A(x) =1

c

∫d3x′

j(x′)eik|x−x′|

|x− x′| , (1.66)

where we have substituted the time–oscillatory ansatz for sources and potential into (1.53)and divided out the time dependent phases. From (1.66) the magnetic and electric fields areobtained as

B = ∇×A, E = ik−1∇× (∇×A), (1.67)

where in the second identity we used the source–free variant of the law of magnetic circulation,c−1∂tE = −ikE = ∇×B.

We may now use the smallness of the ratio d/r to expand |x−x′| ' r−n ·x′+ . . . , wheren is the unit vector in x–direction. Substituting this expansion into (1.66) and expandingthe result in powers of r−1, we obtain a series similar in spirit to the electric and magneticmultipole series discussed above. For simplicity, we here focus attention on the dominantcontribution to the series, obtained by approximating |x − x′| ' r. For reasons to becomeclear momentarily, this term,

A(x) ' eikr

cr

∫d3x′ j(x′), (1.68)

generates electric dipole radiation. In section ?? we have seen that for a static currentdistribution, the integral

∫j vanishes. However, for a dynamic distribution, we may engage

the Fourier transform of the continuity relation, iωρ = ∇ · j to obtain∫d3x ji =

∫d3x∇xi · j = −

∫d3x′ xi(∇ · j) = −iω

∫d3x′ xiρ.

Substituting this result into (1.68) and recalling the definition of the electric dipole momentof a charge distribution, d ≡

∫d3xx · ρ, we conclude that

A(x) ' −ikdeikr

r, (1.69)

is indeed controlled by the electric dipole moment.Besides d and r another characteristic length scale in the problem is the characteristic

wave length λ ≡ 2π/k = 2πc/ω. For simplicity, we assume that the wave length λ � d ismuch larger than the extent of the source.

INFO This latter condition is usually met in practice. E.g. for radiation in the MHz range,

ω ∼ 106 s−1 the wave length is given by λ = 3 · 108 m s−1/106 s−1 ' 300 m much larger than the

extension of typical antennae.

Page 38: Classical Theoretical Physics - Alexander Altland

34 CHAPTER 1. ELECTRODYNAMICS

We then need to distinguish between threedifferent regimes (see Fig. ??): the near zone,d � r � λ, the intermediate zone, d �λ ∼ r, and the far zone, d � λ � r. Wenext discuss these regions in turn.

Near zone

For r � λ or kr � 1, the exponential in(1.68) may be approximated by unity and weobtain

A(x, t) ' −ikrd eiωt,

where we have re–introduced the time de-pendence of the sources. Using Eq.(1.67) tocompute the electric and magnetic field, weget

B(x, t) = ik

r2n× deiωt,

E(x, t) =3n(n · d)− d

r3eiωt.

The electric field equals the dipole field (??)created by a dipole with time dependent mo-ment d exp(iωt) (cf. the figure where the numbers on the ordinate indicate the separationfrom the radiation center in units of the wave length and the color coding is a measure of thefield intensity.) This motivates the denotation ‘electric dipole radiation’. The magnetic fieldis by a factor kr � 1 smaller than the electric field, i.e. in the near zone, the electromagneticfield is dominantly electric. In the limit k → 0 the magnetic field vanishes and the electricfield reduces to that of a static dipole.

The analysis of the intermediate zone, kr ∼ 1 is complicated in as much as all powersin an expansion of the exponential in (1.68) in kr must be retained. For a discussion of theresulting field distribution, we refer to[1].

Far zone

For kr � 1, it suffices to let the derivative operations on the argument kr of the exponentialfunction. Carrying out the derivatives, we obtain

B = k2n× dei(kr−ωt)

r, E = −n×B. (1.70)

These asymptotic field distributions have much in common with the vacuum electromagneticfields above. Indeed, one would expect that far away from the source (i.e. many wavelengthsaway from the source), the electromagnetic field resembles a spherical wave (by which we

Page 39: Classical Theoretical Physics - Alexander Altland

1.7. ELECTROMAGNETIC RADIATION 35

mean that the surfaces of constant phase form spheres.) For observation points sufficientlyfar from the center of the wave, its curvature will be nearly invisible and we the wave will lookapproximately planar. These expectations are met by the field distributions (1.70): neglectingall derivatives other than those acting on the combination kr (i.e. neglecting corrections ofO(kr)−1), the components of E and B obey the wave equation (the Maxwell equations invacuum.

16The wave fronts are spheres, outwardly propagating at a speed c. The vectors

n ⊥ E ⊥ B ⊥ n form an orthogonal set, as we saw is characteristic for a vacuum plane wave.Finally, it is instructive to compute the energy current carried by the wave. To this end,

we recall that the physically realized values of the electromagnetic field obtain by taking thereal part of (1.70). We may then compute the Poynting vector (1.59) as

S =ck4

4πr2n (d2 − (n · d)2) cos2(kr − ωt) 〈... 〉t−→ ck4

8πr2n (d2 − (n · d)2),

Where the last expression is the energy current temporally averaged over several oscillationperiods ω−1. The energy current is maximal in the plane perpendicular to the dipole moment ofthe source and decays according to an inverse square law. It is also instructive to compute thetotal power radiated by the source, i.e. the energy current integrated over spheres of constantradius r. (Recall that the integrated energy current accounts for the change of energy insidethe reference volume per time, i.e. the power radiated by the source.) Choosing the z–axis ofthe coordinate system to be colinear with the dipole moment, we have d2−(n ·d)2 = d2 sin2 θand

P ≡∫S

dσn · S =ck4d2

8πr2r2

∫ π

0

sin θdθ

∫ 2π

0

dφ sin2 θ =ck4d2

3.

Notice that the radiated power does not depend on the radius of the reference sphere, i.e.the work done by the source is entirely converted into radiation and not, say, in a steadilyincreasing density of vacuum field energy.

INFO As a concrete example of a radiation source, let us consider a center–fed linear an-tenna, i.a. a piece of wire of length a carrying an AC current that is maximal at the centerof the wire. We model the current flow by the distribution I(z) = I0(1 − 2|z|a−1) exp(−iωt),where a � λ = c/ω and alignment with the z–axis has been assumed. Using the continu-ity equation, ∂zI(z, t) + ∂tρ(z, t), we obtain the charge density (charge per unit length) in thewire as ρ(z) = iI02(ωa)−1sgn(z) exp(−iωt). The dipole moment of the source is thus given byd = ez

∫ a−a dz zρ(z, t) = iI0(a/2ω), and the radiated power by

P =(ka)2

12cI2

0 .

The coefficient Rrad ≡ (ka)2/12c of the factor I20 is called radiation resistance of the antenna.

To understand the origin of this denotation, notice that [Rrad] = [c−1] has indeed the dimension of

resistivity (exercise.) Next recall that the power required to drive a current through a conventional

resistor R is given by P = UI = RI2. Comparison with the expression above suggests to interpret

16

Indeed, one may verify (do it!) that the characteristic factors r−1ei(kr−ωt) are exact solutions of the waveequations; they describe the space–time profile of a spherical wave.

Page 40: Classical Theoretical Physics - Alexander Altland

36 CHAPTER 1. ELECTRODYNAMICS

Rrad as the ‘resistance’ of the antenna. However, this resistance has nothing to do with dissipative

energy losses inside the antenna. (I.e. those losses that hold responsible for the DC resistivity of

metals.) Rather, work has to be done to feed energy into electromagnetic radiation. This work

determines the radiation resistance. Also notice that Rrad ∼ k2 ∼ ω2, i.e. the radiation losses

increase quadratic in frequency. This latter fact is of great technological importance.

Our results above relied on a first order expansion in the ratio d/r between the extent ofthe sources and the distance of the observation point. We saw that at this order of theexpansion, the source coupled to the electromagnetic field by its electric dipole moment. Amore sophisticated description of the field field may be obtained by driving the expansion in d/rto higher order. E.g. at next order, the electric quadrupole moment and the magnetic dipolemoment enter the stage. This leads to the generation of electric quadrupole radiation andmagnetic dipole radiation. For an in–depth discussion of these types of radiation we referto [1].

Page 41: Classical Theoretical Physics - Alexander Altland

Chapter 2

Macroscopic electrodynamics

Suppose we want to understand the electromagnetic fields in an environment containing ex-tended pieces of matter. From a purist point of view, we would need to regard the O(1023)carriers of electric and magnetic moments — electrons, protons, neutrons — comprising themedium as sources. Throughout, we will denote the fields e, . . . ,h created by this systemof sources as microscopic fields. (Small characters are used to distinguish the microscopicfields from the effective macroscopic fields to be introduced momentarily.) The ‘microscopicMaxwell equations’ read as

∇ · e = 4πρµ,

∇× b− 1

c

∂te =

cjµ,

∇× e +1

c

∂tb = 0,

∇ · b = 0,

where ρµ and jµ are the microscopic charge and current density, respectively. Now, for severalreasons, it does not make much sense to consider these equations as they stand: First, it isclear that attempts to get a system of O(1023) dynamical sources under control are boundto fail. Second, the dynamics of the microscopic sources is governed by quantum effects andcannot be described in terms of a classical vector field j. Finally, we aren’t even interested inknowing the microscopic fields. Rather, (in classical electrodynamics) we want to understandthe behaviour of fields on ‘classical’ length scales which generally exceed the atomic scales byfar.

1

In the following, we develop a more praxis oriented theory of electromagnetism in matter.We aim to lump the influence of the microscopic degrees of freedom of the matter–environmentinto a few effective modifiers of the Maxwell equations.

1

For example, the wave–length of visible light is about 600 nm, whilst the typical extension of a moleculeis 0.1 nm. Thus, ‘classical’ length scales of interest are about three to four orders of magnitude larger thanthe microscopic scales.

37

Page 42: Classical Theoretical Physics - Alexander Altland

38 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS

Figure 2.1: Schematic on the spatially averaged system of sources. A zoom into a solid (left)reveals a pattern of ions or atoms or molecules (center left). Individual ions/atoms/moleculescentered at lattice coordinate xm may carry a net charge (qm), and a dipole element dm (centerright). A dipole element forms if sub-atomic constituents (electrons/nuclei) are shifted asym-metrically w.r.t. the center coordinate. In an ionic system, mobile electrons (at coordinatesxj contribute to the charge balance.)

2.1 Macroscopic Maxwell equations

Let us, then, introduce macroscopic fields by averaging the microscopic fields as

E ≡ 〈e〉, B ≡ 〈b〉,

where the averaging procedure is defined by 〈f(x)〉 ≡∫d3x′ f(x−x′)g(x′), and g is a weight

function that is unit normalized,∫gt = 1 and decays over sufficiently large regions in space.

Since the averaging procedure commutes with taking derivatives w.r.t. both space and time,∂t,xE = 〈∂t,xe〉 (and the same for B), the averaged Maxwell equations read as

∇ · E = 4π〈ρµ〉,

∇×B− 1

c

∂tE =

c〈jµ〉,

∇× E +1

c

∂tB = 0,

∇ ·B = 0.

2.1.1 Averaged sources

To better understand the effect of the averaging on the sources, we need to take a (superficial)look into the atomic structure of a typical solid. Generally, two different types of charges insolids have to be distinguished: first, there are charges — nuclei and valence electrons —that are by no more than a vector am,j off the center coordinate of a molecule (or atom forthat matter) xm. Second, (in metallic systems) there are free conduction electrons which mayabide at arbitrary coordinates xi(t) in the system. Denoting the two contributions by ρb andρf , respectively, we have ρµ = ρb +ρf , where ρb(x, t) =

∑m,j qm,jδ(x−xm(t)−am,j(t)) and

ρf = qe∑

i δ(x− xi(t)). Here, qm,j is the charge of the ith molecular constituent and qe theelectron charge.

Page 43: Classical Theoretical Physics - Alexander Altland

2.1. MACROSCOPIC MAXWELL EQUATIONS 39

Averaged charge density

The density of the bound charges, averaged over macroscopic proportions, is given by

〈ρb(x)〉 =

∫dx′ g(x′)

∑m,j

qm,jδ(x− x′ − xm − am,j) =

=∑m,j

g(x− xm − am,j)qm,j '

'∑m

qmg(x− xm)−∇∑m

g(x− xm) ·∑j

am,jqm,j︸ ︷︷ ︸dm

=

=∑m

qm〈δ(x− xm)〉 − ∇ ·P(x).

In the second line, we used the fact that the range over which the weight function g changesexceeds atomic extensions by far to Taylor expand to first order in the offsets a. The zerothorder term of the expansion is oblivious to the molecular structure, i.e. it contains only thetotal charge qm =

∑j qm,j of the molecules and their center positions. The first order term

contains the molecular dipole moments, dm ≡∑

j am,jqm,j. By the symbol2

P(x, t) ≡⟨∑

m

δ(x− xm(t))dm(t)

⟩, (2.1)

we denote the average polarization of the medium. Evidently, P is a measure of the densityof the dipole moments carried by the molecules in the system.

The average density of the mobile carriers in the system is given by 〈ρf (x, t)〉 = 〈qe∑

i δ(x−xi(t))〉, so that we obtain the average charge density as

〈ρµ(x, t)〉 = ρav(x, t)−∇ ·P(x, t) (2.2)

where we defined the effective or macroscopic charge density in the system,

ρav(x, t) ≡⟨∑

m

qmδ(x− xm(t)) + qe∑i

δ(x− xi(t))

⟩, (2.3)

Notice that the molecules/atoms present in the medium enter the quantity ρav(x, t) as point–like entities. Their finite polarizability is accounted for by the second term contributing to 〈ρ〉.Also notice that most solids are electrically neutral on average, i.e. the positive charge of ionsaccounted for in the first term in (2.3) cancels against the negative charge of mobile electrons(the second term) meaning that ρav(x, t) = 0. Under these circumstances,

〈ρµ(x, t)〉 = −∇ ·P(x, t). (2.4)

2

Here, we allow for dynamical atomic or molecular center coordinates, xm = xm(t). In this way, the defini-tion of P becomes general enough to encompass non-crystalline, or liquid substances in which atoms/moleculesare not tied to a crystalline matrix.

Page 44: Classical Theoretical Physics - Alexander Altland

40 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS

where P is given by the averaged polarization (2.1). The electric field E in the mediumis caused by the sum of the averaged microscopic density, 〈ρµ〉, and an ’external’ chargedensity,

3

∇ · E = −4π(ρ+ 〈ρµ〉), (2.5)

where ρ is the external charge density.

Averaged current density

Averaging the current density proceeds in much the same way as the charge density procedureoutlined above. As a result of a somewhat more tedious calculation (see the info block below)we obtain

〈jµ〉 ' jav + P + c∇×M, (2.6)

where

M(x) =

⟨∑m

δ(x− xm)1

2c

∑j

qm,jam,j × am,j

⟩(2.7)

is the average density of magnetic dipole moments in the system. The quantity

jav ≡⟨qe∑

xiδ(x− xi)⟩

+

⟨∑m

qmxmδ(x− xm)

is the current carried by the free charge carriers and the point–like approximation of themolecules. Much like the average charge density ρav this quantity vanishes in the majority ofcases – solids usually do not support non–vanishing current densities on average, i.e. jav = 0,meaning that the average of the microscopic current density is given by

〈jµ〉 = P + c∇×M. (2.8)

In analogy to Eq. (2.5), the magnetic induction has the sum of 〈jµ〉 and an ’external’ currentdensity, j as its source:

∇×B− 1

c∂tE =

c(〈jµ〉+ j) . (2.9)

INFO To compute the average of the microscopic current density, we we decompose jµ =jb + jf the current density into a bound and a free part. With jb(x) =

∑m,j(xm + am,j)δ(x −

3

For example, for a macroscopic system placed into a capacitor, the role of the external charges will betaken by the charged capacitor plates, etc.

Page 45: Classical Theoretical Physics - Alexander Altland

2.1. MACROSCOPIC MAXWELL EQUATIONS 41

xm − am,j), the former is averaged as

〈jb(x)〉 =

∫dx′ g(x′)

∑m,j

qm,j(xm + am,j) δ(x− x′ − xm − am,j) =

=∑m,j

g(x− xm − am,j)(xm + am,j)qm,j '

'∑m,j

qm,j

[g(x− xm)−

∑m

∇g(x− xm) · am,j]

(xm + am,j).

We next consider the different orders in a contributing to this expansion in turn. At zeroth order,we obtain

〈jb(x)〉(0) =∑m

qmg(x− xm)xm =

⟨∑m

qmxmδ(x− xm)

⟩,

i.e. the current carried by the molecules in a point–like approximation. The first order term isgiven by

〈jb(x)〉(1) =∑m

[g(x− xm) dm − (∇g(x− xm) · dm) xm

].

The form of this expression suggests to compare it with the time derivative of the polarizationvector,

P = dt∑m

g(x− xm)dm =∑m

[g(x− xm) dm − (∇g(x− xm) · xm)dm

].

The difference of these two expressions is given by X ≡ 〈jb(x)〉(1) − P =∑

m∇g(x − xm) ×(dm × xm). By a dimensional argument, one may show that this quantity is negligibly small: themagnitude |xm| ∼ v is of the order of the typical velocity of the atomic or electronic compoundsinside the solid. We thus obtain the rough estimate |X(q, ω)| ∼ v(∇P)(q, ω)| ∼ vq|P|, wherein the second step we generously neglected the differences in the vectorial structure of P and X,resp. However, |(P)(q, ω)| = ω|P(q, ω)|. This means that |X|/|P| ∼ vq/ω ∼ v/c is of the orderof the ratio of typical velocities of non–relativistic matter and the speed of light. This ratio is sosmall that it (a) over–compensates the crudeness of our estimates above by far and (b) justifiesto neglect the difference X. We thus conclude that

〈jb〉(1) ' P.

The second order term contributing to the average current density is given by

〈jb〉(2) = −∑m,j

qm,jam,j (am,j · ∇g(x− xm))

This expression is related to the density of magnetic dipole moments carried by the molecules.The latter is given by (cf. Eq. (??) and Eq. (2.7))

M =1

2c

∑m,j

qm,jam,j × am,jg(x− xm).

As a result of a straightforward calculation, we find that the curl of this expression is given by

∇×M =1

c〈jb〉(2) + dt

1

2c

∑m,j

qm,jam,j (am,j · ∇g(x− xm)).

Page 46: Classical Theoretical Physics - Alexander Altland

42 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS

The second term on the r.h.s. of this expression engages the time derivative of the electricquadrupole moments ∼∑ qaaab, a, b = 1, 2, 3 carried by the molecules. The fields generated bythese terms are arguably very weak so that

〈jb〉(2) ' c∇×M.

Adding to the results above the contribution of the free carriers 〈jf (x)〉 = qe 〈∑

i xiδ(x− xi)〉,we arrive at Eq. (2.6).

Interpretation of the sources I: charge and current densities

Eqs. (2.4) and (2.8) describe the spatial average of the microscopic sources of electromagneticfields in an extended medium. The expressions appearing on the r.h.s. of these equations canbe interpreted in two different ways, both useful in their own right. One option is to look at ther.h.s. of, say, Eq. (2.4) as an effective charge density. This density obtains as the divergenceof a vector field P describing the average polarization in the medium (cf. Eq. (2.1)).

To understand the meaning of this expression, consider the cartoon of a solid shown inFig. 2.2. Assume that the microscopic constituents of the medium are ’polarized’ in a certaindirection.

4Formally, this polarization is described by a vector field P, as indicated by the

horizontal cut through the medium in Fig. 2.2. In the bulk of the medium, the ’positive’and ’negative ends’ of the molecules carrying the polarization mutually neutralize and no net-charge density remains. (Remember that the medium is assumed to be electrically neutral onaverage.) However, the boundaries of the system carry layers of uncompensated positive andnegative polarization, and this is where a non-vanishing charge density will form (indicatedby a solid and a dashed line at the system boundaries.) Formally, the vector field P beginsand ends (has its ’sources’) at the boundaries, i.e. ∇ · P is non-vanishing at the boundarieswhere it represents the effective surface charge density. This explains the appearance of ∇·Pas a source term in the Maxwell equations.

Dynamical changes of the polarization ∂tP 6= 0, – due to, for example, fluctuations in theorientation of polar molecules – correspond to the movement of charges parallel to P. Therate ∂tP of charge movement is a current flow in P direction.

5Thus, ∂tP must appear as a

bulk current source in the Maxwell equations.

Finally, a non-vanishing magnetization M (cf. the vertical cut through the system) corre-sponds to the flow of circular currents in a plane perpendicular to M. For homogeneous M,the currents mutually cancel in the bulk of the system, but a non-vanishing surface currentremains. Formally, this surface current density is represented by the curl of M (Exercise:compute the curl of a vector field that is perpendicular to the cut plane and homogeneousacross the systems.), i.e. c∇×M is a current source in the effective Maxwell equations.

4

Such a polarization may form spontaneously, or, more frequently, be induced externally, cf. the info blockbelow.

5

Notice that a current flow j = ∂tP is not in conflict with the condition of electro-neutrality.

Page 47: Classical Theoretical Physics - Alexander Altland

2.1. MACROSCOPIC MAXWELL EQUATIONS 43

Figure 2.2: Cartoon illustrating the origin of (surface) charge and current densities forming ina solid with non-vanishing polarization and/or magnetization.

Interpretation of the sources II: fields

Above, we have interpreted∇·P, ∂tP and∇×M as charge and current sources in the Maxwellequations. But how do we know these sources? It stands to reason that the polarization, P,and magnetization, M, of a bulk material will sensitively depend on the electric field, E, and themagnetic field, B, present in the system. For in the absence of such fields, E = 0, B = 0, onlyfew materials will ’spontaneously’ build up a non–vanishing polarization, or magnetization.

6

Usually, it takes an external electric field, to orient the microscopic constituents of a solid soas to add up to a bulk polarization. Similarly, a magnetic field is usually needed to generatea bulk magnetization.

This means that the solution of the Maxwell equations poses us with a self consistencyproblem: An external charge distribution, ρext immersed into a solid gives rise to an electricfield. This field in turn creates a bulk polarization, P[E], which in turn alters the chargedistribution, ρ → ρeff [E] ≡ ρext − ∇ · P[E]. (The notation indicates that the polarizationdepends on E.) Our task then is to self consistently solve the equation ∇·E = 4πρeff [E], witha charge distribution that depends on the sought for electric field E. Similarly, the currentdistribution j = jeff [B,E] in a solid depends on the electric and the magnetic field whichmeans that ’Amperes law’, too, poses a self consistency problem.

In this section we re–interpret polarization and magnetization in a manner tailor made tothe self consistent solution of the Maxwell equations. We start by substituting Eq. (2.2) into(2.5). Rearranging terms we obtain

∇ · (E + 4πP) = 4πρ.

6

Materials disobeying this rule are called ferroelectric (spontaneous formation of polarization), or fer-romagnetic (spontaneous magnetization), respectively. In spite of their relative scarcity, ferroelectric and–magnetic materials are of huge applied importance. Application fields include, storage and display technol-ogy, sensor technology, general electronics, and, of course, magnetism.

Page 48: Classical Theoretical Physics - Alexander Altland

44 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS

This equation suggests to introduce a field

D ≡ E + 4πP. (2.10)

The source of this displacement field is the external charge density,

∇ ·D = 4πρ, (2.11)

(while the electric field has the spatial average of the full microscopic charge system as itssource, ∇ · E = 4π〈ρµ〉.) Notice the slightly different perspectives at which Eqs. (2.5) and(2.10) interpret the phenomenon of matter polarization. Eq. (2.5) places the emphasis onthe intrinsic charge density −4π〈ρµ〉. The actual electric field, E, is caused by the sum ofexternal charges and polarization charges. In Eq. (2.10), we consider a fictitious field D, whosesources are purely external. The field D differs from the actual electric field E by (4π×) thepolarization, P, building up in the system. Of course the two alternative interpretations ofpolarization are physically equivalent. (Think about this point.)

The magnetization, M, can be treated in similar terms: Substituting Eq. (2.6) into (2.9),we obtain

∇× (B− 4πM)− 1

c∂t(E + 4πP) =

cj.

The form of this equation motivates the definition of the magnetic field

H = B− 4πM. (2.12)

Expressed in terms of this quantity, the Maxwell equation assumes the form

∇×H− 1

c∂tD =

cj, (2.13)

where j is the external current density. According to this equation, H plays a role similar to D:the sources of the magnetic field H are purely external. The field H differs from the inductionB (i.e. the field an magnetometer would actually measure in the system) by (−4π×) themagnetization M. In conceptual analogy to the electric field, E, the induction B has the sumof external and intrinsic charge and current densities as its source (cf. Eq. (2.9).)

Electric and magnetic susceptibility

To make progress with Eqs. (2.10) and (2.11), we need to say more about the polarization.Above, we have anticipated that a finite polarization is usually induced by an electric field, i.e.P = P[E]. Now, external electric fields triggering a non–vanishing polarization are ususallymuch weaker than the intrinsic microscopic fields keeping the constituents of a solid together.In practice, this means that the polarization may be approximated by a linear functional of the

Page 49: Classical Theoretical Physics - Alexander Altland

2.1. MACROSCOPIC MAXWELL EQUATIONS 45

electric field. The most general form of a linear functional reads as7

P(x, t) =1

(2π)4

∫d3x′

∫dt′ χ(x, t;x′, t′)E(x′, t′), (2.14)

where the integral kernel χ is called the electric susceptibility of the medium and the factor(2π)−4 has been introduced for convenience. The susceptibility χ(x, t;x′, t′) describes howan electric field amplitude at x′ and t′ < t affects the polarization at (x, t).

8Notice that the

polarization obtains by convolution of χ and E. In Fourier space, the equation assumes thesimpler form P(q) = χ(q)E(q), where q = (ω/c,q) as usual. Accordingly, the electric fieldand the displacement field are connected by the linear relation

D(q) = ε(q)E(q), ε(q) = 1 + 4πχ(q), (2.15)

where the function ε(q) is known as the dielectric function.

INFO On a microscopic level, at least two different mechanisms causing field–induced polar-

ization need to be distinguished: Firstly, a finite electric field may cause a distortion of the electron

clouds surrounding otherwise unpolarized atoms or molecules. The relative shift of the electron

clouds against the nuclear centers causes a finite molecular dipole moment which integrates to

a macroscopic polarization P. Second, in many substances (mostly of organic chemical origin),

the molecules (a) carry a permanent intrinsic dipole moment and (b) are free to move. A finite

electric field will cause spatial alignment of the otherwise dis–oriented molecules. This leads to a

polarization of the medium as a whole.

Information on the frequency/momentum dependence of the dielectric function has tobe inferred from outside electrodynamics, typically from condensed matter physics, or plasmaphysics.

9In many cases of physical interest — e.g. in the physics of plasmas or the physics of

metallic systems — the dependence of ε on both q and ω has to be taken seriously. Sometimes,only the frequency dependence, or the dependence on the spatial components of momentumis of importance. However, the crudest approximation at all (often adopted in this course)is to approximate the function ε(q) ' ε by a constant. (Which amounts to approximatingχ(x, t;x′, t′) ∝ δ(x− x′)δ(t− t′) by a function infinitely short ranged in space and time.) Ifnot mentioned otherwise, we will thus assume

D = εE,

where ε is a material constant.7

Yet more generally, one might allow for a non co–linear dependence of the polarization on the field vector:

Pi(x, t) =1

(2π)4

∫d3x′

∫dt′ χij(x, t;x

′, t′)Ej(x′, t′),

where χ = {χij} is a 3× 3–matrix field.8

By causality, χ(x, t;x′, t′) ∝ Θ(t− t′).9

A plasma is a gas of charged particles.

Page 50: Classical Theoretical Physics - Alexander Altland

46 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS

As with the electric sector of the theory, the magnetic fields created by externally imposedmacroscopic current distributions (a) ‘magnetize’ solids, i.e. generate finite fields M, and (b)are generally much weaker than the intrinsic microscopic fields inside a solid. This means thatwe can write B = H+ 4πM[H], where M[H] may be assumed to be linear in H. In analogyto Eq. (2.14) we thus define

M(x, t) =1

(2π)4

∫d3x′

∫dt′ χm(x, t;x′, t′)H(x′, t′), (2.16)

where the function χ is called the magnetic susceptibility of the system. The magnetic sus-ceptibility describes how a magnetic field at (x′, t′) causes magnetization at (x, t). Everythingwe said above about the electric susceptibility applies in a similar manner to the magneticsusceptibility. Specifically, we have the Fourier space relation M(q) = χm(q)H(q) and

B(q) = µ(q)H(q), µ(q) = 1 + 4πχm(q), (2.17)

where µ(q) is called the magnetic permeability. In cases where the momentum dependenceof the susceptibility is negligible, we obtain the simplified relation

B = µH, (2.18)

where µ is a constant. As in the electric case, a number of different microscopic mechanismscausing deviations of µ off unity can be distinguished. Specifically, for molecules carrying nointrinsic magnetic moment, the presence of an external magnetic field may induce molecularcurrents and, thus, a non–vanishing magnetization. This magnetization is generally directedopposite to the external field, i.e. µ < 1. Substances of this type are called diamagnetic.(Even for the most diamagnetic substance known, bismuth, 1 − µ = O(10−4), i.e. diamag-netism is a very weak effect.) For molecules carrying a non–vanishing magnetic moment, anexternal field will cause alignment of the latter and, thus, an effective amplification of thefield, µ > 1. Materials of this type are called paramagnetic. Typical values of paramagneticpermeabilities are of the order of µ− 1 = O(10−5)−O(10−2).

Before leaving this section, it is worthwhile to take a final look at the general structureof the Maxwell equations in matter, We note that

. The averaged charge densities and currents are sources of the electric displacement fieldD and the magnetic field H, respectively.

. These fields are related to the electric field E and the magnetic induction B by Eqs. (2.15)and (2.17), respectively.

. The actual fields one would measure in an experiment are E and B; these fields determinethe coupling (fields matter) as described by the Lorentz force law.

. Similarly, the (matter un–related) homogeneous Maxwell equations contain the fields Eand B.

Page 51: Classical Theoretical Physics - Alexander Altland

2.2. APPLICATIONS OF MACROSCOPIC ELECTRODYNAMICS 47

2.2 Applications of macroscopic electrodynamics

In this section, we will explore a number of phenomena caused by the joint presence of matterand electromagnetic fields. The organization of the section parallels that of the vacuum partof the course, i.e. we will begin by studying static electric phenomena, then advance to thestatic magnetic case and finally discuss a few applications in electrodynamics.

2.2.1 Electrostatics in the presence of matter *

Generalities

Consider the macroscopic Maxwell equations of electrostatics

∇ ·D = 4πρ,

∇× E = 0,

D = εE, (2.19)

where in the last equation we assumed a dielectric constant for simplicity. Now assume thatwe wish to explore the fields in the vicinity of a boundary separating two regions containingtypes of matter. In general, the dielectric constants characterizing the two domains will bedifferent, i.e. we have D = εE in region #1 and D = ε′E in region #2. (For ε′ = 1, wedescribe the limiting case of a matter/vacuum interface.) Optionally, the system boundarymay carry a finite surface charge density η.

Arguing as in section ??, we find that the first and the second of the Maxwell equationsimply the boundary conditions

10

(D(x)−D′(x)) · n(x) = 0,

(E(x)− E′(x))× n(x) = 0, (2.20)

where the unprimed/primed quantities refer to the fields in regions #1 and #2, respectively,and n(x) is a unit vector normal to the boundary at x. Thus,

the tangential component of the electric field continuous and the normalcomponent of the displacement field are continuous at the interface.

Example: Dielectric sphere in a homogeneous electric field

To illustrate the computation of electric fields in static environments with matter, consider theexample of a massive sphere of radius R and dielectric constant ε placed in an homogeneousexternal displacement field D. We wish to compute the electric field E in the entire medium.

10

If the interface between #1 and #2 carries external surface charges, the first of these equations generalizesto (D(x)−D′(x)) ·n(x) = 4πη(x), where η is the surface charge density. In this case, the normal componentof the displacement field jumps by an amount set the by the surface charge density.

Page 52: Classical Theoretical Physics - Alexander Altland

48 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS

Since there are no charges present, the electric potential inside and outside the sphereobeys the Laplace equation ∆φ = 0. Further, choosing polar coordinates such that the z–axisis aligned with the orientation of the external field, we expect the potential to be azimuthallysymmetric, φ(r, θ, φ) = φ(r, θ). Expressed in terms of this potential, the boundary equations(2.20) assume the form

ε∂rφ∣∣r=R−0

= ∂rφ∣∣r=R+0

, (2.21)

∂θφ∣∣r=R−0

= ∂θφ∣∣r=R+0

, (2.22)

φ∣∣r=R−0

= φ∣∣r=R+0

, (2.23)

where R±0 ≡ limη→0R±η. To evaluate these conditions, we expand the potential inside andoutside the sphere in a series of spherical harmonics, (??). Due to the azimuthal symmetry of

the problem, only φ–independent functions Yl,0 =√

2l+14πPl contribute, i.e.

φ(r, θ) =∑l

Pl(cos θ)×{Alr

l , r < R,

Blrl + Clr

−(l+1) , r ≥ R,,

where we have absorbed the normalization factors (2l+ 1/4π)1/2 in the expansion coefficientsAl, Bl, Cl.

At spatial infinity, the potential must approach the potential φ = −Dz = −Dr cos θ =−DrP1(cos θ) of the uniform external field D = Dez. Comparison with the series aboveshows that B1 = −D and Bl 6=1 = 0. To determine the as yet unknown coefficients Al and Cl,we consider the boundary conditions above. Specifically, Eq. (2.21) translates to the condition∑

l

Pl(cos θ)(εlAlR

l−1 + (l + 1)ClR−(l+2) +Dδl,1

)= 0.

Now, the Legendre polynomials are a complete set of functions, i.e. the vanishing of the l.h.s.of the equation implies that all series coefficients must vanish individually:

C0 = 0,

εA1 + 2R−3C1 +D = 0,

εlAlRl−1 + (l + 1)ClR

−(l+2) = 0, l > 1.

A second set of equations is obtained from Eq. (2.22): ∂θ∑

l Pl(cos θ)((Al−Bl)Rl−ClR−(l+1)) =

0 or∑

l>0 Pl(cos θ)((Al − Bl)Rl − ClR−(l+1)) = const. The second condition implies (think

why!) (Al −Bl)Rl −ClR−(l+1) = 0 for all l > 0. Substituting this condition into Eq. (2.23),

we finally obtain A0 = 0. To summarize, the expansion coefficients are determined by the setof equations

A0 = 0,

(A1 +D)R− C1R−2 = 0,

AlRl − ClR−(l+1) = 0, l > 1.

Page 53: Classical Theoretical Physics - Alexander Altland

2.2. APPLICATIONS OF MACROSCOPIC ELECTRODYNAMICS 49

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+ +

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+ +

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

D

η > 0 η < 0

Figure 2.3: Polarization of a sphere by an external electric field.

It is straightforward to verify that these equations are equivalent to the conditions

C1 = DR3 ε− 1

ε+ 2, A1 = − 3

ε+ 2D,

while Cl>1 = Al>1 = 0. The potential distorted by the presence of the sphere is thus given by

φ(r, θ) = −rE0 cos θ ×

3

ε+ 2, r < R

1− ε− 1

ε+ 2

(R

r

)3

, r ≥ R,(2.24)

The result (2.24) affords a very intuitive physical interpretation:

. Inside the sphere the electric field E = −∇φ = ∇E0z3ε+2

= E03ε+2

ez is parallel to theexternal electric field, but weaker in magnitude (as long as ε > 1 which usually is thecase.) This indicates the buildup of a polarization P = ε−1

4πE = 3

4πε−1ε+2

E0.

Outside the sphere, the electric field is given by a superposition of the external field andthe field generated by a dipole potential φ = d · x/r3 = d cos θ/r2, where the dipolemoment is given by d = VP and V = 4πR3/3 is the volume of the sphere.

To understand the origin of this dipole moment, notice that in a polarizable medium, andin the absence of external charges, 0 = ∇·D = ∇·E+ 4π∇·P, while the microscopiccharge density ∇ · E = 4πρmic need not necessarily vanish. In exceptional cases whereρmic does not vanish upon averaging, we denote 〈ρmic〉 = ρpol the polarization charge.The polarization charge is determined by the condition

∇ ·P = −ρpol.

Turning back to the sphere, the equation 0 = ∇ · D = εχ∇ · P tells us that there is

no polarization charge in the bulk of the sphere (nor, of course, outside the sphere.)

Page 54: Classical Theoretical Physics - Alexander Altland

50 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS

However, arguing as in section ??, we find that the sphere (or any boundary betweentwo media with different dielectric constants) carries a surface polarization charge

ηind = P ′n − Pn.Specifically, in our problem, P ′n = 0 so that η = −Pn = − 3

4πε−1ε+2

E0 cos θ. Qualitatively,the origin of this charge is easy enough to understand: while in the bulk of the sphere,the induced dipole moments cancel each other upon spatial averaging (see the figure),uncompensated charges remain at the surface of the medium. These surface chargesinduce a finite and quite macroscopic dipole moment which is aligned opposite to E0

and acts as a source of the electric field (but not of the displacement field).

2.2.2 Magnetostatics in the presence of matter

Generalities

The derivation of ‘magnetic boundary conditions’ in the presence of materials with a finitepermeability largely parallels the electric case: the macroscopic equations of magnetostaticsread as

∇×H =4π

cj,

∇ ·B = 0,

H = µ−1B. (2.25)

Again, we wish to explore the behaviour of the fields in the vicinity of a boundary betweentwo media with different permeabilities. Proceeding in the usual manner, i.e. by applicationof infinitesimal variants of Gaus’s and Stokes law, respectively, we derive the equations

(B(x)−B′(x)) · n = 0,

(H(x)−H′(x))× n =4π

cjs, (2.26)

where js is an optional surface current.

Example: paramagnetic sphere in a homogeneous magnetic field

Consider a sphere of radius R and permeability µ > 1 in the presence of an external magneticfield H. We wish to compute the magnetic induction B inside and outside the sphere. Inprinciple, this can be done as in the analogous problem on the dielectric sphere, i.e. byLegendre polynomial series expansion. However, comparing the magnetic Maxwell equationsto electric Maxwell equations considered earlier,

∇ ·B = 0 ↔ ∇ ·D = 0,

∇×H = 0 ↔ ∇× E = 0,

B = µH ↔ D = εE,

P =ε− 1

4πE ↔ M =

µ− 1

4πH,

Page 55: Classical Theoretical Physics - Alexander Altland

2.3. WAVE PROPAGATION IN MEDIA 51

we notice that there is actually no need to do so: Formally identifying B↔ D,H↔ E, µ↔ε,M ↔ P, the two sets of equation become equivalent, and we may just copy the solutionobtained above. Thus (i) the magnetic field outside the sphere is a superposition of theexternal field and a magnetic dipole field, where (ii) the dipole moment M = 3

4πµ−1µ+2

H isparallel to the external field. This magnetic moment is caused by the orientation of theintrinsic moments along the external field axis; consequently, the actually felt magnetic field(the magnetic induction) B = H + 4πM exceeds the external field.

2.3 Wave propagation in media

What happens if an electromagnetic wave impinges upon a medium characterized by a non–trivial dielectric function ε?

11And how does the propagation behaviour of waves relate to

actual physical processes inside the medium? To prepare the discussion of these questions, wewill first introduce a physically motivated model for thhe dependence of the dielectric functionon its arguments. This modelling will establish a concrete link between the amplitude of theelectromagnetic waves and the dynamics of the charge carriers inside the medium.

2.3.1 Model dielectric function

In Fourier space, the dielectric function, ε(q, ω) is a function of both wave vector and frequency.It owes its frequency dependence to the fact that a wave in matter create excitations of themolecular degrees of freedom. The feedback of these excitations to the electromagnetic waveis described by the frequency dependence of the function ε. To get an idea of the relevantfrequency scales, notice that typical excitation energies in solids are of the order ~ω < 1 eV.Noting that Planck’s constant ~ ' 0.6×10−15 eVs, we find that the characteristic frequenciesat which solids dominantly respond to electromagnetic waves are of the order ω < 1015s−1.However, the wave lengths corresponding to these frequencies, λ = 2π/|q| ∼ 2πc/ω ∼ 10−6 mare much larger than the range of a few interatomic spacings over which we expect thenon–local feedback of the external field into the polarization to be ‘screened’ (the range ofthe susceptibility χ.) This means (think why!) that the function ε(q, ω) ' ε(ω) is largelymomentum–independent at the momentum scales of physical relevance.

To obtain a crude model for the function ε(ω), we imagine that the electrons surroundingthe molecules inside the solid are harmonically bound to the nuclear center coordinates. Anindividual electron at spatial coordinate x (measured w.r.t. a nuclear position) will then besubject to the equation of motion

m(d2t + ω2

0/2 + γdt)x(t) = eE(t), (2.27)

where ω0 is the characteristic frequency of the oscillator motion of the electron, γ a relaxationconstant, and the variation of the external field over the extent of the molecule has been

11

We emphasize dielectric function because, as shall be seen below, the magnetic permeability has a muchlesser impact on electromagnetic wave propagation.

Page 56: Classical Theoretical Physics - Alexander Altland

52 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS

neglected (cf. the discussion above.) Fourier transforming this equation, we obtain d ≡ex(ω) = e2E(ω)m−1/(ω2

0−ω2− iγω). The dipole moment d(ω) of a molecule with electronsfi oscillating at frequency ωi and damping rate γi is given by d(ω) =

∑i fid(ω0→ωi,γ→γi).

Denoting the density of molecules in the by n, we thus obtain (cf. Eqs. (2.1), (2.10), and(2.15))

ε(ω) = 1 +4πne2

m

∑i

fiω2i − ω2 − iγiω

. (2.28)

With suitable quantum mechanical definitions of the material constants fi, ωi and γi, themodel dielectric function (2.28) provides an accurate description of the molecular contri-bution to the dielectric behaviour of solids. A schematic of the typical behaviour of real andimaginary part of the dielectric function is shown in Fig. 2.4. A few remarks on the profile ofthese functions:

. The material constant ε employed in the ‘static’ sections of this chapter is given byε = ε(0). The imaginary part of the zero frequency dielectric function is negligible.

. For reasons to become clear later on, we call the function

α(ω) =2ω

cIm√µε(ω) (2.29)

the (frequency dependent) absorption coefficient and

n(ω) = Re√µε(ω) (2.30)

the (frequency dependent) refraction coefficient of the system.

. For most frequencies, i.e. everywhere except for the immediate vicinity of a resonantfrequency ωi, the absorption coefficient is negligibly small. In these regions, d

dωRe ε(ω) >

0. The positivity of this derivative defines a region of normal dispersion.

. In the immediate vicinity of a resonant frequency, |ω − ωi| < γi/2, the absorptioncoefficient shoots up while the derivative d

dωRe ε(ω) < 0 changes sign. As we shall

discuss momentarily, the physics of these frequency windows of anomalous dispersionis governed by resonant absorption processes.

2.3.2 Plane waves in matter

To study the impact of the non–uniform dielectric function on the behaviour of electromagneticwaves in matter, we consider the spatio–temporal Fourier transform of the Maxwell equationsin a medium free of extraneous sources, ρ = 0, j = 0. Using that D(k) = ε(k)E(k) and

Page 57: Classical Theoretical Physics - Alexander Altland

2.3. WAVE PROPAGATION IN MEDIA 53

ε

ω

γi

ωi

Im ε

Re ε

Figure 2.4: Schematic of the functional profile of the function ε(ω). Each resonance frequencyωi is center of a peak in the imaginary part of ε and of a resonant swing of in the real part.The width of these structures is determined by the damping rate γi.

H(k) = µ−1(k)B(k), where k = (ω/c,k), we have

k · (εE) = 0,

k× (µ−1B) +ω

c(εE) = 0,

k× E− ω

cB = 0,

k ·B = 0,

where we omitted momentum arguments for notational clarity. We next compute k×(thirdequation) + (ωµ/c)×(second equation) and (ωε/c)×(third equation) − k×(second equation)to obtain the Fourier representation of the wave equation(

k2 − ω2

c2(ω)

)×{

E(k, ω)B(k, ω)

= 0, (2.31)

where

c(ω) =c√µε(ω)

, (2.32)

is the effective velocity of light in matter.Comparing with our discussion of section 1.4, we conclude that a plane electric wave in

a matter is mathematically described by the function

E(x, t) = E0 exp(i(k · x− ωt)),

Page 58: Classical Theoretical Physics - Alexander Altland

54 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS

where k = kn and k = ω/c(ω) is a (generally complex) function of the wave frequency.Specifically, for Im (k · x) > 0, the wave is will be exponentially damped. Splitting the wavenumber k into its real and imaginary part, we have

k = Re k + i Im k =ω

cn(ω) +

i

2α(ω),

where n and α are the refraction and absorption coefficient, respectively. To better understandthe meaning of the absorption coefficient, let us compute the Poynting vector of the electro-magnetic wave. As in section 1.4, the Maxwell equation 0 = ∇ ·D = ε∇ ·E ∝ k ·E0 impliestransversality of the electric field. (Here we rely on the approximate x–independence of thedielectric function.) Choosing coordinates such that n = e3 and assuming planar polarizationfor simplicity, we set E0 = E0e1 with a real coefficient E0. The — physically relevant — realpart of the electric field is thus given by

ReE = E0e1 cos(ω(zn/c− t))e−αz/2.

From the Maxwell equations ∇ · B = 0 and ∇ × E + c−1∂tB, we further obtain (exercise)ReB = E0e2 cos(ω(zn/c− t))e−αz/2. Assuming that the permeability is close to unity, µ ' 1or B ' H, we thus obtain for the magnitude of the Poynting vector

|S| = c

4π|E×H| = cE2

0

4πcos2(ω(zn/c− t))e−αz 〈... 〉t−→ cE2

0

8πe−αz,

where in the last step we have averaged over several intervals of the oscillation period 2π/ω.According to this result,

The electromagnetic energy current inside a medium decays at a rateset by the absorption coefficient.

This phenomenon is easy enough to interpret: according to our semi–phenomenological modelof the dielectric function above, the absorption coefficient is non–vanishing in the immediatevicinity of a resonant frequency ωi, i.e. a frequency where molecular degrees of freedomoscillate. The energy stored in an electromagnetic wave of this frequency may thus getconverted into mechanical oscillator energy. This goes along with a loss of field intensity, i.e.a diminishing of the Poynting vector or, equivalently, a loss of electromagnetic energy density,w.

2.3.3 Dispersion

In the previous section we have focused on the imaginary part of the dielectric function andon its attenuating impact on individual plane waves propagating in matter. We next turnto the discussion of the — equally important — role played by the real part. To introduce

Page 59: Classical Theoretical Physics - Alexander Altland

2.3. WAVE PROPAGATION IN MEDIA 55

the relevant physical principles in a maximally simple environment, we will focus on a one–dimensional model of wave propagation throughout. Consider, thus, the one–dimensionalvariant of the wave equations (2.31),(

k2 − ω2

c2(ω)

)ψ(k, ω) = 0, (2.33)

where k is a one–dimensional wave ’vector’ and c(ω) = c/√ε(ω) as before. (For simplicity,

we set µ = 1 from the outset.) Plane wave solutions to this equation are given by ψ(x, t) =ψ0 exp(ik(ω)x − iωt), where k(ω) = ω/c(ω).

12Notice that an alternative representation of

the plane wave configurations reads as ψ(x, t) = ψ0 exp(ikx− iω(k)t), where ω(k) is definedimplicitly, viz. as a solution of the equation ω/c(ω) = k.

To understand the consequences of frequency dependent variations in the real part of thedielectric function, we need, however, to go beyond the level of isolated plane wave. Rather,let us consider a superposition of plane waves (cf. Eq. (1.42)),

ψ(x, t) =

∫dk ψ0(k)e−i(kx−ω(k)t), (2.34)

where ψ0(k) is an arbitrary function. (Exercise: check that this superposition solves the waveequations.)

Specifically, let us chose the function ψ0(k) such that, at initial time t = 0, the distributionψ(x, t = 0) is localized in a finite region in space. Configurations of this type are calledwave packets. While there is a lot of freedom in choosing such spatially localizing envelopefunctions, a convenient (and for our purposes sufficiently general) choice of the weight functionψ0 is a Gaussian,

ψ0(k) = ψ0 exp

(−(k − k0)2 ξ

2

4

),

where ψ0 ∈ C is a amplitude coefficient fixed and ξ a coefficient of dimensionality ‘length’that determines the spatial extent of the wave package (at time t = 0.) To check that latterassertion, let us compute the spatial profile ψ(x, 0) of the wave package at the initial time:

ψ(x, 0) = ψ0

∫dk e−(k−k0)2 ξ

2

4−ikx = ψ0e

ik0x

∫dk e−k

2 ξ2

4−ikx =

= ψ0eik0x

∫dk e−k

2 ξ2

4−(x/ξ)2 = 2

√πψ0ξ e

ik0xe−(x/ξ)2 .

The function ψ(x, 0) is concentrated in a volume of extension ξ; it describes the profile of asmall wave ‘packet’ at initial time t = 0. What will happen to this wave packet as time goeson?

To answer this question, let us assume that the scales over which the function ω(k) variesare much smaller than the extension of the wave package in wave number space, k. It is, then,

12

There is also the ‘left moving’ solution, ψ(x, t) = ψ0 exp(ik(ω)x + iωt), but for our purposes it will besufficient to consider right moving waves.

Page 60: Classical Theoretical Physics - Alexander Altland

56 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS

ξ

ψ

x

vgt

Figure 2.5: Left: Dispersive propagation of an initially sharply focused wave package. Right:A more shallow wave package suffers less drastically from dispersive deformation.

a good idea to expand ω(k) around k0:

ω(k) = ω0 + vg(k − k0) +a

2(k − k0)2 + . . . ,

where we introduced the abbreviations ω0 ≡ ω(k0), vg ≡ ω′(k0) and a ≡ ω′′(k0). Substitutingthis expansion into Eq. (2.34) and doing the Gaussian integral over wave numbers we obtain

ψ(x, t) = ψ0

∫dk e−(k−k0)2ξ2/4−i(kx−ω(k)t) '

' ψ0eik0(x−vpt)

∫dk e−(ξ2+2iat)(k−k0)2/4−i(k−k0)(x−vgt) =

' 2√πψ0ξ(t)e

ik0(x−vpt)e−((x−vgt)/ξ(t))2 ,

where we introduced the function ξ(t) =√ξ2 + 2iat, and the abbreviation vp ≡ ω(k0)/k0

Let us try to understand the main characteristics of this result:

. The center of the wave package moves with a characteristic velocity vg = ∂kω(k0) tothe right. In as much as vg determines the effective center velocity of a superposition ofa continuum or ‘group’ of plain waves, vg is called the group velocity of the system.Only in vacuum where c(ω) = c is independent of frequency we have ω = ck and thegroup velocity vg = ∂kω = c coincides with the vacuum velocity of light.

A customary yet far less useful notion is that of a phase velocity: an individual planewave behaves as ∼ exp(i(kx − ω(k)t)) = exp(ik(x − (ω(k)/k)t)). In view of thestructure of the exponent, one may call vp ≡ ω(k)/k the ‘velocity’ of the wave. Since,however, the wave extends over the entire system anyway, the phase velocity is not ofdirect physical relevance. Notice that the phase velocity vp = ω(k)/k = c(k) = c/n(ω),where the refraction coefficient has been introduced in (2.30). For most frequencies andin almost all substances, n > 1 ⇒ vp < c. In principle, however, the phase velocity

Page 61: Classical Theoretical Physics - Alexander Altland

2.3. WAVE PROPAGATION IN MEDIA 57

may become larger than the speed of light.13

The group velocity vg = ∂kω(k) =(∂ωk(ω))−1 = (∂ωωn(ω))−1c = (n(ω) + ω∂ωn(ω))−1c = vp(1 + n−1ω∂ωn)−1. Inregions of normal dispersion, ∂ωn > 0 implying that vg < vp. For the discussion ofanomalous cases, where vg may exceed the phase velocity or even the vacuum velocityof light, see [1].

. The width of the wave package, ∆x changes in time. Inspection of the Gaussian fac-tor controlling the envelope of the packet shows that Reψ ∼ e(x−vgt)2(ξ−2+ξ∗−2)/2, wherethe symbol ’∼’ indicates that only the exponential dependence of the package is takeninto account. Thus, the width of the package is given by ξ(t) = ξ(1 + (2at/ξ2)2)1/2.

According to this formula, the rate at which ξ(t) increases (the wave package flowsapart) is the larger, the sharper the spatial concentration of the initial wave packagewas (cf. Fig. 2.5). For large times, t� ξ2/a, the width of the wave package increasesas ∆x ∼ t. The disintegration of the wave package is caused by a phenomenon calleddispersion: The form of the package at t = 0 is obtained by (Fourier) superpositionof a large number of plane waves. If all these plane waves propagated with the samephase velocity (i.e. if ω(k) was k–independent), the form of the wave package wouldremain un–changed. However, due to the fact that in a medium plane waves of differentwave number k generally propagate at different velocities, the Fourier spectrum of thepackage at non–zero times differs from that at time zero. Which in turn means that theintegrity of the form of the wave package gets gradually lost.

The dispersive spreading of electromagnetic wave packages is a phenomenon of immenseapplied importance. For example, dispersive deformation is one of the main factors limitingthe information load that can be pushed through fiber optical cables. (For too high loads,the ’wave packets’ constituting the bit stream through such fibers begin to overlap thus loosingtheir identity. The construction of ever more sophisticated counter measures optimizing thedata capacity of optical fibers represents a major stream of applied research.

2.3.4 Electric conductivity

As a last phenomenon relating to macroscopic electrodynamics, we discuss the electric con-duction properties of metallic systems.

Empirically, we know that in a metal, a finite electric field will cause current flow. Inits most general form, the relation between field and current assumes a form similar to thatbetween field and polarization discussed above:

ji(x, t) =

∫d3x′

∫ t

−∞dt′ σij(x− x′, t− t′)Ej(x′, t′), (2.35)

13

However, there is no need to worry that such anomalies conflict with Einsteins principle of relativity to bediscussed in chapter 3 below: the dynamics of a uniform wave train is not linked to the transport of energyor other observable physical quantities, i.e. there is no actual physical entity that is transported at a velocitylarger than the speed of light.

Page 62: Classical Theoretical Physics - Alexander Altland

58 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS

where σ = {σij(x, t)} is called the conductivity tensor. This equation states that a field maycause current flow at different spatial locations in time. Equally, a field may cause current flowin directions different from the field vector. (For example, in a system subject to a magneticfield in z–direction, an electric field in x–direction will cause current flow in both x– andy–direction, why?) As with the electric susceptibility, the non–local space dependence of theconductivity may often be neglected, σ(x, t) ∝ δ(x), or σ(q, ω) = σ(ω) in Fourier space.However, except for very low frequencies, the ω–dependence is generally important. Generally,one calls the finite–frequency conductivity ’AC conductivity’, where ’AC’ stands for ’alternatingcurrent’. For ω → 0, the conductivity crosses over to the ’DC conductivity’, where ’DC’ standsfor ’directed current’. In the DC limit, and for an isotropic medium, the field–current relationassumes the form of Ohm’s law

j = σE. (2.36)

In the following, we wish to understand how the phenomenon of electrical conduction can beexplained from our previous considerations.

There are two different ways to specify the dynamical response of a metal to externalfields: We may either postulate a current–field relation in the spirit of Ohms law. Or wemay determine a dielectric function which (in a more microscopic way) describes the responseof the mobile charge carriers in the system to the presence of a field. However, since thatresponse generally implies current flow, the simultaneous speccificiation of both a frequencydependent

14dielectric function and a current–field relation would over–determine the problem.

In the following, we discuss the two alternatives in more detail.

The phenomenological route

Let us impose Eq. (2.36) as a condition extraneous to the Maxwell equations; We simplypostulate that an electric field drives a current whose temporal and spatial profile is rigidlylocked to the electric field. As indiciated above, this postulate largely determines the dynamicalresponse of the system to the external field, i.e. the dielectric function ε(ω) = ε has to bechosen constant. Substitution of this ansatz into the temporal Fourier transform of the Maxwellequation (??) obtains

∇×H(x, ω) =1

c(−iεω + 4πσ)E(x, ω), (2.37)

For the moment, we leave this result as it is and turn to

The microscopic route

We want to describe the electromagnetic response of a metal in terms of a model dielectricfunction. The dielectric function (2.28) constructed above describes the response of chargecarriers harmonically bound to molecular center coordinates by a harmonic potential ∼ miω

2i /2

and subject to an effective damping or friction mechanism. Now, the conduction electrons of a

14

As we shall see below, a dielectric constant does not lead to current flow.

Page 63: Classical Theoretical Physics - Alexander Altland

2.3. WAVE PROPAGATION IN MEDIA 59

metal mey be thought of as charge carriers whose confining potential is infinitely weak, ωi = 0,so that their motion is not tied to a reference coordinate. Still, the conduction electrons willbe subject to friction mechanisms. (E.g., scattering off atomic imperfections will impede theirballistic motion through the solid.) We thus model the dielectric function of a metal as

ε(ω) = 1− 4πnee2

mω(ω + iτ)

+4πne2

m

∑i

fiω2i − ω2 − iγiω

ω�ωi' ε− 4πnee2

mω(ω + iτ), (2.38)

where ne ≡ Nf0 is a measure of the concentration of conduction electrons, the frictioncoefficient of the electrons has been denoted by τ−1,

15and in the last step we noted that

for frequencies well below the oscillator frequencies ωi of the valence electrons, the frequencydependence of the second contribution to the dielectric function may be neglected; We thuslump that contribution into an effective material constant ε. Keep in mind that the free electroncontribution to the dielectric function exhaustingly describes the acceleration of charge carriersby the field (the onset of current), i.e. no extraneous current–field relations must be imposed.

Substitution of Eq. (2.38) into the Maxwell equation (??) obtains the result

∇×H(x, ω) = −iωε(ω)

cE(x, ω) =

1

c

(−iωε+

4πnee2

m(−iω + 1τ)

)E(x, ω), (2.39)

Comparison to the phenomenological result (2.37) yields the identification

σ(ω) =nee

2

m(−iω + 1τ), (2.40)

i.e. a formula for the AC conductivity in terms of electron density and mass, and the impuritycollision rate.

Eq. (2.40) affords a very intuitive physical interpretations:

. For high frequencies, ω � τ−1, we may approximate σ ∼ (iω)−1, or ωj ∼ E. Intime space, this means ∂tj ∼ ρv ∼ E. This formula describes the ballistic accelerationof electrons by an electric field: On time scales t � τ , which, in a Fourier sense,correspond to large frequencies ω � τ−1 the motion of electrons is not hindered byimpurity collisions and they accelarate as free particles.

. For low frequencies, ω � τ−1, the conductivity may be approximated by a constant,σ ∼ τ . In this regime, the motion of the electrons is impeded by repeated impuritycollisions. As a result they diffuse with a constant drift induced by the electric field.

15

... alluding to the fact that the attenuation of the electrons is due to collisions of impurities which takeplace at a rate denoted by τ .

Page 64: Classical Theoretical Physics - Alexander Altland

60 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS

Page 65: Classical Theoretical Physics - Alexander Altland

Chapter 3

Relativistic invariance

3.1 Introduction

Consult an entry level physics textbook to re–familiarize yourself with the basic notions ofspecial relativity!

3.1.1 Galilei invariance and its limitations

The laws of mechanics are the same in two coordinate systems K and K ′ moving at a con-stant velocity relative to one another. Within classical Newtonian mechanics, the space timecoordinates (t,x) and (t′,x′) of two such systems are related by a Galilei transformation,

t′ = t

x′ = x− vt.

Substituting this transformation into Newton’s equations of motion as formulated in systemK,

K :d2xidt2

= −∇xi

∑j

Vij(xi − xj)

(xi are particle coordinates in system K and Vij is a pair potential which, crucially, dependsonly on the differences between particle coordinates but not on any distinguished referencepoint in universe.) we find that

K ′ :d2x′idt′2

= −∇x′i

∑j

Vij(x′i − x′j),

i.e. the equations remain form invariant. This is what is meant when we say that classicalmechanics is Galilei invariant.

1

1

More precisely, Galilei invariance implies invariance under all transformations of the Galilei group, theuniform motions above, rotations of space, and uniform translations of space and time.

61

Page 66: Classical Theoretical Physics - Alexander Altland

62 CHAPTER 3. RELATIVISTIC INVARIANCE

Now, suppose a certain physical phenomenon in K (the dynamics of a water surface, say)is effectively described by a wave equation,

K :

(∆− 1

c2∂2t

)f(x, t) = 0.

Substitution of the Galilei transformation into this equation obtains

K ′ :

(∆′ − 1

c2∂2t′ −

2

c2v · ∇′∂t′ −

1

c2(v · ∇′)2

)f(x′, t′) = 0,

i.e. an equation of modified form. At first sight this result may look worrisome. After all,many waves (water waves, sound waves, etc.) are of pure mechanical origin which means thatwave propagation ought to be a Galilei invariant phenomenon. The resolution to this problemlies with the fact, that matter waves generally propagate in a host medium (water, say.) Whilethis medium is at rest in K, it isn’t in K ′. But a wave propagating in a non–stationary mediummust be controlled by a different equation of motion, i.e. there is no a priori contradictionto the principle of Galilei invariance. Equivalently, one may observe that a wave emitted by astationary point source in K will propagate with velocity c (in K.) However, an observer inK ′ will observe wave fronts propagating at different velocities. Mathematically, this distortionof the wave pattern is described by K ′’s ‘wave equation’.

But what about electromagnetic waves? So far, we have never talked about a ‘hostmedium’ supporting the propagation of electromagnetic radiation. The lack of Galilean invari-ance of Maxwells equations then leaves us with three possibilities:

1. Nature is Galilean invariant, but Maxwells equations are incorrect (or, to put it moremildly, incomplete.)

2. An electromagnetic host medium (let’s call it the ether) does, indeed, exist.

3. Nature is invariant under a group of space–time transformations different from Galilei.If so, Newton’s equations of motion are in need of modification.

These questions became pressing in the second half of the 19th century, after Maxwell hadbrought the theory of electromagnetism to completion, and the exploration of all kinds ofelectromagnetic wave equation was a focal point of theoretical and experimental research. Inview of the sensational success of Maxwells theory the first answer was not really consideredan option.

Since Newtonian mechanics wasn’t a concept to be sacrificed in a lightheartededly either,the existence of an ether was postulated. However, it was soon realized that the ether mediumwould have to have quite eccentric physical properties. Since no one had ever actually observedan ether, this medium would have to be infinitely light and fully non–interacting with matter.(Its presumed etheric properties actually gave it its name.) Equally problematic, electromag-netic waves would uniformly propagate with the speed of light, c, only in the distinguishedether rest frame. The ether postulate was tantamount to the abandonment of the idea of ‘nopreferential intertial frames in nature’ a concept close at heart to the development of modern

Page 67: Classical Theoretical Physics - Alexander Altland

3.1. INTRODUCTION 63

Figure 3.1: Measurement of Newtons law of gravitational attraction (any other law would bejust as good) in two frames that are relatively inertial. The equations describing the forcebetween masses will be the same in both frames

physics. Towards the end of the nineteenth century, experimental evidence was mounting thatthe vacuum velocity of light was, independent of the observation frame, given by c; the ideaof an ether became increasingly difficult to sustain.

3.1.2 Einsteins postulates

In essence, this summarizes the state of affairs before Einstein entered the debate. In viewof the arguments outlined above, Einstein decided that 3. had to be the correct option.Specifically, he put forward two fundamental postulates

. The postulate of relativity: This postulate is best formulated in a negative form:there is no such thing like ‘absolute rest’ or an ’absolute velocity’, or an ‘absolute pointin space’. No absolute frame exists in which the laws of nature assume a distinguishedform.

To formulate this principle in a positive form, we define two coordinate frames to beinertial relatively to each other if one is related to the other by a translation in spaceand/or time, a constant rotation, a uniform motion, or a combination of these operations.The postulate then states that the laws of nature will assume the same form (be expressedby identical fundamental equations) in both systems.

As a corollary, this implies that physical laws must never make reference to absolutecoordinates, times, angles, etc.

. The postulate of the constancy of the speed of light: the speed of light is inde-pendent of the motion of its source.

What makes this postulate — which superficially might seem to be no more than aninnocent tribute to experimental observation

2— so revolutionary is that it abandons the

2

... although the constancy of the speed of light had not yet been fully established experimentally whenEinstein made his postulate.

Page 68: Classical Theoretical Physics - Alexander Altland

64 CHAPTER 3. RELATIVISTIC INVARIANCE

notion of ‘absolute time’. For example, two events that are simultaneous in one inertialframe will in general no longer be simultaneous in other inertial frames. While, morethan one hundred years later, we have grown used to its implications the revolutionarycharacter of Einsteins second postulate cannot be exaggerated.

3.1.3 First consequences

As we shall see below, the notion of an ’absolute presence’ — which makes perfect sense inNewtonian mechanics

3— becomes meaningless in Einsteins theory of relativity. When we

monitor physical processes we must be careful to assign to each physical event its own spaceand time coordinate x = (ct,x) (where the factor of c has been included for later convenience.)An event is canonical in that it can be observed in all frames (no matter whether they arerelatively inertial or not). However, both its space and time coordinates are non–canonical,i.e. when observed from a different frame K ′ it will have coordinates x′ = (ct′,x′), where t′

may be different from t.

ct

x

When we talk of a ’body’ or a ’particle’ in relativity, what weactually mean is the continuous sequence of events x = (ct,x)defined by its instantaneous position x at time t (both moni-tored in a specific frame). The assembly of these events obtainsthe world line of the body, a (directed) curve in a space–timecoordinate frame (cf. the figure for a schematic of the (1 + 1)–dimensional world lines of a body in uniform motion (left) and anaccelerated body (right).)

The most important characteristic of the relative motion of inertial frames is that noacceleration is involved. This implies, in particular, that a uniform motion in one frame will stayuniform in the other; straight world lines transform into straight world lines. Parameterizinga general uniform motion in system K by x = (ct,wt + b), where w is the velocity and ban offset vector, we must require that the coordinate representation x′ in any inertial systemK ′ is again a straight line. Now, the most general family of transformations mapping straightlines onto straight lines are the affine transformations

x′ = Λx+ a, (3.1)

where Λ : Rd+1 → Rd+1 is a linear map from (d+ 1)–dimensional4

space time into itself anda ∈ Rd+1 a vector describing the displacement of the origin of coordinate systems in spaceand or time. For a = 0, we speak of a homogeneous coordinate transformation. Since anytransformation may be represented as a succession of a homogeneous transformation and atrivial translation, we will focus on the former class throughout.

INFO Above, we have set up criteria for two frames being relatively inertial. However, in thetheory of relativity, one meets with the notion of (absolutely) inertial frames (no reference to

3

Two events that occur simultaneously in one frame will remain simultaneous under Galilei transformations.4

Although we will be foremostly interested in the case d = 3, it is occasionaly useful to generalize to otherdimensions. Specifically, space–time diagrams are easiest to draw/imagine in (1 + 1)–dimensions.

Page 69: Classical Theoretical Physics - Alexander Altland

3.1. INTRODUCTION 65

another frame.) But how can we tell, whether a frame is inertial or not? And what, for thatmatter, does the attribute ’inertial’ mean, if no transformation is involved?Newton approached this question by proposing the concept of an ’absolute space’. Systems at restin this space, or at constant motion relative to it, were ’inertial’. For practical purposes, it wasproposed that a reference system tied to the position of the fixed stars was (a good approximationof) an absolute system. Nowadays, we know that these concepts are problematic. The fixed starsaren’t fixed (nor for that matter is any known constituent of the universe), i.e. we are not awareof a suitable ’anchor’ reference system. What is more, the declaration of an absolute frame is atodds with the ideas of relativity.In the late nineteenth century, the above definition of inertial frames was superseded by a purelyoperational one. A modern formulation of the alternative definition might read:

An inertial frame of reference is one in which the motion of a particlenot subject to forces is in a straight line at constant speed.

The meaning of this definition is best exposed by considering its negative: imagine an observertied to a rotating disc. A particle thrown by that observer will not move at a straight line (relativeto the disc.) If the observer does not know that he/she is on a rotating frame, he/she will have toattribute bending of the trajectory to some ’fictitious’ forces. The presence of such forces is proofthat the disc system is not inertial.

However, in most physical contexts questions concerning the ’absolute’ status of a system do not

arise.5

Far more frequently, we care about what happens as we pass from one frame (laboratory,

say) to another (that of a particle moving relatively to the lab.) Thus, the truly important issue is

whether systems are relatively inertial or not.

What conditions will the transformation matrix Λ have to fulfill in order to qualify as a trans-formation between inertial frames? Referring for a rigorous discussion of this question toRef. [2], we here merely note that symmetry conditions implied by the principle of relativ-ity nearly but not completely specify the class of permissible transformations. This class oftransformations includes the Gallilei transformations which we saw are incompatible with thephysics of electromagnetic wave propagation.

At this point, Einsteins second postulate enters the game. Consider two inertial framesK and K ′ related by a homogeneous coordinate transformation. At time t = 0 the origins ofthe two systems coalesce. Let us assume, however, the K ′ moves relatively to K at constantvelocity v. Now consider the event: at time t = 0 a light source at x = 0 emits a signal. InK, the wave fronts of the light signal then trace out world lines propagating at the vacuumspeed of light, i.e. the spatial coordinates x of a wave front obey the relation

x2 − c2t2 = 0.

The crucial point now is that in spite of its motion relative to the point of emanation of thelight signal, an observer in K ′, too, will observe a light signal moving with velocity c in alldirections. (Notice the difference to, say, a sound wave. From K ′’s point of view, frontsmoving along v would be slowed down while those moving in the opposite direction would

5

Exceptions include high precision measurements of gravitational forces.

Page 70: Classical Theoretical Physics - Alexander Altland

66 CHAPTER 3. RELATIVISTIC INVARIANCE

propagate at higher speed, v′sound = vsound − v.) This means that the light front coordinatesin x′ obey the same relation

x′2 − c2t′2 = 0.

This additional condition unambiguously fixes the set of permissible transformations. Toformulate it in the language of linear coordinate transformations (which, in view of the linearityof transformations between inertial frames is the adequate one), let us define the matrix

g = {gµν} ≡

1−1

−1−1

, (3.2)

where all non–diagonal entries are zero. The constancy of the speed of light may then beexpressed in concise form as xTgx = 0 ⇒ x′Tgx′ = 0. This condition suggests to focus oncoordinate transformations for which the bilinear form xtgx is conserved,

6

xTgx = x′Tgx′. (3.3)

Substituting x′ = Λx, we find that Λ must obey the condition

ΛTgΛ = g (3.4)

Linear coordinate transformations satisfying this condition are called Lorentz transforma-tions. We thus postulate that

All laws of nature must be invariant under affine coordinate transformations (3.1)where the homogeneous part of the transformation obeys the Lorentz condition (3.4).

The statement above characterizes the set of legitimate transformations in implicit terms. Inthe next section, we aim for a more explicit description of the Lorentz transformation. Theseresults will form the basis for our discussion of Lorentz invariant electrodynamics further down.

3.2 The mathematics of special relativity I: Lorentz group

In this section, we will develop the mathematical framework required to describe transforma-tions between inertial frames. We will show that these transformations form a continuousgroup, the Lorentz group.

6

Notice the gap in the logics of the argument. Einsteins second postulate requires Eq. (3.3) only for thosespace time vectors for which xT gx = 0 vanishes; We now suggest to declare it as a universal condition, i.e.one that holds irrespective of the value of xT gx. To show that the weaker condition implies the stronger, onemay first prove (cf. Ref. [2]) that relatively straightforward symmetry considerations suffice to fix the class ofallowed transformations up to one undetermined scalar constant. The value of that constant is then fixed bythe ‘weak condition’ above. Once it has been fixed, one finds that Eq. (3.3) holds in general.

Page 71: Classical Theoretical Physics - Alexander Altland

3.2. THE MATHEMATICS OF SPECIAL RELATIVITY I: LORENTZ GROUP 67

ct

x

forward light conetime like

space like

backward light cone

Figure 3.2: Schematic of the decomposition of space time into a time like region and a spacelike region. The two regions are separated by a ‘light cone’ of light–like vectors. (A ’cone’because in space dimensions d > 1 it acquires a conical structure.) The forward/backwardpart of the light cone extends to positive/negative times

3.2.1 Background

The bilinear form introduced above defines a scalar product of R4:

g : R4 × R4 → R, (3.5)

(x, y) 7→ xTgy ≡ xµgµνyν . (3.6)

A crucial feature of this scalar product is that it is not positive definite. For reasons to bediscussed below, we call vectors whose norm is positive, xTgx = (ct)2 − x2 > 0, time likevectors. Vectors with negative norm will be called space like and vectors of vanishing normlight like.

We are interested in Lorentz transformations, i.e. linear transformations Λ : R4 → R4, x 7→Λx that leave the scalar product g invariant, ΛTgΛ = g. (Similarly to, say, the orthogonaltransformations which leave the standard scalar product invariant.) Before exploring the struc-ture of these transformations in detail, let us observe a number of general properties. Firstnotice that the set of Lorentz transformations forms a group, the Lorentz group, L. For ifΛ and Λ′ are Lorentz transformations, so is the composition ΛΛ′. The identity transformationobviously obeys the Lorentz condition. Finally, the equation ΛTgΛ = g implies that Λ isnon–singular (why?), i.e. it possesses an inverse, Λ−1. We have thus shown that the Lorentztransformations form a group.

Global considerations on the Lorentz group

What more can be said about this group? Taking the determinant of the invariance relation,we find that det(ΛTgΛ) = det(Λ)2 det(g) = det(g), i.e. det(Λ)2 = 1 which means that

Page 72: Classical Theoretical Physics - Alexander Altland

68 CHAPTER 3. RELATIVISTIC INVARIANCE

det(Λ) ∈ {−1, 1}. (Λ is a real matrix, i.e. det(Λ) ∈ R.) We may further observe that theabsolute value of the component Λ00 is always larger than unity. This is shown by inspectionof the 00–element of the invariance relation: 1 = g00 = (ΛTgΛ)00 = Λ2

00 −∑

i Λ20i. Since the

subtracted term is positive (or zero), we have |Λ00| ≥ 1. We thus conclude that L containsfour components defined by

L↑+ : det Λ = +1, Λ00 ≥ +1,

L↓+ : det Λ = +1, Λ00 ≤ −1,

L↑− : det Λ = −1, Λ00 > +1,

L↓− : det Λ = −1, Λ00 ≤ −1.

(3.7)

Transformations belonging to any one of these subsets cannot be continuously deformed into atransformation of a different subset (why?), i.e. the subsets are truly disjoint. In the following,we introduce a few distinguished transformations belonging to the different components of theLorentz group.

Space reflection, or parity, P : (ct,x)→ (ct,−x) is a Lorentz transformation. It belongsto the component L↑−. Time inversion, T : (ct,x) → (−ct,x) belongs to the component

L↓−. The product of space reflection and time inversion, PT : (ct,x) → (−ct,−x) belongs

to the component L↓+. Generally, the product of two transformations belonging to a givencomponent no longer belongs to that component, i.e. the subsets do not for subgroups ofthe Lorentz group. However, the component L↑+ is exceptional in that it is a subgroup. It iscalled the proper orthochrome Lorentz group, or restricted Lorentz group. The attribute’proper’ indicates that it does not contain excpetional transformations (such as parity and timeinversion) which cannot be continuosly deformed back to the identity transform. It is called‘orthochronous’ because it maps the positive light cone into it self, i.e. it respects the ‘direction’of time. To see that L↑+ is a group, notice that it contains the identity transformation. Further,its elements can be continuously contracted to unity; however, if this property holds for twoelements Λ,Λ′ ∈ L↑+, then so for the product ΛΛ′, i.e. the product is again in L↑+.

The restricted Lorentz group

The restricted Lorentz group contains those transformations which we foremostly associatewith coordinate transformations between inertial frames. For example, rotations of space,

ΛR ≡(

1R

),

where R is a three–dimensional rotation matrix trivially fall into this subgroup. However, italso contains the second large family of homogeneous transformations between inertial frames,uniform motions.

Page 73: Classical Theoretical Physics - Alexander Altland

3.2. THE MATHEMATICS OF SPECIAL RELATIVITY I: LORENTZ GROUP 69

x3

x3

x2

x2

x1

x1

x′3

x′3

x′2

x′2

x′1

x′1

vt

vt

K K ′

K ′

K

Consider two inertial frames K and K ′ whose originsat time t = 0 coalesce (which means that the affinecoordinate transformation (3.1) will be homogeneous,a = 0.) We assume that K ′ moves at constant veloc-ity v relative to K. Lorentz transformations of this type,Λv, are called special Lorentz transformations. Phys-ically, they describe what is sometimes callled a Lorentzboost, i.e. the passage from a stationary to a movingframe. (Although we are talking about a mere coor-dinate transformation, i.e. no acceleration processes isdescribed.) We wish to explore the mathematical andphysical properties of these transformations.

Without loss of generality, we assume that the vectorv describing the motion is parallel to the e1 direction of K (and hence to the e′1–direction ofK ′: Any general velocity vector v may be rotated to v ‖ e1 by a space like rotation R. Thismeans that a generic special Lorentz transformation Λv may be obtained from the prototypicalone Λve1 by a spatial rotation, Λv = ΛRΛve1Λ

−1R (cf. the figure.)

A special transformation in e1–direction will leave the coordinates x2 = x′2 and x3 = x′3unchanged.

7The sought for transformation thus assumes the form

Λve1 =

Av 11

,

where the 2× 2 matrix Av operates in (ct, x1)–space.

There are many ways to determine the matrix Av. We here use a group theory inspiredmethod which may come across as somewhat abstract but has the advantage of straightfor-ward applicability to many other physical problems. Being element of a continuous group oftransformations (a Lie group), the matrix Av ≡ exp(

∑i λiTi) can be written in an exponential

parameterization. Here, λi are real parameters and Ti two–dimensional fixed ‘generator ma-trices’ (Cf. with the conventional rotation group; the Ti’s play the role of angular momentumgenerator matrices, and the λi’s are generalized real–valued ‘angles’.) To determine the groupgenerators, we consider the transformation Av for infinitesimal angles λi. Substituting thematrix Av into the defining equation of the Lorentz group, using that the restriction of themetric to (ct, x) space assumes the form g → σ3 ≡ diag(1,−1), and expanding to first orderin λi, we obtain

ATv σ3Av ' (1 +∑i

λiTTi )σ3(1 +

∑i

λiTi) ' σ3 +∑i

λi(TTi σ3 + σ3Ti)

!= σ3.

7

To actually prove this, notice that from the point of view of an observer in K ′ we consider a special trans-formation with velocity −ve′1. Then apply symmetry arguments (notably the condition of parity invariance; amirrored universe should not differ in an observable manner in its transformation behaviour from our one.) toshow that any change in these coordinates would lead to a contradiction.

Page 74: Classical Theoretical Physics - Alexander Altland

70 CHAPTER 3. RELATIVISTIC INVARIANCE

This relation must hold regardless of the value of λi, i.e. the matrices Ti must obey thecondition T Ti σ3 = −σ3Ti. Up to normalization, there is only one matrix that meets this

condition, viz Ti = σ1 ≡(

0 11 0

). The most general form of the matrix Av is thus given by

Av = exp

(0 11 0

))=

(coshλ sinhλsinhλ coshλ

),

where we denoted the single remaining parameter by λ and in the second equality expandedthe exponential in a power series. We now must determine the parameter λ = λ(v) in sucha way that it actually describes our v–dependent transformation. We first substitute Av intothe transformation law x′ = Λvx, or(

x0′

x1′

)= Av

(x0

x1

).

Now, the origin of K ′ (having coordinate x1′ = 0) is at coordinate x = vt. Substituting thiscondition into the equation above, we obtain

tanhλ = −v/c.

Using this result and introducing the (standard) notation

β = |v|/c, γ ≡ (1− β2)−1/2, (3.8)

the special Lorentz transformation assumes the form

Λve1 =

γ −βγ−βγ γ

11

(3.9)

EXERCISE Repeat the argument above for the full transformation group, i.e. not just the set

of special Lorentz transformations along a certain axis. To this end, introduce an exponential

representation Λ = exp(∑

i λiTi) for general elements of the restricted Lorentz group. Use the

defining condition of the group to identify six linearly independent matrix generators.8

Among

the matrices Ti, identify three as the generators Ji of the spatial rotation group, Ti = Ji. Three

others generate special Lorentz transformations along the three coordinate axes of K. (Linear

combinations of these generators can be used to describe arbitrary special Lorentz transformations.)

A few more remarks on the transformations of the restricted Lorentz group:

8

Matrices are elements of a certain vector space (the space of linear transformations of a given vectorspace.) As A set of matrices A1, . . . Ak is linearly independent if no linear combination exists such that∑i xiAi = 0 vanishes.

Page 75: Classical Theoretical Physics - Alexander Altland

3.3. ASPECTS OF RELATIVISTIC DYNAMICS 71

. In the limit of small β, the transformation Λve1 asymptotes to the Galilei transformation,t′ = t and x′1 = x1 − vt.

. We repeat that a general special Lorentz transformation can be obtained from thetransformation along e1 by a space–like rotation, Λv = ΛRΛve1ΛR−1 , where R is arotation mapping e1 onto the unit vector in v–direction.

. Without proof (However, the proof is by and large implied by the exercise above.) wemention that a general restricted Lorentz transformation can always be represented as

Λ = ΛRΛv,

i.e. as the product of a rotation and a special Lorentz transformation.

3.3 Aspects of relativistic dynamics

In this section we explore a few of the physical consequences deriving from Einsteins postulates;Our discussion will be embarassingly superficial, key topics relating to the theory of relativityaren’t mentioned at all. Rather the prime objective of this section will be to provide the corebackground material required to discuss the relativistic invariance of electrodynammics below.

3.3.1 Proper time

ct

x

ct′

x′

Consider the world line of a particle moving in space in a (not nec-essarily uniform manner.) At any instance of time, t, the particlewill be at rest in an inertial system K ′ whose origin coincides withx(t) and whose instantaneous velocity w.r.t. K is given by dtx(t).At this point, it is important to avoid confusion. Of course, thecoordinate system whose origin globally coincides with x(t) is notinertial w.r.t. K (unless the motion is uniform.) Put differently,the origin of the inertial system K ′ defined by the instantaneousposition x(t) and the instantaneous velocity dtx(t) coincides withthe particles position only at the very instant t. The t′ axis ofK ′ is parallel to the vector x′(t). Also, we know that the t′ axiscannot be tilted w.r.t. the t axis by more than an angle π/4, i.e. it must stay within theinstantaneous light cone attached to the reference coordinate x.

Now, imagine an observer traveling along the world line x(t) and carrying a watch. Attime t (corresponding to time zero in the instantaneous rest frame) the watch is reset. Wewish to identify the coordinates of the event ‘watch shows (infinitesimal) time dτ ’ in bothK and K ′. In K’ the answer to this question is formulated easily enough, the event hascoordinates (cdτ, 0), where we noted that, due to the orthogonality of the space–like coordinateaxes to the world line, the watch will remain at coordinate x′ = 0. To identify the K–coordinates, we first note that the origin of K ′ is translated w.r.t. that of K by a = (ct,x(t).

Page 76: Classical Theoretical Physics - Alexander Altland

72 CHAPTER 3. RELATIVISTIC INVARIANCE

Thus, the coordinate transformation between the two frames assumes the general affine formx = Lx′+ a. In K, the event takes place at some (as yet undetermined) time dt later than t.It’s spatial coordinates will be x(t) + vdt, where v = dtx is the instantaneous velocity of themoving observer. We thus have the identification (cdτ, 0)↔ (c(t+ dt),x+ vdt). Comparingthis with the general form of the coordinate transformation, we realize that (cdτ, 0) and(cdt,vdt) are related by a homogeneous Lorentz transformation. This implies, in particular,that (cdτ)2 = (cdt)2 − (vdt)2, or

dτ = dt√

1− v2/c2. (3.10)

The time between events of finite duration may be obtained by integration,

τ =

∫dt√

1− v2/c2.

This equation expresses the famous phenomenon of the relativity of time: For a moving ob-server time proceeds slower than for an observer at rest. The time measured by the propagatingwatch, τ , is called proper time. The attribute ‘proper’ indicates that τ is defined with refer-ence to a coordinate system (the instantaneous rest frame) that can be canonically defined.I.e. while the time t associated to a given point on the world line x(t) = (ct,x(t)) will changefrom system to system (as we just saw), the proper time remains the same; the proper time canbe used to parameterize the space–time curve in an invariant manner as x(τ) = (ct(τ),x(τ)).

3.3.2 Relativistic mechanics

Relativistic energy momentum relation

In section 3.1.2 we argued that Einstein solved the puzzle posed by the Lorentz invariance ofelectrodynamics by postulating that Newtonian mechanics was in need of generalization. Wewill begin our discussion of the physical ramifications of Lorentz invariance by developing thisextended picture. Again, we start from postulates motivated by physical reasoning:

. The extended variant of Newtonian mechanics (lets call it relativistic mechanics) ought tobe invariant under Lorentz transformations, i.e. the generalization of Newtons equationsshould assume the same form in all inertial frames.

. In the rest frame of a particle subject to mechanical forces, or in frames moving at lowvelocity v/c� 1 relatively to the rest frame, the equations must asymptote to Newtonsequations.

Newton’s equations in the rest frame K are of course given by

md2

dτ 2x = f ,

where m is the particle’s mass (in its rest frame — as we shall see, the mass is not an invariantquantity. We thus better speak of a particle rest mass), and f is the force. As always in

Page 77: Classical Theoretical Physics - Alexander Altland

3.3. ASPECTS OF RELATIVISTIC DYNAMICS 73

relativity, a spatial vector (such as x or f) must be interpreted as the space–like componentof a four–vector. The zeroth component of x is given by x0 = cτ , so that the rest framegeneralization of the Newton equation reads as md2

τxµ = fµ, where f = (0, f) and we noted

that for the zeroth component md2τx

0 = 0, so that f 0 = 0. It is useful to define the four–momentum of the particle by pµ = mdτx

µ. (Notice that both τ and the rest mass m areinvariant quantities, they are not affected by Lorentz transformations. The transformationbehaviour of p is determined by that of x.) The zeroth component of the momentum, mdτx

0

carries the dimension [energy]/[velocity]. This suggests to define

p =

(E/cp

),

where E is some characteristic energy. (In the instantaneous rest frame of the particle,E = mc2 and p = 0.)

Expressed in terms of the four–momentum, the Newton equation assumes the simple form

dτp = f. (3.11)

Let us explore what happens to the four momentum as we boost the particle from its restframe to a frame moving with velocity vector v = ve1. Qualitatively, one will expect thatin the moving frame, the particle does carry a finite space–like momentum (which will beproportional to the velocity of the boost, v). We should also expect a renormalization of itsenergy (from the point of view of the moving frame, the particle has acquired kinetic energy).

Focusing on the 0 and 1 component of the momentum (the others won’t change), Eq.(3.9) implies (

mc2/c0

)→(E ′/cp′1

)≡(γmc2/c−(γm)v

).

This equation may be interpreted in two different ways. Substituting the explicit formulaE = mc2, we find E ′ = (γm)c2 and p′1 = −(γm)v. In Newtonian mechanics, we would haveexpected that a particle at rest in K carries momentum p′1 = −mv in K ′. The difference tothe relativistic result is the renormalization of the mass factor: A particle that has mass m inits rest frame appears to carry a velocity dependent mass

m(v) = γm =m√

1− (v/c)2

in K ′. For v → c, it becomes infinitely heavy and its further acceleration will become progres-sively more difficult:

Particles of finite rest frame mass cannot move faster than with the speed of light.

The energy in K ′ is given by (mγ)c2, and is again affected by mass renormalization. However,a more useful representation is obtained by noting that pTgp is a conserved quantity. In

Page 78: Classical Theoretical Physics - Alexander Altland

74 CHAPTER 3. RELATIVISTIC INVARIANCE

K, pTgp = (E/c)2 = (mc2)2. Comparing with the bilinear form in K ′, we find (mc)2 =(E ′/c)2 − p′2 or

E ′ =√

(mc2)2 + (p′c)2 (3.12)

This relativistic energy–momentum relation determines the relation between energy andmomentum of a free particle.

It most important implication is that even a free particle at rest carries energy E = mc2,Einsteins world–famous result. However, this energy is usually not observable (unless it getsreleased in nuclear processes of mass fragmentation, with all the known consequences.) Whatwe do observe in daily life are the changes in energy as we observe the particle in differentframes. This suggests to define the kinetic energy of a particle by

T ≡ E −mc2 =√

(mc2)2 + (pc)2 −mc2,

For velocities v � c, the kineatic energy T ' p2/2m indeed reduces to the familiar non–relativistic result. Deviations from this relation become sizeable at velocities comparable tothe speed of light.

Particle subject to a Lorentz force

We next turn back to the discussion of the relativistically invariant Newton equation dτpµ =fµ. Both, pµ and the components of the force, fµ, transform as vectors under Lorentztransformations. Specifically, we wish to explore this transformation behaviour on an exampleof obvious relevance to electrodynamics, the Newton equation of a particle of velocity v subjectto an electric and a magnetic field. Unlike in much of our previous discussion our observerframe K is not the rest frame of the particle. The four–momentum of the particle is thusgiven by p = (mγc,mγv).

What we take for granted is the Lorentz force relation,

d

dtp = q(E + (v/c)×B).

In view of our discussion above, it is important to appreciate the meaning of the non–invariantconstituents of this equation. Specifically, t is the time in the observer frame, and p is thespace–like component of the actual momentum. This quantity is different from mv. Rather,it is given by the product (observed mass) × (observed velocity), where the observed massdiffers from the rest mass, as discussed above.

We now want to bring this equation into the form of an invariant Newton equation (3.11).Noting that (cf. Eq. (3.10)) d

dt= γ−1 d

dτ, we obtain

d

dτp = f ,

where the force acting on the particle is given by

f = qγ(E + (v/c)×B).

Page 79: Classical Theoretical Physics - Alexander Altland

3.3. ASPECTS OF RELATIVISTIC DYNAMICS 75

To complete our derivation of the Newton equation in K, we need to identify the zerothcomponent of the force, f0. The zeroth component of the left hand side of the Newtonequation is given by (γ/c)dtE, i.e. (γ/c)× the rate at which the kinetic energy of the particlechanges. However, we have seen before that this rate of potential energy change is given bydtU = −f · v = −qE · v. Energy conservation implies that this energy gets converted intothe kinetic energy, dtT = dtE = +qE · v. This implies the identification f0 = (γq/c)E · v.The generalized form of Newtons equations is thus given by

mdτ

(γcγv

)=q

c

(E · (γv)

γcE + (γv)×B

).

The form of these equations suggests to introduce the four–velocity vector

v ≡(γcγv

)(in terms of which the four–momentum is given by p = mv, i.e. by multiplication by the restmass.) The equations of motion can then be expressed as

mdτvµ = F µνvν , (3.13)

where we used the index raising and lowering convention originally introduced in chapter ??,i.e. v0 = v0 and vi = −vi, and the matrix F is defined by

F = {F µν} =

0 −E1 −E2 −E3

E1 0 −B3 B2

E2 B3 0 −B1

E3 −B2 B1 0

. (3.14)

The significance of Eq. (3.13) goes much beyond that of a mere reformulation of Newton’sequations: The electromagnetic field enters the equation through a matrix, the so–calledfield strength tensor. This signals that the transformation behaviour of the electromagneticfield will be different from that of vectors. Throughout much of the course, we treated theelectric field as if it was a (space–like) vector. Naively, one might have expected that in thetheory of relativity, this field gets augmented by a fourth component to become a four–vector.However, a moments thought shows that this picture cannot be correct. Consider, say, acharged particle at rest in K. This particle will create a Coulomb electric field. However, ina frame K ′ moving relative to K, the charge will be in motion. Thus, an observer in K ′ willsee an electric current plus the corresponding magnetic field. This tells us that under inertialtransformations, electric and magnetic fields get transformed into one another. We thus needto find a relativistically invariant object accommodating the six components of the electric andmagnetic field ’vectors’. Eq. (3.14) provides a tentative answer to that problem. The idea isthat the fields enter the theory as a matrix and, therefore, transform in a way fundamentallydifferent from that of vectors.

Page 80: Classical Theoretical Physics - Alexander Altland

76 CHAPTER 3. RELATIVISTIC INVARIANCE

3.4 Mathematics of special relativity II: Co– and Con-travariance

So far, we have focused on the Lorentz transformation behaviour of vectors in R4. However, ourobservations made in the end of the previous section suggest to include other objects (suchas matrices) into our discussion. We will begin by providing some essential mathematicalbackground and then, finally, turn to the discussion of Lorentz invariance of electrodynamics.

3.4.1 Covariant and Contravariant vectors

Generalities

Let x = {xµ} be a four component object that transforms under homogeneous Lorentz trans-formations as x → Λx or xµ → Λµ

νxν in components. (By Λµ

ν we denote the componentsof the Lorentz transformation matrix. For the up/down arrangement of indices, see the disuc-ssion below.) Quantities of this transformation behaviour are called contravariant vectors(or contravariant tensors of first rank). Now, let us define another four–component object,xµ ≡ gµνx

ν . Under a Lorentz transformation, xµ → gµνΛνρx

ρ = (ΛT−1) νµ xν . Quantities

transforming in this way will be denoted as covariant vectors (or covariant tensors of firstrank). Defining gµν to be the inverse of the Lorentz metric (in a basis where g is diagonal,g = g−1), the contravariant anchestor of xµ may be obtained back by ‘raising the indices’ asxµ = gµνxν .

Critical readers may find these ’definitions’ unsatisfactory. Formulated in a given basis,one may actually wonder whether they are definitions at all. For a discussion of the basisinvariant meaning behind the notions of co- and contravariance, we refer to the info blockbelow. However, we here proceed in a pragmatic way and continue to explore the consequencesof the definitions (which, in fact, they are) above.

INFO Consider a general real vector space V . Recall that the dual space V ∗ is the linear space ofall mappings v : V → R,v 7→ v(v). (This space is a vector space by itself which is why we denoteits elements by vector–like symbols, v.) For a given basis {eµ} of V , a dual basis {eµ} of V ∗ maybe defined by the condition eµ(eν) = δµν . With the expansions v =

∑µ v

µeµ and v =∑

µ vµeµ,

respectively, we have v(v) = vµvµ. (Notice that components of objects in dual space will be

indexed by subscripts throughout.) In the literature of relativity, elements of the vector space —then to be identified with space–time, see below — are usually called contravariant vectors whiletheir partners in dual space are called covariant vectors.

EXERCISE Let A : V → V be a linear map defining a basis change in V , i.e. eµ = A νµ e′ν ,

where {A νµ } are the matrix elements of A. Show that

. The components of a contravariant vector v transform as vµ → v′µ = (AT )µνvν .

. The components of a covariant vector, w transform as wµ → w′µ = (A−1) νµ wν .

. The action of w on v remains invariant, i.e. w(v) = wµvµ = w′µv

′ν does not change.(Of course, the action of the linear map w on vectors must not depend on the choice of aparticular basis.)

Page 81: Classical Theoretical Physics - Alexander Altland

3.4. MATHEMATICS OF SPECIAL RELATIVITY II: CO– AND CONTRAVARIANCE 77

In physics it is widespread (if somewhat tautological) practice to define co– or contravariantvectors by their transformation behaviour. For example, a set of d–components {wµ} is denoteda covariant vector if these components change under linear transformations as wµ → A ν

µ wν .

Now, let us assume that V is a vector space with scalar product g : V × V → R, (v,w) 7→vT gw ≡ 〈v, w〉. A special feature of vector spaces with scalar product is the existence of acanonical mapping V → V ∗,v 7→ v, i.e. a mapping that to all vectors v canonically (withoutreference to a specific basis) assigns a dual element v. The vector v is implicitly defined by the

condition ∀w ∈ V, v(w)!

= 〈v,w〉. For a given basis {eµ} of V the components of v =∑

µ vµeµ

may be obtained as vµ = v(eµ) = 〈v, eµ〉 = vνgνµ = gµνvν , where in the last step we used the

symmetry of g. With the so–called index lowering convention

vµ = gµνvν , (3.15)

we are led to the identification vµ = vµ and v(w) = vµwµ.

Before carrying on, let us introduce some more notation: by gµν ≡ (g−1)µν we denote the compo-nents of the inverse of the metric (which, in the case of the Lorentz metric in its standard diagonalrepresentation equals the metric, but that’s an exception.) Then gµνgνρ = δµρ , where δρµ is thestandard Kronecker symbol. With this definition, we may introduce an index raising conventionas vµ ≡ gµνvν . Indeed, gµνvν = gµνgνρv

ρ = δµρvρ = vµ. (Notice that indices that are summedover or ‘contracted’ always appear crosswise, one upper and one lower index.)

Consider a map A : V → V,v 7→ Av. In general the action of the canonical covariant vector Av ontransformed vectors Aw need not equal the original value v(w). However, it is natural to focus on

those transformations that are compatible with the canonical assignment, i.e. Av(Aw) = v(w).In a component language, this condition translates to

(AT ) σµ gσρA

ρν = gµν

or AT gA = g. I.e. transformations compatible with the canonical identification (vector space)↔ (dual space.) have to respect the metric. E.g., in the case of the Lorentz metrix, the ’good’transformations belong to the Lorentz group.

Summarizing, we have seen that the invariant meaning behind contra– and covariant vectors is

that of vectors and dual vectors resepectively. Under Lorentz transformations these objects behave

as is required by the component definitions given in the main text. A general tensor of degree

(n,m) is an element of (⊗n1V )⊗ (⊗m1 V ∗).

The definitions above may be extended to objects of higher complexity. A two componentquantity W µν is called a contravariant tensor of second degree if it transforms underLorentz transformations as W µν → Λµ

µ′Λνν′W

µ′ν′ . Similiarly, a covariant tensor of second

degree transforms as Wµν → (ΛT−1) µ′µ (ΛT−1) ν′

ν Wµ′ν′ . Covariant and contravariant tensors

are related to each other by index raising/lowering, e.g. Wµν = gµµ′gνν′Wµ′ν′ . A mixed

second rank tensor W νµ transforms as W ν

µ → (ΛT−1) µ′µ Λν

ν′Wν′

µ′ . The generalization ofthese definitions to tensors of higher degree should be obvious.

Finally, we define a contravariant vector field (as opposed to a fixed vector) as a fieldvµ(x) that transforms under Lorentz transformations as vµ(x) → v′µ(x′) = Λµ

νvν(x). A

Lorentz scalar (field), or tensor of degree zero is one that does not actively transform underLorentz transformations: φ(x) → φ′(x′) = φ(x). Covariant vector fields, tensor fields, etc.are defined in an analogous manner.

Page 82: Classical Theoretical Physics - Alexander Altland

78 CHAPTER 3. RELATIVISTIC INVARIANCE

The summation over a contravariant and a covariant index is called a contraction. Forexample, vµwµ ≡ φ is the contraction of a contravariant and a covariant vector (field). Asa result, we obtain a Lorentz scalar, i.e. an object that does not change under Lorentztransformations. The contraction of two indices lowers the tensor degree of an object by two.For example the contraction of a mixed tensor of degree two and a contravariant tensor ofdegree one, Aµνv

ν ≡ wµ obtains a contravariant tensor of degree one, etc.

3.4.2 The relativistic covariance of electrodynamics

We now have everything in store to prove the relativistic covariance of electrodynamics, i.e.the form–invariance of its basic equations under Lorentz transformations. Basically, what weneed to do is assign a definite transformation behaviour scalar, co–/contravariant tensor, etc.to the building blocks of electrodynamics.

The coordinate vector xµ is a contravariant vector. In electrodynamics we frequentlydifferentiate w.r.t. the compoents xµ, i.e. the next question to ask is whether the four–dimensional gradient {∂xµ} is co– or contravariant. According to the chain rule,

∂x′µ′=

∂xµ

∂x′µ′∂

∂xµ.

Using that xµ = (Λ−1)µµ′x′µ′ , we find that

∂x′µ′= (Λ−1T ) µ

µ′∂

∂xµ,

i.e. ∂xµ ≡ ∂µ transforms as a covariant vector. In components,

∂µ = (c−1∂t,∇), ∂µ = (c−1∂t,−∇).

The four–current vector

We begin by showing that the four–component object j = (cρ, j) is a Lorentz vector. Considerthe current density carried by a point particle at coordinate x(t) in some coordinate systemK. The four–current density carried by the particle is given by j(x) = q(c, dtx(t))δ(x−x(t)),or jµ(x) = qdtx

µ(t)δ(x−x(t)), where xµ(t) = (ct,x(t)). To show that this is a contravariantvector, we introduce a dummy integration,

jµ(x) = q

∫dtdxµ(t)

dtδ(x− x(t))δ(t− t) = qc

∫dτ

dxµ

dτδ4(x− x(τ)),

where in the last step we switched to an integration over the proper time τ = τ(t) uniquelyassigned to the world line parameterization x(t) = (ct,x(t)) in K. Now, the four–componentδ–distribution, δ4(x) = δ(x)δ(ct) is a Lorentz scalar (why?). The proper time, τ , also is aLorentz scalar. Thus, the transformation behaviour of jµ is dictated by that of xµ, i.e. jµ

defines a contravariant vector.As an important corrolary we note that the continuity equation — the contraction of

the covariant vector ∂µ and the contravariant vector jµ is Lorentz invariant, ∂µjµ = 0 is a

Lorentz scalar.

Page 83: Classical Theoretical Physics - Alexander Altland

3.4. MATHEMATICS OF SPECIAL RELATIVITY II: CO– AND CONTRAVARIANCE 79

Electric field strength tensor and vector potential

In section 3.3.2 we obtained Eq. (3.13) for the Lorentz invariant generalization of Newton’sequations. Since vµ and vν transform a covariant and contravariant vectors, respectively, thematrix F µν must transform as a contravariant tensor of rank two. Now, let us define the fourvector potential as {Aµ} = (φ,A). It is then a straightforward exercise to verify that thefield strength tensor is obtained from the vector potential as

F µν = ∂µAν − ∂νAµ. (3.16)

(Just work out the antisymmetric combination of derivatives and compare to the definitionsB = ∇×A, E = −∇φ− c−1∂tA.) This implies that the four–component object Aµ indeedtransforms as a contravariant vector.

Now, we have seen in chapter 1.2 that the Lorentz condition can be expressed as ∂µAµ =

0, i.e. in a Lorentz invariant form. In the Lorentz gauge, the inhomogeneous wave equationsassume the form

�Aµ =4π

cjµ, (3.17)

where � = ∂µ∂µ is the Lorentz invariant wave operator. Formally, this completes the proof

of the Lorentz invariance of the theory. The combination Lorentz–condition/wave equations,which we saw carries the same information as the Maxwell equations, has been proven to beinvariant. However, to stay in closer contact to the original formulation of the theory, we nextexpress Maxwells equations themselves in a manifestly invariant form.

Invariant formulation of Maxwells equations

One may verify by direct inspection that the two inhomogeneous Maxwell equations can beformulated in a covariant manner as

∂µFµν =

cjµ. (3.18)

To obtain the covariant formulation of the homogeneous equations a bit of preparatory workis required: Let us define the fourth rank antisymmetric tensor as

εµνρσ ≡

1, (µ, ν, ρ, σ) = (0, 1, 2, 3)or an even permutation−1 for an odd permutation0 else.

(3.19)

One may show (do it!) that εµνρσ transforms as a contravariant tensor of rank four (underthe transformations of the unit–determinant subgroup L+.) Contracting this object with thecovariant field strength tensor Fµν , we obtain a contravariant tensor of rank two,

Fµν ≡ 1

2εµνρσFρσ,

Page 84: Classical Theoretical Physics - Alexander Altland

80 CHAPTER 3. RELATIVISTIC INVARIANCE

known as the dual field strength tensor. Using (3.20), it is straightforward to verify that

F = {Fµν} =

0 −B1 −B2 −B3

B1 0 E3 −E2

B2 −E3 0 E1

B3 E2 −E1 0

, (3.20)

i.e. that F is obtained from F by replacement E→ B, B→ −E. One also verifies that thehomogeneous equations assume the covariant form

∂µFµν = 0. (3.21)

This completes the invariance proof. Critical readers may find the definition of the dual tensorsomewhat un–motivated. Also, the excessive use of indices — something one normally triesto avoid in physics — does not help to make the structure of the theory transparent. Indeed,there exists a much more satisfactory, coordinate independent formulation of the theory interms of differential forms. However, as we do not assume familiarity of the reader with thetheory of forms, all we can do is encourage him/her to learn it and to advance to the trulyinvariant formulation of electrodynamics not discussed in this text ...

Page 85: Classical Theoretical Physics - Alexander Altland

Chapter 4

Lagrangian mechanics

Newton’s equations of motion provide a complete description of mechanical motion. Evenfrom today’s perspective, their scope is limited only by velocity (at high velocities v ∼ c,Newtonian mechanics has to be replaced by its relativistic generalization), and classicicity(the dynamics of small bodies is affected by quantum effects.) Otherwise, they remain fullyapplicable, which is remarkable for a theory that old.

Newtonian theory had its first striking successes in celes-tial mechanics. But how useful is this theory in a context moreworldly than that of a planet in open space? To see the jus-tification of this question, consider the system shown in thefigure, a setup known as the Atwood machine: two bodiessubject to gravitational force are tied to each other by an ideal-ized massless string over an idealized frictionless pulley. Whatmakes this problem different from those considered in celestialmechanics is that the participating bodies (the two masses)are constrained in their motion. The question then arises howthis constraint can be incorporated into (Newton’s) mechan-ical equations of motion, and how the ensuing equations canbe solved. This problem is of profound applied relevance – of the hundreds of motions takingplace in, say, the engine of a car practically all are constrained. In the eighteenth century,the era initiating the age of engineering and industrialization, the availability of a formalismcapable of efficient formulation of problems subject to mechanical constraints became pressing.

Early solution schemes in terms of Newtonian mechanics relied on the concept of “con-straining forces”. The strategy there was to formulate a problem in terms of its basal un-constrained variables (e.g., the real space coordinates of the two masses in the figure). Ina second step one would then introduce (infinitely strong) constraining forces serving toreduce the number of free coordinates (e.g., down to the one coordinate measuring the heightof one of the masses in the figure.) However, strategies of this type soon turned out to beoperationally sub-optimal. The need to find more efficient formulations was motivation forintensive research activity, which eventually culminated in the modern formulations of classicalmechanics, Lagrangian and Hamiltonian mechanics.

81

Page 86: Classical Theoretical Physics - Alexander Altland

82 CHAPTER 4. LAGRANGIAN MECHANICS

INFO There are a number of conceptualy different types of mechanical constraints: con-straints that can be expressed in terms of equalities such as f(q1, . . . ,qn; q1, . . . , qn) = 0 arecalled holonomic. Here, q1, . . . ,qn are the basal coordinates of the problem, and q1, . . . , qnthe corresponding velocities. (The constraint of the Atwood machine belongs to this category:x1 +x2 = l, where l is a constant, and xi, i = 1, 2 are the heights of the two bodies measured withrespect to a common reference height.) This has to be contrasted to non-holonomic constraints,i.e. constraints formulated by inequalities. (Think of the molecules moving in a piston. Theircoordinates obey the constraint 0 ≤ qj ≤ Lj , j = 1, 2, 3, where Lj are the extensions of thepiston.)

Constraints explicitly involving time f(q1, . . . ,qn; q1, . . . , qn; t) = 0 are called rheonomic. For

example, a particle constraint to move on a moving surface is subject to a rheonomic constraint.

Constraints void of explicit time dependence are called scleronomic.

In this chapter, we will introduce the concept of variational principles to derive Lagrangian (andlater Hamiltonian) mechanics from their Newtonian ancestor. Our construction falls short toelucidate the beautiful and important history developments that eventually led to the modernformulation. Also, it is difficult to motivate in advance. However, within the framework of ashort introductory course, these shortcomings are outweighed by the brevity of the derivation.Still, it is highly recommended to consult a textbook on classical mechanics to learn moreabout the historical developments that led from Newtonian to Lagrangian mechanics.

Let us begin with a few simple and seemingly un–inspired manipulations of Newton’sequations mq = F of a single particle (generalization to many particles will be obvious)subject to a conservative force F. Now, notice that the l.h.s. of the equation may be writtenas mq = dt∂qT , where T = T (q) = m

2q2 is the particle’s kinetic energy. By definition, the

conservative force affords a representation F = −∂qU , where U = U(q) is the potential. Thismeans that Newton’s equation may be equivalently represented as

(dt∂q − ∂q)L(q, q) = 0, (4.1)

where we have defined the Lagrangian function,

L = T − U. (4.2)

But what is this reformulation good for? To appreciate the meaning of the mathematicalstructure of Eq. (4.1), we need to introduce the purely mathematical concept of

4.1 Variational principles

In standard calculus, one is concerned with functions F (v) that take vectors v ∈ Rn asarguments. Variational calculus generalizes standard calculus, in that one considers “functions”F [f ] taking functions as arguments. Now, “function of a function” does not sound nice. Thismay be the reason for why the “function” F is actually called a functional. Similarly, it iscustomary to indicate the argument of a functional in angular brackets. The generalizationv→ f is not in the least mysterious: as we have seen in previous chapters, one may discretizea function (cf. the discussion in section 1.3.4) f → {fi|i = 1, . . . , N}, thus interpreting it as

Page 87: Classical Theoretical Physics - Alexander Altland

4.1. VARIATIONAL PRINCIPLES 83

Figure 4.1: On the mathematical setting of functionals on curves (discussion, see text)

the limiting case of an N -dimensional vector; in many aspects, variational calculus amountsto a straightforward generalization of standard calculus. At any rate, we will see that one maywork with functionals much like with ordinary functions.

4.1.1 Definitions

In this chapter we will focus on the class of functionals relevant to classical mechanics, viz.functionals F [γ] taking curves in (subsets of) Rn as arguments.

1To be specific, consider the

set of all curves M ≡ {γ : I ≡ [t0, t1]→ U} mapping an interval I into a subset U ⊂ Rn ofn-dimensional space (see Fig. 4.1.)

2Now, consider a mapping

Φ :M → R,γ 7→ Φ[γ], (4.3)

assigning to each curve γ a real number, i.e. a (real) functional on M .

EXAMPLE The length of a curve is defined as

L[γ] ≡∫ t1

t0

dt

(n∑i=1

γi(t)γi(t)

)1/2

. (4.4)

1

We have actually met with more general functionals before. For example, the electric susceptibility χ[E]is a functional of the electric field E : R4 → R3.

2

The set U may actually be a lower dimensional submanifold of Rn. For example, we might consider curvesin a two–dimensional plane embedded in R3, etc.

Page 88: Classical Theoretical Physics - Alexander Altland

84 CHAPTER 4. LAGRANGIAN MECHANICS

It assigns to each curve its euclidean length. Some readers may question the consistency of the

notation: on the l.h.s. we have the symbol γ (no derivatives), and on the r.h.s. γ (temporal

derivatives.) However, there is no contradiction here. The notation [γ] indicates dependence on

the curve as a geometric object. By definition, this contains the full information on the curve,

including all derivatives. The r.h.s. indicates that the functional L reads only partial information

on the curve, viz. that contained in first derivatives.

Consider now two curves γ, γ′ ∈ M that lie “close” to each other. (For example, we may

require that |γ(t) − γ′(t)| ≡(∑

i(γi(t) − γ′i(t))2)1/2

< ε for all t and some positive ε.) Weare interested in the increment Φ[γ] − Φ[γ′]. Defining γ′ = γ + h, the functional Φ is calleddifferentiable iff

Φ[γ + h]− Φ[γ] = F∣∣γ[h] +O(h2), (4.5)

where F∣∣γ[h] is a linear functional of h, i.e. a functional obeying F

∣∣γ[c1h1 + c2h2] =

c1F∣∣γ[h1] + c2F

∣∣γ[h2] for c1, c2 ∈ R and h1,2 ∈ M . In (4.5), O(h2) indicates residual

contributions of order h2. For example, if |h(t)| < ε for all t, these terms would be of O(ε2).The functional F |γ is called the differential of the functional Φ at γ. Notice that F |γ need

not depend linearly on γ. The differential generalizes the notion of a derivative to functionals.Similarly, we may think of Φ[γ+h] = Φ[γ]+F |γ[h]+O(h2) as a generalized Taylor expansion.The linear functional F |γ describes the behavior of Φ in the vicinity of the reference curve γ.A curve γ is called an extremal curve of Φ if F |γ = 0.

EXERCISE Re-familiarize yourself with the definition of the derivative f ′(x) of higher dimensional

functions f : Rk → R. Interpret the functional Φ[γ] as the limit of a function Φ : RN → R, {γi} →Φ({γi}) where the vector {γi|i = 1, . . . , N} is a discrete approximation of the curve γ. Think how

the definition (4.5) generalizes the notion of differentiability and how F |γ ↔ f ′(x) generalizes the

definition of a derivative.

This is about as much as we need to say/define in most general terms. In the next sectionwe will learn how to determine the extremal curves for an extremely important sub–family offunctionals.

4.1.2 Euler–Lagrange equations

In the following, we will focus on functionals that afford a “local representation”

S[γ] =

∫ t1

t0

dt L(γ(t), γ(t), t), (4.6)

Page 89: Classical Theoretical Physics - Alexander Altland

4.1. VARIATIONAL PRINCIPLES 85

where L : Rn ⊕ Rn ⊕ R → R is a function. The functional S is“local” in that the integral kernel L does not depend on points on thecurve at different times. Local functionals play an important rolein applications. (For example, the length functional (4.4) belongsto this family.) We will consider the restriction of local functionalsto the set of all curves γ ∈ M that begin and end at common terminal points: γ(t0) ≡ γ0

and γt1 = γ1 with fixed γ0 and γ1 (see the figure.) Again, this is a restriction motivated bythe applications below. To keep the notation slim, we will denote the space of all curves thusrestricted again by M .

We now prove the following important fact: the local functional S[γ] is differentiable andits derivative is given by

F∣∣γ[h] =

∫ t1

t0

dt (∂γL− dt∂γL) · h. (4.7)

Here, we are using the shorthand notation ∂γL·h ≡∑n

i=1 ∂xiL(x, γ, t)∣∣xi=γi

hi and analogously

for ∂γ · h. Eq. (4.7) is verified by straightforward Taylor series expansion:

S[γ + h]− S[γ] =

∫ t1

t0

dt(L(γ + h, γ + h, t)− L(γ, γ, t)

)=

=

∫ t1

t0

dt[∂γL · h+ ∂γ · h

]+O(h2) =

=

∫ t1

t0

dt [∂γL− dt(∂γL)] · h+ ∂γL · h|t1t0 +O(h2),

where in the last step, we have integrated by parts. The surface term vanishes becauseh(t0) = h(t1) = 0, on account of the condition γ(ti) = (γ+h)(ti) = γi, i = 0, 1. Comparisonwith the definition (4.5) then readily gets us to (4.7).

Eq. (4.7) entails an important corollary: the local functional S is extremal on all curvesobeying the so-called Euler-Lagrange equations

d

dt

dL

dγi− dL

dγi= 0, i = 1, . . . , N.

The reason is that if and only if these N conditions hold, will the linear functional (4.7) vanishon arbitrary curves h. (Exercise: assuming that one or several of the conditions above areviolated, construct a curve h on which the functional (4.7) will not vanish.)

Page 90: Classical Theoretical Physics - Alexander Altland

86 CHAPTER 4. LAGRANGIAN MECHANICS

Let us summarize what we have got: for a given function L : Rn⊕Rn⊕R→ R,the local functional

S : M → R,

γ 7→ S[γ] ≡∫ t1

t0

dt L(γ(t), γ(t), t), (4.8)

defined on the set M = {γ : [t0, t1] → Rn|γ(t0) = γ0, γ(t1) = γ1} is extremalon curves obeying the conditions

d

dt

dL

dγi− dL

dγi= 0, i = 1, . . . , N. (4.9)

These equations are called the Euler–Lagrange equations of the functional S.The function L is often called the Lagrangian (function) and S an actionfunctional.

At this stage, one may observe a suspicious structural similarity between the Euler-Lagrangeequations and our early reformulation of Newton’s equations (4.1). However, before sheddingmore light on this connection, it is worthwhile to illustrate the usage of Euler-Lagrange calculuson an

EXAMPLE Let us ask what curve between fixed initial and final points γ0 and γ1, resp. will haveextremal curve length. Here, extremal means “shortest”, there is no “longest” curve. To answerthis question, we need to formulate and solve the Euler-Lagrange equations of the functional (4.4).

Differentiating the Lagrangian L(γ) =∫ t1t0dt (

∑ni=1 γiγi)

1/2, we obtain (show it)

d

dt

∂L

∂γi− ∂L

∂γi=

γiL(γ)

− γi (γj γj)

L(γ)3.

These equations are solved by all curves connecting γ0 and γ1 as a straight line. For example,

the “constant velocity” curve, γ(t) = γ0t−t1t0−t1 + γ1

t−t0t1−t0 has γi = 0, thus solving the equation.

(Exercise: the most general curve connecting γ0 and γ1 by a straight line is given by γ(t) =

γ0 + f(t)(γ0− γ1), where f(t0) = 0 and f(t1) = 1. In general, these curves describe “accelerated

motion”, γ 6= 0. Still, they solve the Euler-Lagrange equations, i.e. they are of shortest geometric

length. Show it!)

4.1.3 Coordinate invariance of Euler-Lagrange equations

What we actually mean when we write ∂L∂γi(t)

is: differentiate the function L(γ, γ, t) w.r.t. theith coordinate of the curve γ at time t. However the same curve can be represented in differentcoordinates! For example, a three dimensional curve γ will have different representationsdepending on whether we work in cartesian coordinates γ(t)↔ (x1(t), x2(t), x3(t)) or sphericalcoordinates γ(t)↔ (r(t), θ(t), φ(t)).

Page 91: Classical Theoretical Physics - Alexander Altland

4.1. VARIATIONAL PRINCIPLES 87

φ ◦ φ−1

Figure 4.2: On the representation of curves and functionals in different coordinates

Yet, nowhere in our derivation of the Euler–Lagrange equations did we make reference tospecific properties of the coordinate system. This suggests that the Euler–Lagrange equationsassume the same form, Eq. (4.9), in all coordinate systems. (To appreciate the meaningof this statement, compare with the Newton equations which assume their canonical formxi = fi(x) only in cartesian coordinates.) Coordinate changes play an extremely importantrole in analytical mechanics, especially when it comes to problems with constraints. Hence,anticipating that the Euler–Lagrange formalism is slowly revealing itself as a replacement ofNewton’s formulation, it is well invested time to take a close look at the role of coordinates.

Both a curve γ and the functional S[γ] are canonical objects, no reference to coordinatesmade here. The curve is simply a map γ : I → U and S[γ] assigns to that map a number.However, most of the time when we actually need to work with curve, we do so in a systemof coordinates. Mathematically, a coordinate system of U is a diffeomorphic

3map:

φ :U → V,

x 7→ φ(x) ≡ x ≡ (x1, . . . , xn),

where the coordinate domain V is an open subset of Rm and m is the dimensionality ofU ⊂ Rm. (The set U may be of lower dimension than embedding space Rn.) For example,spherical coordinates ]0, π[×]0, 2π[3 (θ, φ) 7→ S2 ⊂ R3 assign to each coordinate pair (θ, φ)a point on the two–dimensional sphere, etc.

4

3

Loosely speaking, this means invertible and smooth (differentiable.)4

We here avoid the discussion of the complications arising when U cannot be covered by a single coordinate“chart” φ. For example, the sphere S2 cannot be fully covered by a single coordinate mapping. (Think why.)

Page 92: Classical Theoretical Physics - Alexander Altland

88 CHAPTER 4. LAGRANGIAN MECHANICS

Given a coordinate system, φ, the abstract curve γ defines a curve x ≡ φ◦γ : I → V ; t 7→x(t) = φ(γ(t)) in coordinate space (see Fig. ??.) What we actually mean when we refer toγi(t) in the previous sections, are the coordinates xi(t) of γ in the coordinate system φ. (Incases where unambiguous reference to one coordinate system is made, it is customary to avoidthe introduction of extra symbols (xi) and to write γi instead. As long as one knows whatone is doing, no harm is done.) Similarly, the abstract functional S[γ] defines a functionalSc[x] ≡ S[φ−1◦x] on curves in the coordinate domain. In our derivation of the Euler-Lagrangeequations we have been making tacit use of a coordinate representation of this kind. In otherwords, the Euler–Lagrange equations were actually derived for the representation Sc[x]. (Againit is customary to simply write S[γ] if unambiguous reference to a coordinate representationis made. Other customary notations include S[x] (omitting the subscript c), or just S[γ] ifγ is tacitly identified with a specific coordinate representation. Occasionally one writes S[γ]where γ is meant to be the vector of coordinates of γ in a specific system.)

Now, suppose we are given another coordinate representation of U , that is, a diffeomor-phism φ′ : U → V ′, x 7→ φ′(x) = x′ ≡ (x′1, . . . , x

′m). A point on the curve, γ(t) now has two

different coordinate representations, x(t) = φ(γ(t)) and x′(t) = φ′(γ(t)). The coordinatetransformation between these representations is mediated by the map

φ′ ◦ φ−1 :V → V ′,

x 7→ x′ = φ′ ◦ φ−1(x).

By construction, this is a smooth and invertible map between open subsets of Rm. For example,if x = (x1, x2, x3) are cartesian coordinates and x′ = (r, θ, φ) are spherical coordinates, wewould have x′1(x) = (x2

1 + x22 + x2

3)1/2, etc.

The most important point is that x(t) and x′(t) describe the same curve, γ(t), only indifferent representations. Specifically, if the reference curve γ is an extremal curve, the coor-dinate representations x : I → V and x′ : I → V2 will be extremal curves, too. According toour discussion in section 4.1.2 both curves must be solutions of the Euler-Lagrange equations,i.e. we can draw the conclusion:

γ extremal ⇒ (4.10)

d

dt

∂L

∂xi− ∂L

∂xi= 0,

d

dt

∂L′

∂x′i− ∂L

∂x′i= 0,

where L′(x′, x′, t) ≡ L(x(x′), x(x′), t) is the Lagrangian in the φ′–coordinate representation.In other words

The Euler–Lagrange equations are coordinate invariant.

They assume the same form in all coordinate systems.

Page 93: Classical Theoretical Physics - Alexander Altland

4.2. LAGRANGIAN MECHANICS 89

EXERCISE Above we have shown the coordinate invariance of the Euler–Lagrange equations by

conceptual reasoning. However, it must be possible to obtain the same invariance properties by

brute force computation. To show that the second line in (4.10) follows from the first, use the

chain rule, dtx′i =

∑j∂x′i∂xj

dtxj , and its immediate consequence∂x′i∂xj

=∂x′i∂xj

.

EXAMPLE Let us illustrate the coordinate invariance of the variational formalism on the exampleof the functional “curve length” discussed on p 86. Considering the case of curves in the plane,n = 2, we might get the idea to attack this problem in polar coordinates φ−1(x) = (r, ϕ). Thepolar coordinate representation of the cartesian Lagrangian L(x1, x2, x1, x2) = (x2

1 + x22)1/2 reads

(verify it)

L(r, ϕ, r, ϕ) = (r2 + r2ϕ2)1/2.

It is now straightforward to compute the Euler–Lagrange equations

d

dt

∂L

∂r− ∂L

∂r= ϕ(. . . )

!= 0,

d

dt

∂L

∂ϕ− ∂L

∂ϕ= ϕ(. . . )

!= 0.

Here, the notation ϕ(. . . ) indicates that we are getting a lengthy list of terms which, however,

are all weighted by ϕ. Putting the initial point into the origin φ(γ0) = (0, 0) and the final point

somewhere into the plane, φ(γ1) = (r1, ϕ1), we thus conclude that the straight line connectors,

φ(γ(t)) = (r(t), ϕ0) are solutions of the Euler–Lagrange equations. (It is less straightforward to

show that these are the only solutions.)

With these preparations in store, we are now in a very good position to apply variationalcalculus to a new and powerful reformulation of Newtonian mechanics.

4.2 Lagrangian mechanics

4.2.1 The idea

Imagine a mechanical problem subject to constraints.For definiteness, we may consider the system shownon the right – a bead sliding on a string and subjectto the gravitational force, Fg. In principle, we maydescribe the situation in terms of Newton’s equa-tions. These equations must contain an “infinitelystrong” force Fc whose sole function is to keepthe bead on the string. About this force we do notknow much (other than that it acts vertically tothe string, and vanishes right on the string.)

According to the structures outlined above, wemay now reformulate Newton’s equations as (4.1), where the potential V = Vg + Vc in theLagrangian L = T −V contains two contributions, one accounting for the gravitational force,

Page 94: Classical Theoretical Physics - Alexander Altland

90 CHAPTER 4. LAGRANGIAN MECHANICS

Vg, and another, Vc, for the force Fc. We also know that the sought for solution curve

q : I → R3, t 7→ q(t) will extremize the action S[q] =∫ t1t0dt L(q, q). (Our problem does not

include explicit time dependence, i.e. L does not carry a time–argument.) So far, we havenot gained much. But let us now play the trump card of the new formulation, its invarianceunder coordinate changes.

In the above formulation of the problem, we seeking for an extremum of S[q] on the set ofall curves I → R3. However, we know that all curves in R3 will be subject to the “infinitelystrong” potential of the constraint force, unless they lie right on the string S. The action ofthose generic curves will be infinitely large and we may remove them from the set of curvesentering the variational procedure from the outset. This observation suggests to represent theproblem in terms of coordinates (s,q⊥), where the one–dimensional coordinate s parameterizesthe curve, and the (3− 1)–dimensional coordinate vector q⊥ parameterizes the space normalto the curve. Knowing that curves with non-vanishing q⊥(t) will have an infinitely large action,we may restrict the set of curves under consideration to M ≡ {γ : I → S, t 7→ (s(t), 0)},i.e. to curves in S. On the string, S, the constraint force Fc vanishes. The above limitationthus entails that the constraint forces will never explicitly appear in the variational procedure;this is an enormous simplification of the problem. Also, our problem has effectively becomeone–dimensional. The Lagrangian evaluated on curves in S is a function L(s(t), s(t)). It ismuch simpler than the original Lagrangian L(q, q(t)) with its constraint force potential.

Before generalizing these ideas to a general solution strategy of problems with constraints,let us illustrate its potential on a concrete problem.

EXAMPLE We consider the Atwood machine depicted on p81. The Lagrangian of this problemreads

L(q1,q2, q1, q2) =m1

2q2

1 +m2

2q2

2 −m1gx1 −m2gx2.

Here, qi is the position of mass i, i = 1, 2 and xi is its height. In principle, the sought for solutionγ(t) = (q1(t),q2(t)) is a curve in six dimensional space. We assume that the masses are releasedat rest at initial coordinates (xi(t0), yi(t0), zi(t0)). The coordinates yi and zi will not change inthe process, i.e. effectively we are seeking for solution curves in the two–dimensional space ofcoordinates (x1, x2). The constraint now implies that x ≡ x1 = l − x2, or x2 = l − x. Thus,our solution curves are uniquely parameterized by the “generalized coordinate” x as (x1, x2) =(x, l − x). We now enter with this parameterization into the Lagrangian above to obtain

L(x, x) =m1 +m2

2x2 − (m1 −m2)gx+ const.

It is important to realize that this function uniquely specifies the action

S[x] =

∫ t1

t0

dtL(x(t), x(t))

of physically allowed curves. The extremal curve, which then describes the actual motion of thetwo–body system, will be solution of the equation

dt∂L

∂x− ∂L

∂x= (m1 +m2)x+ (m1 −m2)g = 0.

Page 95: Classical Theoretical Physics - Alexander Altland

4.2. LAGRANGIAN MECHANICS 91

For the given initial conditions x(t0) = x1(t0) and x(t0) = 0, this equation is solved by

x(t) = x(t0)− m1 −m2

m1 +m2

g

2(t− t0)2.

Substitution of this solution into q1 = (x, y1, z1) and q2 = (l − x, y2, z2) solves our problem.

4.2.2 Hamilton’s principle

After this preparation, we are in a position to formulate a new approach to solving mechanicalproblems. Suppose, we are given an N–particle setup specified by the following data: (i) a La-grangian function L : R6N+1 → R, (x1, . . . ,xN , x1, . . . , xN , t) 7→ L(x1, . . . ,xN , x1, . . . , xN , t)defined in the 6N–dimensional space needed to register the coordinates and velocities of N–particles, and (ii) a set of constraints limiting the motion of particles to an f–dimensionalsubmanifold of RN .

5Mathematically, a (holonomic) set of constraints will be implemented

through 3N − f equations

Fj(x1, . . . ,xN , t) = 0, j = 1, . . . , 3N − f. (4.11)

The number f is called the number of degrees of freedom of the problem.Hamiltons’s principle states that such a problem is to be solved by a three–step algorithm:

. Resolve the constraints (4.11) in terms of f parameters q ≡ (q1, . . . , qf ), i.e. find arepresentation xl(q), l = 1, . . . , N , such that the constraints Fj(x1(q), . . . ,xN(q)) = 0are resolved for all j = 1, . . . , 3N − f . The parameters qi are called generalized coor-dinates of the problem. The maximal set of parameter configurations q ≡ (q1, . . . qf )compatible with the constraint defines a subset V ⊂ Rf and the map V → R3N ,q 7→(x1(q), . . . ,xN(q)) defines an f–dimensional submanifold of R3N .

. Reduce the Lagrangian of the problem to an effective Lagrangian

L(q, q, t) ≡ L(x1(q), . . . ,xN(q), x1(q), . . . , xN(q), t).

In practice, this amounts to a substitution of xi(t) = xi(q(t)) into the original La-grangian. The effective Lagrangian is a function L : V × Rf × R → R, (q, q, t) 7→L(q, q, t).

. Finally formulate and solve the Euler–Lagrange equations

d

dt

L(q, q, t)

dqi− L(q, q, t)

dqi= 0, i = 1, . . . , f. (4.12)

5

Loosely speaking, a d–dimensional submanifold of Rn is a subset of Rn that affords a smooth param-eterization in terms of d < n coordinates (i.e. is locally diffeomorphic to open subsets of Rd.) Think of asmooth surface in three–dimensional space (n = 3, d = 2) or a line (n = 3, d = 1), etc.

Page 96: Classical Theoretical Physics - Alexander Altland

92 CHAPTER 4. LAGRANGIAN MECHANICS

The prescription above is equivalent to the following statement, which is known as Hamiltonsprinciple

Consider a mechanical problem formulated in terms of f generalized coordinates q =(q1, . . . , qf ) and a Lagrangian L(q, q, t) = (T − U)(q, q, t). Let q(t) be solution ofthe Euler-Lagrange equations

(dt∂qi − ∂qi)L(q, q, t) = 0, i = 1, . . . , f,

at given initial and final configuration q(t0) = q0 and q(t1) = q1. This curve describesthe physical motion of the system. It is an extremal curve of the action functional

S[q] =

∫ t1

t0

dt L(q, q, t).

To conclude this section, let us transfer a number of important physical quantities fromNewtonian mechanics to the more general framework of Lagrangian mechanics: the general-ized momentum associated to a generalized coordinate qi is defined as

pi =∂L(q, q, t)

∂qi. (4.13)

Notice that for cartesian coordinates, pi = ∂qiL = ∂qiT = mqi reduces to the familiarmomentum variable. We call a coordinate qi a cyclic variable if it does not enter theLagrangian function, that is if ∂qiL = 0. These two definitions imply that

the generalized momentum corresponding to a cyclic variable isconserved, dtpi = 0.

This follows from dtpi = dt∂qiL = ∂qiL = 0. In general, we call the derivative ∂qiL ≡ Fia generalized force. In the language of generalized momenta and forces, the Lagrangeequations (formally) assume the form of Newton-like equations,

dtpi = Fi.

For the convenience of the reader, the most important quantities revolving around the La-grangian formalism are summarized in table 4.1.

4.2.3 Lagrange mechanics and symmetries

Above, we have emphasized the capacity of the Lagrange formalism to handle problems withconstraints. However, an advantage of equal importance is its flexibility in the choice ofproblem adjusted coordinates: unlike the Newton equations, the Lagrange equations maintaintheir form in all coordinate systems.

Page 97: Classical Theoretical Physics - Alexander Altland

4.2. LAGRANGIAN MECHANICS 93

quantity designation or definitiongeneralized coordinate qigeneralized momentum pi = ∂qiLgeneralized force Fi = ∂qiLLagrangian L(q, q, t) = (T − U)(q, q, t)

Action (functional) S[q] =∫ t1t0dt L(q, q, t)

Euler-Lagrange equations (dt∂qi − ∂qi)L = 0

Table 4.1: Basic definitions of Lagrangian mechanics

The choice of “good coordinates” becomes instrumental in problems with symmetries.From experience we know that a symmetry (think of z–axis rotational invariance) entails theconservation of a physical variable (the z–component of angular momentum), and that itis important to work in coordinates reflecting the symmetry (z–axis cylindrical coordinates.)But how do we actually define the term “symmetry”? And how can we find the ensuingconservation laws? Finally, how do we obtain a system of symmetry–adjusted coordinates? Inthis section, we will provide answers to these questions.

4.2.4 Noether theorem

Consider a family of mappings, hs, of the coordinatemanifold of a mechanical system into itself,

hs :V → V,

q→ hs(q). (4.14)

Here, s ∈ R is a control parameter, and we require thath0 = id. is the identity transform. By way of example, consider the map hs(r, θ, φ) ≡ (r, θ, φ+s) describing a rotation around the z-axis in the language of spherical coordinates, etc. Foreach curve q : I → V, t 7→ q(t), the map hs gives us a new curve hs◦q : I → V, t 7→ hs(q(t))(see the figure).

We call the transformation hs a symmetry (transformation) of a mechanical system iff

S[hs ◦ q] = S[q],

for all s and curves q. The action is then said to be invariant under the symmetry transfor-mation.

Suppose, we have found a symmetry and its representation in terms of a family of in-variant transformations. Associated to that symmetry there is a quantity that is conservedduring the dynamical evolution of the system. The correspondence between symmetry and itsconservation law is established by a famous result due to Emmy Noether:

Page 98: Classical Theoretical Physics - Alexander Altland

94 CHAPTER 4. LAGRANGIAN MECHANICS

Noether theorem (1915-18): Let the action S[q] be invariant under the transfor-mation q 7→ hs(q). Let q(t) be an solution curve of the system (a solution of theEuler-Lagrange equations). Then, the quantity

I(q, q) ≡f∑i=1

pid

ds(hs(q))i (4.15)

is dynamically conserved:dtI(q, q) = 0.

Here, pi = ∂qiL∣∣(q,q)=(hs(q),hs(q))

is the generalized momentum of the ith coordinate,

and the quantity I(q, q) is known as the Noether momentum.

The proof of Noether’s theorem is straightforward: the requirement of action–invariance underthe transformation hs is equivalent to the condition dsS[hs(q)] = 0 for all values of s. Wenow consider the action

S[q] =

∫ tf

t0

dt L(q, q, t),

a solution curve samples during a time interval [t0, tf ]. Here, tf is a free variable and thetransformed value hs(q(tf )) may differ from q(tf ). We next explore what the condition ofaction invariance tells us about the Lagrangian of the theory:

0!

= ds

∫ tf

t0

dt L(hs(q), hs(q), t)

=

∫ t1

t0

dt(∂qiL

∣∣q=hs(q)

ds(hs(q))i + ∂qiL∣∣q=hs(q)

ds(hs(q))i

)=

=

∫ t1

t0

dt (∂qi − dt∂qiL)∣∣(hs(q),hs(q))

ds(hs(q))i + ∂qiL∣∣q=hs(q)

dshs(q)i

∣∣∣tft0.

Now, our reference curve is a solution curve, which means that the integrand in the last line

vanishes. We are thus led to the conclusion ∂qiL∣∣q=hs(q)

dshs(q)i

∣∣∣tft0

= 0 for all values tf , and

this is equivalent to the constancy of the Noether momentum.It is often convenient, to evaluate both, the symmetry transformation hs and the Noether

momentum I = I(s) at infinitesimal values of the control parameter s. In practice, thismeans that we fix a pair (q, q) comprising coordinate and velocity of a solution curve. Wethen consider an infinitesimal transformation hε(q), where ε is infinitesimally small. TheNoether momentum is then obtained as

I(q, q) =

f∑i=1

pid

∣∣∣∣ε

(hε(q))i. (4.16)

Page 99: Classical Theoretical Physics - Alexander Altland

4.2. LAGRANGIAN MECHANICS 95

It is usually convenient to work in symmetry adjusted coordinates, i.e. in coordinateswhere the transformation hs assumes its simplest possible form. These are coordinates wherehs acts by translation in one coordinate direction, that is (hs(q))i = qi + sδij, where j is theaffected coordinate direction. Coordinates adjusted to a symmetry are cyclic (think about thispoint), and the Noether momentum

I = pi,

collapses to the standard momentum of the symmetry coordinate.

4.2.5 Examples

In this section, we will discuss two prominent examples of symmetries and their conservationlaws.

Translational invariance ↔ conservation of momentum

Consider a mechanical system that is invariant under translation in some direction. Withoutloss of generality, we choose cartesian coordinates q = (q1, q2, q3) in such a way that thecoordinate q1 parameterizes the invariant direction. The symmetry transformation hs thentranslates in this direction: hs(q) = q + se1, or hs(q) = (q1 + s, q2, q3). With dshs(q) = e1,we readily obtain

I(q, q) =∂L

∂q1

= p1,

where p1 = mq1 is the ordinary cartesian momentum of the particle. We are thus led to theconclusion

Translational invariance entails the conservation of cartesian momentum.

EXERCISE Generalize the construction above to a system of N particles in the absence of external

potentials. The (interaction) potential of the system then depends only on coordinate differences

qi − qj . Show that translation in arbitrary directions is a symmetry of the system and that this

implies the conservation of the total momentum P =∑N

j=1mjqj .

Rotational invariance ↔ conservation of angular momentum

As a second example, consider a system invariant under rotations around the 3-axis. Incartesian coordinates, the corresponding symmetry transformation is described by

q 7→ hφ(q) ≡ R(3)φ q,

where the angle φ serves as a control parameter (i.e. φ assumes the role of the parameter sabove), and

R(3)φ ≡

cos(φ) sin(φ) 0− sin(φ) cos(φ) 0

0 0 1

Page 100: Classical Theoretical Physics - Alexander Altland

96 CHAPTER 4. LAGRANGIAN MECHANICS

is an O(3)–matrix generating rotations by the angle φ around the 3–axis. The infinitesimalvariant of a rotation is described by

R(3)ε ≡

1 ε 0−ε 1 00 0 1

+O(ε2).

From this representation, we obtain the Noether momentum as

I(q, q) =∂L

∂qidε∣∣ε=0

(R(3)ε q)i = m(q1q2 − q2q1).

This is (the negative of) the 3-component of the particle’s angular momentum. Since thechoice of the 3–axis as a reference axis was arbitrary, we have established the result:

Rotational invariance around a symmetry axis entails the conservation of theangular momentum component along that axis.

Now, we have argued above that in cases with symmetries, one should employ adjusted co-ordinates. Presently, this means coordinates that are organized around the 3–axis: spherical,or cylindrical coordinates. Choosing cylindrical coordinates for definiteness, the Lagrangianassumes the form

L(r, φ, r, φ, z) =m

2(r2 + r2φ2 + z2)− U(r, z), (4.17)

where we noted that the problem of a rotationally invariant system does not depend on φ.The symmetry transformation now simply acts by translation, hs(r, z, φ) = (r, z, φ + s), andthe Noether current is given by

I ≡ l3 =∂L

∂φ= mr2φ. (4.18)

We recognize this as the cylindrical coordinate representation of the 3–component of angularmomentum.

EXAMPLE Let us briefly extend the discussion above to re-derive the symmetry optimized rep-resentation of a particle in a central potential. We choose cylindrical coordinates, such that (a)the force center lies in the origin, and (b) at time t = 0, both q and q lie in the z = 0 plane.Under these conditions, the motion will stay in the z = 0 plane. (Exercise: show this from theEuler–Lagrange equations.) and the cylindrical coordinates (r, φ, z) can be reduced to the polarcoordinates (r, φ) of the invariant z = 0 plane.

Page 101: Classical Theoretical Physics - Alexander Altland

4.2. LAGRANGIAN MECHANICS 97

The reduced Lagrangian reads

L(r, r, φ) =m

2(r2 + r2φ2)− U(r).

From our discussion above, we know that l ≡ l3 = mr2φ isa constant, and this enables us to express the angular velocityφ = l3/mr

2 in terms of the radial coordinate. This leads us theeffective Lagrangian of the radial coordinate,

L(r, r) =m

2r2 +

l2

2mr2− U(r).

The solution of its Euler–Lagrange equations,

mr = −∂rU −l2

mr3,

has been discussed in section *

Page 102: Classical Theoretical Physics - Alexander Altland

98 CHAPTER 4. LAGRANGIAN MECHANICS

Page 103: Classical Theoretical Physics - Alexander Altland

Chapter 5

Hamiltonian mechanics

The Lagrangian approach has introduced a whole new degree of flexibility into mechanicaltheory building. Still, there is room for further development. To see how, notice that in thelast chapter we have consistently characterized the state of a mechanical system in terms ofthe 2f variables (q1, . . . , qf , q1, . . . , qf ), the generalized coordinates and velocities. It is clearthat we need 2f variables to specify the state of a mechanical system. But are coordinatesand velocities necessarily the best variables? The answer is: often yes, but not always. Inour discussion of symmetries in section 4.2.3 above, we have argued that in problems withsymmetries, one should work with variables qi that transform in the simplest possible way (i.e.additively) under symmetry transformations. If so, the corresponding momentum pi = ∂qiL isconserved. Now, if it were possible to express the velocities uniquely in terms of coordinatesand momenta, q = q(q,p), we would be in possession of an alternative set of variables (q,p)such that in a situation with symmetries a fraction of our variables stays constant. And that’sthe simplest type of dynamics a variable can show.

In this chapter, we show that the reformulation of mechanics in terms of coordinates andmomenta as independent variables is an option. The resulting description is called Hamilto-nian mechanics. Important examples of this new approach include:

. It exhibits a maximal degree of flexibility in the choice of problem adjusted coordinates.

. The variables (q,p) live in a mathematical space, so called “phase space”, that carriesa high degree of mathematical structure (much more than an ordinary vector space.)This structure turns out to be of invaluable use in the solution of complex problems.Relatedly,

. Hamiltonian theory is the method of choice in the solution of advanced mechanicalproblems. For example, the theory of (conservative) chaotic systems is almost exclusivelyformulated in the Hamiltonian approach.

. Hamiltonian mechanics is the gateway into quantum mechanics. Virtually all conceptsintroduced below have a direct quantum mechanical extension.

However, this does not mean that Hamiltonian mechanics is “better” than the Lagrangiantheory. The Hamiltonian approach has its specific advantages. However, in some cases it

99

Page 104: Classical Theoretical Physics - Alexander Altland

100 CHAPTER 5. HAMILTONIAN MECHANICS

may be preferable to stay on the level of the Lagrangian theory. At any rate, Lagrangianand Hamiltonian mechanics form a pair that represents the conceptual basis of many moderntheories of physics, not just mechanics. For example, electrodynamics (classical and quantum)can be formulated in terms of a Lagrangian and an Hamiltonian theory, and these formulationsare strikingly powerful.

5.1 Hamiltonian mechanics

The Lagrangian L = L(q, q, t) is a function of coordinates and velocities. What we are after isa function H = H(q,p, t) that is a function of coordinates and momenta, where the momentaand velocities are related to each other as pi = ∂qiL = pi(q, q, t). Once in possession of thenew function H, there must be a way to express the key information carriers of the theory –the Euler Lagrange equations – in the language of the new variables. In mathematics, thereexist different ways of formulating variable changes of this type, and one of them is known as

5.1.1 Legendre transform

Reducing the notation to a necessary minimum, the task formulated above amounts to thefollowing problem: given a function f(x) (here, f assumes the role of L and x represents qi),consider the variable z = ∂xf(x) = z(x) (z represents the momentum pi.) If this equation isinvertible, i.e. if a representation x = x(z) exists, find a partner function g(z), such that gcarries the same amount of information as f . The latter condition means that we are lookingfor a transformation, i.e. a mapping between functions f(x) → g(z) that can be invertedg(z) → f(x), so that the function f(x) can be reconstructed in unique terms from g(z).If this latter condition is met, it must be possible to express any mathematical operationformulated for the function of f in terms of a corresponding operation on the function g (cf.the analogous situation with the Fourier transform.)

Let us now try to find a transformation that meets the criteria above. The most obviousguess might be a direct variable substitution: compute z = ∂xf(x) = z(x), invert to x = x(z),

and substitute this into f , i.e. g(z)?= f(x(z)). This idea goes in the right direction but is

not quite good enough. The problem is that in this way information stored in the function fmay get lost. To exemplify the situation, consider the function

f(x) = C exp(x),

where C is a constant. Now, z = ∂xf(x) = C exp(x), which means x = ln(z/C) substitutionback into f gets us to

g(z) = C exp(ln(z/C)) = z.

The function g no longer knows about C, so information on this constant has been irretrievablylost. (Knowledge of g(z) is not sufficient to re-construct f(x).)

Page 105: Classical Theoretical Physics - Alexander Altland

5.1. HAMILTONIAN MECHANICS 101

However, it turns out that a slight extension of the above idea does the trick. Namely,consider the so called Legendre transform of f(x),

g(z) ≡ f(x(z))− zx(z). (5.1)

We claim that g(z) defines a transformation of functions. To verify this, we need to showthat the function f(x) can be obtained from g(z) by some suitable inverse transformation.It turns out that (up to a harmless sign change) the Legendre transform is self-inverse; justapply it once more and you get back to the function f . Indeed, let us define the variabley ≡ ∂zg(z) = ∂xf

∣∣x(z)

∂zx(z) − z∂zx(z) − x(z). Now, by definition of the function z(x) we

have ∂xf∣∣x(z)

= z(x(z)) = z. Thus, the first two terms cancel, and we have y(z) = −x(z).

We next compute the Legendre transform of g(z):

h(y) = f(x(z(y))− x(z(y))z(y)− z(y)y.

Evaluating the relation y(z) = −x(z) on the specific argument z(y), we get y = y(z(y)) =−x(z(y)), i.e. f(x(z(y)) = f(−y), and the last two terms in the definition of h(y) are seento cancel. We thus obtain

h(y) = f(−y),

the Legendre transform is (almost) self–inverse, and this means that by passing from f(x) tog(z) no information has been lost.

The abstract definition of the Legendre transform with its nested variable dependencies canbe somewhat confusing. However, when applied to concrete functions, the transform is actuallyeasy to handle. Let us illustrate this on the example considered above: with f(x) = C exp(x),and x = ln(z/C), we get

g(z) = C exp(ln(z/C))− ln(z/C)z = z(1− ln(z/C)).

(Notice that g(z) “looks” very different from f(x), but this need not worry us.) Now, let usapply the transform once more: define y = ∂zg(z) = − ln(z/C), or z(y) = C exp(−y). Thisgives

h(y) = C exp(−y)(1 + y)− C exp(−y)y = C exp(−y),

in accordance with the general result h(y) = f(−y).The Legendre transform of multivariate functions f(x) is obtained by application of

the rule to all variables: compute zi ≡ ∂xif(x). Next construct the inverse x = x(z). Thendefine

g(z) = f(x(z))−∑i

zixi(z). (5.2)

5.1.2 Hamiltonian function

Definition

We now apply the construction above to compute the Legendre transform of the Lagrangefunction. We thus apply Eq. (5.2) to the function L(q, q, t) and identify variables as x↔ q

Page 106: Classical Theoretical Physics - Alexander Altland

102 CHAPTER 5. HAMILTONIAN MECHANICS

and z ↔ p. (Notice that the coordinates, q, themselves play the role of spectator variables.The Legendre transform is in the variables q!) Le us now formulate the few steps it takes topass to the Legendre transform of the Lagrange functions:

1. Compute the f variables

pi = ∂qiL(q, q, t). (5.3)

2. Invert these relations to obtain qi = qi(q,p, t)

3. Define a function

H(q,p, t) ≡∑i

piqi(q, q, t)− L(q, q(q,p, t)). (5.4)

Technically, this is the negative of L’s Legendre transform. Of course, this functioncarries the same information as the Legendre transform itself.

The function H is known as the Hamiltonian function of the system. It is usually called“Hamiltonian” for short (much like the Lagrangian function) is called “Lagrangian”. Thenotation above emphasizes the variable dependencies of the quantities qi, pi, etc. One usuallykeeps the notation more compact, e.g. by writing H =

∑i piqi−L. However, it is important

to remember that in both L and∑

i qipi, the variable qi has to be expressed as a function ofq and p. It doesn’t make sense to write down formulae such as H = . . . , if the right handside contains qi’s as fundamental variables!

Hamilton equations

Now we need to do something with the Hamiltonian function. Our goal will be to transcribethe Euler–Lagrange equations to equations defined in terms of the Hamiltonian. The hopethen is, that these new equations contain operational advantages over the Euler–Lagrangeequations.

Now, the Euler-Lagrange equations probe changes (derivatives) of the function L. It willtherefore be a good idea, to explore what happens if we ask similar questions to the functionH. In doing so, we should keep in mind that our ultimate goal is to obtain unambiguousinformation on the evolution of particle trajectories. Let us then compute

∂qiH(q,p, t) = pj∂qi qj(q,p)− ∂qiL(q, q(q,p))− ∂qjL(q, q(q,p))∂qi qj(q,p).

The first and the third term cancel, because ∂qjL = pj. Now, iff the curve q(t) is a so-lution curve, the middle term equals −dt∂qiL = −dtpi. We thus conclude that the (q,p)-representation of solution curves must obey the equation

pi = −∂qiH(q,p, t).

Page 107: Classical Theoretical Physics - Alexander Altland

5.1. HAMILTONIAN MECHANICS 103

Similarly,

∂piH(q,p, t) = qi(q,p) + pj∂pi qj(q,p, t)− ∂qjL(q, q(q,p))∂pi qj(q,p, t).

The last two terms cancel, and we have

qi = ∂piH(q,p, t).

Finally, let us compute the partial time derivative ∂tH.1

dtH(q,p, t) = pi∂tqi(q,p, t)− ∂qiL(q, q(q,p, t))∂tqi(q,p, t)− ∂tL(q, q, t).

Again, two terms cancel and we have

∂tH(q,p, t) = −∂tL(q, q(q,p, t), t), (5.5)

where the time derivative on the r.h.s. acts only on the third argument L( . , . , t).Let us summarize where we are: The solution curve q(t) of a mechanical system de-

fines a 2f–dimensional “curve”, (q(t), q(t)). Eq. (5.3) then defines a 2f–dimensional curve(q(t),p(t)). The invertibility of the relation p↔ q implies that either representation faithfullydescribes the curve. The derivation above then shows that

The solution curves (q,p)(t) of mechanical problems obey the so–called Hamiltonequations

qi = ∂piH,

pi = −∂qiH. (5.6)

Some readers may worry about the numbers of variables employed in the description of curves:in principle, a curve is uniquely represented by the f variables q(t). However, we have nowdecided to represent curves in terms of the 2f–variables (q,p). Is there some redundancyhere? No there isn’t and the reason can be understood in different ways. First notice that inLagrangian mechanics, solution curves are obtained from second order differential equations intime. (The Euler–Lagrange equations contain terms qi.) In contrast, the Hamilton equationsare first order in time.

2The price to be payed for this reduction is the introduction of a second

set of variables, p. Another way to understand the rational behind the introduction of 2fvariables is by noting that the solution of the (2nd order differential) Euler–Lagrange equations

1

Here it is important to be very clear about what we are doing: in the present context, q is a variable inthe Lagrange function. (We could also name it v or z, or whatever.) It is considered a free variable (no timedependence), unless the transformation qi = qi(q,p, t) becomes explicitly time dependent. Remembering theorigin of this transformation, we see that this may happen if the function L(q,p, t) contains explicit timedependence. This happens, e.g., in the case of rheonomic constraints, or time dependent potentials.

2

Ordinary differential equations of nth order can be transformed to systems of n ordinary differentialequations of first order. The passage L→ H is example of such an order reduction.

Page 108: Classical Theoretical Physics - Alexander Altland

104 CHAPTER 5. HAMILTONIAN MECHANICS

requires the specification of 2f boundary conditions. These can be the 2f conditionsstored in the specification q(t0) = q0 and q(t1) = q1 of an initial and a final point,

3or the

specification q(t0) = q0 and q(t0) = v0 of an initial configuration and velocity. In contrast4

the Hamilton are uniquely solvable once an initial configuration(q(t0),p(t0)) = (q0,p0) has been specified.

Either way, we need 2f boundary conditions, and hence 2f variables to uniquely specify thestate of a mechanical system.

EXAMPLE To make the new approach more concrete, let us formulate the Hamilton equationsof a point particle in cartesian coordinates. The Lagrangian is given by

L(q, q, t) =m

2q2 − U(q, t),

where we allow for time dependence of the potential. Thus

pi = mqi,

which is inverted as qi(p) = pim . This leads to

H(q,p, t) =3∑i=1

pipim− L(q,p/m, t) =

p2

2m+ U(q, t).

The Hamilton equations assume the form

q =p

m,

p = −∂qU(q, t).

Further, ∂tH = −∂tL = ∂tU inherits its time dependence from the time dependence of thepotential. The two Hamilton equations are recognized as a reformulation of the Newton equation.(Substituting the time derivative of the first equation, q = p/m, into the second we obtain theNewton equation mq = −∂qU .)

Notice the Hamiltonian function H = p2

2m + U(q, t) equals the energy of the particle! In the next

section, we will discuss this connection in more general terms.

EXAMPLE Let us now solve these equations for the very elementary example of the one-

dimensional harmonic oscillator. In this case, V (q) = mω2

2 q2 and the Hamilton equations assumethe form

q =p

m,

p = −mω2q.

For a given initial configuration x(0) = (q(0), p(0)), these equations afford the unique solution

q(t) = q(0) cosωt+p(0)

mωsinωt,

p(t) = p(0) cosωt−mω sinωt. (5.7)

3

But notice that conditions of this type do not always uniquely specify a solution – think of examples!4

Formally, this follows from a result of the theory of ordinary differential equations: a system of n firstorder differential equations xi = fi(x1, . . . , xn), i = 1, . . . , n affords a unique solution, once n initial conditionsxi(0) have been specified.

Page 109: Classical Theoretical Physics - Alexander Altland

5.1. HAMILTONIAN MECHANICS 105

Physical meaning of the Hamilton function

The Lagrangian L = T − U did not carry immediate physical meaning; its essential rolewas that of a generator of the Euler–Lagrange equations. However, with the Hamiltonianthe situation is different. To understand its physical interpretation, let us compute the fulltime derivative dtH = dtH(q(t),p(t), t) of the Hamiltonian evaluated on a solution curve(q(t),p(t)), (i.e. a solution of the Hamilton equations (5.6)):

dtH(q,p, t) =

f∑i=1

(∂H(q,p, t)

∂qiqi +

∂H(q,p, t)

∂pipi

)+∂H(q,p, t)

∂t

(5.6)=

∂H(q,p, t)

∂t.

Now, above we have seen (cf. Eq. (5.5)) that the Hamiltonian inherits its explicit timedependence ∂tH = −∂tL from the explicit time dependence of the Lagrangian. In problemswithout explicitly time dependent potentials (they are called autonomous problems), theHamiltonian stays constant on solution curves.

We have thus found that the function H(q,p), (the missing time argument indicates thatwe are now looking at an autonomous situation) defines a constant of motion of extremelygeneral nature. What is its physical meaning? To answer this question, we consider an generalN–body system in an arbitrary potential. (This is most general conservative autonomoussetup one may imagine.) In cartesian coordinates, its Lagrangian is given by

L(q,p) =N∑j=1

mjq2j − U(q1, . . . ,qN),

where U is the N–body potential and mj the mass of the jth particle. Proceeding as in theexample on p 104, we readily find

H(q,p) =N∑j=1

p2i

2mi

+ U(q1, . . . ,qN). (5.8)

The expression on the r.h.s. we recognize as the energy of the system. We are thus led to thefollowing important conclusion:

The Hamiltonian H(q,p) of an autonomous problem (a problem with time–independentpotentials) is dynamically conserved: H(q(t),p(t)) = E = const. on solution curves(q(t),p(t)).

However, our discussion above implies another very important corollary. It has shown that

The Hamiltonian H = T + U is given by the sum of potential energy, U(q, t), andkinetic energy, T (q,p, t), expressed as a function of coordinates and momenta.

Page 110: Classical Theoretical Physics - Alexander Altland

106 CHAPTER 5. HAMILTONIAN MECHANICS

We have established this connection in cartesian coordinates. However, the coordinateinvariance of the theory implies that this identification holds in general coordinate systems.Also, the identification H = T +U did not rely on the time–independence of the potential, itextends to potentials U(q, t).

INFO In principle, the identification H = T +U provides an option to access the Hamiltonian

without reference to the Lagrangian. In cases where the expression T = T (q,p, t) of the

kinetic energy in terms of the coordinates and momenta of the theory is known, we may just

add H(q,p, t) = T (q,p, t) + U(q, t). In such circumstances, there is no need to compute the

Hamiltonian by the route L(q, q, t)Legendre−→ H(q,p, t). Often, however, the identification of the

momenta of the theory is not so obvious, and one is better advised to proceed via Legendre

transform.

5.2 Phase space

In this section, we will introduce phase space as the basic “arena” of Hamiltonian mechanics.We will start from an innocent definition of phase space as a 2f–dimensional coordinate spacecomprising configuration space coordinates, q, and momenta, p. However, as we go along,we will realize that this working definition is but the tip of an iceberg: the coordinate spacesof Hamiltonian mechanics are endowed with very deep mathematical structure, and this is oneway of telling why Hamiltonian mechanics is so powerful.

5.2.1 Phase space and structure of the Hamilton equations

To keep the notation simple, we will suppress reference to an optional explicit time depen-dence of the Hamilton functions throughout this section, i.e. we just write H(. . . ) instead ofH(. . . , t).

The Hamilton equations are coupled equations for coordinates q, and momenta p. As wasargued above, the coordinate pair (q,p) contains sufficient information to uniquely encodethe state of a mechanical system. This suggests to consider the 2f–component objects

x ≡(qp

)(5.9)

as the new fundamental variables of the theory. Given a coordinate system, we may think ofx as element of a 2f–dimensional vector space. However, all what has been said in section4.1.3 about the curves of mechanical systems and their coordinate representations carries overto the present context: in abstract terms, the pair “(configuration space points, momenta)”defines a mathematical space known as phase space, Γ. The formulation of coordinateinvariant descriptions of phase space is a subject beyond the scope of the present course.However, locally it is always possible to parameterize Γ in terms of coordinates,

5and to describe

5

Which means that Γ has the status of a 2f–dimensional manifold.

Page 111: Classical Theoretical Physics - Alexander Altland

5.2. PHASE SPACE 107

its elements through 2f–component objects such as (5.9). This means that, locally, phasespace can be identified with a 2f–dimensional vector space. However, different coordinaterepresentations correspond to different coordinate vector spaces, and sometimes it is importantto recall that the identification (phase space) ↔ (vector space) may fail globally.

6

Keeping the above words of caution in mind, we will temporarily identify phase space Γ witha 2f–dimensional vector space. We will soon see that this space contains a very particular sortof “scalar product”. This additional structure makes phase space a mathematical object farmore interesting than an ordinary vector space. Let us start with a rewriting of the Hamiltonequations. The identification xi = qi and xf+i = pi, i = 1, . . . , f enables us to rewriteEq. (5.6) as,

xi = ∂xi+fH,

xi+f = −∂xiH,

where i = 1, . . . , f . We can express this in a more compact form as

xi = Iij∂xjH, i = 1, . . . , 2f, (5.10)

or, in vectorial notation,

x = I∂xH. (5.11)

Here, the (2f)× (2f) matrix I is defined as

I =

(1f

−1f

), (5.12)

and 1f is the f -dimensional unit matrix. The matrix I is sometimes called the symplecticunity.

5.2.2 Hamiltonian flow

For any x, the quantity I∂xH(x) = {Iij∂xjH(x, t)} is a vector in phase space. This meansthat the prescription

XH :Γ→ Rn,

x 7→ I∂xH, (5.13)

defines a vector field in phase space, the so-called Hamiltonian vector field. The form ofthe Hamilton equations

x = Xh, (5.14)

suggests an interpretation in terms of the “flow lines” of the Hamiltonian vector field: at eachpoint in phase space, the field XH defines a vector XH(x). The Hamilton equations state

6

For example, there are mechanical systems whose phase space is a two–sphere, and the sphere is not avector space. (However, locally, it can be represented in terms of vector–space coordinates.)

Page 112: Classical Theoretical Physics - Alexander Altland

108 CHAPTER 5. HAMILTONIAN MECHANICS

Figure 5.1: Visualization of the Hamiltonian vector field and its flow lines

that the solution curve x(t) is tangent to that vector, x(t) = XH(x). One may visualize thesituation in terms of the streamlines of a fluid. Within that analogy, the value XH(x) is ameasure of the local current flow. If one injected a drop of colored ink into the fluid, its tracewould be a representation of the curve x(t).

For a general vector field v : U → RN ,x 7→ v(x), where U ⊂ RN , one may define aparameter–dependent map

Φ : U × R→ U,

(x, t) 7→ Φ(x, t), (5.15)

through the condition ∂xΦ(x, t)!

= v(Φ(x, t)). The map Φ is called the flow of the vectorfield v. Specifically, the flow of the Hamiltonian vector field XH is defined by the prescription

Φ(x, t) ≡ x(t), (5.16)

where x(t) is a curve with initial condition x(t = 0) = x. Equation (5.16) is a properdefinition because for a given x(0) ≡ x, the solution x(t) is uniquely defined. Further,∂tΦ(x, t) = dtx(t) = XH(x(t)) satisfies the flow condition. The map Φ(x, t) is called theHamiltonian flow.

There is a corollary to the uniqueness of the curve x(t) emanating from a given initialcondition x(0):

Phase space curves never cross.

For if they did, the crossing point would be the initial point of the two out-going stretchesof the curve, and this would be in contradiction to the uniqueness of the solution for a giveninitial configuration. There is, however, one subtle caveat to the statement above: althoughphase space curves do not cross, they may actually touch each other in common terminalpoints. (For an example, see the discussion in section 5.2.3 below.)

Page 113: Classical Theoretical Physics - Alexander Altland

5.2. PHASE SPACE 109

5.2.3 Phase space portraits

Graphical representations of phase flow, so-called phase spaceportraits are very useful in the qualitative characterization ofmechanical motion. The only problem with this is that phaseflow is defined in 2f dimensional space. For f > 1, we cannotdraw this. What can be drawn, however, is the projection of acurve onto any of the 2f(2f−1)/2 coordinate planes (xi, xj),1 ≤ i < j ≤ 2f . See the figure for the example of a coordinateplane projection out of 3d–dimensional space. However, ittakes some experience to work with those representations andwe will not discuss them any further.

However, the concept of phase space portraits becomes truly useful in problems withone degree of freedom, f = 1. In this case, the conservation of energy on individualcurves H(q, p) = E = const. enables us compute an explicit representation q = q(p, E), orp = p(q, E) just by solving the relation H(q, p) = E for q or p. This does not “solve” the me-chanical problem under consideration. (For that one would need to know the time dependenceat which the curves are traversed.) Nonetheless, it provides a far–reaching characterizationof the motion: for any phase space point (q, p) we can construct the curve running throughit without solving differential equations. The projection of these curves onto configurationspace (q(t), p(t)) 7→ q(t) tells us something about the evolution of the configuration spacecoordinate.

Let us illustrate these statements, on an example where a closedsolution is possible, the harmonic oscillator. From the HamiltonianH(q, p) = p2

2m+ mω2

2q2 = E, we find

p = p(q, E) = ±√

2m

(E − mω2

2q2

).

Examples of these (ellipsoidal) curves are shown in the figure. Itis straightforward to show (do it!) that they are tangential to theHamiltonian vector field

XH =

(m−1p−mω2q

).

At the turning points, p = 0, the energy of the curve E = mω2

2q2,

equals the potential energy.Now, the harmonic oscillator may be an example a little bit too basic to illustrate the

usefulness of phase space portraits. Instead, consider the problem defined by the potentialshown in Fig. 5.2. The Hamilton equations can now no longer be solved in closed form.However, we may still compute curves by solution of H(p, q) = E ⇒ p = p(q, E), and theresults look as shown qualitatively for three different values of E in the figure. These curves

Page 114: Classical Theoretical Physics - Alexander Altland

110 CHAPTER 5. HAMILTONIAN MECHANICS

give a far–reaching impression of the motion of a point particle in the potential landscape.Specifically, notice the threshold energy E2 corresponding to the local potential maximum. Atfirst sight, the corresponding phase space curve appears to violate the criterion of the non-existence of crossing points (cf. the “critical point” at p = 0 beneath the potential maximum.)In fact, however, this isn’t a crossing point. Rather the two incoming curves “terminate” atthis point: coming from the left, or right it takes infinite time for the particle to climb thepotential maximum.

EXERCISE Show that a trajectory at energy E = V ∗ ≡ V (q∗) equal to the potential maximum

at q = q∗ needs infinite time to climb the potential hill. To this end, use that the potential

maximum at q∗ can be locally modelled as an inverted harmonic oscillator potential, V (q) 'V ∗−C(q−q∗)2, where C is a positive constant. Consider the local approximation of the Hamilton

function H = p2

2m − C(q − q∗)2 + const. and formulate and solve the corresponding equations of

motion (hint: compare to the standard oscillator discussed above.) Compute the time it takes to

reach the maximum if E = V (q∗).

Eventually it will rest at the maximum in an unstable equilibrium position. Similarly, the out-going trajectories “begin” at this point. Starting at zero velocity (corresponding to zero initialmomentum), it takes infinite time to accelerate and move downhill. In this sense, the curvestouching (not crossing!) in the critical point represent idealizations that are never actuallyrealized. Trajectories of this type are called separatrices. Separatrices are important in thatthey separate phase space into regions of qualitatively different type of motion (presently,curves that make it over the hill and those which do not.)

EXERCISE Take a few minutes to familiarize yourself with the phase space portrait and to learn

how to “read” such representations. Sketch the periodic potential V (q) = cos(2πq/a), a = const.

and its phase space portrait.

5.2.4 Poisson brackets

Consider a (differentiable) function in phase space, g : Γ→ R,x 7→ g(x). A natural questionto ask is how g(x) will evolve, as x ≡ x(0) 7→ x(t) traces out a phase space trajectory. Inother words, we ask for the time evolution of g(x(t)) with initial condition g(x(0)) = g(x).The answer is obtained by straightforward time differentiation:

d

dtg(x(t), t) =

d

dtg(q(t),p(t), t) =

f∑i=1

(∂g

∂qi

dqidt

+∂g

∂pi

dpidt

)(q(t),p(t),t)

+ ∂tg(q(t),p(t), t) =

=

f∑i=1

(∂g

∂qi

∂H

∂pi− ∂g

∂pi

∂H

∂qi

)(q(t),p(t),t)

+ ∂tg(q(t),p(t), t),

where the terms ∂tg account for an optional explicit time dependence of g. The character-istic combination of derivatives governing this expressions appears frequently in Hamiltonian

Page 115: Classical Theoretical Physics - Alexander Altland

5.2. PHASE SPACE 111

Figure 5.2: Phase space portrait of a ”non–trivial” one-dimensional potential. Notice the“critical point” at the threshold energy 2).

mechanics. It motivates the introduction of a shorthand notation,

{f, g} ≡f∑i=1

(∂f

∂pi

∂g

∂qi− ∂f

∂qi

∂g

∂pi

). (5.17)

This is expression is called the Poisson bracket of two phase space functions f and g. In theinvariant notation of phase space coordinates x, it assumes the form

{f, g} = (∂xf)T I ∂xg, (5.18)

where I is the symplectic unit matrix defined above. The time evolution of functions may beconcisely expressed in terms of the Poisson bracket of the two functions H and g:

dtg = {H, g}+ ∂tg. (5.19)

Notice that the Poisson bracket operation bears similarity to a scalar product of functions.Let CΓ be the space of smooth (and integrable) functions in phase space. Then

{ , } : CΓ× CΓ→ R,(f, g) 7→ {f, g}, (5.20)

Page 116: Classical Theoretical Physics - Alexander Altland

112 CHAPTER 5. HAMILTONIAN MECHANICS

defines a scalar product. This scalar product comes with a number of characteristic algebraicproperties:

. {f, g} = −{g, f}, (skew symmetry),

. {cf + c′f ′, g} = c{f, g}+ c′{f ′, g}, (linearity),

. {c, g} = 0,

. {ff ′, g} = f{f ′, g}+ {f, g}f ′, (product rule),

. {f, {g, h}}+ {h, {f, g}}+ {g, {h, f}} = 0 (Jacobi identity),

where, c, c′ ∈ R are constants. The first three of these relations are immediate consequencesof the definition and the fourth follows from the product rule. The proof of the Jacobi identityamounts to a straightforward if tedious exercise in differentiation.

At this point, the Poisson bracket appears to be little more than convenient notation.However, as we go along, we will see that it encapsulates important information on themathematical structure of phase space.

5.2.5 Variational principle

The solutions of the Hamilton equations x(t) can be interpreted as extremal curves of a certainaction functional. This connection will be a gateway to the further development of the theory.

Let us consider the set M = {x : [t0, t1]→ Γ,q(t0) = q0,q(t1) = q1} of all phase spacecurves beginning and ending at a common configuration space point, q0,1. We next define thelocal functional

S :M → R,

x 7→ S[x] =

∫ t1

t0

dt

(f∑i=1

piqi −H(q,p, t)

). (5.21)

We now claim that the extremal curves of this functional are but the solutions of the Hamiltonequations (5.6). To see this, define the function F (x, t) ≡ ∑f

i=1 piqi − H(q,p, t), in terms

of which S[x] =∫ t1t0dt F (x, t). Now, according to the general discussion of section 4.1.2, the

extremal curves are solutions of the Euler Lagrange equations

(dt∂xi − ∂xi)F (x, t) = 0, i = 1, . . . , 2f.

Evaluating these equations for i = 1, . . . , f and i = f + 1, . . . , 2f , respectively, we obtain thefirst and second set of the equations (5.6).

Page 117: Classical Theoretical Physics - Alexander Altland

5.3. CANONICAL TRANSFORMATIONS 113

5.3 Canonical transformations

In the previous chapter, we have emphasized the flexibility of the Lagrangian approach inthe choice of problem adjusted coordinates: any diffeomorphic transformation qi 7→ q′i =q′i(q), i = 1, . . . , f let the Euler–Lagrange equations form–invariant. Now, in the Hamiltonianformalism, we have twice as many variables xi, i = 1, . . . , 2f . Of course, we may consider“restricted” transformations, qi 7→ q′i(q), where only the configuration space coordinatesare transformed (the transformation of the pi’s then follows.) However, things get moreinteresting, when we set out to explore the full freedom of coordinate transformations “mixing”configuration space coordinates, and momenta. This would include, for example, a mappingqi 7→ q′i ≡ pi, pi 7→ p′i = qi (a diffeomorphism!) that exchanges the roles of coordinatesand momenta. Now, this newly gained freedom raises two questions: (i) what might suchgeneralized transformations be good for, and (ii) does every diffeormorphic mapping xi 7→ x′iqualify as a good coordinate transformation? Beginning with the second one, let us addressthese two questions in turn:

5.3.1 When is a coordinate transformation ”canonical”?

In the literature of variable transformations in Hamiltonian dynamics it is customary to denotethe “new variables” X ≡ x′ by capital letters. We will follow this convention.

Following our general paradigm of valuing “form invariant equations”, we deem a coordi-nate transformation “canonical”, if the following condition is met:

A diffeomorphism x 7→ X(x)7

defines a canonical transformation if there exists afunction H ′(X, t) such that the representation of solution curves (i.e. solutions ofthe equations x = I∂xH) in the new coordinates X = X(x) solves the “transformedHamilton equations”

X′ = I∂XH′(X, t). (5.22)

Now, this definition may leave us a bit at a loss: it does not tell how the “new” Hamiltonian H ′

relates to the old one, H and in this sense seems to be strangely “under–defined”. However,before exploring the ways by which canonical transformations can be constructively obtained,let us take a look at possible motivations for switching to new coordinates.

5.3.2 Canonical transformations: why?

Why would one start thinking about generalized coordinate transformations mixing config-uration space coordinates and momenta, etc.? In previous sections, we had argued thatcoordinates should relate to the symmetries of a problem. Symmetries, in turn, generate con-servations laws (Noether), i.e. for each symmetry one obtains a quantity, I, that is dynamically

Page 118: Classical Theoretical Physics - Alexander Altland

114 CHAPTER 5. HAMILTONIAN MECHANICS

conserved. Within the framework of phase space dynamics, this means that there is a functionI(x) such that dtI(x(t)), where x(t) is a solution curve. We may think of I(x) as a functionthat is constant along the Hamiltonian flow lines.

Now, suppose we have s ≤ f symmetries and, accordingly, s conserved functions Ii, i =1, . . . , f . Further assume that we find a canonical transformation to phase space coordinatesx 7→ X, where P = (I1, . . . , Is, Ps+1, . . . , Pf ), and Q = (φ1, . . . , φs, Qs+1, . . . , Qf ). Inwords: the first s of the new momenta are the functions Ii. Their conjugate configurationspace coordinates are called φi.

8The new Hamiltonian will be some function of the new

coordinates and momenta, H ′(φ1, . . . , φs, Qs+1, . . . , Qf , I1, . . . , Is, Ps+1, . . . , Pf ). Now let’stake a look at the Hamilton equations associated to the variables (φi, Ii):

dtφi = ∂IiH′,

dtIi = −∂φiH ′!

= 0.

The second equation tells us that H ′ must be independent of the variables φi:

The coordinates, φi, corresponding to conserved momenta Ii are cyclic coordinates,

∂H ′

∂φi= 0.

At

this point, it should have become clear that coordinate sets containing conserved momenta,Ii, and their coordinates, φi are tailored to the optimal formulation of mechanical problems.

The power of these coordinates becomes fully evident in the extreme case where s = f ,i.e. where we have as many conserved quantities as degrees of freedom. In this case

H ′ = H ′(I1, . . . , If ),

and the Hamilton equations assume the form

dtφi = ∂IiH′ ≡ ωi

dtIi = −∂φiH ′ = 0.

These equations can now be trivially solved:9

φi(t) = ωit+ αi,

Ii(t) = Ii = const.

8

It is natural to designate the conserved quantities as “momenta”: in section 4.2.4 we saw that theconserved quantity associated to the “natural” coordinate of a symmetry (a coordinate transforming additivelyunder the symmetry operation) is the Noether momentum of that coordinate.

9

The problem remains solvable, even if H ′(I1, . . . , If , t) contains explicit time dependence. In this case,

φi = ∂IiH′(I1, . . . , If , t) ≡ ωi(t), where ωi(t) is a function of known time dependence. (The time dependence

is known because the variables Ii are constant and we are left with the externally imposed dependence ontime.) These equations – ordinary first order differential equations in time – are solved by φi(t) =

∫ tt0dt′ ωi(t′).

Page 119: Classical Theoretical Physics - Alexander Altland

5.3. CANONICAL TRANSFORMATIONS 115

A very natural guess would H ′(X, t)?= H(x(X), t), i.e. the “old” Hamiltonian expressed

in new coordinates. However, we will see that this ansatz can be too restrictive.Progress with this situation is made once we remember that solution curves x are extrema

of an action functional S[x] (cf. section 5.2.5). This functional is the x–representation of an“abstract” functional. The same object can be expressed in terms of X–coordinates (cf. ourdiscussion in section 4.1.3.) as S ′[X] = S[x]. The form of the functional S ′ follows from thecondition that the X–representation of the curve be solution of the equations (5.22). This isequivalent to the condition

S ′[X] =

∫ t1

t0

dt

(f∑i=1

PiQi −H ′(Q,P, t))

+ const.,

where “const.” is a contribution whose significance we will discuss in a moment. Indeed, thevariation of S ′[X] generates Euler–Lagrange equations which we saw in section 5.2.5 are justthe Hamilton equations (5.22).

Page 120: Classical Theoretical Physics - Alexander Altland

116 CHAPTER 5. HAMILTONIAN MECHANICS

Page 121: Classical Theoretical Physics - Alexander Altland

Bibliography

[1] J. D. Jackson. Classical Electrodynamics. Wiley, 1975.

[2] R. Sexl and H. Urbantke. Relativity, Groups, Particles. Springer, 2001.

117