advanced theoretical physics semester 1 jonathan · pdf fileadvanced theoretical physics...

310
Advanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August 8, 2009

Upload: lymien

Post on 17-Mar-2018

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

Advanced Theoretical PhysicsSemester 1

Jonathan Pearson

School of Physics & Astronomy,University of Manchester

August 8, 2009

Page 2: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August
Page 3: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

Preface

These are a set of notes I have made, based on lectures given by variouslecturers at the University of Manchester Sept-Dec ’08. Please e-mail mewith any comments/corrections: [email protected]. These notes may befound at www.jpoffline.com.

These notes are a combination of notes taken whilst in lectures, and extrabits added by me. There is no guarantee as to the validity of any portion ofthis work.

The gravitation notes are based upon the lecture course given by A.Pilaftsis;advanced quantum mechanics notes are based upon lectures given by S.Grigorenko;and advanced statistical physics as given by A.McKane.

iii

Page 4: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August
Page 5: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

Contents

Preface page iii

1 Gravitation 11.1 Recap of Special Relativity 1

1.1.1 The Lorentz Transformations 11.1.2 Covariant Formalism 21.1.3 Standard Relations 61.1.4 The Equivalence Principles 71.1.5 Gravitational Redshift 71.1.6 Einstein’s Vision of General Relativity 8

1.2 Manifolds, Metrics & Tensors 91.2.1 Definitions 91.2.2 Coordinate Transformations 101.2.3 Tangent Vector 111.2.4 The Metric & Line Element 111.2.5 Vectors 121.2.6 Tensors 14

1.3 Tensor Calculus 171.3.1 Covariant Differentiation 171.3.2 Geodesics 291.3.3 Isometries & Killing’s Equation 351.3.4 Summary 381.3.5 Examples 39

1.4 Curvature 451.4.1 The Riemann Tensor 451.4.2 The Ricci Identity 491.4.3 The Ricci Tensor & Scalar 501.4.4 The Bianchi Identity 51

v

Page 6: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

vi Contents

1.4.5 The Einstein Tensor 531.4.6 Geodesic Deviation 54

1.5 Einstein’s Equation 561.5.1 The Energy Momentum Tensor Tµν 561.5.2 Einstein’s Equation 611.5.3 The Newtonian Limit 631.5.4 Linearised Gravity 68

1.6 The Schwarzschild Solution 721.6.1 Dynamics in the Schwarzschild Spacetime 761.6.2 Light Deflection 831.6.3 Perihelion Precession 861.6.4 Black Holes 89

1.7 The Friedmann-Robertson-Walker Universe 941.7.1 The FRW Metric 951.7.2 Geodesics & Christofell Symbols 971.7.3 Cosmology in the FRW Universe 981.7.4 Age of the FRW Universe 1041.7.5 Light in the FRW Universe 1091.7.6 Flatness Problem 111

1.8 The General Theory of Relativity: Discussion 112

2 Advanced Quantum Mechanics 1152.1 Different Quantisation Schemes 115

2.1.1 Orthodox Quantisation 1152.1.2 Modern Quantum Mechanics 1162.1.3 Quantum Mechanical Operators 1252.1.4 Quantum Evolution 1352.1.5 Path Integrals 1432.1.6 Review of Quantisation Schemes 149

2.2 Quantum Harmonic Oscillator 1502.2.1 Raising & Lowering Operators 1502.2.2 The Vacuum State ψ0(x) 1562.2.3 The General State ψn(x) 1572.2.4 Eigenstates & Eigenvalues of a 1572.2.5 Examples 160

2.3 Secondary Quantisation 1612.3.1 Bosons & Fermions 1632.3.2 Non-interacting Particles 1632.3.3 Creation & Destruction Operators 1642.3.4 The Secondary Quantisation Scheme 167

Page 7: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

Contents vii

2.3.5 Average Number of Particles 1682.3.6 Quasi-particles 170

2.4 Symmetries in Quantum Mechanics 1702.4.1 The Translation Operator 1722.4.2 Generators, Conservation & Gauges 176

2.5 Angular Momentum 1772.5.1 Eigenstates and Eigenvalues of Angular Momentum 1802.5.2 Internal Degrees of Freedom 1842.5.3 Total Angular Momentum 1852.5.4 Multiplication of Angular Momenta 187

2.6 Charged Particle in EM Field 1882.6.1 Pauli Hamiltonian 1892.6.2 Phenomenology 1912.6.3 Quantum Theory of Radiation 196

3 Advanced Statistical Physics 2073.1 Elementary Probability Theory 207

3.1.1 Representations 2073.1.2 Stochastic Random Variables 2093.1.3 Multi-variable Probability Distribution Functions 2133.1.4 Covariance 2153.1.5 The Central Limit Theorem 2153.1.6 Time-dependent Systems 218

3.2 Markov Processes 2203.2.1 Introduction 2203.2.2 Markov Chains 2233.2.3 Stochastic Matrices 2273.2.4 Examples of Markov Chains 2353.2.5 The Master Equation 2403.2.6 One Step Processes 2433.2.7 Solution to Master Equation Under Detailed Balance2553.2.8 Summary 267

3.3 Drift & Diffusion 2703.3.1 Introduction 2703.3.2 The Fokker-Planck Equation 2733.3.3 Properties of the Fokker-Planck Equation 282

3.4 Stochastic Differential Equations 2943.4.1 Brownian Motion Described by the Langevin

Equation 294

Page 8: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

viii Contents

3.4.2 The Solution to the Langevin Equation DescribingBrownian Motion 296

3.4.3 Comments on the Langevin Equation 3003.4.4 Equivalence to the Fokker-Planck Equation 301

Page 9: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1

Gravitation

1.1 Recap of Special Relativity

Let us quickly recap the principles of special relativity that are assumed tobe known.

The postulates of SR:

• All laws of nature are the same for all inertial observers;• The speed of light, c, is the same for all inertial observers.

1.1.1 The Lorentz Transformations

Consider a frame Σ′, within which an observer is stationary. The coordinatesin that frame are the “primed ones”, (ct′, x′, y′, z′). Now, consider anotherframe, Σ, such that Σ′ is moving at constant velocity β ≡ v/c relative toa stationary observer in Σ. The coordinates in the “stationary frame” areunprimed (ct, x, y, z).

The two sets of coordinates are related via the transformations

ct′ = γ(ct− βx), x′ = γ(c− βct), y′ = y, z′ = z. (1.1.1)

We have defined the quantities

γ ≡ 1√1− β2

, β ≡ v

c.

From the transformations, we can compute “the invariance of the interval”,thus

c2t′2 − x′2 − y′2 − z′2 = ct2 − x2 − y2 − z2.

The physical consequences of this is that of Fitzgerald contraction (movingbodies shorten), time dilation (moving clocks run slow).

1

Page 10: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2 Gravitation

1.1.2 Covariant Formalism

The title “covariant formalism” is a little misleading: it should read “invari-ant formalism”, but convention leaves it so.

Let us define the contravariant position 4-vector as

xµ = (x0, x1, x2, x3) = (ct, x, y, z). (1.1.2)

The metric of SR is flat, called the Minkowski metric, and written ηµν . Theelements of the metric may be represented as

(ηµν) = diag(1,−1,−1,−1) =

1 0 0 00 −1 0 00 0 −1 00 0 0 −1

.

Notice that this metric is symmetric; ηµν = ηνµ. Consider constructing aninverse matrix to this metric. That is, we require

ηη−1 = 14,

where 14 is the 4-D identity matrix diag(1,1,1,1). Inspection will see thatthe inverse matrix has the same elements as the original. We denote theinverse of the metric as

(η−1)µν ≡ ηµν ,thus, we have that

ηµνηνλ = δλµ.

Now, in Euclidean space, suppose we have a vector x = xiei, where ei is abasis vector and i ∈ [1, n], where n is the dimension of the Euclidean space(usually 3). Then, the dot-product of the vector with itself can be writtenas

x · x = xixjei · ej ,and we “mix the basis vectors” via the Kronecker-delta, which is the metricof Euclidean space

ei · ej = δij ⇒ x · x = xixjδij = xixi.

If we expand out this implied summation, we get the radius of a sphere inthe n-dimensional Euclidean space

xixi = x2 + y2 + z2.

Now, we make the analogy to Minkowski space. We denote a contravariant

Page 11: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.1 Recap of Special Relativity 3

vector as x = xµeµ, so that the inner-product of the vector with itself iswritten

x · x = xµxνeµ · eν ,

and again we mix the basis vectors by the metric of the space; the metric ofMinkowski space is ηµν . Thus,

eµ · eν = ηµν ,

and therefore

x · x = xµxνηµν .

If we say that

xµ = ηµνxν , (1.1.3)

then we see that

x · x = xµxµ.

From this, we are able to define the covariant position 4-vector as

xµ = ηµνxν = (ct,−x,−y,−z).

And therefore, carrying out the summation, we find that the inner-productof the position 4-vector with itself is the radius of a 4-D sphere in Minkowskispace;

xµxµ = (ct)2 − x2 − y2 − z2.

Of course, we can write the inner-product of one 4-vector with another

x · y = xµyνηµν = xµyν .

Just as we used the metric to lower a contravariant vectors index, to becomea covariant index, we may use the inverse metric to raise a covariant indexto become a contravariant one

xµ = ηµνxν . (1.1.4)

Therefore, using these relations, we are able to see that

xνyν = xνyν .

Page 12: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

4 Gravitation

1.1.2.1 Lorentz Boost

Consider again the 4-vector x = xµeµ. Then, consider that the vector is thesame in another frame, then we must have that

xµeµ = x′µe′µ.

The way we transform between frames is via a Lorentz boost;

x′µ = Λµνxν , (1.1.5)

where we use

(Λµν) =

γ −γβ 0 0−γβ γ 0 0

0 0 1 00 0 0 1

.

If we note all of our definitions used thus far (for contravariant vectors, andtheir components), and the expressions forming the Lorentz transformations,(1.1.1), we see that

Λµν =∂x′µ

∂xν.

We say that Λµν (as defined above) constitutes a boost along the x-axis. Itis infact a rotation about the y − z-plane.

Hence, we have a rule for transforming contravariant component, betweenframes: (1.1.5). Then, how does a covariant component transform?

Consider using the metric to change from a contravariant vector to acovariant one, in the primed frame,

x′µ = ηµκx′κ,

then we use (1.1.5) to transform the contravariant vector on the RHS

ηµκx′κ = ηµκΛκλx

λ,

then lower the index on the RHS

ηµκΛκλxλ = ηµκΛκλη

λνxν .

Although not previously stated, we can imagine that the metric can lower/raiseindices on anything, not just position vector-components. Thus, we see thatηµκΛκλ = Λµλ. Hence, the above reads

ηµκΛκληλνxν = Λ ν

µ xν .

Now, let us define the inverse Lorentz transform as

(Λ−1)ν µ ≡ Λ νµ .

Page 13: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.1 Recap of Special Relativity 5

Therefore, writing this stream of algebra down, from start to finish, we arriveat our result

x′µ = ηµκx′κ

= ηµκΛκλxλ

= ηµκΛκληλνxν

= Λ νµ xν

= (Λ−1)ν µxν .

That is, to find the covariant components of a vector in the primed frame, werelate them to the unprimed frame via the inverse Lorentz transformation

x′µ = (Λ−1)ν µxν . (1.1.6)

Let us then right our two Lorentz transformation rules; one for contravariantcomponents & one for covariant

x′µ = Λµνxν , x′µ = (Λ−1)ν µxν . (1.1.7)

Notice that the inverse Lorentz transformation matrix may be written as

((Λ−1)ν µ) =

γ γβ 0 0γβ γ 0 00 0 1 00 0 0 1

, (Λ−1)ν µ =∂xν

∂x′µ.

Notice that the product of Λµν and (Λ−1)ν µ is the identity matrix, as theyare inverses

Λν λ(Λ−1)µν = δµλ .

We are now in a position to be able to prove the invariance of the interval,in Minkowski space, under Lorentz transformations. Consider the inner-product of two vectors in the primed frame,

x′ · y′ = x′µy′µ,

we then transform each expression on the RHS, according to the relevantrule

x′µy′µ = Λµν(Λ−1)λµxνyλ,

then, noting the relation between the transformation & its inverse,

Λµν(Λ−1)λµxνyλ = δλνx

νyλ,

Page 14: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

6 Gravitation

which easily gives

δλνxνyλ = xνyν .

And therefore, putting it all together

x′µy′µ = Λµν(Λ−1)λµxνyλ

= δλνxνyλ

= xνyν .

And thus, we have shown that the inner-product is invariant under Lorentztransformation (the invariance of the interval).

1.1.3 Standard Relations

Here we shall merely state the standard definitions of various 4-vectors.The infinitesimal 4-position is defined as

dxµ = (cdt,x), ⇒ dxµ = ηµνdxν = (cdt,−x).

The line element:

ds2 = ηµνdxµdxν = c2dt2 − (dx)2.

Proper time:

dτ =1c

√dxνdxν =

dt

γ.

4-velocity:

uµ =dxµ

dτ= (cγ, γv).

4-momentum:

pµ = muµ = (E/c,p).

Differential operator:

∂µ ≡(

1c

∂t,∇).

Charge conservation:

∂µJµ = 0, Jµ = (cρ,J).

Lorentz gauge:

∂µAµ = 0, Aµ = (φ/c,A).

Page 15: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.1 Recap of Special Relativity 7

1.1.4 The Equivalence Principles

Here we shall discuss some thought experiments which lead to the develop-ment of general relativity.

1.1.4.1 The Weak Equivalence Principle

Imagine an observer and “ball” inside a sealed lift. The observer is stationaryrelative to the ball, and are unable to see out of the lift. Suppose that thelift is suspended above a homogeneous gravitational field.

Then, suppose that the cable holding the lift up, is cut. The lift willaccelerate downwards, a = g; where the acceleration due to gravity is justgiven by

g = −∇φg.Now, experience tells us that both the observer and ball will remain at rest,relative to each other, inside the lift.

From Newton, we have the relation between the resultant force on a body(which will be the gravitational mass times the gravitational field), and theinertial mass with acceleration:

mia = mgg.

Thus, as a = g, we therefore easily see that mi = mg. This leads to thestatement of the weak equivalence principle:

“Gravity couples in the same way to all mass & energy”.

1.1.4.2 The Strong Equivalence Principle

Consider the same setup as before: observer & ball at rest inside a sealedlift. This time, let the lift be in free space (i.e. no gravitational fields,anywhere).

Then, suppose that we accelerate the lift (using a rocket) such that a = g.We see that there is no difference in this situation as to one in which the liftis sat on the earths surface. Thus, the string equivalence principle:

“All laws of physics are the same in an accelerated frame, and in a uniformstatic gravitational field”.

1.1.5 Gravitational Redshift

Consider a lift with a stationary observer in. Also in the lift, is a light blub,which emits light at frequency ν ′, according to the observer stationary insidethe lift. Now, consider that there is another observer, stationary, on the

Page 16: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

8 Gravitation

surface of the earth (which we model as having a homogenous gravitationalfield). We have x pointing upwards, from the surface of the earth. Then,

g = −dφd`

x.

Let the length of the lift be d`, and the light bulb reside at the top of thelift. Then, a signal traveling at speed c takes time dt = cd` to traverse thelength of the lift.

Now, suppose that the lift is traveling at speed v, and then the observeron the earth will see some shifted frequency, ν. The Doppler shift is just

ν ′

ν=(

1 + v/c

1− v/c)1/2

≈ 1 +v

c,

after using the binomial expansion. From this, we see that

ν=v

c.

Using the relation that v = du = gdt, this simply gives that

ν=g

cdt,

which gives, using cdt = d`

ν=

g

c2d`.

Now, if we use the fact that gd` = −dφ, then this is just

ν= −dφ

c2.

Therefore, we see that frequency shift is due to a changing gravitationalpotential. Thus, if a photon is moving out of a potential, then it will bered-shifted; and inward would be blue-shifted.

1.1.6 Einstein’s Vision of General Relativity

Einstein’s vision is that spacetime is a manifold, such that line elements aregiven by

ds2 = gµν(xρ)dxµdxν ,

where the metric is a function of coordinates. Within the metric (or, howthe metric is constructed) is information on how spacetime is curved; and itis curved by any form of energy/momentum. According to the equivalenceprinciple, one can always choose coordinates such that space is locally flat

Page 17: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.2 Manifolds, Metrics & Tensors 9

(Minkowski). Things in the spacetime travel along straight geodesics. Mas-sive particles travel along time-like geodesics, which have ds2 > 0, photonstravel along null geodesics ds2 = 0, and tachyons along ds2 < 0.

1.2 Manifolds, Metrics & Tensors

1.2.1 Definitions

Let us state some rather (mathematically) loose definitions.

Manifold A manifold is a continuous set of points, which locally looks likean n-dimensional Minkowski space.

That is, given a manifoldM, if we “zoom in” on a little bit, that little bitwill look flat. Suppose we zoom in on a bit which we label ui(p), where ijust means that we chose one of many bits; and p is the point at the middleof the bit ui. The coordinate system in u1 (say) is Minkowski, xa(p). Thewhole collection of these little bits leads us to our next definition.

A manifold endowed with a metric is called a Riemannian manifold

Atlas An atlas is the complete set of coordinate systems ui in the mani-fold M.

Curve A curve, in an n-manifold (where M merely has n coordinates), isa subset of points defined parametrically

xa = xa(λ), a = 1, 2, . . . , n, λ ∈ R.

For example, consider a 1-sphere (i.e. a circle), defined by the equationx2 + y2 = 1. We parameterise it thus

xa = (x(λ), y(λ)) ⇒ x(λ) = sinλ, y(λ) = cosλ; 0 ≤ λ < 2π.

Surfaces A m-dim hypersurface in an n-manifold (whereby m < n), isdefined as

xa = xa(λ1, . . . , λm); λ1,...,m ∈ R.

So that a curve is a 1D hypersurface. Or, alternatively, a surface is a gener-alisation of a curve.

For example, consider a 2-sphere (i.e. the surface of a ball), of constantradius r. It is defined by x2 + y2 + z2 = r2 = const. We parameterise thesurface by (θ, φ), so that

x = r sin θ cosφ, y = r sin θ sinφ, z = r cos θ; 0 ≤ θ < π, 0 ≤ φ < 2π.

Page 18: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

10 Gravitation

1.2.2 Coordinate Transformations

Consider moving from one coordinate system to another

xµ 7−→ x′µ = x′µ(xν).

Such a transformation is defined by displacement vectors dxµ and dx′ν , suchthat

dx′µ = Jµνdxν , (1.2.1)

whereby the inverse is just

dxµ =(J−1

)µνdx′ν .

By the chain rule, it is easy to see that the transformation matrix is just theJacobian

Jµν =∂x′µ

∂xν. (1.2.2)

The transformation & inverse satisfy

Jµν(J−1

)νσ

= δµσ . (1.2.3)

This is easier to see if we represent the Jacobians in terms of differentials,

Jµν(J−1

)νσ

=∂x′µ

∂xν∂xν

∂x′σ=∂x′µ

∂x′σ= δµσ .

1.2.2.1 Example: Plane Polars

Consider that some point in the R2 plane may be defined by Cartesiancoordinates (x, y) or plane polars, (r, θ). Then, we make the identifications

(x1, x2) = (x, y), (x′1, x′2) = (r, θ).

We also know that

x = r cos θ, y = r sin θ; r =√x2 + y2 , θ = tan−1 y/x.

Then, we can compute the elements of the Jacobian

J i j =∂x′i

∂xj

=∂(r, θ)∂(x, y)

=

(∂r∂x

∂r∂y

∂θ∂x

∂θ∂y

)

=(

cos θ sin θ− sin θ

rcos θr

).

Page 19: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.2 Manifolds, Metrics & Tensors 11

And therefore,

dr =∑j

Jrjdxj

= Jrxdx+ Jrydy

= cos θdx+ sin θdy.

And similarly,

dθ = −sin θrdx+

cos θr

dy.

1.2.3 Tangent Vector

Imagine that on a manifoldM, we have curves parameterised by u. On onecurve, there is a point p(u). So, we have xµ = xµ(u), then the tangent curveis defined to be

Tµ =dxµ

du

∣∣∣∣u=up

. (1.2.4)

1.2.4 The Metric & Line Element

We have the line element

ds2 = gµν(x)dxµdxν . (1.2.5)

Now, a common requirement, is the invariance of the line element (i.e. in-variance of the interval). Thus, we require that

ds2(x) = ds2(x′).

So, under transformation xµ 7→ x′ν(xµ), we want that

gµνdxµdxν = g′αβdx

′αdx′β. (1.2.6)

So, we proceed by writing down the known transformation of the RHS“primed” to “unprimed” displacement vectors,

gµνdxµdxν = g′αβdx

′αdx′β = g′αβJαµJ

βνdx

µdxν .

But, this must always be consistent, so we see that we must have

gµν = g′αβJαµJ

βν . (1.2.7)

We can derive a similar relation, by starting from (1.2.6), and instead oftransforming the RHS, transform the LHS. So,

g′αβdx′αdx′β = gµνdx

µdxν = gµν(J−1

)νβ

(J−1

)µαdx′αdx′β,

Page 20: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

12 Gravitation

which we require to always be true, leaving us with

g′αβ = gµν(J−1

)νβ

(J−1

)µα. (1.2.8)

The alternative way of writing the Jacobian leads us to be able to rewrite(trivially) expressions (1.2.7) and (1.2.8)

gµν =∂x′α

∂xµ∂x′β

∂xνg′αβ, g′αβ =

∂xν

∂x′β∂xµ

∂x′αgµν .

We call gµν the “metric”, and gµν the “inverse metric”; where they mustsatisfy

gµνgνλ = δλµ. (1.2.9)

1.2.4.1 Example: Polars

We know that the line element in plane polars is ds2 = dr2 + r2dθ2. Thus,we can read off the elements of the metric

(gij) =(

1 00 r2

),

and, by (1.2.9), we see that we require

(gij) =(

1 00 1/r2

).

In spherical polars, the line element is

ds2 = dr2 + r2dθ2 + r2 sin2 θdφ2;

and we can easily read off the metric

(gij) =

1 0 00 r2 00 0 r2 sin2 θ

, (gij) =

1 0 00 1/r2 00 0 1/r2 sin2 θ

.

Raising & Lowering We can use the metric to raise & lower indices. Weshall not show this in use here; see the next subsection.

1.2.5 Vectors

We start this by discussing contravariant and covariant vectors.

Page 21: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.2 Manifolds, Metrics & Tensors 13

1.2.5.1 Contravariant Vectors Aµ

These are sometimes just denoted “vectors”.These are defined to transform, under coordinate transformation xµ 7→

x′µ(xν) as

A′µ = JµνAν . (1.2.10)

1.2.5.2 Covariant Vectors AµThese are sometimes called “covectors”.

Let us say that we define a covector Aµ via

Aµ = gµνAν .

Then, we may derive its transformation properties. Consider that

A′µ = g′µνA′ν ,

the RHS of which we know the transformation rules for

A′µ = g′µνA′ν =

(J−1

)αµ

(J−1

)βνgαβJ

νσA

σ.

We can rearrange the terms in this expression,

A′µ = g′µνA′ν =

(J−1

)αµ

(J−1

)βνJνσgαβA

σ,

so that we notice the appearance of a transformation-inverse multiplication,which results in a Kronecker-delta

A′µ = g′µνA′ν =

(J−1

)αµδβσgαβA

σ,

acting the Kronecker-delta results in (ignoring the middle equality now)

A′µ =(J−1

)αµgαβA

β,

then lowering the index, via the metric,

A′µ =(J−1

)αµAα. (1.2.11)

And therefore, we have arrived at the relation we require.

1.2.5.3 The Scalar Product

The scalar product between two vectors is written

S · T = SµT νgµν = SνTν .

A fairly obvious thing we need to prove is the invariance of the dot-product.So,

SνTν = S′αT

′β (J−1)ανJνβ = S′αT

′βδαβ = S′αT′α.

Page 22: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

14 Gravitation

This is a very important proof. Infact, it also states that scalars are invariantunder transformation.

Within the scalar product, we must briefly mention the modulus of avector. We denote them as ||S||, and define them

||S|| =

(SµSµ)1/2 time− like, ds2 > 0(−SµSµ)1/2 space− like, ds2 < 0.

1.2.5.4 Conformal Transformations

Following from the previous definition of the scalar product, we have thedefinition of the angle between two vectors;

cos θ =SµTµ||S|| ||T || =

SµTµ

(SαSα)1/2(T βTβ)1/2. (1.2.12)

A conformal transformation is defined as one whose angle between two vec-tors does not change. That is, under a conformal transformation, the anglebetween two vectors is unchanged.

Associated metrics are termed “conformal metrics”. How can we find suchmetrics? They are given by

gµν = Ω(x)gµν , Ω(x) 6= 0. (1.2.13)

We can see this by putting this new metric into the cos θ expression,

cos θ =gµνS

µT ν

(gαγSαSγ)1/2(gβδT βT δ)1/2,

and by substituting gµν = Ω(x)gµν , we see that the factors of Ω end upcanceling, leaving the angle unchanged.

1.2.5.5 It is a Proper Vector?

Here, we ask if various quantities are “proper vectors”, or not.Consider Cµ(x) = aAµ(x) + bBµ(x). It is clearly a proper vector, as

each of its constituents transform as we expect - each is defined at the samecoordinate point.

Consider Cµ = aAµ(x1) + bBµ(x2). This is not a proper vector, as theconstituents are defined at different points, and different points transformdifferently.

1.2.6 Tensors

These are basically vectors, with more indices. We can also mix the indices,so that we have some up, some down.

Page 23: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.2 Manifolds, Metrics & Tensors 15

For example, consider Fµν ≡ AµBν . We call it a second rank contravari-ant tensor, or a

(20

)-tensor. It clearly transforms as

F ′µν = A′µB′ν = JµαJνβA

αBβ = JµαJνβF

αβ.

Similarly, a second rank covariant tensor, or a(

02

)-tensor, transforms like

F ′µν = A′µB′ν =

(J−1

)αµ

(J−1

)βνAαBβ =

(J−1

)αµ

(J−1

)βνFαβ.

Finally, a mixed(

11

)-tensor transforms

F ′µν = Jµα(J−1

)βνFαβ.

This obviously generalises to higher-rank tensors. One must include aJacobian for each contravariant index, and one inverse Jacobian for eachcovariant index.

Getting equations & other expressions into tensorial form (i.e. into a formconsistent with the above tensor transformations), is extremely useful. Forexample, given a tensor equation in one frame of reference, one thereforeknows the form in all frames of reference. This becomes particularly usefulwhen one finds a frame in which a particular equation becomes simple toanalyse; then, one can simply transform out of that frame, and know thatthe analysis still holds.

Also, consider a tensor for whom all components are zero. Then, one can-not make a coordinate transformation that will be able to “reinstate” those(completely) zero components. That is, a tensor with zero components inone frame, has zero components in all frames. This is a very useful concept.If a quantity is not a tensor, then this does not hold true. That is, a non-tensor with zero components in one frame may have non-zero componentsin another.

1.2.6.1 Symmetric & Anti-symmetric Tensors

A symmetric (20)-tensor is one where

Aµν = Aνµ,

that is, the sign is unchanged under exchange of the indices. An anti-symmetric tensor is one for whom

Bµν = −Bνµ.

Now then, using these relations (definitions, if you will), we can see someinteresting formulae.

Page 24: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

16 Gravitation

Suppose that Aµν is a symmetric tensor. Then, Aµν = Aνµ. Then, we seethat

Aµν = 12(Aµν +Aνµ) = 1

2(Aµν +Aµν) = 122Aµν = Aµν .

Similarly, suppose that Bµν is an anti-symmetric tensor. Then,

Bµν = 12(Bµν −Bνµ) = 1

2(Bµν +Bµν) = Bµν .

These obviously all hold for covariant tensors. Lets introduce some notationthat will be pretty useful.

Suppose we have some tensor, defined as

Tµν ≡ 12(Bµν −Bνµ),

then, we write

B[µν] ≡ 12(Bµν −Bνµ).

That is, we could say that Tµν is formed by the anti-symmetric interchange ofindices on Bµν . We use the “square brackets” to denote the anti-symmetricinterchange. Similarly, suppose we have

Cµν ≡ 12(Aµν +Aνµ),

then, we define

A(µν) ≡ 12(Aµν +Aνµ).

Thus, we say that Cµν is formed by the symmetric interchange of indices.We used “round brackets” to denote the symmetric interchange.

Suppose we have some tensor, Yµν . Then, we can write it as the sum ofan anti-symmetric part, and a symmetric part. That is,

Yµν = A[µν] +A(µν) = 12(Aµν −Aνµ) + 1

2(Aµν +Aνµ).

This is infact pretty obvious. If the tensor is symmetric, then A[µν] = 0.And, if the tensor is anti-symmetric, then A(µν) = 0.

The notation of a lower bracket to denote index interchange can be usedin another way. Recall the electromagnetic field tensor,

Fµν ≡ ∂µAν − ∂νAµ,then, we can write this as

Fµν = 2∂[µAν].

Also recall that two of Maxwells equations may be recovered from

∂µFαβ + ∂αFβµ + ∂βFµα = 0,

Page 25: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.3 Tensor Calculus 17

well, we can denote this (notice that this is a cyclic interchange of index) as

∂(µFαβ) = 0.

In this final example, we were a little sloppy. There is infact a numerical fac-tor associated with this; however, it gets very messy, and the factor cancelsout anyway. However, one should be aware that there is a factor there.

1.3 Tensor Calculus

Here we shall lay some formal groundwork for dealing with objects in curvedspacetime. We start by looking at differentiation, going on to geodesics.

1.3.1 Covariant Differentiation

Let us just state some notation. We have

∂µ ≡ ∂

∂xµ, ∂µ ≡ ∂

∂xµ.

Now, let us look at the coordinate transformation xµ 7→ x′µ(xν). Then, wehave that

dx′µ = Jµνdxν , Jµν =

∂x′µ

∂xν= ∂νx

′µ.

Now, let us consider differentiation of a scalar, and a coordination transfor-mation (noting that scalars do not transform under a coordinate transfor-mation); thus

∂µφ 7−→ ∂′µφ =∂

∂x′µφ

=∂xν

∂x′µ∂νφ

=(J−1

)νµ∂νφ.

Therefore, we see that the derivative of a scalar ∂µφ transforms as a covariantvector

∂′µφ =(J−1

)νµ∂νφ.

Now, let us try this with a vector (again, under a coordinate transforma-tion)

∂µAν 7−→ ∂′µA

′ν ;

where we want to derive how the RHS relates back to the LHS. Notice, if

Page 26: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

18 Gravitation

∂µAν is a (1

1)-tensor, then we know what it gives. However, let us derive it.So, using the known transformation rules for Aν and ∂µ,

∂′µA′ν =

(J−1

)αµ∂αJ

νβA

β.

Now, to continue, we must consider the partial derivative above. We mustuse the product rule on everything to the right of it. That is(

J−1)αµ∂α

(JνβA

β)

=(J−1

)αµ

(∂αJ

νβ

)Aβ +

(J−1

)αµJνβ

(∂αA

β).

This is not the transformation rule for a (11)-tensor, due to the presence of

the first term on the RHS. We write the result, swapping the two terms onthe RHS, to see this more clearly:

∂′µA′ν =

(J−1

)αµJνβ

(∂αA

β)

+(J−1

)αµ

(∂αJ

νβ

)Aβ.

Therefore, we see that the partial derivative of a vector is not a tensor. Thenon-tensorial part is the added term on the far right. There is a rather morefundamental reasoning behind why the partial derivative of a vector is nottensorial. Recall that the partial derivative of a vector is defined as

∂µAν(xα) = lim

δu→0

Aν(xα)−Aν(xα + δu)δu

.

So, the partial derivative is composed by finding the value of a vector atdifferent points. As we have seen, the sum of two vectors evaluated atdifferent points, is not a proper vector (this is due to the Jacobian beingevaluated at different positions). Therefore, one should expect the partialderivative of a vector not to be tensorial; which is what we find.

Now, consider the vector

A(x) = Aν(x)eν(x) = A′ν(x)e′ν(x),

where we use the fact that a vector is the same in all frames. Now con-sider differentiating A, noting that the components and basis vectors are allfunction of coordinate;

∂νA = ∂ν (Aµeµ) = (∂νAµ)eµ +Aµ(∂νeµ).

Now, to continue, we shall write the final bracketed term as a sum overcoefficients

∂νeµ = Γρ νµeρ.

The logic behind this will become clear. However, one may think of it in asimilar way to quantum theory. Given a state, one can write it as a sum overcoefficients times the basis. What we are doing here, is to say that ∂νeµ is

Page 27: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.3 Tensor Calculus 19

a “new object”, and write that new object as a sum over the original basiseρ, with coefficients Γρ νµ. Notice that this then results in

∂νA = (∂νAµ)eµ +AµΓρ νµeρ.

In the final term, let us swap indices ρ→ µ and µ→ β,

AµΓρµνeρ → AβΓµνβeµ.

This therefore results in

∂νA = (∂νAµ)eµ +AβΓµνβeµ,

which we factorise (and move the position of the final Aβ) to

∂νA = eµ(∂νA

µ + ΓµνβAβ).

Furthermore, we define the bracketed quantity as

∇νAµ ≡ ∂νAµ + ΓµνβAβ. (1.3.1)

This defines the covariant derivative of a contravariant vector. We can usethis rule for the covariant derivative of a contravariant vector to derive therule for a covariant vector.

The covariant derivative of a contravariant vector is

∇αAµ = ∂αAµ + ΓµαλA

λ.

A covector is constructed from the contravariant vector via

Aν = gνµAµ.

So,

∇αAµ = ∇α (gµνAν)

= gµν∇αAν +Aν∇αgµν= ∂αA

µ + ΓµαλAλ

= ∂α (gµνAν) + Γµαλ(gλβAβ

)= Aν∂αg

µν + gµν∂αAν + ΓµαλgλβAβ.

If we equate the second and last lines,

gµν∇αAν +Aν∇αgµν = Aν∂αgµν + gµν∂αAν + Γµαλg

λβAβ.

Now, the index ν is a “dummy index”, so we can swap β → ν in the lastterm, to give

gµν∇αAν +Aν∇αgµν = Aν∂αgµν + gµν∂αAν + Γµαλg

λνAν ,

Page 28: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

20 Gravitation

collecting terms,

gµν∇αAν =(∂αg

µν + Γµαλgλν −∇αgµν

)Aν + gµν∂αAν .

We then expand out the covariant derivative of the metric (the third termin the bracket), to give

gµν∇αAν =(∂αg

µν + Γµαλgλν − ∂αgµν − Γµαλg

λν − Γν αλgµλ)Aν+gµν∂αAν .

Now, the first and third terms cancel each other out, as do the second andfourth. Leaving

gµν∇αAν = gµν∂αAν − Γν αλgµλAν .

If we multiply through by gπµ, then we see that

gπµgµλ = δλπ , gπλg

µν = δνπ.

Hence, this gives

δνπ∇αAν = δνπ∂αAν − δλπΓν αλAν ,

which is

∇αAπ = ∂αAπ − Γν απAν .

Putting into more “standard indices”, we have our desired result. Hence,the covariant derivative of a covariant vector is

∇νAµ ≡ ∂νAµ − ΓβνµAβ. (1.3.2)

Now, remember that a scalar is invariant; and that the derivative of a scalaris a tensor, we should have that ∇µ(AνAν) = ∂µ(AνAν). This can bechecked. So,

∇µ(AνAν) = (∇µAν)Aν +Aν(∇µAν)

= Aν

(∂µA

ν + Γν µβAβ)

+Aν(∂µAν − ΓαµνAα

)= ∂µ (AνAν) +AνΓν µβA

β −AνΓαµνAα

= ∂µ (AνAν) +AνAβΓν µβ −AνAαΓαµν .

Now, the last two expressions can be shown to cancel, by interchangingindices. Let us manipulate the final expression

AνAαΓαµν α→ ν → β ⇒ AβAνΓν µβ,

Page 29: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.3 Tensor Calculus 21

and so, if we put this expression back in, we see that

∇µ(AνAν) = ∂µ (AνAν) +AνAβΓν µβ −AβAνΓν µβ

= ∂µ (AνAν) .

Therefore, we see an expected result: the covariant derivative of a scalar isthe same as the partial derivative.

We call the expansion coefficients Γλνµ the affine connection.We are able to find the covariant derivative of tensors of arbitrary rank.

A few are given below.

∇αAµν = ∂αAµν + ΓµαλA

λν + Γν αλAµλ,

∇αAµν = ∂αAµν − ΓλαµAλν − ΓλανAµλ,

∇αAµν = ∂αAµν + ΓµαλA

λν − ΓλανA

µλ,

∇αAµνσ = ∂αAµνσ + ΓµαλA

λνσ + Γν αλAµλσ + ΓσαλA

µνλ.

Basically, for each contravariant component, there should be a positive con-nection term, and for each covariant a negative term.

1.3.1.1 Parallel Transport

The main idea in parallel transport is this:Consider moving a vector from one place to another. Then, in general,

that vector will change direction; thus, a change in the vector upon movingsaid vector. So, we can find the difference in a vector,

DAµ = Aµ(x′)− Aµ(x′).

Considering how the basis changes as well, we end up with

DAµ = δxν(∂νA

µ + ΓµνλAλ)

The bracketed quantity is just the covariant derivative. Thus,

DAµ = δxν∇νAµ.Now, the point is that this gives another insight as to what the covariantderivative is. When moving a vector around a manifold, one must considerhow the basis vectors change from point to point, as well as the components.This information is within the affine connection.

For an example as to what parallel transport is, consider a circle in theplane. Consider that there is an arrow living on the circle, pointing ina given direction (say parallel to the y-axis). Then, consider moving thearrow around the circle. The arrow undergoes parallel transport if it alwayspoints in the same direction, nomatter what its position on the circle.

Page 30: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

22 Gravitation

Now, consider that the entire space is the circle-line. That is, we have a1D manifold. For a vector living on the manifold, parallel transport meansmoving on tangents to the circle.

1.3.1.2 Absolute Derivative

We define the absolute derivative asDAµ

Du=dxν

du∇νAµ,

where we have considered a curve, parameterised so that

Aµ = Aµ(xν(u)).

1.3.1.3 Transformation of ΓλνµLet us consider the transformation property of the affine connection, Γλνµ.Let us start with our previous definition, but in the primed-frame (we willthen transform to the unprimed)

Γ′ρµνe′ρ = ∂′µe

′ν .

Then, we know how to transform the RHS,

∂′µe′ν =

(J−1

)αµ∂α(J−1

)βν

eβ,

we then use the product rule on the RHS,(J−1

)αµ∂α(J−1

)βν

eβ =(J−1

)αµ

(J−1

)βν∂αeβ +

(J−1

)αµ

eβ∂α(J−1

)βν.

Now, we also know that ∂αeβ = Γδ αβeδ, so that(J−1

)αµ∂α(J−1

)βν

eβ =(J−1

)αµ

(J−1

)βν

Γδ αβeδ+(J−1

)αµ

eβ∂α(J−1

)βν,

remembering that the LHS is of course just

Γ′ρµνe′ρ =

(J−1

)αµ

(J−1

)βν

Γδ αβeδ +(J−1

)αµ

eβ∂α(J−1

)βν.

If we then transform the basis vector on the LHS, we have

Γ′ρµν(J−1

)λρeλ =

(J−1

)αµ

(J−1

)βν

Γδ αβeδ +(J−1

)αµ

eβ∂α(J−1

)βν.

On the RHS, let us change the indices on the basis vectors, so that they arethe same as those on the left. That is, δ → λ and β → λ;

Γ′ρµν(J−1

)λρeλ =

(J−1

)αµ

(J−1

)βν

Γλαβeλ +(J−1

)αµ

eλ∂α(J−1

)λν,

which allows us to then cancel off the basis vectors,

Γ′ρµν(J−1

)λρ

=(J−1

)αµ

(J−1

)βν

Γλαβ +(J−1

)αµ∂α(J−1

)λν.

Page 31: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.3 Tensor Calculus 23

If we then multiply this through by something which will kill-off the inverseJacobian on the LHS, we will have got to our result. Notice that Jπλ willdo this. So,

Γ′ρµνJπλ

(J−1

)λρ

= Jπλ(J−1

)αµ

(J−1

)βν

Γλαβ + Jπλ(J−1

)αµ∂α(J−1

)λν

⇒ Γ′ρµνδπρ = Jπλ

(J−1

)αµ

(J−1

)βν

Γλαβ + Jπλ(J−1

)αµ∂α(J−1

)λν

⇒ Γ′πµν = Jπλ(J−1

)αµ

(J−1

)βν

Γλαβ + Jπλ(J−1

)αµ∂α(J−1

)λν.

We therefore have our result: the transformation of the affine connection is

Γ′πµν = Jπλ(J−1

)αµ

(J−1

)βν

Γλαβ + Jπλ(J−1

)αµ∂α(J−1

)λν. (1.3.3)

Now, although not a notation we have been using much, we can representthe Jacobians by differentials,

Jµν =∂x′µ

∂xν,(J−1

)µν

=∂xµ

∂x′ν;

and, using this notation, the transformation of the affine connection lookslike

Γ′πµν =∂x′π

∂xλ∂xα

∂x′µ∂xβ

∂x′νΓλαβ +

∂x′π

∂xλ∂xα

∂x′µ∂

∂xα∂xλ

∂x′ν.

We can see that this immediately shows that the affine connection is nota tensor (due to the existence of the second term on the RHS). Now, ifthe affine connection were a tensor, then, if one were to find a coordinatesystem in which all the components were zero, then they must be zero inall coordinate systems (this is a general property of tensors). That theaffine connection is not a (1

2)-tensor means that even if the connection haszero components in one frame, there exists frames in which the componentsare non-zero. Infact, one can show that there exists a frame in which thecomponents are zero, at a point. We shall now show that.

1.3.1.4 Locally Inertial Frames

This will all seem a little pointless, until we reach the very end of ourdiscussion.

Let us make the following coordinate transformation,

x′µ = xµ +12

Γµαβxαxβ, xµ ≡ xµ − xµ∗ ,

where xµ∗ is a single point. Now, under this transformation, we can write

Page 32: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

24 Gravitation

down the Jacobian

Jµν =∂x′µ

∂xν=

∂xν

(xµ +

12

Γµαβxαxβ

)= δµν +

12xαxβ∂νΓµαβ +

12

Γµαβ(δαν x

β + δβν xα)

= δµν +12xαxβ∂νΓµαβ + Γµνβx

β,

thus, the Jacobian is

Jµν = δµν +12xαxβ∂νΓµαβ + Γµνβx

β. (1.3.4)

Notice that this can be written,

Jµν = δµν +O(xβ). (1.3.5)

Infact, the inverse Jacobian is also this,(J−1

)µν

= δµν −O(xβ). (1.3.6)

Now then, returning to (1.3.4), we see that we can differentiate it,

∂αJπλ = ∂α

(δπλ +

12

(Γπλβx

β + Γπλν xν))

+O(xβ)

=12

(Γπλβδ

βα + Γπνλδ

να

)+O(xβ)

=12

(Γπλα + Γπαλ) +O(xβ)

= Γπλα +O(xβ).

Thus,

∂αJπλ = Γπλα +O(xβ). (1.3.7)

Now, we previously derived the transformation rule of the affine connection,

Γ′πµν = Jπλ(J−1

)αµ

(J−1

)βν

Γλαβ + Jπλ(J−1

)αµ∂α(J−1

)λν.

Let us look at the final term,

Jπλ(J−1

)αµ∂α(J−1

)λν,

we see that we can write it as

− (J−1)λν

(J−1

)αµ∂αJ

πλ.

To see how we can do this, consider that

δαβ =∂x′α

∂x′β=∂x′α

∂xπ∂xπ

∂x′β.

Page 33: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.3 Tensor Calculus 25

Also, ∂νδαβ = 0. Then, that means that

∂νδαβ =

∂xν∂x′α

∂xπ∂xπ

∂x′β

=∂x′α

∂xπ∂2xπ

∂xν∂x′β+∂xπ

∂x′β∂2x′α

∂xν∂xπ

= 0.

That is,

∂x′α

∂xπ∂2xπ

∂xν∂x′β= − ∂x

π

∂x′β∂2x′α

∂xν∂xπ.

Or, using the Jacobian notation,

Jαπ∂ν(J−1

)πβ

= − (J−1)πβ∂νJ

απ.

Thus, we have shown that we can do the “swap” we did above.Therefore, we write the transformation rule of the connection once again,

with this rewriting of the last term,

Γ′πµν = Jπλ(J−1

)αµ

(J−1

)βν

Γλαβ −(J−1

)λν

(J−1

)αµ∂αJ

πλ.

Now, we have all of these expression. We use (1.3.5) and (1.3.6) for theJacobian/inverse, and (1.3.7) for the derivative of the Jacobian;

Γ′πµν = δπλδαµδ

βνΓλαβ − δλν δαµΓπαλ +O(xρ).

Using the Kronecker-deltas results in

Γ′πµν = Γπµν − Γπµν +O(xρ) = O(xρ).

Therefore, the components of the transformed connection are all

Γ′πµν = O(xρ).

Now, if we let xµ → 0, which is equivalent (by our definition of xµ) to sayingxµ = xµ∗ , then we see that

Γ′πµν(xρ = xρ∗) = 0.

Therefore, we have a transformation which renders all components of theaffine connection zero. That is, we can transform to a frame in which thegeometry is Euclidean (flat), at that single point. This is actually an in-credibly useful & important result. Notice that if the Christofell symbolsare zero, then the covariant derivative is just the partial derivative. Thistends to hugely simplify calculations. In the later discussions on curvature,

Page 34: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

26 Gravitation

we shall see that in transforming to a locally inertial frame, where the con-nection components are zero, we can compute this a lot easier. And, as thethings we are transforming are tensors, the results hold in any frame.

Some literature call such a set of coordinates, geodesic coordinates.

Alternative Derivation Here we shall present a rather more mathemati-cally rigorous derivation of the existence of geodesic coordinates.

Let xµ = aµ be coordinates at a point A in the frame Σ. Let us transformto a new frame, via the transformation

xµ = aµ + x′µ +12aµνλx

′νx′λ,

where the coefficient aµνλ is symmetric in its lower indices, and is constant(i.e. we define this as part of the transformation). Thus, at the point A,x′µ = 0. So, let us compute the differentials of the transformation;

∂xµ

∂x′ν= δµν +

12aµκλ

∂x′ν

(x′κx′λ)

= δµν +12aµκλ

(x′κδλν + x′

λδκν

)= δµν + aµνλx

′λ.

Hence,

∂2xµ

∂x′ν∂x′λ= aµνλ.

Hence, at the point A (i.e. where x′µ = 0), we see that

∂xµ

∂x′ν= δµν ,

∂2xµ

∂x′ν∂x′λ= aµνλ. (1.3.8)

Now, we can do a little work to get a relation between the coefficents aµνλand the metric. The metric transforms via

g′µν =∂xα

∂x′µ∂xβ

∂x′νgαβ.

Then, differentiating it,

∂g′µν∂x′λ

=∂2xα

∂x′λ∂x′µ∂xβ

∂x′νgαβ +

∂xα

∂x′µ∂2xβ

∂x′λ∂x′νgαβ +

∂xα

∂x′µ∂xβ

∂x′ν∂gαβ∂x′λ

.

Now, rewrite the last term using the chain rule;

∂gαβ∂x′λ

=∂gαβ∂xσ

∂xσ

∂x′λ,

Page 35: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.3 Tensor Calculus 27

so that we have

∂g′µν∂x′λ

=∂2xα

∂x′λ∂x′µ∂xβ

∂x′νgαβ +

∂xα

∂x′µ∂2xβ

∂x′λ∂x′νgαβ +

∂xα

∂x′µ∂xβ

∂x′ν∂gαβ∂xσ

∂xσ

∂x′λ.

So, at the point A, using (1.3.8), this reads

∂g′µν∂x′λ

= aαλµδβν gαβ + aβλνδ

αµgαβ + δαµδ

βν δ

σλ

∂gαβ∂xσ

= gανaαλν + gµβa

βµβ +

∂gµν∂xλ

. (1.3.9)

Now let us choose that

∂g′µν∂x′λ

= 0. (1.3.10)

which is equivalent to choosing the metric to be flat at that point A. Now,note that

gανaαλµ = aνλµ.

Then, we see that (1.3.9) becomes

aνλµ + aµλν +∂gµν∂xλ

= 0,

which trivially becomes

aνλµ + aµλν = −∂gµν∂xλ

. (1.3.11)

Now, if we permute the indices ν → µ→ λ→ ν, this becomes

aµνλ + aλνµ = −∂gλµ∂xν

, (1.3.12)

permuting again,

aλµν + aνµλ = −∂gνλ∂xµ

. (1.3.13)

Now, if we form (1.3.12) + (1.3.13)–(1.3.11), then we see

aµνλ + aλνµ + aλµν + aνµλ − aνλµ − aµλν= −∂gνλ

∂xµ− ∂gλµ

∂xν+∂gµν∂xλ

.

Now, as aµνλ = aµλν , we see that the fourth and fifth terms cancel, as dothe first and sixth, leaving

2aλνµ = −(∂gνλ∂xµ

+∂gλµ∂xν

− ∂gµν∂xλ

),

Page 36: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

28 Gravitation

that is,

aλνµ = [λν, µ] ,

where

[λν, µ] ≡ −12

(∂gνλ∂xµ

+∂gλµ∂xν

− ∂gµν∂xλ

).

We call the [λν, µ] a Christofell symbol of the first kind. Now, by (1.3.10)we see that

a′λνµ = 0

at the point A.Therefore, we have derived that under the coordinate transformation xµ =

aµ+x′µ+ 12a

µνλx′νx′λ, the Christofell symbols are zero at the point xµ = aµ;

such coordinates are geodesic coordinates. In the derivation, we assumedthat:

aµνλ constant and symmetric in lower indices,∂g′µν∂x′λ

= 0 constant metric at point we transform to.

1.3.1.5 Torsion

Let us define torsion to be

T ρµν ≡12(Γρµν − Γρ νµ

)= Γρ [µν]. (1.3.14)

We shall work with symmetric affine connections; so that the torsion goes tozero. A torsion free space merely allows us to interchange the lower indiceson the connection components at will. This expression for torsion is a tensor;let us prove it.

So, the transformation of torsion can be written

T ′ρµν = Jρα(J−1

)βµ

(J−1

)γνTαβγ + δT ρµν ,

where δT ρµν is a term that can be easily seen from the transformation ruleof the connection ,

δT ρµν = Jρλ(J−1

)αµ∂α(J−1

)λν− Jρλ

(J−1

)αν∂α(J−1

)λµ.

Page 37: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.3 Tensor Calculus 29

Now, a “trick” that we have used before is to note that

δµν = Jµα(J−1

)αν

⇒ ∂βδµν = ∂β

(Jµα

(J−1

)αν

)= Jµα∂β

(J−1

)αν

+(J−1

)αν∂βJ

µα

= 0

⇒ Jµα∂β(J−1

)αν

= − (J−1)αν∂βJ

µα.

We can then use this in the expression for δT ρµν , to see that

δT ρµν = − (J−1)αµ

(J−1

)λν∂αJ

ρλ +

(J−1

)αν

(J−1

)λµ∂αJ

ρλ

= ∂αJρλ

[(J−1

)αν

(J−1

)λµ− (J−1

)αµ

(J−1

)λν

].

Now, let us define

Aαλνµ ≡(J−1

)αν

(J−1

)λµ− (J−1

)αµ

(J−1

)λν,

then we see that

Aαλνµ = −Aλανµ ,i.e. it is anti-symmetric under interchange of α and λ. Now, notice that

∂αJρλ =

∂2x′ρ

∂xα∂xλ

=∂2x′ρ

∂xλ∂xα

= ∂λJρα.

That is, ∂αJρλ is symmetric under interchange of α and λ. Therefore, as

the product of something which is symmetric and anti-symmetric is zero,we see that

δT ρµν = 0.

Hence,

T ′ρµν = Jρα(J−1

)βµ

(J−1

)γνTαβγ ,

which is the rule of transformation of a(

12

)-tensor.

1.3.2 Geodesics

A geodesic is the curve which gives an extremal of motion. That we use theword extremal, rather than minima (or, indeed, maxima), is very important.

Suppose we are “living in a manifold” (suppose we are confined to the

Page 38: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

30 Gravitation

surface of a sphere). Then suppose that we wish to compute the equation ofthe line (in that manifold) that joins two points, where the equation of theline is an extremum. That is, we can compute many equations of that line,but only one of them will be an extremum. Then, that curve is a geodesic.

The geodesic will depend upon the geometry of the manifold; the line hasits motion confined to the manifold. As we shall see, the metric is used togive the geometrical dependence.

1.3.2.1 The Affine Geodesic

We call an affine geodesic the curve for which the tangent vector is paralleltransported to itself. That is,

DTµ

Du= λ(u)Tµ, Tµ ≡ dxµ

du.

That is, we find a curve, along which the tangent vector does not changedirection. It may get longer (hence the factor of λ(u)), but it does notchange direction.

We have our definition of the absolute derivative,

DAµ

Du=dxν

du∇νAµ = T ν∇νAµ.

Therefore, the affine geodesic satisfies

T ν∇νTµ = λ(u)Tµ,

which is, using the definition of the covariant derivative

T ν(∂νT

µ + ΓµνγTγ)

= λTµ.

Now, consider

∂ν =∂

∂xν=

du

dxνd

du=

1T ν

d

du,

then, we see that the affine geodesic can be written

T ν(

1T ν

d

duTµ + ΓµνγT

γ

)= λTµ.

Therefore, noting that Tµ = dxµ

du , we see that

d

du

dxµ

du+ T νT γΓµνγ = λTµ,

which is of course just

d2xµ

du2+ Γµνγ

dxγ

du

dxµ

du= λTµ. (1.3.15)

Page 39: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.3 Tensor Calculus 31

As an example, consider a Cartesian system, whereby the affine connec-tions are all zero. The resultant differential equation has a straight lineas the solution. That is, the affine geodesic in Cartesian coordinates is astraight line.

We say that u is the affine parameter. If λ = 0, then we say that thegeodesic is affinely parameterised. That is,

T ν∇νTµ = 0, Tµ ≡ dxµ

du,

along an affinely parameterised geodesic.

1.3.2.2 The Metric Geodesic

This geodesic is perhaps a little less hand-wavey.Consider two points in some space. Consider that they are joined by a

line. Then, the metric geodesic is the line which extremises that joining line.So, given a line element

ds2 = gµνdxµdxν ,

we see that the corresponding action is

S =∫ds.

Now, considering that the line is parameterised by u, the affine parameter,then we see that the action is simply

S =∫ds =

∫ds

dudu =

∫du

√gµν

dxµ

du

dxν

du.

Then, by the variational principle, the Euler-Lagrange equation

d

du

∂L

∂xµ− ∂L

∂xµ= 0, xµ ≡ dxµ

du

extremises the action (note, we use the word extremise, rather than max-imise or minimise). We must state that gµν(xρ), only.

So, the Lagrangian is

L =(gαβx

αxβ)1/2

.

Page 40: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

32 Gravitation

We are now left to compute the elements of the EL equation. So,

∂L

∂xµ=

12L

∂xµ

(gαβx

αxβ)

=1

2Lgαβ

(∂xα

∂xµxβ + xα

∂xβ

∂xµ

)=

12Lgαβ

(δαµ x

β + δβµ xα)

=1

2Lgαβ

(δαµ x

β + δαµ xβ)

=1Lgαβx

βδαµ

=1Lgµβx

β.

And also,∂L

∂xµ=

12Lxαxβ∂µgαβ.

Finally,

d

du

∂L

∂xµ=

d

du

(1Lgµβx

β

)= − L

L2gµβx

β +1L

d

du

(gµβx

β),

the last expression we evaluate via

d

du

(gµβx

β)

= gµβdxβ

du+ xβ

d

dugµβ

= gµβxβ + xβ

dxγ

du

dgµβdxγ

= gµβxβ + xβxγ∂γgµβ.

Therefore,

d

du

∂L

∂xµ= − L

L2gµβx

β +1L

(gµβx

β + xβxγ∂γgµβ

);

and consequently, the EL equation reads

− L

L2gµβx

β +1L

(gµβx

β + xβxγ∂γgµβ

)− 1

2Lxαxβ∂µgαβ = 0.

Now, the job is to get this into a “nice form”, without mention to L. Wecan move the first term over to the RHS, and multiply through by L, giving

gµβxβ + xβxγ∂γgµβ − 1

2xαxβ∂µgαβ =

L

Lgµβx

β.

Page 41: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.3 Tensor Calculus 33

Let us now multiply this by something that will kill-off the metric multiply-ing xβ, the LHS. Multiplying by gρµ will work,

gρµgµβxβ + gρµ

(xβxγ∂γgµβ − 1

2xαxβ∂µgαβ

)=L

Lgρµgµβx

β,

noting that gρµgµβ = δρβ, and using this relation, we see we now have

xρ + gρµ(xβxγ∂γgµβ − 1

2xαxβ∂µgαβ

)=L

Lxρ.

To continue, we use the simple result that if a = b, then a = 12(a + b). So,

we see that

xβxγ∂γgµβ =12

(xβxγ∂γgµβ + xγ xβ∂βgµγ

),

thus, using this, we see that we have the geodesic equation being

xρ + gρµ12

(xβxγ∂γgµβ + xγ xβ∂βgµγ − xαxβ∂µgαβ

)=L

Lxρ.

We can pull out common factors of the bracketed term, by relabeling indicesα→ γ, thus

xρ + gρµ12

(∂γgµβ + ∂βgµγ − ∂µgγβ) xβxγ =L

Lxρ.

Now, by way of convenient notation, we define everything multiplying thexβxγ as

γ ρ β ≡ gρµ 12

(−∂µgγβ + ∂γgµβ + ∂βgµγ) , (1.3.16)

a symbol we call the Christofell symbol. Thus, the geodesic equation lookslike

xρ + γ ρ β xβxγ =L

Lxρ.

Now, if L = 0, then this reads

xρ + α ρ β xαxβ = 0,d2s

du2= 0.

Notice that the Christofell symbol is symmetric in its lower indices,

γ µ β = β µ γ ,which we can see by its definition, noting that the metric is symmetric.

Just to write the result again,

xρ + α ρ β xαxβ = 0 (1.3.17)

Page 42: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

34 Gravitation

is the affinely parameterised metric geodesic.So, to recap these geodesics. The curve which preserves the direction of

the tangent vector on that curve, is called the affine geodesic. When derivingthe geodesic, one uses the Γλµν symbol, so we call it the affine connection;or just connection. The second type of geodesic was derived to be the curvewhich extremises the path length between two points. This was called themetric geodesic. In deriving the metric geodesic, one defines some quantities,the Christofell symbols.

1.3.2.3 Relation Between Affine Connection & Christofell Symbol

We shall start by asserting that for a torsion free connection, Tαµν = 0, andthat for a metric with zero covariant derivative, ∇αgνµ = 0, then the affineconnection is the Christofell symbol. That is,

∇αgµν = 0, Tµαβ = 0 ⇒ Γµαβ = α µ β .The way to use “torsion free” is that the final two indices on the affine con-nection can be interchanged. We also use a symmetric metric throughout.

Let us prove it.We start by writing the covariant derivative of the metric,

∇αgµν = ∂αgµν − Γλαµgλν − Γλανgµλ.

But, by our definition of the problem, this is zero. So,

∂αgµν = Γλαµgλν + Γλανgµλ.

Let us now cyclicly change the indices. First, we shall do α→ µ→ ν → α.Giving

∂µgνα = Γλµνgλα + Γλµαgνλ.

Let us do the interchange again, on this new equation. Giving

∂νgαµ = Γλναgλµ + Γλνµgαλ.

Let us add the first two, and subtract the last equation. Giving

∂αgµν+∂µgνα−∂νgαµ = Γλαµgλν+Γλανgµλ+Γλµνgλα+Γλµαgνλ−Γλναgλµ−Γλνµgαλ.

Now, we notice that by our torsion free assert, we can cancel off some ofthe terms on the RHS. These are the second with the fifth, and third withsixth. This leaves

∂αgµν + ∂µgνα − ∂νgαµ = Γλαµgλν + Γλµαgνλ,

Page 43: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.3 Tensor Calculus 35

from which we further use the torsion-free assert, to see that the two termson the RHS are identical, leaving

∂αgµν + ∂µgνα − ∂νgαµ = 2Γλαµgλν .

Rearranging this trivially results in

Γλαµgλν =12

(∂αgµν + ∂µgνα − ∂νgαµ) .

Now, let us multiply the whole thing by gρν ,

Γλαµgρνgλν =

12gρν (∂αgµν + ∂µgνα − ∂νgαµ)

⇒ Γλαµδρλ =

12gρν (∂αgµν + ∂µgνα − ∂νgαµ) ,

which is just

Γραµ =12gρν (∂αgµν + ∂µgνα − ∂νgαµ) .

Now, if we switch over the µ index to β (just relabelling),

Γραβ =12gρν (∂αgβν + ∂βgνα − ∂νgαβ) .

Upon inspection of this with (1.3.16), we find that they are equal. Therefore,

Γραβ = α ρ β ; ∇αgµν = 0, Tµαβ = 0.

It is very important to note that this only holds for a torsion free connection,with metric having zero covariant derivative. Under these conditions, theaffine connection is the Christofell symbol.

1.3.3 Isometries & Killing’s Equation

Consider the coordinate transformation

gµν(x) 7−→ gµν(x′),

so that the metric in the new frame has the same functional dependance asin the old frame. That is, the new metric depends on x′ in the same way asthe old metric depended on x. Then, we have

ds2(x) = ds2(x′),

and that

g′µν(x′) = gµν(x). (1.3.18)

Page 44: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

36 Gravitation

Therefore, by the transformation rule of the metric,

gµν(x) =∂x′α

∂xµ∂x′β

∂xνg′αβ(x′),

and using (1.3.18) on the RHS, we see that

gµν(x) =∂x′α

∂xµ∂x′β

∂xνgαβ(x′). (1.3.19)

So, a coordinate transformation leaving the metric in the same form (forminvariant), is called an isometry.

The coordinate transformation we considered was xµ 7→ x′µ. Let us con-sider a special case of this; namely

xµ 7−→ x′µ = xµ + εξµ,

where ε is small, and ξµ a vector field. Now, the Jacobian,

Jµν = ∂νx′µ = ∂ν (xµ + εξµ) ,

which is clearly

Jµν = δµν + ε∂νξµ.

Now, by a Taylor expansion, we see that

gαβ(x′) = gαβ(xµ + εξµ) = gαβ(xµ) + εξµ∂µgαβ(xµ) +O(ε2).

So, we now have enough terms to be able to put them all into the metricisometry transformation equation (1.3.19). Thus,

gµν(x) = JαµJβνgαβ(x′)

=(δαµ + ε∂µξ

α) (δβν + ε∂νξ

β)

(gαβ(x) + εξρ∂ρgαβ(x)) .

if we expand out the RHS, neglecting terms in O(ε2), one finds that

gµν(x) = gµν(x) + εξρ∂ρgµν + εgµβ∂νξβ + εgαν∂µξ

α,

rearranging,

gµν(x) = gµν(x) + ε(gµβ∂νξ

β + gαν∂µξα + ξρ∂ρgµν

),

which is obviously just

gµβ∂νξβ + gαν∂µξ

α + ξρ∂ρgµν = 0. (1.3.20)

Now, notice that

ξα = gαβξβ,

Page 45: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.3 Tensor Calculus 37

and then its differential is

∂νξα = ∂ν

(gαβξ

β)

= ξβ∂νgαβ + gαβ∂νξβ.

Hence, we can rearrange this into the form

gαβ∂νξβ = ∂νξα − ξβ∂νgαβ.

So, if we put this into (1.3.20) for the first and second expressions (beingvery careful in changing indices), we get

∂νξµ − ξβ∂νgµβ + ∂µξν − ξβ∂µgνβ + ξβ∂βgµν = 0.

Collecting terms,

∂νξµ + ∂µξν + ξβ (∂βgµν − ∂νgµβ − ∂µgνβ) = 0

The bracketed quantity is just −2gρβΓρµν , so that

∂νξµ + ∂µξν − 2ξβgρβΓρµν = 0,

which is just,

∂νξµ + ∂µξν − 2Γρµνξρ = 0.

This is just the covariant derivative (noting the symmetry of the Christofellsymbols),

∇µξν +∇νξµ = 0. (1.3.21)

This is known as Killing’s equation. A vector ξν satisfying Killing’s equationis called a Killing vector.

Let us just recap what we have done. A metric is said to have an isometryif it can transform, retaining its functional dependence. Then, under a smallcoordinate transformation, with a vector field ξµ, the field satisfying Killing’sequation will give an isometry.

Now, a theorem states that, for a tangent and Killing vector, Tµ, ξµ re-spectively, there is a conserved quantity Tµξµ along an affinely parameterisedgeodesic. So, to prove it, we consider

D

Du(Tµξµ) = T ν∇ν(Tµξµ) = T ν (ξµ∇νTµ + Tµ∇νξµ) .

Now, the first term is zero, as we are on an affinely parameterised geodesic.Now, notice that we can write the final term as

∇νξµ =12

(∇νξµ +∇µξν),

Page 46: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

38 Gravitation

and thus we have

D

Du(Tµξµ) = T νTµ

12

(∇νξµ +∇µξν).

We were able to interchange the indices (with the factor of one-half to cancelout the double counting), because the things multiplying it are symmetricunder interchange of indices. Notice that the bracketed term is just Killing’sequation, for some Killing vector ξν . Therefore,

D

Du(Tµξµ) = 0,

thus, Tµξµ is some conserved quantity along an affinely parameterised geodesic.

1.3.4 Summary

We shall soon see some examples of geodesics, and what a Killing vectorcorresponds to; but before then we shall bring together our definitions of theChristofell symbol, and introduce a little new notation (just to be inkeepingwith the literature).

We have that the affine connection, Γµνλ is the same as the geodesicconnection µ α ν, for manifolds for whom

∇αgµν = 0, Tαµν = 0 ⇒ Γαµν = µ α ν .We also derived that the relation between the Christofell symbol (as we mayas well call it), and the metric, is

Γραµ =12gρν (−∂νgαµ + ∂αgµν + ∂µgνα) .

Infact, Γραµ are generally denoted Christofell symbols of the second kind.We can infact see that

Γραµ = gρνΓναµ,

where we call the Γναµ the Christofell symbols of the first kind. When werefer to the “Christofell symbols”, we shall mean those of the second kind.

We derived that the affine geodesic is the same as the metric geodesic, foraffinely parameterised geodesics (satisfying the above torsion & covariantderivative relations).

We also saw that the Christofell symbols are not tensors. The non-tensorial nature of the symbols allowed us to derive that there exists a pointin a manifold, where all components of the symbol are zero. That is, thereexists a point where the manifold is flat.

Page 47: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.3 Tensor Calculus 39

The Lagrangian squared,

L2 =(ds

du

)2

= gµν xµxν ,

is just the line element length. Its possible values are 0,±1. We classify 0as null geodesics, +1 as time-like and −1 as space-like.

Also by way of being inkeeping with the literature, some books denotepartial & covariant derivatives in a different way. Sometimes one may see

∂νAµ ≡ Aν,µ, ∇νAµ ≡ Aν;µ.

That is, a “comma” representing partial derivatives, and a “semi-colon” forcovariant derivatives.

1.3.5 Examples

Here we shall see specific examples of geodesics, Killing vectors & how tocompute Christofell symbols.

1.3.5.1 Computing Christofell Symbols: Effective Lagrangian

Now, before we go onto the effective Lagrangian method of computing theChristofell symbols, we shall see how to do so, via brute force.

Brute force: plane polars In plane polars, we have the line element

ds2 = dr2 + r2dφ2,

and therefore, reading off the components of the metric & inverse

(gij) =(

1 00 r2

), (gij) =

(1 00 1/r2

).

Then, using the notation that dsi = (dr, dφ), we see that grr = 1, gφφ =r2, grr = 1, gφφ = r−2 are the only non-zero components. So, to computethe Christofell symbols (the brute force way), we must find

Γi jk =12

∑a

gia (−∂agjk + ∂jgak + ∂kgja) , i, j, k, a = r, φ.

We shall spell out, in detail, how to compute one of the components;

Γr φφ =12

∑a=r,φ

gra (−∂agφφ + ∂φgφa + ∂φgφa)

=12

[grr (−∂rgφφ + ∂φgφr + ∂φgφr) + grφ (−∂φgφφ + ∂φgφφ + ∂φgφφ)

].

Page 48: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

40 Gravitation

Now, one of the first things we note, is that the metric is diagonal: alloff-diagonal components are zero. So, the above reduces to

Γr φφ = −12grr∂rgφφ

= −12.1.

∂rr2

= −r.We have thus found one of the components of the Christofell symbol. Weshall state the rest of them (as going through how to find each one is verytedious).

Γr rr = 0, Γr rφ = Γr φr = 0, Γr φφ = −r,Γφφφ = 0, Γφrφ = Γφφr = r−1, Γφrr = 0.

Now, we shall show how to find them in a more intelligent manner.

Effective Lagrangian Method When we derived the metric geodesic, wehad that the Lagrangian was

L =

√ds

du.

Now, consider

Leff ≡ L2.

The Euler-Lagrange equation for Leff is

d

du

(∂Leff

∂xµ

)− ∂Leff

∂xµ= 0,

from which it is clear to see that

2L[d

du

(∂L

∂xµ

)− ∂L

∂xµ

]= 0.

Thus, the if L satisfies the Euler-Lagrange equation, then so does L2. Thismakes life a lot simpler, as we can consider just gµν xµxν , rather than itssquare-root.

So, for plane polars, where

Leff = L2 = r2 + r2φ2,

we have two Euler-Lagrange equations, one for each coordinate r, φ. Theyare

r − rφ2 = 0, 2rφ+ rφ = 0.

Page 49: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.3 Tensor Calculus 41

Now, if we get these equations into the form x+ Cx1x2 = 0,

r − rφ2 = 0, φ+rφ

r+φr

r= 0.

So, we see that we can read off the Christofell symbols, by inspection. Tosee this a little clearer, the “general” metric geodesic, for r, is

r +∑i,j=r,φ

Γr ij xixj = 0;

then, we can see that the only Christofell components that is non-zero is thatwhere i = j = φ, and that value is −r. Thus, we read off that Γr φφ = −r,which is in accord to what we had by the brute force method. For φ, thegeneral geodesic is

φ+∑i,j=r,φ

Γφij xixj = 0;

and we therefore see two non-zero Christofell symbols: when i = r, j = φ

and i = φ, j = r. The corresponding Christofell symbols components arethus Γφrφ = Γφφr = r−1. Again, in accord with the brute force components.

1.3.5.2 Computing the Geodesic

Now, we are able to find the geodesic: a parameterised curve that extremisesthe distance between two points, in the plane polar coordinate system.

When we computed the Euler-Lagrange equation for φ, we had a term(which we didnt state above, but is easy to see, upon computation)

d

du(2r2φ) = 0 ⇒ r2φ = B = const.

That is, φ = B/r2. Now, the effective Lagrangian is just ds2/du2, which isjust the line element, which can be one of 3 values (as previously stated),

Leff = L2 =

01−1

≡ A.

So, the effective Lagrangian is just

Leff = r2 + r2φ2 = A.

Hence, using our expression for φ,

r2 + r2B2

r4= A ⇒ r =

√A− B2

r2.

Page 50: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

42 Gravitation

Now, if we notice that

φ

r=dφ

dr=B

r2

(A− B2

r2

)−1/2

,

then this integrates to

r cos(φ− φc) =B√A.

If we take A = 1, so that we are talking about time-like geodesics, then theequation becomes

r cos(φ− φc) = r cosφ cosφc + r sinφ sinφc = B.

We now note that φc is a constant, x = r cosφ, y = r sinφ, giving

mx+ ty = B ⇒ y = mx+ c.

That is, the time-like geodesic is a straight line.Let us now consider the null geodesic. We appeal back to the effective

Lagrangian, which becomes

r2 + r2φ2 = 0.

This has solution r = 0 and φ = 0. That is, both radius & angle areconstants. That is, a single point. Thus, the null geodesic is a point (nullsize).

When we consider the space-like geodesic, we find that there is no solution:it does not exist in plane polars.

Example of Geodesic 2 Let us compute another geodesic, for anotherline element,

ds2 =1t2

(dt2 − dx2).

So, we see that the effective Lagrangian is

Leff =t2 − x2

t2, t ≡ dt

du, x ≡ dx

du.

Then, the Euler-Lagrange equations for this effective Lagrangian are

t− t2

t− x2

t= 0, x− 2

txt = 0.

So, we can read off the Christofell symbol components. The only non-zerocomponents are

Γt xx = Γt tt = −1t, Γxxt = Γxtx = −1

t.

Page 51: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.3 Tensor Calculus 43

1.3.5.3 Physical Meaning of the Killing Vector

Again, let us go back to plane polars. The line element is

ds2 = dr2 + r2dφ2.

We would like to think of a vector that leaves the line element unchanged.A transformation on φ works:

φ 7−→ φ′ = φ+ ε,

so that

(r, φ) 7−→ (r′, φ′) = (r, φ+ ε) = (r, φ) + ε(0, 1).

Therefore, our Killing vector is

ξi = (0, 1).

Now, we stated that T iξi is a conserved quantity. Let us consider what itis. So,

ξixi = gijξixj

= gφφξφxφ

= r2.1.φ

= r2φ.

This quantity is a constant (as it is conserved). We also notice that it is theexpression for the angular momentum of the system. Therefore, the con-served quantity associated with the Killing vector is the angular momentum,in plane polars.

1.3.5.4 Nordstrom’s Theory of Gravity

Let us compute the connection associated with gµν = Ω2gµν . Now, theconnection associated with gµν is

Γραβ =12gρν (∂αgβν + ∂βgνα − ∂νgαβ) .

Hence,

∂α (gµν) = ∂α(Ω2gµν

)= gµν2Ω∂αΩ + Ω2∂αgµν .

Page 52: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

44 Gravitation

Therefore,

Γραβ =12gρν (∂αgβν + ∂β gνα − ∂ν gαβ)

=12

1Ω2gρν(2Ωgβν∂αΩ + Ω2∂αgβν + 2Ωgνα∂βΩ+

Ω2∂βgνα − 2Ωgαβ∂νΩ− Ω2∂νgαβ)

=12gρν (∂αgβν + ∂βgνα − ∂νgαβ) +

1Ωgρν (gβν∂αΩ + gνα∂βΩ− gαβ∂νΩ)

= Γραβ +1Ω

(δρβ∂αΩ + δρα∂βΩ− gρνgαβ∂νΩ

)Hence,

Γραβ = Γραβ +1Ω

(δρβ∂αΩ + δρα∂βΩ− gρνgαβ∂νΩ

).

Let us suppose that we have

gµν = e2φηµν ,

so that

Ω = eφ, gµν = ηµν .

Hence,

∂αΩ = eφ∂αφ, Γµαβ = 0.

Hence, using these,

Γραβ = e−φ(δρβe

φ∂αφ+ δραeφ∂βφ− ηρνηαβeφ∂νφ

)= δρβ∂αφ+ δρα∂βφ− ηρνηαβ∂νφ.

So, the geodesic equation, with this connection, reads

xρ +(δρβ∂αφ+ δρα∂βφ− ηρνηαβ∂νφ

)xαxβ = 0.,

which reduces to

xρ + xρ2∂αφxα − x2∂ρφ = 0.

Now, null geodesics have x2 = 0, so that this geodesics null value is

xρ + xρ2∂αφxα = 0.

Similarly, timelike geodesics have x2 = 1, so that this geodesics timelikevalue is

xρ + 2∂αφxα − ∂ρφ = 0.

Page 53: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.4 Curvature 45

Now, this example provides us with some practice with using tensors &computing geodesics. In addition to this, we have found the geodesics for atheory whereby the metric is given by e2φηµν . This theory was proposed byNordstrom before Einstein.

1.4 Curvature

We have now got enough mathematical tools to be able to consider thecurvature of a manifold.

To continue, consider the commutator of covariant derivatives, actingupon a scalar,

[∇µ,∇ν ]φ = ∇µ∇νφ−∇ν∇µφ.Now, as we previously showed, the covariant derivative of a scalar is just thenormal partial derivative. Therefore,

[∇µ,∇ν ]φ = ∇µ(∂νφ)−∇ν(∂µφ).

We can now expand out the covariant derivatives. So,

∇µ(∂νφ) = ∂µ∂νφ− Γλµν∂λφ.

Therefore, the commutator is

[∇µ,∇ν ]φ = ∂µ∂νφ− Γλµν∂λφ− ∂ν∂µφ+ Γλνµ∂λφ.

In a torsion free manifold, the two Christofell terms cancel out, as do thepartial derivative terms (as they commute naturally). Therefore, we see that

[∇µ,∇ν ]φ = 0.

So, the commutator of partial derivatives, acting upon a scalar, is zero. Thisresult isn’t perhaps that surprising. So, let us consider the commutatoracting upon a vector.

1.4.1 The Riemann Tensor

As previously stated, we shall compute the commutator of covariant deriva-tives, acting upon a vector. That is,

[∇µ,∇ν ]Aρ = ∇µ∇νAρ −∇ν∇µAρ.Now, before, we expanded out the inner covariant derivatives first (as theyresulted in just partial derivatives). However, if we do that this time, wewill end up having to compute the covariant derivative of the Christofell

Page 54: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

46 Gravitation

symbol, which we don’t know how to do. Hence, we expand out the outerderivatives first. So,

∇µ∇νAρ = ∂µ(∇νAρ) + Γρµλ∇νAλ − Γλµν∇λAρ,thus, the commutator reads

[∇µ,∇ν ]Aρ = ∂µ(∇νAρ) + Γρµλ∇νAλ − Γλµν∇λAρ−∂ν(∇µAρ)− Γρ νλ∇µAλ + Γλνµ∇λAρ.

So, we see that the final terms on the RHS cancel (i.e. third & sixth),

[∇µ,∇ν ]Aρ = ∂µ(∇νAρ) + Γρµλ∇νAλ−∂ν(∇µAρ)− Γρ νλ∇µAλ.

Now, expanding out the remaining covariant derivatives,

[∇µ,∇ν ]Aρ = ∂µ

(∂νA

ρ + ΓρλνAλ)

+ Γρµλ(∂νA

λ + ΓλνβAβ)

−∂ν(∂µA

ρ + ΓρλµAλ)− Γρ νλ

(∂µA

λ + ΓλµβAβ).

Now, as partial derivatives commute, the two terms on the far LHS cancel.So, cancelling & expanding out the brackets, we have

[∇µ,∇ν ]Aρ = (∂µΓρλν)Aλ + Γρλν∂µAλ + Γρµλ∂νA

λ + ΓρµλΓλνβAβ

−(∂νΓρλµ)Aλ − Γρλµ∂νAλ − Γρ νλ∂µA

λ − Γρ νλΓλµβAβ.

Now, we see that the second term cancels with the seventh, and the thirdwith the sixth (again, by assuming torsion free manifolds). Leaving us with

[∇µ,∇ν ]Aρ = (∂µΓρλν)Aλ + ΓρµλΓλνβAβ − (∂νΓρλµ)Aλ − Γρ νλΓλµβA

β.

Now, in the second & fourth terms, let us interchange β ↔ λ, giving

[∇µ,∇ν ]Aρ = (∂µΓρλν)Aλ + ΓρµβΓβνλAλ − (∂νΓρλµ)Aλ − Γρ νβΓβµλA

λ,

so that we can take out a common factor of Aλ,

[∇µ,∇ν ]Aρ =(∂µΓρλν + ΓρµβΓβνλ − ∂νΓρλµ − Γρ νβΓβµλ

)Aλ.

Now, we define the bracketed quantity to be the Riemann tensor,

Rρλµν ≡ ∂µΓρλν + ΓρµβΓβνλ − ∂νΓρλµ − Γρ νβΓβµλ, (1.4.1)

so that the commutator reads

[∇µ,∇ν ]Aρ = RρλµνAλ. (1.4.2)

Page 55: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.4 Curvature 47

The Riemann tensor is a (13)-tensor. It is clear that Rρλµν is a tensor, as

the LHS of the above is a tensor, the RHS must also be (as Aρ is a tensor).This obviously not a rigorous proof of the tensorial nature of the Riemann“tensor”, so we shall prove it.

We have

(∇µ∇ν −∇ν∇µ)Aρ = RρλµνAλ,

and therefore that (∇′µ∇′ν −∇′ν∇′µ)A′ρ = R′ρλµνA′λ.

Now, as covariant derivatives are tensors, we know that

∇′µ∇′νA′ρ =(J−1

)αµ

(J−1

)βνJργ∇α∇βAγ .

Hence, (J−1

)αµ

(J−1

)βνJργ (∇α∇β −∇β∇α)Aγ = R′ρλµνJ

λπA

π.

Now, on the LHS, we see that (∇α∇β −∇β∇α)Aγ = RγσαβAσ. Therefore,(

J−1)αµ

(J−1

)βνJργR

γσαβA

σ = R′ρλµνJλπA

π.

Now, multiplying through by something that will ‘kill off’ the Jacobian onthe RHS,

(J−1

)δλ

for example,(J−1

)δλ

(J−1

)αµ

(J−1

)βνJργR

γσαβA

σ = R′ρλµνδδπA

π = R′ρλµνAδ.

Now, as this must be valid for all Aµ, we must set δ = σ. Therefore, doingso & canceling off the Aµ,(

J−1)σλ

(J−1

)αµ

(J−1

)βνJργR

γσαβ = R′ρλµν ,

which is the transformation rule of a (13)-tensor. Therefore, we have proven

that the Riemann tensor is infact a tensor.Just to be in-keeping with some literature, the Riemann tensor is also

called the Riemann-Christofell tensor, or the curvature tensor.

1.4.1.1 Symmetries of the Riemann Tensor

Now, in one of our previous discussions, we introduced the local inertialframe (LIF), whereby at a point xµ = xµ∗ , the metric is flat, and theChristofell symbols are all zero;

gµν(x∗) = ηµν , ∂ρgµν(x∗) = 0, Γρµν(x∗) = 0.

Page 56: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

48 Gravitation

In a LIF, the Riemann tensor looks quite simple. So, we see that the Rie-mann tensor, in a LIF, is just

Rρλµν = ∂µΓρλν − ∂νΓρλµ.

Putting in expressions for the Christofell symbols, and noting that the firstderivative of the metric is zero;

Rρλµν =12gρπ (∂µ∂λgνπ + ∂µ∂νgπλ − ∂µ∂πgλν − ∂ν∂λgµπ − ∂ν∂µgπλ + ∂ν∂πgλµ) ,

the second & fifth terms cancel each other (as partial derivatives commute),leaving

Rρλµν =12gρπ (∂µ∂λgνπ − ∂µ∂πgλν − ∂ν∂λgµπ + ∂ν∂πgλµ) .

Now, to get rid of the metric multiplying the bracket, we form

Rαλµν = gαρRρλµν

=12gαρg

ρπ (∂µ∂λgνπ − ∂µ∂πgλν − ∂ν∂λgµπ + ∂ν∂πgλµ)

=12δπα (∂µ∂λgνπ − ∂µ∂πgλν − ∂ν∂λgµπ + ∂ν∂πgλµ)

=12

(∂µ∂λgνα − ∂µ∂αgλν − ∂ν∂λgµα + ∂ν∂αgλµ) .

Of course, it must be clear that this is only valid an a LIF. Now, althoughthe above expression is only valid in a LIF, the resulting symmetries arevalid everywhere (as the Riemann tensor is a tensor). We see that

Rαλµν = −Rλαµν = −Rαλνµ = Rµναλ = Rλανµ. (1.4.3)

And further that

Rαλµν +Rαµνλ +Rανλµ = 0. (1.4.4)

This can also be denoted

Rα(λµν) = 0,

where the notation is understood to mean cyclic interchange, and sum, overbracketed indices.

Theorem We state (without proof), that if all components of the Riemanntensor are zero, then the space is flat. That is

Rλµνδ = 0 ⇒ flat space.

Page 57: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.4 Curvature 49

1.4.1.2 The Round Trip

Now, although we shall not go into the details here (we have already pre-sented a full mathematical treatment, however, of the Riemann tensor), onecan show that the Riemann tensor comes about from a round-trip around arectangle.

Consider a rectangle, with horizontal sides of length ∆xµ and verticalsides length δxµ. Then, if one makes the parallel-transported round tripA → B → C → D → A, and if one computes the coordinate shift (merelydue to displacement) at each vertex, then one finds that

Aρ1 = (1 + δxµ∆xν [∇µ,∇ν ])Aρ0,

where Aρ1 is the component of A, after visiting that point after making theround trip (i.e. one starts at Aρ0). Then,

Aρ1 −Aρ0 = ∆Aρ = δxµ∆xν [∇µ,∇ν ]Aρ0.

Now, we see that [∇µ,∇ν ]Aρ0 = RραµνAα, and so,

∆Aρ = δxµ∆xνRραµνAα

Now, notice that

Rραµνδxµ∆xν =

12(Rραµνδx

µ∆xν +Rρανµδxν∆xµ

)=

12(Rραµνδx

µ∆xν −Rραµνδxν∆xµ)

=12Rραµν∆Sµν ,

where we have used the anti-symmetry identity of the Riemann tensors lasttwo indices. Also, we have defined ∆Sµν ≡ δxµ∆xν − δxν∆xµ. Therefore,we can write the round-trip expression as

∆Aρ =12

∆SµνRραµνAα.

Therefore, we have a semi-geometrical interpretation of the Riemann tensor.It is able to tell us the difference in the orientation of a vector, after makinga round trip about a rectangle, in a manifold.

1.4.2 The Ricci Identity

We call the commutator we defined above, the Ricci identity. That is, theRicci identity is

[∇µ,∇ν ]Aρ = RρλµνAλ,

Page 58: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

50 Gravitation

where Rρλµν is the Riemann tensor, in a torsion-less manifold.

1.4.3 The Ricci Tensor & Scalar

If we contract the Riemann tensor on its first & third indices,

Rρλρν = gραRαλρν ,

we have a quantity we define

Rλν ≡ Rρλρν .If we further contract Rλν ,

R ≡ gλνRλν .Thus, we have what we define the Ricci tensor, Rµν and Ricci scalar, R.By the symmetries of the Riemann tensor above, we can easily see that theRicci tensor is symmetric.

Now, when we stated that the condition for flat space was that all com-ponents of the Riemann tensor were zero; if the Ricci scalar is zero, then thespace is not necessarily flat. One can see this, as upon contraction, somenon-zero components may cancel each other out in summation.

1.4.3.1 Example: Plane Polars

Consider the line element

ds2 = dθ2 + sin2 θdφ2,

and suppose that we are given that

Rθφθφ = sin2 θ

is the only non-zero component of the Riemann tensor (obviously we canfind the other non-zero components by symmetry relations); then, we cancompute the Ricci scalar R.

The non-zero components of the metric are easily read off the line element;

gθθ = gθθ = 1, gφφ = sin2 θ, gφφ =1

sin2 θ.

Now,

Rθφθφ = gθθRθφθφ = sin2 θ.

Now, by symmetry of the Riemann tensor,

Rθφθφ = −Rφθθφ = −Rθφφθ = Rφθφθ.

Page 59: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.4 Curvature 51

Now, the Ricci tensor is found by contraction,

Rij = gnmRnimj .

We are slightly fortunate in that the metric is diagonal. So,

Rθθ = gijRiθjθ

= gθθRθθθθ + gφφRφθφθ

= 1.0 +1

sin2 θ. sin2 θ

= 1.

Also,

Rθφ = gθθRθθθφ + gφφRφθφφ

= 0.

And,

Rφφ = gθθRθφθφ + gφφRφφφφ

= 1. sin2 θ + 0

= sin2 θ.

Therefore, the Ricci scalar,

R = gijRij

= gθθRθθ + gφφRφφ

= 1 +1

sin2 θsin2 θ

= 2.

Therefore, the Ricci scalar is 2 for the plane polar metric.Now, if we were to repeat this, for the line element

ds2 = dr2 + r2dθ2 + r2 sin2 θdφ2,

we would find that R = 0.

1.4.4 The Bianchi Identity

Consider the Riemann tensor, in a LIF,

Rρλµν = ∂µΓρλν − ∂νΓρλµ.

Then, let us differentiate it,

∇πRρλµν = ∇π∂µΓρλν −∇π∂νΓρλµ.

Page 60: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

52 Gravitation

Now, even though we dont know how to evaluate these expressions, we canstill cycle indices to see what happens. So, making the change

π → µ→ ν → π,

then

∇µRρλνπ = ∇µ∂νΓρλπ −∇µ∂πΓρλν ,

and again,

∇νRρλπµ = ∇ν∂πΓρλµ −∇ν∂µΓρλπ.

Now, if we add these 3 expressions,

∇πRρλµν +∇µRρλνπ +∇νRρλπµ = ∇π∂µΓρλν −∇π∂νΓρλµ+∇µ∂νΓρλπ −∇µ∂πΓρλν ,

+∇ν∂πΓρλµ −∇ν∂µΓρλπ.

Now, in a LIF, the Christofell symbols are zero. Therefore, the covariantderivative is the same as the “usual” partial derivative. So, rather thanchanging the above symbols, we let covariant and partial derivative swapindices. Then, we can see that the entire RHS cancels itself out, leaving

∇πRρλµν +∇µRρλνπ +∇νRρλπµ = 0.

Now, if we drop the ρ-index (using a metric, but index relabeling is trivial),

∇πRρλµν +∇µRρλνπ +∇νRρλπµ = 0.

And, using the symmetry property that Rαβγδ = Rγδαβ, then

∇πRµνρλ +∇µRνπρλ +∇νRπµρλ = 0,

which we see is just a cyclic interchange of the first three indices of the wholeexpression. That is,

∇(πRµν)ρλ = 0.

Hence, we have arrived at our result. The Bianchi identity is that

∇πRµνρλ +∇νRπµρλ +∇µRνπρλ = 0. (1.4.5)

The Bianchi identity is infact the equivalent of the rectangular round-tripexpression we derived above. The Bianchi identity will come about if oneconsiders the difference in orientation of a vector being parallelly-transportedaround a cuboid.

Although we derived the Bianchi identity with the Riemann tensor in aLIF, the expression is completely valid in all frames. This is because the

Page 61: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.4 Curvature 53

Riemann tensor is a tensor; and a tensor equation has the same form inall frames. Thus, one begins to see the power of getting expressions intotensorial form, and of the local inertial frame.

1.4.5 The Einstein Tensor

Now, let us take the Bianchi identity,

∇πRµνρλ +∇νRπµρλ +∇µRνπρλ = 0.

Now, let us figure out how to contract this expression, so that we have Riccitensors, rather than Riemann tensors. Now, if we multiply the expressionby gµλ, then we will have achieved our goal (one can see that the indices ofthis metric are those on the first and last parts of the first Riemann tensor).However, let us do this methodically. So, the first expression will read,

gµλ∇πRµνρλ = −gµλ∇πRµνλρ = −∇πRνρ,after using the anti-symmetry of the last two indices of the Riemann ten-sor. The second term can be rewritten, using the symmetry identity of theinterchange first two & last two indices of the Riemann tensor;

gµλ∇νRπµρλ = gµλ∇νRµπλρ = ∇νRπρ.Lastly, the final term of the contracted Bianchi identity is just

gµλ∇µRνπρλ = ∇λRνπρλ.Therefore, putting these all together, our contracted Bianchi identity lookslike

∇νRπρ −∇πRνρ +∇λRνπρλ = 0.

Now, multiplying this whole expression by gνρ will contract the last Riemanntensor into a Ricci tensor; as well as the middle Ricci tensor into a Ricciscalar. Thus,

gνρ∇νRπρ − gνρ∇πRνρ + gνρ∇λRνπρλ = 0,

⇒ ∇ρRπρ −∇πR+∇λRπλ = 0.

Now, the first and last expressions are identical, as we can interchange theindices at will. Therefore, we have

2∇ρRπρ −∇πR = 0.

Then, notice that we can write

2∇ρRπρ − gπρ∇ρR = 0,

Page 62: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

54 Gravitation

and therefore that

∇ρ (Rπρ − 12gπρR

)= 0.

Therefore, we can define the Einstein tensor,

Gµν ≡ Rµν − 12gµνR, (1.4.6)

whereby

∇µGµν = 0; (1.4.7)

after noting that the metric, Ricci & therefore the Einstein tensor are sym-metric. This is called the contracted Bianchi identity.

1.4.6 Geodesic Deviation

Suppose we take, on flat space, two affinely parameterised geodesics xµ(τ), yµ(τ),that are on a collision course. That is, the distance between the two lines,

δµ(τ) ≡ xµ(τ)− yµ(τ),

decreases. On flat space, the distance will decrease linearly. That is,

dδµ

dτ= const ⇒ d2δµ

dτ2= 0.

Now consider a curved space. Let the paths be tangents. Then, the dis-tance between the two wont decrease linearly. Instead, they will accelerate;thus

D2δµ

Dτ2= RµαβρT

αT βδρ. (1.4.8)

To imagine this in a physical situation, consider two balls falling towardsthe centre of the earth. Now, the balls will obviously move towards eachother, as their motion is radial. However, there will be deviation from radial,and that deviation will be due to the curvature of space. That is, one willobserve the balls accelerate towards each other (rather than the expectedlinear motion towards each other).

Derivation We can derive the geodesic deviation equation, by consideringthe 2-dim manifold swept out by two affinely parameterised geodesics, nextto each other. The manifold may be parameterised by xµ = xµ(τ, σ) (i.e.two coordinates on this surface, rather than the usual one, on a curve). Thetangent vectors are

Tµ =dxµ

dτ, δµ =

dxµ

dσ.

Page 63: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.4 Curvature 55

Now, let us show a useful relation. Consider

Tµ∇µδν = Tµ(∂µδ

ν + Γν µλδλ)

=dxµ

∂xµdxν

dσ+ Γν µλT

µδλ.

Now, the first term can be rewritten

dxµ

∂xµdxν

dσ=

d2xν

dτdσ

=d2xν

dσdτ

=dxµ

∂xµdxν

dτ.

Hence, we use this to see that

Tµ∇µδν =dxµ

∂xµdxν

dτ+ Γν µλT

µδλ

= δµ∂µTν + Γν λµT

λδµ

= δµ∇µT ν ,where we have merely used the symmetry of the Christofell symbol. Hence,we have the relation

Tµ∇µδν = δµ∇µT ν . (1.4.9)

Now, let us state the operator

D2

Dτ2= Tα∇αT β∇β,

and compute its action upon δµ,

D2δµ

Dτ2= Tα∇α

(T β∇βδµ

).

Now, we use our relation (1.4.9), to see that

D2δµ

Dτ2= Tα∇α

(δβ∇βTµ

).

Let us now expand this out,

D2δµ

Dτ2= Tα∇αδβ∇βTµ + Tαδβ∇α∇βTµ.

Now, we can rewrite the two-covariant derivatives term on the far RHS,using the Ricci identity,

[∇µ,∇ν ]Aρ = RρλµνAλ,

Page 64: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

56 Gravitation

so that we have

D2δµ

Dτ2= Tα∇αδβ∇βTµ + TαδβRµλαβT

λ + Tαδβ∇β∇αTµ

= δα∇αT β∇βTµ + T βδα∇α∇βTµ +RµλαβTαT λδβ

= δα(∇αT β∇βTµ + T β∇α∇βTµ

)+RµλαβT

αT λδβ.

In the first step we used our relation (1.4.9) on the first term, and changeddummy indices on the third term, then we merely factorised the expression.Now, notice that the bracketed term can be written

∇αT β∇βTµ + T β∇α∇βTµ = ∇α(T β∇βTµ

),

but the bracketed part on the RHS is zero on an affinely parameterisedgoedesic. Hence,

D2δµ

Dτ2= RµλαβT

αT λδβ,

or, trivially relabelling indices, we arrive at our equation for geodesic devi-ation

D2δµ

Dτ2= RµαβρT

αT βδρ. (1.4.10)

1.5 Einstein’s Equation

We almost have enough tools to be able to write Einstein’s equation.We have seen that freely-falling particles follow geodesics. In curved space-

time, the geodesics will probably be curves. So then, what makes the space-time curved?

If we consider electromagnetic theory, there is a source for the electricfield: the electron. For a field, there is a source. Therefore, we need asource term that will curve spacetime. We shall now discuss a term that isthe “gravitation source term”.

1.5.1 The Energy Momentum Tensor Tµν

We shall start by stating that there exists a tensor Tµν , which is symmetric.That is, Tµν = T νµ. Furthermore, we shall state that the components ofthis tensor contain all possible forms of energy and momentum (it will bethis tensor which is the source-term). Let us state how to compute a givencomponent of the tensor.

Page 65: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.5 Einstein’s Equation 57

A given element Tµν is the flux of pµ that goes through the hypersurfacexν = const.

The structure of the tensor is clearly

(Tµν) =(T tt T ti

T it T ij

).

Also, before we start to compute the components of the tensor, we muststate that the full 4-volume is just ∆t∆x∆y∆z.

1.5.1.1 Components of Tµν

Lets consider T 00 = T tt. Then, by our definition, that element is the flux ofp0 through the surface x0 = const. Now, p0 = E and x0 = t. Therefore, wesee that T tt is the flux of energy E through a 3-volume ∆x∆y∆z (it is thehypersurface that holds x0 = t constant). Therefore,

T tt =E

∆x∆y∆z≡ ε,

that is, the energy per unit volume, the energy density ε.Consider the component T 01 = T tx. Then, we see that it is the flux of

p0 = E through the hypersurface x1 = x = const. That is,

T 01 =E

∆t∆y∆z,

which has the interpretation of being the energy flux through the y−z plane,in unit time. This is easily extrapolated to the any term T 0i: the energyflux through a surface, in unit time.

Now consider the purely-spatial components, T ij . For example,

T ix =∆pi

∆t∆y∆z.

Now, notice that we can rewrite this,

T ix =∆pi/∆tAyz

, Ayz ≡ ∆y∆z;

where we have fairly obviously defined an area-element. A change in mo-mentum per unit time is just a force. Thus,

T ix =F i

Ayz,

which is a force per unit area: a pressure. Consider the specific component,

T yx =∆py/∆t∆y∆z

,

Page 66: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

58 Gravitation

then, using the relation pi = viE, we see that

T yx =∆vyE/∆t

∆y∆z.

So, as vi = xi/t, we see that this is just

T yx =∆y/∆tE/∆t

∆y∆z.

Now then, as the ∆y’s cancel, we can just replace them with ∆x’s, thus

T yx =∆x/∆tE/∆t

∆x∆z

=∆vxE/∆t

∆x∆z

=∆px/∆t∆x∆z

= T xy.

Therefore, with this little exercise, we see that the spatial components ofthe energy-momentum tensor are infact symmetric. One may be able to seethat the off-diagonal components of the spatial part, those T ii, correspondto the force perpendicular to a surface. Those off-digonal elements are theforce parallel to a surface (shear). Therefore, the spatial components, T ij

are components of the stress-tensor.The final part to the tensor, are those components T it. Thus, we see that

they are the flow of pi through the hypersurface t = const. That is, themomentum flow in a given 3-volume, at a constant time. That is, how muchmomentum there exists in a unit volume, at a single time. This is clearlythe momentum density.

T it =∆pi

∆x∆y∆z≡ πi.

To see that T it = T ti, consider the above expression; writing pi = viE =xiE/t, then,

T it =∆xi/∆tE∆x∆y∆z

=∆xiE

∆t∆x∆y∆z.

Then, as i is changed through i = x, y, z, different components on the de-nominator will be cancelled out, leaving only those in the correspondingT ti.

Therefore, we have seen that the energy-momentum tensor Tµν is sym-metric, by considering its components. The colloquial construction of the

Page 67: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.5 Einstein’s Equation 59

tensor is thus

(Tµν) =(

energy density energy fluxmomentum density stress tensor

).

We shall write that T it = πi, T tt = ε.

1.5.1.2 Conservation Equations

The standard conservation equation is that

∇νTµν = 0. (1.5.1)

Let us consider this in a LIF. Then, it simply reads ∂νTµν = 0.Now, take the time-component, µ = 0 = t. Then, the conservation equa-

tion reads∂

∂tT tt +

∂xiT ti = 0,

which is just∂ε

∂t+∂πi

∂xi= 0.

This equation can be written

∂ε

∂t+∇ · π = 0,

and is the familiar continuity equation, for energy. Note that this is onlyvalid in a LIF.

Let us take the spatial components, µ = i of the conservation equation.Thus,

∂tT it +

∂xjT ij = 0.

Now, if we write the force density, in a given direction

φi ≡ −∂Tij

∂xj,

then we see that the conservation equation reads

∂πi

∂t− φi = 0,

which is just the statement that the rate of change of momentum densityis the force density. This is the familiar statement of Newton’s second law.That is, the above equation is just

∂π

∂t= φ.

Page 68: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

60 Gravitation

Therefore, we see that the energy-momentum tensor Tµν contains allsources of energy and momentum, and satisfies basic conservation relations.

1.5.1.3 Perfect Fluids

A perfect fluid is defined to be one for whom there is no viscosity or heatconduction. This “restriction” makes the energy-momentum tensor look alot simpler.

That a fluid has no heat conduction means that there is no transfer ofenergy, across surfaces. Viscous forces are those which are parallel to asurface (shear). Thus, the absence of such forces, implies that all forces onsurfaces are perpendicular to those surfaces.

Therefore, if we consider our previous “derivation” of the components ofTµν , we see that it must be diagonal. This is because

• No heat conduction implies no energy flux. Therefore, T ti = T it = 0.• No viscosity means that all parallel forces are zero. This only leaves

diagonal components to the stress tensor. All components left-over arejust pressures (as discussed previously), P .

We shall change notation slightly, so that ρ is the energy density (whichis clearly the case, via ε = ρc2, with c = 1). Therefore, we see that for aperfect fluid, at rest,

Tµν =

ρ 0 0 00 P 0 00 0 P 00 0 0 P

= diag(ρ, P, P, P ).

The Perfect Fluid Tensor The general expression for the energy-momentumtensor, for a perfect fluid in its LIF is

Tµν = (ρ+ P )uµuν − Pgµν , (1.5.2)

where uµ = γ(1,u) is the 4-velocity, ρ the energy density and P the pressureof the fluid. If we take this tensor, with the fluid at rest in its LIF, in flatspace, then, some of the components are

T 00 = (ρ+ P )− P = ρ,

T 12 = 0,

T ij = P.

Infact, all off-diagonal components are zero. Then, we see that we haverecovered our previous expression for a perfect fluid at rest.

Page 69: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.5 Einstein’s Equation 61

We can easily recover some standard fluid mechanics results from theperfect fluid tensor. Suppose we have a non-relativistic pressure-less fluid,P = 0, then, the energy-conservation equation is just

∂µ(ρuµu0) = 0,

which easily becomes∂ρ

∂t+∂ρui

∂xi= 0,

which is just∂ρ

∂t+∇ · (ρu) = 0.

1.5.2 Einstein’s Equation

We now have a source term. The sources of energy and momentum can bewritten “into” the energy-momentum tensor, Tµν ; a tensor which satisfiesthe conservation equation.

Now, from the previous sections contracted Bianchi identity,

∇µGµν = 0, Gµν ≡ Rµν − 12gµνR,

we have an expression which takes care of the geometry of the spacetime.Recall that the Ricci tensor/scalar are composed of differentials (of variousorders) of the metric, where the metric gives meaning to distances withina manifold. Then, if we can equate this expression to an expression whichgives information as to what is doing the curving, then we have our generaltheory of relativity. We must use an expression which also has zero covariantderivative.

The obvious choice is the energy-momentum tensor. Therefore, we write

Gµν = κTµν .

Therefore, up to a constant κ, we have a LHS which describes the geometryof a manifold, and a RHS which describes the distribution of all forms ofenergy and momentum in the manifold. Therefore, we say that the distri-bution of energy-momentum in a manifold causes the manifold to becomecurved.

Therefore, Einstein’s equation is

Rµν − 12gµνR = κTµν . (1.5.3)

Notice that both sides have vanishing covariant derivative. We shall be ableto find the constant κ when we consider the Newtonian limit of the theory.

Page 70: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

62 Gravitation

We can write this in an alternative form. Consider multiplying the wholeexpression by gµν ,

gµν(Rµν − 12gµνR) = κgµνTµν ,

then, writing the trace of the energy-momentum tensor gµνTµν ≡ T , andnoting that we see that the Ricci tensor becomes the Ricci scalar uponcontraction; thus, we see that

R− 12gµµR = κT.

Now, the metric multiplied by its inverse is just the Kronecker-delta. Thus,gµµ = δµµ = 4. Therefore,

R− 124R = κT,

hence,

R = −κT.Thus, we have written the Ricci scalar in terms of the trace of the energy-momentum tensor. Hence, we can write the Einstein equation as

Rµν + 12gµνκT = κTµν ,

which is just

Rµν = κ(Tµν − 12gµνT ). (1.5.4)

This an entirely equivalent form of Einstein’s equation.

1.5.2.1 The Cosmological Constant

Now, if we require the covariant derivative of an expression to be zero, wemay add on an “extra term”, a constant, which will not change the value ofthe covariant derivative. The covariant derivative of the metric is zero, sowe may add on any number of metrics and retain zero covariant derivative.Therefore,

Gµν = κTµν + Λgµν

is still consistent with zero covariant derivative. So, why is this a problem?Consider the expression, from electrodynamic theory, in a LIF,

∂µFµν = Jν .

Then, consider taking the differential of the expression,

∂ν∂µFµν = ∂νJν = 0;

Page 71: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.5 Einstein’s Equation 63

where the equality with zero comes from the “usual” conservation equation.Now, consider that we try to add on an extra term,

∂µ(Fµν + Ληµν).

Then, these two expressions are not the same. That is, we do not have thefreedom to modify the field tensor by adding on an arbitrary quantity ofmetrics. The reason we are not able to do this, is because the field tensor isanti-symmetric, and the metric is symmetric.

Therefore, the reason we are able to add the constant metric term intoEinstein’s equation, is because both the Einstein tensor Gµν and energy-momentum tensor Tµν are symmetric (as is the metric); as well as the metrichaving zero covariant derivative.

The cosmological constant Λ has been measured to exist within the uni-verse, having a very small numerical value. We shall usually define thecosmological constant within the energy-momentum tensor, so that we willessentially ignore it. However, it is to be understood that the term is withinthe energy-momentum tensor.

1.5.3 The Newtonian Limit

Let us discuss the correspondences of the theory of gravity on curved space-time, with Newtonian gravity.

The equation of motion of a free particle, in Newton’s theory, is just givenby Newton’s second law of motion,

d2xi

dt2= −δij ∂Φ

∂xj,

where Φ is the gravitational potential a particle feels. The correspondingequation of motion for a free particle, in curved spacetime, is the geodesic

d2xµ

dτ2= −Γµαβ

dxα

dxβ

dτ,

where the Christofell symbol Γµαβ contains information about the geometryof the spacetime.

The field equation, which describes how “stuff” generates the gravitationalfield, for the Newtonian theory is

∇2Φ = 4πGρm.

That is, Poisson’s equation. This equation tells us that for some mass densityρm, there is an associated gravitational potential Φ. Combined with the

Page 72: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

64 Gravitation

equation of motion, we see that a mass density gives rise to a gravitationalpotential, which affects how a free particle moves.

The field equation in general relativity, is Einstein’s equation,

Rµν − 12gµνR = κTµν .

Some distribution of energy and momentum, defined within the energy-momentum tensor, gives rise to a different geometry. This geometrical in-formation is then carried around by the Ricci tensor, within the metric.The metric then gives the Christofell symbols, which change the equation ofmotion - the geodesic.

The basic correspondence is

gµν ←→ Φ Tµν ←→ ρm.

Notice that we have only been referring to “free-particles”. A free-particleis one which does not have any external influences on its motion. For ex-ample, this could mean a stone being dropped, in vacuum, from a building.The stones motion is only affected by the gravitational potential from theearth. Notice then, that the motion of a freely-falling particle in a curvedspacetime is entirely due to the spacetime through which is moves. That is,its trajectory will be curved because of the geometry of the spacetime.

To modify these equations for a particle which is acted upon by an externalforce, Fext, one must merely add this to each component of the equation ofmotion.

1.5.3.1 Newtonian Gravity from Einstein’s Gravity

Let us consider the geodesic equation,for a free particle,

d2xµ

dτ+ Γµαβ

dxα

dxβ

dτ= 0.

Now, let us consider the non-relativistic limit of this geodesic.Firstly, for non-relativistic motion, τ = t. Second, dxi/dt 1. Then, we

can write the geodesic equation as

d2xµ

dτ2+ Γµ00

(dt

)2

+O(dxi

)2

= 0,

which is just

d2xµ

dτ2+ Γµ00 = 0.

Now then, to continue, we make an assumption about the metric. We say

Page 73: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.5 Einstein’s Equation 65

that the metric is Minkowskian, with a small perturbation,

gµν = ηµν + hµν , hµν 1.

We shall only work to first order in the perturbation. That is, we shallneglect any terms O(h2). We further say that the perturbation is static.That is, hµν(xi) only; which immediately tells us that

∂0hµν = ∂thµν = 0.

Now, the general expression for the Christofell symbol is

Γραβ =12gρν (∂αgβν + ∂βgνα − ∂νgαβ) .

Then, the components that we are interested in are just

Γµ00 =12

∑ν

gµν(∂0g0ν + ∂0gν0 − ∂νg00).

Now, as the time-differential of the metric is zero, all but the last term iszero. We shall also drop the implied summation;

Γµ00 = −12gµν∂νg00.

We shall now drop the greek index on the RHS, and only use roman. Thisis because the time-differential of the metric is zero. Thus,

Γµ00 = −12gµi∂ig00.

Inserting our expression for the metric,

Γµ00 = −12

(ηµi − hµi)∂i(η00 + h00)

= −12(ηµi∂ih00 − hµi∂ih00

).

Now, the expression on the far right is O(h2) thus, we ignore it. Therefore,

Γµ00 = −12ηµi∂ih00.

Finally, recall that the Minkowski metric is diagonal. Therefore, we onlyhave contribution for µ = i. Therefore, as ηii = −1, to first order static-perturbation

Γi 00 =12∂ih00, Γ0

00 = 0.

Page 74: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

66 Gravitation

Therefore, the geodesic equation is

d2t

dτ2= 0,

d2xi

dτ2+

12∂ih00 = 0.

Now, we use the first expression to tell us that dt = Adτ . We then setA = 1, to see that t = τ . Therefore, the second expression is just

d2xi

dt2+

12∂ih00 = 0.

Writing this as a vector equation, this is just

d2xdt2

+12∇h00 = 0,

trivially rewriting results in

d2xdt2

= −12∇h00.

Now then, recall the Newtonian equation,

d2xdt2

= −∇Φ(x).

Then, we can read off the correspondence,

h00 = 2Φ.

Finally, as the metric is just gµν = ηµν + hµν , then

g00 = 1 + 2Φ.

Therefore, we see that the time-component of a static perturbation to theMinkowski metric is the gravitational potential.

Recall the Riemann tensor, in a LIF,

Rρλµν = ∂µΓρλν − ∂νΓρλµ,

and thus the Ricci tensor,

Rλν = Rρλρν = ∂ρΓρλν − ∂νΓρλρ.

Now, let us compute the component R00. Then,

R00 = ∂ρΓρ

00 − ∂0Γρ 0ρ,

noting that

Γi 00 =12∂ih00, Γ0

00 = 0,

Page 75: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.5 Einstein’s Equation 67

we then see that

R00 = ∂iΓi 00

=12∂i∂ih00

=12∇2h00.

Further recall that we just derived that h00 = 2Φ, then

R00 = ∇2Φ.

Now then, we are now in a position to compute the constant κ in Einstein’sfield equation. Let us use the alternative form of the field equation, andtake the “00” components;

R00 = κ(T00 − 12g00T ).

Now, the trace T is just

T ≡ gµνTµν = Tµµ.

Therefore,

g00T = (η00 + h00)(η00 − h00)T00

= (η00η00 − η00h

00 + η00h00 − h00h00)T00

= T00 +O(h2).

Let us suppose that the field is generated by a static, non-relativistic body,mass density ρm. Then, T00 = ρm. Therefore, the field equation becomes

R00 = κ(ρm − 12ρm) = κ1

2ρm.

Now, we have the Poisson equation, ∇2Φ = 4πGρm, and also that R00 =∇2Φ. Therefore, equating the two,

∇2Φ = 4πGρm = 12κρm,

we see that

κ = 8πG.

Therefore, the full field equation is

Rµν = 8πG(Tµν − 12gµνT ). (1.5.5)

Page 76: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

68 Gravitation

1.5.4 Linearised Gravity

Let us take our perturbed metric,

gµν = ηµν + hµν , gµν = ηµν − hµν ,where hµν << 1. Now then, notice that

gµνgνλ = (ηµν − hµν)(ηνλ + hνλ)

= ηµνηνλ + ηµνhνλ − hµνηνλ − hµνhνλ= δµλ +O(h2).

Now, consider a coordinate transformation,

xµ 7→ x′µ = xµ + εµ, xµ = x′µ − εµ.Then, the Jacobians are clearly

Jµν = δµν + ∂νεµ,

(J−1

)µν

= δµν − ∂νεµ.The εµ << 1. So, we work to first order in εµ only. Now then, lets considerthe transformation of the metric,

g′µν =(J−1

)αµ

(J−1

)βνgαβ.

Then, using our Jacobians for the coordinate transformation, this becomes

g′µν = (δαµ − ∂µεα)(δβν − ∂νεβ)gαβ= (δαµδ

βν − δαµ∂νεβ − ∂µεαδβν + ∂µε

α∂νεβ)gαβ

= gµν − ∂νεβgµβ − ∂µεαgαν +O(ε2).

Now then, notice that by the product rule,

∂νεµ = ∂ν(gµβεβ) = εβ∂νgµβ + gµβ∂νεβ,

and therefore that

gµβ∂νεβ = ∂νεµ − εβ∂νgµβ.

Hence, using this, we see that the transformation of the metric looks like

g′µν = gµν − ∂νεµ − ∂µεν + εβ∂νgµβ + εα∂µgαν .

Now, the last two terms on the right are both O(ε2). This is because themetric is of O(ε), and therefore ε times the derivative of the metric is O(ε2).Therefore,

g′µν = gµν − ∂νεµ − ∂µεν +O(ε2).

Page 77: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.5 Einstein’s Equation 69

Now, using the fact that gµν = ηµν + hµν , and ηµν = η′µν , then the above isjust

ηµν + h′µν = ηµν + hµν − ∂νεµ − ∂µεν ,which simply becomes

h′µν = hµν − ∂νεµ − ∂µεν . (1.5.6)

1.5.4.1 Linearising Einstein’s Equation

Now, recall that Einstein’s equation was composed of the Ricci tensor andthe energy-momentum tensor. Now, the Ricci tensor was composed ofderivatives of the Christofell symbol, which in turn contained derivativesof the metric. Now, we can recompute the Einstein equation under thecoordinate transformation defined above as

xµ 7→ x′µ = xµ + εµ ⇒ h′µν = hµν − ∂νεµ − ∂µεν .So, consider

∂νgαβ = ∂ν(ηαβ + hαβ) = ∂νhαβ.

Therefore, the Christofell symbol, defined as

Γραβ =12gρν(∂αgβν + ∂βgνα − ∂νgαβ),

becomes

Γραβ =12ηρν(∂αhβν + ∂βhνα − ∂νhαβ).

Now, the Ricci tensor has components such as the product of two Christofellsymbols. It is clear that these will be O(ε2), and therefore negligible. Hence,the Ricci tensor would look like

Rµν = ∂ρΓρµν − ∂νΓρµρ.

Then, plugging in our Christofell symbols,

Rµν =12ηρσ(∂ρ∂µhνσ +∂ρ∂νhσµ−∂ρ∂σhµν −∂ν∂µhρσ−∂ν∂ρhσµ+∂ν∂σhµρ).

This becomes, after noting that partial derivatives commute, the Minkowskimetric commutes with partial derivatives and that the second and fifth termscancel,

2Rµν = ∂σ∂µhνσ − ∂ρ∂ρhµν − ∂ν∂µhρρ + ∂ρ∂νhµρ.

Now, hρρ ≡ h, and changing the σ index on the first expression to a ρ,

Rµν =12

(∂ρ∂µhνρ + ∂ρ∂νhµρ − ∂ρ∂ρhµν − ∂ν∂µh) .

Page 78: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

70 Gravitation

Then, the Ricci scalar is

R = gµνRµν

= ηµνRµν

=12

(∂ρ∂νhνρ + ∂ρ∂µhµρ − ∂ρ∂ρh− ∂ν∂νh)

= ∂ρ∂νhνρ − ∂ν∂νh.Now then, the Einstein tensor is defined as

Gµν ≡ Rµν − 12gµνR.

Therefore, using our linearised Ricci tensor and scalar,

Gµν =12

(∂ρ∂µhνρ + ∂ρ∂νhµρ − ∂ρ∂ρhµν − ∂ν∂µh−ηµν∂σ∂πhπσ + ηµν∂

ρ∂ρh) . (1.5.7)

Now, let us define

hµν ≡ hµν − 12ηµνh, (1.5.8)

and that the Lorentz gauge is

∂µhµν = ∂µhµν = 0. (1.5.9)

That is,

∂µhµν − 12ηµν∂

µh = 0,

which is just the statement that

∂µhµν =12ηµν∂

µh =12∂νh.

Hence, using this in (1.5.7) (and swapping the ∂µ∂ν ↔ ∂ν∂µ at will), we see

that

Gµν =12

(12∂µ∂νh+

12∂ν∂µh− ∂ρ∂ρhµν − ∂ν∂µh− 1

2ηµν∂

σ∂σh+ ηµν∂σ∂σh

).

Now, the first and second terms are identical, but their sum cancels withthe fourth term. Hence,

Gµν =12

(−∂ρ∂ρhµν − 1

2ηµν∂

σ∂σh+ ηµν∂σ∂σh

).

The second and third terms add, to give

Gµν =12

(−∂ρ∂ρhµν +

12ηµν∂

σ∂σh

).

Page 79: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.5 Einstein’s Equation 71

Now, if we use a little bit of notation,

≡ ∂µ∂µ,then

Gµν = −12

(hµν − 1

2ηµνh

),

or

Gµν = −12

(hµν − 1

2ηµνh

).

Hence, using our substitution (1.5.8) again,

Gµν = −12hµν .

Then, if we write down Einstein’s equation,

Gµν = 8πGTµν ⇒ hµν = −16πGTµν .

Therefore, we have a wave equation in the metric perturbation, with theenergy-momentum tensor as the source. This is the equation for gravita-tional radiation.

1.5.4.2 Gravitational Radiation

Under the Lorentz gauge (to be inkeeping with the literature, this is some-times also referred to as the Einstein gauge, or harmonic gauge),

∂µhµν = 0, hµν ≡ hµν − 1

2ηµνh,

Einstein’s equation becomes

hµν = −16πGTµν , ≡ ∂µ∂µ.That is, a wave equation.

hµν = −16πGTµν . (1.5.10)

We can write down the solution to this directly, if one recalls the solutionto the equivalent equation from electrodynamic theory.

In electrodynamics, under the Lorentz gauge ∂µAµ = 0, we could derivethe wave equation

Aν = µ0Jν ,

which has solution

Ai =µ0

∫d3x′

J iret

|x− x′| .

Page 80: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

72 Gravitation

Hence, we can basically read off our solution by analogy,

hij = 4G∫d3x′

T ijret

|x− x′| . (1.5.11)

One should recall that these are retarded integrals. The minus sign has“gone” because we have raised indices.

Therefore, we have derived that upon linearising Einstein’s equation, andusing the Lorentz gauge, we have derived that there is a wave equationin the metric perturbation. The source to the wave is the distribution ofenergy-momentum.

1.6 The Schwarzschild Solution

We can write Einstein’s equation, in a vacuum, as

Rµν = 0. (1.6.1)

That is, in a vacuum, where Tµν = 0, the “alternative form” of Einstein’sequation reduces to the above.

Now, we can look for spherically symmetric solutions to this. That is,we are looking for a line element which possesses spherical symmetry. Themost general such line element is

ds2 = eν(r,t)dt2 − eλ(r,t)dr2 − r2(dθ2 + sin2 θdφ2).

The reason we make this supposedly general line element diagonal, is thatwe can transform out of a frame in which there are diagonal elements.

In the line element we chose to use exponentials, as they are generallyeasy to work with (differentiating them is easy). Hence, the aim is to nowfind those functions ν(r, t), λ(r, t).

Now, although we shall not derive them, the only non-zero components

Page 81: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.6 The Schwarzschild Solution 73

of the Ricci tensor are

Rtt =12e−λ

(ν ′′ +

12ν ′(ν ′ − λ′) +

2ν ′

r

)+e−ν

(λ(ν − λ)− 1

),

Rtr =λ

2r,

Rrr =12e−ν

(λ− 1

2λ(ν − λ)

)−1

2e−λ

(ν ′′ +

12ν ′(ν ′ − λ′)− 2λ′

r

),

Rθθ = 1− e−λ(

1 +12r(ν ′ − λ′)

),

Rφφ = sin2 θRθθ.

We have used that an over-dot represents derivative with respect to time t,and a prime with respect to r.

Hence, due to the reduction of Einstein’s equation to the form Rµν = 0,each of these equation are equal to 0.

The easiest to start with, is the Rtr term. So,

λ

2r= 0,

which immediately allows us to state that λ(r) only. That is, λ does nothave any dependance upon t. Thus, using λ = 0 allows Rtt and Rrr to lookvery similar. Infact, as Rrr = Rtt = 0, then Rrr +Rtt = 0. This then easilyshows that

Rtt +Rrr =12e−λ

(2ν ′

r+

2λ′

r

)= 0,

that is, assuming r 6= 0,

ν ′ + λ′ = 0.

Integrating this easily shows that

ν + λ = f(t).

Now, we can set f(t) to zero, by a time coordinate transformation. Then,ν = −λ. Therefore,

ν(r) = −λ(r).

Page 82: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

74 Gravitation

Hence, using this in Rθθ, we see that

Rθθ = 1− eν(1 + rν ′) = 0,

that is,

eν + rν ′eν = 1.

Now, we can rewrite this as

(reν)′ = eν + rν ′eν = 1,

that is, as

(reν)′ = 1.

Integrating easily reveals that

eν = 1 +C

r,

where C is some constant. We can find the value of C, by considering theNewtonian limit of the metric. That is, recall that we derived

g00 = 1 + 2Φ,

where we know that

Φ = −GMr.

Now, eν = g00 by inspection (it is the coefficient of the dt2 term). Hence,

1− 2GMr

= 1 +C

r⇒ C = −2GM.

Let us recall that this M is the mass of the body generating the potential Φ.That is, it will be the mass of the planet/star that is curving the spacetime.Therefore,

eν = 1− 2GMr

, eλ =(

1− 2GMr

)−1

.

And finally, we have our metric,

ds2 =(

1− 2GMr

)dt2 −

(1− 2GM

r

)−1

dr2 − r2(dθ2 + sin2 θdφ2).(1.6.2)

That is, we have the vacuum solution of Einstein’s equation, due to a bodyof mass M ; where r > 0. This metric is called the Schwarzschild metric.

Page 83: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.6 The Schwarzschild Solution 75

Properties of the Schwarzschild Metric The metric, by construction, isspherically symmetric. Also, the metric is static; it clearly is not a functionof time. That the metric is static, then means that changing the timecoordinate by a constant amount leaves the metric unchanged. That is, themetric is invariant under constant translations and reflections. Also noticethat the metric has Killing vectors (1, 0, 0, 0) and (0, 0, 0, 1) (i.e. on t andφ); these correspond to conservation of energy and angular momentum.

Notice that as r → ∞, the metric goes over to Minkowski. That is, wesay that the metric is asymptotically flat.

Also notice, at r = 2GM , the gtt and grr components flip sign. We callthis the Schwarzschild radius, or the event horizon. We denote the eventhorizon as

rs ≡ 2GM. (1.6.3)

1.6.0.3 Gravitational Redshift

Consider some radial slices in the metric, so that ds2 = gttdt2. Also consider

that

dτ =ds

c,

hence,

dτ =√gtt

dt

c.

Now, it is fairly obvious that a frequency is inversely proportional to theproper time. That is,

ν ∝ 1∆τ

.

Now, if we take two events which are at the same t, then

ν1

ν2=

√gtt(2)gtt(1)

.

If we use a weak gravitational field, then we can use the previously derivedrelation gtt = 1 + 2Φ. Hence, this gives

ν1

ν2= 1 + Φ(2)− Φ(1).

That is, the shift in frequency is a function of the distance from the gravi-tating body.

Page 84: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

76 Gravitation

1.6.1 Dynamics in the Schwarzschild Spacetime

Recall that the effective Lagrangian is

Leff =(ds

)2

.

Therefore, the effective Lagrangian is

Leff =(

1− rs

r

)t2 −

(1− rs

r

)−1r2 − r2(θ2 + sin2 θφ2), (1.6.4)

where an over-dot denotes derivative with respect to the affine parameterτ , and rs = 2GM . So, let us consider the first integrals of this effectiveLagrangian.

The Euler-Lagrange equations, for this effective Lagrangian are

d

∂Leff

∂xµ− ∂Leff

∂xµ= 0.

Then, consider that

∂Leff

∂t= 0,

∂Leff

∂t= 2

(1− rs

r

)t,

then, the t-first integral is that

2(

1− rs

r

)t = const ≡ 2ε.

Similarly,∂Leff

∂φ= 0,

∂Leff

∂φ= 2r2 sin2 θφ,

with its first integral being

2r2 sin2 θφ = const ≡ 2`.

These constants, ε, `, are related to the conserved energy and angular mo-mentum, per unit mass. Recall that these were predicted to be conserved,by the associated Killing vectors.

Finally, the effective Lagrangian is just the line element, and that cantake on one of 3 values;

Leff = K =

0 null,

+1 time-like,−1 space-like.

Hence, using the derived relations for `, ε, we can easily see that

t2 = ε2(

1− rs

r

)−2, φ2 =

`2

r4 sin4 θ.

Page 85: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.6 The Schwarzschild Solution 77

And thus, using that the effective Lagrangian is just a constant K, we caneasily put the effective Lagrangian into the form

K =(

1− rs

r

)−1 (ε2 − r2

)− r2

(θ2 +

`2

r4 sin2 θ

).

Now, as the system has spherical symmetry, we may as well take a value ofθ that makes the above expression look simpler. Taking θ = π/2 (note thatthen θ = 0), we see that

K =(

1− rs

r

)−1 (ε2 − r2

)− `2

r2,

which is trivially just

K =(

1− rs

r

)−1[ε2 −

(dr

)2]− `2

r2.

Before we carry on with this expression, let us compute the Christofell sym-bols and geodesics.

1.6.1.1 Geodesics & Christofell Symbols

Let us compute the geodesics and Christofell symbols for the effective La-grangian (1.6.4) in this Schwarzschild spacetime.

We can compute the geodesic for the θ-component of the effective La-grangian. We have that

∂Leff

∂θ= −2r2θ,

∂Leff

∂θ= −2r2 sin θ cos θφ2,

hence,d

∂Leff

∂θ= −4rrθ − 2r2θ.

Therefore, the Euler-Lagrange equation for the θ-component, is

−4rrθ − 2r2θ + 2r2 sin θ cos θφ2.

Putting this into a more usable form,

θ +2rrθ − sin θ cos θφ2 = 0.

Thus, we have the geodesic for θ. Now, we can read off the Christofellsymbols. The non-zero components are

Γθ rθ = Γθ θr =1r, Γθ φφ = − sin θ cos θ.

Page 86: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

78 Gravitation

We can compute the geodesic for r. So,

∂Leff

∂r= −2

(1− rs

r

)−1r,

∂Leff

∂r= t2

rs

r2− rs

r2

(1− rs

r

)−2r2 − 2r(θ2 + sin2 θφ2),

and

d

∂Leff

∂r= −2r

(1− rs

r

)−1+ 2r2 rs

r2

(1− rs

r

)−2.

Therefore, the geodesic is

−2r(

1− rs

r

)−1+ 2r2 rs

r2

(1− rs

r

)−2 − t2 rs

r2− rs

r2

(1− rs

r

)−2r2

+2r(θ2 + sin2 θφ2) = 0.

This simplifies down to

r − r2rs

2r2

(1− rs

r

)−1+t2rs

2r2

(1− rs

r

)+ r

(1− rs

r

)(θ2 + sin2 θφ2) = 0.

This is the r-geodesic. From this, we can read off the non-zero Christofellsymbols. They are

Γr rr= −rs

2r2

(1− rs

r

)−1, Γr tt=

rs

2r2

(1− rs

r

),

Γr θθ = r(

1− rs

r

), Γr φφ = r sin2 θ

(1− rs

r

).

Then, let us compile these four geodesics (i.e. including the two notexplicitly computed here). The geodesics for the Schwarzschild spacetimeare:

t+rs

r2

(1− rs

r

)−1tr = 0,

r − rs

2r2

(1− rs

r

)−1r2 +

rs

2r2

(1− rs

r

)t2 + r

(1− rs

r

)(θ2 + sin2 θφ2) = 0,

θ +2rrθ − sin θ cos θφ2 = 0,

φ+2rrφ+ 2 cot θθφ = 0.

These complicated non-linear differential equations can be solved to find thetrajectories of particles in the spacetime. The non-zero Christofell symbols

Page 87: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.6 The Schwarzschild Solution 79

are easily read off, and can be seen to be

Γt rt =rs

2r2

(1− rs

r

)−1, Γr rr= −

rs

2r2

(1− rs

r

)−1,

Γr tt=rs

2r2

(1− rs

r

),

Γr θθ = r(

1− rs

r

), Γr φφ = r sin2 θ

(1− rs

r

),

Γθ rθ = Γθ θr =1r, Γθ φφ = − sin θ cos θ,

Γφrφ =1r, Γφθφ = cot θ.

1.6.1.2 Orbits

Let us return to the expression we derived, for θ = π/2,

K =(

1− rs

r

)−1[ε2 −

(dr

)2]− `2

r2.

We can rearrange it into the form

r2 = ε2 −K −[`2

r2

(1− rs

r

)− Krs

r

],

and indeed into the form

12r2 =

ε2 −K2

−[`2

2r2

(1− rs

r

)− Krs

2r

].

Now, we put it into this form, as we see that the LHS is a “velocity term”,the middle term is just the “energy”, and the far-RHS we call the effectivepotential Veff:

E =12r2 + Veff(r),

where

Veff ≡ `2

2r2

(1− rs

r

)− Krs

2r. (1.6.5)

Now, one familiar with the Newtonian derivation of this formula, will realisethat this expression is not quite the same as its Newtonian counterpart. TheGR “correction” is the rs/r, creating a 1/r3 term.

Just to recap what the symbols are in this effective potential. ` is theangular momentum of the “moving thing”, rs ≡ 2GM , where M is the massof the “big body” that the “moving thing” is moving in. That is, the bigbody is curving spacetime, and some moving object is having its motiondeflected, by the curved spacetime, which is due to the big body. The

Page 88: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

80 Gravitation

amount of deflection is just a function of the distance from the big body tothe smaller one. We shall call the “smaller body” the test mass, and the“big body” the gravitating mass.

Suppose that ` = 0. Then,

Veff = −Krs

2r= −K 2GM

2r= −KGM

r.

This is the Newtonian result. That is, for a test mass with no angularmomentum, the effective potential is just what we would expect.

Recall that we derived

ε =(

1− rs

r

) dtdτ,

then, we can clearly see that

dt

dτ= ε

(1− rs

r

)−1.

That is, the proper time of a test mass is a function of the distance fromthe gravitating mass, and of the total energy.

Let us now give some results relating to orbits in the spacetime. Circularorbits have

dVeff

dr= 0,

and stable circular orbits are those for whom the second differential of thepotential is positive.

Particle Orbits K = 1 If we vary the angular momentum, `, with respectto the event horizon, rs, then various shapes of effective potential are found.With reference to Fig. 1.1, we see the 3 ranges of `.

• ` < √3 rs. Here, we see that any particle with energy E > 0, escapes,whilst any particle with E < 0 crushes back into the origin. No stableorbits exist.• √3 rs < ` < 2rs. For this range, there are two positions in which orbits

can exist, but only one of then is capable of sustaining stable orbits. IfE > 0, then any particle will escape. If we define Vmax as the positionof the maximum of Veff, and Vmin as the minimum, then we can see thatfor any 0 > E > Vmax, a particle will crush into the origin. Also, for aparticle with E = Vmin, then there is a stable circular orbit. E = Vmax isan unstable circular orbit. Any particle trapped in the “well” will havesome sort of elliptical orbit.

Page 89: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.6 The Schwarzschild Solution 81

(a) ` <√

3 rs (b)√

3 rs < ` < 2rs

(c) ` > 2rs

Fig. 1.1. The effective potential, as a function of distance from the gravitating body,for particle orbits.

• ` > 2rs. Here, if E = Vmin, the particle will have a stable circular orbit,and elliptical for perturbations about that minimum. If a particle hasE < Vmax, and lives to the left of the maximum, then it will crush intothe origin. Now, if a particle has E < Vmax, and approaches the systemfrom the right of the maximum, then the particle will be repelled back to∞. However, above a certain value, the particle will hit the origin. Thisis not present in Newtonian gravity, where particles are always repelled.

Photon Orbits K = 0 In this case, we have that

Veff ≡ `2

2r2

(1− rs

r

).

Upon plotting the effective potential, we see that for a given E < Vmax, itdepends on where the photon is, relative to the peak. That is, if the photonis within the peak, the photon will crush to the origin. If the photon isoutside, then the photon will repel to infinity.

1.6.1.3 Summary

Let us just summarise the results obtained, as they will be useful in subse-quent discussions.

Page 90: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

82 Gravitation

Fig. 1.2. The effective potential, as a function of distance from the gravitating body,all for photon orbits.

We derived that, on a θ = π/2 trajectory,

12r2 = E − Veff,

where the “energy” is given by

E =ε2 −K

2,

and the effective potential by

Veff =`2

2r2

(1− rs

r

)− Krs

2r.

The angular momentum of the test mass was computed to be

` = r2φ,

and the energy density

ε =(

1− rs

r

)t.

Light-like trajectories are those for whom K = 0. Particle-like are those forwhom K = 1. The event horizon is related to the mass of the gravitatingbody rs = 2GM , and is idealised so that all mass is concentrated at a singlepoint. Over-dots represent derivative with respect to the affine parameter.Notice that we can then easily write that

dr=

φ

r

= ± 1r2

`√2 (E − Veff)1/2

. (1.6.6)

Page 91: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.6 The Schwarzschild Solution 83

1.6.2 Light Deflection

We can compute the angle that light is deflected by, due to the curvedspacetime of a star.

drrmin

!"defl

!"

Fig. 1.3. Light deflection due to a gravitating mass. Notice how various angles aredefined. d is the impact parameter of the photon, with respect to the radius of thegravitating mass.

The effective potential, for photons with K = 0, reads

Veff =`2

2r2

(1− rs

r

). (1.6.7)

Consider the combination

`

ε=

r2φ(1− rs

r

)t,

then, considering that r rs, then

`

ε≈ r2dφ

dt+O

(rs

r

)⇒ `

ε= r2dφ

dt. (1.6.8)

Now, for small angles, we have that

φ =d

r.

Then,dφ

dt= − d

r2

dr

dt.

Page 92: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

84 Gravitation

Now,dr

dt= −1,

where the unity comes from c = 1, and the minus-sign because distances areshrinking. Hence,

dt=

d

r2,

which we use in (1.6.8) to see that

`

ε= d.

Therefore, for photons,

d =`

ε=

`√2E

. (1.6.9)

Now, with reference to Fig. 1.3, we see that the deflection angle is given by

δφdefl = ∆φ− π.The total angle change is just the integral

∆φ =∫dφ,

or, as we have an expression for dφ/dr,

∆φ =∫drdφ

dr.

Hence, using (1.6.6), we have that

∆φ = 2∫ rmax

rmin

dr1r2

`√2 (E − Veff)1/2

, (1.6.10)

using the light-like effective potential (1.6.7),

∆φ = 2∫ rmax

rmin

dr1r2

`√

2[E − `2

2r2

(1− rs

r

)]1/2.

We take rmax → ∞, and note that the factor of 2 out-front is due to thephoton coming from infinity, the going back to infinity. If we put the factorof ` inside the square-root in the denominator, as well as the

√2 , then

∆φ = 2∫ ∞rmin

dr

r2

[2E`2− 1r2

(1− rs

r

)]−1/2

.

Page 93: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.6 The Schwarzschild Solution 85

Now, noting that via (1.6.9), we rewrite

2E`2

=1d2,

and also change variables to

w ≡ d

r⇒ dw = − d

r2dr.

Hence, using this change of variables, and rewrite,

∆φ = 2∫ 0

wmax

−dwd

[1d2− w2

d2

(1− rsw

d

)]−1/2

,

the minus sign obviously flipping the integration limits to

∆φ = 2∫ wmax

0

dw

d

[1d2− w2

d2

(1− rsw

d

)]−1/2

.

The factor of 1d can be taken inside the square-root, giving

∆φ = 2∫ wmax

0dw[1− w2

(1− rsw

d

)]−1/2.

Now, if we refer to (1.6.10), we see that there is a singularity at E = Veff. Itis an integral of the form∫ 1

0

dx√x+ ε

≈∫ 1

0

dx√x− ε

2

∫ 1

0

dx

x3/2,

whereby upon integration, the first term does not give a singularity, butthe second does (at zero). Thus, we say that the integral has an essentialsingularity.

Let us continue. If we take out a factor, from the square root, then

∆φ = 2∫ wmax

0dw(

1− rs

dw)−1/2

[(1− rs

dw)−1 − w2

]−1/2

.

Now, we can expand the two terms,(1− rs

dw)−1/2

= 1+rs

2dw+O

(rs

d

)2,(

1− rs

dw)−1

= 1+rs

dw+O

(rs

d

)2,

so that

∆φ = 2∫ wmax

0dw(

1 +rs

2dw)(

1 +rs

dw − w2

)−1/2+O

(rs

d

)2.

We can obviously now multiply out the bracket,

∆φ = 2∫ wmax

0dw(

1 +rs

dw − w2

)−1/2+rs

d

∫ wmax

0dww(1− w2)−1/2.

Page 94: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

86 Gravitation

Now, we can see that there is a pole in the second integral, at wmax = 1. Ifwe look up the values of the integrals, we find∫ wmax

0dw(

1 +rs

dw − w2

)−1/2=

π

2+rs

2d,∫ wmax

0dww(1− w2)−1/2 = 1.

Therefore, we see that

∆φ = π +2rs

d,

and hence, the deflection angle,

δφdefl =2rs

d. (1.6.11)

Therefore, we have derived the deflection angle of a photons trajectory, withimpact parameter d with respect to a gravitating body of event horizon rs.

To get a handle on how big this angle is, consider the Sun. rs ≈ 3km, andsuppose the photon just grazes the suns surface. Then,

δφ =2rs

d=

2.3km7× 10−5km

≈ 10−5rad.

This angle is equivalent to the observed height of a 1m high object, viewedfrom 10km away. That is, the effect is very small. However, this angle canbe measured (best in solar eclipses), and has been confirmed to be closerto the actual value than the Newtonian prediction (which is a factor of 4smaller).

This is one of the tests of general relativity.

1.6.3 Perihelion Precession

Here we consider the motion of a planet, about a star. Supposing that theorbit of the planet is elliptical, and that the “size” of the orbit is unchangedover may periods, does the “position” of the orbit change? That is, after eachrevolution, let us consider that rmin is the same, but is shifted in positionby δφprec. Then, we have that

∆φ = δφprec − 2π,

where we use 2π to make the Newtonian prediction give ∆φ = 0.We follow a similar tack as for light deflection, but we must take K = 1

as we are dealing with time-like objects. So, the effective potential is now

Veff =`2

2r2

(1− rs

r

)− rs

2r.

Page 95: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.6 The Schwarzschild Solution 87

We also use (1.6.6)

dr=

1r2

`√2 (E − Veff)1/2

, (1.6.12)

where the energy for time-like objects is

E =ε2 − 1

2.

Then, we write, as before,

∆φ = 2∫ rmax

rmin

drdφ

dr,

which is just

∆φ = 2∫ rmax

rmin

dr`

r2

[√2 (E − Veff)1/2

]−1,

putting in the effective potential,

∆φ = 2∫ rmax

rmin

dr`

r2

[2E − `2

r2

(1− rs

r

)+rs

r

]−1/2

.

If we now take the ` inside the square-root, and use the expression for E,then

∆φ = 2∫ rmax

rmin

dr1r2

[ε2

`2− 1`2− 1r2

(1− rs

r

)+

rs

r`2

]−1/2

.

Let us rewrite the square-rooted bit slightly,

ε2

`2− 1r2

(1− rs

r

)− 1`2

(1− rs

r

).

Let us change integration variables,

u ≡ 1r,

hence,

∆φ = 2∫ umax

umin

du

[ε2

`2− u2(1− rsu)− 1

`2(1− rsu)

]−1/2

.

If we take out a common factor,

∆φ = 2∫ umax

umin

du(1− rsu)−1/2

[ε2

`2(1− rsu)−1 − 1

`2− u2

]−1/2

.

Page 96: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

88 Gravitation

We now expand out, but we must take to higher order within the expressionon the right,

∆φ = 2∫ umax

umin

du(

1 +rsu

2

)[ε2

`2(1 + rsu+ r2

su2)− 1

`2− u2

]−1/2

,

collecting terms,

∆φ = 2∫ umax

umin

du(

1 +rsu

2

)[ε2

`2(1 + rsu)− 1

`2− u2

(1− ε2r2

s

`2

)]−1/2

,

thus,

∆φ = 2(

1 +ε2r2

s

2`2

)∫ umax

umin

du

[ε2

`2(1 + rsu)− 1

`2− u2

]−1/2

+rs

∫ umax

umin

duu

[ε2

`2(1 + rsu)− 1

`2− u2

]−1/2

.

Now, by looking up the integrals, the first gives π, the second π2 (umin+umax).

Now, the integrand on the second integral has poles at the integration limits.Therefore, one can easily see that the sum of the roots of the integrand, is

ε2

`2rs,

and therefore

∆φ = 2π(

1 +ε2r2

s

2`2

)+πε2r2

s

2`2.

Hence, we read off

δφprec =3πr2

s

2`2=

6πG2M2

`2.

Now, in getting a handle on how big this is, we appeal to standard ellipse-theory. The result of which allows us to write the angular momentum ` interms of the semi-major axis a of the orbit, and the eccentricity e,

`2 = GMa(1− e2).

Hence, the precession angle reads

δφprec =6πGMa(1− e2)

. (1.6.13)

See Table (1.1) for a comparison of the prediction and observations of theseprecession angles.

Page 97: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.6 The Schwarzschild Solution 89

Planet GR Prediction (per century) Observation

Mercury 43′′ 43.1± 0.5′′

Venus 8.6′′ 8.4± 4.8′′

Earth 3.8′′ 5.0± 1.2′′

Table 1.1. The GR prediction of, and experimental observation of, theperihelion precession of various planets. The agreement is one of the most

convincing experimental “proofs” of general relativity.

1.6.4 Black Holes

Let us consider what the mass and radius is, of a gravitating body for whomthe escape velocity is the speed of light. That is, what is M,R for whichvesc = c?

Recall that the Newtonian expression for total energy is

EN =12mv2 − GMm

r,

so that rearranging into the familiar form

12

(dr

dt

)2

=ENm−(−GM

r

), v =

dr

dt,

we see the presence of the effective potential. Now, escape velocity is whenEN = 0, which corresponds to

v2esc =

2GMR

,

which we require to be c2, which, under the units of c = 1, is just thestatement that

R = 2GM = rs.

That is, we seem to have derived the Schwarzschild radius (which was a GRresult) using Newtonian mechanics. This is actually just a coincidence, aswe have neglected both SR and GR (i.e. no mention of mass-energy in theabove derivation).

Let us return to the Schwarzschild metric, with the assumption that θ, φare constant. Then, it reads

ds2 =(

1− rs

r

)dt2 −

(1− rs

r

)−1dr2.

Notice that this expression has two singularities. One at r = rs and one atr = 0.

Now, it is not immediately obvious whether these singularities are an

Page 98: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

90 Gravitation

artifact of how we have constructed our coordinate system, or if they are“true singularities”. So, a way of finding this out, would be to construct aquantity that is invariant of coordinate system. Such a quantity is of coursea scalar. Now, we want a scalar that is dependent upon the geometry ofthe system. Such quantities are the contracted Riemann and Ricci tensors,and the Ricci scalar. Now, experience has shown us that the best test is theRiemann tensor, in the form

RαβνµRαβνµ =6r2

s

r6.

That is, we see that this coordinate-system independent quantity does nothave a singularity as r → rs, but does have one for r → 0.

Therefore, we see that r → rs is a removable axis singularity, whereby wecan change coordinates so that the metric does not retain the singularity;and that r → 0 is an essential singularity. Now, although we shall not gointo it at all, a quantum theory of gravity will be able to “sort out” thisessential singularity.

1.6.4.1 Null Geodesics

Let us consider the case ` = 0, and ds2 = 0. Then, the metric is just(1− rs

r

)dt2 −

(1− rs

r

)−1dr2 = 0,

which trivially rearranges into(dr

dt

)2

=(

1− rs

r

)2,

which is justdr

dt= ±

(1− rs

r

).

Notice that this is the radial geodesic. So, we can solve this,

t = ±∫

dr

1− rs/r

= ±∫

rdr

r − rs

= ±∫dr

(1 +

rs

r − rs

)= ± [r + rs ln |r − rs|+ const]

= ±[r + rs ln

∣∣∣∣ rrs− 1∣∣∣∣+ const

].

Page 99: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.6 The Schwarzschild Solution 91

Now, we define the tortoise coordinate

r∗ ≡ r + rs ln∣∣∣∣ rrs− 1∣∣∣∣ , (1.6.14)

so that the geodesic reads

t = ±r∗ + const

Now, notice that for flat space, rs → 0. Hence, the geodesics read

t = ±r + const.

Hence, we denote this as

u = t− r, v = t+ r,

so that lines of u = const and v = const define the null geodesics. See Fig.1.4 for these lines.

r

t

Fig. 1.4. Null geodesics for flat space. Blue (left to right) lines are u = const,Red (right to left) lines are v = const. Photons move on these lines, and massiveparticles move within a light cone, defined by the lines. That is, the light cone isdefined at an intersection of lines of v = const and u = const; where the particlesfuture is everything above that point, within that cone, and its past is everythingbelow that point, within the cone.

We say that u, v are the light-cone coordinates for flat space. Hence, forflat space, the metric is

ds2 = dt2 − dr2 − r2(dθ2 + sin2 dφ2)

Now, notice that

t =12

(u+ v), r =12

(v − u).

Also that

dr =dr

dudu+

dr

dvdv =

12

(dv − du), dt =12

(dv + du).

Page 100: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

92 Gravitation

Therefore, the metric reads

ds2 = dudv − r2(dθ2 + sin2 dφ2),

which is no longer diagonal.

1.6.4.2 Eddington-Finkelstein Coordinates

Now, let us return computing the null geodesics, but for curved space. Weshall still use the light-cone coordinates,

u = t− r∗, v = t+ r∗,

with the tortoise coordinate

r∗ = r + rs ln∣∣∣∣ rrs− 1∣∣∣∣ . (1.6.15)

From which we can compute

dr∗dr

=r

r − rs⇒ dr2 =

(1− rs

r

)2dr2∗.

r

r*

Fig. 1.5. The tortoise coordinate (1.6.15). The position of rs is obvious.

Now, the Schwarzschild metric may be written as (where we are suppress-ing the angular part)

ds2 =(

1− rs

r

)[dt2 − dr2(

1− rsr

)2],

which, using our derived relation for dr2∗. is

ds2 =(

1− rs

r

) [dt2 − dr2

∗].

This, in terms of u, v is just

ds2 =(

1− rs

r

)dudv.

Page 101: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.6 The Schwarzschild Solution 93

Notice that this metric is no longer singular at r = rs, but is still singularat r = 0.

r

t

Fig. 1.6. The null geodesics for curved spacetime. Blue lines are u = const and redlines are v = const. The future direction, for a light cone, is that were a red line ison the left, and a blue line on the right. Notice that for r > rs, all future cones arepointing upwards, and that at r < rs, all future cones point leftwards.

With reference to Fig. 1.6, we can see the geodesics for curved spacetime.We have plotted the lines u = const and v = const. The interesting thingsto note from the plot:

• As r decreases towards rs, the angle between a u and v line decrease. Thismeans that the future (and past) light cone of a particle becomes sharper.This means that “stuff” must be closer to the particle for it to influencethe particle, as the particle gets closer to the Schwarzschild radius.• As a particle crosses r = rs, light cones flip 90, and point towards thet-axis. That is, the future of the particle can only be for motion towardsthe origin. That is, the particle can never escape.

Therefore, we have seen that as a particle crosses the Schwarzschild radius,its light cone gets tilted so that its future is always within the Schwarzschildradius. That is, particles can get into this region, but never out.

Thus, we see that r = rs is some sort of membrane which allows one-waytravel. This is the event horizon.

Therefore, we see how black holes “work”. We have only considered sta-tionary black holes. To consider rotating holes, one must analyse the Kerrmetric, which we shall not do here.

Hawking Radiation Now, classically, particles cannot escape from a blackhole, as we have just seen. However, quantum mechanically, they can tunnelout. According to quantum field theory, there is a “sea” of particle-anti-particle pairs being created and annihilated all the time, in vacuum (i.e.

Page 102: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

94 Gravitation

there is no true vacuum). Now, suppose one of these pairs were created onthe event horizon, so that one of the particle gets created inside the horizon,one out side. Then, as the particle inside cannot get out (it is inside thehorizon), then it cannot annihilate with the one that was created outsidethe horizon. Therefore, the particle outside the horizon can escape. Now,the energy to create the particle-anti-particle pair came from the vacuuminside the horizon. Therefore, by the particle escaping, energy is removedfrom the black hole, and over time, the black hole evaporates. This is calledHawking radiation. To properly understand this radiation requires a hugeamount of QFT, which we shall not go into here.

This effect can be conceived in a rather tamer environment. Consider twometal plates, which posses opposite electric charge, where the space betweenthe plates is “vacuum”. Now, the energy density due to the electric fieldmay be ramped up so that it is high enough to create an electron-positronpair from the vacuum. This experiment, as far as I am aware, has not beendone, but it is conceivable to see that it could (if the idea of a sea of virtualparticles is correct).

1.7 The Friedmann-Robertson-Walker Universe

We shall abbreviate the above name to FRW.Now, we can start to consider the geometry of our universe. Historically,

there were two theories for the universe.The FRW universe was one based upon the cosmological principle: “Our

universe is homogenous and isotropic.” This means that the universe ispretty much the same everywhere you look, and in any direction. That is,the ensemble properties of the universe are invariant under both translationand rotation.

The competing theory was that of a steady state universe, proposed in1948 by F.Hoyle, H.Bondi and T.Gold. The steady state theory was a more“perfect” version of the cosmological principle, by imposing a condition thatthe universe be invariant under time as well as translation/rotation. Thismeans that the universe looks the same at any time.

The main differences between the theories are that the FRW universestarted, and then expanded, whereas the steady state universe “always hasbeen”. At the time these two theories were proposed, the church preferredFRW, with scientists preferring steady state.

The FRW universe model predicts some background radiation from the be-ginning event (i.e. the big bang), in the form of the cosmic microwave back-ground (CMB). The CMB signature was predicted by Gamow and Alpher,

Page 103: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.7 The Friedmann-Robertson-Walker Universe 95

and was observed by Penzias and Wilson. Therefore, providing evidence forthe FRW universe.

The standard model of cosmology, today, uses the FRW model of theuniverse.

1.7.1 The FRW Metric

Schur’s theorem (which we state without proof) states a globally isotropicn-dimensional manifold (n > 2) has a constant curvature k, and that theRiemann tensor has the form

Rµναβ = k(gµαgνβ − gµβgνα).

Following this, one can construct a isotropic metric,

ds2 = dt2 − a2(t)dσ2, (1.7.1)

where dσ2 is the line element for 3−dim space, and a(t) is the scale factor.We define the Hubble parameter, noting its present value,

H ≡ a

a, H0 = 73 km/sec/Mpc;

where it is important to note that an overdot here denotes derivative withrespect to coordinate time t. Furthermore, the metric actually looks like

ds2 = dt2 − a2(t)[

dr2

1− kr2+ r2

(dθ2 + sin2 θdφ2

)]. (1.7.2)

Then, by a suitable coordinate transformation, the curvature constant k cantake on one of 3 values,

k =

1 closed0 flat−1 open

(1.7.3)

So, consider the values of k, to see how the actually correspond to the above“claimed” geometries.

Closed Space Consider setting k = 1, and the transformation

r = sinχ ⇒ dr = cosχdχ,

so that the metric looks like

ds2 = dt2 − a2(t)[

cos2 χdχ2

1− sin2 χ+ sin2 χ

(dθ2 + sin2 θdφ2

)],

Page 104: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

96 Gravitation

which simplifies trivially down to

ds2 = dt2 − a2(t)[dχ2 + sin2 χ

(dθ2 + sin2 θdφ2

)].

Now, consider taking a slice through θ. That is, set θ = π/2, then one findsthat

ds2 = dt2 − a2(t)[dχ2 + sin2 χdφ2

],

where it is clear that the bracketed quantity is the line element of the 2-sphere. That is,

dχ2 + sin2 χdφ2 ⇒ sphere.

(a) Sphere - Closed (b) Hyperboloid - Open

Fig. 1.7. A visualisation of closed and open geometries.

Open Space Consider setting k = −1, and the coordinate transformation

r = sinhχ.

Then, under a completely analogous manner as before, we get the line ele-ment

ds2 = dt2 − a2(t)[dχ2 + sinh2 χdφ2

],

where we now notice that

dχ2 + sinh2 χdφ2 ⇒ hyperboloid.

Page 105: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.7 The Friedmann-Robertson-Walker Universe 97

That is, k = −1 corresponds to a geometry based upon the surface of ahyperboloid.

Flat Space Let us set k = 0, and the transformation

r = χ,

then, we have the line element

ds2 = dt2 − a2(t)[dχ2 + χ2

(dθ2 + sin2 θdφ2

)],

if we set θ = π/2 again, then the square-bracketed quantity is just

dχ2 + χ2dφ2.

This line element is just that of plane polars, which is flat. Hence, we seethat k = 0 corresponds to flat space,

These correspondences of k with a particular geometry will become muchclearer later on.

The standard way to write the FRW metric, in light of these coordinatetransformations, is

ds2 = dt2 − a2(t)

dχ2 +

sin2 χ

χ2

sinh2 χ

(dθ2 + sin2 θdφ2) ,

k =

+10−1

. (1.7.4)

1.7.2 Geodesics & Christofell Symbols

We can compute the geodesics, and read off the Christofell symbols, fromthe effective Lagrangian formed from the FRW metric (1.7.2)

Leff = t2 − a2(t)[

11− kr2

r2 + r2(θ2 + sin2 θφ2

)],

where an overdot denotes derivative with respect to the affine parameter, λ,say. Now, one will need to use the following relation

a =da

=∂a

∂t

∂t

∂λ

= a′t, a′ ≡ da

dt.

Page 106: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

98 Gravitation

Upon careful computation, one finds the four geodesics:

t− aa′

1− kr2r2 − aa′r2θ2 − aa′r2 sin2 θφ2 = 0,

r +kr2

1− kr2

(2a2 − 1a2

)r2 + 2

a′

atr − r(1− kr2)θ2 − r sin2 θ(1− kr2)φ2 = 0,

θ + 2a′

atθ +

2rrθ − sin θ cos θφ2 = 0,

φ+ 2a′

atφ+ 2

sin2 θ

rrφ+ 2 cot θθφ = 0.

This allows us to read off the non-zero components of the Christofell symbols;

Γt rr = − aa′

1− kr2, Γt θθ = −aa′r2, Γt φφ = −aa′r2 sin2 θ,

Γr rr =kr2

1− kr2

(2a2 − 1a2

), Γr tr =

a′

a, Γr θθ = −r(1− kr2),

Γr φφ = −r sin2 θ(1− kr2),

Γθ tθ =a′

a, Γθ rθ =

1r, Γθ φφ = − sin θ cos θ,

Γφtφ =a′

a, Γφrφ =

sin2 θ

r, Γφθφ = cot θ.

Notice that using the definition of the Hubble parameter, H = a′/a, we seethat

Γr tr = Γθ tθ = Γφtφ = H.

This is the only section in which the derivative with respect to the affineparameter will be used; hence, an overdot from hereon denotes derivativewith respect to coordinate time t.

1.7.3 Cosmology in the FRW Universe

We now wish to consider what happens to spacetime, in the FRW Universe.To do so, we shall need the Ricci tensor corresponding to the FRW metric,and some energy-momentum tensor.

So, following from the FRW metric, (1.7.2), one can compute the associ-

Page 107: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.7 The Friedmann-Robertson-Walker Universe 99

ated components of the Ricci tensor. Doing so, one finds

R00 = −3aa, (1.7.5)

R0i = Ri0 = 0, (1.7.6)

Rij = −(

2ka2

+a

a+

2a2

a2

)gij . (1.7.7)

The metric gµν is the FRW metric, which we note can be written as

g00 = 1, gij = −a2(t)diag((1− kr2)−1, r2, r2 sin2 θ

).

Recall Einstein’s equation, in the form

Rµν = 8πG(Tµν − 1

2gµνT

), T ≡ gµνTµν .

We now use Weyl’s postulate which is that our Universe is a perfect fluid.A perfect fluid is one for whom there is no heat conduction or viscosity.

Recall that the general energy-momentum tensor is given by

Tµν = (ρ+ P )uµuν − Pgµν ,where P is the pressure of the fluid, and ρ the density. Hence, its trace is

T = (ρ+ P )uµuµ − Pgµµ = ρ+ P − 4P,

that is,

T = ρ− 3P.

Infact, this result can be obtained in a slightly easier way. Recall that in thecomoving frame of the fluid, uµ = (1, 0, 0, 0), then the energy-momentumtensor is diagonal,

Tµν = diag(ρ,−P,−P,−P ).

Hence, its trace is just the sum of its components, T = ρ− 3P .So, let us compute the bracketed bit of the Einstein equation,

Tµν − 12gµνT = (ρ+ P )uµuν − Pgµν − 1

2gµν(ρ− 3P )

= (ρ+ P )uµuν − 12gµν(ρ− P ).

Hence, the Einstein equation reads

Rµν = 8πG(

(ρ+ P )uµuν − 12gµν(ρ− P )

).

Page 108: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

100 Gravitation

Now, consider the comoving frame of the fluid, then we have that

Tµν = diag (ρ,−Pgij) , T = ρ− 3P,

and thus that

Tµν − 12gµνT =

12

diag (ρ+ 3P, gij(P − ρ)) .

Hence,

T00 − 12g00T =

12

(ρ+ 3P ),

so, the 00-component of the Einstein equation, using (1.7.5) is

−3aa

= 8πG12

(ρ+ 3P ),

trivially rearranging into

a

a= −4πG

3(ρ+ 3P ). (1.7.8)

This is known as Raychauhuri’s equation.Similarly, suppose we took the ij-part of the Einstein equation, using

(1.7.7), then

−(

2ka2

+a

a+

2a2

a2

)gij = −8πG

12gij(ρ− P ),

from which we cancel out the metric gij ,

2ka2

+a

a+

2a2

a2= 4πG(ρ− P ).

Let us then insert Raychauhuri’s equation for the middle term on the LHS,

2ka2− 4πG

3(ρ+ 3P ) +

2a2

a2= 4πG(ρ− P ).

This can then be rearranged easily enough into(a

a

)2

=8πG

3ρ− k

a2. (1.7.9)

This is known as the Friedmann equation. It is common to notate

a

a≡ H,

so that the Friedmann equation reads

H2 =8πG

3ρ− k

a2.

Page 109: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.7 The Friedmann-Robertson-Walker Universe 101

In deriving these two equations, we jumped around a bit between comov-ing frames. These equations describe the expansion of the universe, in thecomoving frame of the fluid.

Let us see where the continuity equation

∇νT νµ = 0,

can get us. So, this is just

∂νTνµ + Γν ναT

αµ − Γν µαT

αν = 0,

where

Tµν = diag(ρ,−P,−P,−P ).

Now, the Christofell symbols relevant are

Γt tt = 0, Γθ tθ = Γφtφ = Γr tr = H.

Now, let us take the µ = t-component of the continuity equation,

∂νTνt + Γν ναT

αt − Γν tαT

αν = 0,

that is,

∂tTtt − ∂iT it + Γν ναT

αt − Γν tαT

αν = 0.

Now, the second term above is zero, as the energy-momentum tensor isdiagonal. Hence, if we write that Tµν = δµνT

µν , then

∂tTtt + Γν ναδ

αt T

αt − Γν tαδ

αν T

αν = 0,

which is just

∂tTtt + Γν νtT

tt − Γν tνT

νν = 0.

Now, the only non-zero Christofell symbols of the form Γν νt are those Γi it.Hence,

∂tTtt + Γi itT

tt − Γi tiT

ii = 0.

Therefore, with reference to the above Christofell symbols, we see that thisis just

∂tρ+ 3Hρ+ 3HP = 0,

which is

ρ = −3H(ρ+ P ), (1.7.10)

which is known as the energy conservation equation, or the fluid equation.Hence, the three important equations we have derived, for a Universe in

a perfect fluid:

Page 110: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

102 Gravitation

• The Raychaudhuri equation:

a

a= −4πG

3(ρ+ 3P ). (1.7.11)

• The Friedmann equation:(a

a

)2

=8πG

3ρ− k

a2. (1.7.12)

• The fluid equation:

ρ = −3a

a(ρ+ P ). (1.7.13)

All three equations are dependent upon the others, so that in solving them,one must use all three. Infact, using any two, one can derive the third.

1.7.3.1 Species Evolution & Densities

The components to the fluid are called “species”. That is, we could conceivethat the fluid is composed of matter, radiation and possibly some other“stuff” (which we shall come to later).

Notice that we can write the fluid equation as

a∂ρ

∂a= −3(ρ+ P ).

Then, this can be solved, for the evolution of ρ as a function of scale factora. We now consider three cases. We shall consider how the density of aparticular species evolves, as a function of scale factor, if only that speciesexists in the Universe.

Matter Dominated FRW Universe Consider a Universe that is filledsolely with matter. For matter, there is no associated pressure. Hence,Pm = 0, and the fluid equation becomes

a∂ρm

∂a= −3ρm,

integrating,

−3∫da

a=∫dρm

ρm⇒ −3 ln a = ln ρm,

which is just

ρm =ρm,0

a3, (1.7.14)

whereby ρm,0 the (constant) initial density of matter.

Page 111: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.7 The Friedmann-Robertson-Walker Universe 103

Radiation Dominated FRW Universe Radiation has the equation ofstate

ρr = 3Pr,

which may be derived from black-body radiation theory. Hence using this,the fluid equation reads

a∂ρr

∂a= −4ρr,

integrating as before results in

ρr =ρr,0

a4. (1.7.15)

Vacuum Dominated FRW Universe The equation of state for vacuumis

ρV = −PV,

so that the fluid equation reads

ρV = 0,

hence, we see that ρV = const.

Critical Density Recall the Friedmann equation, but let us set k = 0 (i.e.flat),

H2 =8πG

3ρ.

Then, let us define this ρ to be ρcrit, so that

ρcrit =3H2

8πG. (1.7.16)

That is, ρcrit is the density required to make the Universe flat. If we takethe present value of the Hubble parameter to be

H0 = 100h km s−1Mpc−1,

then the critical density should have value (if measured today),

ρcrit = 10.54h2keV cm−3.

We use the notation that a subscript “0” denotes the present value of aquantity. In particular, we define

a0 ≡ 1;

the present value of the scale factor is unity.

Page 112: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

104 Gravitation

Normalised Energy Densities Let us suppose that there are four speciespresent in the Universe: matter, radiation, vacuum and curvature. Let usnow define

Ωm ≡ ρm,0

ρcrit, Ωr ≡ ρr,0

ρcrit, ΩV ≡ ρV,0

ρcrit, Ωk ≡ − k

H20a

20

. (1.7.17)

That is, the Ωi are called the normalised energy densities of the species;they represent the current fraction of that species, in terms of the criticaldensity. We impose the condition

Ωm + Ωr + ΩV + Ωk = 1,

as the Universe appears to be flat, by measurement. The matter species iscomposed of both baryonic and dark matter, radiation is composed of bothphotons and neutrinos. We tend to call the vacuum species the cosmologicalconstant, so that ΩV = ΩΛ. See Table (1.2) for the current values of variousquantities.

Quantity Current Accepted Value

Ωm 0.24Ωb 0.04

ΩDM 0.20Ωr < 0.01Ωk 0.05ΩΛ 0.7

Table 1.2. Various quantities, as a fraction of ρcrit.

1.7.4 Age of the FRW Universe

Let us return to the Friedmann equation

H2 =8πG

3ρ− k

a2,

if we the divide through by H20 ,

H2

H20

=8πG3H2

0

ρ− k

a2H20

,

and the last expression on the RHS multiply/divide by a20, to give

H2

H20

=8πG3H2

0

ρ− k

a20H

20

a20

a2.

Page 113: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.7 The Friedmann-Robertson-Walker Universe 105

Now, we notice the presence of our definitions of ρcrit and Ωk, so that

H2

H20

ρcrit+

Ωk

a2,

after using that a0 = 1. We now insert our derived evolutions of the variousspecies ρi,

H2

H20

=1ρcrit

(ρm,0

a3+ρr,0

a4+ ρV,0

)+

Ωk

a2

=Ωm

a3+

Ωr

a4+ ΩV +

Ωk

a2.

Hence, if we set H = H0, and a = a0, then we have

Ωm + Ωr + ΩV + Ωk = 1.

So, let us write our expression back in terms of the scale factor, so that(a

a

)2

= H20

[Ωm

a3+

Ωr

a4+ ΩV +

Ωk

a2

],

or,

a

a= H0

[Ωm

a3+

Ωr

a4+ ΩV +

Ωk

a2

]1/2

,

multiplying through by a, and pulling inside the square-root,

a = H0

[Ωm

a+

Ωr

a2+ ΩVa

2 + Ωk

]1/2

. (1.7.18)

Now, consider that

t0 =∫ t0

0dt =

∫ a0=1

0

dt

dada =

∫ 1

0

da

a.

Hence, we have that

t0 =1H0

∫ 1

0da

[Ωm

a+

Ωr

a2+ ΩVa

2 + Ωk

]−1/2

. (1.7.19)

Therefore, this expression will give us the age of the Universe.

Page 114: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

106 Gravitation

1.7.4.1 Age of Matter Dominated Universe

So, let us assume that Ωm = 1 (all other species are zero). Hence, thepresent age of the Universe may be given by

t0 =1H0

∫ 1

0da a1/2

=2

3H0.

Also notice that in the matter dominated universe, (1.7.18) looks quite sim-ple,

a = H0

√Ωm a−1/2,

which is easily solved to give

a ∝ t2/3. (1.7.20)

That is, if the Universe is matter dominated, then the scale factor evolvesin time as t2/3.

Another curious result, is that for a vacuum dominated Universe, a ∝ a,which implies that

a ∝ et,that is, in a vacuum dominated Universe, the scale factor grows exponen-tially with time.

1.7.4.2 Age of Matter & Curvature Dominated Universe

Here, we have a mixture of two species, such that

Ωr = ΩV = 0, Ωm + Ωk = 1.

Let us introduce a rescaling of time, known as conformal time, whereby

adη = dt.

Hence,

η =∫ t

0

dt

a=∫ a

0

da

aa.

Notice that in writing this, we have that η = η(a). We should then be ableto invert it, so that a = a(η). Notice that if we use conformal time, theFRW metric (1.7.2) can be written in the form

ds2 = a2(t)[dη2 − dr2

1− kr2+ r2

(dθ2 + sin2 θdφ2

)] ∼ a2(t)gµνdxµdxν .

Page 115: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.7 The Friedmann-Robertson-Walker Universe 107

That is, we have a conformal transformation of the metric. This is why wecall η conformal time.

Now, (1.7.18) in our model is

a = H0

[Ωm

a+ Ωk

]1/2

,

hence,

aa = H0

[Ωma+ Ωka

2]1/2

.

Therefore, using this,

η =1H0

∫ a

0

da√Ωma+ Ωka2

.

To integrate this, we complete the square, giving

η =1H0

∫ a

0da

Ωk

[(a+

Ωm

2Ωk

)2

− Ω2m

4Ω2k

]−1/2

.

If we then define

x ≡ 2Ωk

Ωma+ 1,

then we see that we can write

η =1H0

∫ x

adx

2Ωk

Ωm

Ωk

[Ω2

m

4Ω2k

(x2 − 1)]−1/2

=1

H0

√Ωk

∫ x

1

dx√x2 − 1

,

where we look up the value of the integral,∫ x

1

dx√x2 − 1

= cosh−1 x.

Hence,

η =1

H0

√Ωk

cosh−1 x.

Therefore,

x = cosh(ηH0

√Ωk

).

Hence,

a(η) =Ωm

2Ωk

[cosh

(ηH0

√Ωk

)− 1].

Page 116: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

108 Gravitation

Writing Ωk = 1− Ωm, then this reads

a(η) =Ωm

2(1− Ωm)

[cosh

(ηH0

√1− Ωm

)− 1], Ωk > 0. (1.7.21)

Clearly, this only holds for Ωk > 0. If Ωk < 0, then the cosh becomes acosine, and we have

a(η) =Ωm

2(Ωm − 1)

[1− cos

(ηH0

√Ωm − 1

)], Ωk < 0. (1.7.22)

Η

aHΗL

Fig. 1.8. A visualisation of closed and open universes. Closed has Ωk < 0, andopen Ωk > 0. The former is just a sinusoidal-oscillation, the latter an exponentialexpansion.

With reference to Fig. 1.8, we see the two different types of Universes. It isclear from the analytic forms of the evolution of scale factor with conformaltime, a(η), that Ωk > 0 corresponds to an exponential increase in scalefactor (1.7.21), and Ωk < 0 an oscillatory scale factor (1.7.22). Also, fromthe definition of Ωk,

Ωk = − k

H20a

20

,

we see that

Ωk > 0 ⇒ k < 0 ⇒ open, (1.7.23)

Ωk < 0 ⇒ k > 0 ⇒ closed. (1.7.24)

which are in agreement of our previous statements of open and closed Uni-verses. So,

An oscillatory Universe will have a definite (conformal) time when it ends,when a(η) hits the axis again,

cos(ηtotH0

√Ωm − 1

)= 1 ⇒ ηtot =

2π√Ωm − 1H0

.

Page 117: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.7 The Friedmann-Robertson-Walker Universe 109

Hence, the actual total time is given by

ttot =∫ ηtot

0dηa(η),

which easily evaluates to

ttot =πΩm

(Ωm − 1)3/2H0.

Therefore, we have an expression for the total possible age of the Universe, ifthe Universe has a closed geometry. Hence, a small non-zero k is sufficient tocontrol the future “fate” of the Universe. That is, the Universe will eitherend up exponentially growing (the “heat death”), or will crunch back onitself (the “big crunch”).

1.7.5 Light in the FRW Universe

Consider the FRW metric, where we shall ignore all angular terms;

ds2 = dt2 − a2(t)dr2

1− kr2.

Now, assuming flatness, for light (i.e. null geodesics, ds2 = 0), we have thatthe metric reduces to

dt = a(t)dr.

Therefore, consider

R =∫ R

0dr =

∫ te

to

dt

a(t).

That is, the distance between two points that have photons sent betweenthem. We have that te is the time of emission of the photon, and to the timeof observation. Now, we shall assume that this distance is unchanged, forpulses sent slightly after this first set, so that

R =∫ te+δte

to+δto

dt

a(t).

Therefore, we have that ∫ te+δte

to+δto

dt

a(t)=∫ te

to

dt

a(t).

Page 118: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

110 Gravitation

Now, the only non-zero contribution to this (via a general calculus mid-pointtheorem) is

δtoa(to)

− δtea(te)

= 0,

which easily rearranges toδtoδte

=a(to)a(te)

.

Now, we can express the LHS as a ratio of frequencies (by units), so that

νeνo

=a(to)a(te)

≡ 1 + z.

Hence, we arrive at a standard relation in cosmology,νeνo

= 1 + z. (1.7.25)

This is always > 0. Therefore, we see that the ratio of received frequencyand “sent” frequency (i.e. the frequency that the light was, when it wassent by the object) is dependent upon the redshift z that the light wasemitted. This quantity z is just the ratio of the scale factors when the lightwas received, to when it was emitted. Hence, we see that the further awaysomething is, the frequency we see light emitted by it drops. That is, thewavelength increases. Hence, this is called the cosmological redshift effect.This is a different effect from gravitational redshift, because gravitationalredshift occurred due to different distances from a gravitating body.

Expansion of Universe ⇒ Cosmological redshift,

Different distances up gravitational potential ⇒ Gravitational redshift.

To get a handle on the numbers invloved, consider that the most distantquasar is at z ≈ 6.6, and that recombination is at z ≈ 103.

Notice that we can write

z =νe − νoνo

=ao − aeae

.

Also, recall that (non-relativistic) redshift is related to the velocity of theobject,

z =v

c=δa

a.

Hence, notice that we may compute

δa

a=δa/δt

aδt = H

R

c.

Page 119: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.7 The Friedmann-Robertson-Walker Universe 111

Therefore,

v = HR.

This is Hubble’s law, as derived from first principles from the FRW metric.

1.7.6 Flatness Problem

Now, there are problems with the FRW Universe.Recall that the fraction, today, of curvatures contribution to the total

density of the Universe is Ωk,0 < 10%. Also recall that we defined

Ωk(t) ≡ k

H(t)a2(t),

where t is the time at which we are measuring. Hence, let us compute,

Ωk(t0)Ωk(tr)

=k/H2

0a20

k/H2r a

2r

;

the ratio of the curvature contributions today and in the radiation dominatedepoch. This easily reduces to

Ωk(t0)Ωk(tr)

=a2r

H20a

20

.

Now, recalling that the scale factor, in the radiation dominated epoch, de-pends upon time as

ar = a0

(trt0

)1/2

⇒ ar =a0

2t0

(trt0

)−1/2

,

Also, recall that the Hubble parameter, in the radiation dominated epoch,goes as

H0 =1

2t0.

Hence,

ar = H0a0

(trt0

)−1/2

.

And therefore,

Ωk(t0)Ωk(tr)

=t0tr.

Page 120: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

112 Gravitation

Putting some typical numbers in, one sees that

Ωk(t0)Ωk(tr)

≈ 1017secs10−43secs

= 1060.

Hence,

Ωk(t0) = 1060Ωk(tr).

That is, the value of Ωk is 1060 times what it was in the radiation epoch!This requires a very small (so called “fine-tuning”) curvature in the earlyepoch, so that the Universe could be 1060 times more curved now, than itwas.

This fine-tuning required is called the flatness problem.

1.7.6.1 Inflation

One way to “solve” the flatness problem, is to introduce the concept ofinflation. If we allow an epoch before the radiation domination, that wasvacuum dominated (recall that aV (t) = aie

Ht). In this case, we can computethat

Ωk(tr)Ωk(ti)

=a2i

a2r

,

after assuming that Hi ≈ Hr. This gives

Ωk(tr)Ωk(ti)

= e−2H(tr−ti),

a number we require to be less than 10−60. Therefore, we require

Ne ≡ 2H(tr − ti) > 60.

That is, we require the number of e-folds to be about 60, in order for us toobserve the flatness that we do today.

Basically, this idea of inflation gives a mechanism by which the Universeis able to stretch and flatten out, very quickly. Infact, inflation also aids inexplaining the observed homogeneity of the Universe.

1.8 The General Theory of Relativity: Discussion

We have now come to a place whereby all the mathematical groundwork hasbeen laid, for a “wordy” discussion about the general theory of relativity.

Before general relativity (or at least a few hundred years before Einstein,as general relativity went through a few people before Einstein, in variousforms), gravity was some force that was present between two bodies having

Page 121: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

1.8 The General Theory of Relativity: Discussion 113

mass. As this was so, things that don’t have mass don’t interact withgravity. This means that things like photons are not affected by gravity,and that photons are not capable of generating a gravitational field. Also,the structure of spacetime was that space is flat, and time is just somethingto be moved through, at a constant rate; where the rate is the same for allobservers.

General relativity somewhat starts off by letting space and time mix:spacetime. The whole collection of bits of spacetime is then what we calla manifold; further to this, allowing a meaning to the term “distance” in amanifold, we introduce a metric. We call a manifold (collection of points)that has a metric, a Riemannian manifold. We “used” to think of spacetimeas being flat (Pythagoras’ theorem for distances between two points). A flatspacetime is described by a metric with constant components; taking thederivative of any one of them, with respect to any coordinate, is zero. Now,general relativity introduces the idea that a metric has components thatdepend on position. This means that in order to find the distance betweentwo points, you not only have to know where the points are, but where youare relative to the origin of the coordinate system. This is in contrast withonly needing to know the relative positions of the two points.

When one computes the derivative of something, one is computing therate of change of something in a particular direction. Now, when one didthis in a flat spacetime, the derivative of the metric didn’t do anything: itsderivative was zero. In a position dependent metric, this is no longer true.One finds that there is an extra bit, added onto the differential of something,that is proportional to the derivative of the metric. That this extra bit exists,is directly due to the metric being position dependent. Therefore, variouscombinations of this metric (in the form of differential with respect to variouscoordinates), will give us a handle on the geometry of the manifold. Aslightly curious thing is that a manifold does not require a higher dimensionin which to curve. Usually, when one imagines a ball (as an example),one can see that the surface of the ball is curved round, through threedimensions, but the surface of the ball itself is two dimensional. Manifoldsdo not require this extra dimension (to those within the manifold itself) inwhich to curve.

Mathematically, we carry around the “extra bits” of the differential in theChristofell symbols; and the “various combinations” of the differentials ofthe metric in the Riemann tensor.

Now, something that lives in a manifold, and moves in that manifold, willmove along some sort of curve; which is fairly obvious. Now, the motionof something, with respect to a stationary observer, can be determined. In

Page 122: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

114 Gravitation

a flat spacetime, something will move along lines that are determined byNewton’s equations of motion. In a curved spacetime (i.e. a spacetime thatdoesn’t have all zero components of its Riemann tensor), the curves thatthings move along are changed; and the amount that they are changed byis proportional to the Christofell symbols. These curves are geodesics. Ageodesic, in flat spacetime, with no external forces (such as a rocket boost,or magnetic fields), is a straight line. A corresponding geodesic, in curvedspacetime is curved. This curvature of the movement free “thing” is due tothe curvature of the spacetime.

So, this far in our discussion, we have seen that if a manifold is curved,then things don’t tend to move in straight lines within the manifold. Thatis, the geodesics are curved lines. A way to imagine this, is to envisagea cube threaded with a 3D grid; beads move along the gridlines, but thegridlines are not straight. This is only an analogy, as the real geodesics are4D. Then, we must consider what it is that does the curving. What thing,in a manifold, causes it to be curved?

The proposal of Einstein is that all forms of matter and energy (eventhough they are essentially the same) curve spacetime. The proposal equatesthe distribution of “stuff” (i.e. the things that do the curving, things thathave mass & energy) with the geometry of the spacetime. That is, thedistribution of mass-energy with combinations of the metric. This meansthat the more energy you put in a given place, the more the spacetime iscurved (and hence the more curvy geodesics get). The distribution of mass-energy is carried around in the energy-momentum tensor, and the geometryin the Einstein tensor.

This curvature of spacetime, due to the distribution of mass-energy is the“main idea” of general relativity.

Some of the consequences of this general theory include the “ability” ofmassless things, which have energy, to interact with gravity. This is becausethe massless things move through the spacetime, and gravity is just thecurvature of spacetime. This allows the geodesic of a photon to be curved.Notice that this is in contrast with the previous flat spacetime we started offdiscussing. This gives the so-called “light deflection” effect. Another conse-quence is that things at a different distance from the centre of a body doingthe curving (the so-called gravitating mass), experience difference rates ofpassage through time. This is because the position-dependent metric hasdifferent values at different positions (obviously). An example of this, isthat if we synchronize two clocks, on the surface of the earth, then take oneup from the surface of the earth, and leave one on the surface, they will telldifferent times when brought back together.

Page 123: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2

Advanced Quantum Mechanics

2.1 Different Quantisation Schemes

2.1.1 Orthodox Quantisation

This is also referred to as the Copenhagen interpretation, or conventionalquantum mechanics.

The objects are the normalised wavefunction ψ(x), such that∫ψ∗(x)ψ(x)dx = 1,

where we still use our multi-dimensional shorthand. Just taking ourselvesout of this for a second, then the above will simply read∫

ψ∗(x1, . . . , xn)ψ(x1, . . . , xn)dx1 . . . dxn = 1.

Where such integrals are understood to be taken over all space.The dynamic law is the Schrodinger equation,

i~∂

∂tψ(x, t) = Hψ(x, t),

where the Hamiltonian is the differential operator

H = − ~2

2m∂2

∂x2+ V (x).

The measurement law is such that given eigenfunctions ψn of an operator Q,the probability of a measurement resulting in quantity qn is given by |an|2;upon which the system has collapsed into state ψn. The total wavefunctionof the system (before measurement) was given by

ψ =∑i

anψn.

115

Page 124: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

116 Advanced Quantum Mechanics

The quantisation principle may be seen as

H(pi, qi) 7−→ H

(−i~ ∂

∂xi, xi

).

That is, generalised momenta/positions go to the momentum/position op-erators. This is also known as the correspondence principle.

Problems with this method are four-fold.

• The ordering problem: “usually”, p2q = ppq = qp2. However, the orderingof operators matters (a lot) in quantum mechanics. This is something theabove quantisation principle is unable to get right.• The coordinate choice. Classically, we are at liberty to choose different

coordinates; however, such a choice will lead to different operators, andthus different energies.• Constraint equation. The choice of this is debatable.• No particle creation/destruction is allowed - which we shall see is a prob-

lem.

2.1.2 Modern Quantum Mechanics

Here, we have objects |ψ〉 which are state-vectors, dynamics are describedby the Schrodinger equation

H |ψ〉 = i~d

dt|ψ〉 ,

and principle is the quantisation scheme.We shall now discuss Dirac formalism, with quite a detour into linear

algebra.

2.1.2.1 Dirac Formalism

Taking quite a simple Hamiltonian, the Schrodinger equation is

i~∂ψ(x, t)∂t

=(− ~2

2m∇+ V (x, t)

)ψ(x, t) = H(x, t)ψ(x, t).

Now, writing out what the partial derivative means, in terms of time-steps

ψ(x,∆t)− ψ(x, 0) = − i~H(x, 0)ψ(x, 0)∆t,

we see that we can trivially factorise it into

ψ(x,∆t) =(

1− i

~H(x, 0)∆t

)ψ(x, 0).

Page 125: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.1 Different Quantisation Schemes 117

Thus, we see that it is linear. If we consider the next time-step

ψ(x, 2∆t) =(

1− i

~H(x,∆t)∆t

)ψ(x,∆t),

we see that we can then put in the expression for ψ(x,∆t),

ψ(x, 2∆t) =(

1− i

~H(x, 0)∆t

)(1− i

~H(x,∆t)∆t

)ψ(x, 0).

This then suggests that we can write the time evolution of the wavefunctionas being due to some evolution operator. We denote it as

ψ(x, t) = Utφ(x, 0).

We can see linearity from this. Consider some initial wavefunction 1, beingevolved into its final wavefunction; and consider that this is done to twowavefunctions. That is

ψ1iUt−→ ψ1f , ψ2i

Ut−→ ψ2f .

Then, consider a linear combination of the left- and right-hand sides

αψ1i + βψ2iUt−→ αψ1f + βψ2f .

Thus confirming linearity.Let us consider introducing some basis, φn(x), from which we can develop

any function

ψ(x, t) =∑n

an(t)φn(x) = a · e, φn(x) ≡ en.

Suppose we evolved the basis en, and it ended up giving us some new function

enUt−→ ψn(x, t) =

∑m

Unm(t)φm(x)

=∑m

Unm(t)em.

Further, let us have

ψ(x, 0) =∑n

an(0)φn(x),

and then let us evolve it. That is, the evolution operator acts on the basis,not the coefficient (as it is only at t = 0). Thus,

ψ(x, 0) Ut−→∑n,m

an(0)Unm(t)φm(x),

Page 126: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

118 Advanced Quantum Mechanics

which is just ψ(x, t)

ψ(x, t) =∑n,m

an(0)Unm(t)φm(x).

Let us denote this back in terms of development into a basis

ψ(x, t) =∑n

an(t)φn(x), an(t) =∑m

Unm(t)am(0).

Therefore, if we know how the basis evolves, we therefore know how anyfunction evolves. This is a direct consequence of linearity.

2.1.2.2 Vector Spaces

Consider a vector & set of bases x1

x2

x3

, ei = (e1, e2, e3).

Vectors add, and can be multiplied by a scalar thus x1

x2

x3

+

y1

y2

y3

=

x1 + y1

x2 + y2

x3 + y3

, α

x1

x2

x3

=

αx1

αx2

αx3

.

We can change basis,

e 7→ e′ = Ae, ei = Ajiej ,

and similarly coordinates

x′ = A−1x;

where the transformation matrix A satisfies

A−1A = 1.

We write vectors as a sum over components times basis

x = e · x =∑i

eixi.

Under change of vector under coordinate & basis transformation, nothinghappens

x′ = e′ · x′ = AxA−1e = e · x.

Page 127: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.1 Different Quantisation Schemes 119

The Scalar Product We define the scalar product by giving a rule as tohow the basis vectors combine. Thus, for our Cartesian basis

ei · ej = δij .

Generally, this is of course just the metric of the basis. Note that this thenallows us to write

x · y = eixi · ejyj= xiyjei · ej= xiyjδij

= xiyi.

We write the of a square of a vector as the scalar product of the vector withitself;

x · x =∑i

x2i .

The scalar product is linear in both arguments.Notice that if x =

∑eixi, then

x · ei =∑j

xjej · ei

=∑j

δijxj

= xi.

Also,

x =∑i

eixi

=∑i,j

eixjδij

=∑i,j

ei(x · ej)δij .

This is merely showing that we can change the basis to make the problemsimpler.

Coordinate Transformations Consider that we know xi in the basis ei.What are the x′i in the other basis e′i?

Now, consider our previous results that xi = x ·ei and that x = x′. Then,

Page 128: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

120 Advanced Quantum Mechanics

combining the two, we see that

x′i = x · e′i =∑j

xjej · e′i =∑j

Aijxj ,

where we first just wrote out the vector in terms of its components. Wethen identified the matrix

Aij = ei · e′j .We can introduce some linear transformation, y = Tx; thus

yj = y · ej = Tx · ej .Let us now consider complex vector spaces.

2.1.2.3 Complex Vector Spaces

A complex vector space, with some scalar product rule, such that it givesa complete metric space, is called Hilbert space. We shall do all this underthe notation x = xi.

Scalar Product We define the scalar product thus∫χ∗(x)ψ(x)dx =

∑x

χ∗xψxVdx, Vdx = dx = metric.

We have that ψx are the components of some vector |ψ〉, and χ∗x the compo-nents of some complex conjugate (also called the adjoint) vector 〈χ|. Thus,using some sort of abstract vector basis, we have

|ψ〉 =∑x

ψxex, 〈χ| =∑x

χ∗xe∗x.

Therefore, notice that

〈χ|ψ〉 =∑x,x′

χ∗xψx′e∗x · ex′ ,

and defining that e∗x · ex′ = Vdxδxx′ , then

〈χ|ψ〉 =∑x

χ∗xψxVdx =∫χ∗(x)ψ(x)dx.

Finally, notice that

〈ψ|χ〉 =∫ψ∗(x)χ(x)dx = (〈χ|ψ〉)∗ .

Page 129: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.1 Different Quantisation Schemes 121

Let us denote the basis ex as |x〉 (such that it only has a unit non-zero entryat the xth element), then

〈x′|x〉 = δxx′ = δ(x′ − x);

where we shall be flippant with continuous/discrete. This then defines anorthonormal coordinate basis, which we may then develop any function into;

|ψ〉 =∑x

ψx |x〉 .

We use the standard identification that ψx = ψ(x). Similarly, we see that

〈χ| =∑x

χ∗x 〈x| .

Restoration of Wavefunction We say that 〈χ| is adjoint to |χ〉. So then,given the components in a basis, what is the wavefunction ψ(x) itself? Weproceed thus

〈x|ψ〉 = 〈x|∑x′

ψx′∣∣x′⟩

=∑x′

ψx′〈x|x′〉

=∑x′

ψx′δxx′

= ψx ≡ ψ(x).

Therefore, we see that the restoration of the wavefunction is achieved

ψ(x) = 〈x|ψ〉 = ψx.

Completeness of the Basis Now, consider the development of a stateinto a basis

|ψ〉 =∑x

ψx |x〉

=∑x

〈x|ψ〉 |x〉

=

(∑x

|x〉 〈x|)|ψ〉 ,

we thus see that the bracketed quantity must be unity, in order to make theequation consistent. Therefore, we arrive at the completeness of the basis

Page 130: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

122 Advanced Quantum Mechanics

statement ∑x

|x〉 〈x| = 1. (2.1.1)

Different Bases Let us introduce some new basis |n〉, which satisfies or-thonormality, 〈n|m〉 = δnm. Then, suppose we develop some state in termsof this new basis,

|ψ〉 =∑n

an |n〉 ,

then, to find the coefficients an, we form the scalar product with “another”element of the basis

〈m|ψ〉 =∑n

an〈m|n〉 = am;

where we used the above orthonormality relation. Further, consider that

|ψ〉 =∑n

an |n〉

=∑n

〈n|ψ〉 |n〉

=

(∑n

|n〉 〈n|)|ψ〉 .

Thus, again, we see the completeness of the basis∑n

|n〉 〈n| = 1.

Suppose that we have

〈χ| =∑n

bn 〈n| ,

then we see that

bn = 〈χ|n.

Similarly, notice that

|ψ〉 =∑n

an |n〉 conjugate to 〈ψ| =∑n

a∗n 〈n| .

Page 131: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.1 Different Quantisation Schemes 123

Transition Between Bases Now, consider that we have

ψ(x, t) = 〈x|ψ〉; |ψ〉 =∑n

an |n〉 , an = 〈n|ψ〉;

where the first expression is something we want to convert into the basis ofthe last two. In the final expression, consider inserting something which isunity. Thus consider

an = 〈n|ψ〉

= 〈n|(∑

x

|x〉 〈x|)|ψ〉

=∑x

〈n|x〉ψ(x)

=∑x

〈x|n〉∗ψ(x)

=∑x

n∗(x)ψ(x).

The final expression is just the scalar product. And therefore,

an =∫n∗(x)ψ(x)dx.

Discrete .vs. Continuous Suppose that λ is an index describing somebasis. It could be continuous or discrete. We use the “generalised Kroneckersymbol” when denoting such ambiguities in

〈λ|λ′〉 = δλλ′ .

The generalised Kronecker symbol is to be used thus:

δλλ′ =

δλλ′ The normal Kronecker delta, if indices discrete,

δ(λ− λ′) The Dirac delta, if indices continuous,0 Otherwise.

If λ = λini=1, a multi-index label, then we modify each

δλλ′ =n∏i=1

δλiλ′i , δ(λ− λ′) =n∏i=1

δ(λi − λ′i).

And therefore, giving the required generalised completeness of the basis∑λ

|λ〉 〈λ| = 1.

Page 132: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

124 Advanced Quantum Mechanics

Example: Momentum & Position Basis Consider that |x〉 is the coor-dinate basis, and that |p〉 is the momentum basis. Then,

|ψ〉 =∑x

ψ(x) |x〉

=∑p

ψ(p) |p〉 .

Then, using the final expression,

ψ(p) = 〈p|ψ〉=

∑x

〈p|x〉〈x|ψ〉

=∑x

〈p|x〉ψ(x).

In the first step, all we did was to insert the usual “unity”, and progressthat through. Notice then that

〈p|x〉 = 〈x|p〉∗ = P ∗(x),

and that (given P |p〉 = p |p〉)〈x| P |p〉 = 〈x| p |p〉 = p〈x|p〉 = pP (x).

Let us also insert our “unity” into the first expression,

〈x| P |p〉 =∑x′

〈x| P ∣∣x′⟩ 〈x′|p〉 = −i~∑x′

d

dxδxx′P (x′) = pP (x).

Thus,

−i~dPdx

= pP (x),

which we fairly easily solve to

P (x) =1(√

2π~)n eipx/~,

where n is the dimension of the space in which we are working. Finally, wesee that

ψ(p) =∑x

〈p|x〉ψ(x) =∫P ∗(x)ψ(x)dx =

1(√2π~

)n ∫ e−ipx/~ψ(x)dx.

Page 133: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.1 Different Quantisation Schemes 125

2.1.3 Quantum Mechanical Operators

As we have seen, an operator Q is a linear machine which transforms onequantum state into another,

Q : ψi 7−→ χi.

That it is linear implies

Q : αψ1 + βψ2 7−→ αχ1 + βχ2.

Now, we have already seen that we can write a quantum state as a develop-ment (sum) over an orthonormal basis,

|ψ〉 =∑n

an |n〉 .

Then, then operator acting on the basis

Q |n〉 =∑m

|m〉 〈m| Q |n〉

=∑m

Qmn |m〉 , Qmn ≡ 〈m| Q |n〉 .

And thus, the operator acting on a state,

Q |ψ〉 =∑n

anQ |n〉 =∑n,m

anQmn |m〉 .

In this way, we say that Qmn is the operator Q in |n〉 representation.Now, suppose we want to find Q, if we know Qmn. Then, consider that

|χ〉 = Q |ψ〉 ,inserting two “unities” on the RHS,

|χ〉 = 1.Q.1. |ψ〉 ,that is,

|χ〉 =∑n,m

|n〉 〈n| Q |m〉 〈m|ψ〉.

Now, we see Qmn present,

|χ〉 =∑n,m

|n〉Qnm〈m|ψ〉

=∑n,m

Qnm |n〉 〈m|ψ〉

= Q |ψ〉 .

Page 134: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

126 Advanced Quantum Mechanics

Thus, it is clear that

Q =∑n,m

Qnm |n〉 〈m| . (2.1.2)

Suppose that Qnm is diagonal (that is, all off-diagonal elements are zero),

Qnm = qnδnm,

then it is clear that

Q =∑n,m

Qnm |n〉 〈m|

=∑n,m

qnδnm |n〉 〈m|

=∑n

qn |n〉 〈n| .

Then, we say that Q is diagonal in |n〉 representation;

Q =∑n

qn |n〉 〈n| . (2.1.3)

2.1.3.1 Operator Formalism

Here we shall go through more operator formalism & their properties.

The Correspondence Principle For every classical value (i.e. observ-able), f , there exists a quantum mechanical operator f . The converse is nottrue (for example, spin & parity). Thus,

f(p, q) 7−→ f

(−i~ d

dx, x

).

The Expected Value An average value of a measured quantity Q is givenby 〈ψ| Q |ψ〉, given that the system was in the state |ψ〉 immediately beforemeasurement. Let us show this relation.

Now, we have our state & operator,

|ψ〉 =∑n

an |n〉 , Q =∑

qn |n〉 〈n| .

Then, with probability |an|2, we have outcome qn. Then, the average valueof Q (the average denoted with an angled braces), is just the sum over allpossible values times the probability of getting that value;⟨

Q⟩

=∑n

qn|an|2.

Page 135: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.1 Different Quantisation Schemes 127

Now, we have that

〈n|ψ〉 = 〈n|∑n

an |m〉 = an,

so that ⟨Q⟩

=∑n

qn |〈n|ψ〉|2

=∑n

qn〈n|ψ〉 (〈n|ψ〉)∗ ,

however, (〈n|ψ〉)∗ = 〈ψ|n〉. So,⟨Q⟩

=∑n

qn〈n|ψ〉〈ψ|n〉

=∑n

qn〈ψ|n〉〈n|ψ〉

= 〈ψ|(∑

n

qn |n〉 〈n|)|ψ〉

= 〈ψ| Q |ψ〉 .Hence shown.

Hermitian Conjugates For every operator Q, there exists an Hermitianconjugate Q†, which acts to the left. Thus,

|χ〉 = Q |ψ〉 , 〈χ| = 〈ψ| Q†.We have used that the Hermitian conjugate of a ket-state is a bra-state.

Now, we have that an operator is

Q =∑n,m

Qnm |n〉 〈m| ,

then, taking the Hermitian conjugate of the whole expression,

Q† =∑n,m

Q∗nm |m〉 〈n| .

Now then, we have that

〈χ| = (|χ〉)† =(Q |ψ〉

)†,

putting in two unities,(Q |ψ〉

)†=∑n,m

(|n〉 〈n| Q |m〉 〈m|ψ〉

)†.

Page 136: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

128 Advanced Quantum Mechanics

Then, we can see that this is(Q |ψ〉

)†=∑n,m

〈n| Q |m〉∗ 〈m|ψ〉∗ 〈n| ,

but, 〈m|ψ〉∗ = 〈ψ|m〉, so(Q |ψ〉

)†=∑n,m

〈n| Q |m〉∗ 〈ψ|m〉 〈n| .

Rearranging this, (Q |ψ〉

)†=

∑n,m

〈ψ|m〉 〈n| 〈n| Q |m〉∗ (2.1.4)

=∑n,m

〈ψ|m〉 〈n|Q∗nm. (2.1.5)

Now, writing the LHS as (Q |ψ〉

)†= 〈ψ| Q†,

and expanding out the bra-state in terms of its basis,

〈ψ| Q† =∑m

a∗m 〈m| Q†.

Then, as a∗n = 〈n|ψ〉∗ = 〈ψ|n〉, we see that this is just

〈ψ| Q† =∑m

〈ψ|m〉 〈m| Q†

=∑n,m

〈ψ|m〉 〈m| Q† |n〉 〈n|

=∑n,m

〈ψ|m〉 〈n|Q†mn.

So, comparing this with (2.1.4), we see that

(Q†)nm = Q∗mn. (2.1.6)

Combinations of Operators We state, and the prove, that if

Q = AB,

then

Q† = B†A†.

Page 137: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.1 Different Quantisation Schemes 129

So, the proof. Let us have

|χ〉 = B |ψ〉 ,and that

|η〉 = A |χ〉 .Then, it is clear that

|η〉 = A |χ〉 = AB |ψ〉 .Now, consider

(|η〉)† =(A |χ〉

)†,

which is just

〈η| = 〈χ| A†.Similarly,

〈χ| = 〈ψ| B†.And therefore, we see that

〈η| = 〈χ| A† = 〈ψ| B†A†.Therefore, we have proven our assertion. In a similar way, we can prove that

Q = ABC ⇒ Q† = C†B†A†,

Q = A2B3C−1/2 ⇒ Q† = (C−1/2)†(B†)3(A†)2.

Whereby

(〈m| Q |n〉)∗ = 〈n| Q† |m〉 .

Commutators Operators dont generally commute. That is, we cannotchange their order. [

A, B]≡ AB − BA.

For example, position & momentum do not commute,

[x, p] = i~.

We can also form [A2, B

]= A

[A, B

]+[A, B

]A.

Page 138: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

130 Advanced Quantum Mechanics

Hermitian Operators An Hermitian operator is one for whom

Q† = Q, Q∗nm = Qmn.

Physical observables correspond to Hermitian operators. This is seen as theeigenvalues of an Hermitian operator are real. Thus, consider

q = 〈ψ| Q |ψ〉 ; q∗ = 〈ψ| Q |ψ〉∗ = 〈ψ| Q† |ψ〉 = q.

Where the last step follows as the Hermitian conjugate of an Hermitianoperator is itself. Therefore, q = q∗, and therefore, we state

q = 〈ψ| Q |ψ〉 ∈ R.

We shall look more at this in a “few points time”.

Unitary Operators Such unitary operators, U , are such that

U †U = 1.

They are useful, as they conserve the norm. That is, consider

|χ〉 = U |ψ〉 , 〈χ| = 〈ψ| U †.

Then,

〈χ|χ〉 = 〈ψ| U †U |ψ〉 = 1.〈ψ|ψ〉 = 1.

Eigenvalues & Eigenstates An eigenvalue equation is such that when anoperator operates on an eigenstate, the eigenstate is returned, multiplied bya number;

Q |ψ〉 = λ |ψ〉 .

Where the eigenstates have a finite norm,

〈ψ|ψ〉 <∞.

As an example, consider the momentum operator,

−i~ d

dxψ = λψ ⇒ ψ = Ceiλx/~.

For the exponent to not-diverge, the argument of the exponential must becomplex, thus leaving λ ∈ R.

Page 139: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.1 Different Quantisation Schemes 131

More on Hermitian Operators: Basis Hermitian operators have realeigenvalues; and eigenstates of Hermitian operators form a complete or-thonormal basis.

So, we have that

Q =∑n

qn |n〉 〈n| .

Then, letting |n〉 be a normalised eigenvector of an Hermitian operator Q,

Q |n〉 = qn |n〉 ⇒ qn = 〈n| Q |n〉 .Thus, qn is the expectation value of Q, when the system is initially in |n〉,an eigenstate. Also, the eigenvalue qn are real (as was previously shown).

Now, to show orthonormality, consider the action of the Hermitian oper-ator on two different eigenstates, each having different eigenvalue (i.e. arenon-degenerate);

Q |n〉 = qn |n〉 , Q |m〉 = qm |m〉 .Multiplying the first by 〈m|, and the second by 〈n|, results in

〈m| Q |n〉 = qn〈m|n〉, 〈n| Q |m〉 = qm〈n|m〉.If we take the conjugate of the second expression,

(〈n| Q |m〉)∗ = q∗m(〈n|m〉)∗ ⇒ 〈m| Q† |n〉 = qm〈m|n〉,noting that the operator is Hermitian, then this is just 〈m| Q |n〉 = qm〈m|n〉.So, subtracting this from the first expression,

〈m| Q |n〉 − 〈m| Q |n〉 = qn〈m|n〉 − qm〈m|n〉,which is just

(qn − qm)〈m|n〉 = 0.

Therefore, given that qn 6= qm, we see that

〈m|n〉 = 0.

And, if we had chosen suitable normalisation, we can also write

〈n|m〉 = δnm, qn 6= qm.

Thus, we have completed the proof of the orthonormality of the eigenstatesof Hermitian operators. The proof of completeness is hard, and we shall notdo it here.

Page 140: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

132 Advanced Quantum Mechanics

Using orthonormality, we can show that the representation of an Hermi-tian operator in the basis of its eigenstates, is diagonal. So, writing theoperator down, with two unities,

Q =∑n,m

|n〉 〈n| Q |m〉 〈m|

=∑n,m

|n〉 〈n| qm |m〉 〈m|

=∑n,m

qm |n〉 〈n|m〉 〈m|

=∑n,m

qm |n〉 δnm 〈m|

=∑n

qn |n〉 〈n| .

Thus, proven. It is important to note that this was in the basis of eigenstatesof the operator, and that the eigenstates were orthonormal.

Finally, suppose that 3 operators commute & are Hermitian. Then, thereexists a basis in which all 3 operators are diagonal,

A |n〉 = an |n〉 , B |n〉 = bn |n〉 , . . . .

2.1.3.2 The Schrodinger Equation

Let us look at the Schrodinger equation, in light of our previous discussion.So, we have

i~∂ψ(x, t)∂t

= H(x, t)ψ(x, t),

the RHS of which we write as

H(x, t)ψ(x, t) =∑x′

Hxx′δxx′ψ(x′, t).

In writing this, we are taking the usual Hamiltonian (kinetic plus potential),and writing as a sum. We are not forcing it to be diagonal, as it isnt (forexample, in spherical polars).

We write

Hxx′δxx′ = 〈x| H ∣∣x′⟩ .Then, putting this in, and multiplying by a bra state,

i~∂

∂t〈x|ψ〉 =

∑x′

〈x| H ∣∣x′⟩ 〈x′|ψ〉.

Page 141: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.1 Different Quantisation Schemes 133

Now, we notice that (a) we can cancel the 〈x|, and that (b) we see unity in|x′〉 〈x′|. So, the above becomes

i~∂

∂t|ψ〉 = H |ψ〉 . (2.1.7)

Thus, the Schrodinger equation. We also denote it

i~∣∣∣ψ⟩ = H |ψ〉 .

Now, we have that

〈x|ψ〉 = ψ(x, t), Hxx′δxx′ = 〈x| H ∣∣x′⟩ .Then, putting two unities either side of the Hamiltonian,

H =∑x,x′

|x〉 〈x| H ∣∣x′⟩ ⟨x′∣∣ ,which is clearly,∑

xx′

|x〉 〈x| H ∣∣x′⟩ ⟨x′∣∣ =∑x,x′

|x〉Hxx′δxx′⟨x′∣∣

=∑x,x′

Hxx′δxx′ |x〉⟨x′∣∣

=∑x,x′

(− ~2

2m∂2

∂x2+ V (x)

)δxx′ |x〉

⟨x′∣∣ .

2.1.3.3 Properties of the Hamiltonian

The Hamiltonian is an Hermitian operator,

H = H†,which is clear as it corresponds to energy. The corresponding eigenvalueequation is

H |ψ〉 = E |ψ〉 , E ∈ R.

Where E is the energy of the system.Let |n〉 be an orthonormal basis of energy eigenstates

H |n〉 = En |n〉 .Then,

|ψ〉 = e−iEnt/~ |n〉

Page 142: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

134 Advanced Quantum Mechanics

satisfies the Schrodinger equation. So, the Schrodinger equation is

i~∣∣∣ψ⟩ = En |ψ〉 ,

and thus

H |ψ〉 = En |ψ〉 .

To see this, consider that

|ψ〉 = e−iEnt/~ |n〉 ⇒∣∣∣ψ⟩ = − iEn

~|ψ〉 ,

the RHS expression obviously rearranges into

i~∣∣∣ψ⟩ = En |ψ〉 .

Now,

|ψ〉 =∑n

ane−iEnt/~ |n〉

is also a solution to the Schrodinger equation.Now, notice that

|ψ(t = 0)〉 =∑n

an |n〉 ⇒ an = 〈n|ψ(0)〉,

and therefore that

|ψ〉 =∑n

〈n|ψ(0)〉e−iEnt/~ |n〉

is also a solution, for all initial states. Rearranging this slightly,

|ψ〉 =∑n

e−iEnt/~ |n〉 〈n|ψ(0)〉 = U(t) |ψ(0)〉 , (2.1.8)

where we have defined the (unitary; but more on this later) evolution oper-ator

U(t) ≡∑n

e−iEnt/~ |n〉 〈n| . (2.1.9)

Note, we can write the Hamiltonian as

H =∑n

En |n〉 〈n| .

Page 143: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.1 Different Quantisation Schemes 135

2.1.4 Quantum Evolution

2.1.4.1 The Schrodinger Picture

Let us start by stating the Schrodinger equation & the result derived above,

i~∣∣∣ψ⟩ = Hψ, |ψ〉 = U(t) |ψ(0)〉 .

Now, the Schrodinger equation may be “solved”

i~x = Hx ⇒ x = e−i~

RHdtx(0).

So that by the correspondence principle,

|ψ〉 = e−i~

RHdt |ψ(0)〉 .

Now, if the Hamiltonian is not a function of time, then this is just

|ψ〉 = e−it~ H |ψ(0)〉 .

So, we see that the evolution operator is

U = e−it~ H, U † = e

it~ H.

Notice that this operator is unitary, as

U †U = 1.

Thus, notice that

〈ψ|ψ〉 = 〈ψ(0)| U †U |ψ(0)〉 = 1.

So, the Schrodinger picture of quantum mechanics is just this. The state ofthe system evolves, with the Hamitonian (or, any operator) being constantin time.

2.1.4.2 The Heisenberg Picture

Here, consider

q(t) = 〈ψ(t)| Q |ψ(t)〉 ,and that the states are found via the evolution operator,

|ψ(t)〉 = U(t) |ψ(0)〉 ,so that

q(t) = 〈ψ(0)| U †(t)QU(t) |ψ(0)〉 .Then, if we define some new operator, which is a function of time

Q(t) ≡ U †(t)QU(t),

Page 144: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

136 Advanced Quantum Mechanics

then we have

q(t) = 〈ψ(0)| Q(t) |ψ(0)〉 .So now, we have that the operator evolves in time, rather than the state.

So, we have the Heisenberg equation,

d

dtQ(t) =

∂Q

∂t− i

~

[Q, H

]. (2.1.10)

Sometimes the time dependent Heisenberg operator is denoted QH(t), todistinguish it from the static Q.

2.1.4.3 The Dirac Picture

This picture is also called the interaction picture.Here, we consider the Hamiltonian to be a sum of a known Hamiltonian,

and an interaction potential,

H = H0 + V (t).

So, we have our evolution operator, U0, due to the known Hamiltonian H0,

U0(t) = e−i~ H0t.

Then, taking our expectation value, and inserting two unities,

q(t) = 〈ψ(t)| U0U†0QU0U

†0 |ψ(t)〉 ,

defining a new state vector, ∣∣∣ψ(t)⟩≡ U †0 |ψ(t)〉 ,

and operator,˜Q ≡ U †0QU0,

then it is clear that

q(t) =⟨ψ(t)

∣∣∣ ˜Q∣∣∣ψ(t)

⟩.

Then, the Schrodinger equation, in the interaction picture, is

i~d

dt

∣∣∣ψ(t)⟩

= ˜V∣∣∣ψ(t)

⟩.

So, we have evolution due to the interaction operator.

2.1.4.4 Examples

Here we shall consider various examples for changing basis for operators.

Page 145: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.1 Different Quantisation Schemes 137

Momentum in Momentum-representation The general form for themomentum operator is

p = −i~∑x,x′

d

dxδxx′ |x〉

⟨x′∣∣ ,

and

pxx′ = −i~ d

dx′δxx′ .

Now, the momentum operator acting on a state gives a new state,

p |ψ〉 = |χ〉 .Projecting this into x-representation,

〈x| p |ψ〉 = 〈x|χ〉 ≡ χ(x).

Now, let us insert a unity on the LHS,∑x′

〈x| p ∣∣x′⟩ 〈x′|ψ〉.Now, this is just ∑

x′

pxx′ψ(x′) = −i~∑x′

d

dx′δxx′ψ(x′),

which is clearly just

−i~ d

dxψ(x).

And this is the result of acting the momentum operator on ψ(x), in x-representation.

Let us consider for p-representation. So,

p =∑p,p′

|p〉 〈p| p ∣∣p′⟩ ⟨p′∣∣ ,after inserting two unities. Now, we have the eigenvalue equation, in p-representation,

p |p〉 = p |p〉 .Therefore, we see that the term p |p′〉 = p′ |p′〉, and thus that

p =∑p,p′

p′ |p〉 〈p|p′〉 ⟨p′∣∣ .

Page 146: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

138 Advanced Quantum Mechanics

However, we are working an orthonormal basis, so

p =∑p,p′

p′ |p〉 δpp′⟨p′∣∣ =

∑p

p |p〉 〈p| .

Thus, the momentum operator, in p-representation, is diagonal, as expected.Now, the wavefunction in p-representation is

〈p|ψ〉 = ψ(p) ≡ a(p).

This is continuous, as momentum is continuous. Notice that in n-representation(for energy),

〈n|ψ〉 = an.

Now, consider acting p onto a(p), in p-representation. Then,

pa(p) =∑p′

pδpp′a(p) = pa(p),

as p is diagonal in p-representation. Therefore, it is easy to see that

p2a(p) = p2a(p).

Kinetic Energy in p-representation Consider the kinetic energy opera-tor,

p2

2m,

in p-representation. So,

p2 =∑p

p2 |p〉 〈p| ,

Then,p2

2ma(p) =

p2

2ma(p).

Hence, the kinetic energy in momentum-space is pretty simple.

Interaction V (x) in p-representation Now, consider

V (x) =∑p,p′

|p〉 〈p| V (x)∣∣p′⟩ ⟨p′∣∣ ,

putting in another two unities,

V (x) =∑

p,p′,x,x′

|p〉 〈p|x〉 〈x| V (x)∣∣x′⟩ 〈x′|p′〉 ⟨p′∣∣ .

Page 147: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.1 Different Quantisation Schemes 139

Now, notice the terms

〈x| V (x)∣∣x′⟩ = V (x)〈x|x′〉 = V (x)δxx′ , 〈x|p〉 = p(x) = C.eipx/~

Thus,

V (x) =∑p,p′,x

|p〉C∗e−ipx/~V (x)Ceip′x/~ ⟨p′∣∣ ,

which is just

V (x) = |C|2∑p,p′,x

e−i~x(p−p′)V (x) |p〉 ⟨p′∣∣ .

Now, we shall send the sum on x to an integral on x,

V (x) = |C|2∑p,p′

∫e−

i~x(p−p′)V (x)dx |p〉 ⟨p′∣∣ .

Let us now define the integral to be

Vp−p′ ≡ |C|2∫e−

i~x(p−p′)V (x)dx,

which is just the Fourier transform of the potential. Thus,

V (x) =∑p,p′

Vp−p′ |p〉⟨p′∣∣ .

Notice that it is not diagonal. We denote q ≡ p− p′, so that

Vq ≡ |C|2∫e−

i~xqV (x)dx, (2.1.11)

We can display this pictorially. We denote a bra-state as a line pointingtowards the right, with a vertex at the right; a ket-state as a line pointingtowards the right, vertex on the left; and a wavey line pointing downwardsto its vertex, as Vq, where q = p− p′. See the figure.

So, we have

χ(p) =∑p′

Vp−p′a(p′).

As a simple example, consider the potential

V (x) = −u0δ(x).

Then,

Vq = −|C|2∫e−

i~xqu0δ(x)dx = −|C|2u0.

The Dirac-delta forces x to be zero.

Page 148: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

140 Advanced Quantum Mechanics

!p"|

|p#

Vq

q

p" p

Fig. 2.1. Representation of the interaction picture. Notice that at the vertex mo-mentum is conserved.

A Simple Hamiltonian Consider the Hamiltonian, having deep potentialspike,

H =p2

2m− u0δ(x),

then, in p-representation, it is

H =∑p,p′

p2

2m|p〉 ⟨p′∣∣ δpp′ −∑

p,p′

cu0 |p〉⟨p′∣∣ .

Under trivial rearrangement,

H =∑p,p′

(p2

2mδpp′ − cu0

)|p〉 ⟨p′∣∣ .

Now, for eigenvalues, the Hamiltonian has

H |ψ〉 = E |ψ〉 ,projecting onto p-representation,

〈p| H |ψ〉 = E〈p|ψ〉 = Ea(p).

Therefore, using our Hamiltonian, the LHS easily becomes

〈p|∑p′′,p′

(p′′2

2mδp′′p′ − cu0

) ∣∣p′′⟩ 〈p′|ψ〉 = Ea(p).

Page 149: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.1 Different Quantisation Schemes 141

We thus see that (also noting 〈p′|ψ〉 = a(p′)),

〈p|∑p′′,p′

p′′2

2mδp′p′′

∣∣p′′⟩ a(p′)− 〈p|∑p′′,p′

cu0

∣∣p′′⟩ a(p′) = Ea(p),

⇒∑p′′,p′

p′′2

2mδp′p′′〈p|p′′〉a(p′)−

∑p′′,p′

cu0〈p|p′′〉a(p′) = Ea(p),

⇒∑p′′,p′

p′′2

2mδp′p′′δpp′′a(p′)−

∑p′′,p′

cu0δpp′′a(p′) = Ea(p).

Thus,p2

2ma(p)− cu0

∑p′

a(p′) = Ea(p).

Now, we turn the sum over p′ into an integral,

p2

2ma(p)− cu0

∫a(p′)dp′ = Ea(p),

rearranging,

a(p) = − cu0

E − p2

2m

∫a(p′)dp′,

integrating, ∫a(p)dp = −

∫cu0

E − p2

2m

dp

∫a(p′)dp′.

Now, if we change symbol on the LHS, putting primes on everything, we get∫a(p′)dp′ = −

∫cu0

E − p2

2m

dp

∫a(p′)dp′,

which immediately implies that

−∫

cu0

E − p2

2m

dp = 1,

absorbing the minus sign, with a modulus on the energy,∫cu0

|E|+ p2

2m

dp = 1.

Therefore, we effectively have a condition for what energy levels exist ina system. Note that we have not specified anywhere the dimension of thesystem, thus, this holds in any dimension of p.

Page 150: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

142 Advanced Quantum Mechanics

So, in 1D, we have simply,

cu0

∫dp

|E|+ p2

2m

=2πcu02m2√

2m|E| = 1,

which is

|E| = 2mc2u0π2.

In 2D we have that dp = 2πpdp, so that

cu0

∫2πpdp

|E|+ p2

2m

= cu02π ln |E| = 1,

thus,

E = e−1/2πcu0 .

Finally, in 3D, dp = 4πp2dp, does not have a solution. There are no energylevels in 3D.

The power of this method seems obvious now. We do a small amount ofalgebra, and the result is applicable for many dimensions.

Position-momentum Commutators Let x be the position operator inthe Schrodinger picture. We want to know x(t) (i.e. in the Heisenbergpicture). We assume free particles, so that the Hamiltonian reads

H =p2

2m.

Now, we have thatdx

dt=∂x

∂t− i

~

[x, H

].

Now, the important commutator here is[x, p2

]= −p [p, x]− [p, x] p = 2i~p

Therefore,dx

dt= − i

~2i~2m

p =p

m.

Notice that this is just the classical expression for momentum,

m ˙x = p.

In a similar way, we can derive that

dp

dt= ˙p = 0.

Page 151: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.1 Different Quantisation Schemes 143

Thus, we see conservation of momentum. So, momentum is a constant p0.Hence, as

˙x =p0

m⇒ x = x0 +

p0t

m.

2.1.5 Path Integrals

In previous formulations of quantum mechanics, constraints on the systemare hard to deal with. So, we develop a new formalism, which is independentof the Hamiltonian, where it is now easy to incorporate constraints.

So, we have that ∣∣ψ(t′)⟩

= U(t′, t) |ψ(t)〉 .Projecting into x-representation,

〈x′|ψ(t′)〉 =⟨x′∣∣ U(t′, t) |ψ(t)〉 ,

inserting a unity,

ψ(x′, t′) =∑x

⟨x′∣∣ U(t′, t) |x〉 〈x|ψ(t)〉

=∑x

⟨x′∣∣ U(t′, t) |x〉ψ(x, t).

Now, if we send the sum on x to an integral,

ψ(x′, t′) =∫ ⟨

x′∣∣ U(t′, t) |x〉ψ(x, t)dx.

Now, let us define a propagator,

K(x′, t′;x, t) ≡ ⟨x′∣∣ U(t′, t) |x〉 , (2.1.12)

then we have

ψ(x′, t′) =∫K(x′, t′;x, t)ψ(x, t)dx.

Those familiar with Green functions will notice that the propagator is aGreen function. The propagator is an evolution operator, in x-representation.So, by our definition of the evolution operator,

U(t′, t) = e−i~ H(t′−t),

the propagator is just

K(x′, t′;x, t) =⟨x′∣∣ e− i

~ Ht′ei~ Ht |x〉

=⟨x′∣∣ e− i

~ Ht′ |x, t〉

= 〈x′, t′|x, t〉.

Page 152: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

144 Advanced Quantum Mechanics

That is, we introduce a time dependent basis. So,

K(x′, t′;x, t) = 〈x′, t′|x, t〉. (2.1.13)

The basis is normal, as ∑x

|x, t〉 〈x, t| = 1.

We can see this another way. Consider that

ψ(x′, t′) = 〈x′, t′|ψ〉,and inserting a unity,

ψ(x′, t′) =∑x

〈x′, t′|x, t〉〈x, t|ψ〉,

which is just our definition of the propagator,

ψ(x′, t′) =∑x

K(x′, t′;x, t)ψ(x, t);

having also noticed that 〈x, t|ψ〉 = ψ(x, t).Now, we can derive an interesting relation. Consider the definition of the

propagator, (2.1.13), inserting a unity,

K(x′, t′;x, t) = 〈x′, t′|x, t〉 =∑x′′

〈x′, t′|x′′, t′′〉〈x′′, t′′|x, t〉,

however, we see another propgator,

K(x′, t′;x, t) =∑x′′

K(x′, t′;x′′, t′′)K(x′′, t′′;x, t).

Sending the sum to an integral,

K(x′, t′;x, t) =∫K(x′, t′;x′′, t′′)K(x′′, t′′;x, t)dx′′.

Let us change notation slightly. We can see that the propagator on the LHSgoes from unprimed to single primed, and that in the integrand there is apropagator which goes to some intermediate state (the double primes). So,let us denote the above equation as

K(F ; I) =∫K(F ; 1)K(1; I)dx1.

This is the Markovian property. Those familiar with advanced statisticalmechanics will recognise this property.

The integral sweeps over all possible intermediate states, in order that a

Page 153: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.1 Different Quantisation Schemes 145

F

n! 1

n! 2

I

1

2

Fig. 2.2. Representation of the path integral formulation. The integrals sweep overall possible ways of making each transition. That is, the line joining I and1 sweepsover all possible states in 1; and continuing up to the final state.

transition from one state to the next is made. Infact, we can insert anothertwo unities,

〈F |I〉 =∑1,2

〈F |2〉〈2|1〉〈1|I〉,

so that the corresponding integral will sweep over another combination ofintermediate states. We keep inserting unities,

K(F ; I) =∫K(F ;n− 1)K(n− 1;n− 2) . . .K(1; I)dx1 . . . dxn−1.

So, we connect two points by integrating over all possible trajectories, allpossible ways, of getting there. Infact, we also sum over all intermediatestates as well.

For a small time interval, the individual amplitude is

K(m+ 1;m) =1

(2π~)n

∫dpe

i~Ldt,

where L is the classical Lagrangian, and n the dimension of the system.Thus, putting all these little pieces together,

K(F ; I) = limN→∞

∫dx1 . . . dxN−1

∏j

∫dpj

(2π~)nei~

RLdt,

which we denote as

K(F ; I) =∫ DxDp

(2π~)nei~S ,

Page 154: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

146 Advanced Quantum Mechanics

where S is the action. We can write this as

K(F ; I) =∑xi,pi

ei~S .

Now, if we have that

pi = mxi+1 − xiti+1 − ti ,

then the action is

S =∫pdx−Hdt =

∑i

p(xi+1 − xi)−H(ti+1 − ti),

where we have just used the sum-integral cross-over idea.The advantages of this quantisation scheme, are that there are no prob-

lems when introducing constraints. Also, the transition between classicalphysics & quantum physics is transparent. However, a disadvantage is thatthe method is mathematically vulnerable (read: works, but shouldn’t). Themathematics was developed by physicists, for physics (rather than by math-ematicians, then adapted for physics). As a result, the maths wasn’t welldefined at all, with huge ambiguity, with no-one really understanding whythe maths worked at all. It has, however, been mathematically understooda little more.

2.1.5.1 Classical or Quantum Action?

Now, consider some path, defined by (x1(t), p1(t)). The associated actionis just Si(x1(t), p1(t)). The contribution to the path integral will be e

i~S1 .

Now, consider another path,

x2(t) = x1(t) + δx1(t), p2(t) = p1(t) + δp1(t);

where it is clear that the new path is very close to the old one. Then, itscontribution will be

ei~S2 = cos

S2

~+ i sin

S2

~.

Then, one will see that such a “total path integral” will have terms such as

δS

~=S1 − S2

~.

This is just an oscillatory term (also, the expectation values of cos & sin areboth zero). Hence, δS = 0. Therefore, these trajectories contribute mostto the integral. That is, the path for whom the variation in the action iszero, is just given by the Euler-Lagrange equations. This is the classicaltrajectory.

Page 155: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.1 Different Quantisation Schemes 147

So, if we let S~ 1, then e

i~Scl is the main contributor to the path integral.

This is because the oscillatory argument holds. Note that if S~ 1, then theoscillatory argument breaks down, and we have deviation from the classicalpath.

Therefore, we see that (by our rather wooly-argument), the realm of clas-sical physics is S ~, and of quantum physics for when S ~.

So, the “semi-classical” value of the propagator is

K(F ; I) = Cei~Scl ,

where C is a constant, and Scl the classical action. That is, the abovestatement says that the only path to contribute is the classical one. Considerthe standard spring Lagrangian; upon solving, one finds an equation ofmotion. This is the classical path. Our statement above states that theclassical path is the main contributor for systems where the action is a lotgreater than ~. However, if this were not the case, the spring would alsohave motion not described by the standard oscillatory expression.

The classical action, for a free particle, reads

Scl = px−Ht ⇒ Scl~

= xp

~− H

~t.

Now, using the standard expressions, E = ~ω (noting that the Hamiltonianis the energy), and p = ~k, we see that

Scl~

= kx− ωt.

Therefore, the corresponding propagator is just

K(F ; I) = Cei~Scl = Ce−i(ωt−kx),

that is, a plane wave.

A Free Particle Consider a free particle, which travels from the initialstate, I, to final state F ; where each has position & time I(xi, ti), F (xf , tf ).Suppose that the total length of the path is L, and time T . Then, the speedof the particle is just

v =L

T=xf − xitf − ti .

Now, the classical momentum is just

pcl = mv =mL

T.

Page 156: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

148 Advanced Quantum Mechanics

So, the classical action is

Scl =∫ f

ipdx−Hdt,

if we use the standard v = dx/dt, then this is just

Scl =∫ f

i(pv −H)dt.

Now, we use that

v =p

m, H =

p2

2m,

then we see that

Scl =∫ f

i

(p2

m− p2

2m

)dt,

which evaluates easily to

Scl =p2

2mT,

after noting that momentum p is constant. So, if we insert our expressionfor p, the classical action is

Scl =mL2

2T.

Then, if Scl ~, the only contributor to the propagator is the classicalaction;

K(F ; I) = CeimL2

2T~ , Scl ~.

Using our previous definitions of the length & time of the path, this isobviously just

K(F ; I) = Ceim(xf−xi)

2

2(tf−ti)~ .

So, let is consider two examples, to test if they are quantum or classical innature.

• An electron in a scanning electron microscope. Here, typical energies areE ≈ 3keV, length scale L ≈ 50cm. If we use that E = 1

2mv2, T = L/v,

E = p2

2m = 12mv

2, then we can find that Scl~ ≈ 1010. Therefore, such a

system is (very!) classical.• An electron in a quantum well. Here, considering that typical energies are

3meV, length scales 1nm, we find that S~ ≈ 0.6. Therefore, such a system

is quantum in nature.

Page 157: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.1 Different Quantisation Schemes 149

Parabolic Potential & Tunelling Let us first consider a parabolic poten-tial (quantum harmonic oscillator), and determine how energy levels arise.

Now, if we consider that ∑n

eiS~ n = 0,

unless S~ is an integer multiple of 2π; then, it is easy to see that the levels

will be quantised.Now, consider a particle tunneling between two wells (looks like a mexican

hat potential). Then, the energy levels in one well are just Scl = 2πm~.Now, if the action is complex, then the propagator decays. This case ariseswhen we consider that between the wells (at the “hump”), a particle willhave complex momentum (in order to keep the energy - sum over potential& kinetic - zero). Then, there are infact “degenerate trajectories”. That is,ones for whom at different times, the position is the same. It is because itgets infinitely “hard” to get to the “hump”. A degenerate trajectory is suchthat

x(t) ≈ x(t+ τ).

So, the total propagator is formed from two terms. One for motion withinwells, and one between wells. This looks like

K(F ; I) = eiS~ + τe−

S~ ,

respectively. The “i” is missing from the second term as the action is com-plex between the wells.

2.1.6 Review of Quantisation Schemes

Let us review the quantisation schemes we have discussed.

Copenhagen Interpretation The main equation was the Schrodingerequation, in the form (

− ~2

2m∇2 + V

)ψ = i~ψ,

where the advantages of the quantisation scheme are that of being very welldeveloped. It has applications in 1D & 2D, but 3D problems tend to be hardto solve. Disadvantages of the scheme are in measurement & quantisation.

Page 158: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

150 Advanced Quantum Mechanics

Dirac Formalism The main equation here, is the Schrodinger equation inthe form

i~∣∣∣ψ⟩ = H |ψ〉 .

This formalism is mathematically sound, and has applications in many body-problems. However, the disadvantages are the same as the Copenhageninterpretation: within measurement & quantisation. It is also difficult toincorporate equations of constraint.

Path Integrals Here, the main equation is that for the propagator

K(F ; I) =∫ DxDp

(2π~)nei~S .

The advantage of this scheme is that it is physically very transparent. How-ever, problems come in mathematical rigour. The formalism has applicationswithin elementary particles & fields.

2.2 Quantum Harmonic Oscillator

We shall look at the harmonic oscillator, and find its eigenstates & eigen-values, using raising & lowering operators.

The classical Hamiltonian for a harmonic oscillator is

H =p2

2m+

12mω2x2,

therefore, under the correspondence principle, the quantum harmonic oscil-lator has Hamiltonian

H =p2

2m+

12mω2x2.

So, we want eigenvalues & normalisable eigenstates, such that

H |ψn〉 = En |ψn〉 , 〈ψn|ψn〉 = 1.

2.2.1 Raising & Lowering Operators

Now, let us define some operators, which are combinations of the momentum& position operators;

a ≡ 1√2m~ω

(p− imωx) , (2.2.1)

a† =1√

2m~ω(p+ imωx) . (2.2.2)

Page 159: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.2 Quantum Harmonic Oscillator 151

It is clear that a† is adjoint to a, as both the momentum & position operatorsare Hermitian. Now, consider the product

a†a =1

2m~ω(p+ imωx) (p− imωx)

=1

2m~ω(p2 − imω(px− xp) +m2ω2x2

),

but, we notice the presence of the commutator [p, x] = −i~. Therefore, theproduct is

a†a =1

2m~ω(p2 + imωi~ +m2ω2x2

).

So, we see that

~ωa†a =p2

2m− 1

2~ω +

12ω2mx2

= H − 12

~ω.

Therefore, we see that we can write the Hamiltonian in terms of our “newoperators”, as defined above;

H = ~ω(a†a+

12

). (2.2.3)

It is also clear that

a†a =H~ω− 1

2. (2.2.4)

The Commutator Let us consider the commutator[a, a†

]. So,[

a, a†]

=1

2m~ω[p− imωx, p+ imωx] .

The only terms with non-zero contribution are those with both position &momentum. Thus, we see that[

a, a†]

=1

2m~ωimω ([p, x]− [x, p]) ,

which easily gives the result [a, a†

]= 1. (2.2.5)

Page 160: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

152 Advanced Quantum Mechanics

The Eigenstates Let us consider H (a† |ψ〉). So, if we rewrite the Hamil-tonian, using (2.2.3),

H(a† |ψ〉

)= ~ω

(a†a+ 1

2

)a† |ψ〉 .

Now, consider the cross-term above. It is

a†aa† = a†(aa†)

= a†(aa† + a†a− a†a

)= a†

(aa† − a†a+ a†a

)= a†

([a, a†

]+ a†a

)= a†

(1 + a†a

).

Therefore, we see that we have

H(a† |ψ〉

)= ~ω

(a†(

1 + a†a)

+12a†)|ψ〉 .

Now, by (2.2.4), we can rewrite the “middle” bracketed part;

1 + a†a =H~ω

+12,

from which we see that

H(a† |ψ〉

)= ~ω

(a†H~ω

+ a†

)|ψ〉 .

Now, as we know that H |ψ〉 = E |ψ〉, we therefore see that the above is just

H(a† |ψ〉

)= (E + ~ω) a† |ψ〉 .

Therefore, we see that the Hamiltonian acting upon the state a† |ψ〉 givesthat state, multiplied by a number (E + ~ω). Therefore, a† |ψ〉 is an eigen-state of the Hamiltonian, with eigenvalue E + ~ω.

In a completely analogous way, we find that

H (a |ψ〉) = (E − ~ω) a |ψ〉 ,H(

(a†)2 |ψ〉)

= (E + 2~ω) (a†)2 |ψ〉 ,H (a2 |ψ〉) = (E − 2~ω) a2 |ψ〉 .

That is, the Hamiltonian has eigenstates (a†)n |ψ〉 with eigenvalues E+n~ω,and eigenstates an |ψ〉 with eigenvalues E − n~ω.

Page 161: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.2 Quantum Harmonic Oscillator 153

For this reason, we call a† the raising operator, and its adjoint a thelowering operator. So, let us state the operators again.

The raising operator is defined as

a† =1√

2m~ω(p+ imωx) . (2.2.6)

The lowering operator is defined as

a ≡ 1√2m~ω

(p− imωx) . (2.2.7)

a†

a†

a

a

E + 2h!

E + h!

E

E ! h!

E ! 2h!

|""

a†|""

!a†"2 |""

a|""

a2|""

Fig. 2.3. The action of operating the raising and lowering operators on a state.

Preserving Positive Norm Now, consider that |χ〉 = a |ψ〉, and that〈χ| = 〈ψ| a†. Then,

〈χ|χ〉 = 〈ψ| a†a |ψ〉 ,using (2.2.4), we see that this is

〈χ|χ〉 = 〈ψ| H~ω− 1

2|ψ〉 =

E

~ω− 1

2,

Page 162: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

154 Advanced Quantum Mechanics

given that the initial state was normalised, 〈ψ|ψ〉 = 1. Now, the norm of astate should be positive. Therefore, we see that if

〈χ|χ〉 > 0 ⇒ E

~ω− 1

2> 0,

or,E

~ω>

12.

So, we see that we must have the energy being greater than some value, topreserve positive norm. So, let the minimum energy be

E0 =~ω2.

We require that there is some state for which

a |χ〉 = 0,

then, we call the state the “vacuum state”. We denote the vacuum state sothat

a |0〉 = 0.

Transferring Between States Suppose we have that

|n〉 = (a†)n |0〉 ,where |n〉 is not normalised, and En = E0 +~ω. Then, to find normalisation,we consider that

|n〉 = C(a†)n |0〉 ,is a normalised state. Then

〈n|n〉 = |C|2 〈0| an(a†)n |0〉 = |C|2n!.

Therefore, we have that

|n〉 =(a†)n√n!|0〉 , En = (n+ 1

2)~ω

as the solution to the eigenvalue problem of the harmonic oscillator.Consider

a†a |n〉 =

(H~ω− 1

2

)|n〉

=

((n+ 1

2)~ω~ω

− 12

)|n〉

= n |n〉 .

Page 163: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.2 Quantum Harmonic Oscillator 155

Thus, if we define the operator of number of states n ≡ a†a, then

n |n〉 = n |n〉 .Consider

a |n〉 = C |n− 1〉 ,then,

〈n| a†a |n〉 = |C|2〈n− 1|n− 1〉,and

〈n| a†a |n〉 = n〈n|n〉 = 1.

Therefore,

a |n〉 =√n |n− 1〉 .

Similarly,

a† |n〉 =√n+ 1 |n+ 1〉 .

Summary of Results[a, a†

]= 1, (2.2.8)

a†a |n〉 = n |n〉 , (2.2.9)

a |n〉 =√n |n− 1〉 , (2.2.10)

a† |n〉 =√n+ 1 |n+ 1〉 . (2.2.11)

2.2.2 The Vacuum State ψ0(x)

As we previously discussed, there is a “lowest state”, for whom when thelowering operator acts upon that state, the answer is zero. That is,

a |0〉 = 0.

We can compute the corresponding wavefunction, ψ0(x) by usual methods.So, we have that

ψn(x) = 〈x|n〉.Then, making the product

〈x| a |0〉 = 0,

putting the lowering operator into x-representation. Inserting a unity,∑x′

〈x| a ∣∣x′⟩ 〈x′|0〉 = 0.

Page 164: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

156 Advanced Quantum Mechanics

Now, the first term is the matrix element 〈x| a |x′〉 = axx′ , and the secondterm is just 〈x′|0〉 = ψ0(x′). Hence,∑

x′

axx′ψ0(x′).

Now, we know that

axx′ =1√

2m~ω(p− imωx)xx′

=1√

2m~ω

(−i~ d

dx− imωx

)δxx′ .

Therefore, we see that (−i~ d

dx− imωx

)ψ0(x) = 0.

This easily solves to

ψ0(x) =(mωπ~

)1/4e−

mω2~ x

2,

where the constant results from normalisation.Therefore, we have computed the groundstate wavefunction of the quan-

tum harmonic oscillator.

2.2.3 The General State ψn(x)

Let us consider how to construct any state, ψn(x), of the quantum harmonicoscillator, using the raising operator. That is, we would like to gain anexpression for any wavefunction, given the vacuum state; as the raisingoperator takes us up the states.

Before we begin, we recall the identical result from the “old way” of solvingthe problem. That is, by solving the Schrodinger equation directly, we findthat

ψn(x) =(mωπ~

)1/4 1√2nn!

e−mω2~ x

2Hn

(x

√mω

~

),

where Hn(y) is the nth order Hermite polynomial.It may already be clear that the raising operator method will be simpler.Now, we have seen that the state |n〉 may be constructed from the vacuum

state |0〉 by n applications of the raising operator,

|n〉 =(a†)n√n!|0〉 .

Page 165: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.2 Quantum Harmonic Oscillator 157

Then, projecting onto x-representation,

〈x|n〉 = 〈x| (a†)n√n!|0〉

= ψn(x)

inserting a unity, ∑x′

〈x| (a†)n√n!

∣∣x′⟩ 〈x′|0〉.Then, this is clearly just

ψn(x) =1

(2m~ω)n/2

(−i~ d

dx+ imωx

)nψ0(x).

Therefore, we have found an expression for the nth state of the quantumharmonic oscillator, by using the raising operators on the vacuum state.

Fig. 2.4. The eigenstates of the quantum harmonic oscillator. The “distance up”the vertical axis represents the energy of the state.

2.2.4 Eigenstates & Eigenvalues of a

To compute the eigenstates and eigenvalues of the lowering operator, wemust solve the eigenequation

a |ψ〉 = λ |ψ〉 , 〈ψ|ψ〉 = 1.

Page 166: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

158 Advanced Quantum Mechanics

Now, as a is clearly not Hermitian, we do not have the immediate restrictionthat its eigenvalues must be real.

So, projecting the eigenequation into x-representation,

〈x| a |ψ〉 = λ〈x|ψ〉 = λψ(x),

inserting a unity, ∑x′

〈x| a ∣∣x′⟩ 〈x′|ψ〉 = λψ(x),

which is clearly just ∑x′

axx′ψ(x′) = λψ(x).

Then, putting in the expression for the operator, we have

1√2m~ω

(−i~ d

dx+ imωx

)ψ(x) = λψ(x).

We can now solve this easily enough. It is simple to get the expression intothe form

ψ=

(−mωx

~+ iλ

√2mω

~

)dx,

integrating results in

ψ(x) = B. exp

(−mω

2~x2 + iλ

√2mω

~x

).

Now, to constrain the possible values of λ (i.e. to the real or complex set ofnumbers), we must consider the normalisation condition,∫

|ψ(x)|2 = 1.

That is, we require a value of λ that will preserve finite norm.Now, if we consider that the wavefunction is a product of two exponentials;

one with argument in −x2, the other in x, and we note that e−x2

decaysmuch quicker than ex grows (due to the squared-part), then there is nonumber λ which could pull the growing part above the decaying part. Thatis, any real or complex number λ works.

Therefore, any number, real or complex, λ is an eigenvalue of the loweringoperator.

Let us consider the eigenstates of a again, but in energy-representation.So, from the eigenequation, projecting into n-representation,

〈n| a |ψ〉 = λ〈n|ψ〉 = λψn.

Page 167: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.2 Quantum Harmonic Oscillator 159

Then, inserting a unity,

λψn =∑m

〈n| a |m〉 〈m|ψ〉.

Now, we immediately see that 〈m|ψ〉 = ψm. Also, we know that

a |m〉 =√m |m− 1〉 .

Hence,

λψn =∑m

√m 〈n|m− 1〉ψm,

however, due to orthonormality of the energy-states, 〈n|m − 1〉 = δn,m−1,this is just

√n+ 1 ψn+1 = λψn.

This easily rearranges to

ψn+1 =λ√n+ 1

ψn.

These are called coherent states.

2.2.5 Examples

Let us consider a couple of examples; again, they show the ease of solvingproblems using the raising/lowering operators, rather than the old directmethod.

2.2.5.1 Expected Values of Kinetic Energy & Position

Suppose we wish to know the expected value of kinetic energy,

T =⟨p2

2m

⟩,

using the “old method”, this required us to compute the integral⟨p2

2m

⟩=∫ψ∗n(x)

(− ~2

2md2

dx2

)ψn(x)dx.

For a high-n state, this is very tedious.Now, from the definitions of a, a†, we can add and subtract them, to find

p =

√2m~ω

2

(a+ a†

), x =

√2m~ω2imω

(a− a†

). (2.2.12)

Page 168: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

160 Advanced Quantum Mechanics

Then, the expected value of the kinetic energy, for a harmonic oscillator inthe state |n〉, is just

T = 〈n| p2

2m|n〉

=1

2m

(√2m~ω

2

)2

〈n|(a+ a†

)2 |n〉 .

Expanding out,

T =~ω4〈n| a2 + aa† + a†a+ (a†)2 |n〉 .

Now, we can immediately see that two of these terms will give zero. Considerthat the raising operator (on the far right) will create a state |n+ 2〉, then,that state will give the product 〈n|n + 2〉 = 0. Similarly, the loweringoperator on the far left will produce a state 〈n|n − 2〉 = 0. Therefore, weare just left with

T =~ω4〈n| aa† + a†a |n〉 .

Now, if we recall that we have the commutator[a, a†

]= 1, then we see that

we can write

aa† + a†a = 1 + 2a†a.

Hence,

T =~ω4〈n| 1 + 2a†a |n〉

=~ω4

+~ω2n

=~ω2

(n+

12

)Therefore, using this, the expected kinetic energy is

T =12〈n| H |n〉

=12En.

Therefore, the average kinetic energy, for a quantum harmonic oscillator inthe state |n〉 is En/2. This result was relatively easy to obtain, and worksfor any state |n〉.

The expected value of position is just

〈x〉 = B 〈n| a− a† |n〉 = 0.

Page 169: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.3 Secondary Quantisation 161

2.3 Secondary Quantisation

Pretty much every physical situation has many particles. For example, acubic metre of air has ≈ 1022 molecules, and a cubic metre of steel ≈ 1023.Writing a wavefunction for such a system is formidable. Its form would besomething like

ψ(x1, x2, . . . , x1023),

and is clearly a completely useless quantity to work with.In quantum mechanics, the situation is simplified by the introduction of

the concept of identical particles. A few ways to think of identical particles:

• Identical particles are such that under interchange, the physical state ofthe system is unchanged;• Identical particles are those which cannot be distinguished by any means.

So, for a 2-particle system, we must make the interchange

ψ(x1, x2) 7−→ ψ(x2, x1),

where we move particle 1 to the position that particle 2 used to have.Now, the expectation value of a wavefunction is unaffected by the multipli-

cation of a phase eiα. That is, eiαψ(x) and ψ(x) give the same observables.Therefore, to link the interchanged wavefunctions, we suppose that

ψ(x1, x2) = eiαψ(x2, x1).

One can think about this phase shift in the following way. Consider twoparticles, and that we wish to move them along a line, to interchange theirpositions. Now, if we make the particle move along the line that joins themdirectly, then the particle will collide (which is bad). Hence, we make theparticle move on trajectories slightly removed from direct. We make themrotate a bit at the point they would collide. That is, we send them onsemi-circular trajectories. This is obviously equivalent to rotating the entiresystem.

Fig. 2.5. Interchanging two particles. Notice that we send the particle along slightlyrotated paths: the little semi-circle about the centre- point.

Now, consider that we have interchanged the particles, picking up a phase

Page 170: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

162 Advanced Quantum Mechanics

eiα. Then, suppose that we move them back. Thus, we pick up anotherphase factor. Therefore,

ψ(x1, x2) = e2iαψ(x1, x2).

As we require this to be true, we require that e2iα = 1. This is satisfied bythe angles α = 0, π.

Notice then, the two angles give two possibilities for the single-interchange

ψ(x1, x2) = ψ(x2, x1), ψ(x1, x2) = −ψ(x2, x1).

That is, one class of particles are symmetric under particle interchange, andanother anti-symmetric.

We call the symmetric particles Bosons, and the anti-symmetric particlesFermions.

Notice that for the fermionic case, with two particles in the same place,the interchange is “false”,

ψ(a, a) = −ψ(a, a) ⇒ ψ(a, a) = 0.

The Pauli principle is that particles with integer spins are fermions, andparticles with half-integer spin ae bosons.

2.3.1 Bosons & Fermions

As we just discussed, for the two particle case, bosons have totally symmetricwavefunctions, and fermions totally anti-symmetric wavefunctions.

Bose particles Consider the many-particle wavefunction for the Bose case;then, under interchange of any two particles,

ψB(x1, . . . , xi, . . . , xj , . . . , xN ) = ψB(x1, . . . , xj , . . . , xi, . . . , xN ) ∀i, j.

Fermi particles Consider the analogous interchange for the Fermioniccase,

ψF (x1, . . . , xi, . . . , xj , . . . , xN ) = (−1)TrψF (x1, . . . , xj , . . . , xi, . . . , xN ),

where (−1)Tr is a sign-factor that we will come to. It essentially figures outhow many places the particles had to move.

Page 171: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.3 Secondary Quantisation 163

2.3.2 Non-interacting Particles

Let us consider N non-interacting particles. Then, the total Hamiltonian isjust the sum of the free individual Hamiltonians,

H(1, . . . , N) =N∑j=1

H0(j),

whereby

H0 =∑n

εn |n〉 〈n| .

Now, suppose that the wavefunction ψi(xj) denotes that the single particlej is in energy state i. Then, the total wavefunction will be the appropriatelysymmetrised product of these single particle wavefunctions. That is, for a2-particle bose-case, the total wavefunction looks like

Ψ =ψ1(x1)ψn(x2) + ψn(x1)ψ1(x2)√

2.

That is, the total wavefunction is a (normalised) sum over the possible statesof the system (assuming that the particles are either in energy state “1” or“n”).

Now, if we start to count up how many particles are in a particular en-ergy state, then we denote ni as the number of particles, in energy state i.Therefore, for the bose case, we can have ni as any number; but the fermioncase has ni as only 0 or 1. That is, an energy state (for fermions) either hasone or zero particles in it. So, we denote the basis vector

|n〉 = |n0, n1, n2, . . . , nj , . . .〉

as specifying the number of particles in each state. That is, the occupancynumbers of each state.

2.3.3 Creation & Destruction Operators

We now introduce the creation and destruction operators. They will allowthe number of particles in a system to vary. Now, as the fermion and bosoncase are so different, we shall discuss them separately.

Page 172: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

164 Advanced Quantum Mechanics

2.3.3.1 Bose Case

Now, the general overall wavefunction for non-interacting bosons is

ψ(x1, . . . , xN ) =1√N !

∑transpositions

ψ0(x1) . . . ψ0(xn0)ψ1(xn0+1) . . . ψ1(xn0+n1) . . .

. . . ψj(xPj−1i=0 ni

) . . . ψj(xPji=0 ni

) . . . ,

where the rather cumbersome notation means: the sum over all transposi-tions (all combinations) for any number of particles in any one of the states.

Now, let us define the creation and destruction operators.The creation operator is defined as

a+j |n0, . . . nj , . . .〉 =

√nj + 1 |n1, . . . , nj + 1, . . .〉 . (2.3.1)

That is, acting the operator a+j upon a state, increases the number of par-

ticles in state j by one.The destruction operator is defined as

aj |n0, . . . nj , . . .〉 =√nj |n1, . . . , nj − 1, . . .〉 . (2.3.2)

nj

a†j

nj + 1

Fig. 2.6. The effect of acting the creation operator a+j upon the system. The number

of particles in state j is increased by 1, and the other states remain unchanged. Thatis, after the action of a+

j , there are nj + 1 particles in state j.

Notice that (aj a

+j − a+

j aj

)|N〉

will give (reading from far right to left) a decreased number of particles instate j (and a factor of √nj ), then the operator a+

j will take that reducednumber of particles back up to the beginning value, with a factor of √nj .Then, seeing this for both terms, we see that(

aj a+j − a+

j aj

)|N〉 =

(√nj + 1

√nj + 1 −√nj √nj

) |N〉 = |N〉 .

Page 173: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.3 Secondary Quantisation 165

Therefore, we can read off that

aj a+j − a+

j aj = 1. (2.3.3)

That is, we have the commutator,[aj , a

+j

]= 1. (2.3.4)

In a similar vein, we can see that[ai, a

+j

]= δij , (2.3.5)

and also that [a+i , a

+j

]= 0, i 6= j.

2.3.3.2 Fermion Case

Here, we must be careful of where we move the “new particle” to. That is,let us create a state,

a+j |N〉 = |1j , N〉 ,

where we start off by dumping that new state in the start of the ket. Weuse the notation 1j to denote an extra particle in state j. Then, we mustmove that new state up to position j, so that

a+j |n0, . . . , nj , . . .〉 =

√nj + 1 (−1)

Pj−1i=0 ni |n0, . . . , nj + 1, . . .〉 . (2.3.6)

Similarly, the destruction operator dumps the new state in the “first slot”,then moves it up to the jth slot. In doing so, a change in sign may be pickedup,

aj |n0, . . . , nj , . . .〉 =√nj (−1)

Pj−1i=0 ni |n0, . . . , nj − 1, . . .〉 . (2.3.7)

These relations differ from the Bose case, due to the sign change. However,both are similar in that a new particle is created in a given state.

Consider

a+j a

+i |N〉 =

√nj + 1

√ni + 1 (−1)

Pj−1i=0 ni

∣∣∣N⟩ ,and also

a+i a

+j |N〉 =

√nj + 1

√ni + 1 (−1)

Pj−1i=0 ni

∣∣∣N⟩ .Now, the two expressions are different, with the difference being very subtle.In the first expression, a particle is added to state i, so that the number inthat state is ni + 1. This then does something to the overall sign of thestate. Then, a particle is added to the state j, so that there are nj + 1 in

Page 174: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

166 Advanced Quantum Mechanics

that state. So, in total, 2 particles were added to the system, with eachparticle appearing in different states.The second expression does this the other way round. It adds a particle tostate j then to state i. Irrespective of which comes first, i or j, there willbe a sign difference between the two methods. Other than the sign, the twomethods will produce the same state (i.e. raised the particle number, in twoseparate states, by two). This is because the “sign factor” wil be raised toa different power, one will be even, one odd. Therefore,

a+i a

+j |N〉 = −a+

j a+i |N〉 .

This results in a different sort of commutator. That is, we denote

a+i a

+j + a+

j a+i = 0 (2.3.8)

as a+i , a

+j

= 0. (2.3.9)

We call this an “anti-commutator”. That is, instead of taking their differ-ence, we take their sum. In a similar way, we can show that

ai, aj = 0,ai, a

+j

= δij .

We use the notation

(−1)Tr ≡ (−1)Pj−1i=0 ni ,

as a transposition factor.

2.3.4 The Secondary Quantisation Scheme

So, we have seen that we use commutators for bosons, and anti-commutatorsfor fermions. The difference comes when we consider the required anti-symmetry of the fermionic wavefunction.

The vacuum state, |0〉 is that state which contains no particles. Then, thedestruction operator acting upon the vacuum state results in

ai |0〉 = 0, ∀i.Therefore, we can construct a state with N particles using the creationoperator,

|N〉 = |n0, n1 . . .〉 =∏j

(a+j )nj√nj|0〉 . (2.3.10)

Hence, we see that the total number of particles is not fixed.

Page 175: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.3 Secondary Quantisation 167

The set of all states |N〉 (the set where there are 0 particles, 1 particles,2 particles etc), is called a Fock space. We can transfer between states inthe Fock space using the creation and destruction operators. That is, wecan write

|N〉 = Q |0〉 , Q ≡∏j

(a+j )nj√nj

. (2.3.11)

This idea of assigning an operator to a system is called secondary quantisa-tion.

States are orthonormal, thus

〈M | a+j |N〉 =

√nj + 1 δM,N+1j (−1)Tr, 〈M | aj |N〉 =

√nj δM,N−1j (−1)Tr.

The number of particles operator is

nj = a+j aj .

2.3.4.1 Non-interacting Hamiltonian

The Hamiltonian is, as previously stated,

H =∑j

H0(j).

Therefore, inserting two unities,

H =∑j,M,N

|M〉 〈M | H0(j) |N〉 〈N |

=∑j,M,N

njεj |M〉 〈M |N〉 〈N |

=∑j,N

njεj |N〉 〈N | .

Thus,

H =∑N

EN |N〉 〈N | , En ≡∑j

njεj .

That is, H is diagonal in |N〉-representation, and En is the sum over thenumber of particles in the single-particle Hamiltonian energy levels εj .

Now, as nj = a+j aj is the particle number operator, we can replace nj with

nj in the Hamiltonian. That is, the Hamiltonian is just the sum over allenergy levels times by the number of particles in each energy level. We findthe number of particles in each energy level with the ni operator. Therefore,

H =∑i

εini =∑i

εia+i ai

Page 176: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

168 Advanced Quantum Mechanics

If there are two-particle interactions, then

H =∑i

H0(xi) +12

∑i,j

V (xi, xj),

which we write, under the second quantisation scheme, as

H =∑i

εia+i ai +

12

∑i,k,l,m

Vik,lma+i a

+k alam.

It is important to note the difference between the creation/destructionoperators and the raising/lowering operators. The former are able to createparticles, whereas the latter merely add energy to particles that are alreadythere.

2.3.5 Average Number of Particles

For a system with chemical potential µ, at temperature T , with a totalnumber of particles N , then, the mean occupancy of state j is

〈nj〉 =∑∞

k=0 ke− kT

(εj−µ)∑∞k=0 e

− kT

(εj−µ),

where k will be the number of particles in a given state. The energy of statej is εj . To write this, we just used the standard result from the Maxwell-Gibbs relation from statistical mechanics. From this, we define the partitionfunction,

Z ≡∞∑k=0

e−kT

(εj−µ) =1

1− e−εj−µT

.

Now, notice that

T 2∂Z

∂T=∞∑k=0

kεje− kT

(εj−µ).

This only holds for the bosonic case, where we are allowed to take k =0, 1, 2, 3 . . ..Then, further notice that

〈nj〉 =T 2

εj

∂TlnZ,

then, doing the differential, results in

〈nj〉B =1

eεj−µT − 1

.

Page 177: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.3 Secondary Quantisation 169

Therefore, we have used the Maxwell-Gibbs distribution to derive the aver-age number of bosons in energy state εj . From this, we have Bose-Einsteincondensation, as a consequence of requiring a fixed number of particles astemperature drops and µ→ 0.

For the fermionic case, we take the sum from k = 0, 1 only. Hence, weend up with

〈nj〉F =1

eεj−µT + 1

.

Now, the point of doing this, within the context of secondary quantisation,is that we can write that

H =∑i

εini ⇒⟨H⟩

=∑i

εi 〈ni〉 ,

that is, the average energy is⟨H⟩

=∑i

εi

eεj−µT ± 1

.

2.3.6 Quasi-particles

Recall the two-particle interaction potential,

V =12

∑i,k,l,m

Vik,lma+i a

+k alam.

Now, if the particles are in pairs, we write this as an expansion

V =∑p

wpa+p a

+−p + wpapa−p,

so that the Hamiltonian reads

H =∑p

εpa+p ap +

∑p

wpa+p a

+−p + wpapa−p.

Now, it is clear that this Hamiltonian is not diagonal. However, let us definea basis in which the Hamiltonian is diagonal. That is, the Hamiltonian willhave the form

H =∑p

Epb+p bp,

where the operators b+p , bp satisfy the same anti-commutation relations asa+p , ap;

a+p , ap′

= δpp′ ,

b+p , bp′

= δpp′ .

Page 178: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

170 Advanced Quantum Mechanics

Then, by using b’s rather than a’s, we have introduced new “particles”, forwhich the b’s are their creation/destruction operators. That is, the operatorsb+p , bp correspond to quasi-particles, for which the Hamiltonian is diagonal.The quasi-particles are constructed by some expansion

bp = upap + vpa+−p.

2.4 Symmetries in Quantum Mechanics

Let us consider symmetries, and what consequences they have.A symmetry is defined as some operation, which when applied to a system,

leaves the Hamiltonian of the system unchanged. Then, suppose that O issome symmetry functional, so that

OH = H.That O is a functional, rather than an operator (at this stage) means thatO changes the things that H is a function of. That is,

O : x 7→ x′ = f(x).

Then, the symmetry statement is that

H (f(x)) = H(x).

Therefore, we see that under the symmetry functional, the Hamiltonian hasthe same dynamics, in the new coordinate system.

Examples of Symmetry Operations The time translation functional,

OT H = H ⇒ H(t+ T ) = H(t),

the space translation,

OaH = H ⇒ H(x+ a) = H(x),

or the parity operator

P H = H ⇒ H(−x) = H(x).

Commuting Operators If A and B are both diagonal in some basis, thenA and B commute. One can see this fairly simply,

AijBjk = aiδijbjδjk = aibjδij = aibi = biai.

Also, if A and B commute, then there exists a basis in which they are bothdiagonal. Similarly, if A, B, C are mutually diagonal in some basis, thenthey all mutually commute. The converse is also true.

Page 179: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.4 Symmetries in Quantum Mechanics 171

If A is a time independent operator which commutes with the Hamil-tonian, then the Hamiltonian is conserved. Finally, the set of commutingoperators, that commute with the Hamiltonian, provides “good” quantumnumbers which are conserved with time.

For example, as the Hamiltonian of a free particle commutes with mo-mentum, p, then the eigenstates of momentum are able to characterise theHamiltonian. Also, the Hamiltonian of many identical particles, as pre-viously discussed, commutes under the operator of particle transposition.Thus, those eigenstates are either symmetric or anti-symmetric, correspond-ing to either bosons or fermions.

The Hamiltonian of the harmonic oscillator commutes with the parity op-erator. Therefore, all wavefunctions of the harmonic oscillator are symmetricor anti-symmetric.

Symmetry as an Operator Consider assigning an operator O to the func-tional O. Then, the symmetry operator commutes with the Hamiltonian,[

O, H]

= 0.

Therefore, as the symmetry operator and Hamiltonian commute, they arediagonal in the same basis. Also, therefore, the expectation value of O isconserved.

Now, a symmetry operator is not necessarily Hermitian, and therefore willnot necessarily give observable quantities.

2.4.1 The Translation Operator

The translation operator is such that

Ta : xi 7→ x′i = xi + ai,

or,

Taψ(x) = ψ(x+ a).

We can write the translation operator in a few different forms. Notice that

eaddxψ(x) =

∞∑n=0

an

n!dn

dxnψ(x)

=∞∑n=0

an

n!ψ(n)(x)

Page 180: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

172 Advanced Quantum Mechanics

which is just a Taylor expansion of ψ(x+ a). Hence,

eaddxψ(x) = ψ(x+ a).

That is, we identify the translation operator with(Ta

)xx′

= δxx′ea ddx . (2.4.1)

Another way to express the translation operator, is using delta-functions.Recall that ∫

δ(y − z)f(z)dz = f(y),

then, we could write that(Ta

)xx′

= δ(x+ a− x′), (2.4.2)

so that ∫ (Ta

)xx′

ψ(x′)dx′ = ψ(x+ a).

Finally, consider that

Taψ(x) = χ(x) = ψ(x+ a),

so that projecting onto bra-ket notation

〈x| Ta |ψ〉 = 〈x|χ〉 = 〈x+ a|ψ〉.Then, working with the far RHS and LHS only, we multiply by |x〉, andsum, ∑

x

|x〉 〈x| Ta |ψ〉 =∑x

|x〉 〈x+ a|ψ〉,

we notice that the far LHS is a unity, so that

Ta |ψ〉 =∑x

|x〉 〈x+ a|ψ〉.

Therefore,

Ta =∑x

|x〉 〈x+ a| . (2.4.3)

Now, notice that Ta is non-Hermitian. That is, one can easily see that

T †a =∑x

|x+ a〉 〈x| .

Page 181: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.4 Symmetries in Quantum Mechanics 173

Using this, construct

T †a Ta =∑x,x′

|x+ a〉 〈x|x′〉 ⟨x′ + a∣∣ ,

the middle term is just a Kronecker-delta, so that

T †a Ta =∑x,x′

|x+ a〉 δxx′⟨x′ + a

∣∣=

∑xx′

δxx′ |x+ a〉 ⟨x′ + a∣∣

=∑x

|x+ a〉 〈x+ a|

= 1.

Therefore, we see that Ta is unitary.

2.4.1.1 Eigenvalues & Eigenstates of TaConsider solving the eigenequation

Ta |ψ〉 = λ |ψ〉 , 〈ψ|ψ〉 = 1,

for λ. That is, we want to solve

ψ(x+ a) = λψ(x).

So, we take (2.4.3), and insert two unities,

Ta =∑x,p,p′

∣∣p′⟩ 〈p′|x〉〈x+ a|p〉 〈p| .

Now, we know that

〈x|p〉 = P (x) =1

(2π~)n/2eipx/~,

and hence therefore,

Ta =∑x,p,p′

∣∣p′⟩ 1(2π~)n

e−ip′x/~eip(x+a)/~ 〈p|

=∑x,p,p′

∣∣p′⟩ 〈p| 1(2π~)n

eix(p−p′)/~eiap/~

=∑p,p′

∣∣p′⟩ 〈p| 1(2π~)n

δ(p− p′)(2π~)neiap/~

=∑p

|p〉 〈p| eiap/~.

Page 182: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

174 Advanced Quantum Mechanics

That is,

Ta =∑p

|p〉 〈p| eiap/~.

Hence, we have eigenvalues and eigenstates

λ = eiap/~, |p〉 .

2.4.1.2 “The Trick”: Relation to Momentum

Now, recall that we showed that the translation operator was unitary. Thatis,

T †a Ta = 1, ∀a.Now, let us take some small δa, and expand in terms of a Taylor series,

Tδa ≈ 1 + Qδa, Q ≡ ∂Ta∂a

∣∣∣∣∣a=0

.

Similarly, the Hermitian conjugate can be written,

T †δa ≈ 1 + Q†δa.

Then, the statement of unitarity is that

T †a Ta = (1 + Q†δa)(1 + Qδa) = 1.

So, expanding to first order in δa only,

1 +(Q+ Q†

)δa = 1

which immediately leads us to see that

Q = −Q†.That is, Q is anti-Hermitian.

Now, let us write that

π = −i~Q,so that its Hermitian conjugate is

π† = i~Q† = −i~Q = π.

That is, π is Hermitian. Furthermore, as Ta is conserved, then so is Q, thenso is π. Then, using the representation of Ta as

Ta = eaddx ≈ 1 + a

d

dx,

Page 183: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.4 Symmetries in Quantum Mechanics 175

we see that we can read off the identification

Q =d

dx,

and therefore that

π = −i~ d

dx= p.

This is just the momentum operator. That is, as a consequence of transla-tional invariance of the Hamiltonian operator, we have momentum conser-vation. Also, recall that we then have

p = −i~ ∂Ta∂a

∣∣∣∣∣a=0

.

We say that p are the generators of the translation group.This has all been for passive transformations, whereby we move coordinate

systems for a given system. An active transformation is one for whom wemove the system relative to a fixed coordinate system.

2.4.2 Generators, Conservation & Gauges

We can generalise the previous discussion.Consider some symmetry operator U(α), which commutes with the Hamil-

tonian H(x). Then,

U(α)H(x) = H(x),

and

U(α)ψ(x) = ψ(U(α)x

).

Also, the symmetry operator is unitary,

U †(α)U(α) = 1. (2.4.4)

Then, we can expand the operator for small argument,

U(δα) ≈ 1 + δα∂U

∂α

∣∣∣∣∣α=0

. (2.4.5)

Furthermore, we can define some R which is Hermitian and whose expecta-tion value is conserved,

R ≡ −i~ ∂U

∂α

∣∣∣∣∣α=0

. (2.4.6)

Page 184: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

176 Advanced Quantum Mechanics

As another example of a symmetry operator, consider the phase operator,

Uα ≡ eiα, ∀α.

This corresponds to adding on a phase shift to the wavefunction, which doesnothing to the observables of the state. The associated conserved quantityis

R ≡ −i~ ∂

∂αeiα∣∣∣∣α=0

= ~.

The thing that is actually conserved is the expectation value of R,

〈ψ| R |ψ〉 = 〈ψ| ~ |ψ〉 = ~.

That is, given an initially normal-wavefunction, the norm is conserved. Thatis, phase symmetry leads to wavefunction norm conservation.

Now, in adding the phase shift, we have an arbitrary choice of origin.That is, we can choose

α′ = α− α0(x).

Then,

p = −i~ d

dxe−iα0(x)ψ(x),

therefore, we see that we have some freedom in how we choose this,

p = −~d

dx+A, A′ = A+

∂α

∂x.

That is, we introduce some new field, A, which will allow preservation ofsymmetries. If the field is a function of position, then the symmetry will belocalised, and we call it a gauge symmetry. If the symmetry is not a functionof position, then the symmetry is global.

2.5 Angular Momentum

Let us consider a rotation, in 3D, about the z-axis, through an angle θ.Then, the change of coordinates are given by

x′ = x cos θ + y sin θ,

y′ = y cos θ − x sin θ,

z′ = z.

Page 185: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.5 Angular Momentum 177

This may be written in matrix form as x′

y′

z′

=

cos θ sin θ 0− sin θ cos θ 0

0 0 1

x

y

z

.

So then, consider denoting the transformation operator as U , then, it is clearfrom the above that U operates on the coordinates only,

ψ(x′) = Uψ(x) = ψ(U(x)).

That is,

ψ(x′, y′, z′) = ψ(U(x, y, z)

)= ψ (x cos θ + y sin θ, y cos θ − x sin θ, z) .

Now, it is obvious that the operator U is unitary: rotating one way, thenback, puts you at the same place you started.

So, given the operator U , we can expand for small transformation angle,as in the previous section on symmetries. That is, we write

U(δθ) = 1− R

i~δθ.

Also, we use the small angle approximation on the cosine and sine terms,

cos θ ∼ 1, sin θ ∼ θ, θ << 1.

Then, the transformation looks like

ψ(x′, y′, z′) = ψ(U(δθ)

)= ψ (x+ yδθ, y − xδθ, z) .

If we then Taylor expand this,

ψ(x′, y′, z′) ≈ ψ(x, y, z) +∂ψ

∂xyδθ +

∂ψ

∂y(−xδθ),

collecting terms,

ψ(x′, y′, z′) ≈ ψ(x, y, z) +(y∂ψ

∂x− x∂ψ

∂y

)δθ

=[1 +

(y∂

∂x− x ∂

∂y

)δθ

= U(δθ)ψ.

Page 186: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

178 Advanced Quantum Mechanics

So, if we compare this with the expansion of the operator, we identify

R = −i~(y∂

∂x− x ∂

∂y

),

which is of course just

R = xpy − ypx = Lz.

Now, following our symmetries discussion, R is conserved. Therefore, we seethat the z-component of angular momentum is conserved.

That is, the rotational symmetry implies conservation of angular momen-tum. Notice that this is in analogue with norm conservation from phase-shiftsymmetry, and momentum conservation from translation symmetry.

We can cycle the indices, so that we obtain the full set,

Lx = ypz − zpy,Ly = zpx − xpz,Lz = xpy − ypx.

Notice that we can group these together into a “vector equation”,

L = r× p. (2.5.1)

One may use the language that the Li are the generators for rotations aboutsmall angles.

Now, we can fairly easily derive the commutators, using known commu-tators between position and momentum. So, consider

[Lx, Ly

]= [ypz − zpy, zpx − xpz]= [ypz, zpx]− [ypz, xpz]− [zpy, zpx] + [zpy, xpz] .

Now, the second and third terms are zero. One can see this, as the multi-plying coordinate is never the one in the subscript of the momentum. So,we have [

Lx, Ly

]= [ypz, zpx] + [zpy, xpz] .

Page 187: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.5 Angular Momentum 179

Let us explicitly evaluate the first commutator,

[ypz, zpx] = y [pz, zpx]

= y (pzzpx − zpxpz)= y

(−i~ ∂

∂z(zpx)− zpxpz

)= y (−i~px + zpz px − zpxpz)= y (−i~px + z [pz, px])

= −i~ypx.In a very similar way, we can also find that

[zpy, xpz] = i~pyx,

thus giving the commutator[Lx, Ly

]= i~pyx− i~ypx = i~Lz.

That is, [Lx, Ly

]= i~Lz. (2.5.2)

Again, if we cycle the indices, we find the whole set of commutators,[Lx, Ly

]= i~Lz,[

Ly, Lz

]= i~Lx,[

Lz, Lx

]= i~Ly.

We can write this using the totally-antisymmetric Levi-Civita tensor,[La, Lb

]= i~εabcLc, a, b, c = x, y, z.

We shall tend to use indices a, b, c rather than i, j, k, to avoid confusion withthe complex number i.

From these, we define

L2 = L2x + L2

y + L2z,

whereby it is obvious (and easily provable) that[Li, L

2]

= 0, ∀i = x, y, z.

Page 188: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

180 Advanced Quantum Mechanics

2.5.1 Eigenstates and Eigenvalues of Angular Momentum

Now, as previously stated in our discussion on commuting operators, thatthe angular momentum do not commute means that there does not exist abasis in which all Li are diagonal.

So, we shall tend to work with the set of operators that do commute, thebiggest set of which are L2 with Li. For historical reasons, we work with L2

and Lz.We use the basis |λ, µ〉, and wish to solve the eigenequations

L2 |λ, µ〉 = λ2 |λ, µ〉 , Lz |λ, µ〉 = µ |λ, µ〉 ;whereby the eigenstates have unit norm, 〈λ, µ|λ, µ〉 = 1.

2.5.1.1 Ladder Operators

We shall define the ladder operators,

L± ≡ Lx ± iLy. (2.5.3)

It is clear that L+ and L− are adjoint to each other.So, let us consider the commutator,[

Lz, L+

]=

[Lz, Lx

]+ i[Lz, Ly

]= i~Ly + ~Lx= ~L+.

Similarly, one finds that[Lz, L−

]= −~L−,

[Lz, L+

]= ~L+. (2.5.4)

Now, consider

L+L− =(Lx + iLy

)(Lx − iLy

)= L2

x + L2y + i

[Ly, Lx

]= L2

x + L2y + ~Lz,

but,

L2 = L2x + L2

y + L2z ⇒ L2

x + L2y = L2 − L2

z,

hence, we see that

L+L− = L2 − L2z + ~Lz.

Rearranging, trivially,

L2 = L+L− + L2z − ~Lz.

Page 189: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.5 Angular Momentum 181

It is also easy to show that a similar result is

L−L+ = L2 − L2z − ~Lz.

Therefore, as L2 and Lz commute, so do L± and L2. That is,[L±, L

2]

= 0.

Now, consider

LzL+ |λ, µ〉 ,then, by the commutation relation (2.5.4), we see that this is just

LzL+ |λ, µ〉 =(L+Lz + ~L+

)|λ, µ〉 .

Now, we know the action of Lz upon this state, so

LzL+ |λ, µ〉 =(L+µ+ ~L+

)|λ, µ〉

= (µ+ ~) L+ |λ, µ〉 .Now, let us write this with some brackets,

Lz

L+ |λ, µ〉

= (µ+ ~)

L+ |λ, µ〉

.

So, in order for the thing on the RHS to be an eigenstate of Lz, we then musthave that the action of L+ upon the state |λ, µ〉 turns it into a new state|λ, µ+ ~〉, which is an eigenstate of Lz with eigenvalue µ+ ~. Therefore,

L+ |λ, µ〉 = C+ |λ, µ+ ~〉 ,and, similarly,

L− |λ, µ〉 = C− |λ, µ− ~〉 .From this, we say that the action of L+ upon a state, increases the projection-number (i.e. the µ bit) by one unit of ~.

We can find the constant C+ by requiring the normalisation of |λ, µ+ ~〉.So, we have that

L+ |λ, µ〉 = C+ |λ, µ+ ~〉 ,and its Hermitian conjugate,

L− 〈λ, µ| = C∗+ 〈λ, µ+ ~| ,so that

|C+|2〈λ, µ+ ~|λ, µ+ ~〉 = 〈λ, µ| L−L+ |λ, µ〉 .

Page 190: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

182 Advanced Quantum Mechanics

That is,

|C+|2 = 〈λ, µ| L−L+ |λ, µ〉= 〈λ, µ| L2 − L2

z − ~Lz |λ, µ〉= λ2 − µ2 − ~µ.

Similarly, one finds that

|C−|2 = λ2 − µ2 + ~µ.

Now, it is obvious that we require |C| ≥ 0. Therefore, we shall say that itis zero for some µmin, µmax. That is,

L+ |λ, µmax〉 = 0 ⇒ λ2 − µ2max − ~µmax = 0,

L− |λ, µmin〉 = 0 ⇒ λ2 − µ2min + ~µmin = 0.

Subtracting the two expressions on the right reveals

µ2min − µ2

max − ~(µmin + µmax) = 0.

Or,

µmax = −µmin.

Now, notice that at this maximum value (say),

λ2 = µmax(µmax + ~).

Furthermore, let us state that 2µmax = q~, where q is an integer. Then,µmax = −µmin = `~, where ` is an integer or half-integer. Hence, we seethat

λ2 = `~2(`+ 1).

Hence, we can now see that

L2 |`,m〉 = `(`+ 1)~2 |`,m〉 , (2.5.5)

Lz |`,m〉 = m~ |`,m〉 , (2.5.6)

L± |`,m〉 =√`(`+ 1)−m(m± 1) ~ |`,m± 1〉 . (2.5.7)

2.5.1.2 Spherical Harmonics

Consider projecting (2.5.6) onto the orthogonal coordinate system (θ, φ).That is,

〈θ, φ| Lz |`,m〉 = m~〈θ, φ|`,m〉,

Page 191: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.5 Angular Momentum 183

inserting a unity,∑θ′,φ′

〈θ, φ| Lz∣∣θ′, φ′⟩ 〈θ′, φ′|`,m〉 = m~〈θ, φ|`,m〉. (2.5.8)

Now, we define

Y`m(θ, φ) ≡ 〈θ, φ|`,m〉,so that (2.5.8) reads∑

θ′,φ′

〈θ, φ| Lz∣∣θ′, φ′⟩Y`m(θ′, φ′) = m~Y`m(θ, φ).

Now, we see that by correspondence with previous sections, we write thematrix representation of Lz as(

Lz

)θθ′φφ′

= 〈θ, φ| Lz∣∣θ′, φ′⟩ ,

which is just (Lz

)θθ′φφ′

= −i~δθθ′δφφ′ ∂∂φ.

Hence,

−i~ ∂

∂φY`m(θ, φ) = m~Y`m(θ, φ).

Integrating reveals that

Y`m(θ, φ) = P`(θ)eimφ.

It is also known that(L2)θθ′φφ′

= − ~2

sin2 θ

(sin θ

∂θ

)2

+∂2

∂φ2

δθθ′δφφ′ .

So, we say that m is the magnetic (or, projection) quantum number ofangular momentum, of the z-component, corresponding to an orbital angularmomentum `. There are 2` + 1 values of m for a given `. The magneticnumber m has integer values.

Without spin, particles are bosonic.Up to now, we have only considered orbital angular momentum; which

may be thought of as analogous to the motion of the earth about the sun – anorbit. The generic algebra presented here works for spin angular momentumas well.

Page 192: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

184 Advanced Quantum Mechanics

2.5.2 Internal Degrees of Freedom

Let us now envisage a particle that is not only determined by its spatialcoordinates, but also some internal “coordinates”. That is, instead of thewavefunction ψ being a scalar, with argument ψ(x, y, z), let it be a vector,

ψ =

ψ1

ψ2...

=

|ψ1〉|ψ2〉

...

.

Now, let us have some operator U that does not act upon the (x, y, z)-bit ofthe wavefunction, but does act upon these extra degrees of freedom. Thatis, U will change the components of ψ,

ψ = Uψ.

Thus, we introduce the spin operator, which has the same algebra (andcommutators) as orbital angular momentum.

So, the operators are S2, Sx, Sy, Sz, S±; with the set of eigenvalues/states

S2 |s,m〉 = s(s+ 1)~2 |s,m〉 , (2.5.9)

Sz |s,m〉 = ~m |s,m〉 , (2.5.10)

S± |s,m〉 =√s(s+ 1)−m(m± 1) |s,m± 1〉 . (2.5.11)

Thus, s is the spin angular momentum quantum number (or, just spin), withprojection m.

One can (inaccurately, but usefully) think of spin as a particle spinning onits axis, much like the rotation of the earth about the poles. This interpre-tation doesn’t quite work as the particles we tend to consider dont alwayshave a “size” about which to rotate.

2.5.3 Total Angular Momentum

Now, we have seen that orbital and spin angular momentum operators actupon different parts of the wavefunction,

ψ(ν) = Usψ(Uo(r)

),

so, we may as well combine these angular momentum operators into a singleoperator, such that

ψ(ν) = Utψ(r).

That is, we have some new “total operator” Ut that incorporates both spinand orbital angular momentum. We call this new operator the total angular

Page 193: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.5 Angular Momentum 185

momentum operator, and denote it J , where

J = L+ S. (2.5.12)

The total angular momentum operator uses the same algebra and commu-tators as both spin and orbital angular momentum:[

Ja, Jb

]= i~εabcJc, a, b, c = x, y, z; (2.5.13)

J2 |j,m〉 = j(j + 1)~2 |j,m〉 , (2.5.14)

Jz |j,m〉 = ~m |j,m〉 , (2.5.15)

J± |j,m〉 =√j(j + 1)−m(m± 1) ~ |j,m± 1〉 . (2.5.16)

2.5.3.1 The Scalar State

Suppose we took j = 0,m = 0. Then, the state would be |0, 0〉. This statehas

J2 |0, 0〉 = 0, Jz |0, 0〉 = 0, J± |0, 0〉 = 0;

where we easily use the above relations to see this. Clearly, such a statedoesn’t do much. So, we say that |0, 0〉 is scalar.

2.5.3.2 Spinors

We shall introduce what a “spinor” is, by using an example.Let us suppose that we take j = 1

2 . Then, the possible values of m are

m = −j,−j + 1, . . . , j − 1, j ⇒ m = −12 ,

12 .

That is, we have two values of m corresponding to this value of j. Then,the two states |j,m〉 are ∣∣1

2 ,12

⟩,∣∣1

2 ,−12

⟩.

So, we can also write down that

J2∣∣1

2 ,12

⟩= 1

232~2

∣∣12 ,

12

⟩= 3

4~2∣∣1

2 ,12

⟩,

Jz∣∣1

2 ,12

⟩= 1

2~∣∣1

2 ,12

⟩,

J+

∣∣12 ,

12

⟩=

√12

34 − 1

234 ~∣∣1

2 ,12 + 1

⟩= 0,

J−∣∣1

2 ,12

⟩=

√12

34 + 1

214 ~∣∣1

2 ,−12

⟩= ~

∣∣12 ,−1

2

⟩,

with all other relations being trivial (and tedious) to write.So, let us introduce some notation,∣∣1

2 ,12

⟩ ≡ |↑〉 , ∣∣12 ,−1

2

⟩ ≡ |↓〉 . (2.5.17)

Page 194: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

186 Advanced Quantum Mechanics

Hence, we see that

J+ |↑〉 = 0, J− |↑〉 = ~ |↓〉 ,J+ |↓〉 = ~ |↑〉 , J− |↓〉 = 0.

Now, we further notate these “upstates” and “downstates” as

|↑〉 =(

10

), |↓〉 =

(01

). (2.5.18)

So, we can construct any state from a linear combination of these,(α

β

)= α

(10

)+ β

(01

),

again, we give an alternative way of notating this,

|χ〉 = α |↑〉+ β |↓〉 .Now, the components, α, β may be complex. In this case, the state |χ〉 iscalled a spinor. A spinor is just the name given to state vectors, in a complexvector space. The states are normalised,

〈χ|χ〉 = |α|2 + |β|2 = 1.

Now, the operators Ji are generated using the Pauli-spin matrices,

Ji =~σi2, (2.5.19)

where the σi are given by

σx ≡(

0 11 0

), σy ≡

(0 −ii 0

), σz ≡

(1 00 −1

). (2.5.20)

The spin matrices can easily be shown to conform to the commutation re-lations

[σa, σb] = 2iεabcσc, (2.5.21)

and equally as easy to the anti-commutation relations

σa, σb = 2δab. (2.5.22)

Although we haven’t gone into the details of generators of groups at all, itis worth noting that

eiσana

is the generator of the group of rotations in 3D, where na is the unit vector.

Page 195: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.5 Angular Momentum 187

2.5.4 Multiplication of Angular Momenta

Now, we know that from the previous sections discussion, for a given valueof the total angular momentum quantum number j there are 2j+1 values ofthe projection quantum number m. The eigenstates of angular momentumare |j,m〉. Now, suppose we have two particles, each with their own angularmomentum states, what is the resultant angular momentum? That is, howcan we compute the product of two angular momentum states |j1,m1〉 and|j2,m2〉? What is the total J and total M of the new system?

Now, the total angular momentum operators are defined to be the sum

Jtot = J1 + J2, Jztot = Jz1 + Jz2 .

However, the square is not just the sum of the squares, as

J2tot = J2

1 + J22 + J1J2 + J2J1.

This complicates things somewhat!Now, let us use the notation that

|j1,m1; j2,m2〉 = |j1,m1〉 |j2,m2〉 .Hence, as there are (2ji+1) mi states per system, there are (2j1 +1)(2j2 +1)states for the total system. So, the projection operator is fairly easy tocompute,

Jztot |j1,m1; j2,m2〉 =(Jz1 + Jz2

)|j1,m1〉 |j2,m2〉

= ~(m1 +m2) |j1,m1〉 |j2,m2〉= ~(m1 +m2) |j1,m1; j2,m2〉 .

The total angular momentum is not so easy to find. Basically, the answeris that a linear combination of states is produced, where the possible valuesof total angular momentum are

J = |j1 − j2|, |j1 − j2 + 1|, . . . , |j1 + j2|.The coefficients of the linear combinations are known as the Clebsh-Gordancoefficients,

|j1,m1; j2,m2〉 =∑J,M

〈J,M |j1,m1; j2,m2〉 |J,M〉 , M ≡ m1 +m2.

These coefficients are messy, and are generally “looked up” if one needs toknow them. The Clebsh-Gordan coefficients are orthogonal.

Page 196: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

188 Advanced Quantum Mechanics

2.6 Charged Particle in EM Field

Recall the Lorentz force on a charged particle,

F = qE +q

cv ×B,

where we shall be using cgs units. The electric and magnetic fields relate tothe potentials via

E = −∇φ− 1c

∂A∂t

, B = ∇×A.

Notice that by introducing these potentials, we have reduced the number ofparameters from 6 (in 3 components of the fields) to 4 (in the 3 of the vector,and 1 of scalar potential). Now, the potentials are not uniquely defined, andare free up to the choice of gauge transformation,

φ′ = φ+1c

∂χ

∂t, A′ = A−∇χ,

where χ is an arbitrary function. We use the gauge fixing of the Coulombgauge to state that

∇ ·A = 0,

so that in using this, the fields do not change.

2.6.1 Pauli Hamiltonian

Let us postulate the Hamiltonian

H =1

2m

(p− q

cA)2

+ qφ− gq

2mcS ·B, (2.6.1)

where S are the spin matrices, and g the gyromagnetic ratio. This is thenon-relativistic Hamiltonian of a particle with charge q in an electromagneticfield, with magnetic field B, vector and scalar potential A, φ. We call it thePauli Hamiltonian.

The part in the Hamiltonian,

p− q

cA

is said to be of the form minimal coupling, and is an analogue of the covariantderivative.

The interaction of the charged particle with the electric field is describedby the

Page 197: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.6 Charged Particle in EM Field 189

term. The interaction with the magnetic field is described by the

− gq

2mcS ·B

term.The gauge transformation previously stated gives freedom in changing the

phase of the wavefunction,

ψ(r, t) = ψ(r, t)eiq~cχ(r,t).

Now, it is common to notate the magnetic interaction term in terms of themagnetic moment,

gq

2mcS ·B = µ ·B.

2.6.1.1 Spin Dynamics

The Schrodinger equation, as always, is given by

i~∣∣∣ψ⟩ = H |ψ〉 .

Now, suppose we have a particle in a homogeneous magnetic field (i.e. themagnetic field is constant in space). Then, we can separate the wavefunctioninto its space and spin parts,

|ψ〉 = |ψ(r)〉 |η〉 .Each “type” of state is normalised,

〈ψ(r)|ψ(r)〉 = 1, 〈η|η〉 = 1.

The spin state, in general, has n components αn, whereby

|η〉 =

α1...αn

,n∑i=1

|αi|2 = 1.

Let us consider a spin-12 electron. We shall use the spin basis discussed in

the previous section. The spin operator, for such a system, is just formedfrom the Pauli-matrices,

S =~2σ.

Hence, such a system has its wavefunction written as

|ψ〉 =( |ψ↑〉|ψ↓〉

).

Page 198: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

190 Advanced Quantum Mechanics

For an electron, g = 2, q = e. Using this, and writing the spin operator asthe Pauli matrices, we rewrite the Pauli Hamiltonian (2.6.1) as

H =1

2m

(p− e

cA)2

+ eφ− e~2mc

σ ·BIntroducing the Bohr magneton,

µB ≡ e~2mc

,

the Pauli Hamiltonian becomes

H =1

2m

(p− e

cA)2

+ eφ− µBσ ·B.Let us now use this Hamiltonian to write the Schrodinger equation for aparticle in a homogeneous magnetic field. We recall that we separate thewavefunction into its space and spin parts, so that the Schrodinger equationis written as two equations:

i~∣∣∣ψ(r)

⟩=

[1

2m

(p− e

cA)2

+ eφ

]|ψ(r)〉 , (2.6.2)

i~ |η〉 = −µBσ ·B |η〉 . (2.6.3)

The spin equation describes the spin dynamics of a charged spin-12 particle

in a magnetic field.

Spin Precession Consider a charged spin-12 particle magnetic field B =

(0, 0, Bz), where Bz is constant. Then, the spin dynamics equation becomes

i~d

dt|η〉 = −µBBzσz |η〉 .

To avoid confusion, let us write this equation down explicitly, in terms ofmatrices;

i~(η↑η↓

)= −µBBz

(1 00 −1

)(η↑η↓

).

Expanding out the equations, we get

i~dη↑dt

= −µBBzη↑, (2.6.4)

i~dη↓dt

= µBBzη↓. (2.6.5)

These equations are easily integrated, and solved to give

η↑ = η↑(0)eiΩt,

η↓ = η↓(0)e−iΩt, Ω ≡ µBBz~

.

Page 199: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.6 Charged Particle in EM Field 191

These equations describe the precession of an initial spin state. That is, thespins precess with angular frequency Ω.

2.6.2 Phenomenology

Here we shall discuss some interesting effects that quantum theory predicts.

2.6.2.1 The Aharonov–Bohm Effect

Consider the change in phase of a charged particle, upon moving along somepath γ

ϕ =1~

∫γ

(p− q

cA)· dr,

or, just pulling out the magnetic part,

ϕmag = − q

~c

∫γ

A · dr.

Now, consider that the particle moves along two paths γ1 and γ2, consecu-tively, in the presence of a vector potential,

∆ϕmag = − q

~c

(∫γ1

A · dr−∫γ2

A · dr).

We see that via Stokes’ theorem, this is just the flux of B through the surfacebound by the contour γ1 + γ2;

∆ϕmag = − q

~c

∮B · ndS,

where n is the unit normal of the surface element dS. Let us write this fluxas

Φ ≡∮

B · ndS,

so that the phase change is

∆ϕmag = − q

~cΦ. (2.6.6)

Hence, we see that as a particle completes its motion about a closed loop,there is a phase change of the wavefunction, and a magnetic flux associated.

This effect can be experimentally “seen” by putting a solenoid in a system.Consider that electrodynamic theory says that there is no magnetic fieldoutside of a solenoid, but the vector potential does exist outside of thesolenoid. That is, for a magnetic field passing through an area A, there will

Page 200: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

192 Advanced Quantum Mechanics

be no magnetic field outside of the solenoid, but there will be an angularcomponent of the vector potential,

Aθ =Φ

2πr,

where Φ is the magnetic flux through the solenoid, and r the distance fromthe centre of the solenoid.

Now, using the standard Youngs’ double-slit experimental setup (usingelectrons), as one moves along the screen, one sees an interference pattern.The interference pattern is due to the slightly different path lengths of thetwo beams, as they come through the slits. As a function of the differencein path lengths, the interference pattern observed is

|ψ|2 ∼ 2|ψ0|2[1 + cos

( |p|(L1 − L2)~

)].

That is, one will see a sequence of light and dark spots, as the path lengthsweeps through integer and half-integer multiples of 2π. That is,

|p|(L1 − L2) = 2πm~ ⇒ constructive,

|p|(L1 − L2) = (2πm+ π) ~ ⇒ destructive.

We now modify the setup, so that between the slits and the screen is thesolenoid, with a magnetic field through it. It is important to note again thatthere will not be any magnetic field outside of the solenoid, but there willbe a component of the magnetic vector potential.

Now, the phase of the two electron beams will be different from the no-solenoid case. That is, there will be an additional phase shift, due to thevector potential being generated, by the solenoid. This extra phase shift hasthe observable effect of shifting the interference pattern along. The phasedifference, for constructive interference, without the solenoid, was

∆ϕ0 = 2πm,

and with solenoid is now

∆ϕ = ∆ϕ0 − q

~cΦ.

We can basically now state that the Aharonov–Bohm effect is an inter-ference effect which appears when charged particles pass through a regionof space with zero magnetic field, but non-zero vector potential. What thismeans is that the vector potential is more fundamental than the field itself.

Page 201: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.6 Charged Particle in EM Field 193

x

Y¤2

Fig. 2.7. The shift in interference pattern observed, with solenoid: The Aharonov–Bohm effect. Undashed is no-solenoid, dashed has the solenoid.

2.6.2.2 Flux Quantisation

Suppose that charged particles are forced to move around a circle. Then,if there is a magnetic field present, there will be a phase shift after everyrevolution (as we have just seen). If the phase difference is 2πn, the particleswill not “notice” the shift, as everything will be as it was when they first wentround. However, if the phase shift is not 2πn, then after many revolutionsthe particles wavefunctions will interfere and eventually cancel out.

We saw that the phase shift, due to magnetic flux, is

∆ϕmag =qΦ~c.

So, this must be an integer number of 2π’s, for the wavefunctions to not-cancel. That is, we have

∆ϕmag =qΦ~c

=

2πn ⇒ constructive2πn+ ε ⇒ destructive.

Hence, we see that the properties of the system will change perdiodically,due to the presence of the magnetic flux. Hence, for this not to happen, werequire the quantisation of the flux of the magentic field, such that

Φ =2πn~cq

.

2.6.2.3 Fractional Statistics

Let us consider a 2D slab of electrons. To construct such a slab, one couldconsider sandwiching a very thin slice of conductor between two insulators.Then, motion of the electrons would effectively be confined to the x − y-plane.

Page 202: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

194 Advanced Quantum Mechanics

Now, put a solenoid through the 2D slab, so that the electrons will orbitthe solenoid. The Hamiltonian for such a system can be written, from thePauli Hamiltonian,

H =1

2m

[(pr − e

cAr

)2+(pθ − e

cAθ

)2]

+ U(r).

We know that, from our previous discussions,

Aθ =Φ

2πr.

Basically, the solution to the Schrodinger equation takes on the form

ψm(r, θ) = Rm(r)e[i(m+ eΦ~c )θ],

where we must have the quantisation rule

m+eΦ~c

= n,

where n is an integer.Now, suppose we put two solenoids through the slabs. There will be (say)

an electron orbiting each solenoid. Now, suppose we exchange the electrons,by rotating the system (let us leave the solenoids where they are). Theelectrons will pick up a phase shift, because they will be cutting the fluxlines,

∆ϕ ∼ eiα, α ≡ eΦ~c.

Basically, if the electrons pick up a phase shift of −1, they are fermions.However, this need not be the case. That is, upon the exchange of particles,we can make the electrons pick up an arbitrary phase shift.

This is called fractional statistics, whereby particles can have any phaseshift.

It is also interesting to note that this effect is not possible in a 3D gasof electrons. The reason for the existence of fractional statistics in 2D,and not 3D, is due to the topology of the group of rotations (i.e. prettycomplicated!). Consider in 3D, two particles in distinct locations. We canexchange the particles by making them move on tracjectories between thelocations. We can send one particle along one trajectory, and the other ona trajectory rotated a little-bit-upwards from the first trajectory. Infact, wecan rotate the second trajectory all the way round, until a circle is formed.Now, it is very important to note that every one of these rotations is stillwithin the 3D space (it is fairly obvious, but important). That is, we saythat the 3D group of rotations in simply connected. This is not the case in

Page 203: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.6 Charged Particle in EM Field 195

2D: one cannot rotate “upwards” in 2D, remaining in 2D! That is, the 2Dgroup of rotations is not simply connected.

2.6.2.4 Landau Orbitals

Suppose we write the Pauli Hamiltonian as

H =1

2m

(px +

e

2cBy

)2+

12m

(py − e

2cBx

)2,

where there is obviously a vector potential

A = (−12By,

12Bx, 0).

So, suppose we want to find the energy levels of such a system, we wouldthen want to solve

H |ψ〉 = E |ψ〉 .Now, the Hamiltonian is of the form that suggests we write something like

a2 + b2 = (a+ ib)(a− ib),where we shall write

a =1

2√eB/2c~

(px +

e

2cBy + i

(py − e

2cBx

)),

a† =1

2√eB/2c~

(px +

e

2cBy − i

(py − e

2cBx

)).

Hence, we have something very similar to our harmonic oscillator problem ofa few sections ago. That is, we have commutation relation and Hamiltonian[

a, a†]

= 1, H = ~ωB(a†a+ 12).

Thus, an orbital that oscillates.

2.6.3 Quantum Theory of Radiation

Consider the Pauli Hamiltonian, with interaction term,

H =1

2m

(p− e

cA)2

+ eφ− µBσ ·B + HEM ,

where A and E are classical fields. We call H the semi-classical Hamiltonian.If we open out the bracket on this Hamiltonian, we see that we can write itas

H =p2

2m+ eφ− e

mcA · p− µBσ ·B +

e2

2mc2A2 + HEM .

Page 204: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

196 Advanced Quantum Mechanics

The first thing to note, is that we have allowed p and A to commute. Thisis because of the Coulomb gauge ∇ ·A = 0. We can write this Hamiltonianas a sum of “smaller” ones,

H = Ha + HI + HEM ,where the atomic Hamiltonian is

Ha ≡ p2

2m+ eφ,

and the interaction Hamiltonian

HI ≡ − e

mcA · p− µBσ ·B +

e2

2mc2A2.

The electromagnetic Hamiltonian HEM is that of the electromagnetic field.If the “value of” the atomic Hamiltonian is much greater than that of the

interaction Hamiltonian, then we can write its Schrodinger equation,

Ha |Na〉 = En |Na〉 , |Na〉 = |n, j,m〉 ;where we have used the atomic states |Na〉. The quantum number n is theprinciple (or, radial) quantum number, j the angular momentum and m theprojection (or magnetic) quantum number.

Transitions are described by the matrix element

VN ′N =⟨N ′∣∣ HI |N〉 .

The Fermi golden rule is

wdΩ =2π~|VN ′N |2 ρ(EN − EN ′)δ(EN − EN ′ − ~ω)dΩ, (2.6.7)

where wdΩ is the transition probability for a given elemental solid angle; ρis the density of states.

As a small aside, consider the term

− e

mcA · p = − e

mcA · dx

dtm,

which we can write as

−ecA · dx

dt= −e

c

d

dt(A · x) +

e

cx · dA

dt.

Now, if the fields are periodic, the first term on the RHS is zero. Now, invacuum,

E = −1c

dAdt.

Page 205: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.6 Charged Particle in EM Field 197

Hence,

− e

mcA · p = −ex ·E.

Now, a charge multiplied by the field is just the electric dipole, so that

− e

mcA · p = −E · d.

Hence, we see that part of the interaction Hamiltonian can be rewrittenusing this, as

− e

mcA · p− µBσ ·B 7−→ −E · d− µ ·B,

which is a semi-classical expression.Let us get back to a quantum mechanical approach.Recall Maxwell’s equations,

∇ ·E = 4πρ, ∇×E = −1c B,

∇ ·B = 0, ∇×B = 4πc J + 1

c E,

and the relations of the fields to the potentials,

E = −1cA−∇φ, B = ∇×A.

Now, taking the divergence of the first expression,

∇ ·E = −1c

∂t∇ ·A−∇2φ = 4πρ,

where we have used the first Maxwell equation. Now, if we use the Coulombgauge, ∇ ·A = 0, this reduces to

−∇2φ = 4πρ,

which is a wave equation, with static (Coulomb) solution

φ(x, t) = −∫

ρ(x′, t)|x′ − x|dx

′.

Thus, for a given distribution of charge, one can find the scalar potential,and thus reducing the number of free parameters in the potentials by one.

The vector potential may be written as a spherical wave,

A = A0eik·x,

and its divergence is zero, by the Coulomb gauge,

∇ ·A = k ·A = 0.

Therefore, the direction of the vector potential and direction of propagation

Page 206: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

198 Advanced Quantum Mechanics

are always perpendicular. That is, there is no parallel component betweenthe wavevector k and the vector potential A.

In free space, the wave equation becomes

1c2

∂2A∂t2

=∂2A∂x2

.

Now, we shall use the 4-vector notation

x ≡ (t,x), k ≡ (ω,k),

so that their scalar product is

k · x = ωt− k · x.Hence, plane waves are written as

A = A0eik·x.

So, computing the two halfs of the wave equation,

∂2A∂t2

= −ω2A,

∂2A∂x2

= −|k|2A.Hence, the wave equation requires that

ω2

c2= |k|2,

which is the standard dispersion relation

ω = c|k|.Now, the most general solution to the wave equation is a sum over modes.

Infact, in addition to summing over modes, we must also sum over polari-sation states. So, the most general solution is

A(x) =1√V

∑k

∑λ=1,2

(εk,λak,λe

−ik·x + ε∗k,λa†k,λe

ik·x). (2.6.8)

To compound the point, the coefficient-vector εk,λ is a polarisation coeffi-cient (where there are λ polarisations), and ak,λ is some amplitude. Bothcoefficients may be complex, and are classical in nature. The factor of V issome volume-normalisation. We require that

k · εk,λ = 0,

due to the transverse nature that arises from use of the Coulomb gauge.

Page 207: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.6 Charged Particle in EM Field 199

The classical value of the energy of the electromagnetic field is given by

HEM =1

∫(E2 + B2)dx.

Now, we have that in free space,

E2 =ω2

c2A2.

So, we can compute that

18π

∫E2dx =

14π

∑k,λ

ω2

c2a†k,λak,λ =

18π

∫B2dx,

and hence that

HEM =1

∑k,λ

ω2

c2a†k,λak,λ.

Now, if we introduce

Qk,λ =2√π

c

(ak,λ + a†k,λ

),

Pk,λ = −2√π

ciω(ak,λ − a†k,λ

),

we find that the Hamiltonian can be written as

HEM =12

∑k,λ

(P 2

k,λ + ω2Q2k,λ

).

This is of the same form as the harmonic oscillator. That is, in expandingthe general solution into its modes, we have found that the Hamiltonianof the electromagnetic radiation is the same as a sum over modes of theharmonic oscillators. That is, we can think of radiation as being lots oflittle harmonic oscillators.

So, let us “quantise the theory”, by sending quantities to operators:

Pk,λ 7−→ Pk,λ, Qk,λ 7−→ Qk,λ.

Also, we notice that the above definitions allow us to compute the commu-tators, [

Pk,λ, Qk′,λ′

]= i~δkk′δλλ′ ,[

Pk,λ, Pk′,λ′

]= 0,[

Qk,λ, Qk′,λ′

]= 0.

Page 208: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

200 Advanced Quantum Mechanics

We redefine the creation and annihilation operators,

ak,λ =1√

2~ω

(ωQk,λ + iPk,λ

), (2.6.9)

a†k,λ =1√

2~ω

(ωQk,λ − iPk,λ

), (2.6.10)

where we see that the commutator holds,[ak,λ, a

†k′,λ′

]= δkk′δλλ′ , (2.6.11)

with all others being zero. Hence, we see that we can write the Hamiltonianas

HEM =∑k,λ

~ω(a†k,λak,λ +

12

). (2.6.12)

This gives us an example of quantising a field. We take a field, we expandthe solution in terms of modes of the field, where each mode is just a har-monic oscillator. We then apply the quantisation scheme to each oscillator,writing commutation relations.

We write the eigenstates as

|NEM 〉 = |nk1,λ1 , nk2,λ2 , . . .〉 = |nk1,λ1〉 |nk2,λ2〉 . . . ,where nki,λj is the occupancy number of a particular state. That is, howmany photons with a given wavenumber and polarisation state.

We write the eigenvalues as

EEM =∑k,λ

~ωk,λ

(nk,λ +

12

).

Now, if one recalls that

d

dtQ(t) = − i

~

[Q, H

],

then we see that we can find the Heisenberg operator. That is, use the non-time dependent operator to construct the time dependent operator. Now,

d

dtak′,λ′(t) = − i

~

[ak′,λ′ , H

].

For notational brevity, we shall now leave off the λ subscript. So, the com-mutator is [

ak′ , H]

= ak′H − Hak′

=∑k

~ω(ak′ a

†kak − a†kakak′

).

Page 209: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.6 Charged Particle in EM Field 201

Now, by our commutation relations, we can swap the order of the very lasttwo terms, to give[

ak′ , H]

=∑k

~ω(ak′ a

†kak − a†kak′ ak

),

which we then write as[ak′ , H

]=∑k

~ω([ak′ , a

†k

]ak

).

Hence, using our commutation relation, this is just[ak′ , H

]=

∑k

~ωδk′kak

= ~ωak′ .

Therefore, we see that

d

dtak(t) = − i

~

[ak, H

]= −iωak.

This is easily integrated, to give

ak,λ(t) = e−iωtak,λ(0). (2.6.13)

In a similar way, we can show that

a†k,λ(t) = eiωta†k,λ(0). (2.6.14)

Therefore, we have expressions for the time dependent creation and annihi-lation operators.

The action of the creation operator:

a†k′,λ′ |NEM 〉 =√nk′,λ′ + 1

∣∣nk1,λ1 , . . . , nk′,λ′ + 1, . . .⟩. (2.6.15)

The action of the destruction operator:

ak′,λ′ |NEM 〉 =√nk′,λ′

∣∣nk1,λ1 , . . . , nk′,λ′ − 1, . . .⟩. (2.6.16)

We can now write the field operator,

A(x, t) = c

√2π~ωV

∑k,λ

(e−ik·xεk,λak,λ(0) + eik·xε∗k,λa

†k,λ(0)

),

where the first term “kills” a wave, and the second term “creates” a wave.Notice that by (2.6.13) and (2.6.14), we can rewrite this operator in terms

Page 210: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

202 Advanced Quantum Mechanics

of the time dependent creation/destruction operators, being careful of ex-pressions like

eik·xa(0) = eiωt+ix·ka(0) = eik·xa(t).

Hence, the field operator:

A(x, t) = c

√2π~ωV

∑k,λ

(eik·xεk,λak,λ(t) + e−ik·xε∗k,λa

†k,λ(t)

). (2.6.17)

Thus, again, we see that the first term destroys a photon in state (k, λ) andthe second term destroys a photon in the state (k, λ).

Issues Now, if one recalls the Coulomb gauge ∇·A = 0, one must questionthe meaning of ∇ · A = 0; the equivalent in the quantised field operatorformalism. The interesting thing is that there is no real consensus as it itsmeaning!

Another interesting point, is the “plane wave” assumption: we expandedthe modes in terms of plane waves. This seems sensible enough, if the spaceis “empty enough”. That is, there are no walls anywhere near our system.In most circumstances, this is ok, and experiment agrees. However, if onedoes an experiment based in a small cavity, then plane waves are not a goodthing to expand in terms of. Indeed, experiments based in small cavitiesfind different transition probabilities to those in “open space”.

2.6.3.1 Zero Point Energy

Recall that the eigenvalues (i.e. energy of a particular state) are written as

EEM =∑k,λ

~ωk,λ

(nk,λ +

12

),

where nk,λ is the number of photons in a given state. Suppose that thereare no photons at all in a system. Then, there is a non-zero energy of

EZP =∑k,λ

12

~ωk,λ.

This energy is known as the zero-point energy. We can crudely compute thisvalue. Consider sending the sum to an integral,∑

k,λ

7−→ 2∫ kmax

0

k2

(2π)3dk,

Page 211: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.6 Charged Particle in EM Field 203

where we have used a factor of 2 due to the two polarisation states. If wethen use the dispersion relation ω = ck, the zero-point energy is just

EZP =∫ kmax

0

~ck3

(2π)3dk =

~c32π3

k4max.

Now, there is no “real limit” to the maximum wavenumber kmax, hence, aninfinite zero-point energy!

To get a handle on how much energy this is, consider an example wherewe impose that the maximum wavenumber is that of blue light. Then,λblue ≈ 4nm, with

kmax =2πλ

⇒ EZP =~cπ

2λ4max

≈ 1.8Jm−3.

The energy of a typical lamp is ≈ 2.7× 10−8Jm−3.For fermions, the Hamiltonian is

H =∑k,λ

~ωk,λ

(a†k,λak,λ −

12

).

So, notice that upon adding this to the Hamiltonian for radiation (as above),the Hamiltonians cancel, leaving no zero-point energy. This is the generalidea in SUperSymmetrY (SUSY), whereby each boson has an accompanyingfermion (and vice-versa), which ends up canceling out the zero-point energy.

Casimir Effect There is an experiment which can test for the validityof the plane-wave expansion, and the zero-point energy. Consider puttingtwo parallel mirrors a distance L apart. Then, along the axis the planewave expansion will be ok, and radiation will act as a load of harmonicoscillators. Along the axis perpendicular to the mirrors, the waves will haveto be discrete, in terms of kx = 2πn/L where n = 1, 2, 3, . . .. So now, uponcomputing the zero-point energy, we can turn the sum into an integral forthe ky and kz modes, but not the kx modes (as they are not continuous).Hence, the zero-point energy is computed via

EZP =∑kx

∫ ∫c(k2

x + k2y + k2

z)1/2dkydkz

(2π)2∼ 1L3.

Now, the force is just

F =∂EZP∂L

= − ~cπ2

240L4.

That there is a negative force means that there is an attraction between thetwo plates. Therefore, we see that if this zero-point energy is to exist, then

Page 212: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

204 Advanced Quantum Mechanics

there should be a force between two plates in “vacuum”. Indeed, experimentconfirms this.

2.6.3.2 Coherence & Radiation

The states of the EM field are |N〉, whereby

〈N | E |N〉 = 0.

Now, there is the usual uncertainty relation between the state and phase ofthe EM field,

∆N∆ϕ ≥ 1.

Now, states produces by the creation/annihilation operators are coherent,

a |z〉 = z |z〉 ,and in this case

∆N∆ϕ = 1.

Transition probabilities are given by the matrix elements

− e

mc

⟨N ′∣∣ A · p |N〉 ,

where we have that the field operator is of the form

A ∼ αa+ βa†.

Hence, as the action of the creation operator, upon a state with nk,λ willsend that state to one with nk,λ+1, and the annihilation operator to a statenk,λ−1. Therefore, we are sending the initial state |N〉 to some other state.

|N〉 = |Na, nkλ〉 → |Na, nkλ ± 1〉 .Hence, by orthonormality of the |N〉 states, we must have that 〈N ′| =|Na, nkλ ± 1〉. Hence,

• If |N ′〉 = |Na, nkλ + 1〉, then there is creation of a photon,• If |N ′〉 = |N ′a, nkλ − 1〉, then there is absorption of a photon.

Hence, the matrix element for absorption is

V absN ′N = − e

mcc

√2π~ωV

⟨N ′a, nk,λ − 1

∣∣ eik·xak,λ(t)pεk,λ |Na, nk,λ〉 ,where we have used that the field operator collapses down to just the creationoperator. Hence, we write this as

V absN ′N = − e

m

√nk,λ2π~ωV

⟨N ′a∣∣ eik·xpεk,λ |Na〉 e−iωt.

Page 213: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

2.6 Charged Particle in EM Field 205

If we use the dipole approximation, which is eik·x=1, the matrix elementgoes to ⟨

N ′a∣∣ pεk,λ |Na〉 = im

ENa − EN ′a~c

dN ′Nεk,λ.

Basically, one can see that ∣∣∣V absN ′N

∣∣∣2 ∝ nk,λ,

and from a similar line of argument,∣∣V emisN ′N

∣∣2 ∝ nk,λ + 1.

In this emission matrix element, the nk,λ-part is due to stimulation, and the+1-part due to spontaneous emission.

Page 214: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3

Advanced Statistical Physics

3.1 Elementary Probability Theory

Suppose we have a total of n events, and that there are m ways of gettinga specific event A (say). Then, we say that the probability of A occurring,P (A), is given by

P (A) =m

n,

where m,n may not be “known” (that is, known a priori), and must thusbe determined by experiment. Also, this only works for an infinite numberof trials. Thus, P (A) is actually the limiting value of the relative frequencym/n of A, as the number of trials increases to infinity.

We must make a distinction between the sample points and the samplespace. The former is better described as the “possible outcomes”. Forexample, given two coins, the sample space is

S = (H,H), (H,T ), (T,H), (T, T ),with each event being a subset of S.

3.1.1 Representations

We use the following symbols:

• A ∩B to denote what the “cross-over” of the sets A and B are; i.e. onlythat which is common to both A and B.• A ∪B to denote the union: everything within both A and B.• A ⊃ B to denote the subset.

So, for example, the statement

A ∪B =

206

Page 215: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.1 Elementary Probability Theory 207

A ! B = " A !B #= "

A $ B

A B A B

A BB

AA %B

Fig. 3.1. Some standard representations in elementary probability theory.

reads “the union of A with B is the empty set”. That is, A and B possesno common elements. Similarly,

A ∩B 6= denotes that the two sets have common elements. The statement

A ⊃ Breads “B is a subset of A”. That is, all elements of A lie within B.

We see that

P (A ∪B) = P (A) + P (B)− P (A ∩B)

is the probability of an event anywhere within the sets A and B. We subtractthe ‘cross-over’ term as we are counting that region twice. If we have asituation where A∪B = , then events A and B are mutually exclusive. Inthis case, we have that

P (A ∪B) = P (A) + P (B);

So that we read P (A ∪B) as “the probability of event A or B occurring”.We have that, for some total number of events n, the number of events

which fall under the class A ∪ B is nA∪B. Then, by our previous definitionof the probability being the limiting case,

P (A ∪B) = limn→∞

nA∪Bn

.

Page 216: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

208 Advanced Statistical Physics

Consider the setup where some set m is within some set k. That is, m ⊃ k.Then, the probability of event m occurring, given that event k has occurredis denoted

P (m|k).

We can get an intriguing result from this. Consider that the probabilityP (m|k) is the probability of m occurring from a space where k has alreadyoccurred, in a space of n total possibilities. Then, we see that

P (m|k) =P (m)P (k)

, (3.1.1)

where each may be expressed as

P (m)P (k)

= limn→∞

nm/n

nk/n

= limn→∞

nmnk

.

Now, consider that the union of two sets A and B is denoted M ; A∪B = M .Then, by (3.1.1), we see that the probability of M occurring given that Bhas already occurred, is given by

P (M |B) = P (A ∪B|B)

=P (A ∪B)P (B)

.

From this, we have Bayes’ theorem:

P (A ∪B) = P (A|B)P (B), (3.1.2)

which reads “the probability of both A and B occurring is the product ofthe probabilities that B occurs, with the probability that A occurs giventhat B occurred”.

We have that

P (A ∩B) = P (A)P (B)

only if A and B are independent. That is, there is no overlap between thetwo sets.

3.1.2 Stochastic Random Variables

Think of a stochastic random variable as some function X which sends thesample space S to the real line R:

X : S 7→ R.

Page 217: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.1 Elementary Probability Theory 209

One must not confuse this with x, which are the possible values that canresult.

Suppose that we have p(xi), which is the probability that one results invalue xi after some experiment. Then, we call p the probability densityfunction or “pdf”. The pdf has the following properties:

p(xi) ≥ 0, ∀i; (3.1.3)∑i

p(xi) = 1, (3.1.4)

where the last expression is the statement of normality. We also make thedefinitions of the nth moment

µn ≡ 〈Xn〉 ≡∑i

xni p(xi), (3.1.5)

so that 〈X〉 is just the mean (usually denoted just µ), and the combination⟨X2⟩− 〈X〉2 is the variance σ2; so that the standard deviation is σ.

Again, we must make the distinction between X the random variable, andx the actual value you can find.

Thus far, we have discussed discrete random variables; let us now considercontinuous ones.

3.1.2.1 Continuous Random Variables

Now we have that p(x) is some probability density function; where X isdefined on some continuous interval [a, b]. Thus, the probability of an eventwithin some range [c, d] is ∫ d

cp(x)dx,

just the area under the plot of the pdf. We fairly obviously have normalisa-tion via ∫

∀xp(x)dx = 1,

where ∀x denotes the entire range on which X is defined. Similarly, we havethe moment

〈Xn〉 =∫∀xxnp(x)dx.

It should be fairly obvious that a continuous distribution is just the lim-iting case of a discrete one. That is, if we let the number of events (possiblechoices) n tend to infinity, and the spacing between such events ε tend to

Page 218: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

210 Advanced Statistical Physics

zero (whilst keeping nε finite); then all sums go over to integrals. Thus, thechoice in using summations or integrals is just a notational convenience.∑

i

p(xi)↔∫p(x)dx.

We may also take the expectation value of some function of the randomvariable; so that

〈f(X)〉 =∫∀xf(x)p(x)dx.

3.1.2.2 Generating Function

We commonly take

f(x) = eikx,

which is a function we denote the “characteristic function of X”. We furtherdenote

G(k) ≡⟨eikX

⟩=∫∀xeikxp(x)dx. (3.1.6)

This has the property that

|G(k)| ≤ 1,

and that

G(0) = 1;

as one will notice that taking k = 0 just results in the normalisation condi-tion. Also, taking the integration limits to ±∞ results an expression whichis the fourier transform of the pdf∫ ∞

−∞eikxp(x) = Fp(x).

Now, in (3.1.6), if we make the substitution that

ey =∞∑n=0

yn

n!,

then we see that

G(k) =∫∀x

∞∑n=0

(ik)n

n!xnp(x)dx; (3.1.7)

Page 219: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.1 Elementary Probability Theory 211

which, under a trivial reassignment, becomes

G(k) =∞∑n=0

(ik)n

n!µn; (3.1.8)

Where we have used (3.1.5) in identifying the moment-integral

µn =∫xnp(x)dx.

We see that, trivially, from the definition of µn, µ0 = 1 (as it is just nor-malisation). We have also suppresed the notation of ∀x on the limit in theintegral: it is implied unless otherwise specified.

One can see that from (3.1.8), if we differentiate the expression n-times,with respect to k, we will end up with µn (the factors of n! cancel by differ-entiation). That is,

〈Xn〉 = µn =1in

dnG

dkn

∣∣∣∣k=0

. (3.1.9)

Hence, if one posses the function G(k), then one automatically has anymoment one desires (via differentiation). Conversely, one can see that giventhe moments, if one computes the inverse fourier transform, one has thefunction G(k). That is, we have a generating function for the moments.Thus, we call G(k) the generating function.

Notice that the pdf and generating function relate via

p(x) =∫ ∞−∞

dk

2πG(k)e−ikx. (3.1.10)

3.1.2.3 Cumulants

We define

lnG(k) =∞∑n=1

(ik)n

n!κn, (3.1.11)

where κn is the nth-cumulant. Notice that the sum starts at unity (not zero,as before). This is because of the normalisation condition (κ0 = µ0 = 1).Also in a similar way to before, we have

κn =1in

dn

dknlnG(k)

∣∣∣∣k=0

.

We can then compute the various cumulants

• Mean: κ1 = µ1;• Variance: κ2 = µ2 − µ2

1 = σ2;

Page 220: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

212 Advanced Statistical Physics

• Skewness: κ3 = µ3 − 3µ2µ1 + 2µ31;

• Kurtosis: κ4 = µ4 − 4µ3µ1 − 3µ22 + 12µ2µ

21 − 6µ4

1.

The Gaussian, or normal, pdf is given by

p(x) =1√

2πκ2e−(x−κ1)2/2κ2 , (3.1.12)

and is the pdf for which κn = 0 for all n ≥ 3.The way I compute the κi is to use the function-of-a-function rule in the

differential.

3.1.3 Multi-variable Probability Distribution Functions

Consider the case to begin with bivariate pdfs. That is, a system with twodiscrete stochastic random variables;

X(S) = x1, x2, . . .; Y (S) = y1, y2, . . ..We shall denote the probability of event xi and yj occurring as p(xi, yj).That is

p(xi, yj) ≡ P (xi ∩ yj).Thus, the normality condition is∑

i,j

p(xi, yj) = 1.

We can define the marginal pdf as

p(xi) =∑j

p(xi, yj),

where we just sum over things we “aren’t interested in”. The version ofBayes’ theorem is just

p(xi|yj) =p(xi, yj)p(yj)

;

That is, the probability of xi occurring given that yj has.The continuous version of this is almost trivial:∫∀xdx

∫∀ydyp(x, y) = 1, p(x) =

∫ d

cp(x, y)dy, p(y) =

∫ b

ap(x, y)dx.

Bayes’ theorem is just

p(x|y) =p(x, y)p(y)

,

Page 221: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.1 Elementary Probability Theory 213

where integrating this we can see the curious result∫∀xp(x|y)dx =

∫p(x, y)p(y)

dx

=1

p(y)

∫p(x, y)dx

=p(y)p(y)

= 1.

We can also define some “conditional mean”

〈X〉Y=y ≡∫∀xxp(x|y)dx.

Note the following integral

p(x|y) =∫∀x′

δ(x′ − x)p(x′|y)dx′ = 〈δ(X − x)〉Y−y .

Hence, we see that the expectation value of the delta-function is a conditionalprobability function.

The generalisation to arbitrary dimensions is fairly trivial. Given r stochas-tic variables, we have the pdf

pr(x1, x2, . . . , xr).

Then, the probability that some subset X1, X2, . . . , Xs has certain valuesbetween x1 → x1 + dx1, . . . xs → xs + dxs; regardless of the remainingXs+1, . . . Xr (where we require s < r), is given by

ps(x1, x2, . . . , xs) =∫dxs+1dxs+1 . . . dxrpr(x1, x2, . . . , xr). (3.1.13)

This is the marginal probability density function for that subset. It is com-pletely analogous to the bivariate case, where we integrate over the thingsthat we dont care about, leaving us with the things that we do.

Also consider the conditional probability density function. Consider thatXs+1, . . . Xr have fixed values (i.e. they have values xs+1, . . . , xr), then theprobability that x1, . . . xs happen is just (by analogy to the bivariate case)

ps|r−s(x1, x2, . . . , xs|xs+1, . . . , xr) =pr(x1, x2, . . . , xr)pr−s(xs+1, . . . , xr)

.

Page 222: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

214 Advanced Statistical Physics

Again, by analogy, consider the expression∫dx1dx2 . . . dxsps|r−s(x1, x2, . . . , xs|xs+1, . . . , xr) =∫

dx1dx2 . . . dxspr(x1, x2, . . . , xr)pr−s(xs+1, . . . , xr)

= 1;

Where we use (3.1.13) to integrate over the numerator (the denominator isa constant, as far as the integral is concerned). This also makes sense as weare integrating over the x’s, whilst they are still unknown (all xi, i > s arefixed). This then leaves an integral over all unknowns, which should resultin unity (normality).

3.1.4 Covariance

We define the covariance of two variables

Cov(X,Y ) = 〈XY 〉 − 〈X〉 〈Y 〉 , (3.1.14)

and it describes the relation of X with Y . We also write the covariance as

Cov(X,Y ) = 〈(X − 〈X〉)(Y − 〈Y 〉)〉 .

Similarly, we define the correlation function

Cor(X,Y ) =Cov(X,Y )σXσY

.

3.1.5 The Central Limit Theorem

Suppose we are interested in estimating the mean of some stochastic randomvariable, X, and that we have made n independent observations x1, x2, . . . , xn.We estimate the mean via

〈X〉 =1n

n∑i=1

xi,

where this value will change as more values are taken. That is, the mean〈X〉 is itself a stochastic variable. Thus, what is the probability densityfunction of the mean?

The central limit theorem says that for a large enough sample n, the pdfis the normal (Gaussian) pdf. Let us prove it.

Page 223: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.1 Elementary Probability Theory 215

3.1.5.1 Proof

Let X1, X2, . . . , Xn be independent stochastic variables with the same meanµ = µX and variance σ2 = σ2

X . Now, define some new stochastic quantity

Z ≡ X1 +X2 + . . .+Xn

n− µ,

and let us find its pdf.Now, let

Zi ≡ Xi − µn

; Z =n∑i=1

Zi, (3.1.15)

and then the characteristic function for each Zi is just

GZi(k) =∫ ∞−∞

eikzip(zi)dzi.

Now, the LHS of (3.1.15) leads us to trivially write that

GZi(k) =∫ ∞−∞

eik(xi−µ)/np(xi)dxi. (3.1.16)

Now, expanding the exponential term

ew =∞∑n=0

wn

n!⇒ eik(xi−µ)/n = 1 +

ik(xi − µ)n

− k2(xi − µ)2

2n2+O

(k3

n3

).

Now, putting this back into (3.1.16), writing term by term

GZi(k) =∫ ∞−∞

p(xi)dx1

+ik

n

∫ ∞−∞

(xi − µ)p(xi)dxi

− k2

2n2

∫ ∞−∞

(xi − µ)2p(xi)dxi

+O( k3

n3 ). (3.1.17)

Notice, if we have a very large sample (i.e. large n), then the higher-orderterms are very small. Upon inspection, the second term vanishes∫ ∞

−∞(xi − µ)p(xi)dxi =

∫ ∞−∞

xip(xi)dxi − µ∫ ∞−∞

p(xi)dxi

= µ− µ = 0,

due to normality, and that the moment µ is independent of n. Given this,the first and second terms of (3.1.17) are unity and zero, respectively. We

Page 224: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

216 Advanced Statistical Physics

also identify the integrand of the third term as⟨(xi − µ)2

⟩= σ2

X . Hence,we have that

GZi(k) = 1− k2

2n2σ2X +O( k

3

n3 ),

which may be seen to be the expansion of the exponential

GZi(k) = e−k2

2n2 σ2X+O(

k3

n3 ).

Now, as the events are independent, the generating function of their sum isthe product of the separate functions. That is

GZ(k) =n∏i=1

GZi(k),

which is just

GZ(k) = e−k2

2nσ2X+O(

k3

n2 ).

Therefore, in the limit n → ∞, we ignore the terms O(n−2), giving thecharacteristic function

GZ(k) = e−k2

2nσ2X .

And, as previously stated, the pdf is just the inverse Fourier transform ofthe characteristic function. Thus

p(x) =∫ ∞−∞

dk

2πe−

k2

2nσ2X ,

which results in

p(x) =√

n

2πσ2X

e− nx2

2σ2X ;

nothing less than a Gaussian distribution.We assumed in the proof that all measurements have the same mean and

variance. If one removes the assumption, one is left with merely defining

µi ≡ 〈Xi〉 , σi ≡⟨(xi − µi)2

⟩, σ2

X ≡1n

n∑i=1

σ2i .

3.1.5.2 Limitations of the Theorem

The central limit theorem holds if

• The moments are all independent of n and,• The moments of p(xi) must be finite.

Page 225: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.1 Elementary Probability Theory 217

For example,

p(xi) ≡ aiπ(a2

i + x2i ),

has an infinite second moment, and thus the central limit theorem is invalidin this case.

3.1.6 Time-dependent Systems

Most of the time we are interested in systems where p(x, t); the probabilitythat the stochastic variable X has value x at time t. That is, non-equilibriumsystems. Thus, the formalism of the previous non-time-dependant sectionis easy to extend. The expectation value is just

〈X(t)〉 =∫∀xdxp(x, t),

the expectation value of a function of x

〈f(X(t))〉 =∫∀xdxf(x)p(x, t),

and the nth moment just

〈Xn(t)〉 =∫∀xdxxnp(x, t).

Similarly, we can form joint probability distribution functions. Such asp(x1, t1;x2, t2), which reads “the probability that X has value x1 at time t1,and value x2 at time t2”. Also, we then see that

〈X(t1)X(t2)〉 =∫dx1dx2x1x2p(x1, t1;x2, t2). (3.1.18)

Now, if the value of X at t1 is independent of the value at t2, then we canfactorise the joint probability density

p(x1, t1;x2, t2) = p(x1, t1)p(x2, t2).

Thus, using this independent factorisation, we see that (3.1.18) may bewritten as

〈X(t1)X(t2)〉 =∫dx1dx2x1x2p(x1, t1;x2, t2)

=∫dx1x1p(x1, t1)

∫dx2x2p(x2, t2)

= 〈X(t1)〉 〈X(t2)〉 .

Page 226: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

218 Advanced Statistical Physics

Notice that this gives the result that the covariance (or correlation, depend-ing on who you speak to) of two independent quantities is zero.

This joint probability density fairly simply generalises to more than twovalues;

p(x1, t1; . . . ;xi, ti; . . . ;xn, tn)

is the probability that X has the value xi at time ti.To get marginal probability densities, we can still just integrate out thethings we aren’t interested in, thus

p(x1, t1; . . . ;xm, tm) =∫dxm+1 . . . dxnp(x1, t1; . . . ;xn, tn), m < n.

To get conditional probabilities,

p(x1, t1; . . . ;xs, ts|xs+1, ts+1; . . . ;xn, tn) =p(x1, t1; . . . ;xn, tn)

p(xs+1, ts+1; . . . ;xn, tn).

If we just read out what this is saying: the probabilty of X having the valuesx1, t1, . . . xs, ts, given that it already has the values xs+1, ts+1, . . . , xn, tn, isthe total probability divided by the probability that those things happened(those that we said have occurred).

We can also have conditional means,

〈X(t2)〉X(t1)=x1=∫dx2x2p(x2, t2|x1, t1);

which reads as the expectation value of X, at time t2, given that it hadvalue x1 at time t1. In a similar way to before, we see that

p(x2, t2|x1, t1) = 〈δ(X(t2)− x2)〉X(t1)=x1.

As some more notation, we shall write cumulants with double brackets.That is,⟨⟨

X2⟩⟩

=⟨X2⟩− 〈X〉2 , ⟨⟨

X3⟩⟩

=⟨X3⟩− 3 〈X〉2 〈X〉+ 2 〈X〉3 .

3.1.6.1 Stationary Processes: Definition

We define a stationary process as one whose probability densities depend ontime differences alone. Thus, that means that

p(xn, tn + τ ; . . . ;x1, t1 + τ) = p(xn, tn; . . . ;x1, t1), ∀n, τ.Thus, for example, taking τ = −t1, then

p(x1, t1 − t1) = p(x1, t1) ⇒ p(x1, t1) = p(x1, 0),

Page 227: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.2 Markov Processes 219

leaving the resultant probability time independent. As another example,consider

p(x1, t1;x2, t2) = p(x1, 0;x2, t2 − t1) = p(x1, t1 − t2;x2, 0).

Therefore notice that as

〈X(t1)X(t2)〉 =∫dx1dx2x1x2p(x1, t1;x2, t2)

is symmetric under t1 ↔ t2, it therefore only depends upon |t1 − t2| whenthe associated process is stationary.

3.1.6.2 Gaussian Process: Definition

A process is Gaussian if all cumulants beyond second are zero. Then, sucha process is fully specified by

〈X(t1)X(t2)〉 , 〈X(t1)〉 .

3.2 Markov Processes

3.2.1 Introduction

A Markov process is one whose conditional probability density functions areonly affected by the state of the system at a given time, and not by thestates of the system at time prior to that time. That is,

p(xk+1, tk+1|xk, tk; . . . ;x1, t1)

depends on the stateX(tk) = xk, but not the statesX(tk−1) = xk−1, . . . , X(t1) =x1, for all states k. Thus,

p(xk+1, tk+1|xk, tk; . . . ;x1, t1) = p(xk+1, tk+1|xk, tk). (3.2.1)

Such processes may colloquially be thought of as those which “only re-member current state of system, and not all previous states”.

Let us see how to use this assumption. Consider the joint probabilitydensity function

p(xn, tn; . . . ;x1, t1)

= p(xn, tn|xn−1tn−1; . . . ;x1, t1)p(xn−1, tn−1; . . . ;x1t1), (3.2.2)

where we have used Bayes’ theorem to write it in this way. Let us use(3.2.1), the “Markov assumption”, to write the first expression on the RHSof (3.2.2) as

p(xn, tn|xn−1tn−1; . . . ;x1, t1) = p(xn, tn|xn−1tn−1).

Page 228: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

220 Advanced Statistical Physics

Let us use Bayes’ theorem again, on the second expression on the RHS of(3.2.2)

p(xn−1, tn−1; . . . ;x1t1)

= p(xn−1, tn−1|xn−2, tn−2; . . . ;x1, t1)p(xn−2, tn−2; . . . ; y1, t1).

Therefore, using these two expressions, we see that we can write (3.2.2) as

p(xn, tn; . . . ;x1, t1)

= p(xn, tn|xn−1tn−1)p(xn−1, tn−1|xn−2, tn−2; . . . ;x1, t1)p(xn−2, tn−2; . . . ; y1, t1).

Now, we see that we can repeat this whole process of using Bayes’ theorem& the Markov assumption, on the two expressions on the far-RHS of theabove. In doing so, we end up with

p(xn, tn; . . . ;x1, t1) =n−1∏i=1

p(xi+1, ti+1|xi, ti)p(x1, t1). (3.2.3)

As another example, consider the conditional probability density function

p(xk+ltk+l; . . . ;xk+1, tk+1|xk, tk; . . . ;x1, t1) =p(xk+l, tk+l; . . . ;x1, t1)p(xk, tk; . . . ;x1, t1)

,

after using Bayes’ theorem.Now, using our derived result, (3.2.3), we see that we can re-write the nu-merator & denominator as, using the implicit Markov assumption (in thatwe are allowed to do this at all)

p(xk+l, tk+l; . . . ;x1, t1) =k+l−1∏i=1

p(xi+1ti+1|xi, ti)p(x1, t1),

p(xk, tk; . . . ;x1, t1) =k−1∏i=1

p(xi+1ti+1|xi, ti)p(x1, t1).

Therefore, using these, our original expression reads

p(xk+ltk+l; . . . ;xk+1, tk+1|xk, tk; . . . ;x1, t1) =∏k+l−1i=1 p(xi+1ti+1|xi, ti)p(x1, t1)∏k−1j=1 p(xj+1tj+1|xj , tj)p(x1, t1)

,

some terms of which we see cancel down to leave

p(xk+ltk+l; . . . ;xk+1, tk+1|xk, tk; . . . ;x1, t1) =k+l−1∏i=k

p(xi+1ti+1|xi, ti).(3.2.4)

Page 229: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.2 Markov Processes 221

Let us just write down the main results, again, together. These results onlywork when using the Markov assumption for the associated process.

p(xn, tn; . . . ;x1, t1) =n−1∏i=1

p(xi+1, ti+1|xi, ti)p(x1, t1),(3.2.5)

p(xk+ltk+l; . . . ;xk+1, tk+1|xk, tk; . . . ;x1, t1) =k+l−1∏i=k

p(xi+1ti+1|xi, ti).(3.2.6)

We can apply these results to rather easy (but useful) examples.Consider that

p(x2, t2) =∫dx1p(x2, t2;x1, t1),

but, using (3.2.5), we can rewrite the integrand as

p(x2, t2;x1, t1) =1∏i=1

p(xi+1, ti+1|x1, t1)p(x1, t1),

thus leaving us with

p(x2, t2) =∫dx1p(x2, t2|x1, t1)p(x1, t1), (3.2.7)

which is a useful (if obvious) result.Next, consider how to prove that

p(x3, t3|x1, t1) =∫dx2p(x3, t3|x2, t2)p(x2, t2|x1, t1).

Now, using (3.2.5), we see that we have

p(x3, t3;x2, t2;x1, t1) = p(x3, t3|x2, t2)p(x2, t2|x1, t1)p(x1, t1),

integrating both sides with respect to x2,∫dx2p(x3, t3;x2, t2;x1, t1) =

∫dx2p(x3, t3|x2, t2)p(x2, t2|x1, t1)p(x1, t1),

⇒ p(x3, t3;x1, t1) =(∫

dx2p(x3, t3|x2, t2)p(x2, t2|x1, t1))p(x1, t1).

Using Bayes’ theorem on the LHS,

p(x3, t3;x1, t1) = p(x3, t3|x1, t1)p(x1, t1),

and therefore we see that

p(x3, t3|x1, t1) =∫dx2p(x3, t3|x2, t2)p(x2, t2|x1, t1) (3.2.8)

Page 230: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

222 Advanced Statistical Physics

which is thus proven (we cancelled off the common p(x1, t1) factor).The derived result (3.2.8) is called the Chapman-Kolmogorov (CK) equa-

tion. The CK equation tells us that in order to find the probability of transi-tions from x1 → x3, we integrate over all possible ways of getting there, viax2. That we integrate over x2 is that we add up all possible ways of gettingfrom x1 at time t1 to state x3 at time t3, via any possible intermediate x2

(the integral sweeps over all possible intermediate states).

3.2.2 Markov Chains

Here we consider state variables to be discrete, denoted n. The Markovequations (3.2.7) and (3.2.8) are (moving to summations for discrete vari-ables)

p(n2, t2) =∑n1

p(n2, t2|n1, t1)p(n1, t1), (3.2.9)

p(n3, t3|n1, t1) =∑n2

p(n3, t3|n2, t2)p(n2, t2|n1, t1), (3.2.10)

where we have the ordering t1 < t2 < t3. The time parameter t is also dis-crete for Markov chains. We call the final equation, (3.2.10), the Chapman-Kolmogorov equation. Thus, under a trivial redefinition of the symbols, thisreads

p(n, t+ 2|m, t) =∑n′

p(n, t+ 2|n′, t+ 1)p(n′, t+ 1|m, t).

If we “say” what this equation is: the probability that the system is in staten, at time t + 2, given that it was in state m at time t, is the sum over allpossible intermediate states n′ (where the system was in some state n′ attime t+ 1) multiplied by the probability that the system started in state mat time t.That is, if the system is going from some state m to n; starting at time t,ending at time t + 2, then we sum over all possible intermediate states n′

(which occur at time t + 1), and then find the probability that the systemmoved from that intermediate state (i.e. n′) to the state we are interestedin (i.e. n).

By way of making this lot easier to denote, let us define

Qmn(t) ≡ P (m, t+ 1|n, t), (3.2.11)

as the probability that the system transitions at time t state n to state m, attime t+ 1. Thus, Qmn(t) describes a transition probability matrix, generallybeing a function of time.

Page 231: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.2 Markov Processes 223

If we use this notation in (3.2.9), as well as Pn(t) denoting the same asp(n, t) (i.e. the probability that the system is in state n at time t), then wehave

Pn(t+ 1) =∑m

Qnm(t)Pm(t). (3.2.12)

Therefore, using this, and given the transition matrix & state of the systemat time t, one can find the probability that the system is in state n at timet+ 1.

Let us consider some examples, and how to formulate the transition ma-trix.

3.2.2.1 Random One-Dimensional Walk

n0 1 2 3 4!1

pq

Fig. 3.2. The 1D random walker. Starting at some position n0, there is an associ-ated probability of moving one step to the left or right.

Consider a line, which has been discretely divided up into bits, each ofunit length. Each bit of the line has an associated number n. Let n =0,±1,±2, . . . ,±∞.

Now, suppose that a “walker” starts at a given point n0, say, and movesa single place to either the left or right. The probability of moving left isq and to the right p. Then, we have that p + q = 1. Here, we model theprobabilities as being independent of time.

So then, what is the probability of the walker being at site n at time t?The way we have formulated this problem is in a “nearest neighbour” way,

but there are many ways in which one could. That is, we have stipulatedthat if we start at n′, then we can only move to n = n′ ± 1 in a single step.

So, the transition probability is

Qnn′ =

p, n = n′ + 1,q, n = n′ − 1,0 else.

Page 232: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

224 Advanced Statistical Physics

Then, we see that if we write Q as a matrix, this simply reads

Q =

0 q 0 0 . . .

p 0 q 00 p 0 q

0 0 p 0...

. . .

,

where the dimension of Q will simply be the allowed values of n. We couldalso setup the problem so that the walker could stay put. That is, withprobability r, the walker ends up at n = n′; in this case, we have

Qnn′ =

p, n = n′ + 1,q, n = n′ − 1,r, n = n′,

0 else.

⇒ Q =

r q 0 0 . . .

p r q 00 p r q

0 0 p r...

. . .

.

Given this extra probability, we just need to make sure that p+ q + r = 1.Just to hammer the point home; Qnn′ is the probability that the system

transitions from state n′ to state n, between times t and t+ 1.In this way, we can predict the state of the system at any time, given an

initial state. That is, we can compute Pn(t), the probability that the systemis in the state n at time t, given the system was in state n at time t = 0, forall n.

Typically, suppose the system is in a pure state n0 at time t = 0, then wehave the initial condition that

Pn(0) = δnn0 ,

where we notice that ∑n

Pn(0) =∑n

δnn0 = 1,

confirming normalisation.

3.2.2.2 Urn Models

Suppose we have 2 pots, A and B, with 3 red balls and 2 white balls. Theballs are to be distributed so that A always contains 2 balls, and B always 3balls. Let us write down the states of the system, given these simple rules.We denote a white ball by W, and red by R.

• A contains 2W; B has 3R: (W,W ), (R,R,R). Denote this as n = 1.

Page 233: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.2 Markov Processes 225

n = 1

n = 2

n = 3

A B

Fig. 3.3. The possible configurations in the urn model. All pots on the left are A,and those on the right B. Notice that there are two types of “ball”: red and white.

• A contains 1W; 1R; B has 1W, 2R: (W,R), (W,R,R). Denote this asn = 2.• A contains 2R; B has 2W, 1R: (R,R), (R,W,W ). Denote this as n = 3.

So, what are the transition probabilities? We shall be picking up one ballfrom each of A and B, and replacing the ball into the other pot.

Suppose we start in the state n = 1. There is only one other state we canpossibly go to, by our dynamics rule. Thus, the probability of going fromstate n = 1 to n = 1 is 0, from 1→ 2 is 1, from 1→ 3 is 0. Thus, we easilysee that

Q11 = 0, Q21 = 1, Q31 = 0.

Suppose we start in state n = 2. We can go 2 → 1 by picking a W frompot B (with probability 1/3) and the R from pot A (with probability 1/2).Then, the probability is just 1/3×1/2 = 1/6. To make the transition 2→ 2,

Page 234: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

226 Advanced Statistical Physics

we require picking up a W from each (with probability 1/2 from A and 1/3from B) or picking an R from each (1/2 from A, and 2/3 from B). Thus,the total probability of transition is 1/2 × 1/3 + 1/2 × 2/3 = 1/2. Thus,Q22 = 1/2. To make the transition 2 → 3, we require picking the W fromA, and R from B; thus probability of Q32 = 1/2× 2/3 = 1/3. Hence

Q12 = 1/6, Q22 = 1/2, Q32 = 1/3.

And finally, by similar logic, we can compute that

Q13 = 0, Q23 = 2/3, Q33 = 1/3.

And therefore, the transition matrix is

Q =

0 1/6 01 1/2 2/30 1/3 1/3

.

Notice that columns add to unity (which they should).So, consider that the system starts off in a pure state. Say it is in state

n = 3. Then,

P (0) =

001

.

Hence, the probability of where the system is, at time t = 1 is given by themultiplication

P (1) = QP (0) =

0 1/6 01 1/2 2/30 1/3 1/3

001

=

02/31/3

.

Further consider what the distribution of probabilities is, of the state of thesystem, after another time step

P (2) = QP (1) = Q2P (0) =

1/920/361/3

;

That is, if the system starts off in state n = 3, then, after 2 “goes”, onewill find it in state n = 1 with probability 1/9, state n = 2 with probability20/36, and state n = 3 with probability 1/3.

Page 235: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.2 Markov Processes 227

3.2.3 Stochastic Matrices

One may ask: is there a state that the system settles down in? That is, does

P (t) = QtP (0)

have one element which completely overwhelms all the others? Let us con-sider this. Infact, this statement only works if the transition matrix is in-dependent of time. To continue, we must have a little aside on right- andleft-eigenvectors of matrices.

3.2.3.1 Right & Left Eigenvectors

Let us proceed to calculate the right and left eigenvectors of some matrix.We will do this by example. Consider the matrix

M =( −1 2−3 4

),

which clearly is not a stochastic matrix (its columns don’t sum to unity, andit has negative entries).

So, we compute its eigenvalues in the usual way; subtract λ from thediagonals, and set the determinant to zero;∣∣∣∣ −1− λ 2

−3 4− λ∣∣∣∣ = 0 ⇒ λ2 − 3λ+ 2 = 0.

Thus, the characteristic equation (the one on the right, above) has solutions

λ(1) = 1, λ(2) = 2.

We find what we will now call the right eigenvectors the way we “normally”find eigenvectors. That is, by solving( −1 2

−3 4

)(x

y

)= λ

(x

y

).

We shall denote the right eigenvector as ψ(i), corresponding to eigenvalueλ(i). So, we can fairly easily find that

λ(1) = 1 ⇒ ψ(1) =(

11

),

and

λ(2) = 2 ⇒ ψ(2) =(

23

).

Page 236: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

228 Advanced Statistical Physics

Now, to find the left eigenvectors χ(i), which corresponds to eigenvector λ(i),we solve

(x y)( −1 2−3 4

)= λ(i)(x y),

for each λ(i). So, for λ(1) = 1, we see that

−x− 3y = x ⇒ (3 − 2),

and that for λ(2) = 2, we see that

x = −y ⇒ (1 − 1).

Now, one will notice that these eigenvectors are rows, rather than columns.We therefore denote the things above, (χ(i))T . That is,

(χ(1))T = (3 − 2) ⇒ χ(1) =(

3−2

)(χ(2))T = (1 − 1) ⇒ χ(2) =

(1−1

).

Now, by way of convenient notation, we denote right eigenvectors as kets∣∣ψ(i)⟩, and left eigenvectors as bras

⟨χ(i)∣∣. Thus, notice the orthogonality of

the two sets of eigenvectors:

〈χ(1)|ψ(2)〉 = 0, 〈χ(2)|ψ(1)〉 = 0.

3.2.3.2 Properties of Stochastic Matrices

As we saw with our Urn-model example, the columns of a stochastic ma-trix sum to unity. This is obviously the case, as something must happen.Mathematically, using the definition of Q, we see that this corresponds to∑

n

Qnm =∑n

P (n, t+ 1|m, t) = 1,

so that the probability that the state transitions from state m at time t, toany other state n at time t+ 1, is unity.

Also, all entries of a stochastic matrix are non-negative. This is fairlyobvious, as probabilities are non-negative;

Qnm ≥ 0.

Following this, if Q1 & Q2 are two stochastic matrices, then so is theirproduct Q1Q2. Furthermore, if Q is a stochastic matrix, then so is anypower of that matrix, Qt.

In our brief example, we saw that an eigenvalue λ = 1 appeared. This

Page 237: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.2 Markov Processes 229

is a general property of stochastic matrices. All stochastic matrices haveone eigenvalue which is unity. Corresponding to that eigenvalue, the lefteigenvector is “unit”. That is,

λ(1) = 1, (χ(1))T = (1 1 . . . 1).

This is called the stationary state. The eigenvalue equation for this is just∑n

χ(1)n Qnm = 1.χ(1)

m .

Note that our brief example was not a stochastic matrix, and therefore thisleft eigenvector did not appear.

If the system approaches a time-independent state, P st, say, then this isan equilibrium state. This means that no matter what the initial state ofthe system, it will tend towards some stationary state. That is,

P (t+ 1) = QP (t),

which is just

P st = QP st,

which is merely an eigenvalue equation, corresponding to eigenvalue 1. Thatis,

QP st = 1.P st.

Therefore, for the eigenvalue λ = 1, we see that the right eigenvector is P st,and the left χ(1), whereby their product is unity,

(1 1 . . . 1)

P st1

P st2...

=∑n

P stn = 1.

Finally, all eigenvalues of a stochastic matrix have modulus ≤ 1,

|λ(i)| ≤ 1, ∀i.If the matrix is symmetric, QT = Q, then the left and right eigenvectors arethe same.

3.2.3.3 Example: Urn Model

Let us compute the right & left eigenvectors for our previous transitionmatrix

Q =

0 1/6 01 1/2 2/30 1/3 1/3

,

Page 238: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

230 Advanced Statistical Physics

for the Urn model. We compute eigenvalues via∣∣∣∣∣∣−λ 1/6 01 1/2− λ 2/30 1/3 1/3− λ

∣∣∣∣∣∣ = 0,

which results in the characteristic equation

λ3 − 56λ

2 − 29λ+ 1

18 = 0.

To solve this cubic, we first note that we know that one factor is λ = 1 (itbeing a stochastic matrix implies that there is one unit eigenvalue). Thus,we factorise;

(λ− 1)(λ2 + aλ− 118) = 0.

If we expand this out, and compare the powers of λ2 with the originalcharacteristic equation, we find that a = 1

6 . Therefore,

(λ− 1)(λ2 + 16λ− 1

18) = 0,

which further factorises to

(λ− 1)(λ+ 13)(λ− 1

6) = 0.

Therefore, the 3 eigenvalues are

λ(1) = 1, λ(2) = −13 , λ(3) = 1

6 .

Now, corresponding to λ(1) = 1, we know that the left eigenvector is just

(χ(1))T = (1 1 1), λ(1) = 1.

So, to find the corresponding right eigenvector, ψ(1), we solve in the usualway; 0 1/6 0

1 1/2 2/30 1/3 1/3

x

y

z

= 1.

x

y

z

,

to give

ψ(1) =

163

.

This is the stationary state; but, to correctly normalise it so that 〈χ(1)|ψ(1)〉 =

Page 239: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.2 Markov Processes 231

1, we note that

(1 1 1)

163

= 10; ψ(1) = P st =

1/103/53/10

.

And thus, we have found the stationary state. The other eigenvectors arefairly easily found to be

λ(2) = −13 ⇒ ψ(2) =

1/6−1/31/6

, χ(2) =

3−11

;

λ(3) = 16 ⇒ ψ(3) =

4/15−4/158/15

, χ(3) =

−3/2−1/4

1

.

Remember that we get the χ(i) in the form of a row vector, (χ(i))T . Tocontinue, we make the identification

ψ(i) 7−→ ∣∣ψi⟩ , (χ(i))T 7−→⟨χ(i)∣∣∣ .

And therefore, using this notation, we see orthonormality;

〈χ(1)|ψ(3)〉 = 0.

Also note, as ψ(2,3) are orthogonal to χ(1), their entries must sum to zero(which they do).

3.2.3.4 General Theory

We now prove various relations, in a very similar fashion to quantum theory.This theory does not rely on the matrix being stochastic; it is true for anymatrix.

Orthogonality Let Q be an m×m matrix. Then, the eigenvalue equations(for both right & left eigenvectors) are

Q∣∣∣ψ(i)

⟩= λ(i)

∣∣∣ψ(i)⟩,⟨χ(i)∣∣∣Q = λ(i)

⟨χ(i)∣∣∣ .

Then, forming the product with a bra-state on the first, and a ket-state onthe second, we have that⟨

χ(j)∣∣∣Q ∣∣∣ψ(i)

⟩= λ(i)〈χ(j)|ψ(i)〉,

⟨χ(i)∣∣∣Q ∣∣∣ψ(j)

⟩= λ(i)〈χ(i)|ψ(j)〉.

Swapping the indices over, on the second expression, results in⟨χ(j)

∣∣∣Q ∣∣∣ψ(i)⟩

= λ(j)〈χ(j)|ψ(i)〉.

Page 240: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

232 Advanced Statistical Physics

Therefore, subtracting, we have

0 =(λ(i) − λ(j)

)〈χ(j)|ψ(i)〉.

So, if λ(i) 6= λ(j), then this simply reads

〈χ(j)|ψ(i)〉 = 0, λ(i) 6= λ(j), i 6= j.

If we have chosen normality, then we can say that

〈χ(j)|ψ(i)〉 = δij , (3.2.13)

which is the statement of orthonormality.

Completeness We can expand an arbitrary state (probability) into eigen-states of the matrix;

|P 〉 =m∑i=1

αi

∣∣∣ψ(i)⟩.

So, forming the product of this with a bra-state gives

〈χ(j)|P 〉 =m∑i=1

αi〈χ(j)|ψ(i)〉,

which, by our orthonormality statement, is simply

〈χ(j)|P 〉 =m∑i=1

αi〈χ(j)|ψ(i)〉 =m∑i=1

αiδij = αj .

Therefore, we have the coefficients:

αi = 〈χ(i)|P 〉. (3.2.14)

Then, using this in our original expansion,

|P 〉 =m∑i=1

∣∣∣ψ(i)⟩αi =

m∑i=1

∣∣∣ψ(i)⟩〈χ(i)|P 〉,

which leads us to state that∑i

∣∣∣ψ(i)⟩⟨

χ(i)∣∣∣ = 1. (3.2.15)

Page 241: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.2 Markov Processes 233

3.2.3.5 Powers of a Matrix

If we multiply (3.2.15) by Q, from the LHS, we have

Q∑i

∣∣∣ψ(i)⟩⟨

χ(i)∣∣∣ = 1.Q ⇒ Q =

∑i

λ(i)∣∣∣ψ(i)

⟩⟨χ(i)∣∣∣ .

Now, consider that

Q2∣∣∣ψ(i)

⟩= QQ

∣∣∣ψ(i)⟩

= Qλ(i)∣∣∣ψ(i)

⟩= λ2

∣∣∣ψ(i)⟩,

then it is clear that

Qt∣∣∣ψ(i)

⟩=(λ(i))t ∣∣ψi⟩ .

Then, we easily see that

Qt =∑i

(λ(i))t ∣∣∣ψ(i)

⟩⟨χ(i)∣∣∣ .

Just to make notation a little easier, we shall express this as

Qt =∑i

λt(i)

∣∣∣ψ(i)⟩⟨

χ(i)∣∣∣ . (3.2.16)

Therefore, we have a way in which we can compute the powers of a matrix;by forming the products of right with left eigenvectors.

Example: Urn Model Let us return to the Urn models matrix & eigen-vectors, to compute an arbitrary power of the matrix.

So, for λ(1) = 1, we see that

∣∣∣ψ(1)⟩⟨

χ(1)∣∣∣ =

1/103/53/10

(1 1 1) =

1/10 1/10 1/103/5 3/5 3/53/10 3/10 3/10

≡ Q1.

For λ(2) = −1/3, we have that

∣∣∣ψ(2)⟩⟨

χ(2)∣∣∣ =

1/6−1/31/6

(3 − 1 1) =

1/2 −1/6 1/6−1 1/3 −1/31/2 −1/6 1/6

≡ Q2.

And finally, for λ(3) = 1/6, we see that

∣∣ψ3⟩ ⟨χ3∣∣ =

2/5 1/15 −4/152/5 1/15 −4/15−4/5 −2/15 8/15

≡ Q3.

Page 242: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

234 Advanced Statistical Physics

So, an arbitrary power of Q may be found from

Qt = Q1 + (−13)tQ2 + (1

6)tQ3.

Now, we can notice a few things from this. First, note that the smallesteigenvalue will have little effect on the late-time behavior of the system.That is, for high t, the last term will be negligible. The next largest eigen-value gives the dominant large t behavior. Second, notice that if any of theeigenvalues had been > 1, then the system would have diverged. Finally,the very-large t behavior is completely determined by Q1.

Thus, we have a way of computing

P (t) = QtP (0).

3.2.4 Examples of Markov Chains

Here we shall consider some more transition matrices, and how boundaryconditions effect their structure.

The Gamblers’ Ruin Suppose a gambler starts out with £n0, and makesa series of £1 bets against the house. The probability of winning each betis p, and of loosing is q = 1 − p. If the gamblers capitol ever reaches zero,he is ruined & stops playing; he remains at zero.

This is obviously a random walk; but with an absorbing boundary atn = 0. As it stands, the transition matrix is of infinite dimension: there isno upper-stopping point. A variant could be that once the gambler reachesa given amount, £N , he stops again. Thus, there are now two absorbingboundaries.

The transition matrix, naively (i.e. incorrectly) is

Q =

0 q 0 0 . . .

p 0 q 00 p 0 q 0...

. . .0 q 0p 0 q

. . . 0 p 0

.

Now, one will notice that the first & last column do not add to unity. Also,consider that the absorbing boundary is at positions n = 1, N . Then, asthe matrix stands, the element Q21 = p. That is, there is a probabilityof the walker moving from state 1 → 2. This is in contradiction with our

Page 243: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.2 Markov Processes 235

absorbing boundary. Also, the matrix currently has QN−1,N = q; again, thisis in contradiction with our absorbing boundary. The elements mentionedshould be set to zero; and the element Q11 = QNN = 1. Thus,

Q =

1 q 0 0 . . .

0 0 q 00 p 0 q 0...

. . .0 q 0p 0 0

. . . 0 p 1

.

is the correct transition matrix, for the two absorbing boundaries case.

Random Walk with Reflecting Boundaries Suppose that now, at aboundary, the walker “bounces off” or stays put. That is, if the walker is atthe boundary, then on the next “go”, there is a probability that he eithermoves off it, or stays on it; there is no probability that he moves throughthe barrier. Then, given that p is the probability of moving left, and q right,we see that we must have Q11 = q: it must move right at the boundary.Similarly, at the other boundary, QNN = p: it must move left. Therefore,this transition matrix looks like

Q =

q q 0 0 . . .

p 0 q 00 p 0 q 0...

. . .0 q 0p 0 q

. . . 0 p p

.

Birth & Death Processes Suppose that at t = 0, there are n0 bacteriain a colony. At each time step there is a probability µ that one dies, and λ

that one lives.This is clearly another example of a random walk.

3.2.4.1 The Ehrenfest Urn

Consider two containers, A & B, which contain molecules of the same gas.There are a total of N molecules in A & B.

The dynamics of the system is such that at each time step, a molecule ischosen at random from a container, and put in the other.

Page 244: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

236 Advanced Statistical Physics

So, suppose that there are n′ in A at time t. Then, at time t + 1, urn A

will have one less molecule with a probability n′/N . It will obviously haveincreased in number with probability 1 − n′/N . So, the transition matrixlooks like

Qnn′ =

n′

N n = n′ − 11− n′

N n = n′ + 10 else

.

That is, we have the transition probability for urn A to go from having n′

molecules, to having n, where n = n′±1. We shall look at the problem fromthe point of view of urn A. Obviously, B will be directly linked to A.

Notice that now, the transition probability depends on the current stateof the system. That is, if there are more molecules in A, then there is ahigher probability of choosing a molecule from A than from B.

We can deduce the elements of the transition matrix, for the case N = 3.That is, a system where there are 3 molecules in total.

• Q00: probability of A going from 0 molecules to 0 molecules is Q00 = 0.This is obvious as a molecule must be placed in the container.• Q01: probability of A having 1 then 0 is Q01 = 1/3. There are only 1 in 3

molecules in A to begin with, so the probability that the single moleculein A is chosen is just 1/3.• Q21: probability of A going from 1 to 2 molecules. This is the chance of

picking one of the two molecules in B, and moving it over. Obviously,this is 2/3.

We continue untill we have filled the whole transition matrix.

Q =

0 1/3 0 01 0 2/3 00 2/3 0 10 0 1/3 0

.

Now, we state (without proof), the eigenvalues of such a matrix, for generalN :

λ(i) = 1− 2iN, i = 0, 1, . . . , N.

The right eigenvector, corresponding to the eigenvalue λ = 1 (i.e. the sta-tionary state) has elements

P stn =

N !(N − n)!n!

12N

.

Notice that this is a binomial distribution; which tends towards a normal

Page 245: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.2 Markov Processes 237

distribution for N → ∞. The variance of a normal distribution goes as1/√N . Therefore, for a system with many molecules, the system tends

towards a state where the number of molecules is the same in each urn.This will be equilibrium, but with fluctuations of the order 1/

√N .

3.2.4.2 The Wright-Fisher Model

Suppose we have a population of individuals, at time t, who mate randomly,to produce a new generation at the next time step t+ 1.

We shall focus on one particular gene, which may be in one of two states(which are called alleles). The two alleles are A & B.

The dynamics of the system are such that we select a gene at random, wecopy it, place the copy in the new generation, and return the original to theparent population. We continue until there are the same number of genesin the new generation as in the old (i.e. gene number conservation), at timet+ 1.

The idea is, that if the gene pool at time t is saturated with loads of oneallele, and not very many of a second allele, then that second allele will beunlikely to be “picked” for copy into the next generation; thus, eventually,wiping out that allele. It is basically random genetic drift.

Suppose that at time t we have n′ A alleles, and N − n′ B alleles (i.e. Nin total). Then, the probability that in N trials, we get n A alleles, is givenby the binomial distribution (

N

n

)pnqN−n,

where p is the probability of picking A and q of B,

p =n′

N, q = 1− p = 1− n′

N.

Then, it is clear that the transition matrix for allele A (whereby we cantrivially obtain that for B), is

Qnn′ =(N

n

)pnqN−n =

(N

n

)(n′

N

)n(1− n′

N

)N−n. (3.2.17)

Now, we can start to analyse this expression a little.Consider the expectation value at time t+ 1,

〈n(t+ 1)〉 =N∑n=0

nPn(t+ 1).

Now, as the process is a Markov one, the probability of its current state can

Page 246: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

238 Advanced Statistical Physics

be written

〈n(t+ 1)〉 =N∑n=0

nPn(t+ 1)

=N∑n=0

nN∑

n′=0

Qnn′Pn′(t)

=∑n′

(∑n

nQnn′

)Pn′(t).

Now, we can write the bracketed quantity using (3.2.17);∑n

nQnn′ =∑n

n

(N

n

)pn(1− p)N−n,

but this is just the expectation value of the binomial distribution, which isNp. ∑

n

nQnn′ = Np = Nn′

N= n′.

Therefore, the expectation value takes on a rather simple form,

〈n(t+ 1)〉 =∑n′

n′Pn′(t) =∑n

nPn(t) = 〈n(t)〉 .

That is,

〈n(t+ 1)〉 = 〈n(t)〉 .So, if we start off with n0 A alleles, then

〈n(t)〉 = n0.

This suggests that the system will not change over time, but one mustunderstand that this is an “ensemble average”, this is a value which is anaverage over many runs (“realisations”) of the system.

We can also ask what the system looks like after a long time. Intuitively,it seems clear that the system will be entirely one allele or the other. Thus,

P st =

1−Π

0...0Π

,

Page 247: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.2 Markov Processes 239

where Π is the probability that A is fixed. That is,

N∑n′=0

Qnn′Pstn′ = P st

n .

We can intuitively calculate Π (this stationary state part has all “only beenintuitive”), by considering the expectation value, and what happens to it ast→∞; the probability will go to the stationary state

〈n(t)〉 →N∑n=0

nP stn = 0.(1−Π) +N.Π.

But, we also have that 〈n(t)〉 = n0; therefore,

Π =n0

N.

Now, the Wright-Fisher model has two independent right eigenvectors,both corresponding to λ = 1; 1

...0

,

0...1

;

that is, all entries zero except the top & bottom ones. The left eigenvectorsare

(1 1 . . . 1), (0 1 2 . . . N).

So, for repeated eigenvalues, a linear combination of the eigenstates is alsoan eigenstate; but the combination must be chosen so that orthogonality ispreserved.

3.2.5 The Master Equation

This is basically the continuous time version of Markov chains.

3.2.5.1 Derivation

The Chapman-Kolmogorov (CK) equation is

P (n, t+ ∆t|n0, t0) =∑n′

P (n, t+ ∆t|n′, t)p(n′, t|n0, t0).

Now, we assume that

P (n, t+ ∆t|n′, t) =

1− κn(t)∆t+O(∆t)2 n = n′,

wnn′∆t+O(∆t)2 n 6= n′.(3.2.18)

Page 248: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

240 Advanced Statistical Physics

Where wnn′ will become clear that it is a transition rate. Now, by normali-sation,

1 =∑n

P (n, t+ ∆t|n′, t)

= 1− κn′(t)∆t+O(∆t)2 +∑n6=n′

wnn′∆t+O(∆t)2

= 1− κn′(x) +∑n6=n′

wnn′∆t

⇒ κn′(t) =∑n6=n′

wnn′ .

Alternatively, switching the indices,

κn(t) =∑n′ 6=n

wn′n. (3.2.19)

Therefore, using (3.2.18), we see that the CK equation reads

P (n, t+ ∆t|n0, t0) = (1− κn(t)∆t+ . . .)P (n, t|n0, t0)

+∑n′ 6=n

wnn′(t)P (n′, t|n0, t0)∆t+O(∆t)2,

which is easily rearranged to

P (n, t+ ∆t|n0, t0)− P (n, t|n0, t0)∆t

= −κn(t)P (n, t|n0, t0)

+∑n′ 6=n

wnn′(t)P (n′, t|n0, t0) +O(∆t).

Now, in the limit that ∆t→ 0, the LHS becomes a differential, and the RHSeasily becomes

dP (n, t)dt

= −κn(t)p(n, t) +∑n′ 6=n

wnn′(t)p(n′, t),

using (3.2.19), the middle term is rewritten,

dP (n, t)dt

= −∑n′ 6=n

wn′nP (n, t) +∑n′ 6=n

wnn′(t)P (n′, t).

Therefore, we have our master equation,

dP (n, t)dt

=∑n′ 6=n

wnn′(t)P (n′, t)−∑n′ 6=n

wn′nP (n, t). (3.2.20)

Page 249: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.2 Markov Processes 241

The interpretation of this is pretty simple. The first term is just the prob-ability of transitioning from n′ → n, and the second just the other wayaround.

So, we say that the rate of change of probability of being in a state n isequal to the probability of making a transition into n, minus the probabilityof transitioning out of n.

This is all because of the Markov property that wn′n does not depend onprevious states.

3.2.5.2 Relation to Markov Chains

Here we shall provide another derivation; but one that is much less rigorousthan the previous one.

So, let us start from

Pn(t+ 1) =∑n′

Qnn′Pn′(t),

where the columns of the transition matrix add to unity;∑n

Qnn′ = 1.

So, if we write

Pn(t+ 1)− Pn(t) =∑n′

Qnn′Pn′(t)−∑n′

Qn′n(t)Pn(t),

and further that

Pn(t+ 1)− Pn(t) =∑n′ 6=n

Qnn′Pn′(t)−∑n′ 6=n

Qn′n(t)Pn(t),

whereby we notice that the n = n′ terms cancel. Now, let us take the timestep to be ∆t, rather than unity, and divide through by the time step;

Pn(t+ ∆t)− Pn(t)∆t

=∑n′ 6=n

Qnn′

∆tPn′(t)−

∑n′ 6=n

Qn′n∆t

Pn(t).

Then, it is clear that taking the time step to zero, reduces the LHS to adifferential of the probability, with respect to time. Furthermore, let usassume that on average, one event happens in the time step. To achievethis, let us assume

Qnn′ = wnn′∆t+O(∆t)2, n 6= n′,

Page 250: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

242 Advanced Statistical Physics

then, we have arrived at the master equation again;

dPndt

=∑n′ 6=n

wnn′(t)Pn′(t)−∑n′ 6=n

wn′n(t)Pn(t),

3.2.6 One Step Processes

These are a special class of processes, and are easier to analyse than thegeneral case.

These systems have dynamical laws which only allow movement to thenearest states, rather than all states as in the general case. So, we have thatwnn′ and wn′n are zero, unless n′ = n ± 1. Hence, under this, we see thatthe master equation becomes

dPn(t)dt

= wn,n+1Pn+1(t) + wn,n−1Pn−1(t)− wn+1,nPn(t)− wn−1,nPn(t).

Now, by way of convenient notation, we write

wn+1,n ≡ gn, wn−1,n ≡ rn,then the master equation reads

dPn(t)dt

= rn+1Pn+1(t) + gn−1Pn−1(t)− (gn + rn)Pn(t). (3.2.21)

Now, let us consider some examples.

3.2.6.1 The Decay Process

Suppose we have a sample of radioactive material, with n0 at t = 0. Howmany are there at time t, given some decay rate γ? Then,

wnn′ =γn′ n = n′ − 1,0 else.

Therefore, we see that wn+1,n = rn = γn and wn−1,n = gn = 0. Thus, themaster equation (3.2.21) reads

dPndt

= γ(n+ 1)Pn+1(t)− γnPn(t).

We shall state that the boundary condition is

Pn(0) = δn,n0 .

Now, the mean, over many ensembles, is

〈n(t)〉 =∑n

nPn(t),

Page 251: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.2 Markov Processes 243

differentiating,d

dt〈n(t)〉 =

∑n

ndPndt

.

We see that the RHS is just our master equation. Thus,

d

dt〈n(t)〉 = γ

∞∑n=0

n(n+ 1)Pn+1(t)− γ∞∑n=0

n2Pn(t).

Now, we shall change variables, so that m = n+ 1 in the first summation;

d

dt〈n(t)〉 = γ

∞∑m=1

m(m− 1)Pm(t)− γ∞∑n=0

n2Pn(t).

Then, shift the start of the summation back to zero, and rename back to n,

d

dt〈n(t)〉 = γ

∞∑n=0

n(n− 1)Pn(t)− γ∞∑n=0

n2Pn(t).

So, collecting terms, we see that

d

dt〈n(t)〉 = −γ

∞∑n=0

nPn(t),

which is of course just

d

dt〈n(t)〉 = −γ 〈n(t)〉 .

Now, let x(t) ≡ 〈n(t)〉; then, we have the familiar result

dx(t)dt

= −γx(t).

That is, exponential decay. So, we see that this familiar result is only for anensemble average: take a lot of experiments, and average over them.

Random Walk Here, we have rn = µ and gn = λ. That is, a fixed prob-ability to move left or right, by one step. If the random walk is symmetric,then µ = λ.

Birth & Death Processes Here, we shall take gn = bn (a birth rate), andrn = dn (a death rate). Again, this is a model choice. In reality, birth ratemay decrease with n, as resources become scarce. For example, if

gn = b(1− nN )n,

then, once n = N , no more birth occurs, and the system stops. That is, Nrepresents the maximum number of individuals that the system can have.

Page 252: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

244 Advanced Statistical Physics

3.2.6.2 Linear One-step Processes & Their Solutions

A linear one-step process is one for whom

rn = an+ b, gn = cn+ d.

These can be solved be introducing the generating function,

F (z, t) =∑n

znPn(t).

Note that before, we introduced the characteristic function (although webought it in by considering integrals, we can just use a sum instead),

G(k, t) =∑n

eiknPn(t).

Then, we identify z with eik. So, differentiating the generating function,

∂F

∂z=∑n

nzn−1Pn(t),

and, setting z = 1, then we see that

∂F

∂z

∣∣∣∣z=1

=∑n

nPn(t),

that is, the mean.∂F

∂z

∣∣∣∣z=1

= 〈n(t)〉 .

To get this result for the characteristic function, G, we had to set k = 0.Let us differentiate the generating function a second time,

∂2F

∂z2=∑n

n(n− 1)zn−2Pn(t),

setting z = 1 again results in

∂2F

∂z2

∣∣∣∣z=1

=∑n

n2Pn(t)−∑n

nPn(t) =⟨n2(t)

⟩− 〈n(t)〉 .

Note that these are not-quite cumulants. Let us solve a problem.

Example: Symmetric Random Walk Here, we suppose that the proba-bility of going to the left or right is the same; gn = rn = α (also independentof n). So, the master equation for one step processes (3.2.21) reads

dPn(t)dt

= αPn+1(t) + αPn−1 − 2αPn(t).

Page 253: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.2 Markov Processes 245

We can get rid of α, by defining

τ ≡ αt,so that

d

dt=dτ

dt

d

dτ= α

d

dτ.

Therefore, the master equation reads

dPn(t)dτ

= Pn+1(t) + Pn−1 − 2Pn(t).

We shall also impose the initial condition Pn(0) = δn,0; that is, to start atthe origin. So then, if we multiply the master equation by zn, and sum,

∞∑n=−∞

zndPn(t)dτ

=∞∑

n=−∞znPn+1(t) +

∞∑n=−∞

znPn−1 − 2∞∑

n=−∞znPn(t).

Let us then redefine the summation variable on the first and second terms,to m = n+ 1 and m = n− 1,

∞∑n=−∞

zndPn(t)dτ

=∞∑

m=−∞zm−1Pm(t) +

∞∑m=−∞

zm+1Pm − 2∞∑

n=−∞znPn(t).

Now, if we note that in the generating function, only the probability is afunction of time t, differentiating it will result in the LHS of the above. So,

∂F

∂τ=

∞∑m=−∞

zm−1Pm(t) +∞∑

m=−∞zm+1Pm − 2

∞∑n=−∞

znPn(t).

In a similar vein, we see that we can write the RHS as a similar expression;

∂F

∂τ= (z−1 + z − 2)F.

Therefore, we see that1F

∂F

∂τ= z−1 + z − 2.

This is simply integrated to give

lnF = (z−1 + z − 2)τ + Ω(z),

where we note we have an arbitrary function of integration, rather thanconstant, as we integrated a partial differential. This gives

F (z, τ) = φ(z)e(z−1+z−2)τ ,

Page 254: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

246 Advanced Statistical Physics

where φ(z) is an arbitrary function. Putting t back in,

F (z, t) = φ(z)e(z−1+z−2)αt.

We can figure out the function φ(z), by considering that Pn(0) = δn,0. So,

F (z, 0) =∑n

znδn,0 = z0.1 = 1

Therefore,

F (z, 0) = φ(z)e0 = φ(z),

hence,

φ(z) = 1.

Therefore, the generating function is

F (z, t) = e(z−1+z−2)αt.

Now, to find the probability function, Pn(t), we either Taylor expand theexponential, or use contour integration. We shall not go through how to doit; but the answer we get is

Pn(t) = e−2αtI|n|(2αt),

where In is a modified Bessel function. Now, the mean is just

〈n(t)〉 =∂F

∂t

∣∣∣∣z=1

= (−z−2 + 1)αe(z−1+z−2)αt∣∣∣z=1

= 0.

Thus, the mean is zero. This can be intuitively seen; if we consider thatthe probability of going right is the same as going left, and that the systemstarts at zero, then we go left “as much” as right.

Similarly,

〈n(n− 1)(t)〉 =⟨n2(t)

⟩=∂2F

∂t2

∣∣∣∣z=1

= 2αt.

Notice that

σ2 =⟨n2⟩− 〈n〉2 =

⟨n2⟩,

and therefore that the root-mean-square distance goes as t1/2.

Page 255: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.2 Markov Processes 247

Example: Simple Birth-Death Processes Here, let

rn = dn, gn = bn;

then, the one-step master equation reads

dPndt

= d(n+ 1)Pn+1(t) + b(n− 1)Pn−1(t)− n(b+ d)Pn(t), n > 0

Now, we must be careful at the boundary. Consider n = 0, then P−1 makes“no sense”. So, we impose

dP0

dt= dP1(t).

So then, to solve our master equation, we do the usual thing. Multiply by∑∞n=0 z

n,∞∑n=0

zndPndt

= d

∞∑n=0

zn(n+1)Pn+1(t)+b∞∑n=1

zn(n−1)Pn−1(t)−(b+d)∞∑n=1

nznPn(t),

then, we switch summation indices;∞∑n=0

zndPndt

= d

∞∑n=0

zn−1nPn(t) + b

∞∑n=0

zn+1nPn(t)− (b+ d)∞∑n=0

nznPn(t).

Now, notice that ∑n

nzn−1Pn =∂

∂z

∑n

znPn,∑n

nzn+1Pn = z2∑n

nzn−1Pn

= z2 ∂

∂z

∑n

znPn,∑n

nznPn = z∑n

nzn−1Pn

= z∂

∂z

∑n

znPn.

Finally, we know that

F =∑n

znPn,

as thats how we defined the generating function. Therefore, the masterequation reads

∂F

∂t= d

∂F

∂z+ bz2∂F

∂z− (b+ d)z

∂F

∂z,

Page 256: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

248 Advanced Statistical Physics

which is easily rearranged into

∂F

∂t= (d− bz)(1− z)∂F

∂z.

The solution to this, we verify, rather than prove. Let us change variables,

u ≡ 1− zd− bz e

(b−d)t, v ≡ z.

Then,

∂z=

∂u

∂z

∂u+∂v

∂z

∂v

=∂u

∂z

∂u+

∂v,

∂t=

∂u

∂t

∂u+∂v

∂t

∂v

=∂u

∂t

∂u.

Thus, the partial differential equation that is our master equation, can beseen to just become

∂F

∂v= 0 ⇒ F (u, v) = Φ(u),

where Φ(u) is an arbitrary function of u only. Then,

F (z, t) = Φ(

1− zd− bz e

(b−d)t

).

We now use the initial condition that at t = 0, Pn(0) = δn,n0 . Then,

F (z, 0) =∞∑n=0

znδn,n0 = zn0 .

Therefore,

Φ(

1− zd− bz

)= zn0 .

Now, if we let

x ≡ 1− zd− bz ⇒ z =

1− dx1− bx ,

Hence,

Φ(x) =(

1− dx1− bx

)n0

Page 257: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.2 Markov Processes 249

Thus, substituting z back in, we arrive at

F (z, t) =

(d(z − 1)e(b−d)t − (bz − d)b(z − 1)e(b−d)t − (bz − d)

)n0

.

If one were so inclined, then one would find Pn(t) from this; however, weshall not!

We can deduce a couple of things from this expression for F (z, t).First, consider that

F (z, t) =∞∑n=0

znPn(t),

then, notice that setting z = 0 has the effect of picking out a single compo-nent from the sum,

F (0, t) = P0(t).

That is, setting z = 0 in the generating function gives the probability ofattaining the zero-state. Thus, for our generating function,

F (0, t) = P0(t) =

(d− de(b−d)t

d− be(b−d)t

)n0

,

which is obviously the extinction probability.Now, suppose that d > b (i.e. death rate greater than birth rate). Then,

as t → ∞, e(b−d)t → 0. Hence, P0(t) → 1. This is an intuitively expectedresult.

Suppose that b > d. Then,

limt→∞

P0(t) =(d

b

)n0

.

3.2.6.3 The Macroscopic Equation

Recall that for the decay process we found that

d

dt〈n(t)〉 = −γ 〈n(t)〉 ,

which is entirely deterministic. If we let x(t) ≡ 〈n(t)〉, then the solution tothis is the well known

x(t) = x0e−γt.

These are non-stochastic deterministic equations; after taking an ensembleaverage. They will not tell you how each particle moves in a given “run”of the experiment, but they will tell you how that particle will move “on

Page 258: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

250 Advanced Statistical Physics

average”. We can compute an analogous equation for the general one-stepprocess.

Recall the master equation for one-step processes, (3.2.21). Then, let usmultiply it by n, and sum over all n. Thus,∑

n

ndPndt

=∑n

nrn+1Pn+1(t) +∑n

ngn−1Pn−1(t)−∑n

n(gn + rn)Pn(t).

Let us change summation indices on the first and second sums. For the first,let m ≡ n+ 1, and in the second let q ≡ n− 1. Then,

d

dt

∑n

nPn =∑m

(m− 1)rmPm(t) +∑q

(q+ 1)gqPq(t)−∑n

n(gn + rn)Pn(t).

It is clear that we may change the indices back to n’s. Once doing that, wesee that terms cancel, leaving

d

dt

∑n

nPn =∑n

gnPn(t)−∑n

rnPn(t),

which is just the expectation values (ensemble average),

d

dt〈n(t)〉 = 〈gn〉 − 〈rn〉 . (3.2.22)

To check that this is consistent with the previous case for the decay process,recall that gn = 0, and rn = γn. We thus see that this general expressionholds for our specifically derived case of the decay process. This equation iscalled the macroscopic equation

We have been a little sloppy in reference to boundaries. Suppose we haveboundaries at n = 0, N . Then, if we define r0 ≡ 0 (i.e. no transitions from0 to -1), g−1 ≡ 0 (i.e. no transitions from -1 to 0), rN+1 = gN ≡ 0 (i.e. notransitions from N + 1 to N or N to N + 1), then all results hold.

Lets consider the standard few examples again.

Example: Random Walk We have that rn = µ, gn = λ. Therefore, themacroscopic equation is

d

dt〈n(t)〉 = λ− µ.

Therefore, this is easy to solve; resulting in

〈n(t)〉 = (λ− µ)t+ n0.

Page 259: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.2 Markov Processes 251

Example: Birth & Death Processes Here, we have that rn = dn, gn =bn. Hence, the macroscopic equation,

d

dt〈n(t)〉 = (b− d) 〈n(t)〉 ,

which has solution

〈n(t)〉 = n0e(b−d)t.

Now, for a more complicated example.

3.2.6.4 Long Term Behaviour of Non-linear Processes

Consider

rn = αn2, gn = β,

where α, β are constants. Then,

d

dt〈n〉 = β − α ⟨n2

⟩;

where we suppress implied time-dependance. Now this is hard to solve,as we do not know the expectation value of n2. So, we instead look at thelong-term behavior of Pn(t). The central limit theorem tells us that for a bigenough system, the probability distribution will tend towards Gaussianity.Therefore, the initial δ−style peak will eventually “widen” into a Gaussian,probably moving position as it does so. And, as a simplification, we set allfluctuations (i.e. the variance) to zero. Hence,

σ2 =⟨n2⟩− 〈n〉2 = 0 ⇒ ⟨

n2⟩

= 〈n〉2 .This carries on to the other moments as well;⟨

n3⟩→ 〈n〉3 ⟨

n4⟩→ 〈n〉4 . . .

Then, our macroscopic equation reads

d

dt〈n〉 = β − α 〈n〉2 .

3.2.6.5 Boundaries

Let us consider the boundary conditions of a one step-process. There areobviously a few types. The ranges of n are specified as being one of 3 types.

• All integers −∞ < n <∞. Such a system needs no boundary conditions,as there are no boundaries.• Semi-infinite n = 0, 1, 2, . . .. An example of such a system is a bacteria

colony obeying the simple birth-death rules.

Page 260: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

252 Advanced Statistical Physics

n

Pn(t)

t = !

t = 0t = 1

t = 2

n0

"n(!)#"n(1)#"n(2)#

Fig. 3.4. Evolution of the probability distribution, to a Gaussian. The distributionstarts off as an infinite spike, then spreads out over time. The shape changes, asdoes the poisition.

• Finite 0 ≤ n ≤ N . That is, some sort of boundary at n = 0, N . Forexample, when we wrote rn = dn, gn = bn(1− n

N ).

Then, we can have a few types of boundary. Reflecting where the walker isreflected from the boundary (whereby total probability is conserved); andabsorbing, where the walker is removed from the system as soon as he reachesthe boundary.

Absorbing boundaries do not conserve the total probability.

3.2.6.6 Stationary States of the Master Equation

Consider the difference between a stationary state & the equilibrium state.The equilibrium state is the state the system will “settle down to”, if allexternal influences are removed. Now, consider the stationary state. Sup-pose a metal bar is heated at both ends, with each end held at differenttemperatures. Then, given enough time, the temperature of the bar willhave settled down, so that the temperature at a given point is constant; andthe variation of temperature along the bar is only a function of position, andnot of time. Then, this state is the stationary state, and is obviously notthe equilibrium state (as we are applying heat to the bar ends). We shalldenote the stationary state P st

n . Obviously, as the stationary state is not a

Page 261: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.2 Markov Processes 253

gn!1 gn

n! 1 n n + 1

rn rn+1

Fig. 3.5. The movement from one state to the next, for a one-step process

function of time (as we just discussed), then

dP stn

dt= 0.

Therefore, the master equation reads

rn+1Pstn+1 + gn−1P

stn−1 − (gn + rn)P st

n = 0,

which trivially rearranges to

rn+1Pstn+1 − gnP st

n = rnPstn − gn−1P

stn−1, ∀n.

Now, let us define the LHS as −Jn+1 and the RHS as −Jn (where the signsare introduced by convention, which we will soon see). Then, we see that

Jn+1 = Jn, ∀n.Therefore, we see that Jn is independent of n. So then, we have that

Jn+1 = gnPstn − rn+1P

stn+1,

which has the interpretation of being the net flow of probability from staten to state n+ 1. Hence, we call call J the probability current (without thesubscript as it is independent of n). Thus,

rn+1Pstn+1 − gnP st

n = −J.Now, a reflecting boundary at n = 0 is one for whom J = 0. This must bethe case, as there is no probability flow from n = 0 to n = −1; thus, as thecurrent is conserved, we must have J = 0 everywhere. Therefore, we seethat

r1Pst1 = g0P

st0 r2P

st2 = g1P

st1 . . . rnP

stn = gn−1P

stn−1 . . .

Page 262: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

254 Advanced Statistical Physics

Therefore, from the first expression, we see that

P st1 =

g0

r1P st

0 ,

and from the second that

P st2 =

g1

r2P st

1 ,

thus, combining that

P st2 =

g1g0

r2r1P st

0 .

Therefore, iterating up, we see that

P stn =

gn−1gn−2 . . . g0

rnrn−1 . . . r1P st

0 . (3.2.23)

Now then, we have that all probabilities must sum to unity,

1 =N∑n=0

P stn ,

Therefore, taking out the 0-state,

1 = P st0 +

N∑n=1

P stn .

So, inserting our expression for P stn ,

1 = P stn

(1 +

∑n>0

gn−1gn−2 . . . g0

rnrn−1 . . . r1

),

which easily rearranges to

P st0 =

11 +

∑n>0

gn−1gn−2...g0

rnrn−1...r1

. (3.2.24)

Hence, between this and (3.2.23), we are able to compute all parts to thestationary state probability.

3.2.7 Solution to Master Equation Under Detailed Balance

Let us return to the general master equation,

dPn(t)dt

=∑n′ 6=n

wnn′Pn′(t)−∑n′ 6=n

wn′nPn(t),

Page 263: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.2 Markov Processes 255

then, the stationary state clearly is when∑n′ 6=n

wnn′Pstn′ =

∑n′ 6=n

wn′nPstn .

This is because the stationary state is the state which does not depend upontime.

Now, transition rates, wnn′ , satisfy detailed balance, where detailed bal-ance is the statement that

wnn′Pstn′ = wn′nP

stn ∀n, n′.

If one says in words what this statement is: “at stationarity, the flow ofprobability into a state n from n′ is equal to the flow of probability into n′

from n”.For a one-step process, we have that n′ = n+1, which clearly corresponds

to

rn+1Pstn+1 = gnP

stn ,

which is exactly what we found for the system with reflecting boundaries(J = 0).

Now, let us write the master equation in a more concise form. Let usintroduce the transition matrix

Wmn(t) ≡ wmn(t)− δmn∑n′ 6=n

wn′n(t). (3.2.25)

Then, multiplying this by Pn(t), and summing over n,∑n

Wmn(t)Pn(t) =∑n 6=m

Pn(t)wmn(t)−∑n

∑n′ 6=n

δmnPn(t)wn′n(t)

=∑n 6=m

Pn(t)wmn(t)−∑n′ 6=m

Pm(t)wn′m(t)

=dPmdt

.

Therefore, we see that we can write the master equation as

dPmdt

=∑n

Wmn(t)Pn(t), (3.2.26)

which is a form very close to the previous Markov chains.Let us consider the properties of the matrix W .

• As wnm ≥ 0, then, if n 6= m, the Kronecker-delta term does not contribute,leaving Wnm ≥ 0. That is, all off-diagonal elements of W are greater-thanor equal-to zero.

Page 264: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

256 Advanced Statistical Physics

• Consider summing Wnm up a column. That is,∑n

Wnm =∑n

wnm −∑n

δnm∑n′ 6=m

wn′m,

which is, ∑n

Wnm =∑n

wnm −∑n′

wn′m = 0.

Thus, we see that all columns of W add to zero.

To see what the last property “does” to an example W , consider a 3x3 case.Its form must be

W =

−a− b c e

a −c− d f

b d −e− f

.

Notice that we can write the continuous Markov equation, (3.2.26), inket-notation,

d

dt|P (t)〉 = W (t) |P (t)〉 .

Where W (t) is a time dependent operator. Notice that this equation is ofvery similar form to the Schrodinger equation. In writing this ket-form, weuse

Pn(t) = 〈n|P (t)〉, Wnn′ = 〈n|W ∣∣n′⟩ .Now, in principle, we can solve the continuous Markov equation as we did

the discrete equation. When we solved the discrete equation, we implicitlyassumed that there existed a complete set of eigenvalues and eigenstates.This is not always guaranteed. However, if the matrix is symmetric, thenthis method will always work.

Now, W , in general, is not symmetric. Then, finding some symmetricmatrix, V say, that is symmetric (from W ) will enable us to use previousmethods. Infact, as we shall see, a process satisfying the detailed balancecondition allows this.

Now, let us define a new matrix,

Vmn ≡√P stn

P stm

Wmn (3.2.27)

=

√P stn

P stm

wmn − δmn ∑n′ 6=n

wn′n

.

Page 265: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.2 Markov Processes 257

Now, to see that Vmn is symmetric, we shall use the detailed balance condi-tion

wnmPstm = wmnP

stn ,

which we rewrite in the form

wnm

√P stm

P stn

= wmn

√P stn

P stm

.

Now then, notice that Vmn can be rewritten, using this

Vmn =

√P stn

P stm

wmn − δmn√P stn

P stm

∑n′ 6=n

wn′n

=

√P stn

P stm

wmn − δmn∑n′ 6=n

wn′n

=

√P stm

P stn

wnm − δmn∑n′ 6=n

wn′n

= Vnm.

The steps follow as the Kronecker-delta reduce the square-root term to unity.Therefore, we see that Vmn is symmetric, if the detailed balance assumptionholds. Thus, V = V T .

Hence, using (3.2.27), the master equation reads

dPmdt

=∑n

Vmn

√P stm

P stn

Pn,

which is easily rewritten as

d

dt

Pm√P stm

=∑n

VmnPn√P stn

.

If we then define

Pn ≡ Pn√P stn

,

the master equation reads

dPmdt

=∑n

VmnPn.

Page 266: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

258 Advanced Statistical Physics

Now this is in a form which we can solve using previous methods. This isbecause the operator, V is symmetric. Note that this, in ket-notation, reads

d

dt

∣∣∣P (t)⟩

= V∣∣∣P (t)

⟩.

The formal solution to this, is∣∣∣P (t)⟩

= eV t∣∣∣P (0)

⟩,

assuming that V is time independent. We can check this easily,

d

dt

∣∣∣P (t)⟩

= V eV t∣∣∣P (0)

⟩= V

∣∣∣P (t)⟩.

This formal solution is actually pretty useless. We can explicitly solve theequation using eigenvalues and eigenvectors.

Now, as V is symmetric, the left- and right-eigenvectors are identical.That is,

V∣∣∣φ(i)

⟩= µ(i)

∣∣∣φ(i)⟩,⟨φ(i)∣∣∣V =

⟨φ(i)∣∣∣µ(i).

The “superscript with bracket” notation reads the ith right-eigenvector is∣∣φ(i)⟩, with eigenvalue µ(i). The eigenvectors are orthonormal, and satisfy

the completeness relation,

〈φ(i)|φ(j)〉 = δij ,∑i

∣∣∣φ(i)⟩⟨

φ(i)∣∣∣ = I.

Also, since the matrix V is symmetric, all eigenvalues are real.The stationary state is the state which is independent of time. Therefore,

the time derivative of the stationary state is zero. Hence, the stationarystate satisfies ∑

n

VmnPstn = 0.

We can write this as an eigenvalue equation, so that

V∣∣∣P st

⟩= 0.

∣∣∣P st⟩.

That is, the state P stn is an eigenstate of V , with eigenvalue 0. Notice the

difference in notation between |P 〉 and Pn. We generally use the formerto refer to the entire state P , whereas the latter is referring to a singlecomponent of P .

Now, we can take our “formal solution”,∣∣∣P (t)⟩

= eV t∣∣∣P (0)

⟩,

Page 267: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.2 Markov Processes 259

insert a unity, ∣∣∣P (t)⟩

= eV t∑i

∣∣∣φ(i)⟩〈φ(i)|P (0)〉.

Now, we note that as

eAt∣∣∣ψ(i)

⟩= ea

(i)∣∣∣ψ(i)

⟩,

then ∣∣∣P (t)⟩

=∑i

eµ(i)t∣∣∣φ(i)

⟩〈φ(i)|P (0)〉.

Now, by completeness, we can write that a single component of P is

Pn(t) = 〈n|P (t)〉= 〈n|

∑i

eµ(i)t∣∣∣φ(i)

⟩〈φ(i)|P (0)〉

= 〈n|∑i,m

eµ(i)t∣∣∣φ(i)

⟩〈φ(i)|m〉〈m|P (0)〉

=∑i,m

eµ(i)t〈n|φ(i)〉〈φ(i)|m〉〈m|P (0)〉

=∑i

eµ(i)tφ(i)

n

∑m

φ(i)m Pm(0).

We now see that the nth component of the state, at some time t is found bysumming over the nth component of all eigenstates, multiplied by the sumover all components of that eigenvector multiplied by all components of theoriginal state. Thus, we have derived that

Pn(t) =∑i

eµ(i)tφ(i)

n

∑m

φ(i)m Pm(0). (3.2.28)

Notice that the only time-dependent term is the exponential, which carriesthe eigenvalue µ(i).

This is, again, very close to the discrete Markov equation solution. Thedifference between this continuous solution, and the discrete solution, is thateµ

(i)t replaces (λ(i))t. That is, the correspondence between continuous anddiscrete is

λ(i) ←→ eµ(i)

discrete continuous.

In the discrete case, we had that all eigenvalues lay between zero and unity,0 ≤ λ(i) ≤ 1. In the continuous case, we thus have that all eigenvalues are

Page 268: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

260 Advanced Statistical Physics

less than zero µ(i) < 0. Infact, if one recalls that the case −1 ≤ λ(i) ≤ 0corresponded to a solution which flicked between states, then this transfersover, as it corresponds to complex µ(i), which is an oscillatory term.

We can also investigate the correspondence between our eigenstates φ(i)

and the left- and right-eigenstates χ(i), ψ(i).Before we continue, it is worth stressing that the bracketed-superscript

denotes the “number” of the vector (i.e. first, second, third, etc), and thesubscript the component of that vector.

Now, the continuous eigenstates satisfy∑n

Vmnφ(i)n = µ(i)φ(i)

m ,

or, in terms of W , ∑n

√P stn

P stm

Wmnφ(i)n = µ(i)φ(i)

m

⇒∑n

√P stn Wmnφ

(i)n =

√P stm µ(i)φ(i)

m . (3.2.29)

Notice that if we define

ψ(i)n ≡

√P stn φ(i)

n ,

then (3.2.29) reads ∑n

Wmnψ(i)n = ψ(i)

m µ(i).

Now, if we use the symmetry of V , then we also have that∑n

Vnmφ(i)n = µ(i)φ(i)

m

⇒∑n

√P stm

P stn

Wnmφ(i)n = µ(i)φ(i)

m

⇒∑n

Wnmφ

(i)n√P stn

= µ(i) φ(i)m√P stm

. (3.2.30)

Then, if we define

χ(i)n ≡

φ(i)n√P stn

,

then (3.2.30) reads ∑n

χ(i)n Wnm = µ(i)χ(i)

m .

Page 269: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.2 Markov Processes 261

Therefore, we have an interpretation of the left- and right-eigenvectors ofW , in terms of the eigenvectors of V . To summarise the different states ofeach matrix V,W :

• The symmetric V has eigenstates, the ith vector has its nth componentφ

(i)n . The right- and left-eigenstates of V are identical, so we do not

distinguish between them.• The matrix W has different right- and left-eigenstates. The left-eigenstate

is denoted χ(i)n , and right as ψ(i)

n . They relate back to the eigenstates φ(i)n

of V via

ψ(i)n ≡

√P stn φ(i)

n , χ(i)n ≡

φ(i)n√P stn

.

• All eigenstates of both V,W correspond to eigenvalues µ(i).

As before, the long-term behaviour of the system is characterised by theeigenstate corresponding to the eigenvalue closest to 0. This is because alleigenvalues of the continuous master equation are negative, and a large-negative number will cause its contribution to die off exponentially quickly.

Also as before, the stationary state is such that

χ(1)n = 1, ψ(1)

n = P stn , ∀n.

Example: Random Walker Consider a random walker, with 4 latticesites. The transition probabilities are (a given)

w21 = 1, w32 = 34 , w43 = 3

4 ,

w34 = 1, w23 = 14 , w12 = 1

4 .

w21 = 1 w32 = 3/4 w43 = 3/4

w34 = 1w23 = 1/4w12 = 1/4

Fig. 3.6. The 1D random walker, with 4 lattice sites. Also shown are the transitionrates from each state.

Then, using (3.2.25),

Wmn(t) ≡ wmn(t)− δmn∑n′ 6=n

wn′n(t),

Page 270: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

262 Advanced Statistical Physics

we see that the matrix W is

W =

−1 1/4 0 01 −1 1/4 00 3/4 −1 10 0 3/4 −1

.

Notice that the columns add to zero.Let us find wether or not detailed balance holds. The stationary state is

the state for whom

W

a

b

c

d

=

0000

,

the solution of which is a

4a12a9a

.

Now, to find a, we consider the normalisation condition between the sta-tionary state and the unit state. That is,

(1 1 1 1)

a

4a12a9a

= 1,

thus 26a = 1, and hence a = 1/26. Therefore, the stationary state is

∣∣P st⟩

=

1/262/136/139/26

.

That is, after enough time, the system will settle down to lattice point 1once in 26-goes, lattice point 2 twice in 13-goes, lattice point 3 six timesin 13-goes, and point 4 in nine times from 26-goes. That is, we have theprobability of finding the system in a given lattice site, after a long time.

To determine if detailed balance holds, we must test

wnmPstm = wmnP

stn ,

Page 271: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.2 Markov Processes 263

for all n. So,

w21Pst1 = w12P

st2 ⇒ 1. 1

26 = 14 .

213

w23Pst3 = w32P

st2 ⇒ 1

4 .613 = 3

4 .213

. . .

Which we find all to be true. Therefore, detailed balance holds.To find the eigenvalues of W , we must determine∣∣∣∣∣∣∣∣

−1− µ 1/4 0 01 −1− µ 1/4 00 3/4 −1− µ 10 0 3/4 −1− µ

∣∣∣∣∣∣∣∣ = 0.

The characteristic equation is thus

(1 + µ)4 − 916(1 + µ)2 + 3

16 = 0.

We can “read off” some eigenvalues. We can see that µ = 0,−2 are easilyattained from this. To get the other two, define x ≡ (1 + µ)2. Then, thecharacteristic equation is

x2 − 1916x+ 3

16 = 0,

which factorises to

(x− 1)(x− 316) = 0.

Therefore, from this, we have three more values of µ:

µ = 0,−1±√

34.

Therefore, all the eigenvalues are

µ = 0,−1 +√

34,−1−

√3

4,−2.

To get V from W , we must multiply each element of W by√

P stnP stm

. Thisresults in

V =

−1 1/2 0 01/2 −1

√3 /4 0

0√

3 /4 −1√

3 /20 0

√3 /2 −1

.

Page 272: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

264 Advanced Statistical Physics

To find the eigenvectors, say φ(i) (i.e. of V ), we use the relation

χ(i)n =

φ(i)n√P stn

.

Now, we can easily read off the eigenvector φ(1), as we know that χ(1)n = 1,

and we know what the P stn are (the components of the stationary state).

Therefore,

∣∣∣φ(1)⟩

=

(1/26)1/2

(2/13)1/2

(6/13)1/2

(9/26)1/2

.

Suppose that the system starts in state 1 at t = 0. That is, Pn(0) = δ1n.Then, from the expression

Pn(t) =∑i,m

eµ(i)tφ(i)

n φ(i)m Pm(0),

and

Pn =Pn√P stn

,

then

Pn(t) =∑m,i

eµ(i)tφ(i)

n φ(i)m

√P stn

P stm

Pm(0).

If we put the initial condition in, we see that only the case with m = 1contributes,

Pn(t) =4∑i=1

eµ(i)tφ(i)

n φ(i)1

√P stn

P st1

.

We can simplify this. Note that

φ(i)n = χ(i)

n

√P stn , χ(1)

n = 1,

then

φ(1)n =

√P stn , φ

(1)1 =

√P st

1 .

Page 273: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.2 Markov Processes 265

Hence,

Pn(t) = eµ(1)tφ(1)

n φ(1)1

√P stn

P st1

+4∑i=2

eµ(i)tφ(i)

n φ(i)1

√P stn

P st1

= eµ(1)tP st

n +4∑i=2

eµ(i)tφ(i)

n φ(i)1

√P stn

P st1

.

Finally, if we have ordered our eigenvalues in order of “least negativity”,then e(0) = 0, and

Pn(t) = P stn +

4∑i=2

eµ(i)tφ(i)

n φ(i)1

√P stn

P st1

.

Then, if µ(2) is less negative than µ(3), µ(4), then that term will be the maincontributor at large time. Therefore, at large time,

limt→∞

Pn(t) = P stn + eµ

(2)tφ(2)n φ

(2)1

√P stn

P st1

.

Example: Periodic Lattice Let us consider a random walk on a periodiclattice. That is, we identify one end with the other. Then, the walker walkson a circular line. We define the transition rates

w21 = 3/4, w32 = 3/4, w43 = 3/4, w14 = 3/4,

w12 = 1/4, w23 = 1/4, w34 = 1/4, w41 = 1/4.

Therefore, from these, we can write the transition matrix,

W =

−1 1/4 0 3/43/4 −1 1/4 00 3/4 −1 1/4

1/4 0 3/4 −1

,

whereby we introduce the diagonal elements so that columns sum to zero.Now, the stationary state is found by solving

W

a

b

c

d

=

0000

,

which easily solves to give

a = b = c = d,

Page 274: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

266 Advanced Statistical Physics

thus,

P st =

a

a

a

a

.

We find the value of a by requiring normalisation,

(1 1 1 1)

a

a

a

a

= 1.

Therefore, we see that a = 1/4. Therefore, the stationary state is

P st =

1/41/41/41/4

.

Now, we must check that the condition for detailed balance holds. That is,we must check that

wnn′Pstn′ = wn′nP

stn

holds for all n. Now, as P stn′ = P st

n = 1/4, this reduces to the requirementthat

wnn′ = wn′n,

which clearly is not true for all n. Therefore, we see that detailed balancedoes not hold. Therefore, we cannot construct a symmetric matrix fromwhich to solve the master equation. That is, there is always some currentof probability.

3.2.8 Summary

The theory of Markov processes came from the Chapman-Kolmogorov equa-tion,

p(xf , tf |xi, ti) =∫dxmp(xf , tf |xm, tm)p(xm, tm|xi, ti),

which can be thought of as being the probability of going from state i tostate f , over any possible intermediate state m.

Page 275: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.2 Markov Processes 267

Discrete Time For cases where time is discrete, we wrote the CK equationas

Pn(t+ 1) =∑m

Qnm(t)Pm(t),

where the transition matrix has elements

Qmn(t) = P (m, t+ 1|n, t),describing the probaility of making a transition from state n to state m.The right eigenvectors of Q are denoted |ψ〉, and the left 〈χ|. We showedthat powers of the matrix Q can be computed via

Qt =∑i

λti∣∣ψi⟩ ⟨χi∣∣ ,

where λi is the eigenvalue of Q. All eigenvalues are less than or equal toone, in magnitude. The right-eigenvector corresponding to an eigenvalue of1, is the stationary state. The elements of the left-eigenvector, with thiseigenvalue, are all unity. Given the initial state of the system, P (0), allsubsequent states may be computed via

P (t) = QtP (0).

Continuous Time We wrote the CK equation, for systems who have con-tinuous time, in the form of a master equation,

dPn(t)dt

=∑n′ 6=n

wnn′(t)Pn′(t)−∑n′ 6=n

wn′n(t)Pn(t),

where wnn′ is the rate of transitions from the state n′ to the state n.

One-Step Processes If we have systems for whom states only transition toother states in their immediate vicinity, then we write the master equationas

dPn(t)dt

= rn+1Pn+1(t) + gn−1Pn−1(t)− (gn + rn)Pn(t),

where

wn+1,n ≡ gn, wn−1,n ≡ rn.Linear one-step processes are solved by introducing the generating function,

F (z, t) =∑n

znPn(t),

Page 276: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

268 Advanced Statistical Physics

and solving the subsequent differential equation. The rate of change of theexpectation value of a stochastic variable was shown to be

d

dt〈n(t)〉 = 〈gn〉 − 〈rn〉 .

The stationary state is the state which is independent of time,

dP stn

dt= 0.

Detailed Balance We can solve the master equation, if detailed balanceholds,

wnn′Pn′(t) = wn′nPn(t) ∀n, n′.

To aid solving the master equation, we define the transition matrix

Wmn(t) = wmn(t)− δmn∑n′ 6=n

wn′n(t),

the then master equation can be written as

dPm(t)dt

=∑n

Wmn(t)Pn(t).

We further write

Vmn =

√P stn

P stm

Wmn = Vnm,

which, if detailed balance holds, allows us to compute the solution to themaster equation. The stationary state is the state which corresponds to azero eigenvalue of V . If the eigenvectors of V are φ(i), then the solution tothe master equation is

Pn(t) =∑i,m

eµ(i)tφ(i)

n φ(i)m Pm(0)

√P stn

P stm

,

only if detailed balance holds.We have so far discussed Markov processes for whom time is both contin-

uous and discrete. However, in both cases, the state space has been discrete.That is, there has always been a set number of values that the system can bein. We shall now consider systems for whom the state space is continuous.

Page 277: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.3 Drift & Diffusion 269

3.3 Drift & Diffusion

3.3.1 Introduction

Frequently, we do not want to work with a discrete state space such as

n = . . . ,−2,−1, 0, 1, 2, . . . .

Now, we shall introduce this by example, but will go on to derive the mainequation in a more rigorous manner.

In the last section, we introduced the Wright-Fisher model, which inessence, reduces to picking N genes to form the next generation of a popu-lation. The model worked via the set of rules

• Choose an individual at random,• Copy it,• Put copy into generation t+ 1,• Put original back into generation t,• Repeat until generation t+ 1 has as many individuals as generation t.

We then found that

Qnn′ =(N

n

)pnqN−n, p ≡ n′

N.

A model which is in some sense similar, but is a one-step process, wasintroduced by Moran in 1958.

3.3.1.1 The Moran Model

At a given time, two individuals are chosen. One is designated the parent,which is copied to create an offspring, the other is sacrificed in order to makeway for the new offspring.

Now, suppose that there are n′ individuals of type A at time t, and thatthere are N − n′ of type B. Then, the different ways in which we can pickout two individuals are

• Pick two A’s. Then, the composition of the population is unchanged. Thishappens (

n′

N

)(n′ − 1N − 1

)of the time.• Pick two B’s. Then, again, the composition does not change. This hap-

pens (N − n′N

)((N − 1)− n′

N − 1

)

Page 278: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

270 Advanced Statistical Physics

of the time.

• If A and B are picked, then this happens

n′

N

((N − 1)− (n′ − 1)

N − 1

)+(N − n′N

)(n′

N − 1

).

This may be written as

2n′

N

N − n′N − 1

,

and corresponds to either n = n′+ 1 or n = n′− 1, where n is the numberof A’s.

Then, we can write the transition matrix as

Qnn′ =

n′

N

(N−n′N−1

)n = n′ + 1,

1− 2n′(n′−N)N(N−1) n = n′,

n′

NN−n′N−1 n = n′ − 1.

We can see that these all sum to unity. Now, under the master equationformulation,

wnn′ =

n′

N

(n′−NN−1

)n = n′ + 1,

n′

Nn′−NN−1 n = n′ − 1.

That is, a symmetric one-step process, with

gn = rn =n

N

(N − nN − 1

).

Hence, notice that the macroscopic equation looks like

d

dt〈n〉 = 〈gn〉 − 〈rn〉 = 0,

thus, the ensemble average is a constant, 〈n〉 = n0, say.We can now derive the diffusion approximation, for this specific model.The master equation reads

dP (n, t)dt

= rn+1P (n+ 1, t) + gn−1P (n− 1, t)− (rn + gn)P (n, t),

where we have used the notation P (n, t) ≡ Pn(t). Therefore, using our

Page 279: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.3 Drift & Diffusion 271

gn, rn,

dP (n, t)dt

=N

N − 1

[(n+ 1N

)(N − n− 1

N

)P (n+ 1, t)

+(n− 1N

)(N − n+ 1

N

)P (n− 1, t)

− 2( nN

)(N − nN

)P (n, t)

].

Now, we make the substitution x ≡ n/N . In making this substitution, wesee that when we had n = 0, 1, 2, 3, . . ., we now have x = 0

N ,1N ,

2N , . . .. Then,

as N → ∞, it is clear that x becomes continuous. Before we substitute xinto the above master equation, we shall make clear what it is we are doingto P (n+ 1, t). So,

P (n+ 1, t) 7−→ P (x+ 1N , t) = P (x, t) +

1N

∂P

∂x+

12N2

∂2P

∂x2+ . . . ,

that is, more or less a Taylor expansion. Note that we have change thedefinition of the argument of P slightly. Hence, using this, the RHS of themaster equation reads

N

N − 1

[(x+

1N

)(1− x− 1

N

)(P (x, t) +

1N

∂P

∂x+

12N2

∂2P

∂x2+ . . .

)+(x− 1

N

)(1− x+

1N

)(P (x, t)− 1

N

∂P

∂x+

12N2

∂2P

∂x2+ . . .

)− 2x(1− x)P (x, t)] .

Upon expansion, one finds that

∂P (x, t)∂t

=N

N − 1

[1N2

∂2

∂x2(x(1− x)P ) +O

(1N3

)].

Now, let us redefine time,

τ ≡ 2tN(N − 1)

.

Hence, the master equation easily becomes

∂P (x, τ)∂τ

=12∂2

∂x2[x(1− x)P (x, τ)] +O

(1N

),

then, if we take N →∞, we can neglect the last term. Therefore, we have

∂P (x, τ)∂τ

=∂2

∂x2[D(x)P (x, τ)] , D(x) ≡ 1

2x(1− x).

Page 280: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

272 Advanced Statistical Physics

That is, a diffusion equation, with diffusion “constant” D(x). Therefore, wehave made the systems state space, and time, continuous, on 0 ≤ x ≤ 1.

We shall now discuss a general derivation of the diffusion equation.

3.3.2 The Fokker-Planck Equation

Let us start from the Chapman-Kolmogorov equation,

P (x, t+ ∆t|x0, t0) =∫dx′P (x, t+ ∆t|x′, t)P (x′, t|x0, t0).

In the following discussion, we shall ignore the initial conditions, as theywill not play a role.

Now, let x′ = x−∆x, then, the integrand of the CK equation is written

P (x, t+ ∆t|x−∆x, t)P (x−∆x, t),

or, alternatively, as

P (x−∆x+ ∆x, t+ ∆t|x−∆x, t)P (x−∆x, t).

Now, consider Taylor expanding a function of the form,

f(x−∆x; ∆x, t, t+ ∆t) =∞∑`=0

(−∆x)`

`!∂`

∂x`f(x; ∆x, t, t+ ∆t).

Also, notice that as x′ ∼ ∆x (note that x is constant), then∫dx′ 7−→

∫d(∆x).

Now, we shall introduce jump moments, defined as

M`(x, t,∆t) ≡∫d(∆x)(∆x)`P (x+ ∆x, t+ ∆t|x, t). (3.3.1)

Therefore, using this, the CK equation reads

P (x, t+ ∆t) =∞∑`=0

(−1)`

`!∂`

∂x`[M`(x, t,∆t)P (x, t)] . (3.3.2)

Let us consider what the jump moments look like. Changing variables,∆x = z − x, we see that

M`(x, t,∆t) =∫dz(z − x)`P (z, t+ ∆t|x, t),

taking ` = 0,

M0(x, t,∆t) =∫dzP (z, t+ ∆t|x, t),

Page 281: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.3 Drift & Diffusion 273

which is just the sum over all probabilities, which is unity. That is, M0 = 1.An integrand with this property is just a delta-function. That is,

P (z, t|x, t) = δ(z − x).

Now, we can expand the jump moments in powers of ∆t, such that

M`(x, t,∆t) = D(`)(x, t)∆t+O(∆t)2, ` ≥ 1. (3.3.3)

Compare this with the transition matrix expansion in terms of the transitionrates

Qnn′ = wnn′∆t+O(∆t)2.

Therefore, using this expansion, the master equation reads

P (x, t+ ∆t) = P (x, t) +∞∑`=1

(−1)`

`!∂`

∂x`

(D(`)(x, t)P (x, t)

)∆t+O(∆t)2

where we had to be careful of the ` = 0 term. Taking the P (x, t) term tothe other side, and dividing by ∆t results in

P (x, t+ ∆t)− P (x, t)∆t

=∞∑`=1

(−1)`

`!∂`

∂x`

(D(`)(x, t)P (x, t)

)+O(∆t).

Then, if we let ∆t → 0, the LHS becomes a partial differential, and weignore the O(∆t)-term on the RHS. That is,

∂P

∂t=∞∑`=1

(−1)`

`!∂`

∂x`

(D(`)(x, t)P (x, t)

), (3.3.4)

which is known as the Kramers-Moyal expansion. In deriving this, all weassumed was the Markov assumption, and that the Taylor expansion holds.

In many cases, calculation of the jump moments M` shows that theyare negligible for ` > 2. Therefore, we may truncate the Kramers-Moyalexpansion after second order, to obtain the Fokker-Planck equation,

∂P

∂t= − ∂

∂x(A(x, t)P ) +

12∂2

∂x2(B(x, t)P ) , (3.3.5)

where

A ≡ D(1), B ≡ D(2),

Although we have included the time dependance of A,B, they are fre-quently independent of time. This is the Fokker-Planck equation.

Examples of A and B come from simple diffusion and the Moran model

• Simple diffusion: has A = 0 and B = 2D, where D is a constant.

Page 282: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

274 Advanced Statistical Physics

• Moran model: under N →∞, has A = 0 and B = x(1− x).

3.3.2.1 Computing Jump Moments

So, we must be able to compute the jump moments, for a given process. Letus consider how to do so.

Let us use the previous notation for a conditional mean,

〈x(t)〉x(t0)=x0=∫dxxP (x, t|x0t0),

where the notation reads “the mean value of the stochastic variable x attime t, given that it had value x0 at time t0”.

So, let us go back to the definition of the jump moment, (3.3.1), makingthe substitution z ≡ x+ ∆x, so that

M`(x, t,∆t) =∫dz(z − x)`P (z, t+ ∆t|x, t).

Now, notice that we can take the binomial expansion

(z − x)` =∞∑`1=0

(`

`1

)z`1(−x)`−`1 ,

so that the jump moment is

M`(x, t,∆t) =∫dz∑`1=0

(`

`1

)z`1(−x)`−`1P (z, t+ ∆t|x, t).

Now, we take take out of the integral, things that do not depend upon z,giving

M`(x, t,∆t) =∑`1=0

(`

`1

)(−x)`−`1

∫dzz`1P (z, t+ ∆t|x, t),

however, we notice that the resulting integral is just the conditional meanof z`1 , at time t+ ∆t, given that it had value x at time t. That is,∫

dzz`1P (z, t+ ∆t|x, t) =⟨z`1(t+ ∆t)

⟩z(t)=x

,

we change variables so that we use x rather than z,∫dxx`1P (x, t+ ∆t|x, t) =

⟨x`1(t+ ∆t)

⟩x(t)=x

.

Page 283: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.3 Drift & Diffusion 275

Therefore, the jump moment can be written as

M`(x, t,∆t) =∑`1=0

(`

`1

)(−x)`−`1

⟨x`1(t+ ∆t)

⟩x(t)=x

.

Now, notice that we can pull the (−x)`−`1 inside the expectation value (recallthat it was just a constant with respect to the integral), so that

M`(x, t,∆t) =∑`1=0

(`

`1

)⟨(−x)`−`1x`1(t+ ∆t)

⟩x(t)=x

,

the RHS of which we notice is just the binomial expansion of

M`(x, t,∆t) =⟨

[x(t+ ∆t)− x]`⟩x(t)=x

.

Therefore, to calculate the jump moments, we need to compute the momentsat time t+ ∆t, given that the system is fixed to be in state x at time t.

To see what this means, suppose that a stochastic process is described bya one-step master equation, with the transition probabilities rn, gn. Recallthat for a one-step process, we expand the probability out as

P (n, t+ ∆t|n′, t) =

1− (rn′ + gn′)∆t+O(∆t)2 n = n′

gn′∆t+O(∆t)2 n = n′ + 1rn′∆t+O(∆t)2 n = n′ − 1

Then, the associated conditional mean is written

〈n(t+ ∆t)〉n(t)=n′ =∑n

nP (n, t+ ∆t|n′, t)

= n′ (1− (rn′ + gn′)∆t) + (n′ + 1)gn′∆t

+(n′ − 1)rn′∆t+O(∆t)2

=(n′ + gn′ − rn′

)∆t+O(∆t)2,

hence, one can easily see that⟨n(t+ ∆t)− n′⟩

n(t)=n′= (gn′ − rn′) ∆t+O(∆t)2.

Infact, its not too hard to see that⟨[n(t+ ∆t)− n′]`⟩

n(t)=n′=(gn′ + rn′(−1)`

)∆t+O(∆t)2.

Hence, dropping the prime, we arrive at a very useful relation for one-step

Page 284: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

276 Advanced Statistical Physics

processes,

M`(x, t,∆t) =1N `

⟨[x(t+ ∆t)− x]`

⟩n(t)=n

=(gn + rn(−1)`

)∆t+O(∆t)2. (3.3.6)

We can then read off that the coefficient of ∆t is just D(`), using (3.3.3),

D(`)(x, t) =(gn + rn(−1)`

); (3.3.7)

in most cases D(`) will not depend on t.

3.3.2.2 Simple Diffusion

Let us consider a simple, symmetric, random walk; with rn = gn = α.To make the diffusion approximation, let x = nL, where L is a step size,

so that

M`(x, t,∆t) =⟨

[x(t+ ∆t)− x]`⟩x(t)=x

= L`⟨

[n(t+ ∆t)− n]`⟩n(t)=n

= L`(α+ (−1)`α

)∆t+O(∆t)2,

which allows us to read off that

D(`) = L`(α+ (−1)`α

),

where we note that there is neither x nor t dependance. Hence, a few valuesof D(`) are

D(1) = 0, D(2) = 2L2α, D(3) = 0, D(4) = 2L4α, . . .

Therefore, the Kramers-Moyal expansion is

∂P

∂t=

2L2α

2!∂2P

∂x2+

2L4α

4!∂P 4

∂x4+ . . .

Now, we want to let L → 0, without killing off the entire equation. So, werescale time,

τ ≡ L2t ⇒ ∂

∂t=∂τ

∂t

∂τ= L2 ∂

∂τ.

Hence, the Kramers-Moyal expansion looks like

∂P

∂τ=

2α2!∂2P

∂x2+

2L2α

4!∂P 4

∂x4+ . . . ,

Page 285: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.3 Drift & Diffusion 277

letting L→ 0 results in

∂P

∂τ= α

∂2P

∂x2.

This is a diffusion equation, with diffusion constant α. Therefore, we havederived that a simple symmetric one-step process can be modeled as a dif-fusion process.

3.3.2.3 The Moran Model

In the Moran model discussed above, we have that

gn = rn =n(N − n)N(N − 1)

.

Now, we introduce

x ≡ n

N,

so that the jump moments may be written as

M`(x, t,∆t) =⟨

[x(t+ ∆t)− x]`⟩x(t)=x

=1N `

⟨[n(t+ ∆t)− n]`

⟩n(t)=n

.

Furthermore, if we notice that in (3.3.7), the only non-zero D(`) are thosewith ` even (as rn = gn).

Now, notice that with n = xN ,

rn = gn =xN(1− x)N − 1

,

so that

gn − rn = 0, gn + rn =2xN(1− x)N − 1

And then that

D(`) =1N `

(gn + (−1)`rn

).

Hence, a few of the D(`) are

D(1) = 0, D(2) =2

N(N − 1)x(1− x),

D(3) = 0, D(4) =2

N3(N − 1)x(x− 1).

Page 286: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

278 Advanced Statistical Physics

Therefore, the Kramers-Moyal expansion is

∂P

∂t=

2N(N − 1)

12!∂2

∂x2(x(1− x)P ) +O

(1

N3(N − 1)

).

Now, we want to be able to let N → ∞, without killing off the entireequation. So, to do so, we rescale time

τ ≡ 2tN(N − 1)

,

so that the Kramers-Moyal expansion reads

∂P

∂τ=

12∂2

∂x2(x(x− 1)P ) ,

after letting N →∞. This is the same result we arrived at earlier.

3.3.2.4 The Moran Model: With Mutation

Now, in the original Moran model, we had n′ A alleles and N −n′ B alleles.We then picked two at random, one of which was copied, the other destroyed.Both the copy and the parent were then returned to the big pot of alleles.

Let us now suppose that the copy process is not perfect. That is, weintroduce a small mutation probability, so that the offspring can contain adifferent allele to the parent.

For example, if the parent is type A, then the offspring is of type A withprobability (1 − u), but of type B with probability u. That is, u is themutation probability from A to B. Similarly, we let v be the mutationprobability from B to A.

Then, what are the gn, rn in this case? Let us be systematic in settingthis up.

When does the number of A alleles increase by one, in a singletime step? There are two ways this can happen

(i) Eliminate B and pick A as parent, with no mutation. Thus, theprobability of this happening is

(1− u)n′(N − n′)N(N − 1)

.

(ii) Eliminate B and pick B as parent, with mutation. Then, the proba-bility of this happening is

v(N − n′)(N − 1− n′)

N(N − 1).

Page 287: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.3 Drift & Diffusion 279

Therefore, the total probability of increasing is the sum of these two cases,

gn′ = (1− u)n′(N − n′)N(N − 1)

+ v(N − n′)(N − 1− n′)

N(N − 1).

When does the number of A alleles decrease by one, in a singletime step? Again, there are two ways in which this can happen

(i) Eliminate A and pick B as parent, with no mutation; with probability

(1− v)n′(N − n′)N(N − 1)

.

(ii) Eliminate A and pick A as parent, with mutation; with probability

un′(n′ − 1)N(N − 1)

.

Therefore, the probability of decrease is the sum of these two probabilities

rn′ = (1− v)n′(N − n′)N(N − 1)

+ un′(n′ − 1)N(N − 1)

.

From hereon, everything starts to get a little messy.Now, we use the re-scaling n ≡ xN , so that

gn = (1− u)N

N − 1x(1− x) + v

N

N − 1(1− x)

(1− x− 1

N

),

rn = (1− v)N

N − 1x(1− x) + u

N

N − 1x

(x− 1

N

).

Hence, their sum and difference are found to be

gn − rn =N

N − 1

[−ux

(1− 1

N

)+ v(1− x)

(1− 1

N

)],

gn + rn =N

N − 1

[2x(1− x) + ux

(2x− 1− 1

N

)− vx

(2− 2x− 1

N

)].

So, if we write the Kramers-Moyal expansion (up to second order), we seethat

∂P

∂t=

−1N

N

N − 1∂

∂x

[−ux

(1− 1

N

)+ v(1− x)

(1− 1

N

)]P

+

12!N2

N

N − 1∂2

∂x2

[2x(1− x) + ux

(2x− 1− 1

N

)−vx

(2− 2x− 1

N

)]P

.

Page 288: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

280 Advanced Statistical Physics

Now, if we rescale time, such that

τ ≡ 2tN(N − 1)

,

then the Kramers-Moyal expansion reads

∂P

∂τ= −1

2N

∂x

[−ux

(1− 1

N

)+ v(1− x)

(1− 1

N

)]P

+

14∂2

∂x2

[2x(1− x) + ux

(2x− 1− 1

N

)− vx

(2− 2x− 1

N

)]P

.

Now, let us consider the first term,

−12N

∂x

[−ux

(1− 1

N

)+ v(1− x)

(1− 1

N

)]P

,

then, if we do our usual thing of letting N → ∞, then this expressiondiverges. So, let us rescale the u, v such that

u ≡ uN

2, v ≡ vN

2,

then, this first term reads

− ∂

∂x

[−ux

(1− 1

N

)+ v(1− x)

(1− 1

N

)]P

.

Now, we require that u, v are finite as N → ∞. This is equivalent torequiring that u, v are very small. For example, if N ∼ 105 and u, v ∼ 10−6,then this holds. Only under this condition does the diffusion approximationhold.

So, letting N →∞, this first term becomes

− ∂

∂x[−ux+ v(1− x)]P .

Furthermore, notice that the second term of the expansion has terms oforder u, v, which obviously go to zero under this approximation. Therefore,the Kramer-Moyal expansion reads

∂P

∂τ= − ∂

∂x[−ux+ v(1− x)]P+

12∂2

∂x2(x(x− 1)P ) ,

which is a Fokker-Planck equation with

A(x) = −ux+ v(1− x), B(x) = x(1− x).

Notice that without mutation, A(x) is non-existent. Also recall that this isonly valid if the diffusion approximation is valid.

Page 289: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.3 Drift & Diffusion 281

3.3.3 Properties of the Fokker-Planck Equation

We use the acronym FPE to denote the Fokker-Planck equation.We have seen that in one-dimension, the FPE takes on the form

∂P

∂t= − ∂

∂x[A(x, t)P ] +

12∂2

∂x2[B(x, t)P ] , (3.3.8)

where B(x, t) are defined

B(x, t)∆t+O(∆t)2 =⟨

[x(t+ ∆t)− x]2⟩x(t)=t

≥ 0.

Hence, the first thing we see is that B(x, t) ≥ 0.Also, for most processes, A and B will not have an explicit time depen-

dance.

3.3.3.1 Probability Current

Now, we can rewrite the FPE in the form

∂P

∂t+∂J

∂x= 0, (3.3.9)

where we have defined

J(x, t) ≡ A(x, t)P − 12∂

∂x[B(x, t)P ] . (3.3.10)

Now, it is clear that (3.3.9) is a continuity equation, where J is the proba-bility current.

If we compare this continuity equation with the master equation for aone-step process,

dPndt

= Jn − Jn+1,

where Jn ≡ gn−1Pn−1−rnPn, then we see an analogy with the above continu-ity equation. That is, there is a direct correspondence with the continuousdifferential of J , with this discrete difference. That is, we can write thismaster equation using some discrete differential operator,

dPndt

+ ∆nJ = 0, ∆nJ ≡ Jn+1 − Jn.

3.3.3.2 Boundary Conditions

Suppose the system is defined on some interval, so that x ∈ (a, b). Then,one can conceive of some types of boundaries; how the system behaves atthese boundaries.

Page 290: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

282 Advanced Statistical Physics

Reflecting Boundaries Here, there is no net flow of probability across theboundaries x = a and x = b. Hence, the current at the boundary is zero,

J(a, t) = J(b, t) = 0.

A consequence of this is that the normalisation of P (x, t) is preserved overtime. Let us integrate the continuity equation,∫ b

a

∂P

∂tdx+

∫ b

a

∂J

∂xdx = 0,

which easily gives

d

dt

∫ b

aP (x, t)dx+ J(b, t)− J(a, t) = 0.

Now, by having reflecting boundaries, the final two terms are zero, leavingthe statement that

d

dt

∫ b

aP (x, t)dx = 0,

which is the statement that the normalisation of the probability is constant(i.e. independent of time). Hence, if the probability is initially normalised,∫ b

aP (x, 0)dx = 1,

then it retains that normalisation throughout its evolution,∫ b

aP (x, t)dx = 1, ∀t.

Absorbing Boundary Conditions Here, we say that the probability ofbeing at the boundary is zero,

P (a, t) = P (b, t) = 0.

However, if the boundaries are at x = ±∞, then we stipulate that

limx→±∞

P (x, t) = 0,

that is, the probability must decay for large x. This decay means that P isstill normalisable. This also immediately implies that

limx→±∞

∂P

∂x= 0, lim

x→±∞J(x, t) = 0.

Infact, this is only true if A or B do not diverge as x→ ±∞.

Page 291: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.3 Drift & Diffusion 283

3.3.3.3 The Adjoint Operator

We may write the FPE as∂P

∂t= LP,

where

L ≡ − ∂

∂xA(x, t) +

12∂2

∂x2B(x, t). (3.3.11)

So, now we have an equation which looks very much like Schrodingers equa-tion, of quantum mechanics. The main difference is that there are no i’s or~’s. Infact, one can view this as the Schrodinger equation, but in complextime.

Now, the operator L is not necessarily Hermitian (it is real, but may notbe symmetric). Its Hermitian conjugate is

L† = A(x, t)∂

∂x+

12B(x, t)

∂2

∂x2.

3.3.3.4 The Stationary State

From now on, we shall assume that A and B are independent of time t.That is, they are A(x), B(x) only.

The stationary state P st is that state which is independent of time. There-fore, the FPE reduces to

− d

dx

[A(x)P st(x)

]+

12d2

dx2

[B(x)P st(x)

]= 0,

where we have deliberately moved from partial to normal differentials. Now,notice that the continuity equation then becomes,

∂P st

∂t+∂J

∂x= 0 ⇒ ∂J

∂x= 0,

that is, the current J is constant. Now, if we look at the definition of J ,(3.3.10), then we see that

J = A(x)P st(x)− 12∂

∂x

[B(x)P st(x)

]= const.

Now, in the case of reflecting boundaries, we have that J = 0. Hence, theabove simply reads

A(x)P st(x) =12∂

∂x

[B(x)P st(x)

].

Page 292: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

284 Advanced Statistical Physics

This is just

2A(x)P st(x) = B′(x)P st(x) +B(x)dP st(x)dx

,

which rearranges easily into

2A(x)B(x)

− B′(x)B(x)

=1

P st(x)dP st(x)dx

.

We can then integrate this,∫dx

(2A(x)B(x)

− B′(x)B(x)

)=∫dx

1P st(x)

dP st(x)dx

,

which gives ∫dx

(2A(x)B(x)

− B′(x)B(x)

)=

∫dP st(x)P st(x)

= lnP st(x);

which is

lnP st(x) =∫

2A(x)B(x)

dx−∫dB(x)dx

dx

B(x)

=∫

2A(x)B(x)

dx−∫dB(x)B(x)

.

Hence,

lnP st(x) =∫

2A(x)B(x)

dx− lnB(x).

Therefore, this gives (also putting primes on dummy variables),

P st(x) =NB(x)

e2

R xaA(x′)dx′B(x′) , (3.3.12)

where N is some normalisation.

Brownian motion Consider a pollen grain in the overdamped limit, wherebyinertial forces are small compared to viscous forces. The pollen grain movesin a potential V (x) and is described by an FPE, with

A(x) = −dVdx

, B(x) = 2αkBT ;

Page 293: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.3 Drift & Diffusion 285

where T is the temperature of the fluid the grains are immersed in. So, thestationary state (3.3.12) is given by

P st(x) =N

2αkBTe

2R xa

− dVdx′ dx

2αkBT

= Ce−V (x)α−1

kBT .

Notice how similar this final expression is to the Boltzmann distribution.

3.3.3.5 Transformation to Schrodinger-like Equation

Based on our experience with the master equation, let us define

P (x, t) ≡ P (x, t)√P st(x)

. (3.3.13)

Now, for simplicity, let us assume that B is independent of x. Then, we findthat

−B∂P∂t

= −B2

2∂2P

∂x2+ U(x)P ,

where

U(x) ≡ 12

[A(x)]2 +B

2dA

dx. (3.3.14)

This is equivalent to

∂P

∂t= HP , (3.3.15)

where

H ≡ B

2∂2

∂x2−B−1U(x). (3.3.16)

This is equivalent to the Hamiltonian in the Schrodinger equation of quan-tum mechanics. Notice that this H is Hermitian, H† = H.

Hence, a one-dimensional stochastic process, under certain conditions(such as A and B having no explicit time dependance, and B being in-dependent of x), is equivalent to quantum mechanics in imaginary time.The correspondence is

~←→ B,

with unit mass. Infact, recall that fluctuations in quantum mechanics are ofthe order ~; and we had an example (Brownian motion) in which B ∼ kBT .Hence, we see that there are temperature fluctuations of order kBT .

Page 294: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

286 Advanced Statistical Physics

One can show that

H =(P st)−1/2

L(P st)1/2

,

which bears strong resemblance to

Vmn =(P stm

)−1/2Wmn

(P stn

)1/2,

of the master equation formulation. We had that W was not symmetric(hence, non-Hermitian), and that V was Hermitian. Similarly, here, wehave that L is non-Hermitian, but H is Hermitian.

Let us reconsider the overdamped Brownian particle, with

A(x) = −V ′(x), B = 2D.

Hence, the FPE looks like

∂P

∂t=

∂x

[V ′(x)P

]+D

∂2P

∂x2, (3.3.17)

and the “potential” U(x) in the “Schrodinger equation”,

U(x) =12[V ′(x)

]2 −DV ′′(x).

3.3.3.6 Transformation to Adjoint Equation

Suppose we define

Q(x, t) ≡ P (x, t)P st(x)

, (3.3.18)

then, we find that Q(x, t) satisfies the adjoint equation

∂Q

∂t= L†Q. (3.3.19)

3.3.3.7 Time Independent FPE

Suppose we can write

P (x, t) = P (µ)(x)e−µt,

then, as∂P

∂t= −µP (µ)(x)e−µt = −µP (x, t),

that is,

∂P

∂t= −µP (x, t), (3.3.20)

Page 295: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.3 Drift & Diffusion 287

we therefore see that

LP (µ) = −µP (µ);

where we have used L above (3.3.11). Therefore, we have put the FPE intoan eigenvalue problem (compare with the time independent Schrodingerequation Hψ = Eψ). Now, due to the linearity of the FPE, we can developany solution to it into the general solution, as a sum over eigenfunctions,

P (x, t) =∑µ

cµP(µ)(x)e−µt, (3.3.21)

where P (µ)(x) are the eigenfunctions, µ the eigenvalues, and cµ constants.Since the operator L is non-Hermitian, we need to define right- and left-eigenfunctions,

L∣∣∣P (µ)

⟩= −µ

∣∣∣P (µ)⟩, (3.3.22)⟨

Q(µ)∣∣∣ L = −µ

⟨Q(µ)

∣∣∣ , (3.3.23)

L†∣∣∣Q(µ)

⟩= −µ

∣∣∣Q(µ)⟩, (3.3.24)⟨

P (µ)∣∣∣ L† = −µ

⟨P (µ)

∣∣∣ . (3.3.25)

where P and Q are the right- and left-eigenfunctions respectively.We can, as usual, prove orthogonality and completeness of the eigenstates.

Orthogonality Let us take (3.3.22) and multiply through by⟨Q(µ′)

∣∣∣, sothat ⟨

Q(µ′)∣∣∣ L ∣∣∣P (µ)

⟩= −µ〈Q(µ′)|P (µ)〉.

Similarly, let us multiply (3.3.23) by∣∣P (µ)

⟩, and prime unprimed quantities,

such that ⟨Q(µ′)

∣∣∣ L ∣∣∣P (µ)⟩

= −µ′〈Q(µ′)|P (µ)〉.

First, notice that the LHS of both expressions are identical. Let us thensubtract these two expressions, leaving

(µ− µ′)〈Q(µ′)|P (µ)〉 = 0.

Hence, if µ 6= µ′, we must have that

〈Q(µ′)|P (µ)〉 = 0, µ 6= µ′.

Page 296: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

288 Advanced Statistical Physics

Now, we cannot say anything about those states with µ = µ′. If we normalisethe states, by the Gram-Schmidt orthogonalisation procedure (say), then wecan say that

〈Q(µ′)|P (µ)〉 = δµµ′ . (3.3.26)

Thus the statement of orthogonality has been found. Now, we can go a littlefurther. Consider that

I =∫dx |x〉 〈x| ,

where I is just the identity operator. The integral must be understood togo over the entire range upon which the system is defined. Then, insertingthe identity operator between the bra- and ket-state of the orthogonalitystatement, ∫

dx〈Q(µ′)|x〉〈x|P (µ)〉 = δµµ′ .

Now, we use the notation of projecting an operator into a particular coor-dinate representation,

〈Q(µ′)|x〉 = Q(µ′)(x).

Technically, this should be the complex conjugate, but, the functions arereal. So, we hence have∫

dxQ(µ′)(x)P (µ)(x) = δµµ′ . (3.3.27)

This “orthogonality relation” is ok, but it is between two “different” func-tions. Let us get it in terms of the same functions. Now, recall the definitionof Q from (3.3.18). So, using this, the above easily becomes∫

dxP (µ′)(x)P (µ)(x)

P st(x)= δµµ′ . (3.3.28)

Equivalently, we have that∫dxP st(x)Q(µ′)(x)Q(µ)(x) = δµµ′ . (3.3.29)

Now, both of these expressions are “true” orthonormality expressions. Thatis, the eigen-functions P (µ) and Q(µ) are orthonormal, up to a weight func-tion. That is, the weight function for the P (µ) is 1/P st; the weight functionfor the Q(µ) is P st. Infact, if we recall the definition of P from (3.3.13), wecan see that (3.3.28) can be written∫

dx P (µ′)(x)P (µ)(x) = δµµ′ .

Page 297: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.3 Drift & Diffusion 289

Hence, we see that the P (µ) are orthonormal, without the need for a weightfunction.

Completeness If we recall the general solution (3.3.21), and if we multiplythrough by

∫dxQ(µ′)(x), then we have∫dxQ(µ′)(x)P (x, t) =

∑µ

cµe−µt∫dxQ(µ′)(x)P (µ)(x),

the RHS integral of which is just the orthonormality relation (3.3.27), sothat ∫

dxQ(µ′)(x)P (x, t) =∑µ

cµe−µtδµµ′ .

Hence, ∫dxQ(µ′)(x)P (x, t) = cµ′e

−µ′t. (3.3.30)

So, if we insert this back into the general solution (being careful in primingintegration variables), then

P (x, t) =∑µ

∫dx′Q(µ)(x′)P (x′, t)P (µ)(x),

which we rearrange slightly into

P (x, t) =∫dx′ P (x′, t)

∑µ

Q(µ)(x′)P (µ)(x)

.

Now, as this must hold for all P (x, t), then we must have that the bracketedquantity is a delta-function,∑

µ

Q(µ)(x′)P (µ)(x) = δ(x− x′); (3.3.31)

whereby

P (x, t) =∫dx′ P (x′t)δ(x− x′)

= P (x, t).

Hence, we have the completeness relation.

Page 298: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

290 Advanced Statistical Physics

Imposing the Initial Condition Suppose that P (x, t|x0, t0) satisfies theinitial condition

P (x, t0|x0, t0) = δ(x− x0).

Further more, we also have that

P (x, t|x0, t0) =∑µ

cµe−µtP (µ)(x).

Now, from (3.3.30), we therefore have that

cµe−µt0 =

∫dxQ(µ)(x)P (x, t0|x0, t0)

=∫dxQ(µ)(x)δ(x− x0)

= Q(µ)(x0).

Therefore, this determines cµ,

cµ = Q(µ)(x0)eµt0 .

Hence, putting this back into the general solution,

P (x, t|x0, t0) =∑µ

Q(µ)(x0)P (µ)(x)e−µ(t−t0).

Hence, we now have a general solution, with the initial conditions “built in”.

Stationary State The stationary state is the state for whom the probabil-ity distribution is independent of time. That is, from (3.3.20), we see thatthis corresponds to µ = 0. Hence, the general solution reduces to

P (x) = P st(x) = Q(0)(x0)P (0)(x).

From which we see that

Q(0)(x0)P (0)(x)P st(x)

= 1,

however, the fractioned-quantity is just the definition of Q(0)(x), so that thisreads

Q(0)(x0)Q(0)(x) = 1.

From which we say that

Q(0)(x) = 1.

Page 299: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.3 Drift & Diffusion 291

3.3.3.8 Example: Brownian Motion in a Potential

Consider the FPE

∂P

∂t= − ∂

∂x[A(x, t)P ] +

B

2∂2P

∂x2,

where B is a constant. Suppose we have that

A(x) ≡ −γx+D

x, B ≡ 2D,

where γ,D are constants. This is just like Brownian motion in the potential

V ′(x) = −A(x) = γx− D

x.

That is, integrating,

V (x) =12γx2 −D lnx.

x

VHxL

Fig. 3.7. The potential that the Brownian particle “feels”.

The minimum of the potential is easily seen to be at

x =

√D

γ.

Let us say that the system is defined on x ∈ (0,∞), so that J = 0. So,the stationary state, from (3.3.17) is just

d

dx

[V ′(x)P st

]+D

d2P st

dx2= 0,

or, integrating,

V ′(x)P st +DdP st

dx= 0.

Solving this to

P st(x) = N e−V (x)/D,

Page 300: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

292 Advanced Statistical Physics

using our potential,

P st(x) = N e−γ2x2/2Delnx

= Nxe−γ2x2/2D.

Where N is a normalisation constant. Now, due to normalisation, the inte-gral of the stationary state over all space is unity. Hence,∫ ∞

0dxNxe−γ2x2/2D = 1.

To do the integral, make the substitution

z ≡ γx2

2D⇒ dz =

γx

Ddx.

Hence, using this, the integral allows us to find that

N =γ

D.

Therefore, the stationary state is

P st(x) =γx

De−γ

2x2/2D.

x

Pst

Fig. 3.8. The stationary probability distribution of the Brownian particle. Alsoshown is the potential (dashed line) for comparison. Notice that where the proba-bility is peaked, the potential is minimum; which should be obvious: most particleswill be in the minimum of the potential.

The adjoint equation, with our expressions for A(x) and B, looks like

∂Q

∂t=(−γx+

D

x

)∂Q

∂x+D

∂2Q

∂x2.

If we then write that

Q(x, t) = Q(µ)(x)e−µt,

Page 301: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.4 Stochastic Differential Equations 293

then the adjoint equation can be written

Dd2Q(µ)

dx2+(−γx+

D

x

)dQ(µ)

dx+ µQ(µ) = 0.

We can make this look nicer, by making the same change of variables to z.Using this, the above differential equation looks like

zd2Q(µ)

dz2+ (1− z)dQ

(µ)

dz+(µ

)Q(µ) = 0.

This equation is of the form of Laguerres’ equation. This equation also cropsup in the radial part of the Hydrogen atom.

The relevant solutions (i.e. those which satisfy the boundary conditions)only exist if

µ

2γ= n, n = 0, 1, 2, . . . .

The solutions to Laguerres’ equation are called Laguerre polynomials, de-noted Ln(z). The first few are

L0(z) = 1, L1(z) = 1− z, L2(z) = 1− 2z +12z2, . . .

We have the correspondence that

Q(n)(x) = Ln(γx2/2D

), n = 0, 1, 2, . . . .

Notice that these states are discrete.

3.4 Stochastic Differential Equations

We shall introduce stochastic differential equations by example. We considerthe classical example of the Brownian motion of a particle.

3.4.1 Brownian Motion Described by the Langevin Equation

Brownian motions classic example is that of the motion of a pollen grain inwater. The pollen grain appears to undergo random motion, because it iskicked by the water molecules. This is the stochastic motion of the pollengrain.

So, we could say that at time t = 0, the velocity of the pollen grain isv(t = 0) = v0. What do we expect the average values of v(t) and v2(t) tobe, at large times?

The average of v(t) is zero; any motion in one direction will probably be

Page 302: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

294 Advanced Statistical Physics

cancelled out by motion in the opposite, over an ensemble average. We canmake a bit of a guess at the value of

⟨v2⟩. Recall that

12m 〈v〉2eq =

32kBT,

where we use the equipartition theorem for gases. Hence, we have

〈v〉2eq =3kBTm

.

So, the particle moves deterministically, but with random collisions withmolecules of the fluid. Let us write the equation of motion, for a singleBrownian particle:

mr = −αr−∇V + F(t).

The α-term is a friction term, ∇V the potential and F some random force,due to collisions with other molecules. Let us use a simple model, wherebythe particle does not feel a potential, so that the equation of motion is just

mr = −αr + F(t),

or, using the notation that v ≡ r, then

mv = −αv + F(t). (3.4.1)

This is known as a Langevin equation, and must be solved, subject to v(0) =v0. This is a stochastic differential equation, due to the presence of the“random term”. In component form, this reads

mvi = −αvi + Fi(t).

Now, to complete the description of the system, the force Fi(t) has be spec-ified stochastically. That is, we must define the moments,

〈Fi(t)〉 ,⟨Fi(t)Fj(t′)

⟩, . . . . i, j ∈ 1, 2, 3 .

We shall assume that

〈Fi(t)〉 = 0,⟨Fi(t)Fj(t′)

⟩= 2Dδijδ(t− t′), (3.4.2)

where D is a constant, and all higher order moments are zero. The delta-functions mean that there is no correlation in space or time, for the collisionforce. That is, the force Fi becomes uncorrelated over times of a few τ ,where

τ ∼ mean molecular separationmean molecular velocity

∼ 10−13seconds.

Page 303: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.4 Stochastic Differential Equations 295

That is,

〈Fi(t)Fj(t+ s)〉 ≈ 0, s >> τ.

We can write this a little better, as

〈Fi(t)Fj(t+ ∆t)〉 = k(∆t),

where the function k is effectively a spike, centred on t, width of order τ .Such a specification makes the collision force Fi(t) Gaussian; we call such

a force white noise, or just the noise term. That is, if one took the Fouriertransform of the force, one would find contributions from all modes.

So, to summarise, we have the stochastic differential equation with initialcondition

mvi = −αvi + Fi(t), vi(0) = vi,0.

We also impose the Gaussian white noise condition on the collision forceparameter,

〈Fi(t)〉 = 0,⟨Fi(t)Fj(t′)

⟩= 2Dδijδ(t− t′),

where this gives a complete description of the Brownian motion of a particle.

3.4.2 The Solution to the Langevin Equation Describing

Brownian Motion

Let us restrict our attention to one dimension, then the Langevin equationbecomes

v = −γv + η(t),

where we rescale the damping and noise term as

γ ≡ α

m, η(t) ≡ F (t)

m.

We subject the noise term to

〈η(t)〉 = 0,⟨η(t)η(t′)

⟩=

2Dm2

δ(t− t′).

Now, we have to solve v+γv = η(t) subject to the initial condition v(0) = v0.So, if we multiply the Langevin equation by the integrating factor

eRγdt = eγt,

the Langevin equation becomes

d

dt

[v(t)eγt

]= eγtη(t).

Page 304: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

296 Advanced Statistical Physics

We can solve this by multiplying and integrating, to∫ t

0dt′

d

dt′

[v(t′)eγt

′]

=∫ t

0dt′eγt

′η(t′),

which gives

v(t)eγt − v(0) =∫ t

0dt′eγt

′η(t′).

Using the initial condition that v(0) = v0, and rearranging, this is just

v(t) = v0e−γt + e−γt

∫ t

0dt′eγt

′η(t′).

Let us find the ensemble average 〈v(t)〉. The expectation value of the firstterm on the RHS is just itself (it is a fixed number). Now, the ensembleaverage and integral commute, so that

〈v(t)〉 = v0e−γt + e−γt

∫ t

0dt′eγt

′ ⟨η(t′)

⟩,

but, 〈η(t′)〉 = 0, hence,

〈v(t)〉 = v0e−γt.

Let us now compute⟨v2(t)

⟩. So,

v2(t) = v20e−2γt+2v0e

−2γt

∫ t

0dt′eγt

′η(t′)+e−2γt

∫ t

0dt′eγt

′η(t′)

∫ t

0dt′′eγt

′′η(t′′),

and hence the expectation value

⟨v2(t)

⟩= v2

0e−2γt + 2v0e

−2γt

∫ t

0dt′e−γt

′η(t′) +

e−2γt′⟨∫ t

0dt′eγt

′η(t′)

∫ t

0dt′′eγt

′′η(t′′)

⟩,

which we write as

⟨v2(t)

⟩= v2

0e−2γt + 2v0e

−2γt

∫ t

0dt′e−γt

′ ⟨η(t′)

⟩+

e−2γt′∫ t

0dt′∫ t

0dt′′eγ(t′+t′′)

⟨η(t′)η(t′′)

⟩.

Page 305: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.4 Stochastic Differential Equations 297

We can then use our conditions for 〈η(t)〉 , 〈η(t)η(t′)〉, so that⟨v2(t)

⟩= v2

0e−2γt + e−2γt′ 2D

m2

∫ t

0dt′∫ t

0dt′′eγ(t′+t′′)δ(t′ − t′′)

= v20e−2γt + e−2γt′ 2D

m2

∫ t

0dt′e2γt′

= v20e−2γt + e−2γt 2D

m2

12γ

[e2γt′

]t0

= v20e−2γt +

D

γm2e−2γt

(e2γt − 1

).

Hence, ⟨v2(t)

⟩= v2

0e−2γt +

D

γm2e−2γt

(e2γt − 1

).

By the definition of our constants, we can write

D

m2γ=Dm

m2α=

D

mα,

hence, ⟨v2(t)

⟩= v2

0e−2γt +

D

mαe−2γt

(e2γt − 1

)= v2

0e−2γt +

D

(1− e−2γt

).

So, suppose that t→∞, then this becomes

limt→∞

⟨v2(t)

⟩=

D

mα.

Thus, we see that 〈v(t)〉 = 0 but⟨v2(t)

⟩ 6= 0. This is because of theunderlying equilibrium. The other terms show the dynamics as the systemapproaches equilibrium. Also, as t → ∞, the Brownian particle thermallyequilibrates with the fluid. Hence, let us notate

limt→∞

⟨v2(t)

⟩= v2

eq.

By the equipartition theorem, in 1D, we have that

12mv2

eq =12kBT,

where T is the temperature of the fluid. Hence,

12m

D

mα=

12kBT ⇒ D = αkBT.

Page 306: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

298 Advanced Statistical Physics

Therefore, we have a rather interesting relation. We are able to link therandom fluctuations of the fluid, to the damping and temperature of thefluid. That is, we have D which is an “unknown microscopic” quantity, andα, T which are both “known macroscopic” quantities. This is an example ofthe dissipation/fluctuation theorem.

Using this dissipation/fluctuation relation, we have that⟨v2(t)

⟩= v2

0e−2γt +

kBT

m

(1− e−2γt

).

We can also compute the correlation function,

〈v(t1)v(t2)〉 = v20e−γ(t1+t2) + e−γ(t1+t2)

∫ t1

0dt′∫ t2

0dt′′eγ(t′+t′′)

⟨η(t′)η(t′′)

⟩.

Now, we apply a “trick” to the integrals. First, we put in our correlationbetween the η’s;∫ t1

0dt′∫ t2

0dt′′eγ(t′+t′′)

⟨η(t′)η(t′′)

⟩ 7−→ ∫ t1

0dt′∫ t2

0dt′′eγ(t′+t′′)δ(t′ − t′′).

Then, suppose that t2 > t1, then, we see that∫ t1

0dt′∫ t2

0dt′′eγ(t′+t′′)δ(t′ − t′′) =

∫ t1

0dt′∫ t1

0dt′′eγ(t′+t′′)δ(t′ − t′′)

+∫ t1

0dt′∫ t2

t1

dt′′eγ(t′+t′′)δ(t′ − t′′).

Now, the second term is always zero, as t′ is never equal to t′′. Hence,∫ t1

0dt′∫ t2

0dt′′eγ(t′+t′′)δ(t′ − t′′) =

∫ t1

0dt′∫ t1

0dt′′eγ(t′+t′′)δ(t′ − t′′).

Therefore, using this, we see that

〈v(t1)v(t2)〉 = v20e−γ(t1+t2) +

D

αm

(e−γ|t1−t2| − e−γ|t1+t2|

).

Notice that as t1, t2 →∞,

limt1,t2→∞

〈v(t1)v(t2)〉 =kBT

me−γ|t1−t2|.

Hence, the correlation function in equilibrium.As this is such a “useful trick”, let us write the main steps again. We take

the integral, and split up the integral we deem to have the largest upper

Page 307: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.4 Stochastic Differential Equations 299

limit∫ t1

0dt′∫ t2

0dt′′f(t′, t′′)δ(t′ − t′′) =

∫ t1

0dt′[∫ t1

0dt′′f(t′, t′′)δ(t′ − t′′)+∫ t2

t1

dt′′f(t′, t′′)δ(t′ − t′′)].

In the second integral, we see that t′ ∈ (0, t1) and t′′ ∈ (t1, t2), where thereis no overlap in the limits. So, an integral over such limits, with a delta-function as an integrand, results in zero. Therefore,∫ t1

0dt′∫ t2

0dt′′f(t′, t′′)δ(t′ − t′′) =

∫ t1

0dt′∫ t1

0dt′′f(t′, t′′)δ(t′ − t′′)

=∫ t1

0dt′f(t′, t′).

This will generally greatly simplify integrals.

3.4.3 Comments on the Langevin Equation

Here we shall present some generalisations, comments, and other examplesof using the Langevin equation.

The Langevin equation, allowing for an external potential V ′(x) (in 1D)reads

mx = −αx− V ′(x) + F (t).

Frequently, the mx term is unimportant compared to αx. This is called theoverdamped limit, so that the Langevin equation reads

αx = −V ′(x) + F (t),

If there is no external potential, then this simply reads

αx = F (t).

If one finds the variance of x2(t), then one finds that⟨⟨x2(t)

⟩⟩= 2D′t, D′ ≡ kBT

α,

which is the usual random walk result.Let us consider if the stochastic process described by the Langevin equa-

tion, is a Markov process.For example, consider the Langevin equation

mv + αv = F (t).

Page 308: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

300 Advanced Statistical Physics

This is a Markov process if F (t) is white noise, and is not Markovian if F (t)is not white noise. Now, notice that the differential may be written

v(t) =v(t+ τ)− v(t)

τ,

so that the Langevin equation reads

mv(t+ τ)−mv(t) = −ατv(t) + τF (t), τ → 0.

As this equation is of first order, and if F (t) is uncorrelated (i.e. has nomemory of past times), then this is Markovian. Similarly,

x = −V ′(x) + F (t)

is Markovian if F (t) is white noise. Now,

mx+ αx+ V ′(x) = F (t)

is not Markovian, because upon expansion of the differential, one finds

m [x(t+ τ)− 2x(t) + x(t− τ)] = −τ2V ′(x) + . . . ,

which depends not only upon the current state of the system, but the stateof the system in (one time-step) in the past. This second order propertymakes the Langevin equation non-Markovian. To get around this problem,we could define v(t) ≡ x(t), so that the 1D Langevin equation becomes 2D,

x = v(t), v = −γv − V ′(x)m

+F (t)m

,

that is, two 1D differential equations. If F (t) is white noise, then this systemis Markovian.

3.4.4 Equivalence to the Fokker-Planck Equation

Recall the Kramers-Moyal expansion (3.3.4),

∂P

∂t=∞∑`=1

(−1)`

`!∂`

∂x`

(D(`)(x, t)

),

where the D(`) were defined as⟨[x(t+ ∆t)− x(t)]`

⟩x(t)=x

= M`(x, t,∆t) = D(`)(x, t)∆t+O(∆t)2.

Now, we can use the Langevin equation to compute the jump moments. Asan example, consider the Langevin equation

v = −γv + η(t),

Page 309: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

3.4 Stochastic Differential Equations 301

where

〈η(t)〉 = 0,⟨η(t′)η(t′′)

⟩=

2Dm2

δ(t′ − t′′).

So, let us integrate the Langevin equation, from t to t+ ∆t; giving∫ t+∆t

tv(t′)dt′ = −γ

∫ t+∆t

tv(t′)dt′ +

∫ t+∆t

tη(t′)dt′,

which easily gives

v(t+ ∆t)− v(t) = −γv(t)∆t+O(∆t)2 + f(t), f(t) ≡∫ t+∆t

tη(t′)dt′.

Now, notice that

〈f(t)〉 = 0,

and that ⟨f2(t)

⟩=

∫ t+∆t

tdt′∫ t+∆t

tdt′′⟨η(t′)η(t′′)

⟩=

2Dm2

∫ t+∆t

tdt′

=2Dm2

∆t.

Furthermore, it is easy to convince ones-self that⟨f `(t)

⟩= O(∆t)`, ` ≥ 3.

Therefore, using these, one can easily see that the moments are

〈[v(t+ ∆t)− v(t)]〉v(t)=v = −γv∆t+O(∆t)2,⟨[v(t+ ∆t)− v(t)]2

⟩v(t)=v

=2Dm2

∆t+O(∆t)2,⟨[v(t+ ∆t)− v(t)]`

⟩v(t)=v

= O(∆t)`, ` ≥ 3.

Now, the D(`) is the coefficient of ∆t. Hence, we read off that

D(1) = −γv,D(2) = 2D, D ≡ D

m2,

D(`) = 0, ` ≥ 3.

Page 310: Advanced Theoretical Physics Semester 1 Jonathan · PDF fileAdvanced Theoretical Physics Semester 1 Jonathan Pearson School of Physics & Astronomy, University of Manchester August

302 Advanced Statistical Physics

Therefore, we see that the Kramers-Moyal expansion naturally collapses intothe Fokker-Planck equation

∂P

∂t= − ∂

∂v(−γvP ) +

12!∂2

∂v2

(2DP

),

or, cleaning up slightly,

∂P

∂t=

∂v(γvP ) +

∂2

∂v2

(DP

).

The solution to this, which may be verified, is

P (v, t|v0, 0) =1√

2πσ2e−

(v−v0e−γt)2

2σ2 , σ2(t) ≡ D

αm

(1− e−2γt

),

which is a Gaussian. That the solution is a Gaussian is a product of themaster equation being linear in v.

Now, as another example, consider the overdamped Brownian particle, inpotential V (x),

x = −V ′(x) + F (t).

Now, an identical argument as in the previous section, finds that

D(1) = −V ′(x), D(2) = 2D, D(`) = 0, ` ≥ 3.

Which gives the Fokker-Planck equation

∂P

∂t=

∂x

(V ′(x)

)+D

∂2P

∂x2.

This is the FPE claimed to be describing Brownian motion, from the previ-ous section.