gravitation och kosmologi lecture notes - uppsala · pdf filegravitation och kosmologi...

Gravitation och Kosmologi

Lecture Notes

Joseph A. Minahan c©

Uppsala, 2002-2012

Chapter 0

Overview

This course is an introduction to Einstein’s theory of general relativity. It is assumedthat you are already familiar with special relativity, which is relevant when particles aretraveling near to the speed of light1. General relativity takes the concepts from specialrelativity but then generalizes them so that one may also include gravity. Hence, ingeneral relativity we will see many notions already introduced in special relativity, suchas 4-vectors, tensors, proper times etc. But we will also learn some new concepts such asmetrics, local inertial frames, parallel transport and curvature. Eventually we will puttogether these new pieces to construct the Einstein equations.

Once we have the equations, we will look for solutions. When Einstein first formulatedhis equations, he assumed that exact solutions would never be found. However, withinsix months a 42 year old German physicist named Karl Schwarzchild did find an exactsolution while serving in the German army. Unfortunately for Schwarzchild he died thenext year at the Russian front (although from disease). At first, Schwarzchild’s solutionwas considered a curiosity, for it had many strange properties. But it was later realizedthat it was the solution for a black hole. We will spend some time in this class studyingSchwarzchild’s solution.

It is also possible to find another class of exact solutions to Einstein’s equations.These are the solutions that describe a uniform expanding universe, which happily, isvery close to how our own universe has evolved through time. We will spend a significantpart of time on these solutions, which makes up the “Kosmologi” part of the course.

0.1 A note on units

The speed of light is a natural constant that appears throughout relativity, so physicistsoften use units to simplify this constant. In this course we will adopt this conventionand choose units where c, the speed of light, is c = 1. So for example, if we use onemeter as our unit of length, then the unit of time is also a meter. In other words, this isthe amount of time that it takes for light to travel one meter. As you can see, the speed

1A fairly complete set of notes on special relativity can be found athttp://www.teorfys.uu.se/people/minahan/Courses/SR/notes.pdf

i

Gravi. o. Kosm. Lecture Notes — J. Minahan

is c = 1 meter1 meter = 1. If we measure masses using kilograms, then energy and momentum

are also measured in kilograms. Velocities are dimensionless.The big advantage of using these units, is that it is no longer necessary to write down

factors of c. Note that we can always go back to the more conventional units at the endof any computation. So for example, suppose you are computing a time, T and youranswer is 100 meters. The conversion rate is 1 second = 3 × 108 meters. The time inseconds is then T = 100 meters/(3× 108 meters/second) ≈ 3.3× 10−7 seconds.

ii

Contents

0 Overview i0.1 A note on units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i

1 Review of Special Relativity 11.1 The basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Lorentz invariants and the metric in Minkowski space . . . . . . . . . . . 51.3 Relativistic physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3.1 More on space-time diagrams . . . . . . . . . . . . . . . . . . . . 61.3.2 The relativity of simultaneity and causality . . . . . . . . . . . . . 71.3.3 Length contraction . . . . . . . . . . . . . . . . . . . . . . . . . . 81.3.4 Time dilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3.5 The twin paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3.6 Velocity transformations . . . . . . . . . . . . . . . . . . . . . . . 101.3.7 Doppler shifts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 The equivalence principle 142.1 Noninertial frames: the accelerating frame . . . . . . . . . . . . . . . . . 14

2.1.1 Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.1.2 The accelerating frame . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2 The weak equivalence principle . . . . . . . . . . . . . . . . . . . . . . . 20

3 Tensors and basis vectors 223.1 One-forms and tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.2 Lowering indices, the inverse metric and raising indices. . . . . . . . . . . 243.3 Scalar functions and derivatives . . . . . . . . . . . . . . . . . . . . . . . 243.4 Tensors for noninertial frames . . . . . . . . . . . . . . . . . . . . . . . . 253.5 Basis vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.6 The velocity vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.7 The covariant derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4 Equations of motion in curved space 314.1 Free-falling and geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.1.1 The metric is covariantly constant . . . . . . . . . . . . . . . . . . 314.1.2 The local inertial frame . . . . . . . . . . . . . . . . . . . . . . . . 324.1.3 Free-falling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

iii


4.1.4 Geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.1.5 The particle action . . . . . . . . . . . . . . . . . . . . . . . . . . 364.1.6 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.1.7 Light-like trajectories . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.2 Nonrelativistic motion in weak gravitational fields . . . . . . . . . . . . . 394.3 Isometries and Killing vectors . . . . . . . . . . . . . . . . . . . . . . . . 40

4.3.1 Existence of Killing vectors . . . . . . . . . . . . . . . . . . . . . 404.3.2 The example again . . . . . . . . . . . . . . . . . . . . . . . . . . 414.3.3 The conserved quantities . . . . . . . . . . . . . . . . . . . . . . . 42

5 Curvature and the Riemann tensor 435.1 Parallel transport and curvature . . . . . . . . . . . . . . . . . . . . . . . 43

5.1.1 Parallel transport . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.1.2 Properties of the Riemann tensor . . . . . . . . . . . . . . . . . . 475.1.3 The Ricci tensor, Ricci scalar and Einstein tensor . . . . . . . . . 495.1.4 Independent components . . . . . . . . . . . . . . . . . . . . . . . 49

5.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.2.1 A round two-dimensional sphere . . . . . . . . . . . . . . . . . . . 505.2.2 Weak gravitational field . . . . . . . . . . . . . . . . . . . . . . . 515.2.3 The Schwarzschild metric . . . . . . . . . . . . . . . . . . . . . . 52

6 The energy-momentum tensor 556.1 Construction and properties . . . . . . . . . . . . . . . . . . . . . . . . . 556.2 Perfect fluids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576.3 Radiation as a perfect fluid . . . . . . . . . . . . . . . . . . . . . . . . . . 58

7 Einstein equations 607.1 Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607.2 Einstein’s “greatest blunder” — Not! . . . . . . . . . . . . . . . . . . . . 61

8 The Schwarzschild solution to Einstein’s equations 638.1 The Schwarzschild metric . . . . . . . . . . . . . . . . . . . . . . . . . . . 638.2 Tests of general relativity . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

8.2.1 Bending of light . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648.2.2 The precession of the perihelion of mercury . . . . . . . . . . . . . 66

8.3 The effective potential and the stability of orbits . . . . . . . . . . . . . . 708.3.1 Precession once again . . . . . . . . . . . . . . . . . . . . . . . . . 71

8.4 The horizon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728.5 Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

9 Introduction to cosmology 799.1 The Friedmann-Robertson-Walker Universe . . . . . . . . . . . . . . . . . 79

9.1.1 Flat, closed and open . . . . . . . . . . . . . . . . . . . . . . . . . 799.1.2 The Robertson-Walker metric . . . . . . . . . . . . . . . . . . . . 819.1.3 Scale dependence of the energy density . . . . . . . . . . . . . . . 81

iv


9.1.4 The Friedmann equations . . . . . . . . . . . . . . . . . . . . . . 829.1.5 Solutions for a definite equation of state . . . . . . . . . . . . . . 85

9.2 Observing the Universe . . . . . . . . . . . . . . . . . . . . . . . . . . . . 879.2.1 Red-shifts, proper distances and the age of the universe . . . . . . 879.2.2 Brightness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

10 The thermal history of the universe 9610.1 The thermal universe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

10.1.1 Photons in the early universe . . . . . . . . . . . . . . . . . . . . 9610.1.2 Other contributions to g∗ . . . . . . . . . . . . . . . . . . . . . . . 9910.1.3 The recombination temperature . . . . . . . . . . . . . . . . . . . 10110.1.4 Nucleosynthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

11 Cosmology continued 10411.1 Inflation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

11.1.1 The particle horizon . . . . . . . . . . . . . . . . . . . . . . . . . 10411.1.2 Inflation and the particle horizon . . . . . . . . . . . . . . . . . . 10511.1.3 The free lunch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

11.2 Other things to do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

v

Chapter 1

Review of Special Relativity

Before turning to general relativity we will first review some aspects of special relativitywhere gravity is absent.

1.1 The basics

In special relativity we learned that observers in different reference frames can havedifferent measurements of physical quantities. For us, a reference frame S will be aset of coordinates in 4 space-time dimensions (t, x, y, z), but in order to uniformize thecoordinates we will instead write this as

(x0, x1, x2, x3) , (1.1.1)

where x0 = ct = t is the time coordinate. Notice that we have replaced c with 1. A pointin space-time is called an event and the displacement between two events we will write

as−→∆x where

−→∆x is a 4-vector with components ∆xµ where µ = 0 . . . 3. An observer at

rest in a different reference frame S′ will have a different set of coordinates

(x0′ , x1′ , x2′ , x3′) , (1.1.2)

and will measure different components for the displacement 4-vector, ∆xµ′. We are

assuming that both observers are referring to the same two events so the displacement4-vector is the same in both frames. What differs is how the vector is written in thecomponents of the respective frames. To perhaps make this clearer, consider the twodimensional vector ~A in figure 1.1. Two sets of coordinates are given (x, y) and (x′, y′)and the primed coordinates are related to the unprimed coordinates by a rotation ofangle θ. In the unprimed coordinates the components of ~A are (a, 0), while in the

primed coordinates they are given by (cos θ a,− sin θ a), where a is the length of ~A. Butthe vector itself has not changed.

In special relativity we are usually concerned with a particular type of reference framecalled an inertial frame. An inertial frame has the properties:

1


Ax

x’

y

y’

!

Figure 1.1: Vector ~A in terms of two different coordinate systems.

1. There is a universal time coordinate that can be synchronized everywhere in theinertial frame. This means is that at every spatial point in the reference frame wecan place a clock and that all the clocks agree with each other.

2. The spatial components are Euclidean. That is, the spatial components satisfy allaxioms of Euclidean geometry.

3. A body with no forces acting on it will travel at constant velocity according to theclocks and measuring sticks in the inertial frame.

An important question is how a measurement differs for two observers who are atrest in different inertial frames, which we call S and S′. To answer this question we turnto Einstein’s two postulates for special relativity,

1. The laws of physics are identical in any inertial frame.

2. The speed of light in a vacuum, c, is the same in any inertial frame.

The first postulate tells us that the relation between coordinates in S and S′ is linear.To see this, suppose that there is a body with no forces on it and so moves at constantvelocity in both frames. The body moves on a space-time trajectory called a world line.In figure 1.2 we show the world-line for this body on a space-time diagram. Since thereare no forces on the body, the world-line is a straight line.

To measure a velocity we observe the displacement between any two events on theworld-line, say for instance P and Q. The velocity components in S are vi = ∆xi

∆x0where

i = 1, 2, 3 (we will often use latin indices to signify spatial components) while those in S′

are vi′

= ∆xi′

∆x0′. The velocity components for either frame are the same no matter which

two events we choose. In order for this to be true for any choice of the two events, wemust have

∆xµ′= Λµ′

ν ∆xν , (1.1.3)

2


Q

x

x0

1

P

Figure 1.2: A space-time diagram showing the world-line for a body passing throught thespace-time points P at the origin and Q. The dashed lines are light-like trajectories.

where Λµ′ν are constants in xµ and we use the repeated index convention where an

identical index in an up and down position is summed over. Such repeated indices aresometimes called dummy indices, while an unrepeated index is called a free index. TheΛµ′

ν are components of the Lorentz transformation matrix relating frame S to frameS′. The different frames can be related to each other through boosts, or by rotationsin space, or combinations of the two. For a boost in the x-direction, we say that S′ ismoving with velocity v x with respect to (wrt) S. Einstein’s second postulate tells usthat if two observers were to measure the speed of a light ray, they would both find theuniversal constant c, which in our units is 1. In order for this to be true, the Lorentztransformation matrix for the boost must be

Λ =

γ −v γ 0 0−v γ γ 0 0

0 0 1 00 0 0 1

, where γ =1√

1− v2. (1.1.4)

The inverse Lorentz transformation is

∆xµ = (Λ−1)µν′ ∆xν′ , (1.1.5)

where for a boost in the x direction is

Λ−1 =

γ +v γ 0 0

+v γ γ 0 00 0 1 00 0 0 1

. (1.1.6)

It is obvious from the placement of the primed and unprimed indices that (Λ−1)µν′ isthe inverse, so we drop the exponent “−1” and simply write it as Λµ

ν′ .

3


A boost in the y direction with velocity v would look like

Λy−boost =

γ 0 −v γ 00 1 0 0−v γ 0 γ 0

0 0 0 1

, (1.1.7)

which is a different Lorentz transformation. A Lorentz transformations that is a rotationof angle φ in the x− y plane leaves the x0 and x3 coordinates alone and only mixes thex1 and x2 coordinates. The matrix for this Lorentz transformation is given by

Λrot =

1 0 0 00 cosφ − sinφ 00 sinφ cosφ 00 0 0 1

, (1.1.8)

There is a way of reformulating boosts so that they look like the rotations in (1.1.8).We can define the rapidity, ξ, as cosh ξ = γ and sinh ξ = v γ. Notice that the identitycosh2 ξ − sinh2 ξ = 1 is automatically satisfied. Then the boost matrix Λ in (1.1.4)becomes

Λ =

cosh ξ − sinh ξ 0 0− sinh ξ cosh ξ 0 0

0 0 1 00 0 0 1

, (1.1.9)

with an obvious similarity to the transformation in (1.1.8).

Instead of finite displacements−→∆x we will have occasion to consider infinitesimal

displacements−→d x. In fact, once we consider gravity we will be forced to do so. The

infinitesimal displacement has the same Lorentz transformation properties,

d xµ′= Λµ′

ν d xν . (1.1.10)

By the rules of differential calculus with several variables, we know that when we changethe variables the relation between the differentials is given by

d xµ′=∂ xµ

′

∂ xνd xν , (1.1.11)

hence we identify Λµ′ν = ∂ xµ

′

∂ xν. Notice that (1.1.11) applies for any two reference frames;

neither S nor S′ has to be an inertial frame in order for this to be true. However, (1.1.10)is true only if S and S′ are both inertial frames.

For a general 4-vector ~A, the Lorentz transformation properties for the componentsare exactly the same

Aµ′= Λµ′

ν Aν . (1.1.12)

A 4-vector with these transformation properties is called contravariant.

4


1.2 Lorentz invariants and the metric in Minkowski

space

We are often interested in quantities that are the same as measured by any observer, nomatter what the reference frame. If we are talking about inertial frames then we callsuch quantities Lorentz invariants, but if we are considering noninertial frames, then wesimply call these invariants. One of the most important invariants is the length squared∆ s2. For an inertial frame S this is given as

∆ s2 ≡ −(∆x0)2 + (∆x1)2 + (∆x2)2 + (∆x3)2 ≡−→∆x ·

−→∆x , (1.2.1)

or in differential form,

d s2 = −(d x0)2 + (d x1)2 + (d x2)2 + (d x3)2 ≡−→d x ·−→d x . (1.2.2)

As you can see (1.2.1) looks like the length squared for a Euclidean space, except forthe minus sign in front of (∆x0)2. This is clearly invariant under rotations in the spatialdirections. It is also invariant under boosts, as is most easy to check using the form ofthe transformations in (1.1.9).

We can also define ∆s2 ≡ −∆τ 2, where τ is called the proper time. The propertime is the time elapsed in the rest frame of the body whose time we are considering.In the rest frame, the spatial components of the displacement are ∆xi = 0, so clearly∆x0 = ∆τ .

We can write (1.2.1) and (1.2.2) as

∆ s2 = ηµν ∆xµ ∆xν

d s2 = ηµν d xµ d xν ,

where we again use the repeated index notation and where

ηµν = diag(−1,+1,+1,+1) . (1.2.3)

ηµν is called the metric for flat Minkowski space. We will explain what we mean by flatin a later lecture. A more general metric for a noninertial frame, that is one that is notflat Minkowski space, will be written as gµν and the length squared as

ds2 = gµν dxµdxν . (1.2.4)

It is not uncommon to refer to ds2 as the metric since it contains the metric components.Usually when we are speaking of noninertial frames we will need to consider the infinites-imal displacements. Notice that since dxµdxν = dxνdxµ, only the symmetric part of gµνwill contribute, so without any loss of generality we can let gµν = gνµ.

As an example, the invariant length squared for flat Euclidean space in three dimen-sions can be written as

ds2 = (d x1)2 + (d x2)2 + (d x3)2 = gij d xi dxj , g = diag(+1,+1,+1) . (1.2.5)

5


(It is convential to use latin letters i, j, k etc. for spatial indices). However, if we usecylindrical coordinates (r, φ, z) where x1 = r cosφ, x2 = r sinφ, x3 = z, then

dx1 = cosφ dr − r sinφ dφ , dx2 = sinφ dr + r cosφ dφ dx3 = dz. (1.2.6)

Hence the invariant length squared is

ds2 = dr2 + r2dφ2 + dz2 , (1.2.7)

and the various components of the metric are

grr = 1 gφφ = r2 gzz = 1 . (1.2.8)

It is possible to have off-diagonal terms, such as grφ, but in this case they are zero. Wecan also see for the cylindrical coordinates that it was necessary to restrict to infinitesimaldisplacements, since the metric is not constant but depends on the coordinate r.

1.3 Relativistic physics

Throughout this section we assume that S′ is moving with velocity vx wrt S.

1.3.1 More on space-time diagrams

Let us return to the space-time diagram in figure 1.2. The diagram shows two trajectories(dashed lines) emanating from P at 45 degree angles. On these trajectories, we havethat x1 = ±x0. Hence, if these are trajectories of particles we see that they have velocityu = x1/x0 = ±1. Therefore, these particles are traveling at the speed of light and wecall the trajectories light-like trajectories. They are also called null trajectories since fora displacement along this trajectory, (∆s)2 = (∆x1)2 − (∆x0)2 = 0.

The x0 axis is itself the trajectory for a body at rest in frame S. Hence this is calleda time-like trajectory. A time-like trajectory has (∆s)2 < 0. The x1 axis is called aspace-like trajectory. These trajectories have (∆s)2 > 0. Trajectories for bodies are alsoknown as world-lines.

We can also include the axes for a different frame on our space-time diagram. Noticethat for frame S the x1 axis is defined by the line x0 = 0, while the x0 axis is definedby the line x1 = 0. Hence, we can draw axes for a different frame S′ by finding the linesx0′ = 0 and x1′ = 0. Let us assume that the origin is the same for both frames. Then theline x0′ = 0 is γx0 − v γx1 = 0. Hence, this is the line x0 = v x1. Assuming that v < 1,then the slope of this line is less than 1. Likewise, the line x1′ = 0 is γx1 − v γx0 = 0which leads to the line x0 = 1

vx1 which has a slope greater than one. The space-time

diagram showing both sets of axes is shown in figure 1.3. Notice that as v gets closerto c, the slopes of the x0′ axis and the x1′ axis get closer to one. In other words theyapproach the null trajectory from opposite sides. If S′ is the rest frame of a body, thenthe x0′ axis is the body’s trajectory, assuming that the trajectory goes through the originat P.

6


1’

0

x1P

x0’

x

x

Figure 1.3: A space-time diagram showing the axes for reference frames S and S′.

1.3.2 The relativity of simultaneity and causality

Figure 1.4 shows the space-time diagram with events R and Q included. Notice thatevent R occurs at the same time as event P according to an observer in S since bothevents sit on the x1 axis which has x0 = 0. But an observer in S′ would see somethingdifferent. According to this observer, the event R happened before P since R is belowthe x1′ axis and hence occurred for some time when t′ < 0. On the other hand, event Qis simultaneous with P according to the S′ observer, but occurs after P according to theS observer. Thus we see that the notion of simultaneous events is a relative concept.

We can now ask if event P can cause event Q. We can say that this causality wouldoccur if a signal can emanate from P and travel to Q, thus causing it. Let us supposethat the signal is some body whose world-line is the x1′ axis. One consequence of this isthat the body’s speed is greater than the speed of light. One can already see that troubleoccurs if we try to boost to a frame moving faster than the speed of light, because γwill be imaginary. But there is a more fundamental problem. According to the thirdreference frame in figure 1.4, S′′, event Q happened before event P. Hence, there is noway that P could cause Q. Hence, we must conclude that there is no way to send a signalalong x1′ . In fact, we cannot send a signal along any space-like trajectory, only alongtime-like or null trajectories. From this we also must conclude that no physical particlecan have a world-line along a space-like trajectory. Looking again at figure 1.4, we seethat event W is connected to P by a time-like trajectory. Hence, P can cause W. Whenthis occurs for two events we say that they are causally connected. It is also true thatW and P are not simultaneous in any inertial frame.

Figure 1.5 is known as a light-cone diagram. Any event in the shaded region belowP, including the boundary can cause P, because there is a time-like or null world-linethat connects this event to P. This region is called the past light-cone. Similarly, anyevent in shaded region above P can be caused by P because there is a time-like or nullworld-line connecting P to the event.

7


W

0

x1

x1’’

P

x0’

x1’

R

Q

x0’’x

Figure 1.4: Event R is simultaneous with P according to an observer in S but not accordingto an observer in S′. Similarly, event Q is simultaneous with event P according to an observerin S′, but not according to an observer in S. Event W is causally connected to P.

1.3.3 Length contraction

Suppose we have a bar of length L that is stationary in S′ and is aligned along the xaxis. What length would an observer in S measure?

In problems of this type, the key to solving this question is properly setting up theequations. In this case one should ask “how would an observer in S measure the lengthof the bar?” A reasonable thing to do is to measure the positions of the front and back ofthe bar simultaneously, that is find their positions at the same time t, and then measurethe displacement, ∆x ≡ ∆x1. Since the observer is making a simultaneous measurementaccording to his clocks, we have that ∆t ≡ ∆x0 = 0. We also know that the displacementin S′ is ∆x′ = L since the bar is stationary in this frame.

We now use the relations in (1.1.5) and (1.1.6) to write

∆t = 0 = γ (∆t′ + v∆x′) ⇒ ∆t′ = −v∆x′

∆x = γ (∆x′ + v∆t′) = γ(1− v2

)∆x′ =

L

γ. (1.3.1)

Hence the observer in S measures a contracted length since γ > 1.This is particularly clear if we look at the corresponding space-time diagram shown

in figure 1.6. In this diagram we have shown the trajectories for the front and back ofthe bar. We are still assuming that S′ is the rest-frame. The length of the bar in therest-frame is the length between the intersection points of the trajectories with the x1′

axis, while the length in the S frame is the length between the intersection points on thex1 axis. Clearly the distances are not the same and in fact the distance is longer on thex1′ axis.

8


P

0

x1

x

Figure 1.5: Light-cone diagram for event P. Those events in the past light-cone can cause Pand those events in the future light-cone can be caused by P.

1.3.4 Time dilation

Another interesting phenomenon is time dilation. Suppose we have a clock at a fixedpoint in S′ and it measures a time interval ∆t′. What is the time interval measured byan observer in S? Note that ∆t′ = ∆τ is the elapsed proper time on the clock.

Again, the key to solving this question is setting up the problem correctly. Since theclock is stationary in S′ we have that there is no spatial displacement in this frame, thus∆x′ = 0. Then we just use (1.1.3) and (1.1.4) to obtain

∆t = γ∆t′ . (1.3.2)

Since γ > 1, the observer in S measures a longer elapsed time than the proper time ofthe clock. Therefore, the clock in S′ seems to be running slow according the observer’sclocks in S.

1.3.5 The twin paradox

In our discussion of time dilation, you might have noticed the possibility of a contradic-tion. We argued that an observer in S would measure the stationary clocks in S′ to berunning slow. But an observer in S′ would also think the clocks in S are running slow.We can then spin this into a paradox as follows:

Suppose there are two clocks A and B and the world-lines of the clocks both passthrough the space-time point P. We assume that A is at rest in S and B is at rest in S′.B then travels to a space-time point Q at which point it accelerates into a new framewith velocity −v wrt S and travels back to meet up with A again at the space-time pointR. Which clock has more elapsed time? The paradox is that naive thinking would saythat they both think the other clock is slower, since at all times one clock was movingwith velocity ±v wrt the other clock. But that can’t be right. But a little bit of more

9


!

0

x1P

x0’

x1’

Back Front

L

L

x

Figure 1.6: Space-time diagram for a bar stationary in S′. The world-lines for the front andback of the bar are shown.

careful thinking shows that the problem is not entirely symmetric, since B undergoesan instantaneous acceleration halfway through its journey, while A does not. This isespecially clear if we look at the space-time diagram in figure 1.7, where obviously theworld-line for A looks different than the world-line for B.

We can then compare the times of the clocks by measuring the invariant lengths alongthe trajectories. From the diagram, it is clear that B’s invariant length is twice that ingoing from P to Q. Letting ∆tA be the elapsed time along A’s trajectory between Pand R, then the elapsed time in going on B’s trajectory is

∆tB = 2

√(∆tA

2

)2

− v2

(∆tA

2

)2

=∆tAγ

. (1.3.3)

Hence, B’s clock has less elapsed time and both sides would agree. Thus, there is noparadox.

Notice that while B’s trajectory looks longer in the diagram, the elapsed time isshorter. This is because of the relative minus sign that appears in the invariant in(1.2.1).

We can also obtain our result another way. Suppose that the distance that B travelsaway from A is L according to an observer in S. Then A would think the total time forB’s journey is ∆tA = 2vL. But B would see this length contracted to L/γ, so accordingto his clock the time for the journey is ∆tB = 2vL/γ = ∆tA/γ.

1.3.6 Velocity transformations

Suppose that a body is moving with 3-velocity ~u ′ in inertial frame S′. What is the3-velocity ~u measured by an observer in S? To measure a velocity, one measures thespatial displacement and divides by the elapsed time. Thus, the velocity components in

10


!

0

x1P

x0’

x1’

"

!

Q

Rx

Figure 1.7: Space-time diagram for two clocks A and B. B has an instaneous acceleration atpoint Q.

S′ are given by

u′1 =∆x1′

∆x0′, u′2 =

∆x2′

∆x0′, u′3 =

∆x1′

∆x0′, (1.3.4)

while the components in S are

u1 =∆x1

∆x0, u2 =

∆x2

∆x0, u3 =

∆x1

∆x0, (1.3.5)

We again make use of the Lorentz transformations to write the velocity components in(1.3.5) as

u1 =∆x1′ + v∆x0′

∆x0′ + v∆x1′=

u′1 + v

1 + u′1v,

u2 =∆x2′

γ(∆x0′ + v∆x1′)=

u′2γ(1 + u′1v)

,

u3 =∆x3′

γ(∆x0′ + v∆x1′)=

u′3γ(1 + u′1v)

, (1.3.6)

From (1.3.6) one can show that if |~u ′| < c and v < c, then |~u| < c. To see this,consider the combination

1− (~u ′)2 =(∆x0′)2 − (∆x1′)2 − ((∆x2′)2 − ((∆x3′)2

(∆x0′)2> 0 . (1.3.7)

The numerator is the invariant −∆s2 and since (∆x0′)2 > 0, the numerator must alsobe greater than zero. Therefore

1− (~u)2 =(∆x0′)2 − (∆x1′)2 − ((∆x2′)2 − ((∆x3′)2

(∆x0′)2> 0 . (1.3.8)

11


1.3.7 Doppler shifts

Suppose we have a light source with wavelength λ = 1/ν whose rest-frame is S′ and isshining light at an observer in S. We can think of the light source as a clock which issending a signal at a regular time interval, ∆t′ = 1/ν. In other words, this is the timefor one wavelength of light to be emitted. The signal of course travels at the speed oflight. This clock is time dilated in S to ∆t = γ∆t′. But the question is more involvedthan just time dilation. The light being emitted is being observed by an observer at afixed position in S. We are not comparing the clock in S′ with a series of clocks it passesin S as the light source moves along.

Let us say that at t = 0 the light source is at x = 0 moving with velocity v in the xdirection. The observer is fixed at x = 0. If the light source sends a signal at t = 0 thenthe observer receives it instantaneously because the signal has zero distance to travel.The source then sends another signal at time t = ∆t. But the light source is movingaway from the observer and this signal is sent from position x = v∆t, and so the observerdoes not receive it immediately, but at a later time

t = ∆T = ∆t+ v∆t = γ∆t′ (1 + v) = ∆t′√

1 + v

1− v. (1.3.9)

Hence the time it takes the observer to see one wavelength is ∆T and so the lightfrequency as seen by the observer is

νobs =1

∆T= ν

√1− v1 + v

, (1.3.10)

and the wavelength is

λobs =1

νobs

= λ

√1 + v

1− v. (1.3.11)

Since the wavelength increases, we call this a red-shift. To find the shift if the source ismoving toward the observer, all we need to do is replace v with −v, hence we have

λobs = λ

√1− v1 + v

. (1.3.12)

Since the wavelength is smaller, we call this a blue-shift.It is instructive to look at the space-time diagram for this process. Figure 1.8 shows

the world-line of the source as the x0′ axis. Included are the world-lines for light signalssent a time ∆t′ apart according to the source’s clock and sent back toward the observerat x = 0. The diagram shows the difference between ∆t and ∆T .

12


T

0

x1

x0’

x1’

!c

!c

Source

Observer t

x

Figure 1.8: A space-time diagram for a light source emitting light toward an observer. Thesource’s world-line is the x0′ axis, while the observer’s world-line is the x0 axis.

13

Chapter 2

The equivalence principle

In this chapter we discuss Einstein’s equivalence principle in the weak form that equatesa constant gravitational field with a uniformly accelerating reference frame.

2.1 Noninertial frames: the accelerating frame

2.1.1 Acceleration

Let us now consider the case of acceleration and how an observer in a frame differentfrom the accelerating object would see this. The proper acceleration, ~α is defined asthe 3-acceleration in the rest frame of the accelerated body. Now since this rest frameis accelerating with respect to the observer’s rest frame, which we are assuming is aninertial frame, it means that the body’s rest frame is not an inertial frame. However,at any time there is an inertial frame where the body is at rest in. We call this inertialframe its instantaneous rest frame. This can also be called the momentary rest frame.At a later time, the instantaneous inertial frame is different from the one it is in now.

To avoid too many complications, let us simplify the problem a bit and assume thatthe velocities and accelerations are in the x1 ≡ x direction only, so ~α = αx. We also uset for x0. We then let S be the frame of the observer and S′(t) be the instantaneous restframe at time t. If ~u = u x is the velocity of the body as seen by the observer, then v = uat t. In S′(t) we have that u′ = 0. At a slightly later time t+ dt, where dt is assumed tobe an infinitesimal displacement, we have that u′ has an infinitesimal change, du′, sinceS′(t) is no longer the instantaneous rest frame, S′(t + dt) is. But from the definition ofthe proper acceleration, we have that

du′ = α dt′ . (2.1.1)

Since S′(t) is the instantaneous rest frame, dt is related to dt′ by time dilation:

dt = γu dt′ , γu =

1√1− u2

. (2.1.2)

The change in velocity du as seen by the observer in S is then found using the velocitytransformations in (1.3.6) of chapter 1. Given the velocity in S′(t) is du′ and the velocity

14


in S is u+ du then the transformation formula leads to

du =du′ + u

1 + u du′− u ≈ (du′ + u) (1− u du′)− u ≈ du′

(1− u2

), (2.1.3)

where we used that |u| < c and |du′| << 1 to justify the approximations.The acceleration, a, as measured by the observer in S is then

a =du

dt=du′

dt′(1− u2

)3/2=

α

γu3. (2.1.4)

Notice that a is approaching 0 as u approaches c, which is reasonable since we shouldnot be able to accelerate the body through the speed of light.

Let us take the further special case that the proper acceleration α is constant. Wethen observe that

d

dt(γu u) = γu

du

dt+ u

d

dt

1√1− u2

=(γu + u2γu

3) dudt

= γu3 du

dt. (2.1.5)

Hence, we can write (2.1.4) as

d

dt(γu u) = α (2.1.6)

which has the solution

γu u = αt+ u0 (2.1.7)

where u0 is a constant. If we take the initial condition that u = 0 at t = 0, then u0 = 0.Squaring both sides of the above equation gives

u2

1− u2= α2t2 , (2.1.8)


u =αt√

1 + α2t2. (2.1.9)

Now using u = dxdt

we get

dx =αt dt√

1 + α2t2, (2.1.10)

which can be integrated to give

x =1

α

√1 + α2t2 + x0, (2.1.11)

where x0 is a constant. Since the constant is just a shift in x we drop it. Hence, thissolution can be written as

x2 − t2 =1

α2, (2.1.12)

15


which is the equation for a hyperbola. This solution is graphed in figure 2.1, where thehorizontal axis is the x axis and the vertical axis is the t axis. The hyperbola crossesthe x axis at x = α−1. Notice that as t becomes large, the hyperbola approaches thelight-like trajectory which is the dashed line in the plot. Interestingly, if at t = 0 a lightray is emitted at x = 0, we see that it will never catch up with the accelerating bodysince the dashed line never intersects the hyperbola. By the way, notice that α has unitsof inverse length as is clear from (2.1.12). Hence αx and αt are dimensionless.

Figure 2.1: Graph of the hyperbolic trajectory in (2.1.12). The horizontal axis is the x coor-dinate and the vertical axis is the x0 = t coordinate. The dashed lines are light-like lines thatdefine the limits of the hyperbola. The hyperbola intersects the x-axis at x = 1/α.

It is of interest to find the proper time τ of the accelerating body in terms of thetime t. By time dilation we can relate dτ to dt by

dτ =dt

γ(t)= dt

√1−

(dx

dt

)2

= dt

√1− (αt)2

(αt)2 + 1=

dt√(αt)2 + 1

. (2.1.13)

We can integrate the first and the last expressions to give∫ τ

0

dτ =

∫ t

0

dt√(αt)2 + 1

τ =1

αarcsinh(ατ) =

1

αarctanh

(αt√

(αt)2 + 1

)=

1

αarctanh

(t

x

).(2.1.14)

The arctanh is the inverse of the tanh function, where tanh(ξ) is given by

tanh(ξ) =eξ − e−ξ

eξ + e−ξ. (2.1.15)

It then follows that

t

x= tanh(ατ) , (2.1.16)

16


and so using (2.1.15)

τ =1

2αlog

(x+ t

x− t

). (2.1.17)

As t→∞ we see by Taylor expanding (2.1.12) that x ≈ t+ 12α2t

in which case

τ → 1

2αlog(4α2t2

)=

1

αlog(2αt) . (2.1.18)

Hence as t→∞, τ also diverges, but only logarithmically.

2.1.2 The accelerating frame

Let us now assume that we have many bodies with different but fixed proper acceler-ations. Let us further assume that these bodies are separated from each other in thex direction as in figure 2.2, such that their intersection on the x axis is their properacceleration. The difference between the different curves is just a different choice of α in(2.1.12). Let us now choose a reference frame where all of these accelerating bodies areat rest. We will call this the accelerating frame. We can then use x ≡ α−1 as a spatialcoordinate for this frame, where the world-lines for the accelerating bodies correspondto curves of constant x.

For t, the time coordinate in the accelerating frame, we will use the proper timefor one particular accelerating body, say one with constant acceleration α0, which isthus fixed at spatial coordinate x = 1/α0. Hence we see from (2.1.14) that the lines ofconstant t are those that keep the ratio t/x fixed, in other words these are straight linesemanating from the origin. These are shown by the dashed red lines in the figure, withthe thicker lines showing the limiting cases. The values of t range from t = −∞ wheret/x = −1 to t =∞ where t/x = +1. To summarize, the coordinates in the acceleratingframe with respect to the coordinates in the original inertial frame are given by

t =1

α0

arctanh

(t

x

)x = sgn(x)

√x2 − t2

y = y z = z . (2.1.19)

These coordinates are called Rindler coordinates and the accelerating frame is also calledRindler space.

Let us now suppose that there are a series of light sources fixed at different positionsin x and with frequency ν. We further assume that there is an observer at x = 1/α0,and so has proper time t. This observer can measure the clock rates at the other spatialpoints by measuring the frequency of the observed light emanating from those points.We want to show that this corresponds to the proper time at the position of the lightsource. Without any loss of generality we focus on one source located at x = x1 whichemits a light ray at t = 0. This corresponds to t = 0, x = x1 and so the momentaryinertial frame when the light is emitted is S. If x1 < 1/α0 then the world-line of the light

17


Figure 2.2: The world-lines for several accelerating bodies. Each curve corresponds to a con-stant value of the spatial coordinate x in the accelerating frame. The curves on the left handside correspond to bodies with negative constant accelerations. Along the x axis x = x. Thedashed red lines are the lines of constant t. The blue lines are forward and backward world-linesfor light rays emitted at t = 0

seen by the observer at x = 1/α0 is parameterized by x = t+ x1, while if x > 1/α0 thenthe world-line of the light is x = −t + x1 (see figure 2.2). For x < 1/α0 the light rayintersects the observer’s world-line when

(t+ x1)2 = t2 + α−20 (2.1.20)

which we can solve for but don’t actually need. At this time t the velocity of theaccelerating body wrt S is

v =dx

dt=t

x=

α0t√(α0t)2 + 1

, (2.1.21)

Therefore the light is redshifted by the factor (see the Doppler shift section of the firstchapter) √

1− v1 + v

=√

(α0t)2 + 1− α0t = α0x− α0t = α0x1 , (2.1.22)

where we inserted the expression for the light’s world-line in the last step. Hence, theobserver thinks the clock at x1 is slow by a factor of α0x1. We can repeat the steps forx > 0, or we can just continue the result in (2.1.22) to x1 > 1/α0. In any case, if we haddirectly computed the proper time at x1 we would have found

τx1 = x1arctanh(t/x) = α0x1t , (2.1.23)

which is consistent with the redshift result.

18


We now want to use this information to construct ds2 for these coordinates. From(2.1.22) we know that the proper time for a body at fixed x will come with a factor ofα0x as compared to t. Hence, we expect ds2 to have the form

ds2 = −α20 x

2dt 2 + f 2(α0x)dx2 + dy2 + dz2 , (2.1.24)

where f(α0x) is a yet to be determined function. A straightforward calculation (whichis left as an exercise) using the expressions for the Rindler coordinates and comparingto the metric in terms of t and x,

ds2 = −dt2 + dx2 + dy2 + dz2 , (2.1.25)

shows that f(α0x) = 1.For later purposes it will be convenient to do one further change of variables. To this

end, we define a new coordinate x such that

1 + 2α0x = (α0x)2 . (2.1.26)

It then follows that the differentials are related by

2α0dx = 2α20 x dx . (2.1.27)

Letting t = t, y = y, z = z and substituting into ds2 we then find

ds2 = −(1 + 2α0x)dt 2 +dx 2

1 + 2α0x+ dy2 + dz2 . (2.1.28)

Actually this transformation is only good if x > −1/(2α0). If x < −1/(2α0) then t > xand so the argument of the arctanh in (2.1.19) is greater than 1, in which case thearctanh is complex. Hence for t > x we choose t = 1

α0arctanh(x/t) which differs from t

in (2.1.19) by the imaginary constant 12α0

log(−1), as you can see by looking at (2.1.17).Let us now make some observations of the metrics in (2.1.28) and (2.1.24):

1. The metric in (2.1.28) looks singular when x = −1/(2α0). This however is a fake.We know that nothing untoward happens for this space-time since by a changeof coordinates we can go back to the (t, x) basis where the metric is that of flatMinkowski space which has no singularities. The singularity that we are seeinghere is is known as a coordinate singularity. Later on we will see more serioussingularities, as well as find out how we can distinguish these real singularitiesfrom the coordinate singularities.

2. Even more peculiar is that if we go to region where x < −1/(2α0) then the tcoordinate becomes space-like and the x coordinate becomes time-like, since theoverall sign changes in the metric.

3. As a light source is moved closer to x = −1/(2α0) it becomes more and moreredshifted until it reaches the horizon where it’s frequency drops to zero. Whathappens as the source goes beyond x = −1/(2α0)? Then the observer at x = 0will never see it; the source has gone beyond the horizon where the light is trappedinside.

19


4. The metric in (2.1.24) looks like the metric for cylindrical coordinates discussedin the last lecture. In that case, r =

√x2 + y2 and φ = arctan(y/x). If you

compare to the Rindler coordinates you can see why the metrics look similar. Infact, if you replace t with iy, x with r and t with iφ then they are the same. Thistransformation is called a rotation to Euclidean coordinates.

2.2 The weak equivalence principle

Suppose we have an observer at rest at x = 0 in the accelerated frame using the metricin (2.1.28). Let there also be a body that is moving with velociy vy in the frame S andthat intersects the observer’s world line at t = t = 0. What does the motion of the bodylook like to the observer? We can convert the coordinates to determine the motion inthe accelerated frame. In particular, the world-line in S is parameterized by x = 1/α0,y = vt. Therefore, in the accelerated coordinates we can parameterize x in terms of y as

x = − α0

2v2y2 . (2.2.1)

In other words, the trajectory as seen by the observer in the accelerated coordinates is adownward falling parabolic arc, where downward means the negative x direction. Thisis exactly what an observer would see if they were near the surface of the earth and werewatching a free-falling body. The observer thinks the body has a downward accelerationof α0, so if α0 = g where g is the acceleration of gravity at the earth’s surface, then tothe observer this would look exactly the same as if he or she were at rest near the earth’ssurface.

Furthermore, it did not matter what the mass of the body is, or what its compositionis. Given its velocity in the inertial frame, it would have the same trajectory as observedby the fixed observer in the accelerated frame. We know that gravity has this sameproperty, namely that the trajectory of any falling body is the same (ignoring of coursewind resistance).

We can now turn this argument around. Suppose that we had an observer sitting atthe surface of the earth. Because of the gravity, this is not an inertial frame. However,a body that is free-falling toward the earth is in an inertial frame. We can now makethe statement that a body free-falling in a uniform gravitational field is in an inertialframe. Furthermore the observer who is at a fixed position in the gravitational field is inan accelerated frame. This statement is known as the principle of equivalence. Actually,this is the weak version of the equivalence. The strong version states that all laws ofphysics are the same in the free-falling frame as in special relativity.

We should explain what we mean by a uniform gravitational field. It means that thegravitational acceleration in x with respect to the inertial time t is constant. It is notconstant with respect to the proper time of a fixed observer in the accelerating frame.Note that in terms of the time t, the position x is given by

x = −1

2α0 t

2 = − 1

2α0

tanh2(α0 t) , (2.2.2)

20


and so the acceleration with respect to t is −α0 while the acceleration with respect to tis

d2x

dt2= −α0

(3− 2sech2(α0 t)

)sech2(α0 t) . (2.2.3)

For α0 t << 1 we can approximate (2.2.3) as

d2x

dt2≈ −α0 , (2.2.4)

which is the constant free-falling form, but for α0 t >> 1 the acceleration falls off expo-nentially as x approaches the horizon at x = − 1

2α0.

Actually, since the acceleration of gravity toward the earth depends on the distanceto the earth’s center, a free falling body is not exactly in an inertial frame. In fact,whenever a gravitating body is present an inertial frame does not exist. Instead, we cansay that the free-falling body is in a local inertial frame. This basically means that ifwe only look at very short distances and times, it looks the same as an inertial frame,but as the distances get larger it is no longer inertial. A frame that is an inertial frameeverywhere is sometimes called a global inertial frame.

The equivalence principle also tells us what will happen to a light ray that is directedupward at the earth’s surface. Since it is equivalent to being in an accelerated frame (atleast locally), then we know it will be red-shifted. This was confirmed by Pound andRebka in 1960. One of the exercises concerns this effect.

One final comment. The metric in (2.1.28) has the form

ds2 = −(1 + 2Φ(x))dt 2 +dx 2

1 + 2Φ(x)+ dy2 + dz2 . (2.2.5)

Note that Φ(x) = α0x is the gravitational potential for a constant gravitational fieldwith acceleration α0. We will see later on that metrics in the presence of gravitationalfields will have this general form even for non-constant gravitational fields.

21

Chapter 3

Tensors and basis vectors

3.1 One-forms and tensors

Let ~A be a general 4-vector, with the displacement vector being one example of this.We define a one-form as a linear map of all 4-vectors to the real numbers, the result ofwhich is an invariant. The word map is just a fancy way of saying it takes a vector asan input and spits out a real number as an output. We will write this as Φ( ~A). Sincethe map is linear, we have that

Φ(c1~A1 + c2

~A2) = c1 Φ( ~A1) + c2 Φ( ~A2) , (3.1.1)

where c1 and c2 are arbitrary real numbers and ~A1 and ~A2 are any two 4-vectors. HenceΦ( ~A) must have the form

Φ( ~A) = ΦµAµ (3.1.2)

where Φµ are the components of the one-form. The index is down so that we know to sumover it. In order for this to be invariant the components must have special transformationproperties. Given the transformation properties for the components of ~A in (1.1.12) ofchapter 1, the components of Φ transform under the inverse Lorentz transformation

Φµ′ = Λνµ′Φν . (3.1.3)

Since Φ has 4 components it is often called a covariant 4-vector. In fact, the down indexon the component tells us that it is covariant.

A nice way to illustrate a one-form is shown in figure 3.1 for vectors in two spatialdimensions. The dashed lines represent the one-form Φ. The map gives the number oftimes the vector crosses the lines, so from the figure we see that Φ( ~A) = 2 and Φ( ~B) = 6.Notice that these numbers are the same whether we use frame S or S′, even though thevectors have different components in the two frames. Since A1′ = cos θA1 + sin θA2 andA2′ = − sin θA1 + cos θA2, we have that Φ1′ = cos θΦ1− sin θΦ2, Φ1′ = sin θΦ1 + cos θΦ2.Notice decreasing the spacing between the dashed lines increases the value of the map.

We can generalize the one-forms to other maps called(

0n

)tensors. These are linear

maps of n vectors to the real numbers, which we write as T ( ~A1, ~A2, . . . ~An). The map is

22


!

A

x

x’

y

y’

B

Figure 3.1: The one-form Φ is represented by the dashed lines. The map gives the number oftimes a vector crosses a dashed line.

assumed to be linear for every vector input, in other words

T (a ~A1 + b ~B1, ~A2, . . . ~An) = a T ( ~A1, ~A2, . . . ~An) + b T ( ~B1, ~A2, . . . ~An)

T ( ~A1, a ~A2 + b ~B2, . . . ~An) = a T ( ~A1, ~A2, . . . ~An) + b T ( ~A1, ~B2, . . . ~An)...

T ( ~A1, ~A2, . . . ,~a ~An + b ~Bn) = a T ( ~A1, ~A2, . . . ~An) + b T ( ~A1, ~A2, . . . ~Bn) . (3.1.4)

In order for this to be true we must have

T ( ~A1, ~A2, . . . ~An) = Tµ1µ2...µnAµ1Aµ2 . . . Aµn , (3.1.5)

where the Tµ1µ2...µn are the components of the map. In order for this to be Lorentzinvariant we also require that each index transform accordingly under the inverse Lorentztransformation:

Tµ′1µ′2...µ′n = Λν1µ′1

Λν2µ′2. . .Λνn

µ′nTν1ν2...νn . (3.1.6)

Notice that we have already seen a(

02

)tensor, namely the η-tensor with components

ηµν . In other words, we have the linear map

η( ~A1, ~A2) = ηµνAµ1A

ν2 = ~A1 · ~A2 (3.1.7)

that results in a Lorentz invariant. As an exercise, show that under a Lorentz transfor-mation

ηµ′1µ′2 = Λν1µ′1

Λν2µ′2ην1ν2 (3.1.8)

has the same form, diag(−1,+1,+1,+1), in S′ as it does in S.

23


3.2 Lowering indices, the inverse metric and raising

indices.

Notice that in (3.1.7) we can group the expression as

η( ~A, ~B) = (ηµνAµ)Bν , (3.2.1)

where the part inside the parentheses has one free index ν which is down. Hence we canwrite this as Aν ≡ ηµνA

µ where we know that Aν must transform as a covariant vector.Thus it follows that with the η-tensor we can lower an index on a vector. If we can lowerthe index we should be able to reverse the process and raise it again. This requires theinverse metric ηµν where ηµνηνλ = δµλ, where δµλ is the Kronecker δ which equals 1 ifµ = λ and is 0 otherwise. So starting with Aν we have

ηµνAν = ηµν(ηνλAλ) = δµλA

λ = Aµ . (3.2.2)

Hence, with an inverse metric we can raise the index. In fact, we could have started witha one-form with components Φµ and raised its index, Φν ≡ ηµνΦµ. For a general

(0n

)tensor, we can raise one or more down indices using an inverse metric for each index.Hence after raising m of the indices for a

(0

m+n

)tensor we can consider an

(mn

)tensor

whose transformation properties are

Tµ′1µ′2...µ

′m

ν′1ν′2...ν

′n

= Λµ′1µ1Λ

µ′2µ2 . . .Λ

µ′mµmΛν1

ν′1Λν2

ν′2. . .Λνn

ν′nTµ1µ2...µmν1ν2...νn

, (3.2.3)

3.3 Scalar functions and derivatives

A scalar function φ(xλ) is a function of the coordinates xλ that is invariant under Lorentztransformations (don’t confuse the index on xλ in the function with a free index). Inother words,

φ′(xλ′) = φ(xλ) (3.3.1)

where the different coordinates are related by Lorentz transformations. Notice thatthe form of the function could change under the transformation, but the new functionevaluated at the new coordinates is the same as the old function evaluated at the oldcoordinates.

Now suppose we take a derivative on this function with respect to one of the coordi-nates, ∂µφ(xλ) ≡ ∂

∂xµφ(xλ) ≡ φ,µ(xλ). Then in S′ this is

∂µ′φ′(xλ

′) ≡ ∂

∂xµ′φ′(xλ

′) =

∂xν

∂xµ′∂νφ(xλ) = Λν

µ′∂νφ(xλ) , (3.3.2)

In other words φ,µ transforms as a covariant vector.We can generalize this to derivatives acting on vectors and tensors, but we will

postpone this discussion until later.

24


3.4 Tensors for noninertial frames

Lorentz transformations are linear transformations between different inertial frames. Aswas argued in the first chapter, the components of the Lorentz transformation matrixbetween two inertial frames S and S′ are given by

Λµ′ν =

∂xµ′

∂xν. (3.4.1)

So we could have replaced Λµ′ν with ∂xµ

′

∂xνin (3.2.3). However, once in this form there is

no reason why either of the frames needs to be an inertial frame. Hence, we can havetensors in any frame, not just inertial frames which transform between any two framesas

Tµ′1µ′2...µ

′m

ν′1ν′2...ν

′n

=∂xµ

′1

∂xµ1∂xµ

′2

∂xµ2. . .

∂xµ′m

∂xµm∂xν1

∂xν′1

∂xν2

∂xν′2. . .

∂xνn

∂xν′nT µ1µ2...µmν1ν2...νn

, (3.4.2)

If all components of a tensor are zero in one frame, then they are zero in any frameas is obvious from (3.4.2). This is a useful property that we will exploit when derivingEinstein’s equations

In the general case we can also raise indices with the inverse metric, gµν , whichsatisfies

gµνgνλ = δµλ . (3.4.3)

Since

ds2 = gµν dxµ dxν , (3.4.4)

is an invariant, it is clear that

gµ′ν′ =∂xµ

∂xµ′∂xν

∂xν′gµν . (3.4.5)

Hence, if Aµ are the components of a contravariant vector transforming as

Aµ′=∂xµ

′

∂xµAµ , (3.4.6)

then Aν = gµνAν transforms covariantly:

Aµ′ = gµ′ν′Aν′ =

∂xµ

∂xµ′∂xν

∂xν′∂xµ

′

∂xµgµν A

µ =∂xν

∂xν′Aν . (3.4.7)

3.5 Basis vectors

A general 4-vector−→A can be expressed as a linear combination of basis vectors −→e µ, as

−→A = −→e µAµ . (3.5.1)

25


1’

0

x1e1P

x0’

x1’e0 e

e

0’

x

Figure 3.2: Basis vectors for the inertial frame S (red) and S′ (blue).

The basis vectors are also 4-vectors, where the coordinate index µ is a label for the basisvector. Hence for each value of µ there is a different basis vector. The basis vectorssatisfy the orthonormality relations

−→e µ · −→e ν = ηµν , (3.5.2)

for an inertial frame and

−→e µ · −→e ν = gµν , (3.5.3)

for a more general frame. The basis vectors for different frames are different 4-vectors.For example consider two inertial frames S and S′ with basis vectors −→e µ and −→e µ′ . The4-vectors −→e 0 and −→e 1 point in different directions in space-time than −→e 0′ and −→e 1′ , asyou can see in figure 3.2. Hence they are different 4-vectors. It might seem that thebasis vectors just add a trivial complication, but they turn out to be quite useful whenwe are considering spaces that are not flat Minkowski.

If we are discussing an inertial frame, then the basis vectors are constant in all space.However, for a general frame the basis vectors may not be constant. Hence if we take aderivative with respect to the coordinates on a basis vector, ∂µ

−→e ν , in general this will benonzero. The derivative of 4-vector is also a 4-vector, and since the basis vectors form acomplete set, it must be true that the derivative has the form

∂µ−→e ν = Γλµν

−→e λ , (3.5.4)

where the repeated λ index is summed over. The objects Γλµν are called Christoffelsymbols.

As a simple example, let us consider the basis vectors for the cylindrical coordinateswith metric

ds2 = dr2 + r2 dφ2 + dz2 . (3.5.5)

26


The basis vectors we label by −→e r, −→e φ and −→e z. In figure 3.3 we show −→e r and −→e φ at twodifferent values of φ in the r, φ plane. Clearly, as we move from one point to another,the basis vectors have changed. Since −→e φ · −→e φ = r2 and −→e r · −→e r = 1, we can see fromthe figure that

∂φ−→e φ = −r−→e r , ∂φ

−→e r =1

r−→e φ . (3.5.6)

We also have that as we change r

∂r−→e φ =

1

r−→e φ ∂r

−→e r = 0 . (3.5.7)

Therefore, the nonzero Christoffel symbols are

Γrφφ = −r , Γφφr =1

r, Γφrφ =

1

r. (3.5.8)

Notice that Γφφr = Γφrφ. In fact for all spaces that that will concern us in this course the

Christoffel symbols will satisfy Γλµν = Γλνµ. Such spaces are called torsion free.

r

e!

er

e!ere!r!

e

Figure 3.3: The basis vectors ~er and ~eφ at two different values of φ and fixed r (green, blue)and the basis vectors at two different values of r and fixed φ (blue, red). For decreasing r thelength of ~eφ is proportionally shorter but ~er does not change.

3.6 The velocity vector

We have previously considered velocity 3-vectors in space, but we would like to also have

a velocity 4-vector,−→U that transforms as a contravariant vector. Suppose then we have a

body moving through space-time. If we look at an infinitesimal change in its space-time

position then we can construct the 4-vector−→d x. If we then divide this by a differential

of the proper time, dτ , where recall dτ 2 = −−→d x ·−→d x, then we can define

−→U as

−→U =

−→d x

dτ. (3.6.1)

27


Clearly−→U satisfies

−→U ·−→U = −1.

Let us suppose that S′(τ) is the momentary rest-frame of the moving body at propertime τ . Recall that the momentary rest-frame is an inertial frame, or at least a local

inertial frame. In the rest frame the spatial components of−→d x are zero, while dτ =

dx0′ . Hence, U0′ = 1 and the spatial components satisfy U i′ = 0. So it follows that−→U = −→e 0′(τ), the basis vector for the time component in the momentary rest frame. Infigure 3.4 we show the world-line of a body moving through space-time with nonconstantvelocity. In the momentary rest frame the trajectory points along the time direction of

the frame. Hence−→U is the tangent vector of the world-line.

!

0

x1

e0’

e0’e0’

x

Figure 3.4: A world-line with nonconstant velocity parameterized by the proper time τ . Thetangent vectors (red) are ~e0′(τ), the basis vector along the time direction for the momentaryrest frame.

3.7 The covariant derivative

Let us return to the case of derivatives on vectors and more general tensors. As was

emphasized in the first lecture, a 4-vector−→A is invariant under a coordinate transforma-

tion, although the components of−→A can change as the coordinates are changed. Since−→

A is invariant, −→e µ must tranform as

−→e µ′ =∂xµ

∂xµ′−→e µ′ . (3.7.1)

If we then consider the derivative ∂λ−→A , this transforms as

∂λ′−→A =

∂xλ

∂xλ′∂λ−→A . (3.7.2)

But we also have that

∂λ−→A = ∂λ(

−→e µAµ) = −→e µ∂λAµ + Γνλµ−→e νAµ . (3.7.3)

28


Relabeling the indices we can rewrite this as

∂λ−→A = −→e µ(∂λA

µ + ΓµλνAν) . (3.7.4)

Defining the covariant derivative as

∇λAµ ≡ ∂λA

µ + ΓµλνAν ≡ Aµ;λ (3.7.5)

we see from (3.7.1) and (3.7.2) that it transforms as the components of a(

11

)tensor. For

the inertial frame the covariant derivative is the same as the ordinary derivative. It isalso clear that Γµλν by itself is not a tensor, only when it is combined with a derivativeterm. Actually, we can see this another way as well. We know that if we have an inertialframe Γµλν = 0, but if we choose a different set of coordinates then in general at leastsome of the Γµλν are nonzero. But if all components of a tensor are zero in one framethen they are zero in all frames. Clearly Γµλν does not satisfy this property.

Since a derivative acting on a scalar φ is a tensor, we define its covariant derivative as∇λφ = ∂λφ. If we next consider a derivative acting on the scalar made from a one-formacting on a vector Φ( ~A) = ΦµA

µ and we assume that the covariant derivative satisfiesthe Leibniz rule, then we have

∇λ(ΦµAµ) = (∇λΦµ)Aµ + Φµ∇λA

µ

= (∂λΦµ)Aµ + Φµ ∂λAµ . (3.7.6)

It thus follows from (3.7.5) that

∇λΦµ = ∂λΦµ − ΓνλµΦν ≡ Φµ;λ . (3.7.7)

For the covariant derivative of a general tensor one should add a Christoffel symbol witha + sign for every up index and a Christoffel symbol with a − sign for every down index:

∇λ Tµ1...µmν1...νn

= ∂λ Tµ1...µmν1...νn

+m∑j=1

ΓµjλσT

µ1...σ...µmν1...νn

−n∑j=1

ΓσλνjTµ1...µmν1...σ...νn

≡ T µ1...µmν1...νn;λ . (3.7.8)

We close this lecture by finding the expression for the Christoffel symbols in terms ofthe metric gµν . From their definitions we have that

∂λ(−→e µ · −→e ν) = ∂λ gµν

= (Γσλµ−→e σ) · −→e ν +−→e µ · (Γσλν−→e σ) = Γσλµgσν + Γσλνgµσ . (3.7.9)

If we then rotate indices and take the following combination of terms we have

∂λ gµν + ∂µ gλν − ∂ν gµλ = Γσλµgσν + Γσλνgµσ + Γσµλgσν + Γσµνgλσ − Γσνλgσµ − Γσνµgλσ

= 2Γσλµgσν , (3.7.10)

where we used the symmetry properties of Γσλµ and gσν . Hence we find using the inversemetric that

Γσλµ = 12gσν (∂λ gµν + ∂µ gλν − ∂ν gµλ) . (3.7.11)

29


As an example, let’s again consider the cylindrical coordinates. The components ofthe inverse metric are grr = 1, gφφ = 1/r2 and gzz = 1. Let us consider Γφrφ. Accordingto (3.7.11) this is

Γφrφ = 12gµφ (∂rgµφ + ∂φgµr − ∂νgrφ)

=1

2

1

r2

(∂rr

2 + 0 + 0)

=1

r, (3.7.12)

which agrees with our result from before. In doing this calculation we used the fact thatgµν is diagonal so there is only a contribution if µ = φ. The computation of the othercomponents is left as an exercise.

30

Chapter 4

Equations of motion in curved space

In this chapter we compute the trajectories for freely falling bodies and light rays ingravitational fields.

4.1 Free-falling and geodesics

In an inertial frame, a body with no forces acting on it travels with constant velocity. Itfollows that its world-line is a straight line. We saw that when we go to the acceleratingframe the trajectory is a parabola, precisely what we would expect for free-falling in aconstant gravitational field. In this chapter we will study the trajectories of bodies inspace-times where a global inertial frame does not exist. This happens when nonconstantgravitational fields are present.

The metric itself is found by solving Einstein’s equations, which will be discussed inlater lectures. For the present we just assume there is a generic metric gµν that affects themotion of a particle. If the particle itself has a mass, then it too should be a gravitationalsource and hence change the metric. However, if the mass is small enough, then the effecton the metric is practically negligible and can be ignored. In this case we say the particleis a test particle and that it does not backreact on the metric.

4.1.1 The metric is covariantly constant

Before getting into the main parts of this section, we begin with two remarks aboutcovariant derivatives. We first observe that ∇λgµν = 0, or in words, “the metric iscovariantly constant”. To show this, consider the derivative

∂λgµν = ∂λ(~eµ · ~eν) = Γσλµ~eσ · ~eν + Γσλν~eµ · ~eσ = Γσλµgσν + Γσλνgµσ . (4.1.1)

Thus,

∂λgµν − Γσλµgσν − Γσλνgµσ = ∇λgµν = 0 . (4.1.2)

Secondly, since the metric is covariantly constant, we can raise and lower indices

31


through a covariant derivative. In other words

∇λAµ = ∇λ(gµνAν) = gµν∇λA

ν

∇λAµ = ∇λ(g

µνAν) = gµν∇λAν

∇µAµ = ∇µ(gµνAν) = gµν∇µAν = ∇νAν , (4.1.3)

et cetera.

4.1.2 The local inertial frame

We mentioned in a previous lecture that while a global inertial frame might not exist,there exists a local inertial frame at each space-time point. Let us show that this istrue at any space-time point, which witihout any loss of generality we can set to bexµ = 0. If the frame is locally inertial then the metric gµν(x) satisfies gµν(0) = ηµνand the Christoffel symbols evaluated at xµ = 0 are Γµνλ(0) = 0 (gµν(x) and Γµνλ(x) areassumed to depend on all 4 position components, but we abbreviate the notation anddrop the index inside the parentheses).

Let us first start with a frame that is not locally inertial at xµ = 0, but instead hasan arbitrary gµν(0) and Γµνλ(0). We then look for transformations that can take us tothe local inertial frame. Let us assume that we make a small change in our coordinatesystem and write the new variables xµ

′in terms of the old variables xµ as

xµ′

= xµ + ξµ(x) , (4.1.4)

where ξµ(x) are the components of a spatially dependent 4-vector which we are assumingto be small, at least near xµ = 0. Let us investigate how this changes the metric. Considerthe invariant

ds2 = gµ′ν′(x′)dxµ

′dxν

′= gµν(x)dxµdxν . (4.1.5)

The metric in the new coordinates is a different tensor function than the metric in theold coordinates. Assuming that ξµ(x) is small, we can then approximate the new metricas

gµ′ν′(x′) ≈ gµν(x) + ξλ(x)∂λgµν(x) + δgµν(x) , (4.1.6)

where the second term comes from Taylor expanding the function about the old coor-dinates and the third term is the change in the function. If we expand the invariant in(4.1.5) to first order we then find

gµ′ν′(x′)dxµ

′dxν

′ ≈ gµν(x)dxµdxν + gµν(x)∂λξµ(x)dxλdxν + gµν(x)∂λξ

ν(x)dxµdxλ

+ ξλ(x)∂λgµν(x)dxµdxν + δgµν(x)dxµdxν . (4.1.7)

32


Hence it follows after changing some indices that

δgµν = −gλν∂µξλ − gµλ∂νξλ − ξλ∂λgµν= −∂µξν − ∂νξµ + ξλ

(∂µgλν + ∂νgµλ − ∂λgµν

)= −∂µξν − ∂νξµ + ξλ

(Γσµλgσν + Γσµνgλσ + Γσνµgσλ + Γσνλgµσ − Γσλµgσν − Γσλνgµσ

)= −∂µξν − ∂νξµ + 2ξσΓσνµ= −∇µξν −∇νξν . (4.1.8)

In going from the first line to the second we brought gλν and gµλ inside the derivativesso that we could lower the index on ξλ. The third line follows from the second becausegµν is covariantly constant. Finally the last line follows from the fourth because of theform of covariant derivatives acting on covariant vectors. Notice that the change in gµνis also a symmetric

(02

)tensor. The transformations in (4.1.4) and (4.1.8) are called

diffeomorphisms. They are just changes in the coordinate systems and do not changethe physics. These are the analogs of gauge transformations in electrodynamics, wherea change in the gauge field Aµ by a derivative ∂µφ does not change the electric andmagnetic fields.

Let us now assume that ξµ(0) = 0 so that the origin of the first set of coordinatesstays the origin for the second set. Then the expansion of ξµ(x) about xµ = 0 has theform

ξµ(x) = εµν xν + 1

2bµνλ x

νxλ + . . . (4.1.9)

where εµν and bµνλ are constants in x. Because of the symmetry in (4.1.9) we can assumethat bµνλ is symmetric in the last two indices. Hence we have that

δgµν(0) = −∇µξν(0)−∇νξν(0) = −∂µξν(0)− ∂νξµ(0) = −εµν − ενµ , (4.1.10)

where we used that ξµ(0) = 0 so that the covariant derivatives become ordinary deriva-tives. Hence, by choosing εµν appropriately we can shift the metric closer to ηµν . Actually,in order for the expansion in (4.1.7) to be valid we need that ∂λξ

µ is small, at least nearxµ = 0. This means that the εµν are small. However, if gµν(0) starts far from ηµν , we canshift to ηµν in small steps, where we define new coordinates for each step, and eventuallyreach the desired form of the metric.

Notice that only the symmetric part of εµν contributes to δgµν . To understand why theantisymmetric part does not shift the metric, let us suppose we have the transformationin (4.1.9) but with εµν small and antisymmetric. Then to leading order near xµ = 0 wehave that

xµ′ = xµ + εµνxν . (4.1.11)

Now consider an infinitesimal Lorentz boost in the x direction, where we use the rapidityvariable γ = cosh ζ ≈ 1, vγ = sinh ζ ≈ ζ, then

x0′ = η0′0′Λ0′µx

µ ≈ x0 − ζx1

x1′ = η1′1′Λ1′µx

µ ≈ x1 + ζ x0 . (4.1.12)

33


Comparing this to (4.1.11) we see that ε01 = −ε10 = −ζ. Hence the antisymmetric εµνcorrespond to local Lorentz transformations which we previously knew did not changethe metric.

Once we have set gµν(0) = ηµν we can then deal with the Christoffel symbols. Nowwe assume that

ξµ(x) = 12bµνλ x

νxλ + . . . (4.1.13)

so that we leave the origin and the metric at the origin unchanged. Notice that forarbitrary bµνλ the expansion in (4.1.7) is valid so long as xµ is small enough. With thisform, the change to the Christoffel symbols at the origin is

δΓµνλ(0) =1

2ηµσ (∂λδgνσ + ∂νδgσλ − ∂σδgνλ)

= −1

2ηµσ (bσνλ + bνσλ + bλσν + bσλν − bλνσ − bνλσ)

= −ηµσ bσνλ (4.1.14)

Hence, by choosing bσνλ = ηµσ Γµνλ(0), δΓµνλ(0) will cancel off the Christoffel symbols atthe origin, leaving us with the local inertial frame.

4.1.3 Free-falling

Now that we have established the existence of a local inertial frame, let us find thetrajectories for free-falling in the noninertial frame. The key point is that free-falling isthe same as constant velocity in the local inertial frame. To this end, let us consider the

4-velocity−→U =

−→d xdτ

, where τ is the proper time of the falling body. This is an invariantsince we are considering the full vector and not just its components. We can then make

an acceleration 4-vector−→A = d

dτ

−→U by taking another τ derivative. Since this is the

4-acceleration for the free-falling body, we know that in the local inertial frame it is

zero. But then it must be zero in all frames. Now write the vector−→d x in components,−→

d x = ~eµdxµ. If we take a τ derivative on this we find

d

dτ

−→d x

dτ=

d

dτ(~eµ x

µ)

= ~eµxµ +

(d

dτ~eµ

)xµ = ~eµx

µ + (xν∂ν~eµ) xµ

= ~eµ(xµ + Γµνλx

ν xλ)

= 0 , (4.1.15)

where xµ ≡ ddτxµ. We used here that ~eµ does not explicitly depend on τ , it only depends

on τ indirectly through the coordinates xσ. Since the ~eµ are a complete set of basisvectors it must be true that all components in the last line are zero. Therefore, we areled to the equations of motion for free-fall,

xµ + Γµνλxν xλ = 0 (4.1.16)

34


4.1.4 Geodesics

Equation (4.1.16) is called the geodesic equation. Why do we call it this? Instead ofa space-time, let us consider a Euclidean space1 with some metric and let us supposethat we want to find the shortest path between two points in this space. For example,suppose our space is the earth’s surface and the two points are Logan airport in Bostonand Ciampano airport in Rome. Boston and Rome are practically at the same latitude(42 degrees North). Yet when an airplane flies from Boston to Rome it does not godirectly east, but starts out heading northeast. Eventually the flight starts turningtoward the south and is heading in a southeasterly direction by the time it approachesCiampano. Why does it take this route? Because it is the shortest! The route it takesis called a geodesic, or sometimes a great circle.

Let us see how this is related to (4.1.16). Let the metric on the earth’s surface beds2 = gijdx

idxj and suppose we have a continuous path between two points A and B.We parameterize the path by a variable λ, where λ = 0 at A and λ = 1 at B, althoughwe could have chosen a different initial or final value. We write the coordinates on thepath as xi(λ). To find the length L between A and B we then integrate ds along thepath, since ds is giving us the infinitesimal length of the path between λ and λ + dλ.Hence we have

L =

∫ B

A

ds =

∫ B

A

√gijdxidxj =

∫ 1

0

√gijxixj dλ , (4.1.17)

where xi ≡ dxi

dλ. Let us assume that xi(λ) is the path that minimizes L. Then if we add

a small correction δxi(λ) to the path, then to leading order δL is zero, since L is theminimum. Hence we have

0 =

∫ 1

0

g−1/2

(1

2∂kgijx

ixjδxk + gijxi d

dλδxj)dλ , (4.1.18)

where g ≡ gijxixj. Integrating the second term by parts we find

0 =

∫ 1

0

g−1/2

(1

2∂kgijx

ixj − gikxi − ∂jgikxixj +1

2ggikx

i d

dλg

)δxkdλ , (4.1.19)

where there is no contribution from boundary terms since δxi(0) = δxi(1) = 0. Since(4.1.19) is 0 for any δxi, the term inside the parentheses is 0. Let us first assume thatddλg = 0, showing afterward that this is consistent. Then we have

0 = gikxi + ∂jgikx

ixj − 1

2∂kgijx

ixj

= gikxi +

1

2

(∂jgikx

ixj + ∂igjkxixj − ∂kgijxixj

). (4.1.20)

1The convention is to use the phrase “Euclidean space” when are all components are spatial. It doesnot need to be a flat space.

35


Multiplying through by a metric factor gkl we get,

0 = gkl(gikx

i +1

2

(∂jgilx


))= xl +

gkl

2

(∂jgilx


)= xl + Γlijx

ixj , (4.1.21)

matching the form of the equation in (4.1.16). Hence, the free fall equation in space-timeis the analog of the shortest path between two points on the earth’s surface. Actually,nothing in this derivation is specific to the surface of the earth. These arguments wouldhave worked for any Euclidean space. Let us now show that d

dλg = 0 is consistent. To

this end we have

d

dλg = 2gijx

ixj + ∂jgikxixkxj

= 2gikxixk + ∂jgikx

ixkxj + ∂igjkxixkxj − ∂kgijxixkxj

= 2

[gikx

i +1

2

(∂jgikx


)]xk . (4.1.22)

But the term inside the square brackets is the same term as the last line of (4.1.20),which is zero. Hence d

dλg = 0 is consistent.

4.1.5 The particle action

We have just shown that by minimizing the length of a path we get the geodesic equation.We know that in particle mechanics that by minimizing an action we should get theequations of motion. The equations of motion are the geodesic equations, so the actionshould be closely related to (4.1.17). For the particle motion, the particle is on a time-liketrajectory, while the measured length is space-like. Hence, instead of ds we should usedτ and instead of g we should use −g = −gµν xµxν , where xµ = d

dτxµ. Finally, the action

should have dimension of (energy)×(time). In our units where c = 1, mass has the sameunits as energy, so to get an action with appropriate dimension we will multiply by theparticle’s rest mass m. Hence, the particle action is

S = m

∫ τf

τi

√−g dτ , (4.1.23)

where τi and τf are the initial and final proper times for the particle’s space-time trajec-tory. By the same arguments as the last section, minimizing S will lead to the equationsof motion in (4.1.16).

4.1.6 An example

Suppose we have a metric in polar coordinates that looks like2

ds2 = − r2

R2dt2 +

R2

r2dr2 +R2(dθ2 + sin2 θ dφ2) = −dτ 2 (4.1.24)

2This is the metric for the space-time AdS2 × S2. We will learn the origin of this name later in thecourse.

36


where R is a constant. Let us also suppose that a body that is free-falling radially, suchthat θ = φ = 0, in which case we can ignore the θ and φ part of the metric and workwith the two dimensional space-time parameterized by (t, r).

Since the metric is diagonal, the components of the inverse metric are also diagonaland have the form gaa = 1/gaa, where the repeated indices are not summed over. TheChristoffel symbols are also easier to compute with a diagonal metric and have the form

Γabb = −12gaa∂agbb , Γbab = 1

2gbb∂agbb , Γaaa = 1

2gaa∂agaa , Γabc = 0 , (4.1.25)

where a, b and c are assumed to be different indices and repeated indices are not summedover. Hence, for the metric in (4.1.24) the nonzero Christoffel symbols involving r and tare

Γrrr =1

2grr∂rgrr =

1

2

r2

R2∂rR2

r2= −1

r

Γrtt = −1

2grr∂rgtt = −1

2

r2

R2∂r

(− r

2

R2

)=

r3

R4

Γttr =1

2gtt∂rgtt =

1

2

(−R

2

r2

)∂r(−

r2

R2) =

1

r(4.1.26)

Hence we find

r − 1

rr2 +

r3

R4t2 = 0

t+ 21

rrt = 0 . (4.1.27)

At first glance (4.1.27) looks very complicated since we have two coupled nonlinearsecond order differential equations. However, if we multiply the last equation by r2, wecan rewrite this as

d

dτ(r2t) = 0 , (4.1.28)


r2t = C (4.1.29)

where C is an integration constant. Hence, we can then write the first equation as

r − 1

rr2 +

C2

rR4= 0 , (4.1.30)

which after multiplying by r/r2 becomes

rr

r2− r3

r3+C2r

r3R4=

d

dτ

(1

2

r2

r2− 1

2

C2

r2R4

)= 0 . (4.1.31)

It then follows that

r2 =C2

R4− Cr2 , (4.1.32)

37


where C is another integration constant.However, there is a faster way of deriving (4.1.32) which in addition shows that C

must be fixed to a particular value. If we take the metric in (4.1.24) and divide by dτ 2,mulitply by r2/R2, set φ = θ = 0 and use (4.1.29) we get the relation(

dr

dτ

)2

=r4

R4

(dt

dτ

)2

− r2

R2

(dτ

dτ

)2

=C2

R4− r2

R2, (4.1.33)

which is (4.1.32) with C = 1/R2.Equation (4.1.33) is solved by quadrature,

τ = R

∫ r

ri

dr√C2

R2 − r2

, (4.1.34)

where ri is the initial value of r. In this case the solution is

τ = R

(arcsin

(R

Cr

)− arcsin

(R

Cri

)), (4.1.35)

which we can invert to

r =C

Rsin

(1

R(τ + τ0)

), (4.1.36)

where τ0 is chosen to give ri at τ = 0. Notice that there is a maximum value for r,rmax = C

Rwhere r = 0, hence the body can never escape to infinity for any finite value

of ri and C.We can now go back and solve for t, which has the form

t =

∫ τ

0

Cdτ

r2(τ)=R2

C

∫ τ

0

dτ

sin2(R−1(τ + τ0))

=R3

C

(cot

(1

Rτ0

)− cot

(1

R(τ + τ0)

))= R

√R2

r2i

− R4

C2−R

√R2

r2− R4

C2. (4.1.37)

Assuming that r starts out positive, then we first use the positive branch of the squareroot to evaluate t. Once r reaches rmax we switch to the negative branch so that t isalways positive. Notice that it takes an infinite amount of coordinate time t to reachr = 0.

4.1.7 Light-like trajectories

The geodesic equations are also valid for light rays, although in this case we need toreplace τ with a variable λ that can not be interpreted as the proper time, but is stilla parameter that marks the points on the space-time trajectory. However, it is actually

38


easier to just use the metric, setting ds2 = 0 on the trajectory to make it light-like.Returning to the example in the previous section, we find for the metric in (4.1.24) that

dt = ±R2

r2dr , (4.1.38)

assuming that the motion is along fixed θ and φ. We then integrate (4.1.38) to find

t = ±∫ r

ri

R2dr

r2= ±

(R2

ri− R2

r

), (4.1.39)

where we have + for an outgoing trajectory and − for an ingoing trajectory. Notice thatfor the outgoing trajectory the light reaches r =∞ in a finite amount of coordinate time,t = R2/ri. For the ingoing trajectory it takes an infinite amount of time to reach r = 0.One further observation is that if we take the limit C →∞ in (4.1.37) then the relationbetween t and r approaches (4.1.39).

4.2 Nonrelativistic motion in weak gravitational fields

In a previous chapter we suggested that the g00 component of the metric was related tothe gravitational potential. Let us now explicitly show this by considering nonrelativisticmotion for a test particle of rest mass m and show that the geodesic equations areconsistent with this. We assume that the metric has the form

ds2 = −(1 + 2Φ( x−→))dt2 + d2xi , (4.2.1)

where the repeated i index is summed over and x−→ symbolizes the spatial 3-vector. Φ( x−→)

only depends on the spatial coordinates and is assumed to be small, that is |Φ( x−→)| << 1.It is possible to include small corrections to the spatial components of the metric, but theywill be unimportant for free-fall in the nonrelativistic limit (In a subsequent lecture wewill see that we need to include such corrections when considering space-time curvature).

For nonrelativistic motion, we can approximate the proper time to be τ = t so thatt ≈ 1 and xi ≈ d

dtxi and |xi| << 1. From the geodesic equations and our approximations

we have

md2

dt2xi ≈ mxi = −mΓiµν x

µxν ≈ −mΓi00 . (4.2.2)

The relevant Christoffel symbol is easily computed, where we find

Γi00 = −1

2gij∂jg00 = ∂iΦ( x−→) , (4.2.3)

where we used that gij = δij. Hence the equation of motion is

md2

dt2x−→ = −m∇Φ( x−→) . (4.2.4)

39


If we identify Φ( x−→) with the gravitational potential then −m∇Φ( x−→) is the gravitational

force and so we find that (4.2.4) is Newton’s force equation for motion in a gravitationalfield.

For example, suppose we consider the gravitational potential outside a sphericallysymmetric potential of mass M . In this case Φ is given by

Φ(r) = −MG

r, (4.2.5)

where G is Newton’s gravitational constant, G ≈ 6.674 × 10−11 m3 kg−1s−2 ≈ 7.42 ×10−28 m kg−1 where the latter value is given in units where c = 1. If this is to be a weakpotential with |Φ(r)| << 1, then r >> MG. For a solar mass where M ≈ 2 × 1030 kg,then r >> 1.5×103 m. The radius of the sun is about 7×108 m, so even at this distancethe potential is sufficiently weak.

4.3 Isometries and Killing vectors

4.3.1 Existence of Killing vectors

It might seem miraculous that we are able to solve for the equations in (4.1.27). Itturns out that there is an underlying reason why we could do it — there exists constantsof the motion. The idea should be familiar to you from ordinary mechanics, whereconservation of energy can be used to convert a second order differential equation to afirst-order equation. Motion in three spatial dimensions can also be solved if there isanother conserved quantity. For example, a central force conserves angular momentum.With the angular momentum conserved, we can convert a differential equation in termsof time derivatives on r, θ and φ to one where only r appears.

For motion in curved space there are conserved quantities if the metric has isometries.An isometry is coordinate transformation that leaves the metric unchanged. Hence ifour coordinate transformation is xµ → xµ + ξµ(x), then this is an isometry if

∇µξν +∇νξµ = 0 . (4.3.1)

Notice that because this is a tensor equation, if it is true for one frame then it is truefor all frames. A vector ~ξ that satisfies (4.3.1) is called a Killing vector.

Now let us see how the existence of a Killing vector leads to a conserved quantity.Consider the combination

~ξ · ~U = ~ξ · ddτ

−→d x = gµν ξ

µxν , (4.3.2)

where ~U is the velocity 4-vector, and take a derivative on this combination with respect

40


to τ . This gives

d

dτ

(~ξ · ddτ

−→d x

)= ∂λgµν ξ

µxλxν + gµν ∂λξµxλxν + gµν ξ

µxν

= ∂λgµν ξµxλxν + gµν ∂λξ

µxλxν − gµν ξµΓνλσxλxσ

= ∂λgµν ξµxλxν + gµν ∂λξ

µxλxν − 1

2ξµ (∂λgµσ + ∂σgλµ − ∂µgλσ) xλxσ

=1

2

(gλν∂µξ

λ + gµλ∂νξλ + ξλ∂λgµν

)xµxν , (4.3.3)

where in going from the first to the second line we used the geodesic equation (4.1.16), andfrom the second to the third we wrote the Christoffel symbols in terms of the metric. Thefourth line follows from the third after a relabeling of indices and a cancellation of terms.If we now consult the first line of (4.1.8), we see that the term inside the parentheses in

the last line of (4.3.3) is −δgµν . Since ~ξ is a Killing vector, δgµν = 0 and so ~ξ · ddτ

−→d x is

constant in τ .

4.3.2 The example again

Let us see how this works with our example in section 4.1.6. If we let ξt ≡ ξ0 = 1 andset all spatial components to zero, then

∇µξν +∇νξµ = gνλ∇µξλ + gλµ∇νξ

λ = gνλΓλµσξ

σ + gλµΓλνσξσ . (4.3.4)

Most combinations of µ and ν obviously give 0. If µ is r and ν is t, then we have

∇rξt +∇tξr = gttΓtrtξ

t + grrΓrttξ

t = − r2

R2

1

r+R2

r2

r3

R4= 0 . (4.3.5)

Hence ~ξ is a Killing vector. Actually, there is an easier way to see that ~ξ is a Killingvector just by inspection of the metric. The only time dependence in ds2 is in the dt2

term. Hence, if we shift t by a constant the metric is unchanged. Therefore ~ξ is a Killingvector.

The constant of the motion that we extract from ~ξ is

~ξ · ddτ

−→d x = gµν ξ

µxν = − r2

R2t , (4.3.6)

which up to a constant factor is the expression in (4.1.29).

Since ~ξ · ~ξ = gµν ξµξν = − r2

R2 < 0, we call ~ξ a time-like Killing vector. If we inspectthe metric in (4.1.24), we see that the only φ dependence is in dφ2, hence the metric is

invariant under a constant shift in φ and so ~ζ is a Killing vector that has ζφ = 1 and allother components zero. This Killing vector is space like and the conserved quantity thatwe make from it is

~ζ · ~x = gµνζµxν = R2 sin2 θ φ . (4.3.7)

41


4.3.3 The conserved quantities

Physically, how should one understand the conserved quantities? For this, let us considermotion of a body with rest mass m in an inertial frame. In this case any constant vector~ξ is a Killing vector and ~ξ · ~U = ηµνξ

µUν is a constant of the motion. If we take ξ0 = 1and all other components zero, and further multiply by a factor of −m we have

−m~ξ · ~U = mU0 , (4.3.8)

which is the energy of the particle. In the previous example which does not have a globalinertial frame, the conserved quantity derived from the time-like Killing vector is thenrelated to the energy of the particle. However, since gravity is built into the metric, theenergy that is conserved is the combined kinetic plus gravitational potential energy.

The space-like Killing vectors in the inertial frame lead to other conserved quantitieswhich are the components of the linear momentum. If we had used polar coordinateswith metric

ds2 = −dt2 + dr2 + r2(dθ2 + sin2 θ dφ2) , (4.3.9)

then we can have the space-like Killing vector with ζφ = 1 and all other componentszero, which leads to the conserved quantity r2φ. Multiplying by m, this is the angularmomentum. Hence Killing vectors along angular directions lead to conserved angularmomentum.

42

Chapter 5

Curvature and the Riemann tensor

In this chapter we give a mathematical description of curvature. This culminates in theconstruction of a tensor called the Riemann tensor.

5.1 Parallel transport and curvature

In the last chapter we discussed how to determine the motion of massive bodies or lightgiven a metric. The metric itself is determined by solving Einstein’s equations whichencodes the gravitational interactions. The source of the gravitational fields can bemassive bodies, radiation or even something called dark energy. These gravitationalsources end up curving the space, which changes the metric and hence effects the motionof test bodies in the gravitational fields.

It thus behooves us to learn how space-time gets curved by the gravitational sources.In order to do this we require a precise definition of curvature which will enter intoEinstein’s equations, which roughly speaking will be of the form

“curvature=gravitational sources” .

Obviously, we will need to make this more precise.

5.1.1 Parallel transport

Suppose we have a vector at a particular point in space-time and we want to move it toanother point in a series of infinitesimal steps, such that the vector at the beginning ofthe step is parallel to the vector at the end of the step. Such a process is called “paralleltransport”. If the space-time is flat, then the result of the transport is independent ofthe path taken to get from the initial point to the final point. However, if the space iscurved, then the result will be path dependent.

Figure 5.1 shows parallel transport in a two dimensional flat space along two differentpaths. As you can see, the arrow stays parallel to the preceding arrow in the transport.You can also see that at the end of the transport, the arrow points in the same directionfor both paths. Another way to look at this transport as parallel transporting the vectoraround a loop, where the transport goes out along one path and them comes back to the

43


Figure 5.1: Parallel transport along two different paths in a flat two dimensional space. Alter-natively, this is parallel transport around a loop with the red arrows for the transport goingout (say) and the blue arrows for the transport going in.

original point along the other path. In this case the vector that returns is pointing inthe same direction as the starting vector.

Next consider transport along the surface of a two-dimensional sphere, such as thesurface of the earth. Figure 5.2 shows parallel transport of a vector that starts outpointing north on a point on the earth’s equator. It is then parallel transported 90degrees east along the equator, after which it is still pointing north. Then we transportthe vector northward all the way to the North pole. Finally, we transport southwardtoward the origin of the transport. As you can see, the vector is not the original vector,but has been rotated by 90 degrees and is now pointing west. This change in the vectorunder the parallel transport is due to the earth’s curvature. Curvature is one of thosethings where you know it and when you see it, but parallel transport will give us away of defining curvature precisely. In particular, it will give us a way of defining thelocal curvature, that is the curvature at a particular point in space-time. The earthhas roughly the same curvature everywhere on its surface, but we will consider manyspace-times where the curvature is position dependent.

To proceed, suppose we have a vector ~A, which we write in terms of basis vectorsas ~eµA

µ. We want to parallel transport the vector along some path parameterized bya variable λ. If the vector is parallel transported along the path then it does not varyalong the path, so ∂

∂λ~A = 0. Since we can parallel transport along any path we choose,

it might just seem that one way to insure parallel transport is for ~A to have constantcomponents everywhere in space-time. But the components depend on the basis vectorsand the basis vectors themselves could change as we move along the path. Hence, themore general case might be that ~A is made up of components where each component is asingle valued function on space-time, although not necessarily a constant function. Buteven this is wrong as can be seen by our example of parallel transport along the earth’ssurface, where the vector is clearly not single valued under parallel transport around the

44


Figure 5.2: Parallel transport on a two dimensional sphere. The path first goes east (red), andthen north to the North pole (blue) and then south back to the original point on the equator(green). After the transport he vector has turned by 90 degrees.

loop (the original vector is different from the final vector even though both are at thesame point.).

To sort this out, let us say we have some vector ~A12(x) and some coordinate systemsuch that the components Aµ12(x) are single valued functions in space-time (as in the lasthand-out, the functions depend on all components of the coordinates xµ, but we drop theindex inside the functions). Let ~A = ~A12(x) for some particular point with coordinates

xµ and let us parallel transport ~A a short distance away to the point with coordinatesxµ + δ xµ1 and let us further assume that this parallel transported vector is ~A12(x+ δ x1).Under parallel transport we have

δxσ1∂σ ~A12(x) = 0 = δxσ1~eµ(∂σA

µ12(x) + Γµσν(x)Aν12(x)

)= δxσ1~eµ∇σ A

µ(x) , (5.1.1)

that is, the covariant derivative of Aµ12 is zero along the parallel transport direction. Thenthe components of the parallel transported vector, Aµ12(x+ δ x1), satisfy to leading order

Aµ12(x+ δ x1) ≈ Aµ12(x) + δxσ1∂σAµ12(x) = Aµ12(x)− δxσ1Γµσν(x)Aν12(x) = Aµ − δxσ1Γµσν(x)Aν .

(5.1.2)

Now suppose we take the parallel transported vector at xµ + δ xµ1 and parallel transportit a short distance away to xµ + δ xµ1 + δ xµ2 where we assume that δ ~x2 is not parallel to

δ ~x1. Let us further assume that ~A12(x + δ x1 + δ x2) is the parallel transported vector.

45


Since we got to this point by parallel transport, we have that

~A12(x+ δ x1 + δ x2) ≈ Aµ12(x+ δ x1) + δxσ2∂σAµ12(x+ δ x1)

= Aµ12(x+ δ x1)− δxσ2Γµσν(x+ δ x1)Aν12(x+ δ x1)

≈ Aµ − δxσ1Γµσν Aν − δxσ2Γµσν A

ν

−δxσ2δxρ1 ∂ρΓ

µσν A

ν + δxσ2δxρ1 ΓµσνΓ

νρκA

κ

= Aµ − δxσ1Γµσν Aν − δxσ2Γµσν A

ν

−δxσ2δxρ1

(∂ρΓ

µσν − ΓµκσΓκνρ

)Aν (5.1.3)

where the Christoffel symbols with no explicit argument are evaluated at xµ. In goingto the last line we exchanged the ν and the κ dummy indices and used the symmetryproperties of the Christoffel symbols.

Now suppose that we had parallel transported in the opposite order, that is firstwe parallel transport along δ xµ2 and then δ xµ1 . Then we would have defined a different

single valued vector function ~A21(x) such that ~A21(x + δ x2) is the vector after parallel

transporting from xµ to xµ + δ x2 and ~A21(x + δ x2 + δ x1) is the vector after paralleltransporting to xµ + δ x2 + δ x1. Exchanging 1 and 2 in (5.1.3) we get

~A21(x+ δ x2 + δ x1) = Aµ − δxσ2Γµσν Aν − δxσ1Γµσν A

ν

−δxσ1δxρ2

(∂ρΓ

µσν − ΓµκσΓκνρ

)Aν

= Aµ − δxσ2Γµσν Aν − δxσ1Γµσν A

ν

−δxσ2δxρ1

(∂σΓµρν − ΓµκρΓ

κνσ

)Aν , (5.1.4)

where we exchanged the σ and ρ indices in the last line. If we now consider the paralleltransport around the vanishingly small loop where first we transport in the δ ~x1 direction,then δ ~x2, then −δ ~x1 and finally −δ ~x2, then the change in the components of the vectorδAµ are given by the difference between the two vector functions Aµ12 and Aµ21 at xµ +δ xµ2 + δ xµ1 . This is illustrated in figure 5.3. Thus, the resulting change in the vectorfrom parallel transport around the loop is

δAµ = Aµ12(x+ δ x1 + δ x2)− Aµ21(x+ δ x2 + δ x1)

=(∂σΓµρν − ∂ρΓµσν + ΓµκσΓκνρ − ΓµκρΓ

κνσ

)Aν δxσ2δx

ρ1

≡ RµνσρA

ν δxσ2δxρ1 . (5.1.5)

Rµνσρ is called the Riemann curvature tensor, or just Riemann tensor, and as it names

implies it is a tensor. To show this, consider the commutator of covariant derivatives[∇σ,∇ρ] acting on Aµ, where we find

[∇σ,∇ρ]Aµ ≡ (∇σ∇ρ −∇ρ∇σ)Aµ

= ∂σ(∇ρAµ)−Γνρσ∇νA

µ + Γµνσ∇ρAν − ∂ρ(∇σA

µ) +Γνσρ∇νAµ − Γµνρ∇σA

ν

= ∂σ∂ρAµ + ∂σ(ΓµνρA

ν) + Γµνσ∂ρAν + ΓµνσΓνκρA

κ

−∂ρ∂σAµ − ∂ρ(ΓµνσAν)− Γµνρ∂σA

ν − ΓµνρΓνκσA

κ

=(∂σΓµνρ − ∂ρΓµνσ + ΓµκσΓκνρ − ΓµκσΓκνρ

)Aν

= RµνσρA

ν . (5.1.6)

46


x! 1

x! 1

x! 2x! 2

A A12

A12A21A21

Figure 5.3: Parallel transport around a small loop. The blue shows parallel transport alongthe δ ~x1 then δ ~x2 directions while the red is along the δ ~x2 then δ ~x1 directions. The result isthe difference between ~A12 and ~A21 at xµ + δ xµ1 + δ xµ2 in the top right corner.

By construction [∇σ,∇ρ]Aµ is a tensor and since Aν is a contravariant vector then Rµ

νσρ

is a(

13

)tensor.

Notice that it does not matter what the vector is. We could have chosen ~A12(x) or~A21(x) or any other single valued vector function, we would still find the same relationwith the Riemann tensor which only depends on the local geometry of the space-time.Also note that if we had chosen ~A = A12(x), and considered xµ to be the starting pointused in the preceding discussion, we would have

RµνσρA

ν12 δx

σ2δx

ρ1 = [∇σ,∇ρ]A

µ12 δx

σ2δx

ρ1 = −∇ρ∇σA

µ12 δx

σ2δx

ρ1 , (5.1.7)

On the other hand, choosing ~A = A21(x) at the same xµ gives

RµνσρA

ν21 δx

σ2δx

ρ1 = [∇σ,∇ρ]A

µ21 δx

σ2δx

ρ1 = +∇σ∇ρA

µ21 δx

σ2δx

ρ1 , (5.1.8)

A nonzero-Riemann tensor tells us that we can’t parallel transport a single valued vectorfunction around the loop because it is not possible to set all covariant derivatives to zero.

5.1.2 Properties of the Riemann tensor

Now that we know that the Riemann tensor is indeed a tensor, we can use this fact toestablish several important properties about Rµ

νσρ. For instance, for a global inertialframe the Christoffel symbols are zero everywhere and so Rµ

νσρ = 0. Hence, there is nocurvature. This is why we referred to such a space as flat Minkowski space in an earlierlecture. Furthermore, since Rµ

νσρ = 0 is a tensor equation, all components are zero forany frame on this space-time, even in frames where some of the Christoffel symbols arenonzero, such as the constant accelerating frame. We can also argue backwards to saythat if some components of Rµ

νσρ are nonzero then there does not exist a frame where

47


all components are zero. Hence, unlike the Christoffel symbols where a frame alwaysexists at any given point where all Γµνλ are zero, the same is not true for the Riemanntensor. In other words, the curvature is not removed, not even locally, by a good choiceof coordinates.

Another property that we can read off immediately from (5.1.6) is that Rµνσρ is

antisymmetric in its last two indices, Rµνσρ = −Rµ

νρσ. To find more properties let usconsider Rµ

νσρ at one point xµ where the coordinates are a local inertial frame. In thiscase gµν = ηµν and Γµνσ = 0 for all components at xµ. Since the metric is covariantlyconstant this also means that ∂σgµν = 0 at xµ. Therefore in the local inertial frame

Rµνσρ = gµλ(∂σΓλνρ − ∂ρΓλνσ + ΓλκσΓκνρ − ΓλκσΓκνρ

)= ηµλ

(1

2ηλκ∂σ

(∂νgκρ +∂ρgκν − ∂κgνρ

)− 1

2ηλκ∂ρ (∂νgκσ +∂σgκν − ∂κgνσ)

)=

1

2(∂σ∂νgµρ − ∂σ∂µgνρ − ∂ρ∂νgµσ + ∂ρ∂µgνσ) (5.1.9)

We next notice that for this frame Rµνσρ = −Rνµσρ and Rµνσρ = Rσρµν . However, if atensor is symmetric (antisymmetric) under interchange of indices in one frame then itis symmetric (antisymmetric) in all frames. It is also straightforward to check that in(5.1.9)

Rµνσρ +Rµρνσ +Rµσρν = 0 , (5.1.10)

and so this too is true in all frames. Finally, using Rµνσρ;λ ≡ ∇λRµνσρ = ∂λRµνσρ for thelocal inertial frame, we find

Rµνσρ;λ +Rµνλσ;ρ +Rµνρλ;σ =1

2∂λ (∂σ∂νgµρ − ∂σ∂µgνρ − ∂ρ∂νgµσ + ∂ρ∂µgνσ)

+1

2∂ρ (∂λ∂νgµσ − ∂λ∂µgνσ − ∂σ∂νgµλ + ∂σ∂µgνλ)

+1

2∂σ (∂ρ∂νgµλ − ∂ρ∂µgνλ − ∂λ∂νgµρ + ∂λ∂µgνρ) = 0 ,

(5.1.11)

where one sees that the terms pair up (by color) and cancel each other. This lastrelation is called the Bianchi identity. Actually, there is another relatively simple wayof establishing the Bianchi identity that does not use the local inertial frame. Using(∇λR

µνσρ)A

ν = ∇λ(RµνσρA

ν)−Rµνσρ∇λA

ν and (5.1.6) we find

(Rµνσρ;λ +Rµ

νλσ;ρ +Rµνρλ;σ)Aν = [∇λ, [∇σ,∇ρ]]A

µ + [∇ρ, [∇λ,∇σ]]Aµ + [∇σ, [∇ρ,∇λ]]Aµ

= 0 (5.1.12)

by the Jacobi identity:

[A, [B,C]] + [C, [A,B]] + [B, [C,A]] = 0 . (5.1.13)

48


5.1.3 The Ricci tensor, Ricci scalar and Einstein tensor

From the Riemann tensor Rµνσρ we can construct other tensors. The Ricci tensor Rµν

is defined as

Rµν ≡ Rλµλν , (5.1.14)

and the Ricci scalar R is

R ≡ Rµµ . (5.1.15)

From these two tensors we can build another tensor Gµν called the Einstein tensor,

Gµν ≡ Rµν − 12gµνR . (5.1.16)

The Einstein tensor satisfies an extremely important property, namely

Gµν;µ ≡ ∇µG

µν = 0 . (5.1.17)

Let’s now show this:

∇µGµν = ∇µR

µν − 1

2∇νR

= ∇µRλµλν − 1

2∇νR

µλµλ


2

(−∇λR

µλνµ −∇µR

µλλν

)Bianchi Identity


2

(∇µR

λµλν +∇µR

λµλν

)Relabeling, anti-symmetry

= 0 . (5.1.18)

5.1.4 Independent components

The Riemann tensor comes with 4 indices, so if there are D spatial (or space-time)dimensions, then there are D4 different ways of choosing the indices. However, becauseof the symmetries or anti-symmetries, the number of independent components is actuallyless. We know that Rµνσρ = −Rνµσρ = −Rµνρσ, so that there are only D(D − 1)/2independent choices for the first two indices and D(D−1)/2 independent choices for thelast two indices. Moreover, Rµνσρ = Rσρµν so we have to symmetrize over the two sets

of D(D− 1)/2 choices, which gives us 12D(D−1)

2(D(D−1)

2+ 1) independent choices. Lastly,

we have the relation in (5.1.10) which reduces the number of independent componentsby 1 for every choice of 4 different indices (if any two indices are the same then one cansee that (5.1.10) follows from the other symmetries). For example, consider Rabcd whereall indices are different. Then

Rabcd = −Racdb −Radbc . (5.1.19)

There is no extra relation for, say, Rbcda since by the other symmetries we can write thisas

Rbcda = Rdabc = −Radbc (5.1.20)

49


which appeared in our relation for Rabcd. Hence the number of independent componentsof the Riemann tensor is reduced by the number of different ways of choosing 4 differentindices, that is

(D4

), and so the number of independent components is

D(D − 1)(D(D − 1) + 2)

8− D(D − 1)(D − 2)(D − 3)

24

=1

12D2(D − 1)(D + 1) . (5.1.21)

For D = 2 we see there is 1, for D = 3 there are 6, for D = 4 there are 20, etc.For the Ricci tensor Rµν and the Einstein tensor Gµν the number of independent

components for each is D(D + 1)/2, since they are symmetric in the indices1.

5.2 Examples

5.2.1 A round two-dimensional sphere

In this case the metric is

ds2 = R2 dθ2 +R2 sin2 θ dφ2 , (5.2.1)

and so the metric components are gθθ = R2, gφφ = R2 sin2 θ. Since the metric is diagonal,the inverse metric has

gθθ =1

gθθ=

1

R2gφφ =

1

gφφ=

1

R2 sin2 θ. (5.2.2)

The nonzero Christoffel symbols are

Γθφφ = 12gθθ(−∂θgφφ) =

1

2R2(−2R2 sin θ cos θ) = − sin θ cos θ

Γφθφ = 12gφφ(∂θgφφ) =

1

2R2 sin2 θ(2R2 sin θ cos θ) =

cos θ

sin θ= cot θ . (5.2.3)

Since the space is two-dimensional, the Riemann tensor has only one independentcomponent Rθφθφ. All other possibilities are either zero or related to this one by symme-tries. We then have

Rθφθφ = gθθRθφθφ = gθθ

(∂θΓ

θφφ − ∂φΓθφθ + ΓθµθΓ

µφφ − ΓθµφΓµφθ

)= R2

((sin2 θ − cos2 θ)− 0 + 0− (− sin θ cos θ)(cot θ)

)= R2 sin2 θ ,(5.2.4)

where the last term has a contribution if µ = φ. The nonzero components of the Riccitensor are

Rθθ = gφφRθφθφ = 1 , Rφφ = gθθRθφθφ = sin2 θ , (5.2.5)

1If D = 2 then the components of the Ricci tensor are related to each other by factors of the metricsince there is only one independent component for the Riemann tensor.

50


while the Ricci scalar is

R = gθθRθθ + gφφRφφ =1

R2+

1

R2 sin2 θsin2 θ =

2

R2. (5.2.6)

The Ricci scalar is an invariant, thus it gives us information about the curvature thatis independent of the frame. Firstly, we see that R is constant so the curvature is thesame everywhere. This seems sensible since a sphere appears equally curved everywhere.Second, we see that the curvature falls off with R. This again seems sensible; the biggerthe radius of the sphere the less curved it is (the earth looks pretty flat over distanceson orders of meters, but a basketball does not).

5.2.2 Weak gravitational field

In the last chapter we considered the metric

ds2 = −(1 + 2Φ( x−→))dt2 + d2xi , (5.2.7)

where |Φ( x−→)| << 1, which was relevant for nonrelativistic free-fall in a weak gravita-tional field. Let us modify the metric by including small corrections to the spatial partof the metric as well. Assuming small |Φ| we will consider a metric of the form

ds2 = −(1 + 2Φ( x−→))dt2 + (1− 2K Φ( x−→))d2xi , (5.2.8)

where K is a dimensionless constant that we will leave undetermined for now. The metricis diagonal and so the relevant Christoffel symbols to linear order in Φ are

Γi00 = −1

2gij∂jg00 = ∂iΦ( x−→) ,

Γ0i0 =

1

2g00∂ig00 ≈ ∂iΦ( x−→) (5.2.9)

Γijk =1

2gil (∂jglk + ∂kglj − ∂lgjk) ≈ −K

(δik∂jΦ( x−→) + δij∂kΦ( x−→)− δjk∂iΦ( x−→)

).

Hence the relevant curvature tensor components are

Ri0j0 = gik(∂jΓ

k00 − ∂0Γk0j + Γkµ0Γµ0j − ΓkµjΓ

µ00

)≈ ∂i∂jΦ( x−→) (5.2.10)

Rijkl = gim(∂kΓ

mjl − ∂lΓmjk + ΓmµkΓ

µjl − ΓmµlΓ

µjk

)≈ K

(δjl∂i∂kΦ( x−→)− δjk∂i∂lΦ( x−→)− δil∂j∂kΦ( x−→) + δik∂j∂lΦ( x−→)

),

where we only kept terms linear in Φ( x−→). The components of the Ricci tensor that arelinear in Φ are

R00 = gijRi0j0 = ∇2Φ( x−→) ,

Rij = g00R0i0j + gklRkilj = −∂i∂jΦ( x−→) +K(δij∇2Φ( x−→) + ∂i∂jΦ( x−→)

),(5.2.11)

51


while the Ricci scalar is

R = g00R00 + gijRij = −∇2Φ( x−→)−∇2Φ( x−→) +K(

3∇2Φ( x−→) +∇2Φ( x−→))

= (4K − 2)∇2Φ( x−→) . (5.2.12)

By examining free-fall equations we had previously identified Φ( x−→) with the gravitational

potential and so −~∇Φ( x−→) is proportional to the gravitational force. Here we see that it

is not so much the gravitational force, but its divergence, ~∇ · (−~∇Φ( x−→)), that leads tothe curvature of space-time.

Finally, for the Einstein tensor we find

G00 = R00 − 12g00R = 2K∇2Φ( x−→)

Gij = Rij − 12gijR = (K − 1)∂i∂jΦ( x−→) + (1−K)δij∇2Φ( x−→) . (5.2.13)

We will come back to these results to complete the derivation of Einstein’s equations.

5.2.3 The Schwarzschild metric

The Schwarzschild metric is given by

ds2 = −(

1− 2MG

r

)dt2 +

(1− 2MG

r

)−1

dr2 + r2 dθ2 + r2 sin2 θ dφ2 , (5.2.14)

where G is Newton’s gravitational constant, G ≈ 6.674 × 10−11 m3 kg−1s−2 ≈ 7.42 ×10−28 m kg−1 and M is a mass. Later we will see that this is a solution to Einstein’sequations for r > r0, where r0 is the radius of a spherically symmetric static objectwith mass M . If the mass is entirely inside r = 2MG then we will later show that thisdescribes a black hole.

The nonzero metric components are

gtt = −(

1− 2MG

r

), grr =

(1− 2MG

r

)−1

, gθθ = r2 gφφ = r2 sin2 θ .(5.2.15)

Since the metric is diagonal, the inverse metric components are

gtt =1

gtt= −

(1− 2MG

r

)−1

, grr =1

grr=

(1− 2MG

r

),

gθθ =1

gθθ=

1

r2, gφφ =

1

gφφ=

1

r2 sin2 θ. (5.2.16)

The Christoffel symbols Γθφφ and Γφθφ are the same as in (5.2.3). The other nonzero

52


symbols are (using the rules for diagonal metrics)

Γrtt = −12grr∂rgtt =

MG

r2

(1− 2MG

r

), Γtrt = 1

2gtt∂rgtt =

MG

r2

(1− 2MG

r

)−1

,

Γrrr = 12grr∂rgrr = −MG

r2

(1− 2MG

r

)−1

Γrθθ = −12grr∂rgθθ = −r

(1− 2MG

r

)Γθrθ = 1

2gθθ∂rgθθ =

1

r

Γrφφ = −12grr∂rgφφ = −r sin2 θ

(1− 2MG

r

)Γφrφ = 1

2gφφ∂rgφφ =

1

r. (5.2.17)

The components of the Riemann tensor are unfortunately rather tedious to compute.Here we list and compute the independent nonzero components:

Rrtrt = grr(∂rΓ

rtt −

∂tΓrrt + ΓrµrΓ

µtt − ΓrµtΓ

µtr

)= grr

(∂rΓ

rtt + ΓrrrΓ

rtt − ΓrttΓ

ttr

)=

(1− 2MG

r

)−1(−2

MG

r3

(1− 2MG

r

)+ 2

(MG

r2

)2

−(MG

r2

)2

−(MG

r2

)2)

= −2MG

r3= 2

MG

r3grrgtt ,

Rrθrθ = grr(∂rΓ

rθθ −

∂θΓrrθ + ΓrµrΓ

µθθ − ΓrµθΓ

µθr

)= grr

(∂rΓ

rθθ + ΓrrrΓ

rθθ − ΓrθθΓ

θθr

)= grr

(−1 +

MG

r+

(1− 2MG

r

))= −MG

r

(1− 2MG

r

)−1

= −MG

r3grrgθθ

Rrφrφ = grr(∂rΓ

rφφ −∂φΓrrφ + ΓrµrΓ

µφφ − ΓrµφΓµφr

)= grr

(∂rΓ

rφφ + ΓrrrΓ

rφφ − ΓrφφΓφφr

)= grr

(−1 +

MG

r+

(1− 2MG

r

))sin2 θ = −MG

r3grrgφφ

Rtθtθ = gtt

(∂tΓ

tθθ −∂θΓ

ttθ + ΓtµtΓ

µθθ −

ΓtµθΓµθt

)= gttΓ

trtΓ

rθθ = −MG

r3gttgθθ

Rtφtφ = gtt

(∂tΓ

tφφ −∂φΓttφ + ΓtµtΓ

µφφ −ΓtµφΓµφt

)= gttΓ

trtΓ

rφφ = −MG

r3gttgφφ

Rθφθφ = r2 sin2 θ + gθθΓθrθΓ

rφφ = r2 sin2 θ − r2 sin2 θ

(1− 2MG

r

)= 2

MG

r3gθθgφφ ,(5.2.18)

where in the last line we took the two-dimensional sphere result for Rθφθφ, replacing Rwith r, and added an extra term that must be included because r, unlike R, is not heldfixed. Notice that the final result (blue) for each term has a very similar form, with theindices on the metric matching the indices on the Riemann tensor.

Because Γrφφ has both r and θ dependence, the component Rrφθφ is not obviouslyzero; let’s check that it is

Rrφθφ = grr(∂θΓ

rφφ −

∂φΓrθφ + ΓrµθΓµφφ − ΓrµφΓµφθ

)= grr

(∂θΓ

rφφ + ΓrθθΓ

θφφ − ΓrφφΓφφθ

)= grr

(−2 sin θ cos θ + sin θ cos θ + sin2 θ cot θ

)r

(1− 2MG

r

)= 0 . (5.2.19)

53


We can now take the nonzero components and compute the Ricci tensor. Becausethe metric is diagonal and because each nonzero component of the Riemann tensor hastwo repeated indices, the Ricci tensor will also be diagonal. We thus find

Rtt = grr Rrtrt + gθθ Rθtθt + gφφRφtφt = 2MG

r3gtt −

MG

r3gtt −

MG

r3gtt = 0 ,

Rrr = gttRtrtr + gθθ Rθrθr + gφφRφtφt = 2MG

r3grr −

MG

r3grr −

MG

r3grr = 0 ,

Rθθ = gttRtθtθ + grr Rrθrθ + gφφRφθφθ = −MG

r3gθθ −

MG

r3gθθ + 2

MG

r3gθθ = 0 ,

Rφφ = gttRtφtφ + grr Rrφrφ + gθθ Rθφθφ = −MG

r3gφφ −

MG

r3gφφ + 2

MG

r3gφφ = 0 .

(5.2.20)

Hence all components of the Ricci tensor are zero! Obviously, the Ricci scalar R = Rµµ

is also zero, and so too is the Einstein tensor Gµν = Rµν − 12gµν R.

It might seem that because R = 0 there is no invariant information about the curva-ture. However, we can also construct an invariant by squaring two Riemann tensors andcontracting indices. This gives

RµνσρRµνσρ = 4∑a<b

RababRabab = 4∑a<b

(gaagbbRabab)2

= 4(4 + 1 + 1 + 1 + 1 + 4)

(MG

r3

)2

= 48M2G2

r6. (5.2.21)

We have ordered the indices a and b in the sum as t first, r second, θ third and φ fourth.As you can see, the invariant is blowing up as r → 0, signaling a real singularity where thespace-time is becoming infinitely curved. However, there is no singularity at r = 2MGwhere the metric is becoming singular. This point is only a coordinate singularity. Asr → ∞ the invariant is going to zero, indicating the space-time is becoming flat. Thisis clear from the metric where in the r →∞ limit the metric approaches flat Minkowskispace.

54

Chapter 6

The energy-momentum tensor

In this chapter we consider the energy-momentum tensor. We discuss its form in variousinstances.

6.1 Construction and properties

In the previous two chapters we introduced the tools necessary to describe the geometryof space-time. This will lead to one side of Einstein’s equations. On the other side ofthe equations we need the part that acts as a source for this geometry. This is likeelectrodynamics, where half of Maxwell’s equations can be written as

∂µFµν = jν , (6.1.1)

where F µν contains the field strengths ~E and ~B, with Ei = F0i and Bi = 12εijkFjk. The

right hand side has the sources, where j0 is the charge density ρ and ji are the componentsof the current. An important feature in these equations is current conservation, ∂νj

ν = 0,which the field equations respect because of the antisymmetry of Fµν .

Hence for gravity, we are looking for the analog of jν . To this end, let us supposewe are in flat Minkowski space and that we have a uniform distribution of static “dust”,where dust can mean any nonrelativistic matter, including hydrogen molecules. In fact,let us assume that the dust is made up of hydrogen molecules that are nonrelativisticand so for all intents and purposes can be assumed to be at rest. Each molecule has arest mass of m and a velocity 4-vector ~U with components ~U = (1, 0, 0, 0) in the rest

frame. The momentum 4-vector for each molecule is ~p = m ~U .Now let n be the number density of the dust in the dust’s rest frame, that is the

number of hydrogen molecules per unit volume. We can then construct a number 4-current, ~N = n ~U , whose spatial components give the rate that the molecules passthrough a unit area per unit time, while the number density is N0 = nU0. In the restframe all spatial components are zero and the number density is N0 = n. If we boost toa new frame S′ which is moving with velocity v x wrt the rest frame, then the currentin this frame is ~N = n(γ,−vγ, 0, 0). We can easily understand the different factors here.Because of length contraction in the x direction, the density will go up by a factor of γ.

55


Q

!

!Figure 6.1: The figure shows a charge Q inside a volume Σ. The rate of change of Q withrespect to time is the negative of the current flux through the Σ’s boundary ∂Σ.

Also, there is now a current component in the x direction because the dust is movingbackward with velocity −v in S′. The current is then −v times the density.

The components of the 4-momentum for each molecule in S′ are ~p = m(γ,−vγ, 0, 0).The energy of each molecule is mU0 and so the energy density is E = mnU0U0. Sincethis has two Lorentz time-like indices, this suggests that E = T 00 where T µν are thecomponents of a

(20

)tensor called the energy-momentum tensor. In the case of the

uniform dust, the full energy momentum tensor is given by T µν = mnUµUν . Clearly,T µν = T νµ. We can think of the T 0i components as the energy current in the i direction.

But we can also think of it as the density of the ith component of the momentum. Eitherinterpretation is consistent, since if there is an energy current then the dust must havesome momentum to move the energy from one place to another. The components T ij

correspond to the current in the jth direction for the ith component of the momentum,and vice versa.

By construction T µν is symmetric. Since it is a tensor it will be symmetric in anyframe. Even if no global inertial frame is present, at every point there exists a localinertial frame where T µν can be expressed locally in terms of the product of velocityvectors and so T µν is symmetric in this case as well.

The energy-momentum tensor satisfies another crucial property. It is a conservedcurrent. Let us review current conservation in electrodynamics. The current conservationequation ∂νj

ν = 0 can be rewritten as

∂

∂tρ+ ~∇ ·~j = 0 . (6.1.2)

To put this equation into words, it tells us that net electric charge is conserved, it is notcreated or destroyed out of thin air. If the charge density is changing at one particularpoint, it must be because charge is flowing into or away from that point. To see this alittle differently, consider the situation as shown in figure 6.1. Here we have a charge Q

56


in a certain volume Σ and with a boundary ∂Σ. Q is found by integrating the chargedensity ρ over Σ,

Q =

∫Σ

d3x ρ . (6.1.3)

The rate at which the charge in Σ is changing is

d

dtQ =

∫Σ

d3x∂

∂tρ = −

∫Σ

d3x ~∇ ·~j , (6.1.4)

where the last step uses the current conservation in (6.1.2). Then by Gauss’ law it followsthat the rate at which the charge in Σ is changing is

d

dtQ = −

∫∂Σ

d~S ·~j , (6.1.5)

where the last integral is the negative of the current flux on the boundary of Σ. Whatwe learn from this is that if the charge Q is decreasing in Σ it is because charge is flowingout of the boundary. The larger the rate of change for Q, the larger the rate it flows out.

We can now do the same with energy and momentum. Instead of the charge density ρwe can consider the energy density E = T 00 and instead of charge currents ji we considerenergy currents T i0. Then if energy is conserved we must have the same sort of currentconservation equation, ∂µT

µ0 = 0. We can do the same thing with each component ofmomentum so we also have ∂µT

µi = 0, hence we have the four equations ∂µTµν = 0.

These equations are valid for inertial frames, but the derivatives have to be replaced bycovariant derivatives so that we have a tensor equation that applies for any frame. Henceour full set of current conservation equations are

T µν ;µ ≡ ∇µTµν = 0 . (6.1.6)

As we have emphasized in previous chapters, a tensor where all components are zero inone frame has all components zero in any frame.

6.2 Perfect fluids

When we consider cosmological solutions to Einstein’s equations we will assume that theuniverse is composed of a perfect fluid. A perfect fluid could be made up of dust, radia-tion, or something else. A perfect fluid is assumed to be uniform and isotropic. Uniformmeans that the components of the energy-momentum tensor are the same everywhere.Isotropic means there is no preferred direction, so the fluid appears the same no matterwhich direction an observer looks. Because it is isotropic, the energy momentum tensoris invariant under rotations in the spatial directions. Hence, it must have the form

T µν =

E 0 0 00 p 0 00 0 p 00 0 0 p

. (6.2.1)

57


1x

Figure 6.2: A screen inserted in a fluid. The fluid exerts a pressure on the screen.

To understand why this is so, note that we can think of T 0i as the three componentsof a spatial 3-vector. The only vector that is invariant under rotations has all of itscomponents 0. Likewise, the components T ij make up a 3 by 3 matrix. The only matricesthat are invariant under general rotations are proportional to the identity matrix. HenceT µν must have the form in (6.2.1).

We have already discussed the energy density E = T 00. What is p? The T ii com-

ponent is the current for the ith component of momentum in the ith direction. Sincewe are assuming that the fluid is isotropic, there cannot be any net momentum for thefluid. So any positive momentum flowing in the positive direction is cancelled by negativemomentum in the negative direction, but still leaving a net current. Now suppose weinsert an opaque screen into this fluid that is orthogonal to the x1 direction (see figure6.2.) Then each side of the screen is bombarded by the fluid and thus feels a force onit. Assuming that the fluid bounces off the screen, the impulse from one fluid particleon the negative side of the screen is 2 p1, assuming p1 > 0. Then the force per unit areais 2 p1 times the rate per unit area (aka the flux) the particles that make up the fluid hitthe screen. However 2 p1 multiplied by the flux is T 11, the current in the x1 direction forthe first momentum component. The factor of 2 appears because the backward movingparticles contribute equally to the current. p is then the force per unit area, which is thepressure.

6.3 Radiation as a perfect fluid

We can see that static dust in the rest frame is a perfect fluid with p = 0. Anotherexample of a perfect fluid is radiation. The radiation is made up of photons and in thiscase for each photon we have |~p| = p0. The fluid itself is made up of many photonssuch that the total spatial momentum is zero and the distribution of the momentum isisotropic. This means that the average value of each component is pi = 0 for a given p0.

58


However, the average of the squares of the momentum components satisfy

(p0)2 = (p1)2 + (p2)2 + (p3)2 = (p1)2 + (p2)2 + (p1)3 . (6.3.1)

Because the distribution of momentum is isotropic, we have (p1)2 = (p2)2 = (p3)2, andtherefore (pi)2 = 1

3(p0)2 for each spatial component i. Also, since there is no correlation

between the different components, pipj = 0 if i 6= j.Now let n(p0) be the photon density per unit energy p0. Then the energy density is

E = T 00 =

∫p0 n(p0) dp0 . (6.3.2)

The contribution to the photon number current in the ith direction is pi

p0n(p0) which

averages to zero. The components of the momentum currents are then

T ij =

∫pipj

p0n(p0) dp0 =

1

3δij

∫p0 n(p0)dp0 =

1

3δij E . (6.3.3)

Hence we find p = 13E .

For a general perfect fluid, we will write the relation between the pressure and theenergy density as p = w E . This is referred to as an equation of state. Nonrelativisticmatter has w = 0, while radiation has w = 1/3.

59

Chapter 7

Einstein equations

In this chapter we find Einstein’s equations by matching a tensor equation to the New-tonian limit.

7.1 Derivation

Einstein’s equations have the schematic form

Curvature = Sources . (7.1.1)

We will use as the source term the energy-momentum tensor T µν . As we have just seen,T µν is symmetric and conserved, meaning that it satisfies T µν ;µ = 0. Hence for thecurvature part we are looking for a tensor with the same properties so that we can havea consistent tensor equation. In fact, we have seen such a tensor, namely the Einsteintensor, Gµν . Hence

Gµν = C Tµν (7.1.2)

is a consistent tensor equation, where C is a constant.In order to determine C let us consider the weak field limit with gravitational potential

Φ( x−→) and where T µν is comprised of nonrelativistic matter. We previously showed thatthe components of Gµν are given by

G00 = 2K∇2Φ( x−→) ,

Gij = (K − 1)∂i∂jΦ( x−→) + (1−K)δij∇2Φ( x−→) , (7.1.3)

(7.1.4)

where K is a yet to be determined constant that appears in the spatial part of the metric.For the energy-momentum tensor, since the motion is nonrelativistic, the energy densityT 00 = E is approximately the mass density ρ. The other components of the energymomentum tensor are negligible compared to T 00 so we can set them to zero. Hence,the relation in (7.1.2) gives the equations

2K∇2Φ( x−→) ≈ C ρ

(K − 1)∂i∂jΦ( x−→) + (1−K)δij∇2Φ( x−→) ≈ 0 . (7.1.5)

60


Thus the second equation forces us to set K = 1.C is then set by recalling the potential equation for Newtonian gravity. Given a mass

density ρ, the potential satisfies the Poisson equation

∇2Φ = 4π Gρ (7.1.6)

where G is Newton’s gravitational constant. Comparing to the first equation in (7.1.5)we find that C = 8π G and thus we can write the tensor equation

Gµν = 8π GTµν . (7.1.7)

Finally, we have derived the long anticipated Einstein equations, one of the great in-tellectual achievements in science, so far-reaching that we have even put a box aroundit.

7.2 Einstein’s “greatest blunder” — Not!

Actually, Einstein realized that there is another term he could add to the left hand sideof his equations that is also a symmetric

(02

)tensor and whose covariant derivative when

contracted with one of the indices is zero. It’s the metric multiplied by a constant.Clearly the covariant derivative condition holds since ∇λgµν = 0. Hence, we could havewritten the equations as

Gµν + Λ gµν = 8π GTµν , (7.2.1)

where Λ is a constant called the cosmological constant.In fact Einstein thought he needed this term to explain the universe as it was un-

derstood in the early 20th century. At that time the universe was believed to be static,at least on a cosmological scale. But since the universe is also filled with matter, thereshould be a gravitational attraction pulling the matter inward. However, it turns outthat the cosmological constant can counteract this attraction and leave the matter fixedin place.

But in the 1920’s Hubble discovered that the universe is not static but is expanding.Hence the gravitational attraction is still there, slowing the expansion down, but thecosmological constant is no longer necessary. When Einstein learned of the expansionhe referred to the cosmological constant term as his greatest blunder since without theterm he might have been forced to predict an expanding universe before Hubble madehis observations. Or, he might have thought that it was a blunder because it sullied hisbeautiful equations with an extra term.

However, there is another way to look at the cosmological constant. If we bring it tothe other side of Einstein’s equations so that we get

Gµν = 8π GTµν − Λ gµν (7.2.2)

then we can think of it not as a correction to the curvature part of Einstein’s equations,but instead as an extra contribution to T µν . The cosmological constant contribution T µνcc

61


is read off from (7.2.2) to be

T µνcc = − Λ

8πGgµν . (7.2.3)

If we are in a space-time that is very close to flat such that gµν ≈ ηµν , then we can seethat the energy density is

T 00 =Λ

8πG(7.2.4)

while the pressure is

T ii = − Λ

8πG, i = 1, 2, 3 (7.2.5)

and all off-diagonal terms of T µν are zero. Hence this is a perfect fluid with a ratherstrange equation of state, p = −E , that is w = −1 and either the pressure or the energydensity is negative!

But it turns out that Einstein’s blunder was in fact a stroke of genius. Observationsof distant supernovae over the last 15 years have confirmed that 70% of the energydensity in the universe is of this cosmological constant type. The popular name for thisenergy source is “Dark Energy” or sometimes “Vacuum Energy”, and we will explore itsconsequences more fully in the cosmology part of the course.

62

Chapter 8

The Schwarzschild solution toEinstein’s equations

In this chapter we consider Schwarzschild’s solutions to the Einstein equations in emptyspace. We use this solution to do various tests of general relativity. We will also showthat the solution can also describe a black holed.

8.1 The Schwarzschild metric

The Einstein equations are

Gµν = 8π GTµν . (8.1.1)

One place to look for solutions is in empty space where Tµν = 0. In this case Gµν = 0implies R = 0 and Rµν = 0. To see this consider the contraction of the indices of Gµν

(this is also called “taking the trace”),

0 = Gµµ = Rµ

µ − 12gµµR = R− 1

24R = −R (8.1.2)

and so R = 0. The factor of 4 comes from gµµ = δµµ = 4 since there are 4 indices tosum over. From this, we have

Rµν = 12gµν R = 0 . (8.1.3)

It might seem that Rµν = 0 has only trivial solutions, but this is not true. It is possibleto have nonzero Riemann tensor components Rµνσρ but still have all components of thethe Ricci tensor Rµν = Rλ

µλν be zero. In fact, we have already found such a solution. Itis the Schwarzschild metric,

ds2 = −(

1− 2MG

r

)dt2 +

(1− 2MG

r

)−1

dr2 + r2(dθ2 + sin2 θdφ2) (8.1.4)

where it was demonstrated in chapter 5 that all components of the Ricci tensor satisfyRµν = 0, even though many of the Riemann tensor components are nonzero. If weexamine the structure of (8.1.4) we see that gtt has the form

gtt = −(1 + 2Φ(r)) , Φ(r) = −MG

r, (8.1.5)

63


which is the gravitational potential outside a sperically symmetric massive object withmass M . Hence the Schwarzschild metric gives the general relativistic solution for anobject like the sun.

When the gravitational field is weak, we previously showed that to leading order themetric has the form

ds2 = − (1 + 2Φ) dt2 + (1− 2Φ) (dxi)2 . (8.1.6)

If we go to polar coordinates and assume that the Φ = −MGr

the metric becomes

ds2 = −(

1− 2MG

r

)dt2 +

(1 +

2MG

r

)(dr2 + r2(dθ2 + sin2 θdφ2)

), (8.1.7)

which is not exactly the Schwarzschild metric. However, to this order in the expansionwe can let r = r(1 +MG/r) = r +MG, in which case

dr = dr

(1 + 2MG/r) ≈ (1− 2MG/r)−1

r2(1 + 2MG/r) ≈ r2(1 +MG/r)2 = r2

(1− 2MG/r) ≈ (1− 2MG/r) . (8.1.8)

The metric then becomes the Schwarzschild metric with r replaced by r. However, thisis just a coordinate transformation so the physics is the same as if we had used r.

8.2 Tests of general relativity

8.2.1 Bending of light

A significant test of general relativity was made in 1919 when Eddington measured thebending of starlight around the sun during a solar eclipse. Figure 8.1 shows the bendingof starlight around the sun. This bending leads to an apparent shift outward for theposition of the star as seen by an observer on the earth. Notice that the incomingstarlight is shifted in the perpendicular direction by a distance equal to the sun’s radiusR in order to reach the observer. This shift is called the impact parameter and is oftenwritten as b.

We can compute the bending using the Schwarzschild metric (8.1.4) with M as themass of the sun, since the light is completely outside the solar mass and the sun is veryclose to being spherically symmetric. Without any loss of generality we can assume thatthe light is in the equatorial plane of the sun at θ = π/2. Our goal is to compute ∆φ,the angle at which the light bends from a straight line, as it comes in from r = ∞,reaches a minimum distance at the sun’s radius R, and then goes back out to r = ∞.The angular shift in the sky of the star’s apparent position is also ∆φ, since the star isso much farther from the sun than an observer on earth that all light rays emanatingfrom the star that intersect the solar system can be treated as parallel.

64


R!"

!" b

Figure 8.1: The bending of starlight from the sun. Since the star is much farther from the sunthan an observer on earth, the two incoming light rays from the star (solid and dashed) canbe treated as parallel. The perpendicular separation between the parallel lines is the impactparameter b = R, where R is the sun’s radius. The apparent angular shift of the star is due tothe angular deflection of the light and for an observer on earth it is the same as if there werea star emitting a light ray following the dotted line with no deflection.

Since θ is fixed at θ = π/2, we have sin θ = 1 and dθ = 0, and so along a light-liketrajectory the metric satisfies

0 = −f(r) dt2 + f−1(r) dr2 + r2 dφ2 f(r) = 1− 2MG

r, (8.2.1)

which can be rewritten as

dφ2 = r−2f(r) dt2 − r−2f−1(r) dr2 . (8.2.2)

Notice that this expression has the same form as free-fall of a massive body in a radialdirection, with φ replacing τ and gtt and grr replaced by gtt ≡ gtt/r

2 and grr ≡ grr/r2

respectively. Hence, instead of t and r we will consider t′ ≡ dtdφ

and r′ ≡ drdφ

. Thus, sincethe metric is invariant under constant shifts of t, we have that gttt

′ is a constant of themotion,

r−2f(r) t′ = C . (8.2.3)

Hence, we have that

(r′)2 = f 2(r)(t′)2 − r2 f(r) = r4C2 − r2 f(r) . (8.2.4)

The constant C is then determined by setting r′ = 0 at the minimum r = R, where wefind

C2 = R−2 f(R) . (8.2.5)

Plugging this into (8.2.4) and solving by quadratures we end up with the integral equation

2

∫ ∞R

Rdr

r√r2 f(R)−R2 f(r)

=

∫dφ = π + ∆φ , (8.2.6)

where we have the factor of 2 to account for the light coming in and then going outagain. Substituting f(r) into the r integral gives

π + ∆φ = 2

∫ ∞R

Rdr√r(r −R) ((r2 +Rr)f(R)− 2MGR)

, (8.2.7)

65


Defining the dimensionless constant κ = 2MGR

and the dimensionless variable z = r/R,we can write the integral as

π + ∆φ = 2

∫ ∞1

dz√z(z − 1)(z2 + z − κ(z2 + z + 1))

. (8.2.8)

This integral is an example of an elliptic integral and can be written in terms of specialfunctions called elliptic functions. However, most of us are not that familiar with suchfunctions, so instead we will make an approximation. In a previous chapter we sawthat for the sun R >> MG and so κ << 1. In this case, the integral (8.2.8) can beapproximated by

π + ∆φ ≈ 2

∫ ∞1

dz

z√z2 − 1

+ κ

∫ ∞1

(z2 + z + 1)dz

z2(z + 1)√z2 − 1

(8.2.9)

The integrals are solved in (8.5.1) and (8.5.2) in the appendix, giving us

π + ∆φ ≈ π + 2κ . (8.2.10)

Hence we find that

∆φ ≈ 2κ =4GM

R(8.2.11)

Let us plug in the numbers to see how big this effect is. The solar mass is M =1.99 × 1030 kg, while G = 6.67 × 10−11 m3 kg−1 s−2. Thus using units where c = 1, wehave that MG = 1.485× 103 m. The solar radius is R = 6.96× 108 m, therefore

∆φ = 41.485× 103 m

6.96× 108 m= 8.5× 10−6 . (8.2.12)

This is the result in radians. In arcseconds it is

∆φ = 8.5× 10−6 rad (360 deg/2π rad)(3600 arcsec/deg) = 1.76 arcsec . (8.2.13)

Eddington’s expedition claimed to show this result for the bending, although therewas some controversy about systematic errors in the experiment. In fact many claimedthat the errors were so big that it was not possible to distinguish between the generalrelativistic result and the result using Newtonian gravity with special relativity (seeexercise). It was not until much later that experiments were able to definitely verify theGR result.

8.2.2 The precession of the perihelion of mercury

Another famous test of Einstein’s theory of general relativity involves the precession ofMercury’s orbit. The orbit of Mercury around the sun is close to an ellipse, and if Mercurywere alone in the solar system with the sun and if Mercury were a point-like object, thenNewtonian gravity would predict an elliptical orbit, as in figure 8.2. However, Mercuryis not point-like nor is it alone with the sun in the solar system, so the effect of this is for

66


1r r2

Figure 8.2: Ideal elliptical orbit for a single planet orbiting the sun. r1 is the distance to theperihelion and r2 to the aphelion.

Figure 8.3: Due to precession of equinoxes, the gravitation pull of the other planets and gen-eral relativity, the orbit precesses, giving a flower shape. The position of the perihelion (theminimum distance) itself then orbits around the sun. The figure is greatly exaggerated, withan actual precession of 1.5 degrees every 400 revolutions for mercury.

the direction in which the ellipse points to precess, as in figure 8.3. These effects can beaccurately computed and it was found that the theoretical precession rate was off by atiny, but statistically significant amount from the experimental result. There appearedto be an extra 43 arcsecs/century precession that could not be accounted for. Since theprecession effects are small, we can treat the different contributions as independent, sowe will ignore all precession except for the part coming from general relativity. In otherwords, we assume that the orbit would have been elliptical if not for general relativity.

We again use the Schwarzschild metric, treating Mercury as a test-body of mass mmoving in the sun’s equatorial plane (θ = π/2.) If the orbit were perfectly elliptical,then the angle φ would change by π as r changes from the minimum to the maximum.However, if the orbit is precessing in the same direction as Mercury is orbiting the sun,then the change in φ will be slightly greater than π and if it is precessing in the oppositedirection then the change in φ is slightly less than π. Since the metric is invariant under

67


a constant shift in t or in φ, there are two Killing vectors and the constants of the motionare

f(r) t = C r2 φ = L . (8.2.14)

Hence, we have the equation

1 = f−1(r)C2 − f−1(r) r2 − L2

r2, (8.2.15)

which we rewrite after multiplying through by a factor of 12m as

−12m(1− C2) = 1

2mr2 − mMG

r+mL2

2r2− mMGL2

r3

= 12mr2 + Veff (r) . (8.2.16)

We immediately notice that this equation is very similar to the usual Newtonian equa-tions of motion if we treat−1

2m(1−C2) as the total energy, mL as the angular momentum

and r as the radial derivative with respect to the time t. In fact, this would be the New-tonian equation if it were not for the extra term in the effective potential, −mMGL2

r3,

which we therefore interpret as a general relativistic correction.We will present two ways of computing the precession, the second of which is given

in the next section. The two approaches are complementary to each other in that theymake different approximations to do the calculation.

In the first approach an elliptic orbit of arbitrary eccentricity is considered, but thegeneral relativistic correction is assumed to be small. The constants of the motion aredetermined by the two extrema of the orbit r1 and r2 (r2 > r1) where r = 0, so that

1 = f−1(r1)C2 − L2

r21

1 = f−1(r2)C2 − L2

r22

. (8.2.17)

This leads to

L2 =f(r2)− f(r1)

f(r1) r−21 − f(r2) r−2

2

= r21r

22

f(r2)− f(r1)

f(r1) r22 − f(r2) r2

1

C2 =r2

2 − r21

r22 f−1(r2)− r2

1 f−1(r1)

= f(r1)f(r2)r2

2 − r21

f(r1) r22 − f(r2) r2

1

(8.2.18)

68


We will also define the constants

L2 ≡ r21r

22

f(r2)− f(r1)

r22 − r2

1

= 2M Gr1r2

r2 + r1

δL2 ≡ L2 − L2 ≈ (2MG)2 r22 + r1r2 + r2

1

(r1 + r2)2

δC2 ≡ C2 − 1 +2MG

r1 + r2

≈ 2MG

r1 + r2

+ 2MGr2

2 + r1r2 + r21

r1r2(r1 + r2)− 2MG

r1 + r2

r1r2

+(2MG)2

r1r2

− (2MG)2(r22 + r1r2 + r2

1)

r21r

22

+(2MG)2(r2

2 + r1r2 + r21)2

r21r

22(r1 + r2)2

=(2MG)2

(r1 + r2)2, (8.2.19)

where mL would be Mercury’s angular momentum for Newtonian gravity assuming r2

and r1 are the maximum and minimum values of the orbit.Instead of r, we are really interested in r′ = dr

dφ. This is related to r and φ by

r′ =r

φ=

r2

L

√C2 − f(r)

(1 +

L2

r2

)=

1

L

√− 2MG

r1 + r2

r4 + 2M Gr3 − L2 r2 + (δC2r4 − δL2r2 + 2MGL2r)

≈ 1

L

√2MG

r1 + r2

r2(r2 − r)(r − r1)− (2MG)2

(r1 + r2)2r(r + r1 + r2)(r2 − r)(r − r1)

≈√

(r2 − r)(r − r1)√r1r2

(r − MG(r + r1 + r2)

(r1 + r2)− MGr(r2

1 + r1r2 + r22)

r1r2(r1 + r2)

)=

√(r2 − r)(r − r1)√r1r2

(r −MG

(1 +

r(r1 + r2)

r1r2

))(8.2.20)

where in the second to last line the second term inside the parentheses comes fromexpanding the square root while the third term comes from expanding 1

Labout 1

L. In

the last line, the first term inside the parentheses gives the Newtonian contribution, whilethe second term is the leading order general relativistic correction.

For an elliptical orbit, φ will change by 2π as r goes from the minimum at r1 to themaximum at r2 and back again. Hence to find the precession rate per revolution we nowintegrate (8.2.20) from r1 to r2 and multiply by 2 to get

2

∫ r2

r1

√r1r2 dr

r√

(r2 − r)(r − r1)

(1 +MG

(1

r+r1 + r2

r1r2

))≈ 2π + ∆φ

(8.2.21)

69


Using the integrals (8.5.4) and (8.5.5) we find that

∆φ = 6πMG1

r, (8.2.22)

where (r)−1 is the average of the inverse radius,

1

r=

1

2

(1

r1

+1

r2

)=r1 + r2

2 r1r2

. (8.2.23)

If we now plug in numbers, we have that MG = 1.485 km for the sun, while for Mercury,the minimum distance (the distance to the perihelion) is r1 = 4.60 × 107 km, whilethe maximum distance (the distance to the aphelion) is r2 = 6.98 × 107 km, thereforer = 5.55× 107 km. Hence for each revolution the precession is

∆φ = 6π1.485 km

5.55× 107 km= 5.04× 10−7 , (8.2.24)

The period of mercury’s orbit is 88 days= 0.241 years, so over a century the amount ofprecession is

∆φcentury =100

0.2415.05× 10−7 = 2.09× 10−4 = 43.1 arc-secs . (8.2.25)

This is exactly what is needed to account for the missing precession factor.Einstein himself computed this result in 1915 when he formulated general relativity.

Note that we could have used the metric in (8.1.7) to this order of the approximationwhere we would have found the same thing. In fact, this is what Einstein did since atthe time he did not know about the Schwarzschild metric!

8.3 The effective potential and the stability of orbits

The effective potential that appears in (8.2.16) can be readily used to look for circularorbits around a massive object and to also decide their stability. The circular orbits haver = 0, which is consistent when Veff (r) is at an extremum, in other words d

drVeff (r) = 0.

Figure 8.4 shows the effective potential for a particular choice of L2. As one can see, thegeneral relativistic correction is most pronounced at small r, drastically changing thebehavior of Veff (r).

The extrema then satisfy the equation

d

drVeff (r) = 0 =

mMG

r2− mL2

r3+

3mMGL2

r4, (8.3.1)

which has the solutions

r± =1

2MG

(L2 ± L

√L2 − 12(MG)2

). (8.3.2)

Note, that there are no real solutions if L2 < 12(MG)2, so L2 ≥ 12(MG)2 is a conditionto have circular orbits. Assuming that the orbits exist, then from figure 8.4 it is clear

70


Figure 8.4: The effective potential Veff (r). The Newtonian potential would have Veff →∞ asr → 0. With the general relativistic correction the behavior is Veff → −∞ as r → 0..

that the extremum at r+ is stable and the the extremum at r− is unstable. This canalso be verified by showing that d2

dr2Veff (r) > 0 at r+ and d2

dr2Veff (r) < 0 at r−. Indeed,

we have

d2

dr2Veff (r) = −2mMG

r3+

3mL2

r4− 12mMGL2

r5, (8.3.3)

and so substituting r± for r gives

d2

dr2Veff (r)

∣∣∣∣∣r=r±

=mL2

r4±

(1− 6MG

r±

). (8.3.4)

Since r+ > 6MG and r− < 6MG we see that d2

dr2Veff (r) has the claimed property. Notice

that the presence of the smaller but nonstable orbit is a direct consequence of generalrelativity.

8.3.1 Precession once again

In the previous section we computed the precession assuming the general relativisticcorrections are small. We now present a simpler way of computing the precession of anorbit, although it assumes that the orbit is almost circular. However, it is not necessaryto assume that the general relativistic corrections are small, although we will do so atthe end to compare our result with that in the previous section.

Suppose that the orbit is very close to being circular, with r = r+ + ρ where r+

is the stable extremum and ρ is assumed to be much smaller than r+. Then we canapproximate Veff (r) as

Veff (r) ≈ Veff (r+) + 12ρ2 d2

dr2Veff (r)

∣∣∣∣∣r=r+

= Veff (r0) + 12

mL2

r4+

(1− 6MG

r+

)ρ2 . (8.3.5)

71


In this approximation, Veff (r) looks like the potential for an harmonic oscillator withangular frequency

ω =L

r2+

√1− 6MG

r+

, (8.3.6)

and so ρ = ρ0 cos(ωτ + C), where C is a constant phase. Since we are very close to acircular orbit, we can approximate φ ≈ L

r2+and so φ = τ L

r2++ C ′. Hence,

ρ = ρ0 cos

(φ

√1− 6MG

r+

+ φ0

), (8.3.7)

where we combined the constants into one constant phase.Note that result in (8.3.7) does not require r+ >> MG. However, in order to compare

with the previous result let us now assume that r+ >> 6MG. Then we can approximateρ as

ρ ≈ ρ0 cos

(φ

(1 +

3MG

r+

)−1

+ φ0

), (8.3.8)

and so the difference from 2π in one revolution from minimum to minimum for ρ is

∆φ = 6πMG

r+

(8.3.9)

which agrees with our previous result since r = r+ in the limit of a circular orbit.

8.4 The horizon

If we examine the Schwarzschild metric in (8.1.4) we notice some similarities to theaccelerating frame metric

ds2 = −(1 + 2α0x)dt 2 +dx 2

1 + 2α0x+ dy2 + dz2 , (8.4.1)

which we introduced in the second chapter. In particular, they both have coordinatesingularities, in the case of the Schwarzschild metric this happens at r = 2MG. Fur-thermore, just as in the case of the accelerating frame, as a light source gets closer andcloser to the singularity, the light as observed by an observer farther out is more andmore red-shifted and so the source’s clock appears to run slower and slower. Anothersimilarity between the two metrics is that for space-time points beyond the singularity,that is r < 2MG for Schwarzschild and x < −1/(2α0) for the accelerating frame, thetime coordinate becomes space-like, while the spatial coordinate becomes time-like sincethe sign of the metric changes.

72


Other things we can investigate with Schwarzschild that we previously studied withthe accelerating frame are free-fall of massive objects or light. For example, for a light-rayheading inward we have that

dr

dt= −f(r) . (8.4.2)

Assuming that r = r0 at t = 0, we can integrate this equation so that

t = −∫ r

r0

dr

f(r)= −

∫ r

r0

r dr

r − 2MG= r0 − r + 2MG log

r0 − 2MG

r − 2MG. (8.4.3)

Since the right side diverges logarithmically as r → 2MG, it will take an infinite amountof coordinate time t to reach r = 2MG. If the light is heading outward then

t = r − r0 + 2MG logr − 2MG

r0 − 2MG. (8.4.4)

If we consider an infalling massive body then we have that

1 = f(r)t2 − f(r)−1r2 = f(r)−1(C2 − r2

). (8.4.5)

If we make the assumption that r = 0 at r = r0, then C2 = f(r0) and so we find

r = −√f(r0)− f(r) = −

√2MG

r0

√r0 − rr

. (8.4.6)

We can integrate this equation to give

τ =

√2MG

r0

(√r(r0 − r) + r0

(π

2− arctan

√r

r0 − r

)). (8.4.7)

The main thing to observe is that if takes a finite amount of proper time τ to reachr = 2MG and that the infalling body continues falling into the region r < 2MG.However, it must take an infinite amount of coordinate time to reach this point since themassive body has to move slower than a light ray, which itself takes an infinite amountof coordinate time to reach r = 2MG.

As in the case of the accelerating frame, we therefore suspect that r = 2MG is not areal singularity but is only a coordinate singularity. If this is the case then there shouldbe another set of coordinates where there is no singularity at r = 2MG. Such a set ofcoordinates were found by Kruskal in the 1960’s and are now called Kruskal coordinates(what else). These coordinates are defined by

u ≡ MG

√r

2MG− 1 er/4MG cosh

t

4MG

v ≡ MG

√r

2MG− 1 er/4MG sinh

t

4MG(8.4.8)

73


if r ≥ 2MG and

u ≡ MG

√1− r

2MGer/4MG sinh

t

4MG

v ≡ MG

√1− r

2MGer/4MG cosh

t

4MG(8.4.9)

if r ≤ 2MG. With these coordinates we observe that

u2 − v2 = (MG)2( r

2MG− 1)er/2MG , (8.4.10)

We also see that r = 2MG corresponds to u2 − v2 = 0 which has the two solutions,u = ±v. We also have that

t = 4MG arctanhv

ur ≥ 2MG

t = 4MG arctanhu

vr ≤ 2MG. (8.4.11)

The differentials du and dv are related to dt and dr by

du =v

4MGdt+

(1 +

2MG

r − 2MG

)u

4MGdr =

v

4MGdt+ f(r)−1 u

4MGdr

dv =u

4MGdt+

(1 +

2MG

r − 2MG

)v

4MGdr =

u

4MGdt+ f(r)−1 v

4MGdr(8.4.12)

which is true for any value of r. We then see that

du2 − dv2 = (u2 − v2)1

(4MG)2(f(r)−2 dr2 − dt2)

=r

32MGer/2MG

(f(r)−1 dr2 − f(r) dt2

). (8.4.13)

Hence, the Schwarzschild metric can be written as

ds2 =32MG

rer/2MG(−dv2 + du2) + r2(dθ2 + sin2 θ dφ2) , (8.4.14)

where r can be in principle written in terms of u2 − v2 by inverting (8.4.10). From thisform of the metric, we can see that nothing bad happens as r = 2MG. We also see thatu is always a space-like coordinate while v is a time-like coordinate.

The transformations in (8.4.8) and (8.4.8) are singular transformations at r = 2MG.The coordinates u and v are continuous functions of r and t, but their derivatives arenot continuous. However, it is this singularity of the transformation that cancels out thecoordinate singularity of the metric, leaving a nonsingular metric at r = 2MG for the uand v coordinates

The metric however, is still singular since the point at r = 0 is a real singularity.We have previously seen in chapter 5 that the Schwarzschild metric has RµνλσRµνλσ =

74


u

v

Figure 8.5: The Schwarzschild geometry in terms of Kruskal coordinates. Dashed red lines areconstant t, solid black are constant r, solid orange is the real singularity at r = 0. The thickdashed red lines are the past and future horizons. The solid red line is a time like line, while theblue and the green solid lines are light-like trajectories outside and inside the future horizon.

48 (MG)2

r6, so the curvature is blowing up at r = 0. In terms of u and v we can see from

(8.4.10) that this corresponds to the equation u2 − v2 = −(MG)2.Figure 8.5 shows the Schwarzschild geometry in terms of the Kruskal variables u and

v. The dashed red lines are curves of constant t. The thicker dashed red lines are theconstant t lines at t = −∞,∞, r = 2MG. These lines are light-like and are called thepast and future horizons. The solid black curves are at constant r, while the solid orangeline is at r = 0 and is the real singularity of the Schwarzschild metric. The figure alsoshows a time-like world-line (solid red) which we can think of as some astronaut with alight-source. Before the astronaut crosses the future horizon it emits a light ray (solidblue). As you can see the blue line is crossing the lines of constant r so it is able to goout to r = ∞. The astronaut then crosses the future horizon in finite proper time andemits another light ray (solid green). This light ray cannot cross the horizon since it isabove the horizon line and the horizon itself is light-like. In fact, as the figure shows,the light ray itself will hit the singularity. If the light ray must hit the singularity, theneverything else is doomed to hit as well, including the unfortunate astronaut.

Since no light can escape once the horizon is crossed, the Schwarzschild geometryis also called a black hole. The size of the black hole is given by 2MG, so a blackhole with mass equal to the sun’s would be 1.5 km in size, smaller than the distancebetween Polacksbacken and Uppsala C. Even though black holes are black, they aredetectable indirectly because they are busy sucking in a large amount of matter thatemits x-rays as the matter is pulled in. There is also clear evidence for a black hole

75


at the center of our own Milky Way galaxy which is millions of solar masses in size.Around the black hole one can see stars orbiting an extremely heavy object. The massis easily determined from the orbits of the stars (you can see the evidence on youtube athttp://www.youtube.com/user/deholz).

One irony about black holes is that the curvature at the horizon gets smaller as theblack-hole gets bigger. Indeed, we have

RµνλσRµνλσ = 48(MG)2

r6=

3

4(MG)4(8.4.15)

at r = 2MG, so the bigger M , the smaller RµνλσRµνλσ. Recall from chapter 5 that weargued that the curvature was related to the derivative of the gravitational force, so forexample for the Schwarzschild metric

RµνλσRµνλσ = 12

(d2

dr2Φ(r)

)2

. (8.4.16)

This change in the force is called the tidal force and is responsible for ripping apartbodies of nonzero extent. Without getting into two many details we can determine thesize of this tidal force by making a few simplifying assumptions. For example, supposeour astronaut going through the horizon goes in feet first and he is ` =2 meters tall.Let us further assume that he weighs m =100 kg and that half the mass is at his feet(m/2) and half is at his head (this is obviously not true, but it will not affect the answerbeyond an order of magnitude). Then the tension T on the astronaut is the force at hisfeet minus that at his head which is

T ∼ 2MG` (m/2)

r3=

`m

2(2MG)2(8.4.17)

For a black hole with mass equal to that of the sun we have

T ∼ 10−4 kg m−1 ∼ 1013 Newtons (8.4.18)

which is a huge tension, obviously beyond the limits of a human body. However, if theastronaut is crossing the horizon of a black hole with mass equal to 106 solar masses, thethe tension will come down by a factor of 1012, leaving the long suffering astronaut witha tension he won’t even notice.

8.5 Integrals

The following integrals are referred to in the text:These integrals can be solved by trigonometric substitution, with z = sec ξ, dz =

sec ξ tan ξ dξ,

2

∫ ∞1

dz

z√z2 − 1

= 2

∫ π/2

0

sec ξ tan ξ dξ

sec ξ tan ξ= 2

∫ π/2

0

dξ = π , (8.5.1)

76


r

C’

C

Figure 8.6: Integration contours in the complex r plane. The original integral is the sum of theintegral from r1 to r2 (blue) minus the integral from r2 to r1 just below the cut (blue). This isequivalent to the contour integral on C which goes around the branch cut at r1 ≤ r ≤ r2. Thiscan be deformed into the contour C′ at r = 0 since there is no other singularities in the plane.

∫ ∞1

(z2 + z + 1)dz

z2(z + 1)√z2 − 1

=

∫ π/2

0

(sec2 ξ + sec ξ + 1) sec ξ tan ξ dξ

(sec ξ + 1) sec2 ξ tan ξ

=

∫ π/2

0

(1 + cos ξ + cos2 ξ) dξ

(1 + cos ξ)

=

∫ π/2

0

(d

dξ

(√1− cos ξ

1 + cos ξ

)+ cos ξ

)dξ

=

(√1− cos ξ

1 + cos ξ+ sin ξ

)∣∣∣∣∣π/2

0

= 2 . (8.5.2)

The next integrals we solve by contour integration in the complex r plane. The firstis

2

∫ r2

r1

√r1r2 dr

r√

(r2 − r)(r − r1)=

∫ r2+iε

r1+iε

√r1r2 dr

r√

(r2 − r)(r − r1)+

∫ r1−iε

r2−iε

√r1r2 dr

r√

(r2 − r)(r − r1)

=

∮C

√r1r2 dr

r√

(r2 − r)(r − r1), (8.5.3)

where C is the contour around the branch cut in figure 8.6. We can deform this contourto the contour C ′ that circles the singularity at r = 0. Hence we have

2

∫ r2

r1

√r1r2 dr

r√

(r2 − r)(r − r1)=

∮C′

√r1r2 dr

r√

(r2 − r)(r − r1)= 2π i

√r1r2√−r1r2

= 2π . (8.5.4)

77


The other integral is

2

∫ r2

r1

√r1r2 dr

r2√

(r2 − r)(r − r1)=

∮C′

√r1r2 dr

r2√

(r2 − r)(r − r1)

=

∮C′

√r1r2 dr

r2√−r1r2

(1 +

1

2

r1 + r2

r1r2

r + . . .

)= 0 + π

r1 + r2

r1r2

.

(8.5.5)

78

Chapter 9

Introduction to cosmology

In this chapter we start the cosmology part of the course. There has been a hugeamount of progress in cosmology over the last 15 years. One reason for this is better andbetter telescopes and other detection devices have become available, and with this newequipment has come new and often surprising results. Some would say that cosmologyis the most exciting field in physics right now.

9.1 The Friedmann-Robertson-Walker Universe

It turns out that we can make a surprising amount of progress if we assume that theuniverse is homogenous and isotropic. Recall that homogenous means that any onepoint in the universe is the same as any other point. Now of course, this is not true. Forexample, the universe at the surface of the earth is drastically different from the universein the region just outside the solar system. But over large distance scales, and for uslarge means millions of light years, the average density of stuff is homogenous. In otherwords, if we were to look at a spherical volume with radius of ten million light years attwo different regions in the universe, we would find about the same number of galaxiesinside each volume. To say that the universe is isotropic means that no matter whatdirection we look, it looks the same.

9.1.1 Flat, closed and open

There are three types of three-dimensional spaces that are homogenous and isotropic.All three can be summarized by the metric

ds2 = a2

(dr2

1− kr2+ r2(dθ2 + sin2 θ dφ2)

), (9.1.1)

where a is assumed to be independent of the spatial components. Assuming that rhas the dimensions of length then k has dimensions of inverse length squared and a isdimensionless (some books use the convention that r and k are dimensionless). The threepossible classes of metrics come from choosing k = 0, k > 0 or k < 0.

79


k = 0: If we absorb the factor of a into r then this is the usual flat three-dimensionalmetric written in polar coordinates.k > 0: To see what this is, we let r = k−1/2 sinχ, where χ is an angle. If r is between 0and k−1/2 then χ goes from 0 to π/2. With this substitution the metric becomes

ds2 = a2k−1(dχ2 + sin2 χ(dθ2 + sin2 θ dφ2)) . (9.1.2)

This is the metric for a three-dimensional sphere (called a 3-sphere, which is also writtenas S3) with radius a/

√k. This is a closed space so we would call a universe with this

sort of spatial geometry closed. Actually, the range 0 ≤ χ ≤ π/2 only covers half the3-sphere. To cover the entire sphere we have to extend χ to the range 0 ≤ χ ≤ π.k < 0: In this case we let r = (−k)−1/2 sinh ξ, where ξ ranges from 0 to ∞. The metricthen becomes

ds2 = a2k−1(dξ2 + sinh2 ξ(dθ2 + sin2 θ dφ2)) . (9.1.3)

This looks a lot like the metric for the 3-sphere, except now we have a sinh functioninstead of a sin. This is the metric for a hyperbolic space, and since r is unbounded thiswould be the space-time for an open universe.

To become better acquainted with these spaces let us compute their 3-dimensionalcurvatures. Because the metric in (9.1.1) is diagonal we have that

Γrθθ = −12grr∂rgθθ = −(1− kr2)r

Γrφφ = −12grr∂rgφφ = −(1− kr2)r sin2 θ

Γθrθ = Γφrφ =1

rΓrrr = 1

2grr∂rgrr = kr(1− kr2)−1 (9.1.4)

while Γθφφ and Γφθφ are the same as for the 2-sphere (see section 2.1 of chapter 5). Wethen find that

Rrθrθ = ∂rΓ

rθθ − ∂θΓrθr + ΓrµrΓ

µθθ − ΓrµθΓ

µrθ = ∂rΓ

rθθ + ΓrrrΓ

rθθ − ΓrθθΓ

θrθ

= −(1− kr2) + 2kr2 − kr2 + (1− kr2) = kr2

Rrφrφ = ∂rΓ

rφφ − ∂φΓrφr + ΓrµrΓ

µφφ − ΓrµφΓµrφ = ∂rΓ

rφφ + ΓrrrΓ

rφφ − ΓrφφΓφrφ

= −(1− kr2) sin2 θ + 2kr2 sin2 θ − kr2 sin2 θ + (1− kr2) sin2 θ = kr2 sin2 θ

Rθφθφ = Rθ

φθφ + ΓθrθΓrφφ = sin2 θ − 1

r(1− kr2)r sin2 θ = kr2 sin2 θ , (9.1.5)

where Rθφθφ is the Riemann tensor for the 2-sphere. It is then clear that for all combi-

nations we have

Riljm =k

a2(gijglm − gimgjl) . (9.1.6)

from which it follows that

Rij =k

a2(3gij − gij) =

2k

a2gij , R =

6k

a2. (9.1.7)

Hence we see that the 3-sphere has constant positive scalar curvature, flat space has zeroscalar curvature and the hyperbolic space has constant negative spatial curvature.

80


9.1.2 The Robertson-Walker metric

We now bring the time coordinate into the picture. We still assume that the spatial partof the metric has the form in (9.1.1), but now we assume that a is a time dependentfunction a(t). Hence the full 4-dimensional space-time metric is

ds2 = −dt2 + a2(t)

(dr2

1− kr2+ r2(dθ2 + sin2 θ dφ2)

). (9.1.8)

This is known as the Robertson-Walker metric. The spatial coordinates (r, θ, φ) areknown as the co-moving coordinates. We then look for solutions to Einstein’s equationsassuming that the space is filled with a homogenous isotropic substance, in other wordsa perfect fluid. The fluid can be one perfect fluid obeying the equation of state p = wρ,or it can be a combination of perfect fluids with different equations of state.

The other Christoffel symbols we need besides the ones in (9.1.4) have the form

Γtii = −12gtt∂tgii =

a

agii Γiti = 1

2gii∂tgii =

a

a, (9.1.9)

where a ≡ ddta. Hence, the extra Riemann tensors are

Rtiti = ∂tΓ

tii + ΓtµtΓ

µii − ΓtµiΓ

µit =

a

agii +

(a

a

)2

gii + 0−(a

a

)2

gii =a

agii , (9.1.10)

while the Riemann tensor Rijij gets modified to

Rijij = Ri

jij + ΓitiΓtjj =

k

a2gjj +

(a

a

)2

gjj , (9.1.11)

where Rijij are the Riemann tensor components in (9.1.5).

9.1.3 Scale dependence of the energy density

Recall that the energy-momentum tensor satisfies the conservation equation ∇µTµν = 0.

If we choose ν = 0 then we have

∇µTµ0 = ∂tT

00 + ΓµµλTλ0 + Γ0

µλTµλ = ρ+

∑i

(Γiit T

00 + Γtii Tii)

(9.1.12)

where the sum over i is on the spatial coordinates and ρ = ddtρ. Each component of giiT

ii

equals the pressure p, hence, using (9.1.9) we have

0 = ρ+ 3a

aρ+ 3

a

ap = ρ+ 3(ρ+ p)

a

a. (9.1.13)

If the perfect fluid has an equation of state p = w ρ then (9.1.13) can be rewritten as

d

dtlog ρ = −3(1 + w)

d

dtlog a , (9.1.14)

81



ρ =(a0

a

)3(1+w)

ρ0 , (9.1.15)

where ρ0 is an integration constant which fixes the density when a = a0.Let us now give some physical motivation for this result by considering some cases of

perfect fluids. For dust, which has w = 0, we have nonrelativistic particles. Suppose wehave a fixed number of particles N in a volume V0 when a = a0. If the rest mass is mthen the energy density is ρ = mN

V0≡ ρ0. If at a later time we have a different value of

a, then the number N and mass m stay the same, but the physical volume has changedto (a/a0)3 V0 and so the energy density becomes

ρ = mN

V0

(a0

a

)3

=(a0

a

)3

ρ0 , (9.1.16)

which agrees with (9.1.15) when w = 0.For radiation we have w = 1/3. Let us suppose that we have N photons with

wavelength λ0 inside a volume V0 when a = a0. In this case the energy density isρ = N h

λ0V0≡ ρ0, where h is Planck’s constant. At a later time where we have a different

value of a the volume again changes to (a/a0)3 V0 and the wavelength also changes to(a/a0)λ0. Hence the energy density becomes

ρ =N h

λ0V0

(a0

a

)4

=(a0

a

)4

ρ0 , (9.1.17)

which agrees with (9.1.15) when w = 1/3.For vacuum energy which has w = −1, we have that ρ = − Λ

8πGg00 = Λ

8πG= ρ0 for

all values of a, where Λ is the cosmological constant. This too agrees with (9.1.15) whenw = −1.

9.1.4 The Friedmann equations

We are now ready to solve Einstein equations for our homogenous, isotropic universe.We first find the Ricci tensors and Ricci scalar for the Robertson-Walker metric. Thevarious nonzero components of the Ricci tensor are

Rtt =∑i

Ritit =

∑i

giia

agiigtt = −3

a

a

Rjj = Rtjtj +

∑i 6=j

Rijij =

a

agjj + 2

k

a2gjj + 2

(a

a

)2

gjj , (9.1.18)

where we used that gtt = −1 and that gii = (gii)−1, while the Ricci scalar is

R = gttRtt +∑j

gjjRjj = 6

(a

a+k

a2+

(a

a

)2). (9.1.19)

82


The nonzero components of the Einstein tensor are then

Gtt = Rtt − 12gttR = 3

(k

a2+

(a

a

)2)

Gjj = Rjj − 12gjjR = −

(2a

a+k

a2+

(a

a

)2)gjj . (9.1.20)

Using T00 = (gtt)2T 00 = ρ and Tjj = gjjgjjT

jj = gjj p, Einstein’s equations give us thefollowing equations:

3

(k

a2+

(a

a

)2)

= 8π Gρ

−

(2a

a+k

a2+

(a

a

)2)

= 8π Gp , (9.1.21)

One can show that the second equation follows from the first equation and eq. (9.1.13)(see exercise). We can rewrite the first equation as

H2 ≡(a

a

)2

=8πG

3ρ− k

a2, (9.1.22)

H is the Hubble constant which measures the expansion rate of the universe. We willhave much more to say about this later. Equation (9.1.22) is known as the Friedmannequation. Notice this equation assumes a perfect fluid but makes no assumption aboutthe composition of the fluid. It could be a mixture of perfect fluids or one perfect fluidwith a particular equation of state p = w ρ.

All indications are that the universe is flat and so k = 0. If H0 is the Hubble constanttoday, then in this case the energy density has to equal a critical density ρ = ρc where

ρc =3(H0)2

8πG. (9.1.23)

If ρ > ρc then k > 0 and the universe is closed. If ρ < ρc then k < 0 and the universeis open. The present Hubble constant is about 70 km/sec/Mpsc, where Mpsc is ”mega-parsec”. A megaparsec is 3.26 × 106 ly = 3.09 × 1022 m, so plugging in numbers givestoday’s critical density to be

ρc =3(7× 104 m s−1/3.09× 1022 m)2

8π(6.67× 10−11 m3 kg−1s−2)≈ 9.2× 10−27 kg m−3 , (9.1.24)

which is a little less than the energy density of 6 hydrogen atoms for every cubic meter.While this might not seem like much, consider how much visible matter is in the universe.There are about 8× 1010 galaxies in the universe and about 4× 1011 stars per galaxy. Ifwe assume that every star has the mass of the sun, this gives a total mass of 6× 1052 kg.The size of the universe is about 14000 Mpsc and so its volume is roughly 3 × 1080 m3.

83


This gives an overall density from stars as 2× 10−28 kg m−3. Hence the stars contain atmost 5% of the critical density.

Going back to the Friedmann equation we can reexpress (9.1.22) as

12

(a)2 − 4πG

3a2ρ = −1

2k , (9.1.25)

As we have seen, ρ is a function of a and so the equation in (9.1.25) looks like theequation for the total energy for one dimensional motion of a coordinate a, where 1

2(a)2

is the kinetic term, −4πG3a2ρ ≡ V (a) is the potential term and 1

2k is the total energy.

Let us assume that the density is comprised of the three types of fluids, matter (akadust), radiation and vacuum. The density then has the form

ρ = ρm

(a0

a

)3

+ ρ r

(a0

a

)4

+ ρv , (9.1.26)

where ρm, ρ r and ρv are the energy densities for matter, radiation and vacuum energyrespectively, when a = a0. The effective potential is then

V (a) = −4πG

3(a0)2

(ρm

a0

a+ ρ r

(a0

a

)2

+ ρv

(a

a0

)2). (9.1.27)

Figure 9.1 shows the effective potential as a function of a in the case when ρv = 0 andfor the cases of k > 0, k = 0 and k < 0. We assume that both ρm and ρ r are nonzero.As you can see from the effective potential, the radiation term dominates for small a andthe matter term dominates for large a. We call these regions the radiation dominateduniverse and the matter dominated universe respectively. We assume that the universestarts at t = 0 with a = 0. This is the big bang where the universe starts at zero size (atleast classically). In the case where k > 0, the universe “rolls up the hill”, slowing downuntil it reaches the k > 0 line where it stops and turns around. It keeps acceleratingdownward until it reaches a = 0 again. This is the big crunch where the universe shrinksto zero size. If k = 0 then the universe rolls up the hill, slowing down until reaches zerovelocity at a = ∞. Hence, in this case the universe expands forever with the Hubbleconstant decreasing in time until it reaches zero. In the k < 0 the universe also expandsforever but approaches a nonzero a in the limit of large a.

If there is a vacuum energy density present then there can be drastic changes. Figure9.2 shows the effective potential with ρv > 0. The figure shows two different values withk > 0. In one case with the larger value of k, the universe is not expanding fast enoughto overcome the hump, in which case the expansion stops and then the universe contractsback to the big crunch. However, for a smaller value of k, the universe can get over thehump, at which point the expansion starts accelerating outward. This will also be truefor the k = 0 and the k < 0 cases as well.

We can now also see why Einstein needed the cosmological constant to get a staticuniverse. If we look at figure 9.2 again we see that there can be a solution where a = 0.This is at the top of the hump where d

daV (a) = 0.

84


a

ΡHaL

Figure 9.1: The effective potential for a with ρv = 0. The green line is k > 0, the red line isk = 0 and the blue line is k < 0.

a

ΡHaL

Figure 9.2: The effective potential for a with ρv > 0. The green line and orange lines are twodifferent values with k > 0, the red line is k = 0 and the blue line is k < 0.

9.1.5 Solutions for a definite equation of state

Let us now assume that the universe is made up of a material with a definite equationof state p = wρ. Then from (9.1.15) and the Friedmann equation (9.1.22) we have

(a)2 =8πGρ0

3

(a0)3+3w

a1+3w− k , (9.1.28)

where ρ0 is the density when a = a0. If we further assume that k = 0, then ρ0 is thecritical density ρc and we can reduce the equation to

a a1/2+3w/2 = H0 (a0)3/2+3w/2 , H0 =

√8πGρc

3, (9.1.29)


a = a0

(3(1 + w)

2H0 t

) 23(1+w)

, (9.1.30)

85


where we have assumed that a = 0 at t = 0. The Hubble constant as a function of timeis then

H(t) =a

a=

2

3(1 + w)t−1 , (9.1.31)

which is inversely proportional to the age of the universe t. Another interesting parameteris the deceleration parameter

q = − aa

(a

a

)−2

=1 + 3w

2. (9.1.32)

Let us now consider a few cases. One is a matter dominated universe with w = 0. Inthis case the Hubble constant and deceleration parameters are

H(t) =2

3t−1 q =

1

2. (9.1.33)

As we see, the expansion is positive (H > 0), but slowing down (q > 0). Using thepresent measured value of the Hubble constant, H0 = 70 km/sec/Mpsc translates intoan age t0 = 2

3H−1

0 = 2.9 × 1017 sec = 9.3 × 109 years. However, there are galaxies thatare older than this, so a matter dominated universe does not seem consistent with thisHubble constant.

For a radiation dominated universe we have w = 1/3 and so

H(t) =1

2t−1 q = 1 . (9.1.34)

Again this has a positive expansion which is decelerating. Given the present measuredvalue of the Hubble constant, this would correspond to an age of the universe which isonly 3/4 the age found for a matter dominated universe, which itself is too small. In anycase, we know that there is now far more energy density in matter than in radiation.

Another interesting value for w is w = −1/3, in which case q = 0. For w < −1/3then q < 0, which means the expansion rate is accelerating.

The vacuum dominated case has to be treated separately. In this case w = −1 andwe see that the equation in (9.1.31) is singular. However, if we go back to (9.1.29) andset w = −1, we find that the solution is

a(t) = av exp(H0t), . (9.1.35)

In this case the Hubble constant is really a constant with H(t) = H0. Notice that thevacuum universe never has a = 0 in the finite past. However, if we combine a matterdominated or radiation dominated universe at early times with a vacuum dominateduniverse at later times then we can have a universe that started at a = 0 in the finitepast.

86


9.2 Observing the Universe

The expansion of the universe affects the observations we make of stars, supernovae andgalaxies. In this section we will study how this happens. We shall see that the expansionleads to a red shift where the red-shift depends on the distance the object is away fromus. We will also see how measuring the brightness of supernovae helps us determine thecomposition of the universe.

9.2.1 Red-shifts, proper distances and the age of the universe

Astronomers describe red-shifts in terms of a quantity Z,

Z ≡ λobs

λ em− 1 , (9.2.1)

where λ em is the emitted wavelength and λobs is the observed wavelength. Let us assumethat we are the observers at r = 0 and the light source is at comoving radial coordinater. Before doing anything else, let us quickly show that a body with fixed co-movingcoordinates satisfies the geodesic equation. If we assume that r = θ = φ = 0, then

r + Γrtt tt = r = 0 , (9.2.2)

with a similar equation for θ and φ. Hence the body stays fixed on the co-movingcoordinates.

Assume that the light source emits light with wavelength λ em. Let us further assumethat the light is first emitted at t1 and one cycle later the emitted time is t1 + λ em. Thegeodesic for the light satisfies dr

dt= −a−1(t)

√1− kr2 and so r is related to the time the

light is first received, t0 by ∫ r

0

dr√1− kr2

=

∫ t0

t1

dt

a(t), (9.2.3)

while after one cycle the relation is∫ r

0

dr√1− kr2

=

∫ t0+λobs

t1+λem

dt

a(t). (9.2.4)

Assuming that t0 − t1 >> λ em, λobs, and so a(t0 + λ em) ≈ a(t0), a(t1 + λobs) ≈ a(t1),we then have from (9.2.3) and (9.2.4)

λobs

a(t0)− λ em

a(t1)= 0 , (9.2.5)

and hence it follows that

Z =a(t0)

a(t1)− 1 . (9.2.6)

Intuitively we can understand this result as follows: While the co-moving distances havestayed constant, the physical distances, also known as the proper distances have increased

87


linearly with a as a increased. In particular if the proper wavelength were a(t1)∆r at t1,then it would be a(t0)∆r at t0, giving the result in (9.2.6).

Let us now suppose that t1 is not too far in the distance past, at least compared tothe age of the universe. Then we can approximate Z as

Z ≈ a(t0)

a(t0)+ (t0 − t1)

a(t0)

a(t0)− 1 ≈ H0 d , (9.2.7)

where

d = a(t0)

∫ r

0

dr√1− kr2

(9.2.8)

is the proper distance between the light source and the observer and H0 is today’sHubble constant. Over relatively short times the Hubble constant is actually close tobeing constant and so one sees that the red-shift is proportional to the proper distancebetween the light source and the observer. This approximation is valid when Z << 1.For example, for a source that is 10 Mpscs away we would have Z = 700 km/s = 0.0023.Hubble first measured H0 by comparing the red-shifts of light sources to their distances.The original measurements were not very accurate because it is notoriously difficult tomake accurate distance measurements. In fact, not that long ago the Hubble constantwas usually quoted to be somewhere between 50 and 100 km/s/Mpsc. It is only over thelast 15 years that the accuracy has dramatically improved.

For larger red-shifts the relation between the proper distance and the red-shift requiresfurther information about a(t). From (9.2.3) and (9.2.8) we see that the proper distanceis given by

d = a0

∫ t0

t1

dt

a(t)= a0

∫ a0

a1

da

aa= a0

∫ a0

a1

da

H(a)a2, (9.2.9)

where a0 = a(t0) and a1 = a(t1). H(a) is determined from the Friedmann equation in(9.1.22) and hence we are left with an integral over a without any explicit t dependence.The integral will then only depend on ρ and Z. In fact, we can express H as H(Z) bysubstituting (9.2.6) and (9.1.26) into the Friedmann equation. If we assume that thereis only matter and vacuum energy, then we have

H2(Z) =8πG

3

(ρm(Z + 1)3 + ρv

)− k(Z + 1)2 , (9.2.10)

where ρm and ρv are the present values of the matter and vacuum energy densities andwhere k = k/(a0)2. We can then turn the integral over a to an integral over Z by alsousing the relation between the differentials,

da = −a0dZ

(Z + 1)2. (9.2.11)

Actually, from (9.2.10) we see that the presence of k has the same effect on H(Z)and d(Z) as having a perfect fluid with w = −1/3. To put this term on a more even

88


footing with the matter and vacuum energy contributions we will introduce a quasienergy density ρq which is related to k by

ρq = − 3

8πGk . (9.2.12)

Putting everything together we can express the proper distance d as the integral

d =

(3

8πG

)1/2 ∫ Z

0

dZ ′√ρm(Z ′ + 1)3 + ρq(Z ′ + 1)2 + ρv

. (9.2.13)

It is convenient to express these integrals in terms of the ratios of energy densities tothe critical density, by defining Ωm ≡ ρm/ρc, Ωq ≡ ρq/ρc and Ωv ≡ ρv/ρc, and soΩm + Ωq + Ωv = 1. The integral then has the form

d =1

H0

∫ Z

0

dZ ′√Ωm(Z ′ + 1)3 + Ωq(Z ′ + 1)2 + Ωv

. (9.2.14)

For small Z, we see that

d ≈ 1

H0

∫ Z

0

dZ ′√Ωm + Ωq + Ωv

=Z

H0

, (9.2.15)

which agrees with (9.2.7). For larger values of Z, the integral in (9.2.14) is in generalan elliptic integral, so we will instead consider special cases. For instance, for a matterdominated flat universe with Ωq = Ωv = 0 the integral is

d =1

H0

(2− 2√

Z + 1

). (9.2.16)

Observe that in the limit Z →∞, d = 2H0

. Notice that this is different than the universe’s

age in a matter dominated universe, t0 = 23H0

.Let us next consider the case of a vacuum dominated flat universe with Ωm = Ωq = 0.

In this case (9.2.14) is

d =1

H0

Z . (9.2.17)

Hence, in this case the large Z behavior is the same as the small Z behavior. We canalso see that for a fixed value of Z, d is larger for a vacuum dominated universe than amatter dominated universe.

If we have a combination of these last two examples, we can approximate the integralsby assuming that the universe is vacuum dominated up to a crossover red-shift Zc andis matter dominated beyond that. Hence for a Z > Zc we can approximate d as

d ≈ =1

H0

Zc +1

H0

∫ Z

Zc

dZ ′√Ωm(Z ′ + 1)3

≈ 1

H0

(Zc + 2(Zc + 1)

(1−

√Zc + 1

Z + 1

)), (9.2.18)

89


where in this approximation we used that Ωm(1 + Zc)3 ≈ 1.

Before closing this section let us make a final note about the universe’s age in auniverse that has a crossover from matter dominated to vacuum dominated at a latertime. If the crossover time is at tc and today’s time is t0, then using that H stays constantduring the vacuum dominated period, we have that

tc ≈2

3H0

t0 = tc + ∆t , (9.2.19)

where ∆t is however long it has been in the vacuum dominated universe. We alreadyknow that tc ≈ 9.3×109 years, so we can add time onto the age of the universe by havingan extended vacuum dominated period. The present estimate is that the universe is flatwhich means Ωq = 0, while the vacuum and matter parts are Ωv = 0.74 and Ωm = 0.26.Assuming at the crossover time the densities were the same, we have that

Ωm

Ωv=

(a(tc)

a(t0)

)3

= exp(−3H0∆t) . (9.2.20)

Hence we find that

∆t ≈ 1

3H0

log(7.4/2.6) ≈ 4.8× 109 years . (9.2.21)

Which puts the age of the universe to around 14 billion years old. From various experi-ments the agreed upon age is about 13.7± 0.13 billion years. To get this more accuratevalue, from the Friedmann equation we can express the age as

t0 =

(3

8πG

)1/2 ∫ a0

0

da

a√ρ(a)

, (9.2.22)

where ρ(a) = ρc(Ωm(a0a

)3+ Ωv). Defining a variable y = a/a0 we can write the integral

as

t0 =1

H0

∫ 1

0

dy√Ωm y−1 + Ωv y2

. (9.2.23)

Taking the above values for Ωm and Ωv, as well as the best measurement of the Hubbleconstant, which is H0 = 70.4 ± 1.5 km s−1/Mpsc, and inserting these into (9.2.23) wefind

t0 =1

H0

∫ 1

0

dy√(0.26) y−1 + (0.74) y2

≈ 1

H0

(1.003) ≈ 13.89± 0.29× 109 years ,(9.2.24)

where the integral has been done numerically.

90


9.2.2 Brightness

Suppose we have a light source with luminosity L, the total energy released by theobject per unit time. The brightness as measured by an observer is equal to the flux,the energy per unit time per unit area. Assuming that the light source radiates equallyin all directions, then the flux F measured by an observer today is

F =L

4π(a0r)2

1

1 + Z

1

1 + Z. (9.2.25)

Let us explain each of these factors. First, the flux should be inversely proportional tothe area of a sphere whose physical radius, according to the Robertson-Walker metric in(9.1.8), is a0r. In the case where k = 0 then this is the proper distance d separating thesource and the observer. If k > 0 then a0r = (k)−1/2 sin((k)1/2d), while for k < 0 therelation is a0r = (−k)−1/2 sinh((−k)1/2d). One factor of (1 + Z)−1 in (9.2.25) accountsfor the red-shift of the radiation. Since the energy of a photon is inversely proportionalto the wavelength, the energy flux has to be rescaled accordingly. The other factor of(1 + Z)−1 takes into account time dilation. The luminosity is proportional to the ratethat photons are being emitted. But there is a time dilation factor that is proportionalto the red-shift, so the photon emission rate goes down by the same factor.

It is convenient to express the flux in terms of the apparent distance, D(Z)

F =L

4π(D(Z))2, (9.2.26)

where

D(Z) =1

H0

(1 + Z)|Ωq|−1/2S(|Ωq|1/2H0 d(Z)

). (9.2.27)

Here we used that k = −Ωq(H0)2 which follows from (9.2.12). The function S(x) isS(x) = x for k = 0, S(x) = sin(x) for k > 0 and S(x) = sinh(x) for k < 0.

As an example, for a matter dominated flat universe we have from (9.2.16) that

F =L(H0)2/4

4π(1 + Z)2 (1− (1 + Z)−1/2)2 ≈ L(H0)2

4πZ2Z << 1

≈ L(H0)2

16πZ2Z >> 1 . (9.2.28)

For a vacuum dominated flat universe we have from (9.2.17) that the flux is

F =L(H0)2

4π(1 + Z)2Z2≈ L(H0)2

4πZ2Z << 1

≈ L(H0)2

4πZ4Z >> 1 . (9.2.29)

Hence, for large values of Z the light source will appear dimmer in a vacuum dominateduniverse than in a matter dominated universe.

91


Figure 9.3: Light curves for various supernovae. The left graph shows that the shorter thelifetime the dimmer the supernova. The right graph shows the light curves after rescaling fortheir lifetime. The results are remarkably uniform. The absolute magnitude is how bright thesupernova would appear if it were 10 parsecs away from us. An absolute brightness of -20means that if the supernova were 10 parsecs away from us it would be more than 600 timesbrighter than the full moon (which has apparent magnitude -13).

Astronomers give brightness measurements in terms of apparent magnitude, m. Forhistorical reasons its definition has now become

m ≡ −2.5 log10(F/F0) +m0 (9.2.30)

where m0 and F0 are the apparent magnitude and flux for some standard source. So forexample, the star Vega has apparent magnitude m = 0.03. If we measured a light sourcewith m = 5.03 then this object is 100 times dimmer than Vega. Given the form of theflux in (9.2.26), we see that the apparent magnitude for our light source is

m = +5 log10

(D(Z)

)− 2.5 log10(L/(4πF0) +m0 . (9.2.31)

Hence, by measuring m and by knowing L we can determine D. If we make the mea-surements for many different values of Z then D(Z) can be determined and so then canthe various energy densities.

The big problem is finding a light source whose luminosity is known. In order tosee over significant red-shifts, this light source also has to be very bright. Over the last15 or 20 years such a light source has been identified. These are what are known astype Ia supernovae. They are a particular type of exploding star whose luminositiesas a function of the time following the explosion have become well understood. Somesupernovae are brighter than others, but it was observed that the brighter ones stayedbrighter longer and that a simple rescaling of luminosity by the supernova lifetime gavea remarkably uniform result. The plots of luminosity versus time for these supernovaeare known as their light curves (see figure 9.3). So by observing a supernova over itslifetime (typically weeks) one can determine a “standard candle”, that is its luminosity.Moreover, supernovae are very bright. They are often brighter than the galaxy they sitin and the Hubble space telescope has been able to observe supernovae with Z up to 4.

92


0.1 0.2 0.5 1.0 2.0 5.0Z

20

22

24

26

28

30

m

Figure 9.4: Hubble plots for (Ωm,Ωv)=(0.3, 0.7) [blue)], (0.3, 0.0] [red], and (1.0, 0.0) [green].On the vertical axis we show the change in brightness from a source at Z = 0.05. The Z axisis shown on a log scale with 0.05 < Z < 5. The middle case has k < 0 while the other two havek = 0.

1.000.500.20 0.300.15 0.70Z

21

22

23

24

25

m

Figure 9.5: The same Hubble plots as in figure 9.4 but over the narrower range 0.1 < Z < 1.0.Here there is a clearer distinction between the different scenarios.

In figure 9.4 we show plots of magnitude versus Z for 3 possible scenarios for theenergy density of the universe. This type of plot is known as a Hubble plot. The particularcases here have (Ωm,Ωv) given by (0.3, 0.7), (0.3, 0) and (1.0, 0.0). The first and thethird are flat since they have Ωm + Ωv = 1, while the middle case is open since it hasΩm + Ωv < 1. The magnitudes are those for rescaled supernovae brightness, whereZ = 0.1 is about m = 20. The plot shows Z on a log scale ranging from 0.05 to 5. Noticethat once Z becomes greater than 1 the first and second plots start to approach other.Hence, the best place to distinguish between these two behaviors is for 0.1 < Z < 1.Figure 9.5 shows the same Hubble plots in this narrower range. Another useful plot isshown in figure 9.6 which shows the differences in magnitude between the three Hubbleplots and the middle one.

The actual data is shown in figure 9.7, which combines the results of two major groups,the High-Z Supernova Search Team which uses the Hubble Space Telescope (HST) and

93


0.01 0.02 0.05 0.10 0.20 0.50 1.00Z

-0.5

0.0

0.5

1.0Dm

Figure 9.6: The same Hubble plots as in figure 9.4 but shown as a difference between the(Ωm,Ωv) = (0.3, 0.0) case and the other cases.

the Supernova Cosmology Project using various telescopes including the HST. While itmight not seem completely obvious from the plots, there is a huge statistical significancethat the points on the plot follow the curve for a vacuum dominated universe withΩv ≈ 0.70. This is made more clear in the difference plot at the bottom, where clearlymore points are above the 0.0 line than below it.

94


Figure 9.7: Hubble plots from the actual data. The data points track closest to the (0.3, 0.7)plot.

95

Chapter 10

The thermal history of the universe

In this chapter we discuss the thermal history of the universe. This includes the black-body spectrum of the cosmic microwave background.

10.1 The thermal universe

10.1.1 Photons in the early universe

Since the universe has been expanding for a long while, at some time in the past it musthave been highly compressed. This means that the particles in the universe were veryclose together and thus interacting with each other. Since the particles could interactthey would be continuously exchanging their energies and hence would be expected tobe in thermal equilibrium.

Among the particles that were interacting are the photons. Photons are masslessbosons that have two degrees of freedom, its two independent polarizations. Assumingthat the photons were in a thermal bath with temperature T , then their energy densityper wavelength λ would have been given by

ρ(λ, T ) =4πhc g∗λ5

1

exp(hc/λkT )− 1(10.1.1)

where g∗ is the number of degrees of freedom which is two if only photons contributeto ρ(λ, T ), h is Planck’s constant and k is the Boltzmann constant. ρ(λ, T ) is Planck’senergy density distribution for a blackbody. ρ(λ, T ) is peaked where ∂

∂λρ(λ, T ) = 0,

which occurs when

∂

∂λρ(λ, T ) =

4πhc g∗λ5

1

exp(hc/λkT )− 1

(−5

λ+

hc

λ2kT

exp(hc/λkT )

exp(hc/λkT )− 1

)= 0(10.1.2)

This can be solved numerically where one finds that the peak is at

λc =hc

bkTb ≈ 4.9651 . (10.1.3)

96


Figure 10.1: The blackbody spectrum of the CMB as observed by Cobe.

The integrated energy density is

ρ(T ) =

∫ ∞0

ρ(λ, T ) dλ =π2g∗30

k4

(~ c)3T 4 =

2g∗cσ T 4 , (10.1.4)

where σ = 5.67× 10−8 watts m−2 K−4 is the Stefan-Boltzmann constant.It is often common to present the data in terms of ρ(ν, T ) where ν is the frequency

and ρ(T ) =∫∞

0dν ρ(ν, T ). In this case the peak in the frequency is at

νc =kT

bhb ≈ 2.821 . (10.1.5)

As the universe expands, the wavelengths of the photons are redshifted and hencethe photons and the particles they interact with cool down. Eventually, the temperatureis low enough that the electrons recombine with the protons to form hydrogen atomsand there are no longer charged particles available to scatter with the photons. Theseleftover photons can then essentially travel forever without scattering until we see themin our detectors.

Let us say that the temperature that this recombination occurs is at T = 3740 K,which corresponds to kTrec = 0.323 eV. Let us call the universe’s age at this point trec.Then at time trec, the photons have the energy density spectrum ρ(λ, Trec). After thistime the photons stop scattering, but their energy density will stay intact although itwill be redshifted. In particular, a wavelength λ at trec becomes a(t)

a(trec)λ at a later time

97


t. The combination ρ(λ, Trec) dλ then rescales to

ρ(λ, Trec) dλ →(a(trec)

a(t)

)4

ρ

(a(trec)

a(t)λ, Trec

)a(trec)

a(t)dλ

=4πhc g∗λ5

1

exp(hc a(t)/a(trec)λkTrec)− 1dλ

= ρ

(λ,a(trec)

a(t)Trec

)dλ , (10.1.6)

where we multiplied by the scale factor for the radiation energy density and we replacedλ in ρ(λ, T ) with its value at trec. Hence, the new energy density still has the form

of a blackbody, but now the temperature is red-shifted to a(trec)a(t)

Trec. Hence, today weshould expect to see the remnant radiation from this period when the photons werescattering off of the electrons and protons just before recombination. This radiation iscalled the cosmic microwave background (CMB). Figure 10.1 shows the actual spectrumfor the CMB as observed by the COBE satellite. What is presented is the intensity perfrequency which is given by 1

4c ρ(ν, T ). Eyeballing the figure you can see that the peak

is at νc = 160 GHz which corresponds to a wavelength of λc = 1.875 mm. Plugging thefrequency into (10.1.3) we find that the temperature today is

T0 =(6.626× 10−34 j-s)(160× 109 s−1)

(2.821)(1.381× 10−23 j K−1)= 2.73 K . (10.1.7)

As you can see from the figure, T0 is known even more accurately than our estimate sincethe position of the peak can be more accurately determined.

Comparing T0 to Trec, we then find that Z at recombination is

Zrec =TrecT0

− 1 ≈ 3740

2.73− 1 ≈ 1369 . (10.1.8)

Hence, these photons must have been last scattered when the universe was very young.To find out how young this was, let us suppose that the universe was matter dominated atrecombination (we will justify this soon). From the Friedmann equation and analogouslyto eq. (2.22) in chapter 9, we have that trec is

trec =

(3

8πG

)1/2 ∫ arec

0

da

a√ρ(a)

=1

H0

∫ arec/a0

0

dy√Ωm y−1 + Ωv y2

≈ 1

H0

√Ωm

∫ Z−1rec

0

y1/2 dy =2

3H0

√Ωm Z3

rec

. (10.1.9)

Using H0 = 70 km/s/Mpsc, Ωm = .26 and Zrec = 1369 gives

trec =2 (3× 108 m/s)

3(7× 104 m/s Mpsc−1√.26(1369)3

≈ 3.6× 105 years . (10.1.10)

Hence the age of the universe at recombination is a small fraction of its age today.

98


10.1.2 Other contributions to g∗

Up to now we have been assuming that the radiation inside the universe is only comprisedof photons. But there are other light particles in nature that are relativistic throughoutthe history of the universe. Their contribution to the energy density is partly determinedby when these particles fell out of thermal equilibrium. For example, in addition to thephotons there should be neutrinos present as well. We can take account of these extradegrees of freedom by modifying the g∗ that appears in (10.1.4) to

g∗ =∑

bosons

gi

(TiT0

)4

+7

8

∑fermions

gi

(TiT0

)4

(10.1.11)

where gi is the number of degrees of freedom of each species and Ti is its temperaturetoday. Fermions have the extra factor of 7/8 because they have Fermi-Dirac statisticsand so their spectra have the form

ρi(λ, T ) =4πhc giλ5

1

exp(hc/λkT ) + 1. (10.1.12)

The + sign in the denominator then leads to the 7/8 factor after doing the integral in(10.1.4). Note that the particles can have different temperatures today because they areno longer interacting with each other, so they cannot exchange energies to equalize theirtemperatures if they are different.

Aside from the photons, the only other particles contributing to g∗ today are theneutrinos. There are three types of neutrinos (νe, νµ, ντ ) and antineutrinos (νe, νµ,ντ ), where the neutrinos are lefthanded meaning that their spin is antialigned with theirmomentum while the antineutrinos are righthanded meaning that their spin is alignedwith its momentum. Hence the number of degrees of freedom from all neutrino andantineutrino species is gi = 6.

We next derive the neutrino temperature. The neutrinos are weakly interacting andso they stopped interacting with other particles at a temperature around kT = 1 MeV.At temperatures above 1 MeV electrons and positrons are highly relativistic since theirmass is 0.511 MeV and so they would have been contributing to g∗. Now the entropydensity s is given by

s(T ) =1

T

(ρ(T )− f(T )

), (10.1.13)

where f(T ) is the free energy density. For highly relativistic particles fi(T ) = −13ρi(T ),

where i labels the particle species. Hence the contribution to the entropy from speciesi is si(T ) = 4

3ρi(T )/T . At temperatures above kT = 1 MeV, the contribution to the

entropy density of the electrons and positrons will therefore be

sep(T ) = 78

42sγ(T ) , (10.1.14)

where sγ(T ) is the entropy density of the photons and the factor of 4 counts the number ofdegrees of freedom for the electrons and positrons (2 spins for each). As kT drops below

99


1 MeV, it is no longer energetically favorable to have electron positron pairs, so they willannihilate into photons. Let us say that this conversion takes place at temperature Ta.However, during this annihilation process the entropy density cannot drop so all of theentropy goes into the photons, forcing them to heat up. Hence we have

sep(Ta) + sγ(Ta) =11

4sγ(Ta) = sγ(Trh) , (10.1.15)

where Trh is the temperature of the reheated photons after the annihilation. Since the

entropy densities are proportional to T 3, we have that Trh =(

114

)1/3Ta. However, the

neutrinos, which are no longer in thermal equilibrium will remain at the temperatureTa since there cannot be any energy transfer over to them. Hence we learn that for the

neutrinos, Ti =(

114

)−1/3T0. Applying this result, we find that the effective number of

degrees of freedom g∗ is

g∗ = 2 +7

86

(4

11

)4/3

= 3.36 . (10.1.16)

Let us now justify that the universe was matter dominated at recombination. Theenergy density of the photons and neutrinos at a temperature Trec is given by

ρ r(Trec) =2g∗cσ T 4

rec =2(3.36)(5.67× 10−8 watts m−2 K−4)(3740 K)4

3× 108 m s−1

= 0.249 joules m−3 = 2.76× 10−18 kg m−3 . (10.1.17)

The matter density at recombination was

ρm(Trec) = ρc Ωm Z3rec = 9.2× 10−27 kg m−3(0.26)(1369)3 ≈ 6.2× 10−18 kg m−3(10.1.18)

and so ρm(Trec) > ρ r(Trec), although not by much.To see when there was crossover to the radiation dominated period at time trc, we

note that this occurs when

ρ r(T ) = ρm(T ) ⇒ 2g∗cσ T 4 = ρcΩm

(T

T0

)3

(10.1.19)

and so

Z ≈ T

T0

=cρcΩm

2g∗σ(T0)4=

(3× 108 m s−1)(9.2× 10−27 kg m−3)(0.26)

2(3.36)(5.67× 10−8 watts m−2 K−4)(2.73 K)4≈ 3075 .(10.1.20)

Hence, if we do the same calculation as we did for trec we find

trc ≈2(3× 108) m/s

3(7× 104 m/s Mpsc−1√.26(3075)3

≈ 1.1× 105 years . (10.1.21)

100


10.1.3 The recombination temperature

In a previous section we stated that kTrec = 0.323 eV. Here we compute this result. Theidea is that the matter in the universe, at least the matter that we can see is mainlymade up of electrons, protons and neutrons. We will ignore the neutrons for now andconsider the electrons and protons only.

An electron can combine with the proton to form a hydrogen atom, and in order toconserve energy and momentum a photon must also be created. Likewise, an energeticphoton can hit an hydrogen atom, ionizing it into a separate electron and proton. Hencewe have the chemical reactions

H + γ ↔ e− + p . (10.1.22)

The number densities of the H, e− and p are all given by the Boltzmann density

ni(T ) =

∫d3p

(2π~)3e−p

2/2mikT e(µi−mi)/kT =

(mikT

2π~2

)3/2

e(µi−mi)/kT , (10.1.23)

where mi is the mass and µi is the chemical potential for whichever of the three we arediscussing. We could also include a factor for the spin degrees of freedom, but its effectwill divide out below so we ignore it. The chemical potential of the photons is zero sincethey are not conserved. Thus, if the particles are in chemical equilibrium then theirchemical potentials satisfy the relation µH = µe + µp. Therefore, we find that

nH(T )

ne(T )np(T )=

(2π~2mH

mempkT

)3/2

eQ/kT , (10.1.24)

where Q = me +mp −mH = 13.6 eV is the binding energy of the hydrogen atom.

We next define the quantity X =np(T )

nB(T ), where nB(T ) = np(T )+nH(T ) is the baryon

number density (protons and neutrons both have baryon number 1, as does the hydrogenatom since it has one proton in the nucleus). We further assume that ne = np since thetotal charge is zero and also that mH ≈ mp. Hence, we find the equation for X,

1−XX2

= nB(T )

(2π~2

mekT

)3/2

eQ/kT . (10.1.25)

This equation is known as the Saha equation. We then assume that at recombination,nH = np, and so X = 1/2. The energy density today of the baryons is about 4-5% of ρc,which gives a baryon number density of about

nB(T ) ≈ (0.04)ρcmp

(T

T0

)3

= (0.04)9.2× 10−27 kg m−3

1.67× 10−27 kg

(T

T0

)3

≈ 0.22 m−3

(T

T0

)3

,(10.1.26)

although the exact number is not terribly important for determing Trec. Plugging in thenumbers for me = 9.11× 10−31 kg, ~ = 1.05× 10−34 joules-s, and kT0 = 2.35× 10−4 eVwe find that 2π~2

me(kT0)2= 8.6× 10−12 eV−1 m2. This leaves us with the equation

2 = (0.22)(8.6× 10−12 eV−1)3/2(kTrec)3/2 exp((13.6 eV)/kTrec) , (10.1.27)

101


which can be solved numerically to give kTrec ≈ 0.323 eV. Note that the actual baryondensity does not have to be too accurate because of the sharp exponential behaviorcoming from the eQ/kT term. If we had used 5% instead of 4% the answer would havebeen kTrec = 0.325 eV.

10.1.4 Nucleosynthesis

Not all of the baryons in the universe are protons. There are also neutrons, all of whichare contained in the nuclei of heavier atoms. Most of the neutrons are in fact in heliumatoms. Helium is a byproduct of stellar burning, but most of the helium in the universewas produced soon after the big bang during a stage called nucleosynthesis. In thissection we will estimate the density of this primordial helium.

When the universe was very hot, there was thermal equilibrium between neutrons,protons, neutrinos, electrons and positrons. Among the interactions that thermalizedthis soup were

n + νe ↔ e− + p

n + e+ ↔ νe + p (10.1.28)

where νe and νe represent a neutrino and antineutrino respectively of electron type.While in thermal equilibrium, the number density of neutrons to protons is

nn

np= e−Q/kT , (10.1.29)

where Q is the mass difference, Q = mn −mp = 1.29 MeV. Hence, it would seem thatonce kT fell well below 1.29MeV the number of neutrons to protons would be negligiblysmall.

However, the interactions in (10.1.28) are weak interactions, meaning that do notoccur too often. In fact the interaction rate Γ for a neutron to turn into an electron anda proton is given by

Γ = nνσweakc , (10.1.30)

where nν is the number density of the neutrinos and σweak is the weak cross-section forthis interaction. As the universe cools the neutrino number density falls off as nν ∼ T 3

while the cross-section falls off as σweak ∼ T 2, hence the rate falls off as Γ ∼ T 5. Sincethe universe is radiation dominated during this epoch, we have that a(t) ∼ t1/2 (seesection (1.5) in chapter 9). Therefore, Γ ∼ t−5/2 since a(t) is inversely proportional to T .Because the rate falls off faster than 1/t, only a finite number of the neutrons will interactwith the neutrinos beyond a certain time. This time can be reasonably estimated as thepoint where the interaction rate Γ equals the expansion rate H = 1

2t−1 for a radiation

dominated universe. After this point the neutrons stop interacting and they essentiallyfreeze out.

To estimate this temperature, we have that the cross-section is approximately,

σweak ∼ 10−47 m2

(kT

1 MeV

)2

(10.1.31)

102


while the electron neutrino density is

nν =

∫ ∞0

dλ4π

λ4

1

exp(hc/λkT ) + 1=

3 ζ(3)

4π2

(kT

~c

)3

. (10.1.32)

From the Friedmann equation we also have that

H =

√8πG

3ρ r(T ) , (10.1.33)

where ρ r(T ) has the form in (10.1.4). At these temperatures g∗ is approximately

g∗ ≈ 2 + 78(6 + 2 + 2) ≈ 11 , (10.1.34)

where we have counted 6 neutrino species and the two spins for the electrons and thepositrons. Hence, we find that

(kT )3 ≈√

4π3Gg∗ ~3

45 c

4π2

3 ζ(3)× 1047 MeV2 m−2 , (10.1.35)

where we have inserted some factors of c to convert kg to joules. Plugging in numberswe find that

(kT )3 ≈ 19 MeV3 kT ≈ 2.7 MeV . (10.1.36)

Actually, our approximations have been a little too crude, the actual temperature forthe freezeout is closer to kT ≈ 0.8 MeV. Hence, we expect that the number density ofneutrons to protons is nn ≈ np exp(−1.29/0.8) ≈ 0.2np.

We can estimate the age of the universe when the freeze-out occurred. Using thatt = 1

2H−1 we have that

t =1

2

√45 ~3 c5

4π3Gg∗ (kT )4=

1

2

√45 (1.05× 10−34 j-s)3 (3× 108 m s−1)5

4π3(6.67× 10−11 m3 kg−1 s−2 (11) (1.29× 1.6× 10−13 j)4

∼ 1 sec . (10.1.37)

After the freeze-out the neutrons can interact with the protons to form heavier nu-clei. The process goes forward, first making deuterium nuclei and then tritium, beforeproducing helium. After helium the process more or less stops since there are no stablenuclei with 5 nucleons. Hence, we expect that most of the neutrons end up in helium.Since helium has 2 protons and 2 neutrons, we expect that the number of helium atomsto hydrogen atoms is approximately

nHe ≈nn/2

np − nnnH ≈

1

8nH . (10.1.38)

This puts the mass fraction of helium for all baryons to be

Y ≡ 4nHe

4nHe + nH=

1

3. (10.1.39)

The observed value is about Y ≈ 0.24. With a more careful analysis along the lines ofthe recombination computations it is possible to get this value, but we won’t do thishere.

103

Chapter 11

Cosmology continued

In this chapter we discuss the uniformity of the cosmic microwave background and in-troduce the concept of inflation.

11.1 Inflation

11.1.1 The particle horizon

Recall from chapter 9 that the proper distance in a spatially flat universe is given by

d =1

H0

∫ Z

0

dZ ′√Ωm(Z ′ + 1)3 + Ωv

. (11.1.1)

The maximum proper distance, d0 that a light ray can travel over the age of the universeis then found by setting Z =∞. If we use the values Ωm = 0.26 and Ωv = 0.74 then wefind that d0 = 3.50H−1

0 = 3.50 (3 × 108 m s−1)/(7 × 104 m s−1 Mpsc−1) ≈ 15, 000 Mpsc.We call this the particle horizon of the universe today, or just the horizon.

Now let us go back to the time of recombination and compute the particle horizonthen. In this case the proper distance traveled is

drec =1

H0

areca0

∫ ∞Zrec

dZ ′√Ωm(Z ′ + 1)3 + Ωv

≈ 1

ZrecH0

√Ωm

∫ ∞Zrec

Z−3/2dZ =2

H0

√ΩmZ3

rec

. (11.1.2)

If we compare the coordinate distances of these particle horizons, we find that today itis given by ∆r0 = d0/a0, while at recombination it was ∆rrec = drec/arec. Hence theirratios is given by

δ =∆rrec∆r0

=2

(3.5)√

ΩmZrec≈ 2

(3.5)√

(0.26)(1369)≈ 3.0× 10−2 . (11.1.3)

The result in (11.1.3) presents a problem. Supposedly at recombination the photonswere in thermal equilibrium. But in order for the photons in one region of the universe

104


!r0

rrec

Figure 11.1: Particle horizons at recombination and today in terms of the comoving coordinates.The angle in the sky inside the recombination particle horizon is δ.

to have the same temperature as those in another region, it must have been possible forsome particles to travel from one region to another, or to at least scatter off particlesthat scattered off of other particles, etc. until energy was transfered from the first regionto the second or vice versa. However, the absolute fastest that such an energy transfercan happen is the speed of light. Hence, we seem to have no reason to expect that overcoordinate distances greater than rrec that the photon spectrum would have the sametemperature. The radiation after the photons last scatter emerge from regions of theuniverse that are more or less fixed in the comoving coordinates. This means that foran observer looking at the CMB today, we would expect the temperature to be uniformonly over an angular separation δ or less (see figure 11.1). From (11.1.3) we find thatδ ≈ 1.7 degrees. But what do we see? The temperature is remarkably uniform over theentire sky, with a relative deviation in temperature ∆T ∼ 10−5T0. Therefore, it seemsthat the entire region of the sky was at one time in thermal equilibrium, even thoughthis does not seem possible for the scenario we have described.

11.1.2 Inflation and the particle horizon

To find a way around this dilemma, let us consider what happens if there were a shortperiod at a very early stage when the universe was vacuum dominated. Assume that att = 0 the universe starts out radiation dominated until a time t1. Then at t = t1 theparticle horizon is at

d1 = a1

∫ t1

0

dt

a(t)=

∫ t1

0

√t1tdt = 2 t1 . (11.1.4)

105


The coordinate size of the universe at this point is ∆r1 = 2t1/a1. Then at t1 the universeswitches over to a vacuum dominated universe until a time t = t2. The particle horizonthen “inflates” to a new value d2,

d2 = d1eH1(t2−t1) , (11.1.5)

where H1 is the hubble constant during this expansion. After time t2 the universe goesback to a radiation dominated universe with expansion

a(t) = eH1(t2−t1)a1

(t

t2

)1/2

. (11.1.6)

Now let us say we knew nothing about the inflationary period and we just observed theexpansion as in (11.1.6). Then we would say that the particle horizon at time t is atd = 2 t and that the coordinate horizon is at

∆r = e−H1(t2−t1)a−11 (t2 t)

1/2 . (11.1.7)

Thus it is possible for ∆r << ∆r1.For the sake of argument, let us suppose that we live in a radiation dominated

universe. This will get the age of the universe wrong by a factor of two, but that won’tmatter much for our discussion but it will simplify the calculations. Then the coordinatedistance of the particle horizon today is

∆r0 = e−H1(t2−t1)a−11 (t2 t0)1/2 . (11.1.8)

Now if ∆r1 > ∆r0, then the entire visible universe could have been in thermal equilibriumbefore the inflationary period. In order for this to be true, we must have that

eH1(t2−t1) >

(t2 t0t21

)1/2

. (11.1.9)

If we assume that t2 ∼ t1, and using that the universe is radiation dominated afterward,we find that

eH1(t2−t1) >

(t0t1

)1/2

= Z2 , (11.1.10)

where Z2 is the Z factor at t2. For various particle physics reasons, it is often assumedthat the temperature during inflation is kT1 ∼ 1014 GeV. The temperature today iskT0 = 2.35× 10−4 eV, thus we have

H(t2 − t1) > log

(1014 GeV

2.35× 10−4 eV

)∼ 60 . (11.1.11)

Hence, the inflationary period has to go through 60 e-foldings to perhaps explain theuniform temperature of the universe.

106


Φ

VHΦL

Figure 11.2: A slowly rolling potential that can lead to inflation in the early universe.

11.1.3 The free lunch

A little thought shows that there could be a problem with having an inflationary period.The problem is that during inflation the radiation in the universe rapidly cools due tothe exponential expansion. The universe would thus be much too cold since after 60e-foldings the universe would have dropped to the present temperature, even though weare assuming that the inflation ends a tiny fraction of a second after the beginning ofthe universe. Actually, this problem is related to a question: How does one get out ofinflation, since up to now we have been thinking that once we are in a vacuum dominateduniverse the vacuum energy density will dominate at all later times? It turns out thatthe answer to this question will also resolve our problem.

Let us suppose that we have a scalar field φ which has a potential V (φ). The contri-bution to the energy momentum tensor from this potential is ∆Tµν = −gµνV (φ). Hencethis potential is a source of vacuum energy. Let us suppose that the potential has theform as shown in figure 11.2 and let us suppose that the field starts at φ = 0. At somepoint in time, say t1 this term will dominate the energy density and the universe willstart to inflate. However, let us also assume that during the inflation, the field φ isrolling down the hill. The rolling starts slowly since the slope is very gentle at the top,but eventually it approaches the edge and rolls down to the bottom. This is where theinflation is ending since the vacuum energy is disappearing. But this energy must gosomewhere, and what happens is that the energy is transferred to all of the radiation.Hence the energy density of the radiation after this graceful exit is roughly the energydensity that the vacuum started with at t1 which was also the energy density of theradiation, since the inflation started when their energy densities were equal. Hence, thetemperature of the radiation after the graceful exit is the same as at the beginning ofinflation and so we have solved our problem. This resolution was originally called the“free lunch” by Alan Guth, the inventor of the inflationary paradigm.

Let us make this story a little more concrete. A scalar field has an action given by

S = −∫d4x√−g[

12gµν∂µφ ∂νφ+ V (φ)

]. (11.1.12)

107


The equation of motion is then given by

− 1√−g

∂µgµν ∂νφ+ V ′(φ) = 0 . (11.1.13)

If we assume that we have a spatially flat space with the Robertson-Walker metric, theng00 = −1 and

√−g = a3(t). If we further assume that φ has only time dependence, then

the equation of motion reduces to

1

a3∂0(a3∂0φ) + V ′(φ) = 0

φ+ 3Hφ+ V ′(φ) = 0 , (11.1.14)

where we used that H = aa. Examining the second equation in (11.1.14), we see that it

looks like the motion of a particle in one dimension, with H acting like a friction term.If the scalar field is slowly rolling, then |φ| 3H|φ| and (11.1.14) reduces to

φ ' −V′(φ)

3H(11.1.15)

We can find a condition between V (φ) and the Hubble parameter so that this slow rollingholds. If we take another time derivative on (11.1.14) we find

...φ + 3Hφ+ φ V ′′(φ) = 0 , (11.1.16)

where we assumed that H = 0. Ignoring the...φ term and using that |φ| << |3Hφ| leads

to the condition

|φ V ′′(φ)| = |3Hφ| 9H2|φ| ⇒ |V ′′(φ)| << 9H2 . (11.1.17)

This is known as the slow roll condition. From the Friedman equation we can write thisanother way. Namely, at the top of the hill, ρ = V (φ), therefore H2 = 8πG

3V (φ) and so

the slow roll condition becomes

|V ′′(φ)| 24π GV (φ) . (11.1.18)

It turns out that there is another slow-roll condition. The changing of the scalar field alsocontributes to the energy momentum tensor, such that the energy density and pressureare

ρ = 12φ2 + V (φ) , p = 1

2φ2 − V (φ) . (11.1.19)

Hence we want 12φ2 V (φ) in order to have a vacuum dominated energy density. From

the equation in (11.1.15) we then have that

12φ2 ≈ 1

18H2(V ′(φ))2 =

1

48π G

(V ′)2

V, (11.1.20)

and so

|V ′(φ)| (48π G)1/2|V (φ)|, . (11.1.21)

108


These two slow roll conditions are normally expressed in terms of two dimensionlessparameters,

ε =1

16π G

(V ′(φ)

V (φ)

)2

η = =1

8π G

(V ′′(φ)

V (φ)

)(11.1.22)

If the slow roll conditions are met then ε, |η| 1.

11.2 Other things to do

Given more time, I would eventually like to include discussions on the following topics:

• More thorough discussion of nucleosynthesis

• Jeans instability and galaxy formation

• Fluctuations in CMB

• Spectrum of multipole moments in CMB

• Combining supernova, CMB data to get the best fit for the present cosmology.

• Gaussian fluctuations in inflation; ”ringing of the bell”

• Other black holes including Reisner-No rdst”om and Kerr.

• Gravitational waves

109

gravitation och kosmologi lecture notes - uppsala · pdf filegravitation och kosmologi...

Documents