general relativity

401

Upload: abdilatif-mohamud

Post on 12-Jan-2016

139 views

Category:

Documents


0 download

DESCRIPTION

This is a textbook on general relativity for upper-division undergraduates majoring in physics, at roughly the same level as Rindler's Essential Relativity or Hartle's Gravity. The book is meant to be especially well adapted for self-study, and answers are given in the back of the book for almost all the problems. The ratio of conceptual to mathematical problems is higher than in most books. The focus is on "index-gymnastics" techniques, to the exclusion of index-free notation. Knowledge of first-year calculus and lower-division mechanics and electromagnetism is assumed. Special relativity is introduced from scratch, but it will be very helpful to have a thorough previous knowledge of SR, at the level of a book such as Taylor and Wheeler's Spacetime Physics or my own text Special Relativity. Book authored by Benjamin Crowell, titled General Relativity.

TRANSCRIPT

Page 1: General Relativity
Page 2: General Relativity

2

Page 3: General Relativity

General RelativityBenjamin Crowellwww.lightandmatter.com

Page 4: General Relativity

Fullerton, Californiawww.lightandmatter.com

Copyright c©2009 Benjamin Crowell

rev. July 14, 2015

Permission is granted to copy, distribute and/or modify this docu-ment under the terms of the Creative Commons Attribution Share-Alike License, which can be found at creativecommons.org. Thelicense applies to the entire text of this book, plus all the illustra-tions that are by Benjamin Crowell. All the illustrations are byBenjamin Crowell except as noted in the photo credits or in paren-theses in the caption of the figure. This book can be downloadedfree of charge from www.lightandmatter.com in a variety of formats,including editable formats.

Page 5: General Relativity

Brief Contents1 A geometrical theory of spacetime 112 Geometry of flat spacetime 413 Differential geometry 874 Tensors 1235 Curvature 1596 Vacuum solutions 2037 Symmetries 2438 Sources 2679 Gravitational waves 343

5

Page 6: General Relativity

6

Page 7: General Relativity

Contents

1 A geometrical theory of spacetime 111.1 Time and causality . . . . . . . . . . . . . . . . 121.2 Experimental tests of the nature of time . . . . . . . . 14

The Hafele-Keating experiment, 15.—Muons, 16.—Gravitationalred-shifts, 16.

1.3 Non-simultaneity and the maximum speed of cause and ef-fect . . . . . . . . . . . . . . . . . . . . . . . . . 171.4 Ordered geometry. . . . . . . . . . . . . . . . . 181.5 The equivalence principle. . . . . . . . . . . . . . 20

Proportionality of inertial and gravitational mass, 20.—Geometricaltreatment of gravity, 21.—Eotvos experiments, 22.—The equiv-alence principle, 23.—Gravitational red-shifts, 32.—The Pound-Rebka experiment, 34.

Problems . . . . . . . . . . . . . . . . . . . . . . 38

2 Geometry of flat spacetime 412.1 Affine properties of Lorentz geometry. . . . . . . . . 42

Parallelism and measurement, 42.—Vectors, 46.

2.2 Relativistic properties of Lorentz geometry . . . . . . 52Geodesics and stationary action, 63.

2.3 The light cone . . . . . . . . . . . . . . . . . . 64Velocity addition, 66.—Logic, 68.

2.4 Experimental tests of Lorentz geometry . . . . . . . . 68Dispersion of the vacuum, 69.—Observer-independence of c, 69.—Lorentz violation by gravitational forces, 71.

2.5 Three spatial dimensions . . . . . . . . . . . . . . 72Lorentz boosts in three dimensions, 72.—Gyroscopes and the equiv-alence principle, 74.—Boosts causing rotations, 75.—An experi-mental test: Thomas precession in hydrogen, 82.

Problems . . . . . . . . . . . . . . . . . . . . . . 84

3 Differential geometry 873.1 Tangent vectors . . . . . . . . . . . . . . . . . . 883.2 Affine notions and parallel transport . . . . . . . . . 89

The affine parameter in curved spacetime, 89.—Parallel transport,89.

3.3 Models . . . . . . . . . . . . . . . . . . . . . 913.4 Intrinsic quantities . . . . . . . . . . . . . . . . . 95

Coordinate independence, 96.

3.5 The metric . . . . . . . . . . . . . . . . . . . . 98The Euclidean metric, 100.—The Lorentz metric, 105.—Isometry,inner products, and the Erlangen program, 106.—Einstein’s carousel,108.

3.6 The metric in general relativity. . . . . . . . . . . . 114The hole argument, 114.—A Machian paradox, 115.

7

Page 8: General Relativity

3.7 Interpretation of coordinate independence. . . . . . . 116Is coordinate independence obvious?, 116.—Is coordinate indepen-dence trivial?, 117.—Coordinate independence as a choice of gauge,118.

Problems . . . . . . . . . . . . . . . . . . . . . . 119

4 Tensors 1234.1 Lorentz scalars . . . . . . . . . . . . . . . . . . 1234.2 Four-vectors . . . . . . . . . . . . . . . . . . . 124

The velocity and acceleration four-vectors, 124.—The momentumfour-vector, 126.—The frequency vector and the relativistic Dopplershift, 133.—A non-example: electric and magnetic fields, 136.—The electromagnetic potential four-vector, 137.

4.3 The tensor transformation laws . . . . . . . . . . . 1384.4 Experimental tests . . . . . . . . . . . . . . . . 142

Universality of tensor behavior, 142.—Speed of light differing fromc, 142.—Degenerate matter, 143.

4.5 Conservation laws. . . . . . . . . . . . . . . . . 148No general conservation laws, 148.—Conservation of angular mo-mentum and frame dragging, 149.

4.6 Things that aren’t quite tensors . . . . . . . . . . . 151Area, volume, and tensor densities, 151.—The Levi-Civita symbol,153.—Spacetime volume, 155.—Angular momentum, 155.

Problems . . . . . . . . . . . . . . . . . . . . . . 156

5 Curvature 1595.1 Tidal curvature versus curvature caused by local sources 1605.2 The stress-energy tensor . . . . . . . . . . . . . . 1615.3 Curvature in two spacelike dimensions . . . . . . . . 1625.4 Curvature tensors . . . . . . . . . . . . . . . . . 1685.5 Some order-of-magnitude estimates . . . . . . . . . 170

The geodetic effect, 170.—Deflection of light rays, 171.

5.6 The covariant derivative . . . . . . . . . . . . . . 172The covariant derivative in electromagnetism, 173.—The covariantderivative in general relativity, 174.

5.7 The geodesic equation . . . . . . . . . . . . . . . 178Characterization of the geodesic, 178.—Covariant derivative withrespect to a parameter, 179.—The geodesic equation, 179.—Uniqueness,180.

5.8 Torsion . . . . . . . . . . . . . . . . . . . . . 181Are scalars path-dependent?, 181.—The torsion tensor, 183.—Experimentalsearches for torsion, 184.

5.9 From metric to curvature . . . . . . . . . . . . . . 187Finding the Christoffel symbol from the metric, 187.—Numericalsolution of the geodesic equation, 188.—The Riemann tensor interms of the Christoffel symbols, 190.—Some general ideas aboutgauge, 190.

5.10 Manifolds . . . . . . . . . . . . . . . . . . . . 193Why we need manifolds, 193.—Topological definition of a manifold,194.—Local-coordinate definition of a manifold, 196.—Differentiable

8

Page 9: General Relativity

manifolds, 198.—The tangent space, 198.

Problems . . . . . . . . . . . . . . . . . . . . . . 200

6 Vacuum solutions 2036.1 Event horizons . . . . . . . . . . . . . . . . . . 203

The event horizon of an accelerated observer, 203.—Informationparadox, 205.—Radiation from event horizons, 206.

6.2 The Schwarzschild metric . . . . . . . . . . . . . 207The zero-mass case, 208.—Geometrized units, 210.—A large-r limit,210.—The complete solution, 211.—Geodetic effect, 214.—Orbits,217.—Deflection of light, 222.

6.3 Black holes. . . . . . . . . . . . . . . . . . . . 225Singularities, 225.—Event horizon, 226.—Expected formation, 227.—Observational evidence, 227.—Singularities and cosmic censorship,229.—Hawking radiation, 232.—Black holes in d dimensions, 234.

6.4 Degenerate solutions . . . . . . . . . . . . . . . 236Problems . . . . . . . . . . . . . . . . . . . . . . 240

7 Symmetries 2437.1 Killing vectors. . . . . . . . . . . . . . . . . . . 243

Inappropriate mixing of notational systems, 247.—Conservationlaws, 248.

7.2 Spherical symmetry . . . . . . . . . . . . . . . . 2497.3 Penrose diagrams . . . . . . . . . . . . . . . . . 250

Flat spacetime, 251.—Schwarzschild spacetime, 251.—Astrophysicalblack hole, 252.

7.4 Static and stationary spacetimes. . . . . . . . . . . 253Stationary spacetimes, 253.—Isolated systems, 254.—A station-ary field with no other symmetries, 254.—A stationary field withadditional symmetries, 256.—Static spacetimes, 256.—Birkhoff’stheorem, 256.—The gravitational potential, 258.

7.5 The uniform gravitational field revisited . . . . . . . . 260Closed timelike curves, 263.

Problems . . . . . . . . . . . . . . . . . . . . . . 265

8 Sources 2678.1 Sources in general relativity . . . . . . . . . . . . . 267

Point sources in a background-independent theory, 267.—The Ein-stein field equation, 268.—Energy conditions, 281.—The cosmolog-ical constant, 291.

8.2 Cosmological solutions. . . . . . . . . . . . . . . 295Evidence for the finite age of the universe, 295.—Evidence forexpansion of the universe, 296.—Evidence for homogeneity andisotropy, 297.—The FRW cosmologies, 298.—A singularity at theBig Bang, 303.—Observability of expansion, 306.—The vacuum-dominated solution, 314.—The matter-dominated solution, 319.—The radiation-dominated solution, 322.—Local effects of expan-sion, 323.—Observation, 326.

8.3 Mach’s principle revisited . . . . . . . . . . . . . . 330The Brans-Dicke theory, 330.—Predictions of the Brans-Dicke the-

9

Page 10: General Relativity

ory, 333.—Hints of empirical support, 334.—Mach’s principle isfalse., 334.

8.4 Historical note: the steady-state model . . . . . . . . 335Problems . . . . . . . . . . . . . . . . . . . . . . 340

9 Gravitational waves 3439.1 The speed of gravity . . . . . . . . . . . . . . . . 3439.2 Gravitational radiation . . . . . . . . . . . . . . . 344

Empirical evidence, 344.—Energy content, 346.—Expected prop-erties, 348.—Some exact solutions, 350.—Rate of radiation, 352.

Problems . . . . . . . . . . . . . . . . . . . . . . 356

Appendix 1: Excerpts from three papers by Einstein . . . . . . . . . 357“On the electrodynamics of moving bodies” . . . . . . . . . . . . . . . . . . .357“Does the inertia of a body depend upon its energy content?” 370“The foundation of the general theory of relativity” . . . . . . . . . . .372Appendix 2: Hints and solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377

10

Page 11: General Relativity

Chapter 1

A geometrical theory ofspacetime

“I always get a slight brain-shiver, now [that] space and time appearconglomerated together in a gray, miserable chaos.” – Sommerfeld

This is a book about general relativity, at a level that is meantto be accessible to advanced undergraduates.

This is mainly a book about general relativity, not special rel-ativity. I’ve heard the sentiment expressed that books on specialrelativity generally do a lousy job on special relativity, compared tobooks on general relativity. This is undoubtedly true, for someonewho already has already learned special relativity — but wants tounlearn the parts that are completely wrong in the broader contextof general relativity. For someone who has not already learned spe-cial relativity, I strongly recommend mastering it first, from a booksuch as Taylor and Wheeler’s Spacetime Physics.

In the back of this book I’ve included excerpts from three papersby Einstein — two on special relativity and one on general relativity.They can be read before, after, or along with this book. There arefootnotes in the papers and in the main text linking their contentwith each other.

I should reveal at the outset that I am not a professional rela-tivist. My field of research was nonrelativistic nuclear physics untilI became a community college physics instructor. I can only hopethat my pedagogical experience will compensate to some extent formy shallow background, and that readers who find mistakes will bekind enough to let me know about them using the contact informa-tion provided at http://www.lightandmatter.com/area4author.html.

11

Page 12: General Relativity

1.1 Time and causalityUpdating Plato’s allegory of the cave, imagine two super-intelligenttwins, Alice and Betty. They’re raised entirely by a robotic tutoron a sealed space station, with no access to the outside world. Therobot, in accord with the latest fad in education, is programmed toencourage them to build up a picture of all the laws of physics basedon their own experiments, without a textbook to tell them the rightanswers. Putting yourself in the twins’ shoes, imagine giving upall your preconceived ideas about space and time, which may turnout according to relativity to be completely wrong, or perhaps onlyapproximations that are valid under certain circumstances.

Causality is one thing the twins will notice. Certain events re-sult in other events, forming a network of cause and effect. Onegeneral rule they infer from their observations is that there is anunambiguously defined notion of betweenness: if Alice observes thatevent 1 causes event 2, and then 2 causes 3, Betty always agrees that2 lies between 1 and 3 in the chain of causality. They find that thisagreement holds regardless of whether one twin is standing on herhead (i.e., it’s invariant under rotation), and regardless of whetherone twin is sitting on the couch while the other is zooming aroundthe living room in circles on her nuclear fusion scooter (i.e., it’s alsoinvariant with respect to different states of motion).

You may have heard that relativity is a theory that can be inter-preted using non-Euclidean geometry. The invariance of between-ness is a basic geometrical property that is shared by both Euclideanand non-Euclidean geometry. We say that they are both orderedgeometries. With this geometrical interpretation in mind, it willbe useful to think of events not as actual notable occurrences butmerely as an ambient sprinkling of points at which things could hap-pen. For example, if Alice and Betty are eating dinner, Alice couldchoose to throw her mashed potatoes at Betty. Even if she refrains,there was the potential for a causal linkage between her dinner andBetty’s forehead.

Betweenness is very weak. Alice and Betty may also make anumber of conjectures that would say much more about causality.For example: (i) that the universe’s entire network of causality isconnected, rather than being broken up into separate parts; (ii) thatthe events are globally ordered, so that for any two events 1 and 2,either 1 could cause 2 or 2 could cause 1, but not both; (iii) not onlyare the events ordered, but the ordering can be modeled by sortingthe events out along a line, the time axis, and assigning a number t,time, to each event. To see what these conjectures would entail, let’sdiscuss a few examples that may draw on knowledge from outsideAlice and Betty’s experiences.

Example: According to the Big Bang theory, it seems likely thatthe network is connected, since all events would presumably connect

12 Chapter 1 A Geometrical Theory of Spacetime

Page 13: General Relativity

back to the Big Bang. On the other hand, if (i) were false we mighthave no way of finding out, because the lack of causal connectionswould make it impossible for us to detect the existence of the otheruniverses represented by the other parts disconnected from our ownuniverse.

Example: If we had a time machine,1 we could violate (ii), butthis brings up paradoxes, like the possibility of killing one’s owngrandmother when she was a baby, and in any case nobody knowshow to build a time machine.

Example: There are nevertheless strong reasons for believingthat (ii) is false. For example, if we drop Alice into one black hole,and Betty into another, they will never be able to communicateagain, and therefore there is no way to have any cause and effectrelationship between Alice’s events and Betty’s.2

Since (iii) implies (ii), we suspect that (iii) is false as well. ButAlice and Betty build clocks, and these clocks are remarkably suc-cessful at describing cause-and-effect relationships within the con-fines of the quarters in which they’ve lived their lives: events withhigher clock readings never cause events with lower clock readings.They announce to their robot tutor that they’ve discovered a uni-versal thing called time, which explains all causal relationships, andwhich their experiments show flows at the same rate everywherewithin their quarters.

“Ah,” the tutor sighs, his metallic voice trailing off.

“I know that ‘ah’, Tutorbot,” Betty says. “Come on, can’t youjust tell us what we did wrong?”

“You know that my pedagogical programming doesn’t allow that.”

“Oh, sometimes I just want to strangle whoever came up withthose stupid educational theories,” Alice says.

The twins go on strike, protesting that the time theory worksperfectly in every experiment they’ve been able to imagine. Tutor-bot gets on the commlink with his masters and has a long, inaudibleargument, which, judging from the hand gestures, the twins imagineto be quite heated. He announces that he’s gotten approval for afield trip for one of the twins, on the condition that she remain in asealed environment the whole time so as to maintain the conditionsof the educational experiment.

1The possibility of having time come back again to the same point is oftenreferred to by physicists as a closed timelike curve (CTC). Kip Thorne, in hispopularization Black Holes and Time Warps, recalls experiencing some anxietyafter publishing a paper with “Time Machines” in the title, and later beingembarrassed when a later paper on the topic was picked up by the NationalEnquirer with the headline PHYSICISTS PROVE TIME MACHINES EXIST.“CTC” is safer because nobody but physicists know what it means.

2This point is revisited in section 6.1.

Section 1.1 Time and causality 13

Page 14: General Relativity

“Who gets to go?” Alice asks.

“Betty,” Tutorbot replies, “because of the mashed potatoes.”

“But I refrained!” Alice says, stamping her foot.

“Only one time out of the last six that I served them.”

The next day, Betty, smiling smugly, climbs aboard the sealedspaceship carrying a duffel bag filled with a large collection of clocksfor the trip. Each clock has a duplicate left behind with Alice. Theclock design that they’re proudest of consists of a tube with twomirrors at the ends. A flash of light bounces back and forth betweenthe ends, with each round trip counting as one “tick,” one unit oftime. The twins are convinced that this one will run at a constantrate no matter what, since it has no moving parts that could beaffected by the vibrations and accelerations of the journey.

Betty’s field trip is dull. She doesn’t get to see any of the outsideworld. In fact, the only way she can tell she’s not still at home is thatshe sometimes feels strong sensations of acceleration. (She’s grownup in zero gravity, so the pressing sensation is novel to her.) She’sout of communication with Alice, and all she has to do during thelong voyage is to tend to her clocks. As a crude check, she verifiesthat the light clock seems to be running at its normal rate, judgedagainst her own pulse. The pendulum clock gets out of synch withthe light clock during the accelerations, but that doesn’t surpriseher, because it’s a mechanical clock with moving parts. All of thenonmechanical clocks seem to agree quite well. She gets hungry forbreakfast, lunch, and dinner at the usual times.

When Betty gets home, Alice asks, “Well?”

“Great trip, too bad you couldn’t come. I met some cute boys,went out dancing, . . . ”

“You did not. What about the clocks?”

“They all checked out fine. See, Tutorbot? The time theory stillholds up.”

“That was an anticlimax,” Alice says. “I’m going back to bednow.”

“Bed?” Betty exclaims. “It’s three in the afternoon.”

The twins now discover that although all of Alice’s clocks agreeamong themselves, and similarly for all of Betty’s (except for theones that were obviously disrupted by mechanical stresses), Alice’sand Betty’s clocks disagree with one another. A week has passedfor Alice, but only a couple of days for Betty.

1.2 Experimental tests of the nature of time

14 Chapter 1 A Geometrical Theory of Spacetime

Page 15: General Relativity

a / The clock took up two seats,and two tickets were bought for itunder the name of “Mr. Clock.”

1.2.1 The Hafele-Keating experiment

In 1971, J.C. Hafele and R.E. Keating3 of the U.S. Naval Obser-vatory brought atomic clocks aboard commercial airliners and wentaround the world, once from east to west and once from west to east.(The clocks had their own tickets, and occupied their own seats.) Asin the parable of Alice and Betty, Hafele and Keating observed thatthere was a discrepancy between the times measured by the trav-eling clocks and the times measured by similar clocks that stayedat the lab in Washington. The result was that the east-going clocklost an amount of time ∆tE = −59 ± 10 ns, while the west-goingone gained ∆tW = +273 ± 7 ns. This establishes that time is notuniversal and absolute.

Nevertheless, causality was preserved. The nanosecond-scale ef-fects observed were small compared to the three-day lengths of theplane trips. There was no opportunity for paradoxical situationssuch as, for example, a scenario in which the east-going experimenterarrived back in Washington before he left and then proceeded toconvince himself not to take the trip.

Hafele and Keating were testing specific quantitative predictionsof relativity, and they verified them to within their experiment’serror bars. At this point in the book, we aren’t in possession ofenough relativity to be able to make such calculations, but, likeAlice and Betty, we can inspect the empirical results for clues as tohow time works.

The opposite signs of the two results suggests that the rate atwhich time flows depends on the motion of the observer. The east-going clock was moving in the same direction as the earth’s rotation,so its velocity relative to the earth’s center was greater than that ofthe ones that remained in Washington, while the west-going clock’svelocity was correspondingly reduced.4 The signs of the ∆t’s showthat moving clocks were slower.

On the other hand, the asymmetry of the results, with |∆tE | 6=|∆tW |, implies that there was a second effect involved, simply dueto the planes’ being up in the air. Relativity predicts that time’srate of flow also changes with height in a gravitational field. Thedeeper reasons for such an effect are given in section ?? on page ??.

Although Hafele and Keating’s measurements were on the raggededge of the state of the art in 1971, technology has now progressedto the point where such effects have everyday consequences. Thesatellites of the Global Positioning System (GPS) orbit at a speed

3Hafele and Keating, Science, 177 (1972), 1684These differences in velocity are not simply something that can be eliminated

by choosing a different frame of reference, because the clocks’ motion isn’t ina straight line. The clocks back in Washington, for example, have a certainacceleration toward the earth’s axis, which is different from the accelerationsexperienced by the traveling clocks.

Section 1.2 Experimental tests of the nature of time 15

Page 16: General Relativity

of 1.9 × 103 m/s, an order of magnitude faster than a commercialjet. Their altitude of 20,000 km is also much greater than that ofan aircraft. For both these reasons, the relativistic effect on time isstronger than in the Hafele-Keating experiment. The atomic clocksaboard the satellites are tuned to a frequency of 10.22999999543MHz, which is perceived on the ground as 10.23 MHz. (This fre-quency shift will be calculated in example 12 on page 59.)

1.2.2 Muons

Although the Hafele-Keating experiment is impressively direct,it was not the first verification of relativistic effects on time, it didnot completely separate the kinematic and gravitational effects, andthe effect was small. An early experiment demonstrating a large andpurely kinematic effect was performed in 1941 by Rossi and Hall,who detected cosmic-ray muons at the summit and base of MountWashington in New Hampshire. The muon has a mean lifetime of2.2 µs, and the time of flight between the top and bottom of themountain (about 2 km for muons arriving along a vertical path)at nearly the speed of light was about 7 µs, so in the absence ofrelativistic effects, the flux at the bottom of the mountain shouldhave been smaller than the flux at the top by about an order ofmagnitude. The observed ratio was much smaller, indicating thatthe “clock” constituted by nuclear decay processes was dramaticallyslowed down by the motion of the muons.

1.2.3 Gravitational red-shifts

The first experiment that isolated the gravitational effect on timewas a 1925 measurement by W.S. Adams of the spectrum of lightemitted from the surface of the white dwarf star Sirius B. The grav-itational field at the surface of Sirius B is 4 × 105g, and the gravi-tational potential is about 3,000 times greater than at the Earth’ssurface. The emission lines of hydrogen were red-shifted, i.e., re-duced in frequency, and this effect was interpreted as a slowing oftime at the surface of Sirius relative to the surface of the Earth. His-torically, the mass and radius of Sirius were not known with betterthan order of magnitude precision in 1925, so this observation didnot constitute a good quantitative test.

The first such experiment to be carried out under controlledconditions, by Pound and Rebka in 1959, is analyzed quantitativelyin example 7 on page 129.

The first high-precision experiment of this kind was GravityProbe A, a 1976 experiment5 in which a space probe was launchedvertically from Wallops Island, Virginia, at less than escape veloc-ity, to an altitude of 10,000 km, after which it fell back to earth andcrashed down in the Atlantic Ocean. The probe carried a hydro-gen maser clock which was used to control the frequency of a radio

5Vessot at al., Physical Review Letters 45 (1980) 2081

16 Chapter 1 A Geometrical Theory of Spacetime

Page 17: General Relativity

b / Gravity Probe A.

signal. The radio signal was received on the ground, the nonrela-tivistic Doppler shift was subtracted out, and the residual blueshiftwas interpreted as the gravitational effect effect on time, matchingthe relativistic prediction to an accuracy of 0.01%.

1.3 Non-simultaneity and the maximum speedof cause and effect

We’ve seen that time flows at different rates for different observers.Suppose that Alice and Betty repeat their Hafele-Keating-style ex-periment, but this time they are allowed to communicate duringthe trip. Once Betty’s ship completes its initial acceleration awayfrom Betty, she cruises at constant speed, and each girl has her ownequally valid inertial frame of reference. Each twin considers herselfto be at rest, and says that the other is the one who is moving.Each one says that the other’s clock is the one that is slow. If theycould pull out their phones and communicate instantaneously, withno time lag for the propagation of the signals, they could resolvethe controversy. Alice could ask Betty, “What time does your clockread right now?” and get an immediate answer back.

By the symmetry of their frames of reference, however, it seemsthat Alice and Betty should not be able to resolve the controversyduring Betty’s trip. If they could, then they could release two radarbeacons that would permanently establish two inertial frames ofreference, A and B, such that time flowed, say, more slowly in Bthan in A. This would violate the principle that motion is relative,and that all inertial frames of reference are equally valid. The best

Section 1.3 Non-simultaneity and the maximum speed of cause and effect 17

Page 18: General Relativity

that they can do is to compare clocks once Betty returns, and verifythat the net result of the trip was to make Betty’s clock run moreslowly on the average.

Alice and Betty can never satisfy their curiosity about exactlywhen during Betty’s voyage the discrepancies accumulated or atwhat rate. This is information that they can never obtain, butthey could obtain it if they had a system for communicating in-stantaneously. We conclude that instantaneous communication isimpossible. There must be some maximum speed at which signalscan propagate — or, more generally, a maximum speed at whichcause and effect can propagate — and this speed must for examplebe greater than or equal to the speed at which radio waves propa-gate. It is also evident from these considerations that simultaneityitself cannot be a meaningful concept in relativity.

1.4 Ordered geometryLet’s try to put what we’ve learned into a general geometrical con-text.

Euclid’s familiar geometry of two-dimensional space has the fol-lowing axioms,6 which are expressed in terms of operations that canbe carried out with a compass and unmarked straightedge:

E1 Two points determine a line.

E2 Line segments can be extended.

E3 A unique circle can be constructed given any point as its centerand any line segment as its radius.

E4 All right angles are equal to one another.

E5 Parallel postulate: Given a line and a point not on the line,no more than one line can be drawn through the point andparallel to the given line.7

The modern style in mathematics is to consider this type ofaxiomatic system as a self-contained sandbox, with the axioms, andany theorems proved from them, being true or false only in relationto one another. Euclid and his contemporaries, however, believedthem to be empirical facts about physical reality. For example, theyconsidered the fifth postulate to be less obvious than the first four,because in order to verify physically that two lines were parallel,one would theoretically have to extend them to an infinite distanceand make sure that they never crossed. In the first 28 theorems of

6These axioms are summarized for quick reference in the back of the book onpage 400.

7This is a form known as Playfair’s axiom, rather than the version of thepostulate originally given by Euclid.

18 Chapter 1 A Geometrical Theory of Spacetime

Page 19: General Relativity

the Elements, Euclid restricts himself entirely to propositions thatcan be proved based on the more secure first four postulates. Themore general geometry defined by omitting the parallel postulate isknown as absolute geometry.

What kind of geometry is likely to be applicable to general rel-ativity? We can see immediately that Euclidean geometry, or evenabsolute geometry, would be far too specialized. We have in mindthe description of events that are points in both space and time.Confining ourselves for ease of visualization to one dimension worthof space, we can certainly construct a plane described by coordi-nates (t,x), but imposing Euclid’s postulates on this plane resultsin physical nonsense. Space and time are physically distinguishablefrom one another. But postulates 3 and 4 describe a geometry inwhich distances measured along non-parallel axes are comparable,and figures may be freely rotated without affecting the truth orfalsehood of statements about them; this is only appropriate for aphysical description of different spacelike directions, as in an (x, y)plane whose two axes are indistinguishable.

We need to throw most of the specialized apparatus of Euclideangeometry overboard. Once we’ve stripped our geometry to a bareminimum, then we can go back and build up a different set of equip-ment that will be better suited to relativity.

The stripped-down geometry we want is called ordered geometry,and was developed by Moritz Pasch around 1882. As suggested bythe parable of Alice and Betty, ordered geometry does not haveany global, all-encompassing system of measurement. When Bettygoes on her trip, she traces out a particular path through the spaceof events, and Alice, staying at home, traces another. Althoughevents play out in cause-and-effect order along each of these paths,we do not expect to be able to measure times along paths A and Band have them come out the same. This is how ordered geometryworks: points can be put in a definite order along any particularline, but not along different lines. Of the four primitive conceptsused in Euclid’s E1-E5 — point, line, circle, and angle — only thenon-metrical notions of point (i.e., event) and line are relevant inordered geometry. In a geometry without measurement, there is noconcept of measuring distance (hence no compasses or circles), or ofmeasuring angles. The notation [ABC] indicates that event B lieson a line segment joining A and C, and is strictly between them.

The axioms of ordered geometry are as follows:8

8The axioms are summarized for convenient reference in the back of the bookon page 400. This is meant to be an informal, readable summary of the system,pitched to the same level of looseness as Euclid’s E1-E5. Modern mathematicianshave found that systems like these actually need quite a bit more technicalmachinery to be perfectly rigorous, so if you look up an axiomatization of orderedgeometry, or a modern axiomatization of Euclidean geometry, you’ll typicallyfind a much more lengthy list of axioms than the ones presented here. The

Section 1.4 Ordered geometry 19

Page 20: General Relativity

a / Axioms O2 (left) and O3(right).

b / Stephen Hawking (1942-).

O1 Two events determine a line.

O2 Line segments can be extended: given A and B, there is atleast one event such that [ABC] is true.

O3 Lines don’t wrap around: if [ABC] is true, then [BCA] is false.

O4 Betweenness: For any three distinct events A, B, and C lyingon the same line, we can determine whether or not B is betweenA and C (and by statement 3, this ordering is unique exceptfor a possible over-all reversal to form [CBA]).

O1-O2 express the same ideas as Euclid’s E1-E2. Not all linesin the system will correspond physically to chains of causality; wecould have a line segment that describes a snapshot of a steel chain,and O3-O4 then say that the order of the links is well defined. ButO3 and O4 also have clear physical significance for lines describingcausality. O3 forbids time travel paradoxes, like going back in timeand killing our own grandmother as a child; figure a illustrates why aviolation of O3 is referred to as a closed timelike curve. O4 says thatevents are guaranteed to have a well-defined cause-and-effect orderonly if they lie on the same line. This is completely different fromthe attitude expressed in Newton’s famous statement: “Absolute,true and mathematical time, of itself, and from its own nature flowsequably without regard to anything external . . . ”

If you’re dismayed by the austerity of a system of geometry with-out any notion of measurement, you may be more appalled to learnthat even a system as weak as ordered geometry makes some state-ments that are too strong to be completely correct as a foundationfor relativity. For example, if an observer falls into a black hole, atsome point he will reach a central point of infinite density, called asingularity. At this point, his chain of cause and effect terminates,violating O2. It is also an open question whether O3’s prohibitionon time-loops actually holds in general relativity; this is StephenHawking’s playfully named chronology protection conjecture. We’llalso see that in general relativity O1 is almost always true, but thereare exceptions.

1.5 The equivalence principle1.5.1 Proportionality of inertial and gravitational mass

What physical interpretation should we give to the “lines” de-scribed in ordered geometry? Galileo described an experiment (which

axioms I’m omitting take care of details like making sure that there are morethan two points in the universe, and that curves can’t cut through one anotherwithout intersecting. The classic, beautifully written book on these topics isH.S.M. Coxeter’s Introduction to Geometry, which is “introductory” in the sensethat it’s the kind of book a college math major might use in a first upper-divisioncourse in geometry.

20 Chapter 1 A Geometrical Theory of Spacetime

Page 21: General Relativity

a / The cannonball and the mus-ketball have identical parabolicworld-lines. On this type ofspace-time plot, space is conven-tionally shown on the horizontalaxis, so the tower has to bedepicted on its side.

b / A piece of string held tauton a globe forms a geodesicfrom Mexico City to London.Although it appears curved, itis the analog of a straight linein the non-Euclidean geometryconfined to the surface of theEarth. Similarly, the world-lines offigure a appear curved, but theyare the analogs of straight linesin the non-Euclidean geometryused to describe gravitationalfields in general relativity.

he may or may not have actually performed) in which he simultane-ously dropped a cannonball and a musket ball from a tall tower. Thetwo objects hit the ground simultaneously, disproving Aristotle’s as-sertion that objects fell at a speed proportional to their weights. Ona graph of spacetime with x and t axes, the curves traced by the twoobjects, called their world-lines, are identical parabolas. (The pathsof the balls through x− y − z space are straight, not curved.) Oneway of explaining this observation is that what we call “mass” is re-ally two separate things, which happen to be equal. Inertial mass,which appears in Newton’s a = F/m, describes how difficult it isto accelerate an object. Gravitational mass describes the strengthwith which gravity acts. The cannonball has a hundred times moregravitational mass than the musket ball, so the force of gravity act-ing on it is a hundred times greater. But its inertial mass is alsoprecisely a hundred times greater, so the two effects cancel out, andit falls with the same acceleration. This is a special property of thegravitational force. Electrical forces, for example, do not behavethis way. The force that an object experiences in an electric fieldis proportional to its charge, which is unrelated to its inertial mass,so different charges placed in the same electric field will in generalhave different motions.

1.5.2 Geometrical treatment of gravity

Einstein realized that this special property of the gravitationalforce made it possible to describe gravity in purely geometricalterms. We define the world-lines of small9 objects acted on by grav-ity to be the lines described by the axioms of the geometry. Sincewe normally think of the “lines” described by Euclidean geometryand its kin as straight lines, this amounts to a redefinition of whatit means for a line to be straight. By analogy, imagine stretching apiece of string taut across a globe, as we might do in order to planan airplane flight or aim a directional radio antenna. The stringmay not appear straight as viewed from the three-dimensional Eu-clidean space in which the globe is embedded, but it is as straight aspossible in the sense that it is the path followed by a radio wave,10

or by an airplane pilot who keeps her wings level and her rudderstraight. The world-“line” of an object acted on by nongravita-tional forces is not considered to be a straight “line” in the sense ofO1-O4. When necessary, one eliminates this ambiguity in the over-loaded term “line” by referring to the lines of O1-O4 as geodesics.The world-line of a low-mass object acted on only by gravity is one

9The reason for the restriction to small objects is essentially gravitationalradiation. The object should also be electrically neutral, and neither the ob-ject nor the surrounding spacetime should contain any exotic forms of negativeenergy. This is discussed in more detail on p. 286. See also problem 1 on p. 356.

10Radio waves in the HF band tend to be trapped between the ground andthe ionosphere, causing them to curve over the horizon, allowing long-distancecommunication.

Section 1.5 The equivalence principle 21

Page 22: General Relativity

c / Lorand Eotvos (1848-1919).

d / If the geodesics definedby an airplane and a radio wavediffer from one another, thenit is not possible to treat bothproblems exactly using the samegeometrical theory. In generalrelativity, this would be analogousto a violation of the equivalenceprinciple. General relativity’svalidity as a purely geometricaltheory of gravity requires that theequivalence principle be exactlysatisfied in all cases.

type of geodesic.11

We can now see the deep physical importance of statement O1,that two events determine a line. To predict the trajectory of agolf ball, we need to have some initial data. For example, we couldmeasure event A when the ball breaks contact with the club, andevent B an infinitesimal time after A.12 This pair of observations canbe thought of as fixing the ball’s initial position and velocity, whichshould be enough to predict a unique world-line for the ball, sincerelativity is a deterministic theory. With this interpretation, we canalso see why it is not necessarily a disaster for the theory if O1fails sometimes. For example, event A could mark the launching oftwo satellites into circular orbits from the same place on the Earth,heading in opposite directions, and B could be their subsequentcollision on the opposite side of the planet. Although this violatesO1, it doesn’t violate determinism. Determinism only requires thevalidity of O1 for events infinitesimally close together. Even forrandomly chosen events far apart, the probability that they willviolate O1 is zero.

1.5.3 Eotvos experiments

Einstein’s entire system breaks down if there is any violation, nomatter how small, of the proportionality between inertial and grav-itational mass, and it therefore becomes very interesting to searchexperimentally for such a violation. For example, we might won-der whether neutrons and protons had slightly different ratios ofgravitational and inertial mass, which in a Galileo-style experimentwould cause a small difference between the acceleration of a leadweight, with a large neutron-to-proton ratio, and a wooden one,which consists of light elements with nearly equal numbers of neu-trons and protons. The first high-precision experiments of this typewere performed by Eotvos around the turn of the twentieth century,and they verified the equivalence of inertial and gravitational massto within about one part in 108. These are generically referred toas Eotvos experiments.

Figure e shows a strategy for doing Eotvos experiments that al-lowed a test to about one part in 1012. The top panel is a simplifiedversion. The platform is balanced, so the gravitational masses ofthe two objects are observed to be equal. The objects are madeof different substances. If the equivalence of inertial and gravita-tional mass fails to hold for these two substances, then the forceof gravity on each mass will not be exact proportion to its inertia,and the platform will experience a slight torque as the earth spins.The bottom panel shows a more realistic drawing of an experimentby Braginskii and Panov.13 The whole thing was encased in a tall

11For more justification of this statement, see ch. 9, problem 1, on page 356.12Regarding infinitesimals, see p. 93.13V.B. Braginskii and V.I. Panov, Soviet Physics JETP 34, 463 (1972).

22 Chapter 1 A Geometrical Theory of Spacetime

Page 23: General Relativity

e / An Eotvos experiment.Top: simplified version. Bottom:realistic version by Braginskii andPanov. (Drawing after Braginskiiand Panov.)

vacuum tube, which was placed in a sealed basement whose tem-perature was controlled to within 0.02 C. The total mass of theplatinum and aluminum test masses, plus the tungsten wire andthe balance arms, was only 4.4 g. To detect tiny motions, a laserbeam was bounced off of a mirror attached to the wire. There wasso little friction that the balance would have taken on the order ofseveral years to calm down completely after being put in place; tostop these vibrations, static electrical forces were applied throughthe two circular plates to provide very gentle twists on the ellipsoidalmass between them.

Equivalence of gravitational fields and accelerations

One consequence of the Eotvos experiments’ null results is thatit is not possible to tell the difference between an acceleration anda gravitational field. At certain times during Betty’s field trip, shefeels herself pressed against her seat, and she interprets this as ev-idence that she’s in a space vessel that is undergoing violent accel-erations and decelerations. But it’s equally possible that Tutorbothas simply arranged for her capsule to be hung from a rope anddangled into the gravitational field of a planet. Suppose that thefirst explanation is correct. The capsule is initially at rest in outerspace, where there is no gravity. Betty can release a pencil and alead ball in the air inside the cabin, and they will stay in place. Thecapsule then accelerates, and to Betty, who has adopted a frameof reference tied to its deck, ceiling and walls, it appears that thepencil and the ball fall to the deck. They are guaranteed to stayside by side until they hit the deckplates, because in fact they aren’taccelerating; they simply appear to accelerate, when in reality it’sthe deckplates that are coming up and hitting them. But now con-sider the second explanation, that the capsule has been dipped intoa gravitational field. The ball and the pencil will still fall side byside to the floor, because they have the same ratio of gravitationalto inertial mass.

1.5.4 The equivalence principle

This leads to one way of stating a central principle of relativityknown as the equivalence principle: Accelerations and gravitationalfields are equivalent. There is no experiment that can distinguishone from the other.14

To see what a radical departure this is, we need to compare withthe completely different picture presented by Newtonian physics andspecial relativity. Newton’s law of inertia states that “Every objectperseveres in its state of rest, or of uniform motion in a straightline, unless it is compelled to change that state by forces impressed

14This statement of the equivalence principle is summarized, along with someother forms of it to be encountered later, in the back of the book on page 401.

Section 1.5 The equivalence principle 23

Page 24: General Relativity

thereon.”15 Newton’s intention here was to clearly state a contra-diction of Aristotelian physics, in which objects were supposed tonaturally stop moving and come to rest in the absence of a force. ForAristotle, “at rest” meant at rest relative to the Earth, which repre-sented a special frame of reference. But if motion doesn’t naturallystop of its own accord, then there is no longer any way to single outone frame of reference, such as the one tied to the Earth, as beingspecial. An equally good frame of reference is a car driving in astraight line down the interstate at constant speed. The earth andthe car both represent valid inertial frames of reference, in whichNewton’s law of inertia is valid. On the other hand, there are other,noninertial frames of reference, in which the law of inertia is vio-lated. For example, if the car decelerates suddenly, then it appearsto the people in the car as if their bodies are being jerked forward,even though there is no physical object that could be exerting anytype of forward force on them. This distinction between inertial andnoninertial frames of reference was carried over by Einstein into histheory of special relativity, published in 1905.

But by the time he published the general theory in 1915, Einsteinhad realized that this distinction between inertial and noninertialframes of reference was fundamentally suspect. How do we knowthat a particular frame of reference is inertial? One way is to verifythat its motion relative to some other inertial frame, such as theEarth’s, is in a straight line and at constant speed. But how doesthe whole thing get started? We need to bootstrap the processwith at least one frame of reference to act as our standard. Wecan look for a frame in which the law of inertia is valid, but nowwe run into another difficulty. To verify that the law of inertiaholds, we have to check that an observer tied to that frame doesn’tsee objects accelerating for no reason. The trouble here is that bythe equivalence principle, there is no way to determine whether theobject is accelerating “for no reason” or because of a gravitationalforce. Betty, for example, cannot tell by any local measurement(i.e., any measurement carried out within the capsule) whether sheis in an inertial or a noninertial frame.

We could hope to resolve the ambiguity by making non-localmeasurements instead. For example, if Betty had been allowed tolook out a porthole, she could have tried to tell whether her capsulewas accelerating relative to the stars. Even this possibility ends upnot being satisfactory. The stars in our galaxy are moving in circularorbits around the galaxy. On an even larger scale, the universe isexpanding in the aftermath of the Big Bang. It spent about thefirst half of its history decelerating due to gravitational attraction,but the expansion is now observed to be accelerating, apparentlydue to a poorly understood phenomenon referred to by the catch-allterm “dark energy.” In general, there is no distant background of

15paraphrased from a translation by Motte, 1729

24 Chapter 1 A Geometrical Theory of Spacetime

Page 25: General Relativity

g / An artificial horizon.

f / Wouldn’t it be nice if we could define the meaning of a Newtonian inertial frame of reference? New-ton makes it sound easy: to define an inertial frame, just find some object that is not accelerating because it isnot being acted on by any external forces. But what object would we use? The earth? The “fixed stars?” Ourgalaxy? Our supercluster of galaxies? All of these are accelerating — relative to something.

physical objects in the universe that is not accelerating.

Lorentz frames

The conclusion is that we need to abandon the entire distinctionbetween Newton-style inertial and noninertial frames of reference.The best that we can do is to single out certain frames of referencedefined by the motion of objects that are not subject to any non-gravitational forces. A falling rock defines such a frame of reference.In this frame, the rock is at rest, and the ground is accelerating. Therock’s world-line is a straight line of constant x = 0 and varying t.Such a free-falling frame of reference is called a Lorentz frame. Theframe of reference defined by a rock sitting on a table is an inertialframe of reference according to the Newtonian view, but it is not aLorentz frame.

In Newtonian physics, inertial frames are preferable because theymake motion simple: objects with no forces acting on them movealong straight world-lines. Similarly, Lorentz frames occupy a privi-leged position in general relativity because they make motion simple:objects move along “straight” world-lines if they have no nongravi-tational forces acting on them.

The artificial horizon Example: 1The pilot of an airplane cannot always easily tell which way is up.The horizon may not be level simply because the ground has anactual slope, and in any case the horizon may not be visible if theweather is foggy. One might imagine that the problem could besolved simply by hanging a pendulum and observing which wayit pointed, but by the equivalence principle the pendulum cannottell the difference between a gravitational field and an acceler-ation of the aircraft relative to the ground — nor can any otheraccelerometer, such as the pilot’s inner ear. For example, whenthe plane is turning to the right, accelerometers will be tricked intobelieving that “down” is down and to the left. To get around this

Section 1.5 The equivalence principle 25

Page 26: General Relativity

h / Bars of upsidasium arekept in special warehouses,bolted to the ground. CopyrightJay Ward Productions, usedunder U.S. fair use exception tocopyright law.

problem, airplanes use a device called an artificial horizon, whichis essentially a gyroscope. The gyroscope has to be initializedwhen the plane is known to be oriented in a horizontal plane. Nogyroscope is perfect, so over time it will drift. For this reason theinstrument also contains an accelerometer, and the gyroscope isautomatically restored to agreement with the accelerometer, witha time-constant of several minutes. If the plane is flown in cir-cles for several minutes, the artificial horizon will be fooled intoindicating that the wrong direction is vertical.

No antigravity Example: 2This whole chain of reasoning was predicated on the null resultsof Eotvos experiments. In the Rocky and Bullwinkle cartoons,there is a non-Eotvosian substance called upsidasium, which fallsup instead of down. Its ratio of gravitational to inertial mass isapparently negative. If such a substance could be found, it wouldfalsify the equivalence principle. Cf. example 10, p. 289.

Operational definition of a Lorentz frame

We can define a Lorentz frame in operational terms using an ide-alized variation (figure i) on a device actually built by Harold Waageat Princeton as a lecture demonstration to be used by his partnerin crime John Wheeler. Build a sealed chamber whose contents areisolated from all nongravitational forces. Of the four known forcesof nature, the three we need to exclude are the strong nuclear force,the weak nuclear force, and the electromagnetic force. The strongnuclear force has a range of only about 1 fm (10−15 m), so to ex-clude it we merely need to make the chamber thicker than that, andalso surround it with enough paraffin wax to keep out any neutronsthat happen to be flying by. The weak nuclear force also has a shortrange, and although shielding against neutrinos is a practical impos-sibility, their influence on the apparatus inside will be negligible. Toshield against electromagnetic forces, we surround the chamber witha Faraday cage and a solid sheet of mu-metal. Finally, we make surethat the chamber is not being touched by any surrounding matter,so that short-range residual electrical forces (sticky forces, chem-ical bonds, etc.) are excluded. That is, the chamber cannot besupported; it is free-falling.

Crucially, the shielding does not exclude gravitational forces.There is in fact no known way of shielding against gravitational ef-fects such as the attraction of other masses (example 10, p. 289) orthe propagation of gravitational waves (ch. 9). Because the shield-ing is spherical, it exerts no gravitational force of its own on theapparatus inside.

Inside, an observer carries out an initial calibration by firingbullets along three Cartesian axes and tracing their paths, whichshe defines to be linear.

26 Chapter 1 A Geometrical Theory of Spacetime

Page 27: General Relativity

i / The spherical chamber, shownin a cutaway view, has layersof shielding to exclude all knownnongravitational forces. Oncethe chamber has been calibratedby marking the three dashed-linetrajectories under free-fall con-ditions, an observer inside thechamber can always tell whethershe is in a Lorentz frame.

We’ve gone to elaborate lengths to show that we can really de-termine, without reference to any external reference frame, thatthe chamber is not being acted on by any nongravitational forces,so that we know it is free-falling. In addition, we also want theobserver to be able to tell whether the chamber is rotating. Shecould look out through a porthole at the stars, but that would bemissing the whole point, which is to show that without reference toany other object, we can determine whether a particular frame is aLorentz frame. One way to do this would be to watch for precessionof a gyroscope. Or, without having to resort to additional appara-tus, the observer can check whether the paths traced by the bulletschange when she changes the muzzle velocity. If they do, then sheinfers that there are velocity-dependent Coriolis forces, so she mustbe rotating. She can then use flywheels to get rid of the rotation,and redo the calibration.

After the initial calibration, she can always tell whether or notshe is in a Lorentz frame. She simply has to fire the bullets, and seewhether or not they follow the precalibrated paths. For example,she can detect that the frame has become non-Lorentzian if thechamber is rotated, allowed to rest on the ground, or accelerated bya rocket engine.

It may seem that the detailed construction of this elaboratethought-experiment does nothing more than confirm something ob-vious. It is worth pointing out, then, that we don’t really knowwhether it works or not. It works in general relativity, but there areother theories of gravity, such as Brans-Dicke gravity (p. 330), thatare also consistent with all known observations, but in which the

Section 1.5 The equivalence principle 27

Page 28: General Relativity

j / Two local Lorentz frames.

k / One planet rotates aboutits axis and the other does not.As discussed in more detail onp. 115, Einstein believed thatgeneral relativity was even moreradically egalitarian about framesof reference than it really is. Hethought that if the planets werealone in an otherwise emptyuniverse, there would be no wayto tell which planet was reallyrotating and which was not, sothat B’s equatorial bulge wouldhave to disappear. There wouldbe no way to tell which planet’ssurface was a Lorentz frame.

apparatus in figure i doesn’t work. Two of the assumptions madeabove fail in this theory: gravitational shielding effects exist, andCoriolis effects become undetectable if there is not enough othermatter nearby.

Locality of Lorentz frames

It would be convenient if we could define a single Lorentz framethat would cover the entire universe, but we can’t. In figure j, twogirls simultaneously drop down from tree branches — one in Los An-geles and one in Mumbai. The girl free-falling in Los Angeles definesa Lorentz frame, and in that frame, other objects falling nearby willalso have straight world-lines. But in the LA girl’s frame of refer-ence, the girl falling in Mumbai does not have a straight world-line:she is accelerating up toward the LA girl with an acceleration ofabout 2g.

A second way of stating the equivalence principle is that it isalways possible to define a local Lorentz frame in a particular neigh-borhood of spacetime.16 It is not possible to do so on a universalbasis.

The locality of Lorentz frames can be understood in the anal-ogy of the string stretched across the globe. We don’t notice thecurvature of the Earth’s surface in everyday life because the radiusof curvature is thousands of kilometers. On a map of LA, we don’tnotice any curvature, nor do we detect it on a map of Mumbai, butit is not possible to make a flat map that includes both LA andMumbai without seeing severe distortions.

Terminology

The meanings of words evolve over time, and since relativity isnow a century old, there has been some confusing semantic driftin its nomenclature. This applies both to “inertial frame” and to“special relativity.”

Early formulations of general relativity never refer to “inertialframes,” “Lorentz frames,” or anything else of that flavor. The veryfirst topic in Einstein’s original systematic presentation of the the-ory17 is an example (figure k) involving two planets, the purposeof which is to convince the reader that all frames of reference arecreated equal, and that any attempt to make some of them intosecond-class citizens is invidious. Other treatments of general rel-ativity from the same era follow Einstein’s lead.18 The trouble is

16This statement of the equivalence principle is summarized, along with someother forms of it, in the back of the book on page 401.

17Einstein, “The Foundation of the General Theory of Relativity,” 1916. Anexcerpt is given on p. 372.

18Two that I believe were relatively influential are Born’s 1920 Einstein’s The-ory of Relativity and Eddington’s 1924 The Mathematical Theory of Relativity.Born follows Einstein’s “Foundation” paper slavishly. Eddington seems only to

28 Chapter 1 A Geometrical Theory of Spacetime

Page 29: General Relativity

that this example is more a statement of Einstein’s aspirations forhis theory than an accurate depiction of the physics that it actu-ally implies. General relativity really does allow an unambiguousdistinction to be made between Lorentz frames and non-Lorentzframes, as described on p. 26. Einstein’s statement should havebeen weaker: the laws of physics (such as the Einstein field equa-tion, p. 269) are the same in all frames (Lorentz or non-Lorentz).This is different from the situation in Newtonian mechanics and spe-cial relativity, where the laws of physics take on their simplest formonly in Newton-inertial frames.

Because Einstein didn’t want to make distinctions between frames,we ended up being saddled with inconvenient terminology for them.The least verbally awkward choice is to hijack the term “inertial,”redefining it from its Newtonian meaning. We then say that theEarth’s surface is not an inertial frame, in the context of generalrelativity, whereas in the Newtonian context it is an inertial frameto a very good approximation. This usage is fairly standard,19 butwould have made Newton confused and Einstein unhappy. If wefollow this usage, then we may sometimes have to say “Newtonian-inertial” or “Einstein-inertial.” A more awkward, but also moreprecise, term is “Lorentz frame,” as used in this book; this seems tobe widely understood.20

The distinction between special and general relativity has under-gone a similar shift over the decades. Einstein originally defined thedistinction in terms of the admissibility of accelerated frames of ref-erence. This, however, puts us in the absurd position of saying thatspecial relativity, which is supposed to be a generalization of Newto-nian mechanics, cannot handle accelerated frames of reference in thesame way that Newtonian mechanics can. In fact both Newtonianmechanics and special relativity treat Newtonian-noninertial framesof reference in the same way: by modifying the laws of physics sothat they do not take on their most simple form (e.g., violating New-ton’s third law), while retaining the ability to change coordinatesback to a preferred frame in which the simpler laws apply. It wasrealized fairly early on21 that the important distinction was betweenspecial relativity as a theory of flat spacetime, and general relativityas a theory that described gravity in terms of curved spacetime. Allrelativists writing since about 1950 seem to be in agreement on thismore modern redefinition of the terms.22

mention inertial frames in a few places where the context is Newtonian.19Misner, Thorne, and Wheeler, Gravitation, 1973, p. 1820ibid, p. 1921Eddington, op. cit.22Misner, Thorne, and Wheeler, op. cit., pp.163-164. Penrose, The Road to

Reality, 2004, p. 422. Taylor and Wheeler, Spacetime Physics, 1992, p. 132.Schutz, A First Course in General Relativity, 2009, pp. 3, 141. Hobson, GeneralRelativity: An Introduction for Physicists, 2005, sec. 1.14.

Section 1.5 The equivalence principle 29

Page 30: General Relativity

l / Chiao’s paradox: a chargedparticle and a neutral particle arein orbit around the earth. Will thecharged particle radiate, violatingthe equivalence principle?

In an accelerating frame, the equivalence principle tells us thatmeasurements will come out the same as if there were a gravitationalfield. But if the spacetime is flat, describing it in an acceleratingframe doesn’t make it curved. (Curvature is a physical property ofspacetime, and cannot be changed from zero to nonzero simply bya choice of coordinates.) Thus relativity allows us to have gravita-tional fields in flat space — but only for certain special configura-tions like this one. Special relativity is capable of operating just finein this context. For example, Chung et al.23 did a high-precisiontest of special relativity in 2009 using a matter interferometer in avertical plane, specifically in order to test whether there was anyviolation of special relativity in a uniform gravitational field. Theirexperiment is interpreted purely as a test of special relativity, notgeneral relativity.

Chiao’s paradox

The remainder of this subsection deals with the subtle ques-tion of whether and how the equivalence principle can be applied tocharged particles. You may wish to skip it on a first reading. Theshort answer is that using the equivalence principle to make con-clusions about charged particles is like the attempts by slaveholdersand abolitionists in the 19th century U.S. to support their positionsbased on the Bible: you can probably prove whichever conclusionwas the one you set out to prove.

The equivalence principle is not a single, simple, mathemati-cally well defined statement.24 As an example of an ambiguity thatis still somewhat controversial, 90 years after Einstein first proposedthe principle, consider the question of whether or not it applies tocharged particles. Raymond Chiao25 proposes the following thoughtexperiment, which I’ll refer to as Chiao’s paradox. Let a neutral par-ticle and a charged particle be set, side by side, in orbit around theearth. Assume (unrealistically) that the space around the earth hasno electric or magnetic field. If the equivalence principle appliesregardless of charge, then these two particles must go on orbitingamicably, side by side. But then we have a violation of conservationof energy, since the charged particle, which is accelerating, will radi-ate electromagnetic waves (with very low frequency and amplitude).It seems as though the particle’s orbit must decay.

The resolution of the paradox, as demonstrated by hairy cal-culations26 is interesting because it exemplifies the local nature of

23arxiv.org/abs/0905.192924A good recent discussion of this is “Theory of gravitation theories: a no-

progress report,” Sotiriou, Faraoni, and Liberati, http://arxiv.org/abs/0707.2748

25arxiv.org/abs/quant-ph/0601193v726The first detailed calculation appears to have been by Cecile and Bryce

DeWitt, “Falling Charges,” Physics 1 (1964) 3. This paper is unfortunatelyvery difficult to obtain now. A more recent treatment by Grøn and Næss is

30 Chapter 1 A Geometrical Theory of Spacetime

Page 31: General Relativity

the equivalence principle. When a charged particle moves through agravitational field, in general it is possible for the particle to experi-ence a reaction from its own electromagnetic fields. This might seemimpossible, since an observer in a frame momentarily at rest withrespect to the particle sees the radiation fly off in all directions atthe speed of light. But there are in fact several different mechanismsby which a charged particle can be reunited with its long-lost elec-tromagnetic offspring. An example (not directly related to Chiao’sscenario) is the following.

Bring a laser very close to a black hole, but not so close that ithas strayed inside the event horizon, which is the spherical point ofno return from within which nothing can escape. Example 15 onpage 65 gives a plausibility argument based on Newtonian physicsthat the radius27 of the event horizon should be something like rH =GM/c2, and section 6.3.2 on page 226 derives the relativisticallycorrect factor of 2 in front, so that rH = 2GM/c2. It turns outthat at r = (3/2)RH , a ray of light can have a circular orbit aroundthe black hole. Since this is greater than RH , we can, at least intheory, hold the laser stationary at this value of r using a powerfulrocket engine. If we point the laser in the azimuthal direction, itsown beam will come back and hit it.

Since matter can experience a back-reaction from its own elec-tromagnetic radiation, it becomes plausible how the paradox can beresolved. The equivalence principle holds locally, i.e., within a smallpatch of space and time. If Chiao’s charged and neutral particle arereleased side by side, then they will obey the equivalence principlefor at least a certain amount of time — and “for at least a certainamount of time” is all we should expect, since the principle is local.But after a while, the charged particle will start to experience aback-reaction from its own electromagnetic fields, and this causesits orbit to decay, satisfying conservation of energy. Since Chiao’sparticles are orbiting the earth, and the earth is not a black hole, themechanism clearly can’t be as simple as the one described above,but Grøn and Næss show that there are similar mechanisms thatcan apply here, e.g., scattering of light waves by the nonuniformgravitational field.

It is worth keeping in mind the DeWitts’ caution that “The ques-tions answered by this investigation are of conceptual interest only,since the forces involved are far too small to be detected experimen-tally” (see problem 8, p. 39).

accessible at arxiv.org/abs/0806.0464v1. A full exposition of the techniquesis given by Poisson, “The Motion of Point Particles in Curved Spacetime,” www.

livingreviews.org/lrr-2004-6.27Because relativity describes gravitational fields in terms of curvature of

spacetime, the Euclidean relationship between the radius and circumference ofa circle fails here. The r coordinate should be understood here not as the radiusmeasured from the center but as the circumference divided by 2π.

Section 1.5 The equivalence principle 31

Page 32: General Relativity

m / 1. A photon is emittedupward from the floor of the ele-vator. The elevator acceleratesupward. 2. By the time thephoton is detected at the ceiling,the elevator has changed itsvelocity, so the photon is detectedwith a Doppler shift.

n / An electromagnetic wavestrikes an ohmic surface. Thewave’s electric field excites anoscillating current density J. Thewave’s magnetic field then actson these currents, producinga force in the direction of thewave’s propagation. This is apre-relativistic argument thatlight must possess inertia. Thefirst experimental confirmation ofthis prediction is shown in figureo. See Nichols and Hull, ”Thepressure due to radiation,” Phys.Rev. (Series I) 17 (1903) 26.

1.5.5 Gravitational red-shifts

Starting on page 15, we saw experimental evidence that the rateof flow of time changes with height in a gravitational field. We cannow see that this is required by the equivalence principle.

By the equivalence principle, there is no way to tell the differencebetween experimental results obtained in an accelerating laboratoryand those found in a laboratory immersed in a gravitational field.28

In a laboratory accelerating upward, a photon emitted from thefloor and would be Doppler-shifted toward lower frequencies whenobserved at the ceiling, because of the change in the receiver’s ve-locity during the photon’s time of flight. The effect is given by∆E/E = ∆f/f = ay/c2, where a is the lab’s acceleration, y is theheight from floor to ceiling, and c is the speed of light.

Self-check: Verify this statement.

By the equivalence principle, we find that when such an experi-ment is done in a gravitational field g, there should be a gravitationaleffect on the energy of a photon equal to ∆E/E = gy/c2. Since thequantity gy is the gravitational potential (gravitational energy perunit mass), the photon’s fractional loss of energy is the same as the(Newtonian) loss of energy experienced by a material object of massm and initial kinetic energy mc2.

The interpretation is as follows. Classical electromagnetism re-quires that electromagnetic waves have inertia. For example, if aplane wave strikes an ohmic surface, as in figure n, the wave’s elec-tric field excites oscillating currents in the surface. These currentsthen experience a magnetic force from the wave’s magnetic field,and application of the right-hand rule shows that the resulting forceis in the direction of propagation of the wave. Thus the light waveacts as if it has momentum. The equivalence principle says thatwhatever has inertia must also participate in gravitational interac-tions. Therefore light waves must have weight, and must lose energywhen they rise through a gravitational field.

Self-check: Verify the application of the right-hand rule de-scribed above.

Further interpretation:

• The quantity mc2 is famous, even among people who don’tknow what m and c stand for. This is the first hint of whereit comes from. The full story is given in section 4.2.2.

• The relation p = E/c between the energy and momentumof a light wave follows directly from Maxwell’s equations, bythe argument above; however, we will see in section 4.2.2 that

28Problem 4 on p. 38 verifies, in one specific example, that this way of statingthe equivalence principle is implied by the one on p. ??.

32 Chapter 1 A Geometrical Theory of Spacetime

Page 33: General Relativity

o / A simplified drawing ofthe 1903 experiment by Nicholsand Hull that verified the pre-dicted momentum of light waves.Two circular mirrors were hungfrom a fine quartz fiber, insidean evacuated bell jar. A 150mW beam of light was shoneon one of the mirrors for 6 s,producing a tiny rotation, whichwas measurable by an opticallever (not shown). The force waswithin 0.6% of the theoreticallypredicted value of 0.001 µN.For comparison, a short clippingof a single human hair weighs∼ 1 µN.

according to relativity this relation must hold for any masslessparticle

• What we have found agrees with Niels Bohr’s correspondenceprinciple, which states that when a new physical theory, suchas relativity, replaces an older one, such as Newtonian physics,the new theory must agree with the old one under the experi-mental conditions in which the old theory had been verified byexperiments. The gravitational mass of a beam of light withenergy E is E/c2, and since c is a big number, it is not sur-prising that the weight of light rays had never been detectedbefore Einstein trying to detect it.

• This book describes one particular theory of gravity, Einstein’stheory of general relativity. There are other theories of grav-ity, and some of these, such as the Brans-Dicke theory, dojust as well as general relativity in agreeing with the presentlyavailable experimental data. Our prediction of gravitationalDoppler shifts of light only depended on the equivalence princi-ple, which is one ingredient of general relativity. Experimentaltests of this prediction only test the equivalence principle; theydo not allow us to distinguish between one theory of gravityand another if both theories incorporate the equivalence prin-ciple.

• If an object such as a radio transmitter or an atom in an ex-cited state emits an electromagnetic wave with a frequency f ,then the object can be considered to be a type of clock. Wecan therefore interpret the gravitational red-shift as a gravi-tational time dilation: a difference in the rate at which timeitself flows, depending on the gravitational potential. This isconsistent with the empirical results presented in section ??,p. ??.

Chiao’s paradox revisited Example: 3The equivalence principle says that electromagnetic waves havegravitational mass as well as inertial mass, so it seems clear thatthe same must hold for static fields. In Chiao’s paradox (p. 39), theorbiting charged particle has an electric field that extends out toinfinity. When we measure the mass of a charged particle such asan electron, there is no way to separate the mass of this field froma more localized contribution. The electric field “falls” through thegravitational field, and the equivalence principle, which is local,cannot guarantee that all parts of the field rotate uniformly aboutthe earth, even in distant parts of the universe. The electric fieldpattern becomes distorted, and this distortion causes a radiationreaction which back-reacts on the particle, causing its orbit to de-cay.

Section 1.5 The equivalence principle 33

Page 34: General Relativity

p / The Pound-Rebka experi-ment.

q / Emission of 14 keV gamma-rays by 57Fe. The parent nucleus57Co absorbs an electron andundergoes a weak-force decayprocess that converts it into 57Fe,in an excited state. With 85%probability, this state decays to astate just above the ground state,with an excitation energy of 14keV and a half-life of 10−7 s. Thisstate finally decays, either bygamma emission or emission ofan internal conversion electron,to the ground state.

1.5.6 The Pound-Rebka experiment

The 1959 Pound-Rebka experiment at Harvard29 was one of thefirst high-precision, relativistic tests of the equivalence principle tobe carried out under controlled conditions, and in this section wewill discuss it in detail.

When y is on the order of magnitude of the height of a building,the value of ∆E/E = gy/c2 is ∼ 10−14, so an extremely high-precision experiment is necessary in order to detect a gravitationalred-shift. A number of other effects are big enough to obscure it en-tirely, and must somehow be eliminated or compensated for. Theseare listed below, along with their orders of magnitude in the exper-imental design finally settled on by Pound and Rebka.

(1) Classical Doppler broadening due to temper-ature. Thermal motion causes Doppler shifts ofemitted photons, corresponding to the randomcomponent of the emitting atom’s velocity vec-tor along the direction of emission.

∼ 10−6

(2) The recoil Doppler shift. When an atomemits a photon with energy E and momentump = E/c, conservation of momentum requiresthat the atom recoil with momentum p = −E/cand energy p2/2m. This causes a downwardDoppler shift of the energy of the emitted pho-ton. A similar effect occurs on absorption, dou-bling the problem.

∼ 10−12

(3) Natural line width. The Heisenberg uncer-tainty principle says that a state with a half-lifeτ must have an uncertainty in its energy of atleast ∼ h/τ , where h is Planck’s constant.

∼ 10−12

(4) Special-relativistic Doppler shift due to tem-perature. Section 1.2 presented experimental ev-idence that time flows at a different rate depend-ing on the motion of the observer. Thereforethe thermal motion of an atom emitting a pho-ton has an effect on the frequency of the photon,even if the atom’s motion is not along the line ofemission. The equations needed in order to cal-culate this effect will not be derived until section2.2; a quantitative estimate is given in example14 on page 62. For now, we only need to knowthat this leads to a temperature-dependence inthe average frequency of emission, in additionto the broadening of the bell curve described byeffect (1) above.

∼ 10−14 perdegree C

The most straightforward way to mitigate effect (1) is to use

29Phys. Rev. Lett. 4 (1960) 337

34 Chapter 1 A Geometrical Theory of Spacetime

Page 35: General Relativity

r / Top: A graph of velocityversus time for the source. Thevelocity has both a constantcomponent and an oscillating onewith a frequency of 10-50 Hz.The constant component vo wasused as a way of determining thecalibration of frequency shift as afunction of count rates. Data wereacquired during the quarter-cycleperiods of maximum oscillatoryvelocity, 1 and 2. Bottom: Countrates as a function of velocity,for vo = 0 and v1 6= 0. Thedashed curve and black circlesrepresent the count rates thatwould have been observed ifthere were no gravitational effect.The gravitational effect shiftsthe resonance curve to one side(solid curve), resulting in anasymmetry of the count rates(open circles). The shift, and theresulting asymmetry, are greatlyexaggerated for readability; inreality, the gravitational effect was500 times smaller than the widthof the resonance curve.

photons emitted from a solid. At first glance this would seem likea bad idea, since electrons in a solid emit a continuous spectrum oflight, not a discrete spectrum like the ones emitted by gases; thisis because we have N electrons, where N is on the order of Avo-gadro’s number, all interacting strongly with one another, so by thecorrespondence principle the discrete quantum-mechanical behaviormust be averaged out. But the protons and neutrons within onenucleus do not interact much at all with those in other nuclei, sothe photons emitted by a nucleus do have a discrete spectrum. Theenergy scale of nuclear excitations is in the keV or MeV range, sothese photons are x-rays or gamma-rays. Furthermore, the time-scale of the random vibrations of a nucleus in a solid are extremelyshort. For a velocity on the order of 100 m/s, and vibrations with anamplitude of ∼ 10−10 m, the time is about 10−12 s. In many cases,this is much shorter than the half-life of the excited nuclear stateemitting the gamma-ray, and therefore the Doppler shift averagesout to nearly zero.

Effect (2) is still much bigger than the 10−14 size of the effect tobe measured. It can be avoided by exploiting the Mossbauer effect,in which a nucleus in a solid substance at low temperature emits orabsorbs a gamma-ray photon, but with significant probability therecoil is taken up not by the individual nucleus but by a vibrationof the atomic lattice as a whole. Since the recoil energy varies asp2/2m, the large mass of the lattice leads to a very small dissipationof energy into the recoiling lattice. Thus if a photon is emitted andabsorbed by identical nuclei in a solid, and for both emission andabsorption the recoil momentum is taken up by the lattice as awhole, then there is a negligible energy shift. One must pick anisotope that emits photons with energies of about 10-100 keV. X-rays with energies lower than about 10 keV tend to be absorbedstrongly by matter and are difficult to detect, whereas for gamma-ray energies & 100 keV the Mossbauer effect is not sufficient toeliminate the recoil effect completely enough.

If the Mossbauer effect is carried out in a horizontal plane, reso-nant absorption occurs. When the source and absorber are alignedvertically, p, gravitational frequency shifts should cause a mismatch,destroying the resonance. One can move the source at a small ve-locity (typically a few mm/s) in order to add a Doppler shift ontothe frequency; by determining the velocity that compensates for thegravitational effect, one can determine how big the gravitationaleffect is.

The typical half-life for deexcitation of a nucleus by emissionof a gamma-ray with energy E is in the nanosecond range. Tomeasure an gravitational effect at the 10−14 level, one would like tohave a natural line width, (3), with ∆E/E . 10−14, which wouldrequire a half-life of & 10 µs. In practice, Pound and Rebka foundthat other effects, such as (4) and electron-nucleus interactions that

Section 1.5 The equivalence principle 35

Page 36: General Relativity

s / Pound and Rebka at thetop and bottom of the tower.

depended on the preparation of the sample, tended to put nucleiin one sample “out of tune” with those in another sample at the10−13-10−12 level, so that resonance could not be achieved unless thenatural line width gave ∆E/E & 10−12. As a result, they settled onan experiment in which 14 keV gammas were emitted by 57Fe nuclei(figure q) at the top of a 22-meter tower, and absorbed by 57Fenuclei at the bottom. The 100-ns half-life of the excited state leadsto ∆E/E ∼ 10−12. This is 500 times greater than the gravitationaleffect to be measured, so, as described in more detail below, theexperiment depended on high-precision measurements of small up-and-down shifts of the bell-shaped resonance curve.

The absorbers were seven iron films isotopically enhanced in57Fe, applied directly to the faces of seven sodium-iodide scintil-lation detectors (bottom of figure p). When a gamma-ray impingeson the absorbers, a number of different things can happen, of whichwe can get away with considering only the following: (a) the gamma-ray is resonantly absorbed in one of the 57Fe absorbers, after whichthe excited nucleus decays by re-emission of another such photon (ora conversion electron), in a random direction; (b) the gamma-raypasses through the absorber and then produces ionization directlyin the sodium iodide crystal. In case b, the gamma-ray is detected.In case a, there is a 50% probability that the re-emitted photon willcome out in the upward direction, so that it cannot be detected.Thus when the conditions are right for resonance, a reduction incount rate is expected. The Mossbauer effect never occurs with100% probability; in this experiment, about a third of the gammasincident on the absorbers were resonantly absorbed.

The choice of y = 22 m was dictated mainly by systematic er-rors. The experiment was limited by the strength of the gamma-raysource. For a source of a fixed strength, the count rate in the de-tector at a distance y would be proportional to y−2, leading to sta-tistical errors proportional to 1/

√count rate ∝ y. Since the effect

to be measured is also proportional to y, the signal-to-noise ratiowas independent of y. However, systematic effects such as (4) wereeasier to monitor and account for when y was fairly large. A labbuilding at Harvard happened to have a 22-meter tower, which wasused for the experiment. To reduce the absorption of the gammasin the 22 meters of air, a long, cylindrical mylar bag full of heliumgas was placed in the shaft, p.

The resonance was a bell-shaped curve with a minimum at thenatural frequency of emission. Since the curve was at a minimum,where its derivative was zero, the sensitivity of the count rate tothe gravitational shift would have been nearly zero if the source hadbeen stationary. Therefore it was necessary to vibrate the sourceup and down, so that the emitted photons would be Doppler shiftedonto the shoulders of the resonance curve, where the slope of thecurve was large. The resulting asymmetry in count rates is shown

36 Chapter 1 A Geometrical Theory of Spacetime

Page 37: General Relativity

in figure r. A further effort to cancel out possible systematic effectswas made by frequently swapping the source and absorber betweenthe top and bottom of the tower.

For y = 22.6 m, the equivalence principle predicts a fractionalfrequency shift due to gravity of 2.46 × 10−15. Pound and Rebkameasured the shift to be (2.56± 0.25)× 10−15. The results were instatistical agreement with theory, and verified the predicted size ofthe effect to a precision of 10%.

Section 1.5 The equivalence principle 37

Page 38: General Relativity

Problems1 In classical mechanics, one hears the term “the acceleration ofgravity,” which doesn’t literally make sense, since it is objects thataccelerate. Explain why this term’s usefulness is dependent on theequivalence principle.

2 The New Horizons space probe communicates with the earthusing microwaves with a frequency of about 10 GHz. Estimate thesizes of the following frequency shifts in this signal, when the probeflies by Pluto in 2015, at a velocity of ∼ 10 A.U./year: (a) theDoppler shift due to the probe’s velocity; (b) the Doppler shift dueto the Earth’s orbital velocity; (c) the gravitational Doppler shift.

3 Euclid’s axioms E1-E5 (p. 18) do not suffice to prove thatthere are an infinite number of points in the plane, and thereforethey need to be supplemented by an extra axiom that states this(unless one finds the nonstandard realizations with finitely manypoints to be interesting enough to study for their own sake). Provethat the axioms of ordered geometry O1-O4 on p. 19 do not havethis problem. . Solution, p. 377

4 In the science fiction novel Have Spacesuit — Will Travel, byRobert Heinlein, Kip, a high school student, answers a radio distresscall, encounters a flying saucer, and is knocked out and kidnappedby aliens. When he wakes up, he finds himself in a locked cellwith a young girl named Peewee. Peewee claims they’re aboard anaccelerating spaceship. “If this was a spaceship,” Kip thinks. “Thefloor felt as solid as concrete and motionless.”

The equivalence principle can be stated in a variety of ways. Onp. ??, I stated it as (1) gravitational and inertial mass are alwaysproportional to one another. An alternative formulation (p. ??)is (2) that Kip has no way, by experiments or observarions insidehis sealed prison cell, to determine whether he’s in an acceleratingspaceship or on the surface of a planet, experiencing its gravitationalfield.

(a) Show that any violation of statement 1 also leads to a violation ofstatement 2. (b) If we’d intended to construct a geometrical theoryof gravity roughly along the lines of axioms O1-O4 on p. 19, whichaxiom is violated in this scenario? . Solution, p. 377

5 Clock A sits on a desk. Clock B is tossed up in the air fromthe same height as the desk and then comes back down. Comparethe elapsed times. . Hint, p. 377 . Solution, p. 377

6 (a) Find the difference in rate between a clock at the centerof the earth and a clock at the south pole. (b) When an antennaon earth receives a radio signal from a space probe that is in ahyperbolic orbit in the outer solar system, the signal will show both

38 Chapter 1 A Geometrical Theory of Spacetime

Page 39: General Relativity

a kinematic red-shift and a gravitational blueshift. Compare theorders of magnitude of these two effects. . Solution, p. 377

7 Consider the following physical situations: (1) a chargedobject lies on a desk on the planet earth; (2) a charged object orbitsthe earth; (3) a charged object is released above the earth’s surfaceand dropped straight down; (4) a charged object is subjected to aconstant acceleration by a rocket engine in outer space. In each case,we want to know whether the charge radiates. Analyze the physicsin each case (a) based on conservation of energy; (b) by determiningwhether the object’s motion is inertial in the sense intended by IsaacNewton; (c) using the most straightforward interpretation of theequivalence principle (i.e., not worrying about the issues discussedon p. that surround the ambiguous definition of locality).

. Solution, p. 378

8 Consider the physical situation depicted in figure l, p. 30.Let ag be the gravitational acceleration and ar the acceleration ofthe charged particle due to radiation. Then ar/ag measures the vi-olation of the equivalence principle. The goal of this problem is tomake an order-of-magnitude estimate of this ratio in the case of aneutron and a proton in low earth orbit.(a) Let m the mass of each particle, and q the charge of the chargedparticle. Without doing a full calculation like the ones by the De-Witts and Grøn and Næss, use general ideas about the frequency-scaling of radiation (see section 9.2.5, p. 352) to find the proportion-ality that gives the dependence of ar/ag on q, m, and any convenientparameters of the orbit.(b) Based on considerations of units, insert the necessary universalconstants into your answer from part a.(c) The result from part b will still be off by some unitless factor, butwe expect this to be of order unity. Under this assumption, makean order-of-magnitude estimate of the violation of the equivalenceprinciple in the case of a neutron and a proton in low earth orbit.

. Solution, p. 378

Problems 39

Page 40: General Relativity

40 Chapter 1 A Geometrical Theory of Spacetime

Page 41: General Relativity

a / Hendrik Antoon Lorentz(1853-1928)

Chapter 2

Geometry of flat spacetime

The geometrical treatment of space, time, and gravity only requiresas its basis the equivalence of inertial and gravitational mass. Giventhis assumption, we can describe the trajectory of any free-fallingtest particle as a geodesic. Equivalence of inertial and gravitationalmass holds for Newtonian gravity, so it is indeed possible to redoNewtonian gravity as a theory of curved spacetime. This project wascarried out by the French mathematician Cartan. The geometry ofthe local reference frames is very simple. The three space dimensionshave an approximately Euclidean geometry, and the time dimensionis entirely separate from them. This is referred to as a Euclideanspacetime with 3+1 dimensions. Although the outlook is radicallydifferent from Newton’s, all of the predictions of experimental resultsare the same.

The experiments in section 1.2 show, however, that there arereal, experimentally verifiable violations of Newton’s laws. In New-tonian physics, time is supposed to flow at the same rate everywhere,which we have found to be false. The flow of time is actually depen-dent on the observer’s state of motion through space, which showsthat the space and time dimensions are intertwined somehow. Thegeometry of the local frames in relativity therefore must not be assimple as Euclidean 3+1. Their actual geometry was implicit inEinstein’s 1905 paper on special relativity, and had already beendeveloped mathematically, without the full physical interpretation,by Hendrik Lorentz. Lorentz’s and Einstein’s work were explicitlyconnected by Minkowski in 1907, so a Lorentz frame is often referredto as a Minkowski frame.

To describe this Lorentz geometry, we need to add more struc-ture on top of the axioms O1-O4 of ordered geometry, but it will notbe the additional Euclidean structure of E3-E4, it will be somethingdifferent. To see how to proceed, let’s start by thinking about whatbare minimum of geometrical machinery is needed in order to setup frames of reference.

41

Page 42: General Relativity

a / Objects are released atrest at spacetime events P andQ. They remain at rest, and theirworld-lines define a notion ofparallelism.

b / There is no well-definedangular measure in this ge-ometry. In a different frame ofreference, the angles are notright angles.

c / Simultaneity is not welldefined. The constant-time linesPQ and RS from figure a arenot constant-time lines whenobserved in a different frame ofreference.

2.1 Affine properties of Lorentz geometry2.1.1 Parallelism and measurement

We think of a frame of reference as a body of measurementsor possible measurements to be made by some observer. Orderedgeometry lacks measure. The following argument shows that merelyby adding a notion of parallelism to our geometry, we automaticallygain a system of measurement.

We only expect Lorentz frames to be local, but we do need themto be big enough to cover at least some amount of spacetime. IfBetty does an Eotvos experiment by releasing a pencil and a leadball side by side, she is essentially trying to release them at thesame event A, so that she can observe them later and determinewhether their world-lines stay right on top of one another at pointB. That was all that was required for the Eotvos experiment, butin order to set up a Lorentz frame we need to start dealing withobjects that are not right on top of one another. Suppose we re-lease two lead balls in two different locations, at rest relative to oneanother. This could be the first step toward adding measurementto our geometry, since the balls mark two points in space that areseparated by a certain distance, like two marks on a ruler, or thegoals at the ends of a soccer field. Although the balls are separatedby some finite distance, they are still close enough together so thatif there is a gravitational field in the area, it is very nearly the samein both locations, and we expect the distance defined by the gapbetween them to stay the same. Since they are both subject only togravitational forces, their world-lines are by definition straight lines(geodesics). The goal here is to end up with some kind of coordi-nate grid defining a (t,x) plane, and on such a grid, the two balls’world-lines are vertical lines. If we release them at events P andQ, then observe them again later at R and S, PQRS should form arectangle on such a plot. In the figure, the irregularly spaced tickmarks along the edges of the rectangle are meant to suggest thatalthough ordered geometry provides us with a well-defined orderingalong these lines, we have not yet constructed a complete system ofmeasurement.

The depiction of PQSR as a rectangle, with right angles at itsvertices, might lead us to believe that our geometry would havesomething like the concept of angular measure referred to in Euclid’sE4, equality of right angles. But this is too naive even for theEuclidean 3+1 spacetime of Newton and Galileo. Suppose we switchto a frame that is moving relative to the first one, so that the ballsare not at rest. In the Euclidean spacetime, time is absolute, soevents P and Q would remain simultaneous, and so would R andS; the top and bottom edges PQ and RS would remain horizontalon the plot, but the balls’ world-lines PR and QS would becomeslanted. The result would be a parallelogram. Since observers in

42 Chapter 2 Geometry of Flat Spacetime

Page 43: General Relativity

different states of motion do not agree on what constitutes a rightangle, the concept of angular measure is clearly not going to beuseful here. Similarly, if Euclid had observed that a right angledrawn on a piece of paper no longer appeared to be a right anglewhen the paper was turned around, he would never have decidedthat angular measure was important enough to be enshrined in E4.

In the context of relativity, where time is not absolute, there isnot even any reason to believe that different observers must agree onthe simultaneity of PQ and RS. Our observation that time flows dif-ferently depending on the observer’s state of motion tells us specifi-cally to expect this not to happen when we switch to a frame movingto the relative one. Thus in general we expect that PQRS will bedistorted into a form like the one shown in figure c. We do expect,however, that it will remain a parallelogram; a Lorentz frame is onein which the gravitational field, if any, is constant, so the propertiesof spacetime are uniform, and by symmetry the new frame shouldstill have PR=QS and PQ=RS.

With this motivation, we form the system of affine geometry byadding the following axioms to set O1-O4.1 The notation [PQRS]means that events P, Q, S, and R form a parallelogram, and isdefined as the statement that the lines determined by PQ and RSnever meet at a point, and similarly for PR and QS.

A1 Constructibility of parallelograms: Given any P, Q, and R,there exists S such that [PQRS], and if P, Q, and R are distinctthen S is unique.

A2 Symmetric treatment of the sides of a parallelogram: If [PQRS],then [QRSP], [QPSR], and [PRQS].

A3 Lines parallel to the same line are parallel to one another: If[ABCD] and [ABEF], then [CDEF].

The following theorem is a stronger version of Playfair’s axiomE5, the interpretation being that affine geometry describes a space-time that is locally flat.

Theorem: Given any line ` and any point P not on the line, thereexists a unique line through P that is parallel to `.

This is stronger than E5, which only guarantees uniqueness, notexistence. Informally, the idea here is that A1 guarantees the exis-tence of the parallel, and A3 makes it unique.2

1The axioms are summarized for convenient reference in the back of the bookon page 400. This formulation is essentially the one given by Penrose, The Roadto Reality, in section 14.1.

2Proof: Pick any two distinct points A and B on `, and construct the uniquelydetermined parallelogram [ABPQ] (axiom A1). Points P and Q determine a line(axiom O1), and this line is parallel to ` (definition of the parallelogram). To

Section 2.1 Affine properties of Lorentz geometry 43

Page 44: General Relativity

d / Construction of an affineparameter.

e / Affine geometry gives awell-defined centroid for thetriangle.

Although these new axioms do nothing more than to introducethe concept of parallelism lacking in ordered geometry, it turns outthat they also allow us to build up a concept of measurement. Let` be a line, and suppose we want to define a number system on thisline that measures how far apart events are. Depending on the typeof line, this could be a measurement of time, of spatial distance, ora mixture of the two. First we arbitrarily single out two distinctpoints on ` and label them 0 and 1. Next, pick some auxiliary pointq0 not lying on `. By A1, construct the parallelogram 01q0q1. Nextconstruct q01q12. Continuing in this way, we have a scaffolding ofparallelograms adjacent to the line, determining an infinite lattice ofpoints 1, 2, 3, . . . on the line, which represent the positive integers.Fractions can be defined in a similar way. For example, 1

2 is definedas the point such that when the initial lattice segment 01

2 is extendedby the same construction, the next point on the lattice is 1.

The continuously varying variable constructed in this way iscalled an affine parameter. The time measured by a free-fallingclock is an example of an affine parameter, as is the distance mea-sured by the tick marks on a free-falling ruler. Since light rays travelalong geodesics, the wave crests on a light wave can even be usedanalogously to the ruler’s tick marks.

Centroids Example: 1The affine parameter can be used to define the centroid of a setof points. In the simplest example, finding the centroid of twopoints, we simply bisect the line segment as described above inthe construction of the number 1

2 . Similarly, the centroid of a tri-angle can be defined as the intersection of its three medians, thelines joining each vertex to the midpoint of the opposite side.

Conservation of momentum Example: 2In nonrelativistic mechanics, the concept of the center of massis closely related to the law of conservation of momentum. Forexample, a logically complete statement of the law is that if a sys-tem of particles is not subjected to any external force, and wepick a frame in which its center of mass is initially at rest, then itscenter of mass remains at rest in that frame. Since centroids arewell defined in affine geometry, and Lorentz frames have affineproperties, we have grounds to hope that it might be possible togeneralize the definition of momentum relativistically so that thegeneralized version is conserved in a Lorentz frame. On the otherhand, we don’t expect to be able to define anything like a global

prove that this line is unique, we argue by contradiction. Suppose some otherparallel m to exist. If m crosses the infinite line BQ at some point Z, then both[ABPQ] and [ABPZ], so by A1, Q=Z, so the ` and m are the same. The onlyother possibility is that m is parallel to BQ, but then the following chain ofparallelisms holds: PQ ‖ AB ‖ m ‖ BQ. By A3, lines parallel to another line areparallel to each other, so PQ ‖ BQ, but this is a contradiction, since they haveQ in common.

44 Chapter 2 Geometry of Flat Spacetime

Page 45: General Relativity

f / Example 3. The area ofthe viola can be determinedby counting the parallelogramsformed by the lattice. The areacan be determined to any desiredprecision, by dividing the parallel-ograms into fractional parts thatare as small as necessary.

g / Example 4.

Lorentz frame for the entire universe, so there is no such naturalexpectation of being able to define a global principle of conser-vation of momentum. This is an example of a general fact aboutrelativity, which is that conservation laws are difficult or impossibleto formulate globally.

Although the affine parameter gives us a system of measurementfor free in a geometry whose axioms do not even explicitly mentionmeasurement, there are some restrictions:

The affine parameter is defined only along straight lines, i.e.,geodesics. Alice’s clock defines an affine parameter, but Betty’sdoes not, since it is subject to nongravitational forces.

We cannot compare distances along two arbitrarily chosenlines, only along a single line or two parallel lines.

The affine parameter is arbitrary not only in the choice of itsorigin 0 (which is to be expected in any case, since any frameof reference requires such an arbitrary choice) but also in thechoice of scale. For example, there is no fundamental way ofdeciding how fast to make a clock tick.

We will eventually want to lift some of these restrictions byadding to our kit a tool called a metric, which allows us to de-fine distances along arbitrary curves in space time, and to comparedistances in different directions. The affine parameter, however, willnot be entirely superseded. In particular, we’ll find that the metrichas a couple of properties that are not as nice as those of the affineparameter. The square of a metric distance can be negative, and themetric distance measured along a light ray is precisely zero, whichis not very useful.

Self-check: By the construction of the affine parameter above,affine distances on the same line are comparable. By another con-struction, verify the claim made above that this can be extended todistances measured along two different parallel lines.

Area and volume Example: 3It is possible to define area and volume in affine geometry. This

is a little surprising, since distances along different lines are noteven comparable. However, we are already accustomed to mul-tiplying and dividing numbers that have different units (a conceptthat would have given Euclid conniptions), and the situation inaffine geometry is really no different. To define area, we extendthe one-dimensional lattice to two dimensions. Any planar figurecan be superimposed on such a lattice, and dissected into paral-lelograms, each of which has a standard area.

Area on a graph of v versus t Example: 4If an object moves at a constant velocity v for time t, the distance

Section 2.1 Affine properties of Lorentz geometry 45

Page 46: General Relativity

it travels can be represented by the area of a parallelogram in anaffine plane with sides having lengths v and t. These two lengthsare measured by affine parameters along two different directions,so they are not comparable. For example, it is meaningless toask whether 1 m/s is greater than, less than, or equal to 1 s. Ifwe were graphing velocity as a function of time on a conventionalCartesian graph, the v and t axes would be perpendicular, butaffine geometry has no notion of angular measure, so this is irrel-evant here.

Self-check: If multiplication is defined in terms of affine area,prove the commutative property ab = ba and the distributive rulea(b+ c) = ab+ bc from axioms A1-A3.

2.1.2 Vectors

Vectors distinguished from scalars

We’ve been discussing subjects like the center of mass that infreshman mechanics would be described in terms of vectors andscalars, the distinction being that vectors have a direction in spaceand scalars don’t. As we make the transition to relativity, we areforced to refine this distinction. For example, we used to considertime as a scalar, but the Hafele-Keating experiment shows that timeis different in different frames of reference, which isn’t somethingthat’s supposed to happen with scalars such as mass or temperature.In affine geometry, it doesn’t make much sense to say that a vectorhas a magnitude and direction, since non-parallel magnitudes aren’tcomparable, and there is no system of angular measurement in whichto describe a direction.

A better way of defining vectors and scalars is that scalars areabsolute, vectors relative. If I have three apples in a bowl, then allobservers in all frames of reference agree with me on the numberthree. But if my terrier pup pulls on the leash with a certain forcevector, that vector has to be defined in relation to other things. Itmight be three times the strength of some force that we define asone newton, and in the same direction as the earth’s magnetic field.

In general, measurement means comparing one thing to another.The number of apples in the bowl isn’t a measurement, it’s a count.

Affine measurement of vectors

Before even getting into the full system of affine geometry, let’sconsider the one-dimensional example of a line of time. We coulduse the hourly emergence of a mechanical bird from a pendulum-driven cuckoo clock to measure the rate at which the earth spins,but we could equally well take our planet’s rotation as the standardand use it to measure the frequency with which the bird pops out ofthe door. Once we have two things to compare against one another,measurement is reduced to counting (figure d, p. 44). Schematically,

46 Chapter 2 Geometry of Flat Spacetime

Page 47: General Relativity

let’s represent this measurement process with the following notation,which is part of a system called called birdtracks:3

c e = 24

Here c represents the cuckoo clock and e the rotation of the earth.Although the measurement relationship is nearly symmetric, thearrow has a direction, because, for example, the measurement ofthe earth’s rotational period in terms of the clock’s frequency isc e = (24 hr)(1 hr−1) = 24, but the clock’s period in terms of theearth’s frequency is e c = 1/24. We say that the relationship isnot symmetric but “dual.” By the way, it doesn’t matter how wearrange these diagrams on the page. The notations c e and e cmean exactly the same thing, and expressions like this can even bedrawn vertically.

Suppose that e is a displacement along some one-dimensionalline of time, and we want to think of it as the thing being measured.Then we expect that the measurement process represented by c pro-duces a real-valued result and is a linear function of e. Since therelationship between c and e is dual, we expect that c also belongsto some vector space. For example, vector spaces allow multiplica-tion by a scalar: we could double the frequency of the cuckoo clockby making the bird come out on the half hour as well as on thehour, forming 2c. Measurement should be a linear function of bothvectors; we say it is “bilinear.”

Duality

The two vectors c and e have different units, hr−1 and hr, andinhabit two different one-dimensional vector spaces. The “flavor” ofthe vector is represented by whether the arrow goes into it or comesout. Just as we used notation like −→v in freshman physics to tellvectors apart from scalars, we can employ arrows in the birdtracksnotation as part of the notation for the vector, so that instead ofwriting the two vectors as c and e, we can notate them as c ande . Performing a measurement is like plumbing. We join the two

“pipes” in c e and simplify to c e .

A confusing and nonstandardized jungle of notation and termi-nology has grown up around these concepts. For now, let’s refer to avector such as e , with the arrow coming in, simply as a “vector,”and the type like c as a “dual vector.” In the one-dimensionalexample of the earth and the cuckoo clock, the roles played by thetwo vectors were completely equivalent, and it didn’t matter whichone we expressed as a vector and which as a dual vector. Example5 shows that it is sometimes more natural to take one quantity as

3The system used in this book follows the one defined by Cvitanovic, whichwas based closely on a graphical notation due to Penrose. For a more com-plete exposition, see the Wikipedia article “Penrose graphical notation” andCvitanovic’s online book at birdtracks.eu.

Section 2.1 Affine properties of Lorentz geometry 47

Page 48: General Relativity

h / 1. A displacement vector.2. A vector from the space dualto the space of displacements.3. Measurement is reducedto counting. The cuckoo clockchimes 24 times in one rotationof the earth.

i / Constant-temperature curvesfor January in North America, atintervals of 4 C. The tempera-ture gradient at a given point is adual vector.

a vector and another as a dual vector. Example 6 shows that wesometimes have no choice at all as to which is which.

Scaling

In birdtracks notation, a scalar is a quantity that has no externalarrows at all. Since the expression c e = 24 has no external arrows,only internal ones, it represents a scalar. This makes sense becauseit’s a count, and a count is a scalar.

A convenient way of summarizing all of our categories of vari-ables is by their behavior when we convert units, i.e., when we rescaleour space. If we switch our time unit from hours to minutes, thenumber of apples in a bowl is unchanged, the earth’s period of ro-tation gets 60 times bigger, and the frequency of the cuckoo clockchanges by a factor of 1/60. In other words, a quantity u underrescaling of coordinates by a factor α becomes αpu, where the ex-ponents −1, 0, and +1 correspond to dual vectors, scalars, and vec-tors, respectively. We can therefore see that these distinctions are ofinterest even in one dimension, contrary to what one would have ex-pected from the freshman-physics concept of a vector as somethingtransforming in a certain way under rotations.

Geometrical visualization

In two dimensions, there are natural ways of visualizing the dif-ferent vector spaces inhabited by vectors and dual vectors. We’vealready been describing a vector like e as a displacement. Itsvector space is the space of such displacements.4 A vector in thedual space such as c can be visualized as a set of parallel, evenlyspaced lines on a topographic map, h/2, with an arrowhead to showwhich way is “uphill.” The act of measurement consists of countinghow many of these lines are crossed by a certain vector, h/3.

Given a scalar field f , its gradient grad f at any given pointis a dual vector. In birdtracks notation, we have to indicate this

by writing it with an outward-pointing arrow, (grad f) . Because

gradients occur so frequently, we have a special shorthand for them,which is simply a circle:

f

In the context of spacetime with a metric and curvature, we’ll seethat the usual definition of the gradient in terms of partial deriva-tives should be modified with correction terms to form somethingcalled a covariant derivative. When we get to that point on p. 178,we’ll commandeer the circle notation for that operation.

Force is a dual vector Example: 5The dot product dW = F · dx for computing mechanical work

4In terms of the primitive notions used in the axiomatization in section 2.1,a displacement could be described as an equivalence class of segments such thatfor any two segments in the class AB and CD, AB and CD form a parallelogram.

48 Chapter 2 Geometry of Flat Spacetime

Page 49: General Relativity

j / There is no natural metricon the space (θ,φ).

becomes, in birdtracks notation,

dW = F dx .

This shows that force is more naturally considered to be a dualvector rather than a vector. The symmetry between vectors anddual vectors is broken by considering displacements like dx tobe vectors, and this asymmetry then spreads to other quantitiessuch as force.

The same result can be obtained from Newton’s second law; seeexample 21 on p. 141.

Systems without a metric Example: 6The freshman-mechanics way of thinking about vectors and scalarproducts contains the hidden assumption that we have, besidesaffine measurement, an additional piece of measurement appa-ratus called the metric (section 3.5, p. 98). Without yet having toformally define what we mean by a metric, we can say roughlythat it supplies the conveniences that we’re used to having in theEuclidean plane, but that are not present in affine geometry. Inparticular, it allows us to define the notion that one vector is per-pendicular to another vector, or that one dual vector is perpendic-ular to another dual vector.

Let’s start with an example where the hidden assumption is valid,and we do have a metric. Let a billiard ball of unit mass be con-strained by a diagonal wall to have C ≤ 0, where C = y − x . TheLagrangian formalism just leads to the expected Newtonian ex-pressions for the momenta conjugate to x and y , px = x , py = y ,and these form a dual vector p . The force of constraint is

F = dp /dt . Let w = (grad C) be the gradient of the con-

straint function. The vectors F and w both belong to thespace of dual vectors, and they are parallel to each other. Sincewe do happen to have a metric in this example, it is also possibleto say, as most people would, that the force is perpendicular tothe wall.

Now consider the example shown in figure j. The arm’s weight isnegligible compared to the unit mass of the gripped weight, andboth the upper and lower arm have unit length. Elbows don’t bendbackward, so we have a constraint C ≤ 0, where C = θ− φ, andas before we can define define a dual vector w = (grad C)that is parallel to the line of constraint in the (θ,φ) plane. Theconjugate momenta (which are actually angular momenta) turnout to be pθ = θ+cos(φ−θ)φ and a similar expression for pφ. As inthe example of the billiard ball, the force of constraint is parallel tow . There is no metric that naturally applies to the (θ,φ) plane,so we have no notion of perpendicularity, and it doesn’t makesense to say that F is perpendicular to the line of constraint.

Section 2.1 Affine properties of Lorentz geometry 49

Page 50: General Relativity

k / The free-falling observerconsiders P and Q to be simulta-neous.

l / Points on the graph satisfythe dispersion relation C = 0 forwater waves. At a given pointon the graph, the dual vector(grad C) tells us the groupvelocity.

Finally we remark that since four-dimensional Galilean spacetimelacks a metric (see p. 100), the distinction between vectors anddual vectors in Galilean relativity is a real and physically importantone. The only reason people were historically able to ignore thisdistinction was that Galilean spacetime splits into independenttime and spatial parts, with the spatial part having a metric.

No simultaneity without a metric Example: 7We’ll see in section 2.2 that one way of defining the distinction be-tween Galilean and Lorentz geometry is that in Lorentzian space-time, simultaneity is observer-dependent. Without a metric, therecan be no notion of simultaneity at all, not even a frame-dependentone. In figure k, the fact that the observer considers events Pand Q to be simultaneous is represented by the fact that theobserver’s displacement vector o is perpendicular to the dis-placement s from P to Q. In affine geometry, we can’t expressperpendicularity.

Waves Example: 8Depictions of dual vectors as in figure h/2 on p. 48 evoke planewaves. This makes sense, because the phase θ of a wave isa scalar, and the gradient of a scalar is a dual vector. The dualvector k = (grad θ) = (ω, kx , . . .) is called the frequency vector.If o represents the displacement of an observer from one eventin spacetime to another, then k o is the number of radians ofthe wave’s phase that wash over the observer’s position. Thefrequency vector is discussed further in section 4.2.3 on p. 133.

There is a dispersion relation between a wave’s frequency andwavenumber. For example, surface waves in deep water obeythe constraint C = 0, where C = ω4 − α2k2 and α is a constantwith units of acceleration, relating to the acceleration of gravity.(Since the water is infinitely deep, there is no other scale thatcould enter into the constraint.)

When a wave is modulated, it can transport energy and momen-tum and transmit information, i.e., act as an agent of cause andeffect between events. How fast does it go? If a certain bump onthe envelope with which the wave is modulated visits spacetimeevents P and Q, then whatever frequency and wavelength thewave has near the bump are observed to be the same at P and Q.In general, k and ω are constant along the spacetime displace-ment of any point on the envelope, so the spacetime displace-ment s from P to Q must satisfy the condition (grad C) s = 0.The set of solutions to this equation is the world-line of the bump,and the inverse slope of this world-line is called the group velocity.In our example of water waves, the group velocity is α/2ω.

The quantity ω/k is referred to as the phase velocity. In ourexample of the water waves, the phase velocity is ω/k = α/ω,

50 Chapter 2 Geometry of Flat Spacetime

Page 51: General Relativity

which is twice the group velocity. The phase velocity lacks phys-ical interest, because it is not the velocity at which any “stuff”moves. The birdtracks notation expresses this fact by refusingto let us use information about the phase θ to decide whetherevents P and Q could have been linked by the wave’s motion.The only quantity we can extract from the phase information isthe frequency vector k , and one can say loosely that this is thephase velocity vector. But it’s not really a vector at all, it’s a dualvector. The displacement s from P to Q is a vector, not a dualvector. Since k and s belong to different vector spaces, wecan’t say whether they are parallel.

Abstract index notation

Expressions in birdtracks notation such as

C s

can be awkward to type on a computer, which is why we’ve al-ready been occasionally resorting to more linear notations such as

(gradC) s. As we encounter more complicated birdtracks, the di-

agrams will sometimes look like complicated electrical schematics,and the problem of generating them on a keyboard will get moreacute. There is in fact a systematic way of representing any suchexpression using only ordinary subscripts and superscripts. This iscalled abstract index notation, and was introduced by Roger Pen-rose at around the same time he invented birdtracks. For practicalreasons, it was the abstract index notation that caught on.

The idea is as follows. Suppose we wanted to describe a compli-cated birdtrack verbally, so that someone else could draw it. Thediagram would be made up of various smaller parts, a typical onelooking something like the scalar product u v. The verbal instruc-tions might be: “We have an object u with an arrow coming out ofit. For reference, let’s label this arrow as a. Now remember thatother object v I had you draw before? There was an arrow cominginto that one, which we also labeled a. Now connect up the twoarrows labeled a.”

Shortening this lengthy description to its bare minimum, Penroserenders it like this: uav

a. Subscripts depict arrows coming out ofa symbol (think of water flowing from a tank out through a pipebelow). Superscripts indicate arrows going in. When the same letteris used as both a superscript and a subscript, the two arrows are tobe piped together.

Abstract index notation evolved out of an earlier one calledthe Einstein summation convention, in which superscripts and sub-scripts referred to specific coordinates. For example, we might take0 to be the time coordinate, 1 to be x, and so on. A symbol like uγwould then indicate a component of the dual vector u, which could

Section 2.1 Affine properties of Lorentz geometry 51

Page 52: General Relativity

be its x component if γ took on the value 1. Repeated indices weresummed over.

The advantage of the birdtrack and abstract index notations isthat they are coordinate-independent, so that an equation writtenin them is valid regardless of the choice of coordinates. The Einsteinand Penrose notations look very similar, so for example if we wantto take a general result expressed in Penrose notation and apply itin a specific coordinate system, there is essentially no translationrequired. In fact, the two notations look so similar that we need anexplicit way to tell which is which, so that we can tell whether ornot a particular result is coordinate-independent. We therefore usethe convention that Latin indices represent abstract indices, whereasGreek ones imply a specific coordinate system and can take on nu-merical values, e.g., γ = 1.

2.2 Relativistic properties of Lorentz geometryWe now want to pin down the properties of the Lorentz geometrythat are left unspecified by the affine treatment. We need somefurther input from experiments in order to show us how to proceed.We take the following as empirical facts about flat spacetime:5

L1 Spacetime is homogeneous and isotropic. No time or placehas special properties that make it distinguishable from otherpoints, nor is one direction in space distinguishable from an-other.6

L2 Inertial frames of reference exist. These are frames in whichparticles move at constant velocity if not subject to any forces.We can construct such a frame by using a particular particle,which is not subject to any forces, as a reference point.

L3 Equivalence of inertial frames: If a frame is in constant-velocitytranslational motion relative to an inertial frame, then it is alsoan inertial frame. No experiment can distinguish one preferredinertial frame from all the others.

L4 Causality: There exist events 1 and 2 such that t1 < t2 in allframes.

L5 Relativity of time: There exist events 1 and 2 and frames ofreference (t,x) and (t′,x′) such that t1 < t2, but t′1 > t′2.

L4 makes it possible to have an event 1 that causes an event 2,with all observers agreeing on which caused which. L5 is supported

5These facts are summarized for convenience on page 400 in the back of thebook.

6For the experimental evidence on isotropy, see http://www.

edu-observatory.org/physics-faq/Relativity/SR/experiments.html#

Tests_of_isotropy_of_space.

52 Chapter 2 Geometry of Flat Spacetime

Page 53: General Relativity

a / Two objects at rest haveworld-lines that define a rect-angle. In a second frame ofreference in motion relative to thefirst one, the rectangle becomesa parallelogram.

by the experimental evidence in section 1.2; if L5 were false, thenspace and time could work as imagined by Galileo and Newton.

Define affine parameters t and x for time and position, and con-struct a (t,x) plane. Axiom L1 guarantees that spacetime is flat,allowing us to do this; if spacetime had, for example, a curvaturelike that of a sphere, then the axioms of affine geometry would fail,and it would be impossible to lay out such a global grid of paral-lels. Although affine geometry treats all directions symmetrically,we’re going beyond the affine aspects of the space, and t does playa different role than x here, as shown, for example, by L4 and L5.

In the (t,x) plane, consider a rectangle with one corner at theorigin O. We can imagine its right and left edges as representing theworld-lines of two objects that are both initially at rest in this frame;they remain at rest (L2), so the right and left edges are parallel.

How do we know that this is a rectangle and not some other kindof parallelogram? In purely affine geometry, there is no notion ofperpendicularity, so this distinction is meaningless. But implicit inthe existence of inertial frames (L2) is the assumption that spacetimehas some additional structure that allows a particular observer todecide what events he considers to be simultaneous (example 7,p. 50). He then considers his own world-line, i.e., his t axis, tobe perpendicular to a proposed x axis if points on the x axis aresimultaneous to him.

We now define a second frame of reference such that the originsof the two frames coincide, but they are in motion relative to oneanother with velocity v. The transformation L from the first frameto the second is referred to as a Lorentz boost with velocity v. Ldepends on v. By equivalence of inertial frames (L3), an observer inthe new frame considers his own t axis to be perpendicular to his ownx, even though they don’t look that way in figure a. Thus, althoughwe assume some notion of perpendicularity, we do not assume thatit looks the same as the Euclidean one.

By homogeneity of spacetime (L1), L must be linear, so theoriginal rectangle will be transformed into a parallelogram in thenew frame; this is also consistent with L3, which requires that theworld-lines on the right and left edges remain parallel. The left edgehas inverse slope v. By L5 (no simultaneity), the top and bottomedges are no longer horizontal.

For simplicity, let the original rectangle have unit area. Thenthe area of the new parallelogram is still 1, by the following argu-ment. Let the new area be A, which is a function of v. By isotropyof spacetime (L1), A(v) = A(−v). Furthermore, the function A(v)must have some universal form for all geometrical figures, not justfor a figure that is initially a particular rectangle; this follows be-cause of our definition of affine area in terms of a dissection by

Section 2.2 Relativistic properties of Lorentz geometry 53

Page 54: General Relativity

a two-dimensional lattice, which we can choose to be a lattice ofsquares. Applying boosts +v and −v one after another results in atransformation back into our original frame of reference, and sinceA is universal for all shapes, it doesn’t matter that the second trans-formation starts from a parallelogram rather than a square. Scalingthe area once by A(v) and again by A(−v) must therefore give backthe original square with its original unit area, A(v)A(−v) = 1, andsince A(v) = A(−v), A(v) = ±1 for any value of v. Since A(0) = 1,we must have A(v) = 1 for all v. The argument is independent ofthe shape of the region, so we conclude that all areas are preservedby Lorentz boosts. (See subsection 4.6.3 on p. 155 for further inter-pretation of A.)

If we consider a boost by an infinitesimal velocity dv, then thevanishing change in area comes from the sum of the areas of the fourinfinitesimally thin slivers where the rectangle lies either outside theparallelogram (call this negative area) or inside it (positive). (Wedon’t worry about what happens near the corners, because sucheffects are of order dv2.) In other words, area flows around in thex − t plane, and the flows in and out of the rectangle must cancel.Let v be positive; the flow at the sides of the rectangle is then to theright. The flows through the top and bottom cannot be in oppositedirections (one up, one down) while maintaining the parallelism ofthe opposite sides, so we have the following three possible cases:

b / Flows of area: (I) a shearthat preserves simultaneity, (II) arotation, (III) upward flow at alledges.

I There is no flow through the top and bottom. This case cor-responds to Galilean relativity, in which the rectangle shearshorizontally under a boost, and simultaneity is preserved, vi-olating L5.

II Area flows downward at both the top and the bottom. Theflow is clockwise at both the positive t axis and the positivex axis. This makes it plausible that the flow is clockwise ev-erywhere in the (t,x) plane, and the proof is straightforward.7

7Proof: By linearity of L, the flow is clockwise at the negative axes as well.Also by linearity, the handedness of the flow is the same at all points on a rayextending out from the origin in the direction θ. If the flow were counterclockwisesomewhere, then it would have to switch handedness twice in that quadrant, atθ1 and θ2. But by writing out the vector cross product r × dr, where dr is thedisplacement caused by L(dv), we find that it depends on sin(2θ+δ), which doesnot oscillate rapidly enough to have two zeroes in the same quadrant.

54 Chapter 2 Geometry of Flat Spacetime

Page 55: General Relativity

c / Unit square PQRS is Lorentz-boosted to the parallelogramP′Q′R′S′.

As v increases, a particular element of area flows continuallyclockwise. This violates L4, because two events with a causeand effect relationship could be time-reversed by a Lorentzboost.

III Area flows upward at both the top and the bottom.

Only case III is possible, and given case III, there must be at leastone point P in the first quadrant where area flows neither clockwisenor counterclockwise.8 The boost simply increases P’s distance fromthe origin by some factor. By the linearity of the transformation,the entire line running through O and P is simply rescaled. Thisspecial line’s inverse slope, which has units of velocity, apparentlyhas some special significance, so we give it a name, c. We’ll see laterthat c is the maximum speed of cause and effect whose existencewe inferred in section 1.3. Any world-line with a velocity equal toc retains the same velocity as judged by moving observers, and byisotropy the same must be true for −c.

For convenience, let’s adopt time and space units in which c = 1,and let the original rectangle be a unit square. The upper righttip of the parallelogram must slide along the line through the originwith slope +1, and similarly the parallelogram’s other diagonal musthave a slope of −1. Since these diagonals bisected one another onthe original square, and since bisection is an affine property thatis preserved when we change frames of reference, the parallelogrammust be equilateral.

We can now determine the complete form of the Lorentz transfor-mation. Let unit square PQRS, as described above, be transformedto parallelogram P′Q′R′S′ in the new coordinate system (x′, t′). Letthe t′ coordinate of R′ be γ, interpreted as the ratio between thetime elapsed on a clock moving from P′ to R′ and the correspondingtime as measured by a clock that is at rest in the (x′, t′) frame. Bythe definition of v, R′ has coordinates (vγ, γ), and the other geo-metrical facts established above place Q′ symmetrically on the otherside of the diagonal, at (γ, vγ). Computing the cross product of vec-tors P′R′ and P′Q′, we find the area of P′Q′R′S′ to be γ2(1 − v2),and setting this equal to 1 gives

γ =1√

1− v2.

Self-check: Interpret the dependence of γ on the sign of v.

8This follows from the fact that, as shown in the preceding footnote, thehandedness of the flow depends only on θ.

Section 2.2 Relativistic properties of Lorentz geometry 55

Page 56: General Relativity

d / The behavior of the γ factor.

The result for the transformation L, a Lorentz boost along thex axis with velocity v, is:

t′ = γt+ vγx

x′ = vγt+ γx

The symmetry of P′Q′R′S′ with respect to reflection across thediagonal indicates that the time and space dimensions are treatedsymmetrically, although they are not entirely interchangeable asthey would have been in case II.

A measuring rod, unlike a clock, sweeps out a two-dimensionalstrip on an x− t graph. As in Galilean relativity, the two observersdisagree on the positions of events at the two ends of their rods,but in addition they disagree on the simultaneity of such events.Calculation shows that a moving rod appears contracted by a factorγ.

In summary, a clock runs fastest according to an observer whois at rest relative to the clock, and a measuring rod likewise appearslongest in its own rest frame.

The lack of a universal notion of simultaneity has a similarlysymmetric interpretation. In prerelativistic physics, points in spacehave no fixed identity. A brass plaque commemorating a Civil Warbattle is not at the same location as the battle, according to anobserver who perceives the Earth has having been hurtling through

56 Chapter 2 Geometry of Flat Spacetime

Page 57: General Relativity

e / Example 9. Flashes oflight travel along P′T′ and Q′T′.The observer in this frame ofreference judges them to havebeen emitted at different times,and to have traveled differentdistances.

space for the intervening centuries. By symmetry, points in timehave no fixed identity either.

In everyday life, we don’t notice relativistic effects like time di-lation, so apparently γ ≈ 1, and v 1, i.e., the speed c must bevery large when expressed in meters per second. By setting c equalto 1, we have chosen a the distance unit that is extremely long inproportion to the time unit. This is an example of the correspon-dence principle, which states that when a new physical theory, suchas relativity, replaces an old one, such as Galilean relativity, it mustremain “backward-compatible” with all the experiments that ver-ified the old theory; that is, it must agree with the old theory inthe appropriate limit. Despite my coyness, you probably know thatthe speed of light is also equal to c. It is important to emphasize,however, that light plays no special role in relativity, nor was itnecessary to assume the constancy of the speed of light in order toderive the Lorentz transformation; we will in fact prove on page 68that photons must travel at c, and on page 129 that this must betrue for any massless particle.

On the other hand, Einstein did originally develop relativitybased on a different set of assumptions than our L1-L5. His treat-ment, given in his 1905 paper “On the electrodynamics of movingbodies,” is reproduced on p. 357. It starts from the following twopostulates:

P1 The principle of relativity: “. . . the phenomena of electrody-namics as well as of mechanics possess no properties corre-sponding to the idea of absolute rest.”

P2 “. . . light is always propagated in empty space with a definitevelocity c which is independent of the state of motion of theemitting body.”

Einstein’s P1 is essentially the same as our L3 (equivalence of in-ertial frames). He implicitly assumes something equivalent to ourL1 (homogeneity and isotropy of spacetime). In his system, ourL5 (relativity of time) is a theorem proved from the axioms P1-P2,whereas in our system, his P2 is a theorem proved from the axiomsL1-L5.

Section 2.2 Relativistic properties of Lorentz geometry 57

Page 58: General Relativity

Example: 9Let the intersection of the parallelogram’s two diagonals be T inthe original (rest) frame, and T′ in the Lorentz-boosted frame. Anobserver at T in the original frame simultaneously detects thepassing by of the two flashes of light emitted at P and Q, andsince she is positioned at the midpoint of the diagram in space,she infers that P and Q were simultaneous. Since the arrival ofboth flashes of light at the same point in spacetime is a concreteevent, an observer in the Lorentz-boosted frame must agree ontheir simultaneous arrival. (Simultaneity is well defined as longas no spatial separation is involved.) But the distances traveledby the two flashes in the boosted frame are unequal, and sincethe speed of light is the same in all cases, the boosted observerinfers that they were not emitted simultaneously.

Example: 10A different kind of symmetry is the symmetry between observers.If observer A says observer B’s time is slow, shouldn’t B say thatA’s time is fast? This is what would happen if B took a pill thatslowed down all his thought processes: to him, the rest of theworld would seem faster than normal. But this can’t be correctfor Lorentz boosts, because it would introduce an asymmetry be-tween observers. There is no preferred, “correct” frame corre-sponding to the observer who didn’t take a pill; either observercan correctly consider himself to be the one who is at rest. It mayseem paradoxical that each observer could think that the otherwas the slow one, but the paradox evaporates when we considerthe methods available to A and B for resolving the controversy.They can either (1) send signals back and forth, or (2) get to-gether and compare clocks in person. Signaling doesn’t estab-lish one observer as correct and one as incorrect, because aswe’ll see in the following section, there is a limit to the speed ofpropagation of signals; either observer ends up being able to ex-plain the other observer’s observations by taking into account thefinite and changing time required for signals to propagate. Meet-ing in person requires one or both observers to accelerate, as inthe original story of Alice and Betty, and then we are no longerdealing with pure Lorentz frames, which are described by non-accelerating observers.

Einstein’s goof Example: 11Einstein’s original 1905 paper on special relativity, reproduced onp. 357, contains a famous incorrect prediction, that “a spring-clockat the equator must go more slowly, by a very small amount, thana precisely similar clock situated at one of the poles under other-wise identical conditions” (p. 367). This was a reasonable predic-tion at the time, but we now know that it was incorrect becauseit neglected gravitational time dilation. In the description of theHafele-Keating experiment using atomic clocks aboard airplanes

58 Chapter 2 Geometry of Flat Spacetime

Page 59: General Relativity

(p. ??), we saw that both gravity and motion had effects on therate of flow of time. On p. 32 we found based on the equivalenceprinciple that the gravitational redshift of an electromagnetic waveis ∆E/E = ∆Φ (where c = 1 and Φ is the gravitational potentialgy ), and that this could also be interpreted as a gravitational timedilation ∆t/t = ∆Φ.

The clock at the equator suffers a kinematic time dilation thatwould tend to cause it to run more slowly than the one at thepole. However, the earth is not a sphere, so the two clocks areat different distances from the earth’s center, and the field theyinhabit is also not the simple field of a sphere. This suggests thatthere may be an additional gravitational effect due to ∆Φ 6= 0.Expanding the Lorentz gamma factor in a Taylor series, we findthat the kinematic effect amounts to ∆t/t = γ − 1 ≈ v2/2. Themismatch in rates between the two clocks is

∆tt≈ 1

2v2 − ∆Φ ,

where ∆Φ = Φequator −Φpole, and a factor of 1/c2 on the right issuppressed because c = 1. But this expression for ∆t/t vanishes,because the surface of the earth’s oceans is in equilibrium, andtherefore a mass of water m can be brought from the north pole tothe equator, with the change in potential energy −m∆Φ being ex-actly sufficient to supply the necessary kinetic energy (1/2)mv2.We therefore find that a change of latitude should have no effecton the rate of a clock, provided that it remains at sea level.

This has been verified experimentally by Alley et al.9 Alley’s groupflew atomic clocks from Washington, DC to Thule, Greenland, leftthem there for four days, and brought them back. The differencebetween the clocks that went to Greenland and other clocks thatstayed in Washington was 38 ± 5 ns, which was consistent withthe 35±2 ns effect predicted purely based on kinematic and grav-itational time dilation while the planes were in the air. If Einstein’s1905 prediction had been correct, then there would have been anadditional difference of 224 ns due to the difference in latitude.

The perfect cancellation of kinematic and gravitational effects isnot as fortuitous as it might seem; this is discussed in example17 on p. 113.

GPS Example: 12In the GPS system, as in example 11, both gravitational and kine-matic time dilation must be considered. Let’s determine the direc-tions and relative strengths of the two effects in the case of a GPSsatellite.9C.O. Alley, et al., in NASA Goddard Space Flight Center, Proc. of the

13th Ann. Precise Time and Time Interval (PTTI) Appl. and PlanningMeeting, p. 687-724, 1981, available online at http://www.pttimeeting.org/

archivemeetings/index9.html

Section 2.2 Relativistic properties of Lorentz geometry 59

Page 60: General Relativity

A radio photon emitted by a GPS satellite gains energy as it fallsto the earth’s surface, so its energy and frequency are increasedby this effect. The observer on the ground, after accounting forall non-relativistic effects such as Doppler shifts and the Sagnaceffect (p. 74), would interpret the frequency shift by saying thattime aboard the satellite was flowing more quickly than on theground.

However, the satellite is also moving at orbital speeds, so there isa Lorentz time dilation effect. According to the observer on earth,this causes time aboard the satellite to flow more slowly than onthe ground.

We can therefore see that the two effects are of opposite sign.Which is stronger?

For a satellite in low earth orbit, we would have v2/r = g, wherer is only slightly greater than the radius of the earth. The relativeeffect on the flow of time is γ − 1 ≈ v2/2 = gr/2. The gravi-tational effect, approximating g as a constant, is −gy , where yis the satellite’s altitude above the earth. For such a satellite, thegravitational effect is down by a factor of 2y/r , so the Lorentz timedilation dominates.

GPS satellites, however, are not in low earth orbit. They orbitat an altitude of about 20,200 km, which is quite a bit greaterthan the radius of the earth. We therefore expect the gravitationaleffect to dominate. To confirm this, we need to generalize theequation ∆t/t = ∆Φ (with c = 1) from example 11 to the casewhere g is not a constant. Integrating the equation dt/t = dΦ,we find that the time dilation factor is equal to e∆Φ. When ∆Φ issmall, e∆Φ ≈ 1 + ∆Φ, and we have a relative effect equal to ∆Φ.The total effect for a GPS satellite is thus (inserting factors of c forcalculation with SI units, and using positive signs for blueshifts)

1c2

(+∆Φ− v2

2

)= 5.2× 10−10 − 0.9× 10−10 ,

where the first term is gravitational and the second kinematic. Amore detailed analysis includes various time-varying effects, butthis is the constant part. For this reason, the atomic clocks aboardthe satellites are set to a frequency of 10.22999999543 MHz be-fore launching them into orbit; on the average, this is perceivedon the ground as 10.23 MHz. A more complete analysis of thegeneral relativity involved in the GPS system can be found in thereview article by Ashby.10

Self-check: Suppose that positioning a clock at a certain dis-tance from a certain planet produces a fractional change δ in the

10N. Ashby, “Relativity in the Global Positioning System,” http://www.

livingreviews.org/lrr-2003-1

60 Chapter 2 Geometry of Flat Spacetime

Page 61: General Relativity

rate at which time flows. In other words, the time dilation factoris 1 + δ. Now suppose that a second, identical planet is broughtinto the picture, at an equal distance from the clock. The clock ispositioned on the line joining the two planets’ centers, so that thegravitational field it experiences is zero. Is the fractional time dila-tion now approximately 0, or approximately 2δ? Why is this onlyan approximation?

f / Apparatus used for the test ofrelativistic time dilation describedin example 13. The promi-nent black and white blocks arelarge magnets surrounding a cir-cular pipe with a vacuum inside.(c) 1974 by CERN.

Section 2.2 Relativistic properties of Lorentz geometry 61

Page 62: General Relativity

g / Muons accelerated to nearly cundergo radioactive decay muchmore slowly than they wouldaccording to an observer at restwith respect to the muons. Thefirst two data-points (unfilledcircles) were subject to largesystematic errors.

h / The change in the frequencyof x-ray photons emitted by 57Feas a function of temperature,drawn after Pound And Rebka(1960). Dots are experimentalmeasurements. The solid curveis Pound and Rebka’s theoreticalcalculation using the Debye the-ory of the lattice vibrations witha Debye temperature of 420 de-grees C. The dashed line is onewith the slope calculated in thetext using a simplified treatmentof the thermodynamics. There isan arbitrary vertical offset in theexperimental data, as well as thetheoretical curves.

Large time dilation Example: 13The time dilation effect in the Hafele-Keating experiment was verysmall. If we want to see a large time dilation effect, we can’t doit with something the size of the atomic clocks they used; thekinetic energy would be greater than the total megatonnage ofall the world’s nuclear arsenals. We can, however, acceleratesubatomic particles to speeds at which γ is large. An early, low-precision experiment of this kind was performed by Rossi and Hallin 1941, using naturally occurring cosmic rays. Figure f shows a1974 experiment11 of a similar type which verified the time dila-tion predicted by relativity to a precision of about one part perthousand.

Muons were produced by an accelerator at CERN, near Geneva.A muon is essentially a heavier version of the electron. Muons un-dergo radioactive decay, lasting an average of only 2.197 µs be-fore they evaporate into an electron and two neutrinos. The 1974experiment was actually built in order to measure the magneticproperties of muons, but it produced a high-precision test of timedilation as a byproduct. Because muons have the same electriccharge as electrons, they can be trapped using magnetic fields.Muons were injected into the ring shown in figure f, circling aroundit until they underwent radioactive decay. At the speed at whichthese muons were traveling, they had γ = 29.33, so on the av-erage they lasted 29.33 times longer than the normal lifetime. Inother words, they were like tiny alarm clocks that self-destructedat a randomly selected time. Figure g shows the number of ra-dioactive decays counted, as a function of the time elapsed af-ter a given stream of muons was injected into the storage ring.The two dashed lines show the rates of decay predicted with andwithout relativity. The relativistic line is the one that agrees withexperiment.

Time dilation in the Pound-Rebka experiment Example: 14In the description of the Pound-Rebka experiment on page ??, Ipostponed the quantitative estimation of the frequency shift dueto temperature. Classically, one expects only a broadening ofthe line, since the Doppler shift is proportional to v‖/c, wherev‖, the component of the emitting atom’s velocity along the lineof sight, averages to zero. But relativity tells us to expect that ifthe emitting atom is moving, its time will flow more slowly, so thefrequency of the light it emits will also be systematically shifteddownward. This frequency shift should increase with tempera-ture. In other words, the Pound-Rebka experiment was designedas a test of general relativity (the equivalence principle), but thisspecial-relativistic effect is just as strong as the relativistic one,and needed to be accounted for carefully.

11Bailey at al., Nucl. Phys. B150(1979) 1

62 Chapter 2 Geometry of Flat Spacetime

Page 63: General Relativity

In Pound and Rebka’s paper describing their experiment,12 theyrefer to a preliminary measurement13 in which they carefully mea-sured this effect, showed that it was consistent with theory, andpointed out that a previous claim by Cranshaw et al. of havingmeasured the gravitational frequency shift was vitiated by theirfailure to control for the temperature dependence.

It turns out that the full Debye treatment of the lattice vibrations isnot really necessary near room temperature, so we’ll simplify thethermodynamics. At absolute temperature T , the mean transla-tional kinetic energy of each iron nucleus is (3/2)kT . The velocityis much less than c(= 1), so we can use the nonrelativistic expres-sion for kinetic energy, K = (1/2)mv2, which gives a mean valuefor v2 of 3kT/m. In the limit of v 1, time dilation produces achange in frequency by a factor of 1/γ, which differs from unityby approximately −v2/2. The relative time dilation is therefore−3kT/2m, or, in metric units, −3kT/2mc2. The vertical scalein figure h contains an arbitrary offset, since Pound and Rebka’smeasurements were the best absolute measurements to date ofthe frequency. The predicted slope of −3k/2mc2, however, isnot arbitrary. Plugging in 57 atomic mass units for m, we findthe slope to be 2.4 × 10−15, which, as shown in the figure is anexcellent approximation (off by only 10%) near room temperature.

2.2.1 Geodesics and stationary action

One way of characterizing geodesics in spacetime is by usingan action principle. This is similar to characterizing a geodesic inEuclidean space as a line of minimum length between two points.For a timelike geodesic from event P to event Q, we have a propertime τ . In Lorentz spacetime, this proper time is greater than itwould have been for any non-geodesic motion from P to Q. In curvedspacetime, we must weaken this statement somewhat. The propertime may not be a global maximum, but it is stationary. Stationaritymeans that if we vary the curve by some small amount, not movingany part of it by a coordinate distance greater than ε, then thechange in τ is of order ε2.

For spacelike geodesics in Lorentz spacetime, the proper lengthis stationary, but small deformations of the curve can either increaseor decrease the proper length. The stationary action approach doesnot work well for lightlike geodesics.

12Phys. Rev. Lett. 4 (1960) 33713Phys. Rev. Lett. 4 (1960) 274

Section 2.2 Relativistic properties of Lorentz geometry 63

Page 64: General Relativity

a / The light cone in 2+1 di-mensions.

b / The circle plays a privi-leged role in Euclidean geometry.When rotated, it stays the same.The pie slice is not invariant asthe circle is. A similar privilegedplace is occupied by the lightcone in Lorentz geometry. Undera Lorentz boost, the spacetimeparallelograms change, but thelight cone doesn’t.

2.3 The light coneGiven an event P, we can now classify all the causal relationships inwhich P can participate. In Newtonian physics, these relationshipsfell into two classes: P could potentially cause any event that lay inits future, and could have been caused by any event in its past. In aLorentz spacetime, we have a trichotomy rather than a dichotomy.There is a third class of events that are too far away from P in space,and too close in time, to allow any cause and effect relationship, sincecausality’s maximum velocity is c. Since we’re working in units inwhich c = 1, the boundary of this set is formed by the lines withslope ±1 on a (t,x) plot. This is referred to as the light cone, and inthe generalization from 1+1 to 3+1 dimensions, it literally becomesa (four-dimensional) cone. The terminology comes from the factthat light happens to travel at c, the maximum speed of cause andeffect. If we make a cut through the cone defined by a surface of con-stant time in P’s future, the resulting section is a sphere (analogousto the circle formed by cutting a three-dimensional cone), and thissphere is interpreted as the set of events on which P could have hada causal effect by radiating a light pulse outward in all directions.

Events lying inside one another’s light cones are said to havea timelike relationship. Events outside each other’s light cones arespacelike in relation to one another, and in the case where they lieon the surfaces of each other’s light cones the term is lightlike.

The light cone plays the same role in the Lorentz geometry thatthe circle plays in Euclidean geometry. The truth or falsehood ofpropositions in Euclidean geometry remains the same regardless ofhow we rotate the figures, and this is expressed by Euclid’s E3 as-serting the existence of circles, which remain invariant under rota-tion. Similarly, Lorentz boosts preserve light cones and truth ofpropositions in a Lorentz frame.

Self-check: Under what circumstances is the time-ordering ofevents P and Q preserved under a Lorentz boost?

In a uniform Lorentz spacetime, all the light cones line up likesoldiers with their axes parallel with one another. When gravity ispresent, however, this uniformity is disturbed in the vicinity of themasses that constitute the sources. The light cones lying near thesources tip toward the sources. Superimposed on top of this gravi-tational tipping together, recent observations have demonstrated asystematic tipping-apart effect which becomes significant on cosmo-logical distance scales. The parameter Λ that sets the strength ofthis effect is known as the cosmological constant. The cosmologi-cal constant is not related to the presence of any sources (such asnegative masses), and can be interpreted instead as a tendency forspace to expand over time on its own initiative. In the present era,the cosmological constant has overpowered the gravitation of theuniverse’s mass, causing the expansion of the universe to accelerate.

64 Chapter 2 Geometry of Flat Spacetime

Page 65: General Relativity

c / Light cones tip over fortwo reasons in general relativity:because of the presence ofmasses, which have gravita-tional fields, and because ofthe cosmological constant. Thetime and distance scales in thebottom figure are many orders ofmagnitude greater than those inthe top.

d / Example 15. Matter islifted out of a Newtonian blackhole with a bucket. The dashedline represents the point at whichthe escape velocity equals thespeed of light.

Self-check: In the bottom panel of figure c, can an observer lookat the properties of the spacetime in her immediate vicinity andtell how much her light cones are tipping, and in which direction?Compare with figure j on page 28.

A Newtonian black hole Example: 15In the case of a black hole, the light cone tips over so far that

the entire future timelike region lies within the black hole. If anobserver is present at such an event, then that observer’s en-tire potential future lies within the black hole, not outside it. Byexpanding on the logical consequences of this statement, we ar-rive at an example of relativity’s proper interpretation as a theoryof causality, not a theory of objects exerting forces on one an-other as in Newton’s vision of action at a distance, or Lorentz’soriginal ether-drag interpretation of the factor γ, in which lengthcontraction arose from a physical strain imposed on the atomscomposing a physical body.

Imagine a black hole from a Newtonian point of view, as proposedin 1783 by geologist John Michell. Setting the escape velocityequal to the speed of light, we find that this will occur for any grav-itating spherical body compact enough to have M/r > c2/2G.(A fully relativistic argument, as given in section 6.2, agrees onM/r ∝ c2/G, which is fixed by units. The correct unitless factordepends on the definition of r , which is flexible in general relativ-ity.) A flash of light emitted from the surface of such a Newtonianblack hole would fall back down like water from a fountain, butit would nevertheless be possible for physical objects to escape,e.g., if they were lifted out in a bucket dangling from a cable. If thecable is to support its own weight, it must have a tensile strengthper unit density of at least c2/2, which is about ten orders of mag-nitude greater than that of carbon nanotube fibers. (The factor of1/2 is not to be taken seriously, since it comes from a nonrela-tivistic calculation.)

The cause-and-effect interpretation of relativity tells us that thisNewtonian picture is incorrect. A physical object that approachesto within a distance r of a concentration of mass M, with M/rsufficiently large, has no causal future lying at larger values of r .The conclusion is that there is a limit on the tensile strength ofany substance, imposed purely by general relativity, and we canstate this limit without having to know anything about the physicalnature of the interatomic forces. A more complete treatment ofthe tension in the rope is given in example 5 on p. 279. Cf. alsohomework problem 4 and section 3.5.4, as well as some refer-ences given in the remark following problem 4.

Section 2.3 The light cone 65

Page 66: General Relativity

e / The rapidity, η = tanh−1 v , asa function of v .

f / Example 16.

2.3.1 Velocity addition

In classical physics, velocities add in relative motion. For exam-ple, if a boat moves relative to a river, and the river moves relativeto the land, then the boat’s velocity relative to the land is foundby vector addition. This linear behavior cannot hold relativistically.For example, if a spaceship is moving at 0.60c relative to the earth,and it launches a probe at 0.60c relative to itself, we can’t have theprobe moving at 1.20c relative to the earth, because this would begreater than the maximum speed of cause and effect, c. To see howto add velocities relativistically, we start by rewriting the Lorentztransformation as the matrix(

cosh η sinh ηsinh η cosh η

),

where η = tanh−1 v is called the rapidity. We are guaranteed thatthe matrix can be written in this form, because its area-preservingproperty says that the determinant equals 1, and cosh2 η−sinh2 η =1 is an identity of the hyperbolic trig functions. It is now straight-forward to verify that multiplication of two matrices of this formgives a third matrix that is also of this form, with η = η1 + η2. Inother words, rapidities add linearly; velocities don’t. In the exampleof the spaceship and the probe, the rapidities add as tanh−1 .60 +tanh−1 .60 = .693 + .693 = 1.386, giving the probe a velocity oftanh 1.386 = 0.88 relative to the earth. Any number of velocitiescan be added in this way, η1 + η2 + . . .+ ηn.

Self-check: Interpret the asymptotes of the graph in figure e.

Bell’s spaceship paradox Example: 16A difficult philosophical question is whether the time dilation and

length contractions predicted by relativity are “real.” This de-pends, of course, on what one means by “real.” They are frame-dependent, i.e., observers in different frames of reference dis-agree about them. But this doesn’t tell us much about their reality,since velocities are frame-dependent in Newtonian mechanics,but nobody worries about whether velocities are real. John Bell(1928-1990) proposed the following thought experiment to physi-cists in the CERN cafeteria, and found that nearly all of them gotit wrong. He took this as evidence that their intuitions had beenmisguided by the standard way of approaching this question ofthe reality of Lorentz contractions.

Let spaceships A and B accelerate as shown in figure f along astraight line. Observer C does not accelerate. The accelerations,as judged by C, are constant, and equal for the two ships. Eachship is equipped with a yard-arm, and a thread is tied betweenthe two arms. Does the thread break, due to Lorentz contraction?(We assume that the acceleration is gentle enough that the threaddoes not break simply because of its own inertia.)

66 Chapter 2 Geometry of Flat Spacetime

Page 67: General Relativity

The popular answer in the CERN cafeteria was that the threadwould not break, the reasoning being that Lorentz contraction isa frame-dependent effect, and no such contraction would be ob-served in A and B’s frame. The ships maintain a constant dis-tance from one another, so C merely disagrees with A and Babout the length of the thread, as well as other lengths like thelengths of the spaceships.

The error in this reasoning is that the accelerations of A and Bwere specified to be equal and constant in C’s frame, not in A andB’s. Bell’s interpretation is that the frame-dependence is a distrac-tion, that Lorentz contraction is in some sense a real effect, andthat it is therefore immediately clear that the thread must break,without even having to bother going into any other frame. To con-vince his peers in the cafeteria, however, Bell presumably neededto satisfy them as to the specific errors in their reasoning, and thisrequires that we consider the frame-dependence explicitly.

We can first see that it is impossible, in general, for different ob-servers to agree about what is meant by constant acceleration.Suppose that A and B agree with C about the constancy of theiracceleration. Then A and B experience a voyage in which therapidities of the stars around them (and of observer C) increaselinearly with time. As the rapidity approaches infinity, both C andthe stars approach the speed of light. But since A and C agree onthe magnitude of their velocity relative to one another, this meansthat A’s velocity as measured by C must approach c, and thiscontradicts the premise that C observes constant acceleration forboth ships. Therefore A and B do not consider their own acceler-ations to be constant.

A and B do not agree with C about simultaneity, and since theyalso do not agree that their accelerations are constant, they donot consider their own accelerations to be equal at a given mo-ment of time. Therefore the string changes its length, and thisis consistent with Bell’s original, simple answer, which did not re-quire comparing different frames of reference. To establish thatthe string comes under tension, rather than going slack, we canapply the equivalence principle. By the equivalence principle, anyexperiments done by A and B give the same results as if theywere immersed in a gravitational field. The leading ship B sees Aas experiencing a gravitational time dilation. According to B, theslowpoke A isn’t accelerating as rapidly as it should, causing thestring to break.

These ideas are closely related to the fact that general relativitydoes not admit any spacetime that can be interpreted as a uni-form gravitational field (see problem 6, p. 200).

Section 2.3 The light cone 67

Page 68: General Relativity

2.3.2 Logic

The trichotomous classification of causal relationships has in-teresting logical implications. In classical Aristotelian logic, everyproposition is either true or false, but not both, and given proposi-tions p and q, we can form propositions such as p ∧ q (both p andq) or p ∨ q (either p or q). Propositions about physical phenomenacan only be verified by observation. Let p be the statement thata certain observation carried out at event P gives a certain result,and similarly for q at Q. If PQ is spacelike, then the truth or false-hood of p ∧ q cannot be checked by physically traveling to P andQ, because no observer would be able to attend both events. Thetruth-value of p∧ q is unknown to any observer in the universe untila certain time, at which the relevant information has been able topropagate back and forth. What if P and Q lie inside two differentblack holes? Then the truth-value of p∧ q can never be determinedby any observer. Another example is the case in which P and Qare separated by such a great distance that, due to the acceleratingexpansion of the universe, their future light cones do not overlap.

We conclude that Aristotelian logic cannot be appropriately ap-plied to relativistic observation in this way. Some workers attempt-ing to construct a quantum-mechanical theory of gravity have sug-gested an even more radically observer-dependent logic, in whichdifferent observers may contradict one another on the truth-value ofa single proposition p1, unless they agree in advance on the list p2,p3, . . . of all the other propositions that they intend to test as well.We’ll return to these questions on page 234.

2.4 Experimental tests of Lorentz geometryWe’ve already seen, in section 1.2, a variety of evidence for the non-classical behavior of spacetime. We’re now in a position to discusstests of relativity more quantitatively. An up-to-date review of suchtests is given by Mattingly.14

One such test is that relativity requires the speed of light tobe the same in all frames of reference, for the following reasons.Compare with the speed of sound in air. The speed of sound is notthe same in all frames of reference, because the wave propagatesat a fixed speed relative to the air. An observer at who is movingrelative to the air will measure a different speed of sound. Light, onthe other hand, isn’t a vibration of any physical medium. Maxwell’sequations predict a definite value for the speed of light, regardlessof the motion of the source. This speed also can’t be relative toany medium. If the speed of light isn’t fixed relative to the source,and isn’t fixed relative to a medium, then it must be fixed relativeto anything at all. The only speed in relativity that is equal in all

14livingreviews.org/lrr-2005-5

68 Chapter 2 Geometry of Flat Spacetime

Page 69: General Relativity

a / An artist’s conception of agamma-ray burst, resulting froma supernova explosion.

frames of reference is c, so light must propagate at c. We will seeon page 129 that there is a deeper reason for this; relativity requiresthat any massless particle propagate at c. The requirement of v = cfor massless particles is so intimately hard-wired into the structureof relativity that any violation of it, no matter how tiny, would be ofgreat interest. Essentially, such a violation would disprove Lorentzinvariance, i.e., the invariance of the laws of physics under Lorentztransformations. There are two types of tests we could do: (1)test whether photons of all energies travel at the same speed, i.e.,whether the vacuum is dispersive; (2) test whether observers in allframes of reference measure the same speed of light.

2.4.1 Dispersion of the vacuum

Some candidate quantum-mechanical theories of gravity, suchas loop quantum gravity, predict a granular structure for spacetimeat the Planck scale,

√~G/c3 = 10−35 m, which one could imagine

might lead to deviations from v = 1 that would become more andmore significant for photons with wavelengths getting closer andcloser to that scale. Lorentz-invariance would then be an approxi-mation valid only at large scales. It turns out that the state of theart in loop quantum gravity is not yet sufficient to say whether ornot such an effect should exist.

Presently the best experimental tests of the invariance of thespeed of light with respect to wavelength come from astronomicalobservations of gamma-ray bursts, which are sudden outpourings ofhigh-energy photons, believed to originate from a supernova explo-sion in another galaxy. One such observation, in 2009,15 collectedphotons from such a burst, with a duration of 2 seconds, indicatingthat the propagation time of all the photons differed by no morethan 2 seconds out of a total time in flight on the order of ten bil-lion years, or about one part in 1017! A single superlative photon inthe burst had an energy of 31 GeV, and its arrival within the same2-second time window demonstrates Lorentz invariance over a vastrange of photon energies, contradicting heuristic estimates that hadbeen made by some researchers in loop quantum gravity.

2.4.2 Observer-independence of c

The constancy of the speed of light for observers in all frames ofreference was originally detected in 1887 when Michelson and Morleyset up a clever apparatus to measure any difference in the speed oflight beams traveling east-west and north-south. The motion ofthe earth around the sun at 110,000 km/hour (about 0.01% of thespeed of light) is to our west during the day. Michelson and Morleybelieved that light was a vibration of a physical medium, the ether,so they expected that the speed of light would be a fixed valuerelative to the ether. As the earth moved through the ether, they

15http://arxiv.org/abs/0908.1832

Section 2.4 Experimental tests of Lorentz geometry 69

Page 70: General Relativity

thought they would observe an effect on the velocity of light alongan east-west line. For instance, if they released a beam of light ina westward direction during the day, they expected that it wouldmove away from them at less than the normal speed because theearth was chasing it through the ether. They were surprised whenthey found that the expected 0.01% change in the speed of light didnot occur.

b / The Michelson-Morley experi-ment, shown in photographs, anddrawings from the original 1887paper. 1. A simplified draw-ing of the apparatus. A beam oflight from the source, s, is par-tially reflected and partially trans-mitted by the half-silvered mirrorh1. The two half-intensity parts ofthe beam are reflected by the mir-rors at a and b, reunited, and ob-served in the telescope, t. If theearth’s surface was supposed tobe moving through the ether, thenthe times taken by the two lightwaves to pass through the mov-ing ether would be unequal, andthe resulting time lag would bedetectable by observing the inter-ference between the waves whenthey were reunited. 2. In the realapparatus, the light beams werereflected multiple times. The ef-fective length of each arm wasincreased to 11 meters, whichgreatly improved its sensitivity tothe small expected difference inthe speed of light. 3. In anearlier version of the experiment,they had run into problems withits “extreme sensitiveness to vi-bration,” which was “so great thatit was impossible to see the in-terference fringes except at briefintervals . . . even at two o’clockin the morning.” They thereforemounted the whole thing on amassive stone floating in a pool ofmercury, which also made it pos-sible to rotate it easily. 4. A photoof the apparatus. Note that it isunderground, in a room with solidbrick walls.

Although the Michelson-Morley experiment was nearly two dec-ades in the past by the time Einstein published his first paper onrelativity in 1905, and Einstein did know about it,16 it’s unclear howmuch it influenced him. Michelson and Morley themselves were un-certain about whether the result was to be trusted, or whether sys-tematic and random errors were masking a real effect from the ether.There were a variety of competing theories, each of which couldclaim some support from the shaky data. Some physicists believedthat the ether could be dragged along by matter moving through it,which inspired variations on the experiment that were conducted onmountaintops in thin-walled buildings, (figure), or with one arm ofthe apparatus out in the open, and the other surrounded by massivelead walls. In the standard sanitized textbook version of the historyof science, every scientist does his experiments without any pre-conceived notions about the truth, and any disagreement is quicklysettled by a definitive experiment. In reality, this period of confu-

16J. van Dongen, http://arxiv.org/abs/0908.1545

70 Chapter 2 Geometry of Flat Spacetime

Page 71: General Relativity

sion about the Michelson-Morley experiment lasted for four decades,and a few reputable skeptics, including Miller, continued to believethat Einstein was wrong, and kept trying different variations of theexperiment as late as the 1920’s. Most of the remaining doubterswere convinced by an extremely precise version of the experimentperformed by Joos in 1930, although you can still find kooks on theinternet who insist that Miller was right, and that there was a vastconspiracy to cover up his results.

c / Dayton Miller thought that the result of the Michelson-Morley ex-periment could be explained because the ether had been pulled along bythe dirt, and the walls of the laboratory. This motivated him to carry out aseries of experiments at the top of Mount Wilson, in a building with thinwalls.

Before Einstein, some physicists who did believe the negativeresult of the Michelson-Morley experiment came up with explana-tions that preserved the ether. In the period from 1889 to 1895, bothLorentz and George FitzGerald suggested that the negative resultof the Michelson-Morley experiment could be explained if the earth,and every physical object on its surface, was contracted slightly bythe strain of the earth’s motion through the ether. Thus althoughLorentz developed all the mathematics of Lorentz frames, and gotthem named after himself, he got the interpretation wrong.

2.4.3 Lorentz violation by gravitational forces

The tests described in sections 2.4.1 and 2.4.2 both involve thebehavior of light, i.e., they test whether or not electromagnetismreally has the exact Lorentz-invariant behavior contained implicitlyin Maxwell’s equations. In the jargon of the field, they test Lorentzinvariance in the “photon sector.” Since relativity is a theory ofgravity, it is natural to ask whether the Lorentz invariance holdsfor gravitational forces as well as electromagnetic ones. If Lorentzinvariance is violated by gravity, then the strength of gravitationalforces might depend on the observer’s motion through space, rela-tive to some fixed reference frame analogous to that of the ether.Historically, gravitational Lorentz violations have been much moredifficult to test, since gravitational forces are so weak, and the firsthigh-precision data were obtained by Nordtvedt and Will in 1957,70 years after Michelson and Morley. Nordtvedt and Will measured

Section 2.4 Experimental tests of Lorentz geometry 71

Page 72: General Relativity

e / The matter interferometerused by Chung et al. Each atom’swavefunction is split into twoparts, which travel along twodifferent paths (solid and dashedlines).

d / The results of the measure-ment of g by Chung et al., sec-tion 2.4.3. The experiment wasdone on the Stanford Universitycampus, surrounded by the Pa-cific ocean and San FranciscoBay, so it was subject to vary-ing gravitational from both astro-nomical bodies and the rising andfalling ocean tides. Once both ofthese effects are subtracted outof the data, there is no Lorentz-violating variation in g due tothe earth’s motion through space.Note that the data are broken upinto three periods, with gaps ofthree months and four years sep-arating them. (c) APS, used un-der the U.S. fair use exception tocopyright.

the strength of the earth’s gravitational field as a function of time,and found that it did not vary on a 24-hour cycle with the earth’srotation, once tidal effects had been accounted for. Further con-straints come from data on the moon’s orbit obtained by reflectinglaser beams from a mirror left behind by the Apollo astronauts.

A recent high-precision laboratory experiment was done in 2009by Chung et al.17 They constructed an interferometer in a verti-cal plane that is conceptually similar to a Michelson interferometer,except that it uses cesium atoms rather than photons. That is,the light waves of the Michelson-Morley experiment are replaced byquantum-mechanical matter waves. The roles of the half-silveredand fully silvered mirrors are filled by lasers, which kick the atomselectromagnetically. Each atom’s wavefunction is split into twoparts, which travel by two different paths through spacetime, even-tually reuniting and interfering. The result is a measurement of gto about one part per billion. The results, shown in figure d, put astrict limit on violations of Lorentz geometry by gravity.

2.5 Three spatial dimensionsNew and nontrivial phenomena arise when we generalize from 1+1dimensions to 3+1.

2.5.1 Lorentz boosts in three dimensions

How does a Lorentz boost along one axis, say x, affect the othertwo spatial coordinates y and z?

First, we can rule out the possibility that such a transformationcould have various terms such as t′ = . . .+(. . .)y+ . . .. For example,

17arxiv.org/abs/0905.1929

72 Chapter 2 Geometry of Flat Spacetime

Page 73: General Relativity

a / A boost along x followedby a boost along y results intangling up of the x and y coordi-nates, so the result is not just aboost but a boost plus a rotation.

if the t coefficient was positive for v > 0, then the laws of physicswould be different from the laws that applied in a universe wherethe y or t axis was inverted, but this would violate parity or time-reversal symmetry. This establishes that observers in the two framesagree on the directions of the y and z axes and on simultaneity alongthose axes when they coincide.

Now suppose that two observers, in motion relative to one an-other along the x axis, each carry a stick, represented by line seg-ments AB and CD, oriented along the y axis, such that the basesof the sticks A and C coincide at some time. Due to the vanishingof the types of terms in the transformation referred to above, theyagree that B and D are collinear with A (and C) at this time. Thenby O3 and O4, either B lies between A and D, D lies between A andB, or B=D. That is, they must agree whether the sticks are equalin length or, if not, then on whose is longer. This would violate L1,isotropy of space, since it would distinguish +x from −x.

Another simple way to obtain this result is as follows. We havealready proved that area in the (t,x) plane is preserved. The sameproof applies to volume in the spaces (t,x, y) and (t,x, z), hencelengths in the y and z directions are preserved. (The proof does notapply to volume in, e.g., (x, y, z) space, because the x transformationdepends on t, and therefore if we are given a region in (x, y, z), wedo not have enough information to say how it will change under aLorentz boost.)

The complete form of the transformation L(vx), a Lorentz boostalong the x axis with velocity v, is therefore:

t′ = γt+ vγx

x′ = vγt+ γx

y′ = y

z′ = z

Based on the trivial nature of this generalization, it might seemas though no qualitatively new considerations would arise in 3+1dimensions as compared with 1+1. To see that this is not the case,consider figure a. A boost along the x axis tangles up the x andt coordinates. A y-boost mingles y and t. Therefore consecutiveboosts along x and y can cause x and y to mix. The result, aswe’ll see in more detail below, is that two consecutive boosts alongnon-collinear axes are not equivalent to a single boost; they areequivalent to a boost plus a spatial rotation. The remainder of thissection discusses this effect, known as Thomas precession, in moredetail; it can be omitted on a first reading.

Self-check: Apply similar reasoning to a Galilean boost.

Section 2.5 Three spatial dimensions 73

Page 74: General Relativity

b / Inertial devices for main-taining a direction in space: 1.A ring laser. 2. The photon ina perfectly reflective sphericalcavity. 3. A gyroscope.

c / A ring laser gyroscopebuilt for use in inertial guidance ofaircraft.

2.5.2 Gyroscopes and the equivalence principle

To see how this mathematical fact would play out as a physicaleffect, we need to consider how to make a physical manifestation ofthe concept of a direction in space.

In two space dimensions, we can construct a ring laser, b/1,which in its simplest incarnation is a closed loop of optical fiberwith a bidirectional laser inserted in one place. Coherent light tra-verses the loop simultaneously in both directions, interfering in abeat pattern, which can be observed by sampling the light at somepoint along the loop’s circumference. If the loop is rotated in itsown plane, the interference pattern is altered, because the beam-sampling device is in a different place, and the path lengths traveledby the two beams has been altered. This phase shift is called theSagnac effect, after M. Georges Sagnac, who observed the effect in1913 and interpreted it, incorrectly, as evidence for the existence ofthe aether.18 The loop senses its own angular velocity relative toan inertial reference frame. If we transport the loop while alwayscarefully adjusting its orientation so as to prevent phase shifts, thenits orientation has been preserved. The atomic clocks used in theHafele-Keating atomic-clock experiment described on page ?? weresubject to the Sagnac effect.

In three spatial dimensions, we could build a spherical cavitywith a reflective inner surface, and release a photon inside, b/2.

In reality, the photon-in-a-cavity is not very practical. The pho-ton would eventually be absorbed or scattered, and it would also bedifficult to accurately initialize the device and read it out later. Amore practical tool is a gyroscope. For example, one of the classictests of general relativity is the 2007 Gravity Probe B experiment(discussed in detail on pages 170 and 214), in which four gyro-scopes aboard a satellite were observed to precess due to special-and general-relativistic effects.

The gyroscope, however, is not so obviously a literal implementa-tion of our basic concept of a direction. How, then, can we be surethat its behavior is equivalent to that of the photon-in-a-cavity?We could, for example, carry out a complete mathematical develop-ment of the angular momentum vector in relativity.19 The equiva-lence principle, however, allows us to bypass such technical details.Suppose that we seal the two devices inside black boxes, with iden-tical external control panels for initializing them and reading themout. We initialize them identically, and then transport them alongside-by-side world-lines. Nonrelativistically, both the mechanicalgyroscope and the photon-gyroscope would maintain absolute, fixeddirections in space. Relativistically, they will not necessarily main-

18Comptes rendus de l’Academie des science 157 (1913) 70819This is done, for example, in Misner, Thorne, and Wheeler, Gravitation, pp.

157-159.

74 Chapter 2 Geometry of Flat Spacetime

Page 75: General Relativity

d / Nonrelativistically, the gy-roscope should not rotate as longas the forces from the hammerare all transmitted to it at itscenter of mass.

tain their orientations. For example, we’ve already seen in section2.5.1 that there are reasons to expect that their orientations willchange if they are subjected to accelerations that are not all alongthe same line. Because relativity is a geometrical theory of space-time, this difference between the classical and relativistic behaviormust be determinable from purely geometrical considerations, suchas the shape of the world-line. If it depended on something else,then we could conceivably see a disagreement in the outputs of thetwo instruments, but this would violate the equivalence principle.

Suppose there were such a discrepancy. That discrepancy wouldbe a physically measurable property of the spacetime region throughwhich the two gyroscopes had been transported. The effect wouldhave a certain magnitude and direction, so by collecting enough datawe could map it out as vector field covering that region of spacetime.This field evidently causes material particles to accelerate, since ithas an effect on the mechanical gyroscope. Roughly speaking (thereasoning will be filled in more rigorously on page 142), the factthat this field acts differently on the two gyroscopes is like getting anon-null result from an Eotvos experiment, and it therefore violatesthe equivalence principle. We conclude that gyroscopes b/2 andb/3 are equivalent. In other words, there can only be one uniquelydefined notion of direction, and the details of how it is implementedare irrelevant.

2.5.3 Boosts causing rotations

As a quantitative example, consider the following thought ex-periment. Put a gyroscope in a box, and send the box around thesquare path shown in figure d at constant speed. The gyroscope de-fines a local coordinate system, which according to classical physicswould maintain its orientation. At each corner of the square, thebox has its velocity vector changed abruptly, as represented by thehammer. We assume that the hits with the hammer are transmittedto the gyroscope at its center of mass, so that they do not resultin any torque. Nonrelativistically, if the set of gyroscopes travelsonce around the square, it should end up at the same place andin the same orientation, so that the coordinate system it defines isidentical with the original one.

For notation, let L(vx) indicate the boost along the x axis de-scribed by the transformation on page 72. This is a transformationthat changes to a frame of reference moving in the negative x direc-tion compared to the original frame. A particle considered to be atrest in the original frame is described in the new frame as movingin the positive x direction. Applying such an L to a vector p, wecalculate Lp, which gives the coordinates of the event as measuredin the new frame. An expression like MLp is equivalent by asso-ciativity to M(Lp), i.e., ML represents applying L first, and thenM .

Section 2.5 Three spatial dimensions 75

Page 76: General Relativity

In this notation, the hammer strikes can be represented by aseries of four Lorentz boosts,

T = L(vx) L(vy) L(−vx) L(−vy) ,

where we assume that the square has negligible size, so that all fourLorentz boosts act in a way that preserves the origin of the coordi-nate systems. (We have no convenient way in our notation L(. . .) todescribe a transformation that does not preserve the origin.) Thefirst transformation, L(−vy), changes coordinates measured by theoriginal gyroscope-defined frame to new coordinates measured bythe new gyroscope-defined frame, after the box has been acceler-ated in the positive y direction.

e / A page from one of Einstein’snotebooks.

The calculation of T is messy, and to be honest, I made a seriesof mistakes when I tried to crank it out by hand. Calculations inrelativity have a reputation for being like this. Figure e shows a pagefrom one of Einstein’s notebooks, written in fountain pen around1913. At the bottom of the page, he wrote “zu umstaendlich,”meaning “too involved.” Luckily we live in an era in which this sortof thing can be handled by computers. Starting at this point in thebook, I will take appropriate opportunities to demonstrate how touse the free and open-source computer algebra system Maxima to

76 Chapter 2 Geometry of Flat Spacetime

Page 77: General Relativity

keep complicated calculations manageable. The following Maximaprogram calculates a particular element of the matrix T .

1 /* For convenience, define gamma in terms of v: */

2 gamma:1/sqrt(1-v*v);

3 /* Define Lx as L(x-hat), Lmx as L(-x-hat), etc.: */

4 Lx:matrix([gamma, gamma*v, 0],

5 [gamma*v, gamma, 0],

6 [0, 0, 1]);

7 Ly:matrix([gamma, 0, gamma*v],

8 [0, 1, 0],

9 [gamma*v, 0, gamma]);

10 Lmx:matrix([gamma, -gamma*v, 0],

11 [-gamma*v, gamma, 0],

12 [0, 0, 1]);

13 Lmy:matrix([gamma, 0, -gamma*v],

14 [0, 1, 0],

15 [-gamma*v, 0, gamma]);

16 /* Calculate the product of the four matrices: */

17 T:Lx.Ly.Lmx.Lmy;

18 /* Define a column vector along the x axis: */

19 P:matrix([0],[1],[0]);

20 /* Find the result of T acting on this vector,

21 expressed as a Taylor series to second order in v: */

22 taylor(T.P,v,0,2);

Statements are terminated by semicolons, and comments are writ-ten like /* ... */ On line 2, we see a symbolic definition of thesymbol gamma in terms of the symbol v. The colon means “is de-fined as.” Line 2 does not mean, as it would in most programminglanguages, to take a stored numerical value of v and use it to cal-culate a numerical value of γ. In fact, v does not have a numericalvalue defined at this point, nor will it ever have a numerical valuedefined for it throughout this program. Line 2 simply means thatwhenever Maxima encounters the symbol gamma, it should take it asan abbreviation for the symbol 1/sqrt(1-v*v). Lines 5-16 definesome 3×3 matrices that represent the L transformations. The basisis t, x, y. Line 18 calculates the product of the four matrices; thedots represent matrix multiplication. Line 23 defines a vector alongthe x axis, expressed as a column matrix (three rows of one columneach) so that Maxima will know how to operate on it using matrixmultiplication by T .

Finally line 26 outputs20 the result of T acting on P:

19 [ 0 + . . . ]

20I’ve omitted some output generated automatically from the earlier steps inthe computation. The (%o9) indicates that this is Maxima’s output from theninth and final step.

Section 2.5 Three spatial dimensions 77

Page 78: General Relativity

20 [ ]

21 (%o9)/T/ [ 1 + . . . ]

22 [ ]

23 [ 2 ]

24 [ - v + . . . ]

In other words,

T

010

=

01−v2

+ . . . ,

where . . . represents higher-order terms in v. Suppose that we usethe initial frame of reference, before T is applied, to determine thata particular reference point, such as a distant star, is along the xaxis. Applying T , we get a new vector TP, which we find has a non-vanishing y component approximately equal to −v2. This result isentirely unexpected classically. It tells us that the gyroscope, ratherthan maintaining its original orientation as it would have done clas-sically, has rotated slightly. It has precessed in the counterclockwisedirection in the x−y plane, so that the direction to the star, as mea-sured in the coordinate system defined by the gyroscope, appearsto have rotated clockwise. As the box moved clockwise around thesquare, the gyroscope has apparently rotated by a counterclockwiseangle χ ≈ v2 about the z axis. We can see that this is a purelyrelativistic effect, since for v 1 the effect is small. For historicalreasons discussed in section 2.5.4, this phenomenon is referred to asthe Thomas precession.

The particular features of this square geometry are not necessary.I chose them so that (1) the boosts would be along the Cartesianaxes, so that we would be able to write them down easily; (2) it isclear that the effect doesn’t arise from any asymmetric treatmentof the spatial axes; and (3) the change in the orientation of thegyroscope can be measured at the same point in space, e.g., bycomparing it with a twin gyroscope that stays at home. In general:

A gyroscope transported around a closed loop in flat space-time changes its orientation compared with one that is notaccelerated.

This is a purely relativistic effect, since a Newtonian gyro-scope does not change its axis of rotation unless subjected toa torque; if the boosts are accomplished by forces that act atthe gyroscope’s center of mass, then there is no nonrelativisticexplanation for the effect.

The effect can occur in the absence of any gravitational fields.That is, this is a phenomenon of special relativity.

78 Chapter 2 Geometry of Flat Spacetime

Page 79: General Relativity

f / The velocity disk.

g / Two excursions in a rocket-ship: one along the y axis andone along x .

The composition of two or more Lorentz boosts along differentaxes is not equivalent to a single boost; it is equivalent to aboost plus a spatial rotation.

Lorentz boosts do not commute, i.e., it makes a difference whatorder we perform them in. Even if there is almost no time lagbetween the first boost and the second, the order of the boostsmatters. If we had applied the boosts in the opposite order,the handedness of the effect would have been reversed.

Self-check: If Lorentz boosts did commute, what would be theconsequences for the expression L(vx) L(vy) L(−vx) L(−vy)?

The velocity disk

Figure f shows a useful way of visualizing the combined effectsof boosts and rotations in 2+1 dimensions. The disk depicts allpossible states of motion relative to some arbitrarily chosen frameof reference. Lack of motion is represented by the point at thecenter. A point at distance v from the center represents motion atvelocity v in a particular direction in the x − y plane. By drawinglittle axes at a particular point, we can represent a particular frameof reference: the frame is in motion at some velocity, with its ownx and y axes are oriented in a particular way.

It turns out to be easier to understand the qualitative behaviorof our mysterious rotations if we switch from the low-velocity limitto the contrary limit of ultrarelativistic velocities. Suppose we havea rocket-ship with an inertial navigation system consisting of twogyroscopes at right angles to one another. We first accelerate theship in the y direction, and the acceleration is steady in the sensethat it feels constant to observers aboard the ship. Since it is rapidi-ties, not velocities, that add linearly, this means that as an observeraboard the ship reads clock times τ1, τ2, . . . , all separated by equalintervals ∆τ , the ship’s rapidity changes at a constant rate, η1, η2,. . . . This results in a series of frames of reference that appear closerand closer together on the diagram as the ship approaches the speedof light, at the edge of the disk. We can start over from the centeragain and repeat the whole process along the x axis, resulting ina similar succession of frames. In both cases, the boosts are beingapplied along a single line, so that there is no rotation of the x andy axes.

Now suppose that the ship were to accelerate along a route likethe one shown in figure h. It first accelerates along the y axis at aconstant rate (again, as judged by its own sensors), until its velocityis very close to the speed of light, A. It then accelerates, again ata self-perceived constant rate and with thrust in a fixed directionas judged by its own gyroscopes, until it is moving at the sameultrarelativistic speed in the x direction, B. Finally, it deceleratesin the x direction until it is again at rest, O. This motion traces out

Section 2.5 Three spatial dimensions 79

Page 80: General Relativity

h / A round-trip involving ul-trarelativistic velocities. All threelegs are at constant acceleration.

i / In the limit where A and Bare ultrarelativistic velocities, legAB is perpendicular to the edgeof the velocity disk. The result isthat the x − y frame determinedby the ship’s gyroscopes hasrotated by 90 degrees by the timeit gets home.

a clockwise loop on the velocity disk. The motion in space is alsoclockwise.

We might naively think that the middle leg of the trip, from Ato B, would be a straight line on the velocity disk, but this can’t bethe case. First, we know that non-collinear boosts cause rotations.Traveling around a clockwise path causes counterclockwise rotation,and vice-versa. Therefore an observer in the rest frame O sees theship (and its gyroscopes) as rotating as it moves from A to B. Theship’s trajectory through space is clockwise, so according to O theship rotates counterclockwise as it goes A to B. The ship is alwaysfiring its engines in a fixed direction as judged by its gyroscopes, butaccording to O the ship is rotating counterclockwise, its thrust isprogressively rotating counterclockwise, and therefore its trajectoryturns counterclockwise. We conclude that leg AB on the velocitydisk is concave, rather than being a straight-line hypotenuse of atriangle OAB.

We can also determine, by the following argument, that leg ABis perpendicular to the edge of the disk where it touches the edge ofthe disk. In the transformation from frame A to frame O, y coordi-nates are dilated by a factor of γ, which approaches infinity in thelimit we’re presently considering. Observers aboard the rocket-ship,occupying frame A, believe that their task is to fire the rocket’sengines at an angle of 45 degrees with respect to the y axis, so asto eliminate their velocity with respect to the origin, and simulta-neously add an equal amount of velocity in the x direction. This45-degree angle in frame A, however, is not a 45-degree angle inframe O. From the stern of the ship to its bow we have displace-ments ∆x and ∆y, and in the transformation from A to O, ∆yis magnified almost infinitely. As perceived in frame O, the ship’sorientation is almost exactly antiparallel to the y axis.21

As the ship travels from A to B, its orientation (as judged inframe O) changes from −y to x. This establishes, in a much moredirect fashion, the direction of the Thomas precession: its handed-ness is contrary to the handedness of the direction of motion. Wecan now also see something new about the fundamental reason forthe effect. It has to do with the fact that observers in differentstates of motion disagree on spatial angles. Similarly, imagine thatyou are a two-dimensional being who was told about the existenceof a new, third, spatial dimension. You have always believed thatthe cosine of the angle between two unit vectors u and v is given by

21Although we will not need any more than this for the purposes of our presentanalysis, a longer and more detailed discussion by Rhodes and Semon, www.

bates.edu/~msemon/RhodesSemonFinal.pdf, Am. J. Phys. 72(7)2004, showsthat this type of inertially guided, constant-thrust motion is always representedon the velocity disk by an arc of a circle that is perpendicular to the disk at itsedge. (We consider a diameter of the disk to be the limiting case of a circle withinfinite radius.)

80 Chapter 2 Geometry of Flat Spacetime

Page 81: General Relativity

the vector dot product uxvx + uyvy. If you were allowed to explorea two-dimensional projection of a three-dimensional scene, e.g., onthe flat screen of a television, it would seem to you as if all theangles had been distorted. You would have no way to interpret thevisual conventions of perspective. But once you had learned aboutthe existence of a z axis, you would realize that these angular dis-tortions were happening because of rotations out of the x− y plane.Such rotations really conserve the quantity uxvx+uyvy +uzvz; onlybecause you were ignoring the uzvz term did it seem that angleswere not being preserved. Similarly, the generalization from threeEuclidean spatial dimensions to 3+1-dimensional spacetime meansthat three-dimensional dot products are no longer conserved.

The general low-v limit

Let’s find the low-v limit of the Thomas precession in general,not just in the highly artificial special case of χ ≈ v2 for the exampleinvolving the four hammer hits. To generalize to the case of smoothacceleration, we first note that the rate of precession dχ/ dt musthave the following properties.

It is odd under a reversal of the direction of motion, v→ −v.(This corresponds to sending the gyroscope around the squarein the opposite direction.)

It is odd under a reversal of the acceleration due to the secondboost, a→ −a.

It is a rotation about the spatial axis perpendicular to theplane of the v and a vectors, in the opposite direction com-pared to the handedness of the curving trajectory.

It is approximately linear in v and a, for small v and a.

The only rotationally invariant mathematical operation that hasthese symmetry properties is the vector cross product, so the rateof precession must be ka× v, where k > 0 is nearly independent ofv and a for small v and a.

To pin down the value of k, we need to find a connection be-tween our two results: χ ≈ v2 for the four hammer hits, anddχ/ dt ≈ ka×v for smooth acceleration. We can do this by consider-ing the physical significance of areas on the velocity disk. As shownin figure j, the rotation χ due to carrying the velocity around theboundary of a region is additive when adjacent regions are joined to-gether. We can therefore find χ for any region by breaking the regiondown into elements of area dA and integrating their contributionsdχ. What is the relationship between dA and dχ? The velocitydisk’s structure is nonuniform, in the sense that near the edge ofthe disk, it takes a larger boost to move a small distance. But we’reinvestigating the low-velocity limit, and in the low-velocity region

Section 2.5 Three spatial dimensions 81

Page 82: General Relativity

j / If the crack between thetwo areas is squashed flat, thetwo pieces of the path on theinterior coincide, and their contri-butions to the precession cancelout (v → −v, but a → +a, soa × v → −a × v). Therefore theprecession χ obtained by goingaround the outside is equal to thesum χ1 + χ2 of the precessionsthat would have been obtained bygoing around the two parts.

k / States in hydrogen are la-beled with their ` and s quantumnumbers, representing theirorbital and spin angular momentain units of ~. The state withs = +1/2 has its spin angularmomentum aligned with its orbitalangular momentum, while thes = −1/2 state has the twoangular momenta in oppositedirections. The direction andorder of magnitude of the splittingbetween the two ` = 1 statesis successfully explained bymagnetic interactions with theproton, but the calculated effectis too big by a factor of 2. Therelativistic Thomas precessioncancels out half of the effect.

near the center of the disk, the disk’s structure is approximatelyuniform. We therefore expect that there is an approximately con-stant proportionality factor relating dA and dχ at low velocities.The example of the hammer corresponds geometrically to a squarewith area v2, so we find that this proportionality factor is unity,dA ≈ dχ.

To relate this to smooth acceleration, consider a particle per-forming circular motion with period T , which has |a×v| = 2πv2/T .Over one full period of the motion, we have χ =

∫k|a × v| dt =

2πkv2, and the particle’s velocity vector traces a circle of area A =πv2 on the velocity disk. Equating A and χ, we find k = 1/2. Theresult is that in the limit of low velocities, the rate of rotation is

Ω ≈ 1

2a× v ,

where Ω is the angular velocity vector of the rotation. In the specialcase of circular motion, this can be written as Ω = (1/2)v2ω, whereω = 2π/T is the angular frequency of the motion.

2.5.4 An experimental test: Thomas precession in hydrogen

If we want to see this precession effect in real life, we should lookfor a system in which both v and a are large. An atom is such asystem.

The Bohr model, introduced in 1913, marked the first quantita-tively successful, if conceptually muddled, description of the atomicenergy levels of hydrogen. Continuing to take c = 1, the over-allscale of the energies was calculated to be proportional to mα2, wherem is the mass of the electron, and α = ke2/~ ≈ 1/137, known as thefine structure constant, is essentially just a unitless way of express-ing the coupling constant for electrical forces. At higher resolution,each excited energy level is found to be split into several sub-levels.The transitions among these close-lying states are in the millime-ter region of the microwave spectrum. The energy scale of this finestructure is ∼ mα4. This is down by a factor of α2 compared to thevisible-light transitions, hence the name of the constant. Uhlenbeckand Goudsmit showed in 1926 that a splitting on this order of mag-nitude was to be expected due to the magnetic interaction betweenthe proton and the electron’s magnetic moment, oriented along itsspin. The effect they calculated, however, was too big by a factorof two.

The explanation of the mysterious factor of two had in fact beenimplicit in a 1916 calculation by Willem de Sitter, one of the firstapplications of general relativity. De Sitter treated the earth-moonsystem as a gyroscope, and found the precession of its axis of rota-tion, which was partly due to the curvature of spacetime and partlydue to the type of rotation described earlier in this section. Theeffect on the motion of the moon was noncumulative, and was only

82 Chapter 2 Geometry of Flat Spacetime

Page 83: General Relativity

about one meter, which was much too small to be measured at thetime. In 1927, however, Llewellyn Thomas applied similar reason-ing to the hydrogen atom, with the electron’s spin vector playingthe role of gyroscope. Since gravity is negligible here, the effect hasnothing to do with curvature of spacetime, and Thomas’s effect cor-responds purely to the special-relativistic part of de Sitter’s result.It is simply the rotation described above, with Ω = (1/2)v2ω. Al-though Thomas was not the first to calculate it, the effect is knownas Thomas precession. Since the electron’s spin is ~/2, the energysplitting is ±(~/2)Ω, depending on whether the electron’s spin is inthe same direction as its orbital motion, or in the opposite direc-tion. This is less than the atom’s gross energy scale ~ω by a factorof v2/2, which is ∼ α2. The Thomas precession cancels out half ofthe magnetic effect, bringing theory in agreement with experiment.

Uhlenbeck later recalled: “...when I first heard about [the Thomasprecession], it seemed unbelievable that a relativistic effect couldgive a factor of 2 instead of something of order v/c... Even thecognoscenti of relativity theory (Einstein included!) were quite sur-prised.”

Section 2.5 Three spatial dimensions 83

Page 84: General Relativity

Problems

1 Suppose that we don’t yet know the exact form of the Lorentztransformation, but we know based on the Michelson-Morley exper-iment that the speed of light is the same in all inertial frames, andwe’ve already determined, e.g., by arguments like those on p. 72,that there can be no length contraction in the direction perpendic-ular to the motion. We construct a “light clock,” consisting simplyof two mirrors facing each other, with a light pulse bouncing backand forth between them.(a) Suppose this light clock is moving at a constant velocity v in thedirection perpendicular to its own optical arm, which is of length L.Use the Pythagorean theorem to prove that the clock experiences atime dilation given by γ = 1/

√1− v2, thereby fixing the time-time

portion of the Lorentz transformation.(b) Why is it significant for the interpretation of special relativitythat the result from part a is independent of L?(c) Carry out a similar calculation in the case where the clock moveswith constant acceleration a as measured in some inertial frame. Al-though the result depends on L, prove that in the limit of small L,we recover the earlier constant-velocity result, with no explicit de-pendence on a.

Remark: Some authors state a “clock postulate” for special relativity, whichsays that for a clock that is sufficiently small, the rate at which it runs de-pends only on v, not a (except in the trivial sense that v and a are relatedby calculus). The result of part c shows that the clock “postulate” is really atheorem, not a statement that is logically independent of the other postulatesof special relativity. Although this argument only applies to a particular fam-ily of light clocks of various sizes, one can also make any small clock into anacceleration-insensitive clock, by attaching an accelerometer to it and apply-ing an appropriate correction to compensate for the clock’s observed sensitivityto acceleration. (It’s still necessary for the clock to be small, since otherwisethe lack of simultaneity in relativity makes it impossible to describe the wholeclock as having a certain acceleration at a certain instant.) Farley at al.22 haveverified the “clock postulate” to within 2% for the radioactive decay of muonswith γ ∼ 12 being accelerated by magnetic fields at 5 × 1018 m/s2. Some peo-ple get confused by this acceleration-independent property of small clocks andthink that it contradicts the equivalence principle. For a good explanation, seehttp://math.ucr.edu/home/baez/physics/Relativity/SR/clock.html.

. Solution, p. 379

22Nuovo Cimento 45 (1966) 281

84 Chapter 2 Geometry of Flat Spacetime

Page 85: General Relativity

A graph from the paper byIijima, showing the time differ-ence between the two clocks.One clock was kept at MitakaObservatory, at 58 m above sealevel. The other was movedback and forth between a secondobservatory, Norikura CoronaStation, and the peak of theNorikura volcano, 2876 m abovesea level. The plateaus on thegraph are data from the periodswhen the clocks were comparedside by side at Mitaka. Thedifference between one plateauand the next is the gravitationaltime dilation accumulated duringthe period when the mobile clockwas at the top of Norikura.

2 Some of the most conceptually direct tests of relativistic timedilation were carried out by comparing the rates of twin atomicclocks, one left on a mountaintop for a certain amount of time, theother in a nearby valley below.23 Unlike the clocks in the Hafele-Keating experiment, these are stationary for almost the entire dura-tion of the experiment, so any time dilation is purely gravitational,not kinematic. One could object, however, that the clocks are notreally at rest relative to one another, due to the earth’s rotation.This is an example of how the distinction between gravitationaland kinematic time dilations is frame-dependent, since the effect ispurely gravitational in the rotating frame, where the gravitationalfield is reduced by the fictitious centrifugal force. Show that, in thenon-rotating frame, the ratio of the kinematic effect to the gravi-tational one comes out to be 2.8 × 10−3 at the latitude of Tokyo.This small value indicates that the experiment can be interpretedas a very pure test of the gravitational time dilation effect. To cal-culate the effect, you will need to use the fact that, as discussedon p. 33, gravitational redshifts can be interpreted as gravitationaltime dilations. . Solution, p. 379

3 (a) On p. 82 (see figure j), we showed that the Thomas pre-cession is proportional to area on the velocity disk. Use a similarargument to show that the Sagnac effect (p. 74) is proportional tothe area enclosed by the loop.(b) Verify this more directly in the special case of a circular loop.(c) Show that a light clock of the type described in problem 1 isinsensitive to rotation with constant angular velocity.(d) Connect these results to the commutativity and transitivity as-sumptions in the Einstein clock synchronization procedure describedon p. 359. . Solution, p. 379

4 Example 15 on page 65 discusses relativistic bounds on theproperties of matter, using the example of pulling a bucket out of ablack hole. Derive a similar bound by considering the possibility ofsending signals out of the black hole using longitudinal vibrations ofa cable, as in the child’s telephone made of two tin cans connectedby a piece of string.

Remark: Surprisingly subtle issues can arise in such calculations; see A.Y.Shiekh, Can. J. Phys. 70, 458 (1992). For a quantitative treatment of a danglingrope in relativity, see Greg Egan, “The Rindler Horizon,” http://gregegan.

customer.netspace.net.au/SCIENCE/Rindler/RindlerHorizon.html.

5 The Maxima program on page 77 demonstrates how to mul-tiply matrices and find Taylor series. Apply this technique to thefollowing problem. For successive Lorentz boosts along the same

23L. Briatore and S. Leschiutta, “Evidence for the earth gravitational shift bydirect atomic-time-scale comparison,” Il Nuovo Cimento B, 37B (2): 219 (1977).Iijima et al., “An experiment for the potential blue shift at the Norikura CoronaStation,” Annals of the Tokyo Astronomical Observatory, Second Series, Vol.XVII, 2 (1978) 68.

Problems 85

Page 86: General Relativity

axis with rapidities η1 and η2, find the matrix representing the com-bined Lorentz transformation, in a Taylor series up to the first non-classical terms in each matrix element. A mixed Taylor series intwo variables can be obtained simply by nesting taylor functions.The taylor function will happily work on matrices, not just scalars.

. Solution, p. 379

86 Chapter 2 Geometry of Flat Spacetime

Page 87: General Relativity

Chapter 3

Differential geometry

General relativity is described mathematically in the language ofdifferential geometry. Let’s take those two terms in reverse order.

The geometry of spacetime is non-Euclidean, not just in thesense that the 3+1-dimensional geometry of Lorentz frames is dif-ferent than that of 4 interchangeable Euclidean dimensions, but alsoin the sense that parallels do not behave in the way described byE5 or A1-A3. In a Lorentz frame, which describes space withoutany gravitational fields, particles whose world-lines are initially par-allel will continue along their parallel world-lines forever. But inthe presence of gravitational fields, initially parallel world-lines offree-falling particles will in general diverge, approach, or even cross.Thus, neither the existence nor the uniqueness of parallels can beassumed. We can’t describe this lack of parallelism as arising fromthe curvature of the world-lines, because we’re using the world-linesof free-falling particles as our definition of a “straight” line. Instead,we describe the effect as coming from the curvature of spacetime it-self. The Lorentzian geometry is a description of the case in whichthis curvature is negligible.

What about the word differential? The equivalence principlestates that even in the presence of gravitational fields, local Lorentzframes exist. How local is “local?” If we use a microscope to zoom inon smaller and smaller regions of spacetime, the Lorentzian approx-imation becomes better and better. Suppose we want to do experi-ments in a laboratory, and we want to ensure that when we comparesome physically observable quantity against predictions made basedon the Lorentz geometry, the resulting discrepancy will not be toolarge. If the acceptable error is ε, then we should be able to get theerror down that low if we’re willing to make the size of our labora-tory no bigger than δ. This is clearly very similar to the Weierstrassstyle of defining limits and derivatives in calculus. In calculus, theidea expressed by differentiation is that every smooth curve can beapproximated locally by a line; in general relativity, the equivalenceprinciple tells us that curved spacetime can be approximated locallyby flat spacetime. But consider that no practitioner of calculus ha-bitually solves problems by filling sheets of scratch paper with ep-silons and deltas. Instead, she uses the Leibniz notation, in which dyand dx are interpreted as infinitesimally small numbers. You maybe inclined, based on your previous training, to dismiss infinitesi-

87

Page 88: General Relativity

a / A vector can be thoughtof as lying in the plane tangent toa certain point.

mals as neither rigorous nor necessary. In 1966, Abraham Robinsondemonstrated that concerns about rigor had been unfounded; we’llcome back to this point in section 3.3. Although it is true that anycalculation written using infinitesimals can also be carried out usinglimits, the following example shows how much more well suited theinfinitesimal language is to differential geometry.

Areas on a sphere Example: 1The area of a region S in the Cartesian plane can be calculatedas∫

S dA, where dA = dx dy is the area of an infinitesimal rectan-gle of width dx and height dy . A curved surface such as a spheredoes not admit a global Cartesian coordinate system in which theconstant coordinate curves are both uniformly spaced and per-pendicular to one another. For example, lines of longitude on theearth’s surface grow closer together as one moves away from theequator. Letting θ be the angle with respect to the pole, and φ theazimuthal angle, the approximately rectangular patch bounded byθ, θ+dθ, φ, and φ+dφ has width r sin θdθ and height r dφ, givingdA = r2 sin θdθdφ. If you look at the corresponding derivation inan elementary calculus textbook that strictly eschews infinitesi-mals, the technique is to start from scratch with Riemann sums.This is extremely laborious, and moreover must be carried outagain for every new case. In differential geometry, the curvatureof the space varies from one point to the next, and clearly wedon’t want to reinvent the wheel with Riemann sums an infinitenumber of times, once at each point in space.

3.1 Tangent vectorsIt’s not immediately clear what a vector means in the context ofcurved spacetime. The freshman physics notion of a vector carriesall kinds of baggage, including ideas like rotation of vectors and amagnitude that is positive for nonzero vectors. We also used toassume the ability to represent vectors as arrows, i.e., geometricalfigures of finite size that could be transported to other places —but in a curved geometry, it is not in general possible to transporta figure to another location without distorting its shape, so there isno notion of congruence. For this reason, it’s better to visualize vec-tors as tangents to the underlying space, as in figure a. Intuitively,we want to think of these vectors as arrows that are infinitesimallysmall, so that they fit on the curved surface without having to bebent. In the pictures, we simply scale them up to make them visi-ble without an infinitely powerful microscope, and this scaling onlymakes them appear to rise out of the space in which they live.

A more formal definition of the notion of a tangent vector isgiven on p. 198.

88 Chapter 3 Differential Geometry

Page 89: General Relativity

a / Construction of an affineparameter in curved spacetime.

3.2 Affine notions and parallel transport3.2.1 The affine parameter in curved spacetime

An important example of the differential, i.e., local, nature of ourgeometry is the generalization of the affine parameter to a contextbroader than affine geometry.

Our construction of the affine parameter with a scaffolding ofparallelograms depended on the existence and uniqueness of paral-lels expressed by A1, so we might imagine that there was no point intrying to generalize the construction to curved spacetime. But theequivalence principle tells us that spacetime is locally affine to someapproximation. Concretely, clock-time is one example of an affineparameter, and the curvature of spacetime clearly can’t prevent usfrom building a clock and releasing it on a free-fall trajectory. Togeneralize the recipe for the construction (figure a), the first obsta-cle is the ambiguity of the instruction to construct parallelogram01q0q1, which requires us to draw 1q1 parallel to 0q0. Suppose weconstrue this as an instruction to make the two segments initiallyparallel, i.e., parallel as they depart the line at 0 and 1. By the timethey get to q0 and q1, they may be converging or diverging.

Because parallelism is only approximate here, there will be acertain amount of error in the construction of the affine parameter.One way of detecting such an error is that lattices constructed withdifferent initial distances will get out of step with one another. Forexample, we can define 1

2 as before by requiring that the latticeconstructed with initial segment 01

2 line up with the original latticeat 1. We will find, however, that they do not quite line up atother points, such as 2. Let’s use this discrepancy ε = 2 − 2′ asa numerical measure of the error. It will depend on both δ1, thedistance 01, and on δ2, the distance between 0 and q0. Since εvanishes for either δ1 = 0 or δ2 = 0, and since the equivalenceprinciple guarantees smooth behavior on small scales, the leadingterm in the error will in general be proportional to the productδ1δ2. In the language of infinitesimals, we can replace δ1 and δ2

with infinitesimally short distances, which for simplicity we assumeto be equal, and which we call dλ. Then the affine parameter λis defined as λ =

∫dλ, where the error of order dλ2 is, as usual,

interpreted as the negligible discrepancy between the integral andits approximation as a Riemann sum.

3.2.2 Parallel transport

If you were alert, you may have realized that I cheated you ata crucial point in this construction. We were to make 1q1 and 0q0

“initially parallel” as they left 01. How should we even define thisidea of “initially parallel?” We could try to do it by making anglesq001 and q112 equal, but this doesn’t quite work, because it doesn’tspecify whether the angle is to the left or the right on the two-

Section 3.2 Affine notions and parallel transport 89

Page 90: General Relativity

b / Parallel transport is path-dependent. On the surface ofthis sphere, parallel-transportinga vector along ABC gives adifferent answer than transportingit along AC.

c / Bad things happen if wetry to construct an affine param-eter along a curve that isn’t ageodesic. This curve is similarto path ABC in figure b. Par-allel transport doesn’t preservethe vectors’ angle relative tothe curve, as it would with ageodesic. The errors in theconstruction blow up in a waythat wouldn’t happen if the curvehad been a geodesic. The fourthdashed parallel flies off wildlyaround the back of the sphere,wrapping around and meetingthe curve at a point, 4, that isessentially random.

dimensional plane of the page. In three or more dimensions, theissue becomes even more serious. The construction workers buildingthe lattice need to keep it all in one plane, but how do they do thatin curved spacetime?

A mathematician’s answer would be that our geometry lackssome additional structure called a connection, which is a rule thatspecifies how one locally flat neighborhood is to be joined seamlesslyonto another locally flat neighborhood nearby. If you’ve ever boughttwo maps and tried to tape them together to make a big map, you’veformed a connection. If the maps were on a large enough scale,you also probably noticed that this was impossible to do perfectly,because of the curvature of the earth.

Physically, the idea is that in flat spacetime, it is possible toconstruct inertial guidance systems like the ones discussed on page74. Since they are possible in flat spacetime, they are also possiblein locally flat neighborhoods of spacetime, and they can then becarried from one neighborhood to another.

In three space dimensions, a gyroscope’s angular momentum vec-tor maintains its direction, and we can orient other vectors, such as1q1, relative to it. Suppose for concreteness that the constructionof the affine parameter above is being carried out in three space di-mensions. We place a gyroscope at 0, orient its axis along 0q0, slideit along the line to 1, and then construct 1q1 along that axis.

In 3+1 dimensions, a gyroscope only does part of the job. Wenow have to maintain the direction of a four-dimensional vector.Four-vectors will not be discussed in detail until section 4.2, butsimilar devices can be used to maintain their orientations in space-time. These physical devices are ways of defining a mathematicalnotion known as parallel transport, which allows us to take a vectorfrom one point to another in space. In general, specifying a notionof parallel transport is equivalent to specifying a connection.

Parallel transport is path-dependent, as shown in figure b.

Affine parameters defined only along geodesics

In the context of flat spacetime, the affine parameter was definedonly along lines, not arbitrary curves, and could not be comparedbetween lines running in different directions. In curved spacetime,the same limitation is present, but with “along lines” replaced by“along geodesics.” Figure c shows what goes wrong if we try toapply the construction to a world-line that isn’t a geodesic. Onedefinition of a geodesic is that it’s the course we’ll end up followingif we navigate by keeping a fixed bearing relative to an inertial guid-ance device such as gyroscope; that is, the tangent to a geodesic,when parallel-transported farther along the geodesic, is still tangent.A non-geodesic curve lacks this property, and the effect on the con-struction of the affine parameter is that the segments nqn drift more

90 Chapter 3 Differential Geometry

Page 91: General Relativity

a / Tullio Levi-Civita (1873-1941) worked on models ofnumber systems possessinginfinitesimals and on differentialgeometry. He invented thetensor notation, which Einsteinlearned from his textbook. Hewas appointed to prestigiousendowed chairs at Padua and theUniversity of Rome, but was firedin 1938 because he was a Jewand an anti-fascist.

and more out of alignment with the curve.

3.3 ModelsA typical first reaction to the phrase “curved spacetime” — or even“curved space,” for that matter — is that it sounds like nonsense.How can featureless, empty space itself be curved or distorted? Theconcept of a distortion would seem to imply taking all the pointsand shoving them around in various directions as in a Picasso paint-ing, so that distances between points are altered. But if space hasno identifiable dents or scratches, it would seem impossible to deter-mine which old points had been sent to which new points, and thedistortion would have no observable effect at all. Why should weexpect to be able to build differential geometry on such a logicallydubious foundation? Indeed, historically, various mathematicianshave had strong doubts about the logical self-consistency of bothnon-Euclidean geometry and infinitesimals. And even if an authori-tative source assures you that the resulting system is self-consistent,its mysterious and abstract nature would seem to make it difficultfor you to develop any working picture of the theory that couldplay the role that mental sketches of graphs play in organizing yourknowledge of calculus.

Models provide a way of dealing with both the logical issues andthe conceptual ones. Figure a on page 89 “pops” off of the page,presenting a strong psychological impression of a curved surface ren-dered in perspective. This suggests finding an actual mathematicalobject, such as a curved surface, that satisfies all the axioms of acertain logical system, such as non-Euclidean geometry. Note thatthe model may contain extrinsic elements, such as the existence ofa third dimension, that are not connected to the system being mod-eled.

Let’s focus first on consistency. In general, what can we sayabout the self-consistency of a mathematical system? To start with,we can never prove anything about the consistency or lack of consis-tency of something that is not a well-defined formal system, e.g., theBible. Even Euclid’s Elements, which was a model of formal rigor forthousands of years, is loose enough to allow considerable ambiguity.If you’re inclined to scoff at the silly Renaissance mathematicianswho kept trying to prove the parallel postulate E5 from postulatesE1-E4, consider the following argument. Suppose that we replaceE5 with E5′, which states that parallels don’t exist: given a line anda point not on the line, no line can ever be drawn through the pointand parallel to the given line. In the new system of plane geometryE′ consisting of E1-E4 plus E5′, we can prove a variety of theorems,and one of them is that there is an upper limit on the area of any fig-ure. This imposes a limit on the size of circles, and that appears tocontradict E3, which says we can construct a circle with any radius.

Section 3.3 Models 91

Page 92: General Relativity

We therefore conclude that E′ lacks self-consistency. Oops! As yourhigh school geometry text undoubtedly mentioned in passing, E′ isa perfectly respectable system called elliptic geometry. So what’swrong with this supposed proof of its lack of self-consistency? Theissue is the exact statement of E3. E3 does not say that we canconstruct a circle given any real number as its radius. Euclid couldnot have intended any such interpretation, since he had no notion ofreal numbers. To Euclid, geometry was primary, and numbers weregeometrically constructed objects, being represented as lengths, an-gles, areas, and volumes. A literal translation of Euclid’s statementof the axiom is “To describe a circle with any center and distance.”1

“Distance” means a line segment. There is therefore no contradic-tion in E′, because E′ has a limit on the lengths of line segments.

Now suppose that such ambiguities have been eliminated fromthe system’s basic definitions and axioms. In general, we expectit to be easier to prove an inconsistent system’s inconsistency thanto demonstrate the consistency of a consistent one. In the formercase, we can start cranking out theorems, and if we can find a wayto prove both proposition P and its negation ¬P, then obviouslysomething is wrong with the system. One might wonder whethersuch a contradiction could remain contained within one corner ofthe system, like nuclear waste. It can’t. Aristotelian logic allowsproof by contradiction: if we prove both P and ¬P based on certainassumptions, then our assumptions must have been wrong. If wecan prove both P and ¬P without making any assumptions, thenproof by contradiction allows us to establish the truth of any ran-domly chosen proposition. Thus a single contradiction is sufficient,in Aristotelian logic, to invalidate the entire system. This goes bythe Latin rubric ex falso quodlibet, meaning “from a falsehood, what-ever you please.” Thus any contradiction proves the inconsistencyof the entire system.

Proving consistency is harder. If you’re mathematically sophisti-cated, you may be tempted to leap directly to Godel’s theorem, andstate that nobody can ever prove the self-consistency of a mathemat-ical system. This would be a misapplication of Godel. Godel’s theo-rem only applies to mathematical systems that meet certain techni-cal criteria, and some of the interesting systems we’re dealing withdon’t meet those criteria; in particular, Godel’s theorem doesn’tapply to Euclidean geometry, and Euclidean geometry was provedself-consistent by Tarski and his students around 1950. Further-more, we usually don’t require an absolute proof of self-consistency.Usually we’re satisfied if we can prove that a certain system, suchas elliptic geometry, is at least as self-consistent as another system,such as Euclidean geometry. This is called equiconsistency. Thegeneral technique for proving equiconsistency of two theories is toshow that a model of one can be constructed within the other.

1Heath, pp. 195-202

92 Chapter 3 Differential Geometry

Page 93: General Relativity

Suppose, for example, that we construct a geometry in which thespace of points is the surface of a sphere, and lines are understoodto be the geodesics, i.e., the great circles whose centers coincide atthe sphere’s center. This geometry, called spherical geometry, isuseful in cartography and navigation. It is non-Euclidean, as wecan demonstrate by exhibiting at least one proposition that is falsein Euclidean geometry. For example, construct a triangle on theearth’s surface with one corner at the north pole, and the othertwo at the equator, separated by 90 degrees of longitude. The sumof its interior angles is 270 degrees, contradicting Euclid, book I,proposition 32. Spherical geometry must therefore violate at leastone of the axioms E1-E5, and indeed it violates both E1 (becauseno unique line is determined by two antipodal points such as thenorth and south poles) and E5 (because parallels don’t exist at all).

A closely related construction gives a model of elliptic geometry,in which E1 holds, and only E5 is thrown overboard. To accomplishthis, we model a point using a diameter of the sphere,2 and a line asthe set of all diameters lying in a certain plane. This has the effectof identifying antipodal points, so that there is now no violation ofE1. Roughly speaking, this is like lopping off half of the sphere, butmaking the edges wrap around. Since this model of elliptic geometryis embedded within a Euclidean space, all the axioms of ellipticgeometry can now be proved as theorems in Euclidean geometry. If acontradiction arose from them, it would imply a contradiction in theaxioms of Euclidean geometry. We conclude that elliptic geometryis equiconsistent with Euclidean geometry. This was known longbefore Tarski’s 1950 proof of Euclidean geometry’s self-consistency,but since nobody was losing any sleep over hidden contradictionsin Euclidean geometry, mathematicians stopped wasting their timelooking for contradictions in elliptic geometry.

Infinitesimals Example: 2Consider the following axiomatically defined system of numbers:

1. It is a field, i.e., it has addition, subtraction, multiplication, anddivision with the usual properties.

2. It is an ordered geometry in the sense of O1-O4 on p. 19, andthe ordering relates to addition and multiplication in the usualway.

3. Existence of infinitesimals: There exists a positive number dsuch that d < 1, d < 1/2, d < 1/3, . . .

A model of this system can be constructed within the real numbersystem by defining d as the identity function d(x) = x and formingthe set of functions of the form f (d) = P(d)/Q(d), where P and Q2The term “elliptic” may be somewhat misleading here. The model is still

constructed from a sphere, not an ellipsoid.

Section 3.3 Models 93

Page 94: General Relativity

b / An Einstein’s ring is formedwhen there is a chance alignmentof a distant source with a closergravitating body. Here, a quasar,MG1131+0456, is seen as aring due to focusing of light byan unknown object, possibly asupermassive black hole. Be-cause the entire arrangementlacks perfect axial symmetry, thering is nonuniform; most of itsbrightness is concentrated in twolumps on opposite sides. Thistype of gravitational lensing isdirect evidence for the curvatureof space predicted by gravita-tional lensing. The two geodesicsform a lune, which is a figurethat cannot exist in Euclideangeometry.

are polynomials with real coefficients. The ordering of functions fand g is defined according to the sign of limx→0+ f (x)− g(x). Ax-ioms 1-3 can all be proved from the real-number axioms. There-fore this system, which includes infinitesimals, is equiconsistentwith the reals. More elaborate constructions can extend this tosystems that have more of the properties of the reals, and abrowser-based calculator that implements such a system is avail-able at lightandmatter.com/calc/inf. Abraham Robinson ex-tended this in 1966 to all of analysis, and thus there is nothing in-trinsically nonrigorous about doing analysis in the style of Gaussand Euler, with symbols like dx representing infinitesimally smallquantities.3

Besides proving consistency, these models give us insight intowhat’s going on. The model of elliptic geometry suggests an in-sight into the reason that there is an upper limit on lengths andareas: it is because the space wraps around on itself. The model ofinfinitesimals suggests a fact that is not immediately obvious fromthe axioms: the infinitesimal quantities compose a hierarchy, so thatfor example 7d is in finite proportion to d, while d2 is like a “lesserflea” in Swift’s doggerel: “Big fleas have little fleas/ On their backsto ride ’em,/ and little fleas have lesser fleas,/And so, ad infinitum.”

Spherical and elliptic geometry are not valid models of a general-relativistic spacetime, since they are locally Euclidean rather thanLorentzian, but they still provide us with enough conceptual guid-ance to come up with some ideas that might never have occurred tous otherwise:

• In spherical geometry, we can have a two-sided polygon calleda lune that encloses a nonzero area. In general relativity, alune formed by the world-lines of two particles represents mo-tion in which the particles separate but are later reunited,presumably because of some mass between them that createda gravitational field. An example is gravitational lensing.

• Both spherical models wraps around on themselves, so thatthey are not topologically equivalent to infinite planes. Wetherefore form a conjecture there may be a link between cur-vature, which is a local property, and topology, which is global.Such a connection is indeed observed in relativity. For exam-ple, cosmological solutions of the equations of general relativ-ity come in two flavors. One type has enough matter in it toproduce more than a certain critical amount of curvature, andthis type is topologically closed. It describes a universe thathas finite spatial volume, and that will only exist for a finite

3More on this topic is available in, for example, Keisler’s Elementary Calcu-lus: An Infinitesimal Approach, Stroyan’s A Brief Introduction to InfinitesimalCalculus, or my own Calculus, all of which are available for free online.

94 Chapter 3 Differential Geometry

Page 95: General Relativity

time before it recontracts in a Big Crunch. The other type,corresponding to the universe we actually inhabit, has infinitespatial volume, will exist for infinite time, and is topologicallyopen.

• There is a distance scale set by the size of the sphere, with itsinverse being a measure of curvature. In general relativity,we expect there to be a similar way to measure curvaturenumerically, although the curvature may vary from point topoint.

Self-check: Prove from the axioms E′ that elliptic geometry, un-like spherical geometry, cannot have a lune with two distinct ver-tices. Convince yourself nevertheless, using the spherical model ofE′, that it is possible in elliptic geometry for two lines to enclose aregion of space, in the sense that from any point P in the region,a ray emitted in any direction must intersect one of the two lines.Summarize these observations with a characterization of lunes inelliptic geometry versus lunes in spherical geometry.

3.4 Intrinsic quantitiesModels can be dangerous, because they can tempt us to imputephysical reality to features that are purely extrinsic, i.e., that areonly present in that particular model. This is as opposed to intrinsicfeatures, which are present in all models, and which are thereforelogically implied by the axioms of the system itself. The existenceof lunes is clearly an intrinsic feature of non-Euclidean geometries,because intersection of lines was defined before any model has evenbeen proposed.

Curvature in elliptic geometry Example: 3What about curvature? In the spherical model of elliptic geom-etry, the size of the sphere is an inverse measure of curvature.Is this a valid intrinsic quantity, or is it extrinsic? It seems sus-pect, because it is a feature of the model. If we try to define“size” as the radius R of the sphere, there is clearly reason forconcern, because this seems to refer to the center of the sphere,but existence of a three-dimensional Euclidean space inside andoutside the surface is clearly an extrinsic feature of the model.There is, however, a way in which a creature confined to the sur-face can determine R, by constructing geodesic and an affineparameter along that geodesic, and measuring the distance λ ac-cumulated until the geodesic returns to the initial point. Sinceantipodal points are identified, λ equals half the circumference ofthe sphere, not its whole circumference, so R = λ/π, by whollyintrinsic methods.

Extrinsic curvature Example: 4Euclid’s axioms E1-E5 refer to explicit constructions. If a two-

Section 3.4 Intrinsic quantities 95

Page 96: General Relativity

dimensional being can physically verify them all as descriptions ofthe two-dimensional space she inhabits, then she knows that herspace is Euclidean, and that propositions such as the Pythagoreantheorem are physically valid in her universe. But the diagram ina/1 illustrating illustrating the proof of the Pythagorean theorem inEuclid’s Elements (proposition I.47) is equally valid if the page isrolled onto a cylinder, 2, or formed into a wavy corrugated shape,3. These types of curvature, which can be achieved without tear-ing or crumpling the surface, are extrinsic rather than intrinsic. Ofthe curved surfaces in figure a, only the sphere, 4, has intrinsiccurvature; the diagram can’t be plastered onto the sphere withoutfolding or cutting and pasting.

a / Example 4.

Self-check: How would the ideas of example 4 apply to a cone?

Example 4 shows that it can be difficult to sniff out bogus ex-trinsic features that seem intrinsic, and example 3 suggests the de-sirability of developing methods of calculation that never refer toany extrinsic quantities, so that we never have to worry whether asymbol like R staring up at us from a piece of paper is intrinsic.This is why it is unlikely to be helpful to a student of general rel-ativity to pick up a book on differential geometry that was writtenwithout general relativity specifically in mind. Such books have atendency to casually mix together intrinsic and extrinsic notation.For example, a vector cross product a× b refers to a vector pokingout of the plane occupied by a and b, and the space outside theplane may be extrinsic; it is not obvious how to generalize this op-eration to the 3+1 dimensions of relativity (since the cross productis a three-dimensional beast), and even if it were, we could not beassured that it would have any intrinsically well defined meaning.

3.4.1 Coordinate independence

To see how to proceed in creating a manifestly intrinsic notation,consider the two types of intrinsic observations that are available in

96 Chapter 3 Differential Geometry

Page 97: General Relativity

general relativity:

• 1. We can tell whether events and world-lines are incident :whether or not two lines intersect, two events coincide, or anevent lies on a certain line.

Incidence measurements, for example detection of gravitational lens-ing, are global, but they are the only global observations we cando.4 If we were limited entirely to incidence, spacetime would bedescribed by the austere system of projective geometry, a geometrywithout parallels or measurement. In projective geometry, all propo-sitions are essentially statements about combinatorics, e.g., that itis impossible to plant seven trees so that they form seven lines ofthree trees each.

But:

• 2. We can also do measurements in local Lorentz frames.

This gives us more power, but not as much as we might expect.Suppose we define a coordinate such as t or x. In Newtonian me-chanics, these coordinates would form a predefined background, apreexisting stage for the actors. In relativity, on the other hand,consider a completely arbitrary change of coordinates of the formx → x′ = f(x), where f is a smooth one-to-one function. For ex-ample, we could have x→ x+ px3 + q sin(rx) (with p and q chosensmall enough so that the mapping is always one-to-one). Since themapping is one-to-one, the new coordinate system preserves all theincidence relations. Since the mapping is smooth, the new coordi-nate system is still compatible with the existence of local Lorentzframes. The difference between the two coordinate systems is there-fore entirely extrinsic, and we conclude that a manifestly intrinsicnotation should avoid any explicit reference to a coordinate system.That is, if we write a calculation in which a symbol such as x ap-pears, we need to make sure that nowhere in the notation is thereany hidden assumption that x comes from any particular coordinatesystem. For example, the equation should still be valid if the genericsymbol x is later taken to represent the distance r from some centerof symmetry. This coordinate-independence property is also knownas general covariance, and this type of smooth change of coordinatesis also called a diffeomorphism.

The Dehn twist Example: 5As an exotic example of a change of coordinates, take a torusand label it with coordinates (θ,φ), where θ+2π is taken to be thesame as θ, and similarly for φ. Now subject it to the coordinatetransformation T defined by θ → θ + φ, which is like opening the4Einstein referred to incidence measurements as “determinations of space-

time coincidences.” For his presentation of this idea, see p. 376.

Section 3.4 Intrinsic quantities 97

Page 98: General Relativity

b / A series of Lorentz boostsacts on a square.

torus, twisting it by a full circle, and then joining the ends backtogether. T is known as the “Dehn twist,” and it is different frommost of the coordinate transformations we do in relativity becauseit can’t be done smoothly, i.e., there is no continuous function f (x)on 0 ≤ x ≤ 1 such that every value of f is a smooth coordinatetransformation, f (0) is the identity transformation, and f (1) = T .

Frames moving at c?

A good application of these ideas is to the question of what theworld would look like in a frame of reference moving at the speedof light. This question has a long and honorable history. As ayoung student, Einstein tried to imagine what an electromagneticwave would look like from the point of view of a motorcyclist ridingalongside it. We now know, thanks to Einstein himself, that it reallydoesn’t make sense to talk about such observers.

The most straightforward argument is based on the positivistidea that concepts only mean something if you can define how tomeasure them operationally. If we accept this philosophical stance(which is by no means compatible with every concept we ever discussin physics), then we need to be able to physically realize this framein terms of an observer and measuring devices. But we can’t. Itwould take an infinite amount of energy to accelerate Einstein andhis motorcycle to the speed of light.

Since arguments from positivism can often kill off perfectly in-teresting and reasonable concepts, we might ask whether there areother reasons not to allow such frames. There are. Recall thatwe placed two technical conditions on coordinate transformations:they are supposed to be smooth and one-to-one. The smoothnesscondition is related to the inability to boost Einstein’s motorcycleinto the speed-of-light frame by any continuous, classical process.(Relativity is a classical theory.) But independent of that, we havea problem with the one-to-one requirement. Figure b shows whathappens if we do a series of Lorentz boosts to higher and highervelocities. It should be clear that if we could do a boost up to a ve-locity of c, we would have effected a coordinate transformation thatwas not one-to-one. Every point in the plane would be mapped ontoa single lightlike line.

3.5 The metricConsider a coordinate x defined along a certain curve, which is notnecessarily a geodesic. For concreteness, imagine this curve to existin two spacelike dimensions, which we can visualize as the surfaceof a sphere embedded in Euclidean 3-space. These concrete featuresare not strictly necessary, but they drive home the point that weshould not expect to be able to define x so that it varies at a steadyrate with elapsed distance; for example, we know that it will not be

98 Chapter 3 Differential Geometry

Page 99: General Relativity

a / The tick marks on the linedefine a coordinate measuredalong the line. It is not possible toset up such a coordinate systemglobally so that the coordinateis uniform everywhere. Thearrows represent changes in thevalue of the coordinate; since thechanges in the coordinate areall equal, the arrows are all thesame length.

possible to define a two-dimensional Cartesian grid on the surfaceof a sphere. In the figure, the tick marks are therefore not evenlyspaced. This is perfectly all right, given the coordinate invariance ofgeneral relativity. Since the incremental changes in x are equal, I’verepresented them below the curve as little vectors of equal length.They are the wrong length to represent distances along the curve,but this wrongness is an inevitable fact of life in relativity.

Now suppose we want to integrate the arc length of a segmentof this curve. The little vectors are infinitesimal. In the integratedlength, each little vector should contribute some amount, which isa scalar. This scalar is not simply the magnitude of the vector,ds 6=

√dx · dx, since the vectors are the wrong length. Figure a

is clearly reminiscent of the geometrical picture of vectors and dualvectors developed on p. 48. But the purely affine notion of vectorsand their duals is not enough to define the length of a vector ingeneral; it is only sufficient to define a length relative to other lengthsalong the same geodesic. When vectors lie along different geodesics,we need to be able to specify the additional conversion factor thatallows us to compare one to the other. The piece of machinery thatallows us to do this is called a metric.

Fixing a metric allows us to define the proper scaling of the tickmarks relative to the arrows at a given point, i.e., in the birdtracksnotation it gives us a natural way of taking a displacement vectorsuch as s , with the arrow pointing into the symbol, and makinga corresponding dual vector s , with the arrow coming out. Thisis a little like cloning a person but making the clone be of the op-posite sex. Hooking them up like s s then tells us the squared

magnitude of the vector. For example, if dx is an infinitesimal

timelike displacement, then dx dx is the squared time interval dx2

measured by a clock traveling along that displacement in spacetime.(Note that in the notation dx2, it’s clear that dx is a scalar, because

unlike dx and dx it doesn’t have any arrow coming in or outof it.) Figure b shows the resulting picture.

b / The vectors dx and dxare duals of each other.

In the abstract index notation introduced on p. 51, the vectorsdx and dx are written dxa and dxa. When a specific coordinate

system has been fixed, we write these with concrete, Greek indices,dxµ and dxµ. In an older and conceptually incompatible notation

Section 3.5 The metric 99

Page 100: General Relativity

and terminology due to Sylvester (1853), one refers to dxµ as a con-travariant vector, and dxµ as covariant. The confusing terminologyis summarized on p. 401.

The assumption that a metric exists is nontrivial. There is nometric in Galilean spacetime, for example, since in the limit c→∞the units used to measure timelike and spacelike displacements arenot comparable. Assuming the existence of a metric is equivalent toassuming that the universe holds at least one physically manipulableclock or ruler that can be moved over long distances and acceleratedas desired. In the distant future, large and causally isolated regionsof the cosmos may contain only massless particles such as photons,which cannot be used to build clocks (or, equivalently, rulers); thephysics of these regions will be fully describable without a metric.If, on the other hand, our world contains not just zero or one buttwo or more clocks, then the metric hypothesis requires that theseclocks maintain a consistent relative rate when accelerated alongthe same world-line. This consistency is what allows us to thinkof relativity as a theory of space and time rather than a theory ofclocks and rulers. There are other relativistic theories of gravitybesides general relativity, and some of these violate this hypothesis.

Given a dxµ, how do we find its dual dxµ, and vice versa? Inone dimension, we simply need to introduce a real number g as acorrection factor. If one of the vectors is shorter than it should bein a certain region, the correction factor serves to compensate bymaking its dual proportionately longer. The two possible mappings(covariant to contravariant and contravariant to covariant) are ac-complished with factors of g and 1/g. The number g is the metric,and it encodes all the information about distances. For example, ifφ represents longitude measured at the arctic circle, then the metricis the only source for the datum that a displacement dφ correspondsto 2540 km per radian.

Now let’s generalize to more than one dimension. Because glob-ally Cartesian coordinate systems can’t be imposed on a curvedspace, the constant-coordinate lines will in general be neither evenlyspaced nor perpendicular to one another. If we construct a localset of basis vectors lying along the intersections of the constant-coordinate surfaces, they will not form an orthonormal set. Wewould like to have an expression of the form ds2 = Σ dxµ dxµ forthe squared arc length, and using the Einstein summation notationthis becomes

ds2 = dxµ dxµ .

3.5.1 The Euclidean metric

For Cartesian coordinates in a Euclidean plane, where one doesn’tnormally bother with the distinction between covariant and con-travariant vectors, this expression for ds2 is simply the Pythagorean

100 Chapter 3 Differential Geometry

Page 101: General Relativity

theorem, summed over two values of µ for the two coordinates:

ds2 = dxµ dxµ = dx2 + dy2

The symbols dx, ds0, dx0, and dx0 are all synonyms, and likewisefor dy, ds1, dx1, and dx1. (Because notations such as ds1 force thereader to keep track of which digits have been assigned to whichletters, it is better practice to use notation such as dy or dsy; thelatter notation could in principle be confused with one in which ywas a variable taking on values such as 0 or 1, but in reality weunderstand it from context, just as we understand that the d’s indy/dx are not referring to some variable d that stands for a number.)

In the non-Euclidean case, the Pythagorean theorem is false; dxµ

and dxµ are no longer synonyms, so their product is no longer simplythe square of a distance. To see this more explicitly, let’s write theexpression so that only the covariant quantities occur. By localflatness, the relationship between the covariant and contravariantvectors is linear, and the most general relationship of this kind isgiven by making the metric a symmetric matrix gµν . Substitutingdxµ = gµνx

ν , we have

ds2 = gµν dxµ dxν ,

where there are now implied sums over both µ and ν. Notice howimplied sums occur only when the repeated index occurs once asa superscript and once as a subscript; other combinations are un-grammatical.

Self-check: Why does it make sense to demand that the metricbe symmetric?

On p. 46 we encountered the distinction among scalars, vectors,and dual vectors. These are specific examples of tensors, which canbe expressed in the birdtracks notation as objects with m arrowscoming in and n coming out, or. In index notation, we have msuperscripts and n subscripts. A scalar has m = n = 0. A dualvector has (m,n) = (0, 1), a vector (1, 0), and the metric (0, 2). Werefer to the number of indices as the rank of the tensor. Tensors arediscussed in more detail, and defined more rigorously, in chapter 4.For our present purposes, it is important to note that just becausewe write a symbol with subscripts or superscripts, that doesn’t meanit deserves to be called a tensor. This point can be understood inthe more elementary context of Newtonian scalars and vectors. Forexample, we can define a Euclidean “vector” u = (m,T , e), wherem is the mass of the moon, T is the temperature in Chicago, ande is the charge of the electron. This creature u doesn’t deserveto be called a vector, because it doesn’t behave as a vector underrotation. The general philosophy is that a tensor is something thathas certain properties under changes of coordinates. For example,we’ve already seen on p. 48 the different scaling behavior of tensorswith ranks (1, 0), (0, 0), and (0, 1).

Section 3.5 The metric 101

Page 102: General Relativity

When discussing the symmetry of rank-2 tensors, it is convenientto introduce the following notation:

T(ab) =1

2(Tab + Tba)

T[ab] =1

2(Tab − Tba)

Any Tab can be split into symmetric and antisymmetric parts. Thisis similar to writing an arbitrary function as a sum of and oddfunction and an even function. The metric has only a symmetricpart: g(ab) = gab, and g[ab] = 0. This notation is generalized toranks greater than 2 on page 184.

Self-check: Characterize an antisymmetric rank-2 tensor in twodimensions.

A change of scale Example: 6. How is the effect of a uniform rescaling of coordinates repre-sented in g?

. If we change our units of measurement so that xµ → αxµ, whiledemanding that ds2 come out the same, then we need gµν →α−2gµν.

Comparing with p. 48, we deduce the general rule that a tensorof rank (m, n) transforms under scaling by picking up a factor ofαm−n.

Polar coordinates Example: 7Consider polar coordinates (r , θ) in a Euclidean plane. The const-ant-coordinate curves happen to be orthogonal everywhere, sothe off-diagonal elements of the metric grθ and gθr vanish. In-finitesimal coordinate changes dr and dθ correspond to infinitesi-mal displacements dr and r dθ in orthogonal directions, so by thePythagorean theorem, ds2 = dr2 + r2 dθ2, and we read off theelements of the metric gr r = 1 and gθθ = r2.

Notice how in example 7 we started from the generally validrelation ds2 = gµν dxµ dxν , but soon began writing down facts likegθθ = r2 that were only valid in this particular coordinate system.To make it clear when this is happening, we maintain the distinctionbetween abtract Latin indices and concrete Greek indices introducedon p. 51. For example, we can write the general expression forsquared differential arc length with Latin indices,

ds2 = gij dxi dxj ,

because it holds regardless of the coordinate system, whereas thevanishing of the off-diagonal elements of the metric in Euclideanpolar coordinates has to be written as gµν = 0 for µ 6= ν, since itwould in general be false if we used a different coordinate system todescribe the same Euclidean plane.

102 Chapter 3 Differential Geometry

Page 103: General Relativity

c / Example 8.

Oblique Cartesian coordinates Example: 8. Oblique Cartesian coordinates are like normal Cartesian coor-dinates in the plane, but their axes are at at an angle φ 6= π/2 toone another. Find the metric in these coordinates. The space isglobally Euclidean.

. Since the coordinates differ from Cartesian coordinates only inthe angle between the axes, not in their scales, a displacementdx i along either axis, i = 1 or 2, must give ds = dx , so for the diag-onal elements we have g11 = g22 = 1. The metric is always sym-metric, so g12 = g21. To fix these off-diagonal elements, considera displacement by ds in the direction perpendicular to axis 1. Thischanges the coordinates by dx1 = −ds cotφ and dx2 = ds cscφ.We then have

ds2 = gi j dx i dx j

= ds2(cot2φ + csc2φ− 2g12 cosφ cscφ)g12 = cosφ .

Area Example: 9In one dimension, g is a single number, and lengths are given byds =

√g dx . The square root can also be understood through

example 6 on page 102, in which we saw that a uniform rescalingx → αx is reflected in gµν → α−2gµν.

In two-dimensional Cartesian coordinates, multiplication of thewidth and height of a rectangle gives the element of area dA =√

g11g22 dx1 dx2. Because the coordinates are orthogonal, g isdiagonal, and the factor of

√g11g22 is identified as the square root

of its determinant, so dA =√|g|dx1 dx2. Note that the scales on

the two axes are not necessarily the same, g11 6= g22.

The same expression for the element of area holds even if the co-ordinates are not orthogonal. In example 8, for instance, we have√|g| =

√1− cos2φ = sinφ, which is the right correction factor

corresponding to the fact that dx1 and dx2 form a parallelepipedrather than a rectangle.

Area of a sphere Example: 10For coordinates (θ,φ) on the surface of a sphere of radius r , wehave, by an argument similar to that of example 7 on page 102,gθθ = r2, gφφ = r2 sin2 θ, gθφ = 0. The area of the sphere is

A =∫

dA

=∫ ∫ √

|g|dθdφ

= r2∫ ∫

sin θdθdφ

= 4πr2

Section 3.5 The metric 103

Page 104: General Relativity

Inverse of the metric Example: 11. Relate g i j to gi j .

. The notation is intended to treat covariant and contravariantvectors completely symmetrically. The metric with lower indicesgi j can be interpreted as a change-of-basis transformation from acontravariant basis to a covariant one, and if the symmetry of thenotation is to be maintained, g i j must be the corresponding in-verse matrix, which changes from the covariant basis to the con-travariant one. The metric must always be invertible.

In the one-dimensional case, p. 99, the metric at any givenpoint was simply some number g, and we used factors of g and1/g to convert back and forth between covariant and contravariantvectors. Example 11 makes it clear how to generalize this to moredimensions:

xa = gabxb

xa = gabxb

This is referred to as raising and lowering indices. There is no needto memorize the positions of the indices in these rules; they arethe only ones possible based on the grammatical rules, which arethat summation only occurs over top-bottom pairs, and upper andlower indices have to match on both sides of the equals sign. Thiswhole system, introduced by Einstein, is called “index-gymnastics”notation.

Raising and lowering indices on a rank-two tensor Example: 12In physics we encounter various examples of matrices, such asthe moment of inertia tensor from classical mechanics. Thesehave two indices, not just one like a vector. Again, the rules forraising and lowering indices follow directly from grammar. Forexample,

Aab = gacAcb

andAab = gacgbdAcd .

A matrix operating on a vector Example: 13The row and column vectors from linear algebra are the covari-ant and contravariant vectors in our present terminology. (Theconvention is that covariant vectors are row vectors and con-travariant ones column vectors, but I don’t find this worth memo-rizing.) What about matrices? A matrix acting on a column vectorgives another column vector, q = Up. Translating this into index-gymnastics notation, we have

qa = U ......p

b ,

104 Chapter 3 Differential Geometry

Page 105: General Relativity

where we want to figure out the correct placement of the indiceson U. Grammatically, the only possible placement is

qa = Uab pb .

This shows that the natural way to represent a column-vector-to-column-vector linear operator is as a rank-2 tensor with one upperindex and one lower index.

In birdtracks notation, a rank-2 tensor is something that has twoarrows connected to it. Our example becomes → q =→ U → p.That the result is itself an upper-index vector is shown by the factthat the right-hand-side taken as a whole has a single externalarrow coming into it.

The distinction between vectors and their duals may seem ir-relevant if we can always raise and lower indices at will. We can’talways do that, however, because in many perfectly ordinary situa-tions there is no metric. See example 6, p. 49.

3.5.2 The Lorentz metric

In a locally Euclidean space, the Pythagorean theorem allows usto express the metric in local Cartesian coordinates in the simpleform gµµ = +1, gµν = 0, i.e., g = diag(+1, +1, . . . , +1). This isnot the appropriate metric for a locally Lorentz space. The axiomsof Euclidean geometry E3 (existence of circles) and E4 (equality ofright angles) describe the theory’s invariance under rotations, andthe Pythagorean theorem is consistent with this, because it gives thesame answer for the length of a vector even if its components arereexpressed in a new basis that is rotated with respect to the originalone. In a Lorentzian geometry, however, we care about invarianceunder Lorentz boosts, which do not preserve the quantity t2 + x2.It is not circles in the (t,x) plane that are invariant, but light cones,and this is described by giving gtt and gxx opposite signs and equalabsolute values. A lightlike vector (t,x), with t = x, therefore hasa magnitude of exactly zero,

s2 = gttt2 + gxxx

2 = 0 ,

and this remains true after the Lorentz boost (t,x) → (γt, γx). Itis a matter of convention which element of the metric to make pos-itive and which to make negative. In this book, I’ll use gtt = +1and gxx = −1, so that g = diag(+1,−1). This has the advan-tage that any line segment representing the timelike world-line of aphysical object has a positive squared magnitude; the forward flowof time is represented as a positive number, in keeping with thephilosophy that relativity is basically a theory of how causal rela-tionships work. With this sign convention, spacelike vectors havepositive squared magnitudes, timelike ones negative. The same con-vention is followed, for example, by Penrose. The opposite version,

Section 3.5 The metric 105

Page 106: General Relativity

with g = diag(−1, +1) is used by authors such as Wald and Misner,Thorne, and Wheeler.

Our universe does not have just one spatial dimension, it hasthree, so the full metric in a Lorentz frame is given byg = diag(+1,−1,−1,−1).

Mixed covariant-contravariant form of the metric Example: 14In example 12 on p. 104, we saw how to raise and lower indiceson a rank-two tensor, and example 13 showed that it is sometimesnatural to consider the form in which one index is raised and onelowered. The metric itself is a rank-two tensor, so let’s see whathappens when we compute the mixed form ga

b from the lower-index form. In general, we have

Aab = gacAcb ,

and substituting g for A gives

gab = gacgcb .

But we already know that g... is simply the inverse matrix of g...(example 11, p. 104), which means that ga

b is simply the identitymatrix. That is, whereas a quantity like gab or gab carries all theinformation about our system of measurement at a given point,ga

b carries no information at all. Where gab or gab can have bothpositive and negative elements, elements that have units, andoff-diagonal elements, ga

b is just a generic symbol carrying noinformation other than the dimensionality of the space.

The metric tensor is so commonly used that it is simply left out ofbirdtrack diagrams. Consistency is maintained because becausega

b is the identity matrix, so→ g → is the same as→→.

3.5.3 Isometry, inner products, and the Erlangen program

In Euclidean geometry, the dot product of vectors a and b isgiven by gxxaxbx + gyyayby + gzzazbz = axbx + ayby + azbz, and inthe special case where a = b we have the squared magnitude. Inthe tensor notation, aµbν = a1b1 + a2b2 + a3b3. Like magnitudes,dot products are invariant under rotations. This is because know-ing the dot product of vectors a and b entails knowing the valueof a · b = |a||a| cos θab, and Euclid’s E4 (equality of right angles)implies that the angle θab is invariant. the same axioms also entailinvariance of dot products under translation; Euclid waits only untilthe second proposition of the Elements to prove that line segmentscan be copied from one location to another. This seeming triviality isactually false as a description of physical space, because it amountsto a statement that space has the same properties everywhere.

The set of all transformations that can be built out of succes-sive translations, rotations, and reflections is called the group of

106 Chapter 3 Differential Geometry

Page 107: General Relativity

isometries. It can also be defined as the group5 that preserves dotproducts, or the group that preserves congruence of triangles.

In Lorentzian geometry, we usually avoid the Euclidean termdot product and refer to the corresponding operation by the moregeneral term inner product. In a specific coordinate system we haveaµbν = a0b0−a1b1−a2b2−a3b3. The inner product is invariant underLorentz boosts, and also under the Euclidean isometries. The groupfound by making all possible combinations of continuous transfor-mations6 from these two sets is called the Poincare group. ThePoincare group is not the symmetry group of all of spacetime, sincecurved spacetime has different properties in different locations. Theequivalence principle tells us, however, that space can be approxi-mated locally as being flat, so the Poincare group is locally valid,just as the Euclidean isometries are locally valid as a description ofgeometry on the Earth’s curved surface.

The triangle inequality Example: 15In Euclidean geometry, the triangle inequality |b + c| < |b| + |c|follows from

(|b| + |c|)2 − (b + c) · (b + c) = 2(|b||c| − b · c) ≥ 0 .

The reason this quantity always comes out positive is that for twovectors of fixed magnitude, the greatest dot product is alwaysachieved in the case where they lie along the same direction.

In Lorentzian geometry, the situation is different. Let b and c betimelike vectors, so that they represent possible world-lines. Thenthe relation a = b+c suggests the existence of two observers whotake two different paths from one event to another. A goes by adirect route while B takes a detour. The magnitude of each time-like vector represents the time elapsed on a clock carried by theobserver moving along that vector. The triangle equality is nowreversed, becoming |b + c| > |b| + |c|. The difference from theEuclidean case arises because inner products are no longer nec-essarily maximized if vectors are in the same direction. E.g., fortwo lightlike vectors, bicj vanishes entirely if b and c are paral-lel. For timelike vectors, parallelism actually minimizes the innerproduct rather than maximizing it.7

5In mathematics, a group is defined as a binary operation that has an identity,inverses, and associativity. For example, addition of integers is a group. In thepresent context, the members of the group are not numbers but the transforma-tions applied to the Euclidean plane. The group operation on transformationsT1 and T2 consists of finding the transformation that results from doing one andthen the other, i.e., composition of functions.

6The discontinuous transformations of spatial reflection and time reversal arenot included in the definition of the Poincare group, although they do preserveinner products. General relativity has symmetry under spatial reflection (calledP for parity), time reversal (T), and charge inversion (C), but the standardmodel of particle physics is only invariant under the composition of all three,CPT, not under any of these symmetries individually.

7Proof: Let b and c be parallel and timelike, and directed forward in time.

Section 3.5 The metric 107

Page 108: General Relativity

d / Observer A, rotating withthe carousel, measures anazimuthal distance with a ruler.

In his 1872 inaugural address at the University of Erlangen, FelixKlein used the idea of groups of transformations to lay out a gen-eral classification scheme, known as the Erlangen program, for allthe different types of geometry. Each geometry is described by thegroup of transformations, called the principal group, that preservesthe truth of geometrical statements. Euclidean geometry’s principalgroup consists of the isometries combined with arbitrary changes ofscale, since there is nothing in Euclid’s axioms that singles out aparticular distance as a unit of measurement. In other words, theprincipal group consists of the transformations that preserve simi-larity, not just those that preserve congruence. Affine geometry’sprincipal group is the transformations that preserve parallelism; itincludes shear transformations, and there is therefore no invariantnotion of angular measure or congruence. Unlike Euclidean andaffine geometry, elliptic geometry does not have scale invariance.This is because there is a particular unit of distance that has specialstatus; as we saw in example 3 on page 95, a being living in an el-liptic plane can determine, by entirely intrinsic methods, a distancescale R, which we can interpret in the hemispherical model as theradius of the sphere. General relativity breaks this symmetry evenmore severely. Not only is there a scale associated with curvature,but the scale is different from one point in space to another.

3.5.4 Einstein’s carousel

Non-Euclidean geometry observed in the rotating frame

The following example was historically important, because Ein-stein used it to convince himself that general relativity should bedescribed by non-Euclidean geometry.8 Its interpretation is alsofairly subtle, and the early relativists had some trouble with it.

Suppose that observer A is on a spinning carousel while observerB stands on the ground. B says that A is accelerating, but by theequivalence principle A can say that she is at rest in a gravitationalfield, while B is free-falling out from under her. B measures theradius and circumference of the carousel, and finds that their ratiois 2π. A carries out similar measurements, but when she puts hermeter-stick in the azimuthal direction it becomes Lorentz-contractedby the factor γ = (1−ω2r2)−1/2, so she finds that the ratio is greaterthan 2π. In A’s coordinates, the spatial geometry is non-Euclidean,

Adopt a frame of reference in which every spatial component of each vectorvanishes. This entails no loss of generality, since inner products are invariantunder such a transformation. Since the time-ordering is also preserved undertransformations in the Poincare group, each is still directed forward in time, notbackward. Now let b and c be pulled away from parallelism, like opening a pairof scissors in the x − t plane. This reduces btct , while causing bx cx to becomenegative. Both effects increase the inner product.

8The example is described in Einstein’s paper “The Foundation of the GeneralTheory of Relativity.” An excerpt, which includes the example, is given onp. 372.

108 Chapter 3 Differential Geometry

Page 109: General Relativity

and the metric differs from the Euclidean one found in example 7on page 102.

Observer A feels a force that B considers to be fictitious, butthat, by the equivalence principle, A can say is a perfectly realgravitational force. According to A, an observer like B is free-fallingaway from the center of the disk under the influence of this gravita-tional field. A also observes that the spatial geometry of the carouselis non-Euclidean. Therefore it seems reasonable to conjecture thatgravity can be described by non-Euclidean geometry, rather than asa physical force in the Newtonian sense.

At this point, you know as much about this example as Einsteindid in 1912, when he began using it as the seed from which generalrelativity sprouted, collaborating with his old schoolmate, mathe-matician Marcel Grossmann, who knew about differential geometry.The remainder of subsection 3.5.4, which you may want to skip on afirst reading, goes into more detail on the interpretation and math-ematical description of the rotating frame of reference. Even moredetailed treatments are given by Grøn9 and Dieks.10.

Ehrenfest’s paradox

Ehrenfest11 described the following paradox. Suppose that ob-server B, in the lab frame, measures the radius of the disk to be rwhen the disk is at rest, and r′ when the disk is spinning. B canalso measure the corresponding circumferences C and C ′. BecauseB is in an inertial frame, the spatial geometry does not appear non-Euclidean according to measurements carried out with his metersticks, and therefore the Euclidean relations C = 2πr and C ′ = 2πr′

both hold. The radial lines are perpendicular to their own motion,and they therefore have no length contraction, r = r′, implyingC = C ′. The outer edge of the disk, however, is everywhere tan-gent to its own direction of motion, so it is Lorentz contracted, andtherefore C ′ < C. The resolution of the paradox is that it rests onthe incorrect assumption that a rigid disk can be made to rotate.If a perfectly rigid disk was initially not rotating, one would haveto distort it in order to set it into rotation, because once it wasrotating its outer edge would no longer have a length equal to 2πtimes its radius. Therefore if the disk is perfectly rigid, it can neverbe rotated. As discussed on page 65, relativity does not allow theexistence of infinitely rigid or infinitely strong materials. If it did,then one could violate causality. If a perfectly rigid disk existed, vi-brations in the disk would propagate at infinite velocity, so tappingthe disk with a hammer in one place would result in the transmis-

9Relativistic description of a rotating disk, Am. J. Phys. 43 (1975) 86910Space, Time, and Coordinates in a Rotating World, http://www.phys.uu.

nl/igg/dieks11P. Ehrenfest, Gleichformige Rotation starrer Korper und Relativitatstheorie,

Z. Phys. 10 (1909) 918, available in English translation at en.wikisource.org.

Section 3.5 The metric 109

Page 110: General Relativity

e / Einstein and Ehrenfest.

sion of information at v > c to other parts of the disk, and thenthere would exist frames of reference in which the information wasreceived before it was transmitted. The same applies if the hammertap is used to impart rotational motion to the disk.

Self-check: What if we build the disk by assembling the buildingmaterials so that they are already rotating properly before they arejoined together?

The metric in the rotating frame

What if we try to get around these problems by applying torqueuniformly all over the disk, so that the rotation starts smoothly andsimultaneously everywhere? We then run into issues identical to theones raised by Bell’s spaceship paradox (p. 66). In fact, Ehrenfest’sparadox is nothing more than Bell’s paradox wrapped around intoa circle. The same question of time synchronization comes up.

To spell this out mathematically, let’s find the metric accordingto observer A by applying the change of coordinates θ′ = θ − ωt.First we take the Euclidean metric of example 7 on page 102 andrewrite it as a (globally) Lorentzian metric in spacetime for observerB,

[1] ds2 = dt2 − dr2 − r2 dθ2 .

Applying the transformation into A’s coordinates, we find

[2] ds2 = (1− ω2r2) dt2 − dr2 − r2 dθ′2 − 2ωr2 dθ′ dt .

Recognizing ωr as the velocity of one frame relative to another, and(1−ω2r2)−1/2 as γ, we see that we do have a relativistic time dilationeffect in the dt2 term. But the dr2 and dθ′2 terms look Euclidean.Why don’t we see any Lorentz contraction of the length scale in theazimuthal direction?

The answer is that coordinates in general relativity are arbitrary,and just because we can write down a certain set of coordinates, thatdoesn’t mean they have any special physical interpretation. Thecoordinates (t, r, θ′) do not correspond physically to the quantitiesthat A would measure with clocks and meter-sticks. The tip-offis the dθ′ dt cross-term. Suppose that A sends two cars drivingaround the circumference of the carousel, one clockwise and onecounterclockwise, from the same point. If (t, r, θ′) coordinates cor-responded to clock and meter-stick measurements, then we wouldexpect that when the cars met up again on the far side of the disk,their dashboards would show equal values of the arc length rθ′ ontheir odometers and equal proper times ds on their clocks. But thisis not the case, because the sign of the dθ′ dt term is opposite for thetwo world-lines. The same effect occurs if we send beams of lightin both directions around the disk, and this is the Sagnac effect (p.74).

110 Chapter 3 Differential Geometry

Page 111: General Relativity

This is a symptom of the fact that the coordinate t is not prop-erly synchronized between different places on the disk. We alreadyknow that we should not expect to be able to find a universal timecoordinate that will match up with every clock, regardless of theclock’s state of motion. Suppose we set ourselves a more modestgoal. Can we find a universal time coordinate that will match upwith every clock, provided that the clock is at rest relative to therotating disk?

The spatial metric and synchronization of clocks

A trick for improving the situation is to eliminate the dθ′ dtcross-term by completing the square in the metric [2]. The result is

ds2 = (1− ω2r2)

[dt+

ωr2

1− ω2r2dθ′]2

− dr2 − r2

1− ω2r2dθ′2 .

The interpretation of the quantity in square brackets is as follows.Suppose that two observers situate themselves on the edge of thedisk, separated by an infinitesimal angle dθ′. They then synchronizetheir clocks by exchanging light pulses. The time of flight, measuredin the lab frame, for each light pulse is the solution of the equationds2 = 0, and the only difference between the clockwise result dt1and the counterclockwise one dt2 arises from the sign of dθ′. Thequantity in square brackets is the same in both cases, so the amountby which the clocks must be adjusted is dt = (dt2 − dt1)/2, or

dt =ωr2

1− ω2r2dθ′.

Substituting this into the metric, we are left with the purely spatialmetric

[3] ds2 = −dr2 − r2

1− ω2r2dθ′2 .

The factor of (1 − ω2r2)−1 = γ2 in the dθ′2 term is simply theexpected Lorentz-contraction factor. In other words, the circumfer-ence is, as expected, greater than 2πr by a factor of γ.

Does the metric [3] represent the same non-Euclidean spatialgeometry that A, rotating with the disk, would determine by meter-stick measurements? Yes and no. It can be interpreted as theone that A would determine by radar measurements. That is, ifA measures a round-trip travel time dt for a light signal betweenpoints separated by coordinate distances dr and dθ′, then A can saythat the spatial separation is dt/2, and such measurements will bedescribed correctly by [3]. Physical meter-sticks, however, presentsome problems. Meter-sticks rotating with the disk are subject toCoriolis and centrifugal forces, and this problem can’t be avoidedsimply by making the meter-sticks infinitely rigid, because infinitelyrigid objects are forbidden by relativity. In fact, these forces will in-evitably be strong enough to destroy any meter stick that is brought

Section 3.5 The metric 111

Page 112: General Relativity

out to r = 1/ω, where the speed of the disk becomes equal to thespeed of light.

It might appear that we could now define a global coordinate

T = t+ωr2

1− ω2r2θ′ ,

interpreted as a time coordinate that was synchronized in a con-sistent way for all points on the disk. The trouble with this inter-pretation becomes evident when we imagine driving a car aroundthe circumference of the disk, at a speed slow enough so that thereis negligible time dilation of the car’s dashboard clock relative tothe clocks tied to the disk. Once the car gets back to its originalposition, θ′ has increased by 2π, so it is no longer possible for thecar’s clock to be synchronized with the clocks tied to the disk. Weconclude that it is not possible to synchronize clocks in a rotatingframe of reference; if we try to do it, we will inevitably have to havea discontinuity somewhere. This problem is present even locally, asdemonstrated by the possibility of measuring the Sagnac effect withapparatus that is small compared to the disk. The only reason wewere able to get away with time synchronization in order to establishthe metric [3] is that all the physical manifestations of the impossi-bility of synchronization, e.g., the Sagnac effect, are proportional tothe area of the region in which synchronization is attempted. Sincewe were only synchronizing two nearby points, the area enclosed bythe light rays was zero.

GPS Example: 16As a practical example, the GPS system is designed mainly toallow people to find their positions relative to the rotating surfaceof the earth (although it can also be used by space vehicles). Thatis, they are interested in their (r , θ′,φ) coordinates. The frame ofreference defined by these coordinates is referred to as ECEF, forEarth-Centered, Earth-Fixed.

The system requires synchronization of the atomic clocks carriedaboard the satellites, and this synchronization also needs to beextended to the (less accurate) clocks built into the receiver units.It is impossible to carry out such a synchronization globally in therotating frame in order to create coordinates (T , r , θ′,φ). If wetried, it would result in discontinuities (see problem 8, p. 120).Instead, the GPS system handles clock synchronization in coor-dinates (t , r , θ′,φ), as in equation [2]. These are known as theEarth-Centered Inertial (ECI) coordinates. The t coordinate inthis system is not the one that users at neighboring points onthe earth’s surface would establish if they carried out clock syn-chronization using electromagnetic signals. It is simply the timecoordinate of the nonrotating frame of reference tied to the earth’scenter. Conceptually, we can imagine this time coordinate as onethat is established by sending out an electromagnetic “tick-tock”

112 Chapter 3 Differential Geometry

Page 113: General Relativity

signal from the earth’s center, with each satellite correcting thephase of the signal based on the propagation time inferred fromits own r . In reality, this is accomplished by communication with amaster control station in Colorado Springs, which communicateswith the satellites via relays at Kwajalein, Ascension Island, DiegoGarcia, and Cape Canaveral.

Einstein’s goof, in the rotating frame Example: 17Example 11 on p. 58 recounted Einstein’s famous mistake in pre-dicting that a clock at the pole would experience a time dilationrelative to a clock at the equator, and the empirical test of this factby Alley et al. using atomic clocks. The perfect cancellation ofgravitational and kinematic time dilations might seem fortuitous,but it fact it isn’t. When we transform into the frame rotating alongwith the earth, there is no longer any kinematic effect at all, be-cause neither clock is moving. In this frame, the surface of theearth’s oceans is an equipotential, so the gravitational time di-lation vanishes as well, assuming both clocks are at sea level.In the transformation to the rotating frame, the metric picks up adθ′ dt term, but since both clocks are fixed to the earth’s surface,they have dθ′ = 0, and there is no Sagnac effect.

Impossibility of rigid rotation, even with external forces

The determination of the spatial metric with rulers at rest rel-ative to the disk is appealing because of its conceptual simplicitycompared to complicated procedures involving radar, and this waspresumably why Einstein presented the concept using ruler measure-ments in his 1916 paper laying out the general theory of relativity.12

In an effort to recover this simplicity, we could propose using exter-nal forces to compensate for the centrifugal and Coriolis forces towhich the rulers would be subjected, causing them to stay straightand maintain their correct lengths. Something of this kind is car-ried out with the large mirrors of some telescopes, which have activesystems that compensate for gravitational deflections and other ef-fects. The first issue to worry about is that one would need someway to monitor a ruler’s length and straightness. The monitoringsystem would presumably be based on measurements with beamsof light, in which case the physical rulers themselves would becomesuperfluous.

In addition, we would need to be able to manipulate the rulers inorder to place them where we wanted them, and these manipulationswould include angular accelerations. If such a thing was possible,then it would also amount to a loophole in the resolution of theEhrenfest paradox. Could Ehrenfest’s rotating disk be acceleratedand decelerated with help from external forces, which would keep itfrom contorting into a potato chip? The problem we run into with

12The paper is reproduced in the back of the book, and the relevant part ison p. 374.

Section 3.5 The metric 113

Page 114: General Relativity

such a strategy is one of clock synchronization. When it was time toimpart an angular acceleration to the disk, all of the control systemswould have to be activated simultaneously. But we have alreadyseen that global clock synchronization cannot be realized for anobject with finite area, and therefore there is a logical contradictionin this proposal. This makes it impossible to apply rigid angularacceleration to the disk, but not necessarily the rulers, which couldin theory be one-dimensional.

3.6 The metric in general relativitySo far we’ve considered a variety of examples in which the metricis predetermined. This is not the case in general relativity. Forexample, Einstein published general relativity in 1915, but it wasnot until 1916 that Schwarzschild found the metric for a spherical,gravitating body such as the sun or the earth.

When masses are present, finding the metric is analogous tofinding the electric field made by charges, but the interpretation ismore difficult. In the electromagnetic case, the field is found ona preexisting background of space and time. In general relativity,there is no preexisting geometry of spacetime. The metric tells ushow to find distances in terms of our coordinates, but the coordi-nates themselves are completely arbitrary. So what does the metriceven mean? This was an issue that caused Einstein great distressand confusion, and at one point, in 1914, it even led him to pub-lish an incorrect, dead-end theory of gravity in which he abandonedcoordinate-independence.

With the benefit of hindsight, we can consider these issues interms of the general description of measurements in relativity givenon page 97:

1. We can tell whether events and world-lines are incident.

2. We can do measurements in local Lorentz frames.

3.6.1 The hole argument

The main factor that led Einstein to his false start is known asthe hole argument. Suppose that we know about the distribution ofmatter throughout all of spacetime, including a particular region offinite size — the “hole” — which contains no matter. By analogywith other classical field theories, such as electromagnetism, we ex-pect that the metric will be a solution to some kind of differentialequation, in which matter acts as the source term. We find a metricg(x) that solves the field equations for this set of sources, where x issome set of coordinates. Now if the field equations are coordinate-independent, we can introduce a new set of coordinates x′, which isidentical to x outside the hole, but differs from it on the inside. If

114 Chapter 3 Differential Geometry

Page 115: General Relativity

a / Einstein’s hole argument.

b / A paradox? Planet A hasno equatorial bulge, but B does.What cause produces this effect?Einstein reasoned that the causecouldn’t be B’s rotation, becauseeach planet rotates relative to theother.

we reexpress the metric in terms of these new coordinates as g′(x′),then we are guaranteed that g′(x′) is also a solution. But further-more, we can substitute x for x′, and g′(x) will still be a solution.For outside the hole there is no difference between the primed andunprimed quantities, and inside the hole there is no mass distribu-tion that has to match the metric’s behavior on a point-by-pointbasis.

We conclude that in any coordinate-invariant theory, it is impos-sible to uniquely determine the metric inside such a hole. Einsteininitially decided that this was unacceptable, because it showed alack of determinism; in a classical theory such as general relativity,we ought to be able to predict the evolution of the fields, and itwould seem that there is no way to predict the metric inside thehole. He eventually realized that this was an incorrect interpreta-tion. The only type of global observation that general relativity letsus do is measurements of the incidence of world-lines. Relabeling allthe points inside the hole doesn’t change any of the incidence rela-tions. For example, if two test particles sent into the region collideat a point x inside the hole, then changing the point’s name to x′

doesn’t change the observable fact that they collided.

3.6.2 A Machian paradox

Another type of argument that made Einstein suffer is also re-solved by a correct understanding of measurements, this time theuse of measurements in local Lorentz frames. The earth is in hy-drostatic equilibrium, and its equator bulges due to its rotation.Suppose that the universe was empty except for two planets, eachrotating about the line connecting their centers.13 Since there areno stars or other external points of reference, the inhabitants of eachplanet have no external reference points against which to judge theirrotation or lack of rotation. They can only determine their rotation,Einstein said, relative to the other planet. Now suppose that oneplanet has an equatorial bulge and the other doesn’t. This seems toviolate determinism, since there is no cause that could produce thediffering effect. The people on either planet can consider themselvesas rotating and the other planet as stationary, or they can describethe situation the other way around. Einstein believed that this ar-gument proved that there could be no difference between the sizesof the two planets’ equatorial bulges.

The flaw in Einstein’s argument was that measurements in localLorentz frames do allow one to make a distinction between rotationand a lack of rotation. For example, suppose that scientists onplanet A notice that their world has no equatorial bulge, while planetB has one. They send a space probe with a clock to B, let it stay

13The example is described in Einstein’s paper “The Foundation of the GeneralTheory of Relativity.” An excerpt, which includes the example, is given onp. 372.

Section 3.6 The metric in general relativity 115

Page 116: General Relativity

on B’s surface for a few years, and then order it to return. Whenthe clock is back in the lab, they compare it with another clock thatstayed in the lab on planet A, and they find that less time has elapsedaccording to the one that spent time on B’s surface. They concludethat planet B is rotating more quickly than planet A, and that themotion of B’s surface was the cause of the observed time dilation.This resolution of the apparent paradox depends specifically on theLorentzian form of the local geometry of spacetime; it is not availablein, e.g., Cartan’s curved-spacetime description of Newtonian gravity(see page 41).

Einstein’s original, incorrect use of this example sprang from hisinterest in the ideas of the physicist and philosopher Ernst Mach.Mach had a somewhat ill-defined idea that since motion is only awell-defined notion when we speak of one object moving relativeto another object, the inertia of an object must be caused by theinfluence of all the other matter in the universe. Einstein referredto this as Mach’s principle. Einstein’s false starts in constructinggeneral relativity were frequently related to his attempts to make histheory too “Machian.” Section 8.3 on p. 330 discusses an alternative,more Machian theory of gravity proposed by Brans and Dicke in1951.

3.7 Interpretation of coordinate independenceThis section discusses some of the issues that arise in the inter-pretation of coordinate independence. It can be skipped on a firstreading.

3.7.1 Is coordinate independence obvious?

One often hears statements like the following from relativists:“Coordinate independence isn’t really a physical principle. It’smerely an obvious statement about the relationship between math-ematics and the physical universe. Obviously the universe doesn’tcome equipped with coordinates. We impose those coordinates onit, and the way in which we do so can never be dictated by nature.”The impressionable reader who is tempted to say, “Ah, yes, that isobvious,” should consider that it was far from obvious to Newton(“Absolute, true and mathematical time, of itself, and from its ownnature flows equably without regard to anything external . . . ”), norwas it obvious to Einstein. Levi-Civita nudged Einstein in the di-rection of coordinate independence in 1912. Einstein tried hard tomake a coordinate-independent theory, but for reasons described insection 3.6.1 (p. 114), he convinced himself that that was a deadend. In 1914-15 he published theories that were not coordinate-independent, which you will hear relativists describe as “obvious”dead ends because they lack any geometrical interpretation. It seemsto me that it takes a highly refined intuition to regard as intuitively

116 Chapter 3 Differential Geometry

Page 117: General Relativity

“obvious” an issue that Einstein struggled with like Jacob wrestlingwith Elohim.

3.7.2 Is coordinate independence trivial?

It has also been alleged that coordinate independence is trivial.To gauge the justice of this complaint, let’s distinguish between tworeasons for caring about coordinate independence:

1. Coordinate independence tells us that when we solve problems,we should avoid writing down any equations in notation thatisn’t manifestly intrinsic, and avoid interpreting those equa-tions as if the coordinates had intrinsic meaning. Violatingthis advice doesn’t guarantee that you’ve made a mistake, butit makes it much harder to tell whether or not you have.

2. Coordinate independence can be used as a criterion for judgingwhether a particular theory is likely to be successful.

Nobody questions the first justification. The second is a little trick-ier. Laying out the general theory systematically in a 1916 paper,14

Einstein wrote “The general laws of nature are to be expressed byequations which hold good for all the systems of coordinates, that is,are covariant with respect to any substitutions whatever (generallycovariant).” In other words, he was explaining why, with hindsight,his 1914-1915 coordinate-dependent theory had to be a dead end.

The only trouble with this is that Einstein’s way of posing thecriterion didn’t quite hit the nail on the head mathematically. AsHilbert famously remarked, “Every boy in the streets of Gottingenunderstands more about four-dimensional geometry than Einstein.Yet, in spite of that, Einstein did the work and not the mathemati-cians.” What Einstein had in mind was that a theory like Newtonianmechanics not only lacks coordinate independence, but would alsobe impossible to put into a coordinate-independent form withoutmaking it look hopelessly complicated and ugly, like putting lipstickon a pig. But Kretschmann showed in 1917 that any theory couldbe put in coordinate independent form, and Cartan demonstrated in1923 that this could be done for Newtonian mechanics in a way thatdidn’t come out particularly ugly. Physicists today are more apt topose the distinction in terms of “background independence” (mean-ing that a theory should not be phrased in terms of an assumed ge-ometrical background) or lack of a “prior geometry” (meaning thatthe curvature of spacetime should come from the solution of fieldequations rather than being imposed by fiat). But these concepts aswell have resisted precise mathematical formulation.15 My feelingis that this general idea of coordinate independence or background

14see p. 37615Giulini, “Some remarks on the notions of general covariance and background

independence,” arxiv.org/abs/gr-qc/0603087v1

Section 3.7 Interpretation of coordinate independence 117

Page 118: General Relativity

a / Since magnetic field linescan never intersect, a magneticfield pattern contains coordinate-independent information in theform of the knotting of the lines.This figure shows the mag-netic field pattern of the starSU Aurigae, as measured byZeeman-Doppler imaging (Petitat al.). White lines representmagnetic field lines that closeupon themselves in the immedi-ate vicinity of the star; blue linesare those that extend out into theinterstellar medium.

independence is like the equivalence principle: a crucial conceptualprinciple that doesn’t lose its importance just because we can’t putit in a mathematical box with a ribbon and a bow. For example,string theorists take it as a serious criticism of their theory that it isnot manifestly background independent, and one of their goals is toshow that it has a background independence that just isn’t obviouson the surface.

3.7.3 Coordinate independence as a choice of gauge

It is instructive to consider coordinate independence from thepoint of view of a field theory. Newtonian gravity can be describedin three equivalent ways: as a gravitational field g, as a gravitationalpotential φ, or as a set of gravitational field lines. The field lines arenever incident on one another, and locally the field satisfies Poisson’sequation.

The electromagnetic field has polarization properties differentfrom those of the gravitational field, so we describe it using eitherthe two fields (E, B), a pair of potentials,16 or two sets of fieldlines. There are similar incidence conditions and local field equations(Maxwell’s equations).

Gravitational fields in relativity have polarization properties un-known to Newton, but the situation is qualitatively similar to thetwo foregoing cases. Now consider the analogy between electromag-netism and relativity. In electromagnetism, it is the fields that aredirectly observable, so we expect the potentials to have some extrin-sic properties. We can, for example, redefine our electrical ground,Φ→ Φ + C, without any observable consequences. As discussed inmore detail in section 5.6.1 on page 173, it is even possible to modifythe electromagnetic potentials in an entirely arbitrary and nonlinearway that changes from point to point in spacetime. This is called agauge transformation. In relativity, the gauge transformations arethe smooth coordinate transformations. These gauge transforma-tions distort the field lines without making them cut through oneanother.

16There is the familiar electrical potential φ, measured in volts, but also avector potential A, which you may or may not have encountered. Briefly, theelectric field is given not by −∇φ but by −∇φ−∂A/∂t, while the magnetic fieldis the curl of A. This is introduced at greater length in section 4.2.5 on page137.

118 Chapter 3 Differential Geometry

Page 119: General Relativity

Problems1 Consider a spacetime that is locally exactly like the stan-dard Lorentzian spacetime described in ch. 2, but that has a globalstructure differing in the following way from the one we have im-plicitly assumed. This spacetime has global property G: Let twomaterial particles have world-lines that coincide at event A, withsome nonzero relative velocity; then there may be some event B inthe future light-cone of A at which the particles’ world-lines coincideagain. This sounds like a description of something that we wouldexpect to happen in curved spacetime, but let’s see whether thatis necessary. We want to know whether this violates the flat-spaceproperties L1-L5 on page 400, if those properties are taken as local.(a) Demonstrate that it does not violate them, by using a model inwhich space “wraps around” like a cylinder.(b) Now consider the possibility of interpreting L1-L5 as global state-ments. Do spacetimes with property G always violate L3 if L3 istaken globally? . Solution, p. 380

2 Usually in relativity we pick units in which c = 1. Suppose,however, that we want to use SI units. The convention is that co-ordinates are written with upper indices, so that, fixing the usualCartesian coordinates in 1+1 dimensions of spacetime, an infinites-imal displacement between two events is notated (dst, dsx). In SIunits, the two components of this vector have different units, whichmay seem strange but is perfectly legal. Describe the form of themetric, including the units of its elements. Describe the lower-indexvector dsa. . Solution, p. 380

3 (a) Explain why the following expressions ain’t got goodgrammar: Uaa, x

aya, pa−qa. (Recall our notational convention thatLatin indices represent abstract indices, so that it would not makesense, for example, to interpret Uaa as U ’s ath diagonal elementrather than as an implied sum.)(b) Which of these could also be nonsense in terms of units?

. Solution, p. 381

4 Suppose that a mountaineer describes her location using co-ordinates (θ,φ,h), representing colatitude, longitude, and altitude.Infer the units of the components of dsa and of the elements of gaband gab. Given that the units of mechanical work should be newton-meters (cf 5, p. 48), infer the components of a force vector Fa andits upper-index version F a. . Solution, p. 381

5 Generalize figure h/2 on p. 48 to three dimensions.. Solution, p. 381

Problems 119

Page 120: General Relativity

6 Suppose you have a collection of pencils, some of whichhave been sharpened more times than others so that they they’reshorter. You toss them all on the floor in random orientations,and you’re then allowed to slide them around but not to rotatethem. Someone asks you to make up a definition of whether ornot a given set of three pencils “cancels.” If all pencils are treatedequally (i.e., order doesn’t matter), and if we respect the rotationalinvariance of Euclidean geometry, then you will be forced to reinventvector addition and define cancellation of pencils p, q, and r asp + q + r = 0. Do something similar with “pencil” replaced by “anoriented pairs of lines as in figure h/2 on p. 48.

7 Describe the quantity gaa. (Note the repeated index.). Solution, p. 381

8 Example 16 on page 112 discusses the discontinuity thatwould result if one attempted to define a time coordinate for theGPS system that was synchronized globally according to observersin the rotating frame, in the sense that neighboring observers couldverify the synchronization by exchanging electromagnetic signals.Calculate this discontinuity at the equator, and estimate the re-sulting error in position that would be experienced by GPS users.

. Solution, p. 381

9 Resolve the following paradox.

Equation [3] on page 111 claims to give the metric obtained by an ob-server on the surface of a rotating disk. This metric is shown to leadto a non-Euclidean value for the ratio of the circumference of a circleto its radius, so the metric is clearly non-Euclidean. Therefore a lo-cal observer should be able to detect violations of the Pythagoreantheorem.

And yet this metric was originally derived by a series of changesof coordinates, starting from the Euclidean metric in polar coordi-nates, as derived in example 7 on page 102. Section 3.4 (p. 95)argued that the intrinsic measurements available in relativity arenot capable of detecting an arbitrary smooth, one-to-one change ofcoordinates. This contradicts our earlier conclusion that there arelocally detectable violations of the Pythagorean theorem.

. Solution, p. 381

10 This problem deals with properties of the metric [3] on page111. (a) A pulse of collimated light is emitted from the center ofthe disk in a certain direction. Does the spatial track of the pulseform a geodesic of this metric? (b) Characterize the behavior of thegeodesics near r = 1/ω. (c) An observer at rest with respect to thesurface of the disk proposes to verify the non-Euclidean nature ofthe metric by doing local tests in which right triangles are formedout of laser beams, and violations of the Pythagorean theorem aredetected. Will this work? . Solution, p. 382

120 Chapter 3 Differential Geometry

Page 121: General Relativity

11 In the early decades of relativity, many physicists were in thehabit of speaking as if the Lorentz transformation described what anobserver would actually “see” optically, e.g., with an eye or a camera.This is not the case, because there is an additional effect due to opti-cal aberration: observers in different states of motion disagree aboutthe direction from which a light ray originated. This is analogousto the situation in which a person driving in a convertible observesraindrops falling from the sky at an angle, even if an observer on thesidewalk sees them as falling vertically. In 1959, Terrell and Penroseindependently provided correct analyses,17 showing that in realityan object may appear contracted, expanded, or rotated, dependingon whether it is approaching the observer, passing by, or receding.The case of a sphere is especially interesting. Consider the followingfour cases:

A The sphere is not rotating. The sphere’s center is at rest. Theobserver is moving in a straight line.

B The sphere is not rotating, but its center is moving in a straightline. The observer is at rest.

C The sphere is at rest and not rotating. The observer movesaround it in a circle whose center coincides with that of thesphere.

D The sphere is rotating, with its center at rest. The observer isat rest.

Penrose showed that in case A, the outline of the sphere is stillseen to be a circle, although regions on the sphere’s surface appeardistorted.

What can we say about the generalization to cases B, C, and D?. Solution, p. 382

12 This problem involves a relativistic particle of mass m whichis also a wave, as described by quantum mechanics. Let c = 1 and~ = 1 throughout. Starting from the de Broglie relations E = ωand p = k, where k is the wavenumber, find the dispersion relationconnecting ω to k. Calculate the group velocity, and verify that it isconsistent with the usual relations p = mγv and E = mγ for m > 0.What goes wrong if you instead try to associate v with the phasevelocity? . Solution, p. 382

17James Terrell, “Invisibility of the Lorentz Contraction,” Physical Review 116(1959) 1045. Roger Penrose, “The Apparent Shape of a Relativistically MovingSphere,” Proceedings of the Cambridge Philosophical Society 55 (1959) 139.

Problems 121

Page 122: General Relativity

122 Chapter 3 Differential Geometry

Page 123: General Relativity

Chapter 4

Tensors

We now have enough machinery to be able to calculate quite a bit ofinteresting physics, and to be sure that the results are actually mean-ingful in a relativistic context. The strategy is to identify relativisticquantities that behave as Lorentz scalars and Lorentz vectors, andthen combine them in various ways. The notion of a tensor has beenintroduced on page 101. A Lorentz scalar is a tensor of rank 0, anda Lorentz vector is a rank-1 tensor.

4.1 Lorentz scalarsA Lorentz scalar is a quantity that remains invariant under both spa-tial rotations and Lorentz boosts. Mass is a Lorentz scalar.1 Elec-tric charge is also a Lorentz scalar, as demonstrated to extremelyhigh precision by experiments measuring the electrical neutrality ofatoms and molecules to a relative precision of better than 10−20; theelectron in a hydrogen atom has typically velocities of about 1/100,and those in heavier elements such as uranium are highly relativis-tic, so any violation of Lorentz invariance would give the atoms anonvanishing net electric charge.

The time measured by a clock traveling along a particular world-line from one event to another is something that all observers willagree upon; they will simply note the mismatch with their ownclocks. It is therefore a Lorentz scalar. This clock-time as measuredby a clock attached to the moving body in question is often referredto as proper time, “proper” being used here in the somewhat archaicsense of “own” or “self,” as in “The Vatican does not lie within Italyproper.” Proper time, which we notate τ , can only be defined fortimelike world-lines, since a lightlike or spacelike world-line isn’tpossible for a material clock.

More generally, when we express a metric as ds2 = . . ., thequantity ds is a Lorentz scalar. In the special case of a timelikeworld-line, ds and dτ are the same thing. (In books that use a−+ ++ metric, one has ds = −dτ .)

Even more generally, affine parameters, which exist independentof any metric at all, are scalars. As a trivial example, if τ is aparticular object’s proper time, then τ is a valid affine parameter,

1Some older books define mass as transforming according to m→ γm, whichcan be made to give a self-consistent theory, but is ugly.

123

Page 124: General Relativity

but so is 2τ+7. Less trivially, a photon’s proper time is always zero,but one can still define an affine parameter along its trajectory. Wewill need such an affine parameter, for example, in section 6.2.7,page 222, when we calculate the deflection of light rays by the sun,one of the early classic experimental tests of general relativity.

Another example of a Lorentz scalar is the pressure of a perfectfluid, which is often assumed as a description of matter in cosmo-logical models.

Infinitesimals and the clock “postulate” Example: 1At the beginning of chapter 3, I motivated the use of infinitesimalsas useful tools for doing differential geometry in curved space-time. Even in the context of special relativity, however, infinitesi-mals can be useful. One way of expressing the proper time accu-mulated on a moving clock is

s =∫

ds

=∫ √

gi j dx i dx j

=∫ √

1−(

dxdt

)2

−(

dydt

)2

−(

dzdt

)2

dt ,

which only contains an explicit dependence on the clock’s veloc-ity, not its acceleration. This is an example of the clock “postulate”referred to in the remark at the end of homework problem 1 onpage 84. Note that the clock postulate only applies in the limit ofa small clock. This is represented in the above equation by theuse of infinitesimal quantities like dx .

4.2 Four-vectors4.2.1 The velocity and acceleration four-vectors

Our basic Lorentz vector is the spacetime displacement dxi. Anyother quantity that has the same behavior as dxi under rotationsand boosts is also a valid Lorentz vector. Consider a particle movingthrough space, as described in a Lorentz frame. Since the particlemay be subject to nongravitational forces, the Lorentz frame can-not be made to coincide (except perhaps momentarily) with theparticle’s rest frame. If dxi is not lightlike, then the correspondinginfinitesimal proper time interval dτ is nonzero. As with Newtonianthree-vectors, dividing a four-vector by a Lorentz scalar producesanother quantity that transforms as a four-vector, so dividing theinfinitesimal displacement by a nonzero infinitesimal proper timeinterval, we have the four-velocity vector vi = dxi/dτ , whose com-ponents in a Lorentz coordinate system are (γ, γu1, γu2, γu3), where(u1,u2,u3) is the ordinary three-component velocity vector as de-fined in classical mechanics. The four-velocity’s squared magnitude

124 Chapter 4 Tensors

Page 125: General Relativity

vivi is always exactly 1, even though the particle is not moving atthe speed of light. (If it were moving at the speed of light, we wouldhave dτ = 0, and v would be undefined.)

When we hear something referred to as a “vector,” we usuallytake this is a statement that it not only transforms as a vector, butalso that it adds as a vector. But we have already seen in section2.3.1 on page 66 that even collinear velocities in relativity do notadd linearly; therefore they clearly cannot add linearly when dressedin the clothing of four-vectors. We’ve also seen in section 2.5.3 thatthe combination of non-collinear boosts is noncommutative, and isgenerally equivalent to a boost plus a spatial rotation; this is alsonot consistent with linear addition of four-vectors. At the risk ofbeating a dead horse, a four-velocity’s squared magnitude is always1, and this is not consistent with being able to add four-velocityvectors.

A zero velocity vector? Example: 2. Suppose an object has a certain four-velocity v i in a certainframe of reference. Can we transform into a different frame inwhich the object is at rest, and its four-velocity is zero?

. No. In general, the Lorentz transformation preserves the mag-nitude of vectors, so it can never transform a vector with a zeromagnitude into one with nonzero magnitude. Since this is a ma-terial object (not a ray of light) we can transform into a frame inwhich the object is at rest, but an object at rest does not have avanishing four-velocity. It has a four-velocity of (1, 0, 0, 0).

Example 2 suggests a nice way of thinking about velocity vectors,which is that every velocity vector represents a potential observer.An observer is a material object, and therefore has a timelike veloc-ity vector. This observer writes her own velocity vector as (1, 0, 0, 0),i.e., as the unit vector in the timelike direction. Often when we seean expression involving a velocity vector, we can interpret it as de-scribing a measurement taken by a specific observer.

Orthogonality as simultaneity Example: 3In a space where the inner product can be negative, orthogonalitydoesn’t mean what our euclidean intuition thinks it means. For ex-ample, a lightlike vector can be orthogonal to itself — a situationthat never occurs in a euclidean space.

Suppose we have a timelike vector t and a spacelike one x. Whatwould it mean for t and x to be orthogonal, with t·x = 0? Since t istimelike, we can make a unit vector t = t/|t| out of it, and interprett as the velocity vector of some hypothetical observer. We thenknow that in that observer’s frame, t is simply a unit vector alongthe time axis. It now becomes clear that x must be parallel to thex axis, i.e., it represents a displacement between two events thatthis observer considers to be simultaneous.

Section 4.2 Four-vectors 125

Page 126: General Relativity

This is an example of the idea that expressions involving velocityvectors can be interpreted as measurements taken by a certainobserver. The expression t · x = 0 can be interpreted as meaningthat according to an observer whose world-line is tangent to t, xrepresents a relationship of simultaneity.

The four-acceleration is found by taking a second derivative withrespect to proper time. Its squared magnitude is only approximatelyequal to minus the squared magnitude of the Newtonian accelerationthree-vector, in the limit of small velocities.

Constant acceleration Example: 4. Suppose a spaceship moves so that the acceleration is judgedto be the constant value a by an observer on board. Find themotion x(t) as measured by an observer in an inertial frame.

. Let τ stand for the ship’s proper time, and let dots indicatederivatives with respect to τ. The ship’s velocity has magnitude1, so

t2 − x2 = 1 .

An observer who is instantaneously at rest with respect to theship judges is to have a four-acceleration (0, a, 0, 0) (because thelow-velocity limit applies). The observer in the (t , x) frame agreeson the magnitude of this vector, so

t2 − x2 = −a2 .

The solution of these differential equations is t = 1a sinh aτ,

x = 1a cosh aτ, and eliminating τ gives

x =1a

√1 + a2t2 .

As t approaches infinity, dx/dt approaches the speed of light.

4.2.2 The momentum four-vector

Definition for a material particle

If we hope to find something that plays the role of momentumin relativity, then the momentum three-vector probably needs tobe generalized to some kind of four-vector. If so, then the law ofconservation of momentum will be valid regardless of one’s frame ofreference, which is necessary.2

If we are to satisfy the correspondence principle then the rela-tivistic definition of momentum should probably look as much aspossible like the nonrelativistic one. In subsection 4.2.1, we definedthe velocity four-vector in the case of a particle whose dxi is not

2We are not guaranteed that this is the right way to proceed, since the con-verse is not true: some three-vectors such as the electric and magnetic fields areembedded in rank-2 tensors in more complicated ways than this. See section4.2.4, p. 136.

126 Chapter 4 Tensors

Page 127: General Relativity

lightlike. Let’s assume for the moment that it makes sense to thinkof mass as a scalar. As with Newtonian three-vectors, multiplyinga Lorentz scalar by a four-vector vector produces another quantitythat transforms as a four-vector. We therefore conjecture that thefour-momentum of a material particle can be defined as pi = mvi,which in Lorentz coordinates is (mγ,mγv1,mγv2,mγv3). There isno a priori guarantee that this is right, but it’s the most reasonablething to guess. It needs to be checked against experiment, and alsofor consistency with the other parts of our theory.

The spacelike components look like the classical momentum vec-tor multiplied by a factor of γ, the interpretation being that to anobserver in this frame, the moving particle’s inertia is increased rel-ative to its value in the particle’s rest frame. Such an effect is indeedobserved experimentally. This is why particle accelerators are so bigand expensive. As the particle approaches the speed of light, γ di-verges, so greater and greater forces are needed in order to producethe same acceleration. In relativistic scattering processes with ma-terial particles, we find empirically that the four-momentum we’vedefined is conserved, which confirms that our conjectures above arevalid, and in particular that the quantity we’re calling m can betreated as a Lorentz scalar, and this is what all physicists do today.The reader is cautioned, however, that up until about 1950, it wascommon to use the word “mass” for the combination mγ (whichis what occurs in the Lorentz-coordinate form of the momentumvector), while referring to m as the “rest mass.” This archaic termi-nology is only used today in some popular-level books and low-levelschool textbooks.

Equivalence of mass and energy

The momentum four-vector has locked within it the reason forEinstein’s famous E = mc2, which in our relativistic units becomessimply E = m. To see why, consider the experimentally measuredinertia of a physical object made out of atoms. The subatomicparticles are all moving, and many of the velocities, e.g., the ve-locities of the electrons, are quite relativistic. This has the effectof increasing the experimentally determined inertial mass of thewhole object, by a factor of γ averaged over all the particles — eventhough the masses of the individual particles are invariant Lorentzscalars. (This same increase must also be observed for the gravita-tional mass, based on the equivalence principle as verified by Eotvosexperiments.)

Now if the object is heated, the velocities will increase on theaverage, resulting in a further increase in its mass. Thus, a certainamount of heat energy is equivalent to a certain amount of mass.But if heat energy contributes to mass, then the same must be truefor other forms of energy. For example, suppose that heating leads toa chemical reaction, which converts some heat into electromagnetic

Section 4.2 Four-vectors 127

Page 128: General Relativity

binding energy. If one joule of binding energy did not convert tothe same amount of mass as one joule of heat, then this would allowthe object to spontaneously change its own mass, and then by con-servation of momentum it would have to spontaneously change itsown velocity, which would clearly violate the principle of relativity.We conclude that mass and energy are equivalent, both inertiallyand gravitationally. In relativity, neither is separately conserved;the conserved quantity is their sum, referred to as the mass-energy,E. An alternative derivation, by Einstein, is given in example 16 onpage 135.

Energy is the timelike component of the four-momentum

The Lorentz transformation of a zero vector is always zero. Thismeans that the momentum four-vector of a material object can’tequal zero in the object’s rest frame, since then it would be zeroin all other frames as well. So for an object of mass m, let itsmomentum four-vector in its rest frame be (f(m), 0, 0, 0), where fis some function that we need to determine, and f can depend onlyon m since there is no other property of the object that can bedynamically relevant here. Since conservation laws are additive, fhas to be f(m) = km for some universal constant k. In where c = 1,k is unitless. Since we want to recover the appropriate Newtonianlimit for massive bodies, and since vt = 1 in that limit, we need k =1. Transforming the momentum four-vector from the particle’s restframe into some other frame, we find that the timelike componentis no longer m. We interpret this as the relativistic mass-energy, E.

Since the momentum four-vector was obtained from the magni-tude-1 velocity four-vector through multiplication by m, its squaredmagnitude pipi is equal to the square of the particle’s mass. Writingp for the magnitude of the momentum three-vector, and E for themass-energy, we find the useful relation m2 = E2−p2. We take thisto be the relativistic definition the mass of any particle, includingone whose dxi is lightlike.

Particles traveling at c

The definition of four-momentum as pi = mvi only works forparticles that move at less than c. For those that move at c, thefour-velocity is undefined. As we’ll see in example 6 on p. 129, thisclass of particles is exactly those that are massless. As shown onp. 32, the three-momentum of a light wave is given by p = E. Thefact that this momentum is nonzero implies that for light pi = mvi

represents an indeterminate form. The fact that this momentumequals E is consistent with our definition of mass as m2 = E2 − p2.

Mass is not additive

Since the momentum four-vector pa is additive, and our defini-tion of mass as papa depends on the vector in a nonlinear way, itfollows that mass is not additive (even for particles that are not

128 Chapter 4 Tensors

Page 129: General Relativity

interacting but are simply considered collectively).

Mass of two light waves Example: 5Let the momentum of a certain light wave be (pt , px ) = (E , E),and let another such wave have momentum (E ,−E). The totalmomentum is (2E , 0). Thus this pair of massless particles has acollective mass of 2E .

Massless particles travel at c Example: 6We demonstrate this by showing that if we suppose the opposite,then there are two different consequences, either of which wouldbe physically unacceptable.

When a particle does have a nonvanishing mass, we have

limE/m→∞

|v | = limE/m→∞

|p|E

= 1 .

Thus if we had a massless particle with |v | 6= 1, its behaviorwould be different from the limiting behavior of massive particles.But this is physically unacceptable because then we would havea magic method for detecting arbitrarily small masses such as10−10000000000 kg. We don’t actually know that the photon, forexample, is exactly massless; see example 13 on p. 131.

Furthermore, suppose that a massless particle had |v | < 1 inthe frame of some observer. Then some other observer couldbe at rest relative to the particle. In such a frame, the particle’sthree-momentum p is zero by symmetry, since there is no pre-ferred direction for it. Then E2 = p2 + m2 is zero as well, sothe particle’s entire energy-momentum four-vector is zero. Buta four-vector that vanishes in one frame also vanishes in everyother frame. That means we’re talking about a particle that can’tundergo scattering, emission, or absorption, and is therefore un-detectable by any experiment. This is physically unacceptablebecause we don’t consider phenomena (e.g., invisible fairies) tobe of physical interest if they are undetectable even in principle.

Gravitational redshifts Example: 7Since a photon’s energy E is equivalent to a certain gravitationalmass m, photons that rise or fall in a gravitational field mustlose or gain energy, and this should be observed as a redshiftor blueshift in the frequency. We expect the change in gravita-tional potential energy to be E∆φ, giving a corresponding oppo-site change in the photon’s energy, so that ∆E/E = ∆φ. In metricunits, this becomes ∆E/E = ∆φ/c2, and in the field near theEarth’s surface we have ∆E/E = gh/c2. This is the same re-sult that was found in section ?? based only on the equivalenceprinciple, and verified experimentally by Pound and Rebka as de-scribed in section ??.

Constraints on polarization Example: 8We observe that electromagnetic waves are always polarized

Section 4.2 Four-vectors 129

Page 130: General Relativity

transversely, never longitudinally. Such a constraint can only ap-ply to a wave that propagates at c. If it applied to a wave thatpropagated at less than c, we could move into a frame of refer-ence in which the wave was at rest. In this frame, all directions inspace would be equivalent, and there would be no way to decidewhich directions of polarization should be permitted. For a wavethat propagates at c, there is no frame in which the wave is at rest(see p. 98).

Relativistic work-energy theorem Example: 9In Einstein’s original 1905 paper on relativity, he assumed withoutproviding any justification that the Newtonian work-energy rela-tion W = Fd was valid relativistically. One way of justifying this isthat we can construct a simple machine with a mechanical advan-tage A and a reduction of motion by 1/A, with these ratios beingexact relativistically.3 One can then calculate, as Einstein did,

W =∫

dpdt

dx =∫

dpdv

dxdt

dv = m(γ− 1) ,

which is consistent with our result for E as a function of γ if weequate it to E(γ)− E(1).

The Dirac sea Example: 10A great deal of physics can be derived from the T.H. White’s

principle that “whatever is not forbidden in compulsory” — orig-inally intended for ants but applied to particles by Gell-Mann.In quantum mechanics, any process that is not forbidden by aconservation law is supposed to occur. The relativistic relationE = ±

√p2 + m2 has two roots, a positive one and a negative one.

The positive-energy and negative-energy states are separated bya no-man’s land of width 2m, so no continuous classical processcan lead from one side to the other. But quantum-mechanically, ifan electron exists with energy E = +

√p2 + m2, it should be able to

make a quantum leap into a state with E = −√

p2 + m2, emittingthe energy difference of 2E in the form of photons. Why doesn’tthis happen? One explanation is that the states with E < 0 are allalready occupied. This is the “Dirac sea,” which we now interpretas being full of electrons. A vacancy in the sea manifests itself asan antielectron.

Massive neutrinos Example: 11Neutrinos were long thought to be massless, but are now believedto have masses in the eV range. If they had been massless, theywould always have had to propagate at the speed of light. Al-though they are now thought to have mass, that mass is six or-ders of magnitude less than the MeV energy scale of the nuclearreactions in which they are produced, so all neutrinos observedin experiments are moving at velocities very close to the speed oflight.3For an explicit example, see bit.ly/1aUXIa8.

130 Chapter 4 Tensors

Page 131: General Relativity

No radioactive decay of massless particles Example: 12A photon cannot decay into an electron and a positron, γ→ e+ +e−, in the absence of a charged particle to interact with. To seethis, consider the process in the frame of reference in which theelectron-positron pair has zero total momentum. In this frame, thephoton must have had zero (three-)momentum, but a photon withzero momentum must have zero energy as well. This means thatconservation of relativistic four -momentum has been violated: thetimelike component of the four-momentum is the mass-energy,and it has increased from 0 in the initial state to at least 2mc2 inthe final state.

To demonstrate the consistency of the theory, we can arrive at thesame conclusion by a different method. Whenever a particle hasa small mass (small compared to its energy, say), it must travelat close to c. It must therefore have a very large time dilation,and will take a very long time to undergo radioactive decay. Inthe limit as the mass approaches zero, the time required for thedecay approaches infinity. Another way of saying this is that therate of radioactive decay must be fixed in terms of proper time,but there is no such thing as proper time for a massless particle.Thus it is not only this specific process that is forbidden, but anyradioactive decay process involving a massless particle.

There are various loopholes in this argument. The question isinvestigated more thoroughly by Fiore and Modanese.4

Massive photons Example: 13Continuing in the same vein as example 11, we can consider thepossibility that the photon has some nonvanishing mass. A 2003experiment by Luo et al.5 has placed a limit of about 10−54 kgon this mass. This is incredibly small, but suppose that future ex-perimental work using improved techniques shows that the massis less than this, but actually nonzero. A naive reaction to thisscenario is that it would shake relativity to its core, since relativityis based upon the assumption that the speed of light is a con-stant, whereas for a massive particle it need not be constant. Butthis is a misinterpretation of the role of c in relativity. As shouldbe clear from the approach taken in section 2.2, c is primarily ageometrical property of spacetime, not a property of light.

In reality, such a discovery would be more of a problem for parti-cle physicists than for relativists, as we can see by the followingsketch of an argument. Imagine two charged particles, at rest,interacting via an electrical attraction. Quantum mechanics de-

4http://arxiv.org/abs/hep-th/95080185Luo et al., “New Experimental Limit on the Photon Rest Mass with a Ro-

tating Torsion Balance,” Phys. Rev. Lett. 90 (2003) 081801. The interpretationof such experiments is difficult, and this paper attracted a series of comments. Aweaker but more universally accepted bound is 8× 10−52 kg, Davis, Goldhaber,and Nieto, Phys. Rev. Lett. 35 (1975) 1402.

Section 4.2 Four-vectors 131

Page 132: General Relativity

scribes this as an exchange of photons. Since the particles areat rest, there is no source of energy, so where do we get theenergy to make the photons? The Heisenberg uncertainty prin-ciple, ∆E∆t & h, allows us to steal this energy, provided that wegive it back within a time ∆t . This time limit imposes a limit onthe distance the photons can travel, but by using photons of lowenough energy, we can make this distance limit as large as welike, and there is therefore no limit on the range of the force. Butsuppose that the photon has a mass. Then there is a minimummass-energy mc2 required in order to create a photon, the max-imum time is h/mc2, and the maximum range is h/mc. Refiningthese crude arguments a little, one finds that exchange of zero-mass particles gives a force that goes like 1/r2, while a nonzeromass results in e−µr/r2, where µ−1 = ~/mc. For the photon,the best current mass limit corresponds to µ−1 & 1011 m, so thedeviation from 1/r2 would be difficult to measure in earthboundexperiments.

Now Gauss’s law is a specific characteristic of 1/r2 fields. It wouldbe violated slightly if photons had mass. We would have to modifyMaxwell’s equations, and it turns out6 that the necessary changeto Gauss’s law would be of the form ∇ · E = (. . .)ρ − (. . .)µ2Φ,where Φ is the electrical potential, and (. . . ) indicates factorsthat depend on the choice of units. This tells us that Φ, whichin classical electromagnetism can only be measured in terms ofdifferences between different points in space, can now be mea-sured in absolute terms. Gauge symmetry has been broken. Butgauge symmetry is indispensible in creating well-behaved rela-tivistic field theories, and this is the reason that, in general, parti-cle physicists have a hard time with forces arising from the ex-change of massive particles. The hypothetical Higgs particle,which may be observed at the Large Hadron Collider in the nearfuture, is essentially a mechanism for wriggling out of this difficultyin the case of the massive W and Z particles that are responsiblefor the weak nuclear force; the mechanism cannot, however, beextended to allow a massive photon.

Dust and radiation in cosmological models Example: 14In cosmological models, one needs an equation of state that re-

lates the pressure P to the mass-energy density ρ. The pressureis a Lorentz scalar. The mass-energy density is not (since mass-energy is just the timelike component of a particular vector), butin a coordinate system without any net flow of mass, we can ap-proximate it as one.

The early universe was dominated by radiation. A photon in abox contributes a pressure on each wall that is proportional to

6Goldhaber and Nieto, ”Terrestrial and Extraterrestrial Limits on The Pho-ton Mass,” Rev. Mod. Phys. 43 (1971) 277

132 Chapter 4 Tensors

Page 133: General Relativity

a / Example 14.

|pµ|, where µ is a spacelike index. In thermal equilibrium, each ofthese three degrees of freedom carries an equal amount of en-ergy, and since momentum and energy are equal for a masslessparticle, the average momentum along each axis is equal to 1

3E .The resulting equation of state is P = 1

3ρ. As the universe ex-panded, the wavelengths of the photons expanded in proportionto the stretching of the space they occupied, resulting in λ ∝ a−1,where a is a distance scale describing the universe’s intrinsic cur-vature at a fixed time. Since the number density of photons isdiluted in proportion to a−3, and the mass per photon varies asa−1, both ρ and P vary as a−4.

Cosmologists refer to noninteracting, nonrelativistic materials as“dust,” which could mean many things, including hydrogen gas,actual dust, stars, galaxies, and some forms of dark matter. Fordust, the momentum is negligible compared to the mass-energy,so the equation of state is P = 0, regardless of ρ. The mass-energy density is dominated simply by the mass of the dust, sothere is no red-shift scaling of the a−1 type. The mass-energydensity scales as a−3. Since this is a less steep dependence ona than the a−4, there was a point, about a thousand years afterthe Big Bang, when matter began to dominate over radiation. Atthis point, the rate of expansion of the universe made a transitionto a qualitatively different behavior resulting from the change inthe equation of state.

In the present era, the universe’s equation of state is dominatedby neither dust nor radiation but by the cosmological constant(see page 291). Figure a shows the evolution of the size of theuniverse for the three different regimes. Some of the simplercases are derived in sections 8.2.7 and 8.2.8, starting on page314.

4.2.3 The frequency vector and the relativistic Doppler shift

The frequency vector was introduced in example 8 on p. 50. Inthe spirit of index-gymnastics notation, frequency is to time as thewavenumber k = 1/λ is to space, so when treating waves relativis-tically it is natural to conjecture that there is a four-frequency famade by assembling (f , k), which behaves as a Lorentz vector. Thisis correct, since we already know that ∂a transforms as a covariantvector, and for a scalar wave of the form A = Ao exp [2πifax

a] thepartial derivative operator is identical to multiplication by 2πfa.

As an application, consider the relativistic Doppler shift of a lightwave. For simpicity, let’s restrict ourselves to one spatial dimension.For a light wave, f = k, so the frequency vector in 1+1 dimensionsis simply (f , f). Putting this through a Lorentz transformation, wefind

f ′ = (1 + v)γf =

√1 + v

1− vf ,

Section 4.2 Four-vectors 133

Page 134: General Relativity

where the second form displays more clearly the symmetic formof the relativistic relationship, such that interchanging the roles ofsource and observer is equivalent to flipping the sign of v. That is,the relativistic version only depends on the relative motion of thesource and the observer, whereas the Newtonian one also dependson the source’s motion relative to the medium (i.e., relative to thepreferred frame in which the waves have the “right” velocity). InNewtonian mechanics, we have f ′ = (1 + v)f for a moving observer.Relativistically, there is also a time dilation of the oscillation of thesource, providing an additional factor of γ.

This analysis is extended to 3+1 dimensions in problem 11.

Ives-Stilwell experiments Example: 15The relativistic Doppler shift differs from the nonrelativistic one bythe time-dilation factor γ, so that there is still a shift even whenthe relative motion of the source and the observer is perpen-dicular to the direction of propagation. This is called the trans-verse Doppler shift. Einstein suggested this early on as a testof relativity. However, such experiments are difficult to carry outwith high precision, because they are sensitive to any error inthe alignment of the 90-degree angle. Such experiments wereeventually performed, with results that confirmed relativity,7 butone-dimensional measurements provided both the earliest testsof the relativistic Doppler shift and the most precise ones to date.The first such test was done by Ives and Stilwell in 1938, using thefollowing trick. The relativistic expression Sv =

√(1 + v )/(1− v )

for the Doppler shift has the property that Sv S−v = 1, which differsfrom the nonrelativistic result of (1 + v )(1− v ) = 1− v2. One cantherefore accelerate an ion up to a relativistic speed, measureboth the forward Doppler shifted frequency ff and the backwardone fb, and compute

√ff fb. According to relativity, this should

exactly equal the frequency fo measured in the ion’s rest frame.

In a particularly exquisite modern version of the Ives-Stilwell idea,8

Saathoff et al. circulated Li+ ions at v = .064 in a storage ring.An electron-cooler technique was used in order to reduce thevariation in velocity among ions in the beam. Since the identitySv S−v = 1 is independent of v , it was not necessary to mea-sure v to the same incredible precision as the frequencies; it wasonly necessary that it be stable and well-defined. The natural linewidth was 7 MHz, and other experimental effects broadened it fur-ther to 11 MHz. By curve-fitting the line, it was possible to achieveresults good to a few tenths of a MHz. The resulting frequencies,

7See, e.g., Hasselkamp, Mondry, and Scharmann, Zeitschrift fur Physik A:Hadrons and Nuclei 289 (1979) 151.

8G. Saathoff et al., “Improved Test of Time Dilation in Relativity,” Phys.Rev. Lett. 91 (2003) 190403. A publicly available description of the experimentis given in Saathoff’s PhD thesis, www.mpi-hd.mpg.de/ato/homes/saathoff/

diss-saathoff.pdf.

134 Chapter 4 Tensors

Page 135: General Relativity

in units of MHz, were:

ff = 582490203.44± .09fb = 512671442.9± 0.5√

ff fb = 546466918.6± 0.3fo = 546466918.8± 0.4 (from previous experimental work)

The spectacular agreement with theory has made this experimenta lightning rod for anti-relativity kooks.

If one is searching for small deviations from the predictions ofspecial relativity, a natural place to look is at high velocities. Ives-Stilwell experiments have been performed at velocities as high as0.84, and they confirm special relativity.9

Einstein’s derivation of E = mc2 Example: 16On page 126, we showed that the celebrated E = mc2 follows di-rectly from the form of the Lorentz transformation. An alternativederivation was given by Einstein in one of his classic 1905 pa-pers laying out the theory of special relativity; the paper is short,and is reproduced in English translation on page 370 of this book.Having laid the groundwork of four-vectors and relativistic Dopplershifts, we can give an even shorter version of Einstein’s argument.The discussion is also streamlined by restricting the discussion to1+1 dimensions and by invoking photons.

Suppose that a lantern, at rest in the lab frame, is floating weight-lessly in outer space, and simultaneously emits two pulses oflight in opposite directions, each with energy E/2 and frequencyf . By symmetry, the momentum of the pulses cancels, and thelantern remains at rest. An observer in motion at velocity v rel-ative to the lab sees the frequencies of the beams shifted tof ′ = (1 ± v )γf . The effect on the energies of the beams canbe found purely classically, by transforming the electric and mag-netic fields to the moving frame, but as a shortcut we can ap-ply the quantum-mechanical relation Eph = hf for the energies ofthe photons making up the beams. The result is that the mov-ing observer finds the total energy of the beams to be not E but(E/2)(1 + v )γ + (E/2)(1− v )γ = Eγ.

Both observers agree that the lantern had to use up some of theenergy stored in its fuel in order to make the two pulses. Butthe moving observer says that in addition to this energy E , therewas a further energy E(γ − 1). Where could this energy havecome from? It must have come from the kinetic energy of thelantern. The lantern’s velocity remained constant throughout theexperiment, so this decrease in kinetic energy seen by the movingobserver must have come from a decrease in the lantern’s inertialmass — hence the title of Einstein’s paper, “Does the inertia of abody depend upon its energy content?”

9MacArthur et al., Phys. Rev. Lett. 56 (1986) 282 (1986)

Section 4.2 Four-vectors 135

Page 136: General Relativity

b / Magnetism is a purely rel-ativistic effect.

To figure out how much mass the lantern has lost, we have todecide how we can even define mass in this new context. InNewtonian mechanics, we had K = (1/2)mv2, and by the corre-spondence principle this must still hold in the low-velocity limit.Expanding E(γ − 1) in a Taylor series, we find that it equalsE(v2/2) + . . ., and in the low-velocity limit this must be the sameas ∆K = (1/2)∆mv2, so ∆m = E . Reinserting factors of c to getback to nonrelativistic units, we have E = ∆mc2.

4.2.4 A non-example: electric and magnetic fields

It is fairly easy to see that the electric and magnetic fields cannotbe the spacelike parts of two four-vectors. Consider the arrangementshown in figure b/1. We have two infinite trains of moving chargessuperimposed on the same line, and a single charge alongside theline. Even though the line charges formed by the two trains aremoving in opposite directions, their currents don’t cancel. A nega-tive charge moving to the left makes a current that goes to the right,so in frame 1, the total current is twice that contributed by eitherline charge.

In frame 1 the charge densities of the two line charges cancel out,and the electric field experienced by the lone charge is therefore zero.Frame 2 shows what we’d see if we were observing all this from aframe of reference moving along with the lone charge. Both linecharges are in motion in both frames of reference, but in frame1, the line charges were moving at equal speeds, so their Lorentzcontractions were equal, and their charge densities canceled out. Inframe 2, however, their speeds are unequal. The positive chargesare moving more slowly than in frame 1, so in frame 2 they areless contracted. The negative charges are moving more quickly, sotheir contraction is greater now. Since the charge densities don’tcancel, there is an electric field in frame 2, which points into thewire, attracting the lone charge.

We appear to have a logical contradiction here, because an ob-server in frame 2 predicts that the charge will collide with the wire,whereas in frame 1 it looks as though it should move with constantvelocity parallel to the wire. Experiments show that the charge doescollide with the wire, so to maintain the Lorentz-invariance of elec-tromagnetism, we are forced to invent a new kind of interaction, onebetween moving charges and other moving charges, which causes theacceleration in frame 2. This is the magnetic interaction, and if wehadn’t known about it already, we would have been forced to inventit. That is, magnetism is a purely relativistic effect. The reason arelativistic effect can be strong enough to stick a magnet to a re-frigerator is that it breaks the delicate cancellation of the extremelylarge electrical interactions between electrically neutral objects.

Although the example shows that the electric and magnetic fieldsdo transform when we change from one frame to another, it is easy

136 Chapter 4 Tensors

Page 137: General Relativity

c / The charged particle fol-lows a trajectory that extremizes∫

fb dxb compared to othernearby trajectories. Relativis-tically, the trajectory should beunderstood as a world-line in3+1-dimensional spacetime.

d / The magnetic field (top)and vector potential (bottom) ofa solenoid. The lower diagram isin the plane cutting through thewaist of the solenoid, as indicatedby the dashed line in the upperdiagram. For an infinite solenoid,the magnetic field is uniformon the inside and zero on theoutside, while the vector potentialis proportional to r on the insideand to 1/r on the outside.

to show that they do not transform as the spacelike parts of a rela-tivistic four-vector. This is because transformation between frames1 and 2 is along the axis parallel to the wire, but it affects the com-ponents of the fields perpendicular to the wire. The electromagneticfield actually transforms as a rank-2 tensor.

4.2.5 The electromagnetic potential four-vector

An electromagnetic quantity that does transform as a four-vectoris the potential. On page 118, I mentioned the fact, which may ormay not already be familiar to you, that whereas the Newtoniangravitational field’s polarization properties allow it to be describedusing a single scalar potential φ or a single vector field g = −∇φ,the pair of electromagnetic fields (E, B) needs a pair of potentials,Φ and A. It’s easy to see that Φ can’t be a Lorentz scalar. Elec-tric charge q is a scalar, so if Φ were a scalar as well, then theproduct qΦ would be a scalar. But this is equal to the energy ofthe charged particle, which is only the timelike component of theenergy-momentum four-vector, and therefore not a Lorentz scaleritself. This is a contradiction, so Φ is not a scalar.

To see how to fit Φ into relativity, consider the nonrelativisticquantum mechanical relation qΦ = hf for a charged particle in apotential Φ. Since f is the timelike component of a four-vector inrelativity, we need Φ to be the timelike component of some fourvector, Ab. For the spacelike part of this four-vector, let’s write A,so that Ab = (Φ, A). We can see by the following argument thatthis mysterious A must have something to do with the magneticfield.

Consider the example of figure c from a quantum-mechanicalpoint of view. The charged particle q has wave properties, but let’ssay that it can be well approximated in this example as following aspecific trajectory. This is like the ray approximation to wave optics.A light ray in classical optics follows Fermat’s principle, also knownas the principle of least time, which states that the ray’s path frompoint A to point B is one that extremizes the optical path length(essentially the number of oscillations). The reason for this is thatthe ray approximation is only an approximation. The ray actuallyhas some width, which we can visualize as a bundle of neighboringtrajectories. Only if the trajectory follows Fermat’s principle willthe interference among the neighboring paths be constructive. Theclassical optical path length is found by integrating k · ds, where kis the wavenumber. To make this relativistic, we need to use thefrequency four-vector to form fb dxb, which can also be expressedas fbv

b dτ = γ(f − k · v) dτ . If the charge is at rest and thereare no magnetic fields, then the quantity in parentheses is f =E/h = (q/h)Φ. The correct relativistic generalization is clearlyfb = (q/h)Ab.

Since Ab’s spacelike part, A, results in the velocity-dependent

Section 4.2 Four-vectors 137

Page 138: General Relativity

effects, we conclude that A is a kind of potential that relates to themagnetic field, in the same way that the potential Φ relates to theelectric field. A is known as the vector potential, and the relationbetween the potentials and the fields is

E = −∇Φ− ∂A

∂tB = ∇A .

An excellent discussion of the vector potential from a purely classicalpoint of view is given in the classic Feynman Lectures.10 Figure dshows an example.

4.3 The tensor transformation lawsWe may wish to represent a vector in more than one coordinatesystem, and to convert back and forth between the two represen-tations. In general relativity, the transformation of the coordinatesneed not be linear, as in the Lorentz transformations; it can be anysmooth, one-to-one function. For simplicity, however, we start byconsidering the one-dimensional case, and by assuming the coordi-nates are related in an affine manner, x′µ = axµ + b. The additionof the constant b is merely a change in the choice of origin, so ithas no effect on the components of the vector, but the dilation bythe factor a gives a change in scale, which results in v′µ = avµ for acontravariant vector. In the special case where v is an infinitesimaldisplacement, this is consistent with the result found by implicit dif-ferentiation of the coordinate transformation. For a contravariantvector, v′µ = 1

avµ. Generalizing to more than one dimension, and toa possibly nonlinear transformation, we have

v′µ = vκ∂x′µ

∂xκ[1]

v′µ = vκ∂xκ

∂x′µ.[2]

Note the inversion of the partial derivative in one equation comparedto the other. Because these equations describe a change from onecoordinate system to another, they clearly depend on the coordinatesystem, so we use Greek indices rather than the Latin ones thatwould indicate a coordinate-independent equation. Note that theletter µ in these equations always appears as an index referring tothe new coordinates, κ to the old ones. For this reason, we can getaway with dropping the primes and writing, e.g., vµ = vκ∂x′µ/∂xκ

rather than v′, counting on context to show that vµ is the vectorexpressed in the new coordinates, vκ in the old ones. This becomesespecially natural if we start working in a specific coordinate system

10The Feynman Lectures on Physics, Feynman, Leighton, and Sands, AddisonWesley Longman, 1970

138 Chapter 4 Tensors

Page 139: General Relativity

where the coordinates have names. For example, if we transformfrom coordinates (t,x, y, z) to (a, b, c, d), then it is clear that vt isexpressed in one system and vc in the other.

Self-check: Recall that the gauge transformations allowed in gen-eral relativity are not just any coordinate transformations; theymust be (1) smooth and (2) one-to-one. Relate both of these re-quirements to the features of the vector transformation laws above.

In equation [2], µ appears as a subscript on the left side of theequation, but as a superscript on the right. This would appearto violate our rules of notation, but the interpretation here is thatin expressions of the form ∂/∂xi and ∂/∂xi, the superscripts andsubscripts should be understood as being turned upside-down. Sim-ilarly, [1] appears to have the implied sum over κ written ungram-matically, with both κ’s appearing as superscripts. Normally weonly have implied sums in which the index appears once as a super-script and once as a subscript. With our new rule for interpretingindices on the bottom of derivatives, the implied sum is seen to bewritten correctly. This rule is similar to the one for analyzing theunits of derivatives written in Leibniz notation, with, e.g., d2 x/dt2

having units of meters per second squared. That is, the flipping ofthe indices like this is required for consistency so that everythingwill work out properly when we change our units of measurement,causing all our vector components to be rescaled.

A quantity v that transforms according to [1] or [2] is referredto as a rank-1 tensor, which is the same thing as a vector.

The identity transformation Example: 17In the case of the identity transformation x ′µ = xµ, equation [1]clearly gives v ′ = v , since all the mixed partial derivatives ∂x ′µ/∂xκ

with µ 6= κ are zero, and all the derivatives for κ = µ equal 1.

In equation [2], it is tempting to write

∂xκ

∂x ′µ=

1∂x ′µ∂xκ

(wrong!) ,

but this would give infinite results for the mixed terms! Only in thecase of functions of a single variable is it possible to flip deriva-tives in this way; it doesn’t work for partial derivatives. To evalu-ate these partial derivatives, we have to invert the transformation(which in this example is trivial to accomplish) and then take thepartial derivatives.

The metric is a rank-2 tensor, and transforms analogously:

gµν = gκλ∂xκ

∂x′µ∂xλ

∂x′ν

(writing g rather than g′ on the left, because context makes thedistinction clear).

Section 4.3 The tensor transformation laws 139

Page 140: General Relativity

Self-check: Write the similar expressions for gµν , gµν , and gνµ,which are entirely determined by the grammatical rules for writingsuperscripts and subscripts. Interpret the case of a rank-0 tensor.

An accelerated coordinate system? Example: 18Let’s see the effect on Lorentzian metric g of the transformation

t ′ = t x ′ = x +12

at2 .

The inverse transformation is

t = t ′ x = x ′ − 12

at ′2 .

The tensor transformation law gives

g′t ′t ′ = 1− (at ′)2

g′x ′x ′ = −1g′x ′t ′ = −at ′ .

Clearly something bad happens at at ′ = ±1, when the relativevelocity surpasses the speed of light: the t ′t ′ component of themetric vanishes and then reverses its sign. This would be physi-cally unreasonable if we viewed this as a transformation from ob-server A’s Lorentzian frame into the accelerating reference frameof observer B aboard a spaceship who feels a constant acceler-ation. Several things prevent such an interpretation: (1) B cannotexceed the speed of light. (2) Even before B gets to the speedof light, the coordinate t ′ cannot correspond to B’s proper time,which is dilated. (3) Due to time dilation, A and B do not agreeon the rate at which B is accelerating. If B measures her ownacceleration to be a′, A will judge it to be a < a′, with a→ 0 as Bapproaches the speed of light. There is nothing invalid about thecoordinate system (t ′, x ′), but neither does it have any physicallyinteresting interpretation.

Physically meaningful constant acceleration Example: 19To make a more physically meaningful version of example 18, weneed to use the result of example 4 on page 126. The some-what messy derivation of the coordinate transformation is givenby Semay.11 The result is

t ′ =(

x +1a

)sinh at

x ′ =(

x +1a

)cosh at

Applying the tensor transformation law gives (problem 7, page156):

g′t ′t ′ = (1 + ax ′)2

g′x ′x ′ = −111arxiv.org/abs/physics/0601179

140 Chapter 4 Tensors

Page 141: General Relativity

Unlike the result of example 18, this one never misbehaves.

The closely related topic of a uniform gravitational field in generalrelativity is considered in problem 6 on page 200.

Accurate timing signals Example: 20The relation between the potential A and the fields E and B

given on page 137 can be written in manifestly covariant form asFi j = ∂[iAj ], where F , called the electromagnetic tensor, is an an-tisymmetric rank-two tensor whose six independent componentscorrespond in a certain way with the components of the E andB three-vectors. If F vanishes completely at a certain point inspacetime, then the linear form of the tensor transformation lawsguarantees that it will vanish in all coordinate systems, not justone. The GPS system takes advantage of this fact in the trans-mission of timing signals from the satellites to the users. Theelectromagnetic wave is modulated so that the bits it transmitsare represented by phase reversals of the wave. At these phasereversals, F vanishes, and this vanishing holds true regardless ofthe motion of the user’s unit or its position in the earth’s gravita-tional field. Cf. problem 17 on p. 157.

Momentum wants a lower index Example: 21In example 5 on p. 48, we saw that once we arbitrarily chose towrite ruler measurements in Euclidean three-space as ∆xa ratherthan ∆xa, it became natural to think of the Newtonian force three-vector as “wanting” to be notated with a lower index. We cando something similar with the momentum 3- or 4-vector. TheLagrangian is a relativistic scalar, and in Lagrangian mechanicsmomentum is defined by pa = ∂L/∂va. The upper index in thedenominator on the right becomes a lower index on the left bythe same reasoning as was employed in the notation of the ten-sor transformation laws. Newton’s second law shows that this isconsistent with the result of example 5 on p. 48.

Section 4.3 The tensor transformation laws 141

Page 142: General Relativity

4.4 Experimental tests4.4.1 Universality of tensor behavior

The techniques developed in this chapter allow us to make a vari-ety of new predictions that can be tested by experiment. In general,the mathematical treatment of all observables in relativity as ten-sors means that all observables must obey the same transformationlaws. This is an extremely strict statement, because it requires thata wide variety of physical systems show identical behavior. For ex-ample, we already mentioned on page 74 the 2007 Gravity ProbeB experiment (discussed in detail on pages 170 and 214), in whichfour gyroscopes aboard a satellite were observed to precess due tospecial- and general-relativistic effects. The gyroscopes were compli-cated electromechanical systems, but the predicted precession wasentirely independent of these complications. We argued that if twodifferent types of gyroscopes displayed different behaviors, then theresulting discrepancy would allow us to map out some mysteriousvector field. This field would be a built-in characteristic of space-time (not produced by any physical objects nearby), and since allobservables in general relativity are supposed to be tensors, the fieldwould have to transform as a tensor. Let’s say that this tensor wasof rank 1. Since the tensor transformation law is linear, a nonzerotensor can never be transformed into a vanishing tensor in anothercoordinate system. But by the equivalence principle, any special,local property of spacetime can be made to vanish by transforminginto a free-falling frame of reference, in which the spacetime is has ageneric Lorentzian geometry. The mysterious new field should there-fore vanish in such a frame. This is a contradiction, so we concludethat different types of gyroscopes cannot differ in their behavior.

This is an example of a new way of stating the equivalence prin-ciple: there is no way to associate a preferred tensor field with space-time.12

4.4.2 Speed of light differing from c

In a Lorentz invariant theory, we interpret c as a property ofthe underlying spacetime, not of the particles that inhabit it. Oneway in which Lorentz invariance could be violated would be if dif-ferent types of particles had different maximum velocities. In 1997,Coleman and Glashow suggested a sensitive test for such an effect.13

Assuming Lorentz invariance, a photon cannot decay into anelectron and a positron, γ → e+ + e− (example 12, page 131).Suppose, however, that material particles have a maximum speedcm = 1, while photons have a maximum speed cp > 1. Then the pho-ton’s momentum four-vector, (E,E/cp) is timelike, so a frame does

12This statement of the equivalence principle, along with the others we haveencountered, is summarized in the back of the book on page 401.

13arxiv.org/abs/hep-ph/9703240

142 Chapter 4 Tensors

Page 143: General Relativity

exist in which its three-momentum is zero. The detection of cosmic-ray gammas from distant sources with energies on the order of 10TeV puts an upper limit on the decay rate, implying cp−1 . 10−15.

An even more stringent limit can be put on the possibility ofcp < 1. When a charged particle moves through a medium at a speedhigher than the speed of light in the medium, Cerenkov radiationresults. If cp is less than 1, then Cerenkov radiation could be emittedby high-energy charged particles in a vacuum, and the particleswould rapidly lose energy. The observation of cosmic-ray protonswith energies ∼ 108 TeV requires cp − 1 & −10−23.

4.4.3 Degenerate matter

The straightforward properties of the momentum four-vectorhave surprisingly far-reaching implications for matter subject to ex-treme pressure, as in a star that uses up all its fuel for nuclear fusionand collapses. These implications were initially considered too ex-otic to be taken seriously by astronomers. For historical perspective,consider that in 1916, when Einstein published the theory of gen-eral relativity, the Milky Way was believed to constitute the entireuniverse; the “spiral nebulae” were believed to be inside it, ratherthan being similar objects exterior to it. The only types of starswhose structure was understood even vaguely were those that wereroughly analogous to our own sun. (It was not known that nuclearfusion was their source of energy.) The term “white dwarf” had notbeen invented, and neutron stars were unknown.

An ordinary, smallish star such as our own sun has enough hy-drogen to sustain fusion reactions for billions of years, maintainingan equilibrium between its gravity and the pressure of its gases.When the hydrogen is used up, it has to begin fusing heavier el-ements. This leads to a period of relatively rapid fluctuations instructure. Nuclear fusion proceeds up until the formation of ele-ments as heavy as oxygen (Z = 8), but the temperatures are nothigh enough to overcome the strong electrical repulsion of these nu-clei to create even heavier ones. Some matter is blown off, but finallynuclear reactions cease and the star collapses under the pull of itsown gravity.

To understand what happens in such a collapse, we have to un-derstand the behavior of gases under very high pressures. In gen-eral, a surface area A within a gas is subject to collisions in a time tfrom the n particles occupying the volume V = Avt, where v is thetypical velocity of the particles. The resulting pressure is given byP ∼ npv/V , where p is the typical momentum.

Nondegenerate gas: In an ordinary gas such as air, the parti-cles are nonrelativistic, so v = p/m, and the thermal energyper particle is p2/2m ∼ kT , so the pressure is P ∼ nkT/V .

Section 4.4 Experimental tests 143

Page 144: General Relativity

a / Subrahmanyan Chan-drasekhar (1910-1995)

Nonrelativistic, degenerate gas: When a fermionic gas is sub-ject to extreme pressure, the dominant effects creating pres-sure are quantum-mechanical. Because of the Pauli exclu-sion principle, the volume available to each particle is ∼ V/n,so its wavelength is no more than ∼ (V/n)1/3, leading top = h/λ ∼ h(n/V )1/3. If the speeds of the particles are stillnonrelativistic, then v = p/m still holds, so the pressure be-comes P ∼ (h2/m)(n/V )5/3.

Relativistic, degenerate gas: If the compression is strong enoughto cause highly relativistic motion for the particles, then v ≈ c,and the result is P ∼ hc(n/V )4/3.

As a star with the mass of our sun collapses, it reaches a pointat which the electrons begin to behave as a degenerate gas, andthe collapse stops. The resulting object is called a white dwarf. Awhite dwarf should be an extremely compact body, about the sizeof the Earth. Because of its small surface area, it should emit verylittle light. In 1910, before the theoretical predictions had beenmade, Russell, Pickering, and Fleming discovered that 40 Eridani Bhad these characteristics. Russell recalled: “I knew enough aboutit, even in these paleozoic days, to realize at once that there wasan extreme inconsistency between what we would then have called‘possible’ values of the surface brightness and density. I must haveshown that I was not only puzzled but crestfallen, at this exceptionto what looked like a very pretty rule of stellar characteristics; butPickering smiled upon me, and said: ‘It is just these exceptionsthat lead to an advance in our knowledge,’ and so the white dwarfsentered the realm of study!”

S. Chandrasekhar showed in that 1930’s that there was an upperlimit to the mass of a white dwarf. We will recapitulate his calcu-lation briefly in condensed order-of-magnitude form. The pressureat the core of the star is P ∼ ρgr ∼ GM2/r4, where M is the totalmass of the star. The star contains roughly equal numbers of neu-trons, protons, and electrons, so M = Knm, where m is the mass ofthe electron, n is the number of electrons, and K ≈ 4000. For starsnear the limit, the electrons are relativistic. Setting the pressure atthe core equal to the degeneracy pressure of a relativistic gas, wefind that the Chandrasekhar limit is ∼ (hc/G)3/2(Km)−2 = 6M.A less sloppy calculation gives something more like 1.4M. The self-consistency of this solution is investigated in homework problem 15on page 157.

What happens to a star whose mass is above the Chandrasekharlimit? As nuclear fusion reactions flicker out, the core of the star be-comes a white dwarf, but once fusion ceases completely this cannot

144 Chapter 4 Tensors

Page 145: General Relativity

be an equilibrium state. Now consider the nuclear reactions

n→ p + e− + ν

p + e− → n + ν ,

which happen due to the weak nuclear force. The first of these re-leases 0.8 MeV, and has a half-life of 14 minutes. This explainswhy free neutrons are not observed in significant numbers in ouruniverse, e.g., in cosmic rays. The second reaction requires an inputof 0.8 MeV of energy, so a free hydrogen atom is stable. The whitedwarf contains fairly heavy nuclei, not individual protons, but sim-ilar considerations would seem to apply. A nucleus can absorb anelectron and convert a proton into a neutron, and in this context theprocess is called electron capture. Ordinarily this process will onlyoccur if the nucleus is neutron-deficient; once it reaches a neutron-to-proton ratio that optimizes its binding energy, neutron capturecannot proceed without a source of energy to make the reaction go.In the environment of a white dwarf, however, there is such a source.The annihilation of an electron opens up a hole in the “Fermi sea.”There is now an state into which another electron is allowed to dropwithout violating the exclusion principle, and the effect cascadesupward. In a star with a mass above the Chandrasekhar limit, thisprocess runs to completion, with every proton being converted into aneutron. The result is a neutron star, which is essentially an atomicnucleus (with Z = 0) with the mass of a star!

Observational evidence for the existence of neutron stars camein 1967 with the detection by Bell and Hewish at Cambridge of amysterious radio signal with a period of 1.3373011 seconds. The sig-nal’s observability was synchronized with the rotation of the earthrelative to the stars, rather than with legal clock time or the earth’srotation relative to the sun. This led to the conclusion that its originwas in space rather than on earth, and Bell and Hewish originallydubbed it LGM-1 for “little green men.” The discovery of a secondsignal, from a different direction in the sky, convinced them that itwas not actually an artificial signal being generated by aliens. Bellpublished the observation as an appendix to her PhD thesis, andit was soon interpreted as a signal from a neutron star. Neutronstars can be highly magnetized, and because of this magnetizationthey may emit a directional beam of electromagnetic radiation thatsweeps across the sky once per rotational period — the “lighthouseeffect.” If the earth lies in the plane of the beam, a periodic signalcan be detected, and the star is referred to as a pulsar. It is fairlyeasy to see that the short period of rotation makes it difficult toexplain a pulsar as any kind of less exotic rotating object. In theapproximation of Newtonian mechanics, a spherical body of densityρ, rotating with a period T =

√3π/Gρ, has zero apparent gravity

at its equator, since gravity is just strong enough to accelerate anobject so that it follows a circular trajectory above a fixed point on

Section 4.4 Experimental tests 145

Page 146: General Relativity

the surface (problem 14). In reality, astronomical bodies of plane-tary size and greater are held together by their own gravity, so wehave T & 1/

√Gρ for any body that does not fly apart spontaneously

due to its own rotation. In the case of the Bell-Hewish pulsar, thisimplies ρ & 1010 kg/m3, which is far larger than the density of nor-mal matter, and also 10-100 times greater than the typical densityof a white dwarf near the Chandrasekhar limit.

An upper limit on the mass of a neutron star can be found in amanner entirely analogous to the calculation of the Chandrasekharlimit. The only difference is that the mass of a neutron is muchgreater than the mass of an electron, and the neutrons are the onlyparticles present, so there is no factor of K. Assuming the moreprecise result of 1.4M for the Chandrasekhar limit rather thanour sloppy one, and ignoring the interaction of the neutrons via thestrong nuclear force, we can infer an upper limit on the mass of aneutron star:

1.4M

(Kme

mn

)2

≈ 5M

The theoretical uncertainties in such an estimate are fairly large.Tolman, Oppenheimer, and Volkoff originally estimated it in 1939as 0.7M, whereas modern estimates are more in the range of 1.5to 3M. These are significantly lower than our crude estimate of5M, mainly because the attractive nature of the strong nuclearforce tends to pull the star toward collapse. Unambiguous resultsare presently impossible because of uncertainties in extrapolatingthe behavior of the strong force from the regime of ordinary nuclei,where it has been relatively well parametrized, into the exotic envi-ronment of a neutron star, where the density is significantly differentand no protons are present. There are a variety of effects that maybe difficult to anticipate or to calculate. For example, Brown andBethe found in 199414 that it might be possible for the mass limit tobe drastically revised because of the process e− → K−+νe, which isimpossible in free space due to conservation of energy, but might bepossible in a neutron star. Observationally, nearly all neutron starsseem to lie in a surprisingly small range of mass, between 1.3 and1.45M, but in 2010 a neutron star with a mass of 1.97 ± .04 Mwas discovered, ruling out most neutron-star models that includedexotic matter.15

For stars with masses above the Tolman-Oppenheimer-Volkofflimit, theoretical predictions become even more speculative. A va-riety of bizarre objects has been proposed, including black stars,gravastars, quark stars, boson stars, Q-balls, and electroweak stars.

14H.A. Bethe and G.E. Brown, “Observational constraints on the maximumneutron star mass,” Astrophys. J. 445 (1995) L129. G.E. Brown and H.A.Bethe, “A Scenario for a Large Number of Low-Mass Black Holes in the Galaxy,”Astrophys. J. 423 (1994) 659. Both papers are available at adsabs.harvard.edu.

15Demorest et al., arxiv.org/abs/1010.5788v1.

146 Chapter 4 Tensors

Page 147: General Relativity

It seems likely, however, both on theoretical and observational grounds,that objects with masses of about 3 to 20 solar masses end up asblack holes; see section 6.3.3.

Section 4.4 Experimental tests 147

Page 148: General Relativity

a / Two spacelike surfaces.

b / We define a boundaryaround a region whose chargewe want to measure.

c / This boundary cuts thesphere into equal parts.

4.5 Conservation laws4.5.1 No general conservation laws

Some of the first tensors we discussed were mass and charge, bothrank-0 tensors, and the rank-1 momentum tensor, which containsboth the classical energy and the classical momentum. Physicistsoriginally decided that mass, charge, energy, and momentum wereinteresting because these things were found to be conserved. Thismakes it natural to ask how conservation laws can be formulatedin relativity. We’re used to stating conservation laws casually interms of the amount of something in the whole universe, e.g., thatclassically the total amount of mass in the universe stays constant.Relativity does allow us to make physical models of the universe asa whole, so it seems as though we ought to be able to talk aboutconservation laws in relativity.

We can’t.

First, how do we define “stays constant?” Simultaneity isn’twell-defined, so we can’t just take two snapshots, call them initialand final, and compare the total amount of, say, electric charge ineach snapshot. This difficulty isn’t insurmountable. As in figurea, we can arbitrarily pick out three-dimensional spacelike surfaces— one initial and one final — and integrate the charge over eachone. A law of conservation of charge would say that no matter whatspacelike surface we picked, the total charge on each would be thesame.

Next there’s the issue that the integral might diverge, especiallyif the universe was spatially infinite. For now, let’s assume a spa-tially finite universe. For simplicity, let’s assume that it has thetopology of a three-sphere (see section 8.2 for reassurance that thisisn’t physically unreasonable), and we can visualize it as a two-sphere.

In the case of the momentum four-vector, what coordinate sys-tem would we express it in? In general, we do not even expect tobe able to define a smooth, well-behaved coordinate system thatcovers the entire universe, and even if we did, it would not makesense to add a vector expressed in that coordinate system at pointA to another vector from point B; the best we could do would beto parallel-transport the vectors to one point and then add them,but parallel transport is path-dependent. (Similar issues occur withangular momentum.) For this reason, let’s restrict ourselves to theeasier case of a scalar, such as electric charge.

But now we’re in real trouble. How would we go about actuallymeasuring the total electric charge of the universe? The only way todo it is to measure electric fields, and then apply Gauss’s law. Thisrequires us to single out some surface that we can integrate the fluxover, as in b. This would really be a two-dimensional surface on the

148 Chapter 4 Tensors

Page 149: General Relativity

d / A relativistic jet.

three-sphere, but we can visualize it as a one-dimensional surface —a closed curve — on the two-sphere. But now suppose this curve isa great circle, c. If we measure a nonvanishing total flux across it,how do we know where the charge is? It could be on either side.

The conclusion is that conservation laws only make sense in rela-tivity under very special circumstances.16 We do not have anythinglike over-arching, global principles of conservation. As an exampleof the appropriate special circumstances, section 6.2.6, p. 218 showshow to define conserved quantities, which behave like energy andmomentum, for the motion of a test particle in a particular metricthat has a certain symmetry. This is generalized on p. 248 to ageneral, global conservation law corresponding to every continuoussymmetry of a spacetime.

4.5.2 Conservation of angular momentum and framedragging

Another special case where conservation laws work is that ifthe spacetime we’re studying gets very flat at large distances from asmall system we’re studying, then we can define a far-away boundarythat surrounds the system, measure the flux through that boundary,and find the system’s charge. For such asymptotically flat space-times, we can also get around the problems that crop up with con-served vectors, such as momentum. If the spacetime far away isnearly flat, then parallel transport loses its path-dependence, so wecan unambiguously define a notion of parallel-transporting all thecontributions to the flux to one arbitrarily chosen point P and thenadding them. Asymptotic flatness also allows us to define an ap-proximate notion of a global Lorentz frame, so that the choice of Pdoesn’t matter.

As an example, figure d shows a jet of matter being ejected fromthe galaxy M87 at ultrarelativistic fields. The blue color of the jet inthe visible-light image comes from synchrotron radiation, which isthe electromagnetic radiation emitted by relativistic charged parti-cles accelerated by a magnetic field. The jet is believed to be comingfrom a supermassive black hole at the center of M87. The emissionof the jet in a particular direction suggests that the black hole is notspherically symmetric. It seems to have a particular axis associatedwith it. How can this be? Our sun’s spherical symmetry is brokenby the existence of externally observable features such as sunspotsand the equatorial bulge, but the only information we can get abouta black hole comes from its external gravitational (and possibly elec-tromagnetic) fields. It appears that something about the spacetimemetric surrounding this black hole breaks spherical symmetry, butpreserves symmetry about some preferred axis. What aspect of theinitial conditions in the formation of the hole could have determined

16For another argument leading to the same conclusion, see subsection 7.5.1,p. 263.

Section 4.5 Conservation laws 149

Page 150: General Relativity

such an axis? The most likely candidate is the angular momentum.We are thus led to suspect that black holes can possess angular mo-mentum, that angular momentum preserves information about theirformation, and that angular momentum is externally detectable viaits effect on the spacetime metric.

What would the form of such a metric be? Spherical coordinatesin flat spacetime give a metric like this:

ds2 = dt2 − dr2 − r2 dθ2 − r2 sin2 θ dφ2 .

We’ll see in chapter 6 that for a non-rotating black hole, the metricis of the form

ds2 = (. . .) dt2 − (. . .) dr2 − r2 dθ2 − r2 sin2 θ dφ2 ,

where (. . .) represents functions of r. In fact, there is nothing spe-cial about the metric of a black hole, at least far away; the sameexternal metric applies to any spherically symmetric, non-rotatingbody, such as the moon. Now what about the metric of a rotatingbody? We expect it to have the following properties:

1. It has terms that are odd under time-reversal, correspondingto reversal of the body’s angular momentum.

2. Similarly, it has terms that are odd under reversal of the dif-ferential dφ of the azimuthal coordinate.

3. The metric should have axial symmetry, i.e., it should be in-dependent of φ.

Restricting our attention to the equatorial plane θ = π/2, the sim-plest modification that has these three properties is to add a termof the form

f(. . .)Ldφ dt ,

where (. . .) again gives the r-dependence and L is a constant, inter-preted as the angular momentum. A detailed treatment is beyondthe scope of this book, but solutions of this form to the relativisticfield equations were found by New Zealand-born physicist Roy Kerrin 1963 at the University of Texas at Austin.

The astrophysical modeling of observations like figure d is com-plicated, but we can see in a simplified thought experiment that ifwe want to determine the angular momentum of a rotating bodyvia its gravitational field, it will be difficult unless we use a measur-ing process that takes advantage of the asymptotic flatness of thespace. For example, suppose we send two beams of light past theearth, in its equatorial plane, one on each side, and measure theirdeflections, e. The deflections will be different, because the sign ofdφ dt will be opposite for the two beams. But the entire notion of a

150 Chapter 4 Tensors

Page 151: General Relativity

e / Two light rays travel in theearth’s equatorial plane from A toB. Due to frame-dragging, the raymoving with the earth’s rotationis deflected by a greater amountthan the one moving contrary toit. As a result, the figure has anasymmetric banana shape. Boththe deflection and its asymmetryare greatly exaggerated.

f / Gravity Probe B verifiedthe existence of frame-dragging.The rotational axis of the gy-roscope precesses in twoperpendicular planes due to thetwo separate effects: geodeticand frame-dragging.

“deflection” only makes sense if we have an asymptotically flat back-ground, as indicated by the dashed tangent lines. Also, if spacetimewere not asymptotically flat in this example, then there might beno unambiguous way to determine whether the asymmetry was dueto the earth’s rotation, to some external factor, or to some kind ofinteraction between the earth and other bodies nearby.

It also turns out that a gyroscope in such a gravitational fieldprecesses. This effect, called frame dragging, was predicted by Lenseand Thirring in 1918, and was finally verified experimentally in 2008by analysis of data from the Gravity Probe B experiment, to a pre-cision of about 15%. The experiment was arranged so that the rela-tively strong geodetic effect (6.6 arc-seconds per year) and the muchweaker Lense-Thirring effect (.041 arc-sec/yr) produced precessionsin perpendicular directions. Again, the presence of an asymptoti-cally flat background was involved, because the probe measured theorientations of its gyroscopes relative to the guide star IM Pegasi.

4.6 Things that aren’t quite tensorsThis section can be skipped on a first reading.

4.6.1 Area, volume, and tensor densities

We’ve embarked on a program of redefining every possible phys-ical quantity as a tensor, but so far we haven’t tackled area andvolume. Is there, for example, an area tensor in a locally Euclideanplane? We are encouraged to hope that there is such a thing, be-cause on p. 45 we saw that we could cook up a measure of areawith no other ingredients than the axioms of affine geometry. Whatkind of tensor would it be? The notions of vector and scalar fromfreshman mechanics are distinguished from one another by the factthat one has a direction in space and the other does not. Thereforewe expect that area would be a scalar, i.e., a rank-0 tensor. Butthis can’t be right, for the following reason. Under a rescaling ofcoordinates by a factor k, area should change by a factor of k2. Butby the tensor transformation laws, a rank-0 tensor is supposed tobe invariant under a change of coordinates. We therefore concludethat quantities like area and volume are not tensors.

In the language of ordinary vectors and scalars in Euclideanthree-space, one way to express area and volume is by using dot andcross products. The area of the parallelogram spanned by u and vis measured by the area vector u × v, and similarly the volume ofthe parallelepiped formed by u, v, and w can be computed as thescalar triple product u ·(v×w). Both of these quantities are definedsuch that interchanging two of the inputs negates the output. Indifferential geometry, we do have a scalar product, which is definedby contracting the indices of two vectors, as in uava. If we also had aa tensorial cross product, we would be able to define area and volume

Section 4.6 Things that aren’t quite tensors 151

Page 152: General Relativity

a / A Mobius strip is not anorientable surface.

tensors, so we conclude that there is no tensorial cross product, i.e.,an operation that would multiply two rank-1 tensors to produce arank-1 tensor. Since one of the most important physical applicationsof the cross product is to calculate the angular momentum L = r×p,we find that angular momentum in relativity is either not a tensoror not a rank-1 tensor.

When someone tells you that it’s impossible to do a seeminglystraightforward thing, the typical response is to look for a way to getaround the supposed limitation. In the case of a locally Euclideanplane, what is to stop us from making a small, standard square, andthen sliding the square around to any desired location? If we havesome figure whose area we wish to measure, we can then dissect itinto squares of that size and count the number of squares.

There are two problems with this plan, neither of which is com-pletely insurmountable. First, the area vector u×v is a vector, withits orientation specified by the direction of the normal to the surface.We need this orientation, for example, when we calculate the elec-tric flux as

∫E · dA. Figure a shows that we cannot always define

such an orientation in a consistent way. When the x− y coordinatesystem is slid around the Mobius strip, it ends up with the oppo-site orientation. In general relativity, there is not any guarantee oforientability in space — or even in time! But the vast majority ofspacetimes of physical interest are in fact orientable in every desiredway, and even for those that aren’t, orientability still holds in anysufficiently small neighborhood.

The other problem is that area has the wrong scaling propertiesto be a rank-0 tensor. We can get around this problem by being will-ing to discuss quantities that don’t transform exactly like tensors.Often we only care about transformations, such as rotations andtranslations, that don’t involve any scaling. We saw in section 2.2 onp. 52 that Lorentz boosts also have the special property of preserv-ing area in a space-time plane containing the boost. We thereforedefine a tensor density as a quantity that transforms like a tensorunder rotations, translations, and boosts, but that rescales and pos-sibly flips its sign under other types of coordinate transformations.In general, the additional factor comes from the determinant d ofthe matrix consisting of the partial derivatives ∂x′µ/∂xν (called theJacobian matrix). This determinant is raised to a power W , knownas the weight of the tensor density. Weight zero corresponds to thecase of a real tensor.

Area as a tensor density Example: 22In a Euclidean plane, making our rulers longer by a factor of kcauses the area measured in the new coordinates to increase bya factor of k2. The rescaling is represented by a matrix of partialderivatives that is simply kI, where I is the identity matrix. Thedeterminant is k2. Therefore area is a tensor density of weight

152 Chapter 4 Tensors

Page 153: General Relativity

+1.

Mass density Example: 23A piece of aluminum foil as a certain number of milligrams persquare centimeter. Shrinking rulers by 1/k causes this number todecrease by k−2, so this mass density has W = −1.

In Weyl’s apt characterization,17 tensors represent intensities,while tensor densities measure quantity.

4.6.2 The Levi-Civita symbol

Although there is no tensorial vector cross product, we can definea similar operation whose output is a tensor density. This is mosteasily expressed in terms of the Levi-Civita symbol ε. (See p. 91 forbiographical information about Levi-Civita.)

In n dimensions, the Levi-Civita symbol has n indices. It isdefined so as to be totally asymmetric, in the sense that if any two ofthe indices are interchanged, its sign flips. This is sufficient to definethe symbol completely except for an over-all scaling, which is fixedby arbitrarily taking one of the nonvanishing elements and settingit to +1. To see that this is enough to define ε completely, first notethat it must vanish when any index is repeated. For example, inthree dimensions labeled by κ, λ, and µ, εκλλ is unchanged underan interchange of the second and third indices, but it must also flipits sign under this operation, which means that it must be zero. Ifwe arbitrarily fix εκλµ = +1, then interchange of the second andthird indices gives εκµλ = −1, and a further interchange of thefirst and second yields εµκλ = +1. Any permutation of the threedistinct indices can be reached from any other by a series of suchpairwise swaps, and the number of swaps is uniquely odd or even.18

In Cartesian coordinates in three dimensions, it is conventional tochoose εxyz = +1 when x, y, and z form a right-handed spatialcoordinate system. In four dimensions, we take εtxyz = +1 when tis future-timelike and (x, y, z) are right-handed.

In Euclidean three-space, in coordinates such that g = diag(1, 1, 1),the vector cross product A = u× v, where we have in mind the in-terpretation of A as area, can be expressed as Aµ = εµκλu

κvλ.

Self-check: Check that this matches up with the more familiardefinition of the vector cross product.

Now suppose that we want to generalize to curved spaces, whereg cannot be constant. There are two ways to proceed.

17Hermann Weyl, “Space-Time-Matter,” 1922, p. 109, available online atarchive.org/details/spacetimematter00weyluoft.

18For a proof, see the Wikipedia article “Parity of a permutation.”

Section 4.6 Things that aren’t quite tensors 153

Page 154: General Relativity

Tensorial ε

One is to let ε have the values 0 and ±1 at some arbitrarilychosen point, in some arbitrarily chosen coordinate system, but tolet it transform like a tensor. Then Aµ = εµκλu

κvλ needs to bemodified, since the right-hand side is a tensor, and that would makeA a tensor, but if A is an area we don’t want it to transform likea 1-tensor. We therefore need to revise the definition of area tobe Aµ = g−1/2εµκλu

κvλ, where g is the determinant of the lower-index form of the metric. The following two examples justify thisprocedure in a locally Euclidean three-space.

Scaling coordinates with tensorial ε Example: 24Then scaling of coordinates by k scales all the elements of themetric by k−2, g by k−6, g−1/2 by k3, εµκλ by k−3, and uκvλ byk2. The result is to scale Aµ by k+3−3+2 = k2, which makes senseif A is an area.

Oblique coordinates with tensorial ε Example: 25In oblique coordinates (example 8, p. 103), the two basis vectorshave unit length but are at an angle φ 6= π/2 to one another. Thedeterminant of the metric is g = sin2φ, so

√g = sinφ, which is

exactly the correction factor needed in order to get the right areawhen u and v are the two basis vectors.

This procedure works more generally, the sole modification beingthat in a space such as a locally Lorentzian one where g < 0 we needto use

√−g as the correction factor rather than

√g.

Tensor-density ε

The other option is to let ε have the same 0 and ±1 values atall points. Then ε is clearly not a tensor, because it doesn’t scaleby a factor of kn when the coordinates are scaled by k; ε is a tensordensity with weight −1 for the upper-index version and +1 for thelower-index one. The relation Aµ = εµκλu

κvλ gives an area that isa tensor density, not a tensor, because A is not written in terms ofpurely tensorial quantities. Scaling the coordinates by k leaves εµκλunchanged, scales up uκvλ by k2, and scales up the area by k2, asexpected.

Unfortunately, there is no consistency in the literature as towhether ε should be a tensor or a tensor density. Some authorsdefine both a tensor and a nontensor version, with notations likeε and ε, or19 ε0123 and [0123]. Others avoid writing the letter εcompletely.20 The tensor-density version is convenient because wealways know that its value is 0 or ±1. The tensor version has theadvantage that it transforms as a tensor.

19Misner, Thorne, and Wheeler20Hawking and Ellis

154 Chapter 4 Tensors

Page 155: General Relativity

4.6.3 Spacetime volume

We saw on p. 54 that area in the 1 + 1-dimensional plane of flatspacetime is preserved by a Lorentz boost. This makes sense becausewhen we express the area spanned by a parallelogram with edges pand q as εabpasb, all the indices have been contracted, leaving arank-0 tensor density. In 3 + 1 dimensions, we have the spacetimevolume V = εabcdpaqbrcsd spanned by the paralellepiped with edgesp, q, r, and s. A typical situation in which this volume is nonzerowould be that in which one of the vectors is timelike and the otherthree spacelike. Let the timelike one be p. Assume |p| = 1, sincean example with |p| 6= 1 can be reduced to this by scaling. Thenp can be interpreted as the velocity vector of some observer, andV as the spatial volume that the observer says is spanned by the3-paralellepiped with edges q, r, and s.

4.6.4 Angular momentum

As discussed above, angular momentum cannot be a rank-1 ten-sor. One approach is to define a rank-2 angular momentum tensorLab = rapb.

In a frame whose origin is instantaneously moving along witha certain system’s center of mass at a certain time, the time-spacecomponents of L vanish, and the components Lyz, Lzx, and Lxy

coincide in the nonrelativistic limit with the x, y, and z componentsof the Newtonian angular momentum vector. We can also definea three-dimensional object La = εabcL

bc (with three-dimensionaltensor-density ε in the spatial dimensions) that doesn’t transformlike a tensor.

Section 4.6 Things that aren’t quite tensors 155

Page 156: General Relativity

Problems1 Describe the four-velocity of a photon.

. Solution, p. 382

2 The Large Hadron Collider is designed to accelerate protonsto energies of 7 TeV. Find 1− v for such a proton.

. Solution, p. 382

3 Prove that an electron in a vacuum cannot absorb a photon.(This is the reason that the ability of materials to absorb gamma-rays is strongly dependent on atomic number Z. The case of Z = 0corresponds to the vacuum.)

4 (a) For an object moving in a circle at constant speed, thedot product of the classical three-vectors v and a is zero. Givean interpretation in terms of the work-kinetic energy theorem. (b)In the case of relativistic four-vectors, viai = 0 for any world-line.Give a similar interpretation. Hint: find the rate of change of thefour-velocity’s squared magnitude.

5 Starting from coordinates (t,x) having a Lorentzian metricg, transform the metric tensor into reflected coordinates (t′,x′) =(t,−x), and verify that g′ is the same as g.

6 Starting from coordinates (t,x) having a Lorentzian metric g,transform the metric tensor into Lorentz-boosted coordinates (t′,x′),and verify that g′ is the same as g.

7 Verify the transformation of the metric given in example 19on page 140.

8 A skeptic claims that the Hafele-Keating experiment can onlybe explained correctly by relativity in a frame in which the earth’saxis is at rest. Prove mathematically that this is incorrect. Does itmatter whether the frame is inertial? . Solution, p. 382

9 Assume the metric g = diag(+1, +1, +1). Which of the fol-lowing correctly expresses the noncommutative property of ordinarymatrix multiplication?

A ji Bjk 6= BjkA

ji

A ji Bjk 6= B j

i Ajk

10 Example 10 on page 130 introduced the Dirac sea, whoseexistence is implied by the two roots of the relativistic relation E =±√p2 +m2. Prove that a Lorentz boost will never transform a

positive-energy state into a negative-energy state.. Solution, p. 383

11 On page 133, we found the relativistic Doppler shift in 1+1dimensions. Extend this to 3+1 dimensions, and check your resultagainst the one given by Einstein on page 370.

156 Chapter 4 Tensors

Page 157: General Relativity

. Solution, p. 383

12 Estimate the energy contained in the electric field of anelectron, if the electron’s radius is r. Classically (i.e., assumingrelativity but no quantum mechanics), this energy contributes to theelectron’s rest mass, so it must be less than the rest mass. Estimatethe resulting lower limit on r, which is known as the classical electronradius. . Solution, p. 383

13 For gamma-rays in the MeV range, the most frequent mode ofinteraction with matter is Compton scattering, in which the photonis scattered by an electron without being absorbed. Only part ofthe gamma’s energy is deposited, and the amount is related to theangle of scattering. Use conservation of four-momentum to showthat in the case of scattering at 180 degrees, the scattered photonhas energy E′ = E/(1+2E/m), where m is the mass of the electron.

14 Derive the equation T =√

3π/Gρ given on page 146 for theperiod of a rotating, spherical object that results in zero apparentgravity at its surface.

15 Section 4.4.3 presented an estimate of the upper limit on themass of a white dwarf. Check the self-consistency of the solutionin the following respects: (1) Why is it valid to ignore the contri-bution of the nuclei to the degeneracy pressure? (2) Although theelectrons are ultrarelativistic, spacetime is approximated as beingflat. As suggested in example 15 on page 65, a reasonable order-of-magnitude check on this result is that we should have M/r c2/G.

16 The laws of physics in our universe imply that for bodies witha certain range of masses, a neutron star is the unique equilibriumstate. Suppose we knew of the existence of neutron stars, but didn’tknow the mass of the neutron. Infer upper and lower bounds on themass of the neutron.

17 Example 20 on p. 141 briefly introduced the electromagneticpotential four-vector Fij , and this implicitly defines the transforma-tion properties of the electric and magnetic fields under a Lorentzboost v. To lowest order in v, this transformation is given by

E′ ≈ E + v ×B and

B′ ≈ B− v ×E .

I’m not a historian of science, but apparently ca. 1905 people likeHertz believed that these were the exact transformations of thefield.21 Show that this can’t be the case, because performing twosuch transformations in a row does not in general result in a trans-formation of the same form. . Solution, p. 383

21Montigny and Rousseaux, arxiv.org/abs/physics/0512200.

Problems 157

Page 158: General Relativity

18 We know of massive particles, whose velocity vectors alwayslie inside the future light cone, and massless particles, whose veloc-ities lie on it. In principle, we could have a third class of particles,called tachyons, with spacelike velocity vectors. Tachyons wouldhave m2 < 0, i.e., their masses would have to be imaginary. Showthat it is possible to pick momentum four-vectors p1 and p2 fora pair of tachyons such that p1 + p2 = 0. This implies that thevacuum would be unstable with respect to spontaneous creation oftachyon-antitachyon pairs.

158 Chapter 4 Tensors

Page 159: General Relativity

a / The expected structure ofthe field equations in generalrelativity.

b / Two rocks are droppedside by side. The curvatures oftheir world-lines are not intrinsic.In a free-falling frame, both wouldappear straight. If initially parallelworld-lines became non-parallel,that would be evidence of intrinsiccurvature.

Chapter 5

Curvature

General relativity describes gravitation as a curvature of spacetime,with matter acting as the source of the curvature in the same waythat electric charge acts as the source of electric fields. Our goal isto arrive at Einstein’s field equations, which relate the local intrin-sic curvature to the locally ambient matter in the same way thatGauss’s law relates the local divergence of the electric field to thecharge density. The locality of the equations is necessary becauserelativity has no action at a distance; cause and effect propagate ata maximum velocity of c(= 1).

The hard part is arriving at the right way of defining curvature.We’ve already seen that it can be tricky to distinguish intrinsiccurvature, which is real, from extrinsic curvature, which can neverproduce observable effects. E.g., example 4 on page 95 showed thatspheres have intrinsic curvature, while cylinders do not. The mani-festly intrinsic tensor notation protects us from being misled in thisrespect. If we can formulate a definition of curvature expressed usingonly tensors that are expressed without reference to any preordainedcoordinate system, then we know it is physically observable, and notjust a superficial feature of a particular model.

As an example, drop two rocks side by side, b. Their trajectoriesare vertical, but on a (t,x) coordinate plot rendered in the Earth’sframe of reference, they appear as parallel parabolas. The curva-ture of these parabolas is extrinsic. The Earth-fixed frame of refer-ence is defined by an observer who is subject to non-gravitationalforces, and is therefore not a valid Lorentz frame. In a free-fallingLorentz frame (t′,x′), the two rocks are either motionless or movingat constant velocity in straight lines. We can therefore see that thecurvature of world-lines in a particular coordinate system is not anintrinsic measure of curvature; it can arise simply from the choiceof the coordinate system. What would indicate intrinsic curvaturewould be, for example, if geodesics that were initially parallel wereto converge or diverge.

Nor is the metric a measure of intrinsic curvature. In example19 on page 140, we found the metric for an accelerated observer tobe

g′t′t′ = (1 + ax′)2 gx′x′ = −1 ,

where the primes indicate the accelerated observer’s frame. The factthat the timelike element is not equal to −1 is not an indication of

159

Page 160: General Relativity

a / Tidal forces disrupt cometShoemaker-Levy.

intrinsic curvature. It arises only from the choice of the coordinates(t′,x′) defined by a frame tied to the accelerating rocket ship.

The fact that the above metric has nonvanishing derivatives, un-like a constant Lorentz metric, does indicate the presence of a grav-itational field. However, a gravitational field is not the same thingas intrinsic curvature. The gravitational field seen by an observeraboard the ship is, by the equivalence principle, indistinguishablefrom an acceleration, and indeed the Lorentzian observer in theearth’s frame does describe it as arising from the ship’s accelera-tion, not from a gravitational field permeating all of space. Bothobservers must agree that “I got plenty of nothin’ ” — that theregion of the universe to which they have access lacks any stars,neutrinos, or clouds of dust. The observer aboard the ship must de-scribe the gravitational field he detects as arising from some sourcevery far away, perhaps a hypothetical vast sheet of lead lying billionsof light-years aft of the ship’s deckplates. Such a hypothesis is fine,but it is unrelated to the structure of our hoped-for field equation,which is to be local in nature.

Not only does the metric tensor not represent the gravitationalfield, but no tensor can represent it. By the equivalence princi-ple, any gravitational field seen by observer A can be eliminated byswitching to the frame of a free-falling observer B who is instanta-neously at rest with respect to A at a certain time. The structure ofthe tensor transformation law guarantees that A and B will agree onwhether a given tensor is zero at the point in spacetime where theypass by one another. Since they agree on all tensors, and disagreeon the gravitational field, the gravitational field cannot be a tensor.

We therefore conclude that a nonzero intrinsic curvature of thetype that is to be included in the Einstein field equations is notencoded in any simple way in the metric or its first derivatives.Since neither the metric nor its first derivatives indicate curvature,we can reasonably conjecture that the curvature might be encodedin its second derivatives.

5.1 Tidal curvature versus curvature causedby local sources

A further complication is the need to distinguish tidal curva-ture from curvature caused by local sources. Figure a shows CometShoemaker-Levy, broken up into a string of fragments by Jupiter’stidal forces shortly before its spectacular impact with the planet in1994. Immediately after each fracture, the newly separated chunkshad almost zero velocity relative to one another, so once the cometfinished breaking up, the fragments’ world-lines were a sheaf ofnearly parallel lines separated by spatial distances of only 1 km.These initially parallel geodesics then diverged, eventually fanning

160 Chapter 5 Curvature

Page 161: General Relativity

b / Tidal forces cause the ini-tially parallel world-lines of thefragments to diverge. The space-time occupied by the comet hasintrinsic curvature, but it is notcaused by any local mass; it iscaused by the distant mass ofJupiter.

c / The moon’s gravitationalfield causes the Earth’s oceans tobe distorted into an ellipsoid. Thesign of the sectional curvature isnegative in the x − t plane, butpositive in the y − t plane.

d / A cloud of test masses isreleased at rest in a sphericalshell around the earth, shownhere as a circle because the zaxis is omitted. The volume ofthe shell contracts over time,which demonstrates that thelocal curvature of spacetime isgenerated by a local source —the earth — rather than somedistant one.

out to span millions of kilometers.

If initially parallel lines lose their parallelism, that is clearly anindication of intrinsic curvature. We call it a measure of sectionalcurvature, because the loss of parallelism occurs within a particularplane, in this case the (t,x) plane represented by figure b.

But this curvature was not caused by a local source lurking inamong the fragments. It was caused by a distant source: Jupiter.We therefore see that the mere presence of sectional curvature is notenough to demonstrate the existence of local sources. Even the signof the sectional curvature is not a reliable indication. Although thisexample showed a divergence of initially parallel geodesics, referredto as a negative curvature, it is also possible for tidal forces exertedby distant masses to create positive curvature. For example, theocean tides on earth oscillate both above and below mean sea level,c.

As an example that really would indicate the presence of a localsource, we could release a cloud of test masses at rest in a spheri-cal shell around the earth, and allow them to drop, d. We wouldthen have positive and equal sectional curvature in the t− x, t− y,and t − z planes. Such an observation cannot be due to a distantmass. It demonstrates an over-all contraction of the volume of aninitially parallel sheaf of geodesics, which can never be induced bytidal forces. The earth’s oceans, for example, do not change theirtotal volume due to the tides, and this would be true even if theoceans were a gas rather than an incompressible fluid. It is a uniqueproperty of 1/r2 forces such as gravity that they conserve volumein this way; this is essentially a restatement of Gauss’s law in avacuum.

5.2 The stress-energy tensorIn general, the curvature of spacetime will contain contributionsfrom both tidal forces and local sources, superimposed on one an-other. To develop the right formulation for the Einstein field equa-tions, we need to eliminate the tidal part. Roughly speaking, wewill do this by averaging the sectional curvature over all three of theplanes t−x, t−y, and t−z, giving a measure of curvature called theRicci curvature. The “roughly speaking” is because such a prescrip-tion would treat the time and space coordinates in an extremelyasymmetric manner, which would violate local Lorentz invariance.

To get an idea of how this would work, let’s compare with theNewtonian case, where there really is an asymmetry between thetreatment of time and space. In the Cartan curved-spacetime the-ory of Newtonian gravity (page 41), the field equation has a kind ofscalar Ricci curvature on one side, and on the other side is the den-sity of mass, which is also a scalar. In relativity, however, the source

Section 5.2 The stress-energy tensor 161

Page 162: General Relativity

a / This curve has no intrin-sic curvature.

b / A surveyor on a mountaintopuses a heliotrope.

c / A map of a triangulationsurvey such as the one Gausscarried out. By measuring theinterior angles of the triangles,one can determine not just thetwo-dimensional projection ofthe grid but its complete three-dimensional form, including boththe curvature of the earth (notethe curvature of the lines of lat-itude) and the height of featuresabove and below sea level.

term in the equation clearly cannot be the scalar mass density. Weknow that mass and energy are equivalent in relativity, so for exam-ple the curvature of spacetime around the earth depends not juston the mass of its atoms but also on all the other forms of energy itcontains, such as thermal energy and electromagnetic and nuclearbinding energy. Can the source term in the Einstein field equationstherefore be the mass-energy E? No, because E is merely the time-like component of a particle’s momentum four-vector. To single itout would violate Lorentz invariance just as much as an asymmetrictreatment of time and space in constructing a Ricci measure of cur-vature. To get a properly Lorentz invariant theory, we need to find away to formulate everything in terms of tensor equations that makeno explicit reference to coordinates. The proper generalization ofthe Newtonian mass density in relativity is the stress-energy tensorT ij , whose 16 elements measure the local density of mass-energyand momentum, and also the rate of transport of these quantitiesin various directions. If we happen to be able to find a frame ofreference in which the local matter is all at rest, then T tt representsthe mass density. The reason for the word “stress” in the name isthat, for example, the flux of x-momentum in the x direction is ameasure of pressure.

For the purposes of the present discussion, it’s not necessary tointroduce the explicit definition of T ; the point is merely that weshould expect the Einstein field equations to be tensor equations,which tells us that the definition of curvature we’re seeking clearlyhas to be a rank-2 tensor, not a scalar. The implications in four-dimensional spacetime are fairly complex. We’ll end up with a rank-4 tensor that measures the sectional curvature, and a rank-2 Riccitensor derived from it that averages away the tidal effects. TheEinstein field equations then relate the Ricci tensor to the energy-momentum tensor in a certain way. The stress-energy tensor isdiscussed further in section 8.1.2 on page 269.

5.3 Curvature in two spacelike dimensionsSince the curvature tensors in 3+1 dimensions are complicated, let’sstart by considering lower dimensions. In one dimension, a, thereis no such thing as intrinsic curvature. This is because curvaturedescribes the failure of parallelism to behave as in E5, but there isno notion of parallelism in one dimension.

The lowest interesting dimension is therefore two, and this casewas studied by Carl Friedrich Gauss in the early nineteenth century.Gauss ran a geodesic survey of the state of Hanover, inventing anoptical surveying instrument called a heliotrope that in effect wasused to cover the Earth’s surface with a triangular mesh of lightrays. If one of the mesh points lies, for example, at the peak of amountain, then the sum Σθ of the angles of the vertices meeting at

162 Chapter 5 Curvature

Page 163: General Relativity

d / Example 1.

e / Example 2.

that point will be less than 2π, in contradiction to Euclid. Althoughthe light rays do travel through the air above the dirt, we can thinkof them as approximations to geodesics painted directly on the dirt,which would be intrinsic rather than extrinsic. The angular defectaround a vertex now vanishes, because the space is locally Euclidean,but we now pick up a different kind of angular defect, which is thatthe interior angles of a triangle no longer add up to the Euclideanvalue of π.

A polygonal survey of a soccer ball Example: 1Figure d applies similar ideas to a soccer ball, the only differencebeing the use of pentagons and hexagons rather than triangles.

In d/1, the survey is extrinsic, because the lines pass below thesurface of the sphere. The curvature is detectable because theangles at each vertex add up to 120 + 120 + 110 = 350 degrees,giving an angular defect of 10 degrees.

In d/2, the lines have been projected to form arcs of great circleson the surface of the sphere. Because the space is locally Eu-clidean, the sum of the angles at a vertex has its Euclidean valueof 360 degrees. The curvature can be detected, however, be-cause the sum of the internal angles of a polygon is greater thanthe Euclidean value. For example, each spherical hexagon givesa sum of 6× 124.31 degrees, rather than the Euclidean 6× 120.The angular defect of 6× 4.31 degrees is an intrinsic measure ofcurvature.

Angular defect on the earth’s surface Example: 2Divide the Earth’s northern hemisphere into four octants, withtheir boundaries running through the north pole. These octantshave sides that are geodesics, so they are equilateral triangles.Assuming Euclidean geometry, the interior angles of an equilat-eral triangle are each equal to 60 degrees, and, as with any tri-angle, they add up to 180 degrees. The octant-triangle in figuree has angles that are each 90 degrees, and the sum is 270. Thisshows that the Earth’s surface has intrinsic curvature.

This example suggests another way of measuring intrinsic curva-ture, in terms of the ratio C/r of the circumference of a circle toits radius. In Euclidean geometry, this ratio equals 2π. Let ρ bethe radius of the Earth, and consider the equator to be a circlecentered on the north pole, so that its radius is the length of oneof the sides of the triangle in figure e, r = (π/2)ρ. (Don’t confuser , which is intrinsic, with ρ, the radius of the sphere, which is ex-trinsic and not equal to r .) Then the ratio C/r is equal to 4, whichis smaller than the Euclidean value of 2π.

Let ε = Σθ − π be the angular defect of a triangle, and forconcreteness let the triangle be in a space with an elliptic geometry,so that it has constant curvature and can be modeled as a sphere of

Section 5.3 Curvature in two spacelike dimensions 163

Page 164: General Relativity

f / Proof that the angular defectof a triangle in elliptic geometryis proportional to its area. Eachwhite circle represents the en-tire elliptic plane. The dashedline at the edge is not really aboundary; lines that go off theedge simply wrap back around.In the spherical model, the whitecircle corresponds to one hemi-sphere, which is identified withthe opposite hemisphere.

g / Gaussian normal coordi-nates on a sphere.

radius ρ, with antipodal points identified.

Self-check: In elliptic geometry, what is the minimum possiblevalue of the quantity C/r discussed in example 2? How does thisdiffer from the case of spherical geometry?

We want a measure of curvature that is local, but if our spaceis locally flat, we must have ε → 0 as the size of the triangles ap-proaches zero. This is why Euclidean geometry is a good approx-imation for small-scale maps of the earth. The discrete nature ofthe triangular mesh is just an artifact of the definition, so we wanta measure of curvature that, unlike ε, approaches some finite limitas the scale of the triangles approaches zero. Should we expect thisscaling to go as ε ∝ ρ? ρ2? Let’s determine the scaling. Firstwe prove a classic lemma by Gauss, concerning a slightly differentversion of the angular defect, for a single triangle.

Theorem: In elliptic geometry, the angular defect ε = α+β+γ−πof a triangle is proportional to its area A.Proof: By axiom E2, extend each side of the triangle to form a line,figure f/1. Each pair of lines crosses at only one point (E1) anddivides the plane into two lunes with their four vertices touching atthis point, figure f/2. Of the six lunes, we focus on the three shadedones, which overlap the triangle. In each of these, the two interiorangles at the vertex are the same (Euclid I.15). The area of a luneis proportional to its interior angle, as follows from dissection intonarrower lunes; since a lune with an interior angle of π covers theentire area P of the plane, the constant of proportionality is P/π.The sum of the areas of the three lunes is (P/π)(α + β + γ), butthese three areas also cover the entire plane, overlapping three timeson the given triangle, and therefore their sum also equals P + 2A.Equating the two expressions leads to the desired result.

This calculation was purely intrinsic, because it made no use ofany model or coordinates. We can therefore construct a measureof curvature that we can be assured is intrinsic, K = ε/A. This iscalled the Gaussian curvature, and in elliptic geometry it is constantrather than varying from point to point. In the model on a sphereof radius ρ, we have K = 1/ρ2.

Self-check: Verify the equation K = 1/ρ2 by considering a tri-angle covering one octant of the sphere, as in example 2.

It is useful to introduce normal or Gaussian normal coordi-nates, defined as follows. Through point O, construct perpendiculargeodesics, and define affine coordinates x and y along these. Forany point P off the axis, define coordinates by constructing the linesthrough P that cross the axes perpendicularly. For P in a suffi-ciently small neighborhood of O, these lines exist and are uniquelydetermined. Gaussian polar coordinates can be defined in a similarway.

164 Chapter 5 Curvature

Page 165: General Relativity

h / 1. Gaussian curvaturecan be interpreted as the failureof parallelism represented byd2 α/dx dy .

i / 2. Gaussian curvature asL 6= rθ.

Here are two useful interpretations of K.

1. The Gaussian curvature measures the failure of parallelism inthe following sense. Let line ` be constructed so that it crosses thenormal y axis at (0, dy) at an angle that differs from perpendicularby the infinitesimal amount dα (figure h). Construct the line x′ =dx, and let dα′ be the angle its perpendicular forms with `. Then1

the Gaussian curvature at O is

K =d2 α

dx dy,

where d2 α = dα′ − dα.

2. From a point P, emit a fan of rays at angles filling a certainrange θ of angles in Gaussian polar coordinates (figure i). Let thearc length of this fan at r be L, which may not be equal to itsEuclidean value LE = rθ. Then2

K = −3d2

dr2

(L

LE

).

Let’s now generalize beyond elliptic geometry. Consider a spacemodeled by a surface embedded in three dimensions, with geodesicsdefined as curves of extremal length, i.e., the curves made by a pieceof string stretched taut across the surface. At a particular pointP, we can always pick a coordinate system (x, y, z) such that thesurface z = 1

2k1x2 + 1

2k2y2 locally approximates the surface to the

level of precision needed in order to discuss curvature. The surfaceis either paraboloidal or hyperboloidal (a saddle), depending on thesigns of k1 and k2. We might naively think that k1 and k2 could beindependently determined by intrinsic measurements, but as we’veseen in example 4 on page 95, a cylinder is locally indistinguishablefrom a Euclidean plane, so if one k is zero, the other k clearly cannotbe determined. In fact all that can be measured is the Gaussiancurvature, which equals the product k1k2. To see why this shouldbe true, first consider that any measure of curvature has units ofinverse distance squared, and the k’s have units of inverse distance.The only possible intrinsic measures of curvature based on the k’sare therefore k2

1 +k22 and k1k2. (We can’t have, for example, just k2

1,because that would change under an extrinsic rotation about the zaxis.) Only k1k2 vanishes on a cylinder, so it is the only possibleintrinsic curvature.

1Proof: Since any two lines cross in elliptic geometry, ` crosses the x axis. Thecorollary then follows by application of the definition of the Gaussian curvatureto the right triangles formed by `, the x axis, and the lines at x = 0 and x = dx,so that K = dε/dA = d2 α/dxdy, where third powers of infinitesimals havebeen discarded.

2In the spherical model, L = ρθ sinu, where u is the angle subtended at thecenter of the sphere by an arc of length r. We then have L/LE = sinu/u, whosesecond derivative with respect to u is −1/3. Since r = ρu, the second derivativeof the same quantity with respect to r equals −1/3ρ2 = −K/3.

Section 5.3 Curvature in two spacelike dimensions 165

Page 166: General Relativity

j / A triangle in a space withnegative curvature has anglesthat add to less than π.

k / A flea on the football can-not orient himself by intrinsic,local measurements.

Eating pizza Example: 3When people eat pizza by folding the slice lengthwise, they aretaking advantage of the intrinsic nature of the Gaussian curva-ture. Once k1 is fixed to a nonzero value, k2 can’t change withoutvarying K , so the slice can’t droop.

Elliptic and hyperbolic geometry Example: 4We’ve seen that figures behaving according to the axioms of el-liptic geometry can be modeled on part of a sphere, which is asurface of constant K > 0. The model can be made into globalone satisfying all the axioms if the appropriate topological prop-erties are ensured by identifying antipodal points. A paraboloidalsurface z = k1x2 + k2y2 can be a good local approximation toa sphere, but for points far from its apex, K varies significantly.Elliptic geometry has no parallels; all lines meet if extended farenough.

A space of constant negative curvature has a geometry called hy-perbolic, and is of some interest because it appears to be the onethat describes the spatial dimensions of our universe on a cosmo-logical scale. A hyperboloidal surface works locally as a model,but its curvature is only approximately constant; the surface ofconstant curvature is a horn-shaped one created by revolving amountain-shaped curve called a tractrix about its axis. The trac-trix of revolution is not as satisfactory a model as the sphere isfor elliptic geometry, because lines are cut off at the cusp of thehorn. Hyperbolic geometry is richer in parallels than Euclideangeometry; given a line ` and a point P not on `, there are infinitelymany lines through P that do not pass through `.

A flea on a football Example: 5We might imagine that a flea on the surface of an American foot-ball could determine by intrinsic, local measurements which di-rection to go in order to get to the nearest tip. This is impossible,because the flea would have to determine a vector, and curvaturecannot be a vector, since z = 1

2k1x2 + 12k2y2 is invariant under the

parity inversion x → −x , y → −y . For similar reasons, a measureof curvature can never have odd rank.

Without violating reflection symmetry, it is still conceivable that theflea could determine the orientation of the tip-to-tip line runningthrough his position. Surprisingly, even this is impossible. Theflea can only measure the single number K , which carries noinformation about directions in space.

166 Chapter 5 Curvature

Page 167: General Relativity

l / Example 6. In 1 and 2,charges that are visible on thefront surface of the conductor areshown as solid dots; the otherswould have to be seen throughthe conductor, which we imagineis semi-transparent.

The lightning rod Example: 6Suppose you have a pear-shaped conductor like the one in figurel/1. Since the pear is a conductor, there are free charges every-where inside it. Panels 1 and 2 of the figure show a computer sim-ulation with 100 identical electric charges. In 1, the charges arereleased at random positions inside the pear. Repulsion causesthem all to fly outward onto the surface and then settle down intoan orderly but nonuniform pattern.

We might not have been able to guess the pattern in advance, butwe can verify that some of its features make sense. For example,charge A has more neighbors on the right than on the left, whichwould tend to make it accelerate off to the left. But when welook at the picture as a whole, it appears reasonable that this isprevented by the larger number of more distant charges on its leftthan on its right.

There also seems to be a pattern to the nonuniformity: the chargescollect more densely in areas like B, where the Gaussian curva-ture is large, and less densely in areas like C, where K is nearlyzero (slightly negative).

To understand the reason for this pattern, consider l/3. It’s straight-forward to show that the density of charge σ on each sphere isinversely proportional to its radius, or proportional to K 1/2. LordKelvin proved that on a conducting ellipsoid, the density of chargeis proportional to the distance from the center to the tangentplane, which is equivalent3 to σ ∝ K 1/4; this result looks simi-lar except for the different exponent. McAllister showed in 19904

that this K 1/4 behavior applies to a certain class of examples, butit clearly can’t apply in all cases, since, for example, K could benegative, or we could have a deep concavity, which would forma Faraday cage. Problem 1 on p. 200 discusses the case of aknife-edge.

Similar reasoning shows why Benjamin Franklin used a sharp tipwhen he invented the lightning rod. The charged stormclouds in-duce positive and negative charges to move to opposite ends ofthe rod. At the pointed upper end of the rod, the charge tendsto concentrate at the point, and this charge attracts the lightning.The same effect can sometimes be seen when a scrap of alu-minum foil is inadvertently put in a microwave oven. Modern ex-periments5 show that although a sharp tip is best at starting aspark, a more moderate curve, like the right-hand tip of the pearin this example, is better at successfully sustaining the spark forlong enough to connect a discharge to the clouds.

3http://math.stackexchange.com/questions/112662/

gaussian-curvature-of-an-ellipsoid-proportional-to-fourth-power-of-the-distance4I W McAllister 1990 J. Phys. D: Appl. Phys. 23 3595Moore et al., Journal of Applied Meteorology 39 (1999) 593

Section 5.3 Curvature in two spacelike dimensions 167

Page 168: General Relativity

a / The definition of the Riemanntensor. The vector vb changesby dvb when parallel-transportedaround the approximate parallel-ogram. (vb is drawn on a scalethat makes its length comparableto the infinitesimals dpc , dqd , anddvb; in reality, its size would begreater than theirs by an infinitefactor.)

5.4 Curvature tensorsThe example of the flea suggests that if we want to express curvatureas a tensor, it should have even rank. Also, in a coordinate systemin which the coordinates have units of distance (they are not angles,for instance, as in spherical coordinates), we expect that the units ofcurvature will always be inverse distance squared. More elegantly,we expect that under a uniform rescaling of coordinates by a factorof µ, a curvature tensor should scale down by µ−2.

Combining these two facts, we find that a curvature tensor shouldhave one of the forms Rab, R

abcd, . . . , i.e., the number of lower in-

dices should be two greater than the number of upper indices. Thefollowing definition has this property, and is equivalent to the earlierdefinitions of the Gaussian curvature that were not written in tensornotation.

Definition of the Riemann curvature tensor: Let dpc and dqd

be two infinitesimal vectors, and use them to form a quadrilateralthat is a good approximation to a parallelogram.6 Parallel-transportvector vb all the way around the parallelogram. When it comes backto its starting place, it has a new value vb → vb + dvb. Then theRiemann curvature tensor is defined as the tensor that computes dva

according to dva = Rabcdvb dpc dqd. (There is no standardization in

the literature of the order of the indices.)

A symmetry of the Riemann tensor Example: 7If vectors dpc and dqd lie along the same line, then dva must van-ish, and interchanging dpc and dqd simply reverses the directionof the circuit around the quadrilateral, giving dva → −dva. Thisshows that Ra

bcd must be antisymmetric under interchange of theindices c and d , Ra

bcd = −Rabdc .

In local normal coordinates, the interpretation of the Riemanntensor becomes particularly transparent. The constant-coordinatelines are geodesics, so when the vector vb is transported along them,it maintains a constant angle with respect to them. Any rotationof the vector after it is brought around the perimeter of the quadri-lateral can therefore be attributed to something that happens atthe vertices. In other words, it is simply a measure of the angulardefect. We can therefore see that the Riemann tensor is really justa tensorial way of writing the Gaussian curvature K = dε/dA.

In normal coordinates, the local geometry is nearly Cartesian,and when we take the product of two vectors in an antisymmetricmanner, we are essentially measuring the area of the parallelogramthey span, as in the three-dimensional vector cross product. We cantherefore see that the Riemann tensor tells us something about theamount of curvature contained within the infinitesimal area spannedby dpc and dqd. A finite two-dimensional region can be broken

6Section 5.8 discusses the sense in which this approximation is good enough.

168 Chapter 5 Curvature

Page 169: General Relativity

b / The change in the vectordue to parallel transport aroundthe octant equals the integralof the Riemann tensor over theinterior.

down into infinitesimal elements of area, and the Riemann tensorintegrated over them. The result is equal to the finite change ∆vb

in a vector transported around the whole boundary of the region.

Curvature tensors on a sphere Example: 8Let’s find the curvature tensors on a sphere of radius ρ.

Construct normal coordinates (x , y ) with origin O, and let vec-tors dpc and dqd represent infinitesimal displacements along xand y , forming a quadrilateral as described above. Then Rx

yxyrepresents the change in the x direction that occurs in a vectorthat is initially in the y direction. If the vector has unit magni-tude, then Rx

yxy equals the angular deficit of the quadrilateral.Comparing with the definition of the Gaussian curvature, we findRx

yxy = K = 1/ρ2. Interchanging x and y , we find the same resultfor Ry

xyx . Thus although the Riemann tensor in two dimensionshas sixteen components, only these two are nonzero, and theyare equal to each other.

This result represents the defect in parallel transport around aclosed loop per unit area. Suppose we parallel-transport a vectoraround an octant, as shown in figure b. The area of the octantis (π/2)ρ2, and multiplying it by the Riemann tensor, we find thatthe defect in parallel transport is π/2, i.e., a right angle, as is alsoevident from the figure.

The above treatment may be somewhat misleading in that it maylead you to believe that there is a single coordinate system inwhich the Riemann tensor is always constant. This is not thecase, since the calculation of the Riemann tensor was only validnear the origin O of the normal coordinates. The character ofthese coordinates becomes quite complicated far from O; we endup with all our constant-x lines converging at north and southpoles of the sphere, and all the constant-y lines at east and westpoles.

Angular coordinates (φ, θ) are more suitable as a large-scale de-scription of the sphere. We can use the tensor transformation lawto find the Riemann tensor in these coordinates. If O, the originof the (x , y ) coordinates, is at coordinates (φ, θ), then dx/dφ =ρ sin θ and dy/dθ = ρ. The result is Rφ

θφθ = Rxyxy (dy/dθ)2 = 1

and Rθφθφ = Ry

xyx (dx/dφ)2 = sin2 θ. The variation in Rθφθφ is

not due to any variation in the sphere’s intrinsic curvature; it rep-resents the behavior of the coordinate system.

The Riemann tensor only measures curvature within a particularplane, the one defined by dpc and dqd, so it is a kind of sectional cur-vature. Since we’re currently working in two dimensions, however,there is only one plane, and no real distinction between sectionalcurvature and Ricci curvature, which is the average of the sectionalcurvature over all planes that include dqd: Rcd = Racad. The Ricci

Section 5.4 Curvature tensors 169

Page 170: General Relativity

a / The geodetic effect asmeasured by Gravity Probe B.

curvature in two spacelike dimensions, expressed in normal coordi-nates, is simply the diagonal matrix diag(K,K).

5.5 Some order-of-magnitude estimatesAs a general proposition, calculating an order-of-magnitude estimateof a physical effect requires an understanding of 50% of the physics,while an exact calculation requires about 75%.7 We’ve reachedthe point where it’s reasonable to attempt a variety of order-of-magnitude estimates.

5.5.1 The geodetic effect

How could we confirm experimentally that parallel transportaround a closed path can cause a vector to rotate? The rotationis related to the amount of spacetime curvature contained withinthe path, so it would make sense to choose a loop going arounda gravitating body. The rotation is a purely relativistic effect, sowe expect it to be small. To make it easier to detect, we shouldgo around the loop many times, causing the effect to accumulate.This is essentially a description of a body orbiting another body. Agyroscope aboard the orbiting body is expected to precess. This isknown as the geodetic effect. In 1916, shortly after Einstein pub-lished the general theory of relativity, Willem de Sitter calculatedthe effect on the earth-moon system. The effect was not directlyverified until the 1980’s, and the first high-precision measurementwas in 2007, from analysis of the results collected by the GravityProbe B satellite experiment. The probe carried four gyroscopesmade of quartz, which were the most perfect spheres ever manu-factured, varying from sphericity by no more than about 40 atoms.

Let’s estimate the size of the effect. The first derivative of themetric is, roughly, the gravitational field, whereas the second deriva-tive has to do with curvature. The curvature of spacetime aroundthe earth should therefore vary as GMr−3, where M is the earth’smass and G is the gravitational constant. The area enclosed by acircular orbit is proportional to r2, so we expect the geodetic effectto vary as nGM/r, where n is the number of orbits. The angle ofprecession is unitless, and the only way to make this result unitlessis to put in a factor of 1/c2. In units with c = 1, this factor is un-necessary. In ordinary metric units, the 1/c2 makes sense, becauseit causes the purely relativistic effect to come out to be small. Theresult, up to unitless factors that we didn’t pretend to find, is

∆θ ∼ nGM

c2r.

7This statement is itself only a rough estimate. Anyone who has taughtphysics knows that students will often calculate an effect exactly while not un-derstanding the underlying physics at all.

170 Chapter 5 Curvature

Page 171: General Relativity

We might also expect a Thomas precession. Like the spacetimecurvature effect, it would be proportional to nGM/c2r. Since we’renot worrying about unitless factors, we can just lump the Thomasprecession together with the effect already calculated.

The data for Gravity Probe B are r = re+(650 km) and n ≈ 5000(orbiting once every 90 minutes for the 353-day duration of theexperiment), giving ∆θ ∼ 3 × 10−6 radians. Figure b shows theactual results8 the four gyroscopes aboard the probe. The precessionwas about 6 arc-seconds, or 3 × 10−5 radians. Our crude estimatewas on the right order of magnitude. The missing unitless factor onthe right-hand side of the equation above is 3π, which brings the tworesults into fairly close quantitative agreement. The full derivation,including the factor of 3π, is given on page 214.

b / Precession angle as a function of time as measured by the four gyroscopes aboard Gravity Probe B.

5.5.2 Deflection of light rays

In the discussion of the momentum four vector in section 4.2.2,we saw that due to the equivalence principle, light must be affectedby gravity. There are two ways in which such an effect could occur.Light can gain and lose momentum as it travels up and down ina gravitational field, or its momentum vector can be deflected bya transverse gravitational field. As an example of the latter, a rayof starlight can be deflected by the sun’s gravity, causing the star’sapparent position in the sky to be shifted. The detection of thiseffect was one of the first experimental tests of general relativity.Ordinarily the bright light from the sun would make it impossibleto accurately measure a star’s location on the celestial sphere, butthis problem was sidestepped by Arthur Eddington during an eclipseof the sun in 1919.

Let’s estimate the size of this effect. We’ve already seen that

8arxiv.org/abs/1105.3456

Section 5.5 Some order-of-magnitude estimates 171

Page 172: General Relativity

c / One of the photos from Ed-dington’s observations of the1919 eclipse. This is a pho-tographic negative, so the cir-cle that appears bright is actuallythe dark face of the moon, andthe dark area is really the brightcorona of the sun. The stars,marked by lines above and be-low them, appeared at positionsslightly different than their nor-mal ones, indicating that their lighthad been bent by the sun’s gravityon its way to our planet.

the Riemann tensor is essentially just a tensorial way of writingthe Gaussian curvature K = dε/dA. Suppose, for the sake of thisrough estimate, that the sun, earth, and star form a non-Euclideantriangle with a right angle at the sun. Then the angular deflectionis the same as the angular defect ε of this triangle, and equals theintegral of the curvature over the interior of the triangle. Ignoringunitless constants, this ends up being exactly the same calculationas in section 5.5.1, and the result is ε ∼ GM/c2r, where r is thelight ray’s distance of closest approach to the sun. The value of rcan’t be less than the radius of the sun, so the maximum size of theeffect is on the order of GM/c2r, where M is the sun’s mass, and ris its radius. We find ε ∼ 10−5 radians, or about a second of arc. Tomeasure a star’s position to within an arc second was well withinthe state of the art in 1919, under good conditions in a comfortableobservatory. This observation, however, required that Eddington’steam travel to the island of Principe, off the coast of West Africa.The weather was cloudy, and only during the last 10 seconds of theseven-minute eclipse did the sky clear enough to allow photographicplates to be taken of the Hyades star cluster against the backgroundof the eclipse-darkened sky. The observed deflection was 1.6 secondsof arc, in agreement with the relativistic prediction. The relativisticprediction is derived on page 222.

5.6 The covariant derivativeIn the preceding section we were able to estimate a nontrivial generalrelativistic effect, the geodetic precession of the gyroscopes aboardGravity Probe B, up to a unitless constant 3π. Let’s think about

172 Chapter 5 Curvature

Page 173: General Relativity

a / A double-slit experimentwith electrons. If we add anarbitrary constant to the potential,no observable changes result.The wavelength is shortened, butthe relative phase of the two partsof the waves stays the same.

b / Two wavefunctions withconstant wavelengths, and athird with a varying wavelength.None of these are physicallydistinguishable, provided that thesame variation in wavelength isapplied to all electrons in theuniverse at any given point inspacetime. There is not evenany unambiguous way to pick outthe third one as the one with avarying wavelength. We couldchoose a different gauge in whichthe third wave was the only onewith a constant wavelength.

what additional machinery would be needed in order to carry outthe calculation in detail, including the 3π.

First we would need to know the Einstein field equation, but in avacuum this is fairly straightforward: Rab = 0. Einstein posited thisequation based essentially on the considerations laid out in section5.1.

But just knowing that a certain tensor vanishes identically in thespace surrounding the earth clearly doesn’t tell us anything explicitabout the structure of the spacetime in that region. We want toknow the metric. As suggested at the beginning of the chapter, weexpect that the first derivatives of the metric will give a quantityanalogous to the gravitational field of Newtonian mechanics, but thisquantity will not be directly observable, and will not be a tensor.The second derivatives of the metric are the ones that we expect torelate to the Ricci tensor Rab.

5.6.1 The covariant derivative in electromagnetism

We’re talking blithely about derivatives, but it’s not obvious howto define a derivative in the context of general relativity in such away that taking a derivative results in well-behaved tensor.

To see how this issue arises, let’s retreat to the more familiarterrain of electromagnetism. In quantum mechanics, the phase of acharged particle’s wavefunction is unobservable, so that for examplethe transformation Ψ → −Ψ does not change the results of experi-ments. As a less trivial example, we can redefine the ground of ourelectrical potential, Φ→ Φ + δΦ, and this will add a constant ontothe energy of every electron in the universe, causing their phases tooscillate at a greater rate due to the quantum-mechanical relationE = hf . There are no observable consequences, however, becausewhat is observable is the phase of one electron relative to another,as in a double-slit interference experiment. Since every electron hasbeen made to oscillate faster, the effect is simply like letting the con-ductor of an orchestra wave her baton more quickly; every musicianis still in step with every other musician. The rate of change of thewavefunction, i.e., its derivative, has some built-in ambiguity.

For simplicity, let’s now restrict ourselves to spin-zero parti-cles, since details of electrons’ polarization clearly won’t tell usanything useful when we make the analogy with relativity. For aspin-zero particle, the wavefunction is simply a complex number,and there are no observable consequences arising from the transfor-mation Ψ→ Ψ′ = eiαΨ, where α is a constant. The transformationΦ → Φ − δΦ is also allowed, and it gives α(t) = (qδΦ/~)t, so thatthe phase factor eiα(t) is a function of time t. Now from the pointof view of electromagnetism in the age of Maxwell, with the elec-tric and magnetic fields imagined as playing their roles against abackground of Euclidean space and absolute time, the form of this

Section 5.6 The covariant derivative 173

Page 174: General Relativity

time-dependent phase factor is very special and symmetrical; it de-pends only on the absolute time variable. But to a relativist, there isnothing very nice about this function at all, because there is nothingspecial about a time coordinate. If we’re going to allow a functionof this form, then based on the coordinate-invariance of relativity, itseems that we should probably allow α to be any function at all ofthe spacetime coordinates. The proper generalization of Φ→ Φ−δΦis now Ab → Ab − ∂bα, where Ab is the electromagnetic potentialfour-vector (section 4.2.5, page 137).

Self-check: Suppose we said we would allow α to be a functionof t, but forbid it to depend on the spatial coordinates. Prove thatthis would violate Lorentz invariance.

The transformation has no effect on the electromagnetic fields,which are the direct observables. We can also verify that the changeof gauge will have no effect on observable behavior of charged par-ticles. This is because the phase of a wavefunction can only bedetermined relative to the phase of another particle’s wavefunction,when they occupy the same point in space and, for example, inter-fere. Since the phase shift depends only on the location in spacetime,there is no change in the relative phase.

But bad things will happen if we don’t make a correspondingadjustment to the derivatives appearing in the Schrodinger equation.These derivatives are essentially the momentum operators, and theygive different results when applied to Ψ′ than when applied to Ψ:

∂bΨ→ ∂b(eiαΨ

)= eiα∂bΨ + i∂bα

(eiαΨ

)=(∂b +A′b −Ab

)Ψ′

To avoid getting incorrect results, we have to do the substitution∂b → ∂b + ieAb, where the correction term compensates for thechange of gauge. We call the operator ∇ defined as

∇b = ∂b + ieAb

the covariant derivative. It gives the right answer regardless of achange of gauge.

5.6.2 The covariant derivative in general relativity

Now consider how all of this plays out in the context of gen-eral relativity. The gauge transformations of general relativity arearbitrary smooth changes of coordinates. One of the most basicproperties we could require of a derivative operator is that it mustgive zero on a constant function. A constant scalar function remainsconstant when expressed in a new coordinate system, but the sameis not true for a constant vector function, or for any tensor of higherrank. This is because the change of coordinates changes the units

174 Chapter 5 Curvature

Page 175: General Relativity

c / These three rulers representthree choices of coordinates. Asin figure b on page 173, switchingfrom one set of coordinates toanother has no effect on anyexperimental observables. It ismerely a choice of gauge.

in which the vector is measured, and if the change of coordinates isnonlinear, the units vary from point to point.

Consider the one-dimensional case, in which a vector va has onlyone component, and the metric is also a single number, so that wecan omit the indices and simply write v and g. (We just have toremember that v is really a covariant vector, even though we’releaving out the upper index.) If v is constant, its derivative dv/dx,computed in the ordinary way without any correction term, is zero.If we further assume that the coordinate x is a normal coordinate, sothat the metric is simply the constant g = 1, then zero is not just theanswer but the right answer. (The existence of a preferred, globalset of normal coordinates is a special feature of a one-dimensionalspace, because there is no curvature in one dimension. In more thanone dimension, there will typically be no possible set of coordinatesin which the metric is constant, and normal coordinates only give ametric that is approximately constant in the neighborhood arounda certain point. See figure g pn page 164 for an example of normalcoordinates on a sphere, which do not have a constant metric.)

Now suppose we transform into a new coordinate system X,which is not normal. The metric G, expressed in this coordinatesystem, is not constant. Applying the tensor transformation law,we have V = v dX/ dx, and differentiation with respect to X willnot give zero, because the factor dX/ dx isn’t constant. This is thewrong answer: V isn’t really varying, it just appears to vary becauseG does.

We want to add a correction term onto the derivative operatord/ dX, forming a covariant derivative operator ∇X that gives theright answer. This correction term is easy to find if we considerwhat the result ought to be when differentiating the metric itself.In general, if a tensor appears to vary, it could vary either becauseit really does vary or because the metric varies. If the metric itselfvaries, it could be either because the metric really does vary or. . . because the metric varies. In other words, there is no sensibleway to assign a nonzero covariant derivative to the metric itself, sowe must have ∇XG = 0. The required correction therefore consistsof replacing d/ dX with

∇X =d

dX−G−1 dG

dX.

Applying this to G gives zero. G is a second-rank contravarianttensor. If we apply the same correction to the derivatives of othersecond-rank contravariant tensors, we will get nonzero results, andthey will be the right nonzero results. For example, the covariantderivative of the stress-energy tensor T (assuming such a thing couldhave some physical significance in one dimension!) will be ∇XT =dT/ dX −G−1(dG/dX)T .

Physically, the correction term is a derivative of the metric, and

Section 5.6 The covariant derivative 175

Page 176: General Relativity

we’ve already seen that the derivatives of the metric (1) are the clos-est thing we get in general relativity to the gravitational field, and(2) are not tensors. In 1+1 dimensions, suppose we observe that afree-falling rock has dV/ dT = 9.8 m/s2. This acceleration cannot bea tensor, because we could make it vanish by changing from Earth-fixed coordinates X to free-falling (normal, locally Lorentzian) co-ordinates x, and a tensor cannot be made to vanish by a change ofcoordinates. According to a free-falling observer, the vector v isn’tchanging at all; it is only the variation in the Earth-fixed observer’smetric G that makes it appear to change.

Mathematically, the form of the derivative is (1/y) dy/dx, whichis known as a logarithmic derivative, since it equals d(ln y)/dx. Itmeasures the multiplicative rate of change of y. For example, ify scales up by a factor of k when x increases by 1 unit, then thelogarithmic derivative of y is ln k. The logarithmic derivative ofecx is c. The logarithmic nature of the correction term to ∇X is agood thing, because it lets us take changes of scale, which are mul-tiplicative changes, and convert them to additive corrections to thederivative operator. The additivity of the corrections is necessary ifthe result of a covariant derivative is to be a tensor, since tensorsare additive creatures.

What about quantities that are not second-rank covariant ten-sors? Under a rescaling of contravariant coordinates by a factor ofk, covariant vectors scale by k−1, and second-rank covariant tensorsby k−2. The correction term should therefore be half as much forcovariant vectors,

∇X =d

dX− 1

2G−1 dG

dX.

and should have an opposite sign for contravariant vectors.

Generalizing the correction term to derivatives of vectors in morethan one dimension, we should have something of this form:

∇avb = ∂avb + Γbacv

c

∇avb = ∂avb − Γcbavc ,

where Γbac, called the Christoffel symbol, does not transform likea tensor, and involves derivatives of the metric. (“Christoffel” ispronounced “Krist-AWful,” with the accent on the middle syllable.)The explicit computation of the Christoffel symbols from the metricis deferred until section 5.9, but the intervening sections 5.7 and 5.8can be omitted on a first reading without loss of continuity.

An important gotcha is that when we evaluate a particular com-ponent of a covariant derivative such as ∇2v

3, it is possible for theresult to be nonzero even if the component v3 vanishes identically.This can be seen in example 5 on p. 279 and example 21 on p. 318.

176 Chapter 5 Curvature

Page 177: General Relativity

d / Example 9.

Christoffel symbols on the globe Example: 9As a qualitative example, consider the geodesic airplane trajec-tory shown in figure d, from London to Mexico City. In physicsit is customary to work with the colatitude, θ, measured downfrom the north pole, rather then the latitude, measured from theequator. At P, over the North Atlantic, the plane’s colatitude hasa minimum. (We can see, without having to take it on faith fromthe figure, that such a minimum must occur. The easiest way toconvince oneself of this is to consider a path that goes directlyover the pole, at θ = 0.)

At P, the plane’s velocity vector points directly west. At Q, overNew England, its velocity has a large component to the south.Since the path is a geodesic and the plane has constant speed,the velocity vector is simply being parallel-transported; the vec-tor’s covariant derivative is zero. Since we have vθ = 0 at P, theonly way to explain the nonzero and positive value of ∂φvθ is thatwe have a nonzero and negative value of Γθφφ .

By symmetry, we can infer that Γθφφ must have a positive valuein the southern hemisphere, and must vanish at the equator.

Γθφφ is computed in example 10 on page 188.

Symmetry also requires that this Christoffel symbol be indepen-dent of φ, and it must also be independent of the radius of thesphere.

Example 9 is in two spatial dimensions. In spacetime, Γ is es-sentially the gravitational field (see problem 6, p. 200), and earlypapers in relativity essentially refer to it that way.9 This may feellike a joyous reunion with our old friend from freshman mechanics,g = 9.8 m/s. But our old friend has changed. In Newtonian me-chanics, accelerations like g are frame-invariant (considering onlyinertial frames, which are the only legitimate ones in that theory).In general relativity they are frame-dependent, and as we saw onpage 176, the acceleration of gravity can be made to equal anythingwe like, based on our choice of a frame of reference.

To compute the covariant derivative of a higher-rank tensor, wejust add more correction terms, e.g.,

∇aUbc = ∂aUbc − ΓdbaUdc − ΓdcaUbd

or

∇aU cb = ∂aUcb − ΓdbaU

cd + ΓcadU

db .

With the partial derivative ∂µ, it does not make sense to use the

9“On the gravitational field of a point mass according to Einstein’s the-ory,” Sitzungsberichte der Koniglich Preussischen Akademie der Wissenschaften1 (1916) 189, translated in arxiv.org/abs/physics/9905030v1.

Section 5.6 The covariant derivative 177

Page 178: General Relativity

e / Birdtracks notation for thecovariant derivative.

a / The geodesic, 1, preservestangency under parallel trans-port. The non-geodesic curve,2, doesn’t have this property;a vector initially tangent to thecurve is no longer tangent to itwhen parallel-transported alongit.

metric to raise the index and form ∂µ. It does make sense to do sowith covariant derivatives, so ∇a = gab∇b is a correct identity.

Comma, semicolon, and birdtracks notation

Some authors use superscripts with commas and semicolons toindicate partial and covariant derivatives. The following equationsgive equivalent notations for the same derivatives:

∂µXν = Xν,µ

∇aXb = Xb;a

∇aXb = X ;ab

Figure e shows two examples of the corresponding birdtracks no-tation. Because birdtracks are meant to be manifestly coordinate-independent, they do not have a way of expressing non-covariantderivatives. We no longer want to use the circle as a notation fora non-covariant gradient as we did when we first introduced it onp. 48.

5.7 The geodesic equationIn this section, which can be skipped at a first reading, we show howthe Christoffel symbols can be used to find differential equations thatdescribe geodesics.

5.7.1 Characterization of the geodesic

A geodesic can be defined as a world-line that preserves tangencyunder parallel transport, a. This is essentially a mathematical wayof expressing the notion that we have previously expressed moreinformally in terms of “staying on course” or moving “inertially.”

A curve can be specified by giving functions xi(λ) for its coor-dinates, where λ is a real parameter. A vector lying tangent to thecurve can then be calculated using partial derivatives, T i = ∂xi/∂λ.There are three ways in which a vector function of λ could change:(1) it could change for the trivial reason that the metric is changing,so that its components changed when expressed in the new metric;(2) it could change its components perpendicular to the curve; or(3) it could change its component parallel to the curve. Possibility1 should not really be considered a change at all, and the definitionof the covariant derivative is specifically designed to be insensitiveto this kind of thing. 2 cannot apply to T i, which is tangent byconstruction. It would therefore be convenient if T i happened tobe always the same length. If so, then 3 would not happen either,and we could reexpress the definition of a geodesic by saying thatthe covariant derivative of T i was zero. For this reason, we willassume for the remainder of this section that the parametrizationof the curve has this property. In a Newtonian context, we couldimagine the xi to be purely spatial coordinates, and λ to be a uni-

178 Chapter 5 Curvature

Page 179: General Relativity

versal time coordinate. We would then interpret T i as the velocity,and the restriction would be to a parametrization describing motionwith constant speed. In relativity, the restriction is that λ must bean affine parameter. For example, it could be the proper time of aparticle, if the curve in question is timelike.

5.7.2 Covariant derivative with respect to a parameter

The notation of section 5.6 is not quite adapted to our presentpurposes, since it allows us to express a covariant derivative withrespect to one of the coordinates, but not with respect to a param-eter such as λ. We would like to notate the covariant derivative ofT i with respect to λ as ∇λT i, even though λ isn’t a coordinate. Toconnect the two types of derivatives, we can use a total derivative.To make the idea clear, here is how we calculate a total derivativefor a scalar function f(x, y), without tensor notation:

df

dλ=∂f

∂x

∂x

∂λ+∂f

∂y

∂y

∂λ.

This is just the generalization of the chain rule to a function of twovariables. For example, if λ represents time and f temperature,then this would tell us the rate of change of the temperature asa thermometer was carried through space. Applying this to thepresent problem, we express the total covariant derivative as

∇λT i = (∇bT i)dxb

=(∂bT

i + ΓibcTc) dxb

dλ.

5.7.3 The geodesic equation

Recognizing ∂bTi dxb/dλ as a total non-covariant derivative, we

find

∇λT i =dT i

dλ+ ΓibcT

cdxb

dλ.

Substituting ∂xi/∂λ for T i, and setting the covariant derivativeequal to zero, we obtain

d2 xi

dλ2+ Γibc

dxc

dxb

dλ= 0.

This is known as the geodesic equation.

If this differential equation is satisfied for one affine parameterλ, then it is also satisfied for any other affine parameter λ′ = aλ+ b,where a and b are constants (problem 4). Recall that affine param-eters are only defined along geodesics, not along arbitrary curves.We can’t start by defining an affine parameter and then use it tofind geodesics using this equation, because we can’t define an affineparameter without first specifying a geodesic. Likewise, we can’t

Section 5.7 The geodesic equation 179

Page 180: General Relativity

do the geodesic first and then the affine parameter, because if wealready had a geodesic in hand, we wouldn’t need the differentialequation in order to find a geodesic. The solution to this chicken-and-egg conundrum is to write down the differential equations andtry to find a solution, without trying to specify either the affine pa-rameter or the geodesic in advance. We will seldom have occasionto resort to this technique, an exception being example 19 on page316.

5.7.4 Uniqueness

The geodesic equation is useful in establishing one of the neces-sary theoretical foundations of relativity, which is the uniqueness ofgeodesics for a given set of initial conditions. This is related to ax-iom O1 of ordered geometry, that two points determine a line, andis necessary physically for the reasons discussed on page 22; briefly,if the geodesic were not uniquely determined, then particles wouldhave no way of deciding how to move. The form of the geodesicequation guarantees uniqueness. To see this, consider the followingalgorithm for determining a numerical approximation to a geodesic:

1. Initialize λ, the xi and their derivatives dxi/ dλ. Also, set asmall step-size ∆λ by which to increment λ at each step below.

2. For each i, calculate d2 xi/ dλ2 using the geodesic equation.

3. Add (d2 xi/ dλ2)∆λ to the currently stored value of dxi/ dλ.

4. Add (dxi/dλ)∆λ to xi.

5. Add ∆λ to λ.

6. Repeat steps 2-5 until the geodesic has been extended to thedesired affine distance.

Since the result of the calculation depends only on the inputs atstep 1, we find that the geodesic is uniquely determined.

To see that this is really a valid way of proving uniqueness, itmay be helpful to consider how the proof could have failed. Omittingsome of the details of the tensors and the multidimensionality of thespace, the form of the geodesic equation is essentially x+ fx2 = 0,where dots indicate derivatives with respect to λ. Suppose that ithad instead had the form x2 + fx = 0. Then at step 2 we wouldhave had to pick either a positive or a negative square root for x.Although continuity would usually suffice to maintain a consistentsign from one iteration to the next, that would not work if we evercame to a point where x vanished momentarily. An equation of thisform therefore would not have a unique solution for a given set ofinitial conditions.

The practical use of this algorithm to compute geodesics numer-ically is demonstrated in section 5.9.2 on page 188.

180 Chapter 5 Curvature

Page 181: General Relativity

5.8 TorsionThis section describes the concept of gravitational torsion. It canbe skipped without loss of continuity, provided that you accept thesymmetry property Γa[bc] = 0 without worrying about what it meansphysically or what empirical evidence supports it.

Self-check: Interpret the mathematical meaning of the equationΓa[bc] = 0, which is expressed in the notation introduced on page102.

5.8.1 Are scalars path-dependent?

It seems clear that something like the covariant derivative isneeded for vectors, since they have a direction in spacetime, andthus their measures vary when the measure of spacetime itself varies.Since scalars don’t have a direction in spacetime, the same reasoningdoesn’t apply to them, and this is reflected in our rules for covariantderivatives. The covariant derivative has one Γ term for every indexof the tensor being differentiated, so for a scalar there should be noΓ terms at all, i.e., ∇a is the same as ∂a.

But just because derivatives of scalars don’t require special treat-ment for this particular reason, that doesn’t mean they are guaran-teed to behave as we intuitively expect, in the strange world ofcoordinate-invariant relativity.

One possible way for scalars to behave counterintuitively wouldbe by analogy with parallel transport of vectors. If we stick a vectorin a box (as with, e.g., the gyroscopes aboard Gravity Probe B) andcarry it around a closed loop, it changes. Could the same happenwith a scalar? This is extremely counterintuitive, since there is noreason to imagine such an effect in any of the models we’ve con-structed of curved spaces. In fact, it is not just counterintuitive butmathematically impossible, according to the following argument.The only reason we can interpret the vector-in-a-box effect as aris-ing from the geometry of spacetime is that it applies equally to allvectors. If, for example, it only applied to the magnetic polariza-tion vectors of ferromagnetic substances, then we would interpretit as a magnetic field living in spacetime, not a property of space-time itself. If the value of a scalar-in-a-box was path-dependent,and this path-dependence was a geometric property of spacetime,then it would have to apply to all scalars, including, say, massesand charges of particles. Thus if an electron’s mass increased by 1%when transported in a box along a certain path, its charge wouldhave to increase by 1% as well. But then its charge-to-mass ra-tio would remain invariant, and this is a contradiction, since thecharge-to-mass ratio is also a scalar, and should have felt the same1% effect. Since the varying scalar-in-a-box idea leads to a contra-diction, it wasn’t a coincidence that we couldn’t find a model thatproduced such an effect; a theory that lacks self-consistency doesn’t

Section 5.8 Torsion 181

Page 182: General Relativity

a / Measuring ∂2T/∂x∂y fora scalar T .

b / The gyroscopes both ro-tate when transported from Ato B, causing Helen to navigatealong BC, which does not forma right angle with AB. The anglebetween the two gyroscopes’axes is always the same, so therotation is not locally observable,but it does produce an observablegap between C and E.

have any models.

Self-check: Explain why parallel transporting a vector can onlyrotate it, not change its magnitude.

There is, however, a different way in which scalars could behavecounterintuitively, and this one is mathematically self-consistent.Suppose that Helen lives in two spatial dimensions and owns a ther-mometer. She wants to measure the spatial variation of tempera-ture, in particular its mixed second derivative ∂2T/∂x∂y. At homein the morning at point A, she prepares by calibrating her gyro-compass to point north and measuring the temperature. Then shetravels ` = 1 km east along a geodesic to B, consults her gyro-compass, and turns north. She continues one kilometer north to C,samples the change in temperature ∆T1 relative to her home, andthen retraces her steps to come home for lunch. In the afternoon,she checks her work by carrying out the same process, but this timeshe interchanges the roles of north and east, traveling along ADE.If she were living in a flat space, this would form the other two sidesof a square, and her afternoon temperature sample ∆T2 would beat the same point in space C as her morning sample. She actuallydoesn’t recognize the landscape, so the sample points C and E aredifferent, but this just confirms what she already knew: the spaceisn’t flat.10

None of this seems surprising yet, but there are now two quali-tatively different ways that her analysis of her data could turn out,indicating qualitatively different things about the laws of physicsin her universe. The definition of the derivative as a limit requiresthat she repeat the experiment at smaller scales. As ` → 0, theresult for ∂2T/∂x∂y should approach a definite limit, and the er-ror should diminish in proportion to `. In particular the differencebetween the results inferred from ∆T1 and ∆T2 indicate an error,and the discrepancy between the second derivatives inferred fromthem should shrink appropriately as ` shrinks. Suppose this doesn’thappen. Since partial derivatives commute, we conclude that hermeasuring procedure is not the same as a partial derivative. Let’scall her measuring procedure ∇, so that she is observing a discrep-ancy between ∇x∇y and ∇y∇x. The fact that the commutator∇x∇y −∇y∇x doesn’t vanish cannot be explained by the Christof-fel symbols, because what she’s differentiating is a scalar. Since thediscrepancy arises entirely from the failure of ∆T1 − ∆T2 to scaledown appropriately, the conclusion is that the distance δ betweenthe two sampling points is not scaling down as quickly as we ex-pect. In our familiar models of two-dimensional spaces as surfacesembedded in three-space, we always have δ ∼ `3 for small `, but shehas found that it only shrinks as quickly as `2.

10This point was mentioned on page 168, in connection with the definition ofthe Riemann tensor.

182 Chapter 5 Curvature

Page 183: General Relativity

c / Three gyroscopes are ini-tially aligned with the x , y , andz axes. After parallel transportalong the geodesic x axis, thex gyro is still aligned with the xaxis, but the y and z gyros haverotated.

For a clue as to what is going on, note that the commutator∇x∇y − ∇y∇x has a particular handedness to it. For example,it flips its sign under a reflection across the line y = x. When we“parallel”-transport vectors, they aren’t actually staying parallel. Inthis hypothetical universe, a vector in a box transported by a smalldistance ` rotates by an angle proportional to `. This effect is calledtorsion. Although no torsion effect shows up in our familiar models,that is not because torsion lacks self-consistency. Models of spaceswith torsion do exist. In particular, we can see that torsion doesn’tlead to the same kind of logical contradiction as the varying-scalar-in-a-box idea. Since all vectors twist by the same amount whentransported, inner products are preserved, so it is not possible toput two vectors in one box and get the scalar-in-a-box paradox bywatching their inner product change when the box is transported.

Note that the elbows ABC and ADE are not right angles. IfHelen had brought a pair of gyrocompasses with her, one for x andone for y, she would have found that the right angle between thegyrocompasses was preserved under parallel transport, but that agyrocompass initially tangent to a geodesic did not remain so. Thereare in fact two inequivalent definitions of a geodesic in a space withtorsion. The shortest path between two points is not necessarilythe same as the straightest possible path, i.e., the one that parallel-transports its own tangent vector.

5.8.2 The torsion tensor

Since torsion is odd under parity, it must be represented by anodd-rank tensor, which we call τ cab and define according to

(∇a∇b −∇b∇a)f = −τ cab∇cf ,

where f is any scalar field, such as the temperature in the preced-ing section. There are two different ways in which a space can benon-Euclidean: it can have curvature, or it can have torsion. Fora full discussion of how to handle the mathematics of a spacetimewith both curvature and torsion, see the article by Steuard Jensen athttp://www.slimy.com/~steuard/teaching/tutorials/GRtorsion.

pdf. For our present purposes, the main mathematical fact worthnoting is that vanishing torsion is equivalent to the symmetry Γabc =Γacb of the Christoffel symbols. Using the notation introduced onpage 102, Γa[bc] = 0 if τ = 0.

Self-check: Use an argument similar to the one in example 5on page 166 to prove that no model of a two-space embedded in athree-space can have torsion.

Generalizing to more dimensions, the torsion tensor is odd underthe full spacetime reflection xa → −xa, i.e., a parity inversion plusa time-reversal, PT.

In the story above, we had a torsion that didn’t preserve tan-gent vectors. In three or more dimensions, however, it is possible

Section 5.8 Torsion 183

Page 184: General Relativity

to have torsion that does preserve tangent vectors. For example,transporting a vector along the x axis could cause only a rotation inthe y-z plane. This relates to the symmetries of the torsion tensor,which for convenience we’ll write in an x-y-z coordinate system andin the fully covariant form τλµν . The definition of the torsion tensorimplies τλ(µν) = 0, i.e., that the torsion tensor is antisymmetric inits two final indices. Torsion that does not preserve tangent vectorswill have nonvanishing elements such as τxxy, meaning that parallel-transporting a vector along the x axis can change its x component.Torsion that preserves tangent vectors will have vanishing τλµν un-less λ, µ, and ν are all distinct. This is an example of the typeof antisymmetry that is familiar from the vector cross product, inwhich the cross products of the basis vectors behave as x × y = z,y× z = x, y× z = x. Generalizing the notation for symmetrizationand antisymmetrization of tensors from page 102, we have

T(abc) =1

3!ΣTabc

T[abc] =1

3!ΣεabcTabc ,

where the sums are over all permutations of the indices, and in thesecond line we have used the Levi-Civita symbol. In this notation,a totally antisymmetric torsion tensor is one with τλµν=τ[λµν], andtorsion of this type preserves tangent vectors under translation.

In two dimensions, there are no totally antisymmetric objectswith three indices, because we can’t write three indices withoutrepeating one. In three dimensions, an antisymmetric object withthree indices is simply a multiple of the Levi-Civita tensor, so atotally antisymmetric torsion, if it exists, is represented by a singlenumber; under translation, vectors rotate like either right-handedor left-handed screws, and this number tells us the rate of rotation.In four dimensions, we have four independently variable quantities,τxyz, τtyz, τtxz, and τtxy. In other words, an antisymmetric torsion of3+1 spacetime can be represented by a four-vector, τa = εabcdτbcd.

5.8.3 Experimental searches for torsion

One way of stating the equivalence principle (see p. 142) is thatit forbids spacetime from coming equipped with a vector field thatcould be measured by free-falling observers, i.e., observers in localLorentz frames. A variety of high-precision tests of the equivalenceprinciple have been carried out. From the point of view of an ex-perimenter doing this kind of test, it is important to distinguishbetween fields that are “built in” to spacetime and those that livein spacetime. For example, the existence of the earth’s magneticfield does not violate the equivalence principle, but if an experi-ment was sensitive to the earth’s field, and the experimenter didn’tknow about it, there would appear to be a violation. Antisymmet-ric torsion in four dimensions acts like a vector. If it constitutes

184 Chapter 5 Curvature

Page 185: General Relativity

d / The University of Wash-ington torsion pendulum used tosearch for torsion. The light graywedges are Alnico, the darkerones SmCo5. The arrows withthe filled heads represent thedirections of the electron spins,with denser arrows indicatinghigher polarization. The arrowswith the open heads show thedirection of the B field.

a universal background effect built into spacetime, then it violatesthe equivalence principle. If it instead arises from specific materialsources, then it may still show up as a measurable effect in exper-imental tests designed to detect Lorentz-invariance. Let’s considerthe latter possibility.

Since curvature in general relativity comes from mass and en-ergy, as represented by the stress-energy tensor Tab, we could askwhat would be the sources of torsion, if it exists in our universe.The source can’t be the rank-2 stress-energy tensor. It would haveto be an odd-rank tensor, i.e., a quantity that is odd under PT, andin theories that include torsion it is commonly assumed that thesource is the quantum-mechanical angular momentum of subatomicparticles. If this is the case, then torsion effects are expected to beproportional to ~G, the product of Planck’s constant and the gravi-tational constant, and they should therefore be extremely small andhard to measure. String theory, for example, includes torsion, butnobody has found a way to test string theory empirically because itessentially makes predictions about phenomena at the Planck scale,√~G/c3 ∼ 10−35 m, where both gravity and quantum mechanics

are strong effects.

There are, however, some high-precision experiments that havea reasonable chance of detecting whether our universe has torsion.Torsion violates the equivalence principle, and by the turn of thecentury tests of the equivalence principle had reached a level ofprecision sufficient to rule out some models that include torsion.Figure d shows a torsion pendulum used in an experiment by theEot-Wash group at the University of Washington.11 If torsion exists,then the intrinsic spin σ of an electron should have an energy σ · τ ,where τ is the spacelike part of the torsion vector. The torsioncould be generated by the earth, the sun, or some other object at agreater distance. The interaction σ ·τ will modify the behavior of atorsion pendulum if the spins of the electrons in the pendulum arepolarized nonrandomly, as in a magnetic material. The pendulumwill tend to precess around the axis defined by τ .

This type of experiment is extremely difficult, because the pen-dulum tends to act as an ultra-sensitive magnetic compass, resultingin a measurement of the ambient magnetic field rather than the hy-pothetical torsion field τ . To eliminate this source of systematicerror, the UW group first eliminated the ambient magnetic fieldas well as possible, using mu-metal shielding and Helmholtz coils.They also constructed the pendulum out of a combination of twomagnetic materials, Alnico 5 and SmCo5, in such a way that themagnetic dipole moment vanished, but the spin dipole moment didnot; Alnico 5’s magnetic field is due almost entirely to electron spin,whereas the magnetic field of SmCo5 contains significant contribu-

11http://arxiv.org/abs/hep-ph/0606218

Section 5.8 Torsion 185

Page 186: General Relativity

tions from orbital motion. The result was a nonmagnetic objectwhose spins were polarized. After four years of data collection, theyfound |τ | . 10−21 eV. Models that include torsion typically predictsuch effects to be of the order of m2

e/mP ∼ 10−17 eV, where me isthe mass of the electron and mP =

√~c/G ≈ 1019 GeV ≈ 20 µg is

the Planck mass. A wide class of these models is therefore ruled outby these experiments.

Since there appears to be no experimental evidence for the exis-tence of gravitational torsion in our universe, we will assume fromnow on that it vanishes identically. Einstein made the same as-sumption when he originally created general relativity, although heand Cartan later tinkered with non-torsion-free theories in a failedattempt to unify gravity with electromagnetism. Some models thatinclude torsion remain viable. For example, it has been argued thatthe torsion tensor should fall off quickly with distance from thesource.12

12Carroll and Field, http://arxiv.org/abs/gr-qc/9403058

186 Chapter 5 Curvature

Page 187: General Relativity

5.9 From metric to curvature5.9.1 Finding the Christoffel symbol from the metric

We’ve already found the Christoffel symbol in terms of the metricin one dimension. Expressing it in tensor notation, we have

Γdba =1

2gcd (∂?g??) ,

where inversion of the one-component matrix G has been replacedby matrix inversion, and, more importantly, the question marks indi-cate that there would be more than one way to place the subscriptsso that the result would be a grammatical tensor equation. Themost general form for the Christoffel symbol would be

Γbac =1

2gdb (L∂cgab +M∂agcb +N∂bgca) ,

where L, M , and N are constants. Consistency with the one-dimensional expression requires L + M + N = 1, and vanishingtorsion gives L = M . The L and M terms have a different physicalsignificance than the N term.

Suppose an observer uses coordinates such that all objects aredescribed as lengthening over time, and the change of scale accu-mulated over one day is a factor of k > 1. This is described by thederivative ∂tgxx < 1, which affects the M term. Since the metric isused to calculate squared distances, the gxx matrix element scalesdown by 1/

√k. To compensate for ∂tv

x < 0, so we need to add apositive correction term, M > 0, to the covariant derivative. Whenthe same observer measures the rate of change of a vector vt withrespect to space, the rate of change comes out to be too small, be-cause the variable she differentiates with respect to is too big. Thisrequires N < 0, and the correction is of the same size as the Mcorrection, so |M | = |N |. We find L = M = −N = 1.

Self-check: Does the above argument depend on the use of spacefor one coordinate and time for the other?

The resulting general expression for the Christoffel symbol interms of the metric is

Γcab =1

2gcd (∂agbd + ∂bgad − ∂dgab) .

One can readily go back and check that this gives∇cgab = 0. In fact,the calculation is a bit tedious. For that matter, tensor calculationsin general can be infamously time-consuming and error-prone. Anyreasonable person living in the 21st century will therefore resort toa computer algebra system. The most widely used computer alge-bra system is Mathematica, but it’s expensive and proprietary, andit doesn’t have extensive built-in facilities for handling tensors. It

Section 5.9 From metric to curvature 187

Page 188: General Relativity

turns out that there is quite a bit of free and open-source tensor soft-ware, and it falls into two classes: coordinate-based and coordinate-independent. The best open-source coordinate-independent facil-ity available appears to be Cadabra, and in fact the verification of∇cgab = 0 is the first example given in the Leo Brewin’s handy guideto applications of Cadabra to general relativity.13

Self-check: In the case of 1 dimension, show that this reduces tothe earlier result of −(1/2) dG/dX.

Since Γ is not a tensor, it is not obvious that the covariant deriva-tive, which is constructed from it, is a tensor. But if it isn’t obvious,neither is it surprising – the goal of the above derivation was to getresults that would be coordinate-independent.

Christoffel symbols on the globe, quantitatively Example: 10In example 9 on page 177, we inferred the following propertiesfor the Christoffel symbol Γθφφ on a sphere of radius R: Γθφφ isindependent of φ and R, Γθφφ < 0 in the northern hemisphere(colatitude θ less than π/2), Γθφφ = 0 on the equator, and Γθφφ >0 in the southern hemisphere.

The metric on a sphere is ds2 = R2 dθ2 + R2 sin2 θdφ2. The onlynonvanishing term in the expression for Γθφφ is the one involving∂θgφφ = 2R2 sin θ cos θ. The result is Γθφφ = − sin θ cos θ, whichcan be verified to have the properties claimed above.

5.9.2 Numerical solution of the geodesic equation

On page 180 I gave an algorithm that demonstrated the unique-ness of the solutions to the geodesic equation. This algorithm canalso be used to find geodesics in cases where the metric is known.The following program, written in the computer language Python,carries out a very simple calculation of this kind, in a case wherewe know what the answer should be; even without any previousfamiliarity with Python, it shouldn’t be difficult to see the corre-spondence between the abstract algorithm presented on page 180and its concrete realization below. For polar coordinates in a Eu-clidean plane, one can compute Γrφφ = −r and Γφrφ = 1/r (problem2, page 200). Here we compute the geodesic that starts out tangentto the unit circle at φ = 0.

1 import math

2

3 l = 0 # affine parameter lambda

4 dl = .001 # change in l with each iteration

5 l_max = 100.

6

7 # initial position:

13http://arxiv.org/abs/0903.2085

188 Chapter 5 Curvature

Page 189: General Relativity

8 r=1

9 phi=0

10 # initial derivatives of coordinates w.r.t. lambda

11 vr = 0

12 vphi = 1

13

14 k = 0 # keep track of how often to print out updates

15 while l<l_max:

16 l = l+dl

17 # Christoffel symbols:

18 Grphiphi = -r

19 Gphirphi = 1/r

20 # second derivatives:

21 ar = -Grphiphi*vphi*vphi

22 aphi = -2.*Gphirphi*vr*vphi

23 # ... factor of 2 because G^a_bc=G^a_cb and b

24 # is not the same as c

25 # update velocity:

26 vr = vr + dl*ar

27 vphi = vphi + dl*aphi

28 # update position:

29 r = r + vr*dl

30 phi = phi + vphi*dl

31 if k%10000==0: # k is divisible by 10000

32 phi_deg = phi*180./math.pi

33 print "lambda=%6.2f r=%6.2f phi=%6.2f deg." % (l,r,phi_deg)

34 k = k+1

It is not necessary to worry about all the technical details of thelanguage (e.g., line 1, which makes available such conveniences asmath.pi for π). Comments are set off by pound signs. Lines 16-34are indented because they are all to be executed repeatedly, until itis no longer true that λ < λmax (line 15).

Self-check: By inspecting lines 18-22, find the signs of r and φat λ = 0. Convince yourself that these signs are what we expectgeometrically.

The output is as follows:

1 lambda= 0.00 r= 1.00 phi= 0.06 deg.

2 lambda= 10.00 r= 10.06 phi= 84.23 deg.

3 lambda= 20.00 r= 20.04 phi= 87.07 deg.

4 lambda= 30.00 r= 30.04 phi= 88.02 deg.

5 lambda= 40.00 r= 40.04 phi= 88.50 deg.

6 lambda= 50.00 r= 50.04 phi= 88.78 deg.

7 lambda= 60.00 r= 60.05 phi= 88.98 deg.

8 lambda= 70.00 r= 70.05 phi= 89.11 deg.

9 lambda= 80.00 r= 80.06 phi= 89.21 deg.

Section 5.9 From metric to curvature 189

Page 190: General Relativity

10 lambda= 90.00 r= 90.06 phi= 89.29 deg.

We can see that φ → 90 deg. as λ → ∞, which makes sense,because the geodesic is a straight line parallel to the y axis.

A less trivial use of the technique is demonstrated on page 222,where we calculate the deflection of light rays in a gravitational field,one of the classic observational tests of general relativity.

5.9.3 The Riemann tensor in terms of the Christoffel symbols

The covariant derivative of a vector can be interpreted as the rateof change of a vector in a certain direction, relative to the result ofparallel-transporting the original vector in the same direction. Wecan therefore see that the definition of the Riemann curvature tensoron page 168 is a measure of the failure of covariant derivatives tocommute:

(∇a∇b −∇b∇a)Ac = AdRcdab

A tedious calculation now gives R in terms of the Γs:

Rabcd = ∂cΓadb − ∂dΓacb + ΓaceΓ

edb − ΓadeΓ

ecb

This is given as another example later in Brewin’s manual for apply-ing Cadabra to general relativity.14 (Brewin writes the upper indexin the second slot of R.)

5.9.4 Some general ideas about gauge

Let’s step back now for a moment and try to gain some physi-cal insight by looking at the features that the electromagnetic andrelativistic gauge transformations have in common. We have thefollowing analogies:

14http://arxiv.org/abs/0903.2085

190 Chapter 5 Curvature

Page 191: General Relativity

a / The Aharonov-Bohm ef-fect. An electron enters a beamsplitter at P, and is sent out intwo different directions. The twoparts of the wave are reflected sothat they reunite at Q. The arrowsrepresent the vector potential A.The observable magnetic fieldB is zero everywhere outsidethe solenoid, and yet the inter-ference observed at Q dependson whether the field is turnedon. See page 137 for furtherdiscussion of the A and B fieldsof a solenoid.

electromagnetism differentialgeometry

global symmetry A constant phaseshift α has no ob-servable effects.

Adding a con-stant onto acoordinate has noobservable effects.

local symmetry A phase shift αthat varies frompoint to point hasno observable ef-fects.

An arbitrary co-ordinate transfor-mation has no ob-servable effects.

The gauge is de-scribed by . . . α gµν

. . . and differentia-tion of this gives thegauge field. . . Ab Γcab

A second differen-tiation gives thedirectly observablefield(s) . . . E and B Rcdab

The interesting thing here is that the directly observable fieldsdo not carry all of the necessary information, but the gauge fields arenot directly observable. In electromagnetism, we can see this fromthe Aharonov-Bohm effect, shown in figure a.15 The solenoid hasB = 0 externally, and the electron beams only ever move throughthe external region, so they never experience any magnetic field. Ex-periments show, however, that turning the solenoid on and off doeschange the interference between the two beams. This is because thevector potential does not vanish outside the solenoid, and as we’veseen on page 137, the phase of the beams varies according to thepath integral of the Ab. We are therefore left with an uncomfort-able, but unavoidable, situation. The concept of a field is supposedto eliminate the need for instantaneous action at a distance, whichis forbidden by relativity; that is, (1) we want our fields to have onlylocal effects. On the other hand, (2) we would like our fields to bedirectly observable quantities. We cannot have both 1 and 2. Thegauge field satisfies 1 but not 2, and the electromagnetic fields give2 but not 1.

15We describe the effect here in terms of an idealized, impractical experiment.For the actual empirical status of the Aharonov-Bohm effect, see Batelaan andTonomura, Physics Today 62 (2009) 38.

Section 5.9 From metric to curvature 191

Page 192: General Relativity

b / The cone has zero intrin-sic curvature everywhere exceptat its tip. An observer who nevervisits the tip can neverthelessdetect its existence, becauseparallel transport around a paththat encloses the tip causes avector to change its direction.

Figure b shows an analog of the Aharonov-Bohm experiment indifferential geometry. Everywhere but at the tip, the cone has zerocurvature, as we can see by cutting it and laying it out flat. But evenan observer who never visits the tightly curved region at the tip candetect its existence, because parallel-transporting a vector arounda closed loop can change the vector’s direction, provided that theloop surrounds the tip.

In the electromagnetic example, integrating A around a closedloop reveals, via Stokes’ theorem, the existence of a magnetic fluxthrough the loop, even though the magnetic field is zero at everylocation where A has to be sampled. In the relativistic example,integrating Γ around a closed loop shows that there is curvatureinside the loop, even though the curvature is zero at all the placeswhere Γ has to be sampled.

The fact that Γ is a gauge field, and therefore not locally ob-servable, is simply a fancy way of expressing the ideas introducedon pp. 176 and 177, that due to the equivalence principle, the gravi-tational field in general relativity is not locally observable. This non-observability is local because the equivalence principle is a statementabout local Lorentz frames. The example in figure b is non-local.

Geodetic effect and structure of the source Example: 11. In section 5.5.1 on page 170, we estimated the geodetic effecton Gravity Probe B and found a result that was only off by a factorof 3π. The mathematically pure form of the 3π suggests that thegeodetic effect is insensitive to the distribution of mass inside theearth. Why should this be so?

. The change in a vector upon parallel transporting it around aclosed loop can be expressed in terms of either (1) the area inte-gral of the curvature within the loop or (2) the line integral of theChristoffel symbol (essentially the gravitational field) on the loopitself. Although I expressed the estimate as 1, it would have beenequally valid to use 2. By Newton’s shell theorem, the gravita-tional field is not sensitive to anything about its mass distributionother than its near spherical symmetry. The earth spins, and thisdoes affect the stress-energy tensor, but since the velocity withwhich it spins is everywhere much smaller than c, the resultingeffect, called frame dragging, is much smaller.

192 Chapter 5 Curvature

Page 193: General Relativity

a / In Asteroids, space “wrapsaround.”

5.10 ManifoldsThis section can be omitted on a first reading.

5.10.1 Why we need manifolds

General relativity doesn’t assume a predefined background met-ric, and this creates a chicken-and-egg problem. We want to definea metric on some space, but how do we even specify the set of pointsthat make up that space? The usual way to define a set of pointswould be by their coordinates. For example, in two dimensions wecould define the space as the set of all ordered pairs of real numbers(x, y). But this doesn’t work in general relativity, because space isnot guaranteed to have this structure. For example, in the classic1979 computer game Asteroids, space “wraps around,” so that ifyour spaceship flies off the right edge of the screen, it reappearson the left, and similarly at the top and bottom. Even before weimpose a metric on this space, it has topological properties that dif-fer from those of the Euclidean plane. By “topological” we meanproperties that are preserved if the space is thought of as a sheetof rubber that can be stretched in any way, but not cut or gluedback together. Topologically, the space in Asteroids is equivalent toa torus (surface of a doughnut), but not to the Euclidean plane.

b / A coffee cup is topologicallyequivalent to a torus.

Another useful example is the surface of a sphere. In example10 on page 188, we calculated Γθφφ. A similar calculation gives

Γφθφ = cot θ/R. Now consider what happens as we drive our dogslednorth along the line of longitude φ = 0, cross the north pole atθ = 0, and continue along the same geodesic. As we cross the pole,our longitude changes discontinuously from 0 to π. Consulting thegeodesic equation, we see that this happens because Γφθφ blows upat θ = 0. Of course nothing really special happens at the pole.The bad behavior isn’t the fault of the sphere, it’s the fault of the(θ,φ) coordinates we’ve chosen, that happen to misbehave at thepole. Unfortunately, it is impossible to define a pair of coordinateson a two-sphere without having them misbehave somewhere. (Thisfollows from Brouwer’s famous 1912 “Hairy ball theorem,” whichstates that it is impossible to comb the hair on a sphere withoutcreating a cowlick somewhere.)

Section 5.10 Manifolds 193

Page 194: General Relativity

c / General relativity doesn’tassume a predefined backgroundmetric. Therefore all we canreally know before we calculateanything is that we’re workingon a manifold, without a metricimposed on it.

5.10.2 Topological definition of a manifold

This motivates us to try to define a “bare-bones” geometricalspace in which there is no predefined metric or even any predefinedset of coordinates.

There is a general notion of a topological space, which is toogeneral for our purposes. In such a space, the only structure we areguaranteed is that certain sets are defined as “open,” in the samesense that an interval like 0 < x < 1 is called “open.” Any pointin an open set can be moved around without leaving the set. Anopen set is essentially a set without a boundary, for in a set like0 ≤ x ≤ 1, the boundary points 0 and 1 can only be moved in onedirection without taking them outside.

A topological space is too general for us because it can includespaces like fractals, infinite-dimensional spaces, and spaces that havedifferent numbers of dimensions in different regions. It is neverthe-less useful to recognize certain concepts that can be defined usingonly the generic apparatus of a topological space, so that we knowthey do not depend in any way on the presence of a metric. Anopen set surrounding a point is called a neighborhood of that point.In a topological space we have a notion of getting arbitrarily closeto a certain point, which means to take smaller and smaller neigh-borhoods, each of which is a subset of the last. But since there isno metric, we do not have any concept of comparing distances ofdistant points, e.g., that P is closer to Q than R is to S. A con-tinuous function is a purely topological idea; a continuous functionis one such that for any open subset U of its range, the set V ofpoints in its domain that are mapped to points in U is also open.Although some definitions of continuous functions talk about realnumbers like ε and δ, the notion of continuity doesn’t depend onthe existence of any structure such as the real number system. Ahomeomorphism is a function that is invertible and continuous inboth directions. Homeomorphisms formalize the informal notion of“rubber-sheet geometry without cutting or gluing.” If a homeomor-phism exists between two topological spaces, we say that they arehomeomorphic; they have the same structure and are in some sensethe same space.

The more specific type of topological space we want is called amanifold. Without attempting any high level of mathematical rigor,we define an n-dimensional manifold M according to the followinginformal principles:16

16For those with knowledge of topology, these can be formalized a little more:we want a completely normal, second-countable, locally connected topologicalspace that has Lebesgue covering dimension n, is a homogeneous space underits own homeomorphism group, and is a complete uniform space. I don’t knowwhether this is sufficient to characterize a manifold completely, but it suffices torule out all the counterexamples of which I know.

194 Chapter 5 Curvature

Page 195: General Relativity

M1 Dimension: M’s dimension is n.

M2 Homogeneity: No point has any property that distinguishes itfrom any other point.

M3 Completeness: M is complete, in the sense that specifying anarbitrarily small neighborhood gives a unique definition of apoint.

Lines Example: 12The set of all real numbers is a 1-manifold. Similarly, any line withthe properties specified in Euclid’s Elements is a 1-manifold. Allsuch lines are homeomorphic to one another, and we can there-fore speak of “the line.”

A circle Example: 13A circle (not including its interior) is a 1-manifold, and it is nothomeomorphic to the line. To see this, note that deleting a pointfrom a circle leaves it in one connected piece, but deleting a pointfrom a line makes two. Here we use the fact that a homeomor-phism is guaranteed to preserve “rubber-sheet” properties like thenumber of pieces.

No changes of dimension Example: 14A “lollipop” formed by gluing an open 2-circle (i.e., a circle notincluding its boundary) to an open line segment is not a manifold,because there is no n for which it satisfies M1.

It also violates M2, because points in this set fall into three distinctclasses: classes that live in 2-dimensional neighborhoods, thosethat live in 1-dimensional neighborhoods, and the point where theline segment intersects the boundary of the circle.

No manifolds made from the rational numbers Example: 15The rational numbers are not a manifold, because specifying anarbitrarily small neighborhood around

√2 excludes every rational

number, violating M3.

Similarly, the rational plane defined by rational-number coordinatepairs (x , y ) is not a 2-manifold. It’s good that we’ve excludedthis space, because it has the unphysical property that curvescan cross without having a point in common. For example, thecurve y = x2 crosses from one side of the line y = 2 to the other,but never intersects it. This is physically undesirable because itdoesn’t match up with what we have in mind when we talk aboutcollisions between particles as intersections of their world-lines,or when we say that electric field lines aren’t supposed to inter-sect.

No boundary Example: 16The open half-plane y > 0 in the Cartesian plane is a 2-manifold.The closed half-plane y ≥ 0 is not, because it violates M2; the

Section 5.10 Manifolds 195

Page 196: General Relativity

boundary points have different properties than the ones on theinterior.

Disconnected manifolds Example: 17Two nonintersecting lines are a 1-manifold. Physically, discon-nected manifolds of this type would represent a universe in whichan observer in one region would never be able to find out aboutthe existence of the other region.

No bad glue jobs Example: 18Hold your hands like you’re pretending you know karate, and thenuse one hand to karate-chop the other. Suppose we want to jointwo open half-planes in this way. As long as they’re separate,then we have a perfectly legitimate disconnected manifold. But ifwe want to join them by adding the point P where their boundariescoincide, then we violate M2, because this point has special prop-erties not possessed by any others. An example of such a prop-erty is that there exist points Q and R such that every continuouscurve joining them passes through P. (Cf. problem 5, p. 340.)

5.10.3 Local-coordinate definition of a manifold

An alternative way of characterizing an n-manifold is as an ob-ject that can locally be described by n real coordinates. That is,any sufficiently small neighborhood is homeomorphic to an open setin the space of real-valued n-tuples of the form (x1,x2, . . . ,xn). Forexample, a closed half-plane is not a 2-manifold because no neigh-borhood of a point on its edge is homeomorphic to any open set inthe Cartesian plane.

Self-check: Verify that this alternative definition of a manifoldgives the same answers as M1-M3 in all the examples above.

Roughly speaking, the equivalence of the two definitions occursbecause we’re using n real numbers as coordinates for the dimensionsspecified by M1, and the real numbers are the unique number systemthat has the usual arithmetic operations, is ordered, and is completein the sense of M3.

As usual when we say that something is “local,” a question arisesas to how local is local enough. The language in the definitionabove about “any sufficiently small neighborhood” is logically akinto the Weierstrass ε-δ approach: if Alice gives Bob a manifold anda point on a manifold, Bob can always find some neighborhoodaround that point that is compatible with coordinates, but it maybe an extremely small neighborhood.

Coordinates on a circle Example: 19If we are to define coordinates on a circle, they should be contin-uous functions. The angle φ about the center therefore doesn’tquite work as a global coordinate, because it has a discontinuitywhere φ = 0 is identified with φ = 2π. We can get around this by

196 Chapter 5 Curvature

Page 197: General Relativity

d / Example 22.

using different coordinates in different regions, as is guaranteedto be possible by the local-coordinate definition of a manifold. Forexample, we can cover the circle with two open sets, one on theleft and one on the right. The left one, L, is defined by deletingonly the φ = 0 point from the circle. The right one, R, is definedby deleting only the one at φ = π. On L, we use coordinates0 < φL < 2π, which are always a continuous function from L tothe real numbers. On R, we use −π < φR < π.

In examples like this one, the sets like L and R are referred toas patches. We require that the coordinate maps on the differentpatches match up smoothly. In this example, we would like allfour of the following functions, known as transition maps, to becontinuous:

• φL as a function of φR on the domain 0 < φR < π

• φL as a function of φR on the domain −π < φR < 0

• φR as a function of φL on the domain 0 < φL < π

• φR as a function of φL on the domain π < φL < 2π

The local-coordinate definition only states that a manifold canbe coordinatized. That is, the functions that define the coordinatemaps are not part of the definition of the manifold, so, for example,if two people define coordinates patches on the unit circle in differentways, they are still talking about exactly the same manifold.

Open line segment homeomorphic to a line Example: 20Let L be an open line segment, such as the open interval (0, 1). Lis homeomorphic to a line, because we can map (0, 1) to the realline through the function f (x) = tan(πx − π/2).

Closed line segment not homeomorphic to a line Example: 21A closed line segment (which is not a manifold) is not homeomor-phic to a line. If we map it to a line, then the endpoints have togo to two special points A and B. There is then no way for themapping to visit the points exterior to the interval [A, B] withoutvisiting A and B more than once.

Open line segment not homeomorphic to the interior of a circleExample: 22If the interior of a circle could be mapped by a homeomorphism fto an open line segment, then consider what would happen if wetook a closed curve lying inside the circle and found its image. Bythe intermediate value theorem, f would not be one-to-one, butthis is a contradiction since f was assumed to be a homeomor-phism. This is an example of a more general fact that homeomor-phism preserves the dimensionality of a manifold.

Section 5.10 Manifolds 197

Page 198: General Relativity

e / “Bob, your manifold isn’tsmooth!”

5.10.4 Differentiable manifolds

A differentiable manifold means a manifold with enough extrastructure so you can do calculus on it, but this extra structuredoesn’t necessarily include anything as fancy as a metric. As aconcrete example, suppose that in a 1 + 1-dimensional Galilean uni-verse, observer Alice constructs a global coordinate system (t,x).Her spacetime is clearly a manifold, based on the local-coordinatedefinition, and this is true even though Galilean spacetime doesn’thave a metric. Meanwhile, observer Bob constructs his own coordi-nate system (t′,x′). But something disturbing happens when Aliceconstructs the transition map from Bob’s coordinate grid to hers.As shown in figure e, Bob’s grid has a kink in it. “Bob,” says Alice,“something is wrong with your coordinate system. I hypothesizethat at a certain time, which we can call t = 0, an invisible gi-ant struck your body with an invisible croquet mallet and suddenlychanged your state of motion.” “No way, Alice,” Bob answers. “Ididn’t feel anything happen at t = 0. I think you’re the one whogot whacked.”

By a differentiable manifold we mean one in which this sort ofcontroversy never happens. The manifold comes with an a collectionof local coordinate systems, called charts, and wherever these chartsoverlap, the transition map is differentiable. Every coordinate is adifferentiable function of every other coordinate. In fact, we will as-sume for convenience that not just the first derivative but derivativesof all orders are defined. This makes our manifold not just a dif-ferentiable manifold but a smooth manifold. This definition soundscoordinate-dependent, but it isn’t. Our collection of charts (calledan atlas) can contain infinitely many possible coordinate systems;we can even specify that it contains all possible coordinate systemsthat could be obtained from one another by any diffeomorphism.

5.10.5 The tangent space

We now formalize the intuitive notion of a tangent vector (p. 88),following Nowik and Katz.17 Let M be an n-dimensional smoothmanifold, so that locally it looks like Euclidean space, describableby real-number coordinates x, y, . . . We now enhance M to forma new topological space, in which the coordinates can include notonly real numbers, but numbers that differ infinitesimally from reals,as outlined in example 2 on p. 93. From now on when we saythings like “the manifold,” we mean this enhanced version. Fixsome infinitesimal number ε for once and for all, and define thenotation x = O(ε) to mean that x/ε is not infinite.18

17“Differential geometry via infinitesimal displacements,” arxiv.org/abs/

1405.098418As usual in this type of “big O” notation, we abuse the equals sign somewhat.

In particular, the equals sign here is not symmetric. For more detail, see theWikipedia article “Big O notation.”

198 Chapter 5 Curvature

Page 199: General Relativity

f / 1. A tangent vector canbe thought of as an infinitesimaldisplacement. 2. For a sphereembedded in three-space, thespace of tangent vectors is visu-alized as a plane tangent to thesphere at a certain point.

Points in the manifold are considered close if the Euclidean dis-tance between them in coordinate space is O(ε). This definitionsounds coordinate-dependent, but isn’t, and sounds like it’s assum-ing an actual Euclidean metric, but isn’t.19 Define a prevector atpoint P as a pair (P,Q) of points that are close, figure f/1. De-fine prevectors to be equivalent if the difference between them isinfinitesimal even compared to ε.

Definition: A tangent vector at point P is the set of all prevectorsat P that are equivalent to a particular prevector at P.

The tangent space TP is the set of all tangent vectors at P. Thetangent space has the structure of a vector space over the realssimply by using the coordinate differences to define the vector-spaceoperations, just as we would do if (P ,Q) meant an arrow extendingfrom P to Q, as in freshman physics.

In practice, we don’t really care about the details of the con-struction of the tangent space, and different people don’t even haveto use the same construction. All we care about is that the tangentspace has a certain structure. In particular, it has n dimensions, aswe would expect intuitively. Since we’re going to forget the detailsof the construction, it doesn’t matter that we’ve made all tangentvectors infinitesimal by definition. The vector space’s internal struc-ture only has to do with how big the vectors are compared to eachother. (If we wanted to, we could scale up all the tangent vectorsby a factor of 1/ε.) This justifies the visualization in figure f/2.

Actually it’s not quite true that we only care about the tangentspace’s internal structure, because then we could have avoided thefancy definition and simply used the ordinary vector space consistingof n-tuples of real numbers. The fancy definition is needed becauseit ties the tangent space in a natural way to the structure of themanifold at a particular point. Therefore it will allow us (1) to defineparallel transport, which brings a vector from one tangent space toanother, and (2) to define components of vectors in a particularcoordinate system.

For an alternative definition of the tangent space, see ch. 2 ofCarroll.20 Briefly, this involves taking a tangent vector to be some-thing that behaves like a directional derivative. In particular, apartial derivative with respect to a coordinate such as ∂/∂x qual-ifies as a tangent vector, which we think of as pointing in the xdirection. The set of such coordinate derivatives forms a basis forthe tangent space and gives a convenient way of notating tangentvectors. We will find this notation convenient in section 7.1, p. 243.

19An equivalent and manifestly coordinate-independent definition is that forevery smooth real function in a neighborhood of the points, the function differsat these points by an amount that is O(ε).

20Lecture Notes on General Relativity, http://ned.ipac.caltech.edu/

level5/March01/Carroll3/Carroll_contents.html.

Section 5.10 Manifolds 199

Page 200: General Relativity

Problems1 Example 6 on p. 167 discussed some examples in electrostaticswhere the charge density on the surface of a conductor depends onthe Gaussian curvature, when the curvature is positive. In the caseof a knife-edge formed by two half-planes at an exterior angle β > π,there is a standard result21 that the charge density at the edge blowsup to infinity as Rπ/β−1. Does this match up with the hypothesisthat Gaussian curvature determines the charge density?

. Solution, p. 383

2 Show, as claimed on page 188, that for polar coordinates ina Euclidean plane, Γrφφ = −r and Γφrφ = 1/r.

3 Partial derivatives commute with partial derivatives. Co-variant derivatives don’t commute with covariant derivatives. Docovariant derivatives commute with partial derivatives?

4 Show that if the differential equation for geodesics on page178 is satisfied for one affine parameter λ, then it is also satisfied forany other affine parameter λ′ = aλ+ b, where a and b are constants.

5 Equation [2] on page 110 gives a flat-spacetime metric inrotating polar coordinates. (a) Verify by explicit computation thatthis metric represents a flat spacetime. (b) Reexpress the metric inrotating Cartesian coordinates, and check your answer by verifyingthat the Riemann tensor vanishes.

6 The purpose of this problem is to explore the difficultiesinherent in finding anything in general relativity that represents auniform gravitational field g. In example 12 on page 59, we found,based on elementary arguments about the equivalence principle andphotons in elevators, that gravitational time dilation must be givenby eΦ, where Φ = gz is the gravitational potential. This results ina metric

[1] ds2 = e2gz dt2 − dz2 .

On the other hand, example 19 on page 140 derived the metric

[2] ds2 = (1 + gz)2 dt2 − dz2 .

by transforming from a Lorentz frame to a frame whose origin moveswith constant proper acceleration g. (These are known as Rindlercoordinates.) Prove the following facts. None of the calculationsare so complex as to require symbolic math software, so you mightwant to perform them by hand first, and then check yourself on acomputer.(a) The metrics [1] and [2] are approximately consistent with oneanother for z near 0.

21Jackson, Classical Electrodynamics

200 Chapter 5 Curvature

Page 201: General Relativity

(b) When a test particle is released from rest in either of these met-rics, its initial proper acceleration is g.(c) The two metrics are not exactly equivalent to one another underany change of coordinates.(d) Both spacetimes are uniform in the sense that the curvature isconstant. (In both cases, this can be proved without an explicitcomputation of the Riemann tensor.)

Remark: The incompatibility between [1] and [2] can be interpreted as showingthat general relativity does not admit any spacetime that has all the globalproperties we would like for a uniform gravitational field. This is related to Bell’sspaceship paradox (example 16, p. 66). Some further properties of the metric [1]are analyzed in subsection 7.5 on page 260. . Solution, p. 383

7 In a topological space T, the complement of a subset U isdefined as the set of all points in T that are not members of U. Aset whose complement is open is referred to as closed. On the realline, give (a) one example of a closed set and (b) one example ofa set that is neither open nor closed. (c) Give an example of aninequality that defines an open set on the rational number line, buta closed set on the real line.

8 Prove that a double cone (e.g., the surface r = z in cylindricalcoordinates) is not a manifold. . Solution, p. 384

9 Prove that a torus is a manifold. . Solution, p. 384

10 Prove that a sphere is not homeomorphic to a torus.. Solution, p. 384

11 Curvature on a Riemannian space in 2 dimensions is atopic that goes back to Gauss and has a simple interpretation: theonly intrinsic measure of curvature is a single number, the Gaussiancurvature. What about 1+1 dimensions? The simplest metrics Ican think of are of the form ds2 = dt2 − f(t)dx2. (Something likeds2 = f(t)dt2−dx2 is obviously equivalent to Minkowski space undera change of coordinates, while ds2 = f(x)dt2 − dx2 is the same asthe original example except that we’ve swapped x and t.) Playingaround with simple examples, one stumbles across the seeminglymysterious fact that the metric ds2 = dt2− t2dx2 is flat, while ds2 =dt2 − tdx2 is not. This seems to require some simple explanation.Consider the metric ds2 = dt2 − tpdx2.(a) Calculate the Christoffel symbols by hand.(b) Use a computer algebra system such as Maxima to show thatthe Ricci tensor vanishes only when p = 2.

Remark: The explanation is that in the case p = 2, the x coordinate is expandingin proportion to the t coordinate. This can be interpreted as a situation in whichour length scale is defined by a lattice of test particles that expands inertially.Since their motion is inertial, no gravitational fields are required in order toexplain the observed change in the length scale; cf. the Milne universe, p. 305.

. Solution, p. 384

Problems 201

Page 202: General Relativity

202 Chapter 5 Curvature

Page 203: General Relativity

a / A Swiss commemorativecoin shows the vacuum fieldequation.

Chapter 6

Vacuum solutions

In this chapter we investigate general relativity in regions of spacethat have no matter to act as sources of the gravitational field.We will not, however, limit ourselves to calculating spacetimes incases in which the entire universe has no matter. For example,we will be able to calculate general-relativistic effects in the regionsurrounding the earth, including a full calculation of the geodeticeffect, which was estimated in section 5.5.1 only to within an orderof magnitude. We can have sources, but we just won’t describe themetric in the regions where the sources exist, e.g., inside the earth.The advantage of accepting this limitation is that in regions of emptyspace, we don’t have to worry about the details of the stress-energytensor or how it relates to curvature. As should be plausible basedon the physical motivation given in section 5.1, page 160, the fieldequations in a vacuum are simply Rab = 0.

6.1 Event horizonsOne seemingly trivial way to generate solutions to the field equationsin vacuum is simply to start with a flat Lorentzian spacetime and doa change of coordinates. This might seem pointless, since it wouldsimply give a new description (and probably a less convenient anddescriptive one) of the same old, boring, flat spacetime. It turnsout, however, that some very interesting things can happen whenwe do this.

6.1.1 The event horizon of an accelerated observer

Consider the uniformly accelerated observer described in exam-ples 4 on page 126 and 19 on page 140. Recalling these earlier results,we have for the ship’s equation of motion in an inertial frame

x =1

a

(√1 + a2t2 − 1

),

and for the metric in the ship’s frame

g′t′t′ = (1 + ax′)2

g′x′x′ = −1 .

Since this metric was derived by a change of coordinates from a flat-space metric, and the Ricci curvature is an intrinsic property, we

203

Page 204: General Relativity

a / A spaceship (curved world-line) moves with an accelerationperceived as constant by itspassengers. The photon (straightworld-line) comes closer andcloser to the ship, but will neverquite catch up.

expect that this one also has zero Ricci curvature. This is straight-forward to verify. The nonvanishing Christoffel symbols are

Γt′x′t′ =

a

1 + ax′and Γx

′t′t′ = a(1 + ax′) .

The only elements of the Riemann tensor that look like they mightbe nonzero are Rt

′t′x′x′ and Rx

′t′x′t′ , but both of these in fact vanish.

Self-check: Verify these facts.

This seemingly routine exercise now leads us into some very in-teresting territory. Way back on page 12, we conjectured that not allevents could be time-ordered: that is, that there might exists eventsin spacetime 1 and 2 such that 1 cannot cause 2, but neither can 2cause 1. We now have enough mathematical tools at our disposalto see that this is indeed the case.

We observe that x(t) approaches the asymptote x = t−1/a. Thisasymptote has a slope of 1, so it can be interpreted as the world-lineof a photon that chases the ship but never quite catches up to it.Any event to the left of this line can never have a causal relationshipwith any event on the ship’s world-line. Spacetime, as seen by anobserver on the ship, has been divided by a curtain into two causallydisconnected parts. This boundary is called an event horizon. Itsexistence is relative to the world-line of a particular observer. Anobserver who is not accelerating along with the ship does consideran event horizon to exist. Although this particular example of theindefinitely accelerating spaceship has some physically implausiblefeatures (e.g., the ship would have to run out of fuel someday), eventhorizons are real things. In particular, we will see in section 6.3.2that black holes have event horizons.

Interpreting everything in the (t′,x′) coordinates tied to the ship,the metric’s component g′t′t′ vanishes at x′ = −1/a. An observeraboard the ship reasons as follows. If I start out with a head-startof 1/a relative to some event, then the timelike part of the metric atthat event vanishes. If the event marks the emission of a materialparticle, then there is no possible way for that particle’s world-lineto have ds2 > 0. If I were to detect a particle emitted at that event,it would violate the laws of physics, since material particles musthave ds2 > 0, so I conclude that I will never observe such a particle.Since all of this applies to any material particle, regardless of itsmass m, it must also apply in the limit m→ 0, i.e., to photons andother massless particles. Therefore I can never receive a particleemitted from this event, and in fact it appears that there is no wayfor that event, or any other event behind the event horizon, to haveany effect on me. In my frame of reference, it appears that lightcones near the horizon are tipped over so far that their future light-cones lie entirely in the direction away from me.

We’ve already seen in example 15 on page 65 that a naive New-tonian argument suggests the existence of black holes; if a body is

204 Chapter 6 Vacuum Solutions

Page 205: General Relativity

sufficiently compact, light cannot escape from it. In a relativistictreatment, this should be described as an event horizon.

6.1.2 Information paradox

The existence of event horizons in general relativity has deepimplications, and in particular it helps to explain why it is so diffi-cult to reconcile general relativity with quantum mechanics, despitenearly a century of valiant attempts. Quantum mechanics has aproperty called unitarity. Mathematically, this says that if the stateof a quantum mechanical system is given, at a certain time, in theform of a vector, then its state at some point in the future can bepredicted by applying a unitary matrix to that vector. A unitarymatrix is the generalization to complex numbers of the ordinaryconcept of an orthogonal matrix, and essentially it just represents achange of basis, in which the basis vectors have unit length and areperpendicular to one another.

To see what this means physically, consider the following nonex-amples. The matrix (

1 00 0

)is not unitary, because its rows and columns are not orthogonal vec-tors with unit lengths. If this matrix represented the time-evolutionof a quantum mechanical system, then its meaning would be thatany particle in state number 1 would be left alone, but any particlein state 2 would disappear. Any information carried by particles instate 2 is lost forever and can never be retrieved. This also violatesthe time-reversal symmetry of quantum mechanics.

Another nonunitary matrix is:(1 0

0√

2

)Here, any particle in state 2 is increased in amplitude by a factor of√

2, meaning that it is doubled in probability. That is, the particleis cloned. This is the opposite problem compared to the one posedby the first matrix, and it is equally problematic in terms of time-reversal symmetry and conservation of information. Actually, if wecould clone a particle in this way, it would violate the Heisenberguncertainty principle. We could make two copies of the particle,and then measure the position of one copy and the momentum ofthe other, each with unlimited precision. This would violate theuncertainty principle, so we believe that it cannot be done. This isknown as the no-cloning theorem.1

The existence of event horizons in general relativity violates uni-tarity, because it allows information to be destroyed. If a particle isthrown behind an event horizon, it can never be retrieved.

1Ahn et al. have shown that the no-cloning theorem is violated in the presenceof closed timelike curves: arxiv.org/abs/1008.0221v1

Section 6.1 Event horizons 205

Page 206: General Relativity

b / Bill Unruh (1945-).

6.1.3 Radiation from event horizons

In interesting twist on the situation was introduced by Bill Un-ruh in 1976. Observer B aboard the accelerating spaceship believesin the equivalence principle, so she knows that the local proper-ties of space at the event horizon would seem entirely normal andLorentzian to a local observer A. (The same applies to a black hole’shorizon.) In particular, B knows that A would see pairs of virtualparticles being spontaneously created and destroyed in the local vac-uum. This is simply a manifestation of the time-energy form of theuncertainty principle, ∆E∆t . h. Now suppose that a pair of par-ticles is created, but one is created in front of the horizon and onebehind it. To A these are virtual particles that will have to be an-nihilated within the time ∆t, but according to B the one created infront of the horizon will eventually catch up with the spaceship, andcan be observed there, although it will be red-shifted. The amountof redshift is given by

√g′t′t′ =

√(1 + ax′)2. Say the pair is created

right near the horizon, at x′ = −1/a. By the uncertainty princi-ple, each of the two particles is spread out over a region of spaceof size ∆x′. Since these are photons, which travel at the speed oflight, the uncertainty in position is essentially the same as the un-certainty in time. The forward-going photon’s redshift comes outto be a∆x′ = a∆t′, which by the uncertainty principle should be atleast ha/E, so that when the photon is observed by B, its energy isE(ha/E) = ha.

Now B sees a uniform background of photons, with energies ofaround ha, being emitted randomly from the horizon. They arebeing emitted from empty space, so it seems plausible to believethat they don’t encode any information at all; they are completelyrandom. A surface emitting a completely random (i.e., maximum-entropy) hail of photons is a black-body radiator, so we expect thatthe photons will have a black-body spectrum, with its peak at anenergy of about ha. This peak is related to the temperature ofthe black body by E ∼ kT , where k is Boltzmann’s constant. Weconclude that the horizon acts like a black-body radiator with atemperature T ∼ ha/k. The more careful treatment by Unruh showsthat the exact relation is T = ha/4π2k, or ha/4π2kc in SI units.

An important observation here is that not only do different ob-servers disagree about the number of quanta that are present (whichis true in the case of ordinary Doppler shifts), but about the numberof quanta in the vacuum as well. B sees photons that according toA do not exist.

Let’s consider some real-world examples of large accelerations:

206 Chapter 6 Vacuum Solutions

Page 207: General Relativity

a / The field equations of generalrelativity are nonlinear.

acceleration(m/s2)

temperature ofhorizon (K)

bullet fired from a gun 103 10−17

electron in a CRT 107 10−13

plasmas produced by intenselaser pulses

1021 10

proton in a helium nucleus 1027 108

To detect Unruh radiation experimentally, we would ideally like tobe able to accelerate a detector and let it detect the radiation. Thisis clearly impractical. The third line shows that it is possible toimpart very large linear accelerations to subatomic particles, butthen one can only hope to infer the effect of the Unruh radiationindirectly by its effect on the particles. As shown on the final line,examples of extremely large nonlinear accelerations are not hard tofind, but the interpretation of Unruh radiation for nonlinear motionis unclear. A summary of the prospects for direct experimental de-tection of this effect is given by Rosu.2 This type of experiment isclearly extremely difficult, but it is one of the few ways in which onecould hope to get direct empirical insight, under controlled condi-tions, into the interface between gravity and quantum mechanics.

6.2 The Schwarzschild metricWe now set ourselves the goal of finding the metric describing thestatic spacetime outside a spherically symmetric, nonrotating, bodyof mass m. This problem was first solved by Karl Schwarzschildin 1915.3 One byproduct of finding this metric will be the abilityto calculate the geodetic effect exactly, but it will have more far-reaching consequences, including the existence of black holes.

The problem we are solving is similar to calculating the spher-ically symmetric solution to Gauss’s law in a vacuum. The solu-tion to the electrical problem is of the form r/r2, with an arbitraryconstant of proportionality that turns out to be proportional tothe charge creating the field. One big difference, however, is thatwhereas Gauss’s law is linear, the equation Rab = 0 is highly non-linear, so that the solution cannot simply be scaled up and down inproportion to m.

The reason for this nonlinearity is fundamental to general rela-tivity. For example, when the earth condensed out of the primordialsolar nebula, large amounts of heat were produced, and this energywas then gradually radiated into outer space, decreasing the totalmass of the earth. If we pretend, as in figure a, that this process

2http://xxx.lanl.gov/abs/gr-qc/96050323“On the gravitational field of a point mass according to Einstein’s the-

ory,” Sitzungsberichte der K oniglich Preussischen Akademie der Wissenschaften1 (1916) 189. An English translation is available at http://arxiv.org/abs/

physics/9905030v1.

Section 6.2 The Schwarzschild metric 207

Page 208: General Relativity

involved the merging of only two bodies, each with mass m, thenthe net result was essentially to take separated masses m and m atrest, and bring them close together to form close-neighbor massesm and m, again at rest. The amount of energy radiated away wasproportional to m2, so the gravitational mass of the combined sys-tem has been reduced from 2m to 2m−(. . .)m2, where . . . is roughlyG/c2r. There is a nonlinear dependence of the gravitational field onthe masses.

Self-check: The signature of a metric is defined as the list ofpositive and negative signs that occur when it is diagonalized.4 Theequivalence principle requires that the signature be + − −− (or−+ ++, depending on the choice of sign conventions). Verify thatany constant metric (including a metric with the “wrong” signature,e.g., 2+2 dimensions rather than 3+1) is a solution to the Einsteinfield equation in vacuum.

The correspondence principle tells us that our result must havea Newtonian limit, but the only variables involved are m and r, sothis limit must be the one in which r/m is large. Large comparedto what? There is nothing else available with which to compare,so it can only be large compared to some expression composed ofthe unitless constants G and c. We have already chosen units suchthat c = 1, and we will now set G = 1 as well. Mass and distanceare now comparable, with the conversion factor being G/c2 = 7 ×10−28 m/kg, or about a mile per solar mass. Since the earth’s radiusis thousands of times more than a mile, and its mass hundreds ofthousands of times less than the sun’s, its r/m is very large, andthe Newtonian approximation is good enough for all but the mostprecise applications, such as the GPS network or the Gravity ProbeB experiment.

6.2.1 The zero-mass case

First let’s demonstrate the trivial solution with flat spacetime.In spherical coordinates, we have

ds2 = dt2 − dr2 − r2 dθ2 − r2 sin2 θ dφ2 .

The nonvanishing Christoffel symbols (ignoring swaps of the lowerindices) are:

Γθrθ =1

r

Γφrφ =1

rΓrθθ = −rΓrφφ = −r sin2 θ

Γθφφ = − sin θ cos θ

Γφθφ = cot θ

4See p. 236 for a different but closely related use of the same term.

208 Chapter 6 Vacuum Solutions

Page 209: General Relativity

Self-check: If we’d been using the (− + ++) metric instead of(+−−−), what would have been the effect on the Christoffel sym-bols? What if we’d expressed the metric in different units, rescalingall the coordinates by a factor k?

Use of ctensor

In fact, when I calculated the Christoffel symbols above by hand,I got one of them wrong, and missed calculating one other because Ithought it was zero. I only found my mistake by comparing againsta result in a textbook. The computation of the Riemann tensor isan even bigger mess. It’s clearly a good idea to resort to a com-puter algebra system here. Cadabra, which was discussed earlier, isspecifically designed for coordinate-independent calculations, so itwon’t help us here. A good free and open-source choice is ctensor,which is one of the standard packages distributed along with thecomputer algebra system Maxima, introduced on page 76.

The following Maxima program calculates the Christoffel sym-bols found in section 6.2.1.

1 load(ctensor);

2 ct_coords:[t,r,theta,phi];

3 lg:matrix([1,0,0,0],

4 [0,-1,0,0],

5 [0,0,-r^2,0],

6 [0,0,0,-r^2*sin(theta)^2]);

7 cmetric();

8 christof(mcs);

Line 1 loads the ctensor package. Line 2 sets up the names of thecoordinates. Line 3 defines the gab, with lg meaning “the version ofg with lower indices.” Line 7 tells Maxima to do some setup workwith gab, including the calculation of the inverse matrix gab, whichis stored in ug. Line 8 says to calculate the Christoffel symbols.The notation mcs refers to the tensor Γ′ a

bc with the indices swappedaround a little compared to the convention Γabc followed in thisbook. On a Linux system, we put the program in a file flat.mac

and run it using the command maxima -b flat.mac. The relevantpart of the output is:

1 1

2 (%t6) mcs = -

3 2, 3, 3 r

4

5 1

6 (%t7) mcs = -

7 2, 4, 4 r

8

Section 6.2 The Schwarzschild metric 209

Page 210: General Relativity

9 (%t8) mcs = - r

10 3, 3, 2

11

12 cos(theta)

13 (%t9) mcs = ----------

14 3, 4, 4 sin(theta)

15

16 2

17 (%t10) mcs = - r sin (theta)

18 4, 4, 2

19

20 (%t11) mcs = - cos(theta) sin(theta)

21 4, 4, 3

Adding the command ricci(true); at the end of the program re-sults in the output THIS SPACETIME IS EMPTY AND/OR FLAT, whichsaves us hours of tedious computation. The tensor ric (which herehappens to be zero) is computed, and all its nonzero elements areprinted out. There is a similar command riemann(true); to com-pute the Riemann rensor riem. This is stored so that riem[i,j,k,l]is what we would call Rlikj . Note that l is moved to the end, and jand k are also swapped.

6.2.2 Geometrized units

If the mass creating the gravitational field isn’t zero, then weneed to decide what units to measure it in. It has already provedvery convenient to adopt units with c = 1, and we will now also setthe gravitational constant G = 1. Previously, with only c set to 1,the units of time and length were the same, [T ] = [L], and so werethe units of mass and energy, [M ] = [E]. With G = 1, all of thesebecome the same units, [T ] = [L] = [M ] = [E].

Self-check: Verify this statement by combining Newton’s law ofgravity with Newton’s second law of motion.

The resulting system is referred to as geometrized, because unitslike mass that had formerly belonged to the province of mechanicsare now measured using the same units we would use to do geometry.

6.2.3 A large-r limit

Now let’s think about how to tackle the real problem of findingthe non-flat metric. Although general relativity lets us pick any co-ordinates we like, the spherical symmetry of the problem suggestsusing coordinates that exploit that symmetry. The flat-space coor-dinates θ and φ can stil be defined in the same way, and they havethe same interpretation. For example, if we drop a test particletoward the mass from some point in space, its world-line will haveconstant θ and φ. The r coordinate is a little different. In curvedspacetime, the circumference of a circle is not equal to 2π times the

210 Chapter 6 Vacuum Solutions

Page 211: General Relativity

distance from the center to the circle; in fact, the discrepancy be-tween these two is essentially the definition of the Ricci curvature.This gives us a choice of two logical ways to define r. We’ll de-fine it as the circumference divided by 2π, which has the advantagethat the last two terms of the metric are the same as in flat space:−r2 dθ2−r2 sin2 θ dφ2. Since we’re looking for static solutions, noneof the elements of the metric can depend on t. Also, the solution isgoing to be symmetric under t → −t, θ → −θ, and φ → −φ, so wecan’t have any off-diagonal elements.5 The result is that we havenarrowed the metric down to something of the form

ds2 = h(r) dt2 − k(r) dr2 − r2 dθ2 − r2 sin2 θ dφ2 ,

where both h and k approach 1 for r →∞, where spacetime is flat.

For guidance in how to construct h and k, let’s consider theacceleration of a test particle at r m, which we know to be−m/r2, since nonrelativistic physics applies there. We have

∇tvr = ∂tvr + Γrtcv

c .

An observer free-falling along with the particle observes its acceler-ation to be zero, and a tensor that is zero in one coordinate systemis zero in all others. Since the covariant derivative is a tensor, weconclude that ∇tvr = 0 in all coordinate systems, including the(t, r, . . .) system we’re using. If the particle is released from rest,then initially its velocity four-vector is (1, 0, 0, 0), so we find that itsacceleration in (t, r) coordinates is −Γrtt = −1

2grr∂rgtt = −1

2h′/k.

Setting this equal to −m/r2, we find h′/k = 2m/r2 for r m.Since k ≈ 1 for large r, we have

h′ ≈ 2m

r2for r m .

The interpretation of this calculation is as follows. We assert theequivalence principle, by which the acceleration of a free-falling par-ticle can be said to be zero. After some calculations, we find thatthe rate at which time flows (encoded in h) is not constant. It isdifferent for observers at different heights in a gravitational poten-tial well. But this is something we had already deduced, withoutthe index gymnastics, in example 7 on page 129.

Integrating, we find that for large r, h = 1− 2m/r.

6.2.4 The complete solution

A series solution

We’ve learned some interesting things, but we still have an ex-tremely nasty nonlinear differential equation to solve. One way to

5For more about time-reversal symmetry, see p. 213.

Section 6.2 The Schwarzschild metric 211

Page 212: General Relativity

attack a differential equation, when you have no idea how to pro-ceed, is to try a series solution. We have a small parameter m/r toexpand around, so let’s try to write h and k as series of the form

h = Σ∞n=0ak

(mr

)nk = Σ∞n=0bk

(mr

)nWe already know a0, a1, and b0. Let’s try to find b1. In the

following Maxima code I omit the factor of m in h1 for convenience.In other words, we’re looking for the solution for m = 1.

1 load(ctensor);

2 ct_coords:[t,r,theta,phi];

3 lg:matrix([(1-2/r),0,0,0],

4 [0,-(1+b1/r),0,0],

5 [0,0,-r^2,0],

6 [0,0,0,-r^2*sin(theta)^2]);

7 cmetric();

8 ricci(true);

I won’t reproduce the entire output of the Ricci tensor, whichis voluminous. We want all four of its nonvanishing components tovanish as quickly as possible for large values of r, so I decided tofiddle with Rtt, which looked as simple as any of them. It appearsto vary as r−4 for large r, so let’s evaluate limr→∞

(r4Rtt

):

9 limit(r^4*ric[1,1],r,inf);

The result is (b1−2)/2, so let’s set b1 = 2. The approximate solutionwe’ve found so far (reinserting the m’s),

ds2 ≈(

1− 2m

r

)dt2 −

(1 +

2m

r

)dr2 − r2 dθ2 − r2 sin2 θ dφ2 ,

was first derived by Einstein in 1915, and he used it to solve theproblem of the non-Keplerian relativistic correction to the orbit ofMercury, which was one of the first empirical tests of general rela-tivity.

Continuing in this fashion, the results are as follows:

a0 = 1 b0 = 1a1 = −2 b1 = 2a2 = 0 b2 = 4a3 = 0 b2 = 8

The closed-form solution

The solution is unexpectedly simple, and can be put into closedform. The approximate result we found for h was in fact exact.

212 Chapter 6 Vacuum Solutions

Page 213: General Relativity

For k we have a geometric series 1/(1− 2/r), and when we reinsertthe factor of m in the only way that makes the units work, we get1/(1− 2m/r). The result for the metric is

ds2 =

(1− 2m

r

)dt2−

(1

1− 2m/r

)dr2−r2 dθ2−r2 sin2 θ dφ2 .

This is called the Schwarzschild metric. A quick calculation in Max-ima demonstrates that it is an exact solution for all r, i.e., the Riccitensor vanishes everywhere, even at r < 2m, which is outside theradius of convergence of the geometric series.

Time-reversal symmetry

The Schwarzschild metric is invariant under time reversal, sincetime occurs only in the form of dt2, which stays the same underdt→ −dt. This is the same time-reversal symmetry that occurs inNewtonian gravity, where the field is described by the gravitationalacceleration g, and accelerations are time-reversal invariant.

Fundamentally, this is an example of general relativity’s coordi-nate independence. The laws of physics provided by general rela-tivity, such as the vacuum field equation, are invariant under anysmooth coordinate transformation, and t→ −t is such a coordinatetransformation, so general relativity has time-reversal symmetry.Since the Schwarzschild metric was found by imposing time-reversal-symmetric boundary conditions on a time-reversal-symmetric differ-ential equation, it is an equally valid solution when we time-reverseit. Furthermore, we expect the metric to be invariant under timereversal, unless spontaneous symmetry breaking occurs (see p. 321).

This suggests that we ask the more fundamental question of whatglobal symmetries general relativity has. Does it have symmetryunder parity inversion, for example? Or can we take any solutionsuch as the Schwarzschild spacetime and transform it into a frameof reference in which the source of the field is moving uniformly in acertain direction? Because general relativity is locally equivalent tospecial relativity, we know that these symmetries are locally valid.But it may not even be possible to define the corresponding globalsymmetries. For example, there are some spacetimes on which itis not even possible to define a global time coordinate. On such aspacetime, which is described as not time-orientable, there does notexist any smooth vector field that is everywhere timelike, so it isnot possible to define past versus future light-cones at all points inspace without having a discontinuous change in the definition occursomewhere. This is similar to the way in which a Mobius strip doesnot allow an orientation of its surface (an “up” direction as seen byan ant) to be defined globally.

Suppose that our spacetime is time-orientable, and we are ableto define coordinates (p, q, r, s) such that p is always the timelikecoordinate. Because q → −q is a smooth coordinate transforma-

Section 6.2 The Schwarzschild metric 213

Page 214: General Relativity

tion, we are guaranteed that our spacetime remains a valid solutionof the field equations under this change. But that doesn’t meanthat what we’ve found is a symmetry under parity inversion in aplane. Our coordinate q is not necessarily interpretable as distancealong a particular “q axis.” Such axes don’t even exist globally ingeneral relativity. A coordinate does not even have to have unitsof time or distance; it could be an angle, for example, or it mightnot have any geometrical significance at all. Similarly, we could doa transformation q → q′ = q + kp. If we think of q as measuringspatial position and p time, then this looks like a Galilean transfor-mation, with k being the velocity. The solution to the field equationsobtained after performing this transformation is still a valid solu-tion, but that doesn’t mean that relativity has Galilean symmetryrather than Lorentz symmetry. There is no sensible way to definea Galilean transformation acting on an entire spacetime, becausewhen we talk about a Galilean transformation we assume the exis-tence of things like global coordinate axes, which do not even existin general relativity.

6.2.5 Geodetic effect

As promised in section 5.5.1, we now calculate the geodetic effecton Gravity Probe B, including all the niggling factors of 3 and π. Tomake the physics clear, we approach the actual calculation througha series of warmups.

Flat space

As a first warmup, consider two spatial dimensions, representedby Euclidean polar coordinates (r,φ). Parallel-transport of a gyro-scope’s angular momentum around a circle of constant r gives

∇φLφ = 0

∇φLr = 0 .

Computing the covariant derivatives, we have

0 = ∂φLφ + ΓφφrL

r

0 = ∂φLr + ΓrφφL

φ .

The Christoffel symbols are Γφφr = 1/r and Γrφφ = −r. This is

all made to look needlessly complicated because Lφ and Lr are ex-pressed in different units. Essentially the vector is staying the same,but we’re expressing it in terms of basis vectors in the r and φ di-rections that are rotating. To see this more transparently, let r = 1,and write P for Lφ and Q for Lr, so that

P ′ = −QQ′ = P ,

which have solutions such as P = sinφ, Q = cosφ. For each orbit(2π change in φ), the basis vectors rotate by 2π, so the angular

214 Chapter 6 Vacuum Solutions

Page 215: General Relativity

momentum vector once again has the same components. In otherwords, it hasn’t really changed at all.

Spatial curvature only

The flat-space calculation above differs in two ways from theactual result for an orbiting gyroscope: (1) it uses a flat spatialgeometry, and (2) it is purely spatial. The purely spatial nature ofthe calculation is manifested in the fact that there is nothing in theresult relating to how quickly we’ve moved the vector around thecircle. We know that if we whip a gyroscope around in a circle onthe end of a rope, there will be a Thomas precession (section 2.5.4),which depends on the speed.

As our next warmup, let’s curve the spatial geometry, but con-tinue to omit the time dimension. Using the Schwarzschild metric,we replace the flat-space Christoffel symbol Γrφφ = −r with−r+2m.The differential equations for the components of the L vector, againevaluated at r = 1 for convenience, are now

P ′ = −QQ′ = (1− ε)P ,

where ε = 2m. The solutions rotate with frequency ω′ =√

1− ε.The result is that when the basis vectors rotate by 2π, the compo-nents no longer return to their original values; they lag by a factorof√

1− ε ≈ 1−m. Putting the factors of r back in, this is 1−m/r.The deviation from unity shows that after one full revolution, the Lvector no longer has quite the same components expressed in termsof the (r,φ) basis vectors.

To understand the sign of the effect, let’s imagine a counter-clockwise rotation. The (r,φ) rotate counterclockwise, so relativeto them, the L vector rotates clockwise. After one revolution, it hasnot rotated clockwise by a full 2π, so its orientation is now slightlycounterclockwise compared to what it was. Thus the contributionto the geodetic effect arising from spatial curvature is in the samedirection as the orbit.

Comparing with the actual results from Gravity Probe B, we seethat the direction of the effect is correct. The magnitude, however,is off. The precession accumulated over n periods is 2πnm/r, or,in SI units, 2πnGm/c2r. Using the data from section 2.5.4, we find∆θ = 2 × 10−5 radians, which is too small compared to the datashown in figure b on page 171.

2+1 Dimensions

To reproduce the experimental results correctly, we need to in-clude the time dimension. The angular momentum vector now hascomponents (Lφ,Lr,Lt). The physical interpretation of the Lt com-ponent is obscure at this point; we’ll return to this question later.

Section 6.2 The Schwarzschild metric 215

Page 216: General Relativity

Writing down the total derivatives of the three components, andnotating dt/dφ as ω−1, we have

dLφ

dφ= ∂φL

φ + ω−1∂tLφ

dLr

dφ= ∂φL

r + ω−1∂tLr

dLt

dφ= ∂φL

t + ω−1∂tLt

Setting the covariant derivatives equal to zero gives

0 = ∂φLφ + ΓφφrL

r

0 = ∂φLr + ΓrφφL

φ

0 = ∂tLr + ΓrttL

t

0 = ∂tLt + ΓttrL

r .

Self-check: There are not just four but six covariant derivativesthat could in principle have occurred, and in these six covariantderivatives we could have had a total of 18 Christoffel symbols. Ofthese 18, only four are nonvanishing. Explain based on symmetryarguments why the following Christoffel symbols must vanish: Γφφt,

Γttt.

Putting all this together in matrix form, we have L′ = ML,where

M =

0 −1 01− ε 0 −ε(1− ε)/2ω

0 −ε/2ω(1− ε) 0

.

The solutions of this differential equation oscillate like eiΩt, whereiΩ is an eigenvalue of the matrix.

Self-check: The frequency in the purely spatial calculation wasfound by inspection. Verify the result by applying the eigenvaluetechnique to the relevant 2× 2 submatrix.

To lowest order, we can use the Newtonian relation ω2r = Gm/rand neglect terms of order ε2, so that the two new off-diagonal ma-trix elements are both approximated as

√ε/2. The three resulting

eigenfrequencies are zero and Ω = ±[1− (3/2)m/r].

The presence of the mysterious zero-frequency solution can nowbe understood by recalling the earlier mystery of the physical inter-pretation of the angular momentum’s Lt component. Our resultscome from calculating parallel transport, and parallel transport isa purely geometric process, so it gives the same result regardlessof the physical nature of the four-vector. Suppose that we hadinstead chosen the velocity four-vector as our guinea pig. The def-inition of a geodesic is that it parallel-transports its own tangent

216 Chapter 6 Vacuum Solutions

Page 217: General Relativity

vector, so the velocity vector has to stay constant. If we inspectthe eigenvector corresponding to the zero-frequency eigenfrequency,we find a timelike vector that is parallel to the velocity four-vector.In our 2+1-dimensional space, the other two eigenvectors, whichare spacelike, span the subspace of spacelike vectors, which are theones that can physically be realized as the angular momentum ofa gyroscope. These two eigenvectors, which vary as e±iΩ, can besuperposed to make real-valued spacelike solutions that match theinitial conditions, and these lag the rotation of the basis vectorsby ∆Ω = (3/2)mr. This is greater than the purely spatial resultby a factor of 3/2. The resulting precession angle, over n orbits ofGravity Probe B, is 3πnGm/c2r = 3 × 10−5 radians, in excellentagreement with experiment.

One will see apparently contradictory statements in the litera-ture about whether Thomas precession occurs for a satellite: “TheThomas precession comes into play for a gyroscope on the surfaceof the Earth . . . , but not for a gyroscope in a freely moving satel-lite.”6 But: “The total effect, geometrical and Thomas, gives thewell-known Fokker-de Sitter precession of 3πm/r, in the same senseas the orbit.”7 The second statement arises from subtracting thepurely spatial result from the 2+1-dimensional result, and notingthat the absolute value of this difference is the same as the Thomasprecession that would have been obtained if the gyroscope had beenwhirled at the end of a rope. In my opinion this is an unnaturalway of looking at the physics, for two reasons. (1) The signs don’tmatch, so one is forced to say that the Thomas precession has adifferent sign depending on whether the rotation is the result ofgravitational or nongravitational forces. (2) Referring to observa-tion, it is clearly artificial to treat the spatial curvature and Thomaseffects separately, since neither one can be disentangled from theother by varying the quantities n, m, and r. For more discussion,see tinyurl.com/me3qf8o.

6.2.6 Orbits

The main event of Newton’s Principia Mathematica is his proofof Kepler’s laws. Similarly, Einstein’s first important application ingeneral relativity, which he began before he even had the exact formof the Schwarzschild metric in hand, was to find the non-Newtonianbehavior of the planet Mercury. The planets deviate from Keplerianbehavior for a variety of Newtonian reasons, and in particular thereis a long list of reasons why the major axis of a planet’s ellipticalorbit is expected to gradually rotate. When all of these were takeninto account, however, there was a remaining discrepancy of about40 seconds of arc per century, or 6.6× 10−7 radians per orbit. Thedirection of the effect was in the forward direction, in the sense that

6Misner, Thorne, and Wheeler, Gravitation, p. 11187Rindler, Essential Relativity, 1969, p. 141

Section 6.2 The Schwarzschild metric 217

Page 218: General Relativity

if we view Mercury’s orbit from above the ecliptic, so that it orbitsin the counterclockwise direction, then the gradual rotation of themajor axis is also counterclockwise.

As a very rough hand-wavy explanation for this effect, considerthe spatial part of the curvature of the spacetime surrounding thesun. This spatial curvature is positive, so a circle’s circumferenceis less than 2π times its radius. We could imagine that this wouldcause Mercury to get back to a previously visited angular positionbefore it has had time to complete its Newtonian cycle of radialmotion. Arguments such as this one, however, should not be takentoo seriously. A mathematical analysis is required.

Based on the examples in section 5.5, we expect that the effectwill be of order m/r, where m is the mass of the sun and r is theradius of Mercury’s orbit. This works out to be 2.5 × 10−8, whichis smaller than the observed precession by a factor of about 26.

Conserved quantities

If Einstein had had a computer on his desk, he probably wouldsimply have integrated the motion numerically using the geodesicequation. But it is possible to simplify the problem enough to at-tack it with pencil and paper, if we can find the relevant conservedquantities of the motion. Nonrelativistically, these are energy andangular momentum.

Consider a rock falling directly toward the sun. The Schwarzschildmetric is of the special form

ds2 = h(r) dt2 − k(r) dr2 − . . . .

The rock’s trajectory is a geodesic, so it extremizes the proper times between any two events fixed in spacetime, just as a piece of stringstretched across a curved surface extremizes its length. Let the rockpass through distance r1 in coordinate time t1, and then through r2

in t2. (These should really be notated as ∆r1, . . . or dr1, . . . , but weavoid the ∆’s or d’s for convenience.) Approximating the geodesicusing two line segments, the proper time is

s = s1 + s2

=√h1t21 − k1r2

1 +√h2t22 − k2r2

2

=√h1t21 − k1r2

1 +√h2(T − t1)2 − k2r2

2 ,

where T = t1 + t2 is fixed. If this is to be extremized with respectto t1, then ds/dt1 = 0, which leads to

0 =h1t1s1− h2t2

s2,

which means that

hdt

ds= gtt

dxt

ds=

dxtds

218 Chapter 6 Vacuum Solutions

Page 219: General Relativity

b / Proof that if the metric’scomponents are independent oft , the geodesic of a test particleconserves pt .

is a constant of the motion. Except for an irrelevant factor of m,this is the same as pt, the timelike component of the covariant mo-mentum vector. We’ve already seen that in special relativity, thetimelike component of the momentum four-vector is interpreted asthe mass-energy E, and the quantity pt has a similar interpretationhere. Note that no special assumption was made about the form ofthe functions h and k. In addition, it turns out that the assumptionof purely radial motion was unnecessary. All that really matteredwas that h and k were independent of t. Therefore we will havea similar conserved quantity pµ any time the metric’s components,expressed in a particular coordinate system, are independent of xµ.(This is generalized on p. 248.) In particular, the Schwarzschildmetric’s components are independent of φ as well as t, so we have asecond conserved quantity pφ, which is interpreted as angular mo-mentum.

Writing these two quantities out explicitly in terms of the con-travariant coordinates, in the case of the Schwarzschild spacetime,we have

E =

(1− 2m

r

)dt

ds

and

L = r2 dφ

ds

for the conserved energy per unit mass and angular momentum perunit mass.

In interpreting the energy per unit mass E, it is important tounderstand that in the general-relativistic context, there is no use-ful way of separating the rest mass, kinetic energy, and potentialenergy into separate terms, as we could in Newtonian mechanics.E includes contributions from all of these, and turns out to be lessthan the contribution due to the rest mass (i.e., less than 1) for aplanet orbiting the sun. It turns out that E can be interpreted as ameasure of the additional gravitational mass that the solar systempossesses as measured by a distant observer, due to the presence ofthe planet. It then makes sense that E is conserved; by analogywith Newtonian mechanics, we would expect that any gravitationaleffects that depended on the detailed arrangement of the masseswithin the solar system would decrease as 1/r4, becoming negligibleat large distances and leaving a constant field varying as 1/r2.

One way of seeing that it doesn’t make sense to split E into partsis that although the equation given above for E involves a specific setof coordinates, E can actually be expressed as a Lorentz-invariantscalar (see p. 248). This property makes E especially interesting anduseful (and different from the energy in Newtonian mechanics, whichis conserved but not frame-independent). On the other hand, the

Section 6.2 The Schwarzschild metric 219

Page 220: General Relativity

kinetic and potential energies depend on the velocity and position.These are completely dependent on the coordinate system, and thereis nothing physically special about the coordinate system we’ve usedhere. Suppose a particle is falling directly toward the earth, and anastronaut in a space-suit is free-falling along with it and monitoringits progress. The astronaut judges the particle’s kinetic energy tobe zero, but other observers say it’s nonzero, so it’s clearly not aLorentz scalar. And suppose the astronaut insists on defining apotential energy to go along with this kinetic energy. The potentialenergy must be decreasing, since the particle is getting closer to theearth, but then there is no way that the sum of the kinetic andpotential energies could be constant.

Perihelion advance

For convenience, let the mass of the orbiting rock be 1, while mstands for the mass of the gravitating body.

The unit mass of the rock is a third conserved quantity, andsince the magnitude of the momentum vector equals the square ofthe mass, we have for an orbit in the plane θ = π/2,

1 = gttp2t − grrp2

r − gφφp2φ

= gttp2t − grr(pr)2 − gφφp2

φ

=1

1− 2m/rE2 − 1

1− 2m/r

(dr

ds

)2

− 1

r2L2 .

Rearranging terms and writing r for dr/ds, this becomes

r2 = E2 − (1− 2m/r)(1 + L2/r2)

or

r2 = E2 − U2

where

U2 = (1− 2m/r)(1 + L2/r2) .

There is a varied and strange family of orbits in the Schwarzschildfield, including bizarre knife-edge trajectories that take several nearlycircular turns before suddenly flying off. We turn our attention in-stead to the case of an orbit such as Mercury’s which is nearlyNewtonian and nearly circular.

Nonrelativistically, a circular orbit has radius r = L2/m andperiod T = 2πL3/m2.

Relativistically, a circular orbit occurs when there is only oneturning point at which r = 0. This requires that E2 equal the

220 Chapter 6 Vacuum Solutions

Page 221: General Relativity

minimum value of U2, which occurs at

r =L2

2m

(1 +

√1− 12m2/L2

)≈ L2

m(1− ε) ,

where ε = 3(m/L)2. A planet in a nearly circular orbit oscillatesbetween perihelion and aphelion with a period that depends on thecurvature of U2 at its minimum. We have

k =d2(U2)

dr2

=d2

dr2

(1− 2m

r+L2

r2− 2mL2

r3

)= −4m

r3+

6L2

r4− 24mL2

r5

= 2L−6m4(1 + 2ε)

The period of the oscillations is

∆sosc = 2π√

2/k

= 2πL3m−2(1− ε) .

The period of the azimuthal motion is

∆saz = 2πr2/L

= 2πL3m−2(1− 2ε) .

The periods are slightly mismatched because of the relativistic cor-rection terms. The period of the radial oscillations is longer, sothat, as expected, the perihelion shift is in the forward direction.The mismatch is ε∆s, and because of it each orbit rotates the ma-jor axis by an angle 2πε = 6π(m/L)2 = 6πm/r. Plugging in thedata for Mercury, we obtain 5.8 × 10−7 radians per orbit, whichagrees with the observed value to within about 10%. Eliminatingsome of the approximations we’ve made brings the results in agree-ment to within the experimental error bars, and Einstein recalledthat when the calculation came out right, “for a few days, I wasbeside myself with joyous excitement.”

Further attempts were made to improve on the precision of thishistorically crucial test of general relativity. Radar now gives themost precise orbital data for Mercury. At the level of about one partper thousand, however, an effect creeps in due to the oblateness ofthe sun, which is difficult to measure precisely.

In 1974, astronomers J.H. Taylor and R.A. Hulse of Princeton,working at the Arecibo radio telescope, discovered a binary starsystem whose members are both neutron stars. The detection of

Section 6.2 The Schwarzschild metric 221

Page 222: General Relativity

the system was made possible because one of the neutron stars isa pulsar: a neutron star that emits a strong radio pulse in thedirection of the earth once per rotational period. The orbit is highlyelliptical, and the minimum separation between the two stars is verysmall, about the same as the radius of our sun. Both because ther is small and because the period is short (about 8 hours), the rateof perihelion advance per unit time is very large, about 4.2 degreesper year. The system has been compared in great detail with thepredictions of general relativity,8 giving extremely good agreement,and as a result astronomers have been confident enough to reason inthe opposite direction and infer properties of the system, such as itstotal mass, from the general-relativistic analysis. The system’s orbitis decaying due to the radiation of energy in the form of gravitationalwaves, which are predicted to exist by relativity.

6.2.7 Deflection of light

As discussed on page 171, one of the first tests of general rel-ativity was Eddington’s measurement of the deflection of rays oflight by the sun’s gravitational field. The deflection measured byEddington was 1.6 seconds of arc. For a light ray that grazes thesun’s surface, the only physically relevant parameters are the sun’smass m and radius r. Since the deflection is unitless, it can onlydepend on m/r, the unitless ratio of the sun’s mass to its radius.Expressed in SI units, this is Gm/c2r, which comes out to be about10−6. Roughly speaking, then, we expect the order of magnitude ofthe effect to be about this big, and indeed 10−6 radians comes outto be in the same ball-park as a second of arc. We get a similarestimate in Newtonian physics by treating a photon as a (massive)particle moving at speed c.

It is possible to calculate a precise value for the deflection us-ing methods very much like those used to determine the perihelionadvance in section 6.2.6. However, some of the details would haveto be changed. For example, it is no longer possible to parametrizethe trajectory using the proper time s, since a light ray has ds = 0;we must use an affine parameter. Let us instead use this an anexample of the numerical technique for solving the geodesic equa-tion, first demonstrated in section 5.9.2 on page 188. Modifying ourearlier program, we have the following:

1 import math

2

3 # constants, in SI units:

4 G = 6.67e-11 # gravitational constant

5 c = 3.00e8 # speed of light

6 m_kg = 1.99e30 # mass of sun

7 r_m = 6.96e8 # radius of sun

8http://arxiv.org/abs/astro-ph/0407149

222 Chapter 6 Vacuum Solutions

Page 223: General Relativity

8

9 # From now on, all calculations are in units of the

10 # radius of the sun.

11

12 # mass of sun, in units of the radius of the sun:

13 m_sun = (G/c**2)*(m_kg/r_m)

14 m = 1000.*m_sun

15 print "m/r=",m

16

17 # Start at point of closest approach.

18 # initial position:

19 t=0

20 r=1 # closest approach, grazing the sun’s surface

21 phi=-math.pi/2

22 # initial derivatives of coordinates w.r.t. lambda

23 vr = 0

24 vt = 1

25 vphi = math.sqrt((1.-2.*m/r)/r**2)*vt # gives ds=0, lightlike

26

27 l = 0 # affine parameter lambda

28 l_max = 20000.

29 epsilon = 1e-6 # controls how fast lambda varies

30 while l<l_max:

31 dl = epsilon*(1.+r**2) # giant steps when farther out

32 l = l+dl

33 # Christoffel symbols:

34 Gttr = m/(r**2-2*m*r)

35 Grtt = m/r**2-2*m**2/r**3

36 Grrr = -m/(r**2-2*m*r)

37 Grphiphi = -r+2*m

38 Gphirphi = 1/r

39 # second derivatives:

40 # The factors of 2 are because we have, e.g., G^a_bc=G^a_cb

41 at = -2.*Gttr*vt*vr

42 ar = -(Grtt*vt*vt + Grrr*vr*vr + Grphiphi*vphi*vphi)

43 aphi = -2.*Gphirphi*vr*vphi

44 # update velocity:

45 vt = vt + dl*at

46 vr = vr + dl*ar

47 vphi = vphi + dl*aphi

48 # update position:

49 r = r + vr*dl

50 t = t + vt*dl

51 phi = phi + vphi*dl

52

53 # Direction of propagation, approximated in asymptotically flat coords.

54 # First, differentiate (x,y)=(r cos phi,r sin phi) to get vx and vy:

55 vx = vr*math.cos(phi)-r*math.sin(phi)*vphi

Section 6.2 The Schwarzschild metric 223

Page 224: General Relativity

56 vy = vr*math.sin(phi)+r*math.cos(phi)*vphi

57 prop = math.atan2(vy,vx) # inverse tan of vy/vx, in the proper quadrant

58 prop_sec = prop*180.*3600/math.pi

59 print "final direction of propagation = %6.2f arc-seconds" % prop_sec

At line 14, we take the mass to be 1000 times greater than themass of the sun. This helps to make the deflection easier to calcu-late accurately without running into problems with rounding errors.Lines 17-25 set up the initial conditions to be at the point of closestapproach, as the photon is grazing the sun. This is easier to setup than initial conditions in which the photon approaches from faraway. Because of this, the deflection angle calculated by the pro-gram is cut in half. Combining the factors of 1000 and one half, thefinal result from the program is to be interpreted as 500 times theactual deflection angle.

The result is that the deflection angle is predicted to be 870seconds of arc. As a check, we can run the program again withm = 0; the result is a deflection of −8 seconds, which is a measureof the accumulated error due to rounding and the finite incrementused for λ.

Dividing by 500, we find that the predicted deflection angle is1.74 seconds, which, expressed in radians, is exactly 4Gm/c2r. Theunitless factor of 4 is in fact the correct result in the case of smalldeflections, i.e., for m/r 1.

Although the numerical technique has the disadvantage that itdoesn’t let us directly prove a nice formula, it has some advantagesas well. For one thing, we can use it to investigate cases for whichthe approximation m/r 1 fails. For m/r = 0.3, the numericaltechique gives a deflection of 222 degrees, whereas the weak-fieldapproximation 4Gm/c2r gives only 69 degrees. What is happeninghere is that we’re getting closer and closer to the event horizon of ablack hole. Black holes are the topic of section 6.3, but it should beintuitively reasonable that something wildly nonlinear has to happenas we get close to the point where the light wouldn’t even be ableto escape.

The precision of Eddington’s original test was only about ± 30%,and has never been improved on significantly with visible-light as-tronomy. A better technique is radio astronomy, which allows mea-surements to be carried out without waiting for an eclipse. Onemerely has to wait for the sun to pass in front of a strong, compactradio source such as a quasar. These techniques have now verifiedthe deflection of light predicted by general relativity to a relativeprecision of about 10−5.9

9For a review article on this topic, see Clifford Will, “The Con-frontation between General Relativity and Experiment,” http://relativity.

livingreviews.org/Articles/lrr-2006-3/.

224 Chapter 6 Vacuum Solutions

Page 225: General Relativity

6.3 Black holes6.3.1 Singularities

A provocative feature of the Schwarzschild metric is that it haselements that blow up at r = 0 and at r = 2m. If this is a descriptionof the sun, for example, then these singularities are of no physicalsignificance, since we only solved the Einstein field equation for thevacuum region outside the sun, whereas r = 2m would lie about 3km from the sun’s center. Furthermore, it is possible that one orboth of these singularities is nothing more than a spot where ourcoordinate system misbehaves. This would be known as a coordinatesingularity. For example, the metric of ordinary polar coordinatesin a Euclidean plane has gθθ →∞ as r → 0.

One way to test whether a singularity is a coordinate singularityis to calculate a scalar measure of curvature, whose value is inde-pendent of the coordinate system. We can take the trace of theRicci tensor, Raa, known as the scalar curvature or Ricci scalar,but since the Ricci tensor is zero, it’s not surprising that that iszero. A different scalar we can construct is the product RabcdRabcdof the Riemann tensor with itself. This is known as the Kretch-mann invariant. The Maxima command lriemann(true) displaysthe nonvanishing components of Rabcd. The component that misbe-haves the most severely at r = 0 is Rtrrt = 2m/r3. Because of this,the Kretchmann invariant blows up like r−6 as r → 0. This showsthat the singularity at r = 0 is a real, physical singularity.

The singularity at r = 2m, on the other hand, turns out tobe only a coordinate singularity. To prove this, we have to usesome technique other than constructing scalar measures of curva-ture. Even if every such scalar we construct is finite at r = 2m, thatdoesn’t prove that every such scalar we could construct is also wellbehaved. We can instead search for some other coordinate systemin which to express the solution to the field equations, one in whichno such singularity appears. A partially successful change of coordi-nates for the Schwarzschild metric, found by Eddington in 1924, ist→ t′ = t−2m ln(r−2m) (see problem 8 on page 225). This makesthe covariant metric finite at r = 2m, although the contravariantmetric still blows up there. A more complicated change of coor-dinates that completely eliminates the singularity at r = 2m wasfound by Eddington and Finkelstein in 1958, establishing that thesingularity was only a coordinate singularity. Thus, if an observeris so unlucky as to fall into a black hole, he will not be subjectedto infinite tidal stresses — or infinite anything — at r = 2m. Hemay not notice anything special at all about his local environment.(Or he may already be dead because the tidal stresses at r > 2m,although finite, were nevertheless great enough to kill him.)

Section 6.3 Black holes 225

Page 226: General Relativity

6.3.2 Event horizon

Even though r = 2m isn’t a real singularity, interesting thingsdo happen there. For r < 2m, the sign of gtt becomes negative,while grr is positive. In our + − −− signature, this has the fol-lowing interpretation. For the world-line of a material particle, ds2

is supposed to be the square of the particle’s proper time, and itmust always be positive. If a particle had a constant value of r, forr < 2m, it would have ds2 < 0, which is impossible.

The timelike and spacelike characters of the r and t coordinateshave been swapped, so r acts like a time coordinate.

Thus for an object compact enough that r = 2m is exterior,r = 2m is an event horizon: future light cones tip over so far thatthey do not allow causal relationships to connect with the spacetimeoutside. In relativity, event horizons do not occur only in the contextof black holes; their properties, and some of the implications forblack holes, have already been discussed in section 6.1.

The gravitational time dilation in the Schwarzschild field, rel-ative to a clock at infinity, is given by the square root of the gttcomponent of the metric. This goes to zero at the event horizon,meaning that, for example, a photon emitted from the event horizonwill be infinitely redshifted when it reaches an observer at infinity.This makes sense, because the photon is then undetectable, just asit would be if it had been emitted from inside the event horizon.If matter is falling into a black hole, then due to time dilation anobserver at infinity “sees” that matter as slowing down more andmore as it approaches the horizon. This has some counterintuitiveeffects. A radially infalling particle has d2 r/dt2 > 0 once it fallspast a certain point, which could be interpreted as a gravitationalrepulsion. The observer at infinity may also be led to describe theblack hole as consisting of an empty, spherical shell of matter thatnever quite made it through the horizon. If asked what holds theshell up, the observer could say that it is held up by gravitationalrepulsion.

There is actually nothing wrong with any of this, but one shouldrealize that it is only one possible description in one possible coor-dinate system. An observer hovering just outside the event horizonsees a completely different picture, with matter falling past at ve-locities that approach the speed of light as it comes to the eventhorizon. If an atom emits a photon from the event horizon, thehovering observer sees it as being infinitely red-shifted, but explainsthe red-shift as a kinematic one rather than a gravitational one.

We can imagine yet a third observer, one who free-falls alongwith the infalling matter. According to this observer, the gravita-tional field is always zero, and it takes only a finite time to passthrough the event horizon.

226 Chapter 6 Vacuum Solutions

Page 227: General Relativity

If a black hole has formed from the gravitational collapse of acloud of matter, then some of our observers can say that “right now”the matter is located in a spherical shell at the event horizon, whileothers can say that it is concentrated at an infinitely dense singular-ity at the center. Since simultaneity isn’t well defined in relativity,it’s not surprising that they disagree about what’s happening “rightnow.” Regardless of where they say the matter is, they all agree onthe spacetime curvature. In fact, Birkhoff’s theorem (p. 256) tells usthat any spherically symmetric vacuum spacetime is Schwarzschildin form, so it doesn’t matter where we say the matter is, as long asit’s distributed in a spherically symmetric way and surrounded byvacuum.

6.3.3 Expected formation

Einstein and Schwarzschild did not believe, however, that any ofthese features of the Schwarzschild metric were more than a mathe-matical curiosity, and the term “black hole” was not invented untilthe 1967, by John Wheeler. Although there is quite a bit of evi-dence these days that black holes do exist, there is also the relatedquestion of what sizes they come in.

We might expect naively that since gravity is an attractive force,there would be a tendency for any primordial cloud of gas or dustto spontaneously collapse into a black hole. But clouds of less thanabout 0.1M (0.1 solar masses) form planets, which achieve a per-manent equilibrium between gravity and internal pressure. Heavierobjects initiate nuclear fusion, but those with masses above about100M are immediately torn apart by their own solar winds. In therange from 0.1 to 100M, stars form. As discussed in section 4.4.3,those with masses greater than about a few M are expected toform black holes when they die. We therefore expect, on theoreticalgrounds, that the universe should contain black holes with massesranging from a few solar masses to a few tens of solar masses.

6.3.4 Observational evidence

A black hole is expected to be a very compact object, with astrong gravitational field, that does not emit any of its own light. Abare, isolated black hole would be difficult to detect, except perhapsvia its lensing of light rays that happen to pass by it. But if a blackhole occurs in a binary star system, it is possible for mass to betransferred onto the black hole from its companion, if the compan-ion’s evolution causes it to expand into a giant and intrude uponthe black hole’s gravity well. The infalling gas would then get hotand emit radiation before disappearing behind the event horizon.The object known as Cygnus X-1 is the best-studied example. ThisX-ray-emitting object was discovered by a rocket-based experimentin 1964. It is part of a double-star system, the other member beinga blue supergiant. They orbit their common center of mass witha period of 5.6 days. The orbit is nearly circular, and has a semi-

Section 6.3 Black holes 227

Page 228: General Relativity

a / A black hole accretes matterfrom a companion star.

major axis of about 0.2 times the distance from the earth to the sun.Applying Kepler’s law of periods to these data constrains the sumof the masses, and knowledge of stellar structure fixes the mass ofthe supergiant. The result is that the mass of Cygnus X-1 is greaterthan about 10 solar masses, and this is confirmed by multiple meth-ods. Since this is far above the Tolman-Oppenheimer-Volkoff limit,Cygnus X-1 is believed to be a black hole, and its X-ray emissionsare interpreted as the radiation from the disk of superheated ma-terial accreting onto it from its companion. It is believed to havemore than 90% of the maximum possible spin for a black hole of itsmass.10

Around the turn of the 21st century, new evidence was foundfor the prevalence of supermassive black holes near the centers ofnearly all galaxies, including our own. Near our galaxy’s center isan object called Sagittarius A*, detected because nearby stars orbitaround it. The orbital data show that Sagittarius A* has a massof about four million solar masses, confined within a sphere witha radius less than 2.2 × 107 km. There is no known astrophysicalmodel that could prevent the collapse of such a compact object intoa black hole, nor is there any plausible model that would allow thismuch mass to exist in equilibrium in such a small space, withoutemitting enough light to be observable.

The existence of supermassive black holes is surprising. Gasclouds with masses greater than about 100 solar masses cannot formstable stars, so supermassive black holes cannot be the end-point ofthe evolution of heavy stars. Mergers of multiple stars to form moremassive objects are generally statistically unlikely, since a star issuch a small target in relation to the distance between the stars.Once astronomers were confronted with the empirical fact of theirexistence, a variety of mechanisms was proposed for their formation.Little is known about which of these mechanisms is correct, althoughthe existence of quasars in the early universe is interpreted as evi-dence that mass accreted rapidly onto supermassive black holes inthe early stages of the evolution of the galaxies.

A skeptic could object that although Cygnus X-1 and Sagittar-ius A* are more compact than is believed possible for a neutronstar, this does not necessarily prove that they are black holes. In-deed, speculative theories have been proposed in which exotic ob-jects could exist that are intermediate in compactness between blackholes and neutron stars. These hypothetical creatures have nameslike black stars, gravastars, quark stars, boson stars, Q-balls, andelectroweak stars. Although there is no evidence that these the-ories are right or that these objects exist, we are faced with thequestion of how to determine whether a given object is really a

10Gou et al., “The Extreme Spin of the Black Hole in Cygnus X-1,” http:

//arxiv.org/abs/1106.3690

228 Chapter 6 Vacuum Solutions

Page 229: General Relativity

black hole or one of these other species. The defining character-istic of a black hole is that it has an event horizon rather thana physical surface. If an object is not a black hole, then by con-servation of energy any matter that falls onto it must release itsgravitational potential energy when it hits that surface. Cygnus X-1 has a copious supply of matter falling onto it from its supergiantcompanion, and Sagittarius A* likewise accretes a huge amount ofgas from the stellar wind of nearby stars. By analyzing millimeterand infrared very-long-baseline-interferometry observations, Brod-erick, Loeb, and Narayan11 have shown that if Sagittarius A* had asurface, then the luminosity of this surface must be less than 0.3% ofthe luminosity of the accretion disk. But this is not physically pos-sible, because there are fundamental limits on the efficiency withwhich the gas can radiate away its energy before hitting the sur-face. We can therefore conclude that Sagittarius A* must have anevent horizon. Its event horizon may be imaged directly in the nearfuture.12

6.3.5 Singularities and cosmic censorship

Informal ideas

Since we observe that black holes really do exist, maybe weshould take the singularity at r = 0 seriously. Physically, it saysthat the mass density and tidal forces blow up to infinity there.

Generally when a physical theory says that observable quantitiesblow up to infinity at a particular point, it means that the theory hasreached the point at which it can no longer make physical predic-tions. For instance, Maxwell’s theory of electromagnetism predictsthat the electric field blows up like r−2 near a point charge, andthis implies that infinite energy is stored in the field within a finiteradius around the charge. Physically, this can’t be right, becausewe know it only takes 511 keV of energy to create an electron outof nothing, e.g., in nuclear beta decay. The paradox is resolved byquantum electrodynamics, which modifies the description of the vac-uum around the electron to include a sea of virtual particles poppinginto and out of existence.

In the case of the singularity at the center of a black hole, it ispossible that quantum mechanical effects at the Planck scale preventthe formation of a singularity. Unfortunately, we are unlikely to findany empirical evidence about this, since black holes always seemto come clothed in event horizons, so we outside observers cannotextract any data about the singularity inside. Even if we take asuicidal trip into a black hole, we get no data about the singularity,because the singularity in the Schwarzschild metric is spacelike, nottimelike, and therefore it always lies in our future light cone, neverin our past.

11arxiv.org/abs/0903.110512arxiv.org/abs/0906.4040

Section 6.3 Black holes 229

Page 230: General Relativity

In a way, the inaccessibility of singularities is a good thing. If asingularity exists, it is a point at which all the known laws of physicsbreak down, and physicists therefore have no way of predicting any-thing about its behavior. There is likewise no great crisis for physicsdue to the Big Bang singularity or the Big Crunch singularity thatoccurs in some cosmologies in which the universe recollapses; wehave no reasonable expectation of being able to make and test pre-dictions or retrodictions that extend beyond the beginning or endof the universe.

What would be a crushing blow to the enterprise of physics wouldbe a singularity that could sit on someone’s desk. As John Earmanof the University of Pittsburgh puts it, anything could pop out ofsuch a singularity, including green slime or your lost socks. In moretechnical language, a singularity would constitute an extreme viola-tion of unitarity and an acute instance of the information paradox(see page 205).

There is no obvious reason that general relativity should not al-low naked singularities, but neither do we know of any real-worldprocess by which one could be formed by gravitational collapse. Pen-rose’s cosmic censorship hypothesis states that the laws of physicsprevent the formation of naked singularities from nonsingular andgeneric initial conditions. “Generic” is a necessary addition to Pen-rose’s original 1969 formulation, since Choptuik showed in 1993 thatcertain perfectly fine-tuned initial conditions allowed collapse to anaked singularity.13

Formal definitions

The remainder of this subsection provides a more formal expo-sition of the definitions relating to singularities. It can be skippedwithout loss of continuity.

The reason we care about singularities is that they indicate anincompleteness of the theory, and the theory’s inability to makepredictions. One of the simplest things we could ask any theoryto do would be to predict the trajectories of test particles. Forexample, Maxwell’s equations correctly predict the motion of anelectron in a uniform magnetic field, but they fail to predict themotion of an electron that collides head-on with a positron. It mighthave been natural for someone in Maxwell’s era (assuming they wereinformed about the existence of positrons and told to assume thatboth particles were pointlike) to guess that the two particles wouldscatter through one another at θ = 0, their velocities momentarilybecoming infinite. But it would have been equally natural for thisperson to refuse to make a prediction.

Similarly, if a particle hits a black hole singularity, we should notexpect general relativity to make a definite prediction. It doesn’t.

13Phys. Rev. Lett. 70, p. 9

230 Chapter 6 Vacuum Solutions

Page 231: General Relativity

b / A conical singularity. Thecone has zero intrinsic curvatureeverywhere except at its tip.Geodesic 1 can be extended in-finitely far, but geodesic 2 cannot;since the metric is undefined atthe tip, there is no sensible wayto define how geodesic 2 shouldbe extended.

Not only does the geodesic equation break down, but if we were tonaively continue the particle’s geodesic by assuming that it scattersin the forward direction, the continuation would be a world-linewhose future-time direction pointed into the singularity rather thanback out of it.

We would therefore like to define a singularity as a situation inwhich the geodesics of test particles can’t be extended indefinitely.But what does “indefinitely” mean? If the test particle is a photon,then the metric length of its world-line is zero. We get around thisby defining length in terms of an affine parameter.

Definition: A spacetime is said to be geodesically incomplete ifthere exist timelike or lightlike geodesics that cannot be extendedpast some finite affine parameter into the past or future.

Geodesic incompleteness defines what we mean by a singularity.A geodesically incomplete spacetime has one or more singularitiesin it. The Schwarzschild spacetime has a singularity at r = 0, butnot at the event horizon, since geodesics continue smoothly pastthe event horizon. Cosmological spacetimes contain a Big Bangsingularity which prevents geodesics from being extended beyond acertain point in the past.

There are two types of singularities, curvature singularities andconical singularities. The examples above are curvature singulari-ties. Figure b shows an example of a conical singularity. (Cf. figureb, 192.) As one approaches a curvature singularity, the curvature ofspacetime diverges to infinity, as measured by a curvature invariantsuch as the Ricci scalar. In 2+1-dimensional relativity, curvaturevanishes identically, and the only kind of gravity that exists is dueto conical singularities. Conical singularities are not expected tobe present in our universe, since there is no known mechanism bywhich they could be formed by gravitational collapse.

Actual singularities involving geodesic incompleteness are to bedistinguished from coordinate singularities, which are not really sin-gularities at all. In the Schwarzschild spacetime, as described inSchwarzschild’s original coordinates, some components of the met-ric blow up at the event horizon, but this is not an actual singularity.This coordinate system can be replaced with a different one in whichthe metric is well behaved.

The reason curvature scalars are useful as tests for an actual cur-vature singularity is that since they’re scalars, they can’t diverge inone coordinate system but stay finite in another. We define a singu-larity to be a curvature singularity if timelike or lightlike geodesicscan only be extended to some finite affine parameter, and some cur-vature scalar (not necessarily every such scalar) approaches infinityas we approach this value of the affine parameter. Anything that isnot a curvature singularity is considered a conical singularity.

Section 6.3 Black holes 231

Page 232: General Relativity

A singularity is not considered to be a point in a spacetime; it’smore like a hole in the topology of the manifold. For example, theBig Bang didn’t occur at a point.

Because a singularity isn’t a point or a point-set, we can’t defineits timelike or spacelike character in quite the way we would with,say, a curve. A timelike singularity is one such that an observerwith a timelike world-line can have the singularity sometimes in hisfuture light-cone and sometimes in his past light-cone.14

Schwarzschild and Big Bang singularities are spacelike. (Notethat in the Schwarzschild metric, the Schwarzschild r and t coordi-nates swap their timelike and spacelike characters inside the eventhorizon.)

The definition of a timelike singularity is local. A timelike singu-larity would be one that you could have sitting on your desk, whereyou could look at it and poke it with a stick.

A naked singularity is one from which timelike or lightlike world-lines can originate and then escape to infinity. The Schwarzschildmetric’s singularity is not naked. This notion is global.

If either a timelike or a naked singularity can be formed by grav-itational collapse from realistic initial conditions, then it would cre-ate severe difficulties for physicists wishing to make predictions usingthe laws of physics.

6.3.6 Hawking radiation

Radiation from black holes

Since event horizons are expected to emit blackbody radiation,a black hole should not be entirely black; it should radiate. This iscalled Hawking radiation. Suppose observer B just outside the eventhorizon blasts the engines of her rocket ship, producing enough ac-celeration to keep from being sucked in. By the equivalence princi-ple, what she observes cannot depend on whether the accelerationshe experiences is actually due to a gravitational field. She there-fore detects radiation, which she interprets as coming from the eventhorizon below her. As she gets closer and closer to the horizon, theacceleration approaches infinity, so the intensity and frequency ofthe radiation grows without limit.

A distant observer A, however, sees a different picture. Accord-ing to A, B’s time is extremely dilated. A sees B’s accelerationas being only ∼ 1/m, where m is the mass of the black hole; Adoes not perceive this acceleration as blowing up to infinity as Bapproaches the horizon. When A detects the radiation, it is ex-tremely red-shifted, and it has the spectrum that one would expect

14Penrose, Gravitational radiation and gravitational collapse; Proceedings ofthe Symposium, Warsaw, 1973. Dordrecht, D. Reidel Publishing Co. pp. 82-91,free online at adsabs.harvard.edu/full/1974IAUS...64...82P

232 Chapter 6 Vacuum Solutions

Page 233: General Relativity

for a horizon characterized by an acceleration a ∼ 1/m. The resultfor a 10-solar-mass black hole is T ∼ 10−8 K, which is so low thatthe black hole is actually absorbing more energy from the cosmicmicrowave background radiation than it emits.

Direct observation of black-hole radiation is therefore probablyonly possible for black holes of very small masses. These may havebeen produced soon after the big bang, or it is conceivable thatthey could be created artificially, by advanced technology. If black-hole radiation does exist, it may help to resolve the informationparadox, since it is possible that information that goes into a blackhole is eventually released via subtle correlations in the black-bodyradiation it emits.

Particle physics

Hawking radiation has some intriguing properties from the pointof view of particle physics. In a particle accelerator, the list ofparticles one can create in appreciable quantities is determined bycoupling constants. In Hawking radiation, however, we expect tosee a representative sampling of all types of particles, biased onlyby the fact that massless or low-mass particles are more likely to beproduced than massive ones. For example, it has been speculatedthat some of the universe’s dark matter exists in the form of “sterile”particles that do not couple to any force except for gravity. Suchparticles would never be produced in particle accelerators, but wouldbe seen in radiation.

Hawking radiation would violate many cherished conservationlaws of particle physics. Let a hydrogen atom fall into a black hole.We’ve lost a lepton and a baryon, but if we want to preserve con-servation of lepton number and baryon number, we cover this upwith a fig leaf by saying that the black hole has simply increasedits lepton number and baryon number by +1 each. But eventuallythe black hole evaporates, and the evaporation is probably mostlyinto zero-mass particles such as photons. Once the hole has evapo-rated completely, our fig leaf has evaporated as well. There is nowno physical object to which we can attribute the +1 units of leptonand baryon number.

Black-hole complementarity

A very difficult question about the relationship between quan-tum mechanics and general relativity occurs as follows. In our ex-ample above, observer A detects an extremely red-shifted spectrumof light from the black hole. A interprets this as evidence thatthe space near the event horizon is actually an intense maelstromof radiation, with the temperature approaching infinity as one getscloser and closer to the horizon. If B returns from the region nearthe horizon, B will agree with this description. But suppose thatobserver C simply drops straight through the horizon. C does not

Section 6.3 Black holes 233

Page 234: General Relativity

feel any acceleration, so by the equivalence principle C does notdetect any radiation at all. Passing down through the event hori-zon, C says, “A and B are liars! There’s no radiation at all.” Aand B, however, C see as having entered a region of infinitely in-tense radiation. “Ah,” says A, “too bad. C should have turnedback before it got too hot, just as I did.” This is an example of aprinciple we’ve encountered before, that when gravity and quantummechanics are combined, different observers disagree on the numberof quanta present in the vacuum. We are presented with a paradox,because A and B believe in an entirely different version of realitythat C. A and B say C was fricasseed, but C knows that that didn’thappen. One suggestion is that this contradiction shows that theproper logic for describing quantum gravity is nonaristotelian, asdescribed on page 68. This idea, suggested by Susskind et al., goesby the name of black-hole complementarity, by analogy with NielsBohr’s philosophical description of wave-particle duality as being“complementary” rather than contradictory. In this interpretation,we have to accept the fact that C experiences a qualitatively differ-ent reality than A and B, and we comfort ourselves by recognizingthat the contradiction can never become too acute, since C is lostbehind the event horizon and can never send information back out.

6.3.7 Black holes in d dimensions

It has been proposed that our universe might actually have notd = 4 dimensions but some higher number, with the d − 4 “extra”ones being spacelike, and curled up on some small scale ρ so thatwe don’t see them in ordinary life. One candidate for such a scaleρ is the Planck length, and we then have to talk about theories ofquantum gravity such as string theory. On the other hand, it couldbe the 1 TeV electroweak scale; the motivation for such an idea isthat it would allow the unification of electroweak interactions withgravity. This idea goes by the name of “large extra dimensions” —“large” because ρ is bigger than the Planck length. In fact, in suchtheories the Planck length is the electroweak unification scale, andthe number normally referred to as the Planck length is not reallythe Planck length.15

In d dimensions, there are d−1 spatial dimensions, and a surfaceof spherical symmetry has d− 2. In the Newtonian weak-field limit,the density of gravitational field lines falls off like m/rd−2 with dis-tance from a source m, and we therefore find that Newton’s law ofgravity has an exponent of −(d − 2). If d 6= 3, we can integrate tofind that the gravitational potential varies as Φ ∼ −mr−(d−3). Pass-ing back to the weak-field limit of general relativity, the equivalenceprinciple dictates that the gtt term of the metric be approximately1 + 2Φ, so we find that the metric has the form

ds2 ≈ (1− 2mr−(d−3)) dt2 − (. . .) dr2 − r2 dθ2 − r2 sin2 θ dφ2 .

15Kanti, arxiv.org/abs/hep-ph/0402168

234 Chapter 6 Vacuum Solutions

Page 235: General Relativity

This looks like the Schwarzschild form with no other change thana generalization of the exponent, and in fact Tangherlini showedin 1963 that for d > 4, one obtains the exact solution simply byapplying the same change of exponent to grr as well.16

If large extra dimensions do exist, then this is the actual formof any black-hole spacetime for r ρ, where the background curva-ture of the extra dimensions is negligible. Since the exponents areall changed, gravitational forces become stronger than otherwise ex-pected at small distances, and it becomes easier to make black holes.It has been proposed that if large extra dimensions exist, microscopicblack holes would be observed at the Large Hadron Collider. Theywould immediately evaporate into Hawking radiation (p. 232), withan experimental signature of violating the standard conservationlaws of particle physics. As of 2010, the empirical results seem tobe negative.17

The reasoning given above fails in the case of d = 3, i.e., 2+1-dimensional spacetime, both because the integral of r−1 is not r0

and because the Tangherlini-Schwarzschild metric is not a vacuumsolution. As shown in problem 12 on p. 241, there is no counter-part of the Schwarzschild metric in 2+1 dimensions. This is essen-tially because for d = 3 mass is unitless, so given a source havinga certain mass, there is no way to set the distance scale at whichNewtonian weak-field behavior gives way to the relativistic strongfield. Whereas for d ≥ 4, Newtonian gravity is the limiting caseof relativity, for d = 3 they are unrelated theories. In fact, therelativistic theory of gravity for d = 3 is somewhat trivial. Space-time does not admit curvature in vacuum solutions,18 so that theonly nontrivial way to make non-Minkowski 2+1-dimensional space-times is by gluing together Minkowski pieces in various topologies,like gluing pieces of paper to make things like cones and Mobiusstrips. 2+1-dimensional gravity has conical singularities, but notSchwarzschild-style ones that are surrounded by curved spacetime.

If black-hole solutions exist in d dimensions, then one can extendsuch a solution to d+1 dimensions with cylindrical symmetry, form-ing a “black string.” The nonexistence of d = 3 black holes impliesthat black string solutions do not exist in our own d = 4 universe.However, different considerations arise in a universe with a negativecosmological constant (p. 291). There are then 2+1-dimensional so-lutions known as BTZ black holes.19 Since our own universe has apositive cosmological constant, not a negative one, we still find thatblack strings cannot exist.

16Emparan and Reall, “Black Holes in Higher Dimensions,” relativity.

livingreviews.org/Articles/lrr-2008-6/17http://arxiv.org/abs/1012.337518arxiv.org/abs/gr-qc/0503022v419arxiv.org/abs/gr-qc/9506079v1

Section 6.3 Black holes 235

Page 236: General Relativity

6.4 Degenerate solutionsThis section can be omitted on a first reading.

At the event horizon of the Schwarzschild spacetime, the timelikeand spacelike roles of the Schwarzschild r and t coordinates getswapped around, so that the signs in the metric change from +−−−to − + −−. In discussing cases like this, it becomes convenient todefine a new usage of the term “signature,” as s = p− q, where p isthe number of positive signs and q the number of negative ones. Thiscan also be represented by the pair of numbers (p, q). The exampleof the Schwarzschild horizon is not too disturbing, both becausethe funny behavior arises at a singularity that can be removed by achange of coordinates and because the signature stays the same. Anobserver who free-falls through the horizon observes that the localproperties of spacetime stay the same, with |s| = 2, as required bythe equivalence principle.

But this only makes us wonder whether there are other examplesin which an observer would actually detect a change in the metric’ssignature. We are encouraged to think of the signature as somethingempirically observable because, for example, it has been proposedthat our universe may have previously unsuspected additional space-like dimensions, and these theories make testable predictions. Sincewe don’t notice the extra dimensions in ordinary life, they wouldhave to be wrapped up into a cylindrical topology. Some such the-ories, like string theory, are attempts to create a theory of quantumgravity, so the cylindrical radius is assumed to be on the order ofthe Planck length, which corresponds quantum-mechanically to anenergy scale that we will not be able to probe using any foresee-able technology. But it is also possible that the radius is large —a possibility that goes by the name of “large extra dimensions” —so that we could see an effect at the Large Hadron Collider. Noth-ing in the formulation of the Einstein field equations requires a 3+1(i.e., (1, 3)) signature, and they work equally well if the signatureis instead 4+1, 5+1, . . . . Newton’s inverse-square law of gravity isdescribed by general relativity as arising from the three-dimensionalnature of space, so on small scales in a theory with n large extra di-mensions, the 1/r2 behavior changes over to 1/r2+n, and it becomespossible that the LHC could produce microscopic black holes, whichwould immediately evaporate into Hawking radiation in a charac-teristic way.

So it appears that the signature of spacetime is something thatis not knowable a priori, and must be determined by experiment.When a thing is supposed to be experimentally observable, generalrelativity tells us that it had better be coordinate-independent. Isthis so? A proposition from linear algebra called Sylvester’s law ofinertia encourages us to believe that it is. The theorem states thatwhen a real matrix A is diagonalized by a real, nonsingular change

236 Chapter 6 Vacuum Solutions

Page 237: General Relativity

a / The change of coordinates isdegenerate at t = 0.

of basis (a similarity transformation S−1AS), the number of posi-tive, negative, and zero diagonal elements is uniquely determined.Since a change of coordinates has the effect of applying a similar-ity transformation on the metric, it appears that the signature iscoordinate-independent.

This is not quite right, however, as shown by the following para-dox. The coordinate invariance of general relativity tells us that ifall clocks, everywhere in the universe, were to slow down simulta-neously (with simultaneity defined in any way we like), there wouldbe no observable consequences. This implies that the spacetimeds2 = −tdt2 − d`2, where d`2 = dx2 + dy2 + dz2, is empiricallyindistinguishable from a flat spacetime. Starting from t = −∞,the positive gtt component of the metric shrinks uniformly, whichshould be harmless. We can indeed verify by direct evaluation of theRiemann tensor that this is a flat spacetime (problem 10, p. 241).But for t > 0 the signature of the metric switches from +−−− to−−−−, i.e., from Lorentzian (|s| = 2) to Euclidean (|s| = 4). Thisis disquieting. For t < 0, the metric is a perfectly valid descriptionof our own universe (which is approximately flat). Time passes, andthere is no sign of any impending disaster. Then, suddenly, at somepoint in time, the entire structure of spacetime undergoes a horriblespasm. This is a paradox, because we could just as well have posedour initial conditions using some other coordinate system, in whichthe metric had the familiar form ds2 = dt2−d`2. General relativityis supposed to be agnostic about coordinates, but a choice of coor-dinate leads to a differing prediction about the signature, which isa coordinate-independent quantity.

We are led to the resolution of the paradox if we explicitlyconstruct the coordinate transformation involved. In coordinates(t,x, y, z), we have ds2 = −tdt2 − d`2. We would like to find therelationship between t and some other coordinate u such that werecover the familiar form ds2 = du2−d`2 for the metric. The tensortransformation law gives

gtt =

(∂u

∂t

)2

guu

−t =

(∂u

∂t

)2

with solution

u = ±2

3t3/2 , t < 0 .

There is no solution for t > 0.

If physicists living in this universe, at t < 0, for some reasonchoose t as their time coordinate, there is in fact a way for them totell that the cataclysmic event at t = 0 is not a reliable prediction.

Section 6.4 Degenerate solutions 237

Page 238: General Relativity

At t = 0, their metric’s time component vanishes, so its signaturechanges from + − −− to 0 − −−. At that moment, the machin-ery of the standard tensor formulation of general relativity breaksdown. For example, one can no longer raise indices, because gab isthe matrix inverse of gab, but gab is not invertible. Since the fieldequations are ultimately expressed in terms of the metric using ma-chinery that includes raising and lowering of indices, there is no wayto apply them at t = 0. They don’t make a false prediction of theend of the world; they fail to make any prediction at all. Physicistsaccustomed to working in terms of the t coordinate can simply throwup their hands and say that they have no way to predict anythingat t > 0. But they already know that their spacetime is one whoseobservables, such as curvature, are all constant with respect to time,so they should ask why this perfect symmetry is broken by singlingout t = 0. There is physically nothing that should make one mo-ment in time different than any other, so choosing a particular timeto call t = 0 should be interpreted merely as an arbitrary choice ofthe placement of the origin of the coordinate system. This suggeststo the physicists that all of the problems they’ve been having are notproblems with any physical meaning, but merely problems arisingfrom a poor choice of coordinates. They carry out the calculationabove, and discover the u time coordinate. Expressed in terms of u,the metric is well behaved, and the machinery of prediction neverbreaks down.

The paradox posed earlier is resolved because Sylvester’s lawof inertia only applies to a nonsingular transformation S. If S hadbeen singular, then the S−1 referred to in the theorem wouldn’t evenhave existed. But the transformation from u to t has ∂t/∂u = 0 atu = t = 0, so it is singular. This is all in keeping with the generalphilosophy of coordinate-invariance in relativity, which is that onlysmooth, one-to-one coordinate transformations are allowed. Some-one who has found a lucky coordinate like u, and who then con-templates transforming to t, should realize that it isn’t a good idea,because the transformation is not smooth and one-to-one. Someonewho has started by working with an unlucky coordinate like t findsthat the machinery breaks down at t = 0, and concludes that itwould be a good idea to search for a more useful set of coordinates.This situation can actually arise in practical calculations.

What about our original question: could the signature of space-time actually change at some boundary? The answer is now clear.Such a change of signature is something that could conceivablyhave intrinsic physical meaning, but if so, then the standard for-mulation of general relativity is not capable of making predictionsabout it. There are other formulations of general relativity, such asAshtekar’s, that are ordinarily equivalent to Einstein’s, but that arecapable of making predictions about changes of signature. However,there is more than one such formulation, and they do not agree on

238 Chapter 6 Vacuum Solutions

Page 239: General Relativity

their predictions about signature changes.

Section 6.4 Degenerate solutions 239

Page 240: General Relativity

Problems1 Show that in geometrized units, power is unitless. Find theequivalent in watts of a power that equals 1 in geometrized units.

2 The metric of coordinates (θ,φ) on the unit sphere is ds2 =dθ2 + sin2 θ dφ2. (a) Show that there is a singular point at whichgab → ∞. (b) Verify directly that the scalar curvature R = Raaconstructed from the trace of the Ricci tensor is never infinite. (c)Prove that the singularity is a coordinate singularity.

3 (a) Space probes in our solar system often use a slingshotmaneuver. In the simplest case, the probe is scattered gravitation-ally through an angle of 180 degrees by a planet. Show that in someother frame such as the rest frame of the sun, in which the planethas speed u toward the incoming probe, the maneuver adds 2u tothe speed of the probe. (b) Suppose that we replace the planet witha black hole, and the space probe with a light ray. Why doesn’t thisaccelerate the ray to a speed greater than c? . Solution, p. 385

4 An observer outside a black hole’s event horizon can neverobserve a test particle falling past the event horizon and later hittingthe singularity. We could therefore wonder whether general relativ-ity’s predictions about the interior of a black hole, and the singular-ity in particular, are even a testable scientific theory. However, theobserver could herself fall into the black hole. The question is thenwhether she would reach the singularity within a finite proper time;if so, then it is observable to her. The purpose of this problem is toprove that this is so, using the techniques of section 6.2.6, p. 217.Suppose for simplicity that the observer starts at rest far away fromthe black hole, and falls directly inward toward it. (a) In the nota-tion of section 6.2.6, what are the values of E and L in this case?(b) Find the function r(s), i.e., the observer’s Schwarzschild radialcoordinate as a function of her proper time, and show that she doesreach the singularity in finite proper time. . Solution, p. 385

5 The curve given parametrically by (cos3 t, sin3 t) is called anastroid. The arc length along this curve is given by s = (3/2) sin2 t,and its curvature by k = −(2/3) csc 2t. By rotating this astroidabout the x axis, we form a surface of revolution that can be de-scribed by coordinates (t,φ), where φ is the angle of rotation. (a)Find the metric on this surface. (b) Identify any singularities, andclassify them as coordinate or intrinsic singularities.

. Solution, p. 385

240 Chapter 6 Vacuum Solutions

Page 241: General Relativity

6 (a) Section 3.5.4 (p. 108) gave a flat-spacetime metric inrotating polar coordinates,

ds2 = (1− ω2r2) dt2 − dr2 − r2 dθ′2 − 2ωr2 dθ′ dt .

Identify the two values of r at which singularities occur, and classifythem as coordinate or non-coordinate singularities.(b) The corresponding spatial metric was found to be

ds2 = − dr2 − r2

1− ω2r2dθ′2 .

Identify the two values of r at which singularities occur, and classifythem as coordinate or non-coordinate singularities.(c) Consider the following argument, which is intended to providean answer to part b without any computation. In two dimensions,there is only one measure of curvature, which is equivalent (up to aconstant of proportionality) to the Gaussian curvature. The Gaus-sian curvature is proportional to the angular deficit ε of a triangle.Since the angular deficit of a triangle in a space with negative cur-vature satisfies the inequality −π < ε < 0, we conclude that theGaussian curvature can never be infinite. Since there is only onemeasure of curvature in a two-dimensional space, this means thatthere is no non-coordinate singularity. Is this argument correct,and is the claimed result consistent with your answers to part b?

. Solution, p. 385

7 The first experimental verification of gravitational redshiftswas a measurement in 1925 by W.S. Adams of the spectrum of lightemitted from the surface of the white dwarf star Sirius B. Sirius Bhas a mass of 0.98M and a radius of 5.9×106 m. Find the redshift.

8 Show that, as claimed on page 225, applying the change ofcoordinates t′ = t−2m ln(r−2m) to the Schwarzschild metric resultsin a metric for which grr and gt′t′ never blow up, but that gt

′t′ doesblow up.

9 Use the geodesic equation to show that, in the case of acircular orbit in a Schwarzschild metric, d2 t/ ds2 = 0. Explain whythis makes sense.

10 Verify by direct calculation, as asserted on p. 237, that theRiemann tensor vanishes for the metric ds2 = −tdt2 − d`2, whered`2 = dx2 + dy2 + dz2. . Solution, p. 386

11 Suppose someone proposes that the vacuum field equationof general relativity isn’t Rab = 0 but rather Rab = k, where k issome constant that describes an innate tendency of spacetime tohave tidal distortions. Explain why this is not a good proposal.

. Solution, p. 386

12 Prove, as claimed on p. 234, that in 2+1 dimensions, with avanishing cosmological constant, there is no nontrivial Schwarzschildmetric. . Solution, p. 386

Problems 241

Page 242: General Relativity

13 On p. 213 I argued that there is no way to define a time-reversal operation in general relativity so that it applies to all space-times. Why can’t we define it by picking some arbitrary space-like surface that covers the whole universe, flipping the velocity ofevery particle on that surface, and evolving a new version of thespacetime backward and forward from that surface using the fieldequations? . Solution, p. 387

242 Chapter 6 Vacuum Solutions

Page 243: General Relativity

a / The two-dimensional spacehas a symmetry which can bevisualized by imagining it as asurface of revolution embeddedin three-space. Without referenceto any extrinsic features suchas coordinates or embedding,an observer on this surface candetect the symmetry, becausethere exists a vector field ξdusuch that translation by ξdudoesn’t change the distancebetween nearby points.

b / Wilhelm Killing (1847-1923).

Chapter 7

Symmetries

This chapter is not required in order to understand the later mate-rial.

7.1 Killing vectorsThe Schwarzschild metric is an example of a highly symmetric space-time. It has continuous symmetries in space (under rotation) and intime (under translation in time). In addition, it has discrete sym-metries under spatial reflection and time reversal. In section 6.2.6,we saw that the two continuous symmetries led to the existence ofconserved quantities for the trajectories of test particles, and thatthese could be interpreted as mass-energy and angular momentum.

Generalizing, we want to consider the idea that a metric maybe invariant when every point in spacetime is systematically shiftedby some infinitesimal amount. For example, the Schwarzschild met-ric is invariant under t → t + dt. In coordinates (x0,x1,x2,x3) =(t, r, θ,φ), we have a vector field (dt, 0, 0, 0) that defines the time-translation symmetry, and it is conventional to split this into twofactors, a finite vector field ξ and an infinitesimal scalar, so that thedisplacement vector is

ξ dt = (1, 0, 0, 0) dt .

Such a field is called a Killing vector field, or simply a Killing vector,after Wilhelm Killing. When all the points in a space are displacedas specified by the Killing vector, they flow without expansion orcompression. The path of a particular point, such as the dashed linein figure a, under this flow is called its orbit. Although the term“Killing vector” is singular, it refers to the entire field of vectors,each of which differs in general from the others. For example, theξ shown in figure a has a greater magnitude than a ξ near the neckof the surface.

The infinitesimal notation is designed to describe a continuoussymmetry, not a discrete one. For example, the Schwarzschild space-time also has a discrete time-reversal symmetry t→ −t. This can’tbe described by a Killing vector, because the displacement in timeis not infinitesimal.

243

Page 244: General Relativity

c / Vectors at a point P on asphere can be visualized asoccupying a Euclidean plane thatis particular to P.

The Euclidean plane Example: 1The Euclidean plane has two Killing vectors corresponding totranslation in two linearly independent directions, plus a third Killingvector for rotation about some arbitrarily chosen origin O. In Carte-sian coordinates, one way of writing a complete set of these is is

ξ1 = (1, 0)ξ2 = (0, 1)ξ3 = (−y , x) .

A theorem from classical geometry1 states that any transforma-tion in the Euclidean plane that preserves distances and hand-edness can be expressed either as a translation or as a rota-tion about some point. The transformations that do not preservehandedness, such as reflections, are discrete, not continuous.This theorem tells us that there are no more Killing vectors to befound beyond these three, since any translation can be accom-plished using ξ1 and ξ2, while a rotation about a point P can bedone by translating P to O, rotating, and then translating O backto P.

In the example of the Schwarzschild spacetime, the componentsof the metric happened to be independent of t when expressed inour coordinates. This is a sufficient condition for the existence of aKilling vector, but not a necessary one. For example, it is possibleto write the metric of the Euclidean plane in various forms suchas ds2 = dx2 + dy2 and ds2 = dr2 + r2 dφ2. The first form isindependent of x and y, which demonstrates that x → x + dx andy → y + dy are Killing vectors, while the second form gives usφ→ φ+dφ. Although we may be able to find a particular coordinatesystem in which the existence of a Killing vector is manifest, itsexistence is an intrinsic property that holds regardless of whether weeven employ coordinates. In general, we define a Killing vector not interms of a particular system of coordinates but in purely geometricalterms: a space has a Killing vector ξ if translation by an infinitesimalamount ξ du doesn’t change the distance between nearby points.Statements such as “the spacetime has a timelike Killing vector” aretherefore intrinsic, since both the timelike property and the propertyof being a Killing vector are coordinate-independent.

Killing vectors, like all vectors, have to live in some kind ofvector space. On a manifold, this vector space is particular to agiven point, figure c. A different vector space exists at every point,so that vectors at different points, occupying different spaces, canbe compared only by parallel transport. Furthermore, we reallyhave two such spaces at a given point, a space of contravariantvectors and a space of covariant ones. These are referred to as thetangent and cotangent spaces.The infinitesimal displacements we’ve

1Coxeter, Introduction to Geometry, ch. 3

244 Chapter 7 Symmetries

Page 245: General Relativity

been discussing belong to the contravariant (upper-index) space, butby lowering and index we can just as well discuss them as covariantvectors. The customary way of notating Killing vectors makes use ofthe fact, mentioned in passing on p. 199, that the partial derivativeoperators ∂0, ∂1, ∂2, ∂3 form the basis for a vector space. In thisnotation, the Killing vector of the Schwarzschild metric we’ve beendiscussing can be notated simply as

ξ = ∂t .

The partial derivative notation, like the infinitesimal notation,implicitly refers to continuous symmetries rather than discrete ones.If a discrete symmetry carries a point P1 to some distant point P2,then P1 and P2 have two different tangent planes, so there is not auniquely defined notion of whether vectors ξ1 and ξ2 at these twopoints are equal — or even approximately equal. There can there-fore be no well-defined way to construe a statement such as, “P1 andP2 are separated by a displacement ξ.” In the case of a continuoussymmetry, on the other hand, the two tangent planes come closerand closer to coinciding as the distance s between two points on anorbit approaches zero, and in this limit we recover an approximatenotion of being able to compare vectors in the two tangent planes.They can be compared by parallel transport, and although paralleltransport is path-dependent, the difference bewteen paths is pro-portional to the area they enclose, which varies as s2, and thereforebecomes negligible in the limit s→ 0.

Self-check: Find another Killing vector of the Schwarzschild met-ric, and express it in the tangent-vector notation.

It can be shown that an equivalent condition for a field to be aKilling vector is ∇aξb +∇bξa = 0. This relation, called the Killingequation, is written without reference to any coordinate system, inkeeping with the coordinate-independence of the notion.

When a spacetime has more than one Killing vector, any lin-ear combination of them is also a Killing vector. This means thatalthough the existence of certain types of Killing vectors may beintrinsic, the exact choice of those vectors is not.

Euclidean translations Example: 2The Euclidean plane has two translational Killing vectors (1, 0)and (0, 1), i.e., ∂x and ∂y . These same vectors could be ex-pressed as (1, 1) and (1,−1) in coordinate system that was rescaledand rotated by 45 degrees.

A cylinder Example: 3The local properties of a cylinder, such as intrinsic flatness, arethe same as the local properties of a Euclidean plane. Since thedefinition of a Killing vector is local and intrinsic, a cylinder hasthe same three Killing vectors as a plane, if we consider only a

Section 7.1 Killing vectors 245

Page 246: General Relativity

d / Example 3: A cylinderhas three local symmetries, butonly two that can be extendedglobally to make Killing vectors.

patch on the cylinder that is small enough so that it doesn’t wrapall the way around. However, only two of these — the translations— can be extended to form a smooth vector field on the entiresurface of the cylinder. These might be more naturally notated in(φ, z) coordinates rather than (x , y ), giving ∂z and ∂φ.

A sphere Example: 4A sphere is like a plane or a cylinder in that it is a two-dimensionalspace in which no point has any properties that are intrinsicallydifferent than any other. We might expect, then, that it wouldhave two Killing vectors. Actually it has three, ξx , ξy , and ξz , cor-responding to infinitesimal rotations about the x , y , and z axes.To show that these are all independent Killing vectors, we needto demonstrate that we can’t, for example, have ξx = c1ξy + c2ξzfor some constants c1 and c2. To see this, consider the actions ofξy and ξz on the point P where the x axis intersects the sphere.(References to the axes and their intersection with the sphere areextrinsic, but this is only for convenience of description and vi-sualization.) Both ξy and ξz move P around a little, and thesemotions are in orthogonal directions, wherease ξx leaves P fixed.This proves that we can’t have ξx = c1ξy + c2ξz . All three Killingvectors are linearly independent.

This example shows that linear independence of Killing vectorscan’t be visualized simply by thinking about the vectors in thetangent plane at one point. If that were the case, then we couldhave at most two linearly independent Killing vectors in this two-dimensional space. When we say “Killing vector” we’re really re-ferring to the Killing vector field, which is defined everywhere onthe space.

Proving nonexistence of Killing vectors Example: 5. Find all Killing vectors of these two metrics:

ds2 = e−x dx2 + ex dy2

ds2 = dx2 + x2 dy2 .

. Since both metrics are manifestly independent of y , it followsthat ∂y is a Killing vector for both of them. Neither one has anyother manifest symmetry, so we can reasonably conjecture thatthis is the only Killing vector either one of them has. However, onecan have symmetries that are not manifest, so it is also possiblethat there are more.

One way to attack this would be to use the Killing equation to finda system of differential equations, and then determine how manylinearly independent solutions there were.

But there is a simpler approach. The dependence of these met-rics on x suggests that the spaces may have intrinsic propertiesthat depend on x ; if so, then this demonstrates a lower symmetry

246 Chapter 7 Symmetries

Page 247: General Relativity

than that of the Euclidean plane, which has three Killing vectors.One intrinsic property we can check is the scalar curvature R.The following Maxima code calculates R for the first metric.

1 load(ctensor);

2 dim:2;

3 ct_coords:[x,y];

4 lg:matrix([exp(-x),0],[0,exp(x)]);

5 cmetric();

6 R:scurvature(); /* scalar curvature */

The result is R = −ex , which demonstrates that points that differin x have different intrinsic properties. Since the flow of a Killingfield ξ can never connect points that have different properties,we conclude that ξx = 0. If only ξy can be nonzero, the Killingequation ∇aξb +∇bξa = 0 simplifies to ∇xξy = ∇yξy = 0. Theseequations constrain both ∂xξy and ∂yξy , which means that givena value of ξy at some point in the plane, its value everywhereelse is determined. Therefore the only possible Killing vectorsare scalar multiples of the Killing vector already found. Since wedon’t consider Killing vectors to be distinct unless they are linearlyindependent, the first metric only has one Killing vector.

A similar calculation for the second metric shows that R = 0, andan explicit calculation of its Riemann tensor shows that in factthe space is flat. It is simply the Euclidean plane written in funnycoordinates. This metric has the same three Killing vectors as theEuclidean plane.

It would have been tempting to leap to the wrong conclusion aboutthe second metric by the following reasoning. The signature of ametric is an intrinsic property. The metric has signature ++ every-where in the plane except on the y axis, where it has signature+0. This shows that the y axis has different intrinsic propertiesthan the rest of the plane, and therefore the metric must have alower symmetry than the Euclidean plane. It can have at most twoKilling vectors, not three. This contradicts our earlier conclusion.The resolution of this paradox is that this metric has a removabledegeneracy of the same type as the one described in section 6.4.As discussed in that section, the signature is invariant only undernonsingular transformations, but the transformation that convertsthese coordinates to Cartesian ones is singular.

7.1.1 Inappropriate mixing of notational systems

Confusingly, it is customary to express vectors and dual vectorsby summing over basis vectors like this:

v = vµ∂µ

ω = ωµ dxµ .

Section 7.1 Killing vectors 247

Page 248: General Relativity

This is an abuse of notation, driven by the desire to have up-downpairs of indices to sum according to the usual rules of the Einsteinnotation convention. But by that convention, a quantity like v orω with no indices is a scalar, and that’s not the case here. Theproducts on the right are not tensor products, i.e., the indices aren’tbeing contracted.

This muddle is the result of trying to make the Einstein notationdo too many things at once and of trying to preserve a clumsy andoutdated system of notation and terminology originated by Sylvesterin 1853. In pure abstract index notation, there are not six flavors ofobjects as in the two equations above but only two: vectors like va

and dual vectors like ωa. The Sylvester notation is the prevalent oneamong mathematicians today, because their predecessors committedthemselves to it a century before the development of alternativeslike abstract index notation and birdtracks. The Sylvester system isinconsistent with the way physicists today think of vectors and dualvectors as being defined by their transformation properties, becauseSylvester considers v and ω to be invariant.

Mixing the two systems leads to the kinds of notational clashesdescribed above. As a particularly absurd example, a physicist whois asked to suggest a notation for a vector will typically pick up apen and write vµ. We are then led to say that a vector is written ina concrete basis as a linear combination of dual vectors ∂µ!

7.1.2 Conservation laws

Whenever a spacetime has a Killing vector, geodesics have aconstant value of vbξb, where vb is the velocity four-vector. Forexample, because the Schwarzschild metric has a Killing vector ξ =∂t, test particles have a conserved value of vt, and therefore we alsohave conservation of pt, interpreted as the mass-energy.

Energy-momentum in flat 1+1 spacetime Example: 6A flat 1+1-dimensional spacetime has Killing vectors ∂x and ∂t .Corresponding to these are the conserved momentum and mass-energy, p and E . If we do a Lorentz boost, these two Killing vec-tors get mixed together by a linear transformation, correspondingto a transformation of p and E into a new frame.

In addition, one can define a globally conserved quantity foundby integrating the flux density P a = T abξb over the boundary of anycompact orientable region.2 In case of a flat spacetime, there areenough Killing vectors to give conservation of energy-momentumand angular momentum.

2Hawking and Ellis, The Large Scale Structure of Space-Time, p. 62, givea succinct treatment that describes the flux densities and proves that Gauss’stheorem, which ordinarily fails in curved spacetime for a non-scalar flux, holds inthe case where the appropriate Killing vectors exist. For an explicit descriptionof how one can integrate to find a scalar mass-energy, see Winitzki, Topics inGeneral Relativity, section 3.1.5, available for free online.

248 Chapter 7 Symmetries

Page 249: General Relativity

7.2 Spherical symmetryA little more work is required if we want to link the existence ofKilling vectors to the existence of a specific symmetry such as spher-ical symmetry. When we talk about spherical symmetry in the con-text of Newtonian gravity or Maxwell’s equations, we may say, “Thefields only depend on r,” implicitly assuming that there is an r coor-dinate that has a definite meaning for a given choice of origin. Butcoordinates in relativity are not guaranteed to have any particu-lar physical interpretation such as distance from a particular origin.The origin may not even exist as part of the spacetime, as in theSchwarzschild metric, which has a singularity at the center. Anotherpossibility is that the origin may not be unique, as on a Euclideantwo-sphere like the earth’s surface, where a circle centered on thenorth pole is also a circle centered on the south pole; this can alsooccur in certain cosmological spacetimes that describe a universethat wraps around on itself spatially.

We therefore define spherical symmetry as follows. A spacetimeS is spherically symmetric if we can write it as a union S = ∪sr,t ofnonintersecting subsets sr,t, where each s has the structure of a two-sphere, and the real numbers r and t have no preassigned physicalinterpretation, but sr,t is required to vary smoothly as a functionof them. By “has the structure of a two-sphere,” we mean that nointrinsic measurement on s will produce any result different from theresult we would have obtained on some two-sphere. A two-spherehas only two intrinsic properties: (1) it is spacelike, i.e., locally itsgeometry is approximately that of the Euclidean plane; (2) it hasa constant positive curvature. If we like, we can require that theparameter r be the corresponding radius of curvature, in which caset is some timelike coordinate.

To link this definition to Killing vectors, we note that condition2 is equivalent to the following alternative condition: (2′) The sets should have three Killing vectors (which by condition 1 are bothspacelike), and it should be possible to choose these Killing vectorssuch that algebraically they act the same as the ones constructedexplicitly in example 4 on p. 246. As an example of such an algebraicproperty, figure a shows that rotations are noncommutative.

Section 7.2 Spherical symmetry 249

Page 250: General Relativity

a / Performing the rotations in oneorder gives one result, 3, while re-versing the order gives a differentresult, 5.

A cylinder is not a sphere Example: 7. Show that a cylinder does not have the structure of a two-sphere.

. The cylinder passes condition 1. It fails condition 2 because itsGaussian curvature is zero. Alternatively, it fails condition 2′ be-cause it has only two independent Killing vectors (example 3).

A plane is not a sphere Example: 8. Show that the Euclidean plane does not have the structure of atwo-sphere.

. Condition 2 is violated because the Gaussian curvature is zero.Or if we wish, the plane violates 2′ because ∂x and ∂y commute,but none of the Killing vectors of a 2-sphere commute.

7.3 Penrose diagramsMany of the most difficult and interesting topics in general relativ-ity can be studied within the context of two types of sphericallysymmetric spacetimes: (1) a nonrotating black hole, such as theSchwarzschild spacetime; and (2) a cosmological spacetime that ishomogeneous and isotropic. (In the latter case, there is a higherlevel of symmetry, since there is no central point.) When a space-time is spherically symmetric, it can be represented using a Penrosediagram, also known as a Penrose-Carter diagram or causal dia-gram. (A Penrose diagram can also be constructed in the case of aspacetime in which tidal forces vanish, even if there is no sphericalsymmetry.)

250 Chapter 7 Symmetries

Page 251: General Relativity

a / Penrose diagram for flatspacetime.

b / Penrose diagram forSchwarzschild spacetime: ablack hole that didn’t form bygravitational collapse.

7.3.1 Flat spacetime

As a warmup, figure a shows a Penrose diagram for flat (Minkowski)spacetime. The diagram looks 1 + 1-dimensional, but the conven-tion is that spherical symmetry is assumed, so two more dimensionsare hidden, and we’re really portraying 3 + 1 dimensions. A typicalpoint on the interior of the diamond region represents a 2-sphere.On this type of diagram, light cones look just like they would on anormal spacetime diagram of Minkowski space, but distance scalesare highly distorted. The diamond represents the entire spacetime,with the distortion fitting this entire infinite region into that finitearea on the page. Despite the distortion, the diagram shows lightlikesurfaces as 45-degree diagonals. Spacelike and timelike geodesics,however, are distorted, as shown by the curves in the diagram.

The distortion becomes greater as we move away from the centerof the diagram, and becomes infinite near the edges. Because ofthis infinite distortion, the points i− and i+ actually represent 3-spheres. All timelike curves start at i− and end at i+, which areidealized points at infinity, like the vanishing points in perspectivedrawings. We can think of i+ as the “Elephants’ graveyard,” wheremassive particles go when they die. Similarly, lightlike curves endon I + (which includes its mirror image on the left), referred to asnull infinity.3 The point at i0 is an infinitely distant endpoint forspacelike curves. Because of the spherical symmetry, the left andright halves of the diagram are redundant.

It is possible to make up explicit formulae that translate backand forth between Minkowski coordinates and points on the dia-mond, but in general this is not necessary. In fact, the utility ofthe diagrams is that they let us think about causal relationships incoordinate-independent ways. A light cone on the diagram looksexactly like a normal light cone.

Since this particular spacetime is homogeneous, it makes no dif-ference what spatial location on the diagram we pick as our axisof symmetry. For example, we could arbitrarily pick the left-handcorner, the central timelike geodesic (drawn straight) or one of theother timelike geodesics (represented as if it were curved).

7.3.2 Schwarzschild spacetime

Figure b is a Penrose diagram for the Schwarzschild spacetime,i.e., a Minkowski spacetime in which a black hole exists, but inwhich the black hole did not form by gravitational collapse. Thisspacetime isn’t homogeneous; it has a specific location that is itscenter of spherical symmetry, and this is the vertical line on theleft marked r = 0. The triangle is the spacetime inside the eventhorizon; we could have copied it across the r = 0 line if we had sodesired, but the copies would have been redundant.

3See also p. 258.

Section 7.3 Penrose diagrams 251

Page 252: General Relativity

c / Penrose diagram for ablack hole formed by gravitationalcollapse.

d / According to the distantobserver, does the infalling mat-ter ever reach the singularity?

The Penrose diagram makes it easy to reason about causal re-lationships. For example, we can see that if a particle reaches apoint inside the event horizon, its entire causal future lies insidethe horizon, and all of its possible future world-lines intersect thesingularity. The horizon is a lightlike surface, which makes sense,because it’s defined as the boundary of the set of points from whicha light ray could reach I +.

7.3.3 Astrophysical black hole

Figure c is a Penrose diagram for a black hole that has formed bygravitational collapse. Using this type of diagram, we can addressone of the most vexing FAQs about black holes. If a distant observerwatches the collapsing cloud of matter from which the black holeforms, her optical observations will show that the light from thematter becomes more and more gravitationally redshifted, and ifshe wishes, she can interpret this as an example of gravitationaltime dilation. As she waits longer and longer, the light signals fromthe infalling matter take longer and longer to arrive. The redshiftapproaches infinity as the matter approaches the horizon, so thelight waves ultimately become too low in energy to be detectable byany given instrument. Furthermore, her patience (or her lifetime)will run out, because the time on her clock approaches infinity asshe waits to get signals from matter that is approaching the horizon.This is all exactly as it should be, since the horizon is by definitionthe boundary of her observable universe. (A light ray emitted fromthe horizon will end up at i+, which is an end-point of timelike world-lines reached only by observers who have experienced an infiniteamount of proper time.)

People who are bothered by these issues often acknowledge theexternal unobservability of matter passing through the horizon, andthen want to pass from this to questions like, “Does that mean theblack hole never really forms?” This presupposes that our distantobserver has a uniquely defined notion of simultaneity that appliesto a region of space stretching from her own position to the interiorof the black hole, so that she can say what’s going on inside the blackhole “now.” But the notion of simultaneity in general relativity iseven more limited than its counterpart in special relativity. Notonly is simultaneity in general relativity observer-dependent, as inspecial relativity, but it is also local rather than global.

In figure d, E is an event on the world-line of an observer. Thespacelike surface S1 is one possible “now” for this observer. Accord-ing to this surface, no particle has ever fallen in and reached thehorizon; every such particle has a world-line that intersects S1, andtherefore it’s still on its way in.

S2 is another possible “now” for the same observer at the sametime. According to this definition of “now,” all the particles havepassed the event horizon, but none have hit the singularity yet.

252 Chapter 7 Symmetries

Page 253: General Relativity

Finally, S3 is a “now” according to which all the particles have hitthe singularity.

If this was special relativity, then we could decide which surfacewas the correct notion of simultaneity for the observer, based onthe observer’s state of motion. But in general relativity, this onlyworks locally (which is why I made all three surfaces coincide nearE). There is no well-defined way of deciding which is the correct wayof globally extending this notion of simultaneity.

Although it may seem strange that we can’t say whether thesingularity has “already” formed according to a distant observer,this is really just an inevitable result of the fact that the singularityis spacelike. The same thing happens in the case of a Schwarzschildspacetime, which we think of as a description of an eternal blackhole, i.e., one that has always existed and always will. On thesimilar Penrose diagram for an eternal black hole, we can still drawa spacelike surface like S1 or S2, representing a definition of “now”such that the singularity doesn’t exist yet.

7.4 Static and stationary spacetimes7.4.1 Stationary spacetimes

When we set out to describe a generic spacetime, the Alicein Wonderland quality of the experience is partly because coordi-nate invariance allows our time and distance scales to be arbitrarilyrescaled, but also partly because the landscape can change from onemoment to the next. The situation is drastically simplified when thespacetime has a timelike Killing vector. Such a spacetime is said tobe stationary. Two examples are flat spacetime and the spacetimesurrounding the rotating earth (in which there is a frame-draggingeffect). Non-examples include the solar system, cosmological mod-els, gravitational waves, and a cloud of matter undergoing gravita-tional collapse.

Can Alice determine, by traveling around her spacetime andcarrying out observations, whether it is stationary? If it’s not, thenshe might be able to prove it. For example, suppose she visits acertain region and finds that the Kretchmann invariant RabcdRabcdvaries with time in her frame of reference. Maybe this is becausean asteroid is coming her way, in which case she could readjust hervelocity vector to match that of the asteroid. Even if she can’t seethe asteroid, she can still try to find a velocity that makes her localgeometry stop changing in this particular way. If the spacetime istruly stationary, then she can always “tune in” to the right velocityvector in this way by searching systematically. If this procedure everfails, then she has proved that her spacetime is not stationary.

Self-check: Why is the timelike nature of the Killing vector im-portant in this story?

Section 7.4 Static and stationary spacetimes 253

Page 254: General Relativity

Proving that a spacetime is stationary is harder. This is partlyjust because spacetime is infinite, so it will take an infinite amountof time to check everywhere. We aren’t inclined to worry too muchabout this limitation on our geometrical knowledge, which is of atype that has been familiar since thousands of years ago, when itupset the ancient Greeks that the parallel postulate could only bechecked by following lines out to an infinite distance. But there is anew type of limitation as well. The Schwarzschild spacetime is notstationary according to our definition. In the coordinates used insection 6.2, ∂t is a Killing vector, but is only timelike for r > 2m; forr < 2m it is spacelike. Although the solution describes a black holethat is going to sit around forever without changing, no observercan ever verify that fact, because once she strays inside the horizonshe must follow a timelike world-line, which will end her program ofobservation within some finite time.

7.4.2 Isolated systems

Asymptotic flatness

This unfortunate feature of our definition of stationarity — itsempirical unverifiability — is something that in general we just haveto live with. But there is an alternative in the special case of anisolated system, such as our galaxy or a black hole. It may be agood approximation to ignore distant matter, modeling such a sys-tem with a spacetime that is almost flat everywhere except in theregion nearby. Such a spacetime is called asymptotically flat. For-mulating the definition of this term rigorously and in a coordinate-invariant way involves a large amount of technical machinery, sincewe are not guaranteed to be presented in advance with a special,physically significant set of coordinates that would lead directly toa quantitative way of defining words like “nearby.” The reader whowants a rigorous definition is referred to Hawking and Ellis.

Asymptotically stationary spacetimes

In the case of an asymptotically flat spacetime, we say that it isalso asymoptotically stationary if it has a Killing vector that be-comes timelike far away. Some authors (e.g., Ludvigsen) define“stationary” to mean what I’m calling “asymoptotically stationary,”others (Hawking and Ellis) define it the same way I do, and still oth-ers (Carroll) are not self-consistent. The Schwarzschild spacetime isasymptotically stationary, but not stationary.

7.4.3 A stationary field with no other symmetries

Consider the most general stationary case, in which the onlyKilling vector is the timelike one. The only ambiguity in the choiceof this vector is a rescaling; its direction is fixed. At any given pointin space, we therefore have a notion of being at rest, which is tohave a velocity vector parallel to the Killing vector. An observer atrest detects no time-dependence in quantities such as tidal forces.

254 Chapter 7 Symmetries

Page 255: General Relativity

Points in space thus have a permanent identity. The gravita-tional field, which the equivalence principle tells us is normally anelusive, frame-dependent concept, now becomes more concrete: it isthe proper acceleration required in order to stay in one place. Wecan therefore use phrases like “a stationary field,” without the usualcaveats about the coordinate-dependent meaning of “field.”

Space can be sprinkled with identical clocks, all at rest. Fur-thermore, we can compare the rates of these clocks, and even com-pensate for mismatched rates, by the following procedure. Since thespacetime is stationary, experiments are reproducible. If we send aphoton or a material particle from a point A in space to a point B,then identical particles emitted at later times will follow identicaltrajectories. The time lag between the arrival of two such particlstells an observer at B the amount of time at B that corresponds toa certain interval at A. If we wish, we can adjust all the clocks sothat their rates are matched. An example of such rate-matching isthe GPS satellite system, in which the satellites’ clocks are tunedto 10.22999999543 MHz, matching the ground-based clocks at 10.23MHz. (Strictly speaking, this example is out of place in this subsec-tion, since the earth’s field has an additional azimuthal symmetry.)

It is tempting to conclude that this type of spacetime comesequipped with a naturally preferred time coordinate that is uniqueup to a global affine transformation t → at + b. But to constructsuch a time coordinate, we would have to match not just the rates ofthe clocks, but also their phases. The best method relativity allowsfor doing this is Einstein synchronization (p. 359), which involvestrading a photon back and forth between clocks A and B and ad-justing the clocks so that they agree that each clock gets the photonat the mid-point in time between its arrivals at the other clock. Thetrouble is that for a general stationary spacetime, this procedure isnot transitive: synchronization of A with B, and of B with C, doesnot guarantee agreement between A with C. This is because the timeit takes a photon to travel clockwise around triangle ABCA may bedifferent from the time it takes for the counterclockwise itineraryACBA. In other words, we may have a Sagnac effect, which is gen-erally interpreted as a sign of rotation. Such an effect will occur,for example, in the field of the rotating earth, and it cannot beeliminated by choosing a frame that rotates along with the earth,because the surrounding space experiences a frame-dragging effect,which falls off gradually with distance.

Although a stationary spacetime does not have a uniquely pre-ferred time, it does prefer some time coordinates over others. In astationary spacetime, it is always possible to find a “nice” t suchthat the metric can be expressed without any t-dependence in itscomponents.

Section 7.4 Static and stationary spacetimes 255

Page 256: General Relativity

7.4.4 A stationary field with additional symmetries

Most of the results given above for a stationary field with noother symmetries also hold in the special case where additional sym-metries are present. The main difference is that we can make linearcombinations of a particular timelike Killing vector with the otherKilling vectors, so the timelike Killing vector is not unique. Thismeans that there is no preferred notion of being at rest. For exam-ple, in a flat spacetime we cannot define an observer to be at rest ifshe observes no change in the local observables over time, becausethat is true for any inertial observer. Since there is no preferred restframe, we can’t define the gravitational field in terms of that frame,and there is no longer any preferred definition of the gravitationalfield.

7.4.5 Static spacetimes

In addition to synchronizing all clocks to the same frequency, wemight also like to be able to match all their phases using Einsteinsynchronization, which requires transitivity. Transitivity is frame-dependent. For example, flat spacetime allows transitivity if we usethe usual coordinates. However, if we change into a rotating frame ofreference, transitivity fails (see p. 108). If coordinates exist in whicha particular spacetime has transitivity, then that spacetime is saidto be static. In these coordinates, the metric is diagonalized, andsince there are no space-time cross-terms like dx dt in the metric,such a spacetime is invariant under time reversal. Roughly speaking,a static spacetime is one in which there is no rotation.

7.4.6 Birkhoff’s theorem

Birkhoff’s theorem, proved below, states that in the case ofspherical symmetry, the vacuum field equations have a solution,the Schwarzschild spacetime, which is unique up to a choice of co-ordinates and the value of m. Let’s enumerate the assumptionsthat went into our derivation of the Schwarzschild metric on p. 211.These were: (1) the vacuum field equations, (2) spherical symmetry,(3) asymptotic staticity, (4) a certain choice of coordinates, and (5)Λ = 0. Birkhoff’s theorem says that the assumption of staticity wasnot necessary. That is, even if the mass distribution contracts andexpands over time, the exterior solution is still the Schwarzschildsolution. Birkhoff’s theorem holds because gravitational waves aretransverse, not longitudinal (see p. 349), so the mass distribution’sradial throbbing cannot generate a gravitational wave. Birkhoff’stheorem can be viewed as the simplest of the no-hair theorems de-scribing black holes. The most general no-hair theorem states thata black hole is completely characterized by its mass, charge, andangular momentum. Other than these three numbers, nobody onthe outside can recover any information that was possessed by thematter and energy that were sucked into the black hole.

256 Chapter 7 Symmetries

Page 257: General Relativity

It has been proposed4 that the no-hair theorem for nonzero an-gular momentum and zero charge could be tested empirically byobservations of Sagittarius A*. If the observations are consistentwith the no-hair theorem, it would be taken as supporting the va-lidity of general relativity and the interpretation of this object as asupermassive black hole. If not, then there are various possibilities,including a failure of general relativity to be the correct theory ofstrong gravitational fields, or a failure of one of the theorem’s otherassumptions, such as the nonexistence of closed timelike curves inthe surrounding universe.

Proof of Birkhoff’s theorem: Spherical symmetry guaranteesthat we can introduce coordinates r and t such that the surfacesof constant r and t have the structure of a sphere with radius r. Onone such surface we can introduce colatitude and longitude coordi-nates θ and φ. The (θ,φ) coordinates can be extended in a naturalway to other values of r by choosing the radial lines to lie in thedirection of the covariant derivative vector5 ∇ar, and this ensuresthat the metric will not have any nonvanishing terms in dr dθ ordr dφ, which could only arise if our choice had broken the symme-try between positive and negative values of dθ and dφ. Just as wewere free to choose any way of threading lines of constant (θ,φ, t)between spheres of different radii, we can also choose how to threadlines of constant (θ,φ, r) between different times, and this can bedone so as to keep the metric free of any time-space cross-terms suchas dθ dt. The metric can therefore be written in the form6

ds2 = h(t, r) dt2 − k(t, r) dr2 − r2(dθ2 + sin2 θ dφ2) .

This has to be a solution of the vacuum field equations, Rab = 0,and in particular a quick calculation with Maxima shows that Rrt =−∂tk/k2r, so k must be independent of time. With this restriction,we find Rrr = −∂rh/hkr − 1/r2 − 1/kr2 = 0, and since k is time-independent, ∂rh/h is also time-independent. This means that fora particular time to, the function f(r) = h(to, r) has some universalshape set by a differential equation, with the only possible ambiguitybeing an over-all scaling that depends on to. But since h is the time-time component of the metric, this scaling corresponds physically to

4Johannsen and Psaltis, http://arxiv.org/abs/1008.3902v15It may seem backwards to start talking about the covariant derivative of

a particular coordinate before a complete coordinate system has even been in-troduced. But (excluding the trivial case of a flat spacetime), r is not just anarbitrary coordinate, it is something that an observer at a certain point in space-time can determine by mapping out a surface of geometrically identical points,and then determining that surface’s radius of curvature. Another worry is that itis possible for ∇ar to misbehave on certain surfaces, such as the event horizon ofthe Schwarzschild spacetime, but we can simply require that radial lines remaincontinuous as they pass through these surfaces.

6On the same surfaces referred to in the preceding footnote, the functions hand k may to go to 0 or ∞. These turn out to be nothing more serious thancoordinate singularities.

Section 7.4 Static and stationary spacetimes 257

Page 258: General Relativity

a situation in which every clock, all over the universe, speeds up andslows down in unison. General relativity is coordinate-independent,so this has no observable effects, and we can absorb it into a re-definition of t that will cause h to be time-independent. Thus themetric can be expressed in the time-independent diagonal form

ds2 = h(r) dt2 − k(r) dr2 − r2(dθ2 + sin2 θ dφ2) .

We have already solved the field equations for a metric of this formand found as a solution the Schwarzschild spacetime.7 Since themetric’s components are all independent of t, ∂t is a Killing vec-tor, and it is timelike for large r, so the Schwarzschild spacetime isasymptotically static.

The no-hair theorems say that relativity only has a small reper-toire of types of black-hole singularities, defined as singularities in-side regions of space that are causally disconnected from the uni-verse, in the sense that future light-cones of points in the regiondo not extend to infinity.8 That is, a black hole is defined as asingularity hidden behind an event horizon, and since the defini-tion of an event horizon is dependent on the observer, we specifyan observer infinitely far away. The theorems cannot classify nakedsingularities, i.e., those not hidden behind horizons, because the roleof naked singularities in relativity is the subject of the cosmic cen-sorship hypothesis, which is an open problem. The theorems donot rule out the Big Bang singularity, because we cannot define thenotion of an observer infinitely far from the Big Bang. We can alsosee that Birkhoff’s theorem does not prohibit the Big Bang, becausecosmological models are not vacuum solutions with Λ = 0. Blackstring solutions are not ruled out by Birkhoff’s theorem because theywould lack spherical symmetry, so we need the arguments given onp. 234 to show that they don’t exist.

7.4.7 The gravitational potential

When Pound and Rebka made the first observation of gravita-tional redshifts, these shifts were interpreted as evidence of gravita-tional time dilation, i.e., a mismatch in the rates of clocks. We areaccustomed to connecting these two ideas by using the expression

7The Schwarzschild spacetime is the uniquely defined geometry found by re-moving the coordinate singularities from this form of the Schwarzschild metric.

8For a more formal statement of this, see Hawking and Ellis, “The Large ScaleStructure of Space-Time,” p. 315. Essentially, the region must be a connectedregion on a spacelike three-surface, and there must be no lightlike world-linesthat connect points in that region to null infinity. Null infinity was introducedbriefly on p. 251 is defined formally using conformal techniques, but basicallyrefers to points that are infinitely far away in both space and time, and havethe two infinities equal in a certain sense, so that a free light ray could end upthere. The definition is based on the assumption that the surrounding spacetimeis asymptotically flat, since otherwise null infinity can’t be defined. It is notactually necessary to assume a singularity as part of the definition; the no-hairtheorems guarantee that one exists.

258 Chapter 7 Symmetries

Page 259: General Relativity

e−∆Φ for the ratio of the rates of two clocks (example 12, p. 59),where Φ is a function of the spatial coordinates, and this is in factthe most general possible definition of a gravitational potential Φin relativity. Since a stationary field allows us to compare ratesof clocks, it seems that we should be able to define a gravitationalpotential for any stationary field. There is a problem, however, be-cause when we talk about a potential, we normally have in mindsomething that has encoded within it all there is to know about thefield. We would therefore expect to be able to find the metric fromthe potential. But the example of the rotating earth shows that thisneed not be the case for a general stationary field. In that example,there are effects like frame-dragging that clearly cannot be deducedfrom Φ; for by symmetry, Φ is independent of azimuthal angle, andtherefore it cannot distinguish between the direction of rotation andthe contrary direction. In a static spacetime, these rotational effectsdon’t exist; a static vacuum spacetime can be described completelyin terms of a single scalar potential plus information about the spa-tial curvature.

There are two main reasons why relativity does not offer a grav-itational potential with the same general utility as its Newtoniancounterpart.

The Einstein field equations are nonlinear. Therefore one can-not, in general, find the field created by a given set of sources byadding up the potentials. At best this is a possible weak-field ap-proximation. In particular, although Birkhoff’s theorem is in someways analogous to the Newtonian shell theorem, it cannot be usedto find the metric of an arbitrary spherically symmetric mass distri-bution by breaking it up into spherical shells.

It is also not meaningful to talk about any kind of gravitationalpotential for spacetimes that aren’t static or stationary. For exam-ple, consider a cosmological model describing our expanding uni-verse. Such models are usually constructed according to the Coper-nican principle that no position in the universe occupies a privilegedplace. In other words, they are homogeneous in the sense that theyhave Killing vectors describing arbitrary translations and rotations.Because of this high degree of symmetry, a gravitational potentialfor such a model would have to be independent of position, and thenit clearly could not encode any information about the spatial part ofthe metric. Even if we were willing to make the potential a functionof time, Φ(t), the results would still be nonsense. The gravitationalpotential is defined in terms of rate-matching of clocks, so a poten-tial that was purely a function of time would describe a situationin which all clocks, everywhere in the universe, were changing theirrates in a uniform way. But this is clearly just equivalent to a redefi-nition of the time coordinate, which has no observable consequencesbecause general relativity is coordinate-invariant. A corollary is thatin a cosmological spacetime, it is not possible to give a natural pre-

Section 7.4 Static and stationary spacetimes 259

Page 260: General Relativity

scription for deciding whether a particular redshift is gravitational(measured by Φ) or kinematic, or some combination of the two (seealso p. 312).

7.5 The uniform gravitational field revisitedThis section gives a somewhat exotic example. It is not necessaryto read it in order to understand the later material.

In problem 6 on page 200, we made a wish list of desired proper-ties for a uniform gravitational field, and found that they could notall be satisfied at once. That is, there is no global solution to theEinstein field equations that uniquely and satisfactorily embodiesall of our Newtonian ideas about a uniform field. We now revisitthis question in the light of our new knowledge.

The 1+1-dimensional metric

ds2 = e2gz dt2 − dz2

is the one that uniquely satisfies our expectations based on theequivalence principle (example 12, p. 59), and it is a vacuum so-lution. We might logically try to generalize this to 3+1 dimensionsas follows:

ds2 = e2gz dt2 − dx2 − dy2 − dz2 .

But a funny thing happens now — simply by slapping on the twonew Cartesian axes x and y, it turns out that we have made ourvacuum solution into a non-vacuum solution, and not only that,but the resulting stress-energy tensor is unphysical (ch. 8, problem8, p. 341).

One way to proceed would be to relax our insistence on makingthe spacetime one that exactly embodies the equivalence principle’srequirements for a uniform field.9 This can be done by taking gtt =e2Φ, where Φ is not necessarily equal to 2gz. By requiring that themetric be a 3+1 vacuum solution, we arrive at a differential equationwhose solution is Φ = ln(z + k1) + k2, which recovers the flat-spacemetric that we found in example 19 on page 140 by applying achange of coordinates to the Lorentz metric.

What if we want to carry out the generalization from 1+1 to 3+1without violating the equivalence principle? For physical motivationin how to get past this obstacle, consider the following argumentmade by Born in 1920.10 Take a frame of reference tied to a rotat-ing disk, as in the example from which Einstein originally took much

9Thanks to physicsforums.com user Mentz114 for suggesting this approachand demonstrating the following calculation.

10Max Born, Einstein’s Theory of Relativity, 1920. In the 1962 Dover edition,the relevant passage is on p. 320

260 Chapter 7 Symmetries

Page 261: General Relativity

of the motivation for creating a geometrical theory of gravity (sub-section 3.5.4, p. 108). Clocks near the edge of the disk run slowly,and by the equivalence principle, an observer on the disk interpretsthis as a gravitational time dilation. But this is not the only rela-tivistic effect seen by such an observer. Her rulers are also Lorentzcontracted as seen by a non-rotating observer, and she interpretsthis as evidence of a non-Euclidean spatial geometry. There aresome physical differences between the rotating disk and our defaultconception of a uniform field, specifically in the question of whetherthe metric should be static (i.e., lacking in cross-terms between thespace and time variables). But even so, these considerations makeit natural to hypothesize that the correct 3+1-dimensional metricshould have transverse spatial coefficients that decrease with height.

With this motivation, let’s consider a metric of the form

ds2 = e2z dt2 − e−2jz dx2 − e−2kz dy2 − dz2 ,

where j and k are constants, and I’ve taken g = 1 for convenience.11

The following Maxima code calculates the scalar curvature and theEinstein tensor:

1 load(ctensor);

2 ct_coords:[t,x,y,z];

3 lg:matrix([exp(2*z),0,0,0],

4 [0,-exp(-2*j*z),0,0],

5 [0,0,-exp(-2*k*z),0],

6 [0,0,0,-1]

7 );

8 cmetric();

9 scurvature();

10 leinstein(true);

The output from line 9 shows that the scalar curvature is constant,which is a necessary condition for any spacetime that we want tothink of as representing a uniform field. Inspecting the Einsteintensor output by line 10, we find that in order to get Gxx and Gyyto vanish, we need j and k to be (1±

√3i)/2. By trial and error, we

find that assigning the complex-conjugate values to j and k makesGtt and Gzz vanish as well, so that we have a vacuum solution.This solution is, unfortunately, complex, so it is not of any obviousvalue as a physically meaningful result. Since the field equationsare nonlinear, we can’t use the usual trick of forming real-valuedsuperpositions of the complex solutions. We could try simply tak-ing the real part of the metric. This gives gxx = e−z cos

√3z and

gyy = e−z sin√

3z, and is unsatisfactory because the metric becomes

11A metric of this general form is referred to as a Kasner metric. One usuallysees it written with a logarithmic change of variables, so that z appears in thebase rather than in the exponent.

Section 7.5 The uniform gravitational field revisited 261

Page 262: General Relativity

degenerate (has a zero determinant) at z = nπ/2√

3, where n is aninteger.

It turns out, however, that there is a very similar solution, foundby Petrov in 1962,12 that is real-valued. The Petrov metric, whichdescribes a spacetime with cylindrical symmetry, is:

ds2 = −dr2 − e−2r dz2 + er[2 sin√

3r dφ dt− cos√

3r(dφ2 − dt2)]

Note that it has many features in common with the complex oscilla-tory solution we found above. There are transverse length contrac-tions that decay and oscillate in exactly the same way. The presenceof the dφ dt term tells us that this is a non-static, rotating solution— exactly like the one that Einstein and Born had in mind in theirprototypical example! We typically obtain this type of effect dueto frame dragging by some rotating massive body (see p. 149), andthe Petrov solution can indeed be interpreted as the spacetime thatexists in the vacuum on the exterior of an infinite, rigidly rotatingcylinder of “dust” (see p. 132).

The complicated Petrov metric might seem like the furthest pos-sible thing from a uniform gravitational field, but in fact it is aboutthe closest thing general relativity provides to such a field. Wefirst note that the metric has Killing vectors ∂z, ∂φ, and ∂r, so ithas at least three out of the four translation symmetries we ex-pect from a uniform field. By analogy with electromagnetism, wewould expect this symmetry to be absent in the radial direction,since by Gauss’s law the electric field of a line of charge falls offlike 1/r. But surprisingly, the Petrov metric is also uniform ra-dially. It is possible to give the fourth killing vector explicitly (itis ∂r + z∂z + (1/2)(

√3t − φ)∂φ − (1/2)(

√3φ + t)∂t), but it is per-

haps more transparent to check that it represents a field of constantstrength (problem 4, p. 265).

For insight into this surprising result, recall that in our attemptat constructing the Cartesian version of this metric, we ran into theproblem that the metric became degenerate at z = nπ/2

√3. The

presence of the dφ dt term prevents this from happening in Petrov’scylindrical version; two of the metric’s diagonal components canvanish at certain values of r, but the presence of the off-diagonalcomponent prevents the determinant from going to zero. (The de-terminant is in fact equal to −1 everywhere.) What is happeningphysically is that although the labeling of the φ and t coordinatessuggests a time and an azimuthal angle, these two coordinates arein fact treated completely symmetrically. At values of r where thecosine factor equals 1, the metric is diagonal, and has signature

12Petrov, in Recent Developments in General Relativity, 1962, Pergamon, p.383. For a presentation that is freely accessible online, see Gibbons and Gielen,“The Petrov and Kaigorodov-Ozsvath Solutions: Spacetime as a Group Mani-fold,” arxiv.org/abs/0802.4082.

262 Chapter 7 Symmetries

Page 263: General Relativity

(t,φ, r, z) = (+,−,−,−), but when the cosine equals −1, this be-comes (−, +,−,−), so that φ is now the timelike coordinate. Thisperfect symmetry between φ and t is an extreme example of frame-dragging, and is produced because of the specially chosen rate ofrotation of the dust cylinder, such that the velocity of the dust atthe outer surface is exactly c (or approaches it).

Classically, we would expect that a test particle released closeenough to the cylinder would be pulled in by the gravitational at-traction and destroyed on impact, while a particle released fartheraway would fly off due to the centrifugal force, escaping and even-tually approaching a constant velocity. Neither of these would beanything like the experience of a test particle released in a uniformfield. But consider a particle released at rest in the rotating frameat a radius r1 for which cos

√3r1 = 1, so that t is the timelike co-

ordinate. The particle accelerates (let’s say outward), but at somepoint it arrives at an r2 where the cosine equals zero, and the φ− tpart of the metric is purely of the form dφ dt. At this location, wecan define local coordinates u = φ − t and v = φ + t, so that themetric depends only on du2− dv2. One of the coordinates, say u, isnow the timelike one. Since our particle is material, its world-linemust be timelike, so it is swept along in the −φ direction. Gibbonsand Gielen show that the particle will now come back inward, andcontinue forever by oscillating back and forth between two radii atwhich the cosine vanishes.

7.5.1 Closed timelike curves

This oscillation still doesn’t sound like the motion of a particlein a uniform field, but another strange thing happens, as we cansee by taking another look at the values of r at which the cosinevanishes. At such a value of r, construct a curve of the form (t =constant, r = constant,φ, z = constant). This is a closed curve, andits proper length is zero, i.e., it is lightlike. This violates causality.A photon could travel around this path and arrive at its startingpoint at the same time when it was emitted. Something similarlyweird hapens to the test particle described above: whereas it seemsto fall sometimes up and sometimes down, in fact it is always fallingdown — but sometimes it achieves this by falling up while movingbackward in time!

Although the Petov metric violates causality, Gibbons and Gie-len have shown that it satisfies the chronology protection conjecture:“In the context of causality violation we have shown that one cannotcreate CTCs [closed timelike curves] by spinning up a cylinder be-yond its critical angular velocity by shooting in particles on timelikeor null curves.”

We have an exact vacuum solution to the Einstein field equationsthat violates causality. This raises troublesome questions about thelogical self-consistency of general relativity. A very readable and

Section 7.5 The uniform gravitational field revisited 263

Page 264: General Relativity

entertaining overview of these issues is given in the final chapter ofKip Thorne’s Black Holes and Time Warps: Einstein’s OutrageousLegacy. In a toy model constructed by Thorne’s students, involv-ing a billiard ball and a wormhole, it turned out that there alwaysseemed to be self-consistent solutions to the ball’s equations of mo-tion, but they were not unique, and they often involved disquietingpossibilities in which the ball went back in time and collided with itsearlier self. Among other things, this seems to lead to a violation ofconservation of mass-energy, since no mass was put into the systemto create extra copies of the ball. This would then be an exampleof the fact that, as discussed in section 4.5.1, general relativity doesnot admit global conservation laws. However, there is also an ar-gument that the mouths of the wormhole change in mass in such away as to preserve conservation of energy.13

13http://golem.ph.utexas.edu/string/archives/000550.html

264 Chapter 7 Symmetries

Page 265: General Relativity

Problems1 Example 3 on page 245 gave the Killing vectors ∂z and ∂φ ofa cylinder. If we express these instead as two linearly independentKilling vectors that are linear combinations of these two, what isthe geometrical interpretation?

2 Section 7.4 told the story of Alice trying to find evidence thather spacetime is not stationary, and also listed the following exam-ples of spacetimes that were not stationary: (a) the solar system,(b) cosmological models, (c) gravitational waves propagating at thespeed of light, and (d) a cloud of matter undergoing gravitationalcollapse. For each of these, show that it is possible for Alice toaccomplish her mission.

3 If a spacetime has a certain symmetry, then we expect thatsymmetry to be detectable in the behavior of curvature scalars suchas the scalar curvature R = Raa and the Kretchmann invariantk = RabcdRabcd.(a) Show that the metric

ds2 = e2gz dt2 − dx2 − dy2 − dz2

from page 260 has constant values of R = 1/2 and k = 1/4. Notethat Maxima’s ctensor package has built-in functions for these; youhave to call the lriemann and uriemann before calling them.(b) Similarly, show that the Petrov metric

ds2 = −dr2 − e−2r dz2 + er[2 sin√

3r dφ dt− cos√

3r(dφ2 − dt2)]

(p. 262) has R = 0 and k = 0.

Remark: Surprisingly, one can have a spacetime on which every possible curva-ture invariant vanishes identically, and yet which is not flat. See Coley, Hervik,and Pelavas, “Spacetimes characterized by their scalar curvature invariants,”arxiv.org/abs/0901.0791v2.

4 Section 7.5 on page 260 presented the Petrov metric. Thepurpose of this problem is to verify that the gravitational field itrepresents does not fall off with distance. For simplicity, let’s restrictour attention to a particle released at an r such that cos

√3r = 1,

so that t is the timelike coordinate. Let the particle be released atrest in the sense that initially it has z = r = φ = 0, where dotsrepresent differentiation with respect to the particle’s proper time.Show that the magnitude of the proper acceleration is independentof r. . Solution, p. 387

5 The idea that a frame is “rotating” in general relativity canbe formalized by saying that the frame is stationary but not static.Suppose someone says that any rotation must have a center. Givea counterexample. . Solution, p. 387

Problems 265

Page 266: General Relativity

266 Chapter 7 Symmetries

Page 267: General Relativity

Chapter 8

Sources

8.1 Sources in general relativity8.1.1 Point sources in a background-independent theory

The Schrodinger equation and Maxwell’s equations treat space-time as a stage on which particles and fields act out their roles.General relativity, however, is essentially a theory of spacetime it-self. The role played by atoms or rays of light is so peripheralthat by the time Einstein had derived an approximate version ofthe Schwarzschild metric, and used it to find the precession of Mer-cury’s perihelion, he still had only vague ideas of how light and mat-ter would fit into the picture. In his calculation, Mercury played therole of a test particle: a lump of mass so tiny that it can be tossedinto spacetime in order to measure spacetime’s curvature, withoutworrying about its effect on the spacetime, which is assumed to benegligible. Likewise the sun was treated as in one of those orches-tral pieces in which some of the brass play from off-stage, so as toproduce the effect of a second band heard from a distance. Its massappears simply as an adjustable parameter m in the metric, and ifwe had never heard of the Newtonian theory we would have had noway of knowing how to interpret m.

When Schwarzschild published his exact solution to the vacuumfield equations, Einstein suffered from philosophical indigestion. Hisstrong belief in Mach’s principle led him to believe that there was aparadox implicit in an exact spacetime with only one mass in it. IfEinstein’s field equations were to mean anything, he believed thatthey had to be interpreted in terms of the motion of one body rela-tive to another. In a universe with only one massive particle, therewould be no relative motion, and so, it seemed to him, no motionof any kind, and no meaningful interpretation for the surroundingspacetime.

Not only that, but Schwarzschild’s solution had a singularityat its center. When a classical field theory contains singularities,Einstein believed, it contains the seeds of its own destruction. Aswe’ve seen on page 230, this issue is still far from being resolved, acentury later.

However much he might have liked to disown it, Einstein wasnow in possession of a solution to his field equations for a pointsource. In a linear, background-dependent theory like electromag-

267

Page 268: General Relativity

netism, knowledge of such a solution leads directly to the ability towrite down the field equations with sources included. If Coulomb’slaw tells us the 1/r2 variation of the electric field of a point charge,then we can infer Gauss’s law. The situation in general relativityis not this simple. The field equations of general relativity, unlikethe Gauss’s law, are nonlinear, so we can’t simply say that a planetor a star is a solution to be found by adding up a large number ofpoint-source solutions. It’s also not clear how one could represent amoving source, since the singularity is a point that isn’t even partof the continuous structure of spacetime (and its location is alsohidden behind an event horizon, so it can’t be observed from theoutside).

8.1.2 The Einstein field equation

The Einstein tensor

Given these difficulties, it’s not surprising that Einstein’s firstattempt at incorporating sources into his field equation was a deadend. He postulated that the field equation would have the Riccitensor on one side, and the stress-energy tensor T ab (page 161) onthe other,

Rab = 8πTab ,

where a factor of G/c4 on the right is suppressed by our choiceof units, and the 8π is determined on the basis of consistency withNewtonian gravity in the limit of weak fields and low velocities. Theproblem with this version of the field equations can be demonstratedby counting variables. R and T are symmetric tensors, so the fieldequation contains 10 constraints on the metric: 4 from the diagonalelements and 6 from the off-diagonal ones.

In addition, local conservation of mass-energy requires the div-ergence-free property ∇bT ab = 0. In order to construct an example,we recall that the only component of T for which we have so farintroduced any physical interpretation is T tt, which gives the den-sity of mass-energy. Suppose we had a stress-energy tensor whosecomponents were all zero, except for a time-time component varyingas T tt = kt. This would describe a region of space in which mass-energy was uniformly appearing or disappearing everywhere at aconstant rate. To forbid such examples, we need the divergence-free property to hold. This is exactly analogous to the continuityequation in fluid mechanics or electromagnetism, ∂ρ/∂t+∇ · J = 0(or ∇aJa = 0), which states that the quantity of fluid or charge isconserved.

But imposing the divergence-free condition adds 4 more con-straints on the metric, for a total of 14. The metric, however, is asymmetric rank-2 tensor itself, so it only has 10 independent com-ponents. This overdetermination of the metric suggests that theproposed field equation will not in general allow a solution to beevolved forward in time from a set of initial conditions given on a

268 Chapter 8 Sources

Page 269: General Relativity

spacelike surface, and this turns out to be true. It can in fact beshown that the only possible solutions are those in which the tracesR = Raa and T = T aa are constant throughout spacetime.

The solution is to replace Rab in the field equations with adifferent tensor Gab, called the Einstein tensor, defined by Gab =Rab − (1/2)Rgab,

Gab = 8πTab .

The Einstein tensor is constructed exactly so that it is divergence-free, ∇bGab = 0. (This is not obvious, but can be proved by directcomputation.) Therefore any stress-energy tensor that satisfies thefield equation is automatically divergenceless, and thus no additionalconstraints need to be applied in order to guarantee conservation ofmass-energy.

Self-check: Does replacing Rab with Gab invalidate the Schwarz-schild metric?

This procedure of making local conservation of mass-energy “bakedin” to the field equations is analogous to the way conservation ofcharge is treated in electricity and magnetism, where it follows fromMaxwell’s equations rather than having to be added as a separateconstraint.

Interpretation of the stress-energy tensor

The stress-energy tensor was briefly introduced in section 5.2 onpage 161. By applying the Newtonian limit of the field equationto the Schwarzschild metric, we find that T tt is to be identified asthe mass density ρ. The Schwarzschild metric describes a spacetimeusing coordinates in which the mass is at rest. In the cosmologicalapplications we’ll be considering shortly, it also makes sense to adopta frame of reference in which the local mass-energy is, on average,at rest, so we can continue to think of T tt as the (average) massdensity. By symmetry, T must be diagonal in such a frame. Forexample, if we had T tx 6= 0, then the positive x direction wouldbe distinguished from the negative x direction, but there is nothingthat would allow such a distinction.

Dust in a different frame Example: 1As discussed in example 14 on page 132, it is convenient incosmology to distinguish between radiation and “dust,” meaningnoninteracting, nonrelativistic materials such as hydrogen gas orgalaxies. Here “nonrelativistic” means that in the comoving frame,in which the average flow of dust vanishes, the dust particles allhave |v | 1. What is the stress-energy tensor associated withdust?

Since the dust is nonrelativistic, we can obtain the Newtonian limitby using units in which c 6= 1, and letting c approach infinity. InCartesian coordinates, the components of the stress-energy have

Section 8.1 Sources in general relativity 269

Page 270: General Relativity

units that cause them to scale like

T µν ∝

1 1/c 1/c 1/c

1/c 1/c2 1/c2 1/c2

1/c 1/c2 1/c2 1/c2

1/c 1/c2 1/c2 1/c2

.

In the limit of c → ∞, we can therefore take the only source ofgravitational fields to be T t t , which in Newtonian gravity must bethe mass density ρ, so

T µν =

ρ 0 0 00 0 0 00 0 0 00 0 0 0

.

Under a Lorentz boost by v in the x direction, the tensor transfor-mation law gives

T µ′ν′ =

γ2ρ γ2vρ 0 0γ2vρ γ2v2ρ 0 0

0 0 0 00 0 0 0

.

The over-all factor of γ2 arises because of the combination oftwo effects: each dust particle’s mass-energy is increased by afactor of γ, and length contraction also multiplies the density ofdust particles by a factor of γ. In the limit of small boosts, thestress-energy tensor becomes

T µ′ν′ ≈

ρ vρ 0 0

vρ 0 0 00 0 0 00 0 0 0

.

This motivates the interpretation of the time-space componentsof T as the flux of mass-energy along each axis. In the primedframe, mass-energy with density ρ flows in the x direction at ve-locity v , so that the rate at which mass-energy passes through awindow of area A in the y − z plane is given by ρvA.

This is also consistent with our imposition of the divergence-freeproperty, by which we were essentially stating T tx to be the rateof flow of T t t .

The center of mass-energy Example: 2In Newtonian mechanics, for motion in one dimension, the to-

tal momentum of a system of particles is given by ptot = Mvcm,where M is the total mass and vcm the velocity of the center ofmass. Is there such a relation in relativity?

270 Chapter 8 Sources

Page 271: General Relativity

Since mass and energy are equivalent, we expect that the rela-tivistic equivalent of the center of mass would have to be a centerof mass-energy.

It should also be clear that a center of mass-energy can only bewell defined for a region of spacetime that is small enough so thateffects due to curvature are negligible. For example, we can havecosmological models in which space is finite, and expands like thesurface of a balloon being blown up. If the model is homogeneous(there are no “special points” on the surface of the balloon), thenthere is no point in space that could be a center. (A real balloonhas a center, but in our metaphor only the balloon’s spherical sur-face correponds to physical space.) The fundamental issue hereis the same geometrical one that caused us to conclude that thereis no global conservation of mass-energy in general relativity (seesection 4.5.1). In a curved spacetime, parallel transport is path-dependent, so we can’t unambiguously define a way of addingvectors that occur in different places. The center of mass is de-fined by a sum of position vectors. From these considerations weconclude that the center of mass-energy is only well defined inspecial relativity, not general relativity.

For simplicity, let’s restrict ourselves to 1+1 dimensions, and adopta frame of reference in which the center of mass is at rest at x = 0.

Since T t t is interpreted as the density of mass-energy, the posi-tion of the center of mass must be given by

0 =∫

xT t t dx .

By analogy with the Newtonian relation ptot = Mvcm, let’s seewhat happens when we differentiate with respect to time. The ve-locity of the center of mass is then 0 = dxcm/dt =

∫∂tT t tx dx .

Applying the divergence-free property ∂tT t t + ∂xT tx = 0, this be-comes 0 = −

∫∂xT txx dx . Integration by parts gives us finally

0 =∫

T tx dx .

We’ve already interpreted T tx as the rate of flow of mass-energy,which is another way of describing momentum. We can thereforeinterpret T tx as the density of momentum, and the right-hand sideof this equation as the total momentum. The interpretation is thata system’s center of mass-energy is at rest if and only if it haszero total momentum.

Suppose, for example, that we prepare a uniform metal rod sothat one end is hot and the other cold. We then deposit it in outerspace, initially motionless relative to some observer. Althoughthe rod itself is uniform, its mass-energy is very slightly nonuni-form, so its center of mass-energy must be displaced a tiny bit

Section 8.1 Sources in general relativity 271

Page 272: General Relativity

away from the center, toward the hot end. As the rod approachesthermal equilibrium, the observer sees it accelerate very slightlyand then come to rest again, so that its center of mass-energyremains fixed! An even stranger case is described in example 9on p. 286.

Since the Einstein tensor is symmetric, the Einstein field equa-tion requires that the stress-energy tensor be symmetric as well. Itis reassuring that according to example 1 the tensor is symmetricfor dust, and that symmetry is preserved by changes of coordinatesand by superpositions of sources. Besides dust, the other cosmo-logically significant sources of gravity are electromagnetic radiationand the cosmological constant, and one can also check that thesegive symmetry. Belinfante noted in 1939 that symmetry seemed tofail in the case of fields with intrinsic spin, but he found that thisproblem could be avoided by modifying the previously assumed wayof connecting T to the properties of the field. This shows that it canbe rather subtle to interpret the stress-energy tensor and connect itto experimental observables. For more on this connection, and thecase of electromagnetic fields, see examples 7 and 8 on p. 283.

In example 1, we found that T xt had to be interpreted as theflux of T tt (i.e., the flux of mass-energy) across the x axis. Lorentzinvariance requires that we treat t, x, y, and z symmetrically, andthis forces us to adopt the following interpretation: Tµν , where µ isspacelike, is the flux of the density of the mass-energy four-vectorin the µ direction. In the comoving frame, in Cartesian coordi-nates, this means that T xx, T yy, and T zz should be interpreted aspressures. For example, T xx is the flux in the x direction of x-momentum. This is simply the pressure, P , that would be exertedon a surface with its normal in the x direction, so in the comov-ing frame we have Tµν = diag(ρ,P ,P ,P ). For a fluid that is notin equilibrium, the pressure need not be isotropic, and the stressexerted by the fluid need not be perpendicular to the surface onwhich it acts. The space-space components of T would then be theclassical stress tensor, whose diagonal elements are the anisotropicpressure, and whose off-diagonal elements are the shear stress. Thisis the reason for calling T the stress-energy tensor.

The prediction of general relativity is then that pressure acts as agravitational source with exactly the same strength as mass-energydensity. This has important implications for cosmology, since theearly universe was dominated by radiation, and a photon gas hasP = ρ/3 (example 14, p. 132).

Experimental tests

But how do we know that this prediction is even correct? Canit be verified in the laboratory? The classic laboratory test of thestrength of a gravitational source is the 1797 Cavendish experiment,in which a torsion balance was used to measure the very weak grav-

272 Chapter 8 Sources

Page 273: General Relativity

a / A Cavendish balance, usedto determine the gravitationalconstant.

itational attractions between metal spheres. We could test this as-pect of general relativity by doing a Cavendish experiment withboxes full of photons, so that the pressure is of the same orderof magnitude as the mass-energy. This is unfortunately utterly im-practical, since both P and ρ for a well-lit box are ridiculously smallcompared to ρ for a metal ball.

However, the repulsive electromagnetic pressure inside an atomicnucleus is quite large by ordinary standards — about 1033 Pa! Tosee how big this is compared to the nuclear mass density of ρ ∼1018 kg/m3, we need to take into account the factor of c2 6= 1 in SIunits, the result being that P/ρ is about 10−2, which is not too small.Thus if we measure gravitational interactions of nuclei with differentvalues of P/ρ, we should be able to test this prediction of generalrelativity. This was done in a Princeton PhD-thesis experiment byKreuzer1 in 1966.

Before we can properly describe and interpret the Kreuzer ex-periment, we need to distinguish the several different types of massthat could in principle be different from one another in a theory ofgravity. We’ve already encountered the distinction between inertialand gravitational mass, which Eotvos experiments (p. ??) show tobe equivalent to about one part in 1012. But there is also a dis-tinction between an object’s active gravitational mass ma, whichmeasures its ability to create gravitational fields, and its passivegravitational mass mp, which measures the force it feels when placedin an externally generated field. For experiments using laboratory-scale material objects at nonrelativistic velocities, the Newtonianlimit applies, and we can think of active gravitational mass as ascalar, with a density T tt = ρ.

To understand how this relates to pressure as a source of gravi-tational fields, it is helpful to consider a case where P is about thesame as ρ, which occurs for light. Light is inherently relativistic, sothe Newtonian concept of a scalar gravitational mass breaks down,but we can still use “mass” in quotes to talk qualitatively aboutan electromagnetic wave’s active and passive participation in grav-itational effects. Experiments show that general relativity correctlypredicts the deflection of light by the sun to about one part in 105

(p. 222). This is the electromagnetic equivalent of an Eotvos exper-iment; it shows that general relativity predicts the right thing aboutthe proportion between a light wave’s inertial and passive gravita-tional “masses.” Now suppose that general relativity was wrong,and pressure was not a source of gravitational fields. This wouldcause a drastic decrease in the active gravitational “mass” of anelectromagnetic wave.

The Kreuzer experiment actually dealt with static electric fieldsinside nuclei, not electromagnetic waves, but it is still clear what we

1Kreuzer, Phys. Rev. 169 (1968) 1007

Section 8.1 Sources in general relativity 273

Page 274: General Relativity

b / A simplified diagram ofKreuzer’s modification. Themoving teflon mass is submergedin a liquid with nearly the samedensity.

should expect in general: if pressure does not act as a gravitationalsource, then the ratio ma/mp should be different for different nuclei.Specifically, it should be lower for a nucleus with a higher atomicnumber Z, in which the electrostatic pressures are higher.

Kreuzer did a Cavendish experiment, figure b, using massesmade of two different substances. The first substance was teflon.The second substance was a mixture of the liquids trichloroethyleneand dibromoethane, with the proportions chosen so as to give a den-sity as close as possible to that of teflon. Teflon is 76% fluorine byweight, and the liquid is 74% bromine. Fluorine has atomic numberZ = 9, bromine Z = 35, and since the electromagnetic force has along range, the pressure within a nucleus scales upward roughly likeZ1/3 (because any given proton is acted on by Z − 1 other protons,and the size of a nucleus scales like Z1/3, so P ∝ Z/(Z1/3)2). Thesolid mass was immersed in the liquid, and the combined gravita-tional field of the solid and the liquid was detected by a Cavendishbalance.

Ideally, one would formulate the liquid mixture so that its passive-mass density was exactly equal to that of teflon, as determined bybuoyancy. Any oscillation in the torque measured by the Cavendishbalance would then indicate an inequivalence between active andpassive gravitational mass.

c / The Kreuzer experiment. 1. There are two passive masses, P, and an active mass A consisting ofa single 23-cm diameter teflon cylinder immersed in a fluid. The teflon cylinder is driven back and forthwith a period of 400 s. The resulting deflection of the torsion beam is monitored by an optical lever andcanceled actively by electrostatic forces from capacitor plates (not shown). The voltage required for this activecancellation is a measure of the torque exerted by A on the torsion beam. 2. Active mass as a function oftemperature. 3. Passive mass as a function of temperature. In both 2 and 3, temperature is measured in unitsof ohms, i.e., the uncalibrated units of a thermistor that was immersed in the liquid.

In reality, the two substances involved had different coefficientsof thermal expansion, so slight variations in temperature made theirpassive-mass densities unequal. Kreuzer therefore measured boththe buoyant force and the gravitational torque as functions of tem-perature. He determined that these became zero at the same tem-

274 Chapter 8 Sources

Page 275: General Relativity

perature, to within experimental errors, which verified the equiva-lence of active and passive gravitational mass to within a certainprecision,

mp ∝ ma

to within 5× 10−5.

Kreuzer intended this exeriment only as a test of mp ∝ ma,but it was reinterpreted in 1976 by Will2 as a test of the couplingof sources to gravitational fields as predicted by general relativityand other theories of gravity. Crudely, we’ve already argued thatmp ∝ ma would be substance-dependent if pressure did not cou-ple to gravitational fields. Will actually carried out a more carefulcalculation, of which I present a simplified summary. Suppose thatpressure does not contribute as much to gravitational fields as isclaimed by general relativity; its coupling is reduced by a factor1 − x, where x = 0 in general relativity.3 Will considers a modelconsisting of pointlike particles interacting through static electricalforces, and shows that for such a system,

ma = mp +1

2xUe ,

where Ue is the electrical energy. The Kreuzer experiment thenrequires |x| < 6 × 10−2, meaning that pressure does contribute togravitational fields as predicted by general relativity, to within aprecision of 6%.

One of the important ways in which Will’s calculation goes be-yond my previous crude argument is that it shows that when x = 0,as it does for general relativity, the correction term xUe/2 vanishes,and ma = mp exactly. This is interpreted as follows. Let a brominenucleus be referred to with a capital M , fluorine with the lowercasem. Then when a bromine nucleus and a fluorine nucleus interactgravitationally at a distance r, the Newtonian approximation ap-plies, and the total internal force acting on the pair of nuclei takenas a whole equals (mpMa − Mpma)/r

2 (in units where the New-tonian gravitational constant G equals 1). This vanishes only ifmpMa −Mpma = 0, which is equivalent to mp/Mp = ma/Ma. Ifthis proportionality fails, then the system violates Newton’s thirdlaw and conservation of momentum; its center of mass will acceler-ate along the line connecting the two nuclei, either in the directionof M or in the direction of m, depending on the sign of x.

2Will, “Active mass in relativistic gravity: Theoretical interpretation ofthe Kreuzer experiment,” Ap. J. 204 (1976) 234, available online at adsabs.

harvard.edu. A broader review of experimental tests of general relativity isgiven in Will, “The Confrontation between General Relativity and Experiment,”relativity.livingreviews.org/Articles/lrr-2006-3/. The Kreuzer experi-ment is discussed in section 3.7.3.

3In Will’s notation, ζ4 measures nonstandard coupling to pressure, ζ3 tointernal energy, and ζ1 to kinetic energy. By requiring that point-particle modelsagree with perfect-fluid models, one obtains (−2/3)ζ1 = ζ3 = −ζ4 = x.

Section 8.1 Sources in general relativity 275

Page 276: General Relativity

d / The Apollo 11 mission leftbehind this mirror, which in thisphoto shows the reflection of theblack sky. The mirror is usedfor lunar laser ranging measure-ments, which have an accuracyof about a centimeter.

Thus the vanishing of the correction term xUe/2 tells us thatgeneral relativity predicts exact conservation of momentum in thisinteraction. This is comforting, but a little susprising on the faceof it. Newtonian gravity treats active and passive massive perfectlysymmetrically, so that there is a perfect guarantee of conservation ofmomentum. But relativity incorporates them in a completely asym-metric manner, so there is no obvious reason that we should haveperfect conservation of momentum. In fact we don’t have any gen-eral guarantee of conservation of momentum, since, as discussed insection 4.5.1 on page 148, the language of general relativity doesn’teven give us the symbols we would need in order to state a globalconservation law for a vector. General relativity does, however,allow local conservation laws. We will have local conservation ofmass-energy and momentum provided that the stress-energy ten-sor’s divergence ∇bT ab vanishes.

Bartlett and van Buren4 used this connection to conservation ofmomentum in 1986 to derive a tighter limit on x. Since the moonhas an asymmetrical distribution of iron and aluminum, a nonzerox would cause it to have an anomalous acceleration along a certainline. Because lunar laser ranging gives extremely accurate data onthe moon’s orbit, the constraint is tightened to |x| < 1× 10−8.

These are tests of general relativity’s predictions about the grav-itational fields generated by the pressure of a static electric field. Inaddition, there is indirect confirmation (p. 305) that general relativ-ity is correct when it comes to electromagnetic waves.

Energy of gravitational fields not included in the stress-energytensor

Summarizing the story of the Kreuzer and Bartlett-van Burenresults, we find that observations verify to high precision one of thedefining properties of general relativity, which is that all forms ofenergy are equivalent to mass. That is, Einstein’s famous E = mc2

can be extended to gravitational effects, with the proviso that thesource of gravitational fields is not really a scalar m but the stress-energy tensor T .

But there is an exception to this even-handed treatment of alltypes of energy, which is that the energy of the gravitational fielditself is not included in T , and is not even generally a well-definedconcept locally. In Newtonian gravity, we can have conservation ofenergy if we attribute to the gravitational field a negative potentialenergy density −g2/8π. But the equivalence principle tells us thatg is not a tensor, for we can always make g vanish locally by goinginto the frame of a free-falling observer, and yet the tensor trans-formation laws will never change a nonzero tensor to a zero tensor

4Phys. Rev. Lett. 57 (1986) 21. The result is summarized in section 3.7.3 ofthe review by Will.

276 Chapter 8 Sources

Page 277: General Relativity

under a change of coordinates. Since the gravitational field is nota tensor, there is no way to add a term for it into the definition ofthe stress-energy, which is a tensor. The grammar and vocabularyof the tensor notation are specifically designed to prevent writingdown such a thing, so that the language of general relativity is noteven capable of expressing the idea that gravitational fields wouldthemselves contribute to T .

Self-check: (1) Convince yourself that the negative sign in theexpression −g2/8π makes sense, by considering the case where twoequal masses start out far apart and then fall together and combineto make a single body with twice the mass. (2) The Newtoniangravitational field is the gradient of the gravitational potential φ,which corresponds in the Newtonian limit to the time-time compo-nent of the metric. With this motivation, suppose someone proposesgeneralizing the Newtonian energy density −(∇φ)2/8π to a similarexpression such as −(∇agab)(∇cg b

c ), where ∇ is now the covariantderivative, and g is the metric, not the Newtonian field strength.What goes wrong?

As a concrete example, we observe that the Hulse-Taylor binarypulsar system (p. 222) is gradually losing orbital energy, and thatthe rate of energy loss exactly matches general relativity’s predictionof the rate of gravitational radiation. There is a net decrease inthe forms of energy, such as rest mass and kinetic energy, that areaccounted for in the stress energy tensor T . We can account forthe missing energy by attributing it to the outgoing gravitationalwaves, but that energy is not included in T , and we have to developspecial techniques for evaluating that energy. Those techniques onlyturn out to apply to certain special types of spacetimes, such asasymptotically flat ones, and they do not allow a uniquely definedenergy density to be attributed to a particular small region of space(for if they did, that would violate the equivalence principle).

Gravitational energy is locally unmeasurable. Example: 3When a new form of energy is discovered, the way we estab-lish that it is a form of energy is that it can be transformed to orfrom other forms of energy. For example, Becquerel discoveredradioactivity by noticing that photographic plates left in a deskdrawer along with radium salts became clouded: some new formof energy had been converted into previously known forms suchas chemical energy. It is only in this limited sense that energy isever locally observable, and this limitation prevents us from mean-ingfully defining certain measures of energy. For example we cannever measure the local electrical potential in the same sensethat we can measure the local barometric pressure; a potentialof 137 volts only has meaning relative to some other region ofspace taken to be at ground. Let’s use the acronym MELT to re-fer to measurement of energy by the local transformation of thatenergy from one form into another.

Section 8.1 Sources in general relativity 277

Page 278: General Relativity

The reason MELT works is that energy (or actually the momen-tum four-vector) is locally conserved, as expressed by the zero-divergence property of the stress-energy tensor. Without con-servation, there is no notion of transformation. The Einstein fieldequations imply this zero-divergence property, and the field equa-tions have been well verified by a variety of observations, includ-ing many observations (such as solar system tests and observa-tion of the Hulse-Taylor system) that in Newtonian terms wouldbe described as involving (non-local) transformations between ki-netic energy and the energy of the gravitational field. This agree-ment with observation is achieved by taking T = 0 in vacuum,regardless of the gravitational field. Therefore any local trans-formation of gravitational field energy into another form of energywould be inconsistent with previous observation. This implies thatMELT is impossible for gravitational field energy.

In particular, suppose that observer A carries out a local MELTof gravitational field energy, and that A sees this as a process inwhich the gravitational field is reduced in intensity, causing therelease of some other form of energy such as heat. Now con-sider the situation as seen by observer B, who is free-falling inthe same local region. B says that there was never any gravita-tional field in the first place, and therefore sees heat as violatinglocal conservation of energy. In B’s frame, this is a nonzero di-vergence of the stress-energy tensor, which falsifies the Einsteinfield equations.

Some examples

We conclude this introduction to the stress-energy tensor withsome illustrative examples.

A perfect fluid Example: 4For a perfect fluid, we have

Tab = (ρ + P)vavb − sPgab ,

where s = 1 for our + − −− signature or −1 for the signature− + ++, and v represents the coordinate velocity of the fluid’s restframe.

Suppose that the metric is diagonal, but its components are vary-ing, gαβ = diag(A2,−B2, . . .). The properly normalized velocityvector of an observer at (coordinate-)rest is vα = (A−1, 0, 0, 0).Lowering the index gives vα = (sA, 0, 0, 0). The various forms ofthe stress-energy tensor then look like the following:

T00 = A2ρ T11 = B2P

T 00 = sρ T 1

1 = −sP

T 00 = A−2ρ T 11 = B−2P .

278 Chapter 8 Sources

Page 279: General Relativity

A rope dangling in a Schwarzschild spacetime Example: 5Suppose we want to lower a bucket on a rope toward the eventhorizon of a black hole. We have already made some qualitativeremarks about this idea in example 15 on p. 65. This seeminglywhimsical example turns out to be a good demonstration of sometechniques, and can also be used in thought experiments thatillustrate the definition of mass in general relativity and that probesome ideas about quantum gravity.5

The Schwarzschild metric (p. 212) is

ds2 = f 2 dt2 − f−2 dr2 + . . . ,

where f = (1 − 2m/r )1/2, and . . . represents angular terms. Wewill end up needing the following Christoffel symbols:

Γ ttr = f ′/f

Γθθr = Γφφr = r−1

Since the spacetime has spherical symmetry, it ends up beingmore convenient to consider a rope whose shape, rather thanbeing cylindrical, is a cone defined by some set of (θ,φ). Forconvenience we take this set to cover unit solid angle. The finalresults obtained in this way can be readily converted into state-ments about a cylindrical rope. We let µ be the mass per unitlength of the rope, and T the tension. Both of these may dependon r . The corresponding energy density and tensile stress areρ = µ/A = µ/r2 and S = T/A. To connect this to the stress-energytensor, we start by comparing to the case of a perfect fluid fromexample 4. Because the rope is made of fibers that have stengthonly in the radial direction, we will have T θθ = Tφφ = 0. Further-more, the stress is tensile rather than compressional, correspond-ing to a negative pressure. The Schwarzschild coordinates areorthogonal but not orthonormal, so the properly normalized ve-locity of a static observer has a factor of f in it: vα = (f−1, 0, 0, 0),or, lowering an index, vα = (f , 0, 0, 0). The results of example 4show that the mixed-index form of T will be the most convenient,since it can be expressed without messy factors of f . We have

T κν = diag(ρ, S, 0, 0) = r−2 diag(µ, T , 0, 0) .

By writing the stress-energy tensor in this form, which is indepen-dent of t , we have assumed static equilibrium outside the eventhorizon. Inside the horizon, the r coordinate is the timelike one,the spacetime itself is not static, and we do not expect to findstatic solutions, for the reasons given on p. 65.

5Brown, “Tensile Strength and the Mining of Black Holes,” arxiv.org/abs/

1207.3342

Section 8.1 Sources in general relativity 279

Page 280: General Relativity

Conservation of energy is automatically satisfied, since there isno time dependence. Conservation of radial momentum is ex-pressed by

∇κT κr = 0 ,

or0 = ∇r T r

r +∇tT tr +∇θT θr +∇φTφr .

It would be tempting to throw away all but the first term, sinceT is diagonal, and therefore T t

r = T θr = Tφr = 0. However, acovariant derivative can be nonzero even when the symbol beingdifferentiated vanishes identically. Writing out these four terms,we have

0 =∂r T rr + Γ r

r r T rr − Γ r

r r T rr

+Γ ttr T r

r − Γ ttr T t

t

+Γθθr T rr

+Γφφr T rr ,

where each line corresponds to one covariant derivative. Evalu-ating this, we have

0 = T ′ +f ′

fT − f ′

fµ ,

where primes denote differentiation with respect to r . Note thatsince no terms of the form ∂r T t

t occur, this expression is validregardless of whether we take µ to be constant or varying. Thuswe are free to take ρ ∝ r−2, so that µ is constant, and this meansthat our result is equally applicable to a uniform cylindrical rope.This result is checked using computer software in example 6.

This is a differential equation that tells us how the tensile stress inthe rope varies along its length. The coefficient f ′/f = m/r (r−2m)blows up at the event horizon, which is as expected, since we donot expect to be able to lower the rope to or below the horizon.

Let’s check the Newtonian limit, where the gravitational field is gand the potential is Φ. In this limit, we have f ≈ 1 −Φ, f ′/f ≈ g(with g > 0), and µ T , resulting in

0 = T ′ − gµ .

which is the expected Newtonian relation.

Returning to the full general-relativistic result, it can be shown thatfor a loaded rope with no mass of its own, we have a finite resultfor limr→∞ S, even when the bucket is brought arbitrarily close tothe horizon. However, this is misleading without the caveat thatfor µ < T , the speed of transverse waves in the rope is greaterthan c, which is not possible for any known form of matter — itwould violate the null energy condition, discussed in the followingsection.

280 Chapter 8 Sources

Page 281: General Relativity

The rope, using computer algebra Example: 6The result of example 5 can be checked with the following Maximacode:

1 load(ctensor);

2 ct_coords:[t,r,theta,phi];

3 depends(f,r);

4 depends(ten,r); /* tension depends on r */

5 depends(mu,r); /* mass/length depends on r */

6 lg:matrix([f^2,0,0,0],

7 [0,-f^-2,0,0],

8 [0,0,-r^2,0],

9 [0,0,0,-r^2*sin(theta)^2]);

10 cmetric();

11 christof(mcs);

12 /* stress-energy tensor, T^mu_nu */

13 t:r^-2*matrix(

14 [mu,0,0,0],

15 [0,ten,0,0],

16 [0,0,0,0],

17 [0,0,0,0]

18 );

19 /*

20 Compute covariant derivative of the stress-energy

21 tensor with respect to its first index. The

22 function checkdiv is defined so that the first

23 index has to be covariant (lower); the T I’m

24 putting in is T^mu_nu, and since it’s symmetric,

25 that’s the same as T_mu^nu.

26 */

27 checkdiv(t);

8.1.3 Energy conditions

Physical theories are supposed to answer questions. For exam-ple:

1. Does a small enough physical object always have a world-linethat is approximately a geodesic?

2. Do massive stars collapse to form black-hole singularities?

3. Did our universe originate in a Big Bang singularity?

4. If our universe doesn’t currently have violations of causalitysuch as the closed timelike curves exhibited by the Petrov met-ric (p. 262), can we be assured that it will never develop causal-ity violation in the future?

Section 8.1 Sources in general relativity 281

Page 282: General Relativity

We would like to “prove” whether the answers to questions like theseare yes or no, but physical theories are not formal mathematicalsystems in which results can be “proved” absolutely. For example,the basic structure of general relativity isn’t a set of axioms but alist of ingredients like the equivalence principle, which has evadedformal definition.6

Even the Einstein field equations, which appear to be completelywell defined, are not mathematically formal predictions of the be-havior of a physical system. The field equations are agnostic onthe question of what kinds of matter fields contribute to the stress-energy tensor. In fact, any spacetime at all is a solution to theEinstein field equations, provided we’re willing to admit the corre-sponding stress-energy tensor. We can never answer questions likethe ones above without assuming something about the stress-energytensor.

In example 14 on page 132, we saw that radiation has P = ρ/3and dust has P = 0. Both have ρ ≥ 0. If the universe is made outof nothing but dust and radiation, then we can obtain the followingfour constraints on the energy-momentum tensor:

trace energy condition ρ− 3P ≥ 0strong energy condition ρ+ 3P ≥ 0 and ρ+ P ≥ 0dominant energy condition ρ ≥ 0 and |P | ≤ ρweak energy condition ρ ≥ 0 and ρ+ P ≥ 0null energy condition ρ+ P ≥ 0

These are arranged roughly in order from strongest to weakest. Theyall have to do with the idea that negative mass-energy doesn’t seemto exist in our universe, i.e., that gravity is always attractive ratherthan repulsive. With this motivation, it would seem that thereshould only be one way to state an energy condition: ρ > 0. Butthe symbols ρ and P refer to the form of the stress-energy tensorin a special frame of reference, interpreted as the one that is at restrelative to the average motion of the ambient matter. (Such a frameis not even guaranteed to exist unless the matter acts as a perfectfluid.) In this frame, the tensor is diagonal. Switching to some otherframe of reference, the ρ and P parts of the tensor would mix, andit might be possible to end up with a negative energy density. Theweak energy condition is the constraint we need in order to makesure that the energy density is never negative in any frame.

The dominant energy condition is like the weak energy condition,but it also guarantees that no observer will see a flux of energyflowing at speeds greater than c.

The strong energy condition essentially states that gravity isnever repulsive; it is violated by the cosmological constant (see

6“Theory of gravitation theories: a no-progress report,” Sotiriou, Faraoni,and Liberati, http://arxiv.org/abs/0707.2748

282 Chapter 8 Sources

Page 283: General Relativity

p. 295).

An electromagnetic wave Example: 7In example 1 on p. 269, we saw that dust boosted along the x

axis gave a stress-energy tensor

Tµν = γ2ρ

(1 vv v2

),

where we now suppress the y and z parts, which vanish. Forv → 1, this becomes

Tµν = ρ′(

1 11 1

),

where ρ′ is the energy density as measured in the new frame. Asa source of gravitational fields, this ultrarelativistic dust is indis-tinguishable from any other form of matter with v = 1 along thex axis, so this is also the stress-energy tensor of an electromag-netic wave with local energy-density ρ′, propagating along the xaxis. (For the full expression for the stress-energy tensor of anarbitrary electromagnetic field, see the Wikipedia article “Electro-magnetic stress-energy tensor.”)

This is a stress-energy tensor that represents a flux of energy at aspeed equal to c, so we expect it to lie at exactly the limit imposedby the dominant energy condition (DEC). Our statement of theDEC, however, was made for a diagonal stress-energy tensor,which is what is seen by an observer at rest relative to the matter.But we know that it’s impossible to have an observer who, as theteenage Einstein imagined, rides alongside an electromagneticwave on a motorcycle. One way to handle this is to generalize ourdefinition of the energy condition. For the DEC, it turns out thatthis can be done by requiring that the matrix T , when multipliedby a vector on or inside the future light-cone, gives another vectoron or inside the cone.

A less elegant but more concrete workaround is as follows. Re-turning to the original expression for the T of boosted dust atvelocity v , we let v = 1 + ε, where |ε| 1. This gives a stress-energy tensor that (ignoring multiplicative constants) looks like:(

1 1 + ε1 + ε 1 + 2ε

).

If ε is negative, we have ultrarelativistic dust, and we can verifythat it satisfies the DEC by un-boosting back to the rest frame.To do this explicitly, we can find the matrix’s eigenvectors, which(ignoring terms of order ε2) are (1, 1+ε) and (1, 1−ε), with eigen-values 2 + 2ε and 0, respectively. For ε < 0, the first of these istimelike, the second spacelike. We interpret them simply as the

Section 8.1 Sources in general relativity 283

Page 284: General Relativity

e / Example 8.

t and x basis vectors of the rest frame in which we originally de-scribed the dust. Using them as a basis, the stress-energy tensortakes on the form diag(2 + 2ε, 0). Except for a constant factor thatwe didn’t bother to keep track of, this is the original form of theT in the dust’s rest frame, and it clearly satisfies the DEC, sinceP = 0.

For ε > 0, v = 1 + ε is a velocity greater than the speed of light,and there is no way to construct a boost corresponding to−v . Wecan nevertheless find a frame of reference in which the stress-energy tensor is diagonal, allowing us to check the DEC. Theexpressions found above for the eigenvectors and eigenvaluesare still valid, but now the timelike and spacelike characters of thetwo basis vectors have been interchanged. The stress-energytensor has the form diag(0, 2 + 2ε), with ρ = 0 and P > 0, whichviolates the DEC. As in this example, any flux of mass-energy atspeeds greater than c will violate the DEC.

The DEC is obeyed for ε < 0 and violated for ε > 0, and since ε =0 gives a stress-energy tensor equal to that of an electromagneticwave, we can tell that light is exactly on the border between formsof matter that fulfill the DEC and those that don’t. Since the DECis formulated as a non-strict inequality, it follows that light obeysthe DEC.

No “speed of flux” Example: 8The foregoing discussion may have encouraged the reader to be-lieve that it is possible in general to read off a “speed of energyflux” from the value of T at a point. This is not true.

The difficulty lies in the distinction between flow with and withoutaccumulation, which is sometimes valid and sometimes not. Inspringtime in the Sierra Nevada, snowmelt adds water to alpinelakes more rapidly than it can flow out, and the water level rises.This is flow with accumulation. In the fall, the reverse happens,and we have flow with depletion (negative accumulation).

Figure e/1 shows a second example in which the distinction seemsvalid. Charge is flowing through the lightbulb, but because thereis no accumulation of charge in the DC circuit, we can’t detect theflow by an electrostatic measurement; the wire does not attractthe tiny bits of paper below it on the table.

But we know that with different measurements, we could detectthe flow of charge in e/1. For example, the magnetic field fromthe wire would deflect a nearby magnetic compass. This showsthat the distinction between flow with and without accumulationmay be sometimes valid and sometimes invalid. Flow withoutaccumulation may or may not be detectable; it depends on thephysical context.

284 Chapter 8 Sources

Page 285: General Relativity

In figure e/2, an electric charge and a magnetic dipole are super-imposed at a point. The Poynting vector P defined as E × B isused in electromagnetism as a measure of the flux of energy, andit tells the truth, for example, when the sun warms your sun ona hot day. In e/2, however, all the fields are static. It seems asthough there can be no flux of energy. But that doesn’t mean thatthe Poynting vector is lying to us. It tells us that there is a pat-tern of flow, but it’s flow without accumulation; the Poynting vectorforms circular loops that close upon themselves, and electromag-netic energy is transported in and out of any volume at the samerate. We would perhaps prefer to have a mathematical rule thatgave zero for the flux in this situation, but it’s acceptable that ourrule P = E× B gives a nonzero result, since it doesn’t incorrectlypredict an accumulation, which is what would be detectable.

Now suppose we’re presented with this stress-energy tensor, mea-sured at a single point and expressed in some units:

T µν =(

4.037± 0.002 4.038± 0.0024.036± 0.002 4.036± 0.002

).

To within the experimental error bars, it has the right form to bemany different things: (1) We could have a universe filled withperfectly uniform dust, moving along the x axis at some ultrarel-ativistic speed v so great that the ε in v = 1 − ε, as in example7, is not detectably different from zero. (2) This could be a pointsampled from an electromagnetic wave traveling along the x axis.(3) It could be a point taken from figure e/2. (In cases 2 and 3,the off-diagonal elements are simply the Poynting vector.)

In cases 1 and 2, we would be inclined to interpret this stress-energy tensor by saying that its off-diagonal part measures theflux of mass-energy along the x axis, while in case 3 we would re-ject such an interpretation. The trouble here is not so much in ourinterpretation of T as in our Newtonian expectations about what isor isn’t observable about fluxes that flow without accumulation. InNewtonian mechanics, a flow of mass is observable, regardlessof whether there is accumulation, because it carries momentumwith it; a flow of energy, however, is undetectable if there is noaccumulation. The trouble here is that relativistically, we can’tmaintain this distinction between mass and energy. The Einsteinfield equations tell us that a flow of either will contribute equally tothe stress-energy, and therefore to the surrounding gravitationalfield.

The flow of energy in e/2 contributes to the gravitational field, andits contribution is changed, for example, if the magnetic field is re-versed. The figure is in fact not a bad qualitative representation ofthe spacetime around a rotating, charged black hole. At large dis-tances, however, the gravitational effect of the off-diagonal termsin T becomes small, because they average to nearly zero over a

Section 8.1 Sources in general relativity 285

Page 286: General Relativity

sufficiently large spherical region. The distant gravitational fieldapproaches that of a point mass with the same mass-energy.

Momentum in static fields Example: 9Continuing the train of thought described in example 8, we can

come up with situations that seem even more paradoxical. In fig-ure e/2, the total momentum of the fields vanishes by symmetry.This symmetry can, however, be broken by displacing the electriccharge by ∆R perpendicular to the magnetic dipole vector D. Thetotal momentum no longer vanishes, and now lies in the directionof D × ∆R. But we have proved in example 2 on p. 270 that asystem’s center of mass-energy is at rest if and only if its totalmomentum is zero. Since this system’s center of mass-energy iscertainly at rest, where is the other momentum that cancels thatof the electric and magnetic fields?

Suppose, for example, that the magnetic dipole consists of a loopof copper wire with a current running around it. If we open aswitch and extinguish the dipole, it appears that the system mustrecoil! This seems impossible, since the fields are static, and anelectric charge does not interact with a magnetic dipole.

Babson et al.7 have analyzed a number of examples of this type.In the present one, the mysterious “other momentum” can be at-tributed to a relativistic imbalance between the momenta of theelectrons in the different parts of the wire. A subtle point aboutthese examples is that even in the case of an idealized dipole ofvanishingly small size, it makes a difference what structure we as-sume for the dipole. In particular, the field’s momentum is nonzerofor a dipole made from a current loop of infinitesimal size, but zerofor a dipole made out of two magnetic monopoles.8

Geodesic motion of test particles

Question 1 on p. 281 was: “Does a small enough physical objectalways have a world-line that is approximately a geodesic?” In otherwords, do Eotvos experiments give null results when carried out inlaboratories using real-world apparatus of small enough size? Wewould like something of this type to be true, since general relativityis based on the equivalence principle, and the equivalence principle ismotivated by the null results of Eotvos experiments. Nevertheless, itis fairly easy to show that the answer to the question is no, unless wemake some more specific assumption, such as an energy condition,about the system being modeled.

Before we worry about energy conditions, let’s consider why thesmall size of the apparatus is relevant. Essentially this is because ofgravitational radiation. In a gravitationally radiating system suchas the Hulse-Taylor binary pulsar (p. 222), the material bodies lose

7Am. J. Phys. 77 (2009) 8268Milton and Meille, arxiv.org/abs/1208.4826

286 Chapter 8 Sources

Page 287: General Relativity

energy, and as with any radiation process, the radiated power de-pends on the square of the strength of the source. The world-line ofa such a body therefore depends on its mass, and this shows that itsworld-line cannot be an exact geodesic, since the initially tangentworld-lines of two different masses diverge from one another, andthese two world-lines can’t both be geodesics.

Let’s proceed to give a rough argument for geodesic motion andthen try to poke holes in it. When we test geodesic motion, we doan Eotvos experiment that is restricted to a certain small regionof spacetime S. Our test-body’s world-line enters S with a certainenergy-momentum vector p and exits with p′. If spacetime wasflat, then Gauss’s theorem would hold exactly, and the vanishingdivergence ∇bT ab of the stress-energy tensor would require that theincoming flux represented by p be exactly canceled by the outgoingflux due to p′. In reality spacetime isn’t flat, and it isn’t even possibleto compare p and p′ except by parallel-transporting one into thesame location as the other. Parallel transport is path-dependent,but if we make the reasonable restriction to paths that stay within S,we expect the ambiguity due to path-dependence to be proportionalto the area enclosed by any two paths, so that if S is small enough,the ambiguity can be made small. Ignoring this small ambiguity,we can see that one way for the fluxes to cancel would be for theparticle to travel along a geodesic, since both p and p′ are tangentto the test-body’s world-line, and a geodesic is a curve that parallel-transports its own tangent vector. Geodesic motion is therefore onesolution, and we expect the solution to be nearly unique when S issmall.

Although this argument is almost right, it has some problems.First we have to ask whether “geodesic” means a geodesic of the fullspacetime including the object’s own fields, or of the backgroundspacetime B that would have existed without the object. The latteris the more sensible interpretation, since the question is basicallyasking whether a spacetime can really be defined geometrically, asthe equivalence principle claims, based on the motion of test par-ticles inserted into it. We also have to define words like “smallenough” and “approximately;” to do this, we imagine a sequence ofobjects On that get smaller and smaller as n increases. We thenform the following conjecture, which is meant to formulate question1 more exactly: Given a vacuum background spacetime B, and atimelike world-line ` in B, consider a sequence of spacetimes Sn,formed by inserting the On into B, such that: (i) the metric of Sn isdefined on the same points as the metric of B; (ii) On moves along`, and for any r > 0, there exists some n such that for m ≥ n, Om

is smaller than r;9 (iii) the metric of Sn approaches the metric of B

9i.e., at any point P on `, an observer moving along ` at P defines a surfaceof simultaneity K passing through P, and sees the stress-energy tensor of On asvanishing outside of a three-sphere of radius r within K and centered on P

Section 8.1 Sources in general relativity 287

Page 288: General Relativity

f / Negative mass.

g / The black sphere is madeof ordinary matter. Thecrosshatched sphere has positivegravitational mass and negativeinertial mass. If the two of themare placed side by side in emptyspace, they will both acceleratesteadily to the right, graduallyapproaching the speed of light.Conservation of momentum ispreserved, because the exoticsphere has leftward momentumwhen it moves to the right, so thetotal momentum is always zero.

as n→∞. Then ` is a geodesic of B.

This is almost right but not quite, as shown by the followingcounterexample. Papapetrou10 has shown that a spinning body ina curved background spacetime deviates from a geodesic with anacceleration that is proportional to LR, where L is its angular mo-mentum and R is the Riemann curvature. Let all the On have a fixedvalue of L, but let the spinning mass be concentrated into a smallerand smaller region as n increases, so as to satisfy (ii). As the radiusr decreases, the motion of the particles composing an On eventuallyhas to become ultrarelativistic, so that the main contribution to thegravitational field is from the particles’ kinetic energy rather thantheir rest mass. We then have L ∼ pr ∼ Er, so that in order tokeep L constant, we must have E ∝ 1/r. This causes two problems.First, it makes the gravitational field blow up at small distances,violating (iii). Also, we expect that for any known form of matter,there will come a point (probably the Tolman-Oppenheimer-Volkofflimit) at which we get a black hole; the singularity is then not partof the spacetime Sn, violating (i). But our failed counterexamplecan be patched up. We obtain a supply of exotic matter, whosegravitational mass is negative, and we mix enough of this mysteri-ous stuff into each On so that the gravitational field shrinks ratherthan growing as n increases, and no black hole is ever formed.

Ehlers and Geroch11 have proved that it suffices to require anadditional condition: (iv) The On satisfy the dominant energy con-dition. This rules out our counterexample.

The Newtonian limit

In units with c 6= 1, a quantity like ρ+P is expressed as ρ+P/c2.The Newtonian limit is recovered as c→∞, which makes the pres-sure term negligible, so that all the energy conditions reduce toρ ≥ 0. What would it mean if this was violated? Would ρ < 0describe an object with negative inertial mass, which would accel-erate east when you pushed it to the west? Or would it describesomething with negative gravitational mass, which would repel or-dinary matter? We can imagine various possiblities, as shown infigure f. Anything that didn’t lie on the main diagonal would vi-olate the equivalence principle, and would therefore be impossibleto accomodate within general relativity’s geometrical description ofgravity. If we had “upsidasium” matter such as that described bythe second quadrant of the figure (example 2, p. 26), gravity wouldbe like electricity, except that like masses would attract and oppo-sites repel; we could have gravitational dielectrics and gravitationalFaraday cages. The fourth quadrant leads to amusing possibilitieslike figure g.

10Proc. Royal Soc. London A 209 (1951) 248. The relevant result is summa-rized in Misner, Thorne, and Wheeler, Gravitation, p. 1121.

11arxiv.org/abs/gr-qc/0309074v1

288 Chapter 8 Sources

Page 289: General Relativity

h / Nulling out a gravitationalfield is impossible in one dimen-sion without exotic matter. 1. Theplanet imposes a nonvanishinggravitational field with a nonvan-ishing gradient. 2. We can nullthe field at one point in space, byplacing a sphere of very dense,but otherwise normal, matteroverhead. The stick figure stillexperiences a tidal force, g′ 6= 0.3. To change the field’s derivativewithout changing the field, wecan place two additional massesabove and below the given point.But to change its derivative in thedesired direction — toward zero— we would have to make thesemasses negative.

No gravitational shielding Example: 10Electric fields can be completely excluded from a Faraday cage,

and magnetic fields can be very strongly blocked with high-perme-ability materials such as mu-metal. It would be fun if we could dothe same with gravitational fields, so that we could have zero-gravity or near-zero-gravity parties in a specially shielded room.It would be a form of antigravity, but a different one than the “upsi-dasium” type. Unfortunately this is difficult to do, and the reasonit’s difficult turns out to be related to the unavailability of materialsthat violate energy conditions.

First we need to define what we mean by shielding. We restrictourselves to the Newtonian limit, and to one dimension, so that agravitational field is specified by a function of one variable g(x).The best kind of shielding would be some substance that wecould cut with shears and form into a box, and that would ex-clude gravitational fields from the interior of the box. This wouldbe analogous to a Faraday cage; no matter what external field itwas embedded in, it would spontaneously adjust itself so that theinternal field was canceled out. A less desirable kind of shieldingwould be one that we could set up on an ad hoc basis to null out aspecific, given, externally imposed field. Once we know what theexternal field is, we try to choose some arrangement of massessuch that the field is nulled out. We will show that even this kindof shielding is unachievable, if nulling out the field is interpretedto mean this: at some point, which for convenience we take to bethe origin, we wish to have a gravitational field such that g(0) = 0,dg/dx(0) = 0, . . . dn g/dxn(0) = 0, where n is arbitrarily specified.For comparison, magnetic fields can be nulled out according tothis definition by building an appropriately chosen configuration ofcoils such as a Helmholtz coil.

Since we’re only doing the Newtonian limit, the gravitational fieldis the sum of the fields made by all the sources, and we can takethis as a sum over point sources. For a point source m placed atxo, the field g(x) is odd under reflection about xo. The derivativeof the field g′(x) is even. Since g′ is even, we can’t control its signat x = 0 by choosing xo > 0 or xo < 0. The only way to controlthe sign of g′ is by choosing the sign of m. Therefore if the signof the externally imposed field’s derivative is wrong, we can nevernull it out. Figure h shows a special case of this theorem.

The theorem does not apply to three dimensions, and it does notprove that all fields are impossible to null out, only that some are.For example, the field inside a hemispherical shell can be nulledby adding another hemispherical shell to complete the sphere. Ithank P. Allen for helpful discussion of this topic.

Section 8.1 Sources in general relativity 289

Page 290: General Relativity

Singularity theorems

An important example of the use of the energy conditions is thatHawking and Ellis have proved that under the assumption of thestrong energy condition, any body that becomes sufficiently com-pact will end up forming a singularity. We might imagine thatthe formation of a black hole would be a delicate thing, requiringperfectly symmetric initial conditions in order to end up with theperfectly symmetric Schwarzschild metric. Many early relativiststhought so, for good reasons. If we look around the universe atvarious scales, we find that collisions between astronomical bodiesare extremely rare. This is partly because the distances are vastcompared to the sizes of the objects, but also because conservationof angular momentum has a tendency to make objects swing pastone another rather than colliding head-on. Starting with a cloud ofobjects, e.g., a globular cluster, Newton’s laws make it extremelydifficult, regardless of the attractive nature of gravity, to pick initialconditions that will make them all collide the future. For one thing,they would have to have exactly zero total angular momentum.

Most relativists now believe that this is not the case. Generalrelativity describes gravity in terms of the tipping of light cones.When the field is strong enough, there is a tendency for the lightcones to tip over so far that the entire future light-cone points at thesource of the field. If this occurs on an entire surface surroundingthe source, it is referred to as a trapped surface.

To make this notion of light cones “pointing at the source” morerigorous, we need to define the volume expansion Θ. Let the set ofall points in a spacetime (or some open subset of it) be expressed asthe union of geodesics. This is referred to as a foliation in geodesics,or a congruence. Let the velocity vector tangent to such a curvebe ua. Then we define Θ = ∇aua. This is exactly analogous tothe classical notion of the divergence of the velocity field of a fluid,which is a measure of compression or expansion. Since Θ is a scalar,it is coordinate-independent. Negative values of Θ indicate that thegeodesics are converging, so that volumes of space shrink. A trappedsurface is one on which Θ is negative when we foliate with lightlikegeodesics oriented outward along normals to the surface.

When a trapped surface forms, any lumpiness or rotation inthe initial conditions becomes irrelevant, because every particle’sentire future world-line lies inward rather than outward. A possi-ble loophole in this argument is the question of whether the lightcones will really tip over far enough. We could imagine that un-der extreme conditions of high density and temperature, mattermight demonstrate unusual behavior, perhaps including a negativeenergy density, which would then give rise to a gravitational repul-sion. Gravitational repulsion would tend to make the light cones tipoutward rather than inward, possibly preventing the collapse to a

290 Chapter 8 Sources

Page 291: General Relativity

singularity. We can close this loophole by assuming an appropriateenergy condition. Penrose and Hawking have formalized the aboveargument in the form of a pair of theorems, known as the singularitytheorems. One of these applies to the formation of black holes, andanother one to cosmological singularities such as the Big Bang.

In a cosmological model, it is natural to foliate using world-lines that are at rest relative to the Hubble flow (or, equivalently,the world-lines of observers who see a vanishing dipole moment inthe cosmic microwave background). The Θ we then obtain is pos-itive, because the universe is expanding. The volume expansionis Θ = 3Ho, where Ho ≈ 2.3 × 10−18 s−1 is the Hubble constant(the fractional rate of change of the scale factor of cosmological dis-tances). The factor of three occurs because volume is proportionalto the cube of the linear dimensions.

Current status

The current status of the energy conditions is shaky. Althoughit is clear that all of them hold in a variety of situations, there arestrong reasons to believe that they are violated at both microscopicand cosmological scales, for reasons both classical and quantum-mechanical.12 We will see such a violation in the following section.

8.1.4 The cosmological constant

Having included the source term in the Einstein field equations,our most important application will be to cosmology. Some of therelevant ideas originate long before Einstein. Once Newton hadformulated a theory of gravity as a universal attractive force, herealized that there would be a tendency for the universe to collapse.He resolved this difficulty by assuming that the universe was infinitein spatial extent, so that it would have no center of symmetry, andtherefore no preferred point to collapse toward. The trouble withthis argument is that the equilibrium it describes is unstable. Anyperturbation of the uniform density of matter breaks the symmetry,leading to the collapse of some pocket of the universe. If the radiusof such a collapsing region is r, then its gravitational is proportionalto r3, and its gravitational field is proportional to r3/r2 = r. Sinceits acceleration is proportional to its own size, the time it takes tocollapse is independent of its size. The prediction is that the uni-verse will have a self-similar structure, in which the clumping onsmall scales behaves in the same way as clumping on large scales;zooming in or out in such a picture gives a landscape that appearsthe same. With modern hindsight, this is actually not in bad agree-ment with reality. We observe that the universe has a hierarchicalstructure consisting of solar systems, galaxies, clusters of galaxies,superclusters, and so on. Once such a structure starts to condense,

12Barcelo and Visser, “Twilight for the energy conditions?,” http://arxiv.

org/abs/gr-qc/0205066v1.

Section 8.1 Sources in general relativity 291

Page 292: General Relativity

the collapse tends to stop at some point because of conservation ofangular momentum. This is what happened, for example, when ourown solar system formed out of a cloud of gas and dust.

Einstein confronted similar issues, but in a more acute form.Newton’s symmetry argument, which failed only because of its in-stability, fails even more badly in relativity: the entire spacetimecan simply contract uniformly over time, without singling out anyparticular point as a center. Furthermore, it is not obvious thatangular momentum prevents total collapse in relativity in the sameway that it does classically, and even if it did, how would that applyto the universe as a whole? Einstein’s Machian orientation wouldhave led him to reject the idea that the universe as a whole couldbe in a state of rotation, and in any case it was sensible to start thestudy of relativistic cosmology with the simplest and most symmet-ric possible models, which would have no preferred axis of rotation.

Because of these issues, Einstein decided to try to patch up hisfield equation so that it would allow a static universe. Looking backover the considerations that led us to this form of the equation,we see that it is very nearly uniquely determined by the followingcriteria:

1. It should be consistent with experimental evidence for localconservation of energy-momentum.

2. It should satisfy the equivalence principle.

3. It should be coordinate-independent.

4. It should be equivalent to Newtonian gravity or “plain” generalrelativity in the appropriate limit.

5. It should not be overdetermined.

This is not meant to be a rigorous proof, just a general observationthat it’s not easy to tinker with the theory without breaking it.

A failed attempt at tinkering Example: 11As an example of the lack of “wiggle room” in the structure of thefield equations, suppose we construct the scalar T a

a , the trace ofthe stress-energy tensor, and try to insert it into the field equa-tions as a further source term. The first problem is that the fieldequation involves rank-2 tensors, so we can’t just add a scalar.To get around this, suppose we multiply by the metric. We thenhave something like Gab = c1Tab + c2gabT c

c , where the two con-stants c1 and c2 would be constrained by the requirement that thetheory agree with Newtonian gravity in the classical limit.

To see why this attempt fails, note that the stress-energy tensor ofan electromagnetic field is traceless, T c

c = 0. Therefore the beamof light’s coupling to gravity in the c2 term is zero. As discussed on

292 Chapter 8 Sources

Page 293: General Relativity

pp. 273-276, empirical tests of conservation of momentum wouldtherefore constrain c2 to be . 10−8.

One way in which we can change the field equation withoutviolating any of these requirements is to add a term Λgab, giving

Gab = 8πTab + Λgab ,

which is what we will refer to as the Einstein field equation.13 Aswe’ll see in example 12 on p. 294, this is consistent with conserva-tion of energy-momentum (requirement 1 above) if and only if Λ isconstant. In example 13 we find that its effects are only significanton the largest scales, which makes it undetectable, for example, insolar-system tests (criterion 4). For these reasons Λ is referred to asthe cosmological constant. As we’ll see below, Einstein introducedit in order to make a certain type of cosmology work.

We could also choose to absorb the Λgab term in the field equa-tions into the 8πTab, as if the cosmological constant term were dueto some form of matter. It would then be a perfect fluid (example4, p. 278) with a negative pressure, and it would violate the strongenergy condition (example 14, p. 294). When we think of it thisway, it’s common these days to refer to it as dark energy. But evenif we think of it as analogous to a matter field, its constancy meansthat it has none of its own independent degrees of freedom. It can’tvibrate, rotate, flow, be compressed or rarefied, heated or cooled.It acts like a kind of energy that is automatically built in to everycubic centimeter of space. This is closely related to the fact thatits contribution to the stress-energy tensor is proportional to themetric. One way of stating the equivalence principle (requirement2 above) is that space itself does not come equipped with any othertensor besides the metric.

Einstein originally introduced a positive cosmological constantbecause he wanted relativity to be able to describe a static universe.To see why it would have this effect, compare its behavior with thatof an ordinary fluid. When an ordinary fluid, such as the explodingair-gas mixture in a car’s cylinder, expands, it does work on its en-vironment, and therefore by conservation of energy its own internalenergy is reduced. A positive cosmological constant, however, actslike a certain amount of mass-energy built into every cubic meter ofvacuum. Thus when it expands, it releases energy. Its pressure isnegative.

Now consider the following semi-relativistic argument. Althoughwe’ve already seen (page 219) that there is no useful way to sepa-rate the roles of kinetic and potential energy in general relativity,suppose that there are some quantities analogous to them in thedescription of the universe as a whole. (We’ll see below that the

13In books that use a −+ ++ metric rather then our +−−−, the sign of thecosmological constant term is reversed relative to ours.

Section 8.1 Sources in general relativity 293

Page 294: General Relativity

universe’s contraction and expansion is indeed described by a set ofdifferential equations that can be interpreted in essentially this way.)If the universe contracts, a cubic meter of space becomes less thana cubic meter. The cosmological-constant energy associated withthat volume is reduced, so some energy has been consumed. Thekinetic energy of the collapsing matter goes down, and the collapseis decelerated.

Cosmological constant must be constant Example: 12If Λ is thought of as a form of matter, then it becomes naturalto ask whether it’s spread more thickly in some places than oth-ers: is the cosmological “constant” really constant? The follow-ing argument shows that it cannot vary. The field equations areGab = 8πTab +Λgab. Taking the divergence of both sides, we have∇aGab = 8π∇aTab +∇a(Λgab). The left-hand side vanishes (seep. 269). Since laboratory experiments have verified conservationof mass-energy to high precision for all the forms of matter rep-resented by T , we have ∇aTab = 0 as well. Applying the productrule to the term ∇a(Λgab), we get gab∇aΛ +Λ∇agab. But the co-variant derivative of the metric vanishes, so the result is simply∇bΛ. Thus any variation in the cosmological constant over spaceor time violates the field equations, and the violation is equivalentto the violation we would get from a form of matter than didn’tconserve mass-energy locally.

Cosmological constant is cosmological Example: 13The addition of the Λ term constitutes a change to the vacuumfield equations, and the good agreement between theory and ex-periment in the case of, e.g., Mercury’s orbit puts an upper limit onΛ then implies that Λ must be small. For an order-of-magnitudeestimate, consider that Λ has units of mass density, and the onlyparameters with units that appear in the description of Mercury’sorbit are the mass of the sun, m, and the radius of Mercury’s or-bit, r . The relativistic corrections to Mercury’s orbit are on theorder of v2, or about 10−8, and they come out right. Therefore wecan estimate that the cosmological constant could not have beengreater than about (10−8)m/r3 ∼ 10−10 kg/m3, or it would havecaused noticeable discrepancies. This is a very poor bound; if Λwas this big, we might even be able to detect its effects in labora-tory experiments. Looking at the role played by r in the estimate,we see that the upper bound could have been made tighter byincreasing r . Observations on galactic scales, for example, con-strain it much more tightly. This justifies the description of Λ ascosmological: the larger the scale, the more significant the effectof a nonzero Λ would be.

Energy conditions Example: 14Since the right-hand side of the field equation is 8πTab + Λgab,it is possible to consider the cosmological constant as a type ofmatter contributing to the stress-energy tensor. We then have

294 Chapter 8 Sources

Page 295: General Relativity

ρ = −P = Λ/8π. As described in more detail in section 8.2.11 onp. 326, we now know that Λ is positive. With Λ > 0, the weakand dominant energy conditions are both satisfied, so that in ev-ery frame of reference, ρ is positive and there is no flux of energyflowing at speeds greater than c. The negative pressure does vi-olate the strong energy condition, meaning that the constant actsas a form of gravitational repulsion. If the cosmological constantis a product of the quantum-mechanical structure of the vacuum,then this violation is not too surprising, because quantum fieldsare known to violate various energy conditions. For example, theenergy density between two parallel conducting plates is negativedue to the Casimir effect.

8.2 Cosmological solutionsWe are thus led to pose two interrelated questions. First, whatcan empirical observations about the universe tell us about the lawsof physics, such as the zero or nonzero value of the cosmologicalconstant? Second, what can the laws of physics, combined withobservation, tell us about the large-scale structure of the universe,its origin, and its fate?

8.2.1 Evidence for the finite age of the universe

We have a variety of evidence that the universe’s existence doesnot stretch for an unlimited time into the past.

When astronomers view light from the deep sky that has beentraveling through space for billions of years, they observe a uni-verse that looks different from today’s. For example, quasars werecommon in the early universe but are uncommon today.

In the present-day universe, stars use up deuterium nuclei, butthere are no known processes that could replenish their supply. Wetherefore expect that the abundance of deuterium in the universeshould decrease over time. If the universe had existed for an infinitetime, we would expect that all its deuterium would have been lost,and yet we observe that deuterium does exist in stars and in theinterstellar medium.

The second law of thermodynamics predicts that any systemshould approach a state of thermodynamic equilibrium, and yet ouruniverse is very far from thermal equilibrium, as evidenced by thefact that our sun is hotter than interstellar space, or by the existenceof functioning heat engines such as your body or an automobileengine.

With hindsight, these observations suggest that we should notlook for cosmological models that persist for an infinite time intothe past.

Section 8.2 Cosmological solutions 295

Page 296: General Relativity

a / The horn antenna usedby Penzias and Wilson.

8.2.2 Evidence for expansion of the universe

We don’t only see time-variation in locally observable quantitiessuch as quasar abundance, deuterium abundance, and entropy. Inaddition, we find empirical evidence for global changes in the uni-verse. By 1929, Edwin Hubble at Mount Wilson had determinedthat the universe was expanding, and historically this was the firstconvincing evidence that Einstein’s original goal of modeling a staticcosmology had been a mistake. Einstein later referred to the cos-mological constant as the “greatest blunder of my life,” and for thenext 70 years it was commonly assumed that Λ was exactly zero.

Since we observe that the universe is expanding, the laws of ther-modynamics require that it also be cooling, just as the explodingair-gas mixture in a car engine’s cylinder cools as it expands. If theuniverse is currently expanding and cooling, it is natural to imaginethat in the past it might have been very dense and very hot. This isconfirmed directly by looking up in the sky and seeing radiation fromthe hot early universe. In 1964, Penzias and Wilson at Bell Labora-tories in New Jersey detected a mysterious background of microwaveradiation using a directional horn antenna. As with many acciden-tal discoveries in science, the important thing was to pay attentionto the surprising observation rather than giving up and moving onwhen it confounded attempts to understand it. They pointed theantenna at New York City, but the signal didn’t increase. The ra-diation didn’t show a 24-hour periodicity, so it couldn’t be from asource in a certain direction in the sky. They even went so far as tosweep out the pigeon droppings inside. It was eventually establishedthat the radiation was coming uniformly from all directions in thesky and had a black-body spectrum with a temperature of about 3K.

This is now interpreted as follows. At one time, the universewas hot enough to ionize matter. An ionized gas is opaque to light,since the oscillating fields of an electromagnetic wave accelerate thecharged particles, depositing kinetic energy into them. Once theuniverse became cool enough, however, matter became electricallyneutral, and the universe became transparent. Light from this timeis the most long-traveling light that we can detect now. The latestdata show that transparency set in when the temperature was about3000 K. The surface we see, dating back to this time, is known asthe surface of last scattering. Since then, the universe has expandedby about a factor of 1000, causing the wavelengths of photons tobe stretched by the same amount due to the expansion of the un-derlying space. This is equivalent to a Doppler shift due to thesource’s motion away from us; the two explanations are equivalent.We therefore see the 3000 K optical black-body radiation red-shiftedto 3 K, in the microwave region.

It is logically possible to have a universe that is expanding but

296 Chapter 8 Sources

Page 297: General Relativity

whose local properties are nevertheless static, as in the steady-statemodel of Fred Hoyle, in which some novel physical process sponta-neously creates new hydrogen atoms, preventing the infinite dilutionof matter over the universe’s history, which in this model extendsinfinitely far into the past. But we have already seen strong empiri-cal evidence that the universe’s local properties (quasar abundance,etc.) are changing over time. The CMB is an even more extremeand direct example of this; the universe full of hot, dense gas thatemitted the CMB is clearly nothing like today’s universe. A briefdiscussion of the steady-state model is given in section 8.4, p. 335.

8.2.3 Evidence for homogeneity and isotropy

These observations demonstrate that the universe is not homo-geneous in time, i.e., that one can observe the present conditions ofthe universe (such as its temperature and density), and infer whatepoch of the universe’s evolution we inhabit. A different questionis the Copernican one of whether the universe is homogeneous inspace. Surveys of distant quasars show that the universe has verylittle structure at scales greater than a few times 1025 m. (This canbe seen on a remarkable logarithmic map constructed by Gott etal., astro.princeton.edu/universe.) This suggests that we can,to a good approximation, model the universe as being isotropic (thesame in all spatial directions) and homogeneous (the same at alllocations in space).

Further evidence comes from the extreme uniformity of the cos-mic microwave background radiation, once one subtracts out thedipole anisotropy due to the Doppler shift arising from our galaxy’smotion relative to the CMB. When the CMB was first discovered,there was doubt about whether it was cosmological in origin (ratherthan, say, being associated with our galaxy), and it was expectedthat its isotropy would be as large as 10%. As physicists began tobe convinced that it really was a relic of the early universe, interestfocused on measuring this anisotropy, and a series of measurementsput tighter and tighter upper bounds on it.

Other than the dipole term, there are two ways in which onemight naturally expect anisotropy to occur. There might havebeen some lumpiness in the early universe, which might have servedas seeds for the condensation of galaxy clusters out of the cosmicmedium. Furthermore, we might wonder whether the universe as awhole is rotating. The general-relativistic notion of rotation is verydifferent from the Newtonian one, and in particular, it is possibleto have a cosmology that is rotating without having any center ofrotation (see problem 5, p. 265). In fact one of the first exact solu-tions discovered for the Einstein field equations was the Godel met-ric, which described a bizarre rotating universe with closed timelikecurves, i.e., one in which causality was violated. In a rotating uni-verse, one expects that radiation received from great cosmological

Section 8.2 Cosmological solutions 297

Page 298: General Relativity

distances will have a transverse Doppler shift, i.e., a shift originat-ing from the time dilation due to the motion of the distant matteracross the sky. This shift would be greatest for sources lying in theplane of rotation relative to us, and would vanish for sources lyingalong the axis of rotation. The CMB would therefore show varia-tion with the form of a quadrupole term, 3 cos2 θ− 1. In 1977 a U-2spyplane (the same type involved in the 1960 U.S.-Soviet incident)was used by Smoot et al.14 to search for anisotropies in the CMB.This experiment was the first to definitively succeed in detectingthe dipole anisotropy. After subtraction of the dipole component,the CMB was found to be uniform at the level of ∼ 3× 10−4. Thisprovided strong support for homogeneous cosmological models, andruled out rotation of the universe with ω & 10−22 Hz.

8.2.4 The FRW cosmologies

The FRW metric and the standard coordinates

Motivated by Hubble’s observation that the universe is expand-ing, we hypothesize the existence of solutions of the field equationin which the properties of space are homogeneous and isotropic, butthe over-all scale of space is increasing as described by some scalefunction a(t). Because of coordinate invariance, the metric can stillbe written in a variety of forms. One such form is

ds2 = dt2 − a(t)2 d`2 ,

where the spatial part is

d`2 = f(r) dr2 + r2 dθ2 + r2 sin2 θ dφ2 .

To interpret the coordinates, we note that if an observer is ableto determine the functions a and f for her universe, then she canalways measure some scalar curvature such as the Ricci scalar orthe Kretchmann invariant, and since these are proportional to araised to some power, she can determine a and t. This shows thatt is a “look-out-the-window” time, i.e., a time coordinate that wecan determine by looking out the window and observing the presentconditions in the universe. Because the quantity being measured di-rectly is a scalar, the result is independent of the observer’s state ofmotion. (In practice, these scalar curvatures are difficult to measuredirectly, so we measure something else, like the sky-wide averagetemperature of the cosmic microwave background.) Simultaneity issupposed to be ill-defined in relativity, but the look-out-the-window

14G. F. Smoot, M. V. Gorenstein, and R. A. Muller, “Detection of Anisotropyin the Cosmic Blackbody Radiation,” Phys. Rev. Lett. 39 (1977) 898. Theinterpretation of the CMB measurements is somewhat model-dependent; in theearly years of observational cosmology, it was not even universally accepted thatthe CMB had a cosmological origin. The best model-independent limit on therotation of the universe comes from observations of the solar system, Clemence,“Astronomical Time,” Rev. Mod. Phys. 29 (1957) 2.

298 Chapter 8 Sources

Page 299: General Relativity

time defines a notion of simultaneity that is the most naturally in-teresting one in this spacetime. With this particular definition ofsimultaneity, we can also define a preferred state of rest at any loca-tion in spacetime, which is the one in which t changes as slowly aspossible relative to one’s own clock. This local rest frame, which ismore easily determined in practice as the one in which the microwavebackground is most uniform across the sky, can also be interpretedas the one that is moving along with the Hubble flow, i.e., the av-erage motion of the galaxies, photons, or whatever else inhabits thespacetime. The time t is interpreted as the proper time of a particlethat has always been locally at rest. The spatial distance measuredby L =

∫a d` is called the proper distance. It is the distance that

would be measured by a chain of rulers, each of them “at rest” inthe above sense.

These coordinates are referred as the “standard” cosmologicalcoordinates; one will also encounter other choices, such as the co-moving and conformal coordinates, which are more convenient forcertain purposes. Historically, the solution for the functions a andf was found by de Sitter in 1917.

The spatial metric

The unknown function f(r) has to give a 3-space metric d`2

with a constant Einstein curvature tensor. The following Maximaprogram computes the curvature.

1 load(ctensor);

2 dim:3;

3 ct_coords:[r,theta,phi];

4 depends(f,t);

5 lg:matrix([f,0,0],

6 [0,r^2,0],

7 [0,0,r^2*sin(theta)^2]);

8 cmetric();

9 einstein(true);

Line 2 tells Maxima that we’re working in a space with three di-mensions rather than its default of four. Line 4 tells it that f is afunction of time. Line 9 uses its built-in function for computing theEinstein tensor Gab. The result has only one nonvanishing compo-nent, Gtt = (1− 1/f)/r2. This has to be constant, and since scalingcan be absorbed in the factor a(t) in the 3+1-dimensional metric,we can just set the value of Gtt more or less arbitrarily, except forits sign. The result is f = 1/(1− kr2), where k = −1, 0, or 1.

The resulting metric, called the Robertson-Walker metric, is

ds2 = dt2 − a2

(dr2

1− kr2+ r2 dθ2 + r2 sin2 θ dφ2

).

Section 8.2 Cosmological solutions 299

Page 300: General Relativity

b / 1. In the Euclidean plane, thistriangle can be scaled by anyfactor while remaining similar toitself. 2. In a plane with positivecurvature, geometrical figureshave a maximum area and max-imum linear dimensions. Thistriangle has almost the maximumarea, because the sum of itsangles is nearly 3π. 3. In a planewith negative curvature, figureshave a maximum area but nomaximum linear dimensions. Thistriangle has almost the maximumarea, because the sum of itsangles is nearly zero. Its vertices,however, can still be separatedfrom one another without limit.

The form of d`2 shows us that k can be interpreted in terms ofthe sign of the spatial curvature. We recognize the k = 0 metricas a flat spacetime described in spherical coordinates. To interpretthe k 6= 0 cases, we note that a circle at coordinate r has propercircumference C = 2πar and proper radius R = a

∫ r0

√f(r′) dr′.

For k < 0, we have f < 1 and C > 2πR, indicating negative spatialcurvature. For k > 0 there is positive curvature.

Let’s examine the positive-curvature case more closely. Supposewe select a particular plane of simultaneity defined by t = constantand φ = π/2, and we start doing geometry in this plane. In two spa-tial dimensions, the Riemann tensor only has a single independentcomponent, which can be identified with the Gaussian curvature(sec. 5.4, p. 168), and when this Gaussian curvature is positive andconstant, it can be interpreted as the angular defect of a triangleper unit area (sec. 5.3, p. 162). Since the sum of the interior an-gles of a triangle can never be greater than 3π, we have an upperlimit on the area of any triangle. This happens because the positive-curvature Robertson-Walker metric represents a cosmology that isspatially finite. At a given t, it is the three-dimensional analogueof a two-sphere. On a two-sphere, if we set up polar coordinateswith a given point arbitrarily chosen as the origin, then we knowthat the r coordinate must “wrap around” when we get to the an-tipodes. That is, there is a coordinate singularity there. (We knowit can only be a coordinate singularity, because if it wasn’t, then theantipodes would have special physical characteristics, but the FRWmodel was constructed to be spatially homogeneous.) This “wrap-around” behavior is described by saying that the model is closed.

In the negative-curvature case, there is no limit on distances,b/3. Such a universe is called open. In the case of an open universe,it is particularly easy to demonstrate a fact that bothers many stu-dents, which is that proper distances can grow at rates exceeding c.Let particles A and B both be at rest relative to the Hubble flow.The proper distance between them is then given by L = a`, where` =

∫ BA d` is constant. Then differentiating L with respect to the

look-out-the-window time t gives dL/dt = a`. In an open universe,there is no limit on the size of `, so at any given time, we can makedL/dt as large as we like. This does not violate special relativity,since it is only locally that special relativity is a valid approximationto general relativity. Because GR only supplies us with frames ofreference that are local, the velocity of two objects relative to oneanother is not even uniquely defined; our choice of dL/dt was justone of infinitely many possible definitions.

The distinction between closed and open universes is not justa matter of geometry, it’s a matter of topology as well. Just as atwo-sphere cannot be made into a Euclidean plane without cuttingor tearing, a closed universe is not topologically equivalent to an

300 Chapter 8 Sources

Page 301: General Relativity

open one. The correlation between local properties (curvature) andglobal ones (topology) is a general theme in differential geometry.A universe that is open is open forever, and similarly for a closedone.

The Friedmann equations

Having fixed f(r), we can now see what the field equation tellsus about a(t). The next program computes the Einstein tensor forthe full four-dimensional spacetime:

1 load(ctensor);

2 ct_coords:[t,r,theta,phi];

3 depends(a,t);

4 lg:matrix([1,0,0,0],

5 [0,-a^2/(1-k*r^2),0,0],

6 [0,0,-a^2*r^2,0],

7 [0,0,0,-a^2*r^2*sin(theta)^2]);

8 cmetric();

9 einstein(true);

The result is

Gtt = 3

(a

a

)2

+ 3ka−2

Grr = Gθθ = Gφφ = 2a

a+

(a

a

)2

+ ka−2 ,

where dots indicate differentiation with respect to time.

Since we have Gab with mixed upper and lower indices, we eitherhave to convert it into Gab, or write out the field equations in thismixed form. The latter turns out to be simpler. In terms of mixedindices, gab is always simply diag(1, 1, 1, 1). Arbitrarily singling outr = 0 for simplicity, we have g = diag(1,−a2, 0, 0). The stress-energy tensor is Tµν = diag(ρ,−P ,−P ,−P ). (See example 4 onp. 278 for the signs.) Substituting into Gab = 8πT ab + Λgab, we find

3

(a

a

)2

+ 3ka−2 − Λ = 8πρ

2a

a+

(a

a

)2

+ ka−2 − Λ = −8πP .

Rearranging a little, we have a set of differential equations knownas the Friedmann equations,

a

a=

1

3Λ− 4π

3(ρ+ 3P )(

a

a

)2

=1

3Λ +

3ρ− ka−2 .

Section 8.2 Cosmological solutions 301

Page 302: General Relativity

c / Alexander Friedmann (1888-1925).

d / Example 15.

The cosmology that results from a solution of these differentialequations is known as the Friedmann-Robertson-Walker (FRW) orFriedmann-Lemaıtre-Robertson-Walker (FLRW) cosmology.

The first Friedmann equation describes the rate at which cosmo-logical expansion accelerates or decelerates. Let’s refer to it as theacceleration equation. It expresses the basic idea of the field equa-tions, which is that non-tidal curvature (left-hand side) is causedby the matter that is present locally (right-hand side). Example 15illustrates this in a simple case.

The second Friedmann equation tells us the magnitude of therate of expansion or contraction. Call it the velocity equation. Thequantity a/a, evaluated at the present cosmological time, is theHubble constant Ho (which is constant only in the sense that at afixed time, it is a constant of proportionality between distance andrecession velocity).

To the practiced eye, it seems odd to have two dynamical laws,one predicting velocity and one acceleration. The analogous laws infreshman mechanics would be Newton’s second law, which predictsacceleration, and conservation of energy, which predicts velocity.Newton’s laws and conservation of energy are not independent, andfor mechanical systems either can be derived from the other. TheFriedmann equations, however, are not overdetermined or redun-dant. They are underdetermined, because we want to predict threeunknown functions of time: a, ρ, and P . Since there are only twoequations, they are not sufficient to uniquely determine a solutionfor all three functions. The third constraint comes in the form ofsome type of equation of state for the matter described by ρ and P ,which in simple models can often be written in the form P = wρ.For example, dust has w = 0.

Unlike a, ρ, and P , the cosmological constant Λ is not free tovary with time; if it did, then the stress-energy tensor would have anonvanishing divergence, which is not consistent with the Einsteinfield equations (see p. 294).

Although general relativity does not provide any scalar, globallyconserved measure of mass-energy that is conserved in all space-times, the Friedmann velocity equation can be loosely interpretedas a statement of conservation of mass-energy in an FRW spacetime.The left-hand side acts like kinetic energy. In a cosmology that ex-pands and then recontracts in a Big Crunch, the turn-around pointis defined by the time at which the right-hand side equals zero. Theorigin of the velocity equation is in fact the time-time part of thefield equations, whose source term is the mass-energy component ofthe stress-energy tensor.

302 Chapter 8 Sources

Page 303: General Relativity

e / Georges Lemaıtre (1894-1966) proposed in 1927 that ouruniverse be modeled in generalrelativity as a spacetime in whichspace expanded over time.Lemaıtre’s ideas were initiallytreated skeptically by Eddingtonand Einstein, who told him,“Your calculations are correct,but your physics is abominable.”Later, as Hubble’s observationalevidence for cosmological expan-sion became widely accepted,both Einstein and Eddingtonbecame converts, helping tobring Lemaıtre’s ideas to theattention of the community. In1931, an emboldened Lemaıtredescribed the idea that theuniverse began from a “PrimevalAtom” or “Cosmic Egg.” Thename that eventually stuck was“Big Bang,” coined by Fred Hoyleas a derisive term.

Scooping out a hole Example: 15This example illustrates the connection between cosmological ac-celeration and local density of matter given by the Friedmann ac-celeration equation. Consider two cosmologies, each with Λ = 0.Cosmology 1 is an FRW spacetime in which all matter is in theform of nonrelativistic particles such as atoms or galaxies. 2 isidentical to 1, except that all the matter has been scooped out of asmall spherical region S, leaving a vacuum. (“Small” means smallcompared to the Hubble scale 1/Ho.) Within S, we introduce testparticles A and B. Because an FRW spacetime is homogeneousand isotropic, cosmology 2 retains spherical symmetry about thecenter of S. Since Λ = 0, Birkhoff’s theorem applies to 2, so 2 isflat inside S. Therefore in 2, the relative acceleration a of the testparticles equals zero.

Because S is small compared to cosmological distances, and be-cause the dust is nonrelativistic, local observers can accuratelyattibute the difference in behavior between 1 and 2 to the Newto-nian gravitational force from the dust that was present in 1 but notin 2. For convenience, let A and B both be initially at rest relativeto the local dust (i.e., having θ = φ = 0). By the definition of thescale factor (i.e., by inspection of the FRW metric), the distancebetween them varies as const × a(t). If one of these particles isan observer, she sees a “force” acting on the other particle thatcauses an acceleration (a/a)r, where r is the displacement be-tween the particles.

Since a = 0 in 2, it follows that the acceleration in 1 can be calcu-lated accurately by finding the Newtonian gravitational force dueto the added dust. This results in a connection between a/a, onthe left-hand side of the Friedmann acceleration equation, and ρ,on the right side.

For consistency, we can verify that the Newtonian gravitationalforce exerted by a uniform sphere, at a point on its interior, isproportional to r. This is a classic result that is easily derivedfrom Newton’s shell theorem.

8.2.5 A singularity at the Big Bang

The Friedmann equations only allow a constant a in the casewhere Λ is perfectly tuned relative to the other parameters, andeven this artificially fine-tuned equilibrium turns out to be unsta-ble. These considerations make a static cosmology implausible ontheoretical grounds, and they are also consistent with the observedHubble expansion (p. 296).

Since the universe is not static, what happens if we use generalrelativity to extrapolate farther and farther back in time?

If we extrapolate the Friedmann equations backward in time, wefind that they always have a = 0 at some point in the past, and this

Section 8.2 Cosmological solutions 303

Page 304: General Relativity

occurs regardless of the details of what we assume about the mat-ter and radiation that fills the universe. To see this, note that, asdiscussed in example 14 on page 132, radiation is expected to dom-inate the early universe, for generic reasons that are not sensitiveto the (substantial) observational uncertainties about the universe’spresent-day mixture of ingredients. Under radiation-dominated con-ditions, we can approximate Λ = 0 and P = ρ/3 (example 14, p. 132)in the first Friedmann equation, finding

a

a= −8π

where ρ is the density of mass-energy due to radiation. Since a/ais always negative, the graph of a(t) is always concave down, andsince a is currently increasing, there must be some time in the pastwhen a = 0. One can readily verify that this is not just a coordi-nate singularity; the Ricci scalar curvature Raa diverges, and thesingularity occurs at a finite proper time in the past.

In section 6.3.1, we saw that a black hole contains a singularity,but it appears that such singularities are always hidden behind eventhorizons, so that we can never observe them from the outside. TheFRW singularity, however, is not hidden behind an event horizon.It lies in our past light-cone, and our own world-lines emerged fromit. The universe, it seems, originated in a Big Bang, a conceptthat originated with the Belgian Roman Catholic priest GeorgesLemaıtre.

Self-check: Why is it not correct to think of the Big Bang as anexplosion that occurred at a specific point in space?

Does the FRW singularity represent something real about ouruniverse?

One thing to worry about is the accuracy of our physical model-ing of the radiation-dominated universe. The presence of an initialsingularity in the FRW solutions does not depend sensitively on onassumptions like P = ρ/3, but it is still disquieting that no labo-ratory experiment has ever come close to attaining the conditionsunder which we could test whether a gas of photons produces grav-itational fields as predicted by general relativity. We saw on p. 273that static electric fields do produce gravitational fields as predicted,but this is not the same as an empirical confirmation that elec-tromagnetic waves also act as gravitational sources in exactly themanner that general relativity claims. We do, however, have a con-sistency check in the form of the abundances of nuclei. Calculationsof nuclear reactions in the early, radiation-dominated universe pre-dict certain abundances of hydrogen, helium, and deuterium. Inparticular, the relative abundance of helium and deuterium is a sen-sitive test of the relationships among a, a, and a predicted by theFRW equations, and they confirm these relationships to a precision

304 Chapter 8 Sources

Page 305: General Relativity

of about 5± 4%.15

An additional concern is whether the Big Bang singularity isjust a product of the unrealistic assumption of perfect symmetrythat went into the FRW cosmology. One of the Penrose-Hawkingsingularity theorems proves that it is not.16 This particular sin-gularity theorem requires three conditions: (1) the strong energycondition holds; (2) there are no closed timelike curves; and (3) atrapped surface exists in the past timelike geodesics originating atsome point. The requirement of a trapped surface can fail if the uni-verse is inhomogeneous to & 10−4, but observations of the cosmicmicrowave background rule out any inhomogeneity this large (seep. 297). The other possible failure of the assumptions is that if thecosmological constant is large enough, it violates the strong energyequation, and we can have a Big Bounce rather than a Big Bang(see p. 318).

An exceptional case: the Milne universe

There is still a third loophole in our conclusion that the Big Bangsingularity must have existed. Consider the special case of the FRWanalysis, found by Milne in 1932 (long before FRW), in which theuniverse is completely empty, with ρ = 0 and Λ = 0. This is ofcourse not consistent with the fact that the universe contains starsand galaxies, but we might wonder whether it could tell us anythinginteresting as a simplified approximation to a very dilute universe.The result is that the scale factor a varies linearly with time (prob-lem 3, p. 340). If a is not constant, then there exists a time at whicha = 0, but this doesn’t turn out to be a real singularity (which isn’tsurprising, since there is no matter to create gravitational fields).Let this universe have a scattering of test particles whose massesare too small to invalidate the approximation of ρ = 0, and letthe test particles be at rest in the (r, θ,φ) coordinates. The lineardependence of a on t means that these particles simply move in-ertially and without any gravitational interactions, spreading apartfrom one another at a constant rate like the raisins in a rising loafof raisin bread. The Friedmann equations require k = −1, so thespatial geometry is one of constant negative curvature.

The Milne universe is in fact flat spacetime described in trickycoordinates. The connection can be made as follows. Let a spher-ically symmetric cloud of test particles be emitted by an explosionthat occurs at some arbitrarily chosen event in flat spacetime. Makethe cloud’s density be nonuniform in a certain specific way, so that

15Steigman, Ann. Rev. Nucl. Part. Sci. 57 (2007) 463. These tests arestated in terms of the Hubble “constant” H = a/a, which is actually varyingover cosmological time-scales. The nuclear helium-deuterium ratio is sensitiveto H/H.

16Hawking and Ellis, “The Cosmic Black-Body Radiation and the Existence ofSingularities in Our Universe,” Astrophysical Journal, 152 (1968) 25. Availableonline at articles.adsabs.harvard.edu.

Section 8.2 Cosmological solutions 305

Page 306: General Relativity

every observer moving along with a test particle (called a comovingobserver) sees the same local conditions in his own frame; due toLorentz contraction by a factor γ, this requires that the density beproportional to γ as described by the observer O who remained atthe origin. This scenario turns out to be identical to the Milne uni-verse under the change of coordinates from spatially flat coordinates(T ,R) to FRW coordinates (t, r), where t = T/γ is the proper timeand r = vγ. (Cf. problem 11, p. 201.)

The Milne universe may be useful as an innoculation againstthe common misconception that the Big Bang was an explosion ofmatter spreading out into a preexisting vacuum. Such a descriptionseems obviously incompatible with homogeneity, since, for example,an observer at the edge of the cloud sees the cloud filling only halfof the sky. But isn’t this a logical contradiction, since the Milneuniverse does have an explosion into vacuum, and yet it was derivedas a special case of the FRW analysis, which explicitly assumed ho-mogeneity? It is not a contradiction, because a comoving observernever actually sees an edge. In the limit as we approach the edge,the density of the cloud (as seen by the observer who stayed atthe origin) approaches infinity, and the Lorentz contraction also ap-proaches infinity, so that O considers them to be like Hamlet saying,“I could be bounded in a nutshell, and count myself a king of infinitespace.” This logic only works in the case of the Milne universe. Theexplosion-into-preexisting-vacuum interpretation fails in Big Bangcosmologies with ρ 6= 0.

8.2.6 Observability of expansion

Brooklyn is not expanding!

The proper interpretation of the expansion of the universe, asdescribed by the Friedmann equations, can be tricky. The exampleof the Milne universe encourages us to imagine that the expansionwould be undetectable, since the Milne universe can be described aseither expanding or not expanding, depending on the choice of coor-dinates. A more general consequence of coordinate-independence isthat relativity does not pick out any preferred distance scale. Thatis, if all our meter-sticks expand, and the rest of the universe expandsas well, we would have no way to detect the expansion. The flawin this reasoning is that the Friedmann equations only describe theaverage behavior of spacetime. As dramatized in the classic WoodyAllen movie “Annie Hall:” “Well, the universe is everything, andif it’s expanding, someday it will break apart and that would bethe end of everything!” “What has the universe got to do with it?You’re here in Brooklyn! Brooklyn is not expanding!”

To organize our thoughts, let’s consider the following hypotheses:

1. The distance between one galaxy and another increases at therate given by a(t) (assuming the galaxies are sufficiently dis-

306 Chapter 8 Sources

Page 307: General Relativity

tant from one another that they are not gravitationally boundwithin the same galactic cluster, supercluster, etc.).

2. The wavelength of a photon increases according to a(t) as ittravels cosmological distances.

3. The size of the solar system increases at this rate as well (i.e.,gravitationally bound systems get bigger, including the earthand the Milky Way).

4. The size of Brooklyn increases at this rate (i.e., electromag-netically bound systems get bigger).

5. The size of a helium nucleus increases at this rate (i.e., systemsbound by the strong nuclear force get bigger).

We can imagine that:

• All the above hypotheses are true.

• All the above hypotheses are false, and in fact none of thesesizes increases at all.

• Some are true and some false.

If all five hypotheses were true, the expansion would be unde-tectable, because all available meter-sticks would be expanding to-gether. Likewise if no sizes were increasing, there would be nothingto detect. These two possibilities are really the same cosmology,described in two different coordinate systems. But the Ricci andEinstein tensors were carefully constructed so as to be intrinsic.The fact that the expansion affects the Einstein tensor shows thatit cannot interpreted as a mere coordinate expansion. Specifically,suppose someone tells you that the FRW metric can be made intoa flat metric by a change of coordinates. (I have come across thisclaim on internet forums.) The linear structure of the tensor trans-formation equations guarantees that a nonzero tensor can never bemade into a zero tensor by a change of coordinates. Since the Ein-stein tensor is nonzero for an FRW metric, and zero for a flat metric,the claim is false.

Self-check: The reasoning above implicitly assumed a non-emptyuniverse. Convince yourself that it fails in the special case of theMilne universe.

We can now see some of the limitations of a common metaphorused to explain cosmic expansion, in which the universe is visual-ized as the surface of an expanding balloon. The metaphor correctlygets across several ideas: that the Big Bang is not an explosion thatoccurred at a preexisting point in empty space; that hypothesis 1above holds; and that the rate of recession of one galaxy relative

Section 8.2 Cosmological solutions 307

Page 308: General Relativity

to another is proportional to the distance between them. Neverthe-less the metaphor may be misleading, because if we take a laundrymarker and draw any structure on the balloon, that structure willexpand at the same rate. But this implies that hypotheses 1-5 allhold, which cannot be true.

Since some of the five hypotheses must be true and some false,and we would like to sort out which are which. It should also beclear by now that these are not five independent hypotheses. Forexample, we can test empirically whether the ratio of Brooklyn’ssize to the distances between galaxies changes like a(t), remainsconstant, or changes with some other time dependence, but it isonly the ratio that is actually observable.

Empirically, we find that hypotheses 1 and 2 are true (i.e., thephoton’s wavelength maintains a constant ratio with the intergalac-tic distance scale), while 3, 4, and 5 are false. For example, theorbits of the planets in our solar system have been measured ex-tremely accurately by radar reflection and by signal propagationtimes to space probes, and no expanding trend is detected.

General-relativistic predictions

Does general relativity correctly reproduce these observations?General relativity is mainly a theory of gravity, so it should be wellwithin its domain to explain why the solar system does not ex-pand detectably while intergalactic distances do. It is impracticalto solve the Einstein field equations exactly so as to describe theinternal structure of all the bodies that occupy the universe: galax-ies, superclusters, etc. We can, however, handle simple cases, asin example 20 on page 317, where we display an exact solution forthe case of a universe containing only two things: an isolated blackhole, and an energy density described by a cosmological constant.We find that the characteristic scale of the black hole, i.e., the radiusof its event horizon, does not increase with time. A fuller treatmentof these issues is given on p. 323, after some facts about realis-tic cosmologies have been established. The result is that althoughbound systems like the solar system are in some cases predicted toexpand, the expansion is absurdly small, too small to measure, andmuch smaller than the rate of expansion of the universe in general asrepresented by the scale factor a(t). This agrees with observation.

It is easy to show that atoms and nuclei do not steadily ex-pand over time. because such an expansion would violate eitherthe equivalence principle or the basic properties of quantum me-chanics. One way of stating the equivalence principle is that thelocal geometry of spacetime is always approximately Lorentzian, sothat the laws of physics do not depend on one’s position or state ofmotion. Among these laws of physics are the principles of quantummechanics, which imply that an atom or a nucleus has a well-definedground state, with a certain size that depends only on fundamental

308 Chapter 8 Sources

Page 309: General Relativity

constants such as Planck’s constant and the masses of the particlesinvolved. Atoms and nuclei do experience deformation due to gravi-tational strains (examples 24-25, p. 324), but these deformations donot increase with time, and would only be detectable if cosmologicalexpansion were to accelerate radically (example 26, p. 324).

This is different from the case of a photon traveling across theuniverse. The argument given above fails, because the photon doesnot have a ground state. The photon does expand, and this isrequired by the correspondence principle. If the photon did not ex-pand, then its wavelength would remain constant, and this wouldbe inconsistent with the classical theory of electromagnetism, whichpredicts a Doppler shift due to the relative motion of the sourceand the observer. One can choose to describe cosmological redshiftseither as Doppler shifts or as expansions of wavelength due to cos-mological expansion.

A nice way of discussing atoms, nuclei, photons, and solar sys-tems all on the same footing is to note that in geometrized units,the units of mass and length are the same. Therefore the existenceof any fundamental massive particle sets a universal length scale,one that will be known to any intelligent species anywhere in theuniverse. Since photons are massless, they can’t be used to set auniversal scale in this way; a photon has a certain mass-energy, butthat mass-energy can take on any value. Similarly, a solar systemsets a length scale, but not a universal one; the radius of a planet’sorbit can take on any value. A universe without massive fundamen-tal particles would be a universe without length measurement. Itwould obey the laws of conformal geometry, in which angles andlight-cones were the only measures. This is the reason that atomsand nuclei, which are made of massive fundamental particles, do notexpand.

More than one dimension required

Another good way of understanding why a photon expands,while an atom does not, is to recall that a one-dimensional spacecan never have any intrinsic curvature. If the expansion of atomswere to be detectable, we would need to detect it by comparingagainst some other meter-stick. Let’s suppose that a hydrogen atomexpands more, while a more tightly bound uranium atom expandsless, so that over time, we can detect a change in the ratio of the twoatoms’ sizes. The world-lines of the two atoms are one-dimensionalcurves in spacetime. They are housed in a laboratory, and althoughthe laboratory does have some spatial extent, the equivalence prin-ciple guarantees that to a good approximation, this small spatialextent doesn’t matter. This implies an intrinsic curvature in a one-dimensional space, which is mathematically impossible, so we havea proof by contradiction that atoms do not expand streadily.

Now why does this one-dimensionality argument fail for photons

Section 8.2 Cosmological solutions 309

Page 310: General Relativity

and galaxies? For a pair of galaxies, it fails because the galaxies arenot sufficiently close together to allow them both to be covered bya single Lorentz frame, and therefore the set of world-lines com-prising the observation cannot be approximated well as lying withina one-dimensional space. Similar reasoning applies for cosmologi-cal redshifts of photons received from distant galaxies. One couldinstead propose flying along in a spaceship next to an electromag-netic wave, and monitoring the change in its wavelength while it isin flight. All the world-lines involved in such an experiment wouldindeed be confined to a one-dimensional space. The experiment isimpossible, however, because the measuring apparatus cannot beaccelerated to the speed of light. In reality, the speed of the lightwave relative to the measuring apparatus will always equal c, so thetwo world-lines involved in the experiment will diverge, and will notbe confined to a one-dimensional region of spacetime.

A cosmic girdle Example: 16Since cosmic expansion has no significant effect on Brooklyn, nu-clei, and solar systems, we might be tempted to infer that its ef-fect on any solid body would also be negligible. To see that thisis not true, imagine that we live in a closed universe, and the uni-verse has a leather belt wrapping around it on a closed spacelikegeodesic. All parts of the belt are initially at rest relative to thelocal galaxies, and the tension is initially zero everywhere. Thebelt must stretch and eventually break: for if not, then it couldnot remain everywhere at rest with respect to the local galaxies,and this would violate the symmetry of the initial conditions, sincethere would be no way to pick the direction in which a certain partof the belt should begin accelerating.

Østvang’s quasi-metric relativity Example: 17Over the years, a variety of theories of gravity have been pro-posed as alternatives to general relativity. Some of these, suchas the Brans-Dicke theory, remain viable, i.e., they are consis-tent with all the available experimental data that have been usedto test general relativity. One of the most important reasons fortrying to construct such theories is that it can be impossible tointerpret tests of general relativity’s predictions unless one alsopossesses a theory that predicts something different. This issue,for example, has made it impossible to test Einstein’s century-oldprediction that gravitational effects propagate at c, since there isno viable theory available that predicts any other speed for them(see section 9.1).

Østvang (arxiv.org/abs/gr-qc/0112025v6) has proposed an al-ternative theory of gravity, called quasi-metric relativity, which,unlike general relativity, predicts a significant cosmological ex-pansion of the solar system, and which is claimed to be ableto explain the observation of small, unexplained accelerations ofthe Pioneer space probes that remain after all accelerations due

310 Chapter 8 Sources

Page 311: General Relativity

to known effects have been subtracted (the “Pioneer anomaly”).We’ve seen above that there are a variety of arguments againstsuch an expansion of the solar system, and that many of thesearguments do not require detailed technical calculations but onlyknowledge of certain fundamental principles, such as the struc-ture of differential geometry (no intrinsic curvature in one dimen-sion), the equivalence principle, and the existence of ground statesin quantum mechanics. We therefore expect that Østvang’s the-ory, if it is logically self-consistent, will probably violate these as-sumptions, but that the violations must be relatively small if thetheory is claimed to be consistent with existing observations. Thisis in fact the case. The theory violates the strictest form of theequivalence principle.

Over the years, a variety of explanations have been proposedfor the Pioneer anomaly, including both glamorous ones (a mod-ification of the 1/r2 law of gravitational forces) and others morepedestrian (effects due to outgassing of fuel, radiation pressurefrom sunlight, or infrared radiation originating from the space-crafts radioisotope thermoelectric generator). Calculations by Io-rio17 in 2006-2009 show that if the force law for gravity is modifiedin order to explain the Pioneer anomalies, and if gravity obeys theequivalence principle, then the results are inconsistent with theobserved orbital motion of the satellites of Neptune. This makesgravitational explanations unlikely, but does not obviously rule outØstvang’s theory, since the theory is not supposed to obey theequivalence principle. Østvang says18 that his theory predicts anexpansion of ∼ 1m/yr in the orbit of Triton’s moon Nereid, whichis consistent with observation.

In December 2010, the original discoverers of the effect madea statement in the popular press that they had a new analysis,which they were preparing to publish in a scientific paper, in whichthe size of the anomaly would be drastically revised downward,with a far greater proportion of the acceleration being accountedfor by thermal effects. In my opinion this revision, combined withthe putative effect’s violation of the equivalence principle, make itclear that the anomaly is not gravitational.

Does space expand?

Finally, the balloon metaphor encourages us to interpret cosmo-logical expansion as a phenomenon in which space itself expands,or perhaps one in which new space is produced. Does space reallyexpand? Without posing the question in terms of more rigorouslydefined, empirically observable quantities, we can’t say yes or no. Itis merely a matter of which definitions one chooses and which con-ceptual framework one finds easier and more natural to work within.

17http://arxiv.org/abs/0912.2947v118private communication, Jan. 4, 2010

Section 8.2 Cosmological solutions 311

Page 312: General Relativity

Bunn and Hogg have stated the minority view against expansion ofspace19, while the opposite opinion is given by Francis et al.20

As an example of a self-consistent set of definitions that leadto the conclusion that space does expand, Francis et al. give thefollowing. Define eight observers positioned at the corners of a cube,at cosmological distances from one another. Let each observer beat rest relative to the local matter and radiation that were used asingredients in the FRW cosmology. (For example, we know that ourown solar system is not at rest in this sense, because we observethat the cosmic microwave background radiation is slightly Dopplershifted in our frame of reference.) Then these eight observers willobserve that, over time, the volume of the cube grows as expectedaccording to the cube of the function a(t) in the FRW model.

This establishes that expansion of space is a plausible interpreta-tion. To see that it is not the only possible interpretation, considerthe following example. A photon is observed after having traveledto earth from a distant galaxy G, and is found to be red-shifted. Al-ice, who likes expansion, will explain this by saying that while thephoton was in flight, the space it occupied expanded, lengtheningits wavelength. Betty, who dislikes expansion, wants to interpret itas a kinematic red shift, arising from the motion of galaxy G rela-tive to the Milky Way Malaxy, M. If Alice and Betty’s disagreementis to be decided as a matter of absolute truth, then we need someobjective method for resolving an observed redshift into two terms,one kinematic and one gravitational. But we’ve seen in section 7.4on page 253 that this is only possible for a stationary spacetime,and cosmological spacetimes are not stationary: regardless of anobserver’s state of motion, he sees a change over time in observablessuch as density of matter and curvature of spacetime. As an ex-treme example, suppose that Betty, in galaxy M, receives a photonwithout realizing that she lives in a closed universe, and the pho-ton has made a circuit of the cosmos, having been emitted from herown galaxy in the distant past. If she insists on interpreting this asa kinematic red shift, the she must conclude that her galaxy M ismoving at some extremely high velocity relative to itself. This is infact not an impossible interpretation, if we say that M’s high veloc-ity is relative to itself in the past. An observer who sets up a frameof reference with its origin fixed at galaxy G will happily confirmthat M has been accelerating over the eons. What this demonstratesis that we can split up a cosmological red shift into kinematic andgravitational parts in any way we like, depending on our choice ofcoordinate system (see also p. 260).

A cosmic whip Example: 18The cosmic girdle of example 16 on p. 310 does not transmit any

19http://arxiv.org/abs/0808.1081v220http://arxiv.org/abs/0707.0380v1

312 Chapter 8 Sources

Page 313: General Relativity

information from one part of the universe to another, for its stateis the same everywhere by symmetry, and therefore an observernear one part of the belt gets no information that is any differentfrom what would be available to an observer anywhere else.

Now suppose that the universe is open rather than closed, butwe have a rope that, just like the belt, stretches out over cosmicdistances along a spacelike geodesic. If the rope is initially atrest with respect to a particular galaxy G (or, more strictly speak-ing, with respect to the locally averaged cosmic medium), then bysymmetry the rope will always remain at rest with respect to G,since there is no way for the laws of physics to pick a direction inwhich it should accelerate. Now the residents of G cut the rope,release half of it, and tie the other half securely to one of G’s spi-ral arms using a square knot. If they do this smoothly, withoutvarying the rope’s tension, then no vibrations will propagate, andeverything will be as it was before on that half of the rope. (Weassume that G is so massive relative to the rope that the ropedoes not cause it to accelerate significantly.)

Can observers at distant points observe the tail of the rope whip-ping by at a certain speed, and thereby infer the velocity of Grelative to them? This would produce all kinds of strange con-clusions. For one thing, the Hubble law says that this velocity isdirectly proportional to the length of the rope, so by making therope long enough we could make this velocity exceed the speedof light. We’ve also convinced ourselves that the relative veloc-ity of cosmologically distant objects is not even well defined ingeneral relativity, so it clearly can’t make sense to interpret therope-end’s velocity in that way.

The way out of the paradox is to recognize that disturbances canonly propagate along the rope at a certain speed v . Let’s say thatthe information is transmitted in the form of longitudinal vibrations,in which case it propagates at the speed of sound. For a ropemade out of any known material, this is far less than the speedof light, and we’ve also seen in example 15 on page 65 and inproblem 4 on page 85 that relativity places fundamental limitson the properties of all possible materials, guaranteeing v < c.We can now see that all we’ve accomplished with the rope is torecapitulate using slower sound waves the discussion that wascarried out on page 312 using light waves. The sound wavesmay perhaps preserve some information about the state of motionof galaxy G long ago, but all the same ambiguities apply to itsinterpretation as in the case of light waves — and in addition, wesuspect that the rope has long since parted somewhere along itslength.

Section 8.2 Cosmological solutions 313

Page 314: General Relativity

8.2.7 The vacuum-dominated solution

For 70 years after Hubble’s discovery of cosmological expansion,the standard picture was one in which the universe expanded, butthe expansion must be decelerating. The deceleration is predictedby the special cases of the FRW cosmology that were believed tobe applicable, and even if we didn’t know anything about generalrelativity, it would be reasonable to expect a deceleration due to themutual Newtonian gravitational attraction of all the mass in theuniverse.

But observations of distant supernovae starting around 1998 in-troduced a further twist in the plot. In a binary star system con-sisting of a white dwarf and a non-degenerate star, as the non-degenerate star evolves into a red giant, its size increases, and itcan begin dumping mass onto the white dwarf. This can cause thewhite dwarf to exceed the Chandrasekhar limit (page 144), resultingin an explosion known as a type Ia supernova. Because the Chan-drasekhar limit provides a uniform set of initial conditions, the be-havior of type Ia supernovae is fairly predictable, and in particulartheir luminosities are approximately equal. They therefore providea kind of standard candle: since the intrinsic brightness is known,the distance can be inferred from the apparent brightness. Giventhe distance, we can infer the time that was spent in transit by thelight on its way to us, i.e. the look-back time. From measurementsof Doppler shifts of spectral lines, we can also find the velocity atwhich the supernova was receding from us. The result is that wecan measure the universe’s rate of expansion as a function of time.Observations show that this rate of expansion has been accelerating.The Friedmann equations show that this can only occur for Λ & 4ρ.This picture has been independently verified by measurements of thecosmic microwave background (CMB) radiation. A more detaileddiscussion of the supernova and CMB data is given in section 8.2.11on page 326.

With hindsight, we can see that in a quantum-mechanical con-text, it is natural to expect that fluctuations of the vacuum, requiredby the Heisenberg uncertainty principle, would contribute to the cos-mological constant, and in fact models tend to overpredict Λ by afactor of about 10120! From this point of view, the mystery is whythese effects cancel out so precisely. A correct understanding of thecosmological constant presumably requires a full theory of quantumgravity, which is presently far out of our reach.

The latest data show that our universe, in the present epoch, isdominated by the cosmological constant, so as an approximation we

314 Chapter 8 Sources

Page 315: General Relativity

can write the Friedmann equations as

a

a=

1

3Λ(

a

a

)2

=1

3Λ .

This is referred to as a vacuum-dominated universe. The solution is

a = exp

[√Λ

3t

],

where observations show that Λ ∼ 10−26 kg/m3, giving√

3/Λ ∼1011 years.

The implications for the fate of the universe are depressing. Allparts of the universe will accelerate away from one another faster andfaster as time goes on. The relative separation between two objects,say galaxy A and galaxy B, will eventually be increasing faster thanthe speed of light. (The Lorentzian character of spacetime is local,so relative motion faster than c is only forbidden between objectsthat are passing right by one another.) At this point, an observerin either galaxy will say that the other one has passed behind anevent horizon. If intelligent observers do actually exist in the farfuture, they may have no way to tell that the cosmos even exists.They will perceive themselves as living in island universes, such aswe believed our own galaxy to be a hundred years ago.

When I introduced the standard cosmological coordinates onpage 299, I described them as coordinates in which events thatare simultaneous according to this t are events at which the localproperties of the universe are the same. In the case of a perfectlyvacuum-dominated universe, however, this notion loses its meaning.The only observable local property of such a universe is the vacuumenergy described by the cosmological constant, and its density is al-ways the same, because it is built into the structure of the vacuum.Thus the vacuum-dominated cosmology is a special one that maxi-mally symmetric, in the sense that it has not only the symmetries ofhomogeneity and isotropy that we’ve been assuming all along, butalso a symmetry with respect to time: it is a cosmology withouthistory, in which all times appear identical to a local observer. Inthe special case of this cosmology, the time variation of the scalingfactor a(t) is unobservable, and may be thought of as the unfortu-nate result of choosing an inappropriate set of coordinates, whichobscure the underlying symmetry. When I argued in section 8.2.6for the observability of the universe’s expansion, note that all myarguments assumed the presence of matter or radiation. These arecompletely absent in a perfectly vacuum-dominated cosmology.

For these reasons de Sitter originally proposed this solution as astatic universe in 1927. But by 1920 it was realized that this was an

Section 8.2 Cosmological solutions 315

Page 316: General Relativity

oversimplification. The argument above only shows that the timevariation of a(t) does not allow us to distinguish one epoch of theuniverse from another. That is, we can’t look out the window andinfer the date (e.g., from the temperature of the cosmic microwavebackground radiation). It does not, however, imply that the uni-verse is static in the sense that had been assumed until Hubble’sobservations. The r-t part of the metric is

ds2 = dt2 − a2 dr2 ,

where a blows up exponentially with time, and the k-dependencehas been neglected, as it was in the approximation to the Friedmannequations used to derive a(t).21 Let a test particle travel in the radialdirection, starting at event A = (0, 0) and ending at B = (t′, r′). Inflat space, a world-line of the linear form r = vt would be a geodesicconnecting A and B; it would maximize the particle’s proper time.But in the this metric, it cannot be a geodesic. The curvature ofgeodesics relative to a line on an r-t plot is most easily understoodin the limit where t′ is fairly long compared to the time-scale T =√

3/Λ of the exponential, so that a(t′) is huge. The particle’s beststrategy for maximizing its proper time is to make sure that its dris extremely small when a is extremely large. The geodesic musttherefore have nearly constant r at the end. This makes it sound asthough the particle was decelerating, but in fact the opposite is true.If r is constant, then the particle’s spacelike distance from the originis just ra(t), which blows up exponentially. The near-constancy ofthe coordinate r at large t actually means that the particle’s motionat large t isn’t really due to the particle’s inertial memory of itsoriginal motion, as in Newton’s first law. What happens insteadis that the particle’s initial motion allows it to move some distanceaway from the origin during a time on the order of T , but after that,the expansion of the universe has become so rapid that the particle’smotion simply streams outward because of the expansion of spaceitself. Its initial motion only mattered because it determined howfar out the particle got before being swept away by the exponentialexpansion.

Geodesics in a vacuum-dominated universe Example: 19In this example we confirm the above interpretation in the specialcase where the particle, rather than being released in motion atthe origin, is released at some nonzero radius r , with dr/dt = 0initially. First we recall the geodesic equation

d2 x i

dλ2 = Γ ijk

dx j

dλdxk

dλ.

21A computation of the Einstein tensor with ds2 = dt2 − a2(1 − kr2)−1 dr2

shows that k enters only via a factor the form (. . .)e(...)t+(. . .)k. For large t, thek term becomes negligible, and the Einstein tensor becomes Gab = gabΛ, This isconsistent with the approximation we used in deriving the solution, which wasto ignore both the source terms and the k term in the Friedmann equations.The exact solutions with Λ > 0 and k = −1, 0, and 1 turn out in fact to beequivalent except for a change of coordinates.

316 Chapter 8 Sources

Page 317: General Relativity

from page 178. The nonvanishing Christoffel symbols for the 1+1-dimensional metric ds2 = dt2−a2 dr2 are Γ r

tr = a/a and Γ tr r = aa.

Setting T = 1 for convenience, we have Γ rtr = 1 and Γ t

r r = e−2t .

We conjecture that the particle remains at the same value of r .Given this conjecture, the particle’s proper time

∫ds is simply the

same as its time coordinate t , and we can therefore use t as anaffine coordinate. Letting λ = t , we have

d2 tdt2 − Γ

tr r

(drdt

)2

= 0

0− Γ tr r r2 = 0

r = 0r = constant

This confirms the self-consistency of the conjecture that r = constantis a geodesic.

Note that we never actually had to use the actual expressions forthe Christoffel symbols; we only needed to know which of themvanished and which didn’t. The conclusion depended only on thefact that the metric had the form ds2 = dt2 − a2 dr2 for somefunction a(t). This provides a rigorous justification for the inter-pretation of the cosmological scale factor a as giving a universaltime-variation on all distance scales.

The calculation also confirms that there is nothing special aboutr = 0. A particle released with r = 0 and r = 0 initially stays atr = 0, but a particle released at any other value of r also staysat that r . This cosmology is homogeneous, so any point couldhave been chosen as r = 0. If we sprinkle test particles, all atrest, across the surface of a sphere centered on this arbitrarilychosen point, then they will all accelerate outward relative to oneanother, and the volume of the sphere will increase. This is ex-actly what we expect. The Ricci curvature is interpreted as thesecond derivative of the volume of a region of space defined bytest particles in this way. The fact that the second derivative ispositive rather than negative tells us that we are observing thekind of repulsion provided by the cosmological constant, not theattraction that results from the existence of material sources.

Schwarzschild-de Sitter space Example: 20

The metric

ds2 =(

1− 2mr− 1

3Λr2

)dt2− dr2

1− 2mr −

13Λr2

−r2 dθ2−r2 sin2 θdφ2

is an exact solution to the Einstein field equations with cosmo-logical constant Λ, and can be interpreted as a universe in which

Section 8.2 Cosmological solutions 317

Page 318: General Relativity

the only mass is a black hole of mass m located at r = 0. Nearthe black hole, the Λ terms become negligible, and this is simplythe Schwarzschild metric. As argued in section 8.2.6, page 306,this is a simple example of how cosmological expansion does notcause all structures in the universe to grow at the same rate.

Conservation of energy-momentum Example: 21Suppose that we assume the de Sitter geometry, and ask whattype of matter fields are necessary to create it. We know that acosmological constant will do the job, but could we have someother matter field that would also work? Suppose that the matterfield is constrained to be a perfect fluid. The total stress-energyis then of the form T µν = diag(ρ,−P,−P,−P) in Cartesian coordi-nates. (See example 4 on p. 278 for the signs, some of which de-pend on our use of the +−−− signature.) The divergence ∇µT µtmeasures the rate at which an observer says energy is being cre-ated, and we need this to be zero. This expression is one of thosetricky examples where the covariant derivative can be nonzeroeven when the thing being differentiated vanishes identically. Thedivergence is ∇tT t

t +∇xT xt , and the term that doesn’t vanish is

the second one, even though T xt = 0. Using the nonvanishing

Christoffel symbols this becomes Γ xxtT

tt − Γ x

txT xx = a

a (ρ + P), sothat ρ + P = 0. This condition is satisfied by a cosmological con-stant. Our result is that the only way to get a de Sitter geometryis with matter fields that exactly mimic a cosmological constant.This is of some historical interest in the context of the steady-statecosmologies, section 8.4, p. 335. It may seem mysterious thatwe have obtained this result by requiring conservation of energy-momentum, but we could also have done it using the Einstein fieldequations. In fact these are not two separate requirements, sincethe field equations require conservation of energy-momentum inorder to be consistent.

The Big Bang singularity in a universe with a cosmologicalconstant

On page 305 we discussed the possibility that the Big Bangsingularity was an artifact of the unrealistically perfect symmetryassumed by our cosmological models, and we found that this wasnot the case: the Penrose-Hawking singularity theorems demon-strate that the singularity is real, provided that the cosmologicalconstant is zero. The cosmological constant is not zero, however.Models with a very large positive cosmological constant can alsodisplay a Big Bounce rather than a Big Bang. If we imagine us-ing the Friedmann equations to evolve the universe backward intime from its present state, the scaling arguments of example 14 onpage 132 suggest that at early enough times, radiation and mat-ter should dominate over the cosmological constant. For a largeenough value of the cosmological constant, however, it can happen

318 Chapter 8 Sources

Page 319: General Relativity

that this switch-over never happens. In such a model, the universeis and always has been dominated by the cosmological constant,and we get a Big Bounce in the past because of the cosmologicalconstant’s repulsion. In this book I will only develop simple cos-mological models in which the universe is dominated by a singlecomponent; for a discussion of bouncing models with both matterand a cosmological constant, see Carroll, “The Cosmological Con-stant,” http://www.livingreviews.org/lrr-2001-1. By 2008, avariety of observational data had pinned down the cosmological con-stant well enough to rule out the possibility of a bounce caused bya very strong cosmological constant.

8.2.8 The matter-dominated solution

Our universe is not perfectly vacuum-dominated, and in the pastit was even less so. Let us consider the matter-dominated epoch,in which the cosmological constant was negligible compared to thematerial sources. The equation of state for nonrelativistic matter(p. 132) is

P = 0 .

The dilution of the dust with cosmological expansion gives

ρ ∝ a−3

(see example 23). The Friedmann equations become

a

a= −4π

3ρ(

a

a

)2

=8π

3ρ− ka−2 ,

where for compactness ρ’s dependence on a, with some constantof proportionality, is not shown explicitly. A static solution, withconstant a, is impossible, and a is negative, which we can interpretin Newtonian terms as the deceleration of the matter in the universedue to gravitational attraction. There are three cases to consider,according to the value of k.

The closed universe

We’ve seen that k = +1 describes a universe in which the spatialcurvature is positive, i.e., the circumference of a circle is less thanits Euclidean value. By analogy with a sphere, which is the two-dimensional surface of constant positive curvature, we expect thatthe total volume of this universe is finite.

The second Friedmann equation also shows us that at some valueof a, we will have a = 0. The universe will expand, stop, and thenrecollapse, eventually coming back together in a “Big Crunch” whichis the time-reversed version of the Big Bang.

Suppose we were to describe an initial-value problem in thiscosmology, in which the initial conditions are given for all points in

Section 8.2 Cosmological solutions 319

Page 320: General Relativity

the universe on some spacelike surface, say t = constant. Since theuniverse is assumed to be homogeneous at all times, there are reallyonly three numbers to specify, a, a, and ρ: how big is the universe,how fast is it expanding, and how much matter is in it? But thesethree pieces of data may or may not be consistent with the secondFriedmann equation. That is, the problem is overdetermined. Inparticular, we can see that for small enough values of ρ, we donot have a valid solution, since the square of a/a would have to benegative. Thus a closed universe requires a certain amount of matterin it. The present observational evidence (from supernovae and thecosmic microwave background, as described above) is sufficient toshow that our universe does not contain this much matter.

The flat universe

The case of k = 0 describes a universe that is spatially flat.It represents a knife-edge case lying between the closed and openuniverses. In a Newtonian analogy, it represents the case in whichthe universe is moving exactly at escape velocity; as t approachesinfinity, we have a → ∞, ρ → 0, and a → 0. This case, unlike theothers, allows an easy closed-form solution to the motion. Let theconstant of proportionality in the equation of state ρ ∝ a−3 be fixedby setting −4πρ/3 = −ca−3. The Friedmann equations are

a = −ca−2

a =√

2ca−1/2 .

Looking for a solution of the form a ∝ tp, we find that by choosingp = 2/3 we can simultaneously satisfy both equations. The constantc is also fixed, and we can investigate this most transparently byrecognizing that a/a is interpreted as the Hubble constant, H, whichis the constant of proportionality relating a far-off galaxy’s velocityto its distance. Note that H is a “constant” in the sense that it isthe same for all galaxies, in this particular model with a vanishingcosmological constant; it does not stay constant with the passageof cosmological time. Plugging back into the original form of theFriedmann equations, we find that the flat universe can only exist ifthe density of matter satisfies ρ = ρcrit = 3H2/8π = 3H2/8πG. Theobserved value of the Hubble constant is about 1/(14× 109 years),which is roughly interpreted as the age of the universe, i.e., theproper time experienced by a test particle since the Big Bang. Thisgives ρcrit ∼ 10−26 kg/m3.

As discussed in subsection 8.2.11, our universe turns out tobe almost exactly spatially flat. Although it is presently vacuum-dominated, the flat and matter-dominated FRW cosmology is a use-ful description of its matter-dominated era.

The open universe

The k = −1 case represents a universe that has negative spatialcurvature, is spatially infinite, and is also infinite in time, i.e., even

320 Chapter 8 Sources

Page 321: General Relativity

if the cosmological constant had been zero, the expansion of the uni-verse would have had too little matter in it to cause it to recontractand end in a Big Crunch.

The time-reversal symmetry of general relativity was discussedon p. 213 in connection with the Schwarzschild metric.22 Becauseof this symmetry, we expect that solutions to the field equationswill be symmetric under time reversal (unless asymmetric boundaryconditions were imposed). The closed universe has exactly this typeof time-reversal symmetry. But the open universe clearly breaks thissymmetry, and this is why we speak of the Big Bang as lying in thepast, not in the future. This is an example of spontaneous symmetrybreaking. Spontaneous symmetry breaking happens when we try tobalance a pencil on its tip, and it is also an important phenomenonin particle physics. The time-reversed version of the open universeis an equally valid solution of the field equations. Another exampleof spontaneous symmetry breaking in cosmological solutions is thatthe solutions have a preferred frame of reference, which is the one atrest relative to the cosmic microwave background and the averagemotion of the galaxies. This is referred to as the Hubble flow.

Size and age of the observable universe Example: 22The observable universe is defined by the region from which

light has had time to reach us since the Big Bang. Many peopleare inclined to assume that its radius in units of light-years musttherefore be equal to the age of the universe expressed in years.This is not true. Cosmological distances like these are not evenuniquely defined, because general relativity only has local framesof reference, not global ones.

Suppose we adopt the proper distance L defined on p. 299 asour measure of radius. By this measure, realistic cosmologicalmodels say that our 14-billion-year-old universe has a radius of46 billion light years.

For a flat universe, f = 1, so by inspecting the FRW metric wefind that a photon moving radially with ds = 0 has |dr/dt | =a−1, giving r = ±

∫ t2t1

dt/a. Suppressing signs, the proper dis-tance the photon traverses starting soon after the Big Bang isL = a(t2)

∫d` = a(t2)

∫dr = a(t2)r = a(t2)

∫ t2t1

dt/a.

In the matter-dominated case, a ∝ t2/3, so this results in L = 3t2in the limit where t1 is small. Our universe has spent most ofits history being matter-dominated, so it’s encouraging that thematter-dominated calculation seems to do a pretty good job ofreproducing the actual ratio of 46/14=3.3 between L and t2.

While we’re at it, we can see what happens in the purely vacuum-dominated case, which has a ∝ et/T , where T =

√3/Λ. This

22Problem 5 on p. 340 shows that this symmetry is also exhibited by theFriedmann equations.

Section 8.2 Cosmological solutions 321

Page 322: General Relativity

cosmology doesn’t have a Big Bang, but we can think of it as anapproximation to the more recent history of the universe, gluedon to an earlier matter-dominated solution. Here we find L =[e(t2−t1)/T − 1

]T , where t1 is the time when the switch to vacuum-

domination happened. This function grows more quickly with t2than the one obtained in the matter-dominated case, so it makessense that the real-world ratio of L/t2 is somewhat greater thanthe matter-dominated value of 3.

The radiation-dominated version is handled in problem 12 on p. 341.

Local conservation of mass-energy Example: 23Any solution to the Friedmann equations is a solution of the fieldequations, and therefore locally conserves mass-energy. We savedwork above by applying this condition in advance in the formρ ∝ a−3 to make the dust dilute itself properly with cosmologi-cal expansion. In this example we prove the same proportionalityby explicit calculation.

Local conservation of mass-energy is expressed by the zero di-vergence of the stress-energy tensor, ∇jT jb = 0. The definition ofthe covariant derivative gives

∇jT bc = ∂jT bc + ΓbjdT dc + Γ c

jdT bd .

For convenience, we carry out the calculation at r = 0; if conser-vation holds here, then it holds everywhere by homogeneity.

In a local Cartesian frame (t ′, x ′, y ′, z ′) at rest relative to the dust,the stress-energy tensor is diagonal with T t ′t ′ = ρ. At r = 0,the transformation from FLRW coordinates into these coordinatesdoesn’t mix t or t ′ with the other coordinates, so by the tensortransformation law we still have T t t = ρ.

There are a number of Christoffel symbols involved, but the onlythree of relevance that don’t vanish at r = 0 turn out to be Γ r

r t =Γθθt = Γφφt = a/a. The result is

∇µT tµ = ∂tT t t + 3aa

T t t ,

or ρ/ρ = −3a/a, which can be rewritten as

ddt

ln ρ = −3ddt

ln a ,

producing the proportionality originally claimed.

8.2.9 The radiation-dominated solution

For the reasons discussed in example 14 on page 132, the earlyuniverse was dominated by radiation. The solution of the Friedmannequations for this case is taken up in problem 11 on page 341.

322 Chapter 8 Sources

Page 323: General Relativity

8.2.10 Local effects of expansion

In this section we discuss the predictions of general relativityconcerning the effect of cosmological expansion on small, gravita-tionally bound systems such as the solar system or clusters of galax-ies. The short answer is that in most realistic cosmologies (but notnecessarily in “Big Rip” scenarios, p. 324) the effect of expansionis not zero, but is many orders of magnitude too small to measure.Many readers will probably be willing to accept these assertionswhile skipping the following demonstrations.

To begin with, we observe that there are two qualitatively dis-tinct types of effects that could exist. Suppose that a loaf of raisinbread is rising. Let’s say that the loaf’s scale factor a doubles bythe time the yeast’s efforts are spent. By definition, this means thatthe raisins (galaxies, test particles) get farther apart by a factor of2. We could imagine that in addition: (1) the strain of expansioncould cause each raisin to puff up by, say, 1%, and to maintain thisincreased size over the entire course of expansion; or that (2) expan-sion could could cause each raisin to expand gradually, to 0.2% morethan its original size, then 0.4% more than its original size, and soon, until, at the end of the process, each had grown beyond its orig-inal size by some amount such as 3.8%, which, while less than the100% growth of the inter-raisin distances, was nevertheless nonzero.Astronomers refer to the second possibility as a “secular” trend. Forexample, simulations of solar systems often show that over billionsof years, planets gradually migrate either inward or outward, underthe influence of their gravitational interactions with other planets.As an example of an expansion without a secular trend, asteroidsmay experience a nonnegligible 1/r2 force due to radiation pressurefrom the sun. The effect is exactly as if the sun’s mass or the gravi-tational constant had been slightly reduced. Kepler’s elliptical orbitlaw holds, the law of periods is slightly off, and the orbital radiusshows zero trend over time.

If either type of effect exists, an observer in some local inertialframe will interpret it as a “force.” (The scare quotes are a reminderthat general relativity doesn’t describe gravity as Newton-style lin-early additive, instantaneous action at a distance.) Such a force,if it exists, cannot simply be proportional to the rate of expansiona/a. As a counterexample, the Milne universe is just flat spacetimedescribed in silly coordinates, and it has a 6= 0.

It would make more sense for the force to depend on the secondderivative of the scale factor. To justify this more precisely, imaginereleasing two test particles, initially separated by some distance thatis much less than the Hubble scale. They are initially at rest relativeto the Hubble flow, and no locally gravitating bodies are present.As discussed in example 15 on p. 303, the acceleration of one testparticle relative to the other is given by (a/a)r, where r is their

Section 8.2 Cosmological solutions 323

Page 324: General Relativity

relative displacement.

Thus if we are to observe any nonzero effects of expansion on alocal system, they are not really effects of expansion at all, but effectsof the acceleration of expansion. The factor a/a is on the order ofthe inverse square of the age of the universe, i.e., H2

o ∼ 10−35 s−2.The smallness of this factor is what makes the effect on a systemsuch as the solar system so absurdly tiny.

A human body Example: 24Let’s estimate the effect of cosmological expansion on the lengthL of your thigh bone. The body is made of atoms, and for the rea-sons given on p. 309, there can be no steady trend in the sizes ofthese atoms or the lengths of the chemical bonds between them.The bone experiences a stress due to cosmological expansion,but it is in equilibrium, and the strain will disappear if the gravi-tational stress is removed (e.g., if other gravitational stresses aresuperimposed on top of the cosmological one in order to cancelit). The anomalous acceleration between the ends of the boneis (a/a)L, which is observed as an anomalous stress. Takinga/a ∼ H2

o , the anomalous acceleration of one end of the bonerelative to the other is ∼ LH2. The corresponding compressionor tension is ∼ mLH2, where m is your body mass. The resultingstrain is ε ∼ mLH2/AE , where E is the Young’s modulus of bone(about 1010 Pa) and A is the bone’s cross-sectional area.

Putting in numbers, the result for the strain is about 10−40, whichis much too small to be measurable by any imaginable technique,and would in reality be swamped by other effects. Since the signof a is currently positive, this strain is tensile, not compressive. Inthe earlier, matter-dominated era of the universe, it would havebeen compressive.

There is no “secular trend,” i.e., your leg bone is not expandingover time. It’s in equilibrium, and is simply elongated impercepti-bly compared to the length if would have had without the effect ofcosmological expanson.

Strain on an atomic nucleus Example: 25The estimate in example 24 can also be applied to an atomicnucleus, which has a nuclear “Young’s modulus” on the order of1 MeV/fm3 ∼ 1032 Pa. The result is a strain ε ∼ 10−52.

A Big Rip Example: 26Known forms of matter are believed to have equations of state

P = wρ with w ≥ −1. The value for a vacuum-dominated uni-verse would be w = −1. Cosmological observations23 show thatempirically the present-day universe behaves as if it is made outof stuff with w = −1.03± .16, and this leaves open the possibilityof w < −1. In this case, the solution to the Friedmann equations

23Carnero et al., arxiv.org/abs/1104.5426

324 Chapter 8 Sources

Page 325: General Relativity

gives a scale factor a(t) that blows up to infinity at some finite t . Insuch a scenario, known as a “Big Rip,” (d/dt)(a/a) diverges, andany system, no matter how tightly bound, is ripped apart.24

Examples 24-26 show that except under hypothetical extremecosmological conditions, there is no hope of detecting any effect ofcosmological expansion on systems made of condensed matter. Weneed to look at much larger systems to see any effect, and suchsystems are held together by gravity. For concreteness, let’s keeptalking about the earth-sun system. Not only is the anomalousforce on the earth small, it is not guaranteed to produce any seculartrend, which is what would be most likely to be detectable. Thedirection of the anomalous force on the earth is outward for anaccelerating cosmological expansion, as we now know is the case forthe present epoch. As an example in which no secular trend occurs, avacuum-dominated cosmology gives a constant value for a/a, so theoutward force is constant. As with the effect of radiation pressure,the existence of this constant, outward force is very nearly equivalentto rescaling the sun’s gravitational force by a tiny amount, so themotion is still very nearly Keplerian, but with a slightly “wrong”constant of proportionality in Kepler’s law of periods. The rate ofchange r in the radius of the circular orbit is therefore zero in thiscase.

But in most cosmologies a/a is not exactly constant, and theanomalous force on the earth varies. In a matter-dominated cos-mology with Λ = 0, in its expanding phase, the force is inwardbut decreasing over time, so the orbit expands over time. Whatreally matters then, is (d/dt)(a/a). If we were free to pick anyfunction for a(t), we could make up examples in which a > 0 but(d/ dt)(a/a) < 0, so that the solar system would respond to cosmo-logical expansion by shrinking!

The function a(t), however, has to satisfy the Friedmann equa-tions, one of which is (in units with G 6= 1)

a

a= G

[1

3Λ− 4π

3(ρ+ 3P )

].

The present epoch of the universe seems to be well modeled by darkenergy described by a constant Λ plus dust with P ρ. Differenti-ating both sides with respect to time gives

d

dt

(a

a

)∝ ρ ,

with a negative constant of proportionality. This ensures that thesign of the effect is always as expected from the naive Manichean im-age of binding forces struggling against cosmological expansion (or

24Caldwell et al., arxiv.org/abs/astro-ph/0302506

Section 8.2 Cosmological solutions 325

Page 326: General Relativity

f / The angular scale of fluc-tuations in the cosmic microwavebackground can be used to inferthe curvature of the universe.

perhaps cooperating during the contracting phase of a Big Crunchcosmology).

One way of understanding why this reduces so nicely to a de-pendence on ρ is the reasoning given in example 15 on p. 303,in which we found that the relative acceleration of two test par-ticles A and B in a matter-dominated FRW cosmology could becalculated accurately by pretending that it was due to the pres-ence of the dust in any given sphere S surrounding the two par-ticles. We now let A be the sun, B the earth, and S a spherecentered on the sun whose radius equals the radius of the earth’scircular orbit. Due to cosmological expansion, the dust inside Sthins out with time, reducing its density ρ. Applying Newton’slaws to the orbit of the earth gives ω2r = GM/r2, and conserva-tion of angular momentum results in ωr2 = const. A calculationgives r/ro = [M + (4π/3)ρor

3o]/[M + (4π/3)ρr3], which results in

r/ro ≈ −(4π/3)Gω−2o ρ. Application of the Friedmann equations

yieldsr/ro = ω−2

o (d/dt)(a/a) ,

which is valid generally, not just for P = 0. The ω−2o factor shows

that the effect is smaller for more tightly bound systems.

We know that the universe in the present era has (d/dt)(a/a) > 0because ρ < 0, and for purposes of an order-of-magnitude estimatewe can take (d/dt)(a/a) ∼ H3

o . Plugging in numbers for the earth-sun system, we find that since the age of the dinosaurs, the radius ofthe earth’s orbit has grown by less than the diameter of an atomicnucleus.25

8.2.11 Observation

Historically, it was believed that the cosmological constant waszero, that nearly all matter in the universe was in the form of atoms,and that there was therefore only one interesting cosmological pa-rameter to measure, which was the average density of matter. Thisdensity was very difficult to determine, even to within an order ofmagnitude, because most of the matter in the universe probablydoesn’t emit light, making it difficult to detect. Astronomical dis-tance scales were also very poorly calibrated against absolute unitssuch as the SI. Starting around 1995, however, a new set of tech-niques led to an era of high-precision cosmology.

Spatial curvature from CMB fluctuations

A strong constraint on the models comes from accurate mea-surements of the cosmic microwave background, especially by the1989-1993 COBE probe, and its 2001-2009 successor, the WilkinsonMicrowave Anisotropy Probe, positioned at the L2 Lagrange point

25The picturesque image comes from Cooperstock et al., http://arxiv.org/abs/astro-ph/9803097v1, who give a different calculation leading to a result forr exactly equivalent to the one derived here.

326 Chapter 8 Sources

Page 327: General Relativity

g / A Hubble plot for distantsupernovae. Each data pointrepresents an average overseveral different supernovae withnearly the same z.

h / The acoustic peak in theBAO correlation function.Redrawn from Eisenstein,New Astronomy Reviews. 49(2005) 360, as reproducedin Bassett and Hlozek, 2009,arxiv.org/abs/0910.5224.

of the earth-sun system, beyond the Earth on the line connectingsun and earth.26 The temperature of the cosmic microwave back-ground radiation is not the same in all directions, and it can bemeasured at different angles. In a universe with negative spatialcurvature, the sum of the interior angles of a triangle is less thanthe Euclidean value of 180 degrees. Therefore if we observe a varia-tion in the CMB over some angle, the distance between two pointson the surface of last scattering is actually greater than would havebeen inferred from Euclidean geometry. The distance scale of suchvariations is limited by the speed of sound in the early universe, soone can work backward and infer the universe’s spatial curvaturebased on the angular scale of the anisotropies. The measurementsof spatial curvature are usually stated in terms of the parameter Ω,defined as the total average density of all source terms in the Ein-stein field equations, divided by the critical density that results ina flat universe. Ω includes contributions from matter, ΩM , the cos-mological constant, ΩΛ, and radiation (negligible in the present-dayunverse). The results from WMAP, combined with other data fromother methods, gives Ω = 1.005± .006. In other words, the universeis very nearly spatially flat.

Accelerating expansion from supernova data

The supernova data described on page 314 complement the CMBdata because they are mainly sensitive to the difference ΩΛ − ΩM ,rather than their sum Ω = ΩΛ + ΩM . This is because these datameasure the acceleration or deceleration of the universe’s expansion.Matter produces deceleration, while the cosmological constant givesacceleration. Figure g shows some recent supernova data.27 Thehorizontal axis gives the redshift factor z = (λ′ − λ)/λ, where λ′ isthe wavelength observed on earth and λ the wavelength originallyemitted. It measures how fast the supernova’s galaxy is recedingfrom us. The vertical axis is ∆(m−M) = (m−M)− (m−M)empty,where m is the apparent magnitude, M is the absolute magnitude,and (m −M)empty is the value expected in a model of an emptyuniverse, with Ω = 0. The difference m−M is a measure of distance,so essentially this is a graph of distance versus recessional velocity, ofthe same general type used by Hubble in his original discovery of theexpansion of the universe. Subtracting (m−M)empty on the verticalaxis makes it easier to see small differences. Since the WMAP datarequire Ω = 1, we need to fit the supernova data with values of ΩM

and ΩΛ that add up to one. Attempting to do so with ΩM = 1 andΩΛ = 0 is clearly inconsistent with the data, so we can concludethat the cosmological constant is definitely positive.

26Komatsu et al., 2010, arxiv.org/abs/1001.453827Riess et al., 2007, arxiv.org/abs/astro-ph/0611572. A larger data set is

analyzed in Kowalski et al., 2008, arxiv.org/abs/0804.4142.

Section 8.2 Cosmological solutions 327

Page 328: General Relativity

i / The cosmological pa-rameters of our universe,after Perlmutter, 1998,arxiv.org/abs/astro-ph/

9812133 and Kowalski, 2008,arxiv.org/abs/0804.4142. Thethree shaded regions representthe 95% confidence regions forthe three types of observations.

Density of matter from baryonic acoustic oscillations

Efforts such as the Sloan Digital Sky Survey have made three-dimensional maps of the density of luminous matter in the uni-verse.28 The distribution is clumpy. Measuring the average corre-lation ξ between the density at points separated by some distances (measured in the comoving frame), one would expect that thefunction ξ(s) would be largest when s was small and would sim-ply taper off with increasing s. By analogy, we don’t usually finda Manhattan-style landscape of skyscrapers side by side with anuninhabited mountainous wilderness. On the other hand, imagineconstructing such a correlation function for houses in a subdivisionin which the roads do not form any regular grid, but zoning regu-lations prohibit construction of houses on lots of less than a certainsize. In this situation, there would be zero probability of findinghouses separated by very small distances, and ξ(s) would exhibita peak at some larger scale set by the legal code. The actual re-sults of the sky surveys do show such a peak, which is due to wellknown physics referred to as baryon acoustic oscillations (BAO).29

In the early universe, any region of overdensity would tend to createa radiating sound wave like the bang of a firecracker. Such wavespropagated at a known speed (about half the speed of light) for aknown time (about 400,000 years, until matter became deionizedand transparent to radiation, making it immune to the photon pres-sure that drove the oscillations). This leads to a known distances, which forms a standard ruler at which the peak in ξ(s) occurs.In cosmological models, these results strongly constrain ΩM , whilebeing relatively insensitive to ΩΛ, and they are therefore comple-mentary to both the supernova data and the CMB results.

Conclusions about cosmology

Figure i summarizes what we can conclude about our universe,parametrized in terms of a model with both ΩM and ΩΛ nonzero.30

We can tell that it originated in a Big Bang singularity, that it willgo on expanding forever, and that it is very nearly flat. Note thatin a cosmology with nonzero values for both ΩM and ΩΛ, there isno strict linkage between the spatial curvature and the question ofrecollapse, as there is in a model with only matter and no cosmo-logical constant; therefore even though we know that the universewill not recollapse, we do not know whether its spatial curvature isslightly positive (closed) or negative (open).

Consistency checks

Astrophysical considerations provide further constraints and con-sistency checks. In the era before the advent of high-precision cos-

28Sanchez et al., 2012, arxiv.org/abs/1203.661629Bassett and Hlozek, 2009, arxiv.org/abs/0910.522430See Carroll, “The Cosmological Constant,” http://www.livingreviews.

org/lrr-2001-1 for a full mathematical treatment of such models.

328 Chapter 8 Sources

Page 329: General Relativity

mology, estimates of the age of the universe ranged from 10 billionto 20 billion years, and the low end was inconsistent with the ageof the oldest globular clusters. This was believed to be a problemeither for observational cosmology or for the astrophysical modelsused to estimate the age of the clusters: “You can’t be older thanyour ma.” Current data have shown that the low estimates of theage were incorrect, so consistency is restored.

That only a small fraction of the universe’s matter was luminoushad been suspected by astronomers such as Zwicky as early as 1933,based on the inability to reconcile the observed kinematics withNewton’s laws if all matter was assumed to be luminous.

Dark matter

Another constraint comes from models of nucleosynthesis dur-ing the era shortly after the Big Bang (before the formation of thefirst stars). The observed relative abundances of hydrogen, helium,and deuterium cannot be reconciled with the density of “dust” (i.e.,nonrelativistic matter) inferred from the observational data. If theinferred mass density were entirely due to normal “baryonic” matter(i.e., matter whose mass consisted mostly of protons and neutrons),then nuclear reactions in the dense early universe should have pro-ceeded relatively efficiently, leading to a much higher ratio of heliumto hydrogen, and a much lower abundance of deuterium. The con-clusion is that most of the matter in the universe must be made ofan unknown type of exotic non-baryonic matter, known genericallyas “dark matter.”

The existence of nonbaryonic matter is also required in order toreconcile the observed density of galaxies with the observed strengthof the CMB fluctuations, and in merging galaxy clusters it has beenobserved that the gravitational potential is offset from the radiatingplasma. A 2012 review paper on dark matter is Roos, arxiv.org/abs/1208.3662.

A number of experiments are under way to detect dark matterdirectly. As of 2013, the most sensitive experiment has given nullresults: arxiv.org/abs/1310.8214.

At one time it was widely expected that dark matter would con-sist of the lightest supersymmetric particle, which might for examplebe the neutralino. However, results from the LHC seem to make itunlikely that our universe exhibits supersymmetry, assuming thatthe energy scale is the electroweak scale, which is the only scale thathas strong motivation. It now appears more likely that dark matterconsists of some other particle such as sterile neutrinos or axions.

Even with the inclusion of dark matter, there is a problemwith the abundance of lithium-7 relative to hydrogen, which models

Section 8.2 Cosmological solutions 329

Page 330: General Relativity

greatly overpredict.31

8.3 Mach’s principle revisited8.3.1 The Brans-Dicke theory

Mach himself never succeeded in stating his ideas in the form ofa precisely testable physical theory, and we’ve seen that to the ex-tent that Einstein’s hopes and intuition had been formed by Mach’sideas, he often felt that his own theory of gravity came up short.The reader has so far encountered Mach’s principle in the context ofcertain thought experiments that are obviously impossible to realize,involving a hypothetical universe that is empty except for certainapparatus (e.g., section 3.6.2, p. 115). It would be easy, then, to getan impression of Mach’s principle as one of those theories that is“not even wrong,” i.e., so ill-defined that it cannot even be falsifiedby experiment, any more than Christianity can be.

But in 1961, Robert Dicke and his student Carl Brans came upwith a theory of gravity that made testable predictions, and thatwas specifically designed to be more Machian than general relativity.Their paper32 is extremely readable, even for the non-specialist.In this theory, the seemingly foolproof operational definition of aLorentz frame given on p. ?? fails. On the first page, Brans andDicke propose one of those seemingly foolish thought experimentsabout a nearly empty universe:

The imperfect expression of [Mach’s ideas] in generalrelativity can be seen by considering the case of a spaceempty except for a lone experimenter in his laboratory.[...] The observer would, according to general relativity,observe normal behavior of his apparatus in accordancewith the usual laws of physics. However, also accord-ing to general relativity, the experimenter could set hislaboratory rotating by leaning out a window and firinghis 22-caliber rifle tangentially. Thereafter the delicategyroscope in the laboratory would continue to point in adirection nearly fixed relative to the direction of motionof the rapidly receding bullet. The gyroscope would ro-tate relative to the walls of the laboratory. Thus, fromthe point of view of Mach, the tiny, almost massless,very distant bullet seems to be more important than themassive, nearby walls of the laboratory in determininginertial coordinate frames and the orientation of the gy-roscope.

31arxiv.org/abs/0808.2818, arxiv.org/abs/1107.111732C. Brans and R. H. Dicke, “Mach’s Principle and a Relativistic Theory of

Gravitation,” Physical Review 124 (1961) 925

330 Chapter 8 Sources

Page 331: General Relativity

They then proceed to construct a mathematical and more Mach-ian theory of gravity. From the Machian point of view, the correctlocal definition of an inertial frame must be determined relativeto the bulk of the matter in the universe. We want to retain theLorentzian local character of spacetime, so this influence can’t betransmitted via instantaneous action at a distance. It must prop-agate via some physical field, at a speed less than or equal to c.It is implausible that this field would be the gravitational field asdescribed by general relativity. Suppose we divide the cosmos upinto a series of concentric spherical shells centered on our galaxy.In Newtonian mechanics, the gravitational field obeys Gauss’s law,so the field of such a shell vanishes identically on the interior. Inrelativity, the corresponding statement is Birkhoff’s theorem, whichstates that the Schwarzschild metric is the unique spherically sym-metric solution to the vacuum field equations. Given this solutionin the exterior universe, we can set a boundary condition at the out-side surface of the shell, use the Einstein field equations to extendthe solution through it, and find a unique solution on the interior,which is simply a flat space.

Since the Machian effect can’t be carried by the gravitationalfield, Brans and Dicke took up an idea earlier proposed by PascualJordan33 of hypothesizing an auxiliary field φ. The fact that such afield has never been detected directly suggests that it has no mass orcharge. If it is massless, it must propagate at exactly c, and this alsomakes sense because if it were to propagate at speeds less than c,there would be no obvious physical parameter that would determinethat speed. How many tensor indices should it have? Since Mach’sprinciple tries to give an account of inertia, and inertial mass is ascalar,34 φ should presumably be a scalar (quantized by a spin-zeroparticle). Theories of this type are called tensor-scalar theories,because they use a scalar field in addition to the metric tensor.

The wave equation for a massless scalar field, in the absence ofsources, is simply ∇i∇iφ = 0. The solutions of this wave equa-tion fall off as φ ∼ 1/r. This is gentler than the 1/r2 variation ofthe gravitational field, so results like Newton’s shell theorem andBirkhoff’s theorem no longer apply. If a spherical shell of mass actsas a source of φ, then φ can be nonzero and varying inside the shell.The φ that you experience right now as you read this book should bea sum of wavelets originating from all the masses whose world-linesintersected the surface of your past light-cone. In a static universe,this sum would diverge linearly, so a self-consistency requirement forBrans-Dicke gravity is that it should produce cosmological solutions

33Jordan was a member of the Nazi Sturmabteilung or “brown shirts” whonevertheless ran afoul of the Nazis for his close professional relationships withJews.

34A limit of 5× 10−23 has been placed on the anisotropy of the inertial massof the proton: R.W.P. Drever, “A search for anisotropy of inertial mass using afree precession technique,” Philosophical Magazine, 6:687 (1961) 683.

Section 8.3 Mach’s principle revisited 331

Page 332: General Relativity

that avoid such a divergence, e.g., ones that begin with Big Bangs.

Masses are the sources of the field φ. How should they couple toit? Since φ is a scalar, we need to construct a scalar as its source,and the only reasonable scalar that can play this role is the trace ofthe stress-energy tensor, T ii . As discussed in example 11 on page292, this vanishes for light, so the only sources of φ are materialparticles.35 Even so, the Brans-Dicke theory retains a form of theequivalence principle. As discussed on pp. 39 and 33, the equiva-lence principle is a statement about the results of local experiments,and φ at any given location in the universe is dominated by con-tributions from matter lying at cosmological distances. Objects ofdifferent composition will have differing fractions of their mass thatarise from internal electromagnetic fields. Two such objects will stillfollow identical geodesics, since their own effect on the local valueof φ is negligible. This is unlike the behavior of electrically chargedobjects, which experience significant back-reaction effects in curvedspace (p. 39). However, the strongest form of the equivalence princi-ple requires that all experiments in free-falling laboratories produceidentical results, no matter where and when they are carried out.Brans-Dicke gravity violates this, because such experiments coulddetect differences between the value of φ at different locations —but of course this is part and parcel of the purpose of the theory.

We now need to see how to connect φ to the local notion ofinertia so as to produce an effect of the kind that would tend tofulfill Mach’s principle. In Mach’s original formulation, this wouldentail some kind of local rescaling of all inertial masses, but Bransand Dicke point out that in a theory of gravity, this is equivalent toscaling the Newtonian gravitational constant G down by the samefactor. The latter turns out to be a better approach. For one thing,it has a natural interpretation in terms of units. Since φ’s amplitudefalls off as 1/r, we can write φ ∼ Σmi/r, where the sum is over thepast light cone. If we then make the identification of φ with 1/G(or c2/G in a system wher c 6= 1), the units work out properly, andthe coupling constant between matter and φ can be unitless. If thiscoupling constant, notated 1/ω, were not unitless, then the theory’spredictive value would be weakened, because there would be no wayto know what value to pick for it. For a unitless constant, however,there is a reasonable way to guess what it should be: “in any sensibletheory,” Brans and Dicke write, “ω must be of the general order ofmagnitude of unity.” This is, of course, assuming that the Brans-Dicke theory was correct. In general, there are other reasonablevalues to pick for a unitless number, including zero and infinity. The

35This leads to an exception to the statement above that all Brans-Dickespacetimes are expected to look like Big Bang cosmologies. Any solution of theGR field equations containing nothing but vacuum and electromagnetic fields(known as an “elevtrovac” solution) is also a valid Brans-Dicke spacetime. Insuch a spacetime, a constant φ can be set arbitrarily. Such a spacetime is insome sense not generic for Brans-Dicke gravity.

332 Chapter 8 Sources

Page 333: General Relativity

limit of ω →∞ recovers the special case of general relativity. ThusMach’s principle, which once seemed too vague to be empiricallyfalsifiable, comes down to measuring a specific number, ω, whichquantifies how non-Machian our universe is.36

8.3.2 Predictions of the Brans-Dicke theory

Returning to the example of the spherical shell of mass, we cansee based on considerations of units that the value of φ inside shouldbe ∼ m/r, where m is the total mass of the shell and r is its radius.There may be a unitless factor out in front, which will depend on ω,but for ω ∼ 1 we expect this constant to be of order 1. Solving thenasty set of field equations that result from their Lagrangian, Bransand Dicke indeed found φ ≈ [2/(3 + 2ω)](m/r), where the constantin square brackets is of order unity if ω is of order unity. In thelimit of ω → ∞, φ = 0, and the shell has no physical effect on itsinterior, as predicted by general relativity.

Brans and Dicke were also able to calculate cosmological models,and in a typical model with a nearly spatially flat universe, theyfound φ would vary according to

φ = 8π4 + 3ω

6 + 4ωρot

2o

(t

to

)2/(4+3ω)

,

where ρo is the density of matter in the universe at time t = to.When the density of matter is small, G is large, which has the sameobservational consequences as the disappearance of inertia; this isexactly what one expects according to Mach’s principle. For ω →∞,the gravitational “constant” G = 1/φ really is constant.

Returning to the thought experiment involving the 22-caliber ri-fle fired out the window, we find that in this imaginary universe, witha very small density of matter, G should be very large. This causesa frame-dragging effect from the laboratory on the gyroscope, one

36Another good technical reasons for thinking of φ as relating to the gravita-tional constant is that general relativity has a standard prescription for describ-ing fields on a background of curved spacetime. The vacuum field equations ofgeneral relativity can be derived from the principle of least action, and althoughthe details are beyond the scope of this book (see, e.g., Wald, General Relativ-ity, appendix E), the general idea is that we define a Lagrangian density LGthat depends on the Ricci scalar curvature, and then extremize its integral overall possible histories of the evolution of the gravitational field. If we want todescribe some other field, such as matter, light, or φ, we simply take the special-relativistic Lagrangian LM for that field, change all the derivatives to covariantderivatives, and form the sum (1/G)LG + LM . In the Brans-Dicke theory, wehave three pieces, (1/G)LG + LM + Lφ, where LM is for matter and Lφ for φ.If we were to interpret φ as a rescaling of inertia, then we would have to have φappearing as a fudge factor modifying all the inner workings of LM . If, on theother hand, we think of φ as changing the value of the gravitational constant G,then the necessary modification is extremely simple. Brans and Dicke introduceone further modification to Lφ so that the coupling constant ω between matterand φ can be unitless. This modification has no effect on the wave equation ofφ in flat spacetime.

Section 8.3 Mach’s principle revisited 333

Page 334: General Relativity

a / The apparatus used byDicke and Goldenberg to mea-sure the oblateness of the sunwas essentially a telescope witha disk inserted in order to blackout most of the light from the sun.

much stronger than we would see in our universe. Brans and Dickecalculated this effect for a laboratory consisting of a spherical shell,and although technical difficulties prevented the reliable extrapo-lation of their result to ρo → 0, the trend was that as ρo becamesmall, the frame-dragging effect would get stronger and stronger,presumably eventually forcing the gyroscope to precess in lock-stepwith the laboratory. There would thus be no way to determine, oncethe bullet was far away, that the laboratory was rotating at all —in perfect agreement with Mach’s principle.

8.3.3 Hints of empirical support

Only six years after the publication of the Brans-Dicke theory,Dicke himself, along with H.M. Goldenberg37 carried out a measure-ment that seemed to support the theory empirically. Fifty yearsbefore, one of the first empirical tests of general relativity, which ithad seemed to pass with flying colors, was the anomalous perihelionprecession of Mercury. The word “anomalous,” which is often leftout in descriptions of this test, is required because there are manynonrelativistic reasons why Mercury’s orbit precesses, including in-teractions with the other planets and the sun’s oblate shape. It isonly when these other effects are subtracted out that one sees thegeneral-relativistic effect calculated on page 217. The sun’s oblate-ness is difficult to measure optically, so the original analysis of thedata had proceeded by determining the sun’s rotational period byobserving sunspots, and then assuming that the sun’s bulge was theone found for a rotating fluid in static equilibrium. The result was anassumed oblateness of about 1× 10−5. But we know that the sun’sdynamics are more complicated than this, since it has convectioncurrents and magnetic fields. Dicke, who was already a renownedexperimentalist, set out to determine the oblateness by direct op-tical measurements, and the result was (5.0 ± 0.7) × 10−5, which,although still very small, was enough to put the observed perihelionprecession out of agreement with general relativity by about 8%.The perihelion precession predicted by Brans-Dicke gravity differsfrom the general relativistic result by a factor of (4 + 3ω)/(6 + 3ω).The data therefore appeared to require ω ≈ 6 ± 1, which would beinconsistent with general relativity.

8.3.4 Mach’s principle is false.

The trouble with the solar oblateness measurements was thatthey were subject to a large number of possible systematic errors,and for this reason it was desirable to find a more reliable test ofBrans-Dicke gravity. Not until about 1990 did a consensus arise,based on measurements of oscillations of the solar surface, that thepre-Dicke value was correct. In the interim, the confusion had thesalutary effect of stimulating a renaissance of theoretical and ex-

37Dicke and Goldenberg, “Solar Oblateness and General Relativity,” PhysicalReview Letters 18 (1967) 313

334 Chapter 8 Sources

Page 335: General Relativity

perimental work in general relativity. Often if one doesn’t have analternative theory, one has no reasonable basis on which to designand interpret experiments to test the original theory.

Currently, the best bound on ω is based on measurements38

of the propagation of radio signals between earth and the Cassini-Huygens space probe in 2003, which require ω > 4 × 104. This isso much greater than unity that it is reasonable to take Brans andDicke at their word that “in any sensible theory, ω must be of thegeneral order of magnitude of unity.” Brans-Dicke fails this test, andis no longer a “sensible” candidate for a theory of gravity. We cannow see that Mach’s principle, far from being a fuzzy piece of philo-sophical navel-gazing, is a testable hypothesis. It has been testedand found to be false, in the following sense. Brans-Dicke gravityis about as natural a formal implementation of Mach’s principle ascould be hoped for, and it gives us a number ω that parametrizeshow Machian the universe is. The empirical value of ω is so largethat it shows our universe to be essentially as non-Machian as gen-eral relativity.

8.4 Historical note: the steady-state modelFrom 1948 until around the mid-1960s, the Big Bang theory hadviable competition in the form of the steady-state model, originatedby the British trio of Fred Hoyle, Hermann Bondi, and ThomasGold. Legend has it that they came up with the idea after seeing ahorror movie called Dead of Night, in which events from the begin-ning of the story repeat themselves later. This led them to imaginethat the universe could, although expanding, remain locally in thesame state at all times. If this were to happen, the empty spacebeing opened up between the galaxies would have to be filled backin by the spontaneous creation of matter. The model holds a strongphilosophical appeal because it generalizes the Copernican principleso that it applies not just to conditions everywhere in space but alsoat all times.

They published the idea in a pair of back-to-back papers, oneby Bondi and Gold39 and one by Hoyle,40 with comments appendedto the former on the differences between the two approaches. TheBondi-Gold paper is especially fun to read, because it is written innontechnical language and shows a type of daring and creative sci-ence that is not often encountered today. Much of it reads like a

38Bertotti, Iess, and Tortora, “A test of general relativity using radio linkswith the Cassini spacecraft,” Nature 425 (2003) 374

39“The Steady-State Theory of the Expanding Universe,” MNRAS 108 (1948)252; adsabs.harvard.edu/cgi-bin/nph-bib_query?bibcode=1948MNRAS.108.

.252B40“A New Model for the Expanding Universe,” MNRAS 108 (1948)

372, adsabs.harvard.edu/cgi-bin/nph-bib_query?bibcode=1948MNRAS.108.

.372H

Section 8.4 Historical note: the steady-state model 335

Page 336: General Relativity

catalog of cherished principles of physics that were to be given up, in-cluding Lorentz invariance, general relativity, the equivalence princi-ple, and possibly the laws of conservation of charge and mass-energy.The following is a brief presentation (in slightly different notation)of the Hoyle’s more mathematically detailed ideas, as sketched inhis original paper. Although Hoyle eventually fleshed out the ideasmore thoroughly, by the time he had done so the steady-state the-ory was already on its way to being crushed under the weight ofcontrary observations.

Since the model is always to be in the same state, the quan-tity a/a must always be the same, i.e., the Hubble constant reallyis a constant over time. This requires exponential growth, whichmeans that the geometry is that of de Sitter space. In any modelthat assigns an infinite age to the universe, one must explain whythe universe has not undergone heat death due to the second lawof thermodynamics. The steady-state model successfully addressesthis problem, because the exponential expansion is rapid enough toprevent thermal equilibrium from happening.

Hoyle sets out to preserve local conservation of energy-momentum,without which the Einstein field equations become inconsistent. (Thiswas Hoyle’s more conservative approach. Bondi and Gold advocatedreplacing general relativity completely rather than modifying it.) Hepostulates a massless, chargeless, scalar field C, called the “C field,”the letter “C” standing for “creation.” Suppose that the C field’scontribution to the stress-energy tensor ends up being that of a per-fect fluid with the same rest frame as the ordinary matter. The rateof creation of mass-energy is then given by the divergence ∇µTµt ,and we need this to be zero. As shown in example 21 on p. 318, thisrequires that our total stress-energy mimic that of a cosmologicalconstant, with ρ+ P = 0. Since the ordinary matter has ρ > 0 andP > 0, the C field will either need to contribute negative energydensity or negative pressure. We’ll see below that Hoyle’s model isconstructed so that the C field has zero energy and negative pres-sure. One will often see the C field described incorrectly as havingnegative energy to cancel out the positive energy of the matter beingcreated. That wouldn’t have worked, because then the total energydensity ρ would always be zero, which is not what we observe. (Forexample, the Friedmann equation for a/a relates ρ to the square ofthe Hubble constant.)

To understand more about why the theory took the form it did,it is helpful to look at some general physical considerations aboutsymmetry. As Bondi and Gold admit candidly, any theory of thistype is likely to violate Lorentz invariance. We can observe an evac-uated box and wait for hydrogen atoms to appear. When they ap-pear, they’re in some state of motion, at least on the average. Thisstate of motion defines a preferred frame. In addition to breakingsymmetry under Lorentz transformations, the theory lacks time-

336 Chapter 8 Sources

Page 337: General Relativity

reversal invariance (because matter appears but never disappears)and charge-conjugation symmetry (because matter appears but an-timatter doesn’t). All of these asymmetries arise because in thisapproach, we try to explain the observed asymmetries of the cosmo-logical state of the universe as arising directly from asymmetries inthe underlying local laws of physics. Such an approach is very dif-ferent from the modern one, in which we expect the asymmetries toarise from either boundary conditions or instabilities (spontaneoussymmetry breaking).

Because the C field is massless and chargeless, we would normallyexpect it to obey the wave equation ∇a∇aC = 0. Hoyle’s field doesnot, however, evolve according to any Lorentz-invariant dynamicallaw. Instead it simply evolves as C = t, where t is a preferred timecoordinate. In any cosmological model in which the matter fieldsare modeled as perfect fluids, we have a preferred time coordinatewhich is the proper time of an observer at rest with respect to thefluid, and in the Hoyle model we do assume that this is the time twe should use in defining C. However, Hoyle’s theory is differentbecause it gives this preferred time a role in the local laws of physics,thereby breaking Lorentz invariance.

The value of the scalar field C cannot have any directly ob-servable effects, since then its time-evolution would distinguish oneepoch of the universe from another. Instead we form the gradient∇aC. This gives a vector, which can be interpreted as a velocityvector defining the preferred frame of reference. An observer in thisframe is at rest relative to the local cosmological fluid, observes theuniverse to be homogeneous, and also observes that when new atomsare created from the vacuum, they are on the average at rest. Thus∇aC is observable.

The contribution of the C field to the stress-energy must be arank-two tensor, and if we want to construct such a tensor, theonly good possibility that occurs to me41 is an k∇a∇bC, with ka positive constant. If the derivatives had been ordinary partialderivatives, the second derivative would have vanished because C islinear in time, but the covariant derivatives do not vanish, and in factthe second derivative is a tensor measuring the rate of cosmologicalexpansion; the trace ∇a∇aC is the volume expansion Θ defined onp. 290. For de Sitter space, Θ has a constant value equal to threetimes the Hubble constant H0. We can now see why we could nottake the C field to evolve according to the usual wave equation∇a∇aC = 0; if it did, then we would have Θ = 0, and the universewould not be expanding.

When we evaluate the second derivative for the de Sitter metric,

41The only other obvious possibility would have been something like−k∇aC∇bC. This would be the stress-energy of a negative-mass dust, whichwould be unacceptable for the reasons discussed earlier.

Section 8.4 Historical note: the steady-state model 337

Page 338: General Relativity

the only nonvanishing Christoffel symbols that occur are Γtxx =Γtyy = Γtzz = aa. We find T tt = 0 and T xx = T yy = T zz = kΘ/3.Thus the C field’s mass-energy density is zero, while for its pressurewe have P = −kΘ/3, which is negative.

For simplicity, we take the ordinary matter to be dust. The totalstress-energy then consists of an energy density ρ that is due only tothe dust, and a negative pressure P that comes only from the C field.If we require both of these to be constant, take the cosmologicalconstant to be zero, set a = eH0t = eΘt/3, and substitute into theFriedmann equations on p. 302, we find P = −ρ, or k = 3ρ/Θ =ρ/H0.

Like the cosmological constant, the C field is taken to be auniversally prescribed property of the vacuum. There is a differ-ence, however, because the cosmological constant’s contribution tothe stress-energy is proportional to the metric, which preserves theequivalence principle. As remarked at the end of the Bondi-Goldpaper, the C field violates the equivalence principle. No calcula-tion is spelled out, but they say based on a personal communicationfrom Hoyle that the field exerts a force on matter which produces asignificant acceleration in an atom, but a negligible one in a star.

A claimed selling point of the C field was that it would preventthe formation of singularities, including both a Big Bang singularityand black holes. This is reasonable, since the C field violates all ofthe energy conditions listed on p. 282 except for the trace energycondition. The Penrose singularity theorem depends on the null en-ergy condition, and the Hawking singularity theorem requires eitherthe strong or the null energy condition.

Because we can’t make the C field obey the proper wave equationfor a massless, spin-zero particle, there is no obvious way to makeup a dynamical law for its evolution in order to replace the fixedrelation C = t, and we do not expect to have any classical fieldtheory for the C field. Hoyle did attempt to add dynamics to themodel by making it into what is known as a “direct field,” which wasa type of action-at-a-distance theory that in the 1960s was believedto be a good candidate for the fundamental description of the forcesof physics. (Quantum field theory had not developed to the pointwhere it could handle the strong or weak nuclear forces.) Suchtheories were shown to be nonviable as quantum theories in 1963 byCurrie, Jordan, and Sudarshan.

The steady-state model began to succumb to contrary evidencewhen Ryle and coworkers counted radio sources and found that theydid not show the statistical behavior predicted by the model. Thecoup de grace came with the discovery of the cosmic microwave back-ground, which demonstrated directly that the universe had oncebeen much hotter than it is now. Attempts have been made toproduce variations on the model that are consistent with these ob-

338 Chapter 8 Sources

Page 339: General Relativity

servations, but they have not succeeded; for a detailed discussionsee http://www.astro.ucla.edu/~wright/stdystat.htm.

Section 8.4 Historical note: the steady-state model 339

Page 340: General Relativity

Problems1 Verify, as claimed on p. 273, that the electromagnetic pressureinside a medium-weight atomic nucleus is on the order of 1033 Pa.

2 Is the Big Bang singularity removable by the coordinatetransformation t→ 1/t? . Solution, p. 388

3 Verify the claim made on p. 305 that a is a linear function oftime in the case of the Milne universe, and that k = −1.

4 Examples 16 on page 310 and 18 on page 312 discussed ropeswith cosmological lengths. Reexamine these examples in the case ofthe Milne universe. . Solution, p. 388

5 (a) Show that the Friedmann equations are symmetric undertime reversal. (b) The spontaneous breaking of this symmetry inperpetually expanding solutions was discussed on page 321. Usethe definition of a manifold to show that this symmetry cannot berestored by gluing together an expanding solution and a contractingone “back to back” to create a single solution on a single, connectedmanifold. . Solution, p. 388

6 The Einstein field equations are

Gab = 8πTab + Λgab ,

and when it is possible to adopt a frame of reference in which thelocal mass-energy is at rest on average, we can interpret the stress-energy tensor as

Tµν = diag(−ρ,P ,P ,P ) ,

where ρ is the mass-energy density and P is the pressure. Fix somepoint as the origin of a local Lorentzian coordinate system. Analyzethe properties of these relations under a reflection such as x → −xor t→ −t. . Solution, p. 389

7 (a) Show that a positive cosmological constant violates thestrong energy condition in a vacuum. In applying the definition ofthe strong energy condition, treat the cosmological constant as aform of matter, i.e., “roll in” the cosmological constant term to thestress-energy term in the field equations. (b) Comment on how thisaffects the results of the following paper: Hawking and Ellis, “TheCosmic Black-Body Radiation and the Existence of Singularities inOur Universe,” Astrophysical Journal, 152 (1968) 25,http://articles.adsabs.harvard.edu/f...pJ...152...25H.

340 Chapter 8 Sources

Page 341: General Relativity

8 In problem 6 on page 200, we analyzed the properties of themetric

ds2 = e2gz dt2 − dz2 .

(a) In that problem we found that this metric had the same proper-ties at all points in space. Verify in particular that it has the samescalar curvature R at all points in space.(b) Show that this is a vacuum solution in the two-dimensional (t, z)space.(c) Suppose we try to generalize this metric to four dimensions as

ds2 = e2gz dt2 − dx2 − dy2 − dz2 .

Show that this requires an Einstein tensor with unphysical proper-ties.

. Solution, p. 389

9 Consider the following proposal for defeating relativity’s prohi-bition on velocities greater than c. Suppose we make a chain billionsof light-years long and attach one end of the chain to a particulargalaxy. At its other end, the chain is free, and it sweeps past thelocal galaxies at a very high speed. This speed is proportional tothe length of the chain, so by making the chain long enough, we canmake the speed exceed c.

Debunk this proposal in the special case of the Milne universe.

10 Make a rigorous definition of the volume V of the observ-able universe. Suppose someone asks whether V depends on theobserver’s state of motion. Does this question have a well-definedanswer? If so, what is it? Can we calculate V ’s observer-dependenceby applying a Lorentz contraction? . Solution, p. 390

11 For a perfect fluid, we have P = wρ, where w is a constant.The cases w = 0 and w = 1/3 correspond, respectively, to dust andradiation. Show that for a flat universe with Λ = 0 dominated by asingle component that is a perfect fluid, the solution to the Fried-mann equations is of the form a ∝ tδ, and determine the exponentδ. Check your result in the dust case against the one on p. 320,then find the exponent in the radiation case. Although the w = −1case corresponds to a cosmological constant, show that the solutionis not of this form for w = −1. . Solution, p. 391

12 Apply the result of problem 11 to generalize the resultof example 22 on p. 321 for the size of the observable universe.What is the result in the case of the radiation-dominated universe?

. Solution, p. 391

Problems 341

Page 342: General Relativity

342 Chapter 8 Sources

Page 343: General Relativity

Chapter 9

Gravitational waves

9.1 The speed of gravityIn Newtonian gravity, gravitational effects are assumed to propagateat infinite speed, so that for example the lunar tides correspond atany time to the position of the moon at the same instant. Thisclearly can’t be true in relativity, since simultaneity isn’t somethingthat different observers even agree on. Not only should the “speed ofgravity” be finite, but it seems implausible that it would be greaterthan c; in section 2.2 (p. 52), we argued based on empirically wellestablished principles that there must be a maximum speed of causeand effect. Although the argument was only applicable to specialrelativity, i.e., to a flat spacetime, it seems likely to apply to generalrelativity as well, at least for low-amplitude waves on a flat back-ground. As early as 1913, before Einstein had even developed thefull theory of general relativity, he had carried out calculations inthe weak-field limit showing that gravitational effects should prop-agate at c. This seems eminently reasonable, since (a) it is likely tobe consistent with causality, and (b) G and c are the only constantswith units that appear in the field equations (obscured by our choiceof units, in which G = 1 and c = 1), and the only velocity-scale thatcan be constructed from these two constants is c itself.1

Although extremely well founded theoretically, this turns out tobe extremely difficult to test empirically. In a 2003 experiment,2 Fo-malont and Kopeikin used a world-wide array of radio telescopes toobserve a conjunction in which Jupiter passed within 3.7′ of a quasar,so that the quasar’s radio waves came within about 3 light-secondsof the planet on their way to the earth. Since Jupiter moves with

1High-amplitude waves need not propagate at c. For example, general rela-tivity predicts that a gravitational-wave pulse propagating on a background ofcurved spacetime develops a trailing edge that propagates at less than c (Misner,Thorne, and Wheeler, p. 957). This effect is weak when the amplitude is smallor the wavelength is short compared to the scale of the background curvature.It makes sense that the effect vanishes when background curvature is absent,since there is then no fixed scale. Dispersion requires that different wavelengthspropagate at different speeds, but without a scale there is no reason for anywavelength to behave any differently from any other. At very high amplitudes,one can even have such exotic phenomena as the formation of black holes whenenough wave energy is focused into a small region. None of these phenomena isever likely to be observed empirically, since all gravitational waves in our universehave extremely small amplitudes.

2http://arxiv.org/abs/astro-ph/0302294

343

Page 344: General Relativity

v = 4×10−5, one expects naively that the radio waves passing by itshould be deflected by the field produced by Jupiter at the position ithad 3 seconds earlier. This position differs from its present positionby about 10−4 light-seconds, and the result should be a differencein propagation time, which should be different when observed fromdifferent locations on earth. Fomalont and Kopeikin measured thesephase differences with picosecond precision, and found them to bein good agreement with the predictions of general relativity. Thereal excitement started when they published their result with theinterpretation that they had measured, for the first time, the speedof gravity, and found it to be within 20% error bars of c. Samuel3

and Will4 published refutations, arguing that Kopeikin’s calcula-tions contained mistakes, and that what had really been measuredwas the speed of light, not the speed of gravity.

The reason that the interpretation of this type of experimentis likely to be controversial is that although we do have theories ofgravity that are viable alternatives to general relativity (e.g., theBrans-Dicke theory, in which the gravitational constant is a dynam-ically changing variable), such theories have generally been carefullydesigned to agree with general relativity in the weak-field limit, andin particular every such theory (or at least every theory that remainsviable given current experimental data) predicts that gravitationaleffects propagate at c in the weak-field limit. Without an alternativetheory to act as a framework — one that disagrees with relativityabout the speed of gravity — it is difficult to know whether an ob-servation that agrees with relativity is a test of this specific aspectof relativity.

9.2 Gravitational radiation9.2.1 Empirical evidence

So we still don’t know, a century after Einstein found the fieldequations, whether gravitational “ripples” travel at c. Nevertheless,we do have strong empirical evidence that such ripples exist. TheHulse-Taylor system (page 222) contains two neutron stars orbitingaround their common center of mass, and the period of the orbitis observed to be decreasing gradually over time (figure a). This isinterpreted as evidence that the stars are losing energy to radiationof gravitational waves.5 As we’ll see in section 9.2.5, the rate ofenergy loss is in excellent agreement with the predictions of generalrelativity.

An even more dramatic, if less clearcut, piece of evidence is

3http://arxiv.org/abs/astro-ph/03040064http://arxiv.org/abs/astro-ph/03011455Stairs, “Testing General Relativity with Pulsar Timing,” http://

relativity.livingreviews.org/Articles/lrr-2003-5/

344 Chapter 9 Gravitational Waves

Page 345: General Relativity

a / The Hulse-Taylor pulsar’s or-bital motion is gradually losingenergy due to the emission ofgravitational waves. The lineardecrease of the period is inte-grated on this plot, resulting ina parabola. From Weisberg andTaylor, http://arxiv.org/abs/

astro-ph/0211217.

Komossa, Zhou, and Lu’s observation6 of a supermassive black holethat appears to be recoiling from its parent galaxy at a velocity of2650 km/s (projected along the line of sight). They interpret this asevidence for the following scenario. In the early universe, galaxiesform with supermassive black holes at their centers. When two suchgalaxies collide, the black holes can merge. The merger is a violentprocess in which intense gravitational waves are emitted, and thesewaves carry a large amount of momentum, causing the black holesto recoil at a velocity greater than the escape velocity of the mergedgalaxy.

Although the energy loss from systems such as the Hulse-Taylorbinary provide strong evidence that gravitational waves exist andcarry enery, we would also like to detect them directly. This wouldnot only be a definitive test of a century-old prediction of general rel-ativity, it would also open a window into a completely new method

6http://arxiv.org/abs/0804.4585

Section 9.2 Gravitational radiation 345

Page 346: General Relativity

of astronomical observation. A series of attempts is under way toobserve gravitational waves directly using interferometers, which de-tect oscillations in the lengths of their own arms. These can be eitherground-based or space-based. The early iterations of the ground-based LIGO interferometer did not detect gravitational waves, buta new version, Advanced LIGO, is under construction. A space-based system, LISA, has been proposed for launch in 2020, but itsfunding is uncertain. The two devices would operate in complemen-tary frequency ranges (figure b). A selling point of LISA is that ifit is launched, there are a number of known sources in the sky thatare known to be easily within its range of sensitivity.7 One excel-lent candidate is HM Cancri, a pair of white dwarfs with an orbitalperiod of 5.4 minutes, shorter than that of any other known binarystar.8

b / Predicted sensitivities of LISAand LIGO to gravitational wavesof various frequences.

9.2.2 Energy content

Even without performing the calculations for a system like theHulse-Taylor binary, it is easy to show that if such waves exist, theymust be capable of carrying away energy. Consider two equal massesin highly elliptical orbits about their common center of mass, figurec. The motion is nearly one-dimensional. As the masses recede fromone another, they feel a delayed version of the gravitational forceoriginating from a time when they were closer together and the forcewas stronger. The result is that in the near-Newtonian limit, theylose more kinetic and gravitational energy than they would havelost in the purely Newtonian theory. Now they come back inward

7G. Nelemans, “The Galactic Gravitational wave foreground,” arxiv.org/

abs/0901.1778v18Roelofs et al., “Spectroscopic Evidence for a 5.4-Minute Orbital Period in

HM Cancri,” arxiv.org/abs/1003.0658v1

346 Chapter 9 Gravitational Waves

Page 347: General Relativity

c / As the two planets recedefrom one another, each feels thegravitational attraction that theother one exerted in its previousposition, delayed by the timeit takes gravitational effects topropagate at c. At time t , theright-hand planet experiencesthe stronger deceleration corre-sponding to the left-hand planet’scloser position at the earliertime t ′, not its current positionat t . Mechanical energy is notconserved, and the orbits willdecay.

d / The sticky bead argumentfor the reality of gravitationalwaves. As a gravitational wavewith the appropriate polarizationpasses by, the bead vibrates backand forth on the rod. Friction cre-ates heat. This demonstrates thatgravitational waves carry energy,and are thus real, observablephenomena.

in their orbits. As they approach one another, the time-delayedforce is anomalously weak, so they gain less mechanical energy thanexpected. The result is that with each cycle, mechanical energy islost. We expect that this energy is carried by the waves, in thesame way that radio waves carry the energy lost by a transmittingantenna.9

Not only can these waves remove mechanical energy from a sys-tem, they can also deposit energy in a detector, as shown by thenonmathematical “sticky bead argument” (figure d), which was orig-inated by Feynman in 1957 and later popularized by Bondi.

Now strictly speaking, we have only shown that gravitationalwaves can extract or donate mechanical energy, but not that thewaves themselves transmit this energy. The distinction isn’t onethat normally occurs to us, since we are trained to believe thatenergy is always conserved. But we know that, for fundamental rea-sons, general relativity doesn’t have global conservation laws thatapply to all spacetimes (p. 148). Perhaps the energy lost by theHulse-Taylor system is simply gone, never to reappear, and the en-ergy imparted to the sticky bead is simply generated out of nowhere.On the other hand, general relativity does have global conservationlaws for certain specific classes of spacetimes, including, for example,a conserved scalar mass-energy in the case of a stationary spacetime(p. 248). Spacetimes containing gravitational waves are not sta-tionary, but perhaps there is something similar we can do in someappropriate special case.

Suppose we want an expression for the energy of a gravita-tional wave in terms of its amplitude. This seems like it oughtto be straightforward. We have such expressions in other classi-cal field theories. In electromagnetism, we have energy densities+(1/8πk)|E|2 and +(1/2µo)|B|2 associated with the electric andmagnetic fields. In Newtonian gravity, we can assign an energydensity −(1/8πG)|g|2 to the gravitational field g; the minus signindicates that when masses glom onto each other, they produce agreater field, and energy is released.

In general relativity, however, the equivalence principle tells usthat for any gravitational field measured by one observer, we can

9One has to be careful with this type of argument. In particular, one canobtain incorrect correct results by attempting to generalize this one-dimensionalargument to motion in more than one dimension, because the effective semi-Newtonian interaction is not just a time-delayed version of Newton’s law; it alsoincludes velocity-dependent forces. It is easy to see why such velocity-dependencemust occur in the simpler case of electromagnetism. Suppose that charges Aand B are not at rest relative to one another. In B’s frame, the electric fieldfrom A must come from the direction of the position that an observer comovingwith B would extrapolate linearly from A’s last known position and velocity,as determined by light-speed calculation. This follows from Lorentz invariance,since this is the direction that will be seen by an observer comoving with A. Afull discussion is given by Carlip, arxiv.org/abs/gr-qc/9909087v2.

Section 9.2 Gravitational radiation 347

Page 348: General Relativity

find another observer, one who is free-falling, who says that the localfield is zero. It follows that we cannot associate an energy with thecurvature of a particular region of spacetime in any exact way. Thebest we can do is to find expressions that give the energy density (1)in the limit of weak fields, and (2) when averaged over a region ofspace that is large compared to the wavelength. These expressionsare not unique. There are a number of ways to write them in termsof the metric and its derivatives, and they all give the same resultin the appropriate limit. The reader who is interested in seeing thesubject developed in detail is referred to Carroll’s Lecture Notes onGeneral Relativity, http://arxiv.org/abs/gr-qc/?9712019. Al-though this sort of thing is technically messy, we can accomplishquite a bit simply by knowing that such results do exist, and thatalthough they are non-unique in general, they are uniquely well de-fined in certain cases. Specifically, when one wants to discuss grav-itational waves, it is usually possible to assume an asymptoticallyflat spacetime. In an asymptotically flat spacetime, there is a scalarmass-energy, called the ADM mass, that is conserved. In this re-stricted sense, we are assured that the books balance, and that theemission and absorption of gravitational waves really does mean thetransmission of a fixed amount of energy.

9.2.3 Expected properties

To see what properties we should expect for gravitational radia-tion, first consider the reasoning that led to the construction of theRicci and Einstein tensors. If a certain volume of space is filled withtest particles, then the Ricci and Einstein tensors measure the ten-dency for this volume to “accelerate;” i.e., −d2 V/ dt2 is a measureof the attraction of any mass lying inside the volume. A distantmass, however, will exert only tidal forces, which distort a regionwithout changing its volume. This suggests that as a gravitationalwave passes through a certain region of space, it should distort theshape of a given region, without changing its volume.

When the idea of gravitational waves was first discussed, therewas some skepticism about whether they represented an effect thatwas observable, even in principle. The most naive such doubt is ofthe same flavor as the one discussed in section 8.2.6 about the ob-servability of the universe’s expansion: if everything distorts, thendon’t our meter-sticks distort as well, making it impossible to mea-sure the effect? The answer is the same as before in section 8.2.6;systems that are gravitationally or electromagnetically bound donot have their scales distorted by an amount equal to the change inthe elements of the metric.

A less naive reason to be skeptical about gravitational waves isthat just because a metric looks oscillatory, that doesn’t mean its

348 Chapter 9 Gravitational Waves

Page 349: General Relativity

e / As the gravitational wavepropagates in the z direction, themetric oscillates in the x and ydirections, preserving volume.

oscillatory behavior is observable. Consider the following example.

ds2 = dt2 −(

1 +1

10sinx

)dx2 − dy2 − dz2

The Christoffel symbols depend on derivatives of the form ∂agbc, sohere the only nonvanishing Christoffel symbol is Γxxx. It is thenstraightforward to check that the Riemann tensor Rabcd = ∂cΓ

adb −

∂dΓacb + ΓaceΓ

edb − ΓadeΓ

ecb vanishes by symmetry. Therefore this

metric must really just be a flat-spacetime metric that has beensubjected to a silly change of coordinates.

Self-check: R vanishes, but Γ doesn’t. Is there a reason forpaying more attention to one or the other?

To keep the curvature from vanishing, it looks like we need ametric in which the oscillation is not restricted to a single variable.For example, the metric

ds2 = dt2 −(

1 +1

10sin y

)dx2 − dy2 − dz2

does have nonvanishing curvature. In other words, it seems likewe should be looking for transverse waves rather than longitudi-nal ones.10 On the other hand, this metric cannot be a solutionto the vacuum field equations, since it doesn’t preserve volume. Italso stands still, whereas we expect that solutions to the field equa-tions should propagate at the velocity of light, at least for smallamplitudes. These conclusions are self-consistent, because a wave’spolarization can only be constrained if it propagates at c (see p. 129).

Based on what we’ve found out, the following seems like a metricthat might have a fighting chance of representing a real gravitationalwave:

ds2 = dt2 − (1 +A sin(z − t)) dx2 − dy2

1 +A sin(z − t)− dz2

It is transverse, it propagates at c(= 1), and the fact that gxx is thereciprocal of gyy makes it volume-conserving. The following Maximaprogram calculates its Einstein tensor:

1 load(ctensor);

2 ct_coords:[t,x,y,z];

10A more careful treatment shows that longitudinal waves can always be in-terpreted as physically unobservable coordinate waves, in the limit of large dis-tances from the source. On the other hand, it is clear that no such prohibitionagainst longitudinal waves could apply universally, because such a constraintcan only be Lorentz-invariant if the wave propagates at c (see p. 129), whereashigh-amplitude waves need not propagate at c. Longitudinal waves near thesource are referred to as Type III solutions in a classification scheme due toPetrov. Transverse waves, which are what we could actually observe in practicalexperiments, are type N.

Section 9.2 Gravitational radiation 349

Page 350: General Relativity

3 lg:matrix([1,0,0,0],

4 [0,-(1+A*sin(z-t)),0,0],

5 [0,0,-1/(1+A*sin(z-t)),0],

6 [0,0,0,-1]);

7 cmetric();

8 einstein(true);

For a representative component of the Einstein tensor, we find

Gtt = − A2 cos2(z − t)2 + 4A sin(z − t) + 2A2 sin2(z − t)

For small values of A, we have |Gtt| . A2/2. The vacuum fieldequations require Gtt = 0, so this isn’t an exact solution. But allthe components of G, not just Gtt, are of order A2, so this is anapproximate solution to the equations.

It is also straightforward to check that propagation at approx-imately c was a necessary feature. For example, if we replace thefactors of sin(z− t) in the metric with sin(z−2t), we get a Gxx thatis of order unity, not of order A2.

To prove that gravitational waves are an observable effect, wewould like to be able to display a metric that (1) is an exact solutionof the vacuum field equations; (2) is not merely a coordinate wave;and (3) carries momentum and energy. As late as 1936, Einstein andRosen published a paper claiming that gravitational waves were amathematical artifact, and did not actually exist.11

9.2.4 Some exact solutions

In this section we study several examples of exact solutions tothe field equations. Each of these can readily be shown not to bea mere coordinate wave, since in each case the Riemann tensor hasnonzero elements.

An exact solution Example: 1We’ve already seen, e.g., in the derivation of the Schwarzschildmetric in section 6.2.4, that once we have an approximate solu-tion to the equations of general relativity, we may be able to find aseries solution. Historically this approach was only used as a lastresort, because the lack of computers made the calculations toocomplex to handle, and the tendency was to look for tricks thatwould make a closed-form solution possible. But today the seriesmethod has the advantage that any mere mortal can have somereasonable hope of success with it — and there is nothing moreboring (or demoralizing) than laboriously learning someone else’sspecial trick that only works for a specific problem. In this exam-ple, we’ll see that such an approach comes tantalizingly close11Some of the history is related at http://en.wikipedia.org/wiki/Sticky_

bead_argument.

350 Chapter 9 Gravitational Waves

Page 351: General Relativity

to providing an exact, oscillatory plane wave solution to the fieldequations.

Our best solution so far was of the form

ds2 = dt2 − (1 + f ) dx2 − dy2

1 + f− dz2 ,

where f = A sin(z − t). This doesn’t seem likely to be an exactsolution for large amplitudes, since the x and y coordinates aretreated asymmetrically. In the extreme case of |A| ≥ 1, therewould be singularities in gyy , but not in gxx . Clearly the metric willhave to have some kind of nonlinear dependence on f , but we justhaven’t found quite the right nonlinear dependence. Suppose wetry something of this form:

ds2 = dt2 −(

1 + f + cf 2)

dx2 −(

1− f + df 2)

dy2 − dz2

This approximately conserves volume, since (1+ f + . . .)(1− f + . . .)equals unity, up to terms of order f 2. The following program teststhis form.

1 load(ctensor);

2 ct_coords:[t,x,y,z];

3 f : A*exp(%i*k*(z-t));

4 lg:matrix([1,0,0,0],

5 [0,-(1+f+c*f^2),0,0],

6 [0,0,-(1-f+d*f^2),0],

7 [0,0,0,-1]);

8 cmetric();

9 einstein(true);

In line 3, the motivation for using the complex exponential ratherthan a sine wave in f is the usual one of obtaining simpler ex-pressions; as we’ll see, this ends up causing problems. In lines 5and 6, the symbols c and d have not been defined, and have notbeen declared as depending on other variables, so Maxima treatsthem as unknown constants. The result is Gt t ∼ (4d + 4c − 3)A2

for small A, so we can make the A2 term disappear by an appro-priate choice of d and c. For symmetry, we choose c = d = 3/8.With these values of the constants, the result for Gt t is of orderA4. This technique can be extended to higher and higher ordersof approximation, resulting in an exact series solution to the fieldequations.

Unfortunately, the whole story ends up being too good to be true.The resulting metric has complex-valued elements. If general rel-ativity were a linear field theory, then we could apply the usualtechnique of forming linear combinations of expressions of theform e+i... and e−i..., so as to give a real result. Unfortunately thefield equations of general relativity are nonlinear, so the resultinglinear combination is no longer a solution. The best we can do isto make a non-oscillatory real exponential solution (problem 2).

Section 9.2 Gravitational radiation 351

Page 352: General Relativity

An exact, oscillatory, non-monochromatic solution Example: 2Assume a metric of the form

ds2 = dt2 − p(z − t)2 dx2 − q(z − t)2 dy2 − dz2 ,

where p and q are arbitrary functions. Such a metric would clearlyrepresent some kind of transverse-polarized plane wave travelingat velocity c(= 1) in the z direction. The following Maxima codecalculates its Einstein tensor.

1 load(ctensor);

2 ct_coords:[t,x,y,z];

3 depends(p,[z,t]);

4 depends(q,[z,t]);

5 lg:matrix([1,0,0,0],

6 [0,-p^2,0,0],

7 [0,0,-q^2,0],

8 [0,0,0,-1]);

9 cmetric();

10 einstein(true);

The result is proportional to q/q + p/p, so any functions p andq that satisfy the differential equation q/q + p/p = 0 will resultin a solution to the field equations. Setting p(u) = 1 + A cos u, forexample, we find that q is oscillatory, but with a period longer than2π (problem 3).

An exact, plane, monochromatic wave Example: 3Any metric of the form

ds2 = (1− h) dt2 − dx2 − dy2 − (1 + h) dz2 + 2h dz dt ,

where h = f (z − t)xy , and f is any function, is an exact solution ofthe field equations (problem 4).

Because h is proportional to xy , this does not appear at firstglance to be a uniform plane wave. One can verify, however, thatall the components of the Riemann tensor depend only on z − t ,not on x or y . Therefore there is no measurable property of thismetric that varies with x and y .

9.2.5 Rate of radiation

How can we find the rate of gravitational radiation from a systemsuch as the Hulse-Taylor pulsar?

Let’s proceed by analogy. The simplest source of sound wavesis something like the cone of a stereo speaker. Since typical soundwaves have wavelengths measured in meters, the entire speaker isgenerally small compared to the wavelength. The speaker cone isa surface of oscillating displacement x = xo sinωt. Idealizing such

352 Chapter 9 Gravitational Waves

Page 353: General Relativity

f / The power emitted by amultipole source of order m isproportional to ω2(m+1), whenthe size of the source is smallcompared to the wavelength. Themain reason for the ω depen-dence is that at low frequencies,the wavelength is long, so thenumber of wavelengths traveledto a particular point in space isnearly the same from any pointin the source; we therefore getstrong cancellation.

a source to a radially pulsating spherical surface, we have an oscil-lating monopole that radiates sound waves uniformly in all direc-tions. To find the power radiated, we note that the velocity of thesource-surface is proportional to xoω, so the kinetic energy of the airimmediately in contact with it is proportional to ω2x2

o. The powerradiated is therefore proportional to ω2x2

o.

In electromagnetism, conservation of charge forbids the existenceof an oscillating electric monopole. The simplest radiating source istherefore an oscillating electric dipole D = Do sinωt. If the dipole’sphysical size is small compared to a wavelength of the radiation,then the radiation is an inefficient process; at any point in space,there is only a small difference in path length between the positiveand negative portions of the dipole, so there tends to be strongcancellation of their contributions, which were emitted with oppositephases. The result is that the wave’s electromagnetic potential four-vector (section 4.2.5) is proportional to Doω, the fields to Doω

2, andthe radiated power to D2

oω4. The factor of ω4 can be broken down

into (ω2)(ω2), where the first factor of ω2 occurs for reasons similarto the ones that explain the ω2 factor for the monopole radiationof sound, while the second ω2 arises because the smaller ω is, thelonger the wavelength, and the greater the inefficiency in radiationcaused by the small size of the source compared to the wavelength.

AM radio Example: 4Commercial AM radio uses wavelengths of several hundred me-ters, so AM dipole antennas are usually orders of magnitudeshorter than a wavelength. This causes severe attentuation inboth transmission and reception. (There are theorems calledreciprocity theorems that relate efficiency of transmission to ef-ficiency of reception.) Receivers therefore need to use of a largeamount of amplification. This doesn’t cause problems, becausethe ambient sources of RF noise are attenuated by the short an-tenna just as severely as the signal.

Since our universe doesn’t seem to have particles with negativemass, we can’t form a gravitational dipole by putting positive andnegative masses on opposite ends of a stick — and furthermore,such a stick will not spin freely about its center, because its centerof mass does not lie at its center! In a more realistic system, such asthe Hulse-Taylor pulsar, we have two unequal masses orbiting abouttheir common center of mass. By conservation of momentum, themass dipole moment of such a system is constant, so we cannot havean oscillating mass dipole. The simplest source of gravitational ra-diation is therefore an oscillating mass quadrupole, Q = Qo sinωt.As in the case of the oscillating electric dipole, the radiation is sup-pressed if, as is usually the case, the source is small compared tothe wavelength. The suppression is even stronger in the case of aquadrupole, and the result is that the radiated power is proportional

Section 9.2 Gravitational radiation 353

Page 354: General Relativity

to Q2oω

6.

This result has the interesting property of being invariant un-der a rescaling of coordinates. In geometrized units, mass, dis-tance, and time all have the same units, so that Q2

o has unitsof (length3)2 while ω6 has units of (length)−6. This is exactlywhat is required, because in geometrized units, power is unitless,energy/time = length/length = 1.

We can also tie the ω6 dependence to our earlier argument, onp. 346, for the dissipation of energy by gravitational waves. Theargument was that gravitating bodies are subject to time-delayedgravitational forces, with the result that orbits tend to decay. Thisargument only works if the forces are time-varying; if the forcesare constant over time, then the time delay has no effect. For ex-ample, in the semi-Newtonian limit the field of a sheet of mass isindependent of distance from the sheet. (The electrical analog ofthis fact is easily proved using Gauss’s law.) If two parallel sheetsfall toward one another, then neither is subject to a time-varyingforce, so there will be no radiation. In general, we expect that therewill be no gravitational radiation from a particle unless the thirdderivative of its position d3 x/ dt3 is nonzero. (The same is true forelectric quadrupole radiation.) In the special case where the positionoscillates sinusoidally, the chain rule tells us that taking the thirdderivative is equivalent to bringing out a factor of ω3. Since theamplitude of gravitational waves is proportional to d3 x/ dt3, theirenergy varies as (d3 x/ dt3)2, or ω6.

The general pattern we have observed is that for multipole radi-ation of order m (0=monopole, 1=dipole, 2=quadrupole), the radi-ated power depends on ω2(m+1). Since gravitational radiation mustalways have m = 2 or higher, we have the very steep ω6 depen-dence of power on frequency. This demonstrates that if we wantto see strong gravitational radiation, we need to look at systemsthat are oscillating extremely rapidly. For a binary system withunequal masses of order m, with orbits having radii of order r, wehave Qo ∼ mr2. Newton’s laws give ω ∼ m1/2r−3/2, which is essen-tially Kepler’s law of periods. The result is that the radiated powershould depend on (m/r)5. Reinserting the proper constants to givean equation that allows practical calculation in SI units, we have

P = kG4

c5

(mr

)5,

where k is a unitless constant of order unity.

For the Hulse-Taylor pulsar,12 we have m ∼ 3× 1030 kg (aboutone and a half solar masses) and r ∼ 109 m. The binary pulsar ismade to order our purposes, since m/r is extremely large comparedto what one sees in almost any other astronomical system. Theresulting estimate for the power is about 1024 watts.

12http://arxiv.org/abs/astro-ph/0407149

354 Chapter 9 Gravitational Waves

Page 355: General Relativity

The pulsar’s period is observed to be steadily lengthening at arate of α = 2.418× 10−12 seconds per second. To compare this withour crude theoretical estimate, we take the Newtonian energy of thesystem Gm2/r and multiply by ωα, giving 1025 W, which checks towithin an order of magnitude. A full general-relativistic calculationreproduces the observed value of α to within the 0.1% error bars ofthe data.

Section 9.2 Gravitational radiation 355

Page 356: General Relativity

Problems1 (a) Starting on page 21, we have associated geodesics withthe world-lines of low-mass objects (test particles). Use the Hulse-Taylor pulsar as an example to show that the assumption of lowmass was a necessary one. How is this similar to the issues encoun-tered on pp. 39ff involving charged particles?(b) Show that if low-mass, uncharged particles did not follow geodesics(in a spacetime with no ambient electromagnetic fields), it would vi-olate Lorentz invariance. Make sure that your argument explicitlyinvokes the low mass and the lack of charge, because otherwise yourargument is wrong. . Solution, p. 391

2 Show that the metric ds2 = dt2 −Adx2 −B dy2 − dz2 with

A = 1− f +3

8f2 − 25

416f3 +

15211

10729472f5

B = 1 + f +3

8f2 +

25

416f3 − 15211

10729472f5

f = Aek(t−z)

is an approximate solution to the vacuum field equations, providedthat k is real — which prevents this from being a physically realistic,oscillating wave. Find the next nonvanishing term in each series.

3 Verify the claims made in example 2. Characterize the (some-what complex) behavior of the function q obtained when p(u) =1 +A cosu.

4 Verify the claims made in example 3 using Maxima. Althoughthe result holds for any function f , you may find it more convenientto use some specific form of f , such as a sine wave, so that Maximawill be able to simplify the result to zero at the end. Note thatwhen the metric is expressed in terms of the line element, there is afactor of 2 in the 2hdz dt term, but when expressing it as a matrix,the 2 is not present in the matrix elements, because there are twoelements in the matrix that each contribute an equal amount.

356 Chapter 9 Gravitational Waves

Page 357: General Relativity

Appendix 1: Excerpts from three papers byEinstein

The following English translations of excerpts from three papers by Einstein were originallypublished in “The Principle of Relativity,” Methuen and Co., 1923. The translation was byW. Perrett and G.B. Jeffery, and notes were provided by A. Sommerfeld. John Walker (www.fourmilab.ch) has provided machine-readable versions of the first two and placed them in thepublic domain. Some notation has been modernized, British spelling has been Americanized,etc. Footnotes by Sommerfeld, Walker, and B. Crowell are marked with initials. B. Crowell’smodifications to the present version are also in the public domain.

The paper “On the electrodynamics of moving bodies” contains two parts, the first dealingwith kinematics and the second with electrodynamics. I’ve given only the first part here, sincethe second one is lengthy, and painful to read because of the cumbersome old-fashioned notation.The second section can be obtained from John Walker’s web site.

The paper “Does the inertia of a body depend upon its energy content?,” which begins onpage 370, is very short and readable. A shorter and less general version of its main argument isgiven on p. 135.

“The foundation of the general theory of relativity” is a long review article in which Einsteinsystematically laid out the general theory, which he had previously published in a series ofshorter papers. The first three sections of the paper give the general physical reasoning behindcoordinate independence, referred to as general covariance. It begins on page 372.

The reader who is interested in seeing these papers in their entirety can obtain them inex-pensively in a Dover reprint of the original Methuen anthology.

On the electrodynamics of moving bodiesA. Einstein, Annalen der Physik 17 (1905) 891.

It is known that Maxwell’s electrodynamics—as usually understood at the present time—when applied to moving bodies, leads to asymmetries which do not appear to be inherent inthe phenomena.13 Take, for example, the reciprocal electrodynamic action of a magnet anda conductor. The observable phenomenon here depends only on the relative motion of theconductor and the magnet, whereas the customary view draws a sharp distinction between thetwo cases in which either the one or the other of these bodies is in motion. For if the magnetis in motion and the conductor at rest, there arises in the neighbourhood of the magnet anelectric field with a certain definite energy, producing a current at the places where parts ofthe conductor are situated. But if the magnet is stationary and the conductor in motion, noelectric field arises in the neighbourhood of the magnet. In the conductor, however, we find anelectromotive force, to which in itself there is no corresponding energy, but which gives rise—assuming equality of relative motion in the two cases discussed—to electric currents of the samepath and intensity as those produced by the electric forces in the former case.

Examples of this sort, together with the unsuccessful attempts to discover any motion of theearth relative to the “light medium,” suggest that the phenomena of electrodynamics as well as

13Einstein begins by giving an example involving electromagnetic induction, considered in two different framesof reference. With modern hindsight, we would describe this by saying that a Lorentz boost mixes the electricand magnetic fields, as described in section 4.2.4, p. 136. —BC

Problems 357

Page 358: General Relativity

of mechanics possess no properties corresponding to the idea of absolute rest.14 They suggestrather that, as has already been shown to the first order of small quantities,15 the same lawsof electrodynamics and optics will be valid for all frames of reference for which the equationsof mechanics hold good.16 We will raise this conjecture (the purport of which will hereafter becalled the “Principle of Relativity”) to the status of a postulate, and also introduce anotherpostulate, which is only apparently irreconcilable with the former, namely, that light is alwayspropagated in empty space with a definite velocity c which is independent of the state of motionof the emitting body.17 These two postulates suffice for the attainment of a simple and consistenttheory of the electrodynamics of moving bodies based on Maxwell’s theory for stationary bodies.The introduction of a “luminiferous ether” will prove to be superfluous inasmuch as the viewhere to be developed will not require an “absolutely stationary space” provided with specialproperties, nor assign a velocity-vector to a point of the empty space in which electromagneticprocesses take place.

The theory to be developed is based—like all electrodynamics—on the kinematics of therigid body, since the assertions of any such theory have to do with the relationships betweenrigid bodies (systems of coordinates), clocks, and electromagnetic processes.18 Insufficient con-sideration of this circumstance lies at the root of the difficulties which the electrodynamics ofmoving bodies at present encounters.

I. KINEMATICAL PART

§1. Definition of Simultaneity

Let us take a system of coordinates in which the equations of Newtonian mechanics holdgood.19 In order to render our presentation more precise and to distinguish this system ofcoordinates verbally from others which will be introduced hereafter, we call it the “stationary

14Einstein knew about the Michelson-Morley experiment by 1905 (J. van Dongen, arxiv.org/abs/0908.1545),but it isn’t cited specifically here. The 1881 and 1887 Michelson-Morley papers are available online at en.

wikisource.org. —BC15I.e., to first order in v/c. Experimenters as early as Fresnel (1788-1827) had shown that there were no effects

of order v/c due to the earth’s motion through the aether, but they were able to interpret this without jettisoningthe aether, by contriving models in which solid substances dragged the aether along with them. The negativeresult of the Michelson-Morley experiment showed a lack of an effect of order (v/c)2. —BC

16The preceding memoir by Lorentz was not at this time known to the author. —AS17The second postulate is redundant if we take the “laws of electrodynamics and optics” to refer to Maxwell’s

equations. Maxwell’s equations require that light move at c in any frame of reference in which they are valid, andthe first postulate has already claimed that they are valid in all inertial frames of reference. Einstein probablystates constancy of c as a separate postulate because his audience is accustomed to thinking of Maxwell’s equationsas a partial mathematical representation of certain aspects of an underlying aether theory. Throughout part I ofthe paper, Einstein is able to derive all his results without assuming anything from Maxwell’s equations other thanthe constancy of c. The use of the term “postulate” suggests the construction of a formal axiomatic system likeEuclidean geometry, but Einstein’s real intention here is to lay out a set of philosophical criteria for evaluatingcandidate theories; he freely brings in other, less central, assumptions later in the paper, as when he invokeshomogeneity of spacetime on page 362. —BC

18Essentially what Einstein means here is that you can’t have Maxwell’s equations without establishing positionand time coordinates, and you can’t have position and time coordinates without clocks and rulers. Therefore eventhe description of a purely electromagnetic phenomenon such as a light wave depends on the existence of materialobjects. He doesn’t spell out exactly what he means by “rigid,” and we now know that relativity doesn’t actuallyallow the existence of perfectly rigid solids (see p. 109). Essentially he wants to be able to talk about rulers thatbehave like solids rather than liquids, in the sense that if they are accelerated sufficiently gently from rest andlater brought gently back to rest, their properties will be unchanged. When he derives the length contractionlater, he wants it to be clear that this isn’t a dynamical phenomenon caused by an effect such as the drag of theaether.—BC

19i.e., to the first approximation.—AS

358 Chapter 9 Gravitational Waves

Page 359: General Relativity

system.”

If a material point is at rest relative to this system of coordinates, its position can be definedrelative thereto by the employment of rigid standards of measurement and the methods ofEuclidean geometry, and can be expressed in Cartesian coordinates.

If we wish to describe the motion of a material point, we give the values of its coordinatesas functions of the time. Now we must bear carefully in mind that a mathematical descriptionof this kind has no physical meaning unless we are quite clear as to what we understand by“time.” We have to take into account that all our judgments in which time plays a part arealways judgments of simultaneous events. If, for instance, I say, “That train arrives here at 7o’clock,” I mean something like this: “The pointing of the small hand of my watch to 7 and thearrival of the train are simultaneous events.”20

It might appear possible to overcome all the difficulties attending the definition of “time”by substituting “the position of the small hand of my watch” for “time.” And in fact such adefinition is satisfactory when we are concerned with defining a time exclusively for the placewhere the watch is located; but it is no longer satisfactory when we have to connect in timeseries of events occurring at different places, or—what comes to the same thing—to evaluatethe times of events occurring at places remote from the watch.

We might, of course, content ourselves with time values determined by an observer stationedtogether with the watch at the origin of the coordinates, and coordinating the correspondingpositions of the hands with light signals, given out by every event to be timed, and reaching himthrough empty space. But this coordination has the disadvantage that it is not independent ofthe standpoint of the observer with the watch or clock, as we know from experience. We arriveat a much more practical determination along the following line of thought.

If at the point A of space there is a clock, an observer at A can determine the time valuesof events in the immediate proximity of A by finding the positions of the hands which aresimultaneous with these events. If there is at the point B of space another clock in all respectsresembling the one at A, it is possible for an observer at B to determine the time values ofevents in the immediate neighbourhood of B. But it is not possible without further assumptionto compare, in respect of time, an event at A with an event at B. We have so far defined onlyan “A time” and a “B time.” We have not defined a common “time” for A and B, for the lattercannot be defined at all unless we establish by definition that the “time” required by light totravel from A to B equals the “time” it requires to travel from B to A. Let a ray of light startat the “A time” tA from A towards B, let it at the “B time” tB be reflected at B in the directionof A, and arrive again at A at the “A time” t′A.

In accordance with definition the two clocks synchronize21 if

tB − tA = t′A − tB.

We assume that this definition of synchronism is free from contradictions, and possible forany number of points; and that the following relations are universally valid:—

1. If the clock at B synchronizes with the clock at A, the clock at A synchronizes with theclock at B.

20We shall not here discuss the inexactitude which lurks in the concept of simultaneity of two events at approx-imately the same place, which can only be removed by an abstraction.—AS

21The procedure described here is known as Einstein synchronization.—BC

Problems 359

Page 360: General Relativity

2. If the clock at A synchronizes with the clock at B and also with the clock at C, the clocksat B and C also synchronize with each other.22

Thus with the help of certain imaginary physical experiments we have settled what is tobe understood by synchronous stationary clocks located at different places, and have evidentlyobtained a definition of “simultaneous,” or “synchronous,” and of “time.” The “time” of an eventis that which is given simultaneously with the event by a stationary clock located at the placeof the event, this clock being synchronous, and indeed synchronous for all time determinations,with a specified stationary clock.

In agreement with experience we further assume the quantity

2AB

t′A − tA= c,

to be a universal constant—the velocity of light in empty space.

It is essential to have time defined by means of stationary clocks in the stationary system,and the time now defined being appropriate to the stationary system we call it “the time of thestationary system.”

§ 2. On the Relativity of Lengths and Times

The following reflections are based on the principle of relativity and on the principle of theconstancy of the velocity of light. These two principles we define as follows:—

1. The laws by which the states of physical systems undergo change are not affected, whetherthese changes of state be referred to the one or the other of two systems of coordinates in uniformtranslatory motion.

2. Any ray of light moves in the “stationary” system of coordinates with the determinedvelocity c, whether the ray be emitted by a stationary or by a moving body. Hence

velocity =light path

time interval

where time interval is to be taken in the sense of the definition in § 1.

Let there be given a stationary rigid rod; and let its length be l as measured by a measuring-rod which is also stationary. We now imagine the axis of the rod lying along the axis of x ofthe stationary system of coordinates, and that a uniform motion of parallel translation withvelocity v along the axis of x in the direction of increasing x is then imparted to the rod. Wenow inquire as to the length of the moving rod, and imagine its length to be ascertained by thefollowing two operations:—

(a) The observer moves together with the given measuring-rod and the rod to be measured,and measures the length of the rod directly by superposing the measuring-rod, in just the sameway as if all three were at rest.

(b) By means of stationary clocks set up in the stationary system and synchronizing inaccordance with § 1, the observer ascertains at what points of the stationary system the two

22This assumption fails in a rotating frame (see p. 111), but Einstein has restricted himself here to an approx-imately inertial frame of reference.—BC

360 Chapter 9 Gravitational Waves

Page 361: General Relativity

ends of the rod to be measured are located at a definite time. The distance between these twopoints, measured by the measuring-rod already employed, which in this case is at rest, is also alength which may be designated “the length of the rod.”

In accordance with the principle of relativity the length to be discovered by the operation(a)—we will call it “the length of the rod in the moving system”—must be equal to the lengthl of the stationary rod.

The length to be discovered by the operation (b) we will call “the length of the (moving)rod in the stationary system.” This we shall determine on the basis of our two principles, andwe shall find that it differs from l.

Current kinematics tacitly assumes that the lengths determined by these two operations areprecisely equal, or in other words, that a moving rigid body at the epoch t may in geometricalrespects be perfectly represented by the same body at rest in a definite position.

We imagine further that at the two ends A and B of the rod, clocks are placed which syn-chronize with the clocks of the stationary system, that is to say that their indications correspondat any instant to the “time of the stationary system” at the places where they happen to be.These clocks are therefore “synchronous in the stationary system.”

We imagine further that with each clock there is a moving observer, and that these observersapply to both clocks the criterion established in § 1 for the synchronization of two clocks. Leta ray of light depart from A at the time23 tA, let it be reflected at B at the time tB, and reachA again at the time t′A. Taking into consideration the principle of the constancy of the velocityof light we find that

tB − tA =rAB

c− vand t′A − tB =

rAB

c+ v

where rAB denotes the length of the moving rod—measured in the stationary system. Observersmoving with the moving rod would thus find that the two clocks were not synchronous, whileobservers in the stationary system would declare the clocks to be synchronous.

So we see that we cannot attach any absolute signification to the concept of simultaneity, butthat two events which, viewed from a system of coordinates, are simultaneous, can no longer belooked upon as simultaneous events when envisaged from a system which is in motion relativeto that system.

§ 3. Theory of the Transformation of coordinates and Times from a Stationary System toanother System in Uniform Motion of Translation Relative to the Former

Let us in “stationary” space take two systems of coordinates, i.e., two systems, each of threerigid material lines, perpendicular to one another, and issuing from a point. Let the axes of Xof the two systems coincide, and their axes of Y and Z respectively be parallel. Let each systembe provided with a rigid measuring-rod and a number of clocks, and let the two measuring-rods,and likewise all the clocks of the two systems, be in all respects alike.

Now to the origin of one of the two systems (k) let a constant velocity v be imparted inthe direction of the increasing x of the other stationary system (K), and let this velocity becommunicated to the axes of the coordinates, the relevant measuring-rod, and the clocks. To

23“Time” here denotes “time of the stationary system” and also “position of hands of the moving clock situatedat the place under discussion.”—AS

Problems 361

Page 362: General Relativity

any time of the stationary system K there then will correspond a definite position of the axesof the moving system, and from reasons of symmetry we are entitled to assume that the motionof k may be such that the axes of the moving system are at the time t (this “t” always denotesa time of the stationary system) parallel to the axes of the stationary system.

We now imagine space to be measured from the stationary system K by means of thestationary measuring-rod, and also from the moving system k by means of the measuring-rodmoving with it; and that we thus obtain the coordinates x, y, z, and ξ, η, ζ respectively. Further,let the time t of the stationary system be determined for all points thereof at which there areclocks by means of light signals in the manner indicated in § 1; similarly let the time τ of themoving system be determined for all points of the moving system at which there are clocks atrest relative to that system by applying the method, given in § 1, of light signals between thepoints at which the latter clocks are located.

To any system of values x, y, z, t, which completely defines the place and time of an eventin the stationary system, there belongs a system of values ξ, η, ζ, τ , determining that eventrelative to the system k, and our task is now to find the system of equations connecting thesequantities.

In the first place it is clear that the equations must be linear on account of the properties ofhomogeneity which we attribute to space and time.

If we place x′ = x− vt, it is clear that a point at rest in the system k must have a system ofvalues x′, y, z, independent of time. We first define τ as a function of x′, y, z, and t. To do thiswe have to express in equations that τ is nothing else than the summary of the data of clocksat rest in system k, which have been synchronized according to the rule given in § 1.

From the origin of system k let a ray be emitted at the time τ0 along the X-axis to x′, and atthe time τ1 be reflected thence to the origin of the coordinates, arriving there at the time τ2; wethen must have 1

2(τ0 + τ2) = τ1, or, by inserting the arguments of the function τ and applyingthe principle of the constancy of the velocity of light in the stationary system:—

1

2

[τ(0, 0, 0, t) + τ

(0, 0, 0, t+

x′

c− v+

x′

c+ v

)]= τ

(x′, 0, 0, t+

x′

c− v

).

Hence, if x′ be chosen infinitesimally small,

1

2

(1

c− v+

1

c+ v

)∂τ

∂t=

∂τ

∂x′+

1

c− v∂τ

∂t,

or

∂τ

∂x′+

v

c2 − v2

∂τ

∂t= 0.

It is to be noted that instead of the origin of the coordinates we might have chosen any otherpoint for the point of origin of the ray, and the equation just obtained is therefore valid for allvalues of x′, y, z.

An analogous consideration—applied to the axes of Y and Z—it being borne in mind thatlight is always propagated along these axes, when viewed from the stationary system, with thevelocity

√c2 − v2 gives us

362 Chapter 9 Gravitational Waves

Page 363: General Relativity

∂τ

∂y= 0,

∂τ

∂z= 0.

Since τ is a linear function, it follows from these equations that

τ = a

(t− v

c2 − v2x′)

where a is a function φ(v) at present unknown, and where for brevity it is assumed that at theorigin of k, τ = 0, when t = 0.

With the help of this result we easily determine the quantities ξ, η, ζ by expressing inequations that light (as required by the principle of the constancy of the velocity of light, incombination with the principle of relativity) is also propagated with velocity c when measuredin the moving system. For a ray of light emitted at the time τ = 0 in the direction of theincreasing ξ

ξ = cτ or ξ = ac

(t− v

c2 − v2x′)

.

But the ray moves relative to the initial point of k, when measured in the stationary system,with the velocity c− v, so that

x′

c− v= t.

If we insert this value of t in the equation for ξ, we obtain

ξ = ac2

c2 − v2x′.

In an analogous manner we find, by considering rays moving along the two other axes, that

η = cτ = ac

(t− v

c2 − v2x′)

when

y√c2 − v2

= t, x′ = 0.

Thus

η = ac√

c2 − v2y and ζ = a

c√c2 − v2

z.

Substituting for x′ its value, we obtain

Problems 363

Page 364: General Relativity

τ = φ(v)β(t− vx/c2),

ξ = φ(v)β(x− vt),η = φ(v)y,

ζ = φ(v)z,

where

β =1√

1− v2/c2,

and φ is an as yet unknown function of v. If no assumption whatever be made as to the initialposition of the moving system and as to the zero point of τ , an additive constant is to be placedon the right side of each of these equations.

We now have to prove that any ray of light, measured in the moving system, is propagatedwith the velocity c, if, as we have assumed, this is the case in the stationary system; for wehave not as yet furnished the proof that the principle of the constancy of the velocity of light iscompatible with the principle of relativity.

At the time t = τ = 0, when the origin of the coordinates is common to the two systems,let a spherical wave be emitted therefrom, and be propagated with the velocity c in system K.If (x, y, z) be a point just attained by this wave, then

x2 + y2 + z2 = c2t2.

Transforming this equation with the aid of our equations of transformation we obtain aftera simple calculation

ξ2 + η2 + ζ2 = c2τ2.

The wave under consideration is therefore no less a spherical wave with velocity of propaga-tion c when viewed in the moving system. This shows that our two fundamental principles arecompatible.24

In the equations of transformation which have been developed there enters an unknownfunction φ of v, which we will now determine.

For this purpose we introduce a third system of coordinates K′, which relative to the systemk is in a state of parallel translatory motion parallel to the axis of Ξ,25 such that the origin of

24The equations of the Lorentz transformation may be more simply deduced directly from the condition thatin virtue of those equations the relation x2 + y2 + z2 = c2t2 shall have as its consequence the second relationξ2 + η2 + ζ2 = c2τ2.—AS

25 In Einstein’s original paper, the symbols (Ξ, H, Z) for the coordinates of the moving system k were introducedwithout explicitly defining them. In the 1923 English translation, (X, Y, Z) were used, creating an ambiguitybetween X coordinates in the fixed system K and the parallel axis in moving system k. Here and in subsequentreferences we use Ξ when referring to the axis of system k along which the system is translating with respect toK. In addition, the reference to system K′ later in this sentence was incorrectly given as “k” in the 1923 Englishtranslation.—JW

364 Chapter 9 Gravitational Waves

Page 365: General Relativity

coordinates of system K′ moves with velocity −v on the axis of Ξ. At the time t = 0 let allthree origins coincide, and when t = x = y = z = 0 let the time t′ of the system K′ be zero. Wecall the coordinates, measured in the system K′, x′, y′, z′, and by a twofold application of ourequations of transformation we obtain

t′ = φ(−v)β(−v)(τ + vξ/c2) = φ(v)φ(−v)t,x′ = φ(−v)β(−v)(ξ + vτ) = φ(v)φ(−v)x,y′ = φ(−v)η = φ(v)φ(−v)y,z′ = φ(−v)ζ = φ(v)φ(−v)z.

Since the relations between x′, y′, z′ and x, y, z do not contain the time t, the systems Kand K′ are at rest with respect to one another, and it is clear that the transformation from Kto K′ must be the identical transformation. Thus

φ(v)φ(−v) = 1.

We now inquire into the signification of φ(v). We give our attention to that part of the axis ofY of system k which lies between ξ = 0, η = 0, ζ = 0 and ξ = 0, η = l, ζ = 0. This part of theaxis of Y is a rod moving perpendicularly to its axis with velocity v relative to system K. Itsends possess in K the coordinates

x1 = vt, y1 =l

φ(v), z1 = 0

andx2 = vt, y2 = 0, z2 = 0.

The length of the rod measured in K is therefore l/φ(v); and this gives us the meaning of thefunction φ(v). From reasons of symmetry it is now evident that the length of a given rod movingperpendicularly to its axis, measured in the stationary system, must depend only on the velocityand not on the direction and the sense of the motion. The length of the moving rod measured inthe stationary system does not change, therefore, if v and −v are interchanged. Hence followsthat l/φ(v) = l/φ(−v), or

φ(v) = φ(−v).

It follows from this relation and the one previously found that φ(v) = 1, so that the transfor-mation equations which have been found become

τ = β(t− vx/c2),

ξ = β(x− vt),η = y,

ζ = z,

where

Problems 365

Page 366: General Relativity

β = 1/√

1− v2/c2.

§ 4. Physical Meaning of the Equations Obtained in Respect to Moving Rigid Bodies andMoving Clocks

We envisage a rigid sphere26 of radius R, at rest relative to the moving system k, and withits centre at the origin of coordinates of k. The equation of the surface of this sphere movingrelative to the system K with velocity v is

ξ2 + η2 + ζ2 = R2.

The equation of this surface expressed in x, y, z at the time t = 0 is

x2

(√

1− v2/c2)2+ y2 + z2 = R2.

A rigid body which, measured in a state of rest, has the form of a sphere, therefore has in astate of motion—viewed from the stationary system—the form of an ellipsoid of revolution withthe axes

R√

1− v2/c2, R, R.

Thus, whereas the Y and Z dimensions of the sphere (and therefore of every rigid body of nomatter what form) do not appear modified by the motion, the X dimension appears shortenedin the ratio 1 :

√1− v2/c2, i.e., the greater the value of v, the greater the shortening. For v = c

all moving objects—viewed from the “stationary” system—shrivel up into plane figures.27 Forvelocities greater than that of light our deliberations become meaningless; we shall, however,find in what follows, that the velocity of light in our theory plays the part, physically, of aninfinitely great velocity.

It is clear that the same results hold good of bodies at rest in the “stationary” system,viewed from a system in uniform motion.

Further, we imagine one of the clocks which are qualified to mark the time t when at restrelative to the stationary system, and the time τ when at rest relative to the moving system,to be located at the origin of the coordinates of k, and so adjusted that it marks the time τ .What is the rate of this clock, when viewed from the stationary system?

Between the quantities x, t, and τ , which refer to the position of the clock, we have, evidently,x = vt and

τ =1√

1− v2/c2(t− vx/c2).

Therefore,

26That is, a body possessing spherical form when examined at rest.—AS27 In the 1923 English translation, this phrase was erroneously translated as “plain figures”. I have used the

correct “plane figures” in this edition.—JW

366 Chapter 9 Gravitational Waves

Page 367: General Relativity

τ = t√

1− v2/c2 = t− (1−√

1− v2/c2)t

whence it follows that the time marked by the clock (viewed in the stationary system) is slow by1−

√1− v2/c2 seconds per second, or—neglecting magnitudes of fourth and higher order—by

12v

2/c2.

From this there ensues the following peculiar consequence. If at the points A and B of Kthere are stationary clocks which, viewed in the stationary system, are synchronous; and if theclock at A is moved with the velocity v along the line AB to B, then on its arrival at B thetwo clocks no longer synchronize, but the clock moved from A to B lags behind the other whichhas remained at B by 1

2 tv2/c2 (up to magnitudes of fourth and higher order), t being the time

occupied in the journey from A to B.

It is at once apparent that this result still holds good if the clock moves from A to B in anypolygonal line, and also when the points A and B coincide.

If we assume that the result proved for a polygonal line is also valid for a continuously curvedline, we arrive at this result: If one of two synchronous clocks at A is moved in a closed curvewith constant velocity until it returns to A, the journey lasting t seconds, then by the clockwhich has remained at rest the travelled clock on its arrival at A will be 1

2 tv2/c2 second slow.

Thence we conclude that a spring-clock at the equator must go more slowly, by a very smallamount, than a precisely similar clock situated at one of the poles under otherwise identicalconditions.28

§ 5. The Composition of Velocities

In the system k moving along the axis of X of the system K with velocity v, let a point movein accordance with the equations

ξ = wξτ , η = wητ , ζ = 0,

where wξ and wη denote constants.

Required: the motion of the point relative to the system K. If with the help of the equationsof transformation developed in § 3 we introduce the quantities x, y, z, t into the equations ofmotion of the point, we obtain

x =wξ + v

1 + vwξ/c2t,

y =

√1− v2/c2

1 + vwξ/c2wηt,

z = 0.

28Einstein specifies a spring-clock (“unruhuhr”) because the effective gravitational field is weaker at the equatorthan at the poles, so a pendulum clock at the equator would run more slowly by about two parts per thousandthan one at the north pole, for nonrelativistic reasons. This would completely mask any relativistic effect, whichhe expected to be on the order of v2/c2, or about 10−13. In any case, it later turned out that Einstein wasmistaken about this example. There is also a gravitational time dilation that cancels the kinematic effect. Seeexample 11, p. 58. The two clocks would actually agree.—BC

Problems 367

Page 368: General Relativity

Thus the law of the parallelogram of velocities is valid according to our theory only to a firstapproximation. We set29

V 2 =

(dx

dt

)2

+

(dy

dt

)2

,

w2 = w2ξ + w2

η,

a = tan−1wη/wξ,

a is then to be looked upon as the angle between the velocities v and w. After a simple calculationwe obtain

V =

√(v2 + w2 + 2vw cos a)− (vw sin a/c)2

1 + vw cos a/c2.

It is worthy of remark that v and w enter into the expression for the resultant velocity in asymmetrical manner. If w also has the direction of the axis of X, we get

V =v + w

1 + vw/c2.

It follows from this equation that from a composition of two velocities which are less than c,there always results a velocity less than c. For if we set v = c − κ,w = c − λ, κ and λ beingpositive and less than c, then

V = c2c− κ− λ

2c− κ− λ+ κλ/c< c.

It follows, further, that the velocity of light c cannot be altered by composition with avelocity less than that of light. For this case we obtain

V =c+ w

1 + w/c= c.

We might also have obtained the formula for V, for the case when v and w have the samedirection, by compounding two transformations in accordance with § 3. If in addition to thesystems K and k figuring in § 3 we introduce still another system of coordinates k′ movingparallel to k, its initial point moving on the axis of Ξ30 with the velocity w, we obtain equationsbetween the quantities x, y, z, t and the corresponding quantities of k′, which differ from theequations found in § 3 only in that the place of “v” is taken by the quantity

v + w

1 + vw/c2;

from which we see that such parallel transformations—necessarily—form a group.

29This equation was incorrectly given in Einstein’s original paper and the 1923 English translation as a =tan−1 wy/wx.—JW

30 “X” in the 1923 English translation.—JW

368 Chapter 9 Gravitational Waves

Page 369: General Relativity

We have now deduced the requisite laws of the theory of kinematics corresponding to ourtwo principles, and we proceed to show their application to electrodynamics.31

31The remainder of the paper is not given here, but can be obtained from John Walker’s web site at www.

fourmilab.ch.—BC

Problems 369

Page 370: General Relativity

Does the inertia of a body depend upon its energy content?A. Einstein, Annalen der Physik. 18 (1905) 639.

The results of the previous investigation lead to a very interesting conclusion, which is hereto be deduced.

I based that investigation on the Maxwell-Hertz equations for empty space, together withthe Maxwellian expression for the electromagnetic energy of space, and in addition the principlethat:—

The laws by which the states of physical systems alter are independent of the alternative, towhich of two systems of coordinates, in uniform motion of parallel translation relative to eachother, these alterations of state are referred (principle of relativity).

With these principles32 as my basis I deduced inter alia the following result (§ 8):—

Let a system of plane waves of light, referred to the system of coordinates (x, y, z), possessthe energy l; let the direction of the ray (the wave-normal) make an angle φ with the axis of xof the system. If we introduce a new system of coordinates (ξ, η, ζ) moving in uniform paralleltranslation with respect to the system (x, y, z), and having its origin of coordinates in motionalong the axis of x with the velocity v, then this quantity of light—measured in the system(ξ, η, ζ)—possesses the energy33

l∗ = l1− v

c cosφ√1− v2/c2

where c denotes the velocity of light. We shall make use of this result in what follows.

Let there be a stationary body in the system (x, y, z), and let its energy—referred to thesystem (x, y, z) be E0. Let the energy of the body relative to the system (ξ, η, ζ) moving asabove with the velocity v, be H0.

Let this body send out, in a direction making an angle φ with the axis of x, plane wavesof light, of energy 1

2L measured relative to (x, y, z), and simultaneously an equal quantity oflight in the opposite direction. Meanwhile the body remains at rest with respect to the system(x, y, z). The principle of energy must apply to this process, and in fact (by the principle ofrelativity) with respect to both systems of coordinates. If we call the energy of the body afterthe emission of light E1 or H1 respectively, measured relative to the system (x, y, z) or (ξ, η, ζ)respectively, then by employing the relation given above we obtain

E0 = E1 +1

2L +

1

2L,

H0 = H1 +1

2L

1− vc cosφ√

1− v2/c2+

1

2L

1 + vc cosφ√

1− v2/c2

= H1 +L√

1− v2/c2.

By subtraction we obtain from these equations

32The principle of the constancy of the velocity of light is of course contained in Maxwell’s equations.—AS33See homework problem 11, p. 156.—BC

370 Chapter 9 Gravitational Waves

Page 371: General Relativity

H0 − E0 − (H1 − E1) = L

1√

1− v2/c2− 1

.

The two differences of the form H − E occurring in this expression have simple physical signi-fications. H and E are energy values of the same body referred to two systems of coordinateswhich are in motion relative to each other, the body being at rest in one of the two systems(system (x, y, z)). Thus it is clear that the difference H − E can differ from the kinetic energyK of the body, with respect to the other system (ξ, η, ζ), only by an additive constant C, whichdepends on the choice of the arbitrary additive constants34 of the energies H and E. Thus wemay place

H0 − E0 = K0 + C,

H1 − E1 = K1 + C,

since C does not change during the emission of light. So we have

K0 −K1 = L

1√

1− v2/c2− 1

.

The kinetic energy of the body with respect to (ξ, η, ζ) diminishes as a result of the emissionof light, and the amount of diminution is independent of the properties of the body. Moreover,the difference K0 −K1, like the kinetic energy of the electron (§ 10), depends on the velocity.

Neglecting magnitudes of fourth and higher orders35 we may place

K0 −K1 =1

2

L

c2v2.

From this equation it directly follows36 that:—

If a body gives off the energy L in the form of radiation, its mass diminishes by L/c2. Thefact that the energy withdrawn from the body becomes energy of radiation evidently makes nodifference, so that we are led to the more general conclusion that

The mass of a body is a measure of its energy-content; if the energy changes by L, the masschanges in the same sense by L/9 × 1020, the energy being measured in ergs, and the mass ingrammes.

It is not impossible that with bodies whose energy-content is variable to a high degree (e.g.with radium salts) the theory may be successfully put to the test.

If the theory corresponds to the facts, radiation conveys inertia between the emitting andabsorbing bodies.

34A potential energy U is only defined up to an additive constant. If, for example, U depends on the distancebetween particles, and the distance undergoes a Lorentz contraction, there is no reason to imagine that theconstant will stay the same.—BC

35The purpose of making the approximation is to show that under realistic lab conditions, the effect exactlymimics a change in Newtonian mass.

36The object has the same velocity v before and after emission of the light, so this reduction in kinetic energyhas to be attributed to a change in mass.—BC

Problems 371

Page 372: General Relativity

The foundation of the general theory of relativityA. Einstein, Annalen der Physik 49 (1916) 769.

[A one-page introduction relating to history and personalities is omitted.—BC]

A. FUNDAMENTAL CONSIDERATIONS ON THE POSTULATE OF RELATIVITY

§1. Observations on the Special Theory of Relativity

The special theory of relativity is based on the following postulate, which is also satisfied bythe mechanics of Galileo and Newton. If a system of coordinates K is chosen so that, in relationto it, physical laws hold good in their simplest form, the same laws also hold good in relation toany other system of coordinates K ′ moving in uniform translation relative to K. This postulatewe call the “special principle of relativity.” The word “special” is meant to intimate that theprinciple is restricted to the case when K ′ has a motion of uniform translation37 relative to K,but that the equivalence of K ′ and K does not extend to the case of non-uniform motion of K ′

relative to K.

Thus the special theory of relativity does not depart from classical mechanics through thepostulate of relativity, but through the postulate of the constancy of the velocity of light invacuo, from which, in combination with the special principle of relativity, there follow, in thewell-known way, the relativity of simultaneity, the Lorentzian transformation and the relatedlaws for the behaviour of moving bodies and clocks.

The modification to which the special theory of relativity has subjected the theory of spaceand time is indeed far-reaching, but one important point has remained unaffected. For the laws ofgeometry, even according to the special theory of relativity, are to be interpreted directly as lawsrelating to the possible relative positions of solid bodies at rest; and, in a more general way, thelaws of kinematics are to be interpreted as laws which describe the relations of measuring bodiesand clocks. To two selected material points of a stationary rigid body there always correspondsa distance of quite definite length, which is independent of the locality and orientation of thebody, and is also independent of the time. To two selected positions of the hands of a clock atrest relative to the privileged system of reference there always corresponds an interval of timeof a definite length, which is independent of place and time. We shall soon see that the generaltheory of relativity cannot adhere to this simple physical interpretation of space and time.38

§2. The Need for an Extension of the Postulate of Relativity

In classical mechanics, and no less in the special theory of relativity, there is an inherentepistemological defect which was, perhaps for the first time, clearly pointed out by Ernst Mach.We will elucidate it by the following example:39 — Two fluid bodies of the same size and nature

37Here Einstein defines the distinction between special and general relativity according to whether acceleratedframes of reference are allowed. The modern tendency is to pose this distinction in terms of flat versus curvedspacetime, so that accelerated frames of reference in flat spacetime are considered to be part of special relativity.None of this has anything to do with the ability to describe accelerated objects. For example, special relativity isperfectly capable of describing the twin paradox.—BC

38Einstein is just starting to lay out his argument, and has not yet made clear in what sense these statementsabout location-independence of clocks and rulers could be empirically tested. It becomes more clear later thathe means something like this. We could try to fill spacetime with a lattice of clocks and rulers, to synchronizethe clocks, and to construct the lattice so that it consisted of right angles and equal-length line segments. Thissucceeds in special relativity, so that the geometry of spacetime is compatible with frames of reference that splitup spacetime into 3+1 dimensions, where the three dimensions are Euclidean. The same prescription fails ingeneral relativity.—BC

39This example was described on p. 115.—BC

372 Chapter 9 Gravitational Waves

Page 373: General Relativity

hover freely in space at so great a distance from each other and from all other masses that onlythose gravitational forces need be taken into account which arise from the interaction of differentparts of the same body. Let the distance between the two bodies be invariable, and in neitherof the bodies let there be any relative movements of the parts with respect to one another.

But let either mass, as judged by an observer at rest relative to the other mass, rotatewith constant angular velocity about the line joining the masses. This is a verifiable relativemotion of the two bodies. Now let us imagine that each of the bodies has been surveyed bymeans of measuring instruments at rest relative to itself, and let the surface of S1 prove to be asphere, and that of S2 an ellipsoid of revolution. Thereupon we put the question — What is thereason for this difference in the two bodies? No answer can be admitted as epistemologicallysatisfactory,40 unless the reason given is an observable fact of experience. The law of causalityhas not the significance of a statement as to the world of experience, except when observablefacts ultimately appear as causes and effects.

Newtonian mechanics does not give a satisfactory answer to this question. It pronounces asfollows: — The laws of mechanics apply to the space R1, in respect to which the body S1 is atrest, but not to the space R2 , in respect to which the body S2 is at rest. But the privilegedspace R1 of Galileo, thus introduced, is a merely factitious41 cause, and not a thing that can beobserved. It is therefore clear that Newton’s mechanics does not really satisfy the requirementof causality in the case under consideration but only apparently does so, since it makes thefactitious cause R1 responsible for the observable difference in the bodies S1 and S2 .

The only satisfactory answer must be that the physical system consisting of S1 and S2

reveals within itself no imaginable cause to which the differing behaviour of S1 and S2 can bereferred. The cause must therefore lie outside this system. We have to take it that the generallaws of motion, which in particular determine the shapes of S1 and S2 , must be such that themechanical behaviour of S1 and S2 is partly conditioned in quite essential respects, by distantmasses which we have not included in the system under consideration. These distant massesand their motions relative to S1 and S2 must then be regarded as the seat of the causes (whichmust be susceptible to observation) of the different behaviour of our two bodies S1 and S2. Theytake over the role of the factitious cause R1. Of all imaginable spaces R1, R2, etc., in any kindof motion relative to one another there is none which we may look upon as privileged a prioriwithout reviving the above-mentioned epistemological objection. The laws of physics must beof such a nature that they apply to systems reference in any kind of motion.42 Along this roadwe arrive at an extension at the postulate of relativity.

In addition to this weighty argument from the theory of knowledge, there is a well-knownphysical fact which favours an extension of the theory of relativity. Let K be a Galilean systemof reference, i.e., a system relative to which (at least in the four-dimensional region underconsideration) a mass, sufficiently distant from other masses, is moving with uniform motionin a straight line. Let K ′ be a second system of reference which is moving relative to K inuniformly accelerated translation. Then, relative to K ′, a mass sufficiently distant from other

40Of course an answer may be satisfactory from the point of view of epistemology, and yet be unsound hysically,if it is in conflict with other experiences. —AS

41i.e., artificial —BC42At this time, Einstein had high hopes that his theory would be fully Machian. He was already aware of the

Schwarzschild solution (he refers to it near the end of the paper), which offended his Machian sensibilities becauseit imputed properties to spacetime in a universe containing only a single point-mass. In the present example ofthe bodies S1 and S2, general relativity actually turns out to give the non-Machian result which Einstein heresays would be unsatisfactory.—BC

Problems 373

Page 374: General Relativity

masses would have an accelerated motion such that its acceleration and direction of accelerationare independent of the material composition and physical state of the mass.

Does this permit an observer at rest relative to K ′ to infer that he is on a “really” acceleratedsystem of reference? The answer is in the negative; for the above-mentioned relation of freelymovable masses to K ′ may be interpreted equally well in the following way. The system ofreference K ′ is unaccelerated, but the space-time territory in question is under the sway of agravitational field, which generates the accelerated motion of the bodies relative to K ′.

This view is made possible for us by the teaching of experience as to the existence of a fieldof force, namely, the gravitational field, which possesses the remarkable property of impartingthe same acceleration to all bodies.43 The mechanical behaviour of bodies relative to K ′ isthe same as presents itself to experience in the case of systems which we are wont to regardas “stationary” or as “privileged.” Therefore, from the physical standpoint, the assumptionreadily suggests itself that the systems K and K ′ may both with equal right be looked uponas “stationary” that is to say, they have an equal title as systems of reference for the physicaldescription of phenomena.

It will be seen from these reflections that in pursuing the general theory of relativity we shallbe led to a theory of gravitation, since we are able to “produce” a gravitational field merely bychanging the system of coordinates. It will also be obvious that the principle of the constancyof the velocity of light in vacuo must be modified, since we easily recognize that the path ofa ray of light with respect to K ′ must in general be curvilinear, if with respect to K light ispropagated in a straight line with a definite constant velocity.

§3. The Space-Time Continuum. Requirement of General Covariance for the EquationsExpressing General Laws of Nature

In classical mechanics, as well as in the special theory of relativity, the coordinates of spaceand time have a direct physical meaning. To say that a point-event has the X1 coordinate x1

means that the projection of the point-event on the axis of X1, determined by rigid rods andin accordance with the rules of Euclidean geometry, is obtained by measuring off a given rod(the unit of length) x1 times from the origin of coordinates along the axis of X1. To say thata point-event has the X4 coordinate x4 = t, means that a standard clock, made to measuretime in a definite unit period, and which is stationary relative to the system of coordinates andpractically coincident in space with the point-event,44 will have measured off x4 = t periods atthe occurrence of the event.

This view of space and time has always been in the minds of physicists, even if, as a rule, theyhave been unconscious of it. This is clear from the part which these concepts play in physicalmeasurements; it must also have underlain the reader’s reflections on the preceding paragraphfor him to connect any meaning with what he there read. But we shall now show that we mustput it aside and replace it by a more general view, in order to be able to carry through thepostulate of general relativity, if the special theory of relativity applies to the special case of theabsence of a gravitational field.

In a space which is free of gravitational fields we introduce a Galilean system of reference

43Eotvos has proved experimentally that the gravitational field has this property in great accuracy.—AS44We assume the possibility of verifying “simultaneity” for events immediately proximate in space, or — to

speak more precisely — for immediate proximity or coincidence in space-time, without giving a definition of thisfundamental concept.—AS

374 Chapter 9 Gravitational Waves

Page 375: General Relativity

K (x, y, z, t), and also a system of coordinates K ′ (x′, y′, z′, t′) in uniform rotation45 relative toK. Let the origins of both systems, as well as their axes of Z, permanently coincide. We shallshow that for a space-time measurement in the system K ′ the above definition of the physicalmeaning of lengths and times cannot be maintained. For reasons of symmetry it is clear that acircle around the origin in the X, Y plane of K may at the same time be regarded as a circlein the X ′, Y ′ plane of K ′. We suppose that the circumference and diameter of this circle havebeen measured with a unit measure infinitely small compared with the radius, and that we havethe quotient of the two results. If this experiment were performed with a measuring-rod46 atrest relative to the Galilean system K, the quotient would be π. With a measuring-rod at restrelative to K ′, the quotient would be greater than π. This is readily understood if we envisagethe whole process of measuring from the “stationary” system K, and take into considerationthat the measuring-rod applied to the periphery undergoes a Lorentzian contraction, while theone applied along the radius does not.47 Hence Euclidean geometry does not apply to K ′.The notion of coordinates defined above, which presupposes the validity of Euclidean geometry,therefore breaks down in relation to the system K ′. So, too, we are unable to introduce atime corresponding to physical requirements in K ′, indicated by clocks at rest relative to K ′.To convince ourselves of this impossibility, let us imagine two clocks of identical constitutionplaced, one at the origin of coordinates, and the other at the circumference of the circle, andboth envisaged from the “stationary” system K. By a familiar result of the special theory ofrelativity, the clock at the circumference — judged from K — goes more slowly than the other,because the former is in motion and the latter at rest. An observer at the common originof coordinates, capable of observing the clock at the circumference by means of light, wouldtherefore see it lagging behind the clock beside him. As he will not make up his mind to let thevelocity of light along the path in question depend explicitly on the time, he will interpret hisobservations as showing that the clock at the circumference “really” goes more slowly than theclock at the origin. So he will be obliged to define time in such a way that the rate of a clockdepends upon where the clock may be.

We therefore reach this result: — In the general theory of relativity, space and time cannotbe defined in such a way that differences of the spatial coordinates can be directly measured bythe unit measuring-rod, or differences in the time coordinate by a standard clock.

The method hitherto employed for laying coordinates into the space-time continuum in adefinite manner thus breaks down, and there seems to be no other way which would allow usto adapt systems of coordinates to the four-dimensional universe so that we might expect fromtheir application a particularly simple formulation of the laws of nature. So there is nothing forit but to regard all imaginable systems of coordinates, on principle, as equally suitable for thedescription of nature.48 This comes to requiring that: —

45This example of a rotating frame of reference was discussed on p. 108.—BC46Einstein implicitly assumes that the measuring rods are perfectly rigid, but it is not obvious that this is

possible. This issue is discussed on p. 113.—BC47As described on p. 109, Ehrenfest originally imagined that the circumference of the disk would be reduced

by its rotation. His argument was incorrect, because it assumed the ability to start the disk rotating when ithad originally been at rest. The present paper marks the first time that Einstein asserted the opposite, that thecircumference is increased.—BC

48This is a conceptual leap, not a direct inference from the argument about the rotating frame. Einstein startedthinking about this argument in 1912, and concluded from it that he should base a theory of gravity on non-Euclidean geometry. Influenced by Levi-Civita, he tried to carry out this project in a coordinate-independent way,but he failed at first, and for a while explored a theory that was not coordinate-independent. Only later did hereturn to coordinate-independence. It should be clear, then, that the link between the rotating-frame argumentand coordinate-independence was not as clearcut as Einstein makes out here, since he himself lost faith in it for

Problems 375

Page 376: General Relativity

The general laws of nature are to be expressed by equations which hold good for all thesystems of coordinates, that is, are covariant with respect to any substitutions whatever (generallycovariant).49

It is clear that a physical theory which satisfies this postulate will also be suitable for thegeneral postulate of relativity.50 For the sum of all substitutions in any case includes thosewhich correspond to all relative motions of three-dimensional systems of coordinates. Thatthis requirement of general covariance, which takes away from space and time the last remnantof physical objectivity,51 is a natural one, will be seen from the following reflection. All ourspace-time verifications invariably amount to a determination of space-time coincidences.52 If,for example, events consisted merely in the motion of material points, then ultimately nothingwould be observable but the meetings of two or more of these points. Moreover, the resultsof our measurings are nothing but verifications of such meetings of the material points of ourmeasuring instruments with other material points, coincidences between the hands of a clockand points on the clock-dial, and observed point-events happening at the same place at the sametime.

The introduction of a system of reference serves no other purpose than to facilitate thedescription of the totality of such coincidences. We allot to the universe four space-time variablesx1, x2, x3, x4 in such a way that for every point-event there is a corresponding system of valuesof the variables x1 . . .x4. To two coincident point-events there corresponds one system of valuesof the variables x1 . . .x4, i.e., coincidence is characterized by the identity of the coordinates.If, in place of the variables x1 . . .x4, we introduce functions of them, x′1, x′2, x′3, x′4, as a newsystem of coordinates, so that the systems of values are made to correspond to one anotherwithout ambiguity, the equality of all four coordinates in the new system will also serve as anexpression for the space-time coincidence of the two point-events. As all our physical experiencecan be ultimately reduced to such coincidences, there is no immediate reason for preferringcertain systems of coordinates to others, that is to say, we arrive at the requirement of generalcovariance.

a while.—BC49In this book I’ve used the more transparent terminology “coordinate independence” rather than “general

covariance.”—BC50For more on this point, see p. 117.—BC51This is an extreme interpretation of general covariance, and one that Einstein himself didn’t hew closely to

later on. He presented an almost diametrically opposed interpretation in a philosophical paper, “On the aether,”Schweizerische naturforschende Gesellschaft 105 (1924) 85.—BC

52i.e., what this book refers to as incidence measurements (p. 97)—BC

376 Chapter 9 Gravitational Waves

Page 377: General Relativity

Appendix 2: Hints and solutions

Hints

Hints for Chapter 1

Page 38, problem 5: Apply the equivalence principle.

Solutions to Selected Homework Problems

Solutions for Chapter 1

Page 38, problem 3:

Pick two points P1 and P2. By O2, there is another point P3 that is distinct from P1 andP2. (Recall that the notation [ABC] was defined so that all three points must be distinct.)Applying O2 again, there must be a further point P4 out beyond P3, and by O3 this can’t bethe same as P1. Continuing in this way, we can produce as many points as there are integers.

Page 38, problem 4:

(a) If the violation of (1) is tiny, then of course Kip won’t really have any practical way toviolate (2), but the idea here is just to illustrate the idea, so to make things easy, let’s imagine anunrealistically large violation of (1). Suppose that neutrons have about the same inertial mass asprotons, but zero gravitational mass, in extreme violation of (1). This implies that neutron-richelements like uranium would have a much lower gravitational acceleration on earth than oneslike oxygen that are roughly 50-50 mixtures of neutrons and protons. Let’s also simplify bymaking a second unrealistically extreme assumption: let’s say that Kip has a keychain in hispocket made of neutronium, a substance composed of pure neutrons. On earth, the keychainhovers in mid-air. Now he can release his keychain in the prison cell. If he’s on a planet, itwill hover. If he’s in an accelerating spaceship, then the keychain will follow Newton’s first law(its tendency to do so being measured by its nonzero inertial mass), while the deck of the shipaccelerates up to hit it.

(b) It violates O1. O1 says that objects prepared in identical inertial states (as defined bytwo successive events in their motion) are predicted to have identical motion in the future. Thisfails in the case where Kip releases the neutronium keychain side by side with a penny.

Page 38, problem 5: By the equivalence principle, we can adopt a frame tied to the tossedclock, B, and in this frame there is no gravitational field. We see a desk and clock A go by.The desk applies a force to clock A, decelerating it and then reaccelerating it so that it comesback. We’ve already established that the effect of motion is to slow down time, so clock A readsa smaller time interval.

Page 38, problem 6: (a) Generalizing the expression gy/c2 for the fractional time dilation tothe case of a nonuniform field, we find Φ/c2, where Φ is the Newtonian gravitational potential,i.e., the gravitational energy per unit mass. The shell theorem gives a gravitational field g =Mr/R3. Integration shows Φ = Mr2/2R3. The difference in the gravitational potential betweenthese two points, divided by c2, is Φ/c2 = M/2c2R, which comes out to be 3.5×10−10. This is thefractional difference in clock rates. (b) The probe’s velocity is on the same order of magnitude

Problems 377

Page 378: General Relativity

as escape velocity from the inner solar system, so very roughly we can say that v ∼√|∆Φ|,

where ∆Φ is the difference between the gravitational potential at the earth’s orbit and infinity.This gives a Doppler shift v/c ∼

√|∆Φ|/c. We saw in part a that the gravitational Doppler

shift was ∆Φ/c2, which is the square of this quantity, and therefore much smaller.

Page 39, problem 7: (a) In case 1 there is no source of energy, so the particle cannot radiate.In case 2-4, the particle radiates, because there are sources of energy (loss of gravitational energyin 2 and 3, the rocket fuel in 4).

(b) In 1, Newton says the object is subject to zero net force, so its motion is inertial. In2-4, he says the object is subject to a nonvanishing net force, so its motion is noninertial. Thismatches up with the results of the energy analysis.

(c) The equivalence principle, as discussed on page 39, is vague, and is particularly difficultto apply successful and unambiguously to situations involving electrically charged objects, dueto the difficulty of defining locality. Applying the equivalence principle in the most naive way,we predict that there can be no radiation in cases 2 and 3 (because the object is following ageodesic, minding its own business). In case 4, everyone agrees that there will be radiationobservable back on earth (although it’s possible that it would not be observable to an observermomentarily matching velocities with the rocket). The naive equivalence principle says that 1and 4 must give the same result, so we should have radiation in 1 as well. These predictions arewrong in two out of the four equations, which tells us that we had better either not apply theequivalence principle to charged objects, or not apply it in such a naive way.

Page 39, problem 8:

(a) The dominant form of radiation from the orbiting charge will be the lowest-order non-vanishing multipole, which in this case is a dipole. The power radiated from a dipole scales liked2ω4, where d is the dipole moment. For an orbit of radius r, this becomes q2r2ω4. To findthe reaction force on the charged particle, we can use the relation p = E/c for electromagneticwaves (section ??), which tells us that the force is equal to the power, up to a proportionalityconstant c. Therefore ar ∝ q2r2ω4/m. The gravitational acceleration is ag = ω2r, so we havear/ag ∝ (q2/m)ω2r, or ar/ag ∝ (q2/m)ag, where the ag on the right can be taken as an orbitalparameter, and for a low-earth orbit is very nearly equal to the usual acceleration of gravity atthe earth’s surface.

(b) In SI units, ar/ag ∼ (k/c4)(q2/m)ag, where k is the Coulomb constant.

(c) The result is 10−34. If one tried to do this experiment in reality, the effect would beimpossible to detect, because the proton would be affected much more strongly by ambientelectric and magnetic fields than by the effect we’ve calculated.

Remark: It is odd that the result depends on q2/m, rather than on the charge-to-mass ratioq/m, as is usually the case for a test particle’s trajectory. This means that we get a differentanswer if we take two identical objects, place them side by side, and consider them as one bigobject! This is not as unphysical as it sounds. The two side-by-side objects radiate coherently,so the field they radiate is doubled, and the radiated power is quadrupled. Each object’s rateof orbital decay is doubled, with the extra effect coming from electromagnetic interactions withthe other object’s fields.

Solutions for Chapter 2

378 Chapter 9 Gravitational Waves

Page 379: General Relativity

Page 84, problem 1:

(a) Let t be the time taken in the lab frame for the light to go from one mirror to the other,and t′ the corresponding interval in the clock’s frame. Then t′ = L, and (vt)2+L2 = t2, where theuse of the same L in both equations makes use of our prior knowledge that there is no transverselength contraction. Eliminating L, we find the expected expression for γ, which is independentof L (b) If the result of a were independent of L, then the relativistic time dilation would dependon the details of the construction of the clock measuring the time dilation. We would be forcedto abandon the geometrical interpretation of special relativity. (c) The effect is to replace vt withvt+at2/2 as the quantity inside the parentheses in the expression (. . .)2 +L2 = t2. The resultingcorrection terms are of higher order in t than the ones appearing in the original expression, andcan therefore be made as small in relative size as desired by shortening the time t. But this isexactly what happens when we make the clock sufficiently small.

Page 85, problem 2:

Since gravitational redshifts can be interpreted as gravitational time dilations, the gravi-tational time dilation is given by the difference in gravitational potential g dr (in units wherec = 1). The kinematic effect is given by dγ = d(v2)/2 = ω2r dr. The ratio of the two effects isω2R cosλ/g, where R is the radius of the Earth and λ is the latitude. Tokyo is at 36 degreeslatitude, and plugging this in gives the claimed result.

Page 85, problem 3:

(a) Reinterpret figure j on p. 82 as a picture of a Sagnac ring interferometer. Let light waves1 and 2 move around the loop in opposite senses. Wave 1 takes time t1i to move inward alongthe crack, and time t1o to come back out. Wave 2 takes times t2i and t2o. But t1i = t2i (sincethe two world-lines are identical), and similarly t1o = t2o. Therefore creating the crack has noeffect on the interference between 1 and 2, and splitting the big loop into two smaller loopsmerely splits the total phase shift between them. (b) For a circular loop of radius r, the timeof flight of each wave is proportional to r, and in this time, each point on the circumferenceof the rotating interferometer travels a distance v(time) = (ωr)(time) ∝ r2. (c) The effect isproportional to area, and the area is zero. (d) The light clock in c has its two ends synchronizedaccording to the Einstein prescription, and the success of this synchronization verifies Einstein’sassumption of commutativity in this particular case. If we make a Sagnac interferometer in theshape of a triangle, then the Sagnac effect measures the failure of Einstein’s assumption that allthree corners can be synchronized with one another.

Page 85, problem 5:

Here is the program:

1 L1:matrix([cosh(h1),sinh(h1)],[sinh(h1),cosh(h1)]);

2 L2:matrix([cosh(h2),sinh(h2)],[sinh(h2),cosh(h2)]);

3 T:L1.L2;

4 taylor(taylor(T,h1,0,2),h2,0,2);

The diagonal components of the result are both 1 + η21/2 + η2

2/2 + η1η2 + . . . Everything afterthe 1 is nonclassical. The off-diagonal components are η1 + η2 + η1η

22/2 + η2η

21/2 + . . ., with the

third-order terms being nonclassical.

Solutions for Chapter 3

Problems 379

Page 380: General Relativity

Page 119, problem 1:

(a) As discussed in example 4 on page 95, a cylinder has local, intrinsic properties identicalto those of flat space. The cylindrical model therefore has the same properties L1-L5 as ourstandard model of Lorentzian space, provided that L1-L5 are taken as purely local statements.

(b) The cylindrical model does violate L3. In this model, the doubly-intersecting world-linesdescribed by property G will not occur if the world-lines are oriented exactly parallel to thecylinder. This picks out a preferred direction in space, violating L3 if L3 is interpreted globally.Frames moving parallel to the axis have different properties from frames moving perpendicularto the axis.

But just because this particular model violates the global interpretation of L3, that doesn’tmean that all models of G violate it. We could instead construct a model in which space wrapsaround in every direction. In the 2+1-dimensional case, we can visualize the spatial part of sucha model as the surface of a doughnut embedded in three-space, with the caveat that we don’twant to think of the doughnut hole’s circumference as being shorter than the doughnut’s outerradius. Giving up the idea of a visualizable model embedded in a higher-dimensional space,we can simply take a three-dimensional cube and identify its opposite faces. Does this modelviolate L3? It’s not quite as obvious, but actually it does. The spacelike great-circle geodesics ofthis model come in different circumferences, with the shortest being those parallel to the cube’saxes.

We can’t prove by constructing a finite number of models that every possible model of Gviolates L3. The two models we’ve found, however, can make us suspect that this is true, andcan give us insight into how to prove it. For any pair of world-lines that provide an example ofG, we can fix a coordinate system K in which the two particles started out at A by flying offback-to-back. In this coordinate system, we can measure the sum of the distances traversed bythe two particles from A to B. (If homegeneity, L1, holds, then they make equal contributionsto this sum.) The fact that the world-lines were traversed by material particles means that wecan, at least in principle, visit every point on them and measure the total distance using rigidrulers. We call this the circumference of the great circle AB, as measured in a particular frame.The set of all such circumferences has some greatest lower bound. If this bound is zero, thensuch geodesics can exist locally, and this would violate even the local interpretation of L1-L5.If the bound is nonzero, then let’s fix a circle that has this minimum circumference. Mark thespatial points this circle passes through, in the frame of reference defined above. This set ofpoints is a spacelike circle of minimum radius. Near a given point on the circle, the circle lookslike a perfectly straight axis, whose orientation is presumably random. Now let some observerK′ travel around this circle at a velocity v relative to K, measuring the circumference with aLorentz-contracted ruler. The circumference is greater than the minimal one measured by K.Therefore for any axis with a randomly chosen orientation, we have a preferred rest frame inwhich the corresponding great circle has minimum circumference. This violates L3. Thanks tophysicsforums user atyy for suggesting this argument.

More detailed discussions of these issues are given in Bansal et al., arxiv.org/abs/gr-qc/0503070v1, and Barrow and Levin, arxiv.org/abs/gr-qc/0101014v1.

Page 119, problem 2:

In these Cartesian coordinates, the metric is diagonal and has elements with opposite signs.Due to the SI units, it is not possible for the two nonzero elements of the metric to have thesame units. Let’s arbitrarily fix gtt = 1. Then we must have gxx = c−2. Using the metric to

380 Chapter 9 Gravitational Waves

Page 381: General Relativity

lower the index on dsa, we find dsa = (dst, c−2 dsx).

Page 120, problem 7:

According to the Einstein summation convention, the repeated index implies a sum, so theresult is a scalar. As shown in example 14 on p. 106, each term in the sum equals 1, so theresult is unitless and simply equals the number of dimensions.

Page 119, problem 3:

(a) The first two violate the rule that summation only occurs over up-down pairs of indices.The third expression would result in a quantity that couldn’t be classified as either contravariantor covariant. (b) In differential geometry, different elements of the same tensor can have differentunits. Since, as remarked in the problem, Uaa were to be interpreted as a sum, this meanadding things that had different units. In the expression pa − qa, even if we suppose thatp and q both represent the same type of physical quantity, e.g., force, their covariant andcontravariant versions would not necessarily have the same units unless we happened to beworking in coordinates such that the metric was unitless.

Page 119, problem 4:

Assuming the mountaineer uses radians and the metric system, the coordinates have units1, 1, and m (where 1 means a unitless quantity and m means meters — radians are not reallyunits). Therefore the units of an infinitesimal difference in coordinates dsa are also (1, 1, m).Because the coordinates are orthogonal, the metric is diagonal. If we want gab dsa dsb to haveunits of m2, then its diagonal elements must have units of (m2, m2, 1). The upper-index metricgab is the inverse of its lower-index version gab, so its units are (m−2, m−2, 1). Mechanical workhas units of N ·m, so given dW = Fa dsa, the units of Fa must be (N ·m, N ·m, N). Raising theindex on the force using gab gives (N/m, N/m, N).

Page 119, problem 5:

The only aspect of the geometrical representation that needs to be changed is that instead ofrepresenting an upper-index vector using a pair of parallel lines, we should use a pair of parallelplanes.

Page 120, problem 8:

The coordinate T would have a discontinuity of 2πωr2/(1− ω2r2). Reinserting factors of cto make it work out in SI units, we have 2πωr2c−2/(1− ω2r2c−2) ≈ 207 ns. The exact error inposition that would result is dependent on the geometry of the current position of the satellites,but it would be on the order of c∆T , which is ∼ 100 m. This is considerably worse than civilianGPS’s 20-meter error bars.

Page 120, problem 9:

The process that led from the Euclidean metric of example 7 on page 102 to the non-Euclidean one of equation [3] on page 111 was not just a series of coordinate transformations.At the final step, we got rid of the variable t, reducing the number of dimensions by one.Similarly, we could take a Euclidean three-dimensional space and eliminate all the points exceptfor the ones on the surface of the unit sphere; the geometry of the embedded sphere is non-Euclidean, because we’ve redefined geodesics to be lines that are “as straight as they can be”(i.e., have minimum length) while restricted to the sphere. In the example of the carousel, thefinal step effectively redefines geodesics so that they have minimal length as determined by achain of radar measurements.

Problems 381

Page 382: General Relativity

Page 120, problem 10:

(a) No. The track is straight in the lab frame, but curved in the rotating frame. Since thespatial metric in the rotating frame is symmetric with respect to clockwise and counterclockwise,the metric can never result in geodesics with a specific handedness. (b) The dθ′2 term of themetric blows up here. A geodesic connecting point A, at r = 1/ω, with point B, at r < 1/ω,must have minimum length. This requires that the geodesic be directly radial at A, so thatdθ′ = 0; for if not, then we could vary the curve slightly so as to reduce |dθ′|, and the resultingincrease in the dr2 term would be negligible compared to the decrease in the dθ′2 term. (c) No.As we found in part a, laser beams can’t be used to form geodesics.

Page 121, problem 11: A and B are equivalent under a Lorentz transformation, so the Penroseresult clearly includes B. The outline of the sphere is still spherical. C is also equivalent to Aand B, because there are only two effects (Lorentz contraction and optical aberration), and bothof them depend only on the observer’s instantaneous velocity, not on his history of motion. Dis not a well-defined question. When asking this question, we’re implicitly assuming that thesphere has some “real” shape, which appears different because the sphere has been set intomotion. But you can’t impart an angular acceleration to a perfectly rigid body in relativity.

Page 121, problem 12: Applying the de Broglie relations to the relativistic identity m2 =E2 − p2, we find the dispersion relation to be m2 = ω2 − k2. The group velocity is dω/dk =√

1− (m/ω)2. Applying the de Broglie relations to this, and associating the group velocitywith v, we have v =

√1− (m/E)2, which is equivalent to E = mγ. Since E = mγ has been

established, and m2 = E2− p2 was assumed, it follows immediately that p = mγv holds as well.All hell breaks loose if we try to associate v with the phase velocity, which is ω/k =

√1 + (m/k)2.

For example, the phase velocity is always greater than c(= 1) for m > 0.

Solutions for Chapter 4

Page 156, problem 1:

The four-velocity of a photon (or of any massless particle) is undefined. One way to see thisis that dτ = 0 for a massless particle, so vi = dxi/ dτ involves division by zero. Alternatively,pi = mvi would always give an energy and momentum of zero if vi were well defined, yet weknow that massless particles can have both energy and momentum.

Page 156, problem 2:

To avoid loss of precision in numerical operations like subtracting v from 1, it’s betterto derive an ultrarelativistic approximation. The velocity corresponding to a given γ is v =√

1− γ−2 ≈ 1 − 1/2γ2, so 1 − v ≈ 1/2γ2 = (m/E)2/2. Reinserting factors of c so as to makethe units come out right in the SI system, this becomes (mc2/E)2/2 = 9× 10−9.

Page 156, problem 8:

The time on the clock is given by s =∫

ds, where the integral is over the clock’s world-line.The quantity ds is our prototypical Lorentz scalar, so it’s frame-independent. An integral isjust a sum, and the tensor transformation laws are linear, so the integral of a Lorentz scalar isstill a Lorentz scalar. Therefore s is frame-independent. There is no requirement that we use aninertial frame. It would also work fine, for example, in a frame rotating with the earth. We don’teven need to have a frame of reference. All of the above applies equally well to any coordinatesystem at all, even one that doesn’t have any sensible interpretation as some observer’s frame

382 Chapter 9 Gravitational Waves

Page 383: General Relativity

of reference.

Page 156, problem 10:

Such a transformation would take an energy-momentum four-vector (E, p), with E > 0, toa different four-vector (E′, p′), with E′ < 0. That transformation would also have the effect oftransforming a timelike displacement vector from the future light cone to the past light cone. Butthe Lorentz transformations were specifically constructed so as to preserve causality (propertyL5 on p. 52), so this can’t happen.

Page 156, problem 11:

A spatial plane is determined by the light’s direction of propagation and the relative ve-locity of the source and observer, so the 3+1 case reduces without loss of generality to 2+1dimensions. The frequency four-vector must be lightlike, so its most general possible formis (f , f cos θ, f sin θ), where θ is interpreted as the angle between the direction of propaga-tion and the relative velocity. Putting this through a Lorentz boost along the x axis, we findf ′ = γf(1 +v cos θ), which agrees with Einstein’s equation on page 370, except for the arbitraryconvention involved in defining the sign of v.

Page 157, problem 12:

The exact result depends on how one assumes the charge is distributed, so this can’t be anymore than a rough estimate. The energy density is (1/8πk)E2 ∼ ke2/r4, so the total energyis an integral of the form

∫r−4 dV ∼

∫r−2 dr, which diverges like 1/r as the lower limit of

integration approaches zero. This tells us that most of the energy is at small values of r, so toa rough approximation we can just take the volume of integration to be r3 and multiply by afixed energy density of ke2/r4. This gives an energy of ∼ ke2/r. Setting this equal to mc2 andsolving for r, we find r ∼ ke2/mc2 ∼ 10−15 m.

Remark: Since experiments have shown that electrons do not have internal structure on thisscale, we conclude that quantum-mechanical effects must prevent the energy from blowing upas r → 0.

Page 157, problem 17:

Doing a transformation first by u and then by v results in E′′ = E−v×(u×E)+(u+v)×B.This is not of the same form, because if B = 0, we can have E′′ 6= E.

Solutions for Chapter 5

Page 200, problem 1:

The answer to this is a little subtle, since it depends on how we take the limit. Suppose wejoin two planes with a section of a cylinder having radius ρ, and let ρ go to zero. The Gaussiancurvature of a cylinder is zero, so in this limit we fail to reproduce the correct result. On theother hand, suppose we take a discus of radius ρ1 whose edge has a curve of radius ρ2. in thelimit ρ1 → +∞, ρ2 → 0+, we can get either K = 1/(ρ1ρ2)→ 0 or K → +∞, depending on howquickly ρ1 and ρ2 approach their limits.

Page 200, problem 6:

(a) Expanding in a Taylor series, they both have gtt = 1 + 2gz + . . .

(b) This property holds for [2] automatically because of the way it was constructed. In [1],

Problems 383

Page 384: General Relativity

the nonvanishing Christoffel symbols (ignoring permutations of the lower indices) are Γtzt = gand Γztt = ge2gz. We can apply the geodesic equation with the affine parameter taken to be theproper time, and this gives z = −ge2gz t2, where dots represent differentiation with respect toproper time. For a particle instantaneously at rest, t = 1/

√gtt = e−2gz, so z = −g.

(c) [2] was constructed by performing a change of coordinates on a flat-space metric, so itis flat. The Riemann tensor of [1] has Rtztz = −g2, so [1] isn’t flat. Therefore the two can’t bethe same under a change of coordinates.

(d) [2] is flat, so its curvature is constant. [1] has the property that under the transformationz → z + c, where c is a constant, the only change is a rescaling of the time coordinate; bycoordinate invariance, such a rescaling is unobservable.

Page 201, problem 7: (a) 0 ≤ x ≤ 1(b) 0 ≤ x < 1(c) x2 ≤ 2

Page 201, problem 8: The double cone fails to satisfy axiom M2, because the apex hasproperties that differ topologically from those of other points: deleting it chops the space intotwo disconnected pieces.

Page 201, problem 9: When we use a word like “torus,” there is some hidden ambiguity. Wecould mean something strange like the following. Suppose we construct the three-dimensionalspace of coordinates (x, y, z) in which all three coordinates are rational numbers. Then let atorus be the set of all such points lying at a distance of 1/2 from the nearest point on a unitcircle. This is in some sense a torus, but it doesn’t have the topological properties one usuallyassumes. For example, two continuous curves on its surface can cross without having a point ofintersection. We can’t get anywhere without assuming that the word “torus” refers to a surfacethat has the usual topological properties.

Now let’s prove that it’s a manifold using both definitions.

Using the topological definition, M1 is satisfied with n = 2, because every point on the surfacelives in a two-dimensional neighborhood. M2 holds because the only differences between pointsare those that are not topological, e.g., Gaussian curvature. M3 holds due to the interpretationoutlined in the first paragraph.

Alternatively, we can use the local-coordinate definition. We have already shown that acircle is a 1-manifold, which can be coordinatized in two patches by an angle φ. The torus cantherefore be coordinatized by a pair of such angles, (φ1,φ2), in four patches. Again we need toassume the interpretation given above, since otherwise real-number pairs like (φ1,φ2) wouldn’thave the same topology as points on the rational-number torus.

Page 201, problem 10: In the torus, we can construct a closed curve C that encircles thehole. If we have a homeomorphism, C must have an image C′ under that homeomorphismthat is a closed curve in the sphere. C′ can then be contracted continuously to a point, andsince the inverse of the homeomorphism is also continuous, it would be possible to contract Ccontinuously to a point. But this is impossible because C encircles the hole.

Page 201, problem 11: (a) The Christoffel symbols are (assuming I didn’t make a mistakein calculating them by hand) Γtxx = (1/2)ptp−1 and Γxxt = Γxtx = (1/2)pt−1. (b) After that, Iresorted to a computer algebra system (Maxima), which told me that, for example, the Riccitensor has Rtt = (p/2− p2/4)t−2.

384 Chapter 9 Gravitational Waves

Page 385: General Relativity

Solutions for Chapter 6

Page 240, problem 3: (a) In the center of mass frame, symmetry guarantees that the testparticle exits with a speed equal to the speed with which it entered, and the entry and exitvelocities are v and −v. Now let’s switch to the sun’s frame. This involves adding u to allvelocities, so the entry and exit velocities become v + u and −v + u. The difference in speed is2u.

(b) The derivation assumed that velocities add linearly when you change frames of reference,which is a nonrelativistic approximation. Relativistically, velocities combine not like u+ v butlike (u+ v)/(1 + uv). If you put in v = 1, the result for the combined velocity is always 1.

This is a funny case where we can get the answer to a gravitational problem purely throughspecial relativity. We might worry that the SR-based answer is wrong, because we really needGR for gravity. But we can get the same answer from GR, since GR says that a test particlealways follows a geodesic, and a lightlike geodesic always remains lightlike. The reason SRworked is that an observer could watch a patch of flat space far away from the black hole,observe a wave-packet of light passing through that patch on the way to the black hole, andthen observe it again on the way back out. Since the patch is flat, SR works.

Page 240, problem 4: (a) L = 0 by symmetry. The quantity E can be interpreted as theenergy per unit mass that is added to the entire system by inserting the test particle. Since thetest particle starts at rest and far away, the added energy is simply the mass of the particle,and E = 1.

(b) In the special case L = 0, E = 1, the general equation of motion for a test particle in aSchwarzschild spacetime becomes simply r2 = 2m/r. Separating variables and integrating, wehave r ∝ s2/3, where the constant of integration is chosen to be zero. This clearly shows thatwe move from any given r to r = 0 in a finite proper time s.

Page 240, problem 5: (a) For a displacement with dφ = 0, we have ds2 = gtt dt2, so gtt =√ds/dt =

√3 sin t cos t. For an azimuthal displacement, ds = y dφ, so gφφ =

√y = sin3/2 t.

(b) At places on the surface of revolution corresponding to the cusps of the astroid, one orboth of the lower-index elements of the metric go to zero, which means that the correspondingupper-index elements blow up. These are the sharp points of the surface at the x axis and thesharp edge at its waist. There are at least coordinate singularities there, but the question iswhether they are intrinsic. The only intrinsic measure of curvature in two dimensions is theGaussian curvature, which can be interpreted as (minus) the product of the curvatures alongthe two principal axes, here k1 = −(2/3) csc 2t and k2 = 1/y = sin−3 t. At the waist, bothfactors blow up, so the Gaussian curvature, which is intrinsic, blows up, and this is not just acoordinate singularity. The same thing happens at the tips. Interestingly, a geodesic that hitsone of these singularities can still be traced through in a continuous way and extended onwardsuch that its arc length remains finite. This property is called geodesic completeness.

Page 241, problem 6: (a) There are singularities at r = 0, where gθ′θ′ = 0, and r = 1/ω,where gtt = 0. These are considered singularities because the inverse of the metric blows up.They’re coordinate singularities, because they can be removed by a change of coordinates backto the original non-rotating frame.(b) This one has singularities in the same places. The one at r = 0 is a coordinate singularity,because at small r the ω dependence is negligible, and the metric is simply that of ordinaryplane polar coordinates in flat space. The one at r = 1/ω is not a coordinate singularity. The

Problems 385

Page 386: General Relativity

following Maxima code calculates its scalar curvature R = Raa, which is esentially just theGaussian curvature, since this is a two-dimensional space.

1 load(ctensor);

2 dim:2;

3 ct_coords:[r,theta];

4 lg:matrix([-1,0],

5 [0,-r^2/(1-w^2*r^2)]);

6 cmetric();

7 ricci(true);

8 scurvature();

The result is R = 6ω2/(1− 2ω2r2 + ω4r4). This blows up at r = 1/ω, which shows that this isnot a coordinate singularity. The fact that R does not blow up at r = 0 is consistent with ourearlier conclusion that r = 0 is a coordinate singularity, but would not have been sufficient toprove that conclusion.(c) The argument is incorrect. The Gaussian curvature is not just proportional to the angulardeficit ε, it is proportional to the limit of ε/A, where A is the area of the triangle. The area ofthe triangle can be small, so there is no upper bound on the ratio ε/A. Debunking the argumentrestores consistency with the answer to part b.

Page 241, problem 10: The only nonvanishing Christoffel symbol is Γttt = −1/2t. Theantisymmetric treatment of the indices in Rabcd = ∂cΓ

adb−∂dΓacb+ΓaceΓ

edb−ΓadeΓ

ecb guarantees

that the Riemann tensor must vanish when there is only one nonvanishing Christoffel symbol.

Page 241, problem 11: The first thing one notices is that the equation Rab = k isn’t writtenaccording to the usual rules of grammar for tensor equations. The left-hand side has two lowerindices, but the right-hand side has none. In the language of freshman physics, this is likesetting a vector equal to a scalar. Suppose we interpret it as meaning that each of R’s 16components should equal k in a vacuum. But this still isn’t satisfactory, because it violatescoordinate-independence. For example, suppose we are initially working with some coordinatesxµ, and we then rescale all four of them according to xµ

′= 2xµ. Then the components of Rab

all scale down by a factor of 4. But this would violate the proposed field equation.

Page 241, problem 12: The following Maxima code calculates the Ricci tensor for a metricwith gtt = h and grr = k.

1 load(ctensor);

2 dim:3;

3 ct_coords:[t,r,phi];

4 depends(h,r);

5 depends(k,r);

6 lg:matrix([h,0,0],

7 [0,-k,0],

8 [0,0,-r^2]);

9 cmetric();

10 ricci(true);

Inspecting the output (not reproduced here), we see that Rφφ = 0 requires k′/k = h′/h. Since

386 Chapter 9 Gravitational Waves

Page 387: General Relativity

the logarithmic derivatives of h and k are the same, the two functions can differ by at most aconstant factor c. So now we do a second iteration of the calculation:

1 load(ctensor);

2 dim:3;

3 ct_coords:[t,r,phi];

4 depends(h,r);

5 lg:matrix([h,0,0],

6 [0,-c*h,0],

7 [0,0,-r^2]);

8 cmetric();

9 ricci(true);

The result for Rrr is independent of c. Since h is essentially the gravitational potential, we havethe requirements h′ > 0 (because gravity is attractive) and h′′ < 0 (because gravity weakenswith distance). Therefore we find that Rrr is positive, and we do not obtain a vacuum solution.

Page 242, problem 13: This idea is not well defined because it implicitly assumes that wecan fix a global frame of reference. The notion of reversing velocity vectors (i.e., reversingthe spacelike components of 4-velocities) implies that there are some velocity vectors whosespacelike parts are zero, so that they aren’t changed by a flip. This amounts to choosing aframe of reference. To be able to do the flip globally, you’d have to have some sensible notionof a global frame of reference, but we don’t necessarily have that. (In a spacetime with closedtimelike curves, there is also the issue that we don’t have complete freedom to choose initialconditions on a spacelike surface, because these conditions might end up not being consistentwith themselves when evolved around a CTC.)

Solutions for Chapter 7

Page 265, problem 2: (a) If she makes herself stationary relative to the sun, she will stillexperience local geometrical changes because of the planets. (b) If it was to be impossiblefor her to prove the universe’s nonstationarity, then any world-line she picked would have toexperience constant local geometrical conditions. A counterexample is any world-line extendingback to the Big Bang, which is a singularity with drastically different conditions than any otherregion of spacetime. (c) To maintain a constant local geometry, she would have to “surf” thewave, but she can’t do that, because it propagates at the speed of light. (d) There are placeswhere the local mass-energy density is increasing, and the field equations link this to a changein the local geometry.

Page 265, problem 4:

Under these special conditions, the geodesic equations become r = Γrttt2, φ = 0, t = 0,

where the dots can in principle represent differentation with respect to any affine parameter welike, but we intend to use the proper time s. By symmetry, there will be no motion in the zdirection. The Christoffel symbol equals −(1/2)er(cos

√3r −

√3 sin

√3r). At a location where

the cosine equals 1, this is simply −er/2. For t, we have dt/ds = 1/√gtt = e−r/2. The result of

the calculation is simply r = −1/2, which is independent of r.

Page 265, problem 5:

The Petrov metric is one example. The metric has no singularities anywhere, so the r

Problems 387

Page 388: General Relativity

coordinate can be extended from −∞ to +∞, and there is no point that can be considered thecenter. The existence of a dφ dt term in the metric shows that it is not static.

A simpler example is a spacetime made by taking a flat Lorentzian space and making it wraparound topologically into a cylinder, as in problem 1 on p. 119. As discussed in the solutionto that problem, this spacetime has a preferred state of rest in the azimuthal direction. Ina frame that is moving azimuthally relative to this state of rest, the Lorentz transformationrequires that the phase of clocks be adjusted linearly as a function of the azimuthal coordinateφ. As described in section 3.5.4, this will cause a discontinuity once we wrap around by 2π, andtherefore clock synchronization fails, and this frame is not static.

Solutions for Chapter 8

Page 340, problem 2: No. General relativity only allows coordinate transformations that aresmooth and one-to-one (see p. 97). This transformation is not smooth at t = 0.

Page 340, problem 5: (a) The Friedmann equations are

a

a=

1

3Λ− 4π

3(ρ+ 3P )

and (a

a

)2

=1

3Λ +

3ρ− ka−2 .

The first equation is time-reversal invariant because the second derivative stays the same undertime reversal. The second equation is also time-reversal invariant, because although the firstderivative flips its sign under time reversal, it is squared.(b) We typically do not think of a singularity as being a point belonging to a manifold at all. Ifwe want to create this type of connected, symmetric back-to-back solution, then we need the BigBang singularity to be a point in the manifold. But this violates the definition of a manifold,because then the Big Bang point would have topological characteristics different from those ofother points: deleting it separates the spacetime into two pieces.

Page 340, problem 4: Example 16 on page 310, the cosmic girdle, showed that a rope thatstretches over cosmological distances does expand significantly, unlike Brooklyn, nuclei, andsolar systems. Since the Milne universe is nothing but a flat spacetime described in funnycoordinates, something about that argument must fail. The argument used in that examplerelied on the use of a closed cosmology, but the Milne universe is not closed. This is not acompletely satisfying resolution, however, because we expect that a rope in an open universewill also expand, except in the special case of the Milne universe.

In a nontrivial open universe, every galaxy is accelerating relative to every other galaxy.By the equivalence principle, these accelerations can also be seen as gravitational fields, andtidal forces are what stretch the rope. In the special case of the Milne universe, there is noacceleration of test particles relative to other test particles, so the rope doesn’t stretch.

Example 18 on page 312, the cosmic whip, resulted in the conclusion that the velocity ofthe rope-end passing by cannot be interpreted as a measure of the velocity of the distant galaxyto which the rope’s other end is hitched, which makes sense because cosmological solutions arenonstationary, so there is no uniquely defined notion of the relative velocity of distant objects.

388 Chapter 9 Gravitational Waves

Page 389: General Relativity

The Milne universe, however, is stationary, so such velocities are well defined. The key here isthat nothing is accelerating, so the time delays in the propagation of information do not lead toambiguities in extrapolating to a distant object’s velocity “now.”

The Milne case also avoids the paradox in which we could imagine that if the rope is suf-ficiently long, its end would be moving at more than the speed of light. Although there is nolimit to the length of a rope in the Milne universe (there being no tidal forces), the Hubblelaw cannot be extrapolated arbitrarily, since the expanding cloud of test particles has an edge,beyond which there is only vacuum.

Page 340, problem 6: The cosmological constant is a scalar, so it doesn’t change underreflection. The metric is also invariant under reflection of any coordinate. This follows becausewe have assumed that the coordinates are locally Lorentzian, so that the metric is diagonal.It can therefore be written as a line element in which the differentials are all squared. Thisestablishes that the Λgab is invariant under any spatial or temporal reflection.

The specialized form of the energy-momentum tensor diag(−ρ,P ,P ,P ) is also clearly in-variant under any reflection, since both pressure and mass-energy density are scalars.

The form of the tensor transformation law for a rank-2 tensor guarantees that the diagonalelements of such a tensor stay the same under a reflection. The off-diagonal elements will flipsign, but since only the G and T terms in the field equation have off-diagonal terms, the fieldequations remain valid under reflection.

In summary, the Einstein field equations retain the same form under reflection in any co-ordinate. This important symmetry property, which is part of the Poincare group in specialrelativity, is retained when we make the transition to general relativity. It’s a discrete sym-metry, so it wasn’t guaranteed to exist simply because of general covariance, which relates tocontinuous coordinate transformations.

Page 340, problem 7: (a) The Einstein field equations are Gab = 8πTab + Λgab. That meansthat in a vacuum, where T = 0, a cosmological constant is equivalent to ρ = (1/8π)Λ andP = −(1/8π)Λ. This gives ρ+ 3P = (1/8π)(−2Λ), which violates the SEC for Λ > 0, since partof the SEC is ρ+ 3P ≥ 0.(a) Since our universe appears to have a positive cosmological constant, and the paper byHawking and Ellis assumes the strong energy condition, doubts are raised about the conclusionof the paper as applied to our universe. However, the theorem is being applied to the earlyuniverse, which was not a vacuum. Both P and ρ were large and positive in the early, radiation-dominated universe, and therefore the SEC was not violated.

Page 341, problem 8:

(a) The Ricci tensor is Rtt = g2e2gz, Rzz = −g2. The scalar curvature is 2g2, which isconstant, as expected.

(b) Both Gtt and Gzz vanish by a straightforward computation.

(c) The Einstein tensor is Gtt = 0, Gxx = Gyy = g2, Gzz = 0. It is unphysical because it hasa zero mass-energy density, but a nonvanishing pressure.

Page 341, problem 9:

This proposal is an ingenious attempt to propose a concrete method for getting around thefact that in relativity, there is no unique way of defining the relative velocities of objects thatlie at cosmological distances from one another.

Problems 389

Page 390: General Relativity

Because the Milne universe is a flat spacetime, there is nothing to prevent us from layingout a chain of arbitrary length. The chain will not, for example, be subject to the kind of tidalforces that would inevitably break a chain that was lowered through the event horizon of ablack hole. But this only guarantees us that we can have a chain of a certain length as measuredin the chain’s frame. An observer at rest with respect to the chain describes all the links ofthe chain as existing simultaneously at a certain set of locations. But this is a description in(T ,R) coordinates. To an observer who prefers the FRW coordinates, the links do not existsimultaneously at these locations. This observer says that the supposed locations of distantpoints on the chain occurred far in the past, and suspects that the chain has broken since then.

The paradox can also be resolved from the point of view of the (T ,R) coordinates. Thechain is long enough that its end hangs out beyond the edges of the expanding cloud of galaxies.Since there are no galaxies beyond the edge, so there are no galaxies near the end of the chainwith respect to which the chain could be moving at > c.

Page 341, problem 10: Frames are local, not global. One of the things we have to specify inorder to define a frame of reference is a state of motion. To define the volume of the observableuniverse, there end up being three spots in the definition at which we might need to pick a stateof motion. I’ve labeled these 1-2-3 below.

Observer O is in some state of motion [1] at event A. O’s past light-cone intersects thesurface of last scattering (or some other surface where some other physically well-defined thinghappens) in a spacelike two-surface S. S does not depend on O’s state of motion. At everyevent P on S, we define a state of motion [2] that is at rest relative to the Hubble flow, andwe construct a world-line that starts out in this state of motion and extends forward in timeinertially. One of these world-lines intersects O’s world-line at A. Let the proper time intervalalong this world-line be t. We extend all the other world-lines from all the other P by the sameinterval of proper time t. The end-points of all these world-lines constitute a spacelike 2-surfaceB that we can define as the boundary of the observable universe according to O. Let R be the3-surface contained inside B. In order to define R, we need to define some notion of simultaneity,which depends on one’s state of motion [3]. If we like, we can pick this state of motion to beone at rest with respect to the Hubble flow. Given this choice, we can define the volume V ofR (e.g., by chopping R up into pieces and measuring those pieces using rulers that are in thisstate of motion).

State of motion 1 had absolutely no effect on V , but states of motion 2 and 3 did. If O isnot at rest relative to the Hubble flow at A, then 2 and 3 do not match O’s state of motion atA. This probably means that O will object that V is not the answer in his frame but in someoneelse’s. However, there is no clear way to satisfy O by modifying the above definition. We can’tjust say that 2 and 3 should be chosen to be the same as O’s state of motion at A, becauseframes are local things, so matching them to O’s motion at A isn’t the same as matching themat points far from A. In a cosmological solution there is no well-defined notion of whether ornot two cosmologically distant objects are at rest relative to one another.

In particular, it is not meaningful to try to calculate a reduced value of V based on Lorentzcontraction for O’s velocity relative to the Hubble flow. Lorentz contractions can’t be appliedto a curved spacetime.

390 Chapter 9 Gravitational Waves

Page 391: General Relativity

Page 341, problem 11: The Friedmann equations reduce to

a

a= −4π

3(1 + 3w)ρ(

a

a

)2

=8π

3ρ .

Eliminating ρ, we finda

a2= −β ,

where β = (1 + 3w)/2. For a solution of the form a ∝ tδ, calculation of the derivatives results inδ = 1/(1 +β) = (2/3)/(1 +w). For dust, δ = 2/3, which checks out against the result on p. 320.For radiation, δ = 1/2. For a cosmological constant, w = −1 gives δ =∞, so the solution has adifferent form.

Page 341, problem 12: The integral is exactly the same as the one in example 22 on p. 321for the dust case, except that the exponent 2/3 is generalized to δ = (2/3)/(1 + w), as shownin the solution to problem 11. The result is L/t = 1/(1 − δ) = (w + 1)/(w + 1/3). In theradiation-dominated case, we have L/t = 2.

Solutions for Chapter 9

Page 356, problem 1: (a) The members of the Hulse-Taylor system are spiraling toward oneanother as they lose energy to gravitational radiation. If one of them were replaced with alow-mass test particle, there would be negligible radiation, and the motion would no longer bea spiral. This is similar to the issues encountered on pp. 39ff because the neutron stars in theHulse-Taylor system suffer a back-reaction from their own gravitational radiation.

(b) If this occurred, then the particle’s world-line would be displaced in space relative to ageodesic of the spacetime that would have existed without the presence of the particle. Whatwould determine the direction of that displacement? It can’t be determined by properties ofthis preexisting, ambient spacetime, because the Riemann tensor is that spacetime’s only local,intrinsic, observable property. At a fixed point in spacetime, the Riemann tensor is even underspatial reflection, so there’s no way it can distinguish a certain direction in space from theopposite direction.

What else could determine this mysterious displacement? By assumption, it’s not deter-mined by a preexisting, ambient electromagnetic field. If the particle had charge, the directioncould be one imposed by the back-reaction from the electromagnetic radiation it had emittedin the past. If the particle had a lot of mass, then we could have something similar with gravi-tational radiation, or some other nonlinear interaction of the particle’s gravitational field withthe ambient field. But these nonlinear or back-reaction effects are proportional to q2 and m2,so they vanish when q = 0 and m→ 0.

The only remaining possibility is that the result violates the symmetry of space expressed byL1 on p. 52; the Lorentzian geometry is the result of L1-L5, so violating L1 should be considereda violation of Lorentz invariance.

Problems 391

Page 392: General Relativity

Photo CreditsCover Galactic center: NASA, ESA, SSC, CXC, and STScI 15 Atomic clock on plane:Copyright 1971, Associated press, used under U.S. fair use exception to copyright law. 17Gravity Probe A: I believe this diagram to be public domain, due to its age and the improba-bility of its copyright having been renewed. 20 Stephen Hawking: unknown NASA photog-rapher, 1999, public-domain product of NASA. 22 Eotvos: Unknown source. Since Eotvosdied in 1919, the painting itself would be public domain if done from life. Under U.S. law,this makes photographic reproductions of the painting public domain. 25 Earth: NASA,Apollo 17. Public domain. 25 Orion: Wikipedia user Mouser, GFDL. 25 M100: EuropeanSouthern Observatory, CC-BY-SA. 25 Supercluster: Wikipedia user Azcolvin429, CC-BY-SA.25 Artificial horizon: NASA, public domain. 26 Upsidasium: Copyright Jay Ward Produc-tions, used under U.S. fair use exception to copyright law.. 36 Pound and Rebka photo:Harvard University. I presume this photo to be in the public domain, since it is unlikely tohave had its copyright renewed. 41 Lorentz: Jan Veth (1864-1925), public domain. 61Muon storage ring at CERN: (c) 1974 by CERN; used here under the U.S. fair use doctrine.65 Galaxies: Hubble Space Telescope. Hubble material is copyright-free and may be freelyused as in the public domain without fee, on the condition that NASA and ESA is creditedas the source of the material. The material was created for NASA by STScI under ContractNAS5-26555 and for ESA by the Hubble European Space Agency Information Centre. 69Gamma-Ray burst: NASA/Swift/Mary Pat Hrybyk-Keith and John Jones. 85 Graph fromIijima paper: Used here under the U.S. fair use doctrine. 91 Levi-Civita: Believed to be publicdomain. Source: http://www-history.mcs.st-and.ac.uk/PictDisplay/Levi-Civita.html.74 Ring laser gyroscope: Wikimedia commons user Nockson, CC-BY-SA licensed. 94 Ein-stein’s ring: I have lost the information about the source of the bitmapped image. I wouldbe grateful to anyone who could put me in touch with the copyright owners. 48 Map ofisotherms: J. Hanns, 1910, public domain. 49 Human arm: Gray’s Anatomy, 1918, publicdomain. 118 SU Aurigae’s field lines: P. Petit, GFDL 1.2. 115 Galaxies: Hubble SpaceTelescope. Hubble material is copyright-free and may be freely used as in the public domainwithout fee, on the condition that NASA and ESA is credited as the source of the material.The material was created for NASA by STScI under Contract NAS5-26555 and for ESA bythe Hubble European Space Agency Information Centre. 144 Chandrasekhar: Universityof Chicago. I believe the use of this photo in this book falls under the fair use exception tocopyright in the U.S. 149 Relativistic jet: Biretta et al., NASA/ESA, public domain. 159Rocks: Siim Sepp, CC-BY-SA 3.0. 160 Jupiter and comet: Hubble Space Telescope, NASA,public domain. 161 Earth: NASA, Apollo 17. Public domain. 161 Moon: Luc Vi-atour, CC-BY-SA 3.0. 162 Heliotrope: ca. 1878, public domain. 162 Triangulationsurvey: Otto Lueger, 1904, public domain. 166 Triangle in a space with negative curva-ture: Wikipedia user Kieff, public domain. 172 Eclipse: Eddington’s original 1919 photo,public domain. 185 Torsion pendulum: University of Washington Eot-Wash group, http://www.npl.washington.edu/eotwash/publications/pdf/lowfrontier2.pdf. 193 Aster-oids: I believe the use of this photo in this book falls under the fair use exception to copyrightin the U.S. 193 Coffee cup to doughnut: Wikipedia user Kieff, public domain. 203 Coin:Kurt Wirth, public-domain product of the Swiss government. 206 Bill Unruh: Wikipedia userChildrenofthedragon, public domain. 228 Accretion disk: Public-domain product of NASAand ESA. 243 Wilhelm Killing: I believe this to be public domain the US, since Killing died inearly 1923.. 243 Surface of revolution: Shaded rendering by Oleg Alexandrov, public domain.273 Cavendish experiment: Based on a public-domain drawing by Wikimedia commons user

392 Chapter 9 Gravitational Waves

Page 393: General Relativity

Chris Burks. 274 Simplified diagram of Kreuzer experiment: Based on a public-domain draw-ing by Wikimedia commons user Chris Burks. 274 Kreuzer experiment: The diagram of theapparatus is redrawn from the paper, and the two graphs are taken directly from the paper. Ibelieve the use of these images in this book falls under the fair use exception to copyright in theU.S.. 276 Apollo 11 mirror: NASA, public domain. 284 Magnetic fipole: based on a fig-ure by Wikimedia Commons user Geek3, CC-BY-SA licensed. 296 Penzias-Wilson antenna:NASA, public domain. 302 Friedmann: Public domain. 303 Lemaıtre: Ca. 1933, publicdomain.. 326 Cosmic microwave background image: NASA/WMAP Science Team, public do-main. 334 Dicke’s apparatus: Dicke, 1967. Used under the US fair-use doctrine. 346 LIGOand LISA sensitivities: NASA, public domain. 345 Graph of pulsar’s period: Weisberg andTaylor, http://arxiv.org/abs/astro-ph/0211217.

Problems 393

Page 394: General Relativity

Indexaberration, 121absolute geometry, 19abstract index notation, 51, 102

equivalent to birdtracks, 51acceleration four-vector, 126action, 63Adams, W.S., 16ADM mass, 348affine geometry, 43affine parameter, 44Aharonov-Bohm effect, 191angular defect, 163angular momentum, 153, 155antigravity, 26, 289antisymmetrization, 102Aristotelian logic, 68Ashtekar formulation of relativity, 238asymptotically flat, 254atlas, 198atomic clocks, 15, 74

background independence, 117baryon acoustic oscillations, 328Bell, John, 66

spaceship paradox, 66, 201Big Bang, 304Big Crunch, 95Big Rip, 323, 324birdtracks, 47birdtracks notation, 47

covariant derivative, 178equivalent to abstract index notation, 51metric omitted in, 106rank-2 tensor, 105

Birkhoff’s theorem, 256, 259, 331black body spectrum, 206black hole, 225

definition, 258event horizon, 226formation, 147, 227Newtonian, 65no-hair theorem, 256observational evidence, 227radiation from, 232

singularity, 225black string, 235, 258Bohr model, 82Bondi, Hermann, 335boost, 53Brans-Dicke theory, 27, 33, 330, 344Brown-Bethe scenario, 146BTZ black hole, 235

cadabra, 188, 190Cartan, 186

curved-spacetime theory of Newtonian grav-ity, 41, 116, 161

Casimir effect, 295causal diagram, 250center of mass-energy, 270, 286Cerenkov radiation, 143Chandrasekhar limit, 144charge inversion, 107chart, 198Chiao’s paradox, 30, 33Christoffel symbol, 176chronology protection conjecture, 20, 263clock “postulate”, 84, 124cloning of particles, 205closed cosmology, 300closed set, 201closed timelike curves, 13, 20, 263

violate no-cloning theorem, 205comoving cosmological coordinates, 299completeness

geodesic, 385Compton scattering, 157conformal cosmological coordinates, 299conformal geometry, 309congruence, 290conical singularity, 231connection, 90conservation laws, 148

from Killing vectors, 248continuous function, 194contravariant vector, 100

summarized, 401coordinate independence, 97

Page 395: General Relativity

coordinate singularity, 225

coordinate transformation, 97

correspondence principle, 33, 35, 57, 208

cosmic censorship, 229, 230

cosmic microwave background, 314

discovery of, 296

isotropy of, 297

cosmic rays, 16

cosmological constant, 64, 293

no variation of, 294

observation, 326

cosmological coordinates

comoving, 299

conformal, 299

standard, 299

covariant derivative, 173, 174

in electromagnetism, 173, 174

in relativity, 174

covariant vector, 100

summarized, 401

ctensor, 209

curvature, 159

Gaussian, 164

in two spacelike dimensions, 162

intrinsic versus extrinsic, 95

Kretchmann invariant, 225

none in one dimension, 162

of spacetime, 87

Ricci, 161

Ricci scalar, 225

Riemann tensor, 168

scalar, 225

sectional, 161

tensors, 168

tidal versus local sources, 160

Cvitanovic, Predrag, 47

Cygnus X-1, 227

dark energy, 24

dark matter, 329

de Sitter, Willem, 82, 299

deflection of light, 171, 222

degeneracy, 236

Dehn twist, 97

derivative

covariant, 173, 174

in electromagnetism, 173

in relativity, 174

deuteriumevidence for finite age of universe, 295test of cosmological models, 304

diffeomorphism, 97Dirac sea, 130, 156dominant energy condition, 282Doppler shift, 133dual vector, 47

summarized, 401dust, 132

Eotvos experiments, 22Eddington, 171Ehrenfest’s paradox, 109Einstein field equation, 269, 293Einstein summation convention, 51Einstein synchronization, 255, 359Einstein tensor, 269Einstein-Cartan theory, 186electromagnetic fields

transformation properties of, 157electromagnetic potential four-vector, 137, 141,

157electromagnetic tensor, 141electron capture, 145elliptic geometry, 92energy, see also conservation laws

of gravitational fields, 276energy conditions, 281

violated by cosmological constant, 294equiconsistency, 92equivalence principle

accelerations and fields equivalent, 23application to charged particles, 30, 33no preferred field, 142not mathematically well defined, 30, 282spacetime locally Lorentzian, 28

Erlangen program, 108ether, 70event horizon, 203expansion scalar, 290extra dimensions, 234extrinsic quantity, 95

Fermat’s principle, 137field equation, Einstein, 269fine structure constant, 82foliation, 290force

Index 395

Page 396: General Relativity

is a dual vector, 48

four-vector, 124

acceleration, 126

momentum, 126

velocity, 124

frame dragging, 151, 192, 262

frame of reference

inertial, 24

ambiguity in definition, 29

frequency vector, 50, 133

Friedmann equations, 301

Friedmann-Robertson-Walker cosmology, 302

Godel metric, 297

Godel’s theorem, 92

Godel, Kurt, 92

gauge transformation, 118, 173

Gaussian curvature, 164

Gaussian normal coordinates, 164

Gell-Mann, Murray, 130

general covariance, 97

general relativity

defined, 29

geodesic, 21

as world-line of a test particle, 21, 356

differential equation for, 178

stationary action, 63

geodesic completeness, 385

geodesic equation, 178

geodesic incompleteness, 231

geodetic effect, 170, 214

geometrized units, 210

geometry

elliptic, 92

hyperbolic, 166

spherical, 93

Gold, Thomas, 335

Goudsmit, 82

GPS

frames of reference used in, 112

timing signals, 141

gravitational constant, 210

gravitational field

uniform, 200, 260, 341

gravitational mass, 21, 273

active, 273

passive, 273

gravitational potential, see potential

gravitational red-shift, see red-shiftgravitational shielding, 289gravitational waves

empirical evidence for, 344energy content, 346propagation at c, 343propagation at less than c, for high ampli-

tudes, 343rate of radiation, 352transverse nature, 349

Gravity Probe A, 17Gravity Probe B, 74, 142

frame dragging, 151geodetic effect calculated, 214geodetic effect estimated, 170

group, 107group velocity, 51

Hafele-Keating experiment, 15, 74Hawking radiation, 232Hawking, Stephen, 20hole argument, 114homeomorphism, 194Hoyle, Fred, 297, 335Hubble constant, 291, 302, 320Hubble flow, 321Hubble, Edwin, 296Hulse, R.A., 222Hulse-Taylor pulsar, 222, 354hyperbolic geometry, 166

index gymnastics notation, 104indices

raising and lowering, 104inertial frame, see frame, inertialinertial mass, 21, 273information paradox, 205, 230inner product, 107intrinsic quantity, 95isometry, 107Ives-Stilwell experiments, 134

Jacobian matrix, 152

Kasner metric, 261Killing equation, 245Killing vector, 243

orbit, 243Kretchmann invariant, 225, 253, 265

396 Index

Page 397: General Relativity

Kreuzer experiment, 273

large extra dimensions, 234Lemaıtre, Georges, 304length contraction, 56Lense-Thirring effect, 151, 262Levi-Civita symbol, 153, 184Levi-Civita, Tullio, 91, 116, 153light

deflection by sun, 171, 222light clock, 84light cone, 64lightlike, 64logic

Aristotelian, 68loop quantum gravity, 69Lorentz boost, 53lowering an index, 104lune, 94

Mossbauer effect, 35Mach’s principle, 116, 267, 330manifold, 193

differentiable, 198geodesically complete, 385smooth, 198

massactive gravitational, 273ADM, 348defined, 128gravitational, 21, 273inertial, 21, 273passive gravitational, 273

mass-energy, 128ADM, 348

Maxima, 76, 209Mercury, orbit of, 212, 217metric, 99

none in Galilean spacetime, 100Michelson-Morley experiment, 70Milne universe, 305Minkowski, 41model

mathematical, 92momentum four-vector, 126muon, 16

neighborhood, 194neutrino, 130

neutron star, 145, 222no-cloning theorem, 205no-hair theorems, 256normal coordinates, 164null energy condition, 282null infinity, 258

observable universe, 321size and age, 321

open cosmology, 300open set, 194optical effects, 121orbit

Killing vector, 243orientability, 152orientable

in time, 213orthogonality, 125

parallel postulate, 18parallel transport, 89, 90parity, 107Pasch, Moritz, 19patch, 197Penrose

graphical notation for tensors, 47Penrose diagram, 250Penrose, Roger, 121, 230Penrose-Hawking singularity theorems, 290, 305Penzias, Arno, 296Petrov classification, 349Petrov metric, 262, 265phase velocity, 51photon

mass, 131Pioneer anomaly, 311Planck mass, 186Planck scale, 185Playfair’s axiom, 18Poincare group, 107, 389polarization

of gravitational waves, 349of light, 129

potential, 32relativistic vs. Newtonian, 259

Pound-Rebka experiment, 16, 34Poynting vector, 285principal group, 108prior geometry, 117

Index 397

Page 398: General Relativity

projective geometry, 97proper distance, 299proper time, 123pulsar, 145, 222

raising an idex, 104rank of a tensor, 101rapidity, 66red-shift

cosmologicalkinematic versus gravitational, 260, 312

gravitational, 16, 34Ricci curvature, 161

defined, 169Ricci scalar, 225Riemann curvature tensor, 168Riemann tensor

defined, 168rigid-body rotation, 109Rindler coordinates, 200ring laser, 74Robinson

Abraham, 94Robinson, Abraham, 88rotating frame of reference, 108, 261rotation

rigid, 109

Sagittarius A*, 228, 257Sagnac effect, 110, 255

defined, 74in GPS, 60proportional to area, 85

scalardefined, 46

scalar curvature, 225Schwarzschild metric, 213

in d dimensions, 234Schwarzschild, Karl, 207shielding

gravitational, 289signature

change of, 236defined as a list of signs, 208defined as an integer, 236

singularity, 20, 229conical, 231coordinate, 225formal definition, 231

naked, 232

timelike, 232

singularity theorems, 290

Sirius B, 16

spacelike, 64

spaceship paradox, 66, 201

special relativity

defined, 29

spherical geometry, 93

spherical symmetry, 249

spontaneous symmetry breaking, 321

standard cosmological coordinates, 299

static spacetime, 256

stationary, 253

asymptotically, 254

stationary action, 63

steady-state cosmology, 297, 335

stress-energy tensor, 162, 269

divergence-free, 268

interpretation of, 272

of an electromagnetic wave, 283

symmetry of, 272

string theory, 185

strong energy condition, 282

surface of last scattering, 296

Susskind, Leonard, 234

Sylvester’s law of inertia, 236

symmetrization, 102

symmetry

spherical, 249

symmetry breaking

spontaneous, 321

synchronization

Einstein convention, 255, 359

tachyon, 158

tangent space, see tangent vector

tangent vector, 88, 198, 244

Tarski, Alfred, 92

Taylor, J.H., 222

tensor, 101, 139

antisymmetric, 102

Penrose graphical notation, 47

rank, 101

symmetric, 102

transformation law, 139

tensor density, 152

tensor transformation laws, 138

398 Index

Page 399: General Relativity

Terrell, James, 121Thomas precession, 73, 171, 215Thomas, Llewellyn, 83time dilation

gravitational, 15, 33nonuniform field, 60

kinematic, 15, 56time reversal, 107

of the Schwarzschild metric, 213symmetry of general relativity, 213

time-orientable, 213timelike, 64Tolman-Oppenheimer-Volkoff limit, 146topology, 193torsion, 181

tensor, 183trace energy condition, 282transformation laws, 138transition map, 197transverse polarization

of gravitational waves, 349of light, 129

trapped surface, 290triangle inequality, 107Type III solution, 349Type N solution, 349

Uhlenbeck, 82uniform gravitational field, 200, 260, 341unitarity, 205, 230units

geometrized, 210universe

observable, 321size and age, 321

upsidasium, 26

vectordefined, 46dual, 47Penrose graphical notation, 47summarized, 401

vectors and dual vectors, 100summarized, 401

velocity addition, 66velocity four-vector, 124velocity vector, 124volume

spacetime, 155

volume expansion, 290

Waage, Harold, 26wavenumber, 133waves

gravitational, see gravitational wavesweak energy condition, 282weight of a tensor density, 152Wheeler, John, 26white dwarf, 144Wilson, Robert, 296world-line, 21

Index 399

Page 400: General Relativity

Euclidean geometry (page 18):

E1 Two points determine a line.

E2 Line segments can be extended.

E3 A unique circle can be constructed given any point as its center and any line segment asits radius.

E4 All right angles are equal to one another.

E5 Parallel postulate: Given a line and a point not on the line, exactly one line can be drawnthrough the point and parallel to the given line.53

Ordered geometry (page 19):

O1 Two events determine a line.

O2 Line segments can be extended: given A and B, there is at least one event such that [ABC]is true.

O3 Lines don’t wrap around: if [ABC] is true, then [BCA] is false.

O4 Betweenness: For any three distinct events A, B, and C lying on the same line, we candetermine whether or not B is between A and C (and by statement 3, this ordering isunique except for a possible over-all reversal to form [CBA]).

Affine geometry (page 43):In addition to O1-O4, postulate the following axioms:

A1 Constructibility of parallelograms: Given any P, Q, and R, there exists S such that [PQRS],and if P, Q, and R are distinct then S is unique.

A2 Symmetric treatment of the sides of a parallelogram: If [PQRS], then [QRSP], [QPSR],and [PRQS].

A3 Lines parallel to the same line are parallel to one another: If [ABCD] and [ABEF], then[CDEF].

Experimentally motivated statements about Lorentzian geometry (page 400):

L1 Spacetime is homogeneous and isotropic. No point has special properties that make itdistinguishable from other points, nor is one direction distinguishable from another.

L2 Inertial frames of reference exist. These are frames in which particles move at constantvelocity if not subject to any forces. We can construct such a frame by using a particularparticle, which is not subject to any forces, as a reference point.

53This is a form known as Playfair’s axiom, rather than the version of the postulate originally given by Euclid.

400 Index

Page 401: General Relativity

L3 Equivalence of inertial frames: If a frame is in constant-velocity translational motionrelative to an inertial frame, then it is also an inertial frame. No experiment can distinguishone inertial frame from another.

L4 Causality: There exist events 1 and 2 such that t1 < t2 in all frames.

L5 Relativity of time: There exist events 1 and 2 and frames of reference (t,x) and (t′,x′)such that t1 < t2, but t′1 > t′2.

Statements of the equivalence principle:

Accelerations and gravitational fields are equivalent. There is no experiment that candistinguish one from the other (page 23).

It is always possible to define a local Lorentz frame in a particular neighborhood of space-time (page 28).

There is no way to associate a preferred tensor field with spacetime (page 142).

Vectors

Coordinates cannot in general be added on a manifold, so they don’t form a vector space,but infinitesimal coordinate differences can and do. The vector space in which the coordinatedifferences exist is a different space at every point, referred to as the tangent space at that point(see p. 244).

Vectors are written in abstract index notation with upper indices, xa, and are representedby column vectors, arrows, or birdtracks with incoming arrows, → x.

Dual vectors, also known as covectors or 1-forms, are written in abstract index notation withlower indices, xa, and are represented by row vectors, ordered pairs of parallel lines (see p. 48),or birdtracks with outgoing arrows, ← x.

In concrete-index notation, the xµ are a list of numbers, referred to as the vector’s con-travariant components, while xµ would be the covariant components of a dual vector.

Fundamentally the distinction between the two types of vectors is defined by the tensortransformation laws, p. 138. For example, an odometer reading is contravariant because con-verting it from kilometers to meters increases it. A temperature gradient is covariant becauseconverting it from degrees/km to degrees/m decreases it.

In the absence of a metric, every physical quantity has a definite vector or dual vectorcharacter. Infinitesimal coordinate differences dxa and velocities dxa/ dτ are vectors, whilemomentum pa and force Fa are dual (see p. 141). Many ordinary and interesting real-worldsystems lack a metric (see p. 49). When a metric is present, we can raise and lower indices atwill. There is a perfect duality symmetry between the two types of vectors, but this symmetryis broken by the convention that a measurement with a ruler is a ∆xa, not a ∆xa.

For consistency with the transformation laws, differentiation with respect to a quantity flipsthe index, e.g., ∂µ = ∂/∂xµ. The operators ∂µ are often used as basis vectors for the tangentplane. In general, expressing vectors in a basis using the Einstein notation convention resultsin an ugly notational clash described on p. 247.

Index 401