siam to review

44
SIAM REVIEW c 2006 Society for Industrial and Applied Mathematics Vol. 48, No. 2, pp. 393–436 Book Reviews Edited by Robert E. O’Malley, Jr. Featured Review: Textbooks on Linear Algebra Practical Linear Algebra: A Geometric Toolbox. By Gerald Farin and Dianne Hansford. A. K. Peters, Wellesley, MA, 2004. $67.00. xvi+384 pp., hardcover. ISBN 1-56881-234-5. Transform Linear Algebra. By Frank Uhlig. Prentice-Hall, Upper Saddle River, NJ, 2001. $120.00. xx+503 pp., hardcover. ISBN 0-13-041535-9. Applied Linear Algebra. By Peter J. Olver and Chehrzad Shakiban. Prentice-Hall, Upper Saddle River, NJ, 2005. $120.00. xxii+714 pp., hardcover. ISBN 0-13-147382-4. The three textbooks reviewed here pursue contrasting approaches to the teaching of linear algebra. Practical Linear Algebra (PLA), by Farin and Hansford, uses geometry to motivate every single concept. As the authors write in the preface, their objective is to “present the material in an intuitive, geometric manner that will lead to retention of the ideas and methods.” Indeed the book presents many fine geometric insights. To give just one example, the geometric interpretation of Cramer’s rule (2 × 2 case) on page 90 is quite pleasing; each of the determinants is interpreted as the area of a certain parallelogram. I had not seen this before. I am all in favor of geometry. I think it should be used liberally in the explanation of linear algebra concepts, but I also think that PLA carries the idea too far. For example, on pages 91–92 a shear map is used to introduce Gaussian elimination (2 × 2 case). Here (and in many other places in the book) the insistence on explaining everything geometrically has led to an unnecessarily complicated development. It would surely be better to give the easy algebraic explanation first. The connection with shear maps can be made later, if desired. Linear algebra is, after all, algebra. But I would also object to a presentation that was purely algebraic. A healthy mix of geometry and algebra is preferable to a steady diet of one or the other. The style of PLA is deliberately informal. Again, from the preface, “we replaced mathematical proofs with motivations, examples, or graphics ....” Unfortunately the informality just looks like sloppiness to me. For example, the following passage appears on page 95: the solution of the system is u = S3S2S1b. This leads to the definition of the inverse matrix A 1 of a matrix A: A 1 = S3S2S1. This is the definition of the inverse? For many other examples of this sort of style, open the book at random and start reading. Publishers are invited to send books for review to Book Reviews Editor, SIAM, 3600 University City Science Center, Philadelphia, PA 19104-2688. 393

Upload: duongkhanh

Post on 02-Jan-2017

225 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: SIAM to review

SIAM REVIEW c© 2006 Society for Industrial and Applied MathematicsVol. 48, No. 2, pp. 393–436

Book Reviews

Edited by Robert E. O’Malley, Jr.

Featured Review: Textbooks on Linear Algebra

Practical Linear Algebra: A Geometric Toolbox. By Gerald Farin and Dianne Hansford.A. K. Peters, Wellesley, MA, 2004. $67.00. xvi+384 pp., hardcover. ISBN 1-56881-234-5.

Transform Linear Algebra. By Frank Uhlig. Prentice-Hall, Upper Saddle River, NJ, 2001.$120.00. xx+503 pp., hardcover. ISBN 0-13-041535-9.

Applied Linear Algebra. By Peter J. Olver and Chehrzad Shakiban. Prentice-Hall, UpperSaddle River, NJ, 2005. $120.00. xxii+714 pp., hardcover. ISBN 0-13-147382-4.

The three textbooks reviewed here pursue contrasting approaches to the teaching oflinear algebra. Practical Linear Algebra (PLA), by Farin and Hansford, uses geometryto motivate every single concept. As the authors write in the preface, their objectiveis to “present the material in an intuitive, geometric manner that will lead to retentionof the ideas and methods.” Indeed the book presents many fine geometric insights.To give just one example, the geometric interpretation of Cramer’s rule (2 × 2 case)on page 90 is quite pleasing; each of the determinants is interpreted as the area of acertain parallelogram. I had not seen this before.

I am all in favor of geometry. I think it should be used liberally in the explanationof linear algebra concepts, but I also think that PLA carries the idea too far. Forexample, on pages 91–92 a shear map is used to introduce Gaussian elimination (2×2case). Here (and in many other places in the book) the insistence on explainingeverything geometrically has led to an unnecessarily complicated development. Itwould surely be better to give the easy algebraic explanation first. The connectionwith shear maps can be made later, if desired. Linear algebra is, after all, algebra.But I would also object to a presentation that was purely algebraic. A healthy mixof geometry and algebra is preferable to a steady diet of one or the other.

The style of PLA is deliberately informal. Again, from the preface, “we replacedmathematical proofs with motivations, examples, or graphics . . . .” Unfortunatelythe informality just looks like sloppiness to me. For example, the following passageappears on page 95:

the solution of the system isu = S3S2S1b.

This leads to the definition of the inverse matrix A−1 of a matrix A:

A−1 = S3S2S1.

This is the definition of the inverse? For many other examples of this sort of style,open the book at random and start reading.

Publishers are invited to send books for review to Book Reviews Editor, SIAM, 3600 UniversityCity Science Center, Philadelphia, PA 19104-2688.

393

Page 2: SIAM to review

394 BOOK REVIEWS

At some point students need to learn precision of thought. This is true not just formath majors but for science and engineering majors as well. A linear algebra courseis a good opportunity to showcase a few careful, logically correct arguments in simplecontexts. It is also an opportunity to introduce a modest amount of abstraction. PLAoffers none of this. In fact, the style of the book discourages precision of thought.

This might make little difference for weak students; it is the stronger students whoare really being shortchanged here. We should not ignore this important constituency,for these are the people who will make the difference in engineering firms and scientificlaboratories in the future.

In the first nine chapters of PLA (with the exception of section 8.7) all of thegeometry is two-dimensional. Chapters 10–13 recapitulate everything in three dimen-sions, adding new three-dimensional concepts (e.g., cross product) as well. Not untilChapter 14 (p. 241) does a 4 × 4 matrix appear. Few instructors will feel that theyhave time for such a leisurely development.

In summary, I cannot recommend PLA as the primary text for a linear algebracourse. However, I do recommend it to teachers of linear algebra as a source ofgeometric insights.

Uhlig’s book, Transform Linear Algebra (TLA), takes a completely different ap-proach. As the title suggests, the central concept of the book is linear transformation.Indeed, the very first chapter is titled “Linear Transformations,” and the reader isplunged immediately into a discussion of linear maps from R

n to Rm. The presenta-tion is quite abstract, but examples are given.

I’ll begin by mentioning a few things I liked about TLA. First, it presents twointroductions to eigenvalues, one that uses determinants and one that ignores them.It is good to let the students know that we can get along fine without determinantshere; indeed the standard numerical methods for solving eigenvalue problems makeno use of them. I also liked the applications in section 7.3, in which linear algebrais used to compute an integral and to solve a simple differential equation. Of coursethese are things that one probably would not cover in a first course, but they couldbe used in a second course. The notation

| |a1 · · · an| |

,

which is used to denote the matrix whose columns are a1, . . . ,an, is worth emulating.It is very clear and suggestive.

Although I could point out other pleasing features, my overall impression of TLAwas negative. Its greatest shortcoming is that it fails to provide motivation. Appli-cations are discussed, but they are always at the end of the chapter. At least someof them should have been placed earlier to help the student understand what thesubject is about. Section 1.2, “Tasks and Methods of Linear Algebra,” begins asfollows: “We study linear algebra so that we can understand linear transformationsf : Rn → R

m. Studying linear transformations is the same as trying to understandmatrices Amn . . . .” Nothing is said about what these objects might be good for. Thisopening is followed by a list of essential tasks of linear algebra, which includes, forexample, the following entries: “To find images and preimages of and for linear trans-formations . . . ,” and “Where do the images y = Ax come from? Which x map toa given y under A?” No student is going to see the point of this, much less find itinteresting.

Page 3: SIAM to review

BOOK REVIEWS 395

Again and again I was unhappy with the order of presentation of material. Onpage 24 the dot product is used before it is defined. Then on the next page, “werewrite the dot product

a · x = a1x1 + · · ·+ anxn

as

(a1, . . . , an)

x1...xn

,

i.e., as the more convenient product. . . . ” This is before matrix multiplication hasbeen discussed or matrices even mentioned.

Chapter 2, “Row-Echelon Form Reduction,” begins by writing down anm×(n+1)augmented matrix and then saying that it can be interpreted as a system of m linearequations. Isn’t that putting the cart before the horse? I would be inclined to writedown a system first, then represent it compactly as an augmented matrix. Betteryet, I’d start with some small application that gives us a reason for wanting to solvea system of equations, then I’d represent it as an augmented matrix and solve it.Then I’d write down a general system and discuss it. TLA does have some simpleillustrative applications, but they are at the end of Chapter 3. Perhaps Chapters 2and 3 should be swapped.

Let us now move on to Applied Linear Algebra (ALA) by Olver and Shakiban.In the preface the authors tell us that their inspiration came from the books [1, 2] ofGilbert Strang. In the authors’ words, “our goal has been to present a more linearlyordered and less ambitious development of the subject, while retaining the excitementand interconnectedness of theory and applications that is evident in Strang’s works.”

The authors do not recommend ALA for a first course, except for classes ofespecially well motivated and mathematically mature students. However, the bookstarts from scratch and presents even the most elementary material, although briefly.For a second course, the elementary material can be skipped over or reviewed quickly.

The authors insist that abstraction is important. The linear algebra that is usefulfor the study of algebraic equations is also useful for differential equations. Abstractionfacilitates understanding of the underlying unity of the subject. Thus abstraction ispursued because it is useful. Vector spaces are introduced in Chapter 2, and theoremsabout basis and dimension are proved. Students are urged to keep the concreteexamples in mind. Not wanting to overwhelm students with abstraction, ALA delaysthe introduction of linear transformations until Chapter 7. Kernel and range areintroduced in Chapter 2, but only in relation to matrices. Determinants are de-emphasized.

Although applications are paramount, the authors ask for the students’ patienceas they work through five chapters of theory and computational techniques. The firstfive chapters contain few applications indeed. However, the authors do make an effortto tell the readers where the concepts will be useful later. Moreover, many of thecomputational techniques are introduced by beginning with a simplified version. Forexample, in section 4.2, in which multivariate quadratic functions are minimized bycompleting the square, the authors begin by reminding the readers how to minimizea quadratic polynomial in one variable by that technique.

In Chapter 6, “Equilibrium,” we finally have an entire chapter of applications.Equilibrium of mass-spring systems, electrical circuits, and trusses is discussed.

Page 4: SIAM to review

396 BOOK REVIEWS

Chapter 8 on eigenvalues is followed by a chapter of dynamical examples. Chap-ter 10, “Iteration of Linear Systems,” gives an overview of iterative methods for solvinglinear systems and eigenvalue problems. Markov processes are discussed briefly, andso are the conjugate gradient method and the QR algorithm. Chapter 11, the finalchapter, is titled “Boundary Value Problems in One Dimension.” Among the topicscovered, again briefly, are generalized functions, Green’s functions, beams and splines,and the finite element method.

Any instructor who is looking for a text for a second course in linear algebrashould take a look at ALA.

REFERENCES

[1] G. Strang, Introduction to Applied Mathematics, Wellesley-Cambridge Press, Wellesley, MA,1986.

[2] G. Strang, Linear Algebra and Its Applications, 3rd ed., Brooks/Cole, Pacific Grove, CA, 1988.

DAVID S. WATKINS

Washington State University

The Mathematics of Internet CongestionControl. By R. Srikant. Birkhauser Boston,Boston, MA, 2004. $54.95. xii+164 pp., hard-cover. ISBN 0-8176-3227-1.

We can say without exaggeration that dur-ing the last decade the Internet transformedthe scientific world, the economy, and eventhe way people interact in the society. TheInternet can be broadly defined as a sys-tem of interconnected computer networksthat provides means for the distributionand exchange of information and services.At present, most data is transferred over theInternet in a reliable manner; that is, eventhe loss of a single bit is not tolerated. Toensure such reliability, Transmission Con-trol Protocol (TCP) was developed in themid-1970s. The main idea behind TCP de-sign is the concept of a sliding congestionwindow that defines an amount of datawhich the sender is allowed to inject into thenetwork without receiving an acknowledg-ment from the receiver. Initially, the size ofthe congestion window was determined onlyby the processing capacity of the receiver.However, in the mid-1980s, the transmis-sion capacity of links became a bottleneckrather than the processing capacity of endhosts. Together with the rapid growth inthe number of Internet users, the limita-tion of transmission capacity resulted in aphenomenon known as congestion collapse.To overcome the congestion collapse phe-

nomenon, Van Jacobson suggested addinga congestion control algorithm to TCP [6].The congestion control algorithm of [6] de-tects congestion in the network by datapacket losses. Upon detecting congestion,the source reduces the sending rate; oth-erwise, it increases the sending rate. Inparticular, during its main phase, the con-gestion control algorithm of [6] uses theso-called Additive Increase MultiplicativeDecrease (AIMD) scheme. Namely, in theabsence of congestion, the size of the slidingwindow increases additively, and upon thedetection of congestion the size of the slid-ing window decreases by a factor. Chiu andJain [4] with the help of a simple mathemat-ical model showed that the AIMD schemeprovides convergence to fairness and effi-ciency when several TCP flows compete foravailable bandwidth. Chiu and Jain [4] onlyconsidered a single bottleneck link network.In this single link model, efficiency is nat-urally defined as the full utilization of thelink capacity and fairness is defined as asituation when competing TCP flows shareequally the link in the steady state. Theconcepts of fairness and efficiency becomemuch less obvious in the case of a networkwith complex topology.The investigation of fairness, efficiency,

and dynamics of TCP led to the develop-ment of a new fast-growing research domainof mathematical models for congestion con-trol. The Mathematics of Internet Conges-

Page 5: SIAM to review

BOOK REVIEWS 397

tion Control by R. Srikant provides a valu-able comprehensive introduction to this newexciting research area. As the title suggests,the book demonstrates the application ofdifferent mathematical tools to understand-ing and designing congestion control algo-rithms. The tools come from the followingthree major mathematical disciplines: con-vex optimization, control theory, and proba-bility. Using mathematical models, Srikantsuccessfully explains the principal ideas be-hind Internet congestion control. The publi-cation of the book is very timely. Currently,there are no other books on mathematicalmodels for congestion control.The book starts with a presentation of

the Chiu and Jain model [4] in Chapter 1.Indeed, the presentation of this model is aperfect way to introduce and explain thenotions of fairness and efficiency in datanetworks.Chapters 2–6 constitute the core of the

book. In Chapter 2 the congestion controlproblem is presented as a utility function–based resource allocation problem. With anatural choice of a concave nondecreasingutility function for the sending rate, theresource allocation problem becomes an in-stance of convex (concave, in fact) optimiza-tion. The utility function–based approachallows one to define fairness for generaltopology networks. Then, in Chapter 3 itis shown that congestion control algorithmscan be viewed as gradient-type optimizationmethods. Fortunately, these algorithms ap-pear to have a decentralized form, whichis a very important property for controlof large complex networks. More fine de-tails about the relation between the math-ematical models of Chapters 2 and 3 andreal TCP implementations are presented inChapter 4.In Chapters 5 and 6, the stability of con-

gestion control algorithms in the presenceof feedback delays is investigated. Again,the decentralized form of the stability con-ditions is emphasized. In Chapters 5 and6 only local stability is considered. Then,in Chapter 7 some advances in global sta-bility of congestion control algorithms arepresented.It is a pity that Chapters 8 and 9 are

light. However, the author declares in thepreface: “I have chosen to focus in thisbook on those approaches which are rooted

in the view that congestion control is amechanism for resource allocation in a net-work.” In Chapter 8 it is shown that whena large number of TCP flows compete foravailable bandwidth of the bottleneck link,the evolution of the average sending ratecan be described by a deterministic equa-tion derived in Chapter 3. However, thereare many other sources of randomness inthe real Internet: random arrival and de-parture of TCP flows, transmission losseson optical and wireless links, and asyn-chronous updates among sources, just toname a few. There is a large number ofpublications dedicated to the analysis ofdifferent stochastic aspects of congestioncontrol. For example, articles [1] and [3]have a good review of the results obtainedin this area. In Chapters 1–8, the study ofcongestion control is carried out under theassumption that the sources have an infi-nite amount of data to send. Of course, inreality TCP flows stay in the Internet forfinite durations. Furthermore, many flowsin the Internet are short and they typicallydo not even have enough time to respondto congestion notifications. Even thougha large proportion of the load is createdby large flows, the effect of small flows onthe Internet performance is not negligible.After all, the most popular Internet ap-plication, the World Wide Web hypertext,produces mostly short TCP flows. It seemsthat the processor sharing model [7] is quiteappropriate for the analysis of the conges-tion control dynamics at connection level.An interested reader can find more materialon connection-level models in, e.g., [2, 5].In particular, in [5] there is a suggestionof giving priority to short connections. Adiscussion on differentiation between shortand long TCP flows would definitely enrichsection 9.2.The final chapter, Chapter 10, discusses

the problem of coexistence in the Internet ofelastic and performance-guaranteed users.Performance-guaranteed users typically re-quires some minimal guaranteed level ofquality of service. The problem studied isvery important due to the steady increase inthe number of the Internet telephone users.All material in the book is accessible to

graduate or even undergraduate studentsof mathematical and electrical engineer-ing disciplines. Many chapters are supplied

Page 6: SIAM to review

398 BOOK REVIEWS

with appendices which review the necessarybackground material and make the bookself-contained. In conclusion, this book canbe equally recommended for beginners aswell as for expert researchers working inthe domain of congestion control. For be-ginners, the book will be a good startingpoint from which to explore a vast andrapidly growing body of literature on thesubject of Internet congestion control. Forexpert researchers, the book will definitelyhelp to place congestion control in perspec-tive and will point to new avenues in thisexciting research domain.

REFERENCES

[1] C. Barakat, TCP/IP modeling and valida-tion, IEEE Network, May/June 2001,pp. 38–47.

[2] S. Ben Fredj, T. Bonald, A. Proutiere,G. Regnie, and J. W. Roberts, Sta-tistical bandwidth sharing: A study ofcongestion at flow level, in Proceedingsof ACM SIGCOMM 2001, pp. 111–122.

[3] A. Budhiraja, F. Hernandez-Campos,V. G. Kulkarni, and F. D. Smith,Stochastic differential equation forTCP window size: Analysis and ex-perimental validation, Probab. Engrg.Inform. Sci., 18 (2004), pp. 111–140.

[4] D.-M. Chiu and R. Jain, Analysis of theincrease and decrease algorithms forcongestion avoidance in computer net-works, Comput. Networks ISDN Sys-tems, 17 (1989), pp. 1–14.

[5] L. Guo and I. Matta, The war betweenmice and elephants, in Proceedings ofthe 9th IEEE International Conferenceon Network Protocols (ICNP), 2001,pp. 180–188.

[6] V. Jacobson, Congestion avoidance andcontrol, in Proceedings of ACM SIG-COMM 1988, pp. 314–329.

[7] L. Kleinrock, Queueing Systems: VolumeII: Computer Applications, John Wi-ley, New York, 1976.

K. AVRACHENKOV

INRIA Sophia Antipolis

Introduction to Operator Space Theory.By Gilles Pisier. Cambridge University Press,Cambridge, UK, 2003. $65.00. viii+478 pp.,softcover. ISBN 0-521-81165-1.

When asked by SIAM to review GillesPisier’s Introduction to Operator Space The-ory I was stumped. After all, I am not anexpert in the subject—far from it. In fact,though I had heard a number of lectureson operator spaces, that old bugaboo, in-ertia (aka laziness), always held me backfrom studying the subject in earnest. Af-ter reflecting on the proposal, I realizedthat here was my chance to try learningabout operator spaces; what’s more, I hadlectures scheduled in August/September atthe University of Pretoria to an audienceknowledgeable in C∗-algebras and Banachspaces. I was in the perfect position to digin and learn!There was a huge (and unanticipated)

drawback: no longer young, I was to findlearning new things more difficult and frus-trating. Nevertheless, after several falsestarts I was on my way. I added a numberof the references in Pisier’s Introduction tomy browsing material and dutifully boughtthe (fine) books of Effros and Ruan [1] andPaulsen [2].Once started I was soon hooked. Al-

ready, in Chapter 1, Pisier let me knowthat the rules of the game were different.This shouldn’t have been a complete sur-prise; after all, it is a different game thanI was used to trying to play. In Banachspace theory one is interested in isomorphicor isometric classification. What about inoperator space theory?Well, here we had better get some defi-

nitions in hand to try to make some math-ematical sense. An operator space, “simplyput,” is any closed linear subspace of a spaceB(H) of all bounded linear operators onsome Hilbert space H. Now every Banachspace is an operator space, but when wetalk about operator spaces we understandthat operator spaces are Banach spaces thatcarry with them added algebraic characterinherited from the ambient B(H). If X isthe operator space in question and X sitsinside B(H) as a closed linear subspace,then for each n ∈ N we can view Mn(X),the space of n× n matrices with entries inX, as sitting inside Mn(B(H))—which weidentify as B(Hn)—and herein lies the truetale of operator spaces. If Y is another op-erator space sitting inside B(K), say, andu : X → Y is a linear operator, u is saidto be completely bounded if for each n, the

Page 7: SIAM to review

BOOK REVIEWS 399

linear map un : Mn(X) → Mn(Y ) inducedby u,

un((xi,j)) = (u(xi,j)),

is bounded with supn ‖un‖ < ∞. Two op-erator spaces X and Y are completely iso-morphic if there is a linear isomorphismu : X → Y of X onto Y such that both uand u−1 are completely bounded.Back to Pisier’s book. With these defi-

nitions in hand, Pisier quickly gets the ballrolling. He states the fundamental factor-ization/extension theorem (without proof)and uses it. He indicates some basic, fre-quently used computations and presentsthese in enough detail to raise the activereader’s interest. He also indicates that inoperator spaces, Hilbert spaces (even of thesame dimension) can be completely differ-ent (pun intended, if you please). In fact,an early exercise deals with “row Hilbertspace” R and “column Hilbert space” C,two spaces that are isometrically isomor-phic to �2. Each lives inside B(�2), whichis viewed as consisting of infinite matrices.If Eij denotes the infinite (N × N) matrixwith a 1 in the ith row and jth column andzero elsewhere, then R is the closed linearspan of {E1j : j ∈ N} and C is the closedlinear span of {Ei1 : i ∈ N}. If u : C → Ris a completely bounded linear map, then uis Hilbert–Schmidt; hence C and R are notcompletely isomorphic.Having been raised in classical functional

analysis, it was intriguing, to say the least,to see Hilbert spaces of the same dimensionviewed as being so different. Pisier’s ap-proach caught my attention. This is a newgame and its rules worth learning. Therewas an immediate concern—how to recog-nize how to embed the Banach space intothe right B(H) in the right way for it to be“useful.” But nagging me in the back of mymind was the memory of the same result ofRuan that I’d seen years ago. Well, Pisierwastes no time, and in the second chapter hestates Ruan’s representation theorem andreformulates it for future (and frequent) use.He again skips proving it but gives one asense of immediacy by applying it to sev-eral basic operations. He defines quotientoperator spaces, he develops operator spaceduality, and he demonstrates how Ruan’srepresentation theorem is to be used in otherimportant and nontrivial ways.

Here I must interject what I believe isproper warning about “reading Pisier”: heis always “fair,” he never spoon feeds, he’salways interesting, and he never wastes time(his or the reader’s). Bring to any readingthe four P’s: pencil, paper, patience, andpersistence. You will be rewarded by deeperand fuller understanding the old-fashionedway—you’ll have earned it.Pisier’s rendition of operator spaces

moves along lickety split. It reads well evenif it requires (at least) a second and thirdreading. He has weighed his words. Muchis done in a short period/space. When hespends a chapter on the “Operator HilbertSpace,” you sense it’s important. He fol-lows that chapter with one on Haagerup’stensor product of operator spaces, a trulymagical construct. The initial chapters ac-custom the reader to working through ma-terial that is at one and the same time fresh,interesting, and challenging.Keep in mind that the subject of oper-

ator spaces is still in its relative infancy;that so much has been done is testimonyto the energy and talent of a solid corpsof dedicated mathematicians. Its noveltyis verified when Pisier spends a wholechapter on examples of operator spaces,examples that Pisier believes “are the bestcandidates to appear on a list of classicaloperator spaces.” What a rare opportunity,to engage a flourishing subject so close toits beginnings that its “classical” examplesare still being sorted out!At this stage I’ve given the merest hint

of the first half of this text. The second halfdevelops many themes close to the mathe-matical heart of the author, including twochapters on an operator space “take” onsimilarity problems. In the second half oneis struck by the variety of tools that arebrought to bear on operator space problems.There seems to be something for everyone.Is the investment worthwhile for you?

Before deciding, you might want to test thewaters. Once again the industry, energy,and talent of practicing operator spacerscomes to the fore. David Blecher has givena course for undergraduates on “Noncom-mutative Functional Analysis.” Try it. Ifyou like it, find it useful, or if it furtherpricks your curiosity, then Pisier’s Intro-duction to Operator Space Theory is a finelong-term investment. Backed by the books

Page 8: SIAM to review

400 BOOK REVIEWS

of Effros and Ruan and Pausen, you’ll havelots to keep you off the streets at night andout of trouble.

REFERENCES

[1] E. G. Effros and Z.-L. Ruan, OperatorSpaces, London Math. Soc. Monogr.(N.S.) 23, Clarendon Press, OxfordUniversity Press, New York, 2000.

[2] V. Paulsen, Completely Bounded Mapsand Operator Algebras, CambridgeStud. Adv. Math. 78, Cambridge Uni-versity Press, Cambridge, UK, 2002.

JOSEPH DIESTEL

Kent State University

Principles of Constraint Programming. ByKrzysztof R. Apt. Cambridge University Press,Cambridge, UK, 2003. $35.00. xii+407 pp.,hardcover. ISBN 0-521-82583-0.

Constraint Processing. By Rina Dechter.Morgan-Kaufmann, San Francisco, CA, 2003.$65.95. xx+481 pp., hardcover. ISBN 1-55860-890-7.

Of the most recent disciplines to arise incomputer science, one is devoted to mod-eling with constraints and to solving theresulting constraint satisfaction problems.Some authors call it “constraint program-ming,” which I will avoid to prevent con-fusion with “programming” as used in, say,“linear programming” or “object-orientedprogramming.” Other authors prefer “con-straint processing,” which I will avoid infavor of the more specific constraint satis-faction.Constraint satisfaction started as part of

artificial intelligence. In 1993 annual work-shops started, which became the CP (for“Constraint Processing”) series of confer-ences. Since 1996 Kluwer has published ajournal named Constraints exclusively de-voted to the subject. The blurb for thejournal lists as application domains ar-tificial intelligence, discrete mathematics,neural networks, operations research, de-sign and configuration, graphics, visualiza-tion and interfaces, hardware verificationand software engineering, molecular biol-

ogy, scheduling, planning, and resource al-location. Though it sounds like a blurb, itis a justified claim.Early books on the subject [4, 5] were

focused on constraint satisfaction as a basisfor a programming language. Tsang’s [10]is an exception: it is a pioneering precursorto the books under review. It is fitting thatthey were published shortly after [10] wentout of print.Considering how constraint satisfaction

arose in artificial intelligence, which is theconventionally accepted misnomer of whatis, in effect, the interdisciplinary and ex-perimental branch of computing, one mightconsider it to be marginal to the interests ofSIAM Review. As I hope to convince read-ers of this review, this is marginality of thebeneficial kind. Accordingly, I include somebackground, showing how constraint pro-gramming originated in numerical compu-tation, diverged from its origins, and seemsto be settling down in a way that makes itsrelevance to optimization easy to overlook.Logically, optimality, and feasibility are

equally important in constrained optimiza-tion. But to get anything done, one needsto concentrate on one or the other. In op-timization, optimality is put first. Con-straints are incorporated in the objectivefunction, whether by Lagrange multipliersor by the use of penalty functions.The opposite point of view regards the

constraints as primary. As an exercise tolearn to see the world from the point ofview of constraint satisfaction rather thanof optimization, let us consider an under-determined system of equations. If the sys-temwere linear and algebraic, then it is clearhow to obtain a neat characterization of theset of solutions. In the presence of transcen-dental functions no such characterization isavailable; one should be content with a sin-gle solution. Which? If we select this singlesolution by a preference function, then wehave arrived at a constrained optimizationproblem with the preference function as ob-jective function. And, of course, even withan underdetermined system of linear alge-braic equations one may prefer to selecta single solution by means of a preferencefunction. If this function is linear, thenwe have arrived at a linear programmingproblem.

Page 9: SIAM to review

BOOK REVIEWS 401

The two books under review do not seemto be connected to the numerical world ofoptimization. To make such a connection,one has to consider a constraint satisfactionsetting where the variables do not necessar-ily range over the reals and where the rela-tions in the constraints are not necessarilyequality or inequality. Typically, variablesrange over small finite sets; often variablesare boolean. The relations are often notbinary. For example, in the modeling ofdigital circuits, gates can be modeled byrelations between the boolean values on theterminals. For the inverter this relation isbinary, but that is an exception. Dechter’sbook is completely combinatorial, empha-sizing search and propagation techniques.Though Apt does not exclude real-valuedvariables and has a section on arithmeticconstraints on reals, this book is also farfrom the world of optimization.Yet the history of constraint satisfaction

is rooted in the classical notion of con-straint. The oldest scientific use of the term“constraint” is in the sense of “constrainedmotion.” Such motion presents a difficultyin Newtonian dynamics, which is satisfac-tory for the analysis of dynamical systemssuch as that of the Moon orbiting Earth.New techniques were needed to describe themotion of systems like a pendulum. If thebob were to follow the force of gravity only,it would go straight down. However, its mo-tion is constrained by the distance to thehinge being constant.The new technique, analytical dynamics,

appeared a century after Newton’s Prin-cipia. The principle of d’Alembert, whichwas first stated in its full generality byLagrange, recognizes two kinds of forces:“impressed forces” and “constraint forces.”The principle uses the fact that the latterdo no work under virtual displacements. Aspointed out by Lagrange, the principle pro-vides a much-needed simplification, whichenables easy description of the accelerationsof the constrained system.Though the books under review are far

removed also from the world of analyticaldynamics, it seems likely that the word“constraint” in the sense of constraint sat-isfaction was borrowed from analytical dy-namics by the engineers at MIT who au-thored the theses that can be regarded as

the origin of constraint satisfaction. Thefirst of these was Ivan Sutherland. In his1963 thesis [9] he defined “constraint” as

A specific storage representation [incomputer memory] of a relationshipbetween variables which limits the free-dom of the variables, i.e., reduces thenumber of degrees of freedom of thesystem.

The subject of Sutherland’s thesis wasSketchpad, an interactive drawing program.It allows the user to create, replicate, andmodify geometric objects. The relations be-tween these objects give rise to sizable sys-tems of equations. Their provenance causedSutherland to think of them as constraintsin the sense of analytical dynamics.The two salient features of the system of

equations generated by the program were(a) that many equations were not linear,and (b) the system was sparse. Sketchpademployed two methods for solving it, bothexploiting the sparseness of the equations.The first method used a graph represent-ing dependencies among the variables. Infavorable cases, variables were found whosevalues did not depend on those of any othervariables. Their values were then “propa-gated” through the graph. For the propa-gation algorithm Sutherland acknowledgedinspiration provided by Moore’s shortestpath algorithm [6].What to do if the system of equations

does not unravel in this way? It is unlikelythat Newton’s method would have worked.Instead, Sutherland applied to his nonlinearsystems the Gauss–Seidel method that hadbeen popularized in engineering by R.V.Southwell under the term “relaxation” [8].After all, linearity is not essential for relax-ation to work: it is just one way that enablesone to isolate one variable in each equationand to ensure that each of the variables isisolated in at least one of the equations.In typical applications of relaxation, the

system of equations is sparse. But the typ-ical system is linear, and the usual way ofdealing with sparseness is to order the equa-tions and the variables in such a way thatthe coefficient matrix gets an advantageousblock or band structure. In the case of non-linear systems, as in Sutherland’s case, thematrix would be the Jacobian.

Page 10: SIAM to review

402 BOOK REVIEWS

The other way of dealing with sparsenessis to consider what could be called the “de-pendency network,” where the nodes arevariables and where nodes are linked if thecorresponding variables occur in the sameequation. Instead of first determining theJacobian and attempting to transform itinto a favorable block- or band-structuredform, one can avoid both steps and letthe solving process automatically determinewhat its structure is: note which of thevariables have changed sufficiently in theprevious step and pick for the next equa-tion one of the few that contains at leastone of these variables. In this way, changein variable values propagates by means ofconstraints. Refinements of the idea, underthe name of “constraint propagation,” havebecome an important part of research inconstraint satisfaction. It is data-directedcontrol of the relaxation algorithm.To conclude that Sutherland used con-

straint propagation, one has to read be-tween the lines. I think it likely that he did.If so, he is the first that I know of. Whoeverwas the first started a rich new research areathat is central to constraint satisfaction.Thus we see in Sutherland’s 1963 thesis,which was focused on the development of apractical drawing tool, the emergence of twothemes that recurred throughout the newdiscipline of constraint satisfaction: con-straint propagation and relaxation.However, the setting of Sketchpad—real

variables and equality as only constraintrelation—is not representative of what con-straint programming came to be. It is typ-ical for variables to range over finite sets,often very small ones, like the two truth val-ues or the colors of a graph-coloring prob-lem. It is also typical for the constraintrelations to be taken from a large assort-ment that includes some recently inventedones, such as the seven relations betweentime intervals [1]. An important relation onsmall finite sets is the disequality relation (aname chosen to distinguish it from ≤ and ≥among numbers). Especially important isthe version of the disequality relation thattakes an arbitrary number of arguments.An important step away from numerical

constraint satisfaction problems was takenin the 1972 MIT thesis by David Waltz[15, 14]. It was concerned with computer

recognition of three-dimensional polyhedralbodies in two-dimensional images. This re-quires the lines in the image to be labeledaccording to whether they arise from ashadow, from one body obscuring another,or from the intersection of two faces of thesame body, among other possibilities.In Waltz’s thesis, variables range over

small, application-specific finite sets. In ad-dition, his relations are also application-specific, in this case arising from the variousways in which lines in the image can inter-sect. This is about as different from equa-tions in real-valued variables as one can get.Yet, as a constraint satisfaction problem,Waltz’s had in common with Sutherland’sthat it was sparse: most constraints relatedbut a small subset of the variables. Thatagain suggested data-directed control.Waltz’s algorithm contains another im-

portant innovation. In conventional relax-ation, one always associates one value witheach variable, even though these values maynot be anywhere near their solution values.That is, one starts with a guess. Each stepin the relaxation changes (for the better, asone hopes) the guess for one variable. Infavorable cases, these guesses approach asolution.Waltz did not guess; he knew what he

did not know. Accordingly, he associatedwith every variable a “domain”: the set ofits possible values. Using a constraint inx and y could at best mean that the non-occurrence of certain values in the domainfor x allowed the removal of one or morevalues from the domain of y. If y occursin another constraint, say, with a variablez, then this reduction of its domain maytrigger a reduction of the domain of z. Thispropagation process is often referred to as“relaxation,” even though it is far removedfrom the setting of equations in real vari-ables.A typical example of reasoning a la Waltz

is the situation where all of the variablesx, y, and z have to be different. If the setsof values associated with these variables are{a, b, c}, {b, c}, and {b, c}, respectively, thenone can eliminate b and c as possible valuesfor x. In the terminology of the field, thesevalues can be eliminated as being inconsis-tent—inconsistent with the constraint thatthe three variables be different.

Page 11: SIAM to review

BOOK REVIEWS 403

If a constraint satisfaction problem hasone solution, then all one has to do to findthat solution is initially to allow as manyvalues as possible for all variables and thento remove all inconsistent values. A typicalconstraint satisfaction problem is NP-hard,so this is easier said than done. The fieldhas advanced by identifying various feasi-bly computable levels of consistency shortof complete absence of inconsistent values.These are identified by qualifying the word“consistency” in various ways. A bewilder-ing variety of such terms is used. Apt’schapter “Local Consistency Notions” listsnine varieties of local consistency, followedby a final section, “Graphs and Consis-tency.” In Dechter this last section is ex-panded into an entire chapter. It representsthe culmination, so far, of Sutherland’s newway of approaching sparseness by exploit-ing the structure of the dependency graphof the constraints.Next to consistency, the propagation in-

troduced by Waltz is an important topicin constraint satisfaction. Both Apt andDechter devote a chapter to it. Apt’s is no-table for his surprisingly general framework,going far beyond the confines of constraintsatisfaction. He starts with properties ofsets of functions on partially ordered setsthat have one or more of the properties ofbeing inflationary, monotonic, idempotent,andmutually commutative. He gives a com-pelling development of fixpoint algorithmsthat iterate such functions. One would ex-pect results such as these to be included inset theory texts, were it not for the fact thatthey were only recently obtained (by Apt)in order to better understand the wide vari-ety of ad-hoc propagation algorithms foundin the literature.Next to consistency and propagation,

the main topic of constraint satisfaction issearch. In any nontrivial constraint satis-faction problem, propagation to some levelof consistency is not enough to obtain a so-lution. When propagation has exhausted allpossibilities of removing inconsistent valuesfrom the domains, further progress dependson splitting and repeating propagation inthe resulting constraint satisfaction prob-lems. The properties of the resulting treeof constraint satisfaction problems, and thealgorithms for traversing it, constitute the

topic of search. Apt has an excellent one-chapter introduction; Dechter digs deeperwith three chapters: “Look-Ahead,” “Look-Back,” and “Stochastic Search” (which in-cludes simulated annealing).We have seen how constraint satisfaction

arose with Sutherland’s work on numericalproblems and how Waltz adopted Suther-land’s propagation for his combinatorialproblem in scene recognition. In the combi-natorial setting constraint satisfaction de-veloped into a new discipline. As a sign ofits maturity, it started competing with in-teger programming as the method of choicefor solving combinatorial problems. Take,for example, the notorious 10-machine/10-job benchmark mt10 in job-shop scheduling.It was proposed by Muth and Thompson [7]in 1963. For some time various researcherssucceeded in finding ever better solutions.These improvements came to a halt afterabout ten years. For another ten years fail-ure to improve the world record on mt10hardened the suspicion that the optimumhad been found. At the end of this period,Carlier and Pinson [2] succeeded in reduc-ing the search space sufficiently to provethat the long-standing record is indeed theoptimum.All this work was done in integer pro-

gramming: it consisted of improvements insearch, in generating cutting planes, and,of course, it benefited from increased com-puter performance. One sign of the comingof age of constraint satisfaction in the 1990swas that mt10 was more and more routinelysolved by constraint satisfaction methods,without reliance on integer programming.Although in this instance a purely combi-

natorial constraint satisfaction attack wassuccessful, it is likely that there is great po-tential in developing constraint satisfactionfor numerical problems, that is, to exploitthe possibility that the variables range overthe reals and that the domains take theform of intervals bounded by floating-pointnumbers. So far this approach to numer-ical computation has remained outside ofthe mainstream of constraint satisfaction.It is not mentioned in Dechter. In Aptthere is a brief section on arithmetic con-straints between real-valued variables. Forthis neglected area, the best source is stillNumerica by Van Hentenryck, Michel, and

Page 12: SIAM to review

404 BOOK REVIEWS

Deville [13], where constraint satisfactioncan be seen to be a competitive alterna-tive to the continuation method for solvingnonlinear equations. The methods in thisbook are applicable to systems of nonlinearinequalities as well and also to nonconvexglobal optimization.Constraint satisfaction as understood by

Apt and by Dechter seems to have no con-nection with optimization as understoodby the applied mathematics community.Yet it is highly relevant. It is the greatmerit of Hooker’s Logic-Based Methods forOptimization [3] to make the connection.This is not obvious from the title becauseof the emphasis on logic. Indeed, Hooker’smotivation is to break out of the classicaloptimization framework by the use of logicas a more flexible modeling tool. In thisHooker was preceded by Van Hentenryckwho noticed that languages like AMPL andGAMS lacked the scope for the enlargedmodeling capabilities of constraint satisfac-tion. This led him to the language OPL[12], now used in some of the ILOG soft-ware. Early implementations of constraintsatisfaction used logic as the programmingframework [11, 5]. This connection ledHooker to include an excellent account ofconstraint satisfaction in [3].For those in the optimization field to

benefit from the developments in constraintsatisfaction, my summary recommendationis to study both books under review first,to obtain the necessary distance from thetraditional optimization mindset. There issurprisingly little overlap between the twobooks. Apt emphasizes the roots in logic,Dechter, the algorithmic complexity as-pects. It is best to start with Apt, for thelarger picture and for an introduction to thetopics where they overlap. Then Numerica[13] for a glimpse of the potential of con-straint satisfaction in classical numericalcomputation. Then back to optimizationwith Hooker [3].

REFERENCES

[1] J. Allen, Maintaining knowledge abouttemporal intervals, Comm. ACM, 26(1983), pp. 832–843.

[2] J. Carlier and E. Pinson, An algorithmfor solving the job-shop problem, Man-agement Sci., 35 (1989), pp. 164–176.

[3] J. Hooker, Logic-Based Methods for Op-timization: Combining Optimizationand Constraint Satisfaction, Wiley-Intersci. Ser. Discrete Math. Optim.,John Wiley, New York, 2000.

[4] W. Leler, Constraint Programming Lan-guages: Their Specification and Gen-eration, Addison-Wesley, Reading,MA, 1988.

[5] K. Marriott and P. J. Stuckey, Pro-gramming with Constraints: An In-troduction, MIT Press, Cambridge,MA, 1998.

[6] E. F.Moore, On the shortest path througha maze, in Proceedings of the Interna-tional Symposium on the Theory ofSwitching, Harvard Ann., 3 (1959),pp. 285–292.

[7] J. F. Muth and G. L. Thompson, Indus-trial Scheduling, Prentice-Hall, En-glewood Cliffs, NJ, 1963.

[8] R. V. Southwell, Relaxation Methodsin Engineering, Oxford UniversityPress, New York, 1940.

[9] I. Sutherland, Sketchpad: A Man-Machine Graphical CommunicationSystem, Ph.D. thesis, Department ofElectrical Engineering, MIT, Cam-bridge, MA, 1963.

[10] E. Tsang, Foundations of Constraint Sat-isfaction, Academic Press, New York,1993.

[11] P. Van Hentenryck, Constraint Satis-faction in Logic Programming, MITPress, Cambridge, MA, 1989.

[12] P. Van Hentenryck, The OPL Optimiza-tion Programming Language, MITPress, Cambridge, MA, 1999.

[13] P. Van Hentenryck, L. Michel, andY. Deville, Numerica: A Model-ing Language for Global Optimiza-tion, MIT Press, Cambridge, MA,1997.

[14] D.Waltz, Understanding line drawings inscenes with shadows, in The Psychol-ogy of Computer Vision, P. H. Win-ston, ed., McGraw-Hill, New York,1975, pp. 19–91.

[15] D. L. Waltz, Generating Semantic De-scriptions from Drawings of Sceneswith Shadows, Technical reportAITR-271, Computer Science and Ar-tificial Intelligence Laboratory, MIT,Cambridge, MA, 1972.

MAARTEN VAN EMDEN

University of Victoria

Page 13: SIAM to review

BOOK REVIEWS 405

Global Smoothness and Shape PreservingInterpolation by Classical Operators. BySorin G. Gal. Birkhauser Boston, Boston, MA,2005. $79.95. xiv+146 pp., hardcover. ISBN0-8176-4387-7 .

This is a short book (146 pages), and itcovers interpolation methods under the as-pects of smoothness and shape preserva-tion. Global smoothness is defined in termsof the modulus of continuity of a function,either univariate or bivariate. Global shapepreservation addresses recovery of mono-tone or convex regions by interpolatory op-erators. Convexity for bivariate functionsis defined as fxxyy > 0.The author covers these aspects for “clas-

sical” interpolation operators, such as thoseof the type Lagrange, Grunwald, Hermite–Fejer, Shepard, and Jackson trigonometric,again for both the univariate and bivariatecases.The book has the following chapters:

Chapter 1: “Global Smoothness Preserva-tion, Univariate Case.” Chapter 2: “Par-tial Shape Preservation, Univariate Case.”Chapter 3: “Global Smoothness Preserva-tion, Bivariate Case.” Chapter 4: “PartialShape Preservation, Bivariate Case.”Each chapter covers results relating to

the five interpolation operators mentionedabove. Many of the presented results arepublished in this monograph for the firsttime.The book is written in a very nonverbal

style. Theorems are stated as equations,without text indicating their meaning orpurpose. This style makes for extremelyhard reading for anyone not fully immersedin the field.An appendix shows 20 images of appli-

cations of various Shepard interpolants toa suite of five bivariate functions, givenby simple polynomial or exponential ex-pressions over [0, 1] × [0, 1]. There is nodiscussion about the quality of these inter-polants (some look awful). No comparisonsin terms of smoothness or shape preserva-tion are given.This book is written for researchers in ap-

proximation theory, and it contains a largenumber of results, presented in a conciseway. The back cover claims it is also valu-able for more applied fields, such as data

fitting, computer aided geometric design,or engineering; this reviewer cannot sup-port that claim. Smoothness and shapepreservation are of significant importancein those applied fields, but they are typi-cally addressed in the context of spline-typefunctions, an area left untouched by thismonograph.

GERALD E. FARIN

Arizona State University

Large Deviations and Metastability. ByEnzo Olivieri and Maria Eulalia Vares. Cam-bridge University Press, Cambridge, UK, 2005.$140.00. xv+512 pp., hardcover. ISBN 0-521-59163-5.

The Least Improbable Path. The scene isa fishing boat off the Alaskan coast. A fishthrashes around in a bucket; eventually itmanages to flip out of the bucket and backinto the ocean. This is a large deviation:lots of futile attempts, then one big success.Large deviation theory provides a way

to calculate the probabilities of improbableevents. Why would one want to do this?The idea is that if one waits for a long time,such an event will happen. The theory pre-dicts that with overwhelming probability itwill happen in a special way, the way thatis the least improbable. Thus large devia-tion theory gives precise predictions abouthow interesting things happen. What couldhave more charm?The book under review begins with a

thorough introduction to large deviationtheory. This subject has already beentreated in a number of books (for instance,[1, 2, 3]). The new feature is that the au-thors apply it to models of metastability.In their approach there is a system withsome geometric structure, perhaps a phys-ical system. On one time scale the sys-tem appears stable, but on another, muchlonger, time scale the system makes a tran-sition to another, more stable, equilibriumstate. This transition is the large devia-tion. One possible example is a liquid-gassystem. The metastable state is the gassaturated with vapor. Eventually sponta-neous fluctuations produce a condensationnucleus, and the system makes a relatively

Page 14: SIAM to review

406 BOOK REVIEWS

rapid transition to a combined liquid-gasphase.

The Cramer–Chernoff Theorem. Theclassic result for typical fluctuations or mod-erate deviations is the central limit theo-rem. There are independent identically dis-tributed random variables X1, X2, X3, . . . .Suppose that each has mean m and stan-dard deviation σ < +∞. For sample size nthe sample mean is

(1) Xn =X1 + · · ·+Xn

n.

Consider the normalized sample mean

(2) Zn =Xn −m

σ/√n

.

Then the distribution of Zn approaches thestandard normal (Gaussian) distribution.In particular, moderate fluctuations of thesample mean tend to be proportional toσ/√n, which is small for large n. This is

what makes statistical inference possible.The corresponding classic result for large

deviations is what the authors call theCramer–Chernoff theorem. (This result hadpredecessors in the context of equilibriumstatistical mechanics.) The theorem de-scribes fluctuations of the sample meanof constant magnitude, independent of n.Again there are independent identically dis-tributed random variables X1, X2, X3, . . . ,each with probability distribution µ. Thetheorem gives information about the prob-ability distribution µn of the sample meanXn. Let µ be the moment generating func-tion of µ, that is,

(3) µ(β) =∫ ∞−∞

eβx dµ(x).

Consider the quantity

(4) f(β) = log(µ(β)).

From this construct the rate function I(x)by the Legendre transform:

(5) I(x) = supβ

(βx− f(β)).

The theorem gives the exponential asymp-totics of the probability distributions µn interms of the rate function I(x). That is, for

suitable subsets S we have the exponentialasymptotics

(6) µn(S) ≈ e−n infx∈S I(x)

as n→∞.Such estimates reduce the calculation of

events of low probability to the minimiza-tion of the rate function I(x). The relationof I(x) to f(β) is given by the Legendretransform, which may at first seem an oddkind of relation. However, in the simplecase when the supremum is assumed to beat some β and everything is smooth, we havef ′(β) = x and I ′(x) = β. So the Legendretransform relation between two functionsjust says that their derivatives are inversefunctions to each other.The Legendre transform formulas resem-

ble those of equilibrium statistical mechan-ics [4]. In the role of the sample mean wouldbe (minus) the energy function. The quan-tities β and x would become the inversetemperature and (minus) the expected en-ergy. The rate function would correspondto (minus) the entropy. This analogy maybe carried much further, and in fact equilib-rium statistical mechanics may be thoughtof as a domain for a very general kind oflarge deviation theory. Some of this storyis reviewed in the book.The assertion of exponential asymptotics

is that µn(S) ≈ exp(−n infx∈S I(x)) asn→∞ for suitable subsets S. This occursthroughout large deviation theory, and itdeserves a more careful statement. Thiswas provided by Varadhan’s large devi-ation principle [5]. The context for thisdefinition is a family of probability mea-sures µn defined on the Borel subsets of ametric space M . There is a rate functionI : M → [0,+∞] that is lower semicon-tinuous. This means that xn → x impliesthat lim infn→∞ I(xn) ≥ I(x). (In the limitit can jump down, but not up.) The con-dition is that there are lower bounds andupper bounds for µn probabilities in termsof I(x). In particular, ifG is an open subset,then there is a lower bound

(7) lim infn→+∞

1nlog µn(G) ≥ − inf

x∈GI(x).

Furthermore, if C is a closed subset, thenthere is an upper bound

(8) lim supn→+∞

1nlog µn(C) ≤ − inf

x∈CI(x).

Page 15: SIAM to review

BOOK REVIEWS 407

Notice that the asymptotics of the µnprobabilities for the sample mean are givenby the infimum of the rate function over asuitable subset. This is the mathematicalexpression of the fact that the least improb-able situation dominates. If the subset ismade larger, but the infimum of the ratefunction over the subset is the same, thenthe asymptotic result for the probability isthe same. (This is analogous to the result inequilibrium statistical mechanics that saysthat the largest contribution to the entropydefined by an energy constraint comes froma region close to the energy surface.) Thesuitable subsets for which the asymptoticcalculation works need not include all sub-sets for which the probability is defined.For instance, it would be a mistake to ap-ply the result to a subset {a} with only onepoint. For a continuous probability mea-sure µ the probability µn({a}) = 0, whilethe value I(a) could well be finite. As wehave seen, this problem cannot arise for anopen subset.The most elementary illustration of the

Cramer–Chernoff theorem is the case wheneach Xi is one or zero with probabil-ity 1

2 for each case. In this case thesample mean is just the sample propor-tion. Then f(β) = log( 1

2eβ + 1

2 ). Thishas derivative x = eβ/(eβ + 1). Invertthis to get β = log(x/(1 − x)). In-sert to get the rate function I(x) =x log(x) + (1 − x) log(1 − x) + log 2. Thishas an interior minimum with value zeroat x = 1

2 and rises to a value of log 2at x = 0 and x = 1. The values at theend points have an obvious meaning: theprobability of all ones (or of all zeros) ise−n log 2 = ( 1

2 )n. Outside this interval the

rate function has the value +∞, corre-sponding to the fact that such deviationsare impossible. (This rate function is notcontinuous at the end points, but it is lowersemicontinuous.) If the subset is a strictlypositive distance away from 1

2 , then theinfimum will be strictly positive, and theprobability will be exponentially small in n.

The Friedlin and Wentzell Theory. Onegreat success of large deviation theory is itsapplication to Markov diffusion processesby Friedlin and Wentzell. They considerednonlinear stochastic differential equations

of the form

(9)dXt

dt= b(Xt) + εαt.

The solution Xt is a function of time withvalues in n-dimensional Euclidean space.There is an initial condition X0 = x, wherex is the starting point. The vector fieldb gives the deterministic part of the mo-tion. The real parameter ε determines theamount of randomness. Thus when ε = 0we get dxt/dt = b(xt), an autonomous sys-tem of ordinary nonlinear differential equa-tions that determines the flow. The αt inthe driving term is white noise, and that iswhat makes the equation stochastic or ran-dom. In some formal sense this is a similarsituation to that of the Cramer–Chernofftheorem, in that one is dealing with a se-quence of independent random variables. Inthe present case these are the magnitudes ofthe Fourier coefficients of the white noise.The large deviations come in when we

consider ε as a small parameter. Then themotion does something unusual only aftera long wait. What it does at that time isdetermined by large deviation theory, andso it may also be deterministic in a certainsense. However, this determinism resultsfrom the fact that the least improbable so-lution of the random equation dominatesall the others.The stochastic differential equation needs

care in interpretation, as the Fourier seriesthat defines white noise does not converge.(After all, the Fourier coefficients are ran-dom with identical distributions, and sothey tend to be of the same general size.)However, with the aid of the Ito integral,the equation defines a probability measureP ε on the space of continuous functionsφ on the time interval [0, T ] satisfying theinitial condition φ(0) = x. This space isequipped with the usual metric that definesuniform convergence; the open subsets andclosed subsets that occur in the large devi-ation principle are defined in terms of thismetric.The large deviation principle result con-

tinues to hold with a certain choice of ratefunction. The role of n is played by 1/ε2,and µn is replaced by P ε. We are deal-ing with sets of functions instead of setsof numbers, but the result expressing theprobabilities in terms of the rate function

Page 16: SIAM to review

408 BOOK REVIEWS

is the same. The rate function is given ex-plicitly by

(10) I(φ) =12

∫ T

0

∣∣∣∣dφ(t)dt− b(φ(t))

∣∣∣∣2

dt.

It is considered to be infinite if φ is not abso-lutely continuous or if the initial conditionis not satisfied. The asymptotic result isthen that

(11) P ε(S) ≈ e− 1ε2

infφ∈S I(φ)

in the small ε limit for suitable subsets S.To use the rate function, one has to min-

imize it over certain subsets. In generalthis can be a nasty variational problem.However, there is one case in which every-thing is relatively simple, that of a timereversible process, that is, one in which thedetailed balance condition is satisfied. Thisoccurs in the context of a gradient system, inwhich b(x) = −∇U(x). The deterministicmotion is gradient flow downward toward aminimum of the function U(x). There is awonderful identity

(12)I(φ) =

12

∫ T

0

∣∣∣∣dφ(t)dt+ b(φ(t))

∣∣∣∣2

dt

+ 2[U(φ(T ))− U(φ(0))].

In particular, I(φ) ≥ 2[U(φ(T ))−U(φ(0))].This already shows that going uphill againstthe gradient is improbable. A closer analy-sis of the identity leads to a description ofthe least improbable paths, at least for thecase of ε �= 0 but very small. The uphill partof such a path follows a reversed determin-istic trajectory where dφ(t)/dt = −b(φ(t)),an upward gradient flow. This would seeman unlikely thing to happen, but it may bethe least unlikely of all.The authors consider the situation in

which U(x) grows at infinity, but is boundedbelow with several distinct minima. In thesimplest case there are two minima; thisis the famous double-well system. Here isthe description of the motion in the limitof small ε. The diffusing particle is startedin the shallower well. For a long time it ap-pears to be in equilibrium with a stationarymeasure that is concentrated around theminimum point in this well. This is whatmight be called a metastable state. Finally,after many, many futile random attempts,the particle makes it over a saddle point

and falls into the deeper well. The time tomake this transition, the tunneling time,is long and highly random. On the otherhand, the mechanism of the transition issimple and well determined: up the shallowwell against the gradient; down the deepwell with the gradient. The fish is out ofthe bucket.

The Harris Contact Process. The exam-ples of metastability that most interest theauthors are where the process has a spatialstructure. One example they consider is theHarris contact process. Consider the finiteset {−N,−N +1, . . . , N − 1, N}. Each sitein this set is either empty or occupied by atmost one particle. The evolution is Marko-vian: each particle disappears after an expo-nential waiting time of rate 1, independentlyof all the rest; at any time, each particlehas the possibility to create a new particleat each empty neighboring site, with rate λ,also independently of everything else. Theauthors explain (p. 243): “In the biologicalinterpretation, occupied sites correspond toinfected individuals, while empty sites cor-respond to healthy individuals. The recov-ery rate is one, λ is the rate of propagationof the infection (in each direction), and theprocess can be thought of as a very simpli-fied mathematical model for the spread ofan infection.”Consider the situation when N is large.

If λ is small, then the particles disappearquite rapidly. However, if λ is sufficientlylarge, then there is a rough equilibrium be-tween infection and recovery, at least fora long time. Eventually an unusual eventoccurs where there are many recoveries thatare not compensated for by new infections,and the infection completely dies out.The authors present a theorem to the

effect that the time when this happens isvery random. More precisely, suppose thatinitially all the individuals are infected. LetTN be the time when all the individuals re-cover. Suppose λ is larger than a certaincritical value. Then there is a normalizationβN such that the distribution of TN/βNconverges to the distribution of an expo-nential random variable with mean 1. Anexponential random variable has a standarddeviation equal to its mean, so this is an in-dication of the asymptotic unpredictabilityof TN .

Page 17: SIAM to review

BOOK REVIEWS 409

The Stochastic Ising Model. The goal ofthe book is to study systems with even morecomplicated spatial structure. The proto-type is the stochastic Ising model. A con-figuration of the model is an assignment ofa ±1 spin to each site in a two-dimensionalsquare lattice with periodic boundary con-ditions. The energy function on this con-figuration space favors alignment of adja-cent spins and also favors +1 spins over −1spins. The discrete time stochastic dynam-ics is given by a Metropolis Markov chain,reversible with respect to the Gibbs equilib-rium measure defined by the energy func-tion and the temperature parameter. Thesmall parameter that makes this a largedeviation problem is the temperature.In this system the metastable state is

one in which most spins have the value−1. Since the temperature is low, there isa strong tendency to alignment, and thespins will tend to remain mostly with the−1 value. However, eventually there will bea transition to the stable equilibrium state,in which the spins have mostly +1 values.For this to happen, regions of +1 valueswill have to grow and consolidate. The newfeature of such a problem is that the energyfunction hasmany local minima. During thedecay from the metastable to the stable sit-uation, the system will typically visit manybasins of local minima, remaining there forexponentially long intervals of time.The authors derive such results from a

general theory of a certain type of time re-versible Markov chains. There is an energyfunctionH(x) defined for the configurationsx of the system. The transition probabilityfor a transition from x to y �= x is of theMetropolis form

(13)P (x, y)=q(x, y) exp(−βmax[H(y)−H(x), 0]).

It is required that q(x, y) = q(y, x). The pa-rameter β is the inverse temperature. Thusif H(y) > H(x), the system is discouragedfrom making a transition from the lowerenergy state to the higher energy state. Ifβ is large, then it is highly discouraged.Otherwise, if H(y) < H(x), then the tran-sition probability is just q(x, y); there is noparticular problem making a transition toa lower energy configuration. It is not hardto see that the equilibrium measure for thischain is the Gibbs measure with discrete

density π(x), that is, that π(x) is propor-tional to exp(−βH(x)). Furthermore, thetime reversibility condition, or detailed bal-ance condition, is satisfied:

(14) π(x)P (x, y) = π(y)P (y, x).

The analysis is for large β values. For thissystem the mechanism of the large devia-tion is no longer so straightforward. Con-sider again the scene on the fishing boat.The fish might thrash around in the bucket,leap from the bucket to the deck, struggleall over the deck, and only then find a pathover the side and into the sea. The au-thors describe a situation that is even morecomplicated (pp. 337–338):

In the case of a set G ‘completely at-tracted’ by a unique stable point, itturns out that the exit from G, when itoccurs, follows a quite fast path, takingplace in an interval of time T indepen-dent of β, without ‘hesitations’. Theexponentially long (in β) time neededby the process to exit from G is aconsequence of the exponentially smallprobability that an excursion outside Gtakes place in a time independent of β.So the typical time needed to see thefirst excursion outside G is very long asa consequence of a very long series ofunlikely attempts before the successfulone, but the time spent during this firstexcursion is relatively short.

On the contrary, in the case of a setG containing many minima for the en-ergy, the time needed to see the first ex-cursion is still exponentially large in βbut, in general, also the time spent dur-ing the first excursion from G will typ-ically be exponentially large in β, eventhough with a smaller rate. Indeed thefirst excursion will involve random fluc-tuations inside suitable ‘permanencesets’ during exponentially long randomintervals of times (‘resistance times’).As will appear clear, a mechanism ofexit in a finite interval of time andwithout hesitation is, in general, verylow in probability.

We shall see that when G containsmany minima for H, even the formu-lation of the problem of typical pathsof exit from G changes drastically withrespect to the case of completely at-tracted domains. Whereas in the com-pletely attracted case we have to de-termine single optimal paths of exit, inthe general case we have to introducegeneralized paths given by sequences

Page 18: SIAM to review

410 BOOK REVIEWS

of permanence sets with relative per-manence times. The single trajectoryinside a permanence set cannot be spec-ified. The specification of a typical tubeof exit from a general domain G is in-trinsically stochastic: the exit from apermanence set is exponentially longin β and tends to be unpredictablefor large β; moreover the behavior in-side a permanence set is well describedby a restricted equilibrium (conditionalGibbs) measure.

These general assertions are backed upby precise definitions and theorems. Insummary, this book is a serious and schol-arly work dealing with deep and difficultissues at the frontier of probability theory.Its subject matter is not only of interestfor interacting particles or spins, but alsofor any system with a random evolution inan environment defined by a function withmany local minima.

Acknowledgment. I thank Rabi Bhat-tacharya, Maria Reznikoff, Daniel Ueltschi,and Jan Wehr for comments on drafts ofthis review.

REFERENCES

[1] A. Dembo and O. Zeitouni, Large De-viations Techniques and Applications,2nd ed., Springer-Verlag, New York,1998.

[2] F. den Hollander, Large Deviations,Fields Inst. Monogr. 14, AMS, Provi-dence, RI, 2000.

[3] J.-D. Deuschel and D. W. Stroock,Large Deviations, Academic Press, SanDiego, CA, 1989.

[4] H.-O. Georgii, Gibbs Measures and PhaseTransitions, Walter de Gruyter, Berlin,1988.

[5] S. R. S. Varadhan, Asymptotic probabili-ties and differential equations, Comm.Pure Appl. Math., 19 (1966), pp. 261–286.

WILLIAM G. FARIS

University of Arizona

Mathematical Inequalities. By N. G. Pach-patte. Elsevier Science & Technology Books,Amsterdam, 2005. $198.00. xii+591 pp., hard-cover. ISBN 0-444-51795-2.

On another occasion, this reviewer wrote ahistory of inequalities in a volume [1] dedi-cated to Richard Bellman. There I arguedthat certain books were vital to research inthis area. Here we have yet another bookon inequalities, so I think it appropriate, insetting the context, to devote a short essayto the history of books on inequalities.Discounting an early Greek book, Iso-

perimetric Figures, by Zendoros, which islost, the first book on inequalities is therenowned book of Hardy, Littlewood, andPolya [2]. Hardy, in his retiring presiden-tial address to the London Mathemati-cal Society [3], noted that “to find allthree inequalities in one volume is impossi-ble,” so he, together with Littlewood andPolya, would undertake the writing of abook. (The three inequalities were Holder’s,Minkowski’s, and the arithmetic-geometricmean inequality.) The intention was for thebook to be encyclopedic.The next books were by Beckenbach and

Bellman [4] and Mitrinovic [5]. The formerhas an initial chapter that overlaps with [2],but the rest of the book is largely disjoint.On the other hand, [5] contains many ofthe general results of [2] and a great deal ofwork by Eastern European mathematiciansthat was missed by [2] and [4].Mitrinovic was a writer and collector

of histories of inequalities. He had writ-ten several shorter monographs (in Serbo-Croatian) on these histories of particular in-equalities with various colleagues. Towardthe end of his life he began a project todocument all inequalities in books to bepublished in English. With various collab-orators, the results were the four volumes,[6] on means, [7] on geometric inequalities,[8] on inequalities between functions andtheir integrals or derivatives, and [9] on theclassical inequalities. This reviewer was ajoint author for the last two. In addition,I was asked to collaborate on a fifth vol-ume to be titled Particular Inequalities. Ideclined. It was assumed that some sort ofpreliminary manuscript existed, but afterMitrinovic’s death it was never found.In the meantime, other books on in-

equalities were published. They range fromthe elementary Geometric Inequalities [10]by Kazarinoff to sophisticated applicationsof which we can mention a few. Amongthose very closely written with applications

Page 19: SIAM to review

BOOK REVIEWS 411

in mind are Marcus and Minc, A Surveyof Matrix Theory and Matrix Inequalities[11]; Lakshmikantham and Leela, Differen-tial and Integral Inequalities [12, 13]; Walter,Differential and Integral Inequalities [14];Protter and Weinberger, Maximum Princi-ples in Differential Equations [15]; Bainovand Simeonov, Integral Inequalities and Ap-plications [16]; and Agarwal, DifferentialEquations and Inequalities: Theory, Meth-ods and Applications [17].Then there are books centered around

a particular idea or ideas of inequality.Bottema, Geometric Inequalities [18] wasa percursor of [5]. The beautiful book onmajorization by Marshall and Olkin [19]and that of Pecaric on convex functions[20] contain the theory and many of the ap-plications, particularly to statistics. Mean-while, books centered on a particular in-equality include Kwong and Zettl [21] on theKolmogorov–Landau inequalities, Agarwaland Pang [22] on Opial’s inequality, andthe two on Hardy’s inequality by Opic andKufner [23] and Kufner and Persson [24].One other book worth mentioning is

Bullen’s A Dictionary of Inequalities [25],which is a selection of classical and theauthor’s favorite inequalities. The currentbook under review might be described ina similar way, although the scope of thebook is much narrower. It is a collec-tion of five chapters of inequalities whichcome under the titles of convex functions,Hardy’s inequalities, Opial’s inequalities,Poincare–Sobolev inequalities, and finallyLevin–Lyapunov inequalities. Each chap-ter gives proofs of the basic inequalitiesand then concludes with a list of citationsof other inequalities.The preface and introduction to the chap-

ters refer to the tradition of [1, 4, 5] as aguide to the present book, and do not rec-ognize the numerous other books where theproved inequalities might be found. Theinequalities, which are simply cited andunproved, are a hodgepodge collection of“newer” and older results with no discern-able pattern. The author claims in the pref-ace and introduction that applications arekept in mind. This reviewer believes thattoo many of the inequalities in these sec-tions are generalizations for generalization’ssake and have too complicated hypothesesand/or conclusions to have much interest

in the applications. The reviewer has foundthis was already the case in the encyclope-dic accounts found in [8] or [9]. To test theclaim about new results the present authorclaims to include, I looked at the chapteron Poincare–Sobolev inequalities and foundthat only one citation was to a paper pub-lished after 1990.In summary, I believe that practitioners

would be better served to read the morecomprehensive titles cited above.

REFERENCES

[1] A. M. Fink, An essay on the history of in-equalities, J. Math. Anal. Appl., 249(2000), pp. 118–134.

[2] G. H. Hardy, J. E. Littlewood, and G.Polya, Inequalities, Cambridge Uni-versity Press, Cambridge, UK, 1934.

[3] G. H. Hardy, Prolegomena to a chap-ter on inequalities, J. London Math.Soc., 4 (1929), pp. 61–78.

[4] E. F. Beckenbach and R. Bellman,Inequalities, Springer-Verlag, Berlin,1961.

[5] D. S. Mitrinovic, Analytic Inequalities,Springer-Verlag, New York, 1970.

[6] P. S. Bullen, D. S. Mitrinovic, andP. M. Vasic, Means and Their In-equalities, D. Reidel, Dordrecht, TheNetherlands, 1988; updated as Hand-book of Means and Their Inequal-ities, Kluwer Academic, Dordrecht,The Netherlands, 2003.

[7] D. S. Mitrinovic, J. E. Pecaric, andV. Volence, Recent Advances inGeometric Inequalities, Kluwer Aca-demic, Dordrecht, The Netherlands,1989.

[8] D. S. Mitrinovic, J. E. Pecaric, andA. M. Fink, Inequalities InvolvingFunctions and Their Integrals andDerivatives, Kluwer Academic, Dor-drecht, The Netherlands, 1991.

[9] D. S. Mitrinovic, J. E. Pecaric, andA. M. Fink, Classical and New In-equalities in Analysis, Kluwer Aca-demic, Dordrecht, The Netherlands,1993.

[10] N. D. Kazarinoff, Geometric Inequali-ties, Random House, New York, 1961.

[11] M. Marcus and H. Minc, A Survey ofMatrix Theory and Matrix Inequali-ties, Allyn Bacon, Boston, 1964.

[12] V. Lakshmikantham and S. Leela, Dif-ferential and Integral Inequalities:Theory and Applications, Vol. 1, Aca-demic Press, New York, London,1969.

Page 20: SIAM to review

412 BOOK REVIEWS

[13] V. Lakshmikantham and S. Leela, Dif-ferential and Integral Inequalities:Theory and Applications, Vol. 2, Aca-demic Press, New York, London,1969.

[14] W. Walter, Differential and IntegralInequalities, Springer-Verlag, NewYork, 1970.

[15] M. Protter and H. Weinberger, Maxi-mum Principles in Differential Equa-tions, Springer-Verlag, New York,1984.

[16] D. Bainov and P. Simeonov, Integral In-equalities and Applications, KluwerAcademic, Dordrecht, The Nether-lands, 1992.

[17] R. Agarwal, Differential Equations andInequalities: Theory, Methods andApplications, Marcel Dekker, NewYork, 2000.

[18] O. Bottema, R. Z. Djordjevic, R. R.Janic, D. S. Mitrinovic, and P.

M. Vasic, Geometric Inequalities,Wolters-Noordhoff, Groningen, TheNetherlands, 1969.

[19] A. W. Marshall and I. Olkin, Inequal-ities: Theory of Majorization and ItsApplications, Academic Press, NewYork, 1979.

[20] J. Pecaric and Y. L Tong, Convex Func-tions, Partial Ordering and StatisticalApplications, Academic Press, NewYork, London, 1992.

[21] M. K. Kwong and A. Zettl, Norm In-equalities for Derivatives and Differ-ences, Springer-Verlag, Berlin, 1980.

[22] R. P. Agarwal and P. Y. H. Pang,Opial Inequalities with Applicationsin Differential and Difference Equa-tions, Kluwer Academic, Dordrecht,The Netherlands, 1995.

[23] B. Opic and A. Kufner, Hardy-TypeInequalities, Longman Scientific &Technical, Harlow, 1990.

[24] A. Kufner and L. Persson, WeightedInequalities of Hardy Type, World Sci-entific, River Edge, NJ, 2003.

[25] P. Bullen, A Dictionary of Inequalities,Longman Scientific & Technical, Har-low, 1998.

A. M. FINK

Iowa State University

Multiplicative Invariant Theory. By MartinLorenz. Springer-Verlag, Berlin, 2005. $109.00.xii+177 pp., hardcover. ISBN 3-540-24323-2.

Invariant theory has a long, rich, and well-documented history. Some aspects date tothe ancient Greeks. The theory of elemen-tary symmetric functions goes back at leastto Gauss. Abel and Galois contributed tothe theory in different ways. Many treatiseshave been written about the subject. So,when the current book arrived, one mighthave disregarded it as “just another bookon invariant theory.” This would have beena mistake. Martin Lorenz has written anexcellent book treating the theory of invari-ants of groups acting on lattices.As Lorenz explains in his introduction,

multiplicative invariant theory concerns anintegral representation of a group G on afinite rank free Z-module L, also called alattice or aG-module. Thus one has a grouphomomorphism G → GL(L) that extendsto an action of G by k-automorphisms ofthe k-algebra k[L], the group algebra of Lover the field k. One is interested in themultiplicative invariant algebra

k[L]G = {f ∈ k[L] | g(f) = f for all g ∈ G}.

According to the author, the terminologywas introduced by Dan Farkas [1] and is dueto the fact that when considered as elementsin the group algebra, elements in the latticeare “multiplied.” Let k be a commutativering and L a G-module. The group ringk[L] is constructed as the set of functionsfrom L to k with finite support. Additionis given pointwise and multiplication is theconvolution product. So k[L] = Setc(L, k),where the subscript denotes finite support(compact). If f, g are two such functions,then

(f 2 g)(l) =∑

m+n=l

f(m)g(n)

is the convolution product. The group Gacts according to the rule γf(l) = f(γ−1l).It then follows that γ(f 2 g) = (γf) 2 (γg),and similarly for addition. For l ∈ L, de-note by xl the function xl(m) = δl,m (δ isthe Kronecker δ). Then xl 2xm = xl+m andγxl = xγl for γ ∈ G. The multiplicative in-variants are by definition the elements inthe subring k[L]G.The first two chapters form an excellent

and quick review of the properties of G-

Page 21: SIAM to review

BOOK REVIEWS 413

modules that are required for the remainderof the book. Several examples that will ariselater are introduced. (The reader shouldbeware that the definition on page 18 ofinduction from subgroup H to G has therings interchanged. Later, on page 21, onesees an unadorned LATEX escape sequencecreep into the text.) In these two intro-ductory chapters the author introduces allthe notation that he will use. (There area welcome three pages of “Notations andConventions” that follow the introductionand precede the main text, where one learnsthat G always denotes a finite group whileG denotes a possibly infinite group.)A principal goal in the study of invariants,

including multiplicative invariants, is to de-termine the properties of an invariant alge-bra. One asks whether the invariant algegrais Noetherian, factorial, Cohen–Macaulay,or a polynomial algebra, among others.The third chapter begins the study of

the invariant algebras by focusing on theproperties of the group action on the groupalgebra of a lattice. The usual topics areconsidered: reduction to finite groups, fi-nite generation, units, Hopf structure. Themain result here (Proposition 3.3.1) is thatto study the invariant algebra of aG-lattice,one is immediately reduced to the study ofa faithful lattice over a finite quotient groupofG and that one can work over the integersas the base ring. This then further reducesthe study to a finite supply of finite groups,namely, the conjugacy classes of finite sub-groups of GLn(Z) [2]. Many examples ofgroup actions and related structures are in-cluded in this important chapter.The remainder of the book examines the

more restrictive properties of the invariantalgebra chapter by chapter. One recurringtheme is the study of group actions gen-erated by reflections. In this context a re-flection is the generalization of what wenormally think of as a reflection througha hyperplane. This theme was begun bySheppard and Todd [3] and Chevalley [4]and has been continued by many others. Itplays an important role in this book. Anexample is a theorem due to Reichstein [5]that states that a finite group G acts on alattice L as a reflection group if and onlyif the invariant algebra has a SAGBI basis(don’t ask).

The penultimate chapter considers suchdiverse but related topics as Noether’s prob-lem: If F is a finitely generated purely tran-scendental extension of the field K and thegroup G acts on F , does the extension FG

over KG have the same property? Anotherdiscussion asks for the invariants of generallinear groups acting on the coordinate al-gebra of the variety of r-tuples of n × nmatrices.The last chapter is devoted to open prob-

lems, many of which are related to groupactions generated by reflections.This book is excellent. The choice of top-

ics and the order in which they are presentedis very good. The proofs are easy to follow,the references are many and thorough. Theauthor brings many diverse topics togetherin one place. For SIAMmembers, it is prob-ably a book that only former algebraistswho have come over from the dark side tobecome practicing members of SIAM wouldreally appreciate.

REFERENCES

[1] D. R. Farkas, Multiplicative invariants,Enseign. Math., 30 (1984), pp. 141–157.

[2] C. Jordan, Memoire sur les equationsdifferentialles lineaires a integralealgebrique, J. Reine Angew. Math.,84 (1878), pp. 89–215

[3] G. C. Sheppard and J. A. Todd, Finiteunitary reflection groups, Canad. J.Math., 6 (1954), pp. 274–304.

[4] C. Chevalley, Invariants of finite groupsgenerated by reflections, Amer. J.Math., 77 (1955), pp. 778–782.

[5] Z. Reichstein, SAGBI bases in rings ofmultiplicative invariants, Comment.Math. Helv., 78 (2003), pp. 185–202.

ROBERT M. FOSSUM

University of Illinois at Urbana–Champaign

Conflicts between Generalization, Rigor,and Intuition: Number Concepts Under-lying the Development of Analysis in17–19th Century France and Germany.By Gert Schubring. Springer-Verlag, New York,2005. $129.00. xiv+678 pp., hardcover. ISBN0-387-22836-5.

Page 22: SIAM to review

414 BOOK REVIEWS

This is a very ambitious book, both in itsmethodology and in the amount of mate-rial it addresses. It is an immensely learnedbook as well. The topic is of major interest,focusing on two concepts that extended theidea of what was meant by “number”: nega-tive numbers and infinitesimals. Schubringdocuments how hard it was to define neg-ative numbers, let alone justify the rule(−a)(−b) = ab; it was equally hard to rec-oncile the heuristic power of infinitesimalslike dy and dx with proving which ma-nipulations with infinitesimals were accept-able and which not. The history of theseconcepts ranges over many centuries andinteracts with both the social conditionsof mathematical practice and the chang-ing circumstances of mathematics teachingand textbooks. Schubring’s book is aboutall these important things. For these rea-sons, historians of mathematics will needto read this book to acquaint themselveswith the sources Schubring has uncov-ered and with the relationship he tracesamong them. But mathematicians and sci-entists will probably find the 617 pages ofnarrative in this book overly dense withdetail, and readers can easily lose sightof the forest while trudging through thetrees.Of the many insights that are interest-

ing and valuable, I shall mention those Ifound the most so. First, to illustrate bothSchubring’s attention to detail and the wayhis in-depth research illuminates impor-tant historical questions, consider the com-monplace among historians that Descartesdidn’t have Cartesian coordinates, but that18th-century French textbooks developedthem. Schubring discusses how, in orderto have Cartesian coordinates in all fourquadrants, one needs to develop a full un-derstanding of negative numbers, their al-gebra, and their geometric interpretation.Charles-Rene Reyneau, hardly a householdname, is, according to Schubring, “proba-bly the first” to introduce all four quadrantsin the Cartesian plane, in a posthumouslypublished book of 1736; Schubring repro-duces Reyneau’s diagrams and further ob-serves that Reyneau did not fully addressthe question of negative areas that arisesfrom his own work.

Again, consider Newton’s influentialcharacterization of symbolic algebra as“universal arithmetic.” Schubring calls ourattention to how “the contradiction betweenthe intended generality of algebra and therestriction of its understanding to [positive]quantities became an element of the math-ematical crisis. . . in the 1750s,” a crisis hedescribes as occasioned by the need to in-corporate logarithms into the mathematicsof the Paris Academie des Sciences.Schubring describes in a sure-footed way

the approaches to the foundations of thecalculus by Newton, Leibniz, and theBernoullis. He makes clear the distinctionbetween indivisibles (for instance, two ar-eas can be shown to be equal if there isa one-to-one correspondence of the lengthsof indivisible lines making them up) andinfinitesimals (unlike the indivisible lines,the infinitesimally-wide rectangles makingup an area are, though of infinitesimalwidth, nonetheless areas). He gives a valu-able “aside” on mathematical teaching in-stitutions in England, France, and Ger-many and on the rise of Jesuit high schoolson the Continent. He makes effective useof manuscript materials, especially thoseof Ampere and, even more extensively, ofLazare Carnot, and he gives in an appendixa list of Cauchy’s surviving correspondence,including many letters still unpublished,with their locations. He also analyzes thecontents of a large number of textbooks ofthe eighteenth and nineteenth centuries, toprovide a rich, multidimensional context forthe work of Cauchy.Schubring strikingly observes that Kant

“successfully deontologized the ‘nothing’ inmathematics, attributing to it the exclu-sively relational character of the zero.” Hepoints out how

In a way analogous to our use of neg-ative numbers in the development ofthe concept of number as red flags toindicate conflict about the tendencytoward algebraization, the concept ofinfinitely small quantities will serve asindicator for the modes of algebraiza-tion concerning limit processes.

He tellingly relates Cauchy’s use of in-finitely small quantities, defined in Cauchy’s

Page 23: SIAM to review

BOOK REVIEWS 415

Cours d’analyse as variables with zero fora limit, to major debates resulting fromthe reestablishment in 1811 of infinitesimalsin science teaching at the Ecole Polytech-nique. He also effectively rebuts the parti-sans of nonstandard analysis who wish tomake Cauchy one of them, using the workof Cauchy’s disciple the Abbe Moigno toargue for Cauchy’s own intentions.Carefully reading Gauss’s 1831 paper

on quadratic residues, Schubring showsthat “Gauss described the development ofalgebra as a process of abstraction fromnumber domains to which an object inquantity form can be assigned. . . . Gausslegitimized both negative and imaginarynumbers.” Schubring adds that this dis-cussion by Gauss was not even mentionedby the German literature of its time. AndSchubring argues that Dirichlet, using arelatively little-known work of Dirksen asa stepping-off point, “powerfully derivedthe concept [of the integral], definition, andproof geometrically,” adding that Dirich-let showed “how thorough and clear is theanalytical method of proof . . . [and] that in-tuitiveness and rigor do not have to be inconflict with each other.”The book begins with a methodological

introduction, which I found interesting butsomewhat polemical. Schubring refers of-ten to “the historiography” as though therewere a single-voiced received wisdom amonghistorians of mathematics. In the interestof full disclosure, I would record that I amamong the historians whose views he crit-icizes and, I believe, somewhat mischarac-terizes, as when he asserts that by pointingto Cauchy’s use of some ideas about conver-gence in Lacroix’s textbook I meant to im-ply that Cauchy’s theory of convergence wasnot both novel and monumental, and thenlater asserts that I see Cauchy as conjuringmathematical rigor out of thin air. This re-viewer also does not share the author’s viewof the great importance of Lazare Carnot’sfoundations for the calculus, and I disagreewith Schubring’s dismissal of the influenceof Lagrange’s inequality-based proof meth-ods on Cauchy’s calculus. I also questionhis emphasizing Cauchy’s famous error (anargument that an infinite series of contin-uous functions must be continuous) while

not expounding Cauchy’s clear definitionof convergence as a basis for his inequality-based proofs of many important and fruitfulconvergence tests. And though the authorhas quite legitimately chosen to emphasizeFrench and German developments, he doesoccasionally discuss Britain, so it is sur-prising that, though he discusses HermanHankel’s Princip der Permanenz der for-malen Gesetze, he does not mention GeorgePeacock’s similar and prior work on theaxiomatic basis for algebra.The bibliography is a marvelous source

in itself, a multilingual list of sources con-sulted that runs for forty pages; Schubringhas found and effectively exploited virtuallyevery important French or German text-book for the subjects and periods covered.Still, I wish he had included, as part of thestory of negative numbers, the physics ofpositive and negative electrical charges inthe eighteenth century, and, for infinitesi-mals, Joseph Dauben’s work on AbrahamRobinson in the twentieth.There are other difficulties not of the au-

thor’s making. What Schubring calls “therelatively independent former Chapter C”about the context of the 1811 switch toinfinitesimals at the Polytechnique has, hesays, been published separately in German,though its “principal results” are summa-rized in a twelve-page section of the book;the reviewer would have liked this includedin full. Further, the editing of Schubring’sbook is not what one would expect from aquality press like Springer. The book wastranslated from the German, but sometimesjarringly (just three examples: “autonom-ization”; “intermediate value quality” for“property”; “fundament” for “basis” or“foundation”), and a few things aren’ttranslated at all (for instance, on severaloccasions, “und” for “and”). The proof-readers missed many errors (one example:“de Gelder’s intention had been two [sic]refute both”). And I wish there were anindex of subjects to go with the eight-pageindex of names.Nevertheless, this book remains a major

contribution. One need not accept each ofSchubring’s interpretations of the evidenceto value the work as a whole. The richand detailed account of textbooks and edu-

Page 24: SIAM to review

416 BOOK REVIEWS

cational institutions, and the key passagesand events Schubring highlights from thehistory of the concepts of negative numbersand infinitesimals, add greatly to our un-derstanding of the history of mathematicsin one of its most exciting periods.

JUDITH V. GRABINER

Pitzer College

Understanding Search Engines: Mathe-matical Modeling and Text Retrieval. Sec-ond Edition. By Michael W. Berry and MurrayBrowne. SIAM, Philadelphia, PA, 2005. $35.00.xviii+117 pp., softcover. ISBN 0-89871-581-4.

This short book (the main text covers amere 101 pages) describes the steps in theprocess of designing and building a searchengine, whether for Web or database appli-cations. It highlights decisions that need tobe made along the way, drawing attentionto the ramifications on the end product. Itdoes so in a informal conversational dia-logue with only the basics and highlightsof the mathematical modeling referenced inthe subtitle.An example of one of the steps is docu-

ment preparation and the requirements forautomatic and manual indexing schemes.The authors step through various optionsand the associated benefits and pitfalls ofeach approach. With a similar style theystep through the choices of vector spacemodels for representing the text collection,the linear algebra technology that can beused, and many other considerations. Thestring of seven chapters on designing searchengines ends with a thoughtful emphasison user interfaces, a topic that impacts allusers of Internet search engines.Understanding Search Engines states at

the beginning that it is not going to at-tempt to describe how to implement searchengines nor delve into any algorithmic is-sues. Instead the authors liberally sprinklereferences to technical articles and booksfor the nitty gritty details. The book closeswith a list of sources for those wanting todig deeper into various topics.I found the book easy to read and un-

derstand. As I am unfamiliar with books inthis area, indeed even with the first editionof this book, I cannot comment on where

this book stands in the field. I did find ituseful to me in understanding some of theunderpinnings of Internet search engines.

ROGER GRIMES

Livermore Software Technology Corporation

The Four Pillars of Geometry. By John Still-well. Springer-Verlag, New York, 2005. $49.95.xii+227 pp., hardcover. ISBN 0-387-25530-3.

This is an introductory book on geome-try, easy to read, written in an engag-ing style. The author’s goal is to presentgeometry from several different points ofview, to increase one’s overall understand-ing and appreciation of the subject. Theresult is four essays, in eight chapters, moreor less independent of each other, represent-ing the “four pillars,” which are Euclideangeometry, vectors, projective geometry, andtransformation groups. Along the way, hepresents elegant proofs of well-known theo-rems using these different approaches, suchas the delightfully simple proof using vec-tors that the three altitudes of a trianglemeet in a point (pp. 75–76).Chapters 1 and 2 give a brief introduction

to Euclidean geometry, including parallels,similar triangles, the Pythagorean theorem,the use of real numbers, and proofs of thetheorems of Desargues and Pappus.Chapters 3 and 4 are an introduction

to analytic geometry, viewing points of theplane as pairs of real numbers, and the con-sequent applications of linear algebra togeometry. This includes a study of theisometries of the plane and the theoremthat any isometry is a product of three re-flections.Chapters 5 and 6 introduce projective ge-

ometry of the plane, the importance of thecross-ratio as a projective invariant, and acomplete proof of the theorem of introduc-tion of coordinates in the projective plane.Chapters 7 and 8 discuss transformation

groups and illustrate the ideas of Klein’sErlanger Program, that a geometry is char-acterized by the group of transformationsthat leave its structures invariant. The lastchapter is an unusual introduction to thehyperbolic plane of non-Euclidean geom-etry showing that its associated group is

Page 25: SIAM to review

BOOK REVIEWS 417

the same as the group of fractional lineartransformations of a projective line.The advantage of the author’s approach

is clear: in a short space he gives a brief in-troduction to many sides of geometry andincludes many beautiful results, each ex-plained from a perspective that makes iteasy to understand. As he points out, dif-ferent theorems are often best explained indifferent contexts.The disadvantage is that the reader may

come away from the book with some falseimpressions and a lack of solid foundations.For example, in Chapter 3 the reader mayget the impression that Descartes inventedanalytic geometry as we know it, namely,that a point in the plane corresponds to anordered pair of real numbers. It is true thatDescartes’ Geometrie (1637) was the start-ing point of a series of developments that ledto analytic geometry. However, Descartesdid not use numbers. He assigned lettersto line segments and then used algebra tosolve equations, leading to a geometricalconstruction of a given problem. This couldproperly be called applications of algebrato geometry. The use of numbers appearedonly gradually over the next 300 years, andthe first books bearing the words “analyticgeometry” in their titles were written inthe early 19th century. The idea of defininga plane as the set of ordered pairs of realnumbers is even more recent: it appearsclearly in Hilbert’s Grundlagen (1899), andI doubt if it could have been thought ofmuch before then!Chapter 6 may give another false im-

pression: that it is simpler to introducecoordinates in geometry using projectivegeometry than Euclidean. To be sure, onecan introduce coordinates in a projectiveplane, as is done here, by taking the theo-rems of Desargues and Pappus as axioms.But if one wants coordinates in a Euclideanplane, one needs to prove Desargues andPappus. This is done earlier in Chapters 1and 2 using Thales’ theorem, but Thales’theorem is proved using real numbers, i.e.,presupposing the existence of coordinates.So, from thematerial presented in this book,one cannot obtain the introduction of coor-dinates in a Euclidean plane.I do not intend these comments to be

taken too negatively. I merely wish to em-

phasize that this book is a well-written in-troduction, but only an introduction, andthe discerning reader will have to go else-where for more depth. This need has beenanticipated by the author in his list of refer-ences, and I could not complete this reviewwithout acknowledging the kind remarkshe makes about my book Geometry: Eu-clid and Beyond, which is one of those herecommends for further reading.

ROBIN HARTSHORNE

University of California, Berkeley

Discrete and Continuous NonlinearSchrodinger Systems. By M. J. Ablowitz,B. Prinari, and A. D. Trubatch. Cambridge Uni-versity Press, Cambridge, UK, 2004. $60.00.x+257 pp., hardcover. ISBN 0-521-53437-2.

This book is devoted to the description andsolution of a wide class of one-dimensional,integrable, nonlinear Schrodinger (NLS)systems. The distinguishing features ofthese systems are that they usually show upas envelope equations for almost monochro-matic waves which have a weak second-order dispersion (or weak diffraction, ifdiscrete) and they have a weak cubic non-linearity. There are no limitations on howlarge these systems can be, although therewill be stern limitations on the coefficientsof these larger systems, if the system is tobe integrable. There is a further limitationin that the eigenvalue problem for the Laxpair of these systems must be at worse a ma-trix AKNS (Ablowitz–Kaup–Newell–Segur)form. The book uses the well-establishedAKNS pattern to mathematically treatthese general integrable NLS systems. Itstarts with the well-known scalar integrablecase, then covers the discrete integrablecase, and after that, the matrix form ofthe continuous and discrete NLS systems.Applications are discussed. The impor-

tance of the scalar continuous NLS is wellknown, particularly for optical fibers. Un-fortunately, the integrable discrete case ofNLS has no current applications, althoughthe introduction does briefly discuss thenonintegrable form of the discrete NLS(where the nonlinearity is only local), dueto its technological importance.

Page 26: SIAM to review

418 BOOK REVIEWS

The treatment of each system closely fol-lows the classical AKNS IST (inverse scat-tering transform) formulation. The booklogically proceeds from one system to thenext, starting with the simple scalar con-tinuous NLS, then to the scalar, integrable,discrete NLS. After this, it then jumps toa matrix formulation of the same two sys-tems, and treats those two. In each case,one first solves the forward scattering prob-lem (which is to determine the analyticsand asymptotics of the Jost functions andthe scattering coefficients). Then one usesthis data to construct the “linear dispersionrelations” (solution of the Riemann–Hilbertproblem), which relates any Jost functionto other Jost functions and the “scatteringdata” (a subset of information containedin the scattering coefficients). Then, upontaking Fourier transforms of these lineardispersion relations, one can construct aset of integral equations which solves theinverse scattering problem. They also dis-cuss and treat the possible symmetry reduc-tions, trace formulas, N -soliton solutions,time evolution of the scattering data, con-servation laws, and the Hamiltonian struc-ture. This is done thoroughly each timefor all four classes of these NLS systems.Upon completion of these treatments, onecan well appreciate the power in the math-ematics involved in the IST, as well as thegeneralities and similarities between thesefour cases.What is absent? Well, the constraint on

the eigenvalue problem excludes other NLSsystems which can have different forms ofthe nonlinearity. In particular, the deriva-tive NLS is excluded, as well as any otherintegrable form which would have a differ-ent eigenvalue problem. Other things ab-sent are the treatment of the perturbationtheory of each case, along with a discussionof the closure relations for each system.

D. J. KAUP

University of Central Florida

Support Vector Machines for PatternClassification. By Shigeo Abe. Springer-Verlag,New York, 2005. $89.95. xiv+343 pp., hard-cover. ISBN 1-85233-929-2.

This broad and deep, as well as original,book is organized around the highly sig-nificant concept of pattern recognition bysupport vector machines (SVMs). In aninformation age when giga- and terabytesof high-dimensional data (information) areproduced daily, one must use pieces of soft-ware (“learning machines”) to analyze suchhuge data sets. Humans can’t do it. In-stead, they design clever machines thatlearn from huge data sets. SVMs are one ofthe newest and one of the best models de-veloped for such tasks. How they recognize,learn, and/or approximate some unknowndependencies is nicely shown in this book.In particular, the topic of this innovativevolume is a presentation of all the math andconstructive algorithms for using SVMs inpattern recognition. The book is praxis andapplication oriented but with strong theo-retical backing and support. Many tiny,but in other books hidden, details are pre-sented and discussed, thereby making theSVM both an easy-to-understand learningmachine and a more likable data modeling(mining) tool.Shigeo Abe has produced the book that

will become the standard in the field. Inorder to achieve this, he has performed arespectable attempt at making the topicsand techniques both modern and applica-ble in the best sense. In particular, he hasdelivered a well-balanced book. All the ba-sic theory needed is presented in a verypleasant way, and at the same time all theexperimental properties of the algorithmshave been thoroughly investigated. He is aprofessional with experience that tells him:as soon as you start iterating (and learningfrom data is an iterative process) all thenice equations behave strangely and all ofthem can lead you to the strangest sensa-tions, phenomena, and results. And, just tomake things sound, he follows each modelwith quite a number of simulation experi-ments and presents them in many tables.Hence, this is both a theoretically stronglygrounded introduction to SVM methodsand techniques and a good practical bookhaving all the needed equations and setsof rules for an easy implementation of themodels introduced. The book presents var-ious SVM models, problems, approaches,techniques, and methods and consists of

Page 27: SIAM to review

BOOK REVIEWS 419

the preface and eleven chapters as follows:

1. Introduction (8 pp.)

2. Two-Class Support Vector Machines(62 pp.)

3. Multiclass Support Vector Machines(44 pp.)

4. Variants of Support Vector Machines(26 pp.)

5. Training Methods (34 pp.)

6. Feature Selection and Extraction(6 pp.)

7. Clustering (6 pp.)

8. Kernel-Based Methods (14 pp.)

9. Maximum-Margin Multilayer NeuralNetworks (11 pp.)

10. Maximum-Margin Fuzzy Classifiers(28 pp.)

11. Function Approximation (30 pp.)

These are followed by four appendices onconventional classifiers, matrices, quadraticprogramming and positive semidefinite ker-nels, and reproducing kernel Hilbert space,and by the references and an index. I delib-erately stated the number of pages devotedto each chapter in order to convey the au-thor’s weighted approach to each particularset of techniques and models. Just from thenumbers given, one can see that the book’stitle is right to contain the words “patternclassification,” because the SVMs for solv-ing the classification tasks have been giventhree times as many pages as the SVMmod-els for regression (function approximation).Short descriptions of the chapters shouldgive slightly deeper insight into the verycontent and coverage of the book.Chapter 1 discusses two types of deci-

sion functions: the direct decision function,in which the class boundary is given bythe curve where the decision function van-ishes; and the indirect decision function, inwhich the class boundary is given by thecurve where two decision functions take onthe same value. This is a classic piece ofknowledge used to set the stage for furtherdevelopment of the book. In addition, thebenchmark data sets used across the wholebook are presented here too.In Chapter 2, the architecture of SVMs

for two-classification problems is presented.

The presentation is gradual—from thehard-margin SVMs for handling linearlyseparable classification problems to the soft-margin SVMs for the data with overlappingwhen an introduction of slack variables forthe training data is required. A nice originalpart of the book begins here by presentingboth types of SVMs: L1 soft-margin SVMsand L2 soft-margin SVMs, where L1 andL2 denote the linear sum and the squaresum of the slack variables that are added tothe objective function for training, respec-tively. Then, a useful experimental partcomes in which the author investigates var-ious techniques and solutions for improvingthe generalization ability of L1 and L2classifiers. Several popular benchmarkingdata sets are used. Such an end to Chapter2 makes the book unique—no other booksgo into such minute but very importantdetails. Other authors eventually presentthe L1 and L2 models, but then leave itto the reader to search references for infor-mation about their performances. With hispresentation of both L1 and L2 models andconsequences, Shigeo Abe is saving readersa huge amount of time.Chapter 3 again presents a topic not cov-

ered very often in SVM books. It discussesseveral methods for multiclass problems—one against-all SVMs (each class is sepa-rated from the remaining classes), pairwiseSVMs (one class is separated from anotherclass), the use of error-correcting outputcodes for resolving unclassifiable regions,and all-at-once SVMs (in which decisionfunctions for all classes are determined atonce). To resolve unclassifiable regions, inaddition to error-correcting codes, fuzzySVMs and decision-tree-based SVMs arepresented. To compare several methods formulticlass problems, the book gives a per-formance evaluation of these methods forthe benchmark data sets. Such a treatmentof the multiclass SVM, as well as detailedpresentation of several concepts and theirbenchmarking on popular and known datasets, cannot be found in other SVM books.This section enriches the book a lot.Chapter 4 is devoted to the presenta-

tion of many variants of SVMs such as theleast squares SVMswhose training results insolving a set of linear equations, linear pro-gramming SVMs, robust SVMs, Bayesian

Page 28: SIAM to review

420 BOOK REVIEWS

SVMs, and committee machines. Again, itis unusual to find such broad coverage ofvarious SVM models in a single volume.That’s why this chapter is a very useful,good, and distinctive one.There are many training methods for

SVMs. Chapter 5 discusses some of them.The whole variety of different methodscomes from the fact that learning in SVMsmeans solving the quadratic optimizationproblem (QP) with the number of vari-ables equal to the amount of training data.Learning in SVMs scales with the size ofthe training data set, and this brings wor-ries, since having thousands of training datapairs leads to a huge Hessian matrix. In ad-dition, and unlike in classic QP problems,the Hessian matrices here are very, verydense. This all requires the introductionof various decomposition techniques, andChapter 5 presents a few of them. Such aproblem setting may be a particular chal-lenge for mathematicians working in thenonlinear optimization field, for there arestill many open question unresolved.In Chapter 6 several methods for select-

ing optimal features are introduced. Again,a lot of simulations using the benchmarkdata sets show that feature selection is im-portant for SVMs too. It is explained thatthe SVMs do not require clustering becausemapping into a feature space results in aclustering in the input space.In Chapter 7, the author discusses how

one can realize SVM-based clustering. Oneof the features of SVMs is that by map-ping the input space into the feature space,nonlinear separation of class data is real-ized, and this fact is a basis for a clusteringmethod. Thus the conventional linear mod-els become nonlinear if the linear modelsare formulated in the feature space. Themapping into the feature space is done bykernels and such methods are usually calledkernel-based methods. Then, Chapter 8 in-troduces various kernel-based methods forsolving different tasks: kernel least squares,kernel principal component analysis, andthe kernel Mahalanobis distance. This is avaluable chapter because one does not oftenfind topics from different fields presented ina single volume.Chapter 9 is an original attempt to con-

nect the maximal margin idea with classicneural networks, and the author presents

methods for maximizing margins of multi-layer neural networks.Chapter 10 is a continuation of this par-

allel covering of related models. Here, amaximum-margin fuzzy classifier with el-lipsoidal regions and polyhedral regions areintroduced. This chapter gives hints aboutother (possibly previous) interests of theauthor, and together with the material inChapter 9 it points to possible useful ex-tensions of the powerful intuitive idea ofthe maximal margin to other classificationmodels and approaches.Chapter 11 is devoted to regression. It

presents how to extend SVMs to functionapproximation and it compares the per-formances of the SVMs with other func-tion approximators. Many regressors arepresented—LP SVMs, v SVMs, and least-squares SVMs. Also, the robustness with re-spect to outliers is discussed here. It is a pitythat the author did not have time to treatregression SVMs with the same vigor asthe classification ones. However, this wouldhave meant investing almost double thetime in preparing a double-size book, andone cannot ask that much of a single author.Finally, worthy of repeating is that each

chapter contains both a basic theoreticalpresentation of the models and extensive ex-perimental comparisons. This should helpprovide a better understanding of some-times difficult concepts, but it should alsosharpen the reader’s skill. A distinctive fea-ture of this volume is that it comes also withan extensive and reliable list of good refer-ences. Newcomers and cautious beginnersin the field will find this list very useful.Support Vector Machines for Pattern

Classification is aimed at a broad spec-trum of readers: senior undergraduate andgraduate students, a wide audience of re-searchers, and practicing engineers andprofessionals in computer and informationsciences, biomedical informatics, and busi-ness information systems willing to deepentheir understanding of this broad subject.It also presents a huge area for mathe-maticians in devising novel kernel functionssuitable for different tasks, or in develop-ing novel, fast, and reliable QP solvers fordense Hessian matrices that are 1 millionor more dimensional.The area of machine learning is a huge

field, and even if focused on SVM models

Page 29: SIAM to review

BOOK REVIEWS 421

only, it is extremely hard to cover all issues,questions, and open problems in a singlevolume. Writing a book covering such anumber of issues in SVMs is a risky en-deavor, but somehow the author has pro-duced an excellent volume that covers thewhole vast area of the constructive part ofSVMs. Without a doubt, the most relevantaspects of tools, methods, and approachesare covered in detail and, I would say, withutmost elegance. I am not aware of a singlebook that covers both the basic theory ofand experiments in the SVM field in thisway. Therefore, this serious attempt to ad-dress these vast issues is warmly welcomed,and the author deserves all the credit.There is no doubt that this precious vol-

ume will become a standard reference, areliable guide, and a very useful source ofinformation for all interested in the fieldsof pattern recognition and function approx-imation by SVMs, i.e., kernel methods. Ilike it and therefore highly recommend thisbook to all interested in the vast area ofmachine learning.

VOJISLAV KECMAN

The University of Auckland

Branch-and-Bound Applications in Com-binatorial Data Analysis. By Michael J.Brusco and Stephanie Stahl. Springer-Verlag, NewYork, 2005. $69.95. xii+221 pp., hardcover.ISBN 0-387-25037-9.

This monograph is true to its purpose:developing and carefully explaining theapplication of branch-and-bound (B&B)algorithms for the solution of specificcombinatorial optimization problems instatistics. The problems discussed involvecluster analysis (partitioning objects intoclusters), seriation (ordering objects to re-veal structure), and variable selection forcluster and regression analyses. These canbe nasty problems, especially when thenumber of objects or variables is large.Clever algorithms are essential to deal withthem efficiently.Regression analysis is surely among the

most widely practiced statistical methods,and applications of cluster analysis havebeen growing rapidly in the scientific litera-ture. Moreover, the increasing massiveness

of data analysis problems is forcing practi-tioners to be more aware of and concernedabout computational alternatives. One ofthe bright features of this book is the even-handed way in which the B&B methods arepresented. In each case, the authors clearlyspell out not only the process but also itsstrengths and limitations. For example, theperformance of a B&B algorithm may beimpaired when the data are not well enoughstructured. These caveats are important be-cause B&B is not a panacea for hard dataanalysis optimization problems!An underlying theme throughout is the

comparison with dynamic programming asan alternative approach. Indeed the au-thors indicate that their contribution “waslargely inspired by” the Hubert, Arabie,and Meulman monograph [1] on dynamicprogramming, which they recommend as“excellent companion reading.” After ex-plaining that dynamic programming andB&B are “the two principal enumerationstrategies in the combinatorial optimizationliterature,” Brusco and Stahl observe thatthe latter method tends to require much lesscomputer storage and hence may be moreappealing for large problems.The B&B algorithms are developed in

terms of four components: branching (forcreating subproblems), bounding (for eval-uating partial solutions), pruning (for elim-inating partial solutions), and retracting(for moving backward in a partial solu-tion). These concepts are spelled out indetail in the context of the specific appli-cations. Moreover, pointers are provided tocomputer programs for implementing manyof the B&B procedures, and these are amplyillustrated and discussed.TheK-means clustering method is one of

the favorite approaches to cluster analysis.It plays a major role in this monograph.Here K refers to the number of clusters,which is unknown in advance. A simpleiterative algorithm for K-means is imple-mented in most general purpose statisti-cal packages. It works roughly as follows:With K specified, initial “centers” for theclusters are chosen in some arbitrary way,objects are assigned to the closest centersbased on Euclidean distance, centers areupdated to be the centroids of the objectscurrently assigned, and the assignment pro-cess is repeated until (local) convergence is

Page 30: SIAM to review

422 BOOK REVIEWS

achieved. The arbitrary starting point isknown to be a stumbling block, and practi-tioners are encouraged to experiment withdifferent choices. As for K, the hope is thatthe right choice will reveal itself eventuallyafter different values are tested.In discussing the performance of the B&B

approach for K-means clustering, Bruscoand Stahl observe that it can yield opti-mal solutions for “hundreds of objects intoas many as eight clusters, provided thatthe clusters are well-separated.” Then theycaution that the computations are “infeasi-ble” when attempting to cluster 60 randomdata points with K = 6. This can be frus-trating because the practitioner will onlyknow after the fact whether B&B will beeffective on a pure clustering problem withno advance information about the presenceof cluster structure. It also raises the ques-tion of howmuchmore effective B&Bwill beon large problems with clear cluster struc-ture than the simpler algorithms in today’sstatistical packages.Later in the monograph, the authors

tackle the really tricky problem of variableselection for clustering and develop a B&Bapproach to it that effectively embeds oneB&B algorithm (variable selection) insideanother (partitioning the objects into clus-ters). For the limited situation where Kis fixed and one is trying to find an opti-mal subset of fixed size from the full set ofvariables under consideration, the approachis described as “computationally optimal.”This presumes that numerous runs of theK-means algorithm that are involved in theprocess will all be optimally solved. Thiscould happen but serious experimentationwith various starting solutions would be re-quired for each situation. And then there isthe additional challenge of optimizing withrespect to bothK and the size of the subset.Various multiobjective extensions of the

clustering problem are also developed. Oneexample is the search for clusters using dif-ferent criteria. The B&B approach can beapplied to find solutions that are optimalin a combined sense.Under the heading of seriation, the prob-

lems involve searching for optimal permuta-tions of sequences to reveal structure. Forexample, the goal might be to find the per-mutation of rows and columns to maximizethe sum of elements above the main diag-

onal of an asymmetric matrix or to pushlarge entries away from the diagonal of asymmetric matrix, as in a quest for so-calledanti-Robinson structure.Optimization algorithms, such as B&B,

are of growing importance in data analysis.Still, they are only one part of the picture.For example, in regression analysis, in ad-dition to variable selection (perhaps usingB&B), one needs to worry about the impactof outliers and other oddities in the data,transformations of variables, and the sta-tistical significance of results. Fortunately,Brusco and Stahl do a nice job of keepingthe optimization challenges in perspectivewhile dishing out all that you probably wantto know about the state-of-the-art of B&Btechniques for data analysis.

REFERENCE

[1] L. Hubert, P. Arabie, and J. Meulman,Combinatorial Data Analysis: Opti-mization by Dynamic Programming,SIAM, Philadelphia, 2001.

JON R. KETTENRING

Drew University

Graphs and Networks: Transfinite andNonstandard. By Armen H. Zemanian. Birk-hauser Boston, Boston, MA, 2004. $84.95. xii+202 pp., softcover. ISBN 0-8176-4292-7.

For about thirty years Zemanian has beendeveloping a theory of infinite electrical net-works. This book is the latest in a seriesof books [2], [3], [4] on the subject. Thesubject is necessarily abstract and sophisti-cated because infinite objects are the mainobjects of discourse. It then comes as nosurprise that set theoretic concepts play acentral role. A paper of Flanders [1] seemsto have been the source of much of the inter-est in infinite electrical networks. Flandersshowed that some of the results of finitenetwork theory can be wrong if not care-fully phrased. For example, Kirchhoff’s loopand node laws may not uniquely determinea voltage-current regime in an infinite net-work. However, if it is required that thepower dissipated be finite, then the regimeis uniquely determined in an “infinite net-work” with an appropriate meaning of the

Page 31: SIAM to review

BOOK REVIEWS 423

term infinite network. Zemanian has sug-gested at least three different ways to de-fine the terms infinite graph and infinitenetwork.The book that is the subject of this review

is the latest development of this subject.The first four chapters review concepts andterminology introduced earlier in [2], [3],and [4]. However, some changes have beenmade in certain definitions (such as the def-inition of bordering and boundary nodes)and proofs, so the first few chapters areimportant not only to remind the readerof the terms, but also to give an improvedor alternate treatment of some earlier re-sults. Chapter 4 introduces the idea of anordinal-number valued distance in a trans-finite graph, but it is somewhat defectivesince the relation of being connected (by ak-path) is not transitive. Chapter 5 reme-dies this defect by introducing an alternatedefinition of transfinite graph that relieson walks and not paths. With this defi-nition the ordinal-number valued distanceis a metric. Chapters 6 and 7 introducehyperreal numbers as a way of making cal-culations involving Kirchhoff’s laws. Thegraphs are still standard, even when infinite(transfinite). The real leap is in Chapter 8,where the graphs themselves are nonstan-dard. The ideas of Chapter 8 are not fullydeveloped in the sense that there seem to bemany questions which might be addressedin future work. Only nonstandard 0-graphsand nonstandard 1-graphs are discussed,and it may be a long story to develop atheory of nonstandard µ-graphs, where µ isa limit ordinal.The development is too technical to dis-

cuss in this review. An interested readershould get oriented by consulting Zema-nian’s earlier works and papers. There doesnot yet seem to be a large following of re-searchers in this area, but it seems veryattractive and ripe for investigation. It’sintriguing to see the connections betweenset theory and electrical network problems.To give the reader a taste of this subject

a sparse introduction will now be given.The “simplest” types of transfinite graphs(they might be called standard path-based)are constructed by a method that closelyresembles transfinite induction. They areconstructed in order of rank. The first rank(actually rank (−1)) are the (−1) graphs.

The author introduces some terminologythat is not common in graph theory. Thebasic object is a branch b (think “edge”).These are the building blocks. Each branchb = {tb1 , tb2} is a set with two distinct ele-ments called tips. All branches are disjointand the union of the branches T −1 is theset of all (−1)-tips. That’s all there is inrank (−1). The next rank is a 0-graph.The set of all (−1)-tips is partitioned intodisjoint nonempty subsets, T −1 = ∪τTτ .These subsets may contain infinitely many(−1)-tips. Each of these sets is called a0-node of the 0-graph. The set of 0-nodesis denoted by X 0. A branch b ∈ B is in-cident to a node x ∈ X 0 if b ∩ x �= ∅.Notice that x = Tτ for some τ . This isthe description of a standard (possibly in-finite) graph G = {B,X 0}. The construc-tion of 1-graphs is the typical inductionstep. The crucial idea is that of a 0-tip.A 0-tip is an equivalence class of infinitepaths of the form x1, b1, x2, b2, . . . , wherethe 0-nodes and branches are all distinct(no self-intersections). Two such paths areequivalent if they agree except for finitelymany branches and nodes. This describesthe set T 0 of 0-tips. This set is now par-titioned into a union of disjoint nonemptysets T 0 = ∪τT 0

τ . For each τ in the indexset there is given a set X 0

τ which is eitherempty or consists of a single 0-node x0

τ .The sets X 0

τ are disjoint. A 1-node x1τ is de-

fined by x1τ = T 0

τ ∪ X 0τ . The set of 1-nodes

is denoted by X 1. A 1-graph is the tripleG1 = {B,X 0,X 1}. We now have graphs ofranks −1, 0, 1.The first few steps don’t adequately de-

scribe the inductive procedure. The crucialconcept is that of an infinite k-path. Tooversimplify, an infinite k-path is an infinitesequence of distinct k-nodes each connectedby a sequence of nodes of lower rank. Thisprocess leads to a (k+1)-graph for all finiteordinal numbers k. To reach a countablelimit ordinal ν an intermediate precedinggraph is constructed which forms a partialstep to this rank. Unfortunately, this singlestep is different from the other steps, so theprocess is not truly transfinite recursion,but it is very close.Figure 1 shows an example of a 2-graph.

The solid circles such as a are 0-nodes; mid-dle sized open circles such as b and c are1-nodes; and the large open circle, d, is a

Page 32: SIAM to review

424 BOOK REVIEWS

2-node. The dashes between b and c desig-nate an infinite sequence of 0-nodes. Thereis an infinite sequence of 1-nodes betweenc and d. Node d embraces a 1-node and a0-node.

b c d

a

Fig. 1 A 2-graph.

To understand these concepts fully thereader must consult the book under review.The reviewer highly recommends devotingthe effort needed to understand these orig-inal and surprising concepts.

REFERENCES

[1] H. Flanders, Infinite Networks: I. Re-sistive Networks, IEEE Trans. CircuitTheory, 18 (1971), pp. 326–331.

[2] A. Zemanian, Infinite Electrical Net-works, Cambridge University Press,New York, 1991.

[3] A. Zemanian, Transfiniteness for Graphs,Electrical Networks, and RandomWalks, Birkhauser Boston, Boston,1996.

[4] A. Zemanian, Pristine Transfinite Graphsand Permissive Electrical Networks,Birkhauser Boston, Boston, 2001.

JAMES ALLEN MORROW

University of Washington

A First Course in Differential Equations.By J. David Logan. Springer-Verlag, New York,2005. $79.95. xvi+289 pp., hardcover. ISBN0-387-25963-5.

David Logan laments the size of the typicalODE textbook used for the standard one-semester course that follows calculus. He’sdone a remarkably good job at this assignedwriting task. I’m reminded of Ince’s classic[1], which sought to teach the material froma short list of problems and a very parsimo-nious description of the techniques needed.

Since then, Boyce and DiPrima’s classic [2],now in its eighth edition, has dominated themarket and spawned lots of lookalikes, allgetting more obese with time.Logan, the Willa Cather Professor of

Mathematics at Nebraska, has been a suc-cessful applied mathematician and the au-thor of fine texts on applied mathematicsand partial differential equations. Here he’sfound a way to present the important tra-ditional topics from a fresh perspective. Inpart, this results since this ODE course isan introduction to both continuous model-ing and the use of software such as Mapleand MATLAB, which still contains mostbasic analytical methods and illustrates allresults graphically.Lecturers can certainly add to Logan’s

outline. Overall, I think it has been wiselyselected. I, for one, would do somewhatmore on the variation-of-parameters for-mula for second-order linear equations,since sophomores are always tempted tomisinterpret it. Likewise, I would do a bitmore with classifying first-order planar sys-tems since the explicit results are so de-scriptive and teach one about stability. Onthe other hand, I wouldn’t spend time onLaplace transforms, since these students canonly invert them by using tables. The finalchapter on nonlinear systems is particu-larly impressive. It makes real progress byconcentrating on simple (largely biological)models and by outlining the essential bigpicture.Students won’t find lots of recipes in Lo-

gan’s book, but they can instead learn muchof practical value. Maybe they’ll also bemotivated to continue to learn more aboutthe subject after that semester, so they cansolve more significant applied problems.

REFERENCES

[1] E. L. Ince, Integration of Ordinary Dif-ferential Equations, Oliver and Boyd,Edinburgh, UK, 1939.

[2] W. E. Boyce and R. C. DiPrima, Ele-mentary Differential Equations, JohnWiley, New York, 1965.

ROBERT E. O’MALLEY

University of Washington

Page 33: SIAM to review

BOOK REVIEWS 425

Nonparametric Statistical Methods forComplete and Censored Data. By M. M.Desu and D. Raghavarao. Chapman and Hall/CRC, Boca Raton, FL, 2004. $79.95. xiv+367 pp., softcover. ISBN 1-5848-8319-7.

In the context of literary criticism, EdgarAllan Poe suggested that no review shouldever be either completely favorable or com-pletely unfavorable. In the case of the bookby Desu and Raghavarao, referred to asD&R in the following discussion, this ad-vice is easy to follow since the book has botha number of strong points and some veryweak ones. The strengths of the book arethe range of nonparametric analysis meth-ods it covers, the reasonably extensive andup-to-date references cited, and its noveltreatment of nonparametric methods forcensored samples. Unfortunately, a criticalweakness of this book is its poor organiza-tion: many interesting nonparametric testsare described, in varying levels of detail, butenormous effort is required on the part ofthe reader to see the connections betweenthem, or to relate them to other statisticalideas.For example, the following is a typical

definition of a nonparametric procedure [5,p. 63]: “any statistical procedure which hascertain properties holding true under veryfew assumptions made about the under-lying population from which the data areobtained.” One of the surprising featuresof D&R is that the authors never definenonparametric procedures, assuming thatthe reader knows, first, what they are and,second, that better known parametric pro-cedures are inadequate to their particularanalytical task. Thus, while the authorsgive reasonably detailed descriptions of atleast eight alternatives to the classical t-test,this test is only discussed in two contexts:first (on page 125), as a basis for com-puting asymptotic relative efficiencies fornonparametric tests, and second (on page192), as the basis for the nonparametricrank-transformed t-test.The standard t-test is a parametric pro-

cedure for testing the hypothesis that themeans of two random samples are distinct,based on the working assumption that bothare Gaussian and statistically independent.That is, suppose {xk} is a sequence of

nx observations with distributionN(µx, σ2x)

and {yk} is a sequence of ny observationswith distribution N(µy, σ2

y). Under the ad-ditional assumption that both variances—while unknown—are equal, the test statis-tic u = (x− y)/∆ has a t-distribution withnx + ny − 2 degrees of freedom under thenull hypothesis that µx = µy [4, p. 216],where x is the mean of the sequence {xk},y is the mean of the sequence {yk}, and thedenominator ∆ is given by

(1)

∆=

√∑nx

k=1(xk−x)2+

∑ny

k=1(yk−y)2

nx+ny−2

(1nx

+ 1ny

).

Since this distribution is known, we cancompute u and reject the null hypothesisof equal means if the resulting probabil-ity is unacceptably small (e.g., less than5%). Extensions of the t-test are availablefor cases where the two sample variancesare both unknown and unequal [4, p. 216],but nonparametric alternatives are appro-priate in cases where the data distributionis either unknown or known to be stronglynon-Gaussian.One of the well-known nonparametric al-

ternatives to the t-test discussed in D&R isthe Wilcoxon rank-sum test, based on thefollowing idea. We are given two samplesas before, but we now assume only that{xk} has a distribution F (x)—not neces-sarily Gaussian—and that {yk} has the dis-tribution F (y −∆), of the same functionalform but with a possibly different medianvalue. The objective here is to test the nullhypothesis that both distributions are thesame (i.e., that ∆ = 0) against an alter-native hypothesis like ∆ > 0, ∆ < 0, or∆ �= 0. The Wilcoxon rank-sum test pro-ceeds by first forming the combined sampleof size N = m+ n,(2){Z1, Z2, . . . , ZN} = {x1, . . . , xm, y1, . . . , yn},

and then computing the ranks Ri of theyi values in this combined sample. Theseranks are summed to obtain the statistic

(3) WY =n∑i=1

Ri,

and unusually large or unusually small val-ues ofWY are viewed as evidence that∆ > 0or ∆ < 0, respectively. The question of

Page 34: SIAM to review

426 BOOK REVIEWS

what constitutes “unusually large” or “un-usually small” can be decided on the basisof probabilities computed exactly for smallsamples under the assumption of statisticalindependence of the individual observations{Zk}, but it is more commonly decided byappealing to asymptotic normality argu-ments. D&R discusses both approaches forthis test and many of the other nonpara-metric alternatives considered, frequentlygiving published rules of thumb concern-ing minimum sample sizes for asymptoticnormality assumptions to be reasonable.One of the strongest points of D&R is its

unified treatment of a number of nonpara-metric tests like this one, for three cases: thecase of continuous random variables whereties are not possible (i.e., Zi = Zj is a zeroprobability event), the practically impor-tant case where ties do occur, and the case ofcensored random variables that is particu-larly important in biostatistics applications.For the Wilcoxon rank-sum test, the basisfor this unified treatment is Noether’s al-ternative reformulation in terms of w-ranks,defined as [D&R, p. 92]

w(Zh) = (Number of Zk < Zh)

− (Number of Zk > Zh)

=N∑k=1

sign (Zh, Zk),(4)

where the sign function is defined as

(5) sign (a, b) =

{ −1 a < b,0 a = b,

+1 b < a.

Under the null hypothesis, the w-ranks rep-resent a sequence of zero-mean random vari-ables with an asymptotically normal limit-ing distribution. This observation leads tothe test statistic

(6) ZW∗ =

∑N

h=m+1 w(Zh)[m(N−m)N(N−1)

∑N

h=1 w2(Zh)]1/2 ,

interpreted as a zero-mean, unit varianceGaussian random variable under the nullhypothesis. One advantage of this formu-lation over the original Wilcoxon rank-sumtest is that, in the event of ties betweenthe observations Zk, it is necessary to mod-ify the definition of ranks on which the

Wilcoxon statistic is based. Since the signfunction remains well defined in the eventof ties, no modification of the w-rank for-mulation is required.Another advantage of the w-rank for-

mulation is that it extends easily to thecase of censored observations. Again, it isironic that although the term appears inthe title of D&R, the authors never define“censored data,” which can take a vari-ety of forms. (For example, Hajek, Sidak,and Sen [2] devoted 16 pages to the treat-ment of censoring, considering five differentcensoring models.) The model assumed byD&R is random right-censoring, in whichwe observe either the true value x of a ran-dom variable or a censored value x+ that issmaller than the (unknown) true value x bya random amount. One of the most com-mon applications of this model is in medicalstudies where the continuous variable of in-terest is the time to some significant event,such as death, heart attack, or organ trans-plant failure. Patients who experience theseevents during the course of the study havea known survival time relative to the startof the study, while those who are fortunateenough not to experience these events dur-ing the study, or who dropped out duringthe study, exhibit censored survival times.That is, we know only that they survivedlonger than some known time x+. Further,the censoring model adopted by D&R as-sumes that the unobserved difference x−x+

is statistically independent of the true valuex, leading to what is known as noninfor-mative censoring. Hajek, Sidak, and Sen [2,p. 148] noted that this assumption, whilewidely made in practice, “deserves a carefulappraisal.”The w-rank reformulation of Wilcoxon’s

rank-sum test can be extended easily to thecensored data model just described by sim-ply redefining the sign function as follows.Let a ≺ b denote “a is decidedly below b,”meaning that either a < b (i.e., both valuesare uncensored) or a ≤ b+ (i.e., the cen-sored value b+ is at least as large as theuncensored value a). Note that two cen-sored values are not comparable under thispartial order since a < b, a = b, or a > bare all consistent with a+ < b+ for two cen-sored values. The modified sign functionis obtained by replacing the arithmetic in-

Page 35: SIAM to review

BOOK REVIEWS 427

equality < appearing in (5) with the partialorder relation ≺, i.e.,

(7) sign (a, b) =

{ −1 a ≺ b,+1 b ≺ a,0 otherwise.

Substituting this definition of sign(a, b) into(4) leads to a simple modification of the teststatistic defined in (6) known as Gehan’sWilcoxon test, which again exhibits a stan-dard normal limiting distribution, under theassumption that the censoring distributionsare the same for {xk} and {yk}. Relax-ing this assumption leads to the Taroneand Ware test, which the authors showcan be cast into a common framework withGehan’s Wilcoxon test and the logrank test,another nonparametric test that can be ap-plied to test the distributional equality ofcensored data samples.The preceding discussion illustrates one

of the stated objectives of D&R and proba-bly its greatest strength: the unified treat-ment of nonparametric procedures for bothcensored and uncensored real-valued dataobservations. Unfortunately, this strengthis greatly diminished by the book’s hap-hazard organization. For example, theWilcoxon rank-sum procedure is introducedon page 88, after which w-ranks are intro-duced on page 92 in a discussion of thetreatment of ties, where (6) is given explic-itly. Gehan’s Wilcoxon test for the censoredcase is then treated on page 114, followingdiscussions of proportional hazards mod-els, the Savage scores test, P-P plots fortwo sample problems, and detailed treat-ments of linear rank statistics for a varietyof different distributions. When Gehan’sWilcoxon test is presented, a slightly differ-ent notation is used so that (6) becomes twoequations, and a forward reference is givenin Remark 2.17 to section 2.6.3 describingTarone and Ware’s extension for unequalcensoring distributions, but these names arenot mentioned; instead, it is only noted that“this modification is based on the work ofBreslow (1970),” a reference that is not citedin section 2.6.3. Similarly, the authors con-clude Chapter 3, “Procedures for PairedSamples,” with a one-paragraph subsec-tion on the “paired Prentice-Wilcoxon test,”noting that this test, proposed by O’Brienand Fleming in 1987, was recommended by

Woolson and O’Gorman in a comparativestudy published in 1992. Unfortunately, theparagraph says nothing about what testswere compared in this study and it pro-vides only the following description of thePrentice-Wilcoxon test: “This is a general-ization of the procedure discussed in Sub-section 3.3.3 for the censored data, whereinstead of ranks, Prentice–Wilcoxon scoresare used. For further details, see O’Brienand Fleming (1987).” Section 3.3.3 is a half-page description of the rank-transformedt-test mentioned at the beginning of thisreview. Despite repeated attempts to doso, I have been unable to find any dis-cussion of the Prentice–Wilcoxon scores inD&R: the closest approximation seems tobe a discussion of Prentice–Peto Wilcoxonscores for randomized complete block de-signs with censored data given two chapterslater, on page 293.Ultimately, the greatest weakness of

D&R is that it provides little or no helpin placing the wide variety of ideas andresults it presents in some larger statisticalperspective. The book is almost exclusivelyconcerned with hypothesis testing (indeed,even the short sections on linear and lo-gistic regression given in Chapter 6, “In-dependence, Correlation, and Regression,”are primarily concerned with testing thesignificance of estimated parameters), andit adopts the working assumption that thereader knows both that they need to use anonparametric test procedure and why. Forexample, the notion of asymptotic relativeefficiency (ARE) is introduced and specificresults are given for two nonparametric testsrelative to the t-test for Gaussian data dis-tributions. Since the t-test is optimal forGaussian data, these efficiencies are lessthan 1. These observations, together withan uncritical acceptance of the historicallypopular Gaussian data hypothesis, wouldlead to the clear conclusion that these non-parametric procedures are inferior to thebetter known t-test: Why use them? Theanswer—which is not discussed in D&R—isthat nonparametric procedures provide sig-nificant protection against the nonnormal-ity of real data. Indeed, this observationis the reason that rank-based proceduresclosely related to many of those discussedby D&R form an important class of ro-

Page 36: SIAM to review

428 BOOK REVIEWS

bust estimation procedures (specifically, R-estimators [3, sect. 3.4]), designed to beresistant to departures from normality as-sumptions. To illustrate the importance ofaddressing these departures, Huber [3, p. 2]discussed a debate between Eddington andFisher in the early 20th century over therelative merits of two scale estimates—themean absolute deviation for a data sequence{xk} with mean x,

(8) d =1N

N∑k=1

|xk − x|,

and the corresponding standard deviationestimate,

(9) s =

[1N

N∑k=1

(xk − x)2]1/2

.

For Gaussian data, the ARE of d relativeto s is 0.876, which seemed to settle the de-bate for a long time: s was more efficient,hence better. Conversely, for Gaussian datacontaminated with outliers having the samemean but three times larger standard de-viation, 0.2% contamination is enough tooffset this efficiency advantage, while at 5%contamination the ARE of d relative to sis 2.035, demonstrating the clear advan-tage of this robust alternative for realisticdepartures from Gaussianity. Even a sin-gle example, preferably near the beginningof the book, demonstrating when and howa nonparametric test procedure is prefer-able to its traditional parametric coun-terpart, would significantly enhance D&R,especially since it is offered as a textbook[D&R, Preface, p. vii]. Similarly, potentialstudents are likely to struggle with resultslike those given for Example 5.7 on page 293,which presents a small case study compar-ing the effectiveness of two treatments witha control. Results are presented for two dif-ferent nonparametric test procedures, oneof which declares the treatments differentat the 5% significance level, while the otherdoes not. Unfortunately, no further discus-sion of this contradiction is given, nor areeither recommendations or advice to helpthe student decide what to do next.Another important limitation of D&R

is the authors’ exclusive focus on the useof the SAS computing environment. While

it is true that SAS is very widely used,especially in biostatistics, the consequencesof this restriction in focus are most evidentin the final chapter of the book. Chapter7, “Computer-Intensive Methods,” is onlyseven pages long, with two and a half pagesdevoted to permutation and randomizationtests and three pages devoted to bootstrapmethods. A number of useful referencesare cited, but the chapter concludes withthe following sentence: “It may be notedthat many of the computer programs forimplementing the bootstrap methods arewritten in S+.” This effective dismissal ofbootstrap methods is most unfortunate,particularly in view of their utility in test-ing relatively complex hypotheses, such asthe multimodality hypothesis discussed byEfron and Tibshirani, where simple alter-natives are not available [1, sect. 16.5].An example more directly related to thetopics considered by D&R is the bootstrap-based extension of the Wilcoxon–Mann–Whitney (WMW) test recently described byReiczigel, Zakarias, and Rozsa [6]. Like theWilcoxon rank-sum test described above,the WMW procedure tests the null hy-pothesis of equality of sample distribu-tions (∆ = 0 in the preceding discussion)against the shift alternative ∆ �= 0. Thebootstrap test described by Reiczigel, Za-karias, and Rozsa tests the null hypothesisP (X < Y ) = P (X > Y ), analogous to butnot equivalent to ∆ = 0, against the alter-native hypothesis P (X < Y ) �= P (X > Y ),which is much more general than ∆ �= 0since it does not assume the two sequencedistributions have the same shape and dif-fer only in location parameter. Motivationfor this development was the need for auseful alternative to the WMW test whensignificant differences in scale or distribu-tional shape (e.g., skewness) were possible,since it is known that the WMW test canperform poorly under these circumstances.Overall, it would be difficult to recom-

mend the book by Desu and Raghavaraoas a textbook, despite the fact that it con-tains some very useful results and somenovel ways of relating these results. Withthe aid of other references like some ofthose cited below to give a broader per-spective to the material, and by consultingthe original references cited in the book,

Page 37: SIAM to review

BOOK REVIEWS 429

I believe a dedicated reader could benefitsignificantly from working through the ma-terial presented here. Unfortunately, muchof this material must be “pondered, weakand weary” like Poe’s quaint and curiousvolumes of forgotten lore.

REFERENCES

[1] B. Efron and R. Tibshirani, An Intro-duction to the Bootstrap, Chapman andHall/CRC, New York, 1993.

[2] J. Hajek, Z. Sidak, and P. K. Sen, The-ory of Rank Tests, Academic Press,San Diego, CA, 1999.

[3] P. J. Huber, Robust Statistics, JohnWiley,New York, 1981.

[4] A. Madansky, Prescriptions for Work-ing Statisticians, Springer-Verlag, NewYork, 1988.

[5] E. B. Manoukian, Modern Concepts andTheorems of Mathematical Statistics,Springer-Verlag, New York, 1986.

[6] J. Reiczigel, I. Zakarias, and L. Rozsa,A bootstrap test of stochastic equalityof two populations, Amer. Statist., 59(2005), pp. 1–6.

RONALD K. PEARSON

ProSanos Corp.

Mathematical Modeling in ContinuumMechanics. Second Edition. By Roger M.Temam and Alain M. Miranville. Cambridge Uni-versity Press, Cambridge, UK, 2005. $50.00.xii+342 pp., hardcover. ISBN 0-521-61723-5.

This text is addressed to beginning gradu-ate students. It begins with an introductionto the basic concepts of continuum mechan-ics. This is followed by a chapter on fluids,which includes sections on magnetohydro-dynamics, combustion, and modeling of theatmosphere and ocean. The third part ofthe book is devoted to linear and nonlinearelasticity. The final part of the book fo-cuses in some depth on linear and nonlinearwaves.The choice of topics is well matched with

the research interests of applied mathemati-cians, and the presentation of the materialemphasizes issues of interest to analysts,more so than traditional texts on continuummechanics. While no technical background

in functional analysis or partial differentialequations is required or provided, the booklays a foundation which will help the stu-dent put these fields into an applied context.The text includes many useful exercises.I have a few criticisms. The back cover

refers to a “brisk” style. In places, I findit so brisk that it is not clear what isaccomplished. For instance, a section ti-tled “Chemistry of the Atmosphere and theOcean” really does nothing more than statea complex system of coupled equations. Thesection on boundary layers nicely presentsa simple ODE example, but does little toconvey a sense of the complexity associatedwith actual high Reynolds number bound-ary layers. The coverage of more physicalaspects is not always as strong as the moremathematical side. A section on “Hypere-lastic Materials in Biomechanics” containsnothing about any specific problem or ma-terial in biomechanics; it introduces somespecial elastic constitutive models withoutdiscussion of alternatives, plusses and mi-nuses, etc. A very short section on non-Newtonian fluids says that “even thoughthis model is not necessarily very realistic,we usually consider . . . [the Reiner–Rivlinfluid].” What is usual and who is we?Notwithstanding such shortcomings, the

book will be a valuable addition to the lit-erature. For other texts at the same level,Gurtin’s text, An Introduction to Nonlin-ear Continuum Mechanics, comes to mind.In comparing the two, Gurtin places moreemphasis on the foundations of continuummechanics, while this text places more em-phasis on continuum mechanics as an ap-plied field for partial differential equations.

MICHAEL RENARDY

Virginia Polytechnic Instituteand State University

Constrained Optimization and ImageSpace Analysis. Volume 1. Separationof Sets and Optimality Conditions. ByFranco Giannessi. Springer SBM, New York, 2005.$149.00. xii+395 pp., hardcover. ISBN 0-387-24770-X.

This is a monograph on the mathemat-ics of optimization, primarily in finite-dimensional spaces. It begins with a chap-

Page 38: SIAM to review

430 BOOK REVIEWS

ter introducing the reader to constrainedoptimization, then covers in succession theelements of convex analysis in finite dimen-sions, the image space analysis mentionedin the title, results on separation (muchmore general and extensive than those of-ten seen in optimization books), and thedevelopment of optimality conditions. Asthis list indicates, the primary emphasis ison nonlinear continuous optimization, butthere is also mention of some topics in dis-crete optimization. As written, this is avery substantial and useful monograph, butit is probably less suitable for use as a textwithout supplementation (in particular, ex-ercises). However, it would be an excellentreference in a course at the graduate level inengineering or the advanced undergraduatelevel in mathematics, as well as for workingmathematicians interested in optimization.This book is of unusually fine quality, andthough it requires careful reading it willamply repay the time required.One thing that makes this book stand

out from many others is, of course, its em-phasis on the image space viewpoint: thatis, looking at the images of the objectiveand constraint functions and deducing op-timality conditions (and other information)from their properties, e.g., by the use ofvarious kinds of separation in the spaceof those images rather than in the spaceof the problem’s decision variables. In myview, the optimization community has notsufficiently appreciated the power of thisapproach to an optimization problem, andI hope this book will reawaken interest in it.A notable feature of this work is the depth

and completeness of coverage of many ofthe topics. The author not only presentsthe ideas, but also places them in perspec-tive and connects them in ways that areonly possible if one has understood the ma-terial well, thought carefully about it overtime, and then formulated a multidimen-sional view that includes the links and rela-tionships among concepts as well as the factsabout them. The author explores these linksand relationships at some length in the sec-tions titled “Comments,” at the ends of thechapters, as well as elsewhere. In fact thesecomment sections often present minihisto-ries of the development of some of the basic

ideas, with the author’s incisive commentson previous work and on desirable future di-rections. A good example is the discussionof the Farkas lemma on page 294, wherethe author points out how the basic ideahas been generalized, sometimes by work-ers who seemed to be less than completelyfamiliar with the original.Another aspect of depth and complete-

ness is the extensive use of examples. Goodexamples are excellent teaching tools, butthey are also at least as useful to the work-ing research mathematician as they are tothe student. They abound in this book,and we are in the author’s debt for them.Just to cite one such example, the presenta-tion of the Peano function in Example 2.4.1(p. 121) and its discussion in the commenton page 137 points out an error made byLagrange and by many others subsequently.It’s extremely instructive, yet I wonder howmany students of optimization these daysknow about it. The author has done thecommunity a real service in presenting thisand similar examples to help us appreciatethe finer points of the subject.Although I think the author has done

very well indeed in his part of the task,I can’t say the same for the pub-lisher, which appears to have done nocopyediting at all. Typographical er-rors of all kinds abound: for exam-ple, “istance” for “instance” (p. 53),“Griinbaum” for “Grunbaum” (p. 138, withthe correct spelling on p. 139!), “Sacks”for “Saks” (p. 141), “Theirie” for “Theo-rie” (p. 305), “ta” for “the” and “tempta-tive” for “tentative” (p. 364), “Foudas” for“Floudas,” “Kluver Academie” for “KluwerAcademic,”and “od” for “of” (p. 376). Inaddition to these there are many other slipsthat a good copyeditor would have caught,such as bad spacing, incorrect accents, andnonstandard journal abbreviations. Withthe author’s having put so much effort andcare into the writing of this book, it is re-grettable that the publisher has taken solittle care about its part of the job. It is es-pecially sad to see this in a book under theimprint of Springer, which the communityhas been accustomed to regard as a housewith high standards and a distinguishedpublisher of mathematics.

Page 39: SIAM to review

BOOK REVIEWS 431

This book should be welcome on the shelfof any mathematician who wants to under-stand important parts of the theory un-derlying continuous optimization and whoenjoys learning about the context as well asthe content of that theory. The title sug-gests that Volume 2 may be underway; letus hope it will appear soon.

STEPHEN M. ROBINSON

University of Wisconsin–Madison

Orthogonal Polynomials: Computationand Approximation. By Walter Gautschi.Oxford University Press, New York, 2004.$119.50. x+301 pp., hardcover. ISBN 0-19-850672-4.

The occasional user of the computationalaspects of orthogonal polynomials is fa-miliar with the orthogonal polynomialsfor the classical weight functions w(x) =(1−x)α(1+x)β (α , β > −1) on the interval[−1, 1] (Jacobi, Gegenbauer, Legendre, andChebyshev polynomials), w(x) = xαe−x

(α > −1) on [0,∞) (Laguerre), or w(x) =e−x

2on (−∞,∞) (Hermite). In particular,

their application in Gaussian quadratureand function approximation (particularlyfor the Chebyshev case) is a topic describedin many books. In these and related casesmany properties of the orthogonal polyno-mials are known. For their numerical eval-uation, explicit relations or the three-termrecursion relations can be used. The samerecurrence relation, with known analyticalformulas for their coefficients, suffices tocompute the zeros and weights for the as-sociated Gauss quadrature by solving aneigenvalue problem. In addition, extensivetables are available in the literature or onthe Web. In contrast to these very classicalcases, when one needs to compute orthogo-nal polynomials and related quadrature for-mulas for other weights, explicit formulasfor the recurrence coefficients are not avail-able and numerical methods are needed tocompute these coefficients and/or the nodesand weights of Gaussian quadrature.In the present book these and other top-

ics are considered, not only for the occa-sional user. In the author’s preface the first

lines are, “The main purpose of this bookis to present an account of computationalmethods for generating orthogonal polyno-mials on the real line (or part thereof), todiscuss a number of related applications,and to provide software necessary for im-plementing all methods and applications.”The main objective is the discussion of nu-merical algorithms for computing nonclas-sical orthogonal polynomials, but the bookoffers much more. Walter Gautschi is a topexpert in the field and his new book gives awide and up-to-date account of the theoryof orthogonal polynomials and their numer-ical aspects.The book is organized in three chapters.

The first chapter is devoted to the basictheory of orthogonal polynomials. The top-ics covered include standard topics in theclassical theory, starting from the basic def-initions and properties and continuing withthe analysis of the three-term recurrence re-lations, Christoffel–Darboux formulas, theconnection with continued fractions, andtheir application in Gaussian quadrature. Abrief section collects basic properties of theso-called classical orthogonal polynomials.The chapter closes with three not so classi-cal topics: kernel polynomials, Sobolev or-thogonal polynomials (which later reappearin Chapter 2), and orthogonal polynomialson the semicircle. This first chapter, apartfrom being necessary for the consistency ofthe book, is a very nice and concise reviewfor readers who have a basic knowledge ofthe theory of orthogonal polynomials andare interested in a quick and up-to-datereview.The second chapter (“Computational

Methods”) describes, among other topics,numerical methods for solving the centralproblem considered in this book: the com-putation of the first n coefficients of thethree-term recurrence relations satisfied byorthogonal polynomials with respect to agiven positive measure. Both their com-putation via moment information and bymeans of discretization methods are de-scribed. The chapter starts with a sectiondevoted to moment-based methods. In thefirst 23 pages, the condition of the prob-lem of computing the nodes and weights ofn-point Gaussian quadrature from moment

Page 40: SIAM to review

432 BOOK REVIEWS

information, as well as the condition of therelated problem of determining the firstn coefficients of the three-term recurrencesatisfied by the polynomials, are analyzed.It is shown how the use of modified mo-ments helps in improving the condition ofthe problem. After this theoretical intro-duction, the modified Chebyshev algorithmfor computing the coefficients of the recur-rence in terms of the modified moments isdescribed in detail; in this algorithm, theconcept of mixed moments is introduced,which also finds application for the con-version problem (i.e., change of base in apolynomial expansion from one set of or-thogonal polynomials to another one). Thisfirst section ends with several examplescomparing the actual error growth withthat predicted at the beginning of thechapter. The second section of the chap-ter “Computational Methods” deals withthe discretization methods. These methodsuse the explicit formulas for the recursioncoefficients in terms of inner products in-volving the orthogonal polynomials; theresulting integrals are computed by an ade-quate Gaussian formula, and the orthogonalpolynomials are generated in parallel withthis computation. Numerical examples areprovided. Finally, the last three sectionsof the chapter deal, respectively, with thecomputation of Cauchy integrals of or-thogonal polynomials (with applications),the modification algorithms (for computingorthogonal polynomials for weight func-tions modified by rational factors), andthe computation of Sobolev orthogonalpolynomials.In some sense, the second chapter is the

core chapter of the book. It will be of in-terest for researchers in the area. It is alsoa useful chapter for more general users be-cause it contains references to available soft-ware, and useful algorithms are described indetail (modified Chebyshev, Clenshaw algo-rithm for computing sums, modification al-gorithms). Maybe an index for algorithms,programs, and exercises, apart from thegeneral index, would have been desirable inorder to more easily identify the algorithmswhen mentioned.Depending on their research interests,

some readers will be more interested insome sections than others of the last chap-

ter, which offers an “a la carte menu”composed of computation of quadraturerules, least squares problems, moment-preserving spline interpolation, and com-putation of slowly convergent series. Thischapter contains not only applications ofthe previously described methods and algo-rithms but also new computational topics,particularly related to the computation ofquadrature rules. In the first section of thischapter, it is described how to computeGauss and related quadrature formulas (in-cluding Gauss–Radau and Gauss–Lobatto)from the coefficients of the recurrence.Gauss–Kronrod, Gauss–Turan, and ratio-nal quadrature formulas (of Gauss, Gauss–Radau, and Gauss–Lobatto type) are alsoconsidered in the following subsections. Thecomputation of Cauchy principal value inte-grals, of polynomials orthogonal on severalintervals, and the quadrature estimationof matrix functionals are also applicationsof Gauss quadrature included in this firstsection. The second section of the chapterdeals with the application of orthogonalpolynomials to least squares approxima-tion, including constrained approximationand least squares approximation in Sobolevspaces (useful for approximating both func-tion and derivative values). The third sec-tion deals with moment-preserving splineinterpolation. The last section discussesthe application of quadratures to the eval-uation of slowly convergent power series,in particular of series with a general termak which can be expressed as the Laplacetransform of a known function f(t); forcomputing these series, Gauss quadraturebased on the weight function t/(et − 1) isused. Also, alternating series of Laplacetransforms, of derivatives of Laplace trans-forms, as well as some series inspired inplate contact problems are considered.The range of topics covered in this book

is considerably wide and goes beyond thecentral problem of computing recurrencecoefficients given a weight function (or itsmoments). In the preface the author says:“The choice of topics, admittedly, is influ-enced by the author’s own past involvementin this area, but it is hoped that the treat-ment given, and especially the software pro-vided, will be useful to a large segment ofthe readership.” Given the wide range of

Page 41: SIAM to review

BOOK REVIEWS 433

applications of orthogonal polynomials andthe usefulness of the methods described,this will certainly be the case.

JAVIER SEGURA

Universidad de Cantabria, Spain

A Guide to Monte Carlo Simulations inStatistical Physics. Second Edition. ByDavid P. Landau and Kurt Binder. CambridgeUniversity Press, Cambridge, UK, 2005. $70.00.xv+432 pp., hardcover. ISBN 0-521-84238-7.

The first edition of this book came out in2000 and since then has already been citednearly as often as the now-classic earliercollections of reviews on Monte Carlo meth-ods, edited by K. Binder, were cited in thatsame time period. This monograph gives acoherent overview of the field, including 20pages of Fortran 77 programs to learn thedetails.The new edition is larger by nearly

50 pages. I counted 18 new sections orsubsections in the already existing 12chapters. The main addition, in my opin-ion, is the entirely new Chapter 13 on“Monte Carlo Methods Outside Physics.”This chapter summarizes protein foldingand then, more briefly, other biologicallyinspired physics, mathematics/statistics,sociophysics, econophysics, traffic simula-tions, and medicine. (Networks of Watts–Strogatz or Barabasi–Albert type are notmentioned here.) More detailed reviewsare cited in nearly all of these sections, andthe authors also state that some of thesesimulations have only tenuous relations toreality. Expert readers may see some workmore positively and some more negativelythan the two authors, but certainly thisnew chapter gives a good introduction tothis previously “exotic” branch of physics.Thus I wish this book a third edition (in

which German variable names in the bondfluctuation algorithm should be translated)and hope that then the authors will refrainfrom adding too many new pages to thisvery useful book.

DIETRICH STAUFFER

University of Cologne

Practical Fourier Analysis for MultigridMethods. By Roman Wienands and WolfgangJoppich. Chapman and Hall/CRC, Boca Raton,FL, 2005. $71.96. xiv+217 pp., hardcover. ISBN1-58488-492-4.

Myth: Multigrid is inefficient for three-dimensionalproblems.[Several mouse clicks] 3D Poisson problem,standard second-order finite-difference dis-cretization, V (1, 1) cycle with Red-Blackrelaxation . . . . And the spectral radius of athree-grid cycle is . . . 0.23. Not so bad afterall. We expect the complete multigrid Vcycle to perform nearly as well, that is, toasymptotically reduce the error by nearlya factor of 0.23 per cycle.

Myth: Successive over-relaxation and multigriddon’t mix so well.[Click click] Same problem and meth-ods, but with over-relaxation parameterω = 1.15, and the spectral radius drops to0.095, a rather substantial improvement.

Myth: Multigrid is robust for the anisotropic diffu-sion problem if alternating line relaxation is used.[Mouse at work] Anisotropic diffusion, ro-tated at 45◦ relative to the grid, alternatingline relaxation in a V (1, 1) cycle . . . . Andthe spectral radius for the three-grid cy-cle is . . . 0.94. Disappointing! The two-gridspectral radius is 0.75, and the one-gridspectral radius (smoothing factor) is anenticing 0.08. This ominous deteriorationwith respect to the number of grids is asure sign that the complete multigrid cy-cle is going to perform very poorly, dueto inadequate coarse-grid correction. AW (1, 1) cycle should do somewhat bet-ter . . . [click] . . . 0.89. Better, but still un-acceptable. Wait, I read somewhere thatoverweighting the residuals can help . . .[click click] . . .with a residual weightingfactor of 1.75, the three-grid spectral ra-dius drops to 0.70. This is worthy of furtherinvestigation. But first, let’s try a Galerkincoarse-grid discretization, W(1,1) cycle, noresidual over-weighting . . . [click] . . . 0.72 forthe three-grid cycle. Ok, one more try:didn’t someone claim that higher ordertransfers should be employed in (Petrov)Galerkin discretization of singular per-turbation problems? Clicking over to bi-

Page 42: SIAM to review

434 BOOK REVIEWS

cubic interpolation we obtain . . . 0.33 forthe three-grid spectral radius.

The above describes excerpts of a sessionusing the multigrid Fourier analysis toolthat accompanies the recent book by Wien-ands and Joppich. Fourier analysis is themain quantitative tool for the developmentand assessment of multigrid algorithms andfor debugging multigrid code. First intro-duced by Brandt in the 1970s (see [1]) asa predictor of the smoothing properties ofrelaxation, local mode analysis, as it is com-monly called, has been utilized by multigriddevelopers and practitioners to select sub-processes and parameters in multigrid algo-rithms and to verify correctness of codes fora very wide range of problems. This includesnonlinear, nonsymmetric, and/or singular-perturbation equations and systems thatoften prove difficult to analyze, even quali-tatively, by other means. Though rigorousonly in special cases, the practical utility oflocal mode analysis is tremendous, and text-books on multigrid methods often featureone or several chapters on these techniques.On top of that, many papers have beenpublished on various aspects of this sub-ject. And yet, we have before us a completebook devoted solely to this.Why is such a book warranted? The au-

thors ask this very question in the first pageof the preface. Their answer has to do withthe software that accompanies the book,LFA, and the associated graphical user in-terface (GUI), xlfa. This software providesthe user with a first-class quantitative anal-ysis tool for multigrid algorithms. It is rea-sonably straightforward to apply, especiallywith the extensive explanations and exam-ples provided in the book, and it includesa substantial number of features. Amongthese is an impressive and diverse collectionof problems already implemented, featuringscalar partial differential equations (PDEs)in two and three dimensions, as well as PDEsystems in two dimensions. The features in-clude a wide variety of relaxation methodsand intergrid transfers, various coarseningstrategies, direct and Galerkin coarse-griddiscretizations, and optional parameters forrelaxation, residual weighting, and correc-tion weighting. In addition to the prob-lems already implemented, the user can in-

troduce new problems and new coarse-gridcorrection methods. This is a remarkablyuseful tool for testing new ideas, verify-ing old ones, exploring parameter regimes,debugging code, and also for aiding inthe teaching of basic and intermediate-levelmultigrid techniques.The book itself is comprised of two dis-

tinct parts of approximately equal volume.The first part is made up of four chapters.Chapter 1 introduces notation and somebasics: iterative methods, Fourier compo-nents, and a very brief look at multigridmethods. It also introduces the GUI. Hereand later, the book acts in part as auser’s guide to the software. Chapter 2 isa very brief description of Fourier analy-sis for multigrid algorithms, its scope andlimitations. Chapter 3 begins with a ratherbrief introduction to multigrid methods, in-cluding a formal definition of the relevantalgorithms. It then goes back to the soft-ware, in particular the GUI, and describesits features in relation to the different partsof the multigrid algorithm. Thus, at theend of the first three chapters, the readerhas a pretty good idea of how to use thesoftware and what it can do. Nevertheless,I found this manner of introduction some-what unusual. I might have preferred theintroduction to multigrid to have been givenin a single early chapter (which most po-tential users could probably skip), followedby a basic chapter on Fourier analysis, andthen a user’s guide–type chapter that couldbe employed easily both for learning howto use the software and for back-referenceduring use. Also, the English, especially inthe parts that describe the software, couldbenefit from some polishing up. Chapter 4,which concludes the first part of the book, isan excellent collection of case studies thatdemonstrate the features of the software,while also providing a crash course in how toadapt multigrid algorithms to various spe-cial difficulties. The problems covered startwith simple examples, but quickly go on toanisotropic problems, high-order discretiza-tions, fourth-order equations, singular per-turbation problems, three-dimensional ap-plications, and coupled systems of equa-tions.The second part of the book describes

the theory of local mode analysis. While

Page 43: SIAM to review

BOOK REVIEWS 435

much of the material (excluding the three-grid analysis) can be found in other sources,nowhere is it given in such scope and de-tail. Chapter 5 is devoted to single-grid(i.e., smoothing) analysis, including theconcept of h-ellipticity. Chapter 6 describestwo-grid and three-grid analysis in greatdetail and accuracy. Finally, Chapter 7 dis-cusses rather briefly some more specializedaspects, namely, orders of intergrid trans-fers, a simplified multigrid Fourier analy-sis, analysis for cell-centered discretization,and analysis of GMRES preconditioned bymultigrid. These subjects are only touchedupon, and the interested reader is referredto the literature.To summarize, I would recommend this

book to anyone seriously interested in de-velopment of multigrid algorithms for PDEsand systems. The subject of local modeanalysis is treated accurately and in greatdetail, including a large and diverse col-lection of examples. The software that ac-companies the book is of great use and notdifficult to master. The book is a fairly easyread for the most part, but it suffers fromthe lack of a more thorough proofreading.Fortunately, the benefits greatly outweighthis minor deficiency.

REFERENCE

[1] A. Brandt, Multi-level adaptive solutionsto boundary-value problems, Math.Comp., 31 (1977), pp. 333–390.

IRAD YAVNEH

Technion, Israel

Real Analysis: Measure Theory, Integra-tion, and Hilbert Spaces. By E. Stein and M.Shakarchi. Princeton University Press, Prince-ton, NJ, 2005. $59.95. xx+402 pp., hardcover.ISBN 0-691-11386-6.

The book under review is a text for a first-year graduate course in analysis. However,this text has a rather unique perspective;indeed, as is stated in the foreword, RealAnalysis is the third volume in the Prince-ton Lectures in Analysis, a series of fourtextbooks that aim to present, in an inte-grated manner, the core areas of analysis.

The authors also state that the book servesas a basis for a 48-lecture-hour course.This book therefore is not designed to

cover all topics often found in a first-yeargraduate course in analysis. Nevertheless,all the topics that are covered are certainlycore topics in analysis. The first three chap-ters deal with Lebesgue measure theory andintegration theory, including optional sec-tions on the Brunn–Minkowski inequality,a Fourier inversion formula, the Minkowskicontent of a curve, and the isoperimetricinequality in the plane.The next two chapters deal with Hilbert

spaces, including sections on Fatou’s the-orem, the Riesz representation theorem,compact operators, and the Fourier trans-form on L2. As a nice application of thesemethods, they prove a result concerning theexistence of solutions to partial differentialequations with constant coefficients.Chapter 5 is concluded with an optional

section on the Dirichlet principle, includinga treatment of finding a solution by the di-rect method, which then results in obtaininga weakly harmonic solution. By employingpreviously proved results, it is concludedthat the solution is a classical one. Finally,it is shown that the solution continuouslyassumes its desired boundary values. Thisis done only in the plane because the higher-dimensional analogue requires material be-yond their scope. However, there is a veryaccessible treatment of the Wiener criterionfor harmonic functions (which is not as wellknown as it should be) by L. C. Evans andR. Jensen [2]. Also, see G. C. Evans [1].Chapter 6 serves as a complement to the

first three chapters by covering abstractintegration theory which includes outermeasures, Fubini’s theorem, the Radon–Nikodym theorem, as well as an optionalsection on ergodic theorems.The last chapter, Chapter 7, deals with

fractals and Hausdorff measure. Inevitably,there is a discussion of Hausdorff dimen-sion, self-similarity, and space-filling curves.There is also an interesting treatment of theRadon transform and Besicovitch’s Kakeyasets. Their proof of the existence of such setsrelies on the concept of self-replicating sets.As one would expect from these authors,

the exposition is, in general, excellent. Theexplanations are clear and concise with

Page 44: SIAM to review

436 BOOK REVIEWS

many well-focused examples as well as anabundance of exercises, covering the fullrange of difficulty. While this monographmay not be the book of choice for the typi-cal first-year graduate course in analysis (forreasons stated above), it certainly must beon the instructor’s bookshelf as a first-ratereference book.

REFERENCES

[1] G. C. Evans, A necessary and sufficientcondition of Wiener, Amer. Math.Monthly, 54 (1947), pp. 151–155.

[2] L. C. Evans and R. Jensen, A bound-ary gradient estimate for harmonicfunctions and applications, in Nonlin-ear Partial Differential Equations andTheir Applications, College de FranceSeminar, Vol. I, Res. Notes in Math.53, Pitman, Boston, London, 1981,pp. 160–176.

WILLIAM P. ZIEMER

Indiana University