1. introduction.1. introduction. numerical stability analysis related to dense linear algebra...

THE SCIENCE OF DERIVING STABILITY ANALYSES ∗

PAOLO BIENTINESI† AND ROBERT A. VAN DE GEIJN‡

Abstract. We introduce a methodology for obtaining inventories of error results for families ofnumerical dense linear algebra algorithms. The approach for deriving the analyses is goal-oriented,systematic, and layered. The presentation places the analysis side-by-side with the algorithm sothat it is obvious where error is introduced, making it attractive for use in the classroom. Yet it issufficiently powerful to support the analysis of more complex algorithms, such as blocked variantsthat attain high performance.

• Fix the footnote signs in the author list.

1. Introduction. Numerical stability analysis related to dense linear algebraoperations is one of the most important topics in numerical analysis. It is part ofalmost any introductory and advanced text on numerical linear algebra and earnedone of its pioneers, J.H. Wilkinson, a Turing Award in 1970. Yet few practitionersand students will admit to liking the topic. With this paper, we hope to inject somejoy into the subject.

Parlett [?] observes that “One of the major difficulties in a practical analysisis that of description. An ounce of analysis follows a pound of preparation.” Thissuggests the desirability of a standardization of the description of error analyses.Higham [?] echoes this sentiment: “Two reasons why rounding error analysis can behard to understand are that, first, there is no standard notation and, second, erroranalyses are often cluttered with re-derivations of standard results.” His statement alsosuggests that an inventory of results, to be used in subsequent analyses, is desirable.We ourselves assert that texts and papers alike focus on the error results and theirproofs rather than on exposing a methodology that can be used by practitioners andstudents to obtain error results in a systematic way. Especially in a classroom setting,assignments related to stability analysis reward the recognition of tricks because nosystematic methodology is explicitly passed from a typical instructor to the student.

As part of the FLAME project, we approach such problems differently. As com-puter scientists, we are inherently as much, if not more, interested in the methodologyfor deriving results as in the results themselves. The goal for us is to identify notationand a procedure that, in principle mechanically, yields the solution. In the case ofderiving algorithms for linear algebra operations we were able to identify notation to-gether with a procedure that is sufficiently systematic that a mechanical (automated)system resulted [1]. In this paper, we show that this notation and procedure can beextended to equally systematically (although not yet mechanically) derive stabilityanalyses.

This paper makes the following contributions:• The notation we use deviates from tradition. As much as possible we ab-

stract away from details like indices in an effort to avoid clutter, both in thepresentation of the algorithms and in the error analyses of those algorithms.

2Department of Computer Science, P.O. Box 90129, Duke University, Durham, NC 27708-0129;[email protected].

3Department of Computer Sciences, 1 University Station C0500, The University of Texas, Austin,TX 78712; [email protected].

∗ This work was supported in part by NSF grants ????. Any opinions, findings and conclusions orrecommendations expressed in this material are those of the author(s) and do not necessarily reflectthe views of the National Science Foundation.

1

We are not alone in this: many texts, (e.g., [4]), partition matrices in analyses.We do believe we do so more systematically and more consistent with howwe present algorithms.• In most texts and courses the error results are presented as theorems with

proofs that incorporate one or more obscure tricks, often with different tricksfor different operations. In contrast, we show that the derivation of errorresults is a goal-oriented exercise: Given an operation and an algorithm thatcomputes it, a possible error result is motivated and is subsequently estab-lished hand-in-hand with its proof. The methodology naturally leads to asequence of error results that can be cataloged for future reuse. If in an anal-ysis a subresult has not yet been cataloged, the methodology can be appliedto derive that subresult.• We motivate and demonstrate the methodology by applying it to a sequence of

progressively more difficult algorithms for which results were already known.The presentation is such that it can be used as a suppliment to a text beingused in a class, including exercises.• We apply the methodology to analyze a blocked LU factorization algorithm

that is closely related to the most commonly used high-performance algorithmfor LU factorization with partial pivoting. To our knowledge, no completeanalysis of that algorithm has been previously given.

We do not claim that the approach is different from what an expert does as he/sheanalyzes an algorithm. We do claim that we provide structure so that such analysescan be performed by people with considerably less expertise.

Experience tells us that there are many, including experts, in this particular fieldwho think at the index level and for whom the abstractions themselves are clutter.However, we have noticed that our approach often arouses a reaction like “As muchas I have been trying to avoid [stability] analysis for the last ten years, I can only saythat I like [this].” Indeed, this pretty much summarizes our own reaction as we wrotethis paper. One should not dismiss the approach without at least trying some of themore advanced exercises.

While the paper is meant to be self-contained, full understanding will come fromfirst reading our paper on the systematic derivation of algorithms in this problemdomain [?]. The reason is that the derivation of the analysis of an algorithm mirrorsthe derivation of the algorithm itself, as discussed in the conclusion.

It is impossible for us to give a complete discussion of related work. We refer thereader to Higham’s book [?], which lists no less than 1134 citations.

The organization of this paper ...

2. Preliminaries. In this section we present the minimal notation and back-ground required for this paper.

2.1. Notation. We adopt the common convention that scalars, vector, and ma-trices will generally be denoted by lower case Greek, lower case Roman, and uppercase Roman letters, respectively. Exceptions include the letters i, j, k, m, and n,which will denote scalar integers.

The notation δχ, where the symbol δ touches a scalar variable χ, indicates aperturbation associated with the variable χ. Similarly, δx and ∆X (∆ touches X)indicate a perturbation vector and matrix associated with vector x and matrix X,respectively.

The functions m(X) and n(X) return the row and column dimension of matrix(or vector) X, respectively.

2

2.2. Floating point computation. Much of the notation related to error anal-ysis in this paper was taken and/or inspired by the notation in [6].

We introduce definitions and results regarding floating point arithmetic. In thispaper, we focus on real valued arithmetic only. Extensions to complex arithmetic arestraightforward.

As is customary when discussing error analyses, we distinguish between exact andcomputed quantities. The function [expression] returns the result of the evaluation ofexpression, where every operation is executed in floating point arithmetic1. Equalitybetween the quantities lhs and rhs is denoted by lhs = rhs. Assignment of rhs to lhs isdenoted by lhs := rhs (lhs becomes rhs). In the context of a program, the statementslhs := rhs and lhs := [rhs] are equivalent. Given an assignment κ := exp, we use thenotation κ to denote the quantity resulting from [exp], which is actually stored in thevariable κ.

We will denote the machine epsilon or unit roundoff by u. It is typically of theorder of 10−8 and 10−16 for single and double precision arithmetic, respectively2.The unit roundoff is defined as the maximum positive floating point number whichcan be added to the number stored as 1 without changing the number stored as 1:[1 + u] = 1.

The Standard Computational Model (SCM) assumes that, for any two floatingpoint numbers χ and ψ, the basic arithmetic operations satisfy the equality

[χ op ψ] = (χ op ψ)(1 + ε), |ε| ≤ u, and op ∈ {+,−, ∗, /}.

The quantity ε is function of χ, ψ and op. Sometimes we add a subscript (ε+, ε∗, . . . )to indicate what operation generated the (1 + ε) error factor. We always assume thatall the input variables to an operation are floating point numbers.

For certain problems it is convenient to use the Alternative Computational Model(ACM) [6] which also assumes for the basic arithmetic operations that

[χ op ψ] =χ op ψ1 + ε

, |ε| ≤ u, and op ∈ {+,−, ∗, /}.

As for the standard computation model, the quantity ε is a function of χ, ψ and op.Note that the ε’s produced using the standard and alternative models are generallynot equal.

2.3. Stability of a numerical algorithm. In the presence of round-off error,it cannot be expected that an implementation of a function using a specified algo-rithm yields the exact result. Thus, the notion of “correctness” applies only to theexecution of the algorithm in exact arithmetic. Here we briefly introduce the notionof “stability.”

Let f : D → R be a mapping from the domain D to the rangeR and let f : D → Rrepresent the mapping that captures an implementation of f using a given algorithmand executing using floating point arithmetic so that error accumulates. The algo-rithm is said to be backward stable if for all x ∈ D there exists a perturbed inputx ∈ D that is close to x such that f(x) = f(x). In other words, the computed result

1Note that [x+ y + z/w] = [[[x] + [y]] + [[z] / [w]]], provided the operations are performed in thatorder.

2u is machine dependent; it is a function of the parameters characterizing the machine arithmetic:u = 1

2β1−t, where β is the base and t is the precision of the floating point number system for the

machine.

3

equals the result obtained when the exact function is applied to slightly perturbeddata. The difference between x and x, δx = x− x, is the perturbation to the originalinput x.

The reasoning is as follows. The input to a function typically has error associatedwith it. This could be due to measurement error when obtaining the input and/or theresult of converting real numbers to floating point numbers when storing the input ona computer. If it can be shown that an implementation is backward stable, then it hasbeen shown that the result of the computation could have been obtained from slightlycorrupted input. Thus, one can think of the error introduced by the implementationas being similar in nature as the error introduced when obtaining the input data inthe first place.

In the above discuss the difference between x and x, δx, is the backward error3

and the forward error is given by f(x)−f(x). Throughout the remainder of this paperwe will be concerned with bounding the backward and/or forward errors introducedby the algorithms executed with floating point arithmetic.

2.4. Absolute value of vectors and matrices. In the discussion of error, thevague notions of “near” and “slightly perturbed” are used. Making these notionsexact usually requires the introduction of measures of size for vectors and matrices,i.e., norms. Instead, for the operations analyzed in this paper, all bounds are given interms of the absolute values of the individual elements of the vectors and/or matrices.While it is easy to convert such bounds to bounds involving norms, the converse isnot true.

Definition 2.1. Given x ∈ Rn and A ∈ Rm×n,

|x| =

|χ0|...

|χn−1|

and |A| =

|α0,0| . . . |α0,n−1|

.... . .

|αm−1,0| . . . |αm−1,n−1|

.

Definition 2.2. Given x, y ∈ Rn, |x| ≤ |y| iff |χi| ≤ |ψi|, i = 0, . . . , n − 1.Given A,B ∈ Rm×n, |A| ≤ |B| iff |αij | ≤ |βij |, i = 0, . . . ,m− 1 and j = 0, . . . , n− 1.Similarly, <, >, and ≥ are defined over vectors and matrices.

The following result is exploited in later sections:Lemma 2.3. Let A ∈ Rm×k and B ∈ Rk×n. Then |AB| ≤ |A||B|.Exercise 2.4. Prove Lemma 2.3.The fact that the bounds that we establish can be easily converted into bounds

involving norms is a consequence of the following theorem:Theorem 2.5. Let A,B ∈ Rm×n. If |A| ≤ |B| then ‖A‖1 ≤ ‖B‖1, ‖A‖∞ ≤

‖B‖∞, and ‖A‖F ≤ ‖B‖F .Exercise 2.6. Prove Theorem 2.5.

2.5. Deriving dense linear algebra algorithms. In various papers, we haveshown that given a dense linear algebra operation, one can systematically derivealgorithms [5] for computing it. The primary vehicle for this is a worksheet thatis filled in a prescribed order [2]. We do not discuss this worksheet for derivingalgorithms in order to keep the focus of this paper on the derivation of error analyses.However, we encourage the reader to compare the error worksheet, introduced next,to the worksheet for deriving algorithm in our other publications. We do note that

3The backward error is not necessarily unique. We will be interested in showing that there is ax “near” x such that f(x) = f(x).

4

Algorithm Dot: κ := xT y

κ = 0Partition x→

(xTxB

), y →

(yTyB

)

where xT and yT are emptywhile m(xT ) < m(x) do

Repartition„xT

xB

«→0@x0

χ1x2

1A ,

„yT

yB

«→0@y0ψ1y2

1A ,

where . . .

κ := κ+ χ1ψ1

Continue with(xT

xB

)←

(x0χ1

x2

),(yT

yB

)←

(y0ψ1

y2

),

endwhile

Fig. 3.1. Algorithm for computing κ := xT y.

the order in which the error worksheet, discussed next, is filled mirrors the order inwhich the worksheet for derivation is filled.

3. Stability of the dot product operation and an Introduction to theError Worksheet. The triangular solve algorithm in the next section will requirethe computation of the dot (inner) product operation (Dot) of vectors x, y ∈ Rndefined by κ := xT y. In this section, we give an algorithm for this operation and therelated error results. This section also introduces the error-worksheet as a frameworkfor presenting the error analysis side-by-side with the algorithm.

3.1. An algorithm for computing Dot. We will consider the algorithm givenin Fig. 3.1. It uses the FLAME notation [5, ?] to express the computation

κ := (((χ0ψ0 + χ1ψ1) + · · · ) + χn−2ψn−2) + χn−1ψn−1, (3.1)

in the indicated order.

3.2. Preparation. Under the computational model given in Section 2.2, thecomputed result of (3.1), κ, satisfies

κ = (((χ0ψ0(1 + ε(0)∗ ) + χ1ψ1(1 + ε

(1)∗ ))(1 + ε

(1)+ ) + · · · )(1 + ε

(n−2)+ )

+ χn−1ψn−1(1 + ε(n−1)∗ ))(1 + ε

(n−1)+ ) (3.2)

=n−1∑

i=0

χiψi(1 + ε

(i)∗ )

n−1∏

j=i

(1 + ε(j)+ )

, (3.3)

where ε(0)+ = 0 and |ε(0)∗ |, |ε(j)∗ |, |ε(j)+ | ≤ u for j = 1, . . . , n − 1. Clearly, a notation tokeep expressions from becoming unreadable is desirable .

Lemma 3.1. Let εi ∈ R, 0 ≤ i ≤ n − 1, nu < 1, and |εi| ≤ u. Then ∃ θn ∈ Rsuch that

∏n−1i=0 (1 + εi)±1 = 1 + θn, with |θn| ≤ nu/(1− nu).

5

Proof: By Mathematical Induction.Base case. n = 1. Trivial.Inductive Step. Inductive Hypothesis (I.H.): Assume that if for all εi ∈ R, 0 ≤ i ≤n− 1, nu < 1, and |εi| ≤ u, there exists a θn ∈ R such that

∏n−1i=0 (1 + εi)±1 = 1 + θn,

with |θn| ≤ nu/(1− nu).We will show that if εi ∈ R, 0 ≤ i ≤ n, (n+ 1)u < 1, and |εi| ≤ u, there exists a

θn+1 ∈ R such that∏ni=0(1+εi)±1 = 1+θn+1, with |θn+1| ≤ (n+ 1)u/(1− (n+ 1)u).

Case 1:∏ni=0(1 + εi)±1 =

∏n−1i=0 (1 + εi)±1(1 + εn). See Exercise 3.2.

Case 2:∏ni=0(1 + εi)±1 = (

∏n−1i=0 (1 + εi)±1)/(1 + εn). By the I.H. there exists a θn

such that (1 + θn) =∏n−1i=0 (1 + εi)±1 and |θn| ≤ nu/(1− nu). Then

∏n−1i=0 (1 + εi)±1

1 + εn=

1 + θn1 + εn

= 1 +θn − εn1 + εn︸︷︷︸θn+1

,

which tells us how to pick θn+1. Now

|θn+1| =∣∣∣∣θn − εn1 + εn

∣∣∣∣ ≤|θn|+ u1− u

≤nu

1−nu + u

1− u=nu + (1− nu)u(1− nu)(1− u)

=(n+ 1)u− nu2

1− (n+ 1)u + nu2≤ (n+ 1)u

1− (n+ 1)u.

By the Principle of Mathematical Induction, the result holds.Exercise 3.2. Complete the proof of Lemma 3.1.The quantity θn will be used throughout this paper. It is not intended to be a

specific number, but instead is an order of magnitude identified by the subscript n,which indicates how many error factors of the form (1 + εi) and/or (1 + εi)−1 aregrouped together to form (1 + θn).

The bound on |θn| occurs throughout and we assign it a symbol:Definition 3.3. For all n ≥ 1 and nu < 1, define γn := nu/(1− nu).With this notation, (3.3) simplifies to

κ = χ0ψ0(1 + θn) + χ1ψ1(1 + θn) + · · ·+ χn−1ψn−1(1 + θ2) (3.4)

where |θj | ≤ γj , j = 2, . . . , n.Two instances of the symbol θn, appearing even in the same expression, typicallydo not represent the same number. For example, in (3.4) a (1 + θn) multiplies eachof the terms χ0ψ0 and χ1ψ1, but these two instances of θn, as a rule, do not denotethe same quantity.

As part of the analyses the following bounds will be useful to bound error thataccumulates:

Lemma 3.4. If n, b ≥ 1 then γn ≤ γn+b and γn + γb + γnγb ≤ γn+b.Exercise 3.5. Prove Lemma 3.4.

3.3. Target result. It is of interest to accumulate the roundoff error encoun-tered during computation as a perturbation of input and/or output parameters:

• κ = (x+ δx)T y,• κ = xT (y + δy), and/or• κ = xT y + δκ.

6

The first two are backward error results (error is accumulated onto input parameters)while the last one is a forward error result (error is accumulated onto the answer).Under different circumstances, a different error result may be needed by analyses ofoperations that require a dot product.

Let us focus on the second result. Ideally one would show that each of the entriesof y is slightly perturbed relative to that entry:

δy =

σ0ψ0

...σn−1ψn−1

=

σ0 · · · 0...

. . ....

0 · · · σn−1

ψ0

...ψn−1

= Σ(n)y,

where each σi is “small” and Σ(n) = diag(σ0, . . . , σn−1).The following special case of Σ will be used in the remainder of the paper:

Σ(n) =

0× 0 matrix if n = 0θ1 if n = 1diag(θn, θn, θn−1, . . . , θ2) otherwise.

(3.5)

Here, and in the future, recall that θj is an order of magnitude variable with |θj | ≤ γj .Exercise 3.6. Let k ≥ 0 and assume that |ε1|, |ε2| ≤ u, with ε1 = 0 if k = 0.

Show that(I + Σ(k) 0

0 (1 + ε1)

)(1 + ε2) = (I + Σ(k+1)). Hint: reason the case where

k = 0 seperately from the case where k > 0.We now state a theorem that captures how error is accumulated by the algorithm

and give a more traditional proof.Theorem 3.7. Let x, y ∈ Rn and let κ := xT y be computed by executing the

algorithm in Fig. 3.1. Then κ =[xT y

]= xT (I + Σ(n))y.

3.4. A proof in a more traditional format. In the below proof, we will picksymbols to denote vectors so that the proof can be easily related to the alternativeframework to be presented in Section 3.5.

Proof: By Mathematical Induction.Base case. m(x) = m(y) = 0. Trivial.Inductive Step. I.H.: Assume that if xT , yT ∈ Rk then

[xTT yT

]= xTT (I + ΣT )yT , where ΣT = Σ(k).

We will show that if xT , yT ∈ Rk+1 the result again holds.Assume xT , yT ∈ Rk+1 and partition

xT →(x0

χ1

)and yT →

(y0ψ1

).

Then[(x0

χ1

)T (y0ψ1

)]=

[[xT0 y0

]+ χ1ψ1

](definition)

=[xT0 (I + Σ0)y0 + [χ1ψ1]

](I.H. with xT = x0,yT = y0, and Σ0 = Σ(k))

=(xT0 (I + Σ0)y0 + χ1ψ1(1 + ε∗)

)(1 + ε+) (SCM, twice)

=(x0

χ1

)T ((I + Σ0) 0

0 (1 + ε∗)

)(1 + ε+)

(y0ψ1

)(rearrangement)

= xTT (I + ΣT )yT (renaming),7

Error side Step

{ Σ = 0} 1a

κ = 0

Partition

x→„xT

xB

«, y →

„yT

yB

«Σ→

„ΣT 0

0 ΣB

«

where xT is empty and ΣT is 0× 0

4

˘k = m(xT ) ∧ κ = xT

T (I + ΣT )yT ∧ ΣT = Σ(k)¯

2a

while m(xT ) < m(x) do 3˘k = m(xT ) ∧ κ = xT

T (I + ΣT )yT ∧ ΣT = Σ(k)¯

2b

Repartition„xT

xB

«→0@x0

χ1x2

1A ,

„yT

yB

«→0@y0ψ1y2

1A ,

„ΣT 0

0 ΣB

«→0@

Σold0 0 0

0 σold1 0

0 0 Σ2

1A

where χ1, ψ1, and σold1 are scalars

5a

˘k = m(x0) ∧ κold = xT

0 (I + Σold0 )y0 ∧ Σold

0 = Σ(k)¯

6

κ := κ+ χ1ψ1

κnew =`κold + χ1ψ1(1 + ε∗)

´(1 + ε+) SCM, twice

(ε+ = 0 if k = 0)

=`xT0 (I + Σ

(k)0 )y0 + χ1ψ1(1 + ε∗)

´(1 + ε+) Step 6: I.H.

=

„x0

χ1

«T I + Σ

(k)0 0

0 1 + ε∗

!(1 + ε+)

„y0ψ1

«rearrange

=

„x0

χ1

«T `I + Σ(k+1)

´„ y0ψ1

«Exercise 3.6

8

8>><>>:

(k + 1) = m

„x0

χ1

«∧ κnew =

„x0

χ1

«T „I +

„Σnew

0 00 σnew

1

««„y0ψ1

«

∧„

Σnew0 00 σnew

1

«= Σ(k+1)

9>>=>>;

7

Continue with„xT

xB

«← x0χ1

x2

!,

„yT

yB

«← y0ψ1

y2

!,

„ΣT 0

0 ΣB

«←

Σnew0 0 00 σnew

1 0

0 0 Σ2

!5b

˘k = m(xT ) ∧ κ = xT

T (I + ΣT )yT ∧ ΣT = Σ(k)¯

2c

enddo ˘k = m(xT ) ∧ κ = xT

T (I + ΣT )yT ∧ ΣT = Σ(k) ∧m(xT ) = m(x)¯

2d˘n = m(x) ∧ κ = xT(I + Σ(n))y

¯1b

Fig. 3.2. Error worksheet completed to establish the backward error result for the given algo-rithm that computes the Dot operation.

where |ε∗|, |ε+| ≤ u, ε+ = 0 if k = 0, and (I + ΣT ) =(

(I + Σ0) 00 (1 + ε∗)

)(1 + ε+)

so that ΣT = Σ(k+1).

By the Principle of Mathematical Induction, the result holds.

3.5. A Weapon of Math Induction: the error worksheet. We focus thereader’s attention on Fig. 3.2 in which we present a framework, which we will callthe error worksheet, for presenting the inductive proof of Theorem 3.7 side-by-sidewith the algorithm for Dot. This framework, in a slightly different form, was firstintroduced in [1]. The expressions enclosed by { } (in the grey boxes) are predicatesthat indicate the state of the variables.

The inductive proof of Theorem 3.7 is captured by the error worksheet as follows:8

Base case. The predicate

κ = xTT (I + ΣT )yT ∧ k = m(xT ) ∧ ΣT = Σ(k)

holds before the loop starts and appears in Step 2a. Here and throughout “∧” standsfor the logical AND operator.Inductive step. Assume that


holds at the top of the loop at Step 2b. Then Step 6, 7, and 8 in Fig. 3.2 prove itagain holds at the bottom of the loop at Step 2c. Specifically,

• Step 6 holds since there x0 = xT , y0 = yT , and Σold0 = ΣT .

• The update in Step 8-left introduces the error indicated in Step 8-right (SCM,twice), yielding the results for Σnew

0 and σnew1 , leaving the variables in the state

indicated in Step 7.• Finally, the redefinition of ΣT in Step 5b completes the inductive step.

By the Principle of Mathematical Induction,


holds for all iterations and in particular, if the loop terminates,

κ = xTT (I + ΣT )yT ∧ k = m(xT ) ∧ ΣT = Σ

with m(xT ) = m(x), holds after the loop.Definition 3.8. We will call the predicate involving the operands and error

operants in Steps 2a–d the error-invariant for the analysis. Notice that this predicateis true before and after each iteration of the loop.The reader will likely think that the error worksheet is like a sledgehammer thatis used to kill a fly (the analysis of Dot). In later sections we will see that thissame sledgehammer can also be used to kill a cow (the analysis of the solutionof a triangular system), an ox (the analysis of a standard algorithm for the LUfactorization of a matrix) and an elephant (the analysis of a blocked algorithm forthe LU factorization of a matrix).

3.6. Results. We now state a number of useful consequences of Theorem 3.7.These will be used later as an inventory (library) of error results from which to drawwhen analyzing operations and algorithms utilize Dot.

The goals set in Section 3.3, formalized with bounds on the perturbations, areconsequences of Theorem 3.7:

Corollary 3.9. Under the assumptions of Theorem 3.7 the following hold:R1. κ = (x+ δx)T y, where |δx| ≤ max(γ2, γn)|x|,R2. κ = xT (y + δy), where |δy| ≤ max(γ2, γn)|y|, andR3. κ = xT y + δκ, where |δκ| ≤ max(γ2, γn)|x|T |y|.

Proof: We leave the proofs of R1 and R2 as an exercise. For R3, let δκ = xTΣ(n)y,where Σ(n) is as in Theorem 3.7. Then

|δκ| = |xTΣ(n)y| ≤ |χ0||θn||ψ0|+ |χ1||θn||ψ1|+ · · ·+ |χn−1||θ2||ψn−1|≤ γn|χ0||ψ0|+ γn|χ1||ψ1|+ · · ·+ γ2|χn−1||ψn−1| ≤ max(γ2, γn)|x|T |y|.

9

Algorithm Ltrsv: Compute x s.t. Lx = b

Partition L→„LTL 0

LBL LBR

«, x→

„xT

xB

«, b→

„bTbB

«,

where LTL, xT , and bT are empty

while m(xT ) < m(x) do

Repartition„LTL 0

LBL LBR

«→0@L00 0 0

lT10 λ11 0L20 l21 L22

1A ,

„xT

xB

«→0@x0

χ1

x2

1A ,

„bTbB

«→0@b0β1

b2

1A ,

where λ11, χ1, and β1 are scalars

χ1 := (β1 − lT10x0)/λ11

Continue with„LTL 0

LBL LBR

«←0@L00 0 0

lT10 λ11 0

L20 l21 L22

1A ,

„xT

xB

«←0@x0

χ1

x2

1A ,

„bTbB

«←0@b0β1

b2

1A

endwhile

Fig. 4.1. An algorithm for solving Lx = b.

Exercise 3.10. Prove R1 and R2.The corollary holds true (with minor tightening of the bounds) when λ = 1, as

will be encountered when, in the operation that computes the solution of a triangularsystem, the matrix has ones on the diagonal.

4. Analysis of an Algorithm for Solving a Triangular System and anIntroduction to Goal-Oriented Error Analysis. In this section, we discuss agoal-oriented methodology for analyzing the error accumulated by an algorithm whenexecuted under the model of floating point arithmetic given in Section 2.2. We in-troduce it hand-in-hand with the analysis of an algorithm for the solution of a lowertriangular matrix, which serves both as an illustration of the methodology and itselfis of interest as a key operation encountered when solving linear systems.

4.1. An algorithm for solving a lower triangular linear system. Thesolution of a lower triangular system of equations (Ltrsv) can be stated as Lx = b,where L ∈ Rn×n is a nonsingular lower triangular matrix, L and b are input operandsand x is the output operand. In Fig. 4.1 we show a specific algorithm for this operation.This particular algorithm assumes that the state (loop-invariant in the terminologyof [5, ?]) LTLxT = bT , is maintained in the variables before and after each iterationof the loop.

4.2. Preparation. The following error result for computing [(α+ β)/λ] will beneeded for the error analysis of the solution of a triangular system. In that setting, βis the result of a dot product.

More error results for the inner-product. Lemma 4.1. Let α, β and λ bescalars and consider σ := (α+ β)/λ. The following three relations hold:

R1. σ = ((α+ δα) + (β + δβ))/λ, where |δα| ≤ γ2|α| and |δβ| ≤ γ2|β|;R2. σ = (α+ β)/(λ+ δλ), where |δλ| ≤ γ2|λ|;R3. σλ = α+ β + ε, with |ε| ≤ γ2|σ||λ|.

If in the above assignment λ = 1, then the results hold with γ1 in place of γ2.10

Proof: R1 follows directly from the definition of SCM. R2 follows directly from thedefinition of ACM (twice) and Lemma 3.1:

σ =[α+ β

λ

]=

[[α+ β]λ

]=

[α+ β

λ(1 + ε+)

]=

α+ β

λ(1 + ε+)(1 + ε/)

=α+ β

λ(1 + θ2)=α+ β

λ+ δλ, where δλ = λθ2.

R3. is an immediate consequence of R2.R1 and R2 state that the computed result σ is equal to the exact computation

with slightly perturbed inputs. They indicate how the error generated in performingsuch an assignment can be “thrown back” in different ways at the α, β, and/or λinput variables. In other words, they establish that the computation of [(α+ β)/λ] isbackward stable. R3 is an example of a bound on the forward error (α + β)/λ − σ,formulated in a way that avoids problems if λ = 0.

The following lemma prepares us for the analyses of algorithms that use the Dotas a suboperation.

Theorem 4.2. Consider the assignments

µ := α− (xT y) and ν :=(α− (xT y)

)/λ,

where α, λ, µ, ν ∈ R and x, y ∈ Rn and parenthesis identify the evaluation order. ThenR1. xT y + µ = α+ δα, where |δα| ≤ γn+1(|x|T |y|+ γ1|α| ≤ γn+1(|x|T |y|+ |α|).R2. xT y + µ = α+ δµ, where |δµ| ≤ γn|x||y|+ γ1|µ| ≤ γn(|x||y|+ |µ|).R3. xT y + νλ = α+ δα, where |δα| ≤ γn|x|T |y|+ γ2|ν||λ| ≤ max(γ2, γn)(|x|T |y|+

|ν||λ|).R4. (x+ δx)T y + (λ+ δλ)ν = α, where |δx| ≤ γn|x| and |δλ| ≤ γ2|λ|.R5. (x+ δx)T y + (λ+ δλ)ν = α, where

∣∣∣∣(δxδλ

)∣∣∣∣ ≤ γn+1

∣∣∣∣(x

λ

)∣∣∣∣.

Proof: R1 follows Theorem 3.7 and SCM: µ = (α−xTΣ(n)y)(1+ε1), algebraic manip-ulation, and bounding. R2 follows Theorem 3.7 and ACM: ν = (α−xTΣ(n)y)/(1+ε2),algebraic manipulation, and bounding. R3 and R4 follows from similarly invocat-ing Theorem 3.7 and Lemma 4.1 R2. R5 follows from R4 if n ≥ 1, since thenγ2, γn ≤ γn+1. When n = 0 it follows from ACM.

Exercise 4.3. Work out the details of the proof of Theorem 4.2.Exercise 4.4. Under the conditions of Theorem 4.2 show that1. (x+ δx)T y + λν = α+ δα), where |δα| ≤ γ2|α| and |δx| ≤ γn+2|x|.2. (x + δx)T y + (λ + δλ)ν = α + δα, where |δα| ≤ γ1|α|, |δx| ≤ γn+1|x| and|δλ| ≤ γ1|λ|.

4.3. Analysis. The reader should pretend that at this point of the discussionthat all the grey-shaded boxes in Fig. 4.2 are empty as well as the analysis in Step 8-right, since we are going to constructively derive the contents of those fields, in theorder indicated in the “Steps” column.

Step 1: The error-precondition and error-postcondition. The first ques-tion is what error result is desired. We will show that the computed solution, x,satisfies the backward error result (L+ ∆L)x = b with |∆L| ≤ γn|L|.

The reasoning for the factor γn is as follows: we would expect the maximal errorto be incurred during the final iteration when computing χ1 = (β1− lT10x0)/λ. Notice

11

Error side Step

{ ∆L = 0} 1a

Partition

L→„LTL 0LBL LBR

«, x→

„xT

xB

«, b→

„bTbB

«, ∆L→

„∆LTL 0∆LBL ∆LBR

«

where LTL, bT , xT , and ∆LTL are empty

4

{k = m(xT ) ∧ (LTL + ∆LTL)xT = bT ∧ |∆LTL| ≤ γk|LTL|} 2awhile m(xT ) < m(x) do 3

{k = m(xT ) ∧ (LTL + ∆LTL)xT = bT ∧ |∆LTL| ≤ γk|LTL|} 2b

Repartition„LTL 0

LBL LBR

«→0@L00 0 0

lT10 λ11 0L20 l21 L22

1A ,

„xT

xB

«→0@x0

χ1

x2

1A ,

„bTbB

«→0@b0β1

b2

1A

„∆LTL 0

∆LBL ∆LBR

«→0@

∆L00 0 0

δlT10 δλ11 0∆L20 δl21 ∆L22

1A

where λ11, δλ11, χ1, and β1 are scalars

5a

{k = m(x0) ∧ (L00 + ∆L00)x0 = b0 ∧ |∆L00| ≤ γk|L00|} 6

χ1 :=β1−lT10x0

λ11

b0 = (L00 + ∆L00)x0

∧ |∆L00| ≤ γk|L00|ff

Step 6: I.H.

β1 =`lT10 + δlT10

´x0 + (λ11 + δλ11) χ1

∧˛`δlT10 δλ11

´˛ ≤ γk+1

˛`lT10 λ11

´˛ff

Theorem 4.2 R5

8

8>><>>:

k + 1 = m

„x0

χ1

«∧„„

L00 0

lT10 λ

«+

„∆L00 0

δlT10 δλ

««„x0

χ1

«=

„b0β1

«

∧˛˛„

∆L00 0

δlT10 δλ

«˛˛ ≤ γk+1

˛˛„L00 0

lT10 λ

«˛˛

9>>=>>;

7

Continue with

. . .

„∆LTL 0

∆LBL ∆LBR

«←0@

∆L00 0 0

δlT10 δλ11 0

∆L20 δl21 ∆L22

1A 5b

{k = m(xT ) ∧ (LTL + ∆LTL)xT = bT ∧ |∆LTL| ≤ γk|LTL|} 2cenddo

k = m(xT ) ∧ (LTL + ∆LTL)xT = bT ∧ |∆LTL| ≤ γk|LTL|∧m(xT ) = m(x)

ff2d

{n = m(x) ∧ (L+ ∆L)x = b ∧ |∆L| ≤ γn|L|} 1b

Fig. 4.2. Error worksheet for deriving deriving the backward stability error result for the algo-rithm in Fig. 4.1.

that this involved a dot product with vectors of length n−1. Thus, from Theorem 4.2R6 it seem reasonable to expect the indicated factor.In practice, the analysis typically proceeds in two stages. In the first stage, theanalysis proceeds without the error bounds. It is merely used to show that thereis a ∆L such that (L + ∆L)x = b. In the process, it becomes obvious which errorresult regarding suboperations are used to demonstrate the existence of ∆L. Thisthen in turn allows one to make an educated guess as to the bound that can beestablished. Subsequent to this, the analysis as presented below proceeds. Forspace considerations, we have incorporated the proof of the bounds in the derivationpresented below.

We will call the predicate that describes the state of the error operand beforethe algorithm is executed the error-precondition and the predicate that describes thetarget error result the error-postcondition. In Fig. 4.2 these are given in Step 1a

12

and 1b by {∆L = 0} and {n = m(x) ∧ (L+ ∆L)x = b ∧ |∆L| ≤ γn|L|}, respectively.

Step 2: The error-invariant. Next, we decide what the state of the erroroperands, the error-invariant, should be before and after the execution of each it-eration. It makes sense that this equals the error-postcondition restricted to thecomputation performed so far. The algorithm has the property that before and aftereach iteration the contents of xT are described by the predicate {LTLxT = bT }. Thus,we expect the accumulated error to satisfy the error-invariant

{k = m(xT ) ∧ (LTL + ∆LTL)xT = bT ∧ |∆LTL| ≤ γk|LTL|},

which appears in Steps 2a–d.

Step 3: The loop-guard. The loop-guard is determined as part of the deriva-tion of the algorithm and there is nothing to be done for this step here. We retain thisstep number so that the error worksheet closely resembles the worksheet for derivingthe algorithm.

Step 4: Initialization. In Step 4, the error operand, ∆L, is partitioned confor-mal to the operands in the algorithm.

Step 5: Moving through the error operand. In Steps 5a and 5b, the er-ror operand is repartitioned conformal to the repartitioning of the operands in thealgorithm.

Step 6: State of the variables before the update in Step 8. Step 5arelabels

LTL → L00, xT → x0, bT → b0, and ∆LTL → ∆L00.

Thus, given the state of the variables given in Step 2b, the contents of the submatricesand subvectors exposed in Step 5a are described by

k = m(x0) ∧ (L00 + ∆L00)x0 = b0 ∧ |∆L00| ≤ γk|L00| (4.1)

at Step 6. This replacement of symbols is called textual substitution.

Step 7: State of the variables after the update in Step 8. At the bottomof the loop, the variables must again be in a state where the loop-invariant holds,as indicated by Step 2c. In Step 5b, the different quadrants of L and ∆L, and thesubvectors of x and b, are redefined so that

LTL ←(L00 0lT10 λ11

), xT ←

(x0

χ1

), bT ←

(b0β1

), and ∆LTL ←

(∆L00 0δlT10 δλ11

).

Thus, given the state in which the variables must be in Step 2c, the contents of thesubmatrices and subvectors before Step 5b must be

k + 1 = m

(x0

χ1

)∧

((L00 0lT10 λ11

)+ ∆L00

) (x0

χ1

)=

(b0β1

)(4.2)

∧∣∣∣∣(

∆L00 0δlT10 δλ11

).

∣∣∣∣ ≤ γk+1

∣∣∣∣(L00 0lT10 λ11

)∣∣∣∣ (4.3)

at Step 6. Again, this predicate is attained via simple textual substitution.13

Step 8: The inductive step. The question now is whether the update

χ1 := (β1 − lT10x0)/λ11,

together with an appropriate error result for this operation and a clever choice ofupdate to ∆L00, δlT10, and δλ11, will take us from the state described in (4.1) to thestate described in (4.2)-(4.3).

Manipulating (4.2)–(4.3), we find that we must choose the error variables to satisfy

b0 = (L00 + ∆L00)x0 ∧ |∆L00| ≤ γk+1|L00|β1 =

(lT10 + δlT10

)x0 + (λ11 + δλ11) χ1 ∧

∣∣( δlT10 δλ11

)∣∣ ≤ γk+1

∣∣( lT10 λ11

)∣∣ .

Since L00, x0, and b0 are not updated in Step 8-left, the top constraint is satisfied byvirtue of (4.1). (In the language of an inductive proof: this is true by the I.H.) Forthe second, we consult our inventory of error results and find Theorem 4.2 Result R5.The analysis is also given in Step 8-right of Fig. 4.2.

Wrapping it all up. Fig. 4.2, together with the above discussion, proves R1 ofTheorem 4.5. Let L ∈ Rn×n be a nonsingular lower triangular matrix and

x, b ∈ Rn. Assume that the x that solves Lx = b is computed via the algorithm inFig. 4.2(left side). Then

R1. x solves (L+ ∆L)x = b, where |∆L| ≤ γn|L|.R2. x satisfies Lx = b+ δb where |δb| ≤ γn|L||x|.Exercise 4.6. Prove Result 2. of the above theorem.Exercise 4.7. Prove Theorem 4.5 with γn replaced by max(γ2, γn−1). To do so,

carefully consider what changes must be made to Section 4.3 and Fig. 4.2, and whatresult from Theorem 4.2 should be used.

4.4. Other algorithmic variants. There is another algorithmic variant forcomputing the triangular solve [1] for which the error accumulates slightly differentlyif b is overwritten by the solution x without the use of extra workspace. Theoremsfor that algorithm can be derived in a similar manner [1].

5. Analysis of an unblocked algorithm for LU factorization. Having in-troduced the methodology for deriving analyses as well as a few error results for sim-pler operations, we now move on to a more difficult operation, the LU factorizationof a square matrix.

5.1. An algorithm for computing the LU factorization. We analyze whatis often called the Crout variant [3]. This choice allows the reader, if so inclined, tocompare and contrast our exposition with the one in the book by Nick Higham [6].This variant computes L and U so that before and after each iteration LTL, UTL,LBL and UTR satisfy

(LTLUTL = ATL LTLUTR = ATRLBLUTL = ABL no computation has occured with ABR

).

This algorithm is given in Fig. 5.1.

5.2. Preparation. The Crout variant casts the computation in terms of uT12 :=aT12 − lT10U02 and l21 := (a21 − L20u01)/υ11. Both of these involve matrix-vectormultiplication, since the first is equivalent to u12 := a12 − UT02l10. The followingtheorem paves the way for the analysis of this LU factorization algorithm.

14

Algorithm LU = A

Partition A→„AT L AT R

ABL ABR

«, L→

„LT L 0

LBL LBR

«, U →

„UT L UT R

0 UBR

«

where LTL, UTL, ATL, are 0× 0while m(ABR) > 0 do

Repartition(AT L AT R

ABL ABR

)→

(A00 a01 A02

aT10 α11 aT

12A20 a21 A22

),(LT L 0

LBL LBR

)→

(L00 0 0

lT10 1 0L20 l21 L22

), · · ·

where

υ11 := α11 − lT10u01

uT12 := aT12 − lT10U02

l21 := (a21 − L20u01)/υ11

Continue with(AT L AT R

ABL ABR

)←

(A00 a01 A02

aT10 α11 aT

12A20 a21 A22

),(LT L 0

LBL LBR

)←

(L00 0 0

lT10 1 0

L20 l21 L22

), · · ·

endwhile

Fig. 5.1. The unblocked Crout variant for computing the LU factorization of a matrix.

Error results for matrix-vector multiplication. Theorem 5.1. Let A ∈Rm×n, x ∈ Rn, y, u, v ∈ Rm, and λ ∈ R. Assume that v := y−Ax and w := (y−Ax)/λare computed one element at a time by taking inner products of the corresponding rowof A and x and subtracting the result from the corresponding element of y, and thendividing it by λ in the second operation. Then

R1. Ay + v = y + δy, where |δy| ≤ γn+1|A||x|+ γ1|y| ≤ (γn+1|A||x|+ |y|).R2. Ay + v = y + δv, where |δv| ≤ γn|A||x|+ γ1|v| ≤ γn(|A||x|+ |v|).R3. Ax + λw = y + δy, where |δy| ≤ γn|A||x| + γ2|λ||w| ≤ max(γ2, γn)(|A||x| +

|λ||w|)..R4. Ax+ λw = y + δy, where |δy| ≤ γn+1(|A||x|+ |λ||w|)..R5. (A+ ∆A)x+ λw = y + δy, where |∆A| ≤ γn+1|A| and |δy| ≤ γ2|y|.

Proof: All these results follow from Theorem 4.2.Exercise 5.2. Prove Theorem 5.1.

5.3. Analysis. We will show that the computed factors L and U satisfy theequality LU = A + ∆A with |∆A| ≤ γn|L||U |. This is given in Step 1b. What thisimplies about stability is discussed in Section 6.

The justification for the bound is not as straightforward as it was for the solutionof the triangular system. The factor γn comes from the fact that each element ofL and U is exposed to the effects of at most n individual operations. In addition,experience tells us that bounds for the backward error of various algorithms for LUfactorization have the form C1γn|A|+C2γn|L||U |), where C1 ≥ 0 and C2 ≥ 1. Strictlyspeaking, one wants to start the analysis with this more general bound and then refineit once it becomes obvious what constants fits the analysis. For space considerations,we take the easy way out and start our analysis knowing what the bound should be.

In Fig. 5.2, we show both the algorithm (on the left) and the error side of the15

Error side Step

{ } 1a

Partition

A→„ATL ATR

ABL ABR

«, L→

„LTL 0

LBL LBR

«, . . . ∆A→

∆ATL ∆ATR

∆ABL ∆ABR

!

where ATL, LTL, . . . are empty

4

8>><>>:

k = m(ATL) ∧„LTLUTL = ATL + ∆ATL LTLUTR = ATR + ∆ATR

LBLUTR = ABL + ∆ABL

«

∧„ |∆ATL| ≤ γk|LTL||UTL| |∆ATR| ≤ γk|LTL||UTR||ABL| ≤ γk|LBL||UTL|

«

9>>=>>;

2a

while m(ATL) ≤ m(A) do 3

Repartition

„ATL ATR

ABL ABR

«→0@A00 a01 A02

aT10 α01 aT

02A20 a21 A22

1A , . . .

∆ATL ∆ATR

∆ABL ∆ABR

!→

0B@

∆A00 δa01 ∆A02

δaT10 δα11 δaT

12

∆A20 δa21 ∆A22

1CA

where α11, λ11, . . . are scalars

5a

8>>>>>>>>><>>>>>>>>>:

k = m(A00)

∧0@

L00U00=A00+∆A00 L00

ù01 U02

´=à01 A02

´+`δa01 ∆A02

´„lT10L20

«U00=

„aT10A20

«+

„δaT

10∆A20

«1A

∧0@

|∆A00| ≤ γk|L00||U00| | `δa01 ∆A02´ | ≤ γk|L00|

˛ù01 U02

´˛˛˛„δaT

10∆A20

«˛˛ ≤ γk

˛˛„lT10L20

«˛˛ |U00|

1A

9>>>>>>>>>=>>>>>>>>>;

6

υ11 := α11 − lT10u01

uT12 := aT

12 − lT10U02

l21 := (a21 − L20u01)/υ11

lT10u01 + υ11 = α11 + δα11

∧ |δα11|≤γk+1(|lT10||u01|+|υ11|) Theorem 4.2 R2

lT10U02 + uT12 = aT

12 + δaT12

∧ |δaT12|≤γk+1(|lT10||U02|+|uT

12|) Theorem 5.1 R2

L20u01 + l21υ11 = a21 + δa21

∧ |δa21|≤γk+1(|L20||u01|+|l21||υ11|) Theorem 5.1 R4

8

8>>>>>>>>>>>>>>><>>>>>>>>>>>>>>>:

k + 1 = m

„A00 a01aT10 α11

«∧

0BB@

„L00 0

lT10 1

«„U00 u01

0 υ11

«=

„A00 a01aT10 α11

«+

„∆A00 δa01

δaT10 δα11

«„L00 0

lT10 1

«„U02

uT12

«=

„A02

aT12

«+

„∆A02

δaT12

«

`L20 l21

„U00 u01

0 υ11

«=À20 a21

´+`∆A20 δa21

´

1CCA

∧

0BB@

˛˛„

∆A00 δa01

δaT10 δα11

«˛≤γk+1

˛˛„L00 0

lT10 1

«˛˛˛„U00 u01

0 υ11

«˛˛˛„

∆A02

δaT12

«˛≤γk+1

˛˛„L00 0

lT10 1

«˛˛˛„U02

uT12

«˛

˛˛„δaT

10∆A20

«˛≤γk+1

˛˛„lT10L20

«˛ |U00|

1CCA

9>>>>>>>>>>>>>>>=>>>>>>>>>>>>>>>;

7

Continue with

∆ATL ∆ATR

∆ABL ∆ABR

!←

0B@

∆A00 δa01 ∆A02

δaT10 δα11 δaT

12

∆A20 δa21 ∆A22

1CA 5b

enddo ˘n = m(A) ∧ LU = (A+ ∆A) ∧ |∆A| ≤ γn|L||U |

¯1b

Fig. 5.2. LU = A. Right side of the error worksheet for proving the backward stability of the LUfactorization computed via the Crout variant. Step 6 contains the inductive hypothesis and Step 7the thesis. The center field of Step 8 contains the error analysis for the computational updates andthe right field shows the assignments that maintain the error-invariant true.

16

error worksheet, with the error-invariant given in Step 2:

k = m(ATL) ∧(LTLUTL = ATL + ∆ATL LTLUTR = ATR + ∆ATRLBLUTR = ABL + ∆ABL

)

∧( |∆ATL| ≤ γk|LTL||UTL| |∆ATR| ≤ γk|LTL||UTR||ABL| ≤ γk|LBL||UTL|

),

which results from restricting the target error result to the computation already per-formed before and after each iteration. (Note that this error-invariant previouslyappears in four different places in the error worksheet. To save space we only list itbefore the loop.) Given the error-invariant, Steps 4–7 are prescribed by the procedureoutlined in Section 4. In particular, the predicates indicated in Steps 6 and 7 followimmediately from this error-invariant by substituting the submatrices from Steps 5aand 5b, respectively. We now show that there is an analysis that takes one fromStep 6 to Step 7, the Inductive Step.

Algebraic manipulation of Step 7 yields

k + 1 = m

(A00 a01

aT10 α11

)∧

L00U00 = A00+∆A00 L00u01 = a01+δa01 L00U02 = A02+∆A02

lT10U00 = aT10+δaT10 lT10u01 + υ11 = α11+δα11 lT10U02 + uT12 = aT12+δaT12L20U00 = A20+∆A20 L20u01 + l21υ11 = a21+δa21

∧

|∆A00|≤γk+1|L00||U00| |δa01|≤γk+1|L00||u01| |∆A02|≤γk+1|L00||U02||δaT10|≤γk+1|lT10||U00| |δα11|≤γk+1|lT10||u01|+|υ11| |δaT12|≤γk+1|lT10||U02|+|uT12||∆A20|≤γk+1|L20||U00| |δa21|≤γk+1|L20||u01|+|l21||υ11|

This is the predicate that must be true at Step 7 if the I.H. (Step 2) is to again betrue at the bottom of the loop (in Step 2c if this has been listed).

Step 2 tells us that the predicate in Step 6 is true. Algebraic manipulation of thispredicate yields

k = m(A00) ∧L00U00 = A00+∆A00 L00u01 = a01+δa01 L00U02 = A02+∆A02

lT10U00 = aT10+δaT10L20U00 = A20+∆A02

∧|∆A00| ≤ γk|L00||U00| |δa01| ≤ γk|L00||u01| |∆A02| ≤ γk|L00||U02||δaT10| ≤ γk|lT10||U00||∆A20| ≤ γk|L20||U00|

Now, the constraints on the values in the submatrices that are not highlighted in(5.1) are already satisfied in Step 6(“by the I.H.”, in the terminology of an inductiveproof), since γk ≤ γk+1. We merely need to show that there exist δα11, δaT12, and δa21

that satisfy the constraints in the grey-shaded boxes.We now examine the error introduced by the computational updates to determine

how error is contributed to each of these variables:Determining δα11: The assignment υ11 := α11 − lT10u01 and Theorem 4.2 R2 implythat there exists δα11 such that

lT10u01+υ11 = α11+δα11 where |δα11| ≤ γk(∣∣lT10

∣∣ |u01|+|υ11| ≤ γk+1(∣∣lT10

∣∣ |u01|+|υ11| .Determining δaT12: u

T12 := aT12 − lT10U02 and Theorem 5.1 imply that

lT10U02 + uT12 = aT12 + δaT12 where |δaT12| ≤ γk(∣∣lT10

∣∣ ∣∣U02

∣∣ +∣∣uT12

∣∣)

≤ γk+1

(∣∣lT10∣∣ ∣∣U02

∣∣ +∣∣uT12

∣∣) .17

Determining δa21: l21 := (a21 − L20u01)/υ11, and Theorem 5.1 R4 yield

L20u01 + υ11 l21 = a21 + δa21 where |δa21| ≤ γk+1

(∣∣L20

∣∣ |u01|+∣∣l21

∣∣ |υ11|).

This completes the proof of the Inductive Step. It is also listed in Step 8 of Fig. 5.2.

5.4. Results. The above discussion, summarized in Fig. 5.2, prove the followingtheorem:

Theorem 5.3. Let L and U be be computed via the Crout variant for computingthe LU factorization, as given in Fig. 5.1. Then the computed factors L and U satisfyLU = A+ ∆A, where |∆A| ≤ γn|L||U |.

The backward stability result in this theorem agrees with the one in [6].

5.5. Other algorithmic variants. The described approach can be similarlyapplied to the other four variants for computing the LU factorization. In [?] it isshown how to use the error worksheet to derive a similar error result for Variant 1.

5.6. Solution of a linear system. The result of an LU factorization is mostoften used to solve a linear system of equations: If Ax = b, then L(Ux) = b so that,after factoring A, the triangular solves Lz = b followed by Ux = z can be performed,yielding the solution vector x. The following theorem summarizes how the computedsolution, x, is the exact solution of a perturbed linear system (A+ ∆A)x = y:

Theorem 5.4. Let A be a nonsingular matrix. Let L and U be the LU factoriza-tion computed by the algorithm in Fig. 5.1. Let z be the solution to Lz = y computedwith the algorithm in Fig. 4.1. Let x be the solution to Ux = z computed with analgorithm for solving an upper triangular system similar to the algorithm in Fig. 4.1.Then x satisfies (A+ ∆A)x = b with |∆A| ≤ (3γn + γ2

n)|L||U |.Proof: From previous sections we have learned that

LU = A+ E1 ∧ |E1| ≤ γn|L||U | Theorem 5.3(L+ E2)z = b ∧ |E2| ≤ γn|L| Theorem 4.5(U + E3)x = z ∧ |E3| ≤ γn|U | Theorem 4.5.

Now,

b = (L+ E2)(U + E3)x = (LU + E2U + E3L+ E2E3)x= (A+ E1 + E2U + E3L+ E2E3︸︷︷︸

∆A

)x = (A+ ∆A)x,

where

|∆A| = |E1 + E2U + E3L+ E2E3| ≤ |E1|+ |E2||U |+ |E3||L|+ |E2||E3|≤ γn|L||U |+ γn|L||U |+ γn|L||U |+ γ2

n|L||U | = (3γn + γ2n)|L||U |.

6. A Comment about LU factorization with Partial Pivoting. We remindthe reader now of how the analysis of LU factorization without partial pivoting isrelated that of LU factorization with partial pivoting. Invariably, it is argued that LUwith partial pivoting is equivalent to the LU factorization without partial pivotingon a prepermuted matrix [?], PA = LU , where P is a permutation matrix. Thepermutation doesn’t involve any floating point operations and therefore does notgenerate error. It is then argued that, as a result, the error that is accumulated isequivalent with or without partial pivoting.This section can be greatly expanded. How much?

18

Error side Step

{ } 1a

B = A

Partition

X →„XTL XTR

XBL XBR

«, X ∈ {A,L,U,B} ∆A→

∆ATL ∆ATR

∆ABL ∆ABR

!

where XTL and ∆ATL are empty

4

8>>>><>>>>:

k = m(ATL)

∧„LTLUTL = ATL + ∆ATL LTLUTR = ATR + ∆ATR

LBLUTL = ABL + ∆ABL BBR + LBLUTR = ABR + ∆ABR

«

∧„ |∆ATL| ≤ γk+1(|ATL|+ |LTL||UTL|) |∆ATR| ≤ γk+1(|ATR|+ |LTL||UTR|)|∆ABL| ≤ γk+1(|ATL|+ |LBL||UTL|) |∆ABR| ≤ γk+1(|ABR|+ |LBL||UTR|)

«

9>>>>=>>>>;

2a

while m(ATL) ≤ m(A) do 3

Repartition

„XTL XTR

XBL XBR

«→0@X00 X01 X02

X10 X11 X02

X20 X21 X22

1A ,

X ∈ {A,L,U,B}

∆ATL ∆ATR

∆ABL ∆ABR

!→

0B@

∆A00 ∆A01 ∆A02

∆A10 ∆Aold11 ∆Aold

12

∆A20 ∆Aold21 ∆Aold

22

1CA

where X11 and ∆A11 are b× b

5a

{ See Fig. 7.2 } 6

[L11, U11] := LU(B11)

U12 := L−111 B12

L21 := B21U−111

B22 := B22 − L21U12

See Fig. 7.2 8

{ See Fig. 7.2 } 7

Continue with„XTL XTR

XBL XBR

«←0@X00 X01 X02

X10 X11 X02

X20 X21 X22

1A ,

X ∈ {A,L,U,B}

∆ATL ∆ATR

∆ABL ∆ABR

!←

0B@

∆A00 ∆A01 ∆A02

∆A10 ∆Anew11 ∆Anew

12

∆A20 ∆Anew21 ∆Anew

22

1CA 5b

enddo ˘n = m(A) ∧ LU = (A+ ∆A) ∧ |∆A| ≤ γn+1(|A|+ |L||U |)

¯1b

Fig. 7.1. LU = A. Error worksheet for proving the backward stability of the blocked LUfactorization computed via the right-looking variant.

7. Analysis of a Blocked Algorithm for LU Factorization. In this sectionwe go where no analysis has gone before. We derive the backward stability resultfor the blocked right-looking LU factorization, which is closely related to the mostfrequently used blocked algorithm for LU factorization with partial pivoting. Blockedalgorithms cast computation in terms of matrix-matrix operations, like matrix-matrixmultiplication, in order to achieve high performance on architectures with complexmulti-level memories [?].

7.1. A blocked algorithm for computing the LU factorization. Whilein practice A is overwritten, for the analysis we assume new matrices L and U arecomputed and introduce an auxiliary matrix B, which is initially equal to A and inwhich the updated ABR will be stored. The right-looking algorithm then has theproperty that before and after each iteration LTL, UTL, LBL,UTR, and BBR have

19

...8>><>>:

k=m

(AT

L)∧„L

TLU

TL

=A

TL

+∆A

TL

LT

LU

TR

=A

TR

+∆A

TR

LB

LU

TL

=A

BL

+∆A

BLB

BR

+L

BLU

TR

=A

BR

+∆A

BR

«

∧„|∆A

TL |≤

γk+

1(|A

TL |

+|L

TL ||U

TL |)

|∆A

TR |≤

γk+

1(|A

TR |

+|L

TL ||U

TR |)

|∆A

BL |≤

γk+

1(|A

TL |

+|L

BL ||U

TL |)|∆A

BR |≤

γk+

1(|A

BR |

+|L

BL ||U

TR |) «

9>>=>>;2a

...8>>>>>>>><>>>>>>>>:

k=m

(A00)∧0@

L00U

00 =A

00 +

∆A

00

L00 Ù

01U

02 ´= `

A01A

02 ´+ `

∆A

01

∆A

02 ´

„L

10

L20 «

U00 = „

A10

A20 «

+ „∆A

10

∆A

20 «

„B

11B

12

B21B

old

22

«+ „

L10

L20 «Ù

01U

02 ´= „

A11A

12

A21A

22 «

+ „∆A

old

11

∆A

old

12

∆A

old

21

∆A

old

22

«1A

∧

0B@|∆A

00 |≤

γk+

1(|A

00 |+|L

00 ||U

00 |)

| `∆A

01

∆A

02 ´|≤

γk+

1 ` ˛À

01A

02 ´ ˛+|L

00 | ˛`

U01U

02 ´ ˛´

˛˛ „∆A

10

∆A

20 « ˛˛≤

γk+

1 „ ˛˛ „A

10

A20 « ˛˛+ ˛˛ „

L10

L20 « ˛˛|U

00 | «

˛˛ „∆A

old

11

∆A

old

12

∆A

old

21

∆A

old

22

« ˛˛≤γ

k+

1 „ ˛˛ „A

11A

12

A21A

22 « ˛˛+ ˛˛ „

L10

L20 « ˛˛ ˛`

U01U

02 ´ ˛ «

1CA

9>>>>>>>>=>>>>>>>>;

6

[L11,U

11]:=

LU

(B11)

U12

:=L−

111B

12

L21

:=B

21U−

111

B22

:=B

22 −

L21U

12

L11U

11

=B

11

+∆B

11

∧|∆B

11 |≤

γb |L

11 ||U

11 |

Theo

rem???

L11U

12

=B

12

+∆B

12

∧|∆B

12 |≤

γb |L

11 ||U

12 |

Theo

rem???

L21U

11

=B

21

+∆B

21

∧|∆B

21 |≤

γb |L

21 ||U

11 |

Theo

rem???

Bnew

22

+L

21U

12

=B

old

22

+∆B

22∧|∆B

22 |≤

γb+

1(|B

old

22|+|L

21 ||U

12 |)

Theo

rem???

8

8>>>>>>>>>>><>>>>>>>>>>>:

k+b

=m

„A

00A

01

A10A

11 «∧

0BB@ „L

00

0

L10L

11 «„U

00U

01

0U

11 «

= „A

00A

01

A10A

11 «

+ „∆A

00

∆A

01

∆A

10

∆A

new

11

«„L

00

0

L10L

11 «„

U02

U12 «

= „A

02

A12 «

+ „∆A

02

∆A

new

12

«

`L

20L

21

„U

00U

01

0U

11 «

= À

20A

21 ´+ `

∆A

20

∆A

21 ´

Bnew

22

+`L

20L

21 ´ „

U02

U12 «

=A

22 +

∆A

new

22

1CCA

∧

0BB@

˛˛ „∆A

00

∆A

01

∆A

10

∆A

new

11

«˛≤γ

k+

b+

1 „ ˛˛ „A

00A

01

A10A

11 «˛+ ˛˛ „

L00

0

L10L

11 «˛ ˛˛ „

U00U

01

0U

11 «˛ «

˛˛ „∆A

02

∆A

new

12

«˛≤γ

k+

b+

1 „ ˛˛ „A

02

A12 «˛+ ˛˛ „

L00

0

L10L

11 «˛ ˛˛ „

U02

U12 «˛ «

˛∆A

20

∆A

21 ´≤

γk+

b+

1 „˛A

20A

21 ´

+ ˛L

20L

21 ´ ˛˛ „

U00U

01

0U

11 «˛ «

˛∆A

new

22

˛≤γ

k+

b+

1 „|A22 |+ ˛

L20L

21 ´ ˛˛ „

U02

U12 «˛ «

1CCA

9>>>>>>>>>>>=>>>>>>>>>>>;

7

...

Fig

.7.2

.LU

=A

.D

etail

oferro

rworksh

eetfo

rpro

ving

the

backw

ard

stability

ofth

eblocked

LU

facto

rizatio

nco

mputed

viath

erigh

t-lookin

gva

riant.

20

been computed to satisfy(LTLUTL = ATL LTLUTR = ATRLBLUTL = ABL BBR = ABR − LBLUTR

).

The algorithm is given on the left in Fig. 7.1.

7.2. Preparation. Examining the left side of Fig. 7.1 shows that the compu-tation is cast in terms of an LU factorization (of B11), two triangular solves withmultiple right-hand sides (solve L11U12 = B12 and L21U11 = B21 for U12 and L21,respectively), and the matrix-matrix multiplication B22 := B22 − L21U12. An er-ror result for the first, if the Crout variant is used for this subproblem, is given inTheorem 5.3. We now present theorems related to the other operations.

Error results for matrix-matrix multiplication. Theorem 7.1. Let C ∈Rm×n, A ∈ Rm×k, and B ∈ Rk×n. Assume that Z := C − AB is computed onecolumn at a time by the matrix-vector multiplication discussed in Theorem 5.1:

Z =(z0 · · · zn−1

)=

(c0 −Ab0 · · · cn−1Abn−1

)= C −AB.

ThenR1. AB + Z = C + ∆Z, where |∆Z| ≤ γk(|A||B|+ |Z|)R2. AB + Z = C + ∆C, where |∆C| ≤ γk+1|A||B|+ γ1|C|

Proof: This follows from Theorem 5.1.Exercise 7.2. Prove Theorem 7.1.

Error results for triangular solve with multiple right-hand sides. Aclosely related operation is the triangular solve with multiple right-hand sides, LX =B, where m×m lower triangular matrix L and m×n matrix B are given and matrixX is to be computed. We will see how this operation is used as part of a blocked LUfactorization in Section 7.

Partition X → (x0 · · · xn−1

)and B → (

b0 · · · bn−1

). Then

LX = L(x0 · · · xn−1

)=

(Lx0 · · · Lxn−1

)=

(b0 · · · bn−1

)

so that Lxj = bj . This explains the name of the operation: each column of X iscomputed by solving a lower triangular system with the corresponding column of Bas right-hand side.

The following result follows immediately from Theorem 4.5:Corollary 7.3. Let L ∈ Rm×m be nonsingular lower triangular matrix and

X,B ∈ Rm×n. Assume that the X that solves LX = B is computed one column at atime via the algorithm in Fig. 4.2(left side). Then X satisfies LX = B + ∆B where|∆B| ≤ γn|L||X|.

Exercise 7.4. Discuss why in Cor. 7.3 there is no result that corresponds toTheorem 4.5 R1.

There are a number of different algorithms for computing this operation [?], eachof which has a similar error result.

7.3. Analysis. As for the unblocked algorithm discussed in the previous sec-tion, the right-looking LU factorization algorithm, in the presence of round-off error,computes factors L and U . We wish to establish that there exists a perturbation, ∆A,to matrix A ∈ Rn×n such that

LU = A+ ∆A where |∆A| ≤ γn+1(|A|+ |L||U |.21

This is a slightly weaker bound than necessary but makes the analysis fall out moresmoothly. In Exercise ?? we then challenge the reader to modify the analysis totighten the bound.

In Fig. 7.1, we show the error analysis on the right, with the error-invariant givenin Step 2:

k = m(ATL)

∧(LTLUTL = ATL + ∆ATL LTLUTR = ATR + ∆ATRLBLUTL = ABL + ∆ABL BBR + LBLUTR = ABR + ∆ABR

)

∧( |∆ATL| ≤ γk+1(|ATL|+ |LTL||UTL|) |∆ATR| ≤ γk+1(|ATR|+ |LTL||UTR|)|∆ABL| ≤ γk+1(|ATL|+ |LBL||UTL|) |∆ABR| ≤ γk+1(|ABR|+ |LBL||UTR|)

),

which results from restricting the target error result to the computation already per-formed before and after each iteration. We focus on the predicates in Steps 6 and 7which follow immediately from the error-invariant by substituting the submatricesfrom Steps 5a and 5b, respectively. Our procedure now requires us to show that thereis an analysis that takes one from Step 6 to Step 7, the Inductive Step, Step 8.

Algebraic manipulation of Step 7 yields

k + b = m

„A00 A01

A10 A11

«∧ (7.1)

0@? ? ?

? L10U01+L11U11 = A11+∆Anew11 L10U02+U12 = A12+∆Anew

12

? L20U01+L21U11 = A21+∆Anew21 Bnew

22 +L20U02+L21U12 = A22 + ∆Anew22

1A ∧

0@? ? ?

? |∆Anew11 |≤γk+b+1(|A11|+|L10||U01|+|U11|) |∆Anew

12 |≤γk+b+1(|A12|+|L10||U02|+|U12|)? |∆Anew

21 |≤γk+b+1(|A21|+|L20||U01|+|L21||U11|) |∆Anew22 |≤γkb++1(|A22|+|L20||U02|+|L21||U12|)

1A .

Here ? indicates that the expression is identical to that in the corresponding resultin (7.2) below, which means no additional error is accumulated in ∆A00, ∆A01, ∆A02,∆A10 and ∆A20. This is the predicate that must be true at Step 7 if the I.H. (Step 2)is to again be true at the bottom of the loop (in Step 2c if this has been listed).

The I.H. (Step 2) tells us that the predicate in Step 6 is true. Algebraic manipu-lation of this predicate yieldsk = m(A00) ∧ (7.2)0@L00U00 = A00+∆A00 L00U01 = A01+∆A01 L00U02 = A02+∆A02

L10U00 = A10+∆AT10 B11+L10U01 = A11+∆Aold

11 B12+L10U02 = A12+∆Aold12

L20U00 = A20+∆A02 B21+L20U01 = A21+∆Aold21 Bold

22 +L20U02 = A22+∆Aold22

1A ∧

0@|∆A00| ≤ γk+1(|A00|+|L00||U00|) |∆A01| ≤ γk+1(|A01|+|L00||U01|) |∆A02| ≤ γk+1(|A02|+|L00||U02|)|∆A10| ≤ γk+1(|A10|+|L10||U00|) |∆Aold

11 | ≤ γk+1(|A11|+|L10||U01|) |∆Aold12 | ≤ γk+1(|A12|+|L10||U02|)

|∆A20| ≤ γk+1(|A20|+|L20||U00|) |∆Aold21 | ≤ γk+1(|A21|+|L20||U01|) |∆Aold

22 | ≤ γk+1(|A22|+|L20||U02|)

1A

Now, the constraints on the values in the submatrices that are not highlighted in ( 7.1)are already satisfied in Step 6, since γk+1 ≤ γk+b+1 (“by I.H.”, in the terminology ofan inductive proof). We merely need to show that there exist ∆Anew

11 , ∆Anew12 , ∆Anew

21 ,and ∆Anew

22 that satisfy the constraints in the grey-shaded boxes.We now examine the error introduced by the computational updates to determine

how error is contributed to each of these variables:Determining ∆Anew

11 : The update [L11, U11] := LU(B11) together with Theorem 7.1yields that L11U11 = B11 + ∆B11 with |∆B11| ≤ γb|L11||U11|. Let ∆Anew

11 = ∆Aold11 +

∆B11. Then

|∆Anew11 | = |∆Aold

11 + ∆B11| ≤ |∆Aold11 |+ |∆B11| (triangle inequality)

≤ γk+1(|A11|+ |L10||U01|) + γb|L11||U11| (Step 6: I.H.)≤ γk+b+1(|A11 + |L10||U01|+ |L11||U11|) (γb, γk+1 ≤ γk+b+1).

22

Determining ∆Anew12 and ∆Anew

21 : The update U12 := L−111 B12, which is implemented

as a triangular solve with multiple right-hand sides, together with Theorem 7.1 yieldsthat L11U12 = B12 + ∆B12 with |∆B12| ≤ γb|L11||U12|. Let ∆Anew

12 = ∆Aold12 + ∆B12.

Then


12 + ∆B12|≤ |∆Aold

12 |+ |∆B12| (triangle inequality)≤ γk+1(|A12|+ |L10||U02|) + γb|L11||U12| (Step 6: I.H.)≤ γk+b+1(|A12|+ |L10||U02|+ |L11||U12|) (γb, γk ≤ γk+b+1).

The same analysis determines and bounds ∆Anew21 by transposing both sides:

UT11LT21 = BT21 + ∆BT21.

Determining ∆Anew22 : The update B22 := B22 − L21U12 together with Theorem 7.1

yields that Bnew22 = Bold

22 + ∆B22 − L21U12 with |∆B22| ≤ γb(|Bnew22 |+ |L21||U12|). Let

∆Anew22 = ∆Aold

22 + ∆B22. Then


22 + ∆B22|≤ |∆Aold

22 |+ |∆B22| (triangle inequality)≤ |∆Aold

22 |+ γ1|Bold22 |+ γb+1|L21||U12| (Theorem 7.1)

= |∆Aold22 |+ γ1|A22 − L20U02 + ∆Aold

22 |+ γb+1|L21||U12| (substituting Bold22 )

≤ |∆Aold22 |+ γ1|A22|+ γ1|L20||U02|+ γ1|∆Aold

22 |+ γb+1|L21||U12| (triangle inequality)≤ (1 + γ1)|∆Aold

22 |+ γ1(|A22|+ |L20||U02|) + γb+1|L21||U12| (rearranging)≤ (1 + γ1)γk+1(|A22|+ |L20||U02|) (Step 6: I.H.)

+ γ1(|A22|+ |L20U02|) + γb+1|L21||U12|≤ (γk+1 + γ1γk+1 + γ1)(|A22|+ |L20||U02|+ γb+1|L21||U12| (combining terms)≤ γk+2(|A22|+ |L20||U02|) + γb+1|L21||U12| (Cor. ??)≤ γk+b+1(|A22|+ |L20||U02|+ |L21||U12|) (Cor. ??)

The discussion in this section proves the error result summarized in the followingtheorem.

Theorem 7.5. Given A ∈ Rn×n assume that the blocked right-looking algorithmdiscussed in this section completes. Then the computed factors L and U are such that:LU = A+ ∆A where |∆A| ≤ γn+1(|A|+ |L||U |).

8. Conclusion.1. The error property that is desired for the algorithm is stated.2. From this a predictate, the error-invariant, is determined that indicates how

a partial error should accumulate.3. The predicate that describes how error has accumulated before and after the

computational updates in the algorithm is determined.4. How error is incurred by the computational updates is examined and is shown

to satisfy the desired predicates.

REFERENCES

[1] P. Bientinesi, Mechanical Derivation and Systematic Analysis of Correct Linear Algebra Algo-rithms, PhD thesis, 2006.

[2] P. Bientinesi, J. A. Gunnels, M. E. Myers, E. S. Quintana-Ortı, and R. A. van de Geijn,The science of deriving dense linear algebra algorithms, ACM Transactions on MathematicalSoftware, 31 (2005), pp. 1–26.

[3] J. J. Dongarra, I. S. Duff, D. C. Sorensen, and H. A. van der Vorst, Solving LinearSystems on Vector and Shared Memory Computers, SIAM, Philadelphia, PA, 1991.

23

[4] G. H. Golub and C. F. V. Loan, Matrix Computations, The Johns Hopkins University Press,Baltimore, 2nd ed., 1989.

[5] J. A. Gunnels, F. G. Gustavson, G. M. Henry, and R. A. van de Geijn, FLAME: FormalLinear Algebra Methods Environment, ACM Trans. Math. Soft., 27 (2001), pp. 422–455.

[6] N. J. Higham, Accuracy and Stability of Numerical Algorithms, Society for Industrial and Ap-plied Mathematics, Philadelphia, PA, USA, second ed., 2002.

24

1. introduction.1. introduction. numerical stability analysis related to dense linear algebra...

Documents