modern methods of numerical linear algebra – principles ...strakos/download/2008_kamenice.… ·...

Modern Methods of Numerical Linear Algebra –

Principles, Trends and Open Problems

Zdenek Strakos

Institute of Computer Science, Academy of Sciences,

and

Faculty of Mathematics and Physics, Charles University,

Prague, Czech Republic

http://www.cs.cas.cz/~strakos

May 19-23, 2008

1

Basic idea of the course:

An attempt for an advanced “essay” course. It should not be a

replacement for any basic course. The goal is to promote under-

standing of principles and connections, not to derive formulas

(they can be found elsewhere).

Prerequisites:

Standard math, a bit of numerical math, good linear algebra.

Basic assumption:

This is a dialogue. Please ask questions, react whenever some-

thing is unclear. Do not throw flowers (or other object), but,

please, feel free to be very active.

2

NLA as a simple (linear ???) tool ?

A possible stand might ask for “NUMERICAL RECIPES” i.e.

clear descriptions what to do in specific situations.

Such approach works in textbook situations. Life is more compli-

cated. No numerical recipes approach works without achieving

fundamental understanding.

Part of our course have been used elsewhere before. The content

and speed our exposition depend on all of us.

3

SYLLABUS

I. General setting and introduction

I./1. Computational mathematics and numerical linear algebra

I./2. Eigenvalues

I./3. Linear models

I./4. General methodology

II. Model reduction and SVD, bidiagonalization

II./1. SVD and what does it reveal

II./2. TSVD solution of the deblurring problem

II./3. Computation of SVD – bidiagonalization

II./4. Application to solving Ax ≈ b

4

III. Principles of Krylov subspace methods

III./1. Power method and finding the dominance

III./2. The essence of Krylov subspace methods

III./3. A symmetric positive definite example

IV. Gauß quadrature

IV./1. Interpolatory quadrature on n points

IV./2. Gauß-Christoffel quadrature

IV./3. Relationship with the SPD example in III./3.

5

V. Lanczos algorithm and the conjugate gradient method

V./1. Lanczos, CG and Gauß quadrature

V./2. Characterization of convergence

V./3. Measuring convergence

VI. GMRES behaviour and eigenvalues

VI./1. A counterintuitive theory

VI./2. Convection – diffusion model problem

6

VII. Numerical behaviour of Krylov subspace methods

VII./1. Delay of convergence

VII./2. Maximal attainable accuracy

VIII. Roundoff error analysis a mathematical discipline?

VIII./1. Tedious bounds

VIII./2. Mathematical rigor and beauty

7

Motto:

• Folklore (Oxford User’s Guide to Mathematics, E. Zeidler

(ed.), OUP, 2004, 1.14.7 The calculus of residues and the cal-

culation of integrals, p. 549)

Mathematics is the art of avoidingcalculations.

What about computational mathematics

(numerical linear algebra)?

8

• E. Study (1862-1930, Leipzig, Marburg, Gottingen, Greifswald,

Bonn, successor of Lipschitz, Einleitung in die Theorie der In-

varianten lineare Transformationen auf Grund der Vektorenrech-

nung, Friedr. Vieweg & Sohn Akt.-Ges., Braunschweig, 1923)

Mathematics is neither the art of calculationnor the art of avoiding calculations.

Mathematics, however, entails the artof avoiding superflous calculations

and conducting the necessary ones skilfully.In this respect one could have learned

from the older authors.

9

• A. Einstein (Oxford User’s Guide to Mathematics, E. Zeidler

(ed.), OUP, 2004, p. 3)

Everything should be made as simple as possible,but not simpler.

Simple does not mean easy.

Simple does not contradict rigor.

The problem is often in interpretation.

(See, e.g., CG convergence behavior and its estimates via the

Chebyshev polynomials).

10

Lecture I

GENERAL SETTING

AND INTRODUCTION

11

I/1 Computational mathematics and NLA

Scientific computing, computational sciences, numerical analy-

sis (INA at UCLA, founded in 1947 by the National Bureau of

Standards), numerical linear algebra a part of these.

[Baxter, Iserles-03], [Trefethen-92]:

Given a real-world problem, the goal is to produce reliably, ro-

bustly and affordably a solution which is within a user spec-

ified tolerance. A quest for mathematical structure, rigor and

beauty.

12

• mathematical aspects: pattern, rigor and beauty are not

lost with discretization

With discretization, the mathematical problem is, if anything,

more intricate. Fundamental analytical tools such as limit often

need a special setting (a sequence of problems), or they can not

be applied at all.

Example: Analysis of the “transient convergence” in Krylov sub-

space (Ksp) methods

15

• computational aspects:

– errors and inaccuracy are inherent parts of computations,

see the goals above.

∗ can we eliminate all errors by computing exactly?

No – in principle, see the eigenvalue computations, or

solving ill-posed problems below.

We do not wish to compute “exact solution” to the

wrong problem!

16

∗ can we eliminate rounding errors by computing interme-

diate results to an arbitrary accuracy?

In principle yes, but we fail to reach the goals above.

There is no need for it if we understand the mathe-

matics behind computation: Numerical stability.

↓We can stay with finite precision arithmetic and a fixed

round off unit. The axioms of arithmetic are, however,

lost.

↓Mathematical concepts such as perturbation theory,

conditioning, backward error analysis and errors-in-variables,

adaptivity etc.

17

– computer architectures and sw tools

principles of parallel comp. architectures (important both

for control of operations and for processing the data):

∗ parallel processing – pipeliningarray processingmultiprocessing

∗ memory organization – interleavingdistributed memoryhierarchical memory

∗ information exchange – interconnection networks

18

• practical aspects

– structure of the problem (discretized PDE, control, image

processing, signal processing);

– user-specified tollerance and accuracy of the data.

Mathematical concepts of complexity (computational cost).

Rounding error analysis – numerical stability (stability get

a new content).

19

I. Babuska, SINUM 9, No 1, 1972, pp. 53–77,

“Numerical stability of problems of linear algebra” p. 66

“STABLE PROCESS does not significantly increase the uncer-

tainty of the results caused by the UNCERTAINTY OF THE

INPUT DATA”

“The notion GOOD (or BAD) is relative to the uncertainty of

the results caused by uncertainty of the input data alone”

20

I/2 Eigenvalues

Definition. A ∈ CN,N , λ ∈ C. If (A − λI) is singular,

then λ is called an eigenvalue of A.

A − λI singular ⇔ ∃v ∈ CN , Av = λv, v 6= 0

⇔ det(A − λI) = 0

Computation of all eigenvalues?

A general math principle: translate the problem by a proper map-

ping into the form which reveals the solution

REMEMBER THE FOLKLORE

21

A proper mapping?

– unitary similarity transformation ZAZ−1, Z−1 = Z∗

if A contains errors, then for an unitarity invariant norm

A = Aex + E, UAU∗ = UAexU∗ + UEU∗

‖UAexU∗‖ = ‖Aex‖ ,

‖UEU∗‖ = ‖E‖

The least constrained matrix form (with the smallest amount of

prescribed constraints) which reveals the spectrum?

A nice example that world makes a sense:

22

Schur theorem

Let A ∈ CN,N . Then there exist a unitary matrix U ∈ CN,N and

an upper triangular matrix R ∈ CN,N such that

U∗AU = R .

U can be chosen in such a way that R contains on its diagonal

the eigenvalues of A in an arbitrary prescribed order.

Proof: By induction – nonconstructive.

23

Definition. A ∈ CN,N is normal if A∗A = AA∗

Theorem.

Let A ∈ CN,N normal. Then its Schur decomposition gives

a diagonal matrix U∗AU = diag (λi) .

U∗AU = D ⇔ AU = UD ⇔ U = (u1, . . . , uN) is com-

posed of the normalized eigenvectors of A ⇔ the eigenvec-

tors of A form an orthonormal basis of CN

A = UDU∗ ⇔ A =N∑

i=1ui λi u∗

i dyadic form

‖ui λi u∗i ‖ = |λi| .

24

Definition. A Hermitian iff A∗ = A.

A Hermitian ⇔ A normal and λi ∈ R, i = 1, . . . , N .

Definition. A = CN,N diagonalizable if there exist X ∈ CN,N

such that X−1AX = diag (λi) .

Theorem. The set of all (diagonalizable) matrices with distinct

eigenvalues is a dense and open subset of CN,N .

Proof: Schur theorem, continuity of eigenvalues.

25

Theorem (Jordan canonical form).

For any A ∈ CN,N there exist X ∈ CN,N such that

X−1AX = diag (Jn1, . . . , Jnl) ,

Jnk = Cnk,nk, Jnk = bidiag (λk,1) .

Jordan canonical form reveals the structure of invariant sub-

spaces (functional analytic view, Marek, Zitny, ... ).

26

Observation.

There is no (computer, numerical) algorithm which would give

Schur decomposition of a general matrix A in a finite number

of steps. It follows from the Abel-Galois theorem (1822, 1830)

which states that polynomial equations are for N ≥ 5 generally

unsolvable in radicals

(there is no formula using +, −, ×, m√

giving the roots as func-

tions of the coefficients).

27

I/3 Linear models:

• the concept of solution

A x ≈ b, in general A ∈ CN,M , N R M

Numerical PDE: A ∈ RN,N , A nonsingular

More general setting: statistics, control, image proc.

an input x . . . unknownsystem A . . . givenoutput b . . . observed

28

b might contain errors; noise (control, image procesing), mea-

surement errors (Gauß: geodetic measurements).

A might not be known accurately:

- inaccuracy in the model description (physical constants . . . ),

- uncertainty in choosing the model.

29

Linear least squares:

closest approximation to b in R(A),

A x = b + g , ‖g‖ min .

Total least squares (Errors-in-variables):

Find the smallest change of both A and b such that the modified

problem is compatible

(A + E)x = b + g , ‖[g, E]‖F min .

30

• numerical difficulty

Consider A square, nonsingular, Ax = b → x = A−1b.

Let b is corrupted by small errors (noise). Can A−1b change the

proportion of the useful information to the noise?

↓conditioning

κ(A) = max‖z‖=1

‖A z‖ / min‖z‖=1

‖A z‖

= maxmag(A) / minmag(A)

κ(A−1) = (1/minmag(A)) / (1 / maxmag(A)) = κ(A)

It describes how the geometry of the space is deformed by the

mapping defined by the operator A.

31

If the noise is magnified as (1/minmag(A)) while the useful infor-

mation is mainly in the directions magnified only as ∼ (1/maxmag(A)),

and κ(A) is large ⇒ TROUBLE

Ill-conditioning is sometimes identified with conceptual mistakes

in setting the model. In numerical PDE it can be related to

wrong model or improper discretization. Not always, however.

See [Babuska 72], some adaptive mesh refinements in the pres-

ence of singularity.

Ill conditioning can not always be avoided!

Example: ill posed inverse problems,

32

Ill-posed problem

Example

Numerical deblurring (Computer Tomography, Image Reconstruc-

tion and Restoration)

A x + η = b

b . . . observed data (output signal)A . . . describes the system,

(in IRR the Point Spread Function)η . . . noise (unknown)x . . . original image (input data)

which is to be reconstructed

33

This problem is correctly very ill-conditioned, it can not be solved

naively by computation of

x ≈ A−1 b .

- b − η unknown ,

- singular values of A clustered near zero; no numerical rank

and no simple model reduction.

Such problem must be regularized – useful information must be

found without an uncontrolled amplification of the noise present

in the observed data.

34

Example (J. Nagy, Emory University)

Original image (the unknown x) x = true image

35

Observed image

(the right hand side b)

b = blurred, noisy image

Matrix A describing

the Point Spread Function

A = matrix

36

The naive exact solution

of Ax = b

x = inverse solution

Regularized solution

via TSVD or CGLS

659 iterations

37

• size and sparsity

– Large scale computations means

Large scale mathematical approach, not using large scale

computing environment with standard methods and algo-

rithms (sometimes SW) which are in use for decades!

↑This extensive way is used in sciences (physics, comp. chem-

istry, medical imaging) too often!

38

– Sparsity and structure adds another dimension, partial goals

... matching the computer architecture,

... keeping the cost under control – computational efficiency,

... numerical stability,

often in conflict! See SPD/indefinite direct solvers.

A nice historical perspective:

D. Heller, “A Survey of Parallel Algorithms in Numerical Lin-

ear Algebra”, SIAM Review, 20, No 4, 1978, pp. 740-773.

39

I/4 General methodology

• correct definition and rigorous analysis of the problem

Even with Ax ≈ b it is not always simple! For example

min ‖[g, , E]‖F subject to (A + E)x = b + g

IS NOT in general a correctly defined problem!

• design, choice and application of methods and algorithms,

computing an approximate solution

40

• error analysis. Though often ignored, it is crucial. A com-

putation without an error analysis is like jumping out of the

plane without checking whether the backpack contains a

parachute. In numerical solution of PDE, algebraic errors are

very rarely considered.

A key for finding a good approximate solutions in large scale NLA

MODEL REDUCTION

41

Lecture II

MODEL REDUCTION AND

SVD, BIDIAGONALIZATION

42

II/1 SVD and what does it reveal

A symmetric, then A = V diag (λj)VT , V V T = V TV = I

A

v1λ1→ v1

v2λ2→ v2...

vNλN→ vN

One dimensional invariant subspaces, mutually orthogonal.

43

A ∈ RN,M , consider with no loss of generality N ≥ M . Then

A = UΣV T = UrΣrVTr ,

UUT = UTU = I, V TV = V V T = IM , Σ = diag (σ1, . . . , σr,0) ,

σ1 ≥ σ2 ≥ · · · ≥ σr > 0 ,

U = [Ur, . . . ], V = [Vr, . . . ], Σr = diag (σ1, . . . , σr) .

44

A AT

R(A) R(AT )

v1σ1→ u1

σ1→ v1

v2σ2→ u2

σ2→ v2... ... . . . ... ...

vrσr→ ur

σr→ vr

N (A)vr+1

...vM

→ 0 , N (AT )ur+1

...uN

→ 0 ,

45

A =r∑

1uiσiv

Ti , ‖uiσiv

Ti ‖ = σi .

Enormously powerful, theoretically and computationally.

Theorem. The closest rank-k approximation to A, measured by

‖A − Ak‖, is given by

Ak =k∑

1

uiσivTi , ‖A − Ak‖ = σk+1 .

Proof: the minimality needs some exercise, Watkins, Fundamen-

tals of Matrix Comp., Thm. 7.3.6.

46

Theorem. The set of matrices of full rank is an open dense

subset of RN,M .

‖A‖ = σ1 , κ(A) = σ1/σM (consider σM > 0) .

Distance to a nearest singular matrix (rank-deficient matrix):

‖A − AM−1‖ = σm ,‖A − AM−1‖

‖A‖ =σm

σ1= 1/κ(A) .

47

Numerical rank and reduction of the model:

σ1 ≥ σ2 ≥ · · · ≥ σk︸︷︷︸

Ak=∑k

1 uiσivTi

≫ σk+1 ≥ · · · ≥ σr ,

we have a good rank-k approx. to A

If σ1 ≥ σ2 ≥ · · · ≥ σk ≫ σk+1 ∼ O(ε)‖A‖︸︷︷︸

proportional to machine precision ×σ1,

then k is called the numerical rank of A.

48

II/2 TSVD solution of the deblurring problem

Ax ≈ b, b corrupted by noise, A ∈ RN,N ,

Ax + η = b, η is unknown

denote b = b + η , b = Ax the noise free response .

Try using SVD for solving Ax = b

UΣV T x = b ⇒ x =N∑

1

viuT

i b

σiis getting large.

49

x =N∑

1

viuT

i b

σi︸︷︷︸

x, but b unknown.

+N∑

1

viuT

i η

σi

The discrete Piccard condition guaranteesuT

i bσi

→ 0 .

On average, the size of the components of the exact observation

vector b in the individual left singular vector subspaces of A

decay faster than the corresponding singular values.

The problem is inuT

i ησi

for small σi .

50

Truncated SVD:

x =k∑

1

viuT

i b

σi+

k∑

1

viuT

i η

σi︸︷︷︸

≈ x

+N∑

k+1

viuT

i b

σi+

N∑

k+1

viuT

i η

σi︸︷︷︸

Neglect

Here k chosen so that the effect of noise is still small.

Piccard condition: the neglected information about the true so-

lution x,

N∑

k+1

viuT

i b

σi≈ 0 ?

51

II/3 Computation of SVD – bidiagonalization

Observation: SVD decomposition can not be in general com-

puted by a finite algorithm. Computation must in general be

iterative (singular values of A are the nonzero eigenvalues of

ATA, AAT is the argument complete?).

Main point: SVD computation is numerically reliable. For some

structured matrices, all singular values can be determined (for

a given machine precision) to the full number of digits.

52

Two steps:

• A unitary reduction of A to lower (upper) bidiagonal ma-trix, UTAV = B, B = bidiag (βj, αj)

• SVD of B via the implicit QR algorithmout of the scope here.

Bidiagonalization:

• Householder reflections

• (Lanczos)-Golub-Kahan iterative algorithm.

53

Lanczos, Golub, Kahan bidiagonalization

[Golub, Kahan - 65]

Given u1, ‖u1‖ = 1, β1v0 ≡ 0 compute

αivi = AT ui − βivi−1 , ‖vi‖ = 1

βi+1ui+1 = Avi − αiui , ‖ui+1‖ = 1

stop whenever αi = 0 or βi+1 = 0.

54

Suppose αi, βi+1 nonzero for i = 1, . . . , k,

Uk = [u1, . . . , uk], Vk = [v1, . . . , vk], Lk = ℓ − bidiag (βi, αi) ∈ Rk,k.

Then

AT Uk = VkLTk , AVk = UkLk + βk+1 uk+1eT

k ,

UTk AVk = UT

k UkLk + βk+1UTk uk+1eT

k = LkV Tk Vk .

55

Let αk+1 6= 0, i.e. vk+1 can also be computed. Then, appending

one more column to Uk, Vk we get Uk+, Vk+, and appending

a row [0, βk+1] to Lk, giving Lk+,

AT Uk+ = V LTk+ + αk+1vk+1eT

k+1, AVk = Uk+Lk+

V Tk AT Uk+ = V T

k VkLTk+ + αk+1V T

k vk+1ek+1 = LTk+UT

k+Uk+ .

56

Now magic:

By induction: UTk Uk = V T

k Vk = I, then

UTk uk+1 = 0 V T

k vk+1 = 0

i.e. the algorithm generates orthonormal vectors

{u1, . . . , uk+1} , {v1, . . . , vk+1} .

57

II/4 Application to solving Ax ≈ b

Assume

[b, A] =

[

b1 A11 0

0 0 A22

]

.

Then

Ax ≈ b ⇔ A11 x1 ≈ b1

A22 x2 ≈ 0

with the meaningful solution x2 = 0, x =

[

x10

]

.

58

Core problem within A x ≈ b

Our suggestion is to find an orthogonal transformation

Ax ≡ (PT AQ) (QT x) ≈ PT b ≡ b ,

PT [b , A Q] =

[

b1 A11 00 0 A22

]

, P−1 = PT , Q−1 = QT

so that A11 has minimal dimensions. Then solve A11x1 ≈ b1 ,

and take the original problem solution to be

x = Q

[

x10

]

.

59

Such an orthogonal transformation is given (assuming AT b 6= 0 )

by reducing [b, A] to an upper bidiagonal matrix. In fact, A22

need not be bidiagonalized, [b1, A11] = PT1 [b, A Q1] has nonzero

bidiagonal elements and is either

[b1 | A11] =

β1 α1β2 α2

· ·βp αp

, βiαi 6= 0, i = 1, . . . , p

if βp+1 = 0 or p = N , or

60

[b1 | A11] =

β1 α1β2 α2

· ·βp αp

βp+1

, βiαi 6= 0 , βp+1 6= 0

if αp+1 = 0 or p = M < N .

[b1, A11] has full row rank and A11 has full column rank.

The first case happens if and only if the original problem is

compatible, the second case if and only if the original problem is

incompatible.

61

Theorem

(a) A11 has no zero or multiple singular values, so any zero

singular values or repeats that A has must appear in A22 ;

(b) A11 has minimal dimensions, and A22 maximal dimensions,

over all orthogonal transformations of the form given above;

(c) All components of b1 in the left singular vector subspaces

of A11 are nonzero.

62

The core problem approach consists of three steps:

1. Orthogonal transformation [b, A] = PT [b , A Q] , where the

upper bidiagonal block [b1, A11] is as above, and A22 is

not bidiagonalized. All irrelevant and multiple information is

filtered out to A22 .

2. Solving the minimally dimensioned A11 x1 ≈ b1 .

3. Setting x = Q x ≡ Q

[

x10

]

.

63

The core problem approach does not need to complete the SVD

of all of [b , A] . When the bidiagonalization stops, we use only

the necessary (and sufficient) information for computing the so-

lution.

For orthogonally invariant approximation problems, the formu-

lations for the original data [b, A] and the orthogonally trans-

formed data [b, A] are equivalent. Consequently, the core prob-

lem approach always gives meaningful solutions by setting x2 =

0 (neglecting the A22 block).

64

Approximating TSVD

Stopping the bidiagonalization when TSVD can be well approxi-

mated from the computed part of A11: it can approximate soon

the large singular values, more slowly the smalest singular values

→regularization effect of Lanczos – TSVD

[Fierro, Golub, Hansen, O’Leary – 97]

[Hansen, Kilmer, Jensen – 07]

[Paige, Strakos – 05],

[Hnnetynkova, Strakos – 07], . . .

65

Filtration factors

xr =N∑

1

viuT

i b

σiφi

Do we solve linear algebraic systems arising from PDE in this

way?

Yes, we do! A key is

APPROXIMATING DOMINANT INFORMATION FAST

66

Lecture III

PRINCIPLES OF KRYLOV

SUBSPACE METHODS

67

User wants to know, which method should be applied under

given circumstances. No simple template answer. Modern itera-

tive methods are complicated.

Consider A ∈ RN,N , b ∈ RN ;

We restrict our exposition to A nonsingular.

68

III/1 Power method for finding an eigenvector correspond-

ing to the dominant eigenvalue

v1; (Avi/‖Avi‖) = vi+1, i − 1,2, . . .

If |µ1| > |µj|, j 6= 1, {µ1, . . . , µN} eigenvalues of A with {w1, . . . , wN}the corresponding eigenvectors or Jordan canonical vectors, and

if v1 has nonzero component in the direction of w1,

then vi converges to w1 with the asymptotic rate of convergence

maxj 6=1

|µj|/|µ1| .

Local procedure (i → i + 1), it aims at a single dominant infor-

mation.

69

Krylov subspace methods represent global procedures which at

each iteration step remember the history and anticipate the

future.

Though we use them as iterative methods, they are finite (when

we consider Ax = b). Unlike the Power method and unlike the

stationary iterative methods, they work with the global informa-

tion, see later.

70

III/2 The essence of Krylov subspace methods

The essence of KSM

Projections of the N-dimensional problem onto nested Krylov

subspaces of increasing dimension.

Step n : Model reduction from dimension Nto dimension n, n ≪ N .

71

A x = b

An xn = bn

xn approximates the solution x

using the subspace of small dimension.

72

Projection processes

xn ∈ x0 + Sn, r0 ≡ b − Ax0

where the constraints needed to determine xn are given by

rn ≡ b − Axn ∈ r0 + ASn, rn ⊥ Cn .

Here Sn is called the search space, Cn is called the constraint

space.

r0 decomposed to rn + the part in ASn . It should be

called orthogonal projection if Cn = ASn , oblique otherwise.

73

Krylov subspace methods:

Sn ≡ Kn ≡ Kn(A, r0) ≡ span {r0, · · · , An−1r0}.

Algebraic and polynomial formulation:

xn ∈ x0 + Kn(A, r0) ,

x − xn = pn(A) (x − x0) ,

rn ≡ b − Axn = pn(A) r0

∈ r0 + AKn(A, r0) , pn(0) = 1 .

More general setting possible.

74

Krylov subspaces tend to contain the dominant information of

A with respect to r0. Unlike in the power method for computing

the dominant eigenspace, here all the information accumulated

along the way is used [Parlett - 80, Example 12.1.1].

Discretization means approximation of a continuous problem by

a finite dimensional one. Computation using Krylov subspace

methods means nothing but further model reduction. Well-tuned

combination has a chance for being efficient.

The idea of Krylov subspaces is in a fundamental way linked with

the problem of moments.

75

Riemann-Stieltjes integral:

ω(λ) a nondecreasing function on [a, b],

∫ b

af(λ) dω(λ)

is defined as the Riemann integral with the size of the elemen-

tary intervals [ti, ti+1] given by

ω(ti+1) − ω(ti) .

76

In Stieltjes’ formulation, a sequence of numbers ξk, k = 0,1, . . . ,

is given and a non-decreasing distribution function ω(λ), λ ≥ 0,

is sought such that the Riemann-Stieltjes integrals satisfy

∫ ∞

0λk dω(λ) = ξk, k = 0,1, . . . .

Here∫∞0 λkdω(λ) represents the k-th moment of the distribu-

tion function ω(λ).

[Shohat, Tamarkin - 43], [Akhiezer - 65], [Karlin, Shapley - 53]

77

Vector moment problem of Vorobyev:

Find a linear operator An on Kn such that

An r0 = Ar0 ,

An (Ar0) = A2r0 ,...

An (An−2r0) = An−1r0 ,

An (An−1r0) = Qn (Anr0) ,

where Qn projects onto Kn orthogonally to Cn.

[Vorobyev - 65], [Brezinski - 97], [S - 08]

78

Please notice that here Anr0 is decomposed into the part

Qn (Anr0) ∈ Kn and a part orthogonal to Cn .

Therefore Qn is the orthogonal projector if Cn = Kn ,

oblique otherwise.

Some important comments:

79

Direct and iterative

• Some practitioners: when enough computer resources are

available, then direct methods should be preferred to iter-

ative. They compute accurately. Do they?

• If our goal is to improve methodology for solving given prob-

lem, then the questions about having or not having enough

computer resources do not make sense.

• The statement should be read: “We wish to focus on a par-

ticular step and not to be disturbed by possible problems of

the other steps.”

80

• “and” means combination for eliminating the disadvantages

and strengthening the advantages.

• Principal advantage of the iterative part is the possibility of

stopping the computation at the desired accuracy level.

• It requires a meaningful stopping criterion. The errors of

the model, discretization error and the computational error

should be of the same order.

• Due to difficulties with the previous point this (potential)

principal advantage is often presented as a disadvantage (a need

for a stopping criteria . . . ).

81

The point was presented by the founding fathers, it is well un-

derstood in the context of multilevel methods. But it is not ac-

cepted in practical computational mathematics in general. See

the following quote from a negative referee report:

“... the author give a misguided argument. The main advantage

of iterative methods over direct methods does not primarily lie

in the fact that the iteration can be stopped early [whatever

this means], but that their memory (mostly) and computational

requirements are moderate.”

It is sad to see that - how clearly things were explained, e.g. in

the introduction to the [Hestenes, Stiefel - 52]!

82

Validation

Verification

[Babuska - 03], [Oden et al - 03]

83

. . . a linear problem?

Methods and phenomena in NLA need not be linear!

How fast we get an acceptable approximate solution?

• In modern iterative methods we have to study transient phase(early stage of computation) which represents a nonlinearphenomenon in a finite dimensional space of small dimen-sionality. [Ptak, Trefethen, Baxter and Iserles, ... ]

• Operator approach can not describe the transient phase! Lin-earization by asymptotic tools is (even with adaptation) oflimited use.

84

Another look

An information-based argument: A−1 uses global information

from A. Consequently, good iterative solution requires that:

• Either the global information is taken care for by a good pre-

conditioner. An extremely good preconditioner, transforming

the system matrix almost to identity, reduces the number of

iterations to O(1)

• Or the global information is gathered together by many (close

to N) matrix-vector multiplications.

What is wrong?

85

Solving Ax = b for some particular meaningful b can be different

from solving the system with the matrix A and some worst-case

right hand side! The data have typically some meaning and are

correlated.

For an acceptable accuracy we may not need the full global

communication.

Operator approach (in analytical considerations working with an

approximation of A−1) leads to asymptotics. Solving Ax = b for

the particular meaningful {A, b} is different from computing A−1.

[Beckermann, Kuijlaars – 02], [Kuijlaars – 06]

86

Principal question:

When does Kn(A, r0) contain enough information about the orig-

inal N-dimensional linear algebraic problem in order to provide a

good approximate solution xn?

↓

Convergence (better Behavior).

87

Orthogonality in generating Krylov subspaces

• Refining the multidimensional information (unlike in the power

method), and computing projections from b, Ab, . . . , An−1b ?

↓

Mutual orthogonalization, or orthogonalization against some

auxiliary vectors.

Goal: getting in an affordable way a good basis (”good”

does not necessarily mean “the best possible”)

• Practical computations means limited accuracy and must be

justified by numerical stability analysis.

88

Consistency of the whole solution process

Elliptic PDE can serve as a nice example. The weak formulation

leads to a SPD bilinear form, with the energy as the quantity

in charge. The Galerkin FEM discretized problem is again SPD,

and, consequently, an algebraic iterative method consistent with

the whole solution process should minimize the energy norm of

the error of the finite-dimensional approximate solution at each

iteration step. The world makes a sense - the conjugate gradient

method represents such consistency.

For some pioneering work on ”cascadic CG” see [Deuflhard – 93

(94)]. Recently, M. Arioli and his co-workers, see [Arioli et al. –

04]. Algebraic level – [Hestenes and Stiefel - 52], [S, Tichy - 02],

see below.

89

III/3 A symmetric positive definite example

Ax = b, r0 = b − Ax0

q1 = r0/‖r0‖, {q1, q2, . . . , qn} an orthonormal basis of Kn(A, r0)

Using Gram-Schmidt orthogonalization:

AQn = QnHn,n + βn+1qn+1eT1

QTnAQn = Hn,n

If A is symmetric, then Hn,n must be symmetric tridiagonal.

90

For A symmetric, the Lanczos orthonormal basis is computed

via the 3-term recurrence

βn+1qn+1 = Aqn − αnqn − βnqn−1

Where

β1q0 = 0, αn = (Aqn − βnqn−1, qn), ‖qn+1‖ = 1 .

Matrix form (we will see it in detail below):

AQn = QnTn + βn+1qn+1eTn , Tn = tridiag (βj, αj, βj+1) ∈ Rn,n .

91

In terms of polynomials (here A SPD)

qn+1 = ψn(A)q1/(β2 . . . βn+1) , ψn monic

(ψi(A)q1, ψj(A)q1) = δij

With the spectral decomposition A = U diag (λl)UT

ψi(A)q1 =N∑

l=1

(q1, ul)ψi(λl) ul

δij = (qi+1, qj+1) =N∑

l=1

(q1, ul)2 ψi(λl)ψj(λl)

92

{1, ψ1, . . . , ψn} monic orthogonal polynomials with respect to

(ϕ, ψ) =N∑

l=1

ωl ϕ(λl)ψ(λl) , ωl = (q1, ul)2

Polynomials orthogonal with respect to the R-S integral with

the distribution function ω(λ) with the finite points of increase

{λ1, . . . , λN} and weights {ω1, . . . , ωN}.

What does it offer? See next.

93

Lecture IV

GAUß QUADRATURE

94

IV/1 Interpolatory quadrature formula on n points

Consider a nondecreasing function ω(λ) on [ζ, ξ]. Let µ1, µ2, . . . , µn

be n distinct points in [ζ, ξ]. Then there exist weights ω1, . . . , ωn

such that the quadratic formula

∫ ξ

ζf(λ) dω(λ) =

n∑

l=1

ω(n)l f(µl)

is exact for any polynomial of degree at most n − 1.

95

Indeed, take Lf(µ1, . . . , µn) the Lagrange interpolating polyno-

mial on the points µ1, . . . , µn:

Lj(µ1, . . . , µn) =n∑

l=1

f(µl)

∏

m6=l(λ − µm)

∏

m6=l(µl − µm)

≡n∑

l=1

f(µl)φl(λ)

Then

ω(n)l =

∫ ξ

ζφl(λ) dω(λ)

and since every polynomial of degree ≤ (n − 1) is identical with

its Lagrange interpolating polynomial, we get the result.

96

IV/2 Gauß-Christoffel Quadrature

Interpolatory quadrature formula on the n points µ1, . . . , µn,

weights

ω(n)l =

∫ ξ

ζφl(λ) dω(λ)

Add a new node η1 (different from µ1, . . . , µn).

Observation: its weight proportional to

∫ ξ

ζψn(λ) dω(λ) ,

ψn(λ) = (λ − µ1)(λ − µ2) . . . (λ − µn) .

97

Add another new node η2 (different from µ1, . . . , µn, η1),

its weight proportional to

∫ ξ

ζψn(λ)(λ − η1) dω(λ) ;

Add 2nth new node ηn (different from all previous nodes)

its weight proportional to

∫ ξ

ζψn(λ)(λ − η1) . . . (λ − ηn−1) dω(λ)

The resulting quadrature is of degree 2n − 1 (2n nodes).

98

The ingenial idea of Gauß :

Take µ1, . . . , µn the roots of the monic orthogonal polynomial

ψn(λ) defined by the inner product

(ϕ, ψ) =∫ ξ

ζϕ(λ)ψ(λ) dω(λ) ,

THEN ALL WEIGHTS CORRESPONDING TO THE ADDI-

TIONAL n NODES η1, . . . , ηn are 0 , WE EVEN DO

NOT NEED TO KNOW η1, . . . , ηn!

How does it show up in SPD Krylov subspace methods

for solving Ax = b?

99

Lecture V

LANCZOS ALGORITHM

AND THE CONJUGATE

GRADIENT METHOD

100

V/1 Lanczos, CG and Gauß quadrature

Lanczos orthonormal basis of Kn(A, r0) is generated by the

three–term recurrence

AQn = QnTn + βn+1qn+1 eTn , Qn = [q1, . . . , qn] .

101

Tn =

α1 β2β2 α2

. . . . . . . . .

βn

βn αn

Jacobi matrix

Tn = Sn Θn S∗n ,

Θn = diag (θ(n)1 , . . . , θ

(n)n ) ,

Sn = [s(n)1 , . . . , s

(n)n ], S∗

nSn = SnS∗n = I .

102

Relationship with orthogonal polynomials

qn+1 = ψn(A) q1 / (β2β3 . . . βn+1) ,

{1, ψ1, . . . , ψn} are monic orthogonal polynomials wrt

(ϕ, ψ) =N∑

i=1

ωi ϕ(λi)ψ(λi) , ωi = |(ui, q1)|2 .

103

From orthogonality

ψn : ‖ ψn(A) q1 ‖2 = minψ∈Mn

‖ ψ(A) q1 ‖2 ,

↓

N∑

i=1

|(ui, q1)|2 ψ2n(λi) = min

ψ∈Mn

n∑

i=1

|(ui, q1)|2 ψ2(λi) .

104

Riemann-Stieltjes integral determined by A, q1

N∑

1

ωi f(λi) =∫ ξ

ζf(λ) dω(λ) ,

ω(λ) = 0 ζ ≤ λ < λ1 ,

ω(λ) =l∑

j=1

ωj λl ≤ λ < λl+1 ,

ω(λ) =N∑

j=1

ωj λN ≤ λ ≤ ξ .

105

Piecewise constant distribution function ω(λ) with the finite

number of points of increase, recall the spectral decomposition

of the corresponding operator,

...

0

1

ω1

ω2ω3

ω4

ωN

ζ λ1λ2 λ3. . . . . . λN ξ

106

Gauß quadrature interpretation?

Concentrate for a second on the matrix Tn computed at the

step n of the Lanczos algorithm.

Given the matrix Tn , consider the Lanczos algorithm ap-

plied to Tn with the starting vector e1 . Then we get the

Lanczos recurrence for n−dimensional vectors, but with exactly

the same recurrence coefficients as before. Consequently, the

n−dimensional Lanczos recurrence must in the steps 1 to n

generate the same monic orthogonal polynomials as the original

N−dimensional Lanczos recurrence for A, q1.

107

Summarizing,

Tn is determined by the Lanczos process for the matrix Tn and

the starting vector e1 , the monic polynomials {1, ψ1, . . . , ψn}are orthogonal with respect to the (new) innerproduct deter-

mined by the eigenvalues of Tn and the corresponding weights

(ϕ, ψ)n =n∑

i=1

ω(n)i ϕ

(

θ(n)i

)

ψ

(

θ(n)i

)

, ω(n)i =

∣∣∣∣(s

(n)i , e1

)

|2 .

108

The n-th Riemann-Stieltjes integral,

n∑

i=1

ω(n)i f

(

θ(n)i

)

=∫ ξ

ζf(λ) dω(n) (λ) ,

ω(n)(λ) = 0 ζ ≤ λ < θ(n)1 ,

ω(n)(λ) =l∑

j=1

ω(n)j θ

(n)l ≤ λ < θ

(n)l+1 ,

ω(n)(λ) =n∑

j=1

ω(n)j θ

(n)n ≤ λ < ξ .

109

Lanczos algorithm:

• sequence of orthonormal vectors {q1, . . . , qn}

• sequence of Jacobi matrices {T1, . . . , Tn}

• sequence of monic orthogonal polynomials {1, . . . , ψn}

• sequence of R-S integrals with {ω(1), . . . , ω(n)}

• sequence of continued fractions {C1, . . . , Cn}

110

Relationship between∫ ξ

ζf(λ) dω(λ) given by A, q1

and∫ ξ

ζf(λ) dω(n)(λ) =

n∑

i=1

ω(n)i f

(

θ(n)i

)

given by Tn, e1

is nothing but the Gauß quadrature !

The Lanczos process determining the orthonormal basis of Krylov

subspaces is therefore the matrix formulation of the Gauss quadra-

ture. [S, Tichy - 02], [S, Liesen - 05], [Meurant, S - 06].

111

Conjugate gradient method (CG)

The unique method which minimizes the discrete energy norm

of the error over Krylov subspaces (see Appendix)

‖x − xn‖A = minu∈x0+Kn(A,r0)

‖x − u‖A

minz∈Kn(A,r0)

‖(x − x0) − z‖A ,

x − xn = (x − x0) − zn ⊥A Kn(A, r0) ,

rn = b − Axn = A(x − xn) ⊥ Kn(A, r0), rn ⊥ span {q1, . . . , qn} .

112

The CG approximation is determined by

0 = QTn (b − Axn) = ‖r0‖e1 − QT

nAQn yn ,

xn = x0 + Qn yn, Tn yn = ‖r0‖ e1 .

Consequence:

Again, the essence of CG is nothing but the Gauß quadrature!

Everything is determined by ω(λ) . The way the eigenvalues

are linked to convergence is given by the way ω(λ) determines

the sequence ω(n)(λ) , n = 1,2, . . . This relationship is nothing

but trivial !

113

The essence of the CG method

Ax = b , x0 −→∫ ξ

ζf(λ) dω(λ)

↑ ↑

Tn yn = ‖r0‖ e1 ←→n∑

i=1

ω(n)i f

(

θ(n)i

)

xn = x0 + Qn yn

Gauß quadrature !

ω(n)(λ) −→ ω(λ)

114

V/2 Characterization of convergence

In practical applications, preconditioned Krylov subspace meth-

ods search for the sufficiently accurate approximate solution of

the finite dimensional problem in a small number of steps (much

smaller than the system dimension).

“Convergence” must be understood differently from the classical

iterative methods [Hackbush - 94]. We must study the behavior

from the very beginning. No limit, no escape to infinity. We

are interested in the transition period itself [Driscoll, Toh and

Trefethen - 98].

115

In early iterations convergence behavior can strongly depend on

the initial residual (right hand side). Consequently, no analysis

based on the operator (system matrix) only can be sufficient for

achieving a complete understanding.

Very complex phenomenon. In general, no single approach is

sufficient.

Role of the most frequently used eigenvalue - eigenvector struc-

ture in relation to the particular initial residual ?

116

Nick Trefethen [Trefethen-97]:

Any use of eigenvalues to derive physical predictions relies on an

implicit transformation to eigenvector coordinates. If the matrix

is (even moderately) far from normal, the change to eigenvector

coordinates may involve an extreme distortion with a superpo-

sition of huge eigen-components that nearly cancel. The state

of the system may be determined by the pattern of cancellation,

rather than by the size of the individual eigen-components.

Without further transformation the eigenvalue - eigenvector

structure can in such cases hardly be useful!

117

Spectral decompositions

A Hermitian: A = UΛU∗ , UU∗ = U∗U = I , Λ = Λ .

A Normal: A = UΛU∗ , UU∗ = U∗U = I .

A Diagonalizable: A = X ΛX−1 .

A General: A = S J S−1 .

Goal: Show the difference in our understanding when the system

matrix changes from Hermitian to general nonnormal.

118

Conjugate gradient method, A HPD

• ‖ x − xn ‖A = ‖ b − Axn ‖A−1 minimal

• xn = x0 + Qn yn, Tn yn = ‖ r0 ‖ e1

• ‖ rCGn ‖A−1 / ‖ rCG

0 ‖A−1 ≤ minp∈Πn

maxi

|p(λi)|

119

Miminal residual method (MINRES), A Hermitian

• ‖ b − Axn ‖ minimal

• xn = x0 + Qn yn,

where ‖‖ r0 ‖ e1 − Tn+1,n yn ‖ = miny

‖‖ r0 ‖ e1 − Tn+1,n y ‖

• ‖ rMn ‖ / ‖ rM

0 ‖ ≤ minp∈Πn

‖ p(A) ‖ = minp∈Πn

maxi

|p(λi)|

Here Tn+1,n represents the upper Hessenberg tridiagonal matrix

obtained from Tn,n by appending a row [0, . . . ,0, βn+1] .

120

Please notice that ‖ rCGn ‖A−1, ‖ rCG

n ‖A−2 and ‖ rMn ‖ decrease

monotonically, but ‖ rCGn ‖ does not. The CG residual can

exhibit erratic behavior or increase in norm until the last step!

[Hestenes and Stiefel - 52], [Gutknecht, S - 01]

‖ rCGn ‖ =

‖ rMn ‖

√

1 −(

‖ rMn ‖ / ‖ rM

n−1 ‖)2

[Cullum, Greenbaum - 96], (previously [Brown - 91] for FOM –

GMRES). Residual as a measure of convergence for CG? For

a HPD system, MINRES is strictly monotonic.

121

Conclusion :

All is determined by the eigenvalues and by the components

of the initial residual in the individual (invariant) eigenspaces.

The last factor can play a significant role only if the individual

components differ in magnitude.

[Beckerman, Kuijlaars – 02], [Liesen, Tichy - 04]

However, they are initial residuals which can pathologically affect

convergence, see [Scott – 79]. The same residuals suppress com-

pletely the loss of orthogonality in finite precision computations!

122

Ritz values

Roots of the normalized Lanczos polynomial (which is a multiple

of the CG polynomial)

pCGn (µ) = ψn(µ)/ψn(0)

are given by the eigenvalues of Tn i.e. Ritz values. Roots of the

MINRES polynomial are harmonic Ritz values.

[Paige, Parlett and van der Vorst - 95]

Convergence of Ritz values (harmonic Ritz values) explains the

acceleration of convergence of CG (MINRES).

[van der Sluis, van der Vorst - 86]

123

Worst case (operator) bound

For CG and MINRES, minimizing the matrix polynomial (inde-

pendent on the initial residual) gives the worst case bound. The

worst case initial residual may differ for different n . MINRES

example – for each n ,

‖rn‖‖r0‖

= minp∈Πn

‖ p(A) q1 ‖ ≤ max‖q‖=1

minp∈Πn

‖ p(A) q ‖

= minp∈Πn

max‖q‖=1

‖ p(A) q ‖

= minp∈Πn

‖ p(A) ‖ .

124

Linear bounds based on the Chebyshev method, see, e.g. [Hage-

man, Young - 80], [Fischer - 96], [Saad, van der Vorst - 00],

‖x − xn‖A

‖x − x0‖A≤ 2

√

κ(A) − 1√

κ(A) + 1

n

can not be identified, except for some special cases, with the

true behavior of the CG method. Various misleading conclusions

about “complexity of CG”.

125

Unless κ(A) is close to one, the distribution of eigenvalues

between the maximal and minimal ones (not only κ(A)) is im-

portant;

(here we can see a trouble with the term ”preconditioning”).

Does CG significantly outperform Chebyshev?

Unless the spectrum is very special, the global character of CG

takes advantage of the eigenvalue distribution between the min-

imal and the maximal eigenvalue !

126

V/3 Measuring convergence

The CG example (see Appendix)

Given x0, r0 = b − Ax0, p0 = r0

For n = 1,2, . . .

γn−1 = (rn−1, rn−1)/(pn−1, Apn−1)

xn = xn−1 + γn−1 pn−1

rn = rn−1 − γn−1 Apn−1

δn = (rn, rn)/(rn−1, rn−1)

pn = rn + δn pn−1.

127

For most elliptic self-adjoint PDEs, it is natural to measure con-

vergence in solving the discretized problem by ‖x − xn‖A .

The idea of estimating ‖x − xn‖A at the price of d extra steps

comes from [Golub, S - 94]. It was developed into a practical

algorithm in [Golub, Meurant - 97], all based on Gauß quadrature

relationship,

‖x − xn‖2A = EST2 + ‖x − xn+d ‖2A .

When ‖x − xn‖2A ≫ ‖x − xn+d ‖2A , EST gives a tight (lower)

estimate for ‖x − xn‖A, with the inaccuracy determined by

‖x − xn+d ‖A .

128

Mathematically equivalent formulas for EST2 :

[Golub, S - 94], [Golub, Meurant - 97] ‖r0‖2 [Cn+d − Cn ]

[Warnick - 00] rT0 (xn+d − xn)

[Hestenes, Stiefel - 52], after fifty years found and extended in

[S, Tichy 2002, 04] with justification for finite precision compu-

tations

EST2 =n+d−1

∑

l=n

γl ‖rl‖2

129

R. Kouhia, collection Cylshell, N = 90449, κ(A) = 3.62e + 11

130

Incomplete Choleski preconditioned CG, convergence character-

istics and estimate for the A-norm of the error

0 500 1000 1500 2000 2500 300010

−12

10−11

10−10

10−9

10−8

10−7

10−6

10−5

10−4

10−3

10−2 s3dkt3m2: convergence characteristics and the A−norm of the error estimate

true residual normnormwise backward errorA−norm of the errorestimate

131

Estimates for the relative A-norm of the error with different

values of the parameter d

0 500 1000 1500 2000 2500 300010

−9

10−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100

101

s3dkt3m2: estimates of the relative A−norm of the error for different d

d = 200

d = 80

d = 10

d = 1

relative A−norm of the errorestimate

132

Will the spectral information still determine the convergence be-

haviour when we move to the non-Hermitian case?

See next.

133

Lecture VI

GMRES BEHAVIOUR

AND EIGENVALUES

134

Spectral information and convergence

Normal matrices have a full set of eigenvectors forming a basis

of CN which can be chosen orthonormal. Therefore the change

to (orthonormal) eigenvector coordinates does not involve any

distortion of geometry.

Substantial difference from the Hermitian case which causes

enormous technical difficulties in proofs and in deriving bounds -

the eigenvalues are not real. However, principal difficulties come

with nonnormality.

We restrict ourselves to the GMRES method.

135

Given A ∈ CN×N , b ∈ CN , A nonsingular, we wish to solve Ax = b.

Consider x0 ∈ CN , r0 = b − Ax0,

construct the sequence of Krylov subspaces

Kj(A, r0) = span{

r0, Ar0, . . . , Aj−1r0}

, j = 1,2, . . .

and look for xj ∈ x0 + Kj(A, r0).

136

Minimal residual methods

‖rn‖ = minu∈x0+Kn(A,r0)

‖b − Au‖ = minz ∈ AKn(A,r0)

‖r0 − z‖

⇔ rn ⊥ AKn(A, r0) .

(Hermitian) MINRES [Paige, Saunders - 75]

and its general simplification GMRES [Saad, Schultz - 86];

mathematically equivalent to GCR analyzed in [Elman - 1982]

and to many other (mostly numerically inferior) methods.

MINRES IS NOT a symmetric variant of GMRES!

137

Implementation of GMRES [Saad, Schultz - 86]

• Arnoldi basis

{

v1 ≡ r0‖r0‖, v2, . . . , vn

}

, AVn = Vn+1Hn+1,n .

• xn = x0 + Vn yn ,

‖‖r0‖e1 − Hn+1,n yn‖ = miny

‖‖r0‖e1 − Hn+1,n y‖ .

Other iplementations (GCR, simpler GMRES, ORTHODIR)

suffer from possible numerical difficulties.

138

Bound by Elman step by step for A normal:

‖rn‖ = ‖pn(A)r0‖ = minp∈Πn

‖p(A)r0‖ = minp∈Πn

‖ Y [p(Λ)Y ∗r0]‖

= minp∈Πn

‖p(Λ)Y ∗r0‖ = minp∈Πn

{∑

i

| (y∗i r0) p(λi) |2 }12

≤ ‖r0‖ minp∈Πn

maxi

|p(λi)| .

pn(λi) represents a multiplicative correction to the values of the

individual components of r0 in the orthonormal basis {y1, . . . , yN}in order to minimize the sum of squares.

139

Bound by Elman step by step for A diagonalizable:

‖rn‖ = ‖pn(A)r0‖ = minp∈Πn

‖p(A)r0‖ = minp∈Πn

‖ Y [ p(Λ)Y −1r0]‖

≤ ‖Y ‖ minp∈Πn

‖p(Λ)Y −1r0‖

= ‖Y ‖ minp∈Πn

{∑

i

| [Y −1r0]i p(λi) |2 }12

≤ ‖Y ‖ ‖Y −1r0‖ minp∈Πn

maxi

|p(λi)|

≤ ‖r0‖ κ(Y ) minp∈Πn

maxi

|p(λi)| .

140

For any v, ‖v‖ = 1

minp∈Πn

‖ p(A)v ‖ ≤ minp∈Πn

‖ p(A) ‖ ≡ minp∈Πn

max‖v‖=1

‖ p(A)w ‖ ,

therefore

max‖v‖=1

minp∈Πn

‖ p(A)v ‖ ≤ minp∈Πn

max‖w‖=1

‖ p(A)w ‖ .

In the normal case ≤ can be replaced by =

and the polynomial approximation problem is attainable

[Joubert -93], [Gurvits, Greenbaum - 93], [Trefethen - 93]

‖rn‖‖r0‖

= minp∈Πn

‖ p(A)v1 ‖ ≤ max‖v‖=1

minp∈Πn

‖ p(A)v ‖ = minp∈Πn

‖ p(A) ‖ .

141

For a general Y , some of the components Y −1r0 can become

very large. In such case Y [ p(Λ)Y −1r0] represents a significant

cancelation. The minimization problem

‖rn‖ = minp∈Πn

‖ Y [ p(Λ)Y −1r0]‖

reflects that, while the term in the bound

‖Y ‖ minp∈Πn

‖ p(Λ)Y −1r0 ‖

does not (cf. [Trefethen-97]).

142

Convergence to the exact solution:

GMRES converges to the exact solution at the step m

(xm ≡ x, rm ≡ 0)

if and only if

the vectors r0, Ar0, . . . , Am−1r0 are linearly independent and the

vectors r0, Ar0, . . . , Amr0 are linearly dependent

(Krylov sequence r0, Ar0, . . . , ANr0 has length m).

143

Spectrum of A and convergence of GMRES

In practical computations the rate of convergence is linked to

the distribution of eigenvalues of the matrix A.

There are, however, examples showing that any (nonincreasing)

convergence curve is possible for GMRES with matrix A having

any given (nonzero) eigenvalues.

[Greenbaum, S - 94], [Greenbaum, Ptak, S - 96]

144

Assume convergence exactly in N steps (generalization to m < N

possible). For simplicity of notation r0 = 0 (x0 = 0).

Question I:

Given convergence curve, describe the set of all {A, b} such that

GMRES (A, b) generates the prescribed curve.

Question II:

Given convergence curve, given N nonzero eigenvalues (not

necessarily distinct), describe the set of all {A, b} such that

GMRES (A, b) generates the curve while the spectrum

of A is prescribed.

145

Question III:

Given A, denote by m the degree of the minimal polynomial

of A. Describe those b for which GMRES (A, b) converges in

m steps.

146

Convergence curve

‖r0‖ ≥ ‖r1‖ ≥ · · · ≥ ‖rN−1‖ > ‖rN‖ = 0,

h ≡ (η1, . . . , ηN)T , ηj ≡ ((‖rj−1‖)2 − ‖rj‖2)1/2.

d ≡ (ν1, . . . , νN), ν1 =1

ηN, ν2 = − η1

ηN, . . . , νN = −ηN−1

ηN.

Meaning? Let W = (w1, . . . , wj) be the orthonormal basis of

AKj(A, r0). Then

rn = r0 −n∑

j=1

wjηj, r0 =n∑

j=1

wjηj + rn, ‖r0‖2 =n∑

j=1

η2j + ‖rn‖2

147

Convergence curve companion matrix

H =

0 1/ηN1 .. . −η1/ηN

. . . 0 ...1 −ηN−1/ηN

=

01 .. .

. . . 01

d

H−1 =

η1 1... 0 . . .... . . . 1

ηN 0

=

h

10 .. .

. . . 10

148

Eigenvalues:

{λ1, λ2, . . . , λN}, λj 6= 0, j = 1, . . . , n .

qN(z) ≡ zN −N−1∑

j=0

αjzj = (z − λ1)(z − λ2) . . . (z − λN),

pN(z) ≡ 1 −N∑

j=1

ξjzj = − 1

α0qN(z), ξN =

1

α0, ξj = − αj

α0,

s ≡ (ξ1, . . . , ξN)T , a = (α0, . . . , αN−1)T

149

Spectral comparison matrix: qN(z) = det(C − zI)

C =

0 α01 .. . α1

. . . 0 ...1 αN−1

=

01 .. .

. . . 01

a

C−1 =

−α1/α0 1−α2/α0 0 .. .

... . . .

... 11/α0 0

=

s

10 .. .

. . . 10

150

Theorem 1 (Question I)

The following assertions are equivalent:

1◦ Residual vectors norms of GMRES(A, b) form a prescribednonincreasing sequence ‖r0‖ ≥ ‖r1‖ ≥ · · · ≥ ‖rN−1‖ > ‖rN‖ = 0.

2◦ Matrix A is of the form A = WRHW ∗ and b satisfiesW ∗b = h, where W is a unitary matrix, R is a nonsingularupper triangular matrix and

H =

01 . . .

. . . 01

d

.

151

Proof. Consider the QR decomposition

B ≡ (Ab, A2b, . . . , ANb) = W R

.

Then the columns Wj = (w1, . . . , wj) represent an orthonormal

basis of

AKj = A span{b, . . . , Aj−1b} = span{Ab, . . . , Ajb}.therefore

ηj = |ηj| =(

‖rj−1‖2 − ‖rj‖2)1/2

.

152

Rescaling

b = W (Γh) = (WΓ)h = Wh

where

Γ = diag (γi), |γi| = 1,

we can write

B = WR, R = Γ∗R.

153

1◦ is equivalent to

A(b, WN−1) = AW

h

10 .. .

. . . 10

= AWH−1 .

Since for some nonsingular upper triangular R

A(b, WN−1) = (Ab, AWN−1) = WR ,

the identity AWH = WR finishes the proof.

154

Theorem 2 (Question II)

The following two assertions are equivalent:

1◦ The spectrum of A is {λ1, . . . , λN} and GMRES(A, b) yields

residuals with the prescribed nonincreasing sequence

‖r0‖ ≥ ‖r1‖ ≥ · · · ≥ ‖rN−1‖ > ‖rN‖ = 0 .

2◦ Matrix A is of the form A = WRCR−1W ∗ and b = Whwhere C is the companion matrix corresponding to the

polynomial qN(z), W is unitary and R a nonsingular upper

triangular matrix such that Rs = h.

Corollary: Any noninreasing convergence curve can be generated

by GMRES for a matrix having any prescribed eigenvalues.

155

Proof. Assume 1◦. A is annihilated by qN(z), therefore

B = (A−1B)C = (b, . . . , AN−1b)C ,

AB = BC and b = BC−1e1 = Bs.

Similarly to Theorem 1, b = Wh, B = WR, which gives Rs = h.

Moreover,

AWR = AB = BC = WRC

proves 2◦.

Assume 2◦.

Then sp(A) = {λ1, . . . , λN}, and, by induction, {w1, . . . , wk} rep-

resents the unitary basis of AKk which proves 1◦.156

Remark: W represents a change of the basis.

Denote

S1 = S1(f) the set of all pairs {A, b} determined by Theorem 1,

S2 = S2(f, {λ1, . . . , λN}) the set of all pairs {A, b} determined by

Theorem 2.

Clearly S2 ⊂ S1.

Parametrization?

157

Proposition

The set S2 is parametrized by W and by the nonsingular upper

triangular matrix R satisfying the relation

Rs = h.

The set S1 is parametrized by W and an arbitrary nonsingular

upper triangular matrix R. If, in addition, the spectrum of the

matrix A is prescribed, then this additional condition is equivalent

to

RCR−1 = RH,

or, equivalently, R is given by (for the proof see [Arioli, Ptak,

S-98]

R = R

1 0

0 R−11,N−1

.

158

Since ξN 6= 0, any nonsingular upper triangular matrix R sat-

isfying

Rs = h

has its last column uniquely determined by the entries in the left

principal submatrix R1,N−1 representing free parameters.

159

Denoting

Y ≡ RC−1 = R

s

10 .. .

. . . 10

=

h

R1,N−1

0

,

Then

A = WRCR−1W ∗ = W (RC−1)C(CR−1)W ∗ =

= W (RC−1)C(RC−1)−1W ∗ = WY CY −1W ∗

Assertions 1◦ and 2◦ of Theorem 2 are equivalent to

160

Theorem 2 (continuation)

3◦ Matrix A is of the form A = WY CY −1W ∗ and b = Wh

where C is the companion matrix corresponding to the

polynomial q(λ) , W is unitary and R1,N−1 part of Y

is any (N−1) by (N−1) nonsingular upper triangular matrix.

161

The problem of “constants” in the bounds of the type

‖ rn ‖ ≤ ω(A, r0) Fn(sp(A), N) .

If conclusion is based only on Fn(sp(A), N) and the dependence

of ω(A, r0) on the data is not included, then the bound must

hold for any data. Consequently, the bound is for any finite di-

mensional problem irrelevant, otherwise we get a contradiction

with the Theorem.

162

The bound Const Fn(sp(A), N) does not intersect the rectangle

(1,0) − (1, N) − (0, N) − (0,0) .

163

Relationship to minimal polynomial

Theorem 3

Let m denotes the degree of the minimal polynomial qA(λ) of

the matrix A. Then, for any right hand side b, GMRES(A, b)

converges to the exact solution x on or before the step m. More-

over, there exist a right hand side b, for which GMRES(A, b)

converges to x exactly in m steps.

Characterization of right hand sides, for which Krylov sequences

have the maximal length?

164

Minimal polynomial qA(λ) = (λ − λ1)n1 . . . (λ − λk)

nk.

Denote the nullspaces of (λjI − A)nj by E(λj).

Then any b can be decomposed as

b = t1 + t2 + · · · + tnk, tj ∈ E(λj).

The vector b yields the Krylov sequence of the length m if and

only if

(λjI − A)nj−1 tj 6= 0

for each j, j = 1, . . . , k. Equivalently, the vector b have for each

j nonzero component in the direction of at least one last Jordan

principal vector conformed to any of the Jordan blocks largest

in size corresponding to λj.

165

Pathological initial residuals?

The presented cautious view seems to be in conflict with the

common wisdom – convergence is commonly related to eigen-

value distribution even for general matrices without examining

eigenvectors. The proved facts should not be ignored (even a

common knowledge can be wrong), but they should be under-

stood and interpreted correctly! There are good reasons for link-

ing convergence to eigenvalues in many cases, but the reasons

must be given and examined (contrary to common practice).

The role of “pathological initial residuals”; just academic exam-

ples ? Not true. Convection-diffusion examples were described

by Trefethen long ago, see also [Ernst - 00].

166

VI/2 Convection-diffusion model problem

Convection dominated: ν ≪ ‖w‖

167

Discretization

• regular h × h grid, h = 1/(N + 1) , bilinear finite elements,

mesh Peclet number Ph ≡ (h‖w‖)/(2ν) ;

• Ph > 1 , then Galerkin discretization produces wiggles (non-

physical oscillations near the boundary layers);

• Streamline Upwind Petrov Galerkin (SUPG) equivalent to

adding stabilizing diffusion in the direction of the flow (wind);

• wind parallel to the mesh; here the vertical wind

w = [0,1]T .

168

The coefficient matrix of the linear algebraic system is

A = ν Ad + Ac + δ As ,

Ad = (∇φj, ∇φi) ,

Ac = (w · ∇φj, φi) ,

As = (w · ∇φj, w · ∇φi) , δ = δ∗ h/‖w‖ .

A =(

(ν I + δ w wT )∇φj, ∇φi

)

+ (w · ∇φj, φi) .

170

The constituent matrix stencil for A

m4 m3 m4տ ↑ ր

m2 ← m1 → m2ւ ↓ ց

m6 m5 m6

171

has numerical values

−ν3 + h

12(1 − 2δ) −ν3 + h

3(1 − 2δ) −ν3 + h

12(1 − 2δ)տ ↑ ր

−ν3 + δh

3 ← 83ν + 4

3δh → −ν3 + δh

3ւ ↓ ց

−ν3 − h

12(1 + 2δ) −ν3 − h

3(1 + 2δ) −ν3 − h

12(1 + 2δ)

172

With our choice of w, the differential equation is separable, and

the eigendecompozition of the discretized operator is known an-

alytically.

173

Consider the mass (M) , stiffness (K) and gradient (C)

matrices of the corresponding 1D convection-diffusion model

problem discretized using linear elements with the mesh size h,

M =h

6tridiag (1,4,1) , K =

1

htridiag (−1,2,−1) ,

C =1

2tridiag (−1,0,1) .

Let ‘ ⊗ ’ denote the Kronecker product of matrices.

174

Then the 2D SUPG discretized N2 × N2 operator is for the

horizontal line ordering of unknowns

AH = νM ⊗ K + ((ν + δh)K + C) ⊗ M ,

for the vertical line ordering of unknowns

AV = νK ⊗ M + M ⊗ ((ν + δh)K + C) .

AH and AV are orthogonally similar,

AV = P AH P , P = [I ⊗ e1, . . . , I ⊗ en] , P = PT , P2 = I .

175

≈ optimal stabilization parameter δ∗ ≡ 12

(

1 − 1Ph

)

affects

• smoothing of the discretized solution,

• behavior of the linear algebraic solver (convergence behavior

of GMRES).

Examples of boundary conditions:

• Raithby (discontinuous inflow),

• partial right side of the domain.

176

00.2

0.40.6

0.81

0

0.2

0.4

0.6

0.8

10

0.2

0.4

0.6

0.8

1

x−axisy−axis

solu

tion

177

00.2

0.40.6

0.81

0

0.2

0.4

0.6

0.8

10

0.2

0.4

0.6

0.8

1

x−axisy−axis

solu

tion

178

Long list of authors, papers and books

Brooks, Hughes, Raithby, Roos, Stynes, Tobiska,

Morton, Axelsson, . . .

GMRES convergence studied using the field of values and the

eigendecompozition of the system matrix in particular by

[Eiermann - 89], [Ernst - 00], [Eiermann, Ernst - 02], [Fisher,

Ramage, Silvester and Wathen - 99], [Elman, Ramage - 01, 02].

Different approach suggested in [Liesen, S - 04], [Liesen, S - 05].

179

Eigendecompozition of AH (AV ) does not lead to useful bounds

due to the ill-conditioned eigenvectors and cancelation. Instead:

The matrices K and M are symmetric tridiagonal Toeplitz.

The matrix of eigenvectors

U = [u1, . . . , uN ], U = UT , U2 = I

uj = (2h)1/2 [sin(jhπ), . . . , sin(Njhπ)]T , j = 1, . . . , N ,

represents the Fourier basis. Consider the Fourier transformation

of unknowns in the direction perpendicular to the wind. Subse-

quent reordering of the transformed unknowns by vertical lines

gives

P (I ⊗ U)AH (I ⊗ U)P P (I ⊗ U)xH = P (I ⊗ U) bH .

180

The transformation above corresponds to the simultaneous di-

agonalization of the symmetric tridiagonal Toeplitz blocks in the

block tridiagonal matrix AH , with the subsequent permutation

of the rows and columns.

The approach using AV xV = bV is even more straight, AH was

used here for historical reasons. Resulting system:

Ty = f

181

T = diag (Tj) , Tj = tridiag (γj, λj, µj) , j = 1, . . . , N .

Thus, the original discretized system transformes to N non-

symmetric tridiagonal Toeplitz systems

Tj yj = fj , j = 1, . . . , N

representing N discretized one-dimensional convection-diffusion

problems (in the vertical direction of the original mesh, but ac-

counting for the diffusion in the horizontal direction).

182

GMRES convergence behavior:

‖rn‖2 = minp∈Πn

‖ p (A) b ‖2 = minp∈Πn

‖ p (T ) f ‖2= minp∈Πn

N∑

j=1

‖ p (Tj) fj ‖2 .

GMRES for non-symmetric tridiagonal Toeplitz systems? Inter-

esting case: the superdiagonal (µj) substantially smaller in mag-

nitude than the two others, |γj| ≈ λj ≫ µj . Relating the problem

for Tj, fj to convergence of GMRES for scaled Jordan blocks,

we proved (and quantified)

Theorem

Let l be the index of the first significant nonzero entry in fj .

Let |γj| ≈ λj ≫ µj . Then GMRES for Tjyj = fj must converge

slowly for at least N − l steps.

183

Slow initial convergence:

‖rn‖2 = minp∈Πn

N∑

j=1

‖ p (Tj) fj ‖2 ≥N∑

j=1

minp∈Πn

‖ p (Tj) fj ‖2 .

If the theorem applies at least for one j, then the convergence

of GMRES for A x = b must be slow for at least N − l steps.

Acceleration of convergence:

The technique developed in [Greenbaum, Duintjer Tebbens and

S - 05?] leads to tight upper and lower bounds which capture

the sharp convergence acceleration.

184

Nonzero boundary conditions on the full right side of the domain,

GMRES convergence for the whole system (solid line) and for

the individual tridiagonal Toeplitz blocks.

0 2 4 6 8 10 12 14 16 18 2010

−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

12 3 4

56789

10

11

12

13

14

15

185

Nonzero boundary conditions on the part of the right side of the

domain, GMRES convergence for the whole system (solid line)

and for the individual tridiagonal Toeplitz blocks.

0 2 4 6 8 10 12 14 16 18 2010

−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

12 3 4 5

67891011121314

15

186

Discontinuous inflow boundary conditions (Raithby), two differ-

ent values of the diffusion coefficient ν = 0.01 and ν =

0.0001 correspond to the solid and to the dashed line, respec-

tively.

0 5 10 15 20 25 30 35 40 45 50 5510

−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

187

σjk = λj + (γj µj)1/2 ωk , ωk = 2cos (khπ), k = 1, . . . , N .

Which spectrum corresponds to which convergence curve?

λj > 0 , γj µj < 0 .

0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065−0.025

−0.02

−0.015

−0.01

−0.005

0

0.005

0.01

0.015

0.02

0.025

188

GMRES Convergence curve, upper and lower bounds for

ν = 0.01 .

0 5 10 15 20 25 30 3510

−20

10−15

10−10

10−5

100

189

GMRES Convergence curve, upper and lower bounds

for ν = 0.0001 .

0 10 20 30 40 5010

−20

10−15

10−10

10−5

100

190

Concluding remarks

• initial phase is important, it depends on the right hand side!

• technique: orthonormal transformation to Jordan-like-structure

(the problem is diagonalizable!)

• generalizations? Many ways . . . ?

• analytical study of preconditioning?

191

What can be saved in finite precision arithmetic? Does the anal-

ysis of finite precision behaviour belong to mathematics?

See in Lectures VII and VIII.

192

Lecture VII

NUMERICAL BEHAVIOUR

I.

193

General considerations

1. Can’t we compute exactly?

2. Intermediate quantities and desired accuracy

3. Computational cost and numerical stability

194

3.1 Can’t we compute exactly?

No, some problems cannot be solved exactly in principle. For

example, eigenvalues cannot in general be computed exactly be-

cause of the Abel theorem. Consequently, the Schur decomposi-

tion cannot in general be computed exactly - in a finite number

of steps. There is an unavoidable truncation error.

Limited accuracy of performing elementary computer operations

(storing data, + , − , ∗ , /) leads to rounding errors. This we

call precision, and speak about finite precision arithmetic. We

can emulate arbitrary precision arithmetic, but we cannot use it

widely in solving practical problems.

195

Will that issue be resolved by the progress in technology? Hardly.

Accuracy of a computed result is determined by the way the

elementary rounding errors on the machine precision level are

amplified in the computational process. Machine precision will

always be limited, and it influences the resulting accuracy linearly,

while the growth can be exponential.

However, the amplification of elementary rounding errors is not

random, it can be analyzed and understood!

Rounding errors are not always bad;

see, e.g., breaking the symmetry in shifted QR algorithm which

can theoretically suffer from infinite oscillations (relation to dy-

namical systems, [Batterson, Smilie -90]), or generating nonzero

components in invariant subspaces in the Lanczos method.

196

If not under control, elementary rounding errors can grow and

cause a large computational (numerical) error, which can invali-

date the whole solution process. Interestingly, this fact was well

understood by the founding fathers Von Neumann, Goldstine,

Turing, Wilkinson, Forsythe . . . However, it has largely been

ignored in most numerical PDE literature.

[Nash, Golub - 90, quote by Parlett], [Babuska - 03], [Oden et

al - 03], [Wohlmuth, Hoppe - 99], [Stein(ed) - 03]

Possible consequences of not including computational error in

the error analysis of the whole solution process?

197

• Either the computation of the approximate solution of the

algebraic problem consumes unnecessary time and resources

due to aiming at unnecessary high accuracy,

• Or the computational error which is not under control can

impinge the other stages of the solution process and spoil

the numerical solution.

Efficient and computable a-posteriori error bounds which include

algebraic errors [Jiranek, S, Vohralık - 08].

A philosophical difficulty of rounding error analysis - it can not

be done mechanically without a deep knowledge of the analyzed

method and algorithm.

198

3.2 Intermediate quantities and desired accuracy

Do we need in general highly accurate intermediate quantities

in order to guarantee a required (high or not) accuracy of the

computed final result? No, we do not.

Surprising observation Parlett, Wilkinson, see [Parlett - 90]

The number of significant digits in the intermediate quantities

generated in a computation may be quite irrelevant to the accu-

racy of the final output.

199

Vital correlations between (inaccurately) computed quantities

(recall, amplification of rounding errors is not random) can lead

to highly accurate final results. Understanding gained via round-

ing error analysis can guarantee final accuracy close to the ma-

chine precision level.

Such understanding is based on deep mathematical knowledge

about the analyzed method and algorithm.

Example:

The Lanczos method for solving Hermitian eigenvalue problems.

200

Principle of the Lanczos method

Ideally, find in steps 1 through n an N by n matrix Qn having

orthonormal columns such that

Q∗n A Qn = Tn ,

where Tn is Hermitian tridiagonal. Eigenvalues of Tn are then

considered approximations of the (dominant) eigenvalues of A .

Computationally, in the presence of rounding errors, Qn does

not have orthonormal columns. The columns may even become

(numerically) linearly dependent.

201

Even worse, for the computed quantities

Q∗n A Qn 6= Tn ,

and Tn may even not represent a matrix of the operator Aprojected on the Krylov subspace generated by the computed

Lanczos vectors. Most of the entries in Tn may even not have

a single digit of accuracy, i.e.

Tn − Tn can be large.

Does this mean a total disaster? No! The magic is called back-

ward error, and we know it from the work of Wilkinson, Paige

and Greenbaum.

202

For steps 1 to n of a given Lanczos FP computation there exist:

• An M by M matrix A having all its eigenvalues close to

the eigenvalues of A , M ≥ N , possibly M ≫ N ;

• An M by n matrix Qn having orthonormal columns such that

Q∗n A Qn = Tn

Results of the finite precision Lanczos computation for the matrix

A are equivalent to the results of the exact Lanczos computation

for the matrix A having nearby eigenvalues.

203

Consequently, as we will see, Tn is used for computing the

eigenvalues of A to close to full machine precision!

The bad part of the story is that this remarkable success is

not without possible side effects. The eigenvalues of A are

not approximated in the same order and with the same speed

as it would be ideally (in exact arithmetic). This is caused by

the fact that single eigenvalues can in finite precision Lanczos

computation be approximated by multiple computed copies.

In order to prevent the side effects, we must pay the price -

here not by computing the intermediate quantities using higher

precision, but by applying some correction procedure such as

partial reorthogonalization; for an overview see [Parlett - 92].

204

3.3 Computational cost and numerical stability

Towards a mathematical foundation of numerical analysis

– quest for a formal mathematical model of computing with real

numbers, see [Blum, Cucker, Shub, Smale - 99], [Smale - 97]:

• Complexity theory of numerical analysis – study of the num-

ber of arithmetic operations required to pass from the input

to the output of a numerical problem;

• Upper bounds aspect – worst or average case analysis of

basic algorithms;

• Lower bounds aspect – examination of efficiency for all al-

gorithms solving a given problem (the intrinsic difficulty of

solving a problem).

205

Complications

• Ill-posed problems,

• Conditioning,

• Round-off errors,

• Problems are by their nature solved only to a certain accuracy

(eigenvalue problems, iterative methods in general . . . ).

Conclusion: There are practically no results linking complexity

and numerical stability of computing over real numbers.

[Cucker 99]

206

However, we should keep in mind that in numerical analysis, al-

gorithms are tools for solving practical problems - see also [CBSS

- 99, p.23], [Iserles - 00], [Baxter, Iserles - 03].

We should consider, that a practical problem means some par-

ticular (class of) data, for which we seek the approximate so-

lution(s). The specific properties of the data (inner correlations

etc.) are used in order to get the approximate solution efficiently.

Consequently, questions related to a particular problem (includ-

ing data) are much more specific than worst-case or average-case

bounds.

We do not focus on complexity and restrict ourselves to the cost

of particular computations. There are many results linking the

cost of a particular computation to numerical stability!

207

Short recurrences

1. Loss of orthogonality leads to delay

1.1 Hermitian systems

1.2 Non-Hermitian systems

2. Maximal attainable accuracy

3. Measuring convergence in FP computations

208

Example: In finite precision conjugate gradients orthogonality is

lost, convergence is delayed and final accuracy is limited

0 20 40 60 80 100 120

10−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

(E) || x−xj ||

A

(FP) || x−xj ||

A

(E) || I−VTjV

j ||

F(FP) || I−VT

jV

j ||

F

209

4.1 Loss of orthogonality leads to delay

In the Hermitian case, the underlying basis (though it may not

be normalized) is the Lanczos basis. Therefore we must study

rounding error effects in computing Lanczos vectors.

Lanczos vectors are computed using a three-term recurrence, or,

possibly, using two coupled two-term recurrences. Consequently,

orthogonality (even linear independence) may be lost quickly. For

a long time it was concluded that loss of orthogonality meant also

loss of all elegant mathematical structure of orthogonal polyno-

mials (and Gauss quadrature) which could not be extended to

computational behavior of the Lanczos process.

210

However, [Paige - 71, 76, 80]: Loss of orthogonality follows a

regular structure, which can be revealed!

In finite precision computation

AQn = QnTn + βn+1qn+1eTn + Fn, ‖ Fn ‖ ≤ n1/2 ‖ A ‖ ε1 .

QTnQn 6= I , Tn computed by FP L(A, q1) may be far from the

theoretical counterpart.

211

Tn = Sn diag (θ(n)j ) S∗

n , Sn = [s(n)1 , . . . , s

(n)n ] , s

(n)j =

s(n)1j...

s(n)nj

s(n)1j top element - weight,

s(n)nj bottom element - approx. bound, δnj = βn+1|s(n)

nj | ,θ(n)j Ritz value,

z(n)j = Qns

(n)j Ritz vector.

212

Accuracy of the Ritz values computed in the Lanczos method

Exact arithmetic: minl |λl − θ(n)j | ≤ ‖Az

(n)j − θ

(n)j z

(n)j ‖ ≤ δnj .

Finite precision arithmetic:

minl

|λl − θ(n)j | ≤

‖Az(n)j − θ

(n)j z

(n)j ‖

‖z(n)j ‖

≤ ( δnj + n1/2‖A‖ε1 )

‖z(n)j ‖

Due to the loss of orthogonality it can happen ‖z(n)j ‖ → 0 !

The quantity δnj is easy to compute with negligible additional

rounding errors. Does it tell anything about convergence of θ(n)j

in finite precision computations?

213

|λi − θ(n)j | ≤ max

{

2.5(δnj + n1/2 ‖ A ‖ ε1), (n + 1)3 ‖ A ‖ ε2}

,

‖ z(n)j − (z

(n)j , ui)ui ‖ ≤ (δnj + n1/2 ‖ A ‖ ε1)

minl 6=i

|λl − θ(n)j |

δnj = βn+1|s(n)nj |

Fascinating result! Result of FP computation verified at no

cost! Please notice that without the theory developed by Paige,

the ideal relations would imply nothing about the result of FP

computations!

214

Bounds for |s(n)nj |, δnj? [Parlett - 80], [Greenbaum, S - 90]

Loss of orthogonality among the Lanczos vectors? Paige:

|(z(n)j , qn+1)| =

|ε(n)jj |δnj

, |ε(n)jj | ≤ n ‖ A ‖ ε2 .

As long as there is no converged Ritz value, orthogonality must

be well preserved. If the orthogonality among z(n)j , qn+1 is lost,

then θ(n)j have converged to some λi .

New related results

[Wulling - 04, 05?], [Zemke - 04], [Meurant - 05?]

215

0

20

40

60

80

010

2030

4050

6070

80

0

0.2

0.4

0.6

0.8

1

Orthogonality among the Lanczos

vectors.

0

20

40

60

80

010

2030

4050

6070

80

0

0.5

1

1.5

Orthogonality of the (n + 1)st

Lanczos vector (right) against the

Ritz vectors from the nth step

(left).

216

Other work on the loss of orthogonality:

• In [Grcar - 81, never published] Forward analysis - when

the forward error of the computed Lanczos vectors is not

exceeding√

ε , the computed Krylov subspace is correct to

the ε level (the error is largely within the exact subspace).

This was called projection property. In order to maintain the

projection property, Grcar suggested periodic reorthogonal-

ization. It makes sense only until the forward error is below√ε . Not a formal mathematical theory.

• Berkeley, Under the influence of Parlett, Kahan, see [Parlett

- 80, 92, 94], [Parlett, Reid - 81], [Greenbaum, S - 92], [S,

Greenbaum - 92]

217

• In [Parlett, Scott - 79]: Maintaining the strong linear indepen-

dence of the Lanczos vectors - semiorthogonality. Orthog-

onalize only against converged Ritz vectors (when δnj ≈‖A‖√ε .

• In [Scott -79]: Ideally, for any matrix A there is always a

starting vector q1 such that the Lanczos method does not

converge to any eigenvalue until the last step. Construction

- Ritz values at step n − 1 prescribed as the midpoints of

the intervals given by the eigenvalues.

Works also computationally (from experiments). Consequence:

Rounding error amplification can strongly depend on the ini-

tial vector!

218

• In [Simon - 84, 84]: Monitoring semiorthogonality via simple

scalar recurrence, partial reorthogonalization. Semiorthogo-

nality ensures, that the computed matrix Tn represents, up

to the terms ≈ ‖A‖ε , the orthogonal projection of A onto

the computed Krylov subspace.

• In [Parlett - 92]: Full reorthogonalization makes sense only

until the semiorthogonality is maintained.

• Tight clusters of eigenvalues - [Dhillon - 97], [Parlett - 96],

[Ye - 95], [Dhillon, Parlett - 04, 04]

219

Delay of convergence: Backward error - like analysis

of the symmetric Lanczos and CG

[Greenbaum - 89]

Finite precision behavior is explained using exact precision results

for a larger problem.

It uses the relationship between Lanczos method, Jacobi matri-

ces and Orthogonal polynomials.

220

First, recall that any n distinct points {θ(n)j }n

j=1 with weights

{ω(n)j }n

j=1, ω(n)j > 0,

n∑

j=1ω

(n)j = 1 , define the unique set of

monic polynomials

1, ψ1, . . . , ψn

orthogonal with respect to the innerproduct

(ϕ, ψ)n =n∑

j=1

ω(n)j ϕ(θ

(n)j ) ψ(θ

(n)j ) .

Recall the R-S integrals!

221

If ω(n)j = (s

(n)1j )2 , then ψl are the characteristic polynomials of

Tl (Lanczos polynomials), satisfying the minimization property

‖ ψl ‖n = min { ‖ ψ ‖n , ψ monic of degree ≤ l } , l = 1, . . . , n .

Please notice the interpretation of top elements of Tn’s eigen-

vectors.

Selection of related work: [Karlin, Shapley - 53], [Fischer, Freund

- 93], [Freund, Hochbruck - 93], [Golub, S - 94], [Golub, Meurant

- 94, 97], [Gautschi - 03], ...

222

Second, please notice that exact or FP L(A, q1) generates in

steps 1 to n a sequence T1 − Tn which is exactly the same

as in exact L(B, p1) ,

B = V diag (θ(n)j ) V ∗, V ∗V = I, p1 = V

(

s(n)11 , s

(n)12 , . . . , s

(n)1n

)T,

e.g., for V ≡ Sn, p1 = e1, B ≡ Tn , Tn is generated by the

exact L(Tn, e1) .

FP Lanczos in steps 1 to n → Exact Lanczos

223

[Greenbaum - 89] much stronger:

Let J steps of FP L(A, q1) produce TJ . Then TJ is generated

in J steps of exact Lanczos algorithm applied to some AJ , q1J .

AJ is of dimension N + l(J) ; all its eigenvalues lie within tiny

intervals about the eigenvalues of A .

Similarly for the norm of the residuals in the CG method.

[S - 91]: For any eigenvalue of A there must be at least one

eigenvalue of AJ close to it.

Exact distribution of AJ’s eigenvalues depends on the actual

rounding errors.

224

[Greenbaum, S - 92]:

FP L(A, q1) and FP CG(A, q1) behave very similarly as the

exact algorithms applied to any A, q1 from a certain class A

is of dimension Nl ,

where Nl eigenvalues are spread throughout tiny intervals about

the eigenvalues of A while each tight cluster has the total weight

of the original eigenvalue.

This model is valid for any reasonable number of steps, practi-

cally no dependence on l (if sufficiently large), small dependence

on the size of intervals.

In terms of the R-S integrals (theory still not finished):

225

Finite precision Lanczos (CG) is (with some inaccuracy) the ma-

trix formulation of the exact Gauss quadrature of the R-S integral

for some blurred distribution function ω(λ) , which represents the

spectral decomposition of some infinitely dimensional problem.

...

0

1

ω1

ω2ω3

ω4

ωN

ζ λ1λ2 λ3. . . . . . λN ξ

226

Consequences:

• a small Hermitian (HPD) perturbation causes only a small

change of the Lanczos (CG) behavior.

• Finite precision L (CG) for A, q1 corresponds to the exact

precision L (CG) for A, q1 .

By applying exact precision theory (convergence bounds) to

A, q1 we obtain a quantitative description of FP L (CG) for

A, q1 . [Greenbaum, S - 92], [Notay - 93]

• Approximation to the minimal polynomial is for the case with

individual well-separated eigenvalues very different from the

approximation in the case of tight clusters of eigenvalues.

227

0 10 20 30 40 50 60 70−40

−30

−20

−10

0

10

20

30

40

lambda

valu

e of

the

poly

nom

ial

Several close roots are placed in well separated tight clusters

due to the minimization property. But it means that, at the

given step, we do not have enough Ritz values to approximate

some eigenvalues in the other parts of the spectrum; the CG

convergence can be for these two cases very different.

228

Linear independence of the computed generating vectors is lost

with multiple Ritz values. Delay of convergence: each loss of

linear independence costs one iteration!

iteration . . . n ,

dimension of computed Kn . . . n − i ,

delay . . . i .

It is extremely difficult to estimate for the given step n

∣∣∣‖rFP

n ‖ − ‖rn‖∣∣∣ ≤ ‖rFP

n − rn‖

even for HH GMRES (it assumes solving the question about the

stability of Krylov subspaces). We have to rotate our view.

229

0 20 40 60 80 100 12010

−18

10−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

102

iteration number

resi

dual

nor

ms,

bac

kwar

d er

ror,

err

or n

orm

and

loss

of o

rtho

gona

lity

CG − Lanczos

For example, for n = 95 dim (Kn) is not 95, but only 43. When

we shift back each point on each convergence curve i = i(n)

steps, we obtain

230

0 20 40 60 80 100 12010

−18

10−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

102

iteration number

resi

dual

nor

ms,

bac

kwar

d er

ror,

err

or n

orm

and

loss

of o

rtho

gona

lity

rank−shifted CG − Lanczos

shifted CG convergence curves,

which can be compared with

0 20 40 60 80 100 12010

−18

10−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

102

iteration number

resi

dual

nor

ms,

bac

kwar

d er

ror,

err

or n

orm

and

loss

of o

rtho

gona

lity

reorhogonalized CG − Lanczos

the double reorhogonalized CG

[Paige - 80].

231

4.1.2 Non-Hermitian systems

Small perturbation of A can cause a large perturbation of the

spectrum - the role of eigenvalues?!

• Though some results formally correspond to those of Paige,

their interpretation must be different. [Bai - 94]

• Similarly the re-biorthogonalization (maintaining semidual-

ity). [Day - 99]

• Backward error - like result?

232

4.2 Maximal attainable accuracy

recursively computed residual × true residual b − Axn

recursive residuals −→ 0 . Final accuracy?

Two term recurrences:

xn+1 = xn + ωn pn

rn+1 = rn − ωn Apn , pn+1 = rn+1 + ψn pn

233

[Sleijpen et al. - 94], [Greenbaum - 97]:

en = (b − Axn) − rn , en = en−1 + ln−1 ,

where ln−1 counts for the local errors in computing xn, rn

from xn−1, rn−1 .

Consequently,

en+1 = e0 +n∑

j=0

lj ,

global error is given as the sum of local errors.

234

Bound:

‖ek‖‖A‖‖x‖ ≤ const k θk ε + O(ε2) , θk = max

j≤k‖xj‖/‖x0‖ .

Consequence: Oscillations of the size of the approximate so-

lution may damage the final accuracy (BiCG - like methods!).

Extension to LS: [Bjorck, Elfving, S - 97].

Backward stability (based on the assumption ‖rk‖ → 0).

235

Three-term recurrences - a different story:

Coupled two-term recurrences replaced by

xn+1 = −( rn + αn xn + βn−1 xn−1 )/γn,

rn+1 = −(Arn + αn rn + βn−1 rn−1 )/γn,

where γn = −(αn + βn−1).

Examples: Hestenes & Stiefel CG × Rutishauser CG; BiCG

× BIORes; (QMR variants).

236

Observation: three-term implementations “less stable” than

the coupled two-term ones (for nonsingular systems) - the final

accuracy can be much worse.

Explanation: [Gutknecht, S - 97]

en = (b − Axn) − rn , ln−1 local error analogous to two-term

case (not equal!).

en+1 = −(

enαn

γn+ en−1

βn−1

γn+ ln

)

Local errors are potentially amplified by the recurrence.

237

Global error in terms of local errors - multiplicative factors may

become large!

en+1 = e0 −n∑

j=0

lj

− l0

(

β0

γ1+ · · · + β0 . . . βn−1

γ1 . . . γn

)

− l1

(

β1

γ2+ · · · + β1 . . . βn−1

γ2 . . . γn

)

...

− ln−1βn−1

γn.

238

Example: three term CG (HPD case)

Amplification factors:

(1−ϑ)1

κ(A)

‖rk‖2‖ri−1‖2 ≤

k∏

j=i

βj−1

γj≤ (1+ϑ)κ(A)

‖rk‖2‖ri−1‖2 , ϑ ≪ 1 .

Note: holds for the computed values;

here [Greenbaum - 89], [Greenbaum, S - 92] results used.

Consequence: Oscillations of the size of the recursive residuals

may extensively damage the final accuracy.

239

0 20 40 60 80 100 12010

−18

10−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

102

Iteration Number

Tru

e R

esid

ual N

orm

s

Comparison of HS, R and HY Variants

The CG relative residual can indeed very strongly oscillate!

240

4.3 Measuring convergence in FP CG

We present a CG example, matrix s3rmt3m3 from the Cyllshell

collection by Reijo Kouhia, incomplete Choleski preconditioner.

Ideally (in exact arithmetic)

EST2 =n+d−1

∑

l=n

γl ‖rl‖2 = rT0 (xn+d − xn) .

Computationally, though the second estimate is evaluated accu-

rately, it gives misleading information.

241

0 100 200 300 400 500 600 700

10−12

10−10

10−8

10−6

10−4

10−2

100

relative A−norm of the errornumerically stable estimatenumerically unstable estimate

For the numerically unstable estimate, the identity is in finite

precision computations not valid. Rounding error analysis is fun-

damental, it should not be ignored!

242

Among the issues not covered:

• Breakdowns and their influence [Van Den Eshof - 03];

• Closeness to singularity and incompatible systems;

• Can short recurrences produce a well-conditioned basis? (in-

terpretation of look-ahead techniques in FP computations);

• Inaccurate Krylov subspace methods [Sleijpen et al. - 02],

[Simoncini, Szyld - 02].

243

Lecture VIII

NUMERICAL BEHAVIOUR

II.

244

1. Orthogonality and numerical stability

2. Householder GMRES

3. Modified Gram-Schmidt GMRES

4. What should follow

245

5.1 Orthogonality and numerical stability

[Liesen, Rozloznık, S - 02]: The choice of the subspace (basis)

which is used in computations can have a fundamental impact.

‖rn‖ = minu∈x0+Kn(A,r0)

‖b − Au‖ = minz∈A Kn(A,r0)

‖r0 − z‖

⇔ rn ⊥ A Kn(A, r0) .

246

The straightforward approach

Based on the orthonormal basis of AKn(A, r0) . Define w1 ≡Ar0/‖Ar0‖ , v1 = r0/‖r0‖ . Then the recursive columnwise QR-

factorization yields

[Av1, AWn−1] = A [v1, Wn−1] ≡ Wn Rn ,

Wn ≡ [w1, . . . , wn], WTn Wn = In ,

span {w1, . . . , wn} = AKn(A, r0) ,

κ([v1, Wn−1]) ≤ κ(Rn) ≤ κ(A)κ([v1, Wn−1]) .

247

Using the orthonormal basis w1, . . . , wn of AKn(A, r0) :

xn = x0 + [v1, Wn−1] tn ∈ x0 + Kn(A, r0) ;

⇒ rn = r0 − A [v1, Wn−1] tn = r0 − WnRn tn ;

⇒ tn = (WnRn)+ r0 = R−1

n WTn r0 .

How does this affect the numerical stability?

248

Theorem

‖rn‖‖r0‖

= σn+1([v1, Wn])σ1([v1, Wn]) =2κ([v1, Wn])

κ([v1, Wn])2 + 1,

‖r0‖‖rn‖

≤ κ([v1, Wn]) ≤ 2‖r0‖‖rn‖

,

‖r0‖‖rn‖

≤ κ(Rn) ≤ 2κ(A)‖r0‖‖rn‖

.

249

Consequently, κ(Rn) must inevitably increase as ‖rn‖ de-

creases, even for small κ(A) and with the most stable way of

computing w1, . . . , wn.

⇒ Computation of tn = R−1n WT

n r0 is inherently unstable!

Surprise: Numerical behaviour gets worse when orthogonality

of w1, . . . , wn is maintained better; Householder implementation

performs worse than Modified-Gram-Schmidt implementation.

The straightforward approach is used in “Simpler GMRES” [Walker,

Lu Zhou - 94], and it is related to other implementations, e.g.

Orthodir.

250

Classical GMRES implementation

Based on the orthonormal basis of Kn(A, r0) . Let v1 ≡ r0/‖r0‖.Then the Arnoldi process yields

AVn = Vn+1 Hn+1,n ,

Vn ≡ [v1, . . . , vn], V Tn Vn = In ,

span {v1, . . . , vn} = Kn(A, r0) ,

κ(Hn+1,n) ≤ κ(A) .

251

Using the orthonormal basis v1, . . . , vn of Kn(A, r0) :

xn = x0 + Vn zn ∈ Kn(A, r0) ,

⇒ rn = r0 − AVn zn = r0 − Vn+1Hn+1,n zn ,

⇒ zn = (Vn+1Hn+1,n)+ r0 = (Hn+1,n)

+ V Tn+1 r0 .

How does this affect the numerical stability?

252

0 10 20 30 40 50 60 70 8010

−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

102

iteration number

rela

tive

resi

dual

nor

ms

Householder implementations of classical and simpler GMRES

FS1836(183), cond(A)=1.7367e+11

Householder implementations

(residual and backward error).

Excessive ill-conditioning of

computed Rn leads in the

straightforward implementation

to divergence.

0 10 20 30 40 50 60 70 8010

−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

102

iteration number

rela

tive

resi

dual

nor

ms

MGS implementations of classical and simpler GMRES

FS1836(183), cond(A)=1.7367e+11

MGS implementations (residual

and backward error). Due to

rounding errors the identity is vi-

olated, and the computed Rn is

not so badly ill-conditioned as it

ideally should be!

253

Moral

The choice of the right subspace (basis) is fundamental for nu-

merical stability of the Krylov subspace methods.

Even the best orthogonalization technique in computing the ba-

sis (here Householder reflections) can not compensate for insta-

bilities artificially created due to a bad choice of the subspace.

Paradoxically - preserving orthogonality of the computed basis

can even make things worse!

In the rest of the text we concentrate on classical GMRES.

254

Numerical stability of the GMRES implementation based on the

(ideally) orthonormal basis of Kn(A, r0) – many related publi-

cations, e.g.

[Bjorck - 67], [Bjorck, Paige - 92], [Karlson - 91], [Arioli, Fassino

- 96], [Drkosova, Greenbaum, Rozloznık, S - 95], [Greenbaum,

Rozloznık, S - 96], [Rozloznık 1997], [Paige, S - 02, 02, 02],

[Giraud, Langou, Rozloznık, van der Eshof - 05], [Rozloznık,

Paige, S - in progress]

255

Arnoldi process ≡ recursive columnwise QR decomposition

256

5.2 Householder GMRES

Implementation using Householder reflections was suggested in

[Walker 1988, 89].

A tedious but straightforward proof in [DGRS -95]:

Householder - reflections based implementation of GMRES com-

putes a backward stable approximate solution.

257

5.3 Modified Gram-Schmidt GMRES

vn+1 = (I − vnv∗n − . . . − v1v∗1) Avn

= (I − vnv∗n) . . . (I − v1v∗1) Avn

A common general belief: MGS orthogonalization is a goodcompromise between propagation of errors (loss of orthogonal-ity) and algorithm efficiency (computational cost). Price - thecomputation is recursive!

Comparison with classical Gram-Schmidt, Householder reflec-tions, Givens rotations.

258

However:

Despite the loss of orthogonality, some algorithms with MGS

provide results as good as algorithms using the most stable or-

thogonalization processes (with the loss of orthogonality among

the basis vectors kept close to the machine precision level).

Theoretical justification?

• Linear least squares: [Bjorck, Paige - 92];

• Our case: MGS GMRES.

259

In MGS GMRES, loss of orthogonality is controlled by conver-

gence.

Statement:

Loss of orthogonality ‖I − V ∗n+1Vn+1‖F in the modified

Gram–Schmidt Arnoldi process

is inversely proportional

to the value of the GMRES backward error

‖b − Axn‖‖b‖ + ‖A‖‖xn‖

.

260

FS1836, b = ones, x0 = 0

0 5 10 15 20 25 30 35 40 45 50

10−15

10−10

10−5

100

iteration number

See the difference between the relative residual and backward

error (it is interesting to see the same for x0 6= 0).

261

FS1836, b = ones, x0 = randn

0 5 10 15 20 25 30 35 40 45 50

10−15

10−10

10−5

100

iteration number

residual smooth ubound backward error loss of orthogonalityapproximate solution error

262

link the spectrum with convergence ?

10−2

100

102

104

106

108

1010

−0.04

−0.03

−0.02

−0.01

0

0.01

0.02

0.03

0.04Spectrum of the matrix FS1836

263

Sherman 2, b MM, x0 = 0

0 100 200 300 400 500 600 700 800 900

10−15

10−10

10−5

100

iteration number

residualsmooth uboundbackward errorloss of orthogonalityapproximate solutionerror

264

Proof? A long and yet unfinished story, links with Scaled Total

Least Squares and other strange problems.

[Paige, Rozloznık, S - 07]

MGS GMRES is mbackward stable !

265

Sherman 2, b MM, x0 = 0

0 100 200 300 400 500 600 700 800 90010

−2

10−1

100

101

102

103

104

iteration number

beta * loss of orthogonality / epsilonbeta * kappa − nearly opt. scalingbeta * kappa − scaling for elegancebeta * kappa − no scaling

266

5.4 What should follow

Loss of orthogonality in MGS Arnoldi

‖I − V Tk+1Vk+1‖F ≈ κ([r0γ, AVkDk])O(ε) .

Loss of orthogonality in CGS Arnoldi (could be based on recent

result [GLRvdE - 05])

‖I − V Tk+1Vk+1‖F ≈ κ([r0γ, AVkDk])

2O(ε) .

Numerical stability of restarted CGS GMRES.

267

Closure

268

Repeatedly appearing motives:

• Nonlinear problems in numerical linear algebra;

• Backward stability;

• Intermediate quantities and accuracy of final results;

• Theoretical results link computational cost and numerical

stability;

269

• Analysis of iterative methods: Convergence bounds, round-

ing error estimates are tools, not the goals;

• CG, Lanczos – Gauss Q of R-S integral; Scaled TLS fun-

damentals - GMRES: Unexpected, revealing links, analytic

and computational (finite precision arithmetic)parts deeply

connected.

• Lanczos, MGS GMRES: Loss of orthogonality means con-

vergence.

Possible work (in progress):

270

• RS integral model of FP CG and Lanczos;

• Instability of Krylov subspaces;

• Core problem in FP arithmetic, its use for regularization;

• Regularization property of Ksp methods;

• Using GMRES parametrization in [Arioli, Ptak, S - 98] for

mapping the sets of problems;

• Restarted CGS GMRES ...

271

The goal is understanding, and putting it into service of practical

computations.

From practical point of view, we need

• Analytical theory of acceleration (preconditioning),

• Reliable and efficient stopping criteria,

• Theoretical justification of numerical stability.

272

Selected references

The presented papers can be downloaded from

http://www.cs.cas.cz/strakos, or obtained upon request from

the author. They represent a complementary material for the

course, and most of them contain an extensive list of references,

which can be used for further reading.

Z.S. and J. Liesen, On numerical stability in large scale numerical

computations, Z. Angew. Math. Mech. 85, pp. 307–325, 2005.

Relevant to parts II, VII, VIII.

P. Jiranek, Z.S. and M. Vohralık, A posteriori error estimates

including algebraic error: computable upper bounds and stopping

criteria for iterative solvers, submitted to SIAM J. Sci. Comput.,

2008. Relevant to part II.

C. C. Paige and Z. S., Core problems in linear algebraic systems,

SIAM J. Matrix Anal. Appl. 27, pp. 861–875, 2007. Relevant to

part II.

C. C. Paige and and Z. S., Scaled total least squares funda-

mentals, Numer. Math. 91, pp. 117–146, 2002. Relevant to part

II.

I. Hnetynkova, and Z. S., Lanczos tridiagonalization and core

problems, Linear Algebra Appl. 421, pp. 243–251, 2007. Relevant

to parts II, V.

Z. S. and Petr Tichy, On efficient numerical approximation of the

scattering amplitude of c∗A−1b via matching moments, submitted

to SIAM J. Sci. Comput., 2009. Relevant to parts III, IV, V.

Z. S., Model reduction using the Vorobyev moment problem,

Numerical Alg., published electronically in September 2008. Rel-

evant to parts III, IV, V.

J. Liesen and Z. S., On optimal short recurrences for generating

orthogonal Krylov subspace bases, SIAM Review, 50, pp. 485-

503, 2008. Relevant to part III.

D. P. O’Leary, Z. S. and Petr Tichy, On sensitivity of Gauss-

Christoffel quadrature, Num. Math. 107, pp. 147–174, 2007.

Relevant to parts IV, V.

G. Meurant and Z. S., The Lanczos and conjugate gradient algo-

rithms in finite precision arithmetic, Acta Numerica 15, pp. 471–

542, 2006. Relevant to parts IV, V, VII, VIII.

Z. S. and P. Tichy, On Error Estimation in the Conjugate Gradi-

ent Method and Why It Works In Finite Precision Computations,

Electronic Trans. Numer. Anal. (ETNA) 13, pp. 56-80, published

online, 2002. Relevant to parts IV, V, VII.

Z. S. and P. Tichy, Error Estimation in Preconditioned Conju-

gate Gradients, BIT Numerical Mathematics, 45, pp. 789-817,

2005. Relevant to parts V, VII.

M. Arioli, V. Ptak and Z. S., Krylov Sequences of Maximal

Length and Convergence of GMRES, BIT 38, pp. 636–643.

1998. Relevant to part VI.

A. Greenbaum, V. Ptak, and Z. S., Any Convergence Curve is

Possible for GMRES, SIAM Matrix Anal. Appl. 17, pp. 465-470,


J. Liesen and Z. S., GMRES Convergence Analysis for a Convection-

Diffusion Model Problem, SIAM J. on Sci. Comput., 26, pp.

1989-2009, 2005. Relevant to part VI.

J. Liesen and Z. S., Convergence of GMRES for Tridiagonal

Toeplitz Matrices, SIAM J. Matrix Anal. Appl., 26, pp. 233-251,


M. H. Gutknecht and Z. S., Accuracy of Two Three-Term and

Three Two-Term Recurrences for Krylov Space Solvers, SIAM

J. Matrix Anal. Appl. 22, pp. 213–229, 2001. Relevant to parts

VII, VIII.

J. Liesen, M. Rozloznık and Z. S., Least Squares Residuals and

Minimal Residual Methods, SIAM J. Sci. Comput. 23, pp. 1503-

1525, 2002. Relevant to parts VII, VIII.

C.C. Paige, M. Rozloznik and Z. S., Modified Gram-Schmidt

(MGS), Least Squares, and Backward Stability of MGS-GMRES,

SIAM J. Matrix Anal. Appl. 28, pp. 264-284, 2006. Relevant to

parts VII, VIII.

modern methods of numerical linear algebra – principles ...strakos/download/2008_kamenice.… ·...

Documents