linear algebra article · linear algebra article winter school 2017 1. contents 1 why worry? 3 2...

Linear Algebra

Article

Winter School 2017

1

Contents

1 Why Worry? 3

2 Beyond Square Matrices 18

3 Alternatives to Direct Methods 30

4 Building on Simple Methods 48

5 Review Material 67

2

1 Why Worry?

References

• Numerical Linear Algebra by Lloyd N. Trefethen and David Bau, III. SIAM 1997.

• Applied Numerical Linear Algebra by James W. Demmel. SIAM 1997

Arithmetic Disasters

The following reference lists some real life consequences of round-off errors.

• http://ta.twi.tudelft.nl/nw/users/vuik/wi211/disasters.html

– Patriot Missile Failure.

– Explosion of the Ariane 5.

– EURO page: Conversion Arithmetics.

– The Vancouver Stock Exchange.

– Rounding error changes Parliament makeup.

– The sinking of the Sleipner A offshore platform.

– Tacoma bridge failure (wrong design).

– Collection of Software Bugs.

Poor Programming

These examples are more due to poor prgramming practises than round-off errors. But they dohighlight the need to take care of numerical considerations.

• http://www5.in.tum.de/~huckle/bugse.html

– Hammer throwing, London olympics, (Software would not accept athlete’s results be-cause it was exactly the same as the previous athlete’s results, 2012).

– Mars Climate Orbiters, Loss (Mixture of lb and kg, 1999).

– Green Party Convent fails (By rounding error and erronous use of Excel the wrongnumber of delegates is computed, 2002).

– London Millenium Bridge, wobbling (compare Tacoma Bridge). (Simulation fails be-cause of wrong estimates for pedestrian forces, 2000).

– Vancouver Stock Exchange Index (Rounding Error, 1983).

– Shut down of Nuclear Reactors (Use of wrong norm in CAD system, 1979).

– Ozone Hole ignored until 1985 (Software had to set aside data points that deviatedgreatly from expected measurements).

Sample Variance Example

3

Definition 1 (Two pass sample variance calculation).

x =1

n

n∑i=1

xi,

s2n =1

n− 1

n∑i=1

(xi − x)2.

Definition 2 (One pass sample variance calculation).

s2n =1

n− 1

n∑i=1

x2i −1

n

(n∑

i=1

xi

)2 .

The above two and one pass definitions for calculating the sample variance are mathematicalyequivalent. The one pass definition may seem more effective because it only accesses the data once.However, the one pass definition gives bad values in the presence of round-off errors because itcontains the subtraction of two large positive numbers. Note the relative error for the subtractionof two numbers a and b is large if |a − b| << |a| + |b|. See the discussion on cancellation errorsbelow. Example 1 shows that due to cancellation it is possible to get negative answers with the onepass method, which is clearly incorrect. The two pass formulation gives accurate results, unless nis large. Some calculators have used the one pass formula.

Example 1 (One pass method). I wrote a C code to calculate the sample variance of the threenumbers 784318, 784319, 784320. When using single floating point precision the variance calcu-lated from the one pass method was -65536. The sample variance using the two pass method was 1(the correct answer). When I tried double precision both methods gave an answer of 1.

Some Sources of Error

Definition 3 (Truncation Error). Truncation (or discresisation) error is the difference between thetrue result and the result that would be given if exact arithmetic is used. Eg. truncating an infiniteseries.

Definition 4 (Rounding Error). The rounding error is the difference between the results obtainedby a particular algorithm using exact arithmetic and the results obtained by the same algorithmusing finite precision.

Example 2 (Simpson’s Rule). Consider for example, Simpson’s Rule. We know that it is an O(h5)method and saying that it is O(h5) gives information about the truncation error. But when we tryto solve such problems on the computer extra sources of error due to floating point arithmetic arebeing introduced.

Floating Point Arithmetic

Due to rounding error, arithmetic operations on computers are not (always) exact.

4

Definition 5 (Floating Point Arithmetic). We shall denote an evaluation of an expression infloating point arithmetic by fl. If represents the basic arithmetic operations +,−,×, / then

fl(x y) = (x y)(1 + δ), |δ| ≤ u

where u is the ‘unit roundoff’.

The round off error is then |x y − fl(x y)|.

Example 3 (Inner product). Consider the inner product

sn = xTy = x1y1 + · · ·+ xnyn. (1)

Lets assume that we are summing from left to right. Define the partial sum si by si = x1y1 +x2y2 + · · ·+ xiyi. Now,

s1 := fl (x1y1) = x1y1 (1 + δ1) .

s2 := fl (s1 + fl(x2y2))

= fl (s1 + x2y2(1 + δ2))

= (s1 + x2y2(1 + δ2)) (1 + δ3)

= (x1y1(1 + δ1) + x2y2(1 + δ2)) (1 + δ3)

= x1y1(1 + δ1)(1 + δ3) + x2y2(1 + δ2)(1 + δ3), where |δi| < u.

Drop the subscripts and let 1 + δi ≡ 1± δ. Then,

s3 := fl (s2 + x3y3)

= (s2 + x3y3(1± δ)) (1± δ)=

(x1y1(1± δ)2 + x2y2(1± δ)2 + x3y3(1± δ)

)(1± δ)

= x1y1(1± δ)3 + x2y2(1± δ)3 + x3y3(1± δ)2

and in general,

sn = x1y1(1± δ)n + x2y2(1± δ)n + x3y3(1± δ)n−1 + · · ·+ xnyn(1± δ)2.

Finally, by using the lemma given below we get

sn = x1y1(1 + θn) + x2y2(1 + θ′n) + x3y3(1 + θn−1) + · · ·+ xnyn(1 + θ2),

where |θn| ≤ nu/(1− nu).In otherwords sn = xT y where

y = y1(1 + θn) + y2(1 + θ′n) + y3(1 + θn−1) + · · ·+ yn(1 + θ2).

5

Lemma 1.1. If |δ| ≤ u and pi = ±1 for i = 1, · · · , n and nu < 1 then

n∏i=1

(1 + δi)pi = 1 + θn,

where|θn| ≤

nu

1− nu=: γn.

Forward and Backward Errors

Forward Errors: Relative and Absolute Errors.

Backward Errors: What is x such that f(x) = f(x) ?

I@@@

@@@R

....................................................................................................

hhhhhhhhhhhhhhhhhhhhh

@@@R

Ix y = f(x)

y = f(x+ δx)

x+ δx

backward errorforward error

For example,

sn = x1y1(1 + θn) + x2y2(1 + θ′n) + x3y3(1 + θn−1) + · · ·+ xnyn(1 + θ2)

is a backward error result. sn is the exact result for the perturbed set of data points

x1, x2, · · · , xn, y1(1 + θn), y2(1 + θ′n), · · · , yn(1 + θ2).

The forward error is

|xTy − fl(xTy)| ≤ γnn∑

i=1

|xiyi|.

Example 4 (Exponential).

f(x) = ex = 1 + x+x2

2!+x3

3!+x4

4!+ · · ·

f(x) = 1 + x+x2

2!+x3

3!.

If x = 1, then to seven decimal places;

f(x) = 2.718282 and f(x) = 2.666667.

Furthermore,x = log(2.666667) = 0.980829.

So, the forward error is |f(x)− f(x)| = 0.051615 and the backward error is |x− x| = 0.019171.

6

Conditioning

How sensitive is the solution to perturbations in the data

• insensitive ←→ well conditioned.

• sensitive ←→ ill-conditioned.

Relates forward and backward errors.If we include a pertubation in the data δx the relative error is

|f(x+ δx)− f(x)||f(x)|

=1

|f(x)||f(x+ δx)− f(x)|

|δx||δx|

≈ 1

|f(x)||f ′(x)||δx|

=|f ′(x)||x||f(x)|

|δx||x|

= Condition Number|δx||x|

Definition 6 (Condition Number).

Condition Number =|relative change in solution||relative change in input data|

=|(f(x)− f(x))/f(x)|

|(x− x)/x|

≈ |(x− x)f ′(x)/f(x)||(x− x)/x|

=

∣∣∣∣xf ′(x)

f(x)

∣∣∣∣Example 5 (Exponential). For example, consider f(x) = ln(x), then

c = Condition Number =|1/x||x|| ln(x)|

=1

| ln(x)|,

which is large if x ≈ 1.

Suppose x = 2 and set x = 2.02 so that the relative input error is 0.01. Then |f(x)−f(x)|/|f(x)|is 0.0143553.

If x = 1.5 and x = 1.515 (relative input error remains as 0.01), then |f(x) − f(x)|/|f(x)| is0.0245405.

If we set x = 1.01, moving closer to 1, and have x = 1.0201, then |f(x)−f(x)|/|f(x)| is 1.00000.

Horners Method

LetPn−1(t) = x0 + x1t+ · · ·+ xn−1t

n−1

7

1.92 1.94 1.96 1.98 2.00 2.02 2.04 2.06 2.08−14e−11

−10e−11

−6e−11

−2e−11

2e−11

6e−11

10e−11

14e−11

Figure 1: Plot of p(t) = (t− 2)9 evaluated at 8000 equidistant points

and set an−1 = xn−1.If ak = xk + ak+1t0 for k = n− 2, n− 3, · · · , 0 then

a0 = Pn−1(t0).

Moreover, ifQn−2(t) = an−1t

n−2 + an−2tn−3 + · · ·+ a2t+ a1

thenPn−1(t) = (t− t0)Qn−2(t) + a0.

Algorithm 1. Horner’s Method1: p = xn2: for i = n− 1 down to 0 do3: p = t ∗ p+ xi4: end for

Lets apply the algorithm to p(t) = (t − 2)9 = t9 − 18t8 + 144t7 − 672t6 + 2016t5 − 4032t4 +5376t3 − 4608t2 + 2304t− 512.

Figure 1 shows a plot of p(t) = (t− 2)9.Figure 2 shows a plot of p(t) using Horners method.Recall

Condition Number =

∣∣∣∣tf ′(t)f(t)

∣∣∣∣ =t× 9(t− 2)8

(t− 2)9=

∣∣∣∣ 9t

t− 2

∣∣∣∣ .so we would expect this to be illconditioned around t = 2. That is what we have seen in this case.

8

1.92 1.94 1.96 1.98 2.00 2.02 2.04 2.06 2.08−14e−11

−10e−11

−6e−11

−2e−11

2e−11

6e−11

10e−11

14e−11

18e−11

Figure 2: Plot of p(t) evaluated at 8000 equidistant points using Horners method

Note that the condition number depends on the problem not the method that isused.

Lets rewrite Horners method as

Algorithm 2. Horner’s Method - Rewrite1: pn = xn2: for i = n− 1 down to 0 do3: pi = t ∗ pi+1 + xi4: end for

Now use floating point arithmetic to give

Algorithm 3. Horner’s Method - fp1: pn = xn2: for i = n− 1 down to 0 do3: pi = ((t ∗ pi+1)(1 + δi) + xi)(1 + δ′i)4: end for

where |δi|, |δ′i| < u.Expanding that out we get

p(t) =

n−1∑i=0

(1 + δ′i)i−1∏j=0

(1 + δj)(1 + δ′j)

xiti +

n−1∏j=0

(1 + δj)(1 + δ′j)

xntn.

9

This can be simplified to give

p(t) =

n−1∑i=0

(1 + 2θi)xiti

=n−1∑i=0

xiti

where |θi| ≤ iu/(1− iu) ≤ nu/(1− nu) = γn.So the computed solution p(t) is the correct solution of a slightly peturbed polynomial with

coefficients xn. This is a backward stable method and the relative backward error is 2γn.Let us verify the above equation.Polynomial of degree 0:

p0 = x0.

Polynomial of degree 1:p1 = x1.

p0 = fl (t× p1 + x0)

= ((t× p1)(1 + δ0) + x0) (1 + δ′0)

= (t× x1)(1 + δ0)(1 + δ′0) + x0(1 + δ′0).


p1 = fl (t× p2 + x1)

= ((t× p2)(1 + δ1) + x1) (1 + δ′1)

= (t× x2)(1 + δ1)(1 + δ′1) + x1(1 + δ′1).

p0 = fl (t× p1 + x0)

= ((t× p1)(1 + δ0) + x0) (1 + δ′0)

=(t×[(t× x2)(1 + δ1)(1 + δ′1) + x1(1 + δ′1)

](1 + δ0)

)(1 + δ′0) + x0(1 + δ′0)

= (t2 × x2)(1 + δ1)(1 + δ′1)(1 + δ0)(1 + δ′0) + t× x1(1 + δ′1)(1 + δ0)(1 + δ′0) + x0(1 + δ′0).


p2 = fl (t× p3 + x2)

= (t× x3)(1 + δ2)(1 + δ′2) + x2(1 + δ′2).

p1 = fl (t× p2 + x1)

= (t2 × x3)(1 + δ2)(1 + δ′2)(1 + δ1)(1 + δ′1) + t× x2(1 + δ′2)(1 + δ1)(1 + δ′1)

+x1(1 + δ′1).

10

p0 = fl (t× p1 + x0)

= ((t× p1)(1 + δ0) + x0) (1 + δ′0)

= (t× p1)(1 + δ0)(1 + δ′0) + x0(1 + δ′0)

=(t×[(t2 × x3)(1 + δ2)(1 + δ′2)(1 + δ1)(1 + δ′1)

+ (t× x2)(1 + δ′2)(1 + δ1)(1 + δ′1) + x1(1 + δ′1)])

(1 + δ0)(1 + δ′0) + x0(1 + δ′0)

= (t3 × x3)(1 + δ2)(1 + δ′2)(1 + δ1)(1 + δ′1)(1 + δ0)(1 + δ′0)

+ (t2 × x2)(1 + δ′2)(1 + δ1)(1 + δ′1)(1 + δ0)(1 + δ′0)

+ (t× x1)(1 + δ′1)(1 + δ0)(1 + δ′0) + x0(1 + δ′0)

=3−1∑i=0

(1 + δ′i)i−1∏j=0

(1 + δj)(1 + δ′j)

xiti +

i−1∏j=0

(1 + δj)(1 + δ′j)

x3t3.Cancellation

Example 6 (Cancellation error). Consider the function

f(x) =1− cosx

x2.

The plot in Figure 3 suggests that f(x) < 0.5.

If x = 1.2 × 10−5, then c = cos(x) to 10 significant figures is 0.9999 999999 and 1 − c =0.0000000001. Hence

1− cx2

=10−10

1.44× 10−10> 0.6944.

But f(x) < 0.5 for all x 6= 0.We can get around this problem by noting that

cosx = 1− 2 sin2(x

2)

and rewriting f(x) as

f(x) =1

2

(sin(x/2)

x/2

)2

.

Evaluating this formula using 10 significant digits gives f(x) = 0.5.What went wrong in the above example?Consider x = a− b where a = a(1 + δa), b = b(1 + δb), which is an approximation to x = a− b.

Then ∣∣∣∣x− xx∣∣∣∣ =

∣∣∣∣a− b− (a(1 + δa)− b(1 + δb))

a− b

∣∣∣∣=

∣∣∣∣−aδa+ bδb

a− b

∣∣∣∣≤ max|δa|, |δb| |a|+ |b|

|a− b|.

11

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

-10 -5 0 5 10

(1-cos(x))/(x*x)

Figure 3: Plot of f(x) = (1− cosx)/x2.

12

So if |a|+ |b|/|a− b| is large, the relative error in the subtraction of a and b will be large.⇒ heavy cancellation magnifies uncertainties already present in a and b.

The cause of the error in Example 6 is that cos(x) ≈ 1 when x ≈ 0.Cancellation is not always bad. Maybe the value of the initial numbers is known exactly

(ie. δa = δb = 0). The cancellation may not affect the remaining calculation. For example, ifx >> y ≈ z > 0 then the cancellation in x+ (y − z) is harmless.

Quadratic Equation Example

Example 7 (Cancellation in the quadratic equation). Consider the equation ax2 + bx+ c = 0.The two roots are

x =−b±

√b2 − 4ac

2aif a 6= 0.

If b2 >> |4ac| then√b2 − 4ac ≈ |b| so for one choice of the sign, the above equation suffers

from bad cancellation errors.

To avoid the problem, obtain the larger root, say x1, from

x1 =−(b+ sign(b)

√b2 − 4ac)

2aif a 6= 0.

and x2 by

x1x2 =c

a.

Stability

Recall that the relative error is defined to be

||f(x)− f(x)||||f(x)||

.

Definition 7 (Stable). We say that an algorithm is stable if for each x ∈ X

||f(x)− f(x)||||f(x)||

= O(u)

for some x with||x− x||||x||

= O(u).

u is the unit roundoff.

Backward and Forward Stability

13

Definition 8 (Backward Stable). We say that an algorithm f is backward stable if for each x ∈ X

f(x) = f(x)

for some x with||x− x||||x||

= O(u).

Definition 9 (Forward Stable). A method is forward stable if it gives forward errors with simi-lar magnitude (taking the condition number into account) to those produced by a backward stablemethod. That is

forward error<∼ backward error× condition number

Inner Product

Recall that

fl(x∗y) = (x + ∆x)∗y

= x∗(y + ∆y)

where||∆x|| ≤ γn||x||

and||∆y|| ≤ γn||y||,

where γn = nu/(1− nu).Hence the inner product is backward stable.

Outer Product

The outer product A = xy∗ is stable but not backward stable.Consider A = xy∗. By definition

aij = xiyj(1 + δij).

SoA = xy∗ + ∆,

where ‖∆‖ ≤ f(u)‖xy∗‖ and ∆ij = xiyjδij .So the algorithm is stable. That is

‖A−A‖‖A‖

=‖xy∗ + ∆− xy∗‖

‖xy∗‖≤ f(u)

‖xy∗‖‖xy∗‖

= O(u).

But is not backward stable. A will not in general be rank 1 so it not possible to find x = x+∆xand y = y + ∆y such that A = (x + ∆x)(y + ∆y)∗.

Backward Stability and Relative Error

14

Theorem 1. Suppose a backward stable algorithm is applied to solve a problem f : X → Y withcondition number κ. Then the relative errors satisfy

||f(x)− f(x)||||f(x)||

= O(κ(x)u).

Proof:The condition number is

κ(x) =‖f ′(x)‖ ‖x‖‖f(x)‖

≈ ‖f(x + δx)− f(x)‖‖f(x)‖

‖x‖‖δx‖

.

By definition of backward stability, f(x) = f(x) for some x ∈ X such that

‖x− x‖‖x‖

= O(u).

Take x = x + δx. From the definition of the condition number

‖f(x)− f(x)‖‖f(x)‖

≈ ‖δx‖‖x‖

κ(x) = O (κ(x)u) .

Matrix Norms

Definition 10 (Matrix Norms). Given a vector norm, we can define the corresponding matrixnorms as follows;

‖A‖ = max‖x‖6=0

‖Ax‖‖x‖

.

These norms are subordinate to the vector norms.For the 1-norm and ∞-norm these simplify to;

• ‖A‖1 = maxj∑n

i=1 |aij |.

• ‖A‖∞ = maxi∑n

i=j |aij |.

Condition Number of a matrix

Let b be fixed and consider the problem of computing x = A−1b, where A is square andnonsingular.

Definition 11 (Condition Number). The condition number of this problem with respect to pertur-bations in A is

κ = ‖A‖∥∥A−1∥∥ = κ(A).

15

If ‖ · ‖ = ‖ · ‖2, then ‖A‖ = σ1 and∥∥A−1∥∥ = 1/σm where σ1 is the maximum singular value and

σm is the minimum singular value. So

κ(A) =σ1σm

.

For a rectangular matrix A ∈ Cm,n of full rank, m ≥ n,

κ = ‖A‖∥∥A+

∥∥ = κ(A).

Theorem 1.2. Now let E = A − A = ∆A. Consider Ax = b and Ax = b where b is non-zero.Then

‖x− x‖‖x‖

≤ ‖A−1E‖ ≤ κ(A)‖E‖‖A‖

.

Proof:Consider (A+ E)x = b. Then

A−1(A+ E)x = A−1b

⇔ A−1(A+ E)x = x

⇔ (I +A−1E)x = x

⇔ x− x = A−1Ex

⇔ ‖x− x‖ ≤ ‖A−1E‖ ‖x‖.

Hence‖x− x‖‖x‖

= ‖A−1E‖ ≤ ‖A−1‖ ‖E‖ = κ(A)‖E‖‖A‖

.

As an aside, note that if‖y − x‖‖x‖

≤ ρ < 1

then‖x− y‖‖y‖

≤ ρ

1− ρ.

Becauseρ‖x‖ ≥ ‖y − x‖ ≥ ‖x‖ − ‖y‖.

so(1− ρ)‖x‖ ≤ ‖y‖

and‖x− y‖‖y‖

≤ ‖x− y‖(1− ρ)‖x‖

≤ ρ

1− ρ.

Therefore‖x− x‖‖x‖

≤ κ(A)‖E‖‖A‖

16

gives

‖x− x‖‖x‖

≤κ(A)‖E‖‖A‖

1− κ(A)‖E‖‖A‖

<∼ κ(A)‖E‖‖A‖

,

assuming κ(A)‖E‖‖A‖ is small.

Suppose ‖E‖ ≤ u‖A‖, then‖x− x‖‖x‖

<∼ κ(A)u.

17

Table 1: Example data points

i t yi i t yi1 2 2 3 6 282 4 11 4 8 40

2 Beyond Square Matrices

References

• Numerical Linear Algebra by Lloyd N. Trefethen and David Bau, III. SIAM 1997.

• Applied Numerical Linear Algebra by James W. Demmel. SIAM 1997

Data Fitting

Example 8 (Data fitting). Suppose we want to find a line at + b that best fits the data given inTable 1.

In linear least squares we minimise

m∑i=1

[yi − (ati + b)]2.

To find the minimum use the derivatives

0 =∂

∂a(

m∑i=1

[yi − (ati + b)]2)

and

0 =∂

∂b(

m∑i=1

[yi − (ati + b)]2).

Evaluating the derivatives and solving the resulting system of equations gives the solution.

0 =m∑i=1

2 [yi − (ati + b)] (−ti)⇒m∑i=1

yiti =

(m∑i=1

t2i

)a+

(m∑i=1

ti

)b.

0 =m∑i=1

2 [yi − (ati + b)] (−1)⇒m∑i=1

yi =

(m∑i=1

ti

)a+

(m∑i=1

1

)b.

As a system of equations,[n

∑mi=1 ti∑m

i=1 ti∑m

i=1 t2i

] [ba

]=

[ ∑mi=1 yi∑mi=1 yiti

].

18

Normal Equations

In matrix form, linear least squares is equivalent to minimising

||r||22 = r∗r, where r = b−Ax.

Now,

||r||22 = (b−Ax)∗(b−Ax)

= (b∗ − x∗A∗)(b−Ax)

= b∗b− 2x∗A∗b− x∗A∗Ax

andd(||r||22)dx

= 2A∗Ax− 2A∗b.

To minimise ||r||22 we want to solve the normal equations

A∗Ax = A∗b.

Note that A∗A ∈ Cn,n so its size is usually small.If rank(A)= n then A∗A is nonsingular so a solution exits.A∗A is symmetric and positive definite (SPD), so Cholesky factorisation can be used to find

A∗A = LL∗.

However, there is a problem with the condition number; cond(A∗A) = [cond(A)]2.

Example 9 (Data fitting continued). Consider the example on first slide. Then

A =

1 21 41 61 8

, x =

[ba

], b =

2112840

.

A∗A =

[4 2020 120

]=

[n

∑mi=1 xi∑m

i=1 xi∑m

i=1 x2i

], A∗b =

[81538

]=

[ ∑mi=1 yi∑m

i=1 yixi

].

Hence

x =

[ab

]=

[−136.65

].

QR factorisation

We are taught in school to solve equations like Ax = b by using Gaussian elimination. Gaussianelimination is also known as the LU factorisation because it decomposes A into a lower and uppertriangular matrix A = LU (see the revision chapter).

19

In the QR factorisation A is decomposed into a Q with orthonormal columns and an uppertriangular matrix R.

ConsiderAx = b ⇐⇒ QRx = b ⇐⇒ Rx = Q∗b.

Now, y = Q∗b is easy to calculate and Rx = y is easy to solve since R is an upper triangularmatrix.

Full QR factorisation

Definition 1 (Full QR factorisation). In the full QR factorisation we write A ∈ Cm,n as

A = QR

where Q is an m×m unitary matrix and R is m× n (the last m− n columns are zero, if m ≥ n).

Reduced QR Factorisation

Let q1, q2, · · · , qj be a set of orthonormal vectors.Suppose we have a matrix A with columns a1,a2, · · · ,aj and suppose that A has full rank.

Can we find q1, q2, · · · , qj such that

〈q1, q2, · · · , qj〉 = 〈a1,a2, · · · ,aj〉 for j = 1, · · · , n?

In particular we would like to write the columns of A as a1 a2 · · · an

=

q1 q2 · · · qn

r11 r12 · · · r1n

r22 · · ·...

. . ....rnn

where rkk 6= 0.

Note that this equivalent to saying

a1 = r11q1,

a2 = r12q1 + r22q2,

a3 = r13q1 + r23q2 + r33q3

So A = QR where the columns of Q are orthonormal and R is a square upper triangular matrix.This is the reduced QR factorisation of A.

Orthonormal Vectors

Using the ideas given in the Review chapter, we know that given, say, three linearly independentvectors a1, a2 and a3 we can produce three orthonormal vectors q1, q2 and q3 as follows;

1. q1 = a1/‖a1‖2.

2. v2 = a2 − (a∗1a2)/(a∗1a1)a1 = a2 − (q∗1a2)q1.

3. q2 = v2/‖v2‖2.

20

4. v3 = a3 − (q∗1a3)q1 − (q∗2a3)q2.

5. q3 = v3/‖v3‖2.

Gram-Schmidt Orthogonalisation

We can generalise the ideas from the previous slide to construct a set of orthonormal vectorsgiven a set of linearly independent vectors.

Let us rewrite this as

q1 =a1

r11,

q2 =a2 − r12q1

r22,

q3 =a3 − r13q1 − r23q2

r33,

...

qn =an −

∑n−1i=1 rinqirnn

.

This implies that an appropriate choice for rij , if i 6= j is rij = q∗i aj .The coefficients rjj are chosen to ensure that qi is normalised.

|rjj | = ‖vj‖2 =

∥∥∥∥∥aj −j−1∑i=1

rijqi

∥∥∥∥∥2

.

Consequently we have the following (classical) Gram-Schmidt algorithm;

Algorithm 4. Gram-Schmidt1: for j = 1 to n do2: vj = aj

3: for i = 1 to j − 1 do4: rij = q∗i aj

5: vj = vj − rijqi6: end for7: rjj = ||vj ||28: qj = vj/rjj9: end for

This method unfortunately turns out to be numerically unstable if the columns are nearlylinearly dependent. Later we will look a modified version of this algorithm.

Example 10.

A =

1 −1.0 1.01 −0.5 0.251 0.0 0.01 0.5 0.251 1.0 1.0

.

21

j = 1

v1 = [1 1 1 1 1]T .

r11 =√

5

q1 =1√5

[1 1 1 1 1]T .

j = 2

v2 = [−1.0 − 0.5 0.0 0.5 1.0]T .

• i = 1r12 = 0

v2 = [−1.0 − 0.5 0.0 0.5 1.0]T .

r22 =

√5√2

q2 =

√2√5

[−1.0 − 0.5 0.0 0.5 1.0]T .

j = 3

v3 = [−1.0 0.25 0.0 0.25 1.0]T .

• i = 1

r13 =

√5

2.

v3 = [−1.0 0.25 0.0 0.25 1.0]T −√

5

2

1√5

[1 1 1 1 1]T

= [0.5 − 0.25 − 0.5 − 0.25 0.5]T .

• i = 2r23 = 0.

v3 = [0.5 − 0.25 − 0.5 − 0.25 0.5]T .

r33 =

√7√8

q3 =

√8√7

[0.5 − 0.25 − 0.5 − 0.25 0.5]T .

22

Therefore

A =

1/√

5 −√

2/√

5√

7/(2√

8)

1/√

5 −√

2/(2√

5) −√

7/(4√

8)

1/√

5 0.0 −√

7/(2√

8)

1/√

5√

2/(2√

5) −√

7/(4√

8)

1/√

5√

2/√

5√

7/(2√

8)

√

5 0√

5/2√5/√

2 0√7/√

8

.

Existence

Theorem 2. Every A ∈ Cm,n (m ≥ n) has a reduced (and full) QR factorisation.

Proof:Suppose A has full rank. The Gram-Schmidt algorithm gives proof of existence by construction.

The only place where the algorithm could fail is if rjj = ‖vj‖2 = 0 and qj = vj/rjj is not defined.This would in turn imply that

aj ∈ 〈q1, · · · , qj−1〉 = 〈a1, · · · ,aj−1〉

a contradiction to A having full rank. (Recall |rjj | = ‖aj −∑j−1

i=1 rijqi‖2, so if |rjj = 0| then

aj =∑j−1

i=1 rijqi.)Suppose A does not have full rank. Then at some stage of the algorithm we will get rjj = 0.

So we just pick some qj arbitrarily to be any normalised vector orthogonal to 〈q1, · · · , qj−1〉 andcontinue the process. Recall a1 = r11q1, a2 = r12q1 + r22q2, a3 = r13q1 + r23q2 + r33q3 etc. If,for example, a2 was a multiple of a1 then a2 = r12q1 and hence r22 = 0 and q2 may be chosenarbitrarily.

Modified Gram-Schmidt

In the classical Gram-Schmidt algorithm aj appears in the computations only in the jth stage.The method can be rearranged so that as soon as qj is computed all of the remaining vectors areorthogonalised against qj . Another way of looking at this is that we generate R by rows ratherthen columns.

Algorithm 5. Modified Gram-Schmidt1: for i = 1 to n do2: vi = ai

3: end for4: for i = 1 to n do5: rii = ||vi||26: qi = vi/rii7: for j = i+ 1 to n do8: rij = q∗i vj9: vj = vj − rijqi

10: end for11: end for

23

Householder Transformations

A Householder transformation matrix F is of the form

F = I − 2vv∗

v∗v.

Note that F = F ∗ = F−1.Given a vector a we wish to choose the vector v such that

Fa =

α0...0

= αe1.

Then

αe1 = Fa =

(I − 2vv∗

v∗v

)a = a− v

2v∗a

v∗v,

and

v = (a− αe1)v∗v

2v∗a.

We can ignore the (v∗v)/(2v∗a) term since it is cancelled when v is substituted into F . Thus

v = (a− αe1).

From||a||22 = ||Fa||22 = ||αe||22 = |α|2,

we get α = ±||a||2. The sign is chosen to avoid cancellation errors.

Example 11 (Householder).

a =

212

.v = a− αe1 =

212

− α 1

00

=

2− α12

.α = ±‖a‖2 = ±3. Since a1 is positive we take α = −3.

Therefore

v =

512

24

Table 2: Example data set for Householder algorithm.

i xi yi0 0 1.00001 0.25 1.28402 0.50 1.64873 0.75 2.11704 1.00 2.7183

Note

Fa = a− 2vTa

vTv=

212

− 215

30

512

=

−300

.Given a matrix A we use Householder transformations to introduce zeros column by column

below the diagonal.

Qn · · ·Q1A =

[R′

0

],

where R′ is upper triangular and

Qi =

[I 00 Fi

].

Take Q∗ = Qn · · ·Q1 then A = QR where

R =

[R′

0

].

Example 12 (Householder). Lets try to fit the quadratic a+ bx+ cx2 to the data given in Table 2.

Ax =

1.0 0.00 0.00001.0 0.25 0.06251.0 0.50 0.25001.0 0.75 0.56251.0 1.00 1.0000

abc

=

1.00001.28401.64872.11702.7183

.

v1 =

1.01.01.01.01.0

−−2.23607

0.00.00.00.0

=

3.23607

1.01.01.01.0

.From

Fa = a−(

2vTa

vTv

)v,

25

vT1 v1 = 14.47214, vT

1 a2 = 2.5, vT1 a3 = 1.875, vT

1 b = 11.00407.

F1

0.000.250.500.751.00

=

0.000.250.500.751.00

− 2× 2.5

14.47214

3.23607

1.01.01.01.0

=

−1.11803−0.09549

0.154510.404510.65451

.

F1

0.00000.06250.25000.56251.0000

=

0.00000.06250.25000.56251.0000

− 2× 1.875

14.47214

3.23607

1.01.01.01.0

=

−0.83853−0.19662−0.00912

0.303380.74088

F1

1.00001.28401.64872.11702.7183

=

1.00001.28401.64872.11702.7183

− 2× 11.00407

14.47214

3.23607

1.01.01.01.0

=

−3.92117−0.23672

0.127980.596281.19758

F1A =

−2.23607 −1.11803 −0.83853

0.00000 −0.09549 −0.196620.00000 0.15451 −0.009120.00000 0.40451 0.303380.00000 0.65451 0.74088

, F1b =

−3.92117−0.23672

0.127980.596281.19758

.

v2 =

−0.09549

0.154510.404510.65451

−

0.790570.00.00.0

=

−0.88607

0.154510.404510.65451

.vT2 v2 = 1.40099, vT

2 a3 = 0.78044, vT2 b = 1.25455.

F2

−0.19662−0.00912

0.303380.74088

=

−0.19662−0.00912

0.303380.74088

− 2× 0.78044

1.40099

−0.88607

0.154510.404510.65451

=

0.79056−0.18126−0.14730

0.01167

.

26

F2

−0.23672

0.127980.596281.19758

=

−0.23672

0.127980.596281.19758

− 2× 1.25455

1.40099

−0.88607

0.154510.404510.65451

=

1.35017−0.14874−0.12818

0.02539

.

F2F1A =

−2.23607 −1.11803 −0.83853

0.00000 0.79056 0.790560.00000 0.00000 −0.181260.00000 0.00000 −0.147300.00000 0.00000 0.01167

, F2F1b =

−3.92117

1.35017−0.14874−0.12818

0.02539

.

v3 =

−0.18126−0.14730

0.01167

− 0.23386

0.00.0

=

−0.41512−0.14730

0.01167

.vT3 v3 = 0.19416, vT

3 b = 0.08092.

F3

−0.14874−0.12818

0.02539

=

−0.14874−0.12818

0.02539

− 2× 0.08092

0.19416

−0.41512−0.14730

0.01167

=

0.19728−0.00540

0.01566

.

F3F2F1A =

−2.23607 −1.11803 −0.83853

0.00000 0.79056 0.790560.00000 0.00000 0.233860.00000 0.00000 0.000000.00000 0.00000 0.00000

.

F3F2F1b =

−3.92117

1.350170.19728−0.00540

0.01566

.So,

c = 0.19728/0.23386 = 0.8436.

b = (1.35017− 0.8436× 0.79056)/0.79056 = 0.8643.

a = (−3.92117−−1.11803× 0.8643−−0.83853× 0.8436)/− 2.23607 = 1.0051.

27

QR Factorisation

Given an m× n matrix A with m ≥ n, find an m× n orthogonal matrix Q such that

A = QR,

where R is an n× n upper triangular matrix.

||b−Ax||22 =∣∣∣∣∣∣b− QRx∣∣∣∣∣∣2

2=∣∣∣∣∣∣Q∗b− Rx∣∣∣∣∣∣2

2.

• Householder transformations.

• Givens transformations.

• Gram-Schmidt orthogonalisations.

Solving Overdetermined Systems using Normal Equations

1. Form A∗A and the vector A∗b.

2. Compute the Cholesky factorisation A∗A = LL∗.

3. Solve the lower-triangular system Lw = A∗b for w.

4. Solve the upper-triangular system L∗x = w for x.

Work for algorithm ∼ mn2 + 13n

3 flops.

Symmetric Positive Definite Matrices

If A is SPD then A = LL∗ for some L.

Algorithm 6. Cholesky factorisation1: for j = 1 to n do2: for k = 1 to j − 1 do3: for i = j to n do4: aij = aij − aikajk5: end for6: end for7: ajj =

√ajj

8: for k = j + 1 to n do9: akj = akj/ajj

10: end for11: end for

Solving Overdetermined Systems using QR Factorisation

1. Compute the reduced QR factorisation A = QR.

2. Compute the vector Q∗b.

28

3. Solve the upper-triangular system Rx = Q∗b for x.

Work for algorithm ∼ 2mn2 − 23n

3 flops.

Solving Overdetermined Systems using SVD

1. Compute the reduced SVD A = UΣV ∗.

2. Compute the vector U∗b.

3. Solve the diagonal system Σw = U∗b for w.

4. Set x = Vw.

Work for algorithm ∼ 2mn2 + 11n3 flops.

29

3 Alternatives to Direct Methods

References

Jonathan Richard Shewchuk, An Introduction to the Conjugate Gradient MethodWithout the Agonizing Pain.

https://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf

Direct and Iterative methods

We shall look at the solution of the linear system of equations Ax = b.Direct Methods solve the system exactly. Examples of direct methods include Gaussian

Elimination (LU factorisation), Cholesky factorisation etc.Iterative Methods solve the system approximately. Examples include Krylov subspace tech-

niques such as the Conjugate Gradient method, Arnoldi methods, GMRES, Lanczos iterations; andstationary methods like Jacobi, SOR and SSOR.

Direct v’s Iterative

Neither method beats the other in all cases, hence research is still very active in both fields.What follows is some pro’s and con’s for both methods.

• Direct methods, in general, require O(m3) work.

• Iterative methods exist which require O(m) or O(m logm) work.

• Many large systems arise from the discretisation of differential or integral equations. Exactsolution is not required. These systems have certain structures which can be exploited byiterative methods.

• Iterative methods preserve sparsity (but there are fast direct methods that reduce filinl).

• Direct methods require no initial guess, but they take no advantage of it if a good estimatehappens to be available.

• Iterative methods are often dependent on special properties, such as the matrix being sym-metric, positive definite and are subject to slow convergence for badly conditioned systems.

• Iterative methods are less readily employed in standard software packages, since the bestrepresentation of the matrix is often problem-dependent where as direct methods employmore standard storage schemes.

Overview

We use the Conjugate Gradient (CG) method to solve systems of the form Ax = b where A issymmetric and positive definite. I.e. AT = A and xTAx > 0, ∀x 6= 0.

Quadratic Form

The quadratic form of a function is

f(x) =1

2xTAx− bTx + c.

30

Figure 4: Geometrical representation of the solution of a system of equations

The gradient of f(x) is

f ′(x) =

∂

∂x1f(x)

∂∂x2

f(x)...

∂∂xn

f(x)

=

1

2ATx +

1

2Ax− b

= Ax− b.

Hence the minimum of f(x) is the solution of Ax = b.The solution of the system of equations represented in Figure 4 is the point where the two

lines cross. That is (2,−2). The minimum of the quadratic equation shown in Figure 5 is alsothe solution of the system of equations. The contour plot in Figure 6 shows more clearly that theminimum occurs at (2,−2).

Method of Steepest Descent

In the method of steepest decent we takes steps x1,x2, · · · down the slope of f(x) until wereach the minimum.

The direction of the step is given by −f ′(x). But,

−f ′(xi) = b−Axi = ri.

31

Figure 5: A plot of the quadratic equation

Figure 6: A contour plot of the quadratic equation

32

Figure 7: Take a step in the direction ri

So we can take a step by evaluating

xi+1 = xi + αri.

The next question is how big a step should we take? In other words how big should α be?Recalling that we want to minimise f(x),

∂f(xi+1)

∂α= f ′(xi+1)

T ∂xi+1

∂α

= f ′(xi+1)Tri.

So the minimum is reached when f ′(xi+1) is orthogonal to ri.Finally, noting that f ′(xi+1) = −ri+1 we can show that

α =rTi ri

rTi Ari.

To see why, consider

0 = rTi+1ri

= (b−Axi+1)Tri

= (b−A(xi + αri))Tri

= (ri − αAri)Tri= rTi ri − αrTi Ari.

So α = rTi ri/(rTi Ari).

Figures 7 and 8 show that the method takes a step along the direction ri until it reaches theminimum of the curve f(xi + αri).

Hence, the method of steepest descent is

33

Figure 8: Take a step in the direction ri until reaching the minimum of the curve f(xi + αri)

Algorithm 7. Steepest descent1: ri = b−Axi.

2: αi =rTi ri

rTi Ari

.

3: xi+1 = xi + αiri.

Note that, xi+1 = xi + αri so −Axi+1 = −Axi − αAri and ri+1 = ri − αAri. We need tocalculate Ari in Step 2, so we could replace Step 1 with ri+1 = ri − αAri. The problem is theround-off errors will then accumulate so we should periodically correct the calculations by explicitlycalculating ri = b−Axi.

The method of steepest descent repeatedly follows the curve down in the steepest direction untilit reaches the minimum. See Figure 9.

Convergence Analysis for Steepest Descent

It is easier to analyse the CG method if we work in the energy norm,

||x||A =(xTAx

) 12 .

Let vj be a set of orthonormal eigenvectors of A with corresponding eigenvalues λj . The error,ei = x− xi, can then be written as a linear combination of vj ,

ei =n∑

j=1

ξjvj .

By using this notation we can show that

||ei+1||2A = ω2||ei||2A,

where

ω2 = 1−

(∑j ξ

2jλ

2j

)2(∑

j ξ2jλ

3j )(∑

j ξ2jλj)

.

34

-4 -2 2 4 6

-6

-4

-2

2

4

1

2

0

The method of Steepest Descent.

Figure 9: Diagramtic representation of the method of steepest descent

Firstly, note that

(ei)TAei = (

∑j

ξjvj)TA(

∑j

ξjvj)

= (∑j

ξjvj)T (∑j

ξjλjvj)

=∑j

ξ2jλj ,

and(ri)

Tri = (Aei)T (Aei) = (

∑j

ξjλjvj)T (∑j

ξjλjvj) =∑j

ξ2jλ2j .

35

Now

‖ei+1‖2A = eTi+1Aei+1

= (ei − αiri)TA(ei − αiri)

= eTi Aei − 2αirTi Aei + α2

i rTi Ari

= ‖ei‖2A − 2rTi ri

rTi Ari(rTi ri) +

(rTi ri

rTi Ari

)2

rTi Ari

= ‖ei‖2A −(rTi ri)

2

rTi Ari

= ‖ei‖2A(

1− (rTi ri)2

(rTi Ari)(eTi Aei)

)= ‖ei‖2A

(1−

(∑

j ξ2jλ

2j )

2

(∑

j ξ2jλ

3j )(∑

j ξ2jλj)

)= ω2‖ei‖2A,

where

ω2 = 1−

(∑j ξ

2jλ

2j

)2(∑

j ξ2jλ

3j )(∑

j ξ2jλj)

.

If we set

κ =λmax

λmin,

it can also be shown (see reference) that

ω ≤ κ− 1

κ+ 1.

So the convergence rate of steepest descent depends on the condition number as well as theinitial guess. See figures 10, 11 and 12.

Method of Conjugate Directions

Steepest descent often takes several steps in the same direction. Lets pick a set of orthogonalsearch directions d0,d1, · · · ,dn−1 and only take one step in each direction. So,

xi+1 = xi + αidi.

After n steps we are done. See Figure 13.What is the value of αi? We want ei+1 and di to be orthogonal, so that we do not go back in

the same direction of di again. Then

dTi ei+1 = 0

dTi (ei − αidi) = 0

αi =dTi ei

dTi di

.

36

20 40 60 80 1000

0.2

0.4

0.6

0.8

1

Convergence of Steepest Descent (per iteration) worsens asthe condition number of the matrix increases.

Figure 10: The convergence rate depends on the condition number

Figure 11: (a) Large κ, small µ; (b) An example of poor convergence, κ and µ are both large

37

Figure 12: (c) small κ, small µ; (d) small κ, large µ

-4 -2 2 4 6

-6

-4

-2

2

4

1

2

0

1

1

0

The Method of Orthogonal Directions.

Figure 13: The idea behind the method of conjugate directions is to only take one step in eachdirection

38

Figure 14: A plot of A-orthogonal vectors

But we don’t know ei.So instead we make the two vectors A orthogonal. I.e. set

dTi Aei+1 = 0.

Figures 14 and 15 show the difference between orthogonal and A-orthogonal vectors. Recall

xi+1 = xi + αidi

soei+1 = ei − αidi.

IfdTi Aei+1 = 0,

dTi A(ei − αidi) = 0,

dTi Aei − αid

Ti Adi = 0,

and

αi =dTi ri

diAdi.

This gives

αi =dTi ri

dTi Adi

.

If r = d this is the steepest descent method.

Conjugate Directions and Error

Lets express the initial error as a linear combination of the search directions

e0 =

n−1∑j=1

σjdj .

39

Figure 15: Compare these orthogonal vectors with the A-orthogonal vectors in Figure 14

Now, dTkAe0 =

∑n−1j=1 σjd

TkAdj = σkd

−1k Adk.

So,

σk =dTkAe0

dTkAdk

=dTkA(e0 −

∑k−1i=0 αidi)

dTkAdk

=dTkAek

dTkAdk

using ei+1 = ei − αidi

=dTk rk

dTkAdk

= αk.

This gives

ei = e0 −i−1∑j=0

αjdj

=

n−1∑j=0

σjdj −i−1∑j=0

σjdj

=n−1∑j=i

σjdj .

Hence, after n iterations every component of the error has been cut away.

Gram-Schmidt Conjugation

How do we find the search directions di?We can use the conjugate Gram-Schmidt process.

40

Suppose we have n linearly independent vectors u0, · · ·un−1. To construct di we take ui andsubtract any components not orthogonal to the previous d vectors. Set d0 = u0 and

di = ui +

i−1∑k=0

βikdk, i > 0.

Using A-orthogonality

0 = dTi Adj

= uTi Adj +

i−1∑k=0

βikdTkAdj

= uTi Adj + βijd

Tj Adj

(j < i) and

βi,j =−uT

i Adj

dTj Adj

.

The problem with this approach is that we need to keep all search directions in memory.

Conjugate Gradients

How do we choose the conjugate directions?The CG method is the method of conjugate directions where the search directions are con-

structed by conjugation of the residuals (setting ui = ri).Note that

dTi rj = 0 if i < j,

so the residuals are orthogonal to the previous search direction, and is thus guaranteed to producea new linearly independent search direction, unless the residual is zero.

Let Di be the i-dimensional subspace spanned by d0, d1, · · · , di−1. Then ri is orthogonal toDi.

dTi rj = dT

i Aej

= dTi A(

n−1∑k=j

αkdk)

=n−1∑k=j

αkdTi Adk

= 0 if i < j by A-orthogonality.

Also

dTi rj = uT

i rj +

i−1∑k=0

βikdTk rj

so 0 = uTi rj if i < j. This means that rTi rj = 0 if i 6= j since we set ui = ri. Recall

βij =−uT

i Adj

dTj Adj

=−rTi Adj

dTj Adj

.

41

Now,ri+1 = Aei+1 = A(ei − αidi) = ri − αiAdi.

rTi rj+1 = rTi rj − αjrTi Adj .

αjrTi Adj = rTi rj − rTi rj+1.

rTi Adj =

1/αir

Ti ri if i = j

−1/αi−1rTi ri if i = j + 1

0 otherwise.

So, if i > j

βij =−rTi Adj

dTj Adj

=

1/αi−1

rTi ri

dTj Adj

if i = j + 1

0 otherwise.

Hence we only need to keep the previous search direction.Let βi = βi,i−1 and evaluate.

βi =1

αi−1

rTi ri

dTi−1Adi−1

=dTi−1Adi−1

dTi−1ri−1

rTi ri

dTi−1Adi−1

using definition of αi

=rTi ri

dTi−1ri−1

=rTi ri

(ui−1 +∑i−2

k=0 βikdk)Tri−1

=rTi ri

uTi−1ri−1

=rTi ri

rTi−1ri−1.

Finally, the method of Conjugate Gradients is

Algorithm 8. Conjugate Gradient1: d0 = r0 = b−Ax0.

2: αi =rTi ri

dTi Adi

.

3: xi+1 = xi + αidi.4: ri+1 = ri − αiAdi.

5: βi+1 =rTi+1ri+1

rTi ri

.

6: di+1 = ri+1 + βi+1di.

When solving a two dimensional problem using the Conjugate Gradient method the solutionwill be found in two steps. See Figure 16.

It can be shown (see reference) that the convergence rate of the CG method is given by

||ei||A ≤ 2

(√κ− 1√κ+ 1

)i

||e0||A.

42

-4 -2 2 4 6

-6

-4

-2

2

4

1

2

0

The method of Conjugate Gradients.

Figure 16: The solution of the model problem using the Conjugate Gradient method

Recall the convergence rate of the steepest descent method depended on κ whereas we see thatthe convergence rate for the CG method depends on

√κ. As shown in Figure 17, this makes a big

difference to the convergence rate.

Preconditioners

Consider the linear systemAx = b.

We have shown that the convergence rate of iterative methods depends on the condition numberof A.

Now consider the linear systemM−1Ax = M−1b.

This has the same solution as the original system. If we can find an M with favourable properties(M−1A has a small condition number) then we can solve the system a lot faster. Such an M iscalled a Preconditioner.

Left and Right Preconditioners

In general, left and right preconditioners can be used. Write M = M1M2 and transform thesystem to

M−11 AM−12 (M2x) = M−11 b,

where M1 and M2 are the left and right preconditioners.This formulation makes it easier to preserve certain properties of A, such as symmetry and

positive definiteness.

43

20 40 60 80 1000

0.2

0.4

0.6

0.8

1

Convergence of Conjugate Gradients (per iteration) as afunction of condition number.

200 400 600 800 10000

5

10

15

20

25

30

35

40

Number of iterations of Steepest Descent required to matchone iteration of CG.

Figure 17: Comparision of the convergence rate of the steepest descent with the CG method

44

The following slides describes some preconditioners. This a very short (superficial) overview,many other preconditioners are available.

Jacobi Preconditioner

• Point Jacobi Preconditioner:

The simplest preconditioner is to let M = diag(A). This is cheap, easy and can be effectivefor some problems. Although more sophisticated methods are available.

• Block Jacobi Preconditioner:

Let

mi,j =

ai,j if i, j are in some index set0 otherwise

Incomplete Factorisation

Suppose A is sparse. The factors formed by LU decomposition or Cholesky factorisation maynot be very sparse.

Suppose L is computed by Cholesky like formulas but allowed to have non-zeros only in positionswhere A has non-zeros. Define M = LLT .

This is incomplete Cholesky. Similar ILU or incomplete LU decompositions are available in thenonsymmetric case as well as other modifications of the above idea.

This method is not cheap, but improved convergence rate may be enough to recover the cost.

Preconditioners that depend on the PDE

• Low-Order Discretisations:

A high-order method gives a more accurate approximation, but the discretisation stencils arelarge and the matrix less sparse. A lower-order discretisation can be used as a preconditioner.

• Constant Coefficients:

Fast solvers are available for certain PDEs with constant coefficients. For problems withvariable coefficients, a constant coefficient approximation may be a good preconditioner.

• Splitting of multi-term operator:

Many applications involve combination of physical properties. It may be possible to write Aas A = A1 +A2. If A1 or A2 is easily invertible it may be a good preconditioner.

• The multigrid method is a good preconditioner (see Bramble)

• If the problem contains long and short range couplings, it may be possible to ignore the longrange couplings in the preconditioner.

The Transformed Preconditioned Conjugate Gradient Method uses

M−11 AM−T1 x = M−11 b, x = MT1 x, M = M1M

T1 .

Algorithm 9. Transformed Preconditioned CG Method

45

1: d0 = r0 = M−11 b−M−11 AM−T1 x0.

2: αi =rTi ri

dTi M−1

1 AM−T1 di

.

3: xi+1 = xi + αidi.4: ri+1 = ri − αiM

−11 AM−T1 di.

5: βi+1 =rTi+1ri+1

rTi ri

.

6: di+1 = ri+1 + βi+1di.

Problem:: Need to know M1

Set ri = M−11 ri, di = MT1 di and M = M1M

T1 .

The Untransformed Preconditioned Conjugate Gradient Method is

Algorithm 10. Untransformed Preconditioned CG Method1: r0 = b−Ax0.2: d0 = M−1r0.

3: αi =rTi M−1ridTi Adi

.

4: xi+1 = xi + αidi.5: ri+1 = ri − αiAdi.

6: βi+1 =rTi+1M

−1ri+1

rTi M−1ri

.

7: di+1 = M−1ri+1 + βi+1di.

PreconditionersStatements of the form s = M−1r, implies that you should apply the preconditioner, you need notexplicitly form the matrix M .

Example

Example 13 (Preconditioned CG). The A matrix is the block tri-diagonal matrix given by the5-point stencil on a uniform grid. It is of size 632 × 632.

Note: the number of non-zeros in the original A matrix is 2063. The number of non-zeros inM if it is formed using Incomplete Cholesky with a tolerance of 10−2 is 2293, but if the toleranceis increased to 10−3 the number of non-zeros increases to 4835.

Convergence rate

Figure 18 shows how the right choice of preconditioner can greatly influence the convergencerate. Note that what this diagram does not show, is that to decrease the overall run time we needa preconditioner that is cheap and easy to apply.

46

0 5 10 15 20 25 3010

−9

10−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100

101

Iteration

||r||/

||b||

No PrecondIn. Chol (10−2)In. Chol (10−3)Block GSBlock Sor(1.8)

Figure 18: Convergence rate when using various preconditioners for the model problem

47

4 Building on Simple Methods

References

A Multigrid Tutorial by William Briggs, Van Emden Henson and Steve McCormick, 2nd Edition,SIAM, 2000.

Stationary Iterative Methods

An iterative method for solving a system Ax = b has the form

x(k+1) = Rx(k) + c

where R and c are chosen so that at some fixed point of the equation x = Rx + c is a solution ofAx = b. This is called stationary if R and c are constant over all iterates.

Matrix Splitting

Definition 2 (Matrix Splitting). A splitting of A is a decomposition A = M −K, with M nonsin-gular.

Ax = Mx−Kx = b⇒Mx = Kx + b.

⇒ x = M−1Kx +M−1b = Rx + c

where R = M−1K, c = M−1b.We take xm+1 = Rxm + c as our iterative method.

Theorem 3. If ||R|| < 1, then xm+1 = Rxm + c converges for any initial guess x0.

Theorem 4. The iteration xm+1 = Rxm + c converges for any initial guess x0 if and only ifρ(R) < 1.

Jacobi Method

Let D = diagonal elements of A, −L be strictly lower triangular part of A and −U be strictlyupper triangular part of A.

Set M = D and K = L+ U , so RJ = D−1(L+ U) and cJ = D−1b.One step of the Jacobi method is then

Dxm+1 = (L+ U)xm + b.

The Jacobi algorithm is

Algorithm 11. Jacobi1: for j = 1 to n do2: xm+1

j = 1ajj

(bj −∑

k 6=j ajkxmk )

3: end for

Jacobi Example

48

Table 3: Example output from model problem

k x1 x2 x3 x40 0 0 0 01 0.6000 2.2727 -1.1000 1.87502 1.0473 1.7159 -0.8052 0.88523 0.9326 2.0533 -1.0493 1.13094 1.0152 1.9537 -0.9681 0.97385 0.9890 2.0114 -1.0103 1.02146 1.0032 1.9922 -0.9945 0.99447 0.9981 2.0023 -1.0020 1.00368 1.0006 1.9987 -0.9990 0.99899 0.9997 2.0004 -1.0004 1.000610 1.0001 1.9998 -0.9998 0.9998

func t i on iterative ( maxn )

A = [ 1 0 , −1, 2 , 0 ;−1, 11 , −1, 3 ;

2 , −1, 10 , −1;0 , 3 , −1, 8 ] % d e f i n e the A matrix

b = [ 6 ; 25 ; −11; 15 ] % d e f i n e the RHS

x = [ 0 ; 0 ; 0 ; 0 ] % f i n d an i n i t i a l guess

max_it = 10 ; % maximum number o f i t e r a t i o n stol = 0 . 0 0 1 ; % s o l u t i o n t o l e r a n c e

y = jacobi (A , x , b , max_it , tol )

Table 3 shows the convergence rate of the Jacobi method applied to the model problem.

Gauss-Seidel Method

Set M = D − L and K = U , so RGS = (D − L)−1U and cGS = (D − L)−1b.One step of the Gauss-Seidel method is then

(D − L)xm+1 = Uxm + b.

The Gauss-Seidel algorithm is

Algorithm 12. Gauss-Seidel1: for j = 1 to n do2: xm+1

j = 1ajj

(bj −∑

k<j ajkxm+1k −

∑k>j ajkx

mk )

3: end for

49

Table 4: Example output from model problem

k x1 x2 x3 x40 0 0 0 01 0.6000 2.3273 -0.9873 0.87892 1.0302 2.0369 -1.0145 0.98433 1.0066 2.0036 -1.0025 0.99844 1.0009 2.0003 -1.0003 0.99985 1.0001 2.0000 -1.0000 1.0000

The order in which we define the elements of xm is important in the Gauss-Seidel method. Twocommon orderings are the lexicographical ordering and the red black ordering.

Gauss-Seidel Example

Table 4 shows the convergence rate of the Gauss-Seidel method applied to the model problem.

Successive Overrelaxation

Consider ωAx = ωb.Set M = D − ωL and K = (1 − ω)D + ωU where ω is the relaxation factor. Then RSOR =

(D − ωL)−1((1− ω)D + ωU) and cSOR = (D − ωL)−1ωb.One step of the SOR method is then

(D − ωL)xm+1 = ((1− ω)D + ωU)xm + ωb.

ω modifies the size of the correction. If ω = 1 this is the same as the GS method. If ω > 1 it iscalled overrelaxation. If ω < 1 it is called underrelaxation.

The SOR algorithm is

Algorithm 13. SOR1: for j = 1 to n do2: xm+1

j = (1− ω)xmj + ωajj

(bj −∑

k<j ajkxm+1k −

∑k>j ajkx

mk )

3: end for

Some Convergence Properties

Theorem 5. If A is strictly diagonally dominant, then for any choice x0 both the Jacobi andGauss-Seidel methods converge to the solution Ax = b.

Theorem 6 (Stein-Rosenberg). If aij ≤ 0 for each i 6= j and aii > 0 for each i = 1, 2, · · · , n, thenone and only one of the following statements holds:

1. 0 < ρ(RGS) < ρ(RJ) < 1,

2. 1 < ρ(RJ) < ρ(RGS),

3. ρ(RJ) = ρ(RGS) = 0,

50

4. ρ(RJ) = ρ(RGS) = 1.

Model Problem

Lets suppose A has been formed by the standard 5-point finite difference stencil. Then

ρ(Gj) = cosπN + 1 ≈ 1− π2

2(N + 1)2

where N is the number of nodal points. This means that as N increase ρ(Gj) approaches 1 andconvergence slows down.

If A is, for example, updated in red/black Gauss-Seidel order then ρ(GGS) = ρ(GJ)2. So onesweep of Gauss-Seidel decreases the error as much as two Jacobi. Thus Gauss-Seidel is twice asfast.

If we use red/black SOR and choose ω = 2/(1 + sin(π/N + 1)) then

ρ(GSOR(w)) ≈ 1− 2π

N + 1,

compared to ρ(GJ) = 1−O(1/N2).

1D Poisson Example

Example 14 (3-point stencil). Number of Iterations required to solve the system of equationsAx = b where the A is the 3-point stencil from the finite difference approximation of Poisson’sequation.

1

h2

−2 1 0 · · · · · · · · · 0

1 −2 1 0...

0 1 −2 1 0...

.... . .

. . .. . .

. . .. . .

...... 0 1 −2 1 0... 0 1 −2 10 · · · · · · · · · 0 1 −2

x = b.

Figure 19 shows that as the matrix size is increased, the number of iterations required to solvethe system of equations also increases.

The number of iterations also depends on the initial guess. Figure 20 shows that using thesolution from a coarse grid as an initial guess for the fine grid solution can reduce the number ofiterations.

Weighted Jacobi

The weighted Jacobi method is given by

Rw = (1− ω)I + ωRJ ,

cw = D−1b.

51

3 4 5 6 7 8 9 100

50

100

150

200

250

300

Num

ber

of it

erat

ions

Matrix Size

Number of iterations, using initial guess x = 0

JacobiGauss Seidel

Figure 19: Convergence rate compared to the problem size

3 4 5 6 7 8 9 100

50

100

150

200

250

Num

ber

of it

erat

ions

Matrix Size

Number of iterations, using coarse grid

JacobiGauss Seidel

Figure 20: Number of iterations required to solve the system if the coarse grid solution is used asan initial guess

52

k=6

k=4

k=2

k=1

0

-1

-0.5

20

j

30155 10

0.5

1

250

N=32

Figure 21: Plot of the sine function for different frequencies

SoDxm+1 = (1− ω)Dxm + ω(L+ U)xm + b.

Therefore one step of the algorithm is given by

xm+1j =

1

ajj

(1− ω)ajjxmj + ω(bj −

∑k 6=j

ajkxmk )

=

1

ajj

(ajjx

mj + ω(bj −

∑k

ajkxmk )

).

The weighted Jacobi method is not used in practise, it is included here to motivate the multigridmethod.

Weighted Jacobi and the Three Point Stencil

Consider the following system of equations

−u(xi − h) + 2u(xi)− u(xi + h) = 0, u0 = un = 0.

Now suppose we tried to solve the system of equations using the Jacobi method and we pickedv0 as our initial guess where

v0j = sin

(jkπ

n

), 0 ≤ j ≤ n

for some k (1 ≤ k ≤ n − 1). The value k is the wave number. Small values of k give smoothfunctions, large values give more oscillatory functions. See Figure 21.

53

y

1

0.5

0

-0.5

-1

x

302520151050

i = 0, k = 2, N=32

y

1

0.5

0

-0.5

-1

x

302520151050

i = 1, k = 2, N=32

y

1

0.5

0

-0.5

-1

x

302520151050

i = 2, k = 2, N=32

y

1

0.5

0

-0.5

-1

x

302520151050

i = 3, k = 2, N=32

Figure 22: Plot of the error for 4 iterations if k = 2

SmoothersSmoothers like the Jacobi and Gauss Seidel methods are good at removing the high frequencycomponents of the error but have difficulty removing the low frequency components.

Figure 22 shows that if the error is smooth, i.e k = 2, then the Jacobi method does a poor jobof decreasing the error.

If we increase the frequency slightly to k = 4, then we can see some improvement in theconvergence rate. See Figure 23.

If the error is made up of high frequency components, k = 8, then the Jacobi method does agood job of decreasing the error. See Figure 24.

The same results apply to two dimensional problems. Figure 25 shows the results if the initial

54

y

1

0.5

0

-0.5

-1

x

302520151050

i = 0, k = 4, N=32

y

1

0.5

0

-0.5

-1

x

302520151050

i = 1, k = 4, N=32

y

1

0.5

0

-0.5

-1

x

302520151050

i = 2, k = 4, N=32

y

1

0.5

0

-0.5

-1

x

302520151050

i = 3, k = 4, N=32

Figure 23: Increasing the frequency improves the convergence rate

55

y

1

0.5

0

-0.5

-1

x

302520151050

i = 0, k = 8, N=32

y

1

0.5

0

-0.5

-1

x

302520151050

i = 1, k = 8, N=32

y

1

0.5

0

-0.5

-1

x

302520151050

i = 2, k = 8, N=32

y

1

0.5

0

-0.5

-1

x

302520151050

i = 3, k = 8, N=32

Figure 24: High frequency components of the error are removed quickly

56

guess (and in this example the error) is given by

v0j1j2 = sin

(j1k1π

n

)sin

(j2k2π

n

), 0 ≤ j1, j2 ≤ n

with k1 = k2 = 2.Figure 26 shows how that the Jacobi method reduces the error if the initial guess has k1 = k2 = 4.

Jacobi and Eigenvalues

Recall that stationary methods converge if and only if ρ(R) < 1.Let A be the matrix formed by the the 3-point stencil. Then

Rw = I − ω

2A.

So,

λ(Rw) = 1− ω

2λ(A).

The eigenvalues of A are

λk(A) = 4 sin2

(kπ

2n

), 1 ≤ k ≤ n− 1,

with corresponding eigenvectors,

wk,j = sin

(jkπ

n

), 1 ≤ k ≤ n− 1, 0 ≤ j ≤ n.

The Jacobi method converges on this example if 0 < ω ≤ 1.

Jacobi and Eigenvectors

Let e0 be the error of the initial guess used in the weighted Jacobi method. We can write e0 as

e0 =n−1∑k=1

ckwk,

where ck is the amount of error contributing to each mode.Now, the error at the mth iteration is

em = Rmw e0

=n−1∑k=1

ckRmwwk

=

n−1∑k=1

ckλmk (Rw)wk.

After m iterations the kth mode of the error has been reduced by a factor of λmk . That is highoscillatory components are reduced more quickly. See Figure 27.

57

2

4

column6

8

10

12

2

4

6

8 row

10

12

i = 0, k1 = k2 = 2, N=12

2

4

column6

8

10

12

2

4

6

8 row

10

12

i = 1, k1 = k2 = 2, N=12

2

4

column6

8

10

12

2

4

6

8 row

10

12

i = 2, k1 = k2 = 2, N=12

2

4

column6

8

10

12

2

4

6

8 row

10

12

i = 3, k1 = k2 = 2, N=12

Figure 25: Reduction in the error when applying the Jacobi method to a smooth problem

58

2

4

column6

8

10

12

2

4

6

8 row

10

12

i = 0, k1 = k2 = 4, N=12

2

4

column6

8

10

12

2

4

6

8 row

10

12

i = 1, k1 = k2 = 4, N=12

2

4

column6

8

10

12

2

4

6

8 row

10

12

i = 2, k1 = k2 = 4, N=12

2

4

column6

8

10

12

2

4

6

8 row

10

12

i = 3, k1 = k2 = 4, N=12

Figure 26: Once again, the Jacobi method does a good job of removing the high frequency compo-nents of the error

59

y

1

0.5

0

-0.5

-1

x

302520151050

i = 0, k = 2, 8, 16, N=32

y

1

0.5

0

-0.5

-1

x

302520151050

i = 5, k = 2, 8, 16, N=32

y

1

0.5

0

-0.5

-1

x

302520151050

i = 10, k = 2, 8, 16, N=32

Figure 27: Example where the error is a combination of different frequency components

60

w=1/4

w=1/3

w=1/2

w=2/3

w=1

0.5

-0.5

0

-1

1

k

255 3010 150 20

N=32

Figure 28: Plot of λk for different values of ω

Note that

λ1 = 1− 2ω sin2( π

2n

)= 1− 2ω sin2

(πh

2

)≈ 1− ωπ

2h2

2.

In other words, the smoothest mode will always be close to 1. See Figure 28.

Coarse Grids

We can use a coarse grid to find a good initial guess to an iterative scheme. The coarse gridcontains fewer nodes and as we have seen, Jacobi and SOR methods converge faster on coarse grids.

Let Ω2h be a coarse grid defined by taking the even numbered nodes of the fine grid Ωh.If, 1 ≤ k < n

2 , then

whk,2j = sin

(2jkπ

n

)= sin

(jkπ

n/2

)= w2h

k,j , 1 ≤ k < n

2.

So modes, ‘moved’ down to the coarse grids become more oscillatory.This leads to the idea of the 2-grid Multigrid method.

• Apply a few iterations of the iterative method to remove the high frequency components ofthe error.

• Move the problem down to the coarse grid.

61

• Solve the coarse grid problem.

• Move the problem back up to the fine grid to improve the fine grid solution.

Coarse to Fine Grid Transfer Ih2h

How do we transfer the problem from the coarse grid to the fine grid?Linear Interpolation: 1D

vh2j = v2hj ,

vh2j+1 =1

2(v2hj + v2hj+1), 0 ≤ j ≤ n

2− 1.

Bilinear Interpolation: 2D

vh2i,2j = v2hi,j ,

vh2i+1,2j =1

2(v2hi,j + v2hi+1,j),

vh2i,2j+1 =1

2(v2hi,j + v2hi,j+1),

vh2i+1,2j+1 =1

4(v2hi,j + v2hi+1,j + v2hi,j+1 + v2hi+1,j+1), 0 ≤ i, j ≤ n

2− 1.

Fine to Coarse Grid Transfer I2hh

Restriction: 1D

v2hj = vh2j , 1 ≤ j ≤ n

2− 1.

Full Weighting: 1D

v2hj =1

4(vh2j−1 + 2vh2j + vh2j+1), 1 ≤ j ≤ n

2− 1.

Full Weighting: 2D

v2hi,j =1

16[vh2i−1,2j−1 + vh2i−1,2j+1 + vh2i+1,2j−1 + vh2i+1,2j+1

+ 2(vh2i,2j−1 + vh2i,2j+1 + vh2i−1,2j + vh2i+1,2j)

+ 4vhi,j ], 1 ≤ i, j ≤ n

2− 1.

Coarse Grid Approximation of the Error

What is meant by ‘move the problem down to the coarse grid ’?Recall,

rh = bh −Ahvh

= Ahu−Ahvh

= Aheh.

62

Algorithm 14. Two-Grid Correction Scheme vh ←MG(vh, bh)

1: Apply smoother µ1 times to Ahuh = bh on Ωh with initial guess vh.2: Compute the residual rh = bh −Ahvh.3: Restrict the residual down to coarse grid r2h = I2hh rh.4: Solve A2he2h = r2h on Ω2h.5: Interpolate coarse grid error to fine grid eh = Ih2he

2h.6: Update fine grid approximation vh = vh + eh.7: Apply smoother µ2 times to Ahuh = bh on Ωh with initial guess vh.

The multigrid method firstly applies the pre-smoothing steps to remove the high frequencycomponents of the error. It then projects the residual equation down to the coarse grid to find anapproximation to the error. See Figure 29. That approximation is interpolated back up to the finegrid and used to improve the current solution. See Figure 30. Finally, a post-smoothing step isapplied to remove the remaining high frequency components of the error.

Multiple Layers of Grids

In the two grid correction method we said that we should solve the system of equations r2h =A2he2h to find a coarse grid approximation to the error. How do we solve the coarse gridequations?

If the coarse grid is small we could use a direct method like Gaussian Elimination.If the coarse grid is still large we could add another, even coarser, grid and apply Algorithm 14

again. This leads to the multigrid idea.

V-Cycle Scheme

There are many different variations of the multigrid idea, the most common is the V-CycleScheme. Given a nested sequence of grids

Ωkh ⊂ Ω(k−1)h ⊂ · · · ⊂ Ω2h ⊂ Ωh,

the V-Cycle Scheme is:

Algorithm 15. V-Cycle Scheme: vh ← V h(vh, bh)

1: Apply smoother µ1 times to Ahuh = bh on Ωh with initial guess vh.2: if Ωh is the coarsest grid then3: goto step 124: else5: Compute the residual rh = bh −Ahvh.6: Restrict the residual to coarse grid b2h = I2hh rh.7: Set the initial coarse grid guess v2h = 0.8: Solve v2h = V 2h(v2h, b2h).9: Interpolate coarse grid error to fine grid eh = Ih2hv

2h.10: Update fine grid approximation vh = vh + eh.11: end if12: Apply smoother µ2 times to Ahuh = bh on Ωh with initial guess vh.

W-Cycle Scheme

Another variation of the multigrid idea is the W-cycle Scheme:

63

y

1

0.5

0

-0.5

-1

x

302520151050

i=0, k=[2, 8], N=32

y

1

0.5

0

-0.5

-1

x

302520151050

i=6, k = [2, 8], N=32

x

1612840y

1

0.5

0

-0.5

-1

error, N = 16

Figure 29: Apply the pre-smoothing steps to remove the high frequency components of the errorand then approximate the error on the coarse grid.

64

y

1

0.5

0

-0.5

-1

x

302520151050

inter. error, N = 32

y

1

0.5

0

-0.5

-1

x

302520151050

corrected, N=32

Figure 30: The coarse grid approximation to the error is used to improve the current estimate

Algorithm 16. W-Cycle Scheme: vh ←W h(vh, bh)

1: . Apply smoother µ1 times to Ahuh = bh on Ωh with initial guess vh.2: if Ωh is not the coarsest grid then3: Compute the residual rh = bh −Ahvh.4: Restrict the residual to coarse grid b2h = I2hh rh.5: Set the initial coarse grid guess v2h = 0.6: Repeat v2h = V 2h(v2h, b2h) twice.7: Interpolate coarse grid error to fine grid eh = Ih2hv

2h.8: Update fine grid approximation vh = vh + eh.9: end if

10: Apply smoother µ2 times to Ahuh = bh on Ωh with initial guess vh.

Full Multigrid V-Cycle (FMV)

In Chapter 2 we mentioned the idea of using the coarse grid to find a good initial guess. Withthe FMV scheme we incorporate that idea into the multigrid method.

Algorithm 17. FMV Scheme: vh ← FMGh(bh)

1: if Ωh is coarsest grid then2: vh = 03: else4: Find the RHS vector b2h = I2hh bh.5: Solve coarse grid v2h = FMG2h(b2h).6: Find coarse grid approximation vh = Ih2hv

2h.7: end if8: Solve on the current grid vh = V h(vh, bh).

65

Memory Requirements

Lets consider the solution of Poisson’s equation on a square domain. We need to store a copyof vh and bh on each grid level.

Consider a d-dimensional grid with nd points and assume n is a power of 2. Two arrays arestored on each level. The finest grid, Ωh requires 2nd amount of storage. The next grid Ω2h requires2−d times as much storage. The third grid Ω4h requires 4−d = 2−2d as much storage, etc. So,

Storage = 2nd(1 + 2−d + 2−2d + · · ·+ 2−nd) <2nd

1− 2−d.

Computational Load

Let WU be the cost of performing one relaxation sweep on the finest grid. We will ignoreinter-grid transfer costs that are relatively low.

Consider the case where we do one relaxation sweep on each level. Then

V-Cycle computation costs = 2WU(1 + 2−d + 2−2d + · · ·+ 2−nd)

<2WU

1− 2−d.

66

5 Review Material

Identity Matrix

Definition 3 (Unit Vector). We shall represent the unit vectors by ej ∈ Cn where the jth elementof ej is 1 and all other elements are zero.

Example 15 (Unit vector in R3). For example;

e1 = [1, 0, 0]T , e2 = [0, 1, 0]T , e3 = [0, 0, 1]T .

Identity Matrix

Definition 4 (Identity Matrix). The identity matrix is given by I ∈ Cn,n. The row i, column jentry of I is

Ii,j =

1 if i = j0 otherwise

Another way of saying this is that the jth column of I is ej .

Example 16 (Identity matrix). For example, if n = 3;

I =

1 0 00 1 00 0 1

=

1 | 0 | 00 | 1 | 00 | 0 | 1

=

| |e1 | e2 | e3| |

.Non-singular

Definition 5 (Matrix Inverse). Consider a matrix A ∈ Cn,n. If there is a matrix Z ∈ Cn,n suchthat AZ = I ∈ Cn,n then Z is the inverse of A and is written as A−1.

Definition 6 (Non-Singular). A matrix A ∈ Cn,n is non-singular or invertible if there is a matrixZ ∈ Cn,n such that AZ = I ∈ Cn,n.

If A is non-singular then A−1 exists and Ax = b always has an unique solution x = A−1b.

Theorem 7. For A ∈ Cn,n, the following conditions are equivalent;

1. A has an inverse A−1.

2. rank(A) = n.

3. range(A) = Cn.

4. null(A) = 0.

5. 0 is not an eigenvalue of A.

6. 0 is not a singular value of A.

67

7. det(A) 6= 0.

Example 17 (Non-singular system). For example; consider the 3× 3 system

x1 + 3x2 + 2x3 = b1

2x1 + 6x2 + 9x3 = b2

2x1 + 8x2 + 8x3 = b2

This can be written in matrix form as 1 3 22 6 92 8 8

x1x2x3

=

b1b2b3

.The inverse is

1

10

24 8 −15−2 −4 5−4 2 0

.Since

Ax = b↔ x = A−1b,

x1x2x3

=1

10

24b1 + 8b2 − 15b3−2b1 − 4b2 + 5b3

−4b1 + 2b2

.In practise we do not explicitly find A−1 since this is very expensive (as will be shown later).Instead we find ways to solve the, original, system of equations.

Example 18 (Singular system). Now lets consider a slightly different system

x1 + 3x2 + 2x3 = b1

2x1 + 6x2 + 9x3 = b2

3x1 + 9x2 + 8x3 = b3

This can be written in matrix form as 1 3 22 6 93 9 8

x1x2x3

=

b1b2b3

.

In this case A =

1 3 22 6 93 9 8

is singular. That is 6 ∃Z such that AZ = I.

In fact, lets try to solve the system

68

x1 + 3x2 + 2x3 = b1 (2)

2x1 + 6x2 + 9x3 = b2 (3)

3x1 + 9x2 + 8x3 = b3 (4)

(2) ×− 2−2x1 − 6x2 − 4x3 = −2b1 (5)

(5)+(3)5x3 = −2b1 + b2 ⇒ x3 = −2/5b1 + 1/5b2

(2) ×− 3−2x1 − 6x2 − 4x3 = −2b1 (6)

(6)+(4)2x3 = −3b1 + b3 ⇒ x3 = −3/2b1 + 1/2b3.

Hence a solution does not exist unless −2/5b1 + 1/5b2 = −3/2b1 + 1/2b3. So it is possible thereis no solution.

Suppose −2/5b1 + 1/5b2 = −3/2b1 + 1/2b3. If we substitute the result back into (2) we get

x1 + 3x2 = b1 − 2x3 = b1 + 3b1 − b3 = 4b1 − b3

So, if x2 = α,x1 = 4b1 − b3 − 3α.

In otherwords, there are infinitely many solutions.If A is singular then either

1. there is no solution;

2. there are an infinite number of solutions.

Lets suppose b1 = b2 = b3 = 0, then −2/5b1 + 1/5b2 = −3/2b1 + 1/2b3 and

x3 = 0, x2 = α, x1 = −3× α.

Hence 1 3 22 6 93 9 8

−3αα0

=

000

.If A is singular then there exists x 6= 0 such that Ax = 0 which means null(A) 6= 0.

Linear Independence

Let v1,v2,v3, · · · ,vk be a set of vectors.

Definition 7 (Linear Independence). The set is said to be linearly dependent if there are c1, c2, · · · , ck(not all zero) with

0 = c1v1 + c2v2 + · · ·+ ckvk.

69

A set of vectors that is not linearly dependent is called linearly independent .

Example 19 (Vectors in R3). Two vector are dependent if they lie on the same line and threevectors are dependent if they lie on the same space.

Example 20 (Linearly dependent columns).

A =

1 3 3 22 6 9 5−1 −3 3 0

The columns of A are linearly dependent since the second column is a multiple of the first.The rows are also linearly dependent since Row 3 = 2*Row 2 - 5 * Row 1.

Example 21 (Linearly independent columns). If

A =

3 4 20 1 50 0 2

then the columns are linear independent. To see why, suppose 0

00

= c1

300

+ c2

410

+ c3

252

.Then c3 = 0, c2 = 0, c1 = 0 and hence the vectors are linearly independent.

Range

Definition 8 (Range). Consider A ∈ Cm,n. The range(A) is the set of vectors which can beexpressed as Ax for some x.

This is also called the column space of A. The row space of A is the range of A∗.

Example 22 (Consistent system). For example, if

A =

1 05 42 4

then Ax = b can be solved if and only if b lies in the plane that is spanned by the two columnvectors. That is b ∈ range(A).

Nullspace

Definition 9 (Null Space). The nullspace of A ∈ Cm,n, null(A), is the set of vectors x such thatAx = 0.

70

Example 23 (Trivial null space). For example, if

A =

1 05 42 4

then solving

A =

1 05 42 4

[ uv

]=

000

,implies u = 0 (from first equation) and v = 0 (from second equation).

Example 24 (Non-trivial null space). Now consider

B =

1 0 15 4 92 4 6

.Column 3 is a linear combination of Columns 1 and 2 (Column 3 = Column 1 + Column 2), thusColumn 3 lies in the same plane as that spanned by Column 1 and 2. In otherwords range(A) =range(B). Also the null space of B contains any multiple of the vector [1, 1,−1]T (Column 1 +Column 2 - Column 3 = 0); 1 0 1

5 4 92 4 6

cc−c

=

000

..

Rank

Definition 10 (Rank). The rank of A ∈ Cm,n is the dimension of the column space.

dim(nullspace(A)) + rank(A) = n.

Norms

Suppose we have the system of equations Ax = b and we find x as an approximation to x. Howdo we measure the distance between x and x? We will concentrate on the p-norms:

‖x‖p =

(n∑

i=1

|xi|p) 1

p

.

• 1-norm

‖x‖1 =

(n∑

i=1

|xi|

).

• 2-norm

‖x‖2 =

(n∑

i=1

|xi|2) 1

2

=√

(x∗x).

71

• ∞-norm‖x‖∞ = max

i|xi|.

Now, ‖x‖1 ≥ ‖x‖2 ≥ ‖x‖∞ and ‖x‖1 ≤√n‖x‖2, ‖x‖2 ≤

√n‖x‖∞ and ‖x‖1 ≤ n‖x‖∞. Hence

all norms are equivalent.

1

-1

1-1

L2

L1

Linf

Example 25 (Vector norms). For example, consider

xT = [2.6, 5.9, 3.4], xT = [2.8, 5.8, 3.0].

Then(x− x)T = [−0.2, 0.1, 0.4].

So

‖x− x‖1 = 0.2 + 0.1 + 0.4 = 0.7.

‖x− x‖2 = (0.04 + 0.01 + 0.16)2 = 0.4583 to 4 dp.

‖x− x‖∞ = 0.4.

Definition 11 (Vector Norms). If x and y are vectors, then ‖.‖ is a vector norm if all of thefollowing properties hold;

1. ‖x‖ > 0, if x 6= 0.

2. ‖αx‖ = |α‖|x‖ for scalar α.

3. ‖x + y‖ ≤ ‖x‖+ ‖y‖.

Matrix Norms

Given a vector norm, we can define the corresponding matrix norms as follows;

‖A‖ = max‖x‖6=0

‖Ax‖‖x‖

72

.These norms are subordinate to the vector norms.For the 1-norm and ∞-norm these simplify to;

• ‖A‖1 = maxj∑n

i=1 |aij |.

• ‖A‖∞ = maxi∑n

i=j |aij |.

Example 26 (Matrix norms- ‖ · ‖1 ). For example

A =

1 0 −10 3 −15 −1 1

.Then

j = 1 :∑i

|ai1| = 6,

j = 2 :∑i

|ai2| = 4,

j = 3 :∑i

|ai3| = 3,

so ‖A‖1 = 6.

Example 27 (Matrix norms - ‖ · ‖∞). For example

A =

1 0 −10 3 −15 −1 1

.Then

i = 1 :∑j

|a1j | = 2,

i = 2 :∑j

|a2j | = 4,

i = 3 :∑j

|a3j | = 7,

so ‖A‖∞ = 7.

Definition 12 (Matrix Norms). If A and B are matrices, then ‖.‖ is a matrix norm if all of thefollowing properties hold;

1. ‖A‖ > 0, if A 6= 0.

73

2. ‖αA‖ = |α| ‖A‖ for scalar α.

3. ‖A+B‖ ≤ ‖A‖+ ‖B‖.

The subordinate matrix norms defined above also have the following additional properties

4. ‖AB‖ ≤ ‖A‖ ‖B‖.

5. ‖Ax‖ ≤ ‖A‖ ‖x‖ for any vector x.

Orthogonal Vectors

Definition 13 (Orthogonal Vectors). Vectors x and y are orthogonal if x∗y = 0.

A set of vectors S is orthogonal if ∀x,y ∈ S, x∗y = 0.The set is orthonormal if it is orthogonal and ||x||2 = 1 for all x ∈ S.

Theorem 8. The vectors in an orthogonal set are linearly independent.

Proof of linear independence.Suppose the set is not linearly independent. Then there exists a vector vk ∈ S that can be

written as a linear combination of the other vectors in S. That is

vk =

j∑i=1

cjvj .

Assume vk 6= 0. If vk = 0, then v1 =∑j

i=2−cjc1

vj . In otherwords, we can always find some ksuch that vk 6= 0.

Now,

0 < ||vk||22= v∗kvk

= v∗k(

j∑i=1

cjvj)

=

j∑i=1

cjv∗kvj

= 0.

A contradiction.

Orthogonal Basis

Any set of independent vectors q1, q2, · · · , qk can be converted into a set of orthogonal vectorsx1,x2, · · · ,xk.

First set x1 = q1 and then define xi by

xi = qi −x∗1qix∗1x1

x1 − · · · −x∗i−1qi

x∗i−1xi−1xi−1.

74

Theorem 9. The subspace spanned by the original set of vectors q1, q2, · · · , qk is also spanned byx1,x2, · · · ,xk.

x1 = q1.

x2 = q2 −x∗1q2x∗1x1

x1 = q2 − c12q1.

x3 = q3 −x∗1q3x∗1x1

x1 −x∗1q3x∗2x2

x2

= q3 − c13q1 − c23q2 − c23c12q1

(so x1,x2, · · · ,xk and q1, q2, · · · , qk span the same space)Consider xj where j < i.

x∗jxi = x∗j

(qi −

x∗1qix∗1x1

x1 −x∗2qix∗2x2

x2 − · · · −x∗i−1qi

x∗i−1xi−1xi−1

)= x∗jqi − x∗j

(x∗1qix∗1x1

x1

)− x∗j

(x∗2qix∗2x2

x2

)− · · · − x∗j

(x∗i−1qi

x∗i−1xi−1xi−1

)= x∗jqi −

(x∗1qix∗1x1

)x∗jx1 −

(x∗2qix∗2x2

)x∗jx2 − · · · −

(x∗i−1qi

x∗i−1xi−1

)x∗jxi−1

= x∗jqi −

(x∗jqi

x∗jxj

)x∗jxj

= x∗jqi − x∗jqi

= 0.

For example,

q1 =

110

, q2 =

101

, q3 =

011

.x1 = q1 =

110

.x∗1q2 = 1, x∗1x1 = 2, x2 = q2 −

x∗1q2x∗1x1

x1 =

101

− 1

2

110

=1

2

1−12

,x∗1q3 = 1, x∗1x1 = 2, x∗2q3 =

1

2, x∗2x2 =

3

2

x3 = q3 −x∗1q3x∗1x1

x1 −x∗2q3x∗2x2

x2 =

011

− 1

2

110

− 1

3

1

2

1−12

=2

3

−111

.Unitary Matrix

75

Definition 14 (Unitary Matrix). A square matrix Q ∈ Cn,n is unitary (or orthogonal if Q ∈ Rn,n)if Q∗Q = I (or QTQ = I).

The important feature of unitary matrices is that they preserve length. That is

||Qx||22 = (Qx)∗(Qx)

= x∗Q∗Qx

= x∗x

= ||x||22.

Example 28 (Unitary matrices).

Rotation Matrix Q =

[cos θ − sin θsin θ cosθ

],

Permutation Matrix P =

0 1 01 0 00 0 1

.Transformation Matrices

Transform a general linear system into a triangular linear system which is easier to solve.Given an n vector a remove all entries below the kth position (ak 6= 0) by using the following;

Mka =

1 · · · 0 0 · · · 0...

. . ....

.... . .

...0 · · · 1 0 · · · 00 · · · −mk+1 1 · · · 0...

. . ....

.... . .

...0 · · · −mn 0 · · · 1

a1...akak+1

...an

=

a1...ak0...0

where mi = ai/ak, i = k + 1 · · ·n (ak = pivot).

Mk = elementary elimination matrix or Gauss Transform.Note,

1. Mk is a lower triangular matrix with unit diagonal ⇒ Mk is non-singular.

2. Mk = I−meTk , where m = [0, · · · , 0,mk+1, · · · ,mn]T and ek is the kth column of the identitymatrix.

3. M−1k = I + meTk .

4. If Mj , j > k, is another elementary elimination matrix, with vector multiplier t then

MkMj = I −meTk − teTj −meTk teTj = I −meTk − teTj

since eTk t = 0.

76

Gaussian Elimination

To reduce an n × n system Ax = b to upper triangular form multiply both sides by M =Mn−1Mn−2 · · ·M1. For example consider,

A =

∗ ∗ ∗ ∗∗ ∗ ∗ ∗∗ ∗ ∗ ∗∗ ∗ ∗ ∗

M1A =

× × × ×0 × × ×0 × × ×0 × × ×

, M1 =

1 0 0 0−m2 1 0 0−m3 0 1 0−m4 0 0 1

,

M2M1A =

× 4 4 40 4 4 40 0 4 40 0 4 4

, M2 =

1 0 0 00 1 0 00 −t3 1 00 −t4 0 1

,

M3M2M1A =

× 4 © ©0 4 © ©0 0 © ©0 0 0 ©

, M3 =

1 0 0 00 1 0 00 0 1 00 0 s4 1

.

LU Factorisation

Gaussian Elimination is also known as LU factorisation.Set L = M−1 = (Mn−1 · · ·M1)

−1 = M−11 · · ·M−1n−1 and set U = MA. Then LU = M−1(MA) =

A.L is lower triangular and U is upper triangular.If we know L and U then we can express Ax = b as L(Ux) = b and solve for x by

Ly = b ( forward substitution)

Ux = y ( back substitution).

Complexity

Algorithm 18. LU - factorisation1: for k = 1 to n do2: for i = k + 1 to n do3: for j = k + 1 to n do4: aij→aij − (aik/akk)akj5: end for6: end for7: end for

77

We may interchange order in which we move through the indices (eg to suit Fortran or C methodof storage).

Gaussian elimination requires O(n3) operations for the factorisation and O(n2) operations forthe back ( and forward) substitution. Explicit matrix inversion is equivalent to solving n systemsof equations and thus requires O(n3) operations for the factorisation and O(n3) for the back (andforward) substitution.

78

linear algebra article · linear algebra article winter school 2017 1. contents 1 why worry? 3 2...

Documents