linear algebra article · linear algebra article winter school 2017 1. contents 1 why worry? 3 2...
TRANSCRIPT
Linear Algebra
Article
Winter School 2017
1
Contents
1 Why Worry? 3
2 Beyond Square Matrices 18
3 Alternatives to Direct Methods 30
4 Building on Simple Methods 48
5 Review Material 67
2
1 Why Worry?
References
• Numerical Linear Algebra by Lloyd N. Trefethen and David Bau, III. SIAM 1997.
• Applied Numerical Linear Algebra by James W. Demmel. SIAM 1997
Arithmetic Disasters
The following reference lists some real life consequences of round-off errors.
• http://ta.twi.tudelft.nl/nw/users/vuik/wi211/disasters.html
– Patriot Missile Failure.
– Explosion of the Ariane 5.
– EURO page: Conversion Arithmetics.
– The Vancouver Stock Exchange.
– Rounding error changes Parliament makeup.
– The sinking of the Sleipner A offshore platform.
– Tacoma bridge failure (wrong design).
– Collection of Software Bugs.
Poor Programming
These examples are more due to poor prgramming practises than round-off errors. But they dohighlight the need to take care of numerical considerations.
• http://www5.in.tum.de/~huckle/bugse.html
– Hammer throwing, London olympics, (Software would not accept athlete’s results be-cause it was exactly the same as the previous athlete’s results, 2012).
– Mars Climate Orbiters, Loss (Mixture of lb and kg, 1999).
– Green Party Convent fails (By rounding error and erronous use of Excel the wrongnumber of delegates is computed, 2002).
– London Millenium Bridge, wobbling (compare Tacoma Bridge). (Simulation fails be-cause of wrong estimates for pedestrian forces, 2000).
– Vancouver Stock Exchange Index (Rounding Error, 1983).
– Shut down of Nuclear Reactors (Use of wrong norm in CAD system, 1979).
– Ozone Hole ignored until 1985 (Software had to set aside data points that deviatedgreatly from expected measurements).
Sample Variance Example
3
Definition 1 (Two pass sample variance calculation).
x =1
n
n∑i=1
xi,
s2n =1
n− 1
n∑i=1
(xi − x)2.
Definition 2 (One pass sample variance calculation).
s2n =1
n− 1
n∑i=1
x2i −1
n
(n∑
i=1
xi
)2 .
The above two and one pass definitions for calculating the sample variance are mathematicalyequivalent. The one pass definition may seem more effective because it only accesses the data once.However, the one pass definition gives bad values in the presence of round-off errors because itcontains the subtraction of two large positive numbers. Note the relative error for the subtractionof two numbers a and b is large if |a − b| << |a| + |b|. See the discussion on cancellation errorsbelow. Example 1 shows that due to cancellation it is possible to get negative answers with the onepass method, which is clearly incorrect. The two pass formulation gives accurate results, unless nis large. Some calculators have used the one pass formula.
Example 1 (One pass method). I wrote a C code to calculate the sample variance of the threenumbers 784318, 784319, 784320. When using single floating point precision the variance calcu-lated from the one pass method was -65536. The sample variance using the two pass method was 1(the correct answer). When I tried double precision both methods gave an answer of 1.
Some Sources of Error
Definition 3 (Truncation Error). Truncation (or discresisation) error is the difference between thetrue result and the result that would be given if exact arithmetic is used. Eg. truncating an infiniteseries.
Definition 4 (Rounding Error). The rounding error is the difference between the results obtainedby a particular algorithm using exact arithmetic and the results obtained by the same algorithmusing finite precision.
Example 2 (Simpson’s Rule). Consider for example, Simpson’s Rule. We know that it is an O(h5)method and saying that it is O(h5) gives information about the truncation error. But when we tryto solve such problems on the computer extra sources of error due to floating point arithmetic arebeing introduced.
Floating Point Arithmetic
Due to rounding error, arithmetic operations on computers are not (always) exact.
4
Definition 5 (Floating Point Arithmetic). We shall denote an evaluation of an expression infloating point arithmetic by fl. If represents the basic arithmetic operations +,−,×, / then
fl(x y) = (x y)(1 + δ), |δ| ≤ u
where u is the ‘unit roundoff’.
The round off error is then |x y − fl(x y)|.
Example 3 (Inner product). Consider the inner product
sn = xTy = x1y1 + · · ·+ xnyn. (1)
Lets assume that we are summing from left to right. Define the partial sum si by si = x1y1 +x2y2 + · · ·+ xiyi. Now,
s1 := fl (x1y1) = x1y1 (1 + δ1) .
s2 := fl (s1 + fl(x2y2))
= fl (s1 + x2y2(1 + δ2))
= (s1 + x2y2(1 + δ2)) (1 + δ3)
= (x1y1(1 + δ1) + x2y2(1 + δ2)) (1 + δ3)
= x1y1(1 + δ1)(1 + δ3) + x2y2(1 + δ2)(1 + δ3), where |δi| < u.
Drop the subscripts and let 1 + δi ≡ 1± δ. Then,
s3 := fl (s2 + x3y3)
= (s2 + x3y3(1± δ)) (1± δ)=
(x1y1(1± δ)2 + x2y2(1± δ)2 + x3y3(1± δ)
)(1± δ)
= x1y1(1± δ)3 + x2y2(1± δ)3 + x3y3(1± δ)2
and in general,
sn = x1y1(1± δ)n + x2y2(1± δ)n + x3y3(1± δ)n−1 + · · ·+ xnyn(1± δ)2.
Finally, by using the lemma given below we get
sn = x1y1(1 + θn) + x2y2(1 + θ′n) + x3y3(1 + θn−1) + · · ·+ xnyn(1 + θ2),
where |θn| ≤ nu/(1− nu).In otherwords sn = xT y where
y = y1(1 + θn) + y2(1 + θ′n) + y3(1 + θn−1) + · · ·+ yn(1 + θ2).
5
Lemma 1.1. If |δ| ≤ u and pi = ±1 for i = 1, · · · , n and nu < 1 then
n∏i=1
(1 + δi)pi = 1 + θn,
where|θn| ≤
nu
1− nu=: γn.
Forward and Backward Errors
Forward Errors: Relative and Absolute Errors.
Backward Errors: What is x such that f(x) = f(x) ?
I@@@
@@@R
....................................................................................................
hhhhhhhhhhhhhhhhhhhhh
@@@R
Ix y = f(x)
y = f(x+ δx)
x+ δx
backward errorforward error
For example,
sn = x1y1(1 + θn) + x2y2(1 + θ′n) + x3y3(1 + θn−1) + · · ·+ xnyn(1 + θ2)
is a backward error result. sn is the exact result for the perturbed set of data points
x1, x2, · · · , xn, y1(1 + θn), y2(1 + θ′n), · · · , yn(1 + θ2).
The forward error is
|xTy − fl(xTy)| ≤ γnn∑
i=1
|xiyi|.
Example 4 (Exponential).
f(x) = ex = 1 + x+x2
2!+x3
3!+x4
4!+ · · ·
f(x) = 1 + x+x2
2!+x3
3!.
If x = 1, then to seven decimal places;
f(x) = 2.718282 and f(x) = 2.666667.
Furthermore,x = log(2.666667) = 0.980829.
So, the forward error is |f(x)− f(x)| = 0.051615 and the backward error is |x− x| = 0.019171.
6
Conditioning
How sensitive is the solution to perturbations in the data
• insensitive ←→ well conditioned.
• sensitive ←→ ill-conditioned.
Relates forward and backward errors.If we include a pertubation in the data δx the relative error is
|f(x+ δx)− f(x)||f(x)|
=1
|f(x)||f(x+ δx)− f(x)|
|δx||δx|
≈ 1
|f(x)||f ′(x)||δx|
=|f ′(x)||x||f(x)|
|δx||x|
= Condition Number|δx||x|
Definition 6 (Condition Number).
Condition Number =|relative change in solution||relative change in input data|
=|(f(x)− f(x))/f(x)|
|(x− x)/x|
≈ |(x− x)f ′(x)/f(x)||(x− x)/x|
=
∣∣∣∣xf ′(x)
f(x)
∣∣∣∣Example 5 (Exponential). For example, consider f(x) = ln(x), then
c = Condition Number =|1/x||x|| ln(x)|
=1
| ln(x)|,
which is large if x ≈ 1.
Suppose x = 2 and set x = 2.02 so that the relative input error is 0.01. Then |f(x)−f(x)|/|f(x)|is 0.0143553.
If x = 1.5 and x = 1.515 (relative input error remains as 0.01), then |f(x) − f(x)|/|f(x)| is0.0245405.
If we set x = 1.01, moving closer to 1, and have x = 1.0201, then |f(x)−f(x)|/|f(x)| is 1.00000.
Horners Method
LetPn−1(t) = x0 + x1t+ · · ·+ xn−1t
n−1
7
1.92 1.94 1.96 1.98 2.00 2.02 2.04 2.06 2.08−14e−11
−10e−11
−6e−11
−2e−11
2e−11
6e−11
10e−11
14e−11
Figure 1: Plot of p(t) = (t− 2)9 evaluated at 8000 equidistant points
and set an−1 = xn−1.If ak = xk + ak+1t0 for k = n− 2, n− 3, · · · , 0 then
a0 = Pn−1(t0).
Moreover, ifQn−2(t) = an−1t
n−2 + an−2tn−3 + · · ·+ a2t+ a1
thenPn−1(t) = (t− t0)Qn−2(t) + a0.
Algorithm 1. Horner’s Method1: p = xn2: for i = n− 1 down to 0 do3: p = t ∗ p+ xi4: end for
Lets apply the algorithm to p(t) = (t − 2)9 = t9 − 18t8 + 144t7 − 672t6 + 2016t5 − 4032t4 +5376t3 − 4608t2 + 2304t− 512.
Figure 1 shows a plot of p(t) = (t− 2)9.Figure 2 shows a plot of p(t) using Horners method.Recall
Condition Number =
∣∣∣∣tf ′(t)f(t)
∣∣∣∣ =t× 9(t− 2)8
(t− 2)9=
∣∣∣∣ 9t
t− 2
∣∣∣∣ .so we would expect this to be illconditioned around t = 2. That is what we have seen in this case.
8
1.92 1.94 1.96 1.98 2.00 2.02 2.04 2.06 2.08−14e−11
−10e−11
−6e−11
−2e−11
2e−11
6e−11
10e−11
14e−11
18e−11
Figure 2: Plot of p(t) evaluated at 8000 equidistant points using Horners method
Note that the condition number depends on the problem not the method that isused.
Lets rewrite Horners method as
Algorithm 2. Horner’s Method - Rewrite1: pn = xn2: for i = n− 1 down to 0 do3: pi = t ∗ pi+1 + xi4: end for
Now use floating point arithmetic to give
Algorithm 3. Horner’s Method - fp1: pn = xn2: for i = n− 1 down to 0 do3: pi = ((t ∗ pi+1)(1 + δi) + xi)(1 + δ′i)4: end for
where |δi|, |δ′i| < u.Expanding that out we get
p(t) =
n−1∑i=0
(1 + δ′i)i−1∏j=0
(1 + δj)(1 + δ′j)
xiti +
n−1∏j=0
(1 + δj)(1 + δ′j)
xntn.
9
This can be simplified to give
p(t) =
n−1∑i=0
(1 + 2θi)xiti
=n−1∑i=0
xiti
where |θi| ≤ iu/(1− iu) ≤ nu/(1− nu) = γn.So the computed solution p(t) is the correct solution of a slightly peturbed polynomial with
coefficients xn. This is a backward stable method and the relative backward error is 2γn.Let us verify the above equation.Polynomial of degree 0:
p0 = x0.
Polynomial of degree 1:p1 = x1.
p0 = fl (t× p1 + x0)
= ((t× p1)(1 + δ0) + x0) (1 + δ′0)
= (t× x1)(1 + δ0)(1 + δ′0) + x0(1 + δ′0).
Polynomial of degree 2:p2 = x2.
p1 = fl (t× p2 + x1)
= ((t× p2)(1 + δ1) + x1) (1 + δ′1)
= (t× x2)(1 + δ1)(1 + δ′1) + x1(1 + δ′1).
p0 = fl (t× p1 + x0)
= ((t× p1)(1 + δ0) + x0) (1 + δ′0)
=(t×[(t× x2)(1 + δ1)(1 + δ′1) + x1(1 + δ′1)
](1 + δ0)
)(1 + δ′0) + x0(1 + δ′0)
= (t2 × x2)(1 + δ1)(1 + δ′1)(1 + δ0)(1 + δ′0) + t× x1(1 + δ′1)(1 + δ0)(1 + δ′0) + x0(1 + δ′0).
Polynomial of degree 3:p3 = x3.
p2 = fl (t× p3 + x2)
= (t× x3)(1 + δ2)(1 + δ′2) + x2(1 + δ′2).
p1 = fl (t× p2 + x1)
= (t2 × x3)(1 + δ2)(1 + δ′2)(1 + δ1)(1 + δ′1) + t× x2(1 + δ′2)(1 + δ1)(1 + δ′1)
+x1(1 + δ′1).
10
p0 = fl (t× p1 + x0)
= ((t× p1)(1 + δ0) + x0) (1 + δ′0)
= (t× p1)(1 + δ0)(1 + δ′0) + x0(1 + δ′0)
=(t×[(t2 × x3)(1 + δ2)(1 + δ′2)(1 + δ1)(1 + δ′1)
+ (t× x2)(1 + δ′2)(1 + δ1)(1 + δ′1) + x1(1 + δ′1)])
(1 + δ0)(1 + δ′0) + x0(1 + δ′0)
= (t3 × x3)(1 + δ2)(1 + δ′2)(1 + δ1)(1 + δ′1)(1 + δ0)(1 + δ′0)
+ (t2 × x2)(1 + δ′2)(1 + δ1)(1 + δ′1)(1 + δ0)(1 + δ′0)
+ (t× x1)(1 + δ′1)(1 + δ0)(1 + δ′0) + x0(1 + δ′0)
=3−1∑i=0
(1 + δ′i)i−1∏j=0
(1 + δj)(1 + δ′j)
xiti +
i−1∏j=0
(1 + δj)(1 + δ′j)
x3t3.Cancellation
Example 6 (Cancellation error). Consider the function
f(x) =1− cosx
x2.
The plot in Figure 3 suggests that f(x) < 0.5.
If x = 1.2 × 10−5, then c = cos(x) to 10 significant figures is 0.9999 999999 and 1 − c =0.0000000001. Hence
1− cx2
=10−10
1.44× 10−10> 0.6944.
But f(x) < 0.5 for all x 6= 0.We can get around this problem by noting that
cosx = 1− 2 sin2(x
2)
and rewriting f(x) as
f(x) =1
2
(sin(x/2)
x/2
)2
.
Evaluating this formula using 10 significant digits gives f(x) = 0.5.What went wrong in the above example?Consider x = a− b where a = a(1 + δa), b = b(1 + δb), which is an approximation to x = a− b.
Then ∣∣∣∣x− xx∣∣∣∣ =
∣∣∣∣a− b− (a(1 + δa)− b(1 + δb))
a− b
∣∣∣∣=
∣∣∣∣−aδa+ bδb
a− b
∣∣∣∣≤ max|δa|, |δb| |a|+ |b|
|a− b|.
11
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
-10 -5 0 5 10
(1-cos(x))/(x*x)
Figure 3: Plot of f(x) = (1− cosx)/x2.
12
So if |a|+ |b|/|a− b| is large, the relative error in the subtraction of a and b will be large.⇒ heavy cancellation magnifies uncertainties already present in a and b.
The cause of the error in Example 6 is that cos(x) ≈ 1 when x ≈ 0.Cancellation is not always bad. Maybe the value of the initial numbers is known exactly
(ie. δa = δb = 0). The cancellation may not affect the remaining calculation. For example, ifx >> y ≈ z > 0 then the cancellation in x+ (y − z) is harmless.
Quadratic Equation Example
Example 7 (Cancellation in the quadratic equation). Consider the equation ax2 + bx+ c = 0.The two roots are
x =−b±
√b2 − 4ac
2aif a 6= 0.
If b2 >> |4ac| then√b2 − 4ac ≈ |b| so for one choice of the sign, the above equation suffers
from bad cancellation errors.
To avoid the problem, obtain the larger root, say x1, from
x1 =−(b+ sign(b)
√b2 − 4ac)
2aif a 6= 0.
and x2 by
x1x2 =c
a.
Stability
Recall that the relative error is defined to be
||f(x)− f(x)||||f(x)||
.
Definition 7 (Stable). We say that an algorithm is stable if for each x ∈ X
||f(x)− f(x)||||f(x)||
= O(u)
for some x with||x− x||||x||
= O(u).
u is the unit roundoff.
Backward and Forward Stability
13
Definition 8 (Backward Stable). We say that an algorithm f is backward stable if for each x ∈ X
f(x) = f(x)
for some x with||x− x||||x||
= O(u).
Definition 9 (Forward Stable). A method is forward stable if it gives forward errors with simi-lar magnitude (taking the condition number into account) to those produced by a backward stablemethod. That is
forward error<∼ backward error× condition number
Inner Product
Recall that
fl(x∗y) = (x + ∆x)∗y
= x∗(y + ∆y)
where||∆x|| ≤ γn||x||
and||∆y|| ≤ γn||y||,
where γn = nu/(1− nu).Hence the inner product is backward stable.
Outer Product
The outer product A = xy∗ is stable but not backward stable.Consider A = xy∗. By definition
aij = xiyj(1 + δij).
SoA = xy∗ + ∆,
where ‖∆‖ ≤ f(u)‖xy∗‖ and ∆ij = xiyjδij .So the algorithm is stable. That is
‖A−A‖‖A‖
=‖xy∗ + ∆− xy∗‖
‖xy∗‖≤ f(u)
‖xy∗‖‖xy∗‖
= O(u).
But is not backward stable. A will not in general be rank 1 so it not possible to find x = x+∆xand y = y + ∆y such that A = (x + ∆x)(y + ∆y)∗.
Backward Stability and Relative Error
14
Theorem 1. Suppose a backward stable algorithm is applied to solve a problem f : X → Y withcondition number κ. Then the relative errors satisfy
||f(x)− f(x)||||f(x)||
= O(κ(x)u).
Proof:The condition number is
κ(x) =‖f ′(x)‖ ‖x‖‖f(x)‖
≈ ‖f(x + δx)− f(x)‖‖f(x)‖
‖x‖‖δx‖
.
By definition of backward stability, f(x) = f(x) for some x ∈ X such that
‖x− x‖‖x‖
= O(u).
Take x = x + δx. From the definition of the condition number
‖f(x)− f(x)‖‖f(x)‖
≈ ‖δx‖‖x‖
κ(x) = O (κ(x)u) .
Matrix Norms
Definition 10 (Matrix Norms). Given a vector norm, we can define the corresponding matrixnorms as follows;
‖A‖ = max‖x‖6=0
‖Ax‖‖x‖
.
These norms are subordinate to the vector norms.For the 1-norm and ∞-norm these simplify to;
• ‖A‖1 = maxj∑n
i=1 |aij |.
• ‖A‖∞ = maxi∑n
i=j |aij |.
Condition Number of a matrix
Let b be fixed and consider the problem of computing x = A−1b, where A is square andnonsingular.
Definition 11 (Condition Number). The condition number of this problem with respect to pertur-bations in A is
κ = ‖A‖∥∥A−1∥∥ = κ(A).
15
If ‖ · ‖ = ‖ · ‖2, then ‖A‖ = σ1 and∥∥A−1∥∥ = 1/σm where σ1 is the maximum singular value and
σm is the minimum singular value. So
κ(A) =σ1σm
.
For a rectangular matrix A ∈ Cm,n of full rank, m ≥ n,
κ = ‖A‖∥∥A+
∥∥ = κ(A).
Theorem 1.2. Now let E = A − A = ∆A. Consider Ax = b and Ax = b where b is non-zero.Then
‖x− x‖‖x‖
≤ ‖A−1E‖ ≤ κ(A)‖E‖‖A‖
.
Proof:Consider (A+ E)x = b. Then
A−1(A+ E)x = A−1b
⇔ A−1(A+ E)x = x
⇔ (I +A−1E)x = x
⇔ x− x = A−1Ex
⇔ ‖x− x‖ ≤ ‖A−1E‖ ‖x‖.
Hence‖x− x‖‖x‖
= ‖A−1E‖ ≤ ‖A−1‖ ‖E‖ = κ(A)‖E‖‖A‖
.
As an aside, note that if‖y − x‖‖x‖
≤ ρ < 1
then‖x− y‖‖y‖
≤ ρ
1− ρ.
Becauseρ‖x‖ ≥ ‖y − x‖ ≥ ‖x‖ − ‖y‖.
so(1− ρ)‖x‖ ≤ ‖y‖
and‖x− y‖‖y‖
≤ ‖x− y‖(1− ρ)‖x‖
≤ ρ
1− ρ.
Therefore‖x− x‖‖x‖
≤ κ(A)‖E‖‖A‖
16
gives
‖x− x‖‖x‖
≤κ(A)‖E‖‖A‖
1− κ(A)‖E‖‖A‖
<∼ κ(A)‖E‖‖A‖
,
assuming κ(A)‖E‖‖A‖ is small.
Suppose ‖E‖ ≤ u‖A‖, then‖x− x‖‖x‖
<∼ κ(A)u.
17
Table 1: Example data points
i t yi i t yi1 2 2 3 6 282 4 11 4 8 40
2 Beyond Square Matrices
References
• Numerical Linear Algebra by Lloyd N. Trefethen and David Bau, III. SIAM 1997.
• Applied Numerical Linear Algebra by James W. Demmel. SIAM 1997
Data Fitting
Example 8 (Data fitting). Suppose we want to find a line at + b that best fits the data given inTable 1.
In linear least squares we minimise
m∑i=1
[yi − (ati + b)]2.
To find the minimum use the derivatives
0 =∂
∂a(
m∑i=1
[yi − (ati + b)]2)
and
0 =∂
∂b(
m∑i=1
[yi − (ati + b)]2).
Evaluating the derivatives and solving the resulting system of equations gives the solution.
0 =m∑i=1
2 [yi − (ati + b)] (−ti)⇒m∑i=1
yiti =
(m∑i=1
t2i
)a+
(m∑i=1
ti
)b.
0 =m∑i=1
2 [yi − (ati + b)] (−1)⇒m∑i=1
yi =
(m∑i=1
ti
)a+
(m∑i=1
1
)b.
As a system of equations,[n
∑mi=1 ti∑m
i=1 ti∑m
i=1 t2i
] [ba
]=
[ ∑mi=1 yi∑mi=1 yiti
].
18
Normal Equations
In matrix form, linear least squares is equivalent to minimising
||r||22 = r∗r, where r = b−Ax.
Now,
||r||22 = (b−Ax)∗(b−Ax)
= (b∗ − x∗A∗)(b−Ax)
= b∗b− 2x∗A∗b− x∗A∗Ax
andd(||r||22)dx
= 2A∗Ax− 2A∗b.
To minimise ||r||22 we want to solve the normal equations
A∗Ax = A∗b.
Note that A∗A ∈ Cn,n so its size is usually small.If rank(A)= n then A∗A is nonsingular so a solution exits.A∗A is symmetric and positive definite (SPD), so Cholesky factorisation can be used to find
A∗A = LL∗.
However, there is a problem with the condition number; cond(A∗A) = [cond(A)]2.
Example 9 (Data fitting continued). Consider the example on first slide. Then
A =
1 21 41 61 8
, x =
[ba
], b =
2112840
.
A∗A =
[4 2020 120
]=
[n
∑mi=1 xi∑m
i=1 xi∑m
i=1 x2i
], A∗b =
[81538
]=
[ ∑mi=1 yi∑m
i=1 yixi
].
Hence
x =
[ab
]=
[−136.65
].
QR factorisation
We are taught in school to solve equations like Ax = b by using Gaussian elimination. Gaussianelimination is also known as the LU factorisation because it decomposes A into a lower and uppertriangular matrix A = LU (see the revision chapter).
19
In the QR factorisation A is decomposed into a Q with orthonormal columns and an uppertriangular matrix R.
ConsiderAx = b ⇐⇒ QRx = b ⇐⇒ Rx = Q∗b.
Now, y = Q∗b is easy to calculate and Rx = y is easy to solve since R is an upper triangularmatrix.
Full QR factorisation
Definition 1 (Full QR factorisation). In the full QR factorisation we write A ∈ Cm,n as
A = QR
where Q is an m×m unitary matrix and R is m× n (the last m− n columns are zero, if m ≥ n).
Reduced QR Factorisation
Let q1, q2, · · · , qj be a set of orthonormal vectors.Suppose we have a matrix A with columns a1,a2, · · · ,aj and suppose that A has full rank.
Can we find q1, q2, · · · , qj such that
〈q1, q2, · · · , qj〉 = 〈a1,a2, · · · ,aj〉 for j = 1, · · · , n?
In particular we would like to write the columns of A as a1 a2 · · · an
=
q1 q2 · · · qn
r11 r12 · · · r1n
r22 · · ·...
. . ....rnn
where rkk 6= 0.
Note that this equivalent to saying
a1 = r11q1,
a2 = r12q1 + r22q2,
a3 = r13q1 + r23q2 + r33q3
So A = QR where the columns of Q are orthonormal and R is a square upper triangular matrix.This is the reduced QR factorisation of A.
Orthonormal Vectors
Using the ideas given in the Review chapter, we know that given, say, three linearly independentvectors a1, a2 and a3 we can produce three orthonormal vectors q1, q2 and q3 as follows;
1. q1 = a1/‖a1‖2.
2. v2 = a2 − (a∗1a2)/(a∗1a1)a1 = a2 − (q∗1a2)q1.
3. q2 = v2/‖v2‖2.
20
4. v3 = a3 − (q∗1a3)q1 − (q∗2a3)q2.
5. q3 = v3/‖v3‖2.
Gram-Schmidt Orthogonalisation
We can generalise the ideas from the previous slide to construct a set of orthonormal vectorsgiven a set of linearly independent vectors.
Let us rewrite this as
q1 =a1
r11,
q2 =a2 − r12q1
r22,
q3 =a3 − r13q1 − r23q2
r33,
...
qn =an −
∑n−1i=1 rinqirnn
.
This implies that an appropriate choice for rij , if i 6= j is rij = q∗i aj .The coefficients rjj are chosen to ensure that qi is normalised.
|rjj | = ‖vj‖2 =
∥∥∥∥∥aj −j−1∑i=1
rijqi
∥∥∥∥∥2
.
Consequently we have the following (classical) Gram-Schmidt algorithm;
Algorithm 4. Gram-Schmidt1: for j = 1 to n do2: vj = aj
3: for i = 1 to j − 1 do4: rij = q∗i aj
5: vj = vj − rijqi6: end for7: rjj = ||vj ||28: qj = vj/rjj9: end for
This method unfortunately turns out to be numerically unstable if the columns are nearlylinearly dependent. Later we will look a modified version of this algorithm.
Example 10.
A =
1 −1.0 1.01 −0.5 0.251 0.0 0.01 0.5 0.251 1.0 1.0
.
21
j = 1
v1 = [1 1 1 1 1]T .
r11 =√
5
q1 =1√5
[1 1 1 1 1]T .
j = 2
v2 = [−1.0 − 0.5 0.0 0.5 1.0]T .
• i = 1r12 = 0
v2 = [−1.0 − 0.5 0.0 0.5 1.0]T .
r22 =
√5√2
q2 =
√2√5
[−1.0 − 0.5 0.0 0.5 1.0]T .
j = 3
v3 = [−1.0 0.25 0.0 0.25 1.0]T .
• i = 1
r13 =
√5
2.
v3 = [−1.0 0.25 0.0 0.25 1.0]T −√
5
2
1√5
[1 1 1 1 1]T
= [0.5 − 0.25 − 0.5 − 0.25 0.5]T .
• i = 2r23 = 0.
v3 = [0.5 − 0.25 − 0.5 − 0.25 0.5]T .
r33 =
√7√8
q3 =
√8√7
[0.5 − 0.25 − 0.5 − 0.25 0.5]T .
22
Therefore
A =
1/√
5 −√
2/√
5√
7/(2√
8)
1/√
5 −√
2/(2√
5) −√
7/(4√
8)
1/√
5 0.0 −√
7/(2√
8)
1/√
5√
2/(2√
5) −√
7/(4√
8)
1/√
5√
2/√
5√
7/(2√
8)
√
5 0√
5/2√5/√
2 0√7/√
8
.
Existence
Theorem 2. Every A ∈ Cm,n (m ≥ n) has a reduced (and full) QR factorisation.
Proof:Suppose A has full rank. The Gram-Schmidt algorithm gives proof of existence by construction.
The only place where the algorithm could fail is if rjj = ‖vj‖2 = 0 and qj = vj/rjj is not defined.This would in turn imply that
aj ∈ 〈q1, · · · , qj−1〉 = 〈a1, · · · ,aj−1〉
a contradiction to A having full rank. (Recall |rjj | = ‖aj −∑j−1
i=1 rijqi‖2, so if |rjj = 0| then
aj =∑j−1
i=1 rijqi.)Suppose A does not have full rank. Then at some stage of the algorithm we will get rjj = 0.
So we just pick some qj arbitrarily to be any normalised vector orthogonal to 〈q1, · · · , qj−1〉 andcontinue the process. Recall a1 = r11q1, a2 = r12q1 + r22q2, a3 = r13q1 + r23q2 + r33q3 etc. If,for example, a2 was a multiple of a1 then a2 = r12q1 and hence r22 = 0 and q2 may be chosenarbitrarily.
Modified Gram-Schmidt
In the classical Gram-Schmidt algorithm aj appears in the computations only in the jth stage.The method can be rearranged so that as soon as qj is computed all of the remaining vectors areorthogonalised against qj . Another way of looking at this is that we generate R by rows ratherthen columns.
Algorithm 5. Modified Gram-Schmidt1: for i = 1 to n do2: vi = ai
3: end for4: for i = 1 to n do5: rii = ||vi||26: qi = vi/rii7: for j = i+ 1 to n do8: rij = q∗i vj9: vj = vj − rijqi
10: end for11: end for
23
Householder Transformations
A Householder transformation matrix F is of the form
F = I − 2vv∗
v∗v.
Note that F = F ∗ = F−1.Given a vector a we wish to choose the vector v such that
Fa =
α0...0
= αe1.
Then
αe1 = Fa =
(I − 2vv∗
v∗v
)a = a− v
2v∗a
v∗v,
and
v = (a− αe1)v∗v
2v∗a.
We can ignore the (v∗v)/(2v∗a) term since it is cancelled when v is substituted into F . Thus
v = (a− αe1).
From||a||22 = ||Fa||22 = ||αe||22 = |α|2,
we get α = ±||a||2. The sign is chosen to avoid cancellation errors.
Example 11 (Householder).
a =
212
.v = a− αe1 =
212
− α 1
00
=
2− α12
.α = ±‖a‖2 = ±3. Since a1 is positive we take α = −3.
Therefore
v =
512
24
Table 2: Example data set for Householder algorithm.
i xi yi0 0 1.00001 0.25 1.28402 0.50 1.64873 0.75 2.11704 1.00 2.7183
Note
Fa = a− 2vTa
vTv=
212
− 215
30
512
=
−300
.Given a matrix A we use Householder transformations to introduce zeros column by column
below the diagonal.
Qn · · ·Q1A =
[R′
0
],
where R′ is upper triangular and
Qi =
[I 00 Fi
].
Take Q∗ = Qn · · ·Q1 then A = QR where
R =
[R′
0
].
Example 12 (Householder). Lets try to fit the quadratic a+ bx+ cx2 to the data given in Table 2.
Ax =
1.0 0.00 0.00001.0 0.25 0.06251.0 0.50 0.25001.0 0.75 0.56251.0 1.00 1.0000
abc
=
1.00001.28401.64872.11702.7183
.
v1 =
1.01.01.01.01.0
−−2.23607
0.00.00.00.0
=
3.23607
1.01.01.01.0
.From
Fa = a−(
2vTa
vTv
)v,
25
vT1 v1 = 14.47214, vT
1 a2 = 2.5, vT1 a3 = 1.875, vT
1 b = 11.00407.
F1
0.000.250.500.751.00
=
0.000.250.500.751.00
− 2× 2.5
14.47214
3.23607
1.01.01.01.0
=
−1.11803−0.09549
0.154510.404510.65451
.
F1
0.00000.06250.25000.56251.0000
=
0.00000.06250.25000.56251.0000
− 2× 1.875
14.47214
3.23607
1.01.01.01.0
=
−0.83853−0.19662−0.00912
0.303380.74088
F1
1.00001.28401.64872.11702.7183
=
1.00001.28401.64872.11702.7183
− 2× 11.00407
14.47214
3.23607
1.01.01.01.0
=
−3.92117−0.23672
0.127980.596281.19758
F1A =
−2.23607 −1.11803 −0.83853
0.00000 −0.09549 −0.196620.00000 0.15451 −0.009120.00000 0.40451 0.303380.00000 0.65451 0.74088
, F1b =
−3.92117−0.23672
0.127980.596281.19758
.
v2 =
−0.09549
0.154510.404510.65451
−
0.790570.00.00.0
=
−0.88607
0.154510.404510.65451
.vT2 v2 = 1.40099, vT
2 a3 = 0.78044, vT2 b = 1.25455.
F2
−0.19662−0.00912
0.303380.74088
=
−0.19662−0.00912
0.303380.74088
− 2× 0.78044
1.40099
−0.88607
0.154510.404510.65451
=
0.79056−0.18126−0.14730
0.01167
.
26
F2
−0.23672
0.127980.596281.19758
=
−0.23672
0.127980.596281.19758
− 2× 1.25455
1.40099
−0.88607
0.154510.404510.65451
=
1.35017−0.14874−0.12818
0.02539
.
F2F1A =
−2.23607 −1.11803 −0.83853
0.00000 0.79056 0.790560.00000 0.00000 −0.181260.00000 0.00000 −0.147300.00000 0.00000 0.01167
, F2F1b =
−3.92117
1.35017−0.14874−0.12818
0.02539
.
v3 =
−0.18126−0.14730
0.01167
− 0.23386
0.00.0
=
−0.41512−0.14730
0.01167
.vT3 v3 = 0.19416, vT
3 b = 0.08092.
F3
−0.14874−0.12818
0.02539
=
−0.14874−0.12818
0.02539
− 2× 0.08092
0.19416
−0.41512−0.14730
0.01167
=
0.19728−0.00540
0.01566
.
F3F2F1A =
−2.23607 −1.11803 −0.83853
0.00000 0.79056 0.790560.00000 0.00000 0.233860.00000 0.00000 0.000000.00000 0.00000 0.00000
.
F3F2F1b =
−3.92117
1.350170.19728−0.00540
0.01566
.So,
c = 0.19728/0.23386 = 0.8436.
b = (1.35017− 0.8436× 0.79056)/0.79056 = 0.8643.
a = (−3.92117−−1.11803× 0.8643−−0.83853× 0.8436)/− 2.23607 = 1.0051.
27
QR Factorisation
Given an m× n matrix A with m ≥ n, find an m× n orthogonal matrix Q such that
A = QR,
where R is an n× n upper triangular matrix.
||b−Ax||22 =∣∣∣∣∣∣b− QRx∣∣∣∣∣∣2
2=∣∣∣∣∣∣Q∗b− Rx∣∣∣∣∣∣2
2.
• Householder transformations.
• Givens transformations.
• Gram-Schmidt orthogonalisations.
Solving Overdetermined Systems using Normal Equations
1. Form A∗A and the vector A∗b.
2. Compute the Cholesky factorisation A∗A = LL∗.
3. Solve the lower-triangular system Lw = A∗b for w.
4. Solve the upper-triangular system L∗x = w for x.
Work for algorithm ∼ mn2 + 13n
3 flops.
Symmetric Positive Definite Matrices
If A is SPD then A = LL∗ for some L.
Algorithm 6. Cholesky factorisation1: for j = 1 to n do2: for k = 1 to j − 1 do3: for i = j to n do4: aij = aij − aikajk5: end for6: end for7: ajj =
√ajj
8: for k = j + 1 to n do9: akj = akj/ajj
10: end for11: end for
Solving Overdetermined Systems using QR Factorisation
1. Compute the reduced QR factorisation A = QR.
2. Compute the vector Q∗b.
28
3. Solve the upper-triangular system Rx = Q∗b for x.
Work for algorithm ∼ 2mn2 − 23n
3 flops.
Solving Overdetermined Systems using SVD
1. Compute the reduced SVD A = UΣV ∗.
2. Compute the vector U∗b.
3. Solve the diagonal system Σw = U∗b for w.
4. Set x = Vw.
Work for algorithm ∼ 2mn2 + 11n3 flops.
29
3 Alternatives to Direct Methods
References
Jonathan Richard Shewchuk, An Introduction to the Conjugate Gradient MethodWithout the Agonizing Pain.
https://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf
Direct and Iterative methods
We shall look at the solution of the linear system of equations Ax = b.Direct Methods solve the system exactly. Examples of direct methods include Gaussian
Elimination (LU factorisation), Cholesky factorisation etc.Iterative Methods solve the system approximately. Examples include Krylov subspace tech-
niques such as the Conjugate Gradient method, Arnoldi methods, GMRES, Lanczos iterations; andstationary methods like Jacobi, SOR and SSOR.
Direct v’s Iterative
Neither method beats the other in all cases, hence research is still very active in both fields.What follows is some pro’s and con’s for both methods.
• Direct methods, in general, require O(m3) work.
• Iterative methods exist which require O(m) or O(m logm) work.
• Many large systems arise from the discretisation of differential or integral equations. Exactsolution is not required. These systems have certain structures which can be exploited byiterative methods.
• Iterative methods preserve sparsity (but there are fast direct methods that reduce filinl).
• Direct methods require no initial guess, but they take no advantage of it if a good estimatehappens to be available.
• Iterative methods are often dependent on special properties, such as the matrix being sym-metric, positive definite and are subject to slow convergence for badly conditioned systems.
• Iterative methods are less readily employed in standard software packages, since the bestrepresentation of the matrix is often problem-dependent where as direct methods employmore standard storage schemes.
Overview
We use the Conjugate Gradient (CG) method to solve systems of the form Ax = b where A issymmetric and positive definite. I.e. AT = A and xTAx > 0, ∀x 6= 0.
Quadratic Form
The quadratic form of a function is
f(x) =1
2xTAx− bTx + c.
30
Figure 4: Geometrical representation of the solution of a system of equations
The gradient of f(x) is
f ′(x) =
∂
∂x1f(x)
∂∂x2
f(x)...
∂∂xn
f(x)
=
1
2ATx +
1
2Ax− b
= Ax− b.
Hence the minimum of f(x) is the solution of Ax = b.The solution of the system of equations represented in Figure 4 is the point where the two
lines cross. That is (2,−2). The minimum of the quadratic equation shown in Figure 5 is alsothe solution of the system of equations. The contour plot in Figure 6 shows more clearly that theminimum occurs at (2,−2).
Method of Steepest Descent
In the method of steepest decent we takes steps x1,x2, · · · down the slope of f(x) until wereach the minimum.
The direction of the step is given by −f ′(x). But,
−f ′(xi) = b−Axi = ri.
31
Figure 5: A plot of the quadratic equation
Figure 6: A contour plot of the quadratic equation
32
Figure 7: Take a step in the direction ri
So we can take a step by evaluating
xi+1 = xi + αri.
The next question is how big a step should we take? In other words how big should α be?Recalling that we want to minimise f(x),
∂f(xi+1)
∂α= f ′(xi+1)
T ∂xi+1
∂α
= f ′(xi+1)Tri.
So the minimum is reached when f ′(xi+1) is orthogonal to ri.Finally, noting that f ′(xi+1) = −ri+1 we can show that
α =rTi ri
rTi Ari.
To see why, consider
0 = rTi+1ri
= (b−Axi+1)Tri
= (b−A(xi + αri))Tri
= (ri − αAri)Tri= rTi ri − αrTi Ari.
So α = rTi ri/(rTi Ari).
Figures 7 and 8 show that the method takes a step along the direction ri until it reaches theminimum of the curve f(xi + αri).
Hence, the method of steepest descent is
33
Figure 8: Take a step in the direction ri until reaching the minimum of the curve f(xi + αri)
Algorithm 7. Steepest descent1: ri = b−Axi.
2: αi =rTi ri
rTi Ari
.
3: xi+1 = xi + αiri.
Note that, xi+1 = xi + αri so −Axi+1 = −Axi − αAri and ri+1 = ri − αAri. We need tocalculate Ari in Step 2, so we could replace Step 1 with ri+1 = ri − αAri. The problem is theround-off errors will then accumulate so we should periodically correct the calculations by explicitlycalculating ri = b−Axi.
The method of steepest descent repeatedly follows the curve down in the steepest direction untilit reaches the minimum. See Figure 9.
Convergence Analysis for Steepest Descent
It is easier to analyse the CG method if we work in the energy norm,
||x||A =(xTAx
) 12 .
Let vj be a set of orthonormal eigenvectors of A with corresponding eigenvalues λj . The error,ei = x− xi, can then be written as a linear combination of vj ,
ei =n∑
j=1
ξjvj .
By using this notation we can show that
||ei+1||2A = ω2||ei||2A,
where
ω2 = 1−
(∑j ξ
2jλ
2j
)2(∑
j ξ2jλ
3j )(∑
j ξ2jλj)
.
34
-4 -2 2 4 6
-6
-4
-2
2
4
1
2
0
The method of Steepest Descent.
Figure 9: Diagramtic representation of the method of steepest descent
Firstly, note that
(ei)TAei = (
∑j
ξjvj)TA(
∑j
ξjvj)
= (∑j
ξjvj)T (∑j
ξjλjvj)
=∑j
ξ2jλj ,
and(ri)
Tri = (Aei)T (Aei) = (
∑j
ξjλjvj)T (∑j
ξjλjvj) =∑j
ξ2jλ2j .
35
Now
‖ei+1‖2A = eTi+1Aei+1
= (ei − αiri)TA(ei − αiri)
= eTi Aei − 2αirTi Aei + α2
i rTi Ari
= ‖ei‖2A − 2rTi ri
rTi Ari(rTi ri) +
(rTi ri
rTi Ari
)2
rTi Ari
= ‖ei‖2A −(rTi ri)
2
rTi Ari
= ‖ei‖2A(
1− (rTi ri)2
(rTi Ari)(eTi Aei)
)= ‖ei‖2A
(1−
(∑
j ξ2jλ
2j )
2
(∑
j ξ2jλ
3j )(∑
j ξ2jλj)
)= ω2‖ei‖2A,
where
ω2 = 1−
(∑j ξ
2jλ
2j
)2(∑
j ξ2jλ
3j )(∑
j ξ2jλj)
.
If we set
κ =λmax
λmin,
it can also be shown (see reference) that
ω ≤ κ− 1
κ+ 1.
So the convergence rate of steepest descent depends on the condition number as well as theinitial guess. See figures 10, 11 and 12.
Method of Conjugate Directions
Steepest descent often takes several steps in the same direction. Lets pick a set of orthogonalsearch directions d0,d1, · · · ,dn−1 and only take one step in each direction. So,
xi+1 = xi + αidi.
After n steps we are done. See Figure 13.What is the value of αi? We want ei+1 and di to be orthogonal, so that we do not go back in
the same direction of di again. Then
dTi ei+1 = 0
dTi (ei − αidi) = 0
αi =dTi ei
dTi di
.
36
20 40 60 80 1000
0.2
0.4
0.6
0.8
1
Convergence of Steepest Descent (per iteration) worsens asthe condition number of the matrix increases.
Figure 10: The convergence rate depends on the condition number
Figure 11: (a) Large κ, small µ; (b) An example of poor convergence, κ and µ are both large
37
Figure 12: (c) small κ, small µ; (d) small κ, large µ
-4 -2 2 4 6
-6
-4
-2
2
4
1
2
0
1
1
0
The Method of Orthogonal Directions.
Figure 13: The idea behind the method of conjugate directions is to only take one step in eachdirection
38
Figure 14: A plot of A-orthogonal vectors
But we don’t know ei.So instead we make the two vectors A orthogonal. I.e. set
dTi Aei+1 = 0.
Figures 14 and 15 show the difference between orthogonal and A-orthogonal vectors. Recall
xi+1 = xi + αidi
soei+1 = ei − αidi.
IfdTi Aei+1 = 0,
dTi A(ei − αidi) = 0,
dTi Aei − αid
Ti Adi = 0,
and
αi =dTi ri
diAdi.
This gives
αi =dTi ri
dTi Adi
.
If r = d this is the steepest descent method.
Conjugate Directions and Error
Lets express the initial error as a linear combination of the search directions
e0 =
n−1∑j=1
σjdj .
39
Figure 15: Compare these orthogonal vectors with the A-orthogonal vectors in Figure 14
Now, dTkAe0 =
∑n−1j=1 σjd
TkAdj = σkd
−1k Adk.
So,
σk =dTkAe0
dTkAdk
=dTkA(e0 −
∑k−1i=0 αidi)
dTkAdk
=dTkAek
dTkAdk
using ei+1 = ei − αidi
=dTk rk
dTkAdk
= αk.
This gives
ei = e0 −i−1∑j=0
αjdj
=
n−1∑j=0
σjdj −i−1∑j=0
σjdj
=n−1∑j=i
σjdj .
Hence, after n iterations every component of the error has been cut away.
Gram-Schmidt Conjugation
How do we find the search directions di?We can use the conjugate Gram-Schmidt process.
40
Suppose we have n linearly independent vectors u0, · · ·un−1. To construct di we take ui andsubtract any components not orthogonal to the previous d vectors. Set d0 = u0 and
di = ui +
i−1∑k=0
βikdk, i > 0.
Using A-orthogonality
0 = dTi Adj
= uTi Adj +
i−1∑k=0
βikdTkAdj
= uTi Adj + βijd
Tj Adj
(j < i) and
βi,j =−uT
i Adj
dTj Adj
.
The problem with this approach is that we need to keep all search directions in memory.
Conjugate Gradients
How do we choose the conjugate directions?The CG method is the method of conjugate directions where the search directions are con-
structed by conjugation of the residuals (setting ui = ri).Note that
dTi rj = 0 if i < j,
so the residuals are orthogonal to the previous search direction, and is thus guaranteed to producea new linearly independent search direction, unless the residual is zero.
Let Di be the i-dimensional subspace spanned by d0, d1, · · · , di−1. Then ri is orthogonal toDi.
dTi rj = dT
i Aej
= dTi A(
n−1∑k=j
αkdk)
=n−1∑k=j
αkdTi Adk
= 0 if i < j by A-orthogonality.
Also
dTi rj = uT
i rj +
i−1∑k=0
βikdTk rj
so 0 = uTi rj if i < j. This means that rTi rj = 0 if i 6= j since we set ui = ri. Recall
βij =−uT
i Adj
dTj Adj
=−rTi Adj
dTj Adj
.
41
Now,ri+1 = Aei+1 = A(ei − αidi) = ri − αiAdi.
rTi rj+1 = rTi rj − αjrTi Adj .
αjrTi Adj = rTi rj − rTi rj+1.
rTi Adj =
1/αir
Ti ri if i = j
−1/αi−1rTi ri if i = j + 1
0 otherwise.
So, if i > j
βij =−rTi Adj
dTj Adj
=
1/αi−1
rTi ri
dTj Adj
if i = j + 1
0 otherwise.
Hence we only need to keep the previous search direction.Let βi = βi,i−1 and evaluate.
βi =1
αi−1
rTi ri
dTi−1Adi−1
=dTi−1Adi−1
dTi−1ri−1
rTi ri
dTi−1Adi−1
using definition of αi
=rTi ri
dTi−1ri−1
=rTi ri
(ui−1 +∑i−2
k=0 βikdk)Tri−1
=rTi ri
uTi−1ri−1
=rTi ri
rTi−1ri−1.
Finally, the method of Conjugate Gradients is
Algorithm 8. Conjugate Gradient1: d0 = r0 = b−Ax0.
2: αi =rTi ri
dTi Adi
.
3: xi+1 = xi + αidi.4: ri+1 = ri − αiAdi.
5: βi+1 =rTi+1ri+1
rTi ri
.
6: di+1 = ri+1 + βi+1di.
When solving a two dimensional problem using the Conjugate Gradient method the solutionwill be found in two steps. See Figure 16.
It can be shown (see reference) that the convergence rate of the CG method is given by
||ei||A ≤ 2
(√κ− 1√κ+ 1
)i
||e0||A.
42
-4 -2 2 4 6
-6
-4
-2
2
4
1
2
0
The method of Conjugate Gradients.
Figure 16: The solution of the model problem using the Conjugate Gradient method
Recall the convergence rate of the steepest descent method depended on κ whereas we see thatthe convergence rate for the CG method depends on
√κ. As shown in Figure 17, this makes a big
difference to the convergence rate.
Preconditioners
Consider the linear systemAx = b.
We have shown that the convergence rate of iterative methods depends on the condition numberof A.
Now consider the linear systemM−1Ax = M−1b.
This has the same solution as the original system. If we can find an M with favourable properties(M−1A has a small condition number) then we can solve the system a lot faster. Such an M iscalled a Preconditioner.
Left and Right Preconditioners
In general, left and right preconditioners can be used. Write M = M1M2 and transform thesystem to
M−11 AM−12 (M2x) = M−11 b,
where M1 and M2 are the left and right preconditioners.This formulation makes it easier to preserve certain properties of A, such as symmetry and
positive definiteness.
43
20 40 60 80 1000
0.2
0.4
0.6
0.8
1
Convergence of Conjugate Gradients (per iteration) as afunction of condition number.
200 400 600 800 10000
5
10
15
20
25
30
35
40
Number of iterations of Steepest Descent required to matchone iteration of CG.
Figure 17: Comparision of the convergence rate of the steepest descent with the CG method
44
The following slides describes some preconditioners. This a very short (superficial) overview,many other preconditioners are available.
Jacobi Preconditioner
• Point Jacobi Preconditioner:
The simplest preconditioner is to let M = diag(A). This is cheap, easy and can be effectivefor some problems. Although more sophisticated methods are available.
• Block Jacobi Preconditioner:
Let
mi,j =
ai,j if i, j are in some index set0 otherwise
Incomplete Factorisation
Suppose A is sparse. The factors formed by LU decomposition or Cholesky factorisation maynot be very sparse.
Suppose L is computed by Cholesky like formulas but allowed to have non-zeros only in positionswhere A has non-zeros. Define M = LLT .
This is incomplete Cholesky. Similar ILU or incomplete LU decompositions are available in thenonsymmetric case as well as other modifications of the above idea.
This method is not cheap, but improved convergence rate may be enough to recover the cost.
Preconditioners that depend on the PDE
• Low-Order Discretisations:
A high-order method gives a more accurate approximation, but the discretisation stencils arelarge and the matrix less sparse. A lower-order discretisation can be used as a preconditioner.
• Constant Coefficients:
Fast solvers are available for certain PDEs with constant coefficients. For problems withvariable coefficients, a constant coefficient approximation may be a good preconditioner.
• Splitting of multi-term operator:
Many applications involve combination of physical properties. It may be possible to write Aas A = A1 +A2. If A1 or A2 is easily invertible it may be a good preconditioner.
• The multigrid method is a good preconditioner (see Bramble)
• If the problem contains long and short range couplings, it may be possible to ignore the longrange couplings in the preconditioner.
The Transformed Preconditioned Conjugate Gradient Method uses
M−11 AM−T1 x = M−11 b, x = MT1 x, M = M1M
T1 .
Algorithm 9. Transformed Preconditioned CG Method
45
1: d0 = r0 = M−11 b−M−11 AM−T1 x0.
2: αi =rTi ri
dTi M−1
1 AM−T1 di
.
3: xi+1 = xi + αidi.4: ri+1 = ri − αiM
−11 AM−T1 di.
5: βi+1 =rTi+1ri+1
rTi ri
.
6: di+1 = ri+1 + βi+1di.
Problem:: Need to know M1
Set ri = M−11 ri, di = MT1 di and M = M1M
T1 .
The Untransformed Preconditioned Conjugate Gradient Method is
Algorithm 10. Untransformed Preconditioned CG Method1: r0 = b−Ax0.2: d0 = M−1r0.
3: αi =rTi M−1ridTi Adi
.
4: xi+1 = xi + αidi.5: ri+1 = ri − αiAdi.
6: βi+1 =rTi+1M
−1ri+1
rTi M−1ri
.
7: di+1 = M−1ri+1 + βi+1di.
PreconditionersStatements of the form s = M−1r, implies that you should apply the preconditioner, you need notexplicitly form the matrix M .
Example
Example 13 (Preconditioned CG). The A matrix is the block tri-diagonal matrix given by the5-point stencil on a uniform grid. It is of size 632 × 632.
Note: the number of non-zeros in the original A matrix is 2063. The number of non-zeros inM if it is formed using Incomplete Cholesky with a tolerance of 10−2 is 2293, but if the toleranceis increased to 10−3 the number of non-zeros increases to 4835.
Convergence rate
Figure 18 shows how the right choice of preconditioner can greatly influence the convergencerate. Note that what this diagram does not show, is that to decrease the overall run time we needa preconditioner that is cheap and easy to apply.
46
0 5 10 15 20 25 3010
−9
10−8
10−7
10−6
10−5
10−4
10−3
10−2
10−1
100
101
Iteration
||r||/
||b||
No PrecondIn. Chol (10−2)In. Chol (10−3)Block GSBlock Sor(1.8)
Figure 18: Convergence rate when using various preconditioners for the model problem
47
4 Building on Simple Methods
References
A Multigrid Tutorial by William Briggs, Van Emden Henson and Steve McCormick, 2nd Edition,SIAM, 2000.
Stationary Iterative Methods
An iterative method for solving a system Ax = b has the form
x(k+1) = Rx(k) + c
where R and c are chosen so that at some fixed point of the equation x = Rx + c is a solution ofAx = b. This is called stationary if R and c are constant over all iterates.
Matrix Splitting
Definition 2 (Matrix Splitting). A splitting of A is a decomposition A = M −K, with M nonsin-gular.
Ax = Mx−Kx = b⇒Mx = Kx + b.
⇒ x = M−1Kx +M−1b = Rx + c
where R = M−1K, c = M−1b.We take xm+1 = Rxm + c as our iterative method.
Theorem 3. If ||R|| < 1, then xm+1 = Rxm + c converges for any initial guess x0.
Theorem 4. The iteration xm+1 = Rxm + c converges for any initial guess x0 if and only ifρ(R) < 1.
Jacobi Method
Let D = diagonal elements of A, −L be strictly lower triangular part of A and −U be strictlyupper triangular part of A.
Set M = D and K = L+ U , so RJ = D−1(L+ U) and cJ = D−1b.One step of the Jacobi method is then
Dxm+1 = (L+ U)xm + b.
The Jacobi algorithm is
Algorithm 11. Jacobi1: for j = 1 to n do2: xm+1
j = 1ajj
(bj −∑
k 6=j ajkxmk )
3: end for
Jacobi Example
48
Table 3: Example output from model problem
k x1 x2 x3 x40 0 0 0 01 0.6000 2.2727 -1.1000 1.87502 1.0473 1.7159 -0.8052 0.88523 0.9326 2.0533 -1.0493 1.13094 1.0152 1.9537 -0.9681 0.97385 0.9890 2.0114 -1.0103 1.02146 1.0032 1.9922 -0.9945 0.99447 0.9981 2.0023 -1.0020 1.00368 1.0006 1.9987 -0.9990 0.99899 0.9997 2.0004 -1.0004 1.000610 1.0001 1.9998 -0.9998 0.9998
func t i on iterative ( maxn )
A = [ 1 0 , −1, 2 , 0 ;−1, 11 , −1, 3 ;
2 , −1, 10 , −1;0 , 3 , −1, 8 ] % d e f i n e the A matrix
b = [ 6 ; 25 ; −11; 15 ] % d e f i n e the RHS
x = [ 0 ; 0 ; 0 ; 0 ] % f i n d an i n i t i a l guess
max_it = 10 ; % maximum number o f i t e r a t i o n stol = 0 . 0 0 1 ; % s o l u t i o n t o l e r a n c e
y = jacobi (A , x , b , max_it , tol )
Table 3 shows the convergence rate of the Jacobi method applied to the model problem.
Gauss-Seidel Method
Set M = D − L and K = U , so RGS = (D − L)−1U and cGS = (D − L)−1b.One step of the Gauss-Seidel method is then
(D − L)xm+1 = Uxm + b.
The Gauss-Seidel algorithm is
Algorithm 12. Gauss-Seidel1: for j = 1 to n do2: xm+1
j = 1ajj
(bj −∑
k<j ajkxm+1k −
∑k>j ajkx
mk )
3: end for
49
Table 4: Example output from model problem
k x1 x2 x3 x40 0 0 0 01 0.6000 2.3273 -0.9873 0.87892 1.0302 2.0369 -1.0145 0.98433 1.0066 2.0036 -1.0025 0.99844 1.0009 2.0003 -1.0003 0.99985 1.0001 2.0000 -1.0000 1.0000
The order in which we define the elements of xm is important in the Gauss-Seidel method. Twocommon orderings are the lexicographical ordering and the red black ordering.
Gauss-Seidel Example
Table 4 shows the convergence rate of the Gauss-Seidel method applied to the model problem.
Successive Overrelaxation
Consider ωAx = ωb.Set M = D − ωL and K = (1 − ω)D + ωU where ω is the relaxation factor. Then RSOR =
(D − ωL)−1((1− ω)D + ωU) and cSOR = (D − ωL)−1ωb.One step of the SOR method is then
(D − ωL)xm+1 = ((1− ω)D + ωU)xm + ωb.
ω modifies the size of the correction. If ω = 1 this is the same as the GS method. If ω > 1 it iscalled overrelaxation. If ω < 1 it is called underrelaxation.
The SOR algorithm is
Algorithm 13. SOR1: for j = 1 to n do2: xm+1
j = (1− ω)xmj + ωajj
(bj −∑
k<j ajkxm+1k −
∑k>j ajkx
mk )
3: end for
Some Convergence Properties
Theorem 5. If A is strictly diagonally dominant, then for any choice x0 both the Jacobi andGauss-Seidel methods converge to the solution Ax = b.
Theorem 6 (Stein-Rosenberg). If aij ≤ 0 for each i 6= j and aii > 0 for each i = 1, 2, · · · , n, thenone and only one of the following statements holds:
1. 0 < ρ(RGS) < ρ(RJ) < 1,
2. 1 < ρ(RJ) < ρ(RGS),
3. ρ(RJ) = ρ(RGS) = 0,
50
4. ρ(RJ) = ρ(RGS) = 1.
Model Problem
Lets suppose A has been formed by the standard 5-point finite difference stencil. Then
ρ(Gj) = cosπN + 1 ≈ 1− π2
2(N + 1)2
where N is the number of nodal points. This means that as N increase ρ(Gj) approaches 1 andconvergence slows down.
If A is, for example, updated in red/black Gauss-Seidel order then ρ(GGS) = ρ(GJ)2. So onesweep of Gauss-Seidel decreases the error as much as two Jacobi. Thus Gauss-Seidel is twice asfast.
If we use red/black SOR and choose ω = 2/(1 + sin(π/N + 1)) then
ρ(GSOR(w)) ≈ 1− 2π
N + 1,
compared to ρ(GJ) = 1−O(1/N2).
1D Poisson Example
Example 14 (3-point stencil). Number of Iterations required to solve the system of equationsAx = b where the A is the 3-point stencil from the finite difference approximation of Poisson’sequation.
1
h2
−2 1 0 · · · · · · · · · 0
1 −2 1 0...
0 1 −2 1 0...
.... . .
. . .. . .
. . .. . .
...... 0 1 −2 1 0... 0 1 −2 10 · · · · · · · · · 0 1 −2
x = b.
Figure 19 shows that as the matrix size is increased, the number of iterations required to solvethe system of equations also increases.
The number of iterations also depends on the initial guess. Figure 20 shows that using thesolution from a coarse grid as an initial guess for the fine grid solution can reduce the number ofiterations.
Weighted Jacobi
The weighted Jacobi method is given by
Rw = (1− ω)I + ωRJ ,
cw = D−1b.
51
3 4 5 6 7 8 9 100
50
100
150
200
250
300
Num
ber
of it
erat
ions
Matrix Size
Number of iterations, using initial guess x = 0
JacobiGauss Seidel
Figure 19: Convergence rate compared to the problem size
3 4 5 6 7 8 9 100
50
100
150
200
250
Num
ber
of it
erat
ions
Matrix Size
Number of iterations, using coarse grid
JacobiGauss Seidel
Figure 20: Number of iterations required to solve the system if the coarse grid solution is used asan initial guess
52
k=6
k=4
k=2
k=1
0
-1
-0.5
20
j
30155 10
0.5
1
250
N=32
Figure 21: Plot of the sine function for different frequencies
SoDxm+1 = (1− ω)Dxm + ω(L+ U)xm + b.
Therefore one step of the algorithm is given by
xm+1j =
1
ajj
(1− ω)ajjxmj + ω(bj −
∑k 6=j
ajkxmk )
=
1
ajj
(ajjx
mj + ω(bj −
∑k
ajkxmk )
).
The weighted Jacobi method is not used in practise, it is included here to motivate the multigridmethod.
Weighted Jacobi and the Three Point Stencil
Consider the following system of equations
−u(xi − h) + 2u(xi)− u(xi + h) = 0, u0 = un = 0.
Now suppose we tried to solve the system of equations using the Jacobi method and we pickedv0 as our initial guess where
v0j = sin
(jkπ
n
), 0 ≤ j ≤ n
for some k (1 ≤ k ≤ n − 1). The value k is the wave number. Small values of k give smoothfunctions, large values give more oscillatory functions. See Figure 21.
53
y
1
0.5
0
-0.5
-1
x
302520151050
i = 0, k = 2, N=32
y
1
0.5
0
-0.5
-1
x
302520151050
i = 1, k = 2, N=32
y
1
0.5
0
-0.5
-1
x
302520151050
i = 2, k = 2, N=32
y
1
0.5
0
-0.5
-1
x
302520151050
i = 3, k = 2, N=32
Figure 22: Plot of the error for 4 iterations if k = 2
SmoothersSmoothers like the Jacobi and Gauss Seidel methods are good at removing the high frequencycomponents of the error but have difficulty removing the low frequency components.
Figure 22 shows that if the error is smooth, i.e k = 2, then the Jacobi method does a poor jobof decreasing the error.
If we increase the frequency slightly to k = 4, then we can see some improvement in theconvergence rate. See Figure 23.
If the error is made up of high frequency components, k = 8, then the Jacobi method does agood job of decreasing the error. See Figure 24.
The same results apply to two dimensional problems. Figure 25 shows the results if the initial
54
y
1
0.5
0
-0.5
-1
x
302520151050
i = 0, k = 4, N=32
y
1
0.5
0
-0.5
-1
x
302520151050
i = 1, k = 4, N=32
y
1
0.5
0
-0.5
-1
x
302520151050
i = 2, k = 4, N=32
y
1
0.5
0
-0.5
-1
x
302520151050
i = 3, k = 4, N=32
Figure 23: Increasing the frequency improves the convergence rate
55
y
1
0.5
0
-0.5
-1
x
302520151050
i = 0, k = 8, N=32
y
1
0.5
0
-0.5
-1
x
302520151050
i = 1, k = 8, N=32
y
1
0.5
0
-0.5
-1
x
302520151050
i = 2, k = 8, N=32
y
1
0.5
0
-0.5
-1
x
302520151050
i = 3, k = 8, N=32
Figure 24: High frequency components of the error are removed quickly
56
guess (and in this example the error) is given by
v0j1j2 = sin
(j1k1π
n
)sin
(j2k2π
n
), 0 ≤ j1, j2 ≤ n
with k1 = k2 = 2.Figure 26 shows how that the Jacobi method reduces the error if the initial guess has k1 = k2 = 4.
Jacobi and Eigenvalues
Recall that stationary methods converge if and only if ρ(R) < 1.Let A be the matrix formed by the the 3-point stencil. Then
Rw = I − ω
2A.
So,
λ(Rw) = 1− ω
2λ(A).
The eigenvalues of A are
λk(A) = 4 sin2
(kπ
2n
), 1 ≤ k ≤ n− 1,
with corresponding eigenvectors,
wk,j = sin
(jkπ
n
), 1 ≤ k ≤ n− 1, 0 ≤ j ≤ n.
The Jacobi method converges on this example if 0 < ω ≤ 1.
Jacobi and Eigenvectors
Let e0 be the error of the initial guess used in the weighted Jacobi method. We can write e0 as
e0 =n−1∑k=1
ckwk,
where ck is the amount of error contributing to each mode.Now, the error at the mth iteration is
em = Rmw e0
=n−1∑k=1
ckRmwwk
=
n−1∑k=1
ckλmk (Rw)wk.
After m iterations the kth mode of the error has been reduced by a factor of λmk . That is highoscillatory components are reduced more quickly. See Figure 27.
57
2
4
column6
8
10
12
2
4
6
8 row
10
12
i = 0, k1 = k2 = 2, N=12
2
4
column6
8
10
12
2
4
6
8 row
10
12
i = 1, k1 = k2 = 2, N=12
2
4
column6
8
10
12
2
4
6
8 row
10
12
i = 2, k1 = k2 = 2, N=12
2
4
column6
8
10
12
2
4
6
8 row
10
12
i = 3, k1 = k2 = 2, N=12
Figure 25: Reduction in the error when applying the Jacobi method to a smooth problem
58
2
4
column6
8
10
12
2
4
6
8 row
10
12
i = 0, k1 = k2 = 4, N=12
2
4
column6
8
10
12
2
4
6
8 row
10
12
i = 1, k1 = k2 = 4, N=12
2
4
column6
8
10
12
2
4
6
8 row
10
12
i = 2, k1 = k2 = 4, N=12
2
4
column6
8
10
12
2
4
6
8 row
10
12
i = 3, k1 = k2 = 4, N=12
Figure 26: Once again, the Jacobi method does a good job of removing the high frequency compo-nents of the error
59
y
1
0.5
0
-0.5
-1
x
302520151050
i = 0, k = 2, 8, 16, N=32
y
1
0.5
0
-0.5
-1
x
302520151050
i = 5, k = 2, 8, 16, N=32
y
1
0.5
0
-0.5
-1
x
302520151050
i = 10, k = 2, 8, 16, N=32
Figure 27: Example where the error is a combination of different frequency components
60
w=1/4
w=1/3
w=1/2
w=2/3
w=1
0.5
-0.5
0
-1
1
k
255 3010 150 20
N=32
Figure 28: Plot of λk for different values of ω
Note that
λ1 = 1− 2ω sin2( π
2n
)= 1− 2ω sin2
(πh
2
)≈ 1− ωπ
2h2
2.
In other words, the smoothest mode will always be close to 1. See Figure 28.
Coarse Grids
We can use a coarse grid to find a good initial guess to an iterative scheme. The coarse gridcontains fewer nodes and as we have seen, Jacobi and SOR methods converge faster on coarse grids.
Let Ω2h be a coarse grid defined by taking the even numbered nodes of the fine grid Ωh.If, 1 ≤ k < n
2 , then
whk,2j = sin
(2jkπ
n
)= sin
(jkπ
n/2
)= w2h
k,j , 1 ≤ k < n
2.
So modes, ‘moved’ down to the coarse grids become more oscillatory.This leads to the idea of the 2-grid Multigrid method.
• Apply a few iterations of the iterative method to remove the high frequency components ofthe error.
• Move the problem down to the coarse grid.
61
• Solve the coarse grid problem.
• Move the problem back up to the fine grid to improve the fine grid solution.
Coarse to Fine Grid Transfer Ih2h
How do we transfer the problem from the coarse grid to the fine grid?Linear Interpolation: 1D
vh2j = v2hj ,
vh2j+1 =1
2(v2hj + v2hj+1), 0 ≤ j ≤ n
2− 1.
Bilinear Interpolation: 2D
vh2i,2j = v2hi,j ,
vh2i+1,2j =1
2(v2hi,j + v2hi+1,j),
vh2i,2j+1 =1
2(v2hi,j + v2hi,j+1),
vh2i+1,2j+1 =1
4(v2hi,j + v2hi+1,j + v2hi,j+1 + v2hi+1,j+1), 0 ≤ i, j ≤ n
2− 1.
Fine to Coarse Grid Transfer I2hh
Restriction: 1D
v2hj = vh2j , 1 ≤ j ≤ n
2− 1.
Full Weighting: 1D
v2hj =1
4(vh2j−1 + 2vh2j + vh2j+1), 1 ≤ j ≤ n
2− 1.
Full Weighting: 2D
v2hi,j =1
16[vh2i−1,2j−1 + vh2i−1,2j+1 + vh2i+1,2j−1 + vh2i+1,2j+1
+ 2(vh2i,2j−1 + vh2i,2j+1 + vh2i−1,2j + vh2i+1,2j)
+ 4vhi,j ], 1 ≤ i, j ≤ n
2− 1.
Coarse Grid Approximation of the Error
What is meant by ‘move the problem down to the coarse grid ’?Recall,
rh = bh −Ahvh
= Ahu−Ahvh
= Aheh.
62
Algorithm 14. Two-Grid Correction Scheme vh ←MG(vh, bh)
1: Apply smoother µ1 times to Ahuh = bh on Ωh with initial guess vh.2: Compute the residual rh = bh −Ahvh.3: Restrict the residual down to coarse grid r2h = I2hh rh.4: Solve A2he2h = r2h on Ω2h.5: Interpolate coarse grid error to fine grid eh = Ih2he
2h.6: Update fine grid approximation vh = vh + eh.7: Apply smoother µ2 times to Ahuh = bh on Ωh with initial guess vh.
The multigrid method firstly applies the pre-smoothing steps to remove the high frequencycomponents of the error. It then projects the residual equation down to the coarse grid to find anapproximation to the error. See Figure 29. That approximation is interpolated back up to the finegrid and used to improve the current solution. See Figure 30. Finally, a post-smoothing step isapplied to remove the remaining high frequency components of the error.
Multiple Layers of Grids
In the two grid correction method we said that we should solve the system of equations r2h =A2he2h to find a coarse grid approximation to the error. How do we solve the coarse gridequations?
If the coarse grid is small we could use a direct method like Gaussian Elimination.If the coarse grid is still large we could add another, even coarser, grid and apply Algorithm 14
again. This leads to the multigrid idea.
V-Cycle Scheme
There are many different variations of the multigrid idea, the most common is the V-CycleScheme. Given a nested sequence of grids
Ωkh ⊂ Ω(k−1)h ⊂ · · · ⊂ Ω2h ⊂ Ωh,
the V-Cycle Scheme is:
Algorithm 15. V-Cycle Scheme: vh ← V h(vh, bh)
1: Apply smoother µ1 times to Ahuh = bh on Ωh with initial guess vh.2: if Ωh is the coarsest grid then3: goto step 124: else5: Compute the residual rh = bh −Ahvh.6: Restrict the residual to coarse grid b2h = I2hh rh.7: Set the initial coarse grid guess v2h = 0.8: Solve v2h = V 2h(v2h, b2h).9: Interpolate coarse grid error to fine grid eh = Ih2hv
2h.10: Update fine grid approximation vh = vh + eh.11: end if12: Apply smoother µ2 times to Ahuh = bh on Ωh with initial guess vh.
W-Cycle Scheme
Another variation of the multigrid idea is the W-cycle Scheme:
63
y
1
0.5
0
-0.5
-1
x
302520151050
i=0, k=[2, 8], N=32
y
1
0.5
0
-0.5
-1
x
302520151050
i=6, k = [2, 8], N=32
x
1612840y
1
0.5
0
-0.5
-1
error, N = 16
Figure 29: Apply the pre-smoothing steps to remove the high frequency components of the errorand then approximate the error on the coarse grid.
64
y
1
0.5
0
-0.5
-1
x
302520151050
inter. error, N = 32
y
1
0.5
0
-0.5
-1
x
302520151050
corrected, N=32
Figure 30: The coarse grid approximation to the error is used to improve the current estimate
Algorithm 16. W-Cycle Scheme: vh ←W h(vh, bh)
1: . Apply smoother µ1 times to Ahuh = bh on Ωh with initial guess vh.2: if Ωh is not the coarsest grid then3: Compute the residual rh = bh −Ahvh.4: Restrict the residual to coarse grid b2h = I2hh rh.5: Set the initial coarse grid guess v2h = 0.6: Repeat v2h = V 2h(v2h, b2h) twice.7: Interpolate coarse grid error to fine grid eh = Ih2hv
2h.8: Update fine grid approximation vh = vh + eh.9: end if
10: Apply smoother µ2 times to Ahuh = bh on Ωh with initial guess vh.
Full Multigrid V-Cycle (FMV)
In Chapter 2 we mentioned the idea of using the coarse grid to find a good initial guess. Withthe FMV scheme we incorporate that idea into the multigrid method.
Algorithm 17. FMV Scheme: vh ← FMGh(bh)
1: if Ωh is coarsest grid then2: vh = 03: else4: Find the RHS vector b2h = I2hh bh.5: Solve coarse grid v2h = FMG2h(b2h).6: Find coarse grid approximation vh = Ih2hv
2h.7: end if8: Solve on the current grid vh = V h(vh, bh).
65
Memory Requirements
Lets consider the solution of Poisson’s equation on a square domain. We need to store a copyof vh and bh on each grid level.
Consider a d-dimensional grid with nd points and assume n is a power of 2. Two arrays arestored on each level. The finest grid, Ωh requires 2nd amount of storage. The next grid Ω2h requires2−d times as much storage. The third grid Ω4h requires 4−d = 2−2d as much storage, etc. So,
Storage = 2nd(1 + 2−d + 2−2d + · · ·+ 2−nd) <2nd
1− 2−d.
Computational Load
Let WU be the cost of performing one relaxation sweep on the finest grid. We will ignoreinter-grid transfer costs that are relatively low.
Consider the case where we do one relaxation sweep on each level. Then
V-Cycle computation costs = 2WU(1 + 2−d + 2−2d + · · ·+ 2−nd)
<2WU
1− 2−d.
66
5 Review Material
Identity Matrix
Definition 3 (Unit Vector). We shall represent the unit vectors by ej ∈ Cn where the jth elementof ej is 1 and all other elements are zero.
Example 15 (Unit vector in R3). For example;
e1 = [1, 0, 0]T , e2 = [0, 1, 0]T , e3 = [0, 0, 1]T .
Identity Matrix
Definition 4 (Identity Matrix). The identity matrix is given by I ∈ Cn,n. The row i, column jentry of I is
Ii,j =
1 if i = j0 otherwise
Another way of saying this is that the jth column of I is ej .
Example 16 (Identity matrix). For example, if n = 3;
I =
1 0 00 1 00 0 1
=
1 | 0 | 00 | 1 | 00 | 0 | 1
=
| |e1 | e2 | e3| |
.Non-singular
Definition 5 (Matrix Inverse). Consider a matrix A ∈ Cn,n. If there is a matrix Z ∈ Cn,n suchthat AZ = I ∈ Cn,n then Z is the inverse of A and is written as A−1.
Definition 6 (Non-Singular). A matrix A ∈ Cn,n is non-singular or invertible if there is a matrixZ ∈ Cn,n such that AZ = I ∈ Cn,n.
If A is non-singular then A−1 exists and Ax = b always has an unique solution x = A−1b.
Theorem 7. For A ∈ Cn,n, the following conditions are equivalent;
1. A has an inverse A−1.
2. rank(A) = n.
3. range(A) = Cn.
4. null(A) = 0.
5. 0 is not an eigenvalue of A.
6. 0 is not a singular value of A.
67
7. det(A) 6= 0.
Example 17 (Non-singular system). For example; consider the 3× 3 system
x1 + 3x2 + 2x3 = b1
2x1 + 6x2 + 9x3 = b2
2x1 + 8x2 + 8x3 = b2
This can be written in matrix form as 1 3 22 6 92 8 8
x1x2x3
=
b1b2b3
.The inverse is
1
10
24 8 −15−2 −4 5−4 2 0
.Since
Ax = b↔ x = A−1b,
x1x2x3
=1
10
24b1 + 8b2 − 15b3−2b1 − 4b2 + 5b3
−4b1 + 2b2
.In practise we do not explicitly find A−1 since this is very expensive (as will be shown later).Instead we find ways to solve the, original, system of equations.
Example 18 (Singular system). Now lets consider a slightly different system
x1 + 3x2 + 2x3 = b1
2x1 + 6x2 + 9x3 = b2
3x1 + 9x2 + 8x3 = b3
This can be written in matrix form as 1 3 22 6 93 9 8
x1x2x3
=
b1b2b3
.
In this case A =
1 3 22 6 93 9 8
is singular. That is 6 ∃Z such that AZ = I.
In fact, lets try to solve the system
68
x1 + 3x2 + 2x3 = b1 (2)
2x1 + 6x2 + 9x3 = b2 (3)
3x1 + 9x2 + 8x3 = b3 (4)
(2) ×− 2−2x1 − 6x2 − 4x3 = −2b1 (5)
(5)+(3)5x3 = −2b1 + b2 ⇒ x3 = −2/5b1 + 1/5b2
(2) ×− 3−2x1 − 6x2 − 4x3 = −2b1 (6)
(6)+(4)2x3 = −3b1 + b3 ⇒ x3 = −3/2b1 + 1/2b3.
Hence a solution does not exist unless −2/5b1 + 1/5b2 = −3/2b1 + 1/2b3. So it is possible thereis no solution.
Suppose −2/5b1 + 1/5b2 = −3/2b1 + 1/2b3. If we substitute the result back into (2) we get
x1 + 3x2 = b1 − 2x3 = b1 + 3b1 − b3 = 4b1 − b3
So, if x2 = α,x1 = 4b1 − b3 − 3α.
In otherwords, there are infinitely many solutions.If A is singular then either
1. there is no solution;
2. there are an infinite number of solutions.
Lets suppose b1 = b2 = b3 = 0, then −2/5b1 + 1/5b2 = −3/2b1 + 1/2b3 and
x3 = 0, x2 = α, x1 = −3× α.
Hence 1 3 22 6 93 9 8
−3αα0
=
000
.If A is singular then there exists x 6= 0 such that Ax = 0 which means null(A) 6= 0.
Linear Independence
Let v1,v2,v3, · · · ,vk be a set of vectors.
Definition 7 (Linear Independence). The set is said to be linearly dependent if there are c1, c2, · · · , ck(not all zero) with
0 = c1v1 + c2v2 + · · ·+ ckvk.
69
A set of vectors that is not linearly dependent is called linearly independent .
Example 19 (Vectors in R3). Two vector are dependent if they lie on the same line and threevectors are dependent if they lie on the same space.
Example 20 (Linearly dependent columns).
A =
1 3 3 22 6 9 5−1 −3 3 0
The columns of A are linearly dependent since the second column is a multiple of the first.The rows are also linearly dependent since Row 3 = 2*Row 2 - 5 * Row 1.
Example 21 (Linearly independent columns). If
A =
3 4 20 1 50 0 2
then the columns are linear independent. To see why, suppose 0
00
= c1
300
+ c2
410
+ c3
252
.Then c3 = 0, c2 = 0, c1 = 0 and hence the vectors are linearly independent.
Range
Definition 8 (Range). Consider A ∈ Cm,n. The range(A) is the set of vectors which can beexpressed as Ax for some x.
This is also called the column space of A. The row space of A is the range of A∗.
Example 22 (Consistent system). For example, if
A =
1 05 42 4
then Ax = b can be solved if and only if b lies in the plane that is spanned by the two columnvectors. That is b ∈ range(A).
Nullspace
Definition 9 (Null Space). The nullspace of A ∈ Cm,n, null(A), is the set of vectors x such thatAx = 0.
70
Example 23 (Trivial null space). For example, if
A =
1 05 42 4
then solving
A =
1 05 42 4
[ uv
]=
000
,implies u = 0 (from first equation) and v = 0 (from second equation).
Example 24 (Non-trivial null space). Now consider
B =
1 0 15 4 92 4 6
.Column 3 is a linear combination of Columns 1 and 2 (Column 3 = Column 1 + Column 2), thusColumn 3 lies in the same plane as that spanned by Column 1 and 2. In otherwords range(A) =range(B). Also the null space of B contains any multiple of the vector [1, 1,−1]T (Column 1 +Column 2 - Column 3 = 0); 1 0 1
5 4 92 4 6
cc−c
=
000
..
Rank
Definition 10 (Rank). The rank of A ∈ Cm,n is the dimension of the column space.
dim(nullspace(A)) + rank(A) = n.
Norms
Suppose we have the system of equations Ax = b and we find x as an approximation to x. Howdo we measure the distance between x and x? We will concentrate on the p-norms:
‖x‖p =
(n∑
i=1
|xi|p) 1
p
.
• 1-norm
‖x‖1 =
(n∑
i=1
|xi|
).
• 2-norm
‖x‖2 =
(n∑
i=1
|xi|2) 1
2
=√
(x∗x).
71
• ∞-norm‖x‖∞ = max
i|xi|.
Now, ‖x‖1 ≥ ‖x‖2 ≥ ‖x‖∞ and ‖x‖1 ≤√n‖x‖2, ‖x‖2 ≤
√n‖x‖∞ and ‖x‖1 ≤ n‖x‖∞. Hence
all norms are equivalent.
1
-1
1-1
L2
L1
Linf
Example 25 (Vector norms). For example, consider
xT = [2.6, 5.9, 3.4], xT = [2.8, 5.8, 3.0].
Then(x− x)T = [−0.2, 0.1, 0.4].
So
‖x− x‖1 = 0.2 + 0.1 + 0.4 = 0.7.
‖x− x‖2 = (0.04 + 0.01 + 0.16)2 = 0.4583 to 4 dp.
‖x− x‖∞ = 0.4.
Definition 11 (Vector Norms). If x and y are vectors, then ‖.‖ is a vector norm if all of thefollowing properties hold;
1. ‖x‖ > 0, if x 6= 0.
2. ‖αx‖ = |α‖|x‖ for scalar α.
3. ‖x + y‖ ≤ ‖x‖+ ‖y‖.
Matrix Norms
Given a vector norm, we can define the corresponding matrix norms as follows;
‖A‖ = max‖x‖6=0
‖Ax‖‖x‖
72
.These norms are subordinate to the vector norms.For the 1-norm and ∞-norm these simplify to;
• ‖A‖1 = maxj∑n
i=1 |aij |.
• ‖A‖∞ = maxi∑n
i=j |aij |.
Example 26 (Matrix norms- ‖ · ‖1 ). For example
A =
1 0 −10 3 −15 −1 1
.Then
j = 1 :∑i
|ai1| = 6,
j = 2 :∑i
|ai2| = 4,
j = 3 :∑i
|ai3| = 3,
so ‖A‖1 = 6.
Example 27 (Matrix norms - ‖ · ‖∞). For example
A =
1 0 −10 3 −15 −1 1
.Then
i = 1 :∑j
|a1j | = 2,
i = 2 :∑j
|a2j | = 4,
i = 3 :∑j
|a3j | = 7,
so ‖A‖∞ = 7.
Definition 12 (Matrix Norms). If A and B are matrices, then ‖.‖ is a matrix norm if all of thefollowing properties hold;
1. ‖A‖ > 0, if A 6= 0.
73
2. ‖αA‖ = |α| ‖A‖ for scalar α.
3. ‖A+B‖ ≤ ‖A‖+ ‖B‖.
The subordinate matrix norms defined above also have the following additional properties
4. ‖AB‖ ≤ ‖A‖ ‖B‖.
5. ‖Ax‖ ≤ ‖A‖ ‖x‖ for any vector x.
Orthogonal Vectors
Definition 13 (Orthogonal Vectors). Vectors x and y are orthogonal if x∗y = 0.
A set of vectors S is orthogonal if ∀x,y ∈ S, x∗y = 0.The set is orthonormal if it is orthogonal and ||x||2 = 1 for all x ∈ S.
Theorem 8. The vectors in an orthogonal set are linearly independent.
Proof of linear independence.Suppose the set is not linearly independent. Then there exists a vector vk ∈ S that can be
written as a linear combination of the other vectors in S. That is
vk =
j∑i=1
cjvj .
Assume vk 6= 0. If vk = 0, then v1 =∑j
i=2−cjc1
vj . In otherwords, we can always find some ksuch that vk 6= 0.
Now,
0 < ||vk||22= v∗kvk
= v∗k(
j∑i=1
cjvj)
=
j∑i=1
cjv∗kvj
= 0.
A contradiction.
Orthogonal Basis
Any set of independent vectors q1, q2, · · · , qk can be converted into a set of orthogonal vectorsx1,x2, · · · ,xk.
First set x1 = q1 and then define xi by
xi = qi −x∗1qix∗1x1
x1 − · · · −x∗i−1qi
x∗i−1xi−1xi−1.
74
Theorem 9. The subspace spanned by the original set of vectors q1, q2, · · · , qk is also spanned byx1,x2, · · · ,xk.
x1 = q1.
x2 = q2 −x∗1q2x∗1x1
x1 = q2 − c12q1.
x3 = q3 −x∗1q3x∗1x1
x1 −x∗1q3x∗2x2
x2
= q3 − c13q1 − c23q2 − c23c12q1
(so x1,x2, · · · ,xk and q1, q2, · · · , qk span the same space)Consider xj where j < i.
x∗jxi = x∗j
(qi −
x∗1qix∗1x1
x1 −x∗2qix∗2x2
x2 − · · · −x∗i−1qi
x∗i−1xi−1xi−1
)= x∗jqi − x∗j
(x∗1qix∗1x1
x1
)− x∗j
(x∗2qix∗2x2
x2
)− · · · − x∗j
(x∗i−1qi
x∗i−1xi−1xi−1
)= x∗jqi −
(x∗1qix∗1x1
)x∗jx1 −
(x∗2qix∗2x2
)x∗jx2 − · · · −
(x∗i−1qi
x∗i−1xi−1
)x∗jxi−1
= x∗jqi −
(x∗jqi
x∗jxj
)x∗jxj
= x∗jqi − x∗jqi
= 0.
For example,
q1 =
110
, q2 =
101
, q3 =
011
.x1 = q1 =
110
.x∗1q2 = 1, x∗1x1 = 2, x2 = q2 −
x∗1q2x∗1x1
x1 =
101
− 1
2
110
=1
2
1−12
,x∗1q3 = 1, x∗1x1 = 2, x∗2q3 =
1
2, x∗2x2 =
3
2
x3 = q3 −x∗1q3x∗1x1
x1 −x∗2q3x∗2x2
x2 =
011
− 1
2
110
− 1
3
1
2
1−12
=2
3
−111
.Unitary Matrix
75
Definition 14 (Unitary Matrix). A square matrix Q ∈ Cn,n is unitary (or orthogonal if Q ∈ Rn,n)if Q∗Q = I (or QTQ = I).
The important feature of unitary matrices is that they preserve length. That is
||Qx||22 = (Qx)∗(Qx)
= x∗Q∗Qx
= x∗x
= ||x||22.
Example 28 (Unitary matrices).
Rotation Matrix Q =
[cos θ − sin θsin θ cosθ
],
Permutation Matrix P =
0 1 01 0 00 0 1
.Transformation Matrices
Transform a general linear system into a triangular linear system which is easier to solve.Given an n vector a remove all entries below the kth position (ak 6= 0) by using the following;
Mka =
1 · · · 0 0 · · · 0...
. . ....
.... . .
...0 · · · 1 0 · · · 00 · · · −mk+1 1 · · · 0...
. . ....
.... . .
...0 · · · −mn 0 · · · 1
a1...akak+1
...an
=
a1...ak0...0
where mi = ai/ak, i = k + 1 · · ·n (ak = pivot).
Mk = elementary elimination matrix or Gauss Transform.Note,
1. Mk is a lower triangular matrix with unit diagonal ⇒ Mk is non-singular.
2. Mk = I−meTk , where m = [0, · · · , 0,mk+1, · · · ,mn]T and ek is the kth column of the identitymatrix.
3. M−1k = I + meTk .
4. If Mj , j > k, is another elementary elimination matrix, with vector multiplier t then
MkMj = I −meTk − teTj −meTk teTj = I −meTk − teTj
since eTk t = 0.
76
Gaussian Elimination
To reduce an n × n system Ax = b to upper triangular form multiply both sides by M =Mn−1Mn−2 · · ·M1. For example consider,
A =
∗ ∗ ∗ ∗∗ ∗ ∗ ∗∗ ∗ ∗ ∗∗ ∗ ∗ ∗
M1A =
× × × ×0 × × ×0 × × ×0 × × ×
, M1 =
1 0 0 0−m2 1 0 0−m3 0 1 0−m4 0 0 1
,
M2M1A =
× 4 4 40 4 4 40 0 4 40 0 4 4
, M2 =
1 0 0 00 1 0 00 −t3 1 00 −t4 0 1
,
M3M2M1A =
× 4 © ©0 4 © ©0 0 © ©0 0 0 ©
, M3 =
1 0 0 00 1 0 00 0 1 00 0 s4 1
.
LU Factorisation
Gaussian Elimination is also known as LU factorisation.Set L = M−1 = (Mn−1 · · ·M1)
−1 = M−11 · · ·M−1n−1 and set U = MA. Then LU = M−1(MA) =
A.L is lower triangular and U is upper triangular.If we know L and U then we can express Ax = b as L(Ux) = b and solve for x by
Ly = b ( forward substitution)
Ux = y ( back substitution).
Complexity
Algorithm 18. LU - factorisation1: for k = 1 to n do2: for i = k + 1 to n do3: for j = k + 1 to n do4: aij→aij − (aik/akk)akj5: end for6: end for7: end for
77
We may interchange order in which we move through the indices (eg to suit Fortran or C methodof storage).
Gaussian elimination requires O(n3) operations for the factorisation and O(n2) operations forthe back ( and forward) substitution. Explicit matrix inversion is equivalent to solving n systemsof equations and thus requires O(n3) operations for the factorisation and O(n3) for the back (andforward) substitution.
78