chapter v selected topics in numerical linear...

Chapter V Selected Topics in Numerical LinearAlgebra


5.0 Review of Basic Concepts5.1 Matrix Eigenvalue Problems:

Power Method5.2 Schur’s and Gershgorin’s Theorems5.3 Orthogonal Factorizations and

Least-Squares Problems5.4 Singular-Value Decomposition and

Pseudo-inverses5.5* QR-Algorithm of Francis for the

Eigenvalue Problem





Pseudo-inverses

5.5* QR-Algorithm of Francis for theEigenvalue Problem





Pseudo-inverses5.5* QR-Algorithm of Francis for the

Eigenvalue Problem

5.3 Orthogonal Factorizations and Least-SquaresProblems

Recall the inner-product notation for complexvectors x , y ∈ Cn is defined as

〈x , y〉 = ȳT x =n∑

i=1

xi ȳi

and satisfies these axioms:1 〈x , x〉 > 0 if x 6= 02 〈αx + βy , z〉 = α〈x , z〉+ β〈y , z〉 ∀α, β ∈ C3 〈x , y〉 = 〈y , x〉

If 〈x , y〉 = 0, then‖x + y‖22 = ‖x‖22 + ‖y‖22

This is the so-called Pythagorean rule.

Suppose that {v1, v2, · · · , vn} is an orthonormalbasis for Cn.

Then, each element x ∈ Cn has aunique representation in the form

x =n∑

i=1

civi ∈ Cn

where ci = 〈x , vi〉 (1 ≤ i ≤ n). Thus,

x =n∑

i=1

〈x , vi〉vi ∈ Cn

Suppose that {v1, v2, · · · , vn} is an orthonormalbasis for Cn. Then, each element x ∈ Cn has aunique representation in the form

x =n∑

i=1

civi ∈ Cn


x =n∑

i=1



x =n∑

i=1

civi ∈ Cn

where ci = 〈x , vi〉 (1 ≤ i ≤ n).

Thus,

x =n∑

i=1



x =n∑

i=1

civi ∈ Cn


x =n∑

i=1


Gram-Schmidt Process

It can be used to obtain orthonormal systems inan inner-product space. Suppose that [x1, x2, · · · ]is a linearly independent sequence of vectorsin an inner-product space (The sequence can befinite or infinite). We can generate an orthonormalsequence {u1, u2, · · · , } by the formula

uk =xk −

∑k−1i=1 〈xk , ui〉ui

‖xk −∑k−1

i=1 〈xk , ui〉ui‖2(k ≥ 1)

Gram-Schmidt ProcessIt can be used to obtain orthonormal systems inan inner-product space. Suppose that [x1, x2, · · · ]is a linearly independent sequence of vectorsin an inner-product space (The sequence can befinite or infinite).

We can generate an orthonormalsequence {u1, u2, · · · , } by the formula

uk =xk −


‖xk −∑k−1

i=1 〈xk , ui〉ui‖2(k ≥ 1)

Gram-Schmidt ProcessIt can be used to obtain orthonormal systems inan inner-product space. Suppose that [x1, x2, · · · ]is a linearly independent sequence of vectorsin an inner-product space (The sequence can befinite or infinite). We can generate an orthonormalsequence {u1, u2, · · · , } by the formula

uk =xk −


‖xk −∑k−1

i=1 〈xk , ui〉ui‖2(k ≥ 1)

Theorem (on Gram-Schmidt Sequence)The Gram-Schmidt Sequence [u1,u2, · · · , ]has the property that {u1,u2, · · · ,uk} is anorthonormal base for the linear span of for{x1, x2, · · · , xk} for k ≥ 1.

Proof.Hint: by induction on k . See also P275.

RemarkThe theorem tells us that

span{

x1, x2, · · · , xk}

= span{

u1,u2, · · · ,uk}

for k ≥ 1.

For example, apply the Gram-Schmidtprocess to the columns A1,A2, · · · ,An ofAm×n (m ≥ n), we arrive after n steps at anm × n matrix B whose columns form anorthonormal set (if {A1,A2, · · · ,An} is alinearly independent set).

RemarkNote: the Gram-Schmidt process whenapplied to the columns of a matrix can beinterpreted as a matrix factorization.

Gram-Schmidt Algorithm

for j = 1 to n dofor i = 1 to j − 1 do

tij ← 〈Aj ,Bi〉end do

Cj ← Aj −∑i

Theorem (on Gram-Schmidt Factorization)The Gram-Schmidt process, when appliedto the columns of an m × n (m ≥ n) matrix Aof rank n, produces a factorization

Am×n = Bm×nTn×nin which B is an m × n matrix withorthonormal columns and T is an n × nupper triangular matrix with positivediagonal.

Proof.

We complete the Definition of the T matrix bysetting tij = 0 when i > j . By the precedingtheorem, the columns of B form an orthonormalset of n vectors in Rm, and each Aj is a linearcombination of B1,B2, · · · ,Bj . In fact,

Aj =j∑

i=1

〈Aj ,Bi〉Bi

=

j−1∑i=1

〈Aj ,Bi〉Bi + 〈Aj ,Bj〉Bj

=

j−1∑i=1

tijBi + 〈Aj ,Bj〉Bj

Proof.We complete the Definition of the T matrix bysetting tij = 0 when i > j .

By the precedingtheorem, the columns of B form an orthonormalset of n vectors in Rm, and each Aj is a linearcombination of B1,B2, · · · ,Bj . In fact,

Aj =j∑

i=1

〈Aj ,Bi〉Bi

=

j−1∑i=1


=

j−1∑i=1


Proof.We complete the Definition of the T matrix bysetting tij = 0 when i > j . By the precedingtheorem, the columns of B form an orthonormalset of n vectors in Rm, and each Aj is a linearcombination of B1,B2, · · · ,Bj .

In fact,

Aj =j∑

i=1

〈Aj ,Bi〉Bi

=

j−1∑i=1


=

j−1∑i=1


Proof.We complete the Definition of the T matrix bysetting tij = 0 when i > j . By the precedingtheorem, the columns of B form an orthonormalset of n vectors in Rm, and each Aj is a linearcombination of B1,B2, · · · ,Bj . In fact,

Aj =j∑

i=1

〈Aj ,Bi〉Bi

=

j−1∑i=1


=

j−1∑i=1


Proof.Next,

〈Aj ,Bj〉 = 〈Cj +∑i 0

So

Aj =j−1∑i=1

tijBi + 〈Aj ,Bj〉Bj =j∑

i=1

tijBi =n∑

i=1

tijBi

i.e., A = BT .

Proof.Next,

〈Aj ,Bj〉 = 〈Cj +∑i 0

So

Aj =j−1∑i=1


i=1

tijBi

=n∑

i=1

tijBi

i.e., A = BT .

Proof.Next,

〈Aj ,Bj〉 = 〈Cj +∑i 0

So

Aj =j−1∑i=1


i=1

tijBi =n∑

i=1

tijBi

i.e., A = BT .

To avoid the square roots involving in computing‖ · ‖2, the Gram-Schmidt Algorithm can bemodified as follows:

Modified Gram-Schmidt Algorithmfor k = 1 to n do

dk ← ‖Ak‖22tkk ← 1

for j = k + 1 to n do

tkj ← 〈Aj ,Ak〉/dkAj ← Aj − tkjAk

end do

end do


Modified Gram-Schmidt Algorithm

for k = 1 to n do

dk ← ‖Ak‖22tkk ← 1



end do

end do


Modified Gram-Schmidt Algorithmfor k = 1 to n do

dk ← ‖Ak‖22tkk ← 1



end do

end do

Theorem (on Modified Gram-SchmidtFactorization)If the modified Gram-Schmidt process isapplied to the columns of an m × n (m ≥ n)matrix A of rank n, the transformed m × nmatrix B has an orthogonal set of columnsand satisfies

Am×n = Bm×nTn×nwhere T is a unit n × n upper triangularmatrix whose elements tkj (j > k) aregenerated in the algorithm.

Proof.Hint: by induction on k . See also P277-278.

Least-Squares Problems

An important application of the orthogonalfactorizations being discussed is theLeast-Squares Problem for a linear system ofequations. Consider

Ax = b (1)where A is m × n, x is n × 1, b is m × 1. Assumethe rank of A is n; hence, m ≥ n. Usually, system(1) will have no solution if

b /∈ span{A1, · · · ,An} ⊂ Cm.In such cases, it is often required to find an x

that minimizes the norm of the residual vector,b − Ax . The least-squares “solution" of (1) isthe vector x that makes ‖b − Ax‖2 a minimum. (Ifrank(A) = n, then this x will be unique.)

Least-Squares ProblemsAn important application of the orthogonalfactorizations being discussed is theLeast-Squares Problem for a linear system ofequations.

ConsiderAx = b (1)

where A is m × n, x is n × 1, b is m × 1. Assumethe rank of A is n; hence, m ≥ n. Usually, system(1) will have no solution if



Least-Squares ProblemsAn important application of the orthogonalfactorizations being discussed is theLeast-Squares Problem for a linear system ofequations. Consider

Ax = b (1)where A is m × n, x is n × 1, b is m × 1.

Assumethe rank of A is n; hence, m ≥ n. Usually, system(1) will have no solution if




Ax = b (1)where A is m × n, x is n × 1, b is m × 1. Assumethe rank of A is n; hence, m ≥ n.

Usually, system(1) will have no solution if





b /∈ span{A1, · · · ,An} ⊂ Cm.

In such cases, it is often required to find an xthat minimizes the norm of the residual vector,b − Ax . The least-squares “solution" of (1) isthe vector x that makes ‖b − Ax‖2 a minimum. (Ifrank(A) = n, then this x will be unique.)




that minimizes the norm of the residual vector,b − Ax .

The least-squares “solution" of (1) isthe vector x that makes ‖b − Ax‖2 a minimum. (Ifrank(A) = n, then this x will be unique.)




that minimizes the norm of the residual vector,b − Ax . The least-squares “solution" of (1) isthe vector x that makes ‖b − Ax‖2 a minimum.

(Ifrank(A) = n, then this x will be unique.)

Some Motivations of Least-Square Problems

Assume y = αx + β. Then

A4×2 z ≡

x1 1x2 1x3 1x4 1

( αβ ) = y1y2

y3y4

≡ b


Assume y = αx + β.

Then

A4×2 z ≡

x1 1x2 1x3 1x4 1

( αβ ) = y1y2

y3y4

≡ b


Assume y = αx + β. Then

A4×2 z ≡

x1 1x2 1x3 1x4 1

( αβ ) = y1y2

y3y4

≡ b

Lemma (on the Least-Square Problems)If x is a point s.t. A∗(Ax − b) = 0, then x solvesthe least-square problems.

Proof.Note that A∗(Ax − b) = 0 implies that b − Ax isorthogonal to the column space of A. For anyvector y , since A(x − y) is in the column space ofA, we have 〈b − Ax ,A(x − y)〉 = 0. Then, thePythagorean rule gives

‖b − Ay‖22 = ‖b − Ax + A(x − y)‖22= ‖b − Ax‖22 + ‖A(x − y)‖22≥ ‖b − Ax‖22, ∀y


Proof.Note that A∗(Ax − b) = 0 implies that b − Ax isorthogonal to the column space of A.

For anyvector y , since A(x − y) is in the column space ofA, we have 〈b − Ax ,A(x − y)〉 = 0. Then, thePythagorean rule gives



Proof.Note that A∗(Ax − b) = 0 implies that b − Ax isorthogonal to the column space of A. For anyvector y , since A(x − y) is in the column space ofA, we have 〈b − Ax ,A(x − y)〉 = 0.

Then, thePythagorean rule gives



Proof.Note that A∗(Ax − b) = 0 implies that b − Ax isorthogonal to the column space of A. For anyvector y , since A(x − y) is in the column space ofA, we have 〈b − Ax ,A(x − y)〉 = 0. Then, thePythagorean rule gives


If Am×n has been factored in the form A = BT asdescribed in the preceding theorem, then theleast-squares solution of Ax = b will be the exactsolution of the n × n system

Tx = (B∗B)−1B∗b

This can be verified by the above Lemma:

A∗Ax = (BT )∗BTx = T ∗B∗B(B∗B)−1B∗b = T ∗B∗b = A∗b

The matrix

(B∗B)−1 = diag{d−11 , · · · , d−1n }

the numbers di being those computed in themodified Gram-Schimdt algorithm.




A∗Ax = (BT )∗BTx

= T ∗B∗B(B∗B)−1B∗b = T ∗B∗b = A∗b

The matrix

(B∗B)−1 = diag{d−11 , · · · , d−1n }





A∗Ax = (BT )∗BTx = T ∗B∗B(B∗B)−1B∗b

= T ∗B∗b = A∗b

The matrix

(B∗B)−1 = diag{d−11 , · · · , d−1n }





A∗Ax = (BT )∗BTx = T ∗B∗B(B∗B)−1B∗b = T ∗B∗b = A∗b

The matrix

(B∗B)−1 = diag{d−11 , · · · , d−1n }


OTOH, by the preceding lemma, ‖b − Ax‖2 willbe a minimum if

A∗(Ax − b) = 0

If rank(A) = n, then the n × n square matrix A∗Ais nonsingular, and in this case there is exactlyone least-squares solutions; it is determineduniquely by solving the n × n system of so-callednormal equations (��555§§§|||):

A∗Ax = A∗b

It is easy to see that A∗A is Hermitian andpositive definite. (So, Cholesky factorization maybe used.)


A∗(Ax − b) = 0

If rank(A) = n, then the n × n square matrix A∗Ais nonsingular, and in this case there is exactlyone least-squares solutions;

it is determineduniquely by solving the n × n system of so-callednormal equations (��555§§§|||):

A∗Ax = A∗b



A∗(Ax − b) = 0


A∗Ax = A∗b



A∗(Ax − b) = 0


A∗Ax = A∗b

It is easy to see that A∗A is Hermitian andpositive definite.

(So, Cholesky factorization maybe used.)


A∗(Ax − b) = 0


A∗Ax = A∗b


The direct use of the normal equations for solvinga least-squares problem seems very appealingbecause of its conceptual simplicity.

However, it isregarded as one of the least satisfactory methodsto use on this problem. One reason is that thecondition number of A∗A may be considerablyworse than that of A.For example,

A =

1 1 1� 0 00 � 00 0 �

A∗A = 1 + �2 1 11 1 + �2 1

1 1 1 + �2

For small �, in a computer one may have

rank(A) = 3 but rank(A∗A) = 1

The direct use of the normal equations for solvinga least-squares problem seems very appealingbecause of its conceptual simplicity. However, it isregarded as one of the least satisfactory methodsto use on this problem.

One reason is that thecondition number of A∗A may be considerablyworse than that of A.For example,

A =

1 1 1� 0 00 � 00 0 �

A∗A = 1 + �2 1 11 1 + �2 1

1 1 1 + �2



The direct use of the normal equations for solvinga least-squares problem seems very appealingbecause of its conceptual simplicity. However, it isregarded as one of the least satisfactory methodsto use on this problem. One reason is that thecondition number of A∗A may be considerablyworse than that of A.

For example,

A =

1 1 1� 0 00 � 00 0 �

A∗A = 1 + �2 1 11 1 + �2 1

1 1 1 + �2



The direct use of the normal equations for solvinga least-squares problem seems very appealingbecause of its conceptual simplicity. However, it isregarded as one of the least satisfactory methodsto use on this problem. One reason is that thecondition number of A∗A may be considerablyworse than that of A.For example,

A =

1 1 1� 0 00 � 00 0 �

A∗A = 1 + �2 1 11 1 + �2 1

1 1 1 + �2



Householder’s QR-Factorization

One of the most useful orthogonal factorizationsis called QR-Factorization. The objective is tofactor an m × n matrix A into a product

Am×n = Qm×mRm×n

where Q is a unitary matrix and R is an m × nupper triangular matrix. The factorizationalgorithm actually produces

Q∗Am×n = Rm×n

Householder’s QR-FactorizationOne of the most useful orthogonal factorizationsis called QR-Factorization. The objective is tofactor an m × n matrix A into a product

Am×n = Qm×mRm×n


Q∗Am×n = Rm×n


Am×n = Qm×mRm×n

where Q is a unitary matrix and R is an m × nupper triangular matrix.

The factorizationalgorithm actually produces

Q∗Am×n = Rm×n


Am×n = Qm×mRm×n


Q∗Am×n = Rm×n

Q∗ is build up step-by-step as

Q∗ = U(n−1)U(n−2) · · ·U(1)

where

U(k)m×m =

(I(k−1)×(k−1) 0

0 I(m−k+1)×(m−k+1) − vv∗

)with v ∈ Cm−k+1, ‖v‖2 =

√2.

SoU(1) = Im×m − vv∗ and we want U(1)A1 = β1e(1)with e(1) = (1, 0, · · · , 0)T . Next, we want

U(2)U(1)A1 = β1e(1),U(2)U(1)A2 = (∗, β2, 0, · · · , 0)T

Finally, U(n−1)U(n−2) · · ·U(1)A = Rm×n.


Q∗ = U(n−1)U(n−2) · · ·U(1)

where

U(k)m×m =

(I(k−1)×(k−1) 0

0 I(m−k+1)×(m−k+1) − vv∗

)with v ∈ Cm−k+1, ‖v‖2 =

√2. So

U(1) = Im×m − vv∗

and we want U(1)A1 = β1e(1)

with e(1) = (1, 0, · · · , 0)T . Next, we want

U(2)U(1)A1 = β1e(1),U(2)U(1)A2 = (∗, β2, 0, · · · , 0)T



Q∗ = U(n−1)U(n−2) · · ·U(1)

where

U(k)m×m =

(I(k−1)×(k−1) 0

0 I(m−k+1)×(m−k+1) − vv∗

)with v ∈ Cm−k+1, ‖v‖2 =

√2. So

U(1) = Im×m − vv∗ and we want U(1)A1 = β1e(1)with e(1) = (1, 0, · · · , 0)T .

Next, we want

U(2)U(1)A1 = β1e(1),U(2)U(1)A2 = (∗, β2, 0, · · · , 0)T



Q∗ = U(n−1)U(n−2) · · ·U(1)

where

U(k)m×m =

(I(k−1)×(k−1) 0

0 I(m−k+1)×(m−k+1) − vv∗

)with v ∈ Cm−k+1, ‖v‖2 =

√2. So

U(1) = Im×m − vv∗ and we want U(1)A1 = β1e(1)with e(1) = (1, 0, · · · , 0)T . Next, we want

U(2)U(1)A1 = β1e(1),U(2)U(1)A2 = (∗, β2, 0, · · · , 0)T


Recall (in Sect. 5.2)

Lemma (1st Lemma on Unitary Matrix)For any vector v ∈ Cn, the matrix I − vv∗ isunitary if and only if ‖v‖2 =

√2 or v = 0.

Lemma (2nd Lemma on Unitary Matrix)Let x , y ∈ Cn s.t. ‖x‖2 = ‖y‖2 and〈x , y〉 = y∗x is real. Then there exists aunitary matrix U of the form I − vv∗ s.t.Ux = y.

Here, v =√

2(x−y)‖x−y‖2 .

For example in the 1st step of theQR-factorization, we can put

v = α(A1 − β1e(1))with α =

√2/‖A1 − β1e(1)‖2.

If choose

β1 ={ −‖A1‖2a11/|a11| if a11 6= 0−‖A1‖2 if a11 = 0

then‖A1‖2 = ‖β1e(1)‖2, 〈A1, β1e(1)〉 = −|a11|‖A1‖2 ∈ RFrom the proof of Lemma 2,

U(1)A1 ≡ (I − vv∗)A1 = β1e(1)

Repeat the process for the (m− 1)× (n− 1)sub-matrix of U(1)A ≡ (I − vv∗)A and so on.


v = α(A1 − β1e(1))with α =

√2/‖A1 − β1e(1)‖2. If choose

β1 ={ −‖A1‖2a11/|a11| if a11 6= 0−‖A1‖2 if a11 = 0


U(1)A1 ≡ (I − vv∗)A1 = β1e(1)



v = α(A1 − β1e(1))with α =

√2/‖A1 − β1e(1)‖2. If choose

β1 ={ −‖A1‖2a11/|a11| if a11 6= 0−‖A1‖2 if a11 = 0

then‖A1‖2 = ‖β1e(1)‖2, 〈A1, β1e(1)〉 = −|a11|‖A1‖2 ∈ R

From the proof of Lemma 2,U(1)A1 ≡ (I − vv∗)A1 = β1e(1)



v = α(A1 − β1e(1))with α =

√2/‖A1 − β1e(1)‖2. If choose

β1 ={ −‖A1‖2a11/|a11| if a11 6= 0−‖A1‖2 if a11 = 0


U(1)A1 ≡ (I − vv∗)A1 = β1e(1)


Example (See also P253)Find the QR-factorization of the matrix

A =

63 41 −8842 60 510 −28 56

126 82 −71

U(1)A =1

35

−5145 −3675 29400 1078 29890 −980 19600 −196 1127

U(2)U(1)A =

−147 −105 840 −42 −210 0 96.92310 0 40.3846

Q∗A = U(3)U(2)U(1)A = 21

−7 −5 40 −2 −10 0 −50 0 0

= R

Exercise (in class)Find the QR-factorization of the matrix

A =[ 3 2 3

4 5 6

]

Some Applications of QR-Factorization

d QR ©)½nµeAn�¢Ý§K3��

QÚþn�R¦� A = QR.

ïá3QR©)þ�QR{´¦Ý�ÜA��k�{. Ä�g´µ-A1 , A.

éA1�QR©) A1 = Q1R1. -A2 , R1Q1=⇒ éA2�QR©) A2 = Q2R2. -A3 , R2Q2=⇒ éA3�QR©) A3 = Q3R3. -A4 , R3Q3=⇒ · · ·

´§þã�ÝS�{Ak}�Ñq;� A÷v½^§{Ak}Âñ�diag{λ1, · · · , λn}.k,��ÓÆ?Ú�Ö�5.5!"






´§þã�ÝS�{Ak}�Ñq;

� A÷v½^§{Ak}Âñ�diag{λ1, · · · , λn}.k,��ÓÆ?Ú�Ö�5.5!"

chapter v selected topics in numerical linear...

Documents