chapter 4 orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · chapter 4 orthogonality po-ning...

Chapter 4

Orthogonality

Po-Ning Chen, Professor

Department of Electrical and Computer Engineering

National Chiao Tung University

Hsin Chu, Taiwan 30010, R.O.C.

4.1 Orthogonality of four subspaces 4-1

• The general relation of the four subspaces:

Answer:

– C(A) is perpendicular/orthogonal to N (AT).

– R(A) is perpendicular/orthogonal to N (A).

• How the four subspaces are related to Ax = b system?

Answer:

– All b’s in Ax = b gives C(A).

– All x’s in Ax = 0 gives N (A).

• However, what if Ax = b has no solution? Can we find x such that ‖e‖2 isminimized, where e = b− Ax?

Answer: Yes, we certainly can. This problem has a huge number of applica-

tions in reality.


Definition (Orthogonality of subspaces): Two subspaces V and W are

said to be orthogonal if

vTw = 0 for all v ∈ V and all w ∈ W .

• For orthogonal V and W , V ∩W = {0} is a set consisting of one element

0, not an empty set.

• Note that 0T0 = 0; so, in a sense, 0 is orthogonal to itself.

In fact, 0 is the only vector that is orthogonal to itself.

Exercise. Prove that R(A) ⊥ N (A).

Answer. R(A) contains all vectors of the form ATy, and all vectors x in N (A)

satisfies Ax = 0. Hence,

ATy · x = xTATy = (Ax)Ty = 0Ty = 0.

�


Exercise (Problem 9). Prove that ATA has the same null space as A.

(Problem 9, Section 4.1) If ATAx = 0 then Ax = 0. Reason: Ax is in the

nullspace of AT and also in the of A and those spaces are . Con-

clusion: ATA has the same nullspace as A. This key fact is repeated in the

next section.

Proof: It suffices to prove that ATAx = 0 if, and only if, Ax = 0.

Note that Ax = 0 implying ATAx = 0 trivially hold.

• ATAx = 0 =⇒ Ax ∈ N (AT).

• By definition of the column space of A, Ax ∈ C(A).

• So, Ax ⊥ Ax, which implies Ax = 0.

�

Alternative proof is that

• B = ATA. Hence, R(B) = R(A), which implies N (B) = N (A) because

the null space is the orthogonal complement (see the next slide) of the

row space. �

4.1 Orthogonal complement 4-4

Definition (Orthogonal complement): The orthogonal complement of V

consists of all vectors that are perpendicular to vectors in V . It is denoted by V ⊥

and is pronounced as “V perp.”

• R(A) andN (A) are not just orthogonal to each other, but are orthogonal

complement to each other!

N (A)⊥ = C(AT).

Example of orthogonal subspaces that are not mutually orthogonal complement.

1. {0} and xy-plane in �3.

2. Two perpendicular lines in �3.


Important fact about orthogonal complements V and V ⊥:

• Every vector v can be expressed as v = vr+vn, where vr ∈ V and vn ∈ V ⊥.

• vr is the projection of v onto V , and vn is the projection of v onto V ⊥.

Example.

1. Given Am×n, every vn×1 can be expressed as

vn×1 = x(r)n×1 + x

(n)n×1,

where ATn×mwm×1 = x

(r)n×1 for some wm×1 ∈ �m and Am×nx

(n)n×1 = 0m×1.

2. Given Am×n, every vm×1 can be expressed as

vm×1 = x(c)m×1 + x

(ln)m×1,

where Am×nwn×1 = x(c)m×1 for some wn×1 ∈ �n and AT

n×mx(ln)m×1 = 0n×1.


Lemma (Uniqueness of projection): The projection x(r) of a vector v onto

the row space is unique.

Proof:

• Suppose x(r) and x̃(r) are both projections of v (onto the row space of A).

• x(r) ∈ R(A) and x̃(r) ∈ R(A) ⇒ x(r) − x̃(r) ∈ R(A)

(property of a vector space)

• Av = Ax(r) + Ax(n) = Ax(r) = Ax̃(r) ⇒ A(x(r) − x̃(r)) = 0.

⇒ x(r) − x̃(r) ∈ N (A)

• R(A) ⊥ N (A) ⇒ x(r) − x̃(r) = 0. �

Am×n defines a (projection, possibly many-to-one) function mapping

from �n to R(A)!


Lemma: Am×n defines a (one-to-one) functionmapping fromR(A) toC(A)!

Proof:

• v = Ax(r) ∈ C(A), where x(r) ∈ R(A). (I.e., x(r) maps to v = Ax(r).)

• If Ax(r) = Ax̃(r) ⇒ x(r) = x̃(r). (I.e., the vector that maps to v = Ax(r)

is unique!) �

• This lemma also conforms to that the dimensions of C(A) and R(A) should

be the same.

• Since the inverse mapping from C(A) to R(A) exists, A contains an r × r

invertible submatrix.

Just pick up the r pivot columns and pivot rows to form the invertible sub-

matrix!

4.1 How to find the projection? 4-8

• (Example 5) Find the projections of x =

[4

3

]onto R(A) and N (A), where

A =

[1 2

3 6

].

Answer: Inner product with the orthonormal basis.

– rref(A) =

[1 2

0 0

].

– Orthonormal basis for R(A) is b(r) = 1√5

[1

2

]. Then,

x(r) = (x · b(r))b(r)( = (‖x‖ cos(θ))b(r)) = 2

[1

2

]=

[2

4

].

– Orthonormal basis for N (A) is b(n) = 1√5

[−2

1

]. Then,

x(n) = (x · b(n))b(n) = −[−2

1

]=

[2

−1

].

4.1 How to find the projection? 4-9

• (Problem 4.1B(d)) Find the projections of x =

645

onto R(A) and N (A),

where A =[1 −3 −4

].

Answer: Alternatively, solve coefficient c for linear system Bc = x,

where columns of B are bases for N (A) and R(A).

– Basis for R(A) is

1

−3

−4

.

– Basis for N (A) is

3 4

1 0

0 1

.

– Then, Gauss-Jordan gives

3 4 1

1 0 −3

0 1 −4

c =

645

⇒ c =

1

1

−1

⇒

x(n) =

3 4

1 0

0 1

[1

1

]=

7

1

1

x(r) =

1

−3

−4

[−1

]=

−1

3

4

4.2 Projections 4-10

Definition (Projection matrix): For a linear subspace V, we can project any

vector b onto this subspace.

Such a projection p can be obtained through projection matrix P , i.e.,

p = Pb.

• If b ∈ V, then its projection to V is itself.

• Hence, Pp = p and P 2p = Pp = p and . . . and Pnp = p.

• Let bi be a vector with all components being 0 except the ith component, which

is 1. Then, we have

P (Pbi) = Pbi.

By collecting all the bi to form the identity matrix, we obtain

P (PI) = PI =⇒ P 2 = P.

The projection matrix P satisfies P 2 = P .

4.2 Projections 4-11

Examples of projection matrices.

• Onto the the z-axis, P1 =

0 0 0

0 0 0

0 0 1

.

• Onto the the xy-plane, P2 =

1 0 0

0 1 0

0 0 0

.

Since z-axis and xy-plane are orthogonal complement, we have

P1b + P2b = b = Ib ⇒ (P1 + P2 − I)b = 0 for every b.

⇒ P1 + P2 = I for projection matrices of orthogonal complement subspaces

4.2 Orthogonality and projections 4-12

• p is the projection of b onto a subspace V.

Then, p and b− p are orthogonal.

Proof:

– Let V⊥ be the orthogonal complement of V.

– Let P and P⊥ be the projection matrices of V and V⊥, respectively.

– Then,

Pb + P⊥b = p + p⊥ = b

⇒ p⊥ = b− p

⇒ p⊥ · p = 0 by definition of orthogonal complement

4.2 How to determine the projection based on orthogonal vectors?4-13

Suppose the subspace V has rank r and a1, . . . ,ar are linearly independent vectors

that span V (i.e., they form a basis of V.)

⇒ projection p (of b) can be written as:

p = x̂1a1 + · · · + x̂rar =[a1 a2 · · · ar

]︸︷︷︸=A

x̂1x̂2...

x̂r

= Ax̂.

⇒ (b− p) ∈ V⊥ gives

aT1(b− p) = 0

aT2(b− p) = 0

...

aTr(b− p) = 0

≡ AT(b−Ax̂) = 0 =⇒ ATAx̂ = ATb

This concludes (if ATA is invertible):

x̂ = (ATA)−1ATb and p = A(ATA)−1ATb and P = A(ATA)−1AT.

4.2 How to determine the projection based on orthogonal vectors?4-14

Example. Find the projection of b onto a one-dimensional subspace V with a ∈ V

and a = 0.

Answer: We know A =[a]. Hence,

p = A(ATA)−1ATb = a(aTa)−1aTb =

(aTb

aTa

)a =

(b · aa · a)a

�

• P = A(ATA)−1AT implies P T = P .

The projection matrix onto a space V can always be represented as the form

A(ATA)−1AT, where the columns of A span V. So, P T = P is always true for

a projection matrix.

4.2 Projection matrix and its column space 4-15

Tip: From slide 3-65, Pm×m = Bm×rCr×m (and the rank of all matrices is r).

Then,

C(Pm×m) = C(Bm×r) and R(Pm×m) = R(Cr×m).

Hence,

{p : p = Pb for some b ∈ �m} is the column space V of

P = A(ATA)−1AT and also A.

and

{p : p = P⊥b = (I − P )b for some b ∈ �m

}is the left nullspace space

V⊥ of P = A(ATA)−1AT and also A.

What if ATA is not invertible?

4.2 How to determine the projection onto column space?4-16

Suppose C(Am×n) has rank r.

Note that if r < n, then (ATA)n×n is not invertible!

This gives:

Ax̂ = A(ATA)−1ATb and p = A(ATA)−1ATb and P = A(ATA)−1AT.

where A is formed by the r linearly independent (usually, pivot) columns of A,

and the matrix P projects b to the column space of A (equivalently, A).


Example. A =

1 0 1

0 1 1

0 0 0

. Find the projection matrix onto C(A).

Answer:

• First, we note the column space of A is the xy-plane.

• Taking A =

1 0

0 1

0 0

gives P = A(ATA)−1AT =

1 0 0

0 1 0

0 0 0

.

• Taking A =

1 1

0 1

0 0


1 0 0

0 1 0

0 0 0

.

• Taking A =

0 1

1 1

0 0


1 0 0

0 1 0

0 0 0

.

�


Example. A =

1 1

1 1

1 1

and b =

100

. Find x̂ that minimizes ‖b− Ax̂‖2.

Answer:

• A =

111

=⇒ P = A(ATA)−1AT =

1

313

13

13

13

13

13

13

13

.

• Then, Ax̂ = Pb =

1

31313

, i.e., x̂1 + x̂2 =

13. �


(Problem 21, Section 4.3) Continue from the previous example. Which of the four

spaces contains the error vector e (= b − Ax̂)? Which contains p (= b − e)?

Which contains x̂? What is the nullspace of A?

Answer: It can be derived thatC(A) = {b ∈ �3 : b1 = b2 = b3}R(A) = {b ∈ �2 : b1 = b2}N (A) = {x ∈ �2 : x1 + x2 = 0}N (AT) = {x ∈ �3 : x1 + x2 + x3 = 0}

So, e ∈ N (AT), p ∈ C(A), one of x̂ in R(A) (but x̂ can never in N (A) since

Ax̂ = Pb = 0), and N (A) is a line in �2. �

• Final note. When n = r, the answer reduces to what has been provided in

the textbook, i.e., x̂ in R(A) = �n, and N (A) = {0}.

4.3 Least square approximations 4-20

Question : Find the solution of Ax̂ = b given that b ∈ C(A).

Answer :

• The solution can be found using methods taught before (e.g., Gauss-Jordan).

• e = b−Ax̂ satisfies ‖e‖2 = 0.

Question : Find x such that ‖b − Ax̂‖2 is minimized (where b is not necessarily

in C(A).)

Answer :

• This is to find x̂ such that Ax̂ = Pb, where P is the projection matrix of

C(A). The solution will minimize ‖e‖2 = ‖b−Ax̂‖2.• If ATA is invertible, then P = A(ATA)−1AT. So, the problem is equivalent to:

Ax̂ = A(ATA)−1ATb ⇔ ATAx̂ = ATb

• If ATA is not invertible, then P = A(ATA)−1AT. So, the problem is equivalent

to:

Ax̂ = A(ATA)−1ATb ⇔ ATAx̂ = ATb ⇔ ATAx̂ = ATb


Example. A =

1 1

1 1

1 1

and b =

100

. Find x̂ that minimizes ‖b− Ax̂‖2.

Answer:

• A =

111

=⇒ P = A(ATA)−1AT =

1

313

13

13

13

13

13

13

13

.

• Hence,

– Ax̂ = A(ATA)−1ATb is equivalent to x̂1 + x̂2 =13.

– ATAx̂ = ATb is equivalent to 3x̂1 + 3x̂2 = 1.

– ATAx̂ = ATb is equivalent to 3x̂1 + 3x̂2 = 1.

So the above three systems are equivalent. �


Example. A =

[1 1 1

1 1 0

]and b =

[1

0

]. Find x̂ that minimizes ‖b− Ax̂‖2.

Answer:

• A =

[1 1

1 0

]=⇒ P = A(ATA)−1AT =

[1 0

0 1

].

• Hence,

– Ax̂ = A(ATA)−1ATb is equivalent to

{x̂1 + x̂2 = 0

x̂3 = 1.

– ATAx̂ = ATb is equivalent to

{x̂1 + x̂2 = 0

x̂3 = 1.

– ATAx̂ = ATb is equivalent to

{x̂1 + x̂2 = 0

x̂3 = 1.

So the above three systems are equivalent. �

Normal equations ATAx̂ = ATb give us the least square approximation.


Applications of least square approximation

• Fitting a straight line: To find the line that minimizes square errors to a

group of points.

– A line (excluding x1 = constant) on �2 can be parameterized via t under

fixed constants D and C as

x =

[x1x2

]= t

[1

D

]+

[0

C

]= td + c =

[t

C + tD

].

– The distance of a point u =

[t

b

]to this line can be calculated by ‖v‖,

where v = (u− c)− q, and q is the projection of u− c onto the line td.

�

�

td + c

u

c

d

�

�

�

�

td

u− c

qv

�

��


• So

q = d(dTd)−1dT(u− c) =1

1 +D2

[1 D

D D2

](u− c).

Then

‖v‖2 = ‖(u− c)− q‖2 = ‖(u− c)− d(dTd)−1dT(u− c)‖2

=

∥∥∥∥∥∥∥∥∥1

1 +D2

[D2 −D

−D 1

] [t− 0

b− C

]︸︷︷︸=(u−c)

∥∥∥∥∥∥∥∥∥

2

=

∥∥∥∥∥∥∥∥∥(b− C − tD)

[− D

1+D2

11+D2

]︸︷︷︸

=v

∥∥∥∥∥∥∥∥∥

2

=(b− C − tD)2

1 +D2=

1

1 +D2

∥∥∥∥b− [1 t] [C

D

]∥∥∥∥2• For a sequence of vectors u1,u2, . . . ,uk, their total distance square to the

line isk∑

i=1

‖vi‖2 = 1

1 +D2

∥∥∥∥∥b1...bk

︸︷︷︸b

−1 t1... ...

1 tk

︸︷︷︸A

[C

D

]︸︷︷︸

x̂

∥∥∥∥∥2

.


However, the error (in textbook) to be minimized is not

minC,D

k∑i=1

‖vi‖2 = minC,D

1

1 +D2︸︷︷︸a function of D

∥∥∥∥∥b1...bk

︸︷︷︸b

−1 t1... ...

1 tk

︸︷︷︸A

[C

D

]︸︷︷︸

x̂

∥∥∥∥∥2

but

minC,D

(1 +D2)

k∑i=1

‖vi‖2 = minC,D

∥∥∥∥∥∥e1...ek

∥∥∥∥∥∥2

= minC,D

∥∥∥∥∥b1...bk

︸︷︷︸b

−1 t1... ...

1 tk

︸︷︷︸A

[C

D

]︸︷︷︸

x̂

∥∥∥∥∥2

,

where ei = bi − (C + tiD) or e = b−Ax̂.

Note that from the previous slide, vi = ei

[− D1+D2

11+D2

]or

[v1 · · · vk

]=

[− D1+D2

11+D2

] [e1 · · · ek

].


(Problem 1, Section 4.3) Find the best straight line that fits points[t

b

]=

[0

0

],

[1

8

],

[3

8

]and

[4

20

].

Give the errors for each point.

Answer:

• ∑4i=1 ‖vi‖2 = 1

1+D2

∥∥∥∥∥∥∥∥∥∥∥∥∥

0

8

8

20

︸︷︷︸b

−

1 0

1 1

1 3

1 4

︸︷︷︸A

[C

D

]︸︷︷︸

x̂

∥∥∥∥∥∥∥∥∥∥∥∥∥

2

and ‖e‖2 =

∥∥∥∥∥∥∥∥∥∥∥∥∥

0

8

8

20

︸︷︷︸b

−

1 0

1 1

1 3

1 4

︸︷︷︸A

[C

D

]︸︷︷︸

x̂

∥∥∥∥∥∥∥∥∥∥∥∥∥

2

• Solving ATAx̂ = ATb, i.e.,

[4 8

8 26

]x̂ =

[36

112

], yields x̂ =

[C

D

]=

[1

4

].

• [v1 v2 v3 v4

]=

[− D

1+D2

11+D2

](b−Ax̂)T =

[− 41+42

11+42

](b−Ax̂)T =

[− 417

117

] [−1 3 −5 3].

• [e1 e2 e3 e4]T

= (b−Ax̂)T =[−1 3 −5 3

].


1 2 3 4

4

8

12

16

20

�

�

The red color line segment is b− (C + tD) (the direct difference).

The blue color line segment is the true error,i.e., the projection of the direct difference

onto the vector perpendicular to the line C + tD.

Note that the minimization of the two does not necessarily yield

the same minimizers, i.e., C∗ and D∗.

For the application of least square approximation just taught,

the textbook defines the error to be ei = bi − (C + tiD) ,

and the projection to be pi = C + tiD .

See Figure 4.9 on page 227 or the next page.


(Problem 1, Section 4.3) With b = 0, 8, 8, 0 at t = 0, 1, 3, 4, set up and solve the

normal equations ATAx̂ = ATb. For the best straight line in Figure 4.9a, find its

four heights pi and four errors ei. What is the minimum value E = e21+e22+e23+e24.

Figure 4.9:


• Do not confuse with that b − Ax̂ = e =

e1e2e3e4

=

−1

3

−5

3

is perpendicular to

the columns of A =

1 0

1 1

1 3

1 4

, i.e., e is perpendicular to the column space of A.

• In the above, ‖e‖ is the shortest distance of vector b to the column space of

A.


• For simplicity, in this textbook, we will directly take the points on the target

equation and find the best fit curve.

Example. Find the best straight b = C (respectively, b = C + tD+ t2E) that

fits points [t

b

]=

[0

0

],

[1

8

],

[3

8

]and

[4

20

].

Answer. Form the equations as follows.0 = C|t=0

8 = C|t=1

8 = C|t=3

20 = C|t=4

=⇒

1

1

1

1

[C] =

0

8

8

20

respectively,0 = C + 0 ·D + 0 · E8 = C + 1 ·D + 1 · E8 = C + 3 ·D + 9 · E20 = C + 4 ·D + 16 · E

=⇒

1 0 0

1 1 1

1 3 9

1 4 16

CDE

=

0

8

8

20

�


Question : Find x such that ‖b − Ax̂‖2 is minimized (where b is not necessarily

in C(A).)

Answer :

• This is to find x̂ satisfying the normal equations ATAx̂ = ATb .

Calculus interpretation: Suppose all components are real.

• We wish to minimize E � ‖b− Ax̂‖2 =∑mi=1(bi −

[ai,1 ai,2 · · · ai,n

]x̂)2;

hence, the solution satisfies

∂E

∂x̂j=

m∑i=1

2(−ai,j)(bi −[ai,1 ai,2 · · · ai,n

]x̂)

= −2[a1,j a2,j · · · am,j

](b−Ax̂) = 0.

I.e., AT(b−Ax̂) = 0 , which coincides with the normal equations.

Summary: The derivative of ‖b− Ax‖2 with respect to x is −2AT(b−Ax).

4.3 Example 4-32

Example 4.3B. Find the parabola C + Dt + Et2 that comes closet (least square

error) to five points

[t

b

]=

[−2

0

],

[−1

0

],

[0

1

],

[1

0

]and

[2

0

].

Example.

• Write down the five equations:

C +D · (−2) + E(−2)2 = 0

C +D · (−1) + E(−1)2 = 0

C +D · (0) + E(0)2 = 1

C +D · (1) + E(1)2 = 0

C +D · (2) + E(2)2 = 0

=⇒

1 (−2) (−2)2

1 (−1) (−1)2

1 (0) (0)2

1 (1) (1)2

1 (2) (2)2

︸︷︷︸A

CDE

︸︷︷︸x̂

=

0

0

1

0

0

︸︷︷︸b

• Since Ax̂ = b has no solution, we turn to the normal equations ATAx̂ = ATb.

The solution will minimize ‖b− Ax̂‖2. �

4.4 Orthogonal bases and Gram-Schmidt 4-33

• The problem of solving Ax = b (or minimizing ‖b− Ax‖2) will become easy

if the columns of A are orthogonal.

• The reason for the above claim is obvious sinceATA becomes diagonalmatrix!

The problem then reduces to:

ATAx = x = ATb if ATA = I .

Definition (Orthonormal basis): The vectors q1, q2, . . . , qn are orthonor-

mal if

qTiqj =

{0 when i = j (orthogonality)

1 when i = j (unit vector)

• Q =[q1 q2 · · · qn

]is sometimes called orthonormal matrix.

Example (Orthonormal matrix). Every permutation matrix (e.g.,

0 0 1

1 0 0

0 1 0

) is

an orthonormal matrix. �


An important fact about the square orthonormal basis.

• The square orthonormal matrix Q satisfies QT = Q−1.

Proof: A direct consequence of QTQ = I . �

Example (Reflection matrix). Another interesting example for the orthonormal

matrix Q is when

Q = I − 2qqT

= (I − qqT)− qqT

= P⊥ − P

Tip: Why is it called a reflection matrix?

• The projection matrix on to a subspace spanned by a unit vector q is

P = A(ATA)−1AT = q(qTq)qT = qqT.

• The project matrix P⊥ on the the orthogonal complement of the space spanned

by q is

P⊥ = I − qqT.


• So, the reflection matrix Qb = P⊥b−Pb = p⊥−p; hence, Qb is the “mirror”

of b with respect to the orthogonal complement of the subspace spanned by q.

�

�

p

b

p⊥

�

�

�

−p

Qb

�

�

4.4 Properties of orthonormal matrix mapping 4-36

1. Length invariance: ‖Qx‖ = ‖x‖2. Angle invariance: (Qx)T(Qy) = xTy

Proof: We only need to prove the second property (since the first property is

a special case of the second property), which can be easily seen by

(Qx)T(Qy) = xTQTQy = xTIy = xTy.

�With A replaced by Q, we have

x̂ = (ATA)−1ATb = QTb and p = A(ATA)−1ATb = QQTb

and P = A(ATA)−1AT = QQT.

Further assuming that Q is a square matrix, we have QQT = I and

x̂ = QTb =

qT

1b...

qTnb

and p = QQTb = b and P = QQT = I.

Example. Q =

[cos(θ) − sin(θ)

sin(θ) cos(θ)

]rotates b counterclockwisely by degree θ.

4.4 Gram-Schmidt procedure 4-37

Observations

• Under the assumption that Q is not necessarily a square matrix (and hence,

possibly, p = b) the error

e =(b−Qx̂ =

)b− x̂1q1 − x̂2q2 − · · · − x̂nqn

= b− (qT1b)q1 − (qT

2b)q2 − · · · − (qTnb)qn

= b− q1(qT1b)− q2(q

T2b)− · · · − qn(q

Tnb)

= b− [q1 q2 · · · qn

]qT1

qT2...

qTn

b = b−Qm×nQ

Tn×mb =

(Im×m −Qm×nQ

Tn×m

)b

is perpendicular to q1, q2, · · · , qn (i.e., is perpendicular to the column space

of Q).

• If ‖e‖ = 0, then b cannot be completely represented by Q.

– This means that q1, q2, . . . , qn is not enough to span the space of interest.

– We can then add a new vector qn+1 =e‖e‖ to complete the basis for this

space.

• This leads to the basic idea of the Gram-Schmidt procedure.


Gram-Schmidt procedure is used to identify the dimension spanned by a sequence

of vectors b1, b2, . . . , bn and find an orthonormal basis for this space.

(Modified) Gram-Schmidt procedure

Initialization: A sequence of (non-zero) vectors b1, b2, . . . , bn.

1. q1 =e1

‖e1‖, where e1 = b1 .

2. q2 =e2

‖e2‖, where e2 = b2 − (qT1b2)q1 , if ‖e2‖ is not zero.

3. q3 =e3

‖e3‖, where e3 = b3 − (qT1b3)q1 − (qT

2b3)q2 =b3−q1(qT1b3)−q2(q

T2b3)=b3−[

q1 q2

] [qT1

qT2

]b3 =(Im×m −Qm×2Q

T2×m

)b3, if ‖e3‖ is not zero.

4. · · ·5. Repeat the above procedure until bn is examined.

Note that by the basic property of projections, e2 is perpendicular to q1, and e3is perpendicular to q1 and q2, and etc. See the first observation in the previous

slide.

4.4 QR decomposition 4-39

• Gram-Schmidt procedure actually changes A =[b1 b2 · · · bn

]into QR.

A =[b1 b2 b3 · · ·]

m×n

=[e1 e2 + (qT

1b2)q1 e3 + (qT1b3)q1 + (qT

2b3)q2 · · ·]m×n

=[‖e1‖q1 ‖e2‖q2 + (qT

1b2)q1 ‖e3‖q3 + (qT1b3)q1 + (qT

2b3)q2 · · ·]m×n

=[q1 q2 q3 · · ·]

m×r

‖e1‖ qT

1b2 qT1b3 · · ·

0 ‖e2‖ qT2b3 · · ·

0 0 ‖e3‖ · · ·... ... ... . . .

r×n

=[q1 q2 q3 · · ·]

m×r

qT1b1 qT

1b2 qT1b3 · · ·

0 qT2b2 qT

2b3 · · ·0 0 qT

3b3 · · ·... ... ... . . .

r×n

where the last red-colored step follows from, for example:

‖e3‖ = (‖e3‖qT3)q3 = eT3q3 = (bT3 − (qT

1b3)qT1 − (qT

2b3)qT2)q3 = bT3q3.


• Can we decompose or factorize A into QR such that Q is an orthonormal

matrix and R in an upper triangular matrix?

Answer: Yes, by Gram-Schmidt procedure.

Notably, the upper triangularR is often taken to be a square matrix whenm ≥ n in, e.g., MATLAB.

Hence,

Am×n = Qm×nRn×n.

What if r < n? Suppose r = 2 < n = 3, ande1 = (qT

1b1)q1 = b1 = 0 ⇒ q1 = e1/‖e1|e2 = (qT

3b2)q2 = b2 − (qT1b2)q1 = 0 ⇒ q2 ??

e3 = (qT3b3)q3 = b3 − (qT

1b3)q1 = 0 ⇒ q3 = e3/‖e3|⇒

b1 = (qT

1b1)q1

b2 = (qT1b2)q1

b3 = (qT1b3)q1 + (qT

3b3)q3

⇒ A =[b1 b2 b3

]m×3

=[q1 q3

]m×2

[qT1b1 qT

1b2 qT1b3

0 0 qT3b3

]2×3

=[q1 q2 q3

]m×3

qT

1b1 qT1b2 qT

1b30 0 0

0 0 qT3b3

3×3


Example. A =

1 1 1

0 0 1

0 0 0

e1 = b1 =

1

0

0

⇒ q1 =

e1‖e1| =

1

0

0

e2 = b2 − (qT1b2)q1 =

1

0

0

−1

0

0

=

0

0

0

⇒ q2 ??

e3 = b3 − (qT1b3)q1 =

1

1

0

−1

0

0

=

0

1

0

⇒ q3 =

e3‖e3| =

0

1

0


Choose q2 to be a unit vector orthogonal to both q1 and q3. Then

⇒ A =[b1 b2 b3

]3×3

=[q1 q3

]3×2

[qT1b1 qT

1b2 qT1b3

0 0 qT3b3

]2×3

(Answer 1) =

1 0

0 1

0 0

3×2

[1 1 1

0 0 1

]2×3

=[q1 q2 q3

]m×3

qT

1b1 qT1b2 qT

1b30 0 0

0 0 qT3b3

3×3

(Answer 2) =

1 0 0

0 0 1

0 1 0

3×3

1 1 1

0 0 0

0 0 1

3×3

(Answer 3) =

1 0 0

0 1 0

0 0 1

3×3

1 1 1

0 0 1

0 0 0

3×3

�


Example. A =

[1 1 1

0 0 1

]

e1 = b1 =

[1

0

]⇒ q1 =

e1‖e1| =

[1

0

]

e2 = b2 − (qT1b2)q1 =

[1

0

]−[1

0

]=

[0

0

]⇒ q2 ??

e3 = b3 − (qT1b3)q1 =

[1

1

]−[1

0

]=

[0

1

]⇒ q3 =

e3‖e3| =

[0

1

]

⇒ A =[b1 b2 b3

]2×3

=[q1 q3

]2×2

[qT1b1 qT

1b2 qT1b3

0 0 qT3b3

]2×3

=

[1 0

0 1

]2×2

[1 1 1

0 0 1

]2×3

�


• In most cases Am×n has rank r = n; hence, Am×n = Qm×nRn×n and R is

invertible. This simplifies

ATAx̂ = ATb

to

ATAx̂ = ATb ⇔ RTQTQRx̂ = RTQTb ⇔ Rx̂ = QTb,

which, by the upper triangularity of R, can be solved directly by back sub-

stitution!

4.4 (Modified) Gram-Schmidt procedure 4-45

The Matlab program to do the QR decomposition (or Gram-Schmidt process)

% Modified Gram-Schmidt

R=zeros(n) % Initialization

for j=1:n %

v=A(:,j); % v = bjfor i=1:j-1 % The columns up to j − 1 of Q are now q1, . . . , qj−1.

R(i,j)=Q(:,i)′*v; % ri,j = qTibj (m multiplications) (12a)

v=v-R(i,j)*Q(:,i) % Perform v = v − (qTibj)qi (m multiplications) (12b)

end % v = ej is now perpendicular to all q1, . . . , qj−1.

R(j,j)=norm(v); % Set diagonal entry of R. (m multiplications) (11)

Q(:,j)=v/R(j,j); % Normalization to obtain qj

end

(See Problem 28) In total, we need

m ·∑nj=1 1+2m ·∑n

j=1

∑j−1i=1 1 =mn+m(n− 1)n=mn2 multiplications.

(Problem 28, Section 4.4) Where are the mn2 multiplications in equaitons (11) and (12)?

A direct way to do QR-decomposition is:

[Q,R]=qr(A)


The “modified” Gram-Schmidt procedure that we just introduced is nu-

merically more stable than the “unmodified” (original) Gram-Schmidt

procedure below (because the division of the former is only performed at the

end).

Initialization: A sequence of (non-zero) vectors b1, b2, . . . , bn.

1. c1 = b1

2. c2 = b2 − cT1b2cT1c1

c1

(= b2 − c1

cT1b2cT1c1

= b2 − c1(cT1c1)

−1cT1b2

)

3. c3 = b3 − cT1b3cT1c1

c1 − cT2b3cT2c2

c2

(= b3 −

[c1 c2

]([cT1cT2

] [c1 c2

])−1 [cT1cT2

]b3

)

4. · · ·5. Repeat the above procedure until bn is examined.

4.4 Naming convention 4-47

• The textbook uses A,B,C, . . . (uppercase letters) to denote c1, c2, c3, . . .

obtained in the (original) Gram-Schmidt procedure in the previous slide. I

personally do not like using the uppercase boldface letters to denote vectors.

So I change it to c1, c2, c3, . . ..

– You may need to know this naming convention in order to solve those

problems at the end of the section. For example, Problem 22.

(Problem 22, Section 4.4) Find orthogonal vectors A, B and C by Gram-Schmidt

from

a =

112

and b =

1

−1

0

and c =

104

.

4.4 Naming convention 4-48

• (Definition from the textbook) Orthogonal matrix: Q is called the orthog-

onal matrix when Q is square and its columns are orthonormal bases of the

column space of Q.

– This naming convection does cause a confusion. It should be called or-

thonormal square matrix.

– So when being asked “Whether Q−1 is an orthogonal matrix when Q is?”

The answer should be YES according to the definition in the textbook.

(See Problem 20.)

(Problem 20, Section 4.4) True or false (give an example in either case):

(a) Q−1 is an orthogonal matrix when Q is an orthogonal matrix.

(b) If Q (3 by 2) has orthonormal columns then ‖Qx‖ always equals to

‖x‖.

chapter 4 orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · chapter 4 orthogonality po-ning...

Documents