Chapter 4
Orthogonality
Po-Ning Chen, Professor
Department of Electrical and Computer Engineering
National Chiao Tung University
Hsin Chu, Taiwan 30010, R.O.C.
4.1 Orthogonality of four subspaces 4-1
• The general relation of the four subspaces:
Answer:
– C(A) is perpendicular/orthogonal to N (AT).
– R(A) is perpendicular/orthogonal to N (A).
• How the four subspaces are related to Ax = b system?
Answer:
– All b’s in Ax = b gives C(A).
– All x’s in Ax = 0 gives N (A).
• However, what if Ax = b has no solution? Can we find x such that ‖e‖2 isminimized, where e = b− Ax?
Answer: Yes, we certainly can. This problem has a huge number of applica-
tions in reality.
4.1 Orthogonality of four subspaces 4-2
Definition (Orthogonality of subspaces): Two subspaces V and W are
said to be orthogonal if
vTw = 0 for all v ∈ V and all w ∈ W .
• For orthogonal V and W , V ∩W = {0} is a set consisting of one element
0, not an empty set.
• Note that 0T0 = 0; so, in a sense, 0 is orthogonal to itself.
In fact, 0 is the only vector that is orthogonal to itself.
Exercise. Prove that R(A) ⊥ N (A).
Answer. R(A) contains all vectors of the form ATy, and all vectors x in N (A)
satisfies Ax = 0. Hence,
ATy · x = xTATy = (Ax)Ty = 0Ty = 0.
�
4.1 Orthogonality of four subspaces 4-3
Exercise (Problem 9). Prove that ATA has the same null space as A.
(Problem 9, Section 4.1) If ATAx = 0 then Ax = 0. Reason: Ax is in the
nullspace of AT and also in the of A and those spaces are . Con-
clusion: ATA has the same nullspace as A. This key fact is repeated in the
next section.
Proof: It suffices to prove that ATAx = 0 if, and only if, Ax = 0.
Note that Ax = 0 implying ATAx = 0 trivially hold.
• ATAx = 0 =⇒ Ax ∈ N (AT).
• By definition of the column space of A, Ax ∈ C(A).
• So, Ax ⊥ Ax, which implies Ax = 0.
�
Alternative proof is that
• B = ATA. Hence, R(B) = R(A), which implies N (B) = N (A) because
the null space is the orthogonal complement (see the next slide) of the
row space. �
4.1 Orthogonal complement 4-4
Definition (Orthogonal complement): The orthogonal complement of V
consists of all vectors that are perpendicular to vectors in V . It is denoted by V ⊥
and is pronounced as “V perp.”
• R(A) andN (A) are not just orthogonal to each other, but are orthogonal
complement to each other!
N (A)⊥ = C(AT).
Example of orthogonal subspaces that are not mutually orthogonal complement.
1. {0} and xy-plane in �3.
2. Two perpendicular lines in �3.
4.1 Orthogonal complement 4-5
Important fact about orthogonal complements V and V ⊥:
• Every vector v can be expressed as v = vr+vn, where vr ∈ V and vn ∈ V ⊥.
• vr is the projection of v onto V , and vn is the projection of v onto V ⊥.
Example.
1. Given Am×n, every vn×1 can be expressed as
vn×1 = x(r)n×1 + x
(n)n×1,
where ATn×mwm×1 = x
(r)n×1 for some wm×1 ∈ �m and Am×nx
(n)n×1 = 0m×1.
2. Given Am×n, every vm×1 can be expressed as
vm×1 = x(c)m×1 + x
(ln)m×1,
where Am×nwn×1 = x(c)m×1 for some wn×1 ∈ �n and AT
n×mx(ln)m×1 = 0n×1.
4.1 Orthogonal complement 4-6
Lemma (Uniqueness of projection): The projection x(r) of a vector v onto
the row space is unique.
Proof:
• Suppose x(r) and x̃(r) are both projections of v (onto the row space of A).
• x(r) ∈ R(A) and x̃(r) ∈ R(A) ⇒ x(r) − x̃(r) ∈ R(A)
(property of a vector space)
• Av = Ax(r) + Ax(n) = Ax(r) = Ax̃(r) ⇒ A(x(r) − x̃(r)) = 0.
⇒ x(r) − x̃(r) ∈ N (A)
• R(A) ⊥ N (A) ⇒ x(r) − x̃(r) = 0. �
Am×n defines a (projection, possibly many-to-one) function mapping
from �n to R(A)!
4.1 Orthogonal complement 4-7
Lemma: Am×n defines a (one-to-one) functionmapping fromR(A) toC(A)!
Proof:
• v = Ax(r) ∈ C(A), where x(r) ∈ R(A). (I.e., x(r) maps to v = Ax(r).)
• If Ax(r) = Ax̃(r) ⇒ x(r) = x̃(r). (I.e., the vector that maps to v = Ax(r)
is unique!) �
• This lemma also conforms to that the dimensions of C(A) and R(A) should
be the same.
• Since the inverse mapping from C(A) to R(A) exists, A contains an r × r
invertible submatrix.
Just pick up the r pivot columns and pivot rows to form the invertible sub-
matrix!
4.1 How to find the projection? 4-8
• (Example 5) Find the projections of x =
[4
3
]onto R(A) and N (A), where
A =
[1 2
3 6
].
Answer: Inner product with the orthonormal basis.
– rref(A) =
[1 2
0 0
].
– Orthonormal basis for R(A) is b(r) = 1√5
[1
2
]. Then,
x(r) = (x · b(r))b(r)( = (‖x‖ cos(θ))b(r)) = 2
[1
2
]=
[2
4
].
– Orthonormal basis for N (A) is b(n) = 1√5
[−2
1
]. Then,
x(n) = (x · b(n))b(n) = −[−2
1
]=
[2
−1
].
4.1 How to find the projection? 4-9
• (Problem 4.1B(d)) Find the projections of x =
645
onto R(A) and N (A),
where A =[1 −3 −4
].
Answer: Alternatively, solve coefficient c for linear system Bc = x,
where columns of B are bases for N (A) and R(A).
– Basis for R(A) is
1
−3
−4
.
– Basis for N (A) is
3 4
1 0
0 1
.
– Then, Gauss-Jordan gives
3 4 1
1 0 −3
0 1 −4
c =
645
⇒ c =
1
1
−1
⇒
x(n) =
3 4
1 0
0 1
[1
1
]=
7
1
1
x(r) =
1
−3
−4
[−1
]=
−1
3
4
4.2 Projections 4-10
Definition (Projection matrix): For a linear subspace V, we can project any
vector b onto this subspace.
Such a projection p can be obtained through projection matrix P , i.e.,
p = Pb.
• If b ∈ V, then its projection to V is itself.
• Hence, Pp = p and P 2p = Pp = p and . . . and Pnp = p.
• Let bi be a vector with all components being 0 except the ith component, which
is 1. Then, we have
P (Pbi) = Pbi.
By collecting all the bi to form the identity matrix, we obtain
P (PI) = PI =⇒ P 2 = P.
The projection matrix P satisfies P 2 = P .
4.2 Projections 4-11
Examples of projection matrices.
• Onto the the z-axis, P1 =
0 0 0
0 0 0
0 0 1
.
• Onto the the xy-plane, P2 =
1 0 0
0 1 0
0 0 0
.
Since z-axis and xy-plane are orthogonal complement, we have
P1b + P2b = b = Ib ⇒ (P1 + P2 − I)b = 0 for every b.
⇒ P1 + P2 = I for projection matrices of orthogonal complement subspaces
4.2 Orthogonality and projections 4-12
• p is the projection of b onto a subspace V.
Then, p and b− p are orthogonal.
Proof:
– Let V⊥ be the orthogonal complement of V.
– Let P and P⊥ be the projection matrices of V and V⊥, respectively.
– Then,
Pb + P⊥b = p + p⊥ = b
⇒ p⊥ = b− p
⇒ p⊥ · p = 0 by definition of orthogonal complement
4.2 How to determine the projection based on orthogonal vectors?4-13
Suppose the subspace V has rank r and a1, . . . ,ar are linearly independent vectors
that span V (i.e., they form a basis of V.)
⇒ projection p (of b) can be written as:
p = x̂1a1 + · · · + x̂rar =[a1 a2 · · · ar
]︸ ︷︷ ︸=A
x̂1x̂2...
x̂r
= Ax̂.
⇒ (b− p) ∈ V⊥ gives
aT1(b− p) = 0
aT2(b− p) = 0
...
aTr(b− p) = 0
≡ AT(b−Ax̂) = 0 =⇒ ATAx̂ = ATb
This concludes (if ATA is invertible):
x̂ = (ATA)−1ATb and p = A(ATA)−1ATb and P = A(ATA)−1AT.
4.2 How to determine the projection based on orthogonal vectors?4-14
Example. Find the projection of b onto a one-dimensional subspace V with a ∈ V
and a = 0.
Answer: We know A =[a]. Hence,
p = A(ATA)−1ATb = a(aTa)−1aTb =
(aTb
aTa
)a =
(b · aa · a)a
�
• P = A(ATA)−1AT implies P T = P .
The projection matrix onto a space V can always be represented as the form
A(ATA)−1AT, where the columns of A span V. So, P T = P is always true for
a projection matrix.
4.2 Projection matrix and its column space 4-15
Tip: From slide 3-65, Pm×m = Bm×rCr×m (and the rank of all matrices is r).
Then,
C(Pm×m) = C(Bm×r) and R(Pm×m) = R(Cr×m).
Hence,
{p : p = Pb for some b ∈ �m} is the column space V of
P = A(ATA)−1AT and also A.
and
{p : p = P⊥b = (I − P )b for some b ∈ �m
}is the left nullspace space
V⊥ of P = A(ATA)−1AT and also A.
What if ATA is not invertible?
4.2 How to determine the projection onto column space?4-16
Suppose C(Am×n) has rank r.
Note that if r < n, then (ATA)n×n is not invertible!
This gives:
Ax̂ = A(ATA)−1ATb and p = A(ATA)−1ATb and P = A(ATA)−1AT.
where A is formed by the r linearly independent (usually, pivot) columns of A,
and the matrix P projects b to the column space of A (equivalently, A).
4.2 How to determine the projection onto column space?4-17
Example. A =
1 0 1
0 1 1
0 0 0
. Find the projection matrix onto C(A).
Answer:
• First, we note the column space of A is the xy-plane.
• Taking A =
1 0
0 1
0 0
gives P = A(ATA)−1AT =
1 0 0
0 1 0
0 0 0
.
• Taking A =
1 1
0 1
0 0
gives P = A(ATA)−1AT =
1 0 0
0 1 0
0 0 0
.
• Taking A =
0 1
1 1
0 0
gives P = A(ATA)−1AT =
1 0 0
0 1 0
0 0 0
.
�
4.2 How to determine the projection onto column space?4-18
Example. A =
1 1
1 1
1 1
and b =
100
. Find x̂ that minimizes ‖b− Ax̂‖2.
Answer:
• A =
111
=⇒ P = A(ATA)−1AT =
1
313
13
13
13
13
13
13
13
.
• Then, Ax̂ = Pb =
1
31313
, i.e., x̂1 + x̂2 =
13. �
4.2 How to determine the projection onto column space?4-19
(Problem 21, Section 4.3) Continue from the previous example. Which of the four
spaces contains the error vector e (= b − Ax̂)? Which contains p (= b − e)?
Which contains x̂? What is the nullspace of A?
Answer: It can be derived thatC(A) = {b ∈ �3 : b1 = b2 = b3}R(A) = {b ∈ �2 : b1 = b2}N (A) = {x ∈ �2 : x1 + x2 = 0}N (AT) = {x ∈ �3 : x1 + x2 + x3 = 0}
So, e ∈ N (AT), p ∈ C(A), one of x̂ in R(A) (but x̂ can never in N (A) since
Ax̂ = Pb = 0), and N (A) is a line in �2. �
• Final note. When n = r, the answer reduces to what has been provided in
the textbook, i.e., x̂ in R(A) = �n, and N (A) = {0}.
4.3 Least square approximations 4-20
Question : Find the solution of Ax̂ = b given that b ∈ C(A).
Answer :
• The solution can be found using methods taught before (e.g., Gauss-Jordan).
• e = b−Ax̂ satisfies ‖e‖2 = 0.
Question : Find x such that ‖b − Ax̂‖2 is minimized (where b is not necessarily
in C(A).)
Answer :
• This is to find x̂ such that Ax̂ = Pb, where P is the projection matrix of
C(A). The solution will minimize ‖e‖2 = ‖b−Ax̂‖2.• If ATA is invertible, then P = A(ATA)−1AT. So, the problem is equivalent to:
Ax̂ = A(ATA)−1ATb ⇔ ATAx̂ = ATb
• If ATA is not invertible, then P = A(ATA)−1AT. So, the problem is equivalent
to:
Ax̂ = A(ATA)−1ATb ⇔ ATAx̂ = ATb ⇔ ATAx̂ = ATb
4.3 Least square approximations 4-21
Example. A =
1 1
1 1
1 1
and b =
100
. Find x̂ that minimizes ‖b− Ax̂‖2.
Answer:
• A =
111
=⇒ P = A(ATA)−1AT =
1
313
13
13
13
13
13
13
13
.
• Hence,
– Ax̂ = A(ATA)−1ATb is equivalent to x̂1 + x̂2 =13.
– ATAx̂ = ATb is equivalent to 3x̂1 + 3x̂2 = 1.
– ATAx̂ = ATb is equivalent to 3x̂1 + 3x̂2 = 1.
So the above three systems are equivalent. �
4.3 Least square approximations 4-22
Example. A =
[1 1 1
1 1 0
]and b =
[1
0
]. Find x̂ that minimizes ‖b− Ax̂‖2.
Answer:
• A =
[1 1
1 0
]=⇒ P = A(ATA)−1AT =
[1 0
0 1
].
• Hence,
– Ax̂ = A(ATA)−1ATb is equivalent to
{x̂1 + x̂2 = 0
x̂3 = 1.
– ATAx̂ = ATb is equivalent to
{x̂1 + x̂2 = 0
x̂3 = 1.
– ATAx̂ = ATb is equivalent to
{x̂1 + x̂2 = 0
x̂3 = 1.
So the above three systems are equivalent. �
Normal equations ATAx̂ = ATb give us the least square approximation.
4.3 Least square approximations 4-23
Applications of least square approximation
• Fitting a straight line: To find the line that minimizes square errors to a
group of points.
– A line (excluding x1 = constant) on �2 can be parameterized via t under
fixed constants D and C as
x =
[x1x2
]= t
[1
D
]+
[0
C
]= td + c =
[t
C + tD
].
– The distance of a point u =
[t
b
]to this line can be calculated by ‖v‖,
where v = (u− c)− q, and q is the projection of u− c onto the line td.
�
�
td + c
u
c
d
�
�
�
�
td
u− c
qv
�
��
4.3 Least square approximations 4-24
• So
q = d(dTd)−1dT(u− c) =1
1 +D2
[1 D
D D2
](u− c).
Then
‖v‖2 = ‖(u− c)− q‖2 = ‖(u− c)− d(dTd)−1dT(u− c)‖2
=
∥∥∥∥∥∥∥∥∥1
1 +D2
[D2 −D
−D 1
] [t− 0
b− C
]︸ ︷︷ ︸=(u−c)
∥∥∥∥∥∥∥∥∥
2
=
∥∥∥∥∥∥∥∥∥(b− C − tD)
[− D
1+D2
11+D2
]︸ ︷︷ ︸
=v
∥∥∥∥∥∥∥∥∥
2
=(b− C − tD)2
1 +D2=
1
1 +D2
∥∥∥∥b− [1 t] [C
D
]∥∥∥∥2• For a sequence of vectors u1,u2, . . . ,uk, their total distance square to the
line isk∑
i=1
‖vi‖2 = 1
1 +D2
∥∥∥∥∥b1...bk
︸ ︷︷ ︸b
−1 t1... ...
1 tk
︸ ︷︷ ︸A
[C
D
]︸︷︷︸
x̂
∥∥∥∥∥2
.
4.3 Least square approximations 4-25
However, the error (in textbook) to be minimized is not
minC,D
k∑i=1
‖vi‖2 = minC,D
1
1 +D2︸ ︷︷ ︸a function of D
∥∥∥∥∥b1...bk
︸ ︷︷ ︸b
−1 t1... ...
1 tk
︸ ︷︷ ︸A
[C
D
]︸︷︷︸
x̂
∥∥∥∥∥2
but
minC,D
(1 +D2)
k∑i=1
‖vi‖2 = minC,D
∥∥∥∥∥∥e1...ek
∥∥∥∥∥∥2
= minC,D
∥∥∥∥∥b1...bk
︸ ︷︷ ︸b
−1 t1... ...
1 tk
︸ ︷︷ ︸A
[C
D
]︸︷︷︸
x̂
∥∥∥∥∥2
,
where ei = bi − (C + tiD) or e = b−Ax̂.
Note that from the previous slide, vi = ei
[− D1+D2
11+D2
]or
[v1 · · · vk
]=
[− D1+D2
11+D2
] [e1 · · · ek
].
4.3 Least square approximations 4-26
(Problem 1, Section 4.3) Find the best straight line that fits points[t
b
]=
[0
0
],
[1
8
],
[3
8
]and
[4
20
].
Give the errors for each point.
Answer:
• ∑4i=1 ‖vi‖2 = 1
1+D2
∥∥∥∥∥∥∥∥∥∥∥∥∥
0
8
8
20
︸ ︷︷ ︸b
−
1 0
1 1
1 3
1 4
︸ ︷︷ ︸A
[C
D
]︸︷︷︸
x̂
∥∥∥∥∥∥∥∥∥∥∥∥∥
2
and ‖e‖2 =
∥∥∥∥∥∥∥∥∥∥∥∥∥
0
8
8
20
︸ ︷︷ ︸b
−
1 0
1 1
1 3
1 4
︸ ︷︷ ︸A
[C
D
]︸︷︷︸
x̂
∥∥∥∥∥∥∥∥∥∥∥∥∥
2
• Solving ATAx̂ = ATb, i.e.,
[4 8
8 26
]x̂ =
[36
112
], yields x̂ =
[C
D
]=
[1
4
].
• [v1 v2 v3 v4
]=
[− D
1+D2
11+D2
](b−Ax̂)T =
[− 41+42
11+42
](b−Ax̂)T =
[− 417
117
] [−1 3 −5 3].
• [e1 e2 e3 e4]T
= (b−Ax̂)T =[−1 3 −5 3
].
4.3 Least square approximations 4-27
1 2 3 4
4
8
12
16
20
�
�
The red color line segment is b− (C + tD) (the direct difference).
The blue color line segment is the true error,i.e., the projection of the direct difference
onto the vector perpendicular to the line C + tD.
Note that the minimization of the two does not necessarily yield
the same minimizers, i.e., C∗ and D∗.
For the application of least square approximation just taught,
the textbook defines the error to be ei = bi − (C + tiD) ,
and the projection to be pi = C + tiD .
See Figure 4.9 on page 227 or the next page.
4.3 Least square approximations 4-28
(Problem 1, Section 4.3) With b = 0, 8, 8, 0 at t = 0, 1, 3, 4, set up and solve the
normal equations ATAx̂ = ATb. For the best straight line in Figure 4.9a, find its
four heights pi and four errors ei. What is the minimum value E = e21+e22+e23+e24.
Figure 4.9:
4.3 Least square approximations 4-29
• Do not confuse with that b − Ax̂ = e =
e1e2e3e4
=
−1
3
−5
3
is perpendicular to
the columns of A =
1 0
1 1
1 3
1 4
, i.e., e is perpendicular to the column space of A.
• In the above, ‖e‖ is the shortest distance of vector b to the column space of
A.
4.3 Least square approximations 4-30
• For simplicity, in this textbook, we will directly take the points on the target
equation and find the best fit curve.
Example. Find the best straight b = C (respectively, b = C + tD+ t2E) that
fits points [t
b
]=
[0
0
],
[1
8
],
[3
8
]and
[4
20
].
Answer. Form the equations as follows.0 = C|t=0
8 = C|t=1
8 = C|t=3
20 = C|t=4
=⇒
1
1
1
1
[C] =
0
8
8
20
respectively,0 = C + 0 ·D + 0 · E8 = C + 1 ·D + 1 · E8 = C + 3 ·D + 9 · E20 = C + 4 ·D + 16 · E
=⇒
1 0 0
1 1 1
1 3 9
1 4 16
CDE
=
0
8
8
20
�
4.3 Least square approximations 4-31
Question : Find x such that ‖b − Ax̂‖2 is minimized (where b is not necessarily
in C(A).)
Answer :
• This is to find x̂ satisfying the normal equations ATAx̂ = ATb .
Calculus interpretation: Suppose all components are real.
• We wish to minimize E � ‖b− Ax̂‖2 =∑mi=1(bi −
[ai,1 ai,2 · · · ai,n
]x̂)2;
hence, the solution satisfies
∂E
∂x̂j=
m∑i=1
2(−ai,j)(bi −[ai,1 ai,2 · · · ai,n
]x̂)
= −2[a1,j a2,j · · · am,j
](b−Ax̂) = 0.
I.e., AT(b−Ax̂) = 0 , which coincides with the normal equations.
Summary: The derivative of ‖b− Ax‖2 with respect to x is −2AT(b−Ax).
4.3 Example 4-32
Example 4.3B. Find the parabola C + Dt + Et2 that comes closet (least square
error) to five points
[t
b
]=
[−2
0
],
[−1
0
],
[0
1
],
[1
0
]and
[2
0
].
Example.
• Write down the five equations:
C +D · (−2) + E(−2)2 = 0
C +D · (−1) + E(−1)2 = 0
C +D · (0) + E(0)2 = 1
C +D · (1) + E(1)2 = 0
C +D · (2) + E(2)2 = 0
=⇒
1 (−2) (−2)2
1 (−1) (−1)2
1 (0) (0)2
1 (1) (1)2
1 (2) (2)2
︸ ︷︷ ︸A
CDE
︸ ︷︷ ︸x̂
=
0
0
1
0
0
︸︷︷︸b
• Since Ax̂ = b has no solution, we turn to the normal equations ATAx̂ = ATb.
The solution will minimize ‖b− Ax̂‖2. �
4.4 Orthogonal bases and Gram-Schmidt 4-33
• The problem of solving Ax = b (or minimizing ‖b− Ax‖2) will become easy
if the columns of A are orthogonal.
• The reason for the above claim is obvious sinceATA becomes diagonalmatrix!
The problem then reduces to:
ATAx = x = ATb if ATA = I .
Definition (Orthonormal basis): The vectors q1, q2, . . . , qn are orthonor-
mal if
qTiqj =
{0 when i = j (orthogonality)
1 when i = j (unit vector)
• Q =[q1 q2 · · · qn
]is sometimes called orthonormal matrix.
Example (Orthonormal matrix). Every permutation matrix (e.g.,
0 0 1
1 0 0
0 1 0
) is
an orthonormal matrix. �
4.4 Orthogonal bases and Gram-Schmidt 4-34
An important fact about the square orthonormal basis.
• The square orthonormal matrix Q satisfies QT = Q−1.
Proof: A direct consequence of QTQ = I . �
Example (Reflection matrix). Another interesting example for the orthonormal
matrix Q is when
Q = I − 2qqT
= (I − qqT)− qqT
= P⊥ − P
Tip: Why is it called a reflection matrix?
• The projection matrix on to a subspace spanned by a unit vector q is
P = A(ATA)−1AT = q(qTq)qT = qqT.
• The project matrix P⊥ on the the orthogonal complement of the space spanned
by q is
P⊥ = I − qqT.
4.4 Orthogonal bases and Gram-Schmidt 4-35
• So, the reflection matrix Qb = P⊥b−Pb = p⊥−p; hence, Qb is the “mirror”
of b with respect to the orthogonal complement of the subspace spanned by q.
�
�
p
b
p⊥
�
�
�
−p
Qb
�
�
4.4 Properties of orthonormal matrix mapping 4-36
1. Length invariance: ‖Qx‖ = ‖x‖2. Angle invariance: (Qx)T(Qy) = xTy
Proof: We only need to prove the second property (since the first property is
a special case of the second property), which can be easily seen by
(Qx)T(Qy) = xTQTQy = xTIy = xTy.
�With A replaced by Q, we have
x̂ = (ATA)−1ATb = QTb and p = A(ATA)−1ATb = QQTb
and P = A(ATA)−1AT = QQT.
Further assuming that Q is a square matrix, we have QQT = I and
x̂ = QTb =
qT
1b...
qTnb
and p = QQTb = b and P = QQT = I.
Example. Q =
[cos(θ) − sin(θ)
sin(θ) cos(θ)
]rotates b counterclockwisely by degree θ.
4.4 Gram-Schmidt procedure 4-37
Observations
• Under the assumption that Q is not necessarily a square matrix (and hence,
possibly, p = b) the error
e =(b−Qx̂ =
)b− x̂1q1 − x̂2q2 − · · · − x̂nqn
= b− (qT1b)q1 − (qT
2b)q2 − · · · − (qTnb)qn
= b− q1(qT1b)− q2(q
T2b)− · · · − qn(q
Tnb)
= b− [q1 q2 · · · qn
]qT1
qT2...
qTn
b = b−Qm×nQ
Tn×mb =
(Im×m −Qm×nQ
Tn×m
)b
is perpendicular to q1, q2, · · · , qn (i.e., is perpendicular to the column space
of Q).
• If ‖e‖ = 0, then b cannot be completely represented by Q.
– This means that q1, q2, . . . , qn is not enough to span the space of interest.
– We can then add a new vector qn+1 =e‖e‖ to complete the basis for this
space.
• This leads to the basic idea of the Gram-Schmidt procedure.
4.4 Gram-Schmidt procedure 4-38
Gram-Schmidt procedure is used to identify the dimension spanned by a sequence
of vectors b1, b2, . . . , bn and find an orthonormal basis for this space.
(Modified) Gram-Schmidt procedure
Initialization: A sequence of (non-zero) vectors b1, b2, . . . , bn.
1. q1 =e1
‖e1‖, where e1 = b1 .
2. q2 =e2
‖e2‖, where e2 = b2 − (qT1b2)q1 , if ‖e2‖ is not zero.
3. q3 =e3
‖e3‖, where e3 = b3 − (qT1b3)q1 − (qT
2b3)q2 =b3−q1(qT1b3)−q2(q
T2b3)=b3−[
q1 q2
] [qT1
qT2
]b3 =(Im×m −Qm×2Q
T2×m
)b3, if ‖e3‖ is not zero.
4. · · ·5. Repeat the above procedure until bn is examined.
Note that by the basic property of projections, e2 is perpendicular to q1, and e3is perpendicular to q1 and q2, and etc. See the first observation in the previous
slide.
4.4 QR decomposition 4-39
• Gram-Schmidt procedure actually changes A =[b1 b2 · · · bn
]into QR.
A =[b1 b2 b3 · · ·]
m×n
=[e1 e2 + (qT
1b2)q1 e3 + (qT1b3)q1 + (qT
2b3)q2 · · ·]m×n
=[‖e1‖q1 ‖e2‖q2 + (qT
1b2)q1 ‖e3‖q3 + (qT1b3)q1 + (qT
2b3)q2 · · ·]m×n
=[q1 q2 q3 · · ·]
m×r
‖e1‖ qT
1b2 qT1b3 · · ·
0 ‖e2‖ qT2b3 · · ·
0 0 ‖e3‖ · · ·... ... ... . . .
r×n
=[q1 q2 q3 · · ·]
m×r
qT1b1 qT
1b2 qT1b3 · · ·
0 qT2b2 qT
2b3 · · ·0 0 qT
3b3 · · ·... ... ... . . .
r×n
where the last red-colored step follows from, for example:
‖e3‖ = (‖e3‖qT3)q3 = eT3q3 = (bT3 − (qT
1b3)qT1 − (qT
2b3)qT2)q3 = bT3q3.
4.4 QR decomposition 4-40
• Can we decompose or factorize A into QR such that Q is an orthonormal
matrix and R in an upper triangular matrix?
Answer: Yes, by Gram-Schmidt procedure.
Notably, the upper triangularR is often taken to be a square matrix whenm ≥ n in, e.g., MATLAB.
Hence,
Am×n = Qm×nRn×n.
What if r < n? Suppose r = 2 < n = 3, ande1 = (qT
1b1)q1 = b1 = 0 ⇒ q1 = e1/‖e1|e2 = (qT
3b2)q2 = b2 − (qT1b2)q1 = 0 ⇒ q2 ??
e3 = (qT3b3)q3 = b3 − (qT
1b3)q1 = 0 ⇒ q3 = e3/‖e3|⇒
b1 = (qT
1b1)q1
b2 = (qT1b2)q1
b3 = (qT1b3)q1 + (qT
3b3)q3
⇒ A =[b1 b2 b3
]m×3
=[q1 q3
]m×2
[qT1b1 qT
1b2 qT1b3
0 0 qT3b3
]2×3
=[q1 q2 q3
]m×3
qT
1b1 qT1b2 qT
1b30 0 0
0 0 qT3b3
3×3
4.4 QR decomposition 4-41
Example. A =
1 1 1
0 0 1
0 0 0
e1 = b1 =
1
0
0
⇒ q1 =
e1‖e1| =
1
0
0
e2 = b2 − (qT1b2)q1 =
1
0
0
−1
0
0
=
0
0
0
⇒ q2 ??
e3 = b3 − (qT1b3)q1 =
1
1
0
−1
0
0
=
0
1
0
⇒ q3 =
e3‖e3| =
0
1
0
4.4 QR decomposition 4-42
Choose q2 to be a unit vector orthogonal to both q1 and q3. Then
⇒ A =[b1 b2 b3
]3×3
=[q1 q3
]3×2
[qT1b1 qT
1b2 qT1b3
0 0 qT3b3
]2×3
(Answer 1) =
1 0
0 1
0 0
3×2
[1 1 1
0 0 1
]2×3
=[q1 q2 q3
]m×3
qT
1b1 qT1b2 qT
1b30 0 0
0 0 qT3b3
3×3
(Answer 2) =
1 0 0
0 0 1
0 1 0
3×3
1 1 1
0 0 0
0 0 1
3×3
(Answer 3) =
1 0 0
0 1 0
0 0 1
3×3
1 1 1
0 0 1
0 0 0
3×3
�
4.4 QR decomposition 4-43
Example. A =
[1 1 1
0 0 1
]
e1 = b1 =
[1
0
]⇒ q1 =
e1‖e1| =
[1
0
]
e2 = b2 − (qT1b2)q1 =
[1
0
]−[1
0
]=
[0
0
]⇒ q2 ??
e3 = b3 − (qT1b3)q1 =
[1
1
]−[1
0
]=
[0
1
]⇒ q3 =
e3‖e3| =
[0
1
]
⇒ A =[b1 b2 b3
]2×3
=[q1 q3
]2×2
[qT1b1 qT
1b2 qT1b3
0 0 qT3b3
]2×3
=
[1 0
0 1
]2×2
[1 1 1
0 0 1
]2×3
�
4.4 QR decomposition 4-44
• In most cases Am×n has rank r = n; hence, Am×n = Qm×nRn×n and R is
invertible. This simplifies
ATAx̂ = ATb
to
ATAx̂ = ATb ⇔ RTQTQRx̂ = RTQTb ⇔ Rx̂ = QTb,
which, by the upper triangularity of R, can be solved directly by back sub-
stitution!
4.4 (Modified) Gram-Schmidt procedure 4-45
The Matlab program to do the QR decomposition (or Gram-Schmidt process)
% Modified Gram-Schmidt
R=zeros(n) % Initialization
for j=1:n %
v=A(:,j); % v = bjfor i=1:j-1 % The columns up to j − 1 of Q are now q1, . . . , qj−1.
R(i,j)=Q(:,i)′*v; % ri,j = qTibj (m multiplications) (12a)
v=v-R(i,j)*Q(:,i) % Perform v = v − (qTibj)qi (m multiplications) (12b)
end % v = ej is now perpendicular to all q1, . . . , qj−1.
R(j,j)=norm(v); % Set diagonal entry of R. (m multiplications) (11)
Q(:,j)=v/R(j,j); % Normalization to obtain qj
end
(See Problem 28) In total, we need
m ·∑nj=1 1+2m ·∑n
j=1
∑j−1i=1 1 =mn+m(n− 1)n=mn2 multiplications.
(Problem 28, Section 4.4) Where are the mn2 multiplications in equaitons (11) and (12)?
A direct way to do QR-decomposition is:
[Q,R]=qr(A)
4.4 Gram-Schmidt procedure 4-46
The “modified” Gram-Schmidt procedure that we just introduced is nu-
merically more stable than the “unmodified” (original) Gram-Schmidt
procedure below (because the division of the former is only performed at the
end).
Initialization: A sequence of (non-zero) vectors b1, b2, . . . , bn.
1. c1 = b1
2. c2 = b2 − cT1b2cT1c1
c1
(= b2 − c1
cT1b2cT1c1
= b2 − c1(cT1c1)
−1cT1b2
)
3. c3 = b3 − cT1b3cT1c1
c1 − cT2b3cT2c2
c2
(= b3 −
[c1 c2
]([cT1cT2
] [c1 c2
])−1 [cT1cT2
]b3
)
4. · · ·5. Repeat the above procedure until bn is examined.
4.4 Naming convention 4-47
• The textbook uses A,B,C, . . . (uppercase letters) to denote c1, c2, c3, . . .
obtained in the (original) Gram-Schmidt procedure in the previous slide. I
personally do not like using the uppercase boldface letters to denote vectors.
So I change it to c1, c2, c3, . . ..
– You may need to know this naming convention in order to solve those
problems at the end of the section. For example, Problem 22.
(Problem 22, Section 4.4) Find orthogonal vectors A, B and C by Gram-Schmidt
from
a =
112
and b =
1
−1
0
and c =
104
.
4.4 Naming convention 4-48
• (Definition from the textbook) Orthogonal matrix: Q is called the orthog-
onal matrix when Q is square and its columns are orthonormal bases of the
column space of Q.
– This naming convection does cause a confusion. It should be called or-
thonormal square matrix.
– So when being asked “Whether Q−1 is an orthogonal matrix when Q is?”
The answer should be YES according to the definition in the textbook.
(See Problem 20.)
(Problem 20, Section 4.4) True or false (give an example in either case):
(a) Q−1 is an orthogonal matrix when Q is an orthogonal matrix.
(b) If Q (3 by 2) has orthonormal columns then ‖Qx‖ always equals to
‖x‖.