chapter 4 orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · chapter 4 orthogonality po-ning...

49
Chapter 4 Orthogonality Po-Ning Chen, Professor Department of Electrical and Computer Engineering National Chiao Tung University Hsin Chu, Taiwan 30010, R.O.C.

Upload: others

Post on 23-Mar-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

Chapter 4

Orthogonality

Po-Ning Chen, Professor

Department of Electrical and Computer Engineering

National Chiao Tung University

Hsin Chu, Taiwan 30010, R.O.C.

Page 2: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.1 Orthogonality of four subspaces 4-1

• The general relation of the four subspaces:

Answer:

– C(A) is perpendicular/orthogonal to N (AT).

– R(A) is perpendicular/orthogonal to N (A).

• How the four subspaces are related to Ax = b system?

Answer:

– All b’s in Ax = b gives C(A).

– All x’s in Ax = 0 gives N (A).

• However, what if Ax = b has no solution? Can we find x such that ‖e‖2 isminimized, where e = b− Ax?

Answer: Yes, we certainly can. This problem has a huge number of applica-

tions in reality.

Page 3: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.1 Orthogonality of four subspaces 4-2

Definition (Orthogonality of subspaces): Two subspaces V and W are

said to be orthogonal if

vTw = 0 for all v ∈ V and all w ∈ W .

• For orthogonal V and W , V ∩W = {0} is a set consisting of one element

0, not an empty set.

• Note that 0T0 = 0; so, in a sense, 0 is orthogonal to itself.

In fact, 0 is the only vector that is orthogonal to itself.

Exercise. Prove that R(A) ⊥ N (A).

Answer. R(A) contains all vectors of the form ATy, and all vectors x in N (A)

satisfies Ax = 0. Hence,

ATy · x = xTATy = (Ax)Ty = 0Ty = 0.

Page 4: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.1 Orthogonality of four subspaces 4-3

Exercise (Problem 9). Prove that ATA has the same null space as A.

(Problem 9, Section 4.1) If ATAx = 0 then Ax = 0. Reason: Ax is in the

nullspace of AT and also in the of A and those spaces are . Con-

clusion: ATA has the same nullspace as A. This key fact is repeated in the

next section.

Proof: It suffices to prove that ATAx = 0 if, and only if, Ax = 0.

Note that Ax = 0 implying ATAx = 0 trivially hold.

• ATAx = 0 =⇒ Ax ∈ N (AT).

• By definition of the column space of A, Ax ∈ C(A).

• So, Ax ⊥ Ax, which implies Ax = 0.

Alternative proof is that

• B = ATA. Hence, R(B) = R(A), which implies N (B) = N (A) because

the null space is the orthogonal complement (see the next slide) of the

row space. �

Page 5: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.1 Orthogonal complement 4-4

Definition (Orthogonal complement): The orthogonal complement of V

consists of all vectors that are perpendicular to vectors in V . It is denoted by V ⊥

and is pronounced as “V perp.”

• R(A) andN (A) are not just orthogonal to each other, but are orthogonal

complement to each other!

N (A)⊥ = C(AT).

Example of orthogonal subspaces that are not mutually orthogonal complement.

1. {0} and xy-plane in �3.

2. Two perpendicular lines in �3.

Page 6: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.1 Orthogonal complement 4-5

Important fact about orthogonal complements V and V ⊥:

• Every vector v can be expressed as v = vr+vn, where vr ∈ V and vn ∈ V ⊥.

• vr is the projection of v onto V , and vn is the projection of v onto V ⊥.

Example.

1. Given Am×n, every vn×1 can be expressed as

vn×1 = x(r)n×1 + x

(n)n×1,

where ATn×mwm×1 = x

(r)n×1 for some wm×1 ∈ �m and Am×nx

(n)n×1 = 0m×1.

2. Given Am×n, every vm×1 can be expressed as

vm×1 = x(c)m×1 + x

(ln)m×1,

where Am×nwn×1 = x(c)m×1 for some wn×1 ∈ �n and AT

n×mx(ln)m×1 = 0n×1.

Page 7: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.1 Orthogonal complement 4-6

Lemma (Uniqueness of projection): The projection x(r) of a vector v onto

the row space is unique.

Proof:

• Suppose x(r) and x̃(r) are both projections of v (onto the row space of A).

• x(r) ∈ R(A) and x̃(r) ∈ R(A) ⇒ x(r) − x̃(r) ∈ R(A)

(property of a vector space)

• Av = Ax(r) + Ax(n) = Ax(r) = Ax̃(r) ⇒ A(x(r) − x̃(r)) = 0.

⇒ x(r) − x̃(r) ∈ N (A)

• R(A) ⊥ N (A) ⇒ x(r) − x̃(r) = 0. �

Am×n defines a (projection, possibly many-to-one) function mapping

from �n to R(A)!

Page 8: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.1 Orthogonal complement 4-7

Lemma: Am×n defines a (one-to-one) functionmapping fromR(A) toC(A)!

Proof:

• v = Ax(r) ∈ C(A), where x(r) ∈ R(A). (I.e., x(r) maps to v = Ax(r).)

• If Ax(r) = Ax̃(r) ⇒ x(r) = x̃(r). (I.e., the vector that maps to v = Ax(r)

is unique!) �

• This lemma also conforms to that the dimensions of C(A) and R(A) should

be the same.

• Since the inverse mapping from C(A) to R(A) exists, A contains an r × r

invertible submatrix.

Just pick up the r pivot columns and pivot rows to form the invertible sub-

matrix!

Page 9: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.1 How to find the projection? 4-8

• (Example 5) Find the projections of x =

[4

3

]onto R(A) and N (A), where

A =

[1 2

3 6

].

Answer: Inner product with the orthonormal basis.

– rref(A) =

[1 2

0 0

].

– Orthonormal basis for R(A) is b(r) = 1√5

[1

2

]. Then,

x(r) = (x · b(r))b(r)( = (‖x‖ cos(θ))b(r)) = 2

[1

2

]=

[2

4

].

– Orthonormal basis for N (A) is b(n) = 1√5

[−2

1

]. Then,

x(n) = (x · b(n))b(n) = −[−2

1

]=

[2

−1

].

Page 10: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.1 How to find the projection? 4-9

• (Problem 4.1B(d)) Find the projections of x =

645

onto R(A) and N (A),

where A =[1 −3 −4

].

Answer: Alternatively, solve coefficient c for linear system Bc = x,

where columns of B are bases for N (A) and R(A).

– Basis for R(A) is

1

−3

−4

.

– Basis for N (A) is

3 4

1 0

0 1

.

– Then, Gauss-Jordan gives

3 4 1

1 0 −3

0 1 −4

c =

645

⇒ c =

1

1

−1

x(n) =

3 4

1 0

0 1

[1

1

]=

7

1

1

x(r) =

1

−3

−4

[−1

]=

−1

3

4

Page 11: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.2 Projections 4-10

Definition (Projection matrix): For a linear subspace V, we can project any

vector b onto this subspace.

Such a projection p can be obtained through projection matrix P , i.e.,

p = Pb.

• If b ∈ V, then its projection to V is itself.

• Hence, Pp = p and P 2p = Pp = p and . . . and Pnp = p.

• Let bi be a vector with all components being 0 except the ith component, which

is 1. Then, we have

P (Pbi) = Pbi.

By collecting all the bi to form the identity matrix, we obtain

P (PI) = PI =⇒ P 2 = P.

The projection matrix P satisfies P 2 = P .

Page 12: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.2 Projections 4-11

Examples of projection matrices.

• Onto the the z-axis, P1 =

0 0 0

0 0 0

0 0 1

.

• Onto the the xy-plane, P2 =

1 0 0

0 1 0

0 0 0

.

Since z-axis and xy-plane are orthogonal complement, we have

P1b + P2b = b = Ib ⇒ (P1 + P2 − I)b = 0 for every b.

⇒ P1 + P2 = I for projection matrices of orthogonal complement subspaces

Page 13: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.2 Orthogonality and projections 4-12

• p is the projection of b onto a subspace V.

Then, p and b− p are orthogonal.

Proof:

– Let V⊥ be the orthogonal complement of V.

– Let P and P⊥ be the projection matrices of V and V⊥, respectively.

– Then,

Pb + P⊥b = p + p⊥ = b

⇒ p⊥ = b− p

⇒ p⊥ · p = 0 by definition of orthogonal complement

Page 14: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.2 How to determine the projection based on orthogonal vectors?4-13

Suppose the subspace V has rank r and a1, . . . ,ar are linearly independent vectors

that span V (i.e., they form a basis of V.)

⇒ projection p (of b) can be written as:

p = x̂1a1 + · · · + x̂rar =[a1 a2 · · · ar

]︸ ︷︷ ︸=A

x̂1x̂2...

x̂r

= Ax̂.

⇒ (b− p) ∈ V⊥ gives

aT1(b− p) = 0

aT2(b− p) = 0

...

aTr(b− p) = 0

≡ AT(b−Ax̂) = 0 =⇒ ATAx̂ = ATb

This concludes (if ATA is invertible):

x̂ = (ATA)−1ATb and p = A(ATA)−1ATb and P = A(ATA)−1AT.

Page 15: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.2 How to determine the projection based on orthogonal vectors?4-14

Example. Find the projection of b onto a one-dimensional subspace V with a ∈ V

and a = 0.

Answer: We know A =[a]. Hence,

p = A(ATA)−1ATb = a(aTa)−1aTb =

(aTb

aTa

)a =

(b · aa · a)a

• P = A(ATA)−1AT implies P T = P .

The projection matrix onto a space V can always be represented as the form

A(ATA)−1AT, where the columns of A span V. So, P T = P is always true for

a projection matrix.

Page 16: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.2 Projection matrix and its column space 4-15

Tip: From slide 3-65, Pm×m = Bm×rCr×m (and the rank of all matrices is r).

Then,

C(Pm×m) = C(Bm×r) and R(Pm×m) = R(Cr×m).

Hence,

{p : p = Pb for some b ∈ �m} is the column space V of

P = A(ATA)−1AT and also A.

and

{p : p = P⊥b = (I − P )b for some b ∈ �m

}is the left nullspace space

V⊥ of P = A(ATA)−1AT and also A.

What if ATA is not invertible?

Page 17: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.2 How to determine the projection onto column space?4-16

Suppose C(Am×n) has rank r.

Note that if r < n, then (ATA)n×n is not invertible!

This gives:

Ax̂ = A(ATA)−1ATb and p = A(ATA)−1ATb and P = A(ATA)−1AT.

where A is formed by the r linearly independent (usually, pivot) columns of A,

and the matrix P projects b to the column space of A (equivalently, A).

Page 18: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.2 How to determine the projection onto column space?4-17

Example. A =

1 0 1

0 1 1

0 0 0

. Find the projection matrix onto C(A).

Answer:

• First, we note the column space of A is the xy-plane.

• Taking A =

1 0

0 1

0 0

gives P = A(ATA)−1AT =

1 0 0

0 1 0

0 0 0

.

• Taking A =

1 1

0 1

0 0

gives P = A(ATA)−1AT =

1 0 0

0 1 0

0 0 0

.

• Taking A =

0 1

1 1

0 0

gives P = A(ATA)−1AT =

1 0 0

0 1 0

0 0 0

.

Page 19: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.2 How to determine the projection onto column space?4-18

Example. A =

1 1

1 1

1 1

and b =

100

. Find x̂ that minimizes ‖b− Ax̂‖2.

Answer:

• A =

111

=⇒ P = A(ATA)−1AT =

1

313

13

13

13

13

13

13

13

.

• Then, Ax̂ = Pb =

1

31313

, i.e., x̂1 + x̂2 =

13. �

Page 20: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.2 How to determine the projection onto column space?4-19

(Problem 21, Section 4.3) Continue from the previous example. Which of the four

spaces contains the error vector e (= b − Ax̂)? Which contains p (= b − e)?

Which contains x̂? What is the nullspace of A?

Answer: It can be derived thatC(A) = {b ∈ �3 : b1 = b2 = b3}R(A) = {b ∈ �2 : b1 = b2}N (A) = {x ∈ �2 : x1 + x2 = 0}N (AT) = {x ∈ �3 : x1 + x2 + x3 = 0}

So, e ∈ N (AT), p ∈ C(A), one of x̂ in R(A) (but x̂ can never in N (A) since

Ax̂ = Pb = 0), and N (A) is a line in �2. �

• Final note. When n = r, the answer reduces to what has been provided in

the textbook, i.e., x̂ in R(A) = �n, and N (A) = {0}.

Page 21: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.3 Least square approximations 4-20

Question : Find the solution of Ax̂ = b given that b ∈ C(A).

Answer :

• The solution can be found using methods taught before (e.g., Gauss-Jordan).

• e = b−Ax̂ satisfies ‖e‖2 = 0.

Question : Find x such that ‖b − Ax̂‖2 is minimized (where b is not necessarily

in C(A).)

Answer :

• This is to find x̂ such that Ax̂ = Pb, where P is the projection matrix of

C(A). The solution will minimize ‖e‖2 = ‖b−Ax̂‖2.• If ATA is invertible, then P = A(ATA)−1AT. So, the problem is equivalent to:

Ax̂ = A(ATA)−1ATb ⇔ ATAx̂ = ATb

• If ATA is not invertible, then P = A(ATA)−1AT. So, the problem is equivalent

to:

Ax̂ = A(ATA)−1ATb ⇔ ATAx̂ = ATb ⇔ ATAx̂ = ATb

Page 22: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.3 Least square approximations 4-21

Example. A =

1 1

1 1

1 1

and b =

100

. Find x̂ that minimizes ‖b− Ax̂‖2.

Answer:

• A =

111

=⇒ P = A(ATA)−1AT =

1

313

13

13

13

13

13

13

13

.

• Hence,

– Ax̂ = A(ATA)−1ATb is equivalent to x̂1 + x̂2 =13.

– ATAx̂ = ATb is equivalent to 3x̂1 + 3x̂2 = 1.

– ATAx̂ = ATb is equivalent to 3x̂1 + 3x̂2 = 1.

So the above three systems are equivalent. �

Page 23: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.3 Least square approximations 4-22

Example. A =

[1 1 1

1 1 0

]and b =

[1

0

]. Find x̂ that minimizes ‖b− Ax̂‖2.

Answer:

• A =

[1 1

1 0

]=⇒ P = A(ATA)−1AT =

[1 0

0 1

].

• Hence,

– Ax̂ = A(ATA)−1ATb is equivalent to

{x̂1 + x̂2 = 0

x̂3 = 1.

– ATAx̂ = ATb is equivalent to

{x̂1 + x̂2 = 0

x̂3 = 1.

– ATAx̂ = ATb is equivalent to

{x̂1 + x̂2 = 0

x̂3 = 1.

So the above three systems are equivalent. �

Normal equations ATAx̂ = ATb give us the least square approximation.

Page 24: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.3 Least square approximations 4-23

Applications of least square approximation

• Fitting a straight line: To find the line that minimizes square errors to a

group of points.

– A line (excluding x1 = constant) on �2 can be parameterized via t under

fixed constants D and C as

x =

[x1x2

]= t

[1

D

]+

[0

C

]= td + c =

[t

C + tD

].

– The distance of a point u =

[t

b

]to this line can be calculated by ‖v‖,

where v = (u− c)− q, and q is the projection of u− c onto the line td.

td + c

u

c

d

td

u− c

qv

��

Page 25: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.3 Least square approximations 4-24

• So

q = d(dTd)−1dT(u− c) =1

1 +D2

[1 D

D D2

](u− c).

Then

‖v‖2 = ‖(u− c)− q‖2 = ‖(u− c)− d(dTd)−1dT(u− c)‖2

=

∥∥∥∥∥∥∥∥∥1

1 +D2

[D2 −D

−D 1

] [t− 0

b− C

]︸ ︷︷ ︸=(u−c)

∥∥∥∥∥∥∥∥∥

2

=

∥∥∥∥∥∥∥∥∥(b− C − tD)

[− D

1+D2

11+D2

]︸ ︷︷ ︸

=v

∥∥∥∥∥∥∥∥∥

2

=(b− C − tD)2

1 +D2=

1

1 +D2

∥∥∥∥b− [1 t] [C

D

]∥∥∥∥2• For a sequence of vectors u1,u2, . . . ,uk, their total distance square to the

line isk∑

i=1

‖vi‖2 = 1

1 +D2

∥∥∥∥∥b1...bk

︸ ︷︷ ︸b

−1 t1... ...

1 tk

︸ ︷︷ ︸A

[C

D

]︸︷︷︸

∥∥∥∥∥2

.

Page 26: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.3 Least square approximations 4-25

However, the error (in textbook) to be minimized is not

minC,D

k∑i=1

‖vi‖2 = minC,D

1

1 +D2︸ ︷︷ ︸a function of D

∥∥∥∥∥b1...bk

︸ ︷︷ ︸b

−1 t1... ...

1 tk

︸ ︷︷ ︸A

[C

D

]︸︷︷︸

∥∥∥∥∥2

but

minC,D

(1 +D2)

k∑i=1

‖vi‖2 = minC,D

∥∥∥∥∥∥e1...ek

∥∥∥∥∥∥2

= minC,D

∥∥∥∥∥b1...bk

︸ ︷︷ ︸b

−1 t1... ...

1 tk

︸ ︷︷ ︸A

[C

D

]︸︷︷︸

∥∥∥∥∥2

,

where ei = bi − (C + tiD) or e = b−Ax̂.

Note that from the previous slide, vi = ei

[− D1+D2

11+D2

]or

[v1 · · · vk

]=

[− D1+D2

11+D2

] [e1 · · · ek

].

Page 27: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.3 Least square approximations 4-26

(Problem 1, Section 4.3) Find the best straight line that fits points[t

b

]=

[0

0

],

[1

8

],

[3

8

]and

[4

20

].

Give the errors for each point.

Answer:

• ∑4i=1 ‖vi‖2 = 1

1+D2

∥∥∥∥∥∥∥∥∥∥∥∥∥

0

8

8

20

︸ ︷︷ ︸b

1 0

1 1

1 3

1 4

︸ ︷︷ ︸A

[C

D

]︸︷︷︸

∥∥∥∥∥∥∥∥∥∥∥∥∥

2

and ‖e‖2 =

∥∥∥∥∥∥∥∥∥∥∥∥∥

0

8

8

20

︸ ︷︷ ︸b

1 0

1 1

1 3

1 4

︸ ︷︷ ︸A

[C

D

]︸︷︷︸

∥∥∥∥∥∥∥∥∥∥∥∥∥

2

• Solving ATAx̂ = ATb, i.e.,

[4 8

8 26

]x̂ =

[36

112

], yields x̂ =

[C

D

]=

[1

4

].

• [v1 v2 v3 v4

]=

[− D

1+D2

11+D2

](b−Ax̂)T =

[− 41+42

11+42

](b−Ax̂)T =

[− 417

117

] [−1 3 −5 3].

• [e1 e2 e3 e4]T

= (b−Ax̂)T =[−1 3 −5 3

].

Page 28: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.3 Least square approximations 4-27

1 2 3 4

4

8

12

16

20

The red color line segment is b− (C + tD) (the direct difference).

The blue color line segment is the true error,i.e., the projection of the direct difference

onto the vector perpendicular to the line C + tD.

Note that the minimization of the two does not necessarily yield

the same minimizers, i.e., C∗ and D∗.

For the application of least square approximation just taught,

the textbook defines the error to be ei = bi − (C + tiD) ,

and the projection to be pi = C + tiD .

See Figure 4.9 on page 227 or the next page.

Page 29: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.3 Least square approximations 4-28

(Problem 1, Section 4.3) With b = 0, 8, 8, 0 at t = 0, 1, 3, 4, set up and solve the

normal equations ATAx̂ = ATb. For the best straight line in Figure 4.9a, find its

four heights pi and four errors ei. What is the minimum value E = e21+e22+e23+e24.

Figure 4.9:

Page 30: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.3 Least square approximations 4-29

• Do not confuse with that b − Ax̂ = e =

e1e2e3e4

=

−1

3

−5

3

is perpendicular to

the columns of A =

1 0

1 1

1 3

1 4

, i.e., e is perpendicular to the column space of A.

• In the above, ‖e‖ is the shortest distance of vector b to the column space of

A.

Page 31: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.3 Least square approximations 4-30

• For simplicity, in this textbook, we will directly take the points on the target

equation and find the best fit curve.

Example. Find the best straight b = C (respectively, b = C + tD+ t2E) that

fits points [t

b

]=

[0

0

],

[1

8

],

[3

8

]and

[4

20

].

Answer. Form the equations as follows.0 = C|t=0

8 = C|t=1

8 = C|t=3

20 = C|t=4

=⇒

1

1

1

1

[C] =

0

8

8

20

respectively,0 = C + 0 ·D + 0 · E8 = C + 1 ·D + 1 · E8 = C + 3 ·D + 9 · E20 = C + 4 ·D + 16 · E

=⇒

1 0 0

1 1 1

1 3 9

1 4 16

CDE

=

0

8

8

20

Page 32: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.3 Least square approximations 4-31

Question : Find x such that ‖b − Ax̂‖2 is minimized (where b is not necessarily

in C(A).)

Answer :

• This is to find x̂ satisfying the normal equations ATAx̂ = ATb .

Calculus interpretation: Suppose all components are real.

• We wish to minimize E � ‖b− Ax̂‖2 =∑mi=1(bi −

[ai,1 ai,2 · · · ai,n

]x̂)2;

hence, the solution satisfies

∂E

∂x̂j=

m∑i=1

2(−ai,j)(bi −[ai,1 ai,2 · · · ai,n

]x̂)

= −2[a1,j a2,j · · · am,j

](b−Ax̂) = 0.

I.e., AT(b−Ax̂) = 0 , which coincides with the normal equations.

Summary: The derivative of ‖b− Ax‖2 with respect to x is −2AT(b−Ax).

Page 33: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.3 Example 4-32

Example 4.3B. Find the parabola C + Dt + Et2 that comes closet (least square

error) to five points

[t

b

]=

[−2

0

],

[−1

0

],

[0

1

],

[1

0

]and

[2

0

].

Example.

• Write down the five equations:

C +D · (−2) + E(−2)2 = 0

C +D · (−1) + E(−1)2 = 0

C +D · (0) + E(0)2 = 1

C +D · (1) + E(1)2 = 0

C +D · (2) + E(2)2 = 0

=⇒

1 (−2) (−2)2

1 (−1) (−1)2

1 (0) (0)2

1 (1) (1)2

1 (2) (2)2

︸ ︷︷ ︸A

CDE

︸ ︷︷ ︸x̂

=

0

0

1

0

0

︸︷︷︸b

• Since Ax̂ = b has no solution, we turn to the normal equations ATAx̂ = ATb.

The solution will minimize ‖b− Ax̂‖2. �

Page 34: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.4 Orthogonal bases and Gram-Schmidt 4-33

• The problem of solving Ax = b (or minimizing ‖b− Ax‖2) will become easy

if the columns of A are orthogonal.

• The reason for the above claim is obvious sinceATA becomes diagonalmatrix!

The problem then reduces to:

ATAx = x = ATb if ATA = I .

Definition (Orthonormal basis): The vectors q1, q2, . . . , qn are orthonor-

mal if

qTiqj =

{0 when i = j (orthogonality)

1 when i = j (unit vector)

• Q =[q1 q2 · · · qn

]is sometimes called orthonormal matrix.

Example (Orthonormal matrix). Every permutation matrix (e.g.,

0 0 1

1 0 0

0 1 0

) is

an orthonormal matrix. �

Page 35: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.4 Orthogonal bases and Gram-Schmidt 4-34

An important fact about the square orthonormal basis.

• The square orthonormal matrix Q satisfies QT = Q−1.

Proof: A direct consequence of QTQ = I . �

Example (Reflection matrix). Another interesting example for the orthonormal

matrix Q is when

Q = I − 2qqT

= (I − qqT)− qqT

= P⊥ − P

Tip: Why is it called a reflection matrix?

• The projection matrix on to a subspace spanned by a unit vector q is

P = A(ATA)−1AT = q(qTq)qT = qqT.

• The project matrix P⊥ on the the orthogonal complement of the space spanned

by q is

P⊥ = I − qqT.

Page 36: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.4 Orthogonal bases and Gram-Schmidt 4-35

• So, the reflection matrix Qb = P⊥b−Pb = p⊥−p; hence, Qb is the “mirror”

of b with respect to the orthogonal complement of the subspace spanned by q.

p

b

p⊥

−p

Qb

Page 37: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.4 Properties of orthonormal matrix mapping 4-36

1. Length invariance: ‖Qx‖ = ‖x‖2. Angle invariance: (Qx)T(Qy) = xTy

Proof: We only need to prove the second property (since the first property is

a special case of the second property), which can be easily seen by

(Qx)T(Qy) = xTQTQy = xTIy = xTy.

�With A replaced by Q, we have

x̂ = (ATA)−1ATb = QTb and p = A(ATA)−1ATb = QQTb

and P = A(ATA)−1AT = QQT.

Further assuming that Q is a square matrix, we have QQT = I and

x̂ = QTb =

qT

1b...

qTnb

and p = QQTb = b and P = QQT = I.

Example. Q =

[cos(θ) − sin(θ)

sin(θ) cos(θ)

]rotates b counterclockwisely by degree θ.

Page 38: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.4 Gram-Schmidt procedure 4-37

Observations

• Under the assumption that Q is not necessarily a square matrix (and hence,

possibly, p = b) the error

e =(b−Qx̂ =

)b− x̂1q1 − x̂2q2 − · · · − x̂nqn

= b− (qT1b)q1 − (qT

2b)q2 − · · · − (qTnb)qn

= b− q1(qT1b)− q2(q

T2b)− · · · − qn(q

Tnb)

= b− [q1 q2 · · · qn

]qT1

qT2...

qTn

b = b−Qm×nQ

Tn×mb =

(Im×m −Qm×nQ

Tn×m

)b

is perpendicular to q1, q2, · · · , qn (i.e., is perpendicular to the column space

of Q).

• If ‖e‖ = 0, then b cannot be completely represented by Q.

– This means that q1, q2, . . . , qn is not enough to span the space of interest.

– We can then add a new vector qn+1 =e‖e‖ to complete the basis for this

space.

• This leads to the basic idea of the Gram-Schmidt procedure.

Page 39: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.4 Gram-Schmidt procedure 4-38

Gram-Schmidt procedure is used to identify the dimension spanned by a sequence

of vectors b1, b2, . . . , bn and find an orthonormal basis for this space.

(Modified) Gram-Schmidt procedure

Initialization: A sequence of (non-zero) vectors b1, b2, . . . , bn.

1. q1 =e1

‖e1‖, where e1 = b1 .

2. q2 =e2

‖e2‖, where e2 = b2 − (qT1b2)q1 , if ‖e2‖ is not zero.

3. q3 =e3

‖e3‖, where e3 = b3 − (qT1b3)q1 − (qT

2b3)q2 =b3−q1(qT1b3)−q2(q

T2b3)=b3−[

q1 q2

] [qT1

qT2

]b3 =(Im×m −Qm×2Q

T2×m

)b3, if ‖e3‖ is not zero.

4. · · ·5. Repeat the above procedure until bn is examined.

Note that by the basic property of projections, e2 is perpendicular to q1, and e3is perpendicular to q1 and q2, and etc. See the first observation in the previous

slide.

Page 40: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.4 QR decomposition 4-39

• Gram-Schmidt procedure actually changes A =[b1 b2 · · · bn

]into QR.

A =[b1 b2 b3 · · ·]

m×n

=[e1 e2 + (qT

1b2)q1 e3 + (qT1b3)q1 + (qT

2b3)q2 · · ·]m×n

=[‖e1‖q1 ‖e2‖q2 + (qT

1b2)q1 ‖e3‖q3 + (qT1b3)q1 + (qT

2b3)q2 · · ·]m×n

=[q1 q2 q3 · · ·]

m×r

‖e1‖ qT

1b2 qT1b3 · · ·

0 ‖e2‖ qT2b3 · · ·

0 0 ‖e3‖ · · ·... ... ... . . .

r×n

=[q1 q2 q3 · · ·]

m×r

qT1b1 qT

1b2 qT1b3 · · ·

0 qT2b2 qT

2b3 · · ·0 0 qT

3b3 · · ·... ... ... . . .

r×n

where the last red-colored step follows from, for example:

‖e3‖ = (‖e3‖qT3)q3 = eT3q3 = (bT3 − (qT

1b3)qT1 − (qT

2b3)qT2)q3 = bT3q3.

Page 41: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.4 QR decomposition 4-40

• Can we decompose or factorize A into QR such that Q is an orthonormal

matrix and R in an upper triangular matrix?

Answer: Yes, by Gram-Schmidt procedure.

Notably, the upper triangularR is often taken to be a square matrix whenm ≥ n in, e.g., MATLAB.

Hence,

Am×n = Qm×nRn×n.

What if r < n? Suppose r = 2 < n = 3, ande1 = (qT

1b1)q1 = b1 = 0 ⇒ q1 = e1/‖e1|e2 = (qT

3b2)q2 = b2 − (qT1b2)q1 = 0 ⇒ q2 ??

e3 = (qT3b3)q3 = b3 − (qT

1b3)q1 = 0 ⇒ q3 = e3/‖e3|⇒

b1 = (qT

1b1)q1

b2 = (qT1b2)q1

b3 = (qT1b3)q1 + (qT

3b3)q3

⇒ A =[b1 b2 b3

]m×3

=[q1 q3

]m×2

[qT1b1 qT

1b2 qT1b3

0 0 qT3b3

]2×3

=[q1 q2 q3

]m×3

qT

1b1 qT1b2 qT

1b30 0 0

0 0 qT3b3

3×3

Page 42: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.4 QR decomposition 4-41

Example. A =

1 1 1

0 0 1

0 0 0

e1 = b1 =

1

0

0

⇒ q1 =

e1‖e1| =

1

0

0

e2 = b2 − (qT1b2)q1 =

1

0

0

−1

0

0

=

0

0

0

⇒ q2 ??

e3 = b3 − (qT1b3)q1 =

1

1

0

−1

0

0

=

0

1

0

⇒ q3 =

e3‖e3| =

0

1

0

Page 43: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.4 QR decomposition 4-42

Choose q2 to be a unit vector orthogonal to both q1 and q3. Then

⇒ A =[b1 b2 b3

]3×3

=[q1 q3

]3×2

[qT1b1 qT

1b2 qT1b3

0 0 qT3b3

]2×3

(Answer 1) =

1 0

0 1

0 0

3×2

[1 1 1

0 0 1

]2×3

=[q1 q2 q3

]m×3

qT

1b1 qT1b2 qT

1b30 0 0

0 0 qT3b3

3×3

(Answer 2) =

1 0 0

0 0 1

0 1 0

3×3

1 1 1

0 0 0

0 0 1

3×3

(Answer 3) =

1 0 0

0 1 0

0 0 1

3×3

1 1 1

0 0 1

0 0 0

3×3

Page 44: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.4 QR decomposition 4-43

Example. A =

[1 1 1

0 0 1

]

e1 = b1 =

[1

0

]⇒ q1 =

e1‖e1| =

[1

0

]

e2 = b2 − (qT1b2)q1 =

[1

0

]−[1

0

]=

[0

0

]⇒ q2 ??

e3 = b3 − (qT1b3)q1 =

[1

1

]−[1

0

]=

[0

1

]⇒ q3 =

e3‖e3| =

[0

1

]

⇒ A =[b1 b2 b3

]2×3

=[q1 q3

]2×2

[qT1b1 qT

1b2 qT1b3

0 0 qT3b3

]2×3

=

[1 0

0 1

]2×2

[1 1 1

0 0 1

]2×3

Page 45: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.4 QR decomposition 4-44

• In most cases Am×n has rank r = n; hence, Am×n = Qm×nRn×n and R is

invertible. This simplifies

ATAx̂ = ATb

to

ATAx̂ = ATb ⇔ RTQTQRx̂ = RTQTb ⇔ Rx̂ = QTb,

which, by the upper triangularity of R, can be solved directly by back sub-

stitution!

Page 46: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.4 (Modified) Gram-Schmidt procedure 4-45

The Matlab program to do the QR decomposition (or Gram-Schmidt process)

% Modified Gram-Schmidt

R=zeros(n) % Initialization

for j=1:n %

v=A(:,j); % v = bjfor i=1:j-1 % The columns up to j − 1 of Q are now q1, . . . , qj−1.

R(i,j)=Q(:,i)′*v; % ri,j = qTibj (m multiplications) (12a)

v=v-R(i,j)*Q(:,i) % Perform v = v − (qTibj)qi (m multiplications) (12b)

end % v = ej is now perpendicular to all q1, . . . , qj−1.

R(j,j)=norm(v); % Set diagonal entry of R. (m multiplications) (11)

Q(:,j)=v/R(j,j); % Normalization to obtain qj

end

(See Problem 28) In total, we need

m ·∑nj=1 1+2m ·∑n

j=1

∑j−1i=1 1 =mn+m(n− 1)n=mn2 multiplications.

(Problem 28, Section 4.4) Where are the mn2 multiplications in equaitons (11) and (12)?

A direct way to do QR-decomposition is:

[Q,R]=qr(A)

Page 47: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.4 Gram-Schmidt procedure 4-46

The “modified” Gram-Schmidt procedure that we just introduced is nu-

merically more stable than the “unmodified” (original) Gram-Schmidt

procedure below (because the division of the former is only performed at the

end).

Initialization: A sequence of (non-zero) vectors b1, b2, . . . , bn.

1. c1 = b1

2. c2 = b2 − cT1b2cT1c1

c1

(= b2 − c1

cT1b2cT1c1

= b2 − c1(cT1c1)

−1cT1b2

)

3. c3 = b3 − cT1b3cT1c1

c1 − cT2b3cT2c2

c2

(= b3 −

[c1 c2

]([cT1cT2

] [c1 c2

])−1 [cT1cT2

]b3

)

4. · · ·5. Repeat the above procedure until bn is examined.

Page 48: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.4 Naming convention 4-47

• The textbook uses A,B,C, . . . (uppercase letters) to denote c1, c2, c3, . . .

obtained in the (original) Gram-Schmidt procedure in the previous slide. I

personally do not like using the uppercase boldface letters to denote vectors.

So I change it to c1, c2, c3, . . ..

– You may need to know this naming convention in order to solve those

problems at the end of the section. For example, Problem 22.

(Problem 22, Section 4.4) Find orthogonal vectors A, B and C by Gram-Schmidt

from

a =

112

and b =

1

−1

0

and c =

104

.

Page 49: Chapter 4 Orthogonalityshannon.cm.nctu.edu.tw/la/la4s09.pdf · Chapter 4 Orthogonality Po-Ning Chen,Professor Department of Electrical andComputer Engineering ... (projection, possibly

4.4 Naming convention 4-48

• (Definition from the textbook) Orthogonal matrix: Q is called the orthog-

onal matrix when Q is square and its columns are orthonormal bases of the

column space of Q.

– This naming convection does cause a confusion. It should be called or-

thonormal square matrix.

– So when being asked “Whether Q−1 is an orthogonal matrix when Q is?”

The answer should be YES according to the definition in the textbook.

(See Problem 20.)

(Problem 20, Section 4.4) True or false (give an example in either case):

(a) Q−1 is an orthogonal matrix when Q is an orthogonal matrix.

(b) If Q (3 by 2) has orthonormal columns then ‖Qx‖ always equals to

‖x‖.