the singular value decomposition

88
Chapter 7 The Singular Value Decomposition In an earlier chapter we looked at a procedure for diagonalizing a square matrix by using a change of basis. At that time we saw that not every square matrix could be diagonalized. In this chapter we will look at a generalization of that diagonalization procedure that will allow us to “diagonalize” any matrix – square or not square, invertible or not invertible. This procedure is called the singular value decomposition. 7.1 Singular Values Let A be an m × n matrix, then we know that A T A will be a symmetric positive semi-definite n × n matrix. We can therefore find an orthonormal basis of R n consisting of eigenvectors of A T A. Let this orthonormal basis be {v 1 , v 2 ,..., v n } and let λ i be the eigenvalue of A T A corresponding to the eigenvector v i . Since A T A is positive semi-definite we must have λ i 0. Now notice that Av i 2 =(Av i ) T Av i = v T i A T Av i = λ i v T i v i = λ i Therefore the length of Av i is λ i . In other words, λ i is the factor by which the length of each eigenvector of A T A is scaled when multiplied by A. Furthermore, notice that for i = j we have Av i · Av j =(Av i ) T Av j = v T i A T Av j = λ j v i · v j =0 so {Av 1 ,Av 2 ,...,Av n } is an orthogonal set of vectors. If we want to normalize a non-zero vector, Av i , in this set we just have to scale it by 1/ λ i . Note also that some of the vectors in this set could be the zero vector if 0 happens to be an eigenvalue of A T A. In fact one of these vectors will definitely be the zero vector whenever Nul A = {0} (that is, whenever the columns of A are linearly dependent). The reason is as follows: Nul A = {0} Ax = 0 for some x = 0 A T Ax = 0 x is an eigenvector of A T A with eigenvalue 0. The implication also works in the other direction as follows: 0 is an eigenvalue of A T A A T Ax = 0 for some x = 0 x T A T Ax =0 Ax=0 Ax = 0 The columns of A are linearly dependent. 291

Upload: dundi-nath

Post on 02-Jun-2017

228 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: The singular value decomposition

Chapter 7

The Singular Value Decomposition

In an earlier chapter we looked at a procedure for diagonalizing a square matrix by using a changeof basis. At that time we saw that not every square matrix could be diagonalized. In this chapterwe will look at a generalization of that diagonalization procedure that will allow us to “diagonalize”any matrix – square or not square, invertible or not invertible. This procedure is called the singularvalue decomposition.

7.1 Singular Values

Let A be an m × n matrix, then we know that AT A will be a symmetric positive semi-definiten × n matrix. We can therefore find an orthonormal basis of R

n consisting of eigenvectors of AT A.Let this orthonormal basis be {v1,v2, . . . ,vn} and let λi be the eigenvalue of AT A correspondingto the eigenvector vi. Since AT A is positive semi-definite we must have λi ≥ 0.

Now notice that‖Avi‖2 = (Avi)

T Avi = vTi AT Avi = λiv

Ti vi = λi

Therefore the length of Avi is√

λi. In other words,√

λi is the factor by which the length of eacheigenvector of AT A is scaled when multiplied by A.

Furthermore, notice that for i 6= j we have

Avi · Avj = (Avi)T

Avj = vTi AT Avj = λjvi · vj = 0

so {Av1, Av2, . . . , Avn} is an orthogonal set of vectors. If we want to normalize a non-zero vector,Avi, in this set we just have to scale it by 1/

√λi. Note also that some of the vectors in this set

could be the zero vector if 0 happens to be an eigenvalue of AT A. In fact one of these vectors willdefinitely be the zero vector whenever Nul A 6= {0} (that is, whenever the columns of A are linearlydependent). The reason is as follows:

Nul A 6= {0} → Ax = 0 for some x 6= 0→ AT Ax = 0→ x is an eigenvector of AT A with eigenvalue 0.

The implication also works in the other direction as follows:

0 is an eigenvalue of AT A → AT Ax = 0 for some x 6= 0→ xT AT Ax = 0→ ‖Ax‖ = 0→ Ax = 0→ The columns of A are linearly dependent.

291

Page 2: The singular value decomposition

292 7. The Singular Value Decomposition

The above comments lead to the following definition.

Definition 22 Let A be an m×n matrix then the singular values of A are defined to be the square

roots of the eigenvalues1 of AT A. The singular values of A will be denoted by σ1, σ2, . . . , σn. It is

customary to list the singular values in decreasing order so it will be assumed that

σ1 ≥ σ2 ≥ · · · ≥ σn ≥ 0

Example 7.1.1

What are the singular values of A =

1 11 11 −1

?

The first step is to compute AT A which gives

[

3 11 3

]

. This matrix has the character-

istic polynomial

λ2 − 6λ + 8 = (λ − 4)(λ − 2)

which gives us the two eigenvalues of 4 and 2. We take the square roots of these to getthe singular values, σ1 = 2 and σ2 =

√2.

The vectors v1 =

[ √2/2√2/2

]

and v2 =

[ √2/2

−√

2/2

]

would be orthonormal eigenvectors

of AT A. What happens when these two vectors are multiplied by A?

Av1 =

√2√20

and this vector has length σ1 = 2.

Av2 =

00√2

and this vector has length σ2 =√

2.

So the lengths of v1 and v2 are scaled by the corresponding singular values when thesevectors are multiplied by A. Note also, as mentioned earlier, that Av1 and Av2 areorthogonal.Now consider the following problem: Let B = AT . What are the singular values of B?BT B will be a 3 × 3 matrix so B has 3 singular values. It was shown earlier that AT Aand AAT will have the same non-zero eigenvalues so the singular values of B will be 2,√

2, and 0.

Example 7.1.2

Let A =

[

1 21 2

]

. What are the singular values of A? (Note that in this case the columns

of A are not linearly independent so, for reasons mentioned earlier in this section, 0 willturn out to be a singular value.)

1Some textbooks prefer to define the singular values of A as the square roots of the non-zero eigenvalues of AT A.

Page 3: The singular value decomposition

7.1. Singular Values 293

The procedure is straightforward. First we compute

AT A =

[

2 44 8

]

and this matrix has characteristic polynomial λ2 − 10λ, which gives eigenvalues of 10and 0, and so we get σ1 =

√10 and σ2 = 0 as the singular values of A.

For λ = 10 we would have a unit eigenvector of v1 =

[

1/√

5

2/√

5

]

.

Then Av1 =

[√5√5

]

which has length σ1 =√

10.

For λ = 0 we would have a unit eigenvector of v2 =

[

−2/√

5

1/√

5

]

.

Then Av1 =

[

00

]

and this vector has length σ2 = 0.

The Singular Value Decomposition

Here is the main theorem for this chapter.

Theorem 7.1 (The Singular Value Decomposition) Let A be any m× n matrix, then we can

write A = UΣV T where U is an m × m orthogonal matrix, V is an n × n orthogonal matrix, and

Σ is an m × n matrix whose first r diagonal entries are the nonzero singular values σ1, σ2, . . . , σr

of A and all other entries are zero. The columns of V are called the right singular vectors. The

columns of U are called the left singular vectors.

Proof.Let A be any m × n matrix. Let σ1, σ2, . . . , σn be the singular values of A (with σ1, σ2, . . . , σr

the non-zero singular values) and let v1,v2, . . . ,vn be the corresponding orthonormal eigenvectorsof AT A. Let V =

[

v1 v2 · · · vn

]

. So V is an orthogonal matrix and

AV =[

Av1 Av2 · · · Avn

]

=[

Av1 Av2 · · · Avr 0 · · · 0]

We will mention here (the proof is left as an exercise) that r will be the rank of A. So it is possiblethat r = n in which case there will not be any columns of zeroes in AV .

Now let ui =1

σi

Avi for 1 ≤ i ≤ r. As we saw earlier these vectors will form an orthonormal

set of r vectors in Rm. Extend this set to an orthonormal basis of R

m by adding m− r appropriatevectors, ur+1, . . . ,um, and let U = [u1 u2 . . . ur ur+1 . . . um]. Then U will be an orthogonal matrixand

UΣ = [u1 u2 . . .um]

σ1 0 0 · · · 00 σ2 0 · · · 00 0 σ3 · · · 0...

......

. . ....

=[

σ1u1 σ2u2 · · · σrur 0 · · · 0]

=[

Av1 Av2 · · · Avr 0 · · · 0]

(In case the above reasoning is unclear remember that in the product UΣ the columns of Σ containthe weights given given to the columns of U and after the rth column all the entries in Σ are zeroes.)

Page 4: The singular value decomposition

294 7. The Singular Value Decomposition

Therefore AV = UΣ and multiplying on the right by V T gives us the singular value decompositionA = UΣV T .

The singular value decomposition (SVD) can also be written as

A = σ1u1vT1 + σ2u2v

T2 + · · · + σrurv

Tr

You should see a similarity between the singular value decomposition and the spectral decomposition.In fact, if A is symmetric and positive definite they are equivalent.

The singular value decomposition of a matrix is not unique. The right singular vectors are or-thonormal eigenvectors of AT A. If an eigenspace of this matrix is 1 dimensional there are two choicesfor the corresponding singular vector, these choices are negatives of each other. If an eigenspace hasdimension greater than 1 then there are infinitely many choices for the (orthonormal) eigenvectors,but any of these choices would be an orthonormal basis of the same eigenspace. Furthermore, asseen in the above proof, it might be necessary to add columns2 to U to make up an orthonormalbasis for R

m. There will be a certain amount of freedom in choosing these vectors.

Example 7.1.3

To illustrate the proof of Theorem 7.1 we will outline the steps required to find theSVD of

A =

[

1 21 2

]

In Example 7.1.2 we found the singular values of A to be σ1 =√

10 and σ2 = 0 so weknow that

Σ =

[√10 00 0

]

If we take the right singular vectors (in the appropriate order) as columns then we have

V =

[

1/√

5 −2/√

5

2/√

5 1/√

5

]

Take a moment to consider the following questions:

i. Are there any other possible answers for Σ in this example?

ii. Are there any other possible answers for V in this example?

The answer is no to the first question, and yes to the second. There are four possiblechoices for V . (What are they?)Now how can we find U? From the proof of Theorem 7.1 we see that

u1 =1

σ1Av1

=1√10

[

1 21 2

] [

1/√

5

2/√

5

]

=1√10

[√5√5

]

=

[

1/√

2

1/√

2

]

2Suppose we let W be the span of {u1,u2, · · · ,ur}. Then the columns that we add are an orthonormal basis ofW⊥.

Page 5: The singular value decomposition

7.1. Singular Values 295

This gives us the first column of U , but we can’t find u2 the same way since σ2 = 0. Tofind u2 we just have to extend u1 to an orthonormal basis of R

2. It should be clear that

letting u2 =

[

−1/√

2

1/√

2

]

will work. So we now have

U =

[

1/√

2 −1/√

2

1/√

2 1/√

2

]

Again, stop now and ask yourself if there any other possible choices for U at this stage?(The answer is yes, for any particular choice of V there are 2 choices for U .)We now have the SVD

A = UΣV T =

[

1/√

2 −1/√

2

1/√

2 1/√

2

] [√10 00 0

] [

1/√

5 2/√

5

−2/√

5 1/√

5

]

You should recognize U and V as rotation matrices.This SVD can also be written in the form

σ1u1vT1 + σ2u2v

T2 =

√10

[

1/√

2

1/√

2

]

[

1/√

5 2√

5]

Example 7.1.4

Find the SVD of A =

1 11 11 −1

.

We used this matrix for an earlier example and so we already have most of the importantinformation. From the earlier results we know that

Σ =

2 0

0√

20 0

and V =

[ √2/2

√2/2√

2/2 −√

2/2

]

The last step is to find U . The first column of U will be Av1 normalized, so u1 =

√2/2√2/2

0

. Similarly u2 =

001

. What about u3? First notice that at this point we

can write the SVD as follows:

UΣV T =

√2/2 0 ∗√2/2 0 ∗

0 1 ∗

2 0

0√

20 0

[ √2/2

√2/2√

2/2 −√

2/2

]

=

√2/2 0 ∗√2/2 0 ∗

0 1 ∗

√2

√2

1 −10 0

If we now carry out the last matrix multiplication, the entries in the third column ofU all get multiplied by 0. So in a sense it doesn’t matter what entries go in that lastcolumn.

Page 6: The singular value decomposition

296 7. The Singular Value Decomposition

This can also be seen if we write the SVD in the form σ1u1vT1 + σ2u2v

T2 . Since there is

no σ3 it follows that the value of u3 is not relevant when the SVD is expressed in thisform. In this form the SVD gives

σ1u1vT1 + σ2u2v

T2 = 2

√2/2√2/20

[√2/2

√2/2

]

+√

2

001

[√2/2 −

√2/2

]

= 2

1/2 1/21/2 1/20 0

+√

2

0 00 0√2/2 −

√2/2

=

1 11 10 0

+

0 00 01 −1

=

1 11 11 −1

But having said all this U should have a third column and if you wanted to find it howcould you do it? The set {u1,u2} is an orthonormal basis for a plane in R

3. To extendthese two vectors to an orthonormal basis for all of R

3 we want a third vector, u3, thatis normal to this plane. One way of doing this would be to let u3 = u1 ×u2. This would

give u3 =

√2/2

−√

2/20

.

Page 7: The singular value decomposition

7.1. Singular Values 297

Exercises

1. Find a singular value decomposition of the following matrices.

(a)

[

2 30 2

]

(b)

[

6 3−1 2

]

(c)

[

0 20 0

]

2. Find a singular value decomposition of the following matrices.

(a)

0 20 10 0

(b)

1 00 11 0

(c)

1 20 11 0

3. What are the singular values of the matrix

[

cos(θ) sin(θ)sin(θ) cos(θ)

]

4. Let A =

[

1 2 21 2 2

]

. Find a SVD for A and AT .

5. (a) Let A =

[

1 00 −2

]

. This is a symmetric indefinite matrix. Find a spectral decomposition and

a singular value decomposition for this matrix.

(b) Let A =

[

1 33 1

]

. This is a symmetric indefinite matrix. Find a spectral decomposition and a

singular value decomposition for this matrix.

(c) If A is a symmetric matrix show that the singular values of A are just the absolute value ofthe eigenvalues of A.

6. Find a singular value decomposition for the following matrices. Note that these matrices havedifferent sizes, but they are all of rank 1 so in each case the SVD can be written σ1u1v

T1 .

(a)

111

(b)

1 11 11 1

(c)

1 1 11 1 11 1 1

(d)

1 1 1 11 1 1 11 1 1 1

7. Find a singular value decomposition for the following matrices. Note that these matrices havedifferent sizes, but they are all of rank 2 so in each case the SVD can be written σ1u1v

T1 +σ2u2v

T2 .

(a)

1 01 00 1

(b)

1 0 21 0 20 1 0

(c)

1 0 2 01 0 2 00 1 0 2

8. Find the SVD of A =

1 1 00 0 10 0 −1

.

9. The matrix A =

[

1 3/20 1

]

is not diagonalizable. What is the singular value decomposition of this

matrix?

Page 8: The singular value decomposition

298 7. The Singular Value Decomposition

10. Let A =

0 00 01 1

. Find the singular value decomposition A = UΣV T . How many choices are

there for the second and third column of U?

11. Let A = UΣV T be the singular value decomposition of A. Express the following in terms of U , Σand V .

(a) AT A

(b) AAT

(c) (AT A)−1AT (assuming A has linearly independent columns)

(d) A(AT A)−1AT (assuming A has linearly independent columns)

12. Suppose A is a square matrix with singular value decomposition A = UΣV T

(a) What is the SVD of AT ?

(b) If A is invertible, what is the SVD of A−1?

(c) Show that | det(A)| is the product of the singular values of A.

13. Let A = UΣV T be the singular value decomposition of the m×n matrix A with U =[

u1 u2 · · · um

]

and V =[

v1 v2 · · · vn

]

. Show that

σ1u1vT1 + σ2u2v

T2 + · · · + σkukv

Tk

has rank k. (Hint: show that {vk+1,vk+2, · · · ,vn} is a basis for Nul A.)

14. (a) Suppose A is a symmetric matrix with the spectral decomposition A = PDP T , show thatthe spectral decomposition of A + I is P (D + I)P T .

(b) Suppose A is a square matrix with the SVD A = UΣV T . Is the SVD of A + I given byU(Σ + I)V T ?

15. Let Q be a matrix with orthonormal columns. What does a SVD of Q look like?

Page 9: The singular value decomposition

7.1. Singular Values 299

Using MAPLE

Example 1

The Maple command for computing the SVD is SingularValues and is illustrated below.We will find the SVD of

A =

0 1 23 0 12 3 01 2 3

>A:=<<0,3,2,1>|<1,0,3,2>|<2,1,0,3>>;

>U,S,Vt:=SingularValues(A,output=[’U’,’S’,’Vt’]):

>U;

\[

\left[ \begin {array}{cccc} - 0.32302& 0.49999& 0.03034&- 0.80296

\\\noalign{\medskip}- 0.41841&- 0.49999&

0.74952&- 0.11471\\\noalign{\medskip}-

0.55065&- 0.5000&- 0.65850&-

0.11471\\\noalign{\medskip}- 0.64604&

0.49999& 0.06068& 0.57354

\end {array} \right] \]

>S;

\[ \left[ \begin {array}{c} 5.35768\\\noalign{\medskip} 2.82843\\\noalign{\medskip} 2.30115

\\\noalign{\medskip} 0.0\end {array} \right] \]

>Vt;

\[

\left[ \begin {array}{ccc} - 0.56043&- 0.60979&- 0.56043\\\noalign{\medskip}-

0.70711&-{ 2.6895\times 10^{-16}}&

0.70711\\\noalign{\medskip} 0.43119&-

0.79256& 0.43119\end {array} \right]

\]

The singular values are returned as a vector not in the form of a diagonal matrix. If you want thesingular values in a matrix you can enter

>DiagonalMatrix(S[1..3],4,3);

>U.%.Vt;

This last command returns the following matrix

0.00000000036 1.0 2.0

3.0 0 0.9999999997

2.0 3.0 0.0000000001

1.000000001 2.0 3.000000001

Page 10: The singular value decomposition

300 7. The Singular Value Decomposition

This is matrix A with some small differences due to the accumulation of rounding errors in the floatingpoint arithmetic. The precision of our result could be improved by increasing the value of the Digits

variable in Maple .We could also write the SVD in the form

3∑

i=1

σiuivTi

In Maple this sum could be entered as

>simplify(add(S[i]*Column(U,i).Row(Vt,i),i=1..3));

This will again give matrix A with some rounding errors.

Example 2

We will use Maple to find the singular values of

1 a1 01 a

and we will investigate how these singular values relate to the parameter a.

>A:=<<1,1,1>|<a,0,a>>;

>U,S,Vt:=SingularValues(A,output=[’U’,’S’,’Vt’],conjugate=false);

We now have the two singular values of A expressed in terms of the parameter a. We can visualizethe relationship between a and these singular values as follows:

>plot({ [a,S[1],a=-4..4],[a,S[2].a=-4..4]});

We get Figure 7.1.

0

1

2

3

4

5

–4 –2 2 4

Figure 7.1: The singular values of A versus a.

The plot seems to indicate that one of the singular values, s[2], approaches a limit as a becomeslarge. We can compute this limit in Maple as follows

Page 11: The singular value decomposition

7.1. Singular Values 301

>limit(s[2], a=infinity);

1

We look at a variation on the same type of problem. Suppose we want to investigate the singularvalues of matrices of the form

B =

[

cos(t) sin(t)sin(t) cos(t)

]

We will first define

>f:=t-><<cos(t),sin(t)>|<sin(t),cos(t)>>;

This defines a function in Maple which returns a matrix of the desired form for any specified valueof t. For example the command

>f(1);

will return[

cos(1) sin(1)sin(1) cos(1)

]

and

>f(k);

will return[

cos(k) sin(k)sin(k) cos(k)

]

Next we enter

>g:=t->map( sqrt, eigenvals(transpose(f(t))&*f(t)) );

This will compute the singular values of our matrix for any specified value of t.For example, the command

>g(.3);

[ .659816, 1.250857 ]

returns the singular values of[

cos(.3) sin(.3)sin(.3) cos(.3)

]

So we can enter

>sv:=g(t):

>plot( [ sv[1], sv[2] ], t=-3..3);

These commands give Figure 7.2 which plots the singular values of our matrix as a funtion of t.

Example 3

We have seen that the SVD of matrix A can be expressed as

A =

r∑

i=1

σiuivTi

Page 12: The singular value decomposition

302 7. The Singular Value Decomposition

0.2

0.4

0.6

0.8

1

1.2

1.4

–3 –2 –1 1 2 3t

Figure 7.2: The singular values of B versus t.

where r is the rank of A. For any integer n with 0 < n ≤ r the sum

n∑

i=1

σiuivTi

is called the rank n singular value approximation of A.Before getting to the main problem we will look at a simple example to illustrate the basic idea.

Let A =

1.4 0.0 3.01.1 0.0 0.02.1 2.1 2.1

. This is a 3× 3 matrix of rank 3. We will find the SVD of A using Maple

.

>A:=<<1.4, 1.1, 2.1>|<0.0,0.0,2.1>|<2.1,2.1,2.1>>;

>U,S,Vt:=SingularValues(A,output=[’U’,’S’,’Vt’]);

>u1:=Column(U,1): ### The left singular vectors

>u2:=Column(U,2):

>u3:=Column(U,3):

>v1:=Row(V,1): ### the right singular vectors as row vectors

>v2:=Row(V,2):

>v3:=Row(V,3):

The rank 1 singular value approximation would be

>A1:=S[1]*u1.v1;

>A1:=U.DiagonalMatrix(<S[1],0,0>).Vt; ### another way to get the same result

A1 =

1.719 1.023 2.309.348 .207 .4681.953 1.163 2.624

How close is matrix A1 to A? This question makes sense only relative to an inner product. We will usethe inner product 〈A, B〉 = traceAT B.

The distance from A to A1 can now be computed as

>sqrt(Trace((A-A1)^%T.(A-A1)));

1.9046

Page 13: The singular value decomposition

7.1. Singular Values 303

We will mention without proof that, in fact, matrix A1 is the closest you can get to A by a matrixof rank 1 relative to this inner product.

The rank 2 approximation would be

>A2:=sv[1]*u1.v1 + sv[2]*u2.v2; ### one way

>A2:=U.DiagonalMatrix(<S[1],S[2],0>).Vt; ### another way

A2 =

1.365 .0232 3.016.433 .447 .2992.249 1.000 2.033

If you compare the entries in this matrix with those in A you can see that it appears to be close to Athan matrix A1. How far is A2 from A?

>sqrt(Trace((A-A2)^%T.(A-A2)));

.8790

So we see that A2 is a better approximation to A than A1. A2 will be the closest you can get to A bya rank 2 matrix.

If we were to continue this for one more step and compute the rank 3 singular value approximationwe would get A exactly. The distance from A3 to A would be 0.

We will extend this idea to a larger matrix.In this example we will choose a random 12× 12 matrix and compute the distance between the rank

n singular value approximation of A and A itself for n = 1..12. The distance will be computed relativeto the inner product 〈A, B〉 = traceAT B.

>A:=RandomMatrix(12,12, generator=0.0..9.0):

>U,S,Vt:=SingularValues(A,output=[’U’,’S’,’Vt’]);

>ip:=(A,B)->Trace(A^%T.B); ### our inner product

We will now define our rank n approximations in Maple . Then we compute the distances (i.e., theerrors of our approximations) using the inner product.

>for n to 12 do

B[n]:=eval(add(S[i]*Column(U,i).Row(Vt,i),i=1..n)) od:

>for n to 12 do

err[n]:=sqrt(ip(A-B[n],A-B[n])) od;

We can visualize these errors using a plot.

>plot([seq( [i,err[i]],i=1..12)],style=point);

This gives Figure 7.3.Of course B12 = A so the final error must be 0. The above pattern is typical for any matrix. As n

increases the approximations become better and better, and becomes exact when n = r.There is another interesting aspect to this example. We have found the singular values and place

them in the list sv. From this list we will define the following values

e1 =√

σ22 + σ2

3 + σ24 + · · · + σ2

12

e2 =√

σ23 + σ2

4 + · · · + σ212

e3 =√

σ24 + · · · + σ2

12

......

e11 = σ12

e12 = 0

Page 14: The singular value decomposition

304 7. The Singular Value Decomposition

0

100

200

300

400

500

600

2 4 6 8 10 12

Figure 7.3: The errors of the SVD approximations.

and plot them.

>for i to 12 do e[i]:=sqrt( add( sv[j]^2, j=i+1..12)) od:

>plot( [seq( [i, e[i]], i=1..12),style =point)

This plot turns out to be exactly the same as Figure 7.3. This illustrates a fact that is true in generaland whose proof is left as an exercise3: The error of the rank n singular value approximation is the squareroot of the sum of the squares of the unused singular values. That is, if you look at the unused singularvalues as a vector, then the error is the length of this vector.

3The trickiest part of the proof depends on the fact that if v is a unit vector then the trace of vvT is 1.

Page 15: The singular value decomposition

7.2. Geometry of the Singular Value Decomposition 305

7.2 Geometry of the Singular Value Decomposition

Let A =

[

2 −12 2

]

. This matrix has the following SVD:

A = UΣV T =

[

1/√

5 −2/√

5

2/√

5 1/√

5

] [

3 00 2

] [

2/√

5 −1/√

5

1/√

5 2/√

5

]T

The matrices U and V T are orthogonal matrices, and in this case they are simple rotationmatrices (i.e., there is no reflection). U corresponds to a counter-clockwise rotation by 63.4◦ andV T corresponds to a clockwise rotation of 26.6◦. Finally Σ is a diagonal matrix so it corresponds toa scaling by the factors of 3 and 2 along the two axes. So what happens to the unit circle when itis multiplied by A? We will look at the effect of multiplying the unit circle by each of the factors ofthe SVD in turn. The steps are illustrated in Figures 7.4 - 7.7

–3

–2

–1

0

1

2

3

–3 –2 –1 1 2 3

Figure 7.4: The unit circle with the rightsingular vectors

–3

–2

–1

0

1

2

3

–3 –2 –1 1 2 3

Figure 7.5: The unit circle is rotated byV T . The right singular vectors now lieon the axes.

–3

–2

–1

0

1

2

3

–3 –2 –1 1 2 3

Figure 7.6: The unit circle is scaled byΣ resulting in an ellipse.

–3

–2

–1

0

1

2

3

–3 –2 –1 1 2 3

Figure 7.7: The ellipse is rotated by U .

In Figure 7.4 we see the unit circle with the right singular vectors (the columns of V ) plotted.In Figure 7.5 the unit circle has been multiplied by V T , which means it has been rotated

clockwise. There is something you should understand about this result. First, recall that thecolumns of V form an orthonormal set of vectors - the right singular vectors. When these vectors(arranged in matrix V ) are multiplied by V T we get the identity matrix. This means that the rightsingular vectors have been reoriented (by a rotation and possibly a reflection) to lie along the axesof the original coordinate system. So in Figure 7.4 we see that the right singular vectors have been

Page 16: The singular value decomposition

306 7. The Singular Value Decomposition

rotated to lie on the x and y axes. (This happens in every case, multiplying by V T rotates, andpossibly flips, the right singular vectors so that they line up along the original axes.)

In Figure 7.6 the rotated unit circle is multiplied by Σ. Since Σ is a diagonal matrix we seethe expected result. The circle has been scaled by a factor of 3 along the x axis and by a factor of2 along the y axis. The circle has now been transformed into an ellipse.

Finally in Figure 7.7 we multiply by U . This is a rotation matrix so the ellipse in Figure 7.6is rotated so that it is no longer oriented along the x and y axes. The axes of the ellipse are nowthe left singular vectors. The vectors shown in Figure 7.7 are not the left singular vectors, theyare the vectors Av1 and Av2. The left singular vectors would be the result of normalizing these twovectors.

To summarize the above: The unit circle is transformed into an ellipse when it is multiplied byA. The axes of the ellipse are in the directions of u1 and u2. The points on the ellipse that arefurthest from the origin are Av1 and its negative. The points on the ellipse that are closest to theorigin are Av2 and its negative.

PROBLEM. Repeat the above example with A =

[

1 21 2

]

. Use the fact that the SVD for this

matrix is

A = UΣV T =

[

1/√

2 −1/√

2

1/√

2 1/√

2

] [√10 00 0

] [

1/√

5 −2/√

5

2/√

5 1/√

5

]T

Suppose we try a similar analysis with the matrix A =

1 11 11 −1

. We have already computed

the SVD of A:

A = UΣV T =

√2/2 0

√2/2√

2/2 0 −√

2/20 1 0

1 0

0√

20 0

[√2/2

√2/2√

2/2 −√

2/2

]

In this case notice that A is a 3 × 2 matrix so multiplication by A would correspond to a lineartransformation from R

2 to R3. In the SVD we have A = UΣV T where U is a 3 × 3 matrix, V is a

2× 2 matrix, and Σ is 3× 2. So U corresponds to a transformation from R3 to R

3, V T correspondsto a transformation from R

2 to R2, and Σ corresponds to a transformation from R

2 to R3.

So suppose we start with the unit circle in R2. When we multiply by V T the circle looks the

same, it has just been rotated so that the right singular vectors lie along the axes. Next we multiplyby Σ. Notice that for any vector in R

2 we have

Ax =

2 0

0√

20 0

[

xy

]

=

2x√2y0

So what happens here? We see that the x value is scaled by 2 and the y value is scaled by√

2 soagain the circle is stretched into an ellipse. But something else happens, there is a third coordinateof 0 that gets added on. In other words we still have an ellipse in the xy plane, but the ellipsein now located in 3 dimensional space. In this case multiplying by Σ has the effect of scaling andzero-padding. It is the zero-padding that results in the change of dimension.

Finally we multiply by U which again will be a rotation matrix, but now the rotation is in R3

so the ellipse is rotated out of the xy plane. The unit circle is again transformed into an ellipse,but the resulting ellipse is located in 3 dimensional space. These transformations are illustrated inFigures 7.8- 7.11.

Page 17: The singular value decomposition

7.2. Geometry of the Singular Value Decomposition 307

–2

–1

0

1

2

–2 –1 1 2

Figure 7.8: The unit circle with the rightsingular vectors

–2

–1

0

1

2

–2 –1 1 2

Figure 7.9: The unit circle is multipliedby V T . The right singular vectors nowlie on the axes.

–2–1012

–2–1

2

–2–1

12

Figure 7.10: The unit circle is scaledinto an ellipse by Σ and inserted intoR

3.

–1.5–1

0.511.5

–1

1

–1.50.511.5

Figure 7.11: The ellipse from is rotatedin R

3 by U .

PROBLEM. Do a similar analysis for multiplying the unit sphere by AT . (There are a couple ofmajor differences with this example. In particular, what exactly do you end up with in this case?)

In summary you should understand that the finding the SVD of a matrix A can be interpretedas factoring the matrix into a rotation followed by a scaling followed by another rotation. Thislast sentence is a bit of an oversimplificationin that there could also be reflections involved in theorthogonal matrices. Also if A is not a square matrix then multiplying by Σ will involve truncation(decreasing the dimension) or zero padding (increasing the dimension).

The SVD and Linear Transformations

If A is an m × n matrix then T (x) = Ax would be a linear transformation from Rn to R

m.

Rn

Rm

A-

Now when we find the singular value decomposition A = UΣV T the matrices U and V T can belooked at as change of basis matrices giving the following diagram.

Rn

Rm

Σ-

Rn

Rm

A-

? 6V T U

Page 18: The singular value decomposition

308 7. The Singular Value Decomposition

From this point of view you can look at A and Σ as corresponding to the same linear transfor-mation relative to different bases in the domain and codomain. More specifically, if the vectors inthe domain are expressed in terms of the columns of U and vectors in the codomain are expressedin terms of the columns of V , then the multiplication by A (in the standard basis) corresponds tomultiplication by Σ.

If the domain and codomain have different dimensions then the change in dimension is a resultof the operation of Σ. If the dimension is increased via the transformation, this is accomplishedthrough zero padding. If the dimension is decreased, this is accomplished through truncation.4

4In fact we can write Σ =ˆ

D O˜

= Dˆ

I O˜

or Σ =

»

DO

=

»

IO

D where D is a square diagonal matrix (with

possibly some zeroes on the diagonal). So Σ can be written as the product of a square matrix which scales the entries

in a vector and a truncation matrix,ˆ

I O˜

or a zero padding matrix

»

IO

.

Page 19: The singular value decomposition

7.2. Geometry of the Singular Value Decomposition 309

Exercises

1. For A =

[

6 2−7 6

]

we have the SVD

A = UΣV T =

[

1/√

5 2/√

5

−2/√

5 1/√

5

] [

10 00 5

] [

2/√

5 −1/√

5

1/√

5 2/√

5

]

Plot the unit circle with the right singular vectors then show the result of successively multiplyingthis circle by V T , Σ, and U

2. Let matrix A be the same as in question (1). Repeat the steps of question 1 for

(a) AT

(b) A−1

3. For A =

[

2 30 2

]

we have the SVD

A = UΣV T =

[

2/√

5 −1/√

5

1/√

5 2/√

5

] [

4 00 1

] [

1/√

5 2/√

5

−2/√

5 1/√

5

]

Plot the unit circle with the right singular vectors then show the result of successively multiplyingthis circle by V T , Σ, and U

4. Let matrix A be the same as in question (3). Repeat the steps of question 1 for

(a) AT

(b) A−1

5. Let A =

1 11 11 1

. This matrix has the following SVD

A = UΣV T =

1/√

3 1/√

6 1/√

2

1/√

3 1/√

6 −1/√

2

1/√

3 −2/√

6 0

√6 0

0 00 0

[ √2/2

√2/2

−√

2/2√

2/2

]

(a) Describe the effect of multiplying the unit circle by A by looking at the effect of multiplyingsuccessively by each factor of the SVD.

(b) The unit circle gets transformed into a line segment in R3 with what end points?

(c) What is a basis for Col A? How does this relate to the answer for (b)?

(d) What are the furthest points from the origin on the transformed unit circle? How far are thesepoints from the origin? What does this have to do with the singular values of A?

6. Let A =

[

1 1 11 1 1

]

. This matrix has the following SVD

A = UΣV T =

[√2/2 −

√2/2√

2/2√

2/2

] [√6 0 0

0 0 0

]

1/√

3 1/√

3 1/√

3

1/√

6 1/√

6 −2/√

6

1/√

2 −1/√

2 0

Page 20: The singular value decomposition

310 7. The Singular Value Decomposition

(a) Describe the effect of multiplying the unit sphere by A by looking at the effect of multiplyingsuccessively by each factor of the SVD.

(b) The unit sphere gets transformed into a line segment in R2 with what end points?

(c) What is a basis for Col A? How does this relate to the answer for (b)?

(d) What are the furthest points from the origin on the transformed unit circle? How far are thesepoints from the origin? What does this have to do with the singular values of A?

7. Let A =

0 0 11 0 1−1 0 0

.

(a) Find the SVD of A.

(b) The unit sphere will be transformed into a filled ellipse in R3. What is the equation of the

plane containing this ellipse.

(c) What are the points on the ellipse that are furthest from the origin? What is the distance ofthese points from the origin.

8. Are the following statements TRUE or FALSE?

(a) A 2 × 2 matrix of rank 1 transforms the unit circle into a line segment in R2.

(b) A 3 × 2 matrix of rank 1 transforms the unit circle into a line segment in R3.

(c) A 2 × 2 matrix of rank 2 transforms the unit circle into an ellipse in R2.

(d) A 3 × 2 matrix of rank 2 transforms the unit circle into an ellipse in R3.

(e) A 3 × 3 matrix of rank 3 transforms the unit sphere into an ellipsoid in R3.

(f) A 3 × 3 matrix of rank 1 transforms the unit sphere into a line segment in R3.

Page 21: The singular value decomposition

7.2. Geometry of the Singular Value Decomposition 311

Using MAPLE

Example 1

In this example we will use Maple to illustrate the geometry of the SVD in R3. We will let

A =

1 0 10 1 11 0 −.5

and show the effects of multiplying the unit sphere by this matrix.We will use the following basic fact: the unit sphere can be plotted using the vector

v =

cos(s) sin(t)sin(s) sin(t)

cos(t)

and letting the parameter s range over the interval [0, 2π] and the parameter t range over the interval[0, π]. We will in fact write a Maple procedure that will plot the top and bottom halves in differentcolors.

>showsphere:=proc(matr)

local A,v1,v2,p1,p2;

A:=matr:

v1:=<cos(s)*sin(t), sin(s)*sin(t), cos(t)>:

v2:=A.v2:

p1:=plot3d(v2,s=0..2*Pi,t=0..Pi/2,color=grey):

p2:=plot3d(v2,s=0..2*Pi,t=Pi/2..Pi,color=blue):

plots[display]([p1,p2],scaling=constrained,orientation=[100,70]);

end:

In this procedure, the input matr is assumed to be a 3 × 3 matrix and the procedure plots the unitsphere after being multiplied by the matrix.

Next we will enter matrix A and find the SVD.

>A:=<<1,0,1,0,1,1,1,0,-.5>>;

>U,S,Vt:=SingularValues(A,output=[’U’,’S’,’Vt’]):

>I3:=IdentityMatrix(3):

Next we will use the showsphere above and apply the various transformations to a sphere.Now to plot the results we just have to enter the following:

>showsphere(I3); #### the original sphere

>showsphere(Vt); #### apply Vt

>showsphere(DiagonalMatrix(S).Vt); ### now apply S

>showsphere(U.DiagonalMatrix(S).Vt); ### and finally apply U

This gives Figures 7.12- 7.15.Note that one of the singular values is .3099 which results in the sphere being flattened a lot in one

direction. To see this it is a good idea to use the mouse to rotate the plots once they have been drawnin order to see them from different viewing angles.

Example 2

Page 22: The singular value decomposition

312 7. The Singular Value Decomposition

Figure 7.12: The unit sphere. Figure 7.13: Multiply by V T .6

Figure 7.14: Multiply by Σ.

Figure 7.15: Multiply by U .

In this example we will let A =

[

1.2 1 11 −1 1

]

. This corresponds to a transformation from R3 to R

2.

Finding the SV D with Maple we get:

>A:=<<1.2,1>|<1,-1>|<1,1>>;

>U,S,Vt:=SingularValues(A, output=[’U’,’S’,’Vt’]);

So we have Σ =

[

2.107 0 00 1.414 0

]

which involves scaling and truncation (dimension reduction). In

this case we will have to modify our approach since after multiplying by Σ we will be in R2. We will still

use the plot3d command by adding a third component of zero, and choosing an appropriate viewingangle.

>S1:=DiagonalMatrix(S,2,3):

>v:=<cos(s)*sin(t),sin(s)*sin(t),cos(t)>:

>SV:=S1.Vt.v;

>USV:=U.S1.Vt.v;

We now have a slight problem when it comes to plotting. The vectors ΣV Tv and UΣV Tv arevectors in R

2 using 2 parameters. Maple doesn’t have a command for plotting in two dimensions with2 parameters, so we will use a trick as shown below.

>showsphere(V);

>plot3d( [SV[1],SV[2],0],s=0..2*Pi,t=0..Pi,

orientation=[90,0],scaling=constrained);

>plot3d( [USV[1],USV[2],0],s=0..2*Pi,t=0..Pi,

orientation=[90,0],scaling=constrained);

This gives Figures 7.16 - 7.19.Multiplication by V T gives, as expected, a rotation in R

3. Multiplication by Σ truncates the thirdcoordinate and scales the result. This gives a filled ellipse in R

2. Multiplying by U rotates this ellipse inR2. Notice that the plotting method we used makes it clear where the “north pole” and “south pole” ofthe original sphere have ended up. They are in the interior of the ellipse at the points

[

1.2 1 11 −1 1

]

001

=

[

11

]

Page 23: The singular value decomposition

7.2. Geometry of the Singular Value Decomposition 313

Figure 7.16: The unit sphere. Figure 7.17: Multiply by V T .

Figure 7.18: Multiply by Σ.

Figure 7.19: Multiply by U .

and[

1.2 1 11 −1 1

]

00−1

=

[

−1−1

]

Page 24: The singular value decomposition

314 7. The Singular Value Decomposition

7.3 The Singular Value Decomposition and the Pseudoin-

verse

Consider the matrix A =

1 11 00 1

. This matrix has no inverse, but the pseudoinverse as defined

in Chapter 5 would be

A† = (AT A)−1AT =

[

2 11 2

]−1 [

1 1 01 0 1

]

=

[

1/3 2/3 −1/31/3 −1/3 2/3

]

Now look at the SVD of A. From AT A we get singular values of√

3 and 1. Omitting the rest ofthe details we get

A = UΣV T =

2/√

6 0 1/√

3

1/√

6 −1/√

2 −1/√

3

1/√

6 1/√

2 −1/√

3

√3 0

0 10 0

[

1/√

2 1/√

2

−1/√

2 1/√

2

]

Now suppose we ask ourselves why matrix A cannot be inverted. If we look at the SVD we seethat A can been decomposed into three factors. Of those three both U and V are invertible (sincethey are orthogonal, their inverse is just their transpose), so the reason that A is not invertiblemust have something to do with Σ. What is the effect of Σ, the middle factor? It scales the firstcomponent by

√3, and this scaling can be inverted (just divide the first component by

√3). It scales

the second component by 1, and again this scaling can be undone. There is a third effect of theΣ matrix, it takes vectors in R

2 and places them in R3 by adding a 0 as a third component (zero

padding). It is this last effect of Σ that lies behind the non-invertibility of A in that it changes thedimension of the vector. Every vector in R

2 gets transformed into a unique vector in R3 by A, but

the reverse is not true. Every vector in R3 does not have a pre-image in R

2 since the column spaceof A is two dimensional. It is precisely the vectors in R

3 that are not in the column space of A thatdo not have a pre-image in R

2.So we have A = UΣV T and if each factor was invertible the inverse of A would be V Σ−1UT .

This should be a 2× 3 matrix which corresponds to a linear transformation from R3 to R

2 that willundo the effects of matrix A. The problem is the middle term, the matrix Σ has no inverse. Howclose can we come to finding an inverse of Σ? To undo the effects of matrix A we want to do threethings: scale the first component by 1/

√3, scale the second component by 1, and chop off (truncate)

the third component of an input vector in R3 . The matrix that would do this is

[

1/√

3 0 00 1 0

]

and, for reasons that will become clear shortly, we will call this matrix Σ†. If we evaluate V Σ†UT

we get[

1/√

2 −1/√

2

1/√

2 1/√

2

] [

1/√

3 0 00 1 0

]

2/√

6 1/√

6 1/√

6

0 −1/√

2 1/√

2

1/√

3 −1/√

3 −1/√

3

=

[

1/√

6 −1/√

2 0

1/√

6 1/√

2 0

]

2/√

6 1/√

6 1/√

6

0 −1/√

2 1/√

2

1/√

3 −1/√

3 −1/√

3

=

[

1/3 2/3 −1/31/3 −1/3 2/3

]

In other words we get A†, the pseudoinverse of A.Now, in general, when you find the SVD of an m×n matrix A = UΣV T the matrix Σ will be an

m × n matrix of the form

[

D 00 0

]

where D stands for a square diagonal matrix with all non-zero

Page 25: The singular value decomposition

7.3. The Singular Value Decomposition and the Pseudoinverse 315

diagonal entries. We will define the pseudoinverse of Σ to be the n × m matrix

[

D−1 00 0

]

. The

matrix D−1 will undo the scalings of D.

The principle behind the pseudoinverse is essentially how we deal with Σ. The principle is toinvert all scalings and then to undo any zero padding by a truncation and vice versa..

To clarify the point we are trying to make in this section suppose A is an m × n matrix withlinearly independent columns with the singular value decomposition A = UΣV T . The pseudoinverseof A as defined in Chapter 5 would be

A† = (AT A)−1AT

= (V ΣT UT UΣV T )−1V ΣT UT

= (V ΣT ΣV T )−1V ΣT UT

= (V

σ21

σ22

. . .

σ2n

V T )−1V ΣT UT

= V

1/σ21

1/σ22

. . .

1/σ2n

V T V ΣT UT

= V

1/σ21

1/σ22

. . .

1/σ2n

σ1 . . .σ2

. . .

σn · · ·

UT

= V

1/σ1 · · ·1/σ2

. . .

1/σn · · ·

UT

= V Σ†UT

In other words the pseudoinverse as defined in this section in terms of the singular value decom-position is consistent with our previous definition. But this new definition is more powerful becauseit is always defined. It is not restricted to matrices with linearly independent columns.

Example 7.3.5

W hat is the pseudoinverse of A =

[

1 21 2

]

?

We have already found the SVD of this matrix in Example 7.1.4

A = UΣV T

=

[√2/2 −

√2/2√

2/2√

2/2

] [√10 00 0

] [

1/√

5 2/√

5

−2/√

5 1/√

5

]

Page 26: The singular value decomposition

316 7. The Singular Value Decomposition

From the above discussion we have the pseuodinverse

A† = V Σ†UT

=

[

1/√

5 −2/√

5

2/√

5 1/√

5

] [

1/√

10 00 0

] [ √2/2

√2/2

−√

2/2√

2/2

]

=

[

1/10 1/101/5 1/5

]

What happens if you multiply A by its pseudoinverse? Do you get the identity?No. Simple computation gives

AA† =

[

1 21 2

] [

1/10 1/101/5 1/5

]

=

[

1/2 1/21/2 1/2

]

and

A†A =

[

1/10 1/101/5 1/5

] [

1 21 2

]

=

[

1/5 2/52/5 4/5

]

Suppose we write the SVD of an m × n matrix A as

A = σ1u1vT1 + σ2u2v

T2 + σ3u3v

T3 + · · · + σrurv

Tr

where r is the number of non-zero singular values of A. Then the above comments mean that thepseudoinverse of A can be written as

A† =1

σ1v1u

T1 +

1

σ2v2u

T2 +

1

σ3v3u

T3 + · · · + 1

σr

vruTr

Notice what happens when these two expressions are multiplied together. We leave it as a simpleexercise to show that

AAT = u1uT1 + u2u

T2 + u3u

T3 + · · · + uru

Tr

andAT A = v1v

T1 + v2v

T2 + v3v

T3 + · · · + vrv

Tr

These are just projectors onto ColU and ColV respectively5.

Example 7.3.6

C onsider the matrix A =

[

1 11 1

]

which has the following SVD

A =

[√2/2 −

√2/2√

2/2√

2/2

] [

2 00 0

] [ √2/2

√2/2

−√

2/2√

2/2

]

Now it should be obvious that A is not invertible, in fact since the columns of A are notlinearly independent you can’t find the pseudoinverse of A from the formula (AT A)−1AT .But why is A not invertible? What insights can the SVD give us into this question?We have A = UΣV T so it might seem that to invert A all we have to do is to inverteach of the factors of the SVD and then reverse the order of multiplication. If we trythis there is certainly no problem with U or V ; since these matrices are orthogonal theyare certainly invertible. But what about Σ. This is just a scaling matrix and so it might

5We will soon see that Col U = Col A andCol V = Row A.

Page 27: The singular value decomposition

7.3. The Singular Value Decomposition and the Pseudoinverse 317

seem that to invert it we just have to undo the scalings. In particular the x coordinateis scaled by 2 so to undo that scaling we just have to multiply by 1/2. But the y value isscaled by 0 and that means all y values are mapped to 0, so there is no way to undo thisscaling.That is one way of understanding why A is not invertible – one of the singularvalues is equal to 0 and a scaling by 0 cannot be inverted.If we proceed as outlined above, the pseudoinverse of A should be given by V Σ†UT whichgives

[√2/2 −

√2/2√

2/2√

2/2

] [

1/2 00 0

] [ √2/2

√2/2

−√

2/2√

2/2

]

=

[

1/4 1/41/4 1/4

]

Now suppose you had the following system of equations

x1 + x2 = 1

x1 + x2 = 3

This system is obviously inconsistent. The normal equations would be

2x1 + 2x2 = 4

2x1 + 2x2 = 4

The normal equations have infinitely many solutions so the system we are looking atdoesn’t have a unique least squares solution. It has infinitely many least squares solu-tions.

–2

2

4

6

–3 –2 –1 1 2 3

x

Figure 7.20: The two solid parallel lines represent the inconsistent system. The dotted line representsthe least-squares solutions to the system.

The normal equations imply that all the points on the line x1 + x2 = 2 would be leastsquares solutions. This is illustrated in Figure 7.20. If we write this system as Ax = bsuppose we tried to find a least squares solution by multiplying by the pseudoinversefound above. In this case we get

A†b =

[

1/4 1/41/4 1/4

] [

13

]

=

[

11

]

What is so special about this result? First of all it lies on the line x1 + x2 = 2 so it isa least squares solution. More than that it is the least squares solution of the minimumlength (i.e., it is the least squares solution that is closest to the origin).

Page 28: The singular value decomposition

318 7. The Singular Value Decomposition

Although we won’t prove it, what happended in this example will always happen. IfAx = b has linearly independent columns then A†b will give the unique least squaressolution to the system. If Ax = b has linearly dependent columns then the system willhave many least squares solutions and A†b will give the least squares solution to thesystem of minimum norm.

You should see the pseudoinverse as a generalization of the idea of a matrix inverse. The followingpoints should clarify this.

• If A is square with independent columns then A is invertible and the pseudoinverse of A wouldbe the same as the inverse. That is,

A† = A−1

In this case, a linear system Ax = b would have the unique solution A−1b.

• If A is not square but has linearly independent columns then A is not invertible but A doeshave a pseudoinverse. The pseudoninverse can be computed as

A† = (AT A)−1AT

In this case A†b gives the unique least-squares solution to Ax = b.

• If A does not have linearly independent columns then the pseudoninverse can be computedusing the SVD. In this case A†b gives the least-squares solution of minimum norm to thesystem Ax = b.

Page 29: The singular value decomposition

7.3. The Singular Value Decomposition and the Pseudoinverse 319

Exercises

1. Suppose you are given the following SVD of A

A =

1 0 0

0 1/√

2 −1/√

2

0 1/√

2 1/√

2

4 00 20 0

[

2/√

5 −1/√

5

1/√

5 2/√

5

]

What is A†?

2. Suppose

A = 3

1/√

2

1/√

20

[

2/3 1/3 2/3]

What is A†?

3. Suppose

A = 3

211

[

1 1 1]

What is A†?

4. Use the SVD to find the pseudoinverse of

(a)

1 11 11 1

(b)

[

1 1 11 1 1

]

(c)

[

1 −1−1 1

]

5. Find the psudoinverse of

(a)

1 2 30 0 00 0 0

(b)

1 0 02 0 03 0 0

(c)

0 0 10 0 20 0 3

6. Let Σ =

[

3 0 00 2 0

]

. Evaluate ΣΣ† and Σ†Σ.

7. Let Σ =

6 00 40 0

. Evaluate ΣΣ† and Σ†Σ.

8. Let Σ =

5 0 00 2 00 0 00 0 0

. Evaluate ΣΣ† and Σ†Σ.

9. Use the pseudoinverse to find a least squares solution to the following system:

x1 + x2 + x3 = 0

x1 + x2 + x3 = 6

Page 30: The singular value decomposition

320 7. The Singular Value Decomposition

10. The system

x1 + 2x2 + x3 = 3

x2 − x3 = 0

is consistent and has infinitely many solutions.

(a) Find an expression for the general solution of this system.

(b) Find an expression for the magnitude squared of the general solution and use calculus todetermine the smallest possible value of the magnitude squared.

(c) If you write this system as Ax = b, evaluate A†b. How does this relate to the answer from(b)?

11. (a) What is the pseudoninverse of any n × 1 matrix A =

a1

a2

...an

? (Hint: use the fact that this

matrix has rank 1.)

(b) What is the pseudoninverse of any 1 × n matrix A =[

a1 a2 · · · an

]

?

12. Let A be an m × n matrix of rank r with SVD A = UΣV T .

(a) What is A†ui for 1 ≤ i ≤ r?

(b) What is A†ui for r + 1 ≤ i ≤ m?

13. If A has orthonormal columns what is A†?

14. Show that if A is an invertible matrix then A† = A−1.

15. Show that A†A and AA† are symmetric.

16. Show that AA†A = A and A†AA† = A†. (Note: this result along with the previous problem showsthat AA† and A†A are projectors.)

Page 31: The singular value decomposition

7.3. The Singular Value Decomposition and the Pseudoinverse 321

Using MAPLE

Example 1.

In this example we will use Maple to find the least-squares solution to an overdetermined system withthe pseudoinverse. Our system of equations will represent an attempt to write ex as a linear combinationof 1, x, x2, and x3. We will convert this into a discrete problem by sampling these functions 41 timeson the interval [−2, 2].

>f:=x->exp(x):

>g[1]:=x->1:

>g[2]:=x->x:

>g[3]:=x->x^2:

>g[4]:=x->x^3:

>xvals:=Vector(41,i->-2+.1*(i-1)):

>u:=map(f,xvals):

>for i to 4 do v[i]:=map(g[i],xvals) od:

We will now try to write u as a linear combination of the vi. Now vector u is a discrete approximationto ex and vectors v1, v2, v3, and v4 are approximations to 1, x, x2 and x3 so our problem is the discreteversion of trying to write ex as a cubic polynomial. Setting up this problem will result in an inconsistentsystem of 41 equations in 4 unknowns.

In the following Maple commands we compute the pseudoinverse from the fact that if

A = σ1u1vT + σ2u2v

22 + · · · + σrurv

Tr

then

A† =1

σ1v1u

T +1

σ2v2u

22 + · · · + 1

σr

vruTr

>A:=<v[1]|v[2]|v[3]|v[4]>:

>U,S,Vt:=SingularValues(A,output=[’U’,’S’,Vt’]);

>pinvA:=eval( add( 1/S[i] * Column(V^%T,i) . Column(U^%T,i),i=1..4)):

>soln:=pinvA.u;

soln = [.92685821055486, .9606063839232, .6682692476746, .209303723666]

>p1:=add(soln[i]*x^(i-1),i=1..4);

>p2:=1+x+1/2*x^2+1/6*x^3;

>plot([exp(x),p1,p2],x=-2..2,color=[black,red,blue]);

The resulting plot is shown in Figure 7.21. By looking at the graphs it appears that by usingthe weights computed above we get a better approximation to ex than the Taylor polynomial. We canquantify this a bit more clearly as follows:

>int((exp(x)-p1)^2,x=-2..2);

.015882522833790110294

>int((exp(x)-p2)^2,x=-2..2);

.27651433188568219389

These values show that p1 is closer to ex than p2

Example 2.

Page 32: The singular value decomposition

322 7. The Singular Value Decomposition

2

4

6

–2 –1 1 2x

Figure 7.21:

In this example we will illustrate how to write a Maple procedure that will compute the pseudoinverseof a matrix. We will call our procedure pinv. We will first give the procedure and make some commentsafterwards. When you are entering the procedure you should end each line (until you are finished) withSHIFT-ENTER rather than ENTER. This prevents a new prompt from appearing on each line.

>pinv:=proc(A)

local sv1,sv2,U,V,i:

U,S,Vt:=SingularValues(A,output=[’U’,’S’,’Vt’]);

sv2:=select(x->x>10^(-8),S);

eval(add( 1/sv2[i]*Column(Vt^%T,i).Row(U^%T,i),i=1..Dimension(sv2)));

end;

• The first line gives the name of the procedure and indicates that the procedure will require oneinput parameter. The A in this line is a dummy variable, it stands for whatever matrix is input tothe procedure.

• The second line lists the local variables used in the procedure. These are basically all the symbolsused within the procedure.

• The third line computes the SVD of the input matrix.

• The fourth line is a bit tricky. Some of the singular values from the previous line could be zero.We just want the non-zero singular values. But, unfortunately, due to rounding errors sometimessingular values that should be 0 turn out to be small non-zero decimals. This line selects all thesingular values that are greater than 10−8. Even if a singular value is not zero but very small thenits reciprocal will be very large and this can result in numerical instability in the computation.

• The fifth line computes the pseudoinverse as

∑ 1

σi

viuTi

for the non-zero singular values, or at least for the singular values greater than our cut-off value.

Page 33: The singular value decomposition

7.3. The Singular Value Decomposition and the Pseudoinverse 323

• The last line indicates that the procedure is finished. You can now use the pinv command to findthe pseudoinverse of any (numerical) matrix.

For example:

>M:=<<1,5>|<2,6>|<3,7>|<4,8>>;

>pinv(M);

This returns the matrix:

−0.5500000002 0.2500000001

−0.2250000001 0.1250000000

0.1000000000 −1.0 × 10−11

0.4250000002 −0.1250000001

Page 34: The singular value decomposition

324 7. The Singular Value Decomposition

7.4 The SVD and the Fundamental Subspaces of a Matrix

Suppose A is an m × n matrix of rank r with the following SVD

A =[

u1 . . . ur ur+1 . . . um

]

σ1

. . .

σr

0. . .

0

[

v1 . . . vr vr+1 . . . vn

]T

which can be written asA = σ1u1v

T1 + σ2u2v

T2 + · · · + σrurv

Tr

Now since A is m × n with rank r it follows that NulA has dimension n − r. If we look at theproduct Avk where k > r then we have

Avk =(

σ1u1vT1 + σ2u2v

T2 + · · · + σrurv

Tr

)

vk = 0

since the columns of V are orthogonal. It follows then that {vr+1, . . . ,vn} is an orthonormal basisof Nul A. Since the row space of A is the orthogonal complement of the null space it then followsthat {v1, . . . ,vr} is an orthonormal basis of RowA.

If we apply the above argument to AT we then get {ur, . . . ,um} is an orthonormal basis ofNul AT , and {u1, . . . ,ur} is an orthonormal basis for RowAT (which is the same as Col A).

Given any matrix A, the four fundamental subspaces of A are: Col A, Nul A, Col AT , and NulAT .So the SVD of A gives orthonormal bases for each of these subspaces.

The SVD also gives us projectors onto these four fundamental subpaces.

• AA† projects onto ColA.

• A†A projects onto RowA.

• I − AA† projects onto NulAT .

• I − A†A projects onto NulA.

The following may help clarify some of the above comments:

• If A is an n × n matrix with linearly independent columns then A is invertible and

A−1A = I

AA−1 = I

In this case we have ColA = RowA = Rn and Nul A = Nul AT = {0}.

• If A is not square but has linearly independent columns then A has a pseudoinverse and

A†A = I

AA† = the projector onto Col A

• If the columns of A are not linearly independent then A has a pseudoinverse and

A†A = the projector onto Row A

AA† = the projector onto Col A

Page 35: The singular value decomposition

7.4. The SVD and the Fundamental Subspaces of a Matrix 325

Example 7.4.7

L et

A =

1 1 11 1 1−1 −1 −1−1 −1 −1

It should be perfectly clear that A has rank 1 and that

111

is a basis for the row space

of A and

11−1−1

T

is a basis for the column space of A. If we find the SVD we get

V =

1/√

3 −1/√

2 1/√

6

1/√

3 0 −2/√

6

1/√

3 1/√

2 1/√

6

The first column is a unit vector that is a basis of RowA. Because the columns areorthonormal, the second and third columns form a basis for the plane orthogonal to therow space, and that is precisely Nul A.We also have

U =

1/2 1/√

2 1/√

6 1/√

12

1/2 0 0 −3/√

12

−1/2 1/√

2 −1/√

6 −1/√

12

−1/2 0 2/√

6 −1/√

12

Again is should be easy to see that the first column is a unit vector that is a basis forColA and so the remaining columns must be an orthonormal basis of Nul AT

Example 7.4.8

L et A =

0 1 01 0 20 2 0

. Find the matrix that projects vectors orthogonally onto Col A.

One way of doing this would be to find an explicit orthonormal basis for the columnspace. In this particular case this is easy because it is clear that the first two columnsform an orthogonal basis for the column space. If we normalize these columns then wecan compute the projector as

0 1/√

51 0

0 2/√

5

0 1/√

51 0

0 2/√

5

T

=

1/5 0 2/50 1 0

2/5 0 4/5

(If you look at this projector it should be clear that it has rank 2. You should rememberthat this corresponds to the fact that it projects vectors onto a 2 dimensional subspace.)Another way of finding the projector is by the SVD. In this case the SVD would be givenby

UΣV T =

0 1/√

5 −2/√

51 0 0

0 2/√

5 1/√

5

√5 0 0

0√

5 00 0 0

1/√

5 0 −2/√

50 1 0

2/√

5 0 1/√

5

T

Page 36: The singular value decomposition

326 7. The Singular Value Decomposition

The pseudoinverse is then given by

A† = V Σ†UT =

0 1/5 01/5 0 2/50 2/5 0

The projector onto Col A will then be

AA† =

1/5 0 2/50 1 0

2/5 0 4/5

Note that in this case the first method seems simpler because it was very easy to find anorthonormal basis for the column space. The second method has an advantage in thatit allows you to define the projector strictly in terms of matrix A regardless of the sizeof A.

Page 37: The singular value decomposition

7.4. The SVD and the Fundamental Subspaces of a Matrix 327

Exercises

1. Let A =

1 1 01 1 00 0 1

.

(a) Find the SVD of A.

(b) Find a basis for Col A and Row A.

(c) Find a basis for NulA and NulAT .

(d) Evaluate A†A and AA†

2. Let A =

1 01 01 01 2

.

(a) Find the SVD of A.

(b) Find a basis for Col A and Row A.

(c) Find a basis for NulA and NulAT .

(d) Evaluate A†A and AA†

Page 38: The singular value decomposition

328 7. The Singular Value Decomposition

7.5 The SVD and Statistics

There are deep connections between linear agebra and statistics. In this section we want to takea brief look at the relationship bewteen the SVD of a matrix and several statistical concepts.

Suppose a series of measurements results in several lists of related data. For example, in a studyof plant growth biologists might collect data about the temperature, the acidity of the soil, theheight of the plants, and the surface area of the leaves. The data collected can be arranged in theform of a matrix called the matrix of observations. Each parameter that is measured can bearranged along one row of the matrix, so an m×n matrix of observations consists of n observations(i.e., measurements) of m different parameters.

Let X = [X1 X2 · · · Xn] be an m × n matrix of observations. The sample mean, M, is givenby

M =1

n(X1 + X2 + · · · + Xn)

If we define X̂j = Xk − M then the matrix

B =[

X̂1 X̂2 · · · X̂n

]

is said to represent the data in mean-deviation form.The covariance matrix, S, is defined to be

S =1

n − 1BBT

As an example suppose we measured the weights and heights of 10 individuals and got the resultsshown in the following table.

weight (kg) 23.1 16.2 18.4 24.2 12.4 20.0 25.2 11.1 19.3 25.1height (m) 1.10 .92 .98 1.24 .86 .99 1.21 .75 1.00 1.35

This would give a 2 × 10 matrix of observations. Each observation would involve the measurementof 2 parameters.

The sample mean is the vector whose entries are the average weight and average height. Com-puting these averages we get

M =

[

19.51.04

]

If we now subtract this mean from each of the observations we get the data in mean-deviationform

B =

[

3.6 −3.3 −1.1 4.7 −7.1 .5 5.7 −8.4 −.2 5.6.06 −.12 −.06 .20 −.18 −.05 .17 −.29 −0.4 .31

]

If we look at each column of the matrix of observations as a point in R2 we can plot these points

in what is called a scatter plot. Figure 7.22 is a scatter plot of our data. In this plot the samplemean is also plotted as a cross, it is located at the “center” of the data. For comparison, Figure7.23 is a plot of the data in mean deviation form. The only difference is that the data has beenshifted so that it is now centered around the origin. In mean deviation form the sample mean willbe the origin. The entries in matrix B indicate how much above or below average each value lies.

The covariance matrix would be

S =1

9BBT =

[

25.807 .891.891 .034

]

Page 39: The singular value decomposition

7.5. The SVD and Statistics 329

0.8

0.9

1

1.1

1.2

1.3

12 14 16 18 20 22 24

Figure 7.22: Scatter plot of original data.

–0.3

–0.2

–0.1

0

0.1

0.2

0.3

–8 –6 –4 –2 2 4

Figure 7.23: Data in mean-deviation form.

The entries down the diagonal of the covariance matrix represent the variance of the data. Inparticular, the diagonal entry sjj of matrix S is the variance of the jth parameter.

So, in the above example, 25.907 is the variance of the weight and .034 is the variance of theheight.

The variance can be interpreted as a measure of the spread of the values of a certain parameteraround the average value. For example, the average of 9 and 11 is 10, but the average of -100 and220 is also 10. The difference is that the first pair of numbers lie much closer to 10 than the secondpair, i.e., the variance of the first pair is much less than the variance of the second pair.

The total variance of the data is the sum of all the separate variances. That is, the totalvariance of the data is the sum of the diagonal entries of S (this is also the trace of S).

Each off-diagonal entry of matrix S, sij for i 6= j, is called the covariance between parametersxi and xj of the data matrix X. Notice that the covariance matrix is symmetric so sij = sji. If thecovariance is 0 it is said that the corresponding parameters are uncorrelated.

Principal Components

The covariance matrix is symmetric and positive definite so, as we’ve seen before, it can be diag-onalized. To diagonalize S we would find the (positive) eigenvalues and then the correspondingeigenvectors. The eigenvectors of S determine a set of orthogonal lines. If ui is one of these eigen-vectors then the vector BTui is called a principal components of the data. The principal componentcorresponding to the largest eigenvalue is called the first principal component. The second principalcomponent corresponds to the second larget eigenvalue and so on.

For our earlier example of weights and heights we would get the following eigenvalues and uniteigenvectors

λ1 = 25.838, λ2 = .0032

u1 =[

.9994, .0345]

, u2 =[

−.0345, .9994]

The first principal component would then be

BTu1 =[

3.600 −3.302 −1.101 4.704 −7.102 .498 5.702 −8.405 −.214]T

The second principal component would be

BT u2 =[

−.064 −.006 −.022 .038 .065 −.067 −.027 −.000 −.393 .117]T

Page 40: The singular value decomposition

330 7. The Singular Value Decomposition

Now this might seem confusing but all that is going on is a change of basis. We have our data inmean-deviation form and we are converting it to our eigenbasis6 which has been ordered accordingto the size of the eigenvalues. The first principal component is a vector that contains all the firstcoordinates of our data points relative to the eigenbasis. The second principal component is madeup of the second coordinates of our data points relative to the eigenbasis.

In the above example the entries in the second principal component are fairly small. This meansthat most of the data points lie very near the first eigenspace. That is, relative to the eigenbasis ourdata is approximately 1-dimensional. This is connected to the relative sizes of the eigenvalues. Thesum of the eigenvalues of S will equal the total variance.

In the following plot we see the data in mean-deviation form and the eigenspace of S correspond-ing to the first principal component (λ1 = 25.8374).

–0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

–10 –6–4 2 4 6 8 101214161820222426x

Figure 7.24: The data in mean-deviation form and the first principal component

The line through the origin along the principal component has slope .0345. This would haveequation h̄ = .0345w̄ where h̄ and w̄ are the weight and height in mean deviation form. Is this justthe least-squares line? No7, the significance of this line and how it relates to the least-squares linewill be explained in the next section.

6The eigenbasis consists of the eigenvectors of mathbfS which are the right singular vectors of X.7The least-squares line would be w̄ = .0383h̄. You should try deriving this equation for a bit of review.

Page 41: The singular value decomposition

7.5. The SVD and Statistics 331

Exercises

1. Given the following data points (in mean-deviation form)

x -2 -1 1 2y -3 0 2 1

(a) Find the least-squares line for this data.

(b) Find the total least-squares line for this data.

(c) Plot the data points and the two lines from (a) and (b) on the same set of axes.

(d) Consider the line y = x. Find the square root of the sum of the squares of the verticaldistances of the data points to this line. Find the square root of the sum of the squares ofthe perpendicular distances of the data points to this line.

2. Given the following data points (in mean-deviation form)

x -2 -1 0 1 2y 1 1 0 -2 0

(a) Find the least-squares line for this data.

(b) Find the total least-squares line for this data.

(c) Plot the data points and the two lines from (a) and (b) on the same set of axes.

(d) Consider the line y = x. Find the square root of the sum of the squares of the verticaldistances of the data points to this line. Find the square root of the sum of the squares ofthe perpendicular distances of the data points to this line.

3. Given the following data points

x 1 3 5y 3 1 2

(a) Find the least-squares line for this data.

(b) Find the total least-squares line for this data.

4. Let A =

[

3 4 11 2 5

]

be a data matrix.

(a) Convert A to mean-deviation form.

(b) Find the covariance matrix.

(c) Find the principal components.

(d) What fraction of of the total variance is due to the first principal component.

5. Let A =

1 1 2 2 1 2 1 23 5 7 9 11 13 15 171 −1 1 −1 1 −1 1 −1

be a data matrix.

(a) Convert A to mean-deviation form.

(b) Find the covariance matrix.

(c) Find the principal components.

(d) What fraction of of the total variance is due to the first principal component.

Page 42: The singular value decomposition

332 7. The Singular Value Decomposition

Using MAPLE

Example 1.

We will use Maple to illustrate the idea of principal components.We begin by generating 200 points using one of the random number routines in Maple .

>with(stats[random]):

>xv:=[seq( normald(),i=1..200)]: ### the x coordinates

>yv:=[seq(.9*xv[i]+normald(),i=1..200)]: ### the y coordinates

>mx:=add(xv[i],i=1..200)/200: ### the average x value

>my:=add(yv[i],i=1..200)/200: ### the average y value

>mxv:=[seq(xv[i]-mx,i=1..200)]: ### x in mean deviation form

>myv:=[seq(yv[i]-my,i=1..200)]: ### y in mean deviation form

>data:=[seq( [mxv[i],myv[i]],i=1..200)]:

>p1:=plot(data,style=point,color=black):

>B:=< convert(mxv,Vector), convert(myv,Vector)>;

>M:=1/199*B^%T.B;

[

1.084 .9079.9079 1.790

]

>SingularValues(M,output=[’U’,’S’,’Vt’]);

[ 2.4115, .4631 ]

[

.5646 .8254

.8254 −.5646

]

The first row of V t gives the first principal component. We will compute the corresponding slope.

>m1:=Vt[1,2]/V[1,1]:

>p2:=plot([m1*x,-1/m1*x],x=-3..3,thickness=2,color=black):

>plots[display]([p1,p2],scaling=constrained);

–4

–2

2

4

–3 –2 –1 1 2 3

Figure 7.25: The 200 data points and the principal components.

Page 43: The singular value decomposition

7.5. The SVD and Statistics 333

This gives Figure 7.25. We have a cloud of data points centered at the origin. These points lie ina roughly elliptical region. The principal components correspond to the axes of that ellipse.

Example 2.

In this example we will begin by genterating 30 data points and then put the data in mean-deviationform. The steps are similar to the first example.

>xv:=[seq(normald(),i=1..30)]:yv:=[seq(.5*normald()+.6*xv[i], i=1..30)]:

>mx:=add(xv[i],i=1..30)/30:

>my:=add(yv[i],i=1..30)/30:

>mxv:=convert([seq(xv[i]-mx,i=1..30)],Vector):

>myv:=convert([seq(yv[i]-my,i=1..30)],Vector):

>data:=[seq( [mxv[i],myv[i]],i=1..30)]:

We now have a collection of points centered at the origin. Look at any straight line drawn throughthe origin at angle θ (the slope of this line would be tan θ ). We will find the sum of the squares of theorthogonal distances to this line and the sum of the squares of the vertical diastances to this line.

A unit vector in the direction of the line would be

[

cos(θ)sin(θ)

]

. A unit vector normal to the line would

be

[

− sin(θ)cos(θ)

]

. The orthogonal distance from a point

[

xi

yi

]

to this line the length of the projection onto

the normal vector and this would be | − xi sin(θ) + yi cos(θ)|.The vertical distance from

[

xi

yi

]

to this line would be |yi − xi tan(θ)|.We will use Maple to compute the sum of the squares of these distances and plot the results functions

of θ. We will call the sum of the squares of the orthogonal distances D1, and the sum of the squares ofthe vertical distances will be called D2.

>D1:=expand(add( (-mxv[i]*sin(t)+myv[i]*cos(t))^2,i=1..30));

D1 = 19.42397 cos2(θ) + 33.01234 sin2(θ) − 40.59837 sin(θ) cos(θ)

>D2:=expand(add( ( myv[i]-mxv[i]*tan(t) )^2,i=1..30));

D2 = 33.01234 tan2(θ) − 40.59837 tan(θ) + 19.42397

>plot( [ D1, D2 ], t=-Pi/2..Pi/2, 0..60, thickness=2);

The plots of D1 and D2 are shown in Figure 7.24. The plot shows that both of these functionstake on a minimum at around θ = .5. Using Maple we can find where these minima occur. We willfind the derivatives (using the diff command), and find the critical values.

>fsolve(diff(D1,t)=0,t=0..1);

.62391

>fsolve(diff(D2,t)=0,t=0..1);

.55130

So the line which minimizes the sum of the squares of the orthogonal distance would lie at anangle of .6391 radians. For vertical distances the minimum would be when the line lies at .55310radians.

Page 44: The singular value decomposition

334 7. The Singular Value Decomposition

0

10

20

30

40

50

60

–1.5 –1 –0.5 0.5 1 1.5t

Figure 7.26: The plot of D1 and D2

Now the line which minimizes the sum of the squares of the vertical distances would be the least-squares line. If our x coordinates are in x and out y coordinates are in vy then the least-squaresline through the origin fitting these points would have slope

y · xx · x

To find the angle at which this line lies we then apply the inverse tangent. In Maple we have

>arctan(DotProduct(mxv,myv)/DotProduct(mxv,mxv));

.55130

>VectorAngle(mxv,myv); ### an easier way

Amazing! This is the same result that we obtained above using calculus to determine the mini-mum value of D2.

Now what about the minimum of D1. How do we find this using linear algebra. The minimumline here will be the eigenspace of the covariance matrix corresponding to the largest eigenvalue.

>B:=<mxv|myv>:

>S:=1/29*B^%T.B;

>U,S,Vt:=SingularValues(M,output=[’U’,’S’,’Vt’]);

The line we are looking for is determined by the first column of V. We will find the slope of thisline and then apply the inverse tangent.

>arctan(V[1,2]/V[1,1]);

.62391

This agrees with the previous result obtained using calculus.

Page 45: The singular value decomposition

7.6. Total Least Squares 335

7.6 Total Least Squares

Suppose you want to find the straight line that gives the best fit to a collection of data. Oneapproach, as we saw earlier, is to find the least squares line. The assumption of this approach isthat all the error of the data is located in the y values8. In some cases this assumption is valid, butin many cases it will turn out that there are errors of measurement in both the x and y values.

Suppose we have data matrix X =[

x1 x2 · · · xn

]

where each column is a point in R2 and

the data is already in mean-deviation form. Let u be a unit vector, and let x = tu be the linethrough the origin in the direction of u. We can find square of the orthogonal distance of each point,xi, to the line x = tu by using the projector I − uuT .

‖(I − uuT )xi‖2 = xTi (I − uuT )(I − uuT )xi

= xTi (I − uuT )xi

The sum of all such distances is therefore

n∑

i=1

xTi (I − uuT )xi =

n∑

i=1

‖xi‖2 −n

i=1

xTi uuTxi

Look at the expression on the right hand side of the above equation. This represents the value thatwe want to minimize as the difference of two sums. If we wish to find the vector u which minimizesthis quantity we must maximize the second sum. This is because the value of the first sum is fixedby the given data points so we want to subtract as much as possible from this sum. But the secondsum is

xTi uuTxi =

uTxixTi u

= uT(

xixTi

)

u

= uT XXTu

and this can be seen as a quadratic form with the unknown u and the maximum will be taken onwhen u is a unit eigenvector corresponding to the largest eigenvalue of the matrix XXT . Finally,this is just the first principal component of X .

As an example suppose we have the following data values:

x 1.1 1.2 1.3 1.8 1.9 2.1 2.3 2.4 2.5 3.0 3.3 3.8y .2 .4 .4 .6 .7 .9 .8 1.0 1.3 1.1 2.5 2.8

We can put this data in mean-deviation form and then find the least squares line and total leastsquares line. We will outline the steps.

First find the average x and y values.

∑12i=1 xi

12= 2.225

∑12i=1 yi

12= 1.0583

8When we find the least squares line by solving a system of the form Xβ = y we remove the error from y byprojecting y onto the column space of X. The column space of X is determined by the x coordinates of the datapoints. If there are errors in these x values then this method would be flawed.

Page 46: The singular value decomposition

336 7. The Singular Value Decomposition

We subtract these averages from the x and y values to put the data in mean-deviation form andcreate matrix X .

X =

[

−1.125 −1.025 −0.925 −0.425 −0.325 −0.125 0.075 0.175 0.275 0.775 1.075 1.5750.8853 −0.6853 −0.6853 −0.4853 −0.3853 −0.1853 −0.2853 −0.0853 0.2147 0.0147 1.4147 1.7147

This gives

XXT =

[

7.82250 6.942506.94250 7.21791

]

This matrix has eigenvalues of 14.4693 and .5711. A basis for the eigenspace corresponding to

the largest eigenvalue is

[

.72232

.69156

]

. This eigenspace would be a line whose slope is

.69156

.72232= .95741

If we find the least squares line through these points in the usual way we would get a line whoseslope is .88750.

If we plot the data and the lines we obtain Figure 7.27.

LS

TLS

–1.2–1

–0.8–0.6–0.4–0.2

0

0.20.40.60.8

11.21.41.61.8

–1.2 –0.8 0.2 0.6 1 1.2 1.6

Figure 7.27: Comparison of the total least squares line (TLS) and the least squares line (LS).

Page 47: The singular value decomposition

7.6. Total Least Squares 337

Exercises

1. Find the pseudoinverse of A =

[

1 2−1 −2

]

and use this pseudoinverse to find the least-squares

solution of the system

Ax =

[

10

]

2. Find the pseudoinverse of A =

1 22 11 −1

and use this pseudoinverse to find the least-squares

solution of the system

Ax =

111

3. Find the least-squares line and the total least-squares line for the following data points. Plot bothlines on the same set of axes.

x 0 1 2y 0 0 1

4. Find the least-squares line and the total least-squares line for the following data points. Plot bothlines on the same set of axes.

x -1 0 1 2y 0 0 1 1

Page 48: The singular value decomposition

338 7. The Singular Value Decomposition

Using MAPLE

Example 1.

We will use Maple to illustrate another use of the SVD - data compression. We will begin by defininga 30 × 30 matrix of data.

>with(plots):

>f:=(x,y)-> if abs(x)+abs(y)<1 then 1 else 0 fi:

>A:=Matrix(30,30,(i,j)->f((i-15)/12,(j-15)/12)-f((i-15)/4,(j-15)/4)):

>matrixplot(A);

Figure 7.28: Matrixplot of A.

There is nothing special about this matrix other than the fact that it generates a nice picture.The matrixplot command in Maple generates a 3 dimensional image where the values in the matrixcorrespond to the heights of a surface. This gives us a nice way of visualizing the data in the matrix.

Now we will find the singular value decomposition of A and we will define new matrices

B1 = σ1u1vT1

B2 = σ1u1vT1 + σ2u2v

T2

B3 = σ1u1vT1 + σ2u2v

T2 + σ3u3v

T3

......

and plot them. The second line of Maple code below is a bit tricky but it just corresponds to the formula

Bi =i

j=1

σjujvTj

>U,S,Vt:=SingularValues(A,output=[’U’,’S’,’Vt’]);

>for i to 12 do

B[i]:=value(add(S[j]*Column(U,j).Row(Vt,j)),j=1..i)) od:

>matrixplot(B[1]);

>matrixplot(B[2]);

>matrixplot(B[3]);

>matrixplot(B[10]);

Page 49: The singular value decomposition

7.6. Total Least Squares 339

Figure 7.29: Matrixplot of B1.Figure 7.30: Matrixplot of B2

Figure 7.31: Matrixplot of B3. Figure 7.32: Matrixplot of B10

Now matrix A contains 900 entries. Matrix B1 was computed from only 61 numbers - σ1, the 30entries in u1 and the 30 entries in v1. With less than 7% of the original amount of information wewere able to reconstruct a poor approximation to A - the plot of A is not recognizable from the plotof B1. With matrix B2 we use 122 numbers, about 14% of the original amount of data. The plot ofB2 is beginning to reveal the basic 3d structure of the original data. By adding more and more of thecomponents of the SVD we can get closer and closer to A, and the plot of B10 is very close to theplot of A, but if you look at the singular values they become very small after the first 12 so all thecomponents after σ12 should contribute very little to the reconstruction of A. To construct B12 we need732 numbers which is about 80% of the amount of data in A. The point is that if we wanted to transmitthe information in matrix A to another location we could save time by sending the 732 numbers neededto reconstruct A12 rather than the 900 numbers of A thereby reducing the amount of data that must betransferred. It is true that the matrix reconstructed would not be exactly the same as A but, dependingon the context, it might be acceptably close.

How close is matrix B1 to matrix A. If they were the same then A − B1 would be the zero matrix.If B1 is close to A then the entries in A − B1 should all be small. The distance from A to B1 couldbe measured in a way similar to how we measure the distance from one vector to another. We couldsubtract the matrices, square the entries, add them, and then take the square root9. In Maple we can dothis with the Norm command with the frobenius option. (There are various ways of finding the normof a matrix. The method we are using here is called the Frobenius norm.)

>Norm(A-B[1],frobenius);

>Norm(A-B[2],frobenius);

>Norm(A-B[3],frobenius);

>Norm(A-B[12],frobenius);

9This is in fact the same as finding the distance relative to the inner product 〈A, B〉 = trace(AT B)

Page 50: The singular value decomposition

340 7. The Singular Value Decomposition

This gives us the values 6.791, 2.648, 2.278, .013426353, .764e-8. The conclusion is that B12 is very

close to A.We can plot the error of the successive approximations and get the following graph. We also plot

the singular values for comparison. Notice how the decrease in the errors parallels the decrease in thesingular values.

0

1

2

3

4

5

6

2 4 6 8 10 12

Figure 7.33: Errors of the SVD reconstructions.

2

4

6

8

10

12

2 4 6 8 10 12

Figure 7.34: The singular values ofA.

There is another way to visualize how the matrices Bj approximate A using an animation in Maple .

>for i to 12 do p[i]:=matrixplot(B[i]) od:

>display( [ seq( p[i],i=1..12) ], insequence=true );

Page 51: The singular value decomposition

7.6. Total Least Squares 341

Example 2.

We will now look at another example of data compression. We begin by defining a 8 × 8 matrix:

>M:=Matrix(8,8,[[0,0,0,0,0,0,0,0],

[0,1,1,1,1,1,1,0],

[0,1,0,0,0,0,1,0],

[0,1,0,1,1,0,1,0],

[0,1,0,0,0,0,1,0],

[0,1,0,0,0,0,1,0],

[0,1,1,1,1,1,1,0],

[0,0,0,0,0,0,0,0]]):

This matrix would correspond to the following image where 0=black and 1=white.

Figure 7.35: An 8 by 8 image.

The JPEG method of compressing an image involves converting it to the Discrete Cosine basis thatwe mentioned in Chapter 1. We will write a procedure that converts an 8 × 8 matrix to the DiscreteCosine basis and the corresponding inverse transformation. First we define a function f that gives cosinefunctions at various frequencies. We then generate a basis for R

8 by sampling f. We place these basisvectors in in matrix A and let A1 be the inverse of A (these are the change of basis matrices). Then wedefine dct for the Discrete Cosine transform and idct for the inverse transform.

>f:=(k,t)->evalf(cos(Pi*k*(t-1/2)/8)):

>A:=Matrix(8,8, (i,j)->f(i-1,j));

>A1:=A^(-1):

>dct:=proc(mat)

local m1;

m1:=mat:

A1.m1.A1^%T;

end:

>idct:=proc(mat)

local m1;

m1:=mat:

A.M1.A^%T;

end:

We now will apply the dct procedure to M and call the result TM. This matrix contains all theinformation from the original image but relative to a different basis. Image compression is performed byreducing the amount of information in TM by making all “small” entries equal to 0. The following Maple

code scans through the entries in TM and if an entry is lesss than 0.2 then that entry is made 0.

Page 52: The singular value decomposition

342 7. The Singular Value Decomposition

>TM:=dct(M);

>for i to 8 do

for j to 8 do if abs(TM[i,j])<.2 then TM[i,j]:=0 fi

od; od;

>print(TM);

This gives the following matrix

0.3437500000 0 0 0 −0.2209708692 0 −0.3027230266 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0.3093592166 0

0 0 0 0 0 0 0 0

−0.2209708691 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

−0.3027230267 0 0.3093592167 0 0 0 0 0

0 0 0 0 0 0 0 0

Notice that we are keeping only 7 of the 64 entries in TM. We now transform back to the original basis.

>M2:=idct(TM):

This would correspond to the Figure 7.37

Figure 7.36: The DCT compressed image. Figure 7.37: The SVD compressed image.

Now we will compare this with using the SVD to compress the image:

>U,S,Vt:=SingularValues(M,oytput=[’U’,’S’,’Vt’]):

>M3:=value(add(S[i]*Column(U,i).Row(Vt,i)),i=1..2));

Here we are reconstructing the image from just two components of the SVD and we getThe idea is not to recreate the original image exactly. The idea is to create a “reasonably” good

reproduction of the image by using significantly less data than that contained in the original image.Since some of the original information is lost in this process, this type of compression is called lossy

compression.

Page 53: The singular value decomposition

7.6. Total Least Squares 343

Example 3.

We will use Maple to compare the total least squares line and the least squares line for a set of data.We begin by generating two lists called xv and yv which contain the coordinates of our data points.

>with(stats[random],normald):

>f1:=x->.2*x+normald[0,.2]():

>f2:=x->2*sin(.1*x)+normald[0,.2]():

>xv:=[seq( f1(i), i=1..20)]: ## noisy x values

>yv:=[seq( f2(i),i=1..20)]: ## noisy y values

Next we have to put the data in mean-deviation form. We will write a procedure called mdform whichwill take any list as an input and return the mean-deviation form of that list.

>mdform:=proc(L)

local n,m;

n:=nops(L): ## n is the number of poiints

m:=add(L[i],i=1..n)/n: ## m is the mean

convert([seq( L[i]-m, i=1..n)],Vector);

end:

>mx:=mdform(xv):

>my:=mdform(yv):

>A:=<mx|my>:

>M:=A^%T.A:

>U,S,Vt:=SingularValues(M,output=[’U’,’S’,’Vt’]);

The direction of the total least squares line is determined by the first column of V computed above.We will define the slope of this line and then define the plots the TLS line and the data points. Theplots won’t be displayed until we find the least squares line as well.

>v1:=Row(Vt,1):

>mtls:=v1[2]/v1[1]: ### the slope of the TLS line

>data:=[seq( [x[i],y[i]],i=1..50)]:

>p1:=plot(data,style=point):

>p2:=plot(mtls*x,x=-2..2): ### plot the TLS line

We can find the least squares line as follows:

>mls:=dotprod(mx,my)/dotprod(mx,mx); ## slope of the LS line

>p3:=plot(mls*x,x=-2..2): ### plot the LS line

>plots[display]([p1,p2,p3]);

This gives the following plot:Now the least squares line should minimize the sum of the squares of the vertical distances from the

data point to the line. We will use Maple to compute these distances for the TLS line and the LS line¿

>yls:=mls*mx:

>ytls:=mtls*mx:

>Norm(my-yls,2);

1.29529

>Norm(my-ytls,2);

1.30000

Each time you execute the above commands the numerical values obtained should vary because the datapoints wwere generated using random numbers but the first value should always be smaller than thesecond.

We leave it as a final Maple exercise for the reader to compare the sum of the squares of theorthogonal distances from the data points to the lines. (See the discussion in section 7.6).

Page 54: The singular value decomposition

344 7. The Singular Value Decomposition

–1

–0.5

0.5

–2 –1 1 2

Figure 7.38: The TLS line, LS line, and the data points.

Page 55: The singular value decomposition

Chapter 8

Calculus and Linear Algebra

In this chapter we will look at some connections between techniques of calculus (differentiation andintegration) and the methods of linear algebra we have covered in this course.

8.1 Calculus with Discrete Data

Suppose we have the following experimental data which gives the vapor pressure (in torr) ofethanol at various temperatures (in degrees Centigrade)

Temperature T 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0 60.0Pressure P 42.43 55.62 69.25 93.02 116.95 153.73 190.06 241.26 303.84

We have plotted this data in Figure 8.1 which shows that the pressure is clearly an increasingfunction of temperature, P = f(T ), but any precise formulation of this function is unknown. Nowsuppose we want to answer the following questions:

• What is the value of P when T = 42.7?

• What is the value of dP/dT when T = 42.7?

• What is

∫ T=40.0

T=20.0

f(T ) dT ?

How could we answer these questions? We have seen two major approaches that could be taken.We could find an interpolating polynomial and use that to answer each of the above questions.

In Figure 8.1 we show the data, the interpolating polynomial, and the derivative of this polynomial.Looking at these plots should convince you that this approach is unsatisfactory. As the plot of thederivative clearly shows the interpolating polynomial is not strictly increasing and so it violates ourbasic intuition of the physics of the problem. This is typical of what happens when you try to fit ahigh degree polynomial to a set of data.

The problem with the above approach is that we tried to find a curve that fit the data valuesexactly but experimental data is almost always guaranteed to contain errors of measurement. An-other method would be to find a function that gives the best least-squares fit to the data. Theproblem here is to determine what type of function to fit. In many cases understanding the physicaltheory behind the phenomenon can indicate what type of function should be used. In the currentexample we can see that P grows with T but should that growth be exponential, linear, quadratic,or some other form. If we assume quadratic growth and find the best fitting function of the form

345

Page 56: The singular value decomposition

346 8. Calculus and Linear Algebra

0

50

100

150

200

250

300

30 40 50 60

Figure 8.1: Vapor pressure versus temperature (interpolation)

P = c0 +C1T + c2T2 we would get the plots shown in Figure 8.2 . Clearly this is a more satisfactory

result than simple interpolation.

We now want to take a different approach to this type of problem.

You should recall from calculus that if P = f(T ) then the derivative dP/dT is defined as

f ′(T ) = lim∆T→0

f(T + ∆T ) − f(T )

∆T

This means that we have the following approximation

dP

dT≈ f(T + ∆T ) − f(T )

∆T

This is not saying anything new or complicated. It is stating the obvious fact that the instantaneousrate of change can be approximated by an average rate of change over an interval (generally thesmaller the interval the better the approximation). There is an interval of ∆T = 5.0 between eachof our data values in the above table.

Page 57: The singular value decomposition

8.1. Calculus with Discrete Data 347

0

50

100

150

200

250

300

20 30 40 50 60

Figure 8.2: Vapor pressure versus temperature (least squares fit)

If we store the P values in a vector v then the finite differences can be computed by matrixmultiplication

1

5

−1 1−1 1

−1 1−1 1

−1 1−1 1

−1 1−1 1

42.4355.6269.2593.02116.95153.73190.06241.26303.84

=

2.642.734.754.787.367.2710.2412.52

Notice that since it takes 2 data values to compute each finite difference our nine data valuesgives only eight finite differences.

The matrix in the above example could be called a differentiation matrix since for any vectorgenerated by sampling a function (with, in this case, an interval of 5 between the samples) multipli-cation by the matrix results in a vector containing approximations to the derivative of the function.What is the null space of this matrix?

Page 58: The singular value decomposition

348 8. Calculus and Linear Algebra

8.2 Differential Equations and Dynamical Systems

In this section we want to look at differential equations of the form

dy

dt= f(y)

where f(y) is linear. The left hand side of this equation represents the instantaneous rate of changeof y with respect to t. If we evaluate y at a sequence of equidistant values of t then this instantaneousrate of change at the particular value yk can be approximated by

dy

dt≈ yk+1 − yk

∆t

where ∆t represents the t interval from yk to yk+1.For example, if we had the function y = 3t2 then the value of dy

dtat t = 1 could be approximated

byy(1.1)− (y(1)

.1=

3(1.1)2 − 3(1)2

.1= 6.3

where we are using a value of ∆t = .1. It should be clear that in general we will get a betterapproximation by using a smaller value of ∆t. So if we used ∆t = .02 we would have

y(1.02) − (y(1)

.02=

3(1.02)2 − 3(1)2

.02= 6.06

Now suppose we have the differential equation

dy

dt= 5y

At the particular value yk this equation could be approximated by

yk+1 − yk

∆t= 5yk

which can be rewritten asyk+1 = (1 + 5∆t)yk

If we had some initial value (say y0 = 3) and some fixed interval (say ∆t = .1) then we couldapproximate subsequent values of y from the order 1 difference equation yk+1 = 1.5yk. This wouldgive y0 = 3, y1 = (1.5)(3) = 4.5, y2 = (1.5)2(3) = 6.75, and so on. In general we would haveyn = 3(1.5)n. Remember yn stands for the value of y after n intervals of length .1 .

Page 59: The singular value decomposition

8.3. An Oscillating Spring 349

Exercises

1. Given the differential equationdy

dt= −2y

We will look at several ways of solving this by a discrete approximation. The right hand side tellsus that the rate of change at time k is given by −2yk but how can we approximate this rate ofchange by a finite difference? For the following problems use a time interval of ∆t = .25 with theinitial va lues y0 = 1, y1 = 1/2.

(a) The rate of change at time k can be approximated by the forward difference

yk+1 − yk

∆t

Use this approximation to solve the differential equation. (Remember a solution in this contextis a sequence of values y2, y3, y4, . . . . This sequence of values will be generatedby a discretedynamical system.)

(b) The rate of change at time k can be approximated by the backward difference

yk − yk−1

∆t

Use this approximation to solve the differential equation.

(c) The rate of change at time k can be approximated by the centered difference

yk+1 − yk−1

2∆t

Use this approximation to solve the system.

(d) Plot the three approximate solutions along with the exact solution.

2. Repeat the previous problem fordy

dt= −2y + 1

Use the same ∆t and initial values.

3. Repeat ford2y

dt2= −2y + 1

8.3 An Oscillating Spring

In this section we will consider a system composed of a mass connected to a spring with theother end of the spring connected to a fixed support. We will assume that the only relevant forceis the spring force and will ignore gravity and friction. The mass is displaced from its equilibriumposition and released. The result is that the mass will oscillate.

On a conceptual level this is one of the most important examples in this chapter. It illustrateshow a problem can be analyzed in terms of basic laws of physics. These laws of physics can thenbe expressed in the form of a differential equation which can be solved in continuous time usingcalculus. Finally it shows how it can be converted to discrete time and solved using linear algebra.This last step might appear to be redundant but in many applications it turns out that the equations

Page 60: The singular value decomposition

350 8. Calculus and Linear Algebra

involved are too complicated for a calculus solution and they have to be converted to discrete timeapproximations.

The use of calculus in such problems results in what is called an analytical solution; the solutionis given as a function. The use of linear algebra as shown here is called a numerical solution; thesolution is a list of numbers. The development of fast computers with large memories have had arevolutionary impact on applied mathematics. These technological improvements have made quickand accurate numerical solutions possible where they would have been impossible 30 years ago.

FixedSupport

Figure 8.3: A mass on a spring

Continuous Time

First we will analyze this as a continuous time system. From physics you know that the positionof the mass is governed by an equation of the form F = ma. Furthermore, in this example, we areassuming the only relevant force is the spring force which is given by F = −Kx where K > 0 is thespring constant and x is the displacement of the mass from equilibrium. Combining these equationswe get

ma = −Kx

md2x

dt2= −Kx

d2x

dt2= −K

mx

This last equation has a general solution of the following form1 :

1In fact, the set of solutions to this differential equation form a vector space (i.e., the sum of any two solutionsis also a solution, and a scalar multiple of a solution is a solution). This vector space is two dimensional and has a

basis consisting of cosq

Km

t and sinq

Km

t. So you can look at the general solution as being any possible combination

of these basis vectors.

Page 61: The singular value decomposition

8.3. An Oscillating Spring 351

x = C1 cos

K

mt + C2 sin

K

mt

Problem. Given that K = 1, m = 1, and at t = 0 you know that x = 0 and dxdt

= 1 find thevalues of C1 and C2.

Solution. Given the values of K and m the above equation would become

x = C1 cos t + C2 sin t

Substituting x = 0 and t = 0 into this equation we have

0 = C1 cos(0) + C2 sin(0)

= C1

Hence C1 = 0.Now find the derivative and substitute t = 0 and dx

dt= 1 :

dx

dt= − sin t + C2 cos t

1 = − sin(0) + C2 cos(0)

1 = C2

Therefore the motion of of the oscillating mass is described by

x = sin t

Discrete Time

We’ve seen before that a first derivative can be approximated by a finite difference. The secondderivative can be approximated in a similar way. Using the fact that the second derivative is thederivative of the first derivative we get

d2x

dt2≈

xk+2−xk+1

∆t− xk+1−xk

∆t

∆t=

xk+2 − 2xk+1 + xk

∆t2

This finite difference expression would give an approximation to the second derivative at timek + 1 (the midpoint of the values used). So if we use this discrete approximation in the place of thesecond derivative then the equation describing the motion becomes

xk+2 − 2xk+1 + xk

∆t2= −K

mxk+1

Solving this for xt+2 we get

xk+2 =

(

2 − K

m∆t2

)

xk+1 − xk

Or in matrix form:

[

xk+1

xk+2

]

=

[

0 1−1 2 − p

] [

xk

xk+1

]

where p =K

m∆t2.

Page 62: The singular value decomposition

352 8. Calculus and Linear Algebra

As an example of this discrete model suppose that K/m = 1 and ∆t = .8. We then have thedynamical system

xk+1 =

[

0 1−1 1.36

]

xk

To actually use this to compute values we would need x0 which would require knowledge ofthe position of the mass at times t = 0 and t = .8 (i.e, at k = 0 and k = 1). From previousresults we know that the position of the object is described by x = sin t, so at t = .8 we have

x1 = sin(.8) = .71736. The initial state of the dynamical system is then x0 =

[

0.71736

]

. Repeated

multiplication by A gives the following values:

[

0.71736

]

→[

.71736

.97560

]

→[

.97560

.60947

]

→[

.60947−.14673

]

→[

−.14673−.80902

]

→[

−.80902−.95354

]

→[

−.95354−.48779

]

→[

−.48779.29014

]

→[

.29014

.88238

]

→ · · ·

If we draw a time plot of the discrete system along with the solution of the continuous model weget Figure 3.56. The values given by the discrete system also lie on a sine wave but at a slightlydifferent frequency from that of the continuous time solution.

–1–0.8–0.6–0.4–0.2

0

0.20.40.60.8

1

2 4 6 8 10 12 14 16

Figure 8.4: Plots of the discrete time and continuous time solutions with ∆t = .8. The horizontalaxis is indexed by k not by t.

The continuous solution has a period of 2π. What is the period of the discrete solution? Thecharacteristic polynomial would be λ2 − 1.36λ + 1 = 0 giving eigenvalues of

1.36 ±√

1.362 − 4

2= .68 ± .7333212i

These complex eigenvalues have magnitude 1, and correspond to a rotation of arccos(.68) at each

multiplication by A. One complete cycle would therefore require2π

arccos(.68)steps. Since each step

is .8 seconds the total period is

.82π

arccos(.68)=

1.6π

arccos(.68)≈ 6.107342014

This is slightly less than the period of the continuous time model which had a period of 2π ≈6.283185308.

Problem. You should realize that the finite difference approximation that we are using togenerate our linear dynamical system becomes more exact as the time interval becomes smaller.

Page 63: The singular value decomposition

8.3. An Oscillating Spring 353

Compute the period of the discrete time solution for ∆t = .4, and ∆t = .1 and compare the resultwith the period of the continuous time solution.

Here are the plots of these solutions for ∆t = .4, and ∆t = .1.

–1–0.8–0.6–0.4–0.2

0

0.20.40.60.8

1

2 4 6 8 10 12 14 16

Figure 8.5: Plots of the discrete time and continuous time solutions with ∆t = .4.

–1–0.8–0.6–0.4–0.2

0

0.20.40.60.8

1

2 4 6 8 10 12 14 16

Figure 8.6: Plots of the discrete time and continuous time solutions with ∆t = .1.

Page 64: The singular value decomposition

354 8. Calculus and Linear Algebra

Exercises

1. In our analysis of the oscillating spring-mass system we ignored gravity. How would the analysischange if we include gravity.

2. The dynamical system which modelled the spring-mass system ignored friction. We can modify thesystem as follows to include the effect of friction:

[

xk+1

xk+2

]

=

[

0 1q − 1 2 − p − q

] [

xk

xk+1

]

Here the parameter q is a (small) positive value that models the presence of friction.

(a) What are the magnitudes of the (complex) eigenvalues of this system?

(b) Let p = .64 and x0 =

[

0.717

]

. Use Maple to draw time plots of this system for q =

0.1, 0.2, 0.3, . . . , 0.9, 1.0. At what value of q do the eigenvalues become real? How does thebehavior of the system change at that point?

3. Set up the differential equations for a system of two equal masses connected by three springs totwo fixed supports. Assume the springs all have the same spring constant.

8.4 Differential Equations and Linear Algebra

We have already looked at dynamical systems of the form xk+1 = Axk. Dynamical systems ofthis type are sometimes called discrete dynamical systems because the time variable (k) evolvesin steps of some fixed finite size. There is another fundamental way of modeling systems which evolveover time using continuous dynamical systems which are described by differential equations.

The simplest example of a continuous linear dynamical system would be an equation like

dx

dt= .06x

The left side of this equation is a derivative which represents the instantaneous rate of change of xwith respect to time, t. The equation says that this rate of change is equal to 6% of the value of x.The solution to this equation would be

x = Ce.06t

Checking this by taking the derivative we would get

dx

dt= Ce.06t(.06) = x(.06) = .06x

One key idea here is that the set of all possible solutions of the above differential equation canbe seen as a 1 dimensional vector space2 with basis e.06t.

We now show how the same dynamical system can be modeled as a discrete system.For simplicitywe will choose some specific value for C, say C = 10. We would then have the solution x = 10e.06t.If we let x0 stand for the value of x at time 0 we have x0 = 10. Choose a time interval, say ∆t = .5

2As a generalization of this example the solution ofdx

dt= kx would be x = Cekt – a one dimensional vector space

consisting of all scalar multiples of ekt.

Page 65: The singular value decomposition

8.4. Differential Equations and Linear Algebra 355

and let xk be the value of x at time t = .5k (that is, k time intervals after t = 0). Then the derivativecan be approximated by a difference quotient

xk+1 − xk

.5= .06xk

Solving this for xk+1 we getxk+1 = 1.03xk

This would now be a discrete time approximation to the original continuous time system. If we startoff with x0 = 10 and use this difference equation we get the following values

x0 10x1 10.3x2 10.609x3 10.927x4 11.255x5 11.595x6 11.941

Now remember that x6 is the value after 6 time intervals which corresponds to t = 3. In thecontinuous time model the value of x at t = 3 would be 10e.06(3) = 11.972. So the continuous andthe discrete model DO NOT AGREE. Here is a plot of the continuous time model and the values ofthe discrete time model. The gap between the discrete and continuous models will generally increase

10

12

14

16

18

0 2 4 6 8 10t

Figure 8.7: Comparison of the continuous and discrete models with ∆t = .5.

as time goes on. At t = 10 the difference between the two models would be

10e.06(10) − 10(1.03)20 ≈ .16008

In general, as the time interval gets smaller the solution given by the discrete model gets closerto the solution of the continuous model. To illustrate this we show the plots that would result if weused a larger time interval of ∆t = 1 and a smaller time interval of ∆t = .2.

By choosing a smaller time interval the discrete model becomes a better approximation to thecontinuous model. When we use a time interval of ∆t = .2 the difference between the two models

Page 66: The singular value decomposition

356 8. Calculus and Linear Algebra

10

12

14

16

18

0 2 4 6 8 10t

Figure 8.8: Comparison of the continuousand discrete models with ∆t = 1.

10

12

14

16

18

0 2 4 6 8 10t

Figure 8.9: Comparison of the continuousand discrete models with ∆t = .2.

at t = 10 would be .06496 (approximately one third of what it was with an interval of .5). Whenthe time interval is ∆t = 1 the difference between the two models at t = 10 would be .31271.

There are two drawbacks to using very small intervals. First, due to the finite precision ofcomputers there is a limit as to how small the interval can be and the smaller the interval the moreserious the rounding errors will be. Second, the smaller the time interval the greater the amount ofdata that will be produced. For example, if you chose ∆t to be a millionth of a second, it wouldtake a million steps to compute just one second of the solution.

Systems of Two Differential Equations

Next suppose we had a system of differential equations like

dx1

dt= .01x1

dx2

dt= .07x2

This system would be easy to solve based on the earlier example. The solution here would be

x1 = C1e.01t

x2 = C2e.07t

That was pretty simple, but now look at a more complicated example

dx1

dt= x1 +2x2

dx2

dt= −x1 +4x2

The big difference in this example is that the variables are coupled. That is the formula for therate of change of x1 involves the values of both x1 and x2, and similarly for the rate of change of x2.To solve this problem we want to uncouple the equations, and this will just involve diagonalizinga matrix.

We begin by letting x =

[

x1

x2

]

and then we have

dx

dt=

[

dx1/dtdx2/dt

]

=

[

x1 + 2x2

−x1 + 4x2

]

=

[

1 2−1 4

] [

x1

x2

]

= Ax

Page 67: The singular value decomposition

8.4. Differential Equations and Linear Algebra 357

The matrix A in this case can be diagonalized in the usual way. The matrix that diagonalizes A

is P =

[

2 11 1

]

and P−1AP =

[

2 00 3

]

. So we introduce a new variable

y =

[

y1

y2

]

= P−1x

We then get

dx

dt= Ax

dx

dt= PDP−1x

P−1 dx

dt= P−1PDP−1x

dy

dt= Dy

In this case that leaves us withdy1

dt= 2y1

dy2

dt= 3y2

The equations have been uncoupled. The solution now is simple:

y1 = C1e2t

y2 = C2e3t

But this is the solution in terms of our new variables. What is the solution in terms of the originalvariables. For this we just evaluate the following

x = Py =

[

2 11 1

] [

C1e2t

C2e3t

]

Sox1 = 2C1e

2t + C2e3t

x2 = C1e2t + C2e

3t

This solution can be written as

x = C1

[

2e2t

e2t

]

+ C2

[

e3t

e3t

]

When the solutions are written this way it is easier to see that they form a two dimensional vector

space with basis

[

2e2t

e2t

]

and

[

e3t

e3t

]

.

Now how can the solutions be visualized. First, to use a specific example let’s choose C1 = 1and C2 = −1. Then we can draw time plots of the solutions where we view x1 and x2 as functionsof time. This would give

We can also draw a phase plot where we plot x1 values against the corresponding x2 values.This gives In the phase plot we see the origin as a repellor.

Can we model this system as a discrete-time system? First, to simplify the notation 3, we willintroduce new variables: let a = x1 and b = x2.Well, if we let ∆t = .1 then using a similar argumentas our earlier example we get

3If we keep the variable x1, it is most common to represent the value of this variable at time t by xt1

Page 68: The singular value decomposition

358 8. Calculus and Linear Algebra

–10

–8

–6

–4

–20

2

4

6

8

10

–2 –1 1 2t

Figure 8.10: The time plots of x1 = 2e2t − e3t and x2 = e2t − e3t

–6

–4

–2

0

2

4

6

x2

–6 –4 –2 2 4 6x1

Figure 8.11: The phase plot of x1 = 2e2t − e3t and x2 = e2t − e3t.

Page 69: The singular value decomposition

Appendix A

Linear Algebra with Maple

We will summarize the basics of using the LinearAlgebra package to do linear algebra. For all the commands in thissection it will be assumed that you have entered the following command to load the LinearAlgebra package.

>with(LinearAlgebra):

Defining Vectors in Maple

There are several ways of defining a vector in Maple . Suppose we have the vector

u =

2

6

6

4

4308

3

7

7

5

The easiest way of defining this vector in Maple is

>u:=<4, 3, 0, 8>;

Note: You have to enclose the vector entries in angled brackets, < and >.

The Vector command also allows you to define a vector by giving a rule to generate the nth entry in the vector.This method of defining a vector requires two input parameters. The first parameter is the size of the vector. Thesecond is the rule to generate entries in the vector.

>u:=Vector(4, n->n^2);

u=[1, 4, 9, 16]

>v:=Vector(5, j->t/(j+1));

v=[t/2, t/3, t/4, t/5, t/6]

A vector is a one-dimensional array in Maple which means individual entries in the vector can be accessed byspecifying the index of that entry as shown below

>u:=<x,y,x*y,-1>:

>u[2];

y

>u[4];

-1

>u[1]-u[3];

x - xy

>v:=Vector(4, n->u[5-n]);

v=[-1, xy, y, x]

359

Page 70: The singular value decomposition

360 A. Linear Algebra with Maple

Defining Matrices in Maple

There are many ways of defining a matrix in Maple . Suppose we have the matrix

A =

2

4

1 03 58 2

3

5

Either of the first two following commands could be used to define this matrix.

>A:=<<1|0>,<3|5>,<8|2>>; #### row by row

>A:=<<1,3,8>|<0,5,2>>; #### column by column

><A|A>;

><A,A>;

A matrix is a two-dimensional array in Maple and each entry in the matrix can be accessed by specifying thetwo indices (row and column) of the entry. For example

>A:=<<2,0,8>|<5,1,4>>;

2

4

2 50 18 4

3

5

>A[2,1];

0

>A[3,1]*A[1,2];

40

Matrices can also be generated by giving a rule which generates thye entries. You first specify the size of thematrix and then give the rule

>B:=Matrix(2,3, (i,j)->i/j);

>C:=Matrix(2,2, (i,j)->t^(i*j));

B =

»

1 1/2 1/32 1 2/3

C =

»

t t2

t2 t4

Patterned Matrices

Some matrices have entries that fall into a particular pattern. For example, the following matrices are square diagonalmatrices:

B =

2

6

6

4

4 0 0 00 8 0 00 0 3 00 0 0 −1

3

7

7

5

C =

2

6

6

6

4

1 0 0 0 00 1 0 0 00 0 1 0 00 0 0 2 00 0 0 0 5

3

7

7

7

5

The maple command DiagonalMatrix can be used for this type of matrix. With this command you just have to inputthe entries on the diagonal of the matrix. So we could define B and C as

>B:=Diagonalmatrix(<4,8,3,-1>);

>C:=DiagonalMatrix(<1,1,1,2,5>);

The 10 × 10 identity matrix could be defined as

>I10:=IdentityMatrix(10);

Note how the $ sign is used to create a sequence of ten 1’s.Another type of patterned matrix is called a band matrix. The following are examples of band matrices:

B1 =

2

6

6

6

4

4 1 0 0 0−2 4 1 0 00 −2 4 1 00 0 −2 4 10 0 0 −2 4

3

7

7

7

5

B2 =

2

6

6

6

6

6

4

b c d 0 0 0a b c d 0 00 a b c d 00 0 a b c d0 0 0 a b c0 0 0 0 a b

3

7

7

7

7

7

5

In Maple we can enter

Page 71: The singular value decomposition

361

>B1:=BandMatrix(<-2,4,1>,1, 5);

>B2:=BandMatrix(<0,a,b,c,d>,1,6);

The BandMatrix command requires three inputs. The first input must be a vector containing the entries down theband. The next entry specifies how many diagonal bands extend below the main diagonal. The third entry (or thirdand fourth) specifies the size of the matrix.

Solving Systems of Equations

As usual with Maple there are many ways of solving a system of equations. We will only mention three ways.

Suppose we want to solve the system Ax = b where A =

2

4

1 2 12 1 23 4 5

3

5 and b =

2

4

20−5

3

5. In this case A is a square

invertible matrix so the solution is given by x = A−1b. The Maple commands for this are as follows:

>A:=<<1,2,3>|<2,1,4>|<1,2,5>>;

>b:=<2,0,-5>:

>sol:=A^(-1).b;

[7/2, 4/3, -25/6]

The method that was just used only works if A is invertible. We can solve any system by setting up the augmentedmatrix of the system and then putting it in reduced row echelon form. The last column of the reduced matrix willcontain the solution.

>A:=<<1,2,3>|<2,1,4>|<1,2,5>>;

>b:=<2,0,-5>:

>ReducedRowEchelonForm(A,b):

>col(%, 4);

[7/2, 4/3, -25/6]

>LinearSolve(A,b); ### another option

Suppose A =

»

1 2 2 32 1 3 4

and b =

»

30

then Ax = b must have infinitely many solutions. We can find these

solutions in Maple as follows:

>A:=<<1,2>|<2,1>|<2,3>|<3,4>>:

>b:=[3,0]:

>LinearSolve(A,b);

[-1-4/3*s-5/3*t, 2-1/3*s-2/3*t, s, t]

Matrix and Vector Opertations

The simplest operations on matrices and vectors are scalar multiplication and addition. These two operationsallow you to create linear combinations.

We will use the following matrices and vectors for our examples in this section:

A =

2

4

2 11 −10 2

3

5 , B =

2

4

3 3−1 42 1

3

5 , u =

»

13

, v =

»

3−1

then we can evaluate 5A, −3u, 2A + 3B, 4u − 8v as follows

>A:=<<2|1|1>|<-1|0|2>>:

>B:=<<3|3|-1>|<4|2|1>>:

>u:=<1,3>:

>v:=<3,-1>:

>5*A;

>-3*u;

>2*A+3*B;

>4*u-8*v;

So addition and scalar multiplication are computed using the symbols + and *.Matrix multiplication is not the same as scalar multiplication and is represented by a different symbol in Maple

. Matrix multiplication is indicated by a dot, that is the . symbol. So if we wanted to compute AB, BA, A2, Au,B(u + v) using the same matrices and vectors as above we could enter

Page 72: The singular value decomposition

362 A. Linear Algebra with Maple

>A.B;

>B.A;

>A.A; ### one way of finding A^2

>A^2; ### an alternate way of finding A^2

>A.u;

>A.(u+v);

Finding the transpose or inverse of a matrix can be found as follows (we show two methods for finding the inverse).

>Transpose(A);

>A^(-1); ### this stands for the inverse of A but does not compute it

For example, using the same matrices as above suppose we want to find a matrix C such that

A(C + B) = BtA

Solving this equation symbolically we would get

C = A−1BtA − B

We can then compute this result in Maple

>C:=A^(-1).Transpose(B).A-B;

The dot product can be found in two ways. To find u · v we can enter either

>DotProduct(u,v);

>Transpose(u).v);

These two methods result from the equation u · v = uT v.There is a similar command for the cross product. A cross product can be evaluated using

>CrossProduct(<1,2,3>,<4,5,6>);

[-3, 6, -3]

>CrossProduct(<A,B,C>,<X,Y,Z>);

[B*Z-C*Y, C*X-A*Z, A*Y-B*X])

Determinants

A determinant can be computed in Maple with the det command. For example suppose we want to use Cramer’sRule to solve

2x1 + 3x2 + 4x3 = a

3x1 + 2x2 + 3x3 = b

5x1 + 5x2 + 9x3 = c

for x2.Cramer’s Rule says that

x2 =

˛

˛

˛

˛

˛

˛

2 a 43 b 35 c 9

˛

˛

˛

˛

˛

˛

˛

˛

˛

˛

˛

˛

2 3 43 2 35 5 9

˛

˛

˛

˛

˛

˛

In Maple we could do

>a1:=<2,3,5>:

>a2:=<3,2,5>:

>a3:=<4,3,9>:

>y:=<a,b,c>:

>A:=<a1|a2|a3>:

>A2:=<a1|y|a3>:

>Determinant(A2)/Determinant(A); ### Cramer’s Rule for x2

Page 73: The singular value decomposition

363

Examples

Example 1

We will solve the system

x + y + z = 3

3x − 2y + z = 1

4x − y + 2z = 4

First we will show how to plot these equations.

>e1:=x+y+z=3:

>e2:=3*x-2*y+z=1:

>e3:=4*x-y+2*z=4:

>plots[implicitplot3d]({e1,e2,e3},x=-4..4,y=-4..4,z=-4..4,axes=boxed,style=patchnogrid,shading=zgrayscale);

–4

–2

0

2

4

x

–20

24

y

–4

–2

0

2

4

Figure A.1: The plot of the system.

The plot shows that the three planes making up the system intersect along a line.

We could solve this system by

>solve({e1,e2,e3}, {x,y,z});

{y = 8/5 − 2/5 z, z = z, x = 7/5 − 3/5 z}This result means that z is free and so the solution would correspond to the line

2

4

8/50

7/5

3

5 + t

2

4

−2/51

−3/5

3

5

We could also solve this system by setting up the augmented matrix and reducing.

>A:=<<1,3,4>|<1,-2,-1>|<1,1,2>|<3,1,4>>;

>ReducedRowEchelonForm(A);

2

6

6

4

1 0 3/5 7/5

0 1 2/5 8/5

0 0 0 0

3

7

7

5

It should be clear that this reduced form gives the same solution as the previous method.

Page 74: The singular value decomposition

364 A. Linear Algebra with Maple

Example 2

Given

v1 =

2

6

6

4

1234

3

7

7

5

,v2 =

2

6

6

4

2345

3

7

7

5

, v3 =

2

6

6

4

3456

3

7

7

5

, v4 =

2

6

6

4

4567

3

7

7

5

Find a basis for Span (v1, v2, v3, v4) from among hese vectors.

We can solve this in Maple as follows

>v1:=<1,2,3,4>:

>v2:=<2,3,4,5>:

>v3:=<3,4,5,6>:

>v4:=<4,5,6,7>:

>A:=<v1|v2|v3|v4>:

>ReducedRowEchelonForm(A);

2

6

6

6

6

6

4

1 0 −1 −2

0 1 2 3

0 0 0 0

0 0 0 0

3

7

7

7

7

7

5

Maple has done the computation but it is up to us to give the correct interpretation to this result. In this casewe see that two columns of the reduced form contain pivots. The corresponding columns of the original matrix wouldbe the basis we are looking for. So our basis is {v1, v2}.

Example 3

Find all 2 × 2 matrices satisfying A2 = 0.

We start by defining

A =

»

a bc d

>A:=<<a,c>|<b,d>>:

>B:=A^2:

B =

"

a2 + bc ab + bd

ca + dc bc + d2

#

Now we want each entry in B to equal 0. The next line shows haw we can refer to these entries in Maple andhave Maple solve the desired equations.

>solve( {B[1,1]=0, B[1,2]=0, B[2,1]=0, B[2,2]=0}, {a,b,c,d} );

{c = 0, d = 0, b = b, a = 0} ,

c = c, d = d, a = −d, b = −d2

c

ff

This result means that there are two basic solutions. If c = 0 then there is a solution of the form»

0 b0 0

where b is free.

If c 6= 0 then there is a solution of the form

"

−d − d2

cc d

#

where c and d are free.

Page 75: The singular value decomposition

365

Example 4

For what values of a and b do the vectors2

4

122

3

5 ,

2

4

21a

3

5

2

4

1ab

3

5

from a basis of R3?We will illustrate two methods of answering this question.

>A:=<<1,2,2>|<2,1,a>|<1,a,b>>:

>GaussianElimination(A);

2

6

6

4

1 2 1

0 −3 a − 2

0 0 b + 2/3 + 1/3 a2 − 2 a

3

7

7

5

In order for these vectors to be a basis of R3 we need the entry in the third row, third column to be non-zero.We can state this condition as

b 6= −1

3a2 + 2a − 2

3We could also do the following

>Determinant(A);

−3 b − a2 + 6 a − 2

For these vectors to be a basis of R3 we want the determinant to be non-zero. This would give the same result as thefirst method.

Page 76: The singular value decomposition

366 A. Linear Algebra with Maple

Page 77: The singular value decomposition

Appendix B

Complex Numbers

Consider the equation x2 +1 = 0. If you try to solve this equation, the first step would be to isolate the x2 term givingx2 = −1. You would then take the square root and get x =

√−1. Algebraically this would be the solution (or rather

one of the solutions, the other being −√−1). However there is no real number which satisfies this condition since

when you square as a real number the result can never be negative. In the 16th century mathematicians introducedthe symbol i to represent this algebraic solution, and referred to this solution as an imaginary number. In general,an imaginary number is any real multiple of i.

A complex number is a number that is the sum of a real number and an imaginary number. A complex numberis usually represented as a + bi where a and b are real numbers. In this notation a is referred to as the real part, andb is referred to as the imaginary part of the complex number. There are special symbols that are commonly used torefer to the real and imaginary parts of a complex number. If z is a complex number then ℜz indicates the real partof z and ℑz indicates the imaginary part of z.

Complex numbers satisfy the usual rules of addition and multiplication. The one complication is that anyoccurrence of i2 can be replaced by −1. Look at the following computations for example :

(2 + 5i) + (7 − 2i) = 9 + 3i

(2 + 5i)(7 − 2i) = 14 − 4i + 35i − 10i2 = 14 + 31i − 10(−1) = 24 + 31i

i3 = i2 i = −1 i = −i

Geometry of Complex Numbers

A correpondance can be set up between complex numbers and points in the plane. The real part gives thehorizontal coordinate, and the imaginary part gives the vertical coordinate. So, for example, the complex number3 + 2i would correspond to the point (3, 2). A purely real number would lie somewhere on the horizontal axis and apurely complex number would lie on the vertical axis. When plotting complex numnbers in this way it is standard tocall the horizontal axis the real axis and the vertical axis the imaginary axis. If we associate points in the plane withposition vectors (that is, vectors whose starting point is the origin), then adding complex numbers is like adding thecorresponding vectors. Multiplying a complex number by a real number is like multiplying the vector by a scalar.

Given a complex number z = a + bi, the complex conjugate of that number is z = a − bi. So the conjugate ofa complex number is formed by changing the sign of the imaginary part. Geometrically, the conjugate of z is themirror image of z through the real axis. Notice that z = z if and only if zis purely real. Two basic properties of theconjugate are:

z1 + z̄2 = z1 + z2

z1z̄2 = z1z2

We will give a proof of the second of these properties. Let z1 = a + bi and z2 = c + di, then

z1z2 = (a + bi)(c + di)

= ac + adi + bci + bdi2

= ac − bd + (ad + bc)i

367

Page 78: The singular value decomposition

368 B. Complex Numbers

REAL AXIS

IMAGINARY AXIS

a+bi

a-bi

Figure B.1: The points a + bi and a − bi in the complex plane

and so we have

z1z2 = (a − bi)(c − di)

= ac − adi − bci + bdi2

= ac − bd − (ad + bc)i

= z1z2

The above result can be generalized to matrices and vectors with complex entries. For a complex matrix A andcomplex vector x we have:

Ax = Ax

Or, more particularly, if Ax = λx then Ax = λx. From this it follows that is A has only real entries then Ax = λx.In other words, if A has only real entries and has complex eigenvalues then the eigenvalues and eigenvectors come incomplex pairs. In other words, if λ is an eigenvalue then so is λ, and if x is an eigenvector (corresponding to λ) thenx is an eigenvector corresponding to λ.

Another important property of the conjugate is that if z = a + bi then

zz = (a + bi)(a − bi) = a2 − abi + abi − b2i2 = a2 + b2

which you should recognize as the distance of z from the origin squared (or the length of the corresponding vectorsquared). This distance is called the magnitude (or length, or absolute value) of the complex number and written|z| =

√zz̄. This equation has an important consequence when dealing with complex vectors: recall that if v is a real

vector then ‖v‖2 = vT v. But if v is a complex vector then ‖v‖2 = v̄T v.1

For example, suppose v =

»

1i

then

vT v =ˆ

1 i˜

»

1i

= 12 + i2 = 1 − 1 = 0

which would clearly be incorrect for the length. But

vT v =ˆ

1 −i˜

»

1i

= 12 − i2 = 1 + 1 = 2

Taking the square root we then get the correct length, ‖v‖ =√

2.

1The conjugate of the transpose of a complex matrix A is usually written A∗. So if v is a complex vector then‖v‖2 = v∗v. This equation is also valid for real vectors since v∗ = vT if all the entries are real.

Page 79: The singular value decomposition

369

The conjugate also has some use with division of complex numbers. To rationalize the denominator of a complexfraction means to eliminate any imaginary terms from the denominator. This can be done by multiplying the numeratorand denominator of the fraction by the conjugate of the denominator. For example:

1 + i

2 + i=

(1 + i)(2 − i)

(2 + i)(2 − i)=

3 + i

5=

3

5+

1

5i

Polar Representation of Complex Numbers

Any complex number (or, more generally, any point in the plane) can be characterized by the distance of thepoint from the origin, r, and the angle measured from the positive x axis, θ. So, for example, the complex number1 + i has r =

√2 and θ = π/4. If we square this complex number we get (1 + i)2 = 1 + 2i + i2 = 2i. In this case the

value of r would be 2 and θ would be π/2.In general if a complex number lies at a distance r and an angle θ the real coordinate would be given by r cos θ

and the imaginary coordinate would be r sin θ. So this complex number could be written as r cos θ + ir sin θ =r (cos θ + i sin θ).

There is another important notation for complex numbers that is related to the idea of power series. From calculusyou should recall that

ex = 1 + x +x2

2+

x3

6+

x4

24+ · · ·

Substituting x = iθ into the above and simplifying we get

eiθ = 1 + iθ +i2θ2

2+

i3θ3

6+

i4θ4

24+ · · ·

= 1 + iθ − θ2

2− iθ3

6+

θ4

24+ · · ·

The real part of this last expression is 1 − θ2

2+

θ4

24+ · · · which is the power series for cos θ. The imaginary part

is θ − θ3

6+

θ5

120+ · · · which is the power series for sin θ. As a result we get what is called Euler’s Formula:

eiθ = cos θ + i sin θ

As a result we have the fact that any complex number can be represented as reiθ. The conjugate of this complexnumber would be re−iθ. The absolute value of r is just the magnitude of the complex number, and the angle θ iscalled the argument of the complex number.

This notation makes one important aspect of multiplication of complex numbers easy to see. Suppose we havea complex number z1 = r1eiθ. This point is located at a distance r from the origin and at an angle θ from thepositive real axis. Now suppose we multiply this complex number by another complex number z2 = r2eiφ. We getz1z2 = r1eiθr2eiφ = r1r2ei(θ+φ). What has happened to the original complex number? Its length has been scaledby r2 and the angle has been rotated to θ + φ. In other words, multiplication by a complex number can be seen as acombination of a scaling and a rotation.

Roots of Unity

Suppose you have the equation z3 = 1. One solution is clearly z = 1. This is the real cube root of 1, but there aretwo other complex solutions. If we write z = reiθ, then we want r3ei3θ = 1 = ei2πN for any integer N . This impliesthat r = 1 and that 3θ = 2πN . We then have θ = 2πN

3, and this gives three different solutions θ = 0, 2π/3,−2π/3.

(All the other values of θ would be coterminal with these angles.) If we plot these points in the complex plane alongwith the unit circle we get the following:

In general, if we want the Nth roots of 1 we can start with w = ei2π/N . Then wN =`

ei2π/N´N

= ei2Pi = 1, so

w is an Nth root of 1. Then wk is also an Nth root of 1. Thus 1, w, w2, w3, . . . , wN−1 are the Nth roots. By earlierremarks, these will be evenly spaced points on the unit circle.

Page 80: The singular value decomposition

370 B. Complex Numbers

–1

1

–1 1

Figure B.2: The cube roots of 1.

Exercises1. Let z1 = 2 + i and z2 = 1 + 2i. Find

(a) z1z2

(b) z1 − z1

(c) z22

2. Let z = 1 +√

3i.

(a) Write z in the form reiθ.

(b) Write z in the form reiθ.

(c) Write z2 in the form reiθ.

(d) Write z6 in the form reiθ.

3. Find all solutions of z3 = 1. Do this by rewriting the equation as z3 − 1 = 0. Then factor the left hand side:(z − 1)(z2 + z + 1) = 0. You should get 3 solutions. Give your solutions in both the standard form as a + biand in exponential form as reiθ.

4. Find all four solutions to z4 = 1.

5. Start with the equation eiθ = cos θ + i sin θ. Square both sides of this equation. Use this result to findtrigonometric identities for cos 2θ and sin 2θ.

6. Show that1

z=

|z|2 for any complex number z 6= 0.

7. (a) Find |ei| and |ie|.(b) Plot the two points ei and ie in the complex plane.

Page 81: The singular value decomposition

371

Using MAPLE

We will use Maple to illustrate some of the aspects of complex numbers discussed in this section.In Maple the symbol I is used to stand for

√−1. In the following example we will begin by defining the complex

number z = 7.4 + 3.2i.

>z:=7.4+3.2*I:

>abs(z);

8.062257748

>Re(z);

7.4

>Im(z);

3.2

>conjugate(z);

7.4-3.2*I

>conjugate(z)*z;

65.00

>sqrt(conjugate(z)*z);

8.062257748

>convert(z,polar);

polar(8.062257748,.4081491038)

>8.062257748*exp(.4081491038*I);

7.399999999+3.200000000*I

>argument(z);

.4081491038

>convert(z^2,polar);

polar(65.00, .816298)

The command abs(z) computes |z|, the magnitude of z.The commands Re and Im return the real and imaginary parts of a complex number.The conjugate command returns z. Notice that the product conjugate(z)*z returns the square of abs(z).The command convert(z,polar) returns the value of r and θ required to write z in the form reiθ. The following

command computes this exponential form and returns the original z (with some rounding error). Notice that the valuesreturned by convert(z2̂,polar) show that when z is squared the magnitude gets squared and the argument gets doubled.The Maple command argument(z) will return just the argument of z.

Next we will use Maple to illustrate Euler’s Formula.

>f:=exp(I*t);

>plot([Re(f), Im(f)],t=-9..9,linestyle=[1,2],thickness=2);

This gives Figure B.3.You should understand where these plots came from. Since eit = cos t + i sin t plotting the real and imaginary parts

results in plots of a cosine and sine function.Compare the above with the following:

>w:=.3+.9*I;

>g:=exp(w*t);

>plot([Re(g), Im(g)],t=-9..9,linestyle=[1,2],thickness=2);

This gives the Figure B.4.To understand this result notice that we have

e(.3+.9i)t = e.3te.9it = e.3t (cos(.9t) + i sin(.9t)) = e.3t cos(.9t) + ie.3t sin(.9t)

So plotting the real and imaginary parts returns a cosine and sine function but now they are being scaled by a functionwhich is increasing exponentially.

For one last example we will use Maple to plot the solutions to z20 = 1 (that is, to plot the 20 twentieth roots of1). The first command below uses Maple to compute the roots and place them in a list called sols. The second line usescomplexplot procedure in Maple which can be used to plot a list of complex numbers.

Page 82: The singular value decomposition

372 B. Complex Numbers

–1–0.8–0.6–0.4–0.2

0

0.20.40.60.8

1

–8 –6 –4 2 4 6 8t

Figure B.3: The real and imaginary parts of eit.

–4–2

2468

101214

–8 –6 –4 2 4 6 8t

Figure B.4: The real and imaginary parts of e(.3+.9i)t.

>sols:=[solve(z^20=1,z)];

>plots[complexplot](sols,style=point);

This gives Figure B.5.

–1–0.8–0.6–0.4–0.2

0.20.40.60.8

1

–1 –0.6 0.20.40.60.8 1

Figure B.5: The solutions to z20 = 1.

Page 83: The singular value decomposition

Appendix C

Linear Transformations

Let U and V be vector spaces and let T be a transformation (or function, or mapping) from U to V . That is,T is a rule that associates each vector, u, in U with a unique vector, T (u), in V . The space U is called the domainof the transformation and V is called the co-domain. The vector T (u) is called the image of the vector u undertransformation T .

Definition 23 A transformation is linear if:

1. T (u + v) = T (u) + T (v) for all u and v in the domain of T .

2. T (cu) = cT (u) for all u in the domain of T and all scalars c.

The combination of the two properties of a linear transformation implies that

T (c1v1 + c2v2 + · · · + cnvn) = c1T (v1) + c2T (v2) + · · · + cnT (vn)

for any set of vectors, vi, and scalars, ci.We will just make a few observations about linear transformations.

Theorem C.1 If T is a linear transformation then T (0) = 0.

Proof Let T : U → V be a linear transformation and let u be any vector in U , then

T (0U ) = T (u − u) = T (u) − T (u) = 0V

In the above 0U stands for the zero vector in U and 0V is the zero vector in V .

Theorem C.2 If T is a linear transformation from Rn to Rm then T (u) = Au for some m × n matrix A.

Proof Let u =

2

6

6

6

4

u1

u2

...un

3

7

7

7

5

be any vector in Rn then

T (u) = T (u1e1 + u2e2 + · · · + unen)

= u1T (e1) + u2T (e2) + · · · + unT (en)

T (e1) T (e2) · · · T (en)˜

2

6

6

6

4

u1

u2

...un

3

7

7

7

5

= Au

373

Page 84: The singular value decomposition

374 C. Linear Transformations

The matrix A in the above theorem is called the standard matrix of the linear transformation T . Theabove proof in fact gives a method for finding the matrix A. The proof shows that the columns of A will be the imagesof the standard basis under the transformation.

For example, suppose A =

»

1 1 22 2 1

and u =

2

4

3−1−2

3

5. The linear transformation T (x) = Ax would be from R3

to R2. This is sometimes written T : R3 7→ R2. The image of u under this transformation would be

T (u) =

»

1 1 22 2 1

2

4

3−1−2

3

5 =

»

−22

So any linear transformation from Rn to Rm is equivalent to a matrix multiplication. What happens with othervector spaces? There are many familiar operations which qualify as linear transformations. For example, in vectorspaces of differentiable functions the operation of finding a derivative is a linear transformation because

(f + g)′ = f ′ + g′

(cf)′ = cf ′

where f and g are functions and c is a scalar.Or in the vector spaces of matrices, taking the transpose is a linear transformation because

(A + B)T = AT + BT

(cA)T = cAT

When you take the determinant of a matrix the inputs are square matrices and the outputs are real numbers, socomputing a determinant is a transformation from the vector space of n× n matrices to R1 but it is not linear since

det(A + B) 6= det(A) + det(B)

det(cA) 6= c det(A)

It turns out that we can say something specific about linear transformations between finite-dimensional vectorspaces:

Suppose T is a linear transformation where the domain and co-domain are both finite dimensional vector spaces.In this case if we represent each vector by coordinates in terms of some basis then the vector spaces will look like Rn

for some value of n (the dimension of the spaces).For example, suppose we had T : P3 → P3 defined by T (p(x)) = p′(x).If we use a basis

˘

1, x, x2, x3¯

then the

polynomial c0 + c1x + c2x2 + c3x3 would be represented by

2

6

6

4

c0c1c2c3

3

7

7

5

and T (p) = c1 + 2c2x + 3c3x2would be represented

by

2

6

6

4

c12c23c30

3

7

7

5

and this transformation would be equivalent to multiplying by the matrix

2

6

6

4

0 1 0 00 0 2 00 0 0 30 0 0 0

3

7

7

5

It is also possible for one or both of the domain and co-domain to be infinite dimensional and in this case thetransformation is usually not represented by a matrix multiplication. But even here it is possible. Suppose for examplewe had an infinite dimensional vector where the transformation is just a shift in the coordinates, i.e.

T :

2

6

6

6

4

c0c1c2...

3

7

7

7

5

2

6

6

6

4

c1c2c3...

3

7

7

7

5

This could be seen as multiplication by the matrix2

6

6

6

4

0 1 0 0 · · ·0 0 1 0 · · ·0 0 0 1 · · ·...

.

..

3

7

7

7

5

In this case the matrix would have an infinite number of rows and columns.Finally we point out why they are called “linear” transformations.

Page 85: The singular value decomposition

375

Theorem C.3 If T : U → V is a linear transformation and L is a straight line in U , then T (L) is either a straight

line in V or a single point in V .

Proof Any straight line L in U must have an equation of the form x = u0 + tu1. This is a line through u0 in thedirection of u1. If we apply T to this line we get:

T (L) = T (u0 + tu1)

= T (u0) + T (tu1)

= T (u0) + tT (u1)

This result can be seen as a line through T (u0) in the direction of T (u1). If T (u1) = 0 then the transformation givesjust a single point.

You have to be careful in interpreting the above. For example, in the vector space of differentiable functionsthe expression t sin x would correspond to a straight line through the origin. The “points” on this line would beexpressions such as 2 sinx, 3 sinx, −3.7 sinx.

• It is a straight line because it corresponds to all scalar multiples of a vector. The usual plot of sinx as awaveform is totally irrelevant in this case.

• The origin in this case is not the point (0,0). The origin would be the zero function, f(x) = 0.

As pointed out earlier, taking the derivative of a function is a linear transformation. If we apply this linear transfor-mation to this line (by differentiating with respect to x) we get t cos x, which is another straight line.

Here’s another example. The expression t sinx+(1−t) cos x gives a straight line in the vector space of differentiablefunctions. The “points” in this space are functions. When you plug in t = 0 you get cos x. When you plug in t = 1you get sin x. So this is a straight line passing through the points cos x and sin x. This type of abstraction is one ofthe basic features of higher mathematics. Here we have taken a simple, intuitive geometric idea from R2 (the idea ofa line through two points) and extended it to an abstract space.

Page 86: The singular value decomposition

376 C. Linear Transformations

Page 87: The singular value decomposition

Appendix D

Partitioned Matrices

Suppose you have the 5 × 5 matrix

A =

2

6

6

6

4

1 1 4 3 26 3 1 7 89 0 1 2 28 7 6 5 81 1 3 4 2

3

7

7

7

5

This matrix can be partioned, for example, as follows:

A =

2

6

6

6

4

1 1 4 3 26 3 1 7 89 0 1 2 28 7 6 5 81 1 3 4 2

3

7

7

7

5

=

»

A11 A12

A21 A22

The entries in A can be divided into a group of submatrices. In this example A11 is a 3 × 3 matrix, A12 is a 3 × 2matrix, A21 is a 2 × 3 matrix, and A22 is a 2 × 2 matrix. (This would not be the only way of partitioning A. Drawany collection of horizontal and vertical lines through the matrix and you can create a partition.

For another example let I3 be the 3 × 3 identity matrix. The following are all ways of partitioning I3:

ˆ

e1 e2 e3˜

2

4

eT1

eT2

eT3

3

5

»

1 00 I2

The important thing about partitioned matrices is that if the partitions have compatible sizes then the usual rulesfor matrix addition and multiplication can be used with the partitions. For example we could write

A + B =

»

A11 A12

A21 A22

+

»

B11 B12

B21 B22

=

»

A11 + B11 A12 + B12

A21 + B21 A22 + B22

if the various submatrices have compatible sizes for the additions to be defined (i.e., A11 and B11 must have the samesize, etc.)

Similarly we could write

AB =

»

A11 A12

A21 A22

– »

B11 B12

B21 B22

=

»

A11B11 + A12B21 A11B21 + A12B22

A21B11 + A22B21 A21B12 + A22B22

provided that all the subsequent multiplications and additions are defined.For example suppose A is an invertible n×n matrix,I is the n×n identity matrix, and O is the n×n zero matrix,

then»

O AI O

– »

O IA−1 O

=

»

I OO I

Or suppose that matrix B is a 3×7 matrix. If you can find a pivot in of the first 3 columns of B then the reducedrow echelon form of B would have the form

ˆ

I C˜

where I is the 3× 3 identity matrix and C is a 3× 7 matrix. Nownotice that

ˆ

I C˜

»

−CI

= O

377

Page 88: The singular value decomposition

378 D. Partitioned Matrices

Ask yourself: what are the dimensions of the matrices in the above equation. The above equation also implies that

the columns of

»

−CI

form a basis for Nul B. (Why?)

Two other familiar examples of multiplying partitioned matrices are when each row or column is a partition. Forexample if we have the matrix product AB and we let aT

i be the rows of A and bi be the columns of B then we canwrite

AB =

2

6

6

6

4

aT1

aT2

aT3...

3

7

7

7

5

ˆ

b1 b2 b3 . . .˜

=

2

6

6

6

4

aT1 b1 aT

1 b2 aT1 b3 . . .

aT2 b1 aT

2 b2 aT2 b3 . . .

aT3 b1 aT

3 b2 aT3 b3 . . .

.

.....

.

..

3

7

7

7

5

This is just the inner product form for matrix multiplication.On the other hand if we have the matrix product CD and we partition C into columns and D into rows we have

CD =ˆ

c1 c2 c3 . . .˜

2

6

6

6

4

dT1

dT2

dT3...

3

7

7

7

5

= c1dT1 + c2d

T2 + c3d

T3 + · · ·

This is the outer product form for matrix multiplication.

As a last example of using partitioned matrices we will give a proof that a symmetric matrix, A, is orthogonallydiagonalizable by some matrix P .

We will proof this by induction on the size of the matrix. If A is 1 × 1 then it is already diagonal and we can letP = [1].

Now assume the statement is true for matrices of size (n− 1)× (n − 1). We have to show that it is true for n× nmatrices. We know that A has only real eigenvalues, so let λ1 be some real eigenvalue of A with a correspondingunit eigenvector v1. We can find an orthonormal basis for Rn {v1,v2, . . . , vn} (any such basis will do) and letP =

ˆ

v1 v2 . . . vn˜

. Now

P T AP =

2

6

6

6

4

vT1

vT2...

vn

3

7

7

7

5

ˆ

Av1 Av2 . . . Avn˜

=

2

6

6

6

4

vT1

vT2...

vn

3

7

7

7

5

ˆ

λ1v1 Av2 . . . Avn˜

=

»

λ1 00 B

where B is an (n−1)×(n−1) matrix. Furthermore, P T AP is symmetric, so B must be symmentric. By the inductionhypothesis we now have

QT BQ = D

for some orthogonal matrix Q and diagonal matrix D.

Let R =

»

1 00 Q

. We then have

RT

»

λ1 00 B

R =

»

1 00 QT

– »

λ1 00 B

– »

1 00 Q

=

»

λ1 00 QT BQ

=

»

λ1 00 D

Finally, this means that RT P T APR = (PR)T APR =

»

λ1 00 D

. But PR is an orthogonal matrix since the

product of two orthogonal matrices is orhogonal. Let’s define S = PR. We then get ST AS is diagonal and so A isorthogonally diagonalizable.