singular value decomposition - lecture notes

Definitions and Notations Singular Value Decomposition Theorem Applications Summary

Singular Value DecompositionWhy and How

Royi Avital1

1Signal Processing Algorithms DepartmentElectromagnetic Area

Missile DivisionRafael

June, 2011


1 Definitions and NotationsNotationsDefinitionsIntroduction

2 Singular Value Decomposition TheoremSVD TheoremProof of the SVD TheoremSVD PropertiesSVD Example

3 ApplicationsOrder ReductionSolving Linear Equation SystemTotal Least SquaresPrincipal Component Analysis


Notations





Notations

Capital letter stands for a Matrix

A ∈ Cmxn, A ∈ Rmxn

Small letter stands for a column Vector

a ∈ Cmx1, a ∈ Rmx1

Referring a Row of a Matrix

Ai − The i − th Row of a Matrix

Referring a Column of a Matrix

Aj − The j − th Column of a Matrix


Definitions





Definitions

Unless written otherwise, The Complex Field is the defaultConjugate Operator

A∗

Transpose Operator

AT : ATij = Aji

Complex Conjugate Transpose Operator

AH : AHij = Aji

∗

Range Space and Null Space of OperatorLet L : X → Y be an Operator (Linear or otherwise). TheRange Space R (L) ⊆ Y is

R (L) = {y = Lx : x ∈ X}

The Null Space N (L) ⊆ X is

N (L) = {x ∈ X : Lx = 0}


Introduction





Introduction

Each Linear Operator A : Cn → Cm defines spaces as follows

The following properties hold

R (A)⊥N(

AH), R

(AH)⊥N (A)

rank (A) = dim (R (A)) = dim(R(

AH))

= rank(

AH)


Introduction

The action of linear operator A ∈ Cmxn

The following properties hold

rank (A) = rank(

AAH)= rank

(AHA

)= rank

(AH)

R (A) = R(

AAH), R

(AH)= R

(AHA

)


SVD Theorem





SVD Theorem

TheoremSVD Theorem:Every Matrix A ∈ Cmxn can be factored as A = UΣV H .Where U ∈ Cmxm, V ∈ Cnxn are Unitary.Σ ∈ Cmxn has the form Σ = diag (σ1, σ2, . . . , σp) , p = min (m, n).

Corollary (i)The columns of U are the Eigenvectors of AAH (Left Eigenvectors).

AAH = UΣV H(

UΣV H)H

= UΣV HV ΣHUH = UΣΣHUH

The columns of V are the Eigenvectors of AHA (RightEigenvectors).

AHA =(

UΣV H)H

UΣV H = V ΣHUHUΣV H = V ΣH ΣV H


SVD Theorem

TheoremSVD Theorem:Every Matrix A ∈ Cmxn can be factored as A = UΣV H .Where U ∈ Cmxm, V ∈ Cnxn are Unitary.Σ ∈ Cmxn has the form Σ = diag (σ1, σ2, . . . , σp) , p = min (m, n).

Corollary (ii)The p singular values on the diagonal of Σ are the square roots ofthe non zero eigenvalues of both AAH and AHA.

The SVD is unique up to permutations of (ui , σi , vi ) as long asσi 6= σj ⇔ i 6= j . If the "Algebraic Multiplicity" of a certaineigenvalue of AHA / AAH is larger than 1, Then, there’s a freedomof choosing the the vectors which spans the the null spaceAAH − λI / AHA− λI.


Proof of the SVD Theorem






Theorem

A = UΣV H

In order to prove the SVD Theorem 2 propositions should be used.

Proposition I∀A ∈ Cmxn AHA or AAH are Hermitian Matrix.

Proof.

Cij = AHiAj =

(AH

iAj)HH

=

(Aj H(AH

i)H)H

= AHjAi = Cji

H = Cji∗



Proposition II - Spectral Decomposition∀A ∈ Cnxn : Aij = Aji

∗ (Hermitian Matrix) Can be diagonalizedusing Unitary Matrix U ∈ Cnxn s.t. UHAU = Λ.

Proof.The Spectral Decomposition is a result of few properties ofHermitian Matrices:

For Hermitian Matrices the Eigenvectors of distinctEigenvalues are Orthogonal.Schur’s Lemma

∀A ∈ Cnxn ∃ U ∈ Cnxn Unitary s.t. UHAU = T

Where T ∈ Cnxn is Upper Triangular Matrix.When A has n distinct Eigenvalues the Proposition II is immediate.Otherwise it can be shown that if A is Hermitian, T is Hermitianand since it is Upper Triangular, it must be Diagonal Matrix.



Theorem∀A ∈ Cmxn can be factored as A = UΣV H .Where U ∈ Cmxm, V ∈ Cnxn are Unitary.Σ ∈ Cmxn has the form Σ = diag (σ1, σ2, . . . , σp) , p = min (m, n).

Proof.Let

AHAV = Vdiag (λ1,λ2, . . . ,λn)

be the Spectral Decomposition of AHA.Where the columns of V = [v1, v2, . . . , vn] are Eigenvectors andλ1,λ2, . . . ,λr > 0, λr+1 = λr+2 = . . . = λn = 0, Where r ≤ p.For 1 ≤ i ≤ r , Let

ui =Avi√

λi




Proof. ContinuedNotice that

〈ui , uj〉 = δi−j

The set {ui , i = 1, 2, . . . , r} can be extended using theGraham-Schmidt procedure to form an Orthonormal basis for Cm.Let

U = [u1, u2, . . . , um]

Then the set of ui are the Eigenvectors for AAH .




Proof. ContinuedThis is clear for the non zero Eigenvalues of AAH .For the zero Eigenvalues, The Eigenvectors must come from theNull Space of AAH . Since the Eigenvectors with zero Eigenvaluesare, By construction, Orthogonal to the Eigenvectors with non zeroEigenvalues that are in the Range of AAH , Hence must be in theNull Space of AAH .




Proof. ContinuedExamining the elements of UHAV .For i ≤ r the (i , j) element of UHAV is

uiHAvj =

1√λi

viHAHAvj =

λj√λi

viHvj =

√λj δij

For i > r We getAAHui = 0

Thus AHui ∈ N (A) and also AHui ∈ R(AH) as a Linear

Combination of the columns of AH . Yet R(AH)⊥N (A) Hence

AHui = 0.




Proof. Continued.Since AHui = 0 we get ui

HAvj = vjHAHui = 0. Thus UHAV = Σ,

Where Σ is diagonal (Main Diagonal).




Alternative ProofNotice that AHA and AAH share the same non zero eigenvalues(Could be proved independently from the SVD).Let AAHui = σi

2ui for i = 1, 2, . . . ,m.By Spectral Theorem:

U = [u1, u2, . . . , um] ,U ∈ Cmxm,UUH = UHU = Im

Thus∥∥AHui = σi

∥∥ for i = 1, 2, . . . ,m.




Alternative Proof ContinuedLet AHAvi = σi

2vi for i = 1, 2, . . . ,m.By Spectral Theorem:

V = [v1, v2, . . . , vm] ,V ∈ Cnxn,VV H = V HV = In

Utilizing the above for the non zero σi2:

AAHui = σi2ui ⇒ AHAAHui︸︷︷︸

zi

= σi2AHui︸︷︷︸

zi

Meaning zi and σi2 are eigenvectors and eigenvalues of AHA.




Alternative Proof ContinuedExamining zi yields:

zjHzi = uj

HAAHui = σi2uj

Hui ⇒ zi = ui σi ⇒ vi =ziσi

=AHui

σi

Consider the following n equations for i = 1, 2, . . . ,m:

Avi = AAH uiσi

(or zero) = σiui (or zero)




Alternative Proof ContinuedThese equations can be written as:

AV = UΣ⇔ A = UΣV H

Where U and V as defined above, Σ is an mxn matrix with the topleft nxn block in diagonal form with σi on the diagonal and thebottom are zeros.


SVD Properties





SVD Properties

It is often convenient to break the matrices in the SVD into twoparts, corresponding to the nonzero singular values and the zerosingular values. Let

Σ =

[Σ1

Σ2

]where

Σ1 = diag (σ1, σ2, . . . , σr ) ∈ Rrxr and σ1 > σ2 > . . . > σr ,

Σ2 = diag (σr+1, σr+2, . . . , σp)

= diag (0, 0, . . . , 0) ∈ R(m−r)x(n−r)

Then the SVD can be written as

A =[U1 U2

] [Σ1Σ2

] [V1

H

V2H

]= U1Σ1V1

H

Where U1 ∈ Cmxr , U2 ∈ Cmx(m−r), V1 ∈ Cnxr and V2 ∈ Cnx(n−r).


SVD Properties

The SVD can also be written as

A =r

∑i=1

σiuiviH

The SVD can also be used to compute 2 matrix norms:Hibert-Schmidt / Frobenius Norm

‖A‖2F = ∑i ,j‖Aij‖2 =

r

∑i=1

σi2

l2 Norm

‖A‖2 = supx 6=0

‖Ax‖‖x‖ = max (λ (A)) = σ1

Which implies

argmaxx 6=0

‖Ax‖‖x‖ = v1, argmax

x 6=0

∥∥xHA∥∥

‖x‖ = u1


SVD Properties

The Rank of a matrix is the number of nonzero singular valuesalong the main diagonal of Σ. Using the notation used before

rank (A) = r

The SVD is numerically stable way of computing the rank of amatrix.The range (Column Space) of a matrix is

R (A) = {b ∈ Cm : b = Ax}=

{b ∈ Cm : b = UΣV Hx

}= {b ∈ Cm : b = UΣy}= {b ∈ Cm : b = U1y} = span (U1)

The range of a matrix is spanned by the orthogonal set ofvectors in U1, the first r columns of U.


SVD Properties

Generally, the other fundamental spaces of a matrix A canalso be determined from the SVD:

R (A) = span (U1) = R(

AAH)

N (A) = span (V2)

R(

AH)

= span (V1) = R(

AHA)

N(

AH)

= span (U2)

The SVD thus provides an explicit orthogonal basis and acomputable dimensionality for each of the fundamental spacesof a matrix.


SVD Properties

Since the SVD is a decomposition of a given matrix into 2 Unitarymatrices and a diagonal matrix, all matrices could be described asa rotation, scaling and another rotation.This intuition is a result of the properties of unitary matrices whichbasically rotate the multiplied matrix.This property is farther examined when dealing Linear Equations.


SVD Example





SVD Example

Finding the SVD of a matrix (Numerically) using MATLABcommand - [U S V] = svd(A).Let

A =

[1 2 36 5 4

]Then

A = UΣV H

WhereU =

[−0.355 −0.934−0.934 0.355

]Σ =

[9.362 0 00 1.831 0

]

V =

−0.637 −0.653 0.408−0.575 −0.050 −0.8165−0.513 −0.754 0.408


SVD Example

Let A be a Diagonal Matrix

A =

[2 00 −4

]=

[0 1−1 0

] [4 00 2

] [0 11 0

]In this case, the U and V matrices just shuffle the columns aroundand change the signs to make the singular values positive.Let A be a Square Symmetric Matrix

A =

5 6 26 1 42 4 7

= UΣV H

Where

U = V =

0.592 −0.616 0.5180.526 −0.191 0.8280.610 0.763 −0.211

, Σ =

12.391 0 00 4.383 00 0 3.774

In this case, the SVD is the regular Eigen Decomposition.


Order Reduction





Order Reduction

The SVD of a matrix can be used to determine how near (In thesense of l2-norm) the matrix is to a matrix of a lower rank. It canalso be used to find the nearest matrix of a given lower rank.

TheoremLet A be an mxn matrix with rank(A) = r and let A = UΣV H .Let k < r and let

Ak =k

∑i=1

σiuiviH = UΣkV H

whereΣk = diag (σ1, σ2, . . . , σk)

Then ‖A− Ak‖2 = σk+1, and Ak is the nearest matrix of rank k toA (In the sense of l2-norm / Frobenius norm):

minrank(B)=k

‖A− B‖2 = ‖A− Ak‖2


Order Reduction

Proof.Since A− Ak = Udiag (0, 0, . . . , 0, σk+1, . . . , σr , 0, . . . , 0)V H itfollows that ‖A− Ak‖2 = σk+1.The second part of the proof is a "Proof by Inequality". ByDefinition of the matrix norm, for any unit vector z the followingholds:

‖A− B‖22 ≥ ‖(A− B) z‖22Let B be a rank - k matrix of size mxn. Then there exist vectors{x1, x2, . . . , xn−k} that span N (B) where xi ∈ Cn.Consider the vectors from the matrix V of the SVD,{v1, v2, . . . , vk+1} where vi ∈ Cn.


Order Reduction

Proof. Continued.The intersection, span (x1, . . . , xn−k) ∩ span (v1, . . . , vk+1) ⊆ Rn,cannot be zero since there are total of n + 1 vectors. Let z be avector from this intersection, normalized s.t. ‖z‖2 = 1. Then:

‖A− B‖22 ≥ ‖(A− B) z‖22 = ‖Az‖22

Since z ∈ span (v1, v2, . . . , vk+1), Az =k+1∑

i=1σi(vi

Hz)

ui Now

‖A− B‖22 ≥ ‖Az‖22 =k+1

∑i=1

σ2i

(vi

Hz)2≥ σ2

k+1

Lower bound is achieved by B =k∑

i=1σiuivi

H , with z = vk+1.


Order Reduction

Applications of Order Reduction:Image Compression


Order Reduction

Applications of Order Reduction:Noise ReductionBasic assumption - Noise is mainly pronounced in the smallsingular values.

Noiseless Matrix Noisy Matrix - Std 1 Noisy Matrix - Std 6 Noisy Matrix - Std 11


Order Reduction

Analyzing the effect of noise on the Singular Values


Order Reduction

Ground Truth Added Noise Std 6 Reconstruction using 140 SingularValues


Solving Linear Equation System






Consider the solution of the equation Ax = b.If b ∈ R (A) there is at least one solution:

If dim (N (A)) = 0 there is only one unique solution,xr ∈ R

(AH), s.t. Axr = b.

If dim (N (A)) ≥ 1 , The columns of A are not independent,there are infinite solutions. Any vector of the form x = xr + xnwhere xr ∈ R

(AH) is the solution from the previous case and

xn ∈ N (A) is a solution s.t. A (xr + xn) = b.Which solution should be chosen?Usually the solution with the minimum norm, xr .

If b /∈ R (A) there is no solution.Usually, the following vector is searched, x s.t. ‖Ax − b‖2 isbrought to minimum.



Assuming x = minx‖Ax − b‖2.

By definition b = Ax ∈ R (A).Meaning, the search is for b s.t.

∥∥b − b∥∥2 is minimized.



According to the Projection Theorem, only one vector b exists s.t.∥∥b − b∥∥2 is minimized. This vector is the projection of b on

R (A).

b = ProjR(A) (b) = AHb

Moreover,

x = minx‖Ax − b‖2 ⇔

(AHA

)x = AHb

Intuitively, the procedure is as following:Project b onto the Column Space R (A), namely,b = ProjR(A) (b) = AHb.Project x onto the Row Space R

(AH), namely,

ProjR(AH ) (x) = Ax .Project the previous result Ax onto the Column Space R (A),namely, ProjR(A) (Ax) = AH (Ax).



The equation(AHA

)x = AHb is called the Normal Equations. If

the columns of A are independent then AHA is invertible and xcould be calculated as the following:

x =(

AHA)−1

AHb

This is the Least Squares solution using the Pseudo Inverse of A:

A† =(

AHA)−1

AH



Yet, if the columns of A are linearly dependent the Pseudo Inverseof A can’t be calculated directly.If A has dependent columns, then the nullspace of A is not trivialand there is no unique solution.The problem becomes selecting one solution out of the infinitenumber of possible solutions.As mentioned, commonly accepted approach is to select thesolution with the smallest norm (Length).This problem could be solved using the SVD and definition of thegeneralized Pseudo Inverse of a matrix.



DefinitionThe Pseudo Inverse of a matrix A = UΣV H , denoted A† is givenby

A† = V Σ†UH

Where Σ† is obtained by transposing Σ and inverting all non zeroentries.

Proposition IIILet A = UΣV H and x † = A†b = V Σ†UHb. Then AHAx † = AHb.

Namely, using the solution given by the Pseudo Inverse matrixcalculated using the SVD holds the Normal Equations. Thisdefinition of Pseudo Inverse exists for any matrix.



Proposition IIILet A = UΣV H and x † = A†b = V Σ†UHb. Then AHAx † = AHb.

proofIt’s sufficient to show that AH (Ax † − b

)= 0.

Ax † − b =(

UΣV H)

V Σ†UHb − b

=(

UΣΣ†UH − I)

b

=(

U(

ΣΣ† − I)

UH)

b



Proof. Continued.Thus,

AH(

Ax † − b)

= V ΣHUHU(

ΣΣ† − I)

UHb

= V ΣH(

ΣΣ† − I)

UHb

One should observe that

ΣH =

[ΣH

rxr 0rx(m−r)0(n−r)xr 0(m−r)x(m−r)

]Where Σr is r by r submatrix of non zero diagonal entries in Σ and

ΣΣ† − I =[

0rxr 0rx(m−r)0(n−r)xr −I (m−r)x(m−r)

]Hence the multiplication yields the zero matrix.



Proposition IVThe vector x = A†b is the shortest Least Squares solution toAx = b, namely,

‖x‖2 = min {‖x‖2 : ‖Ax − b‖2 is minimal}

proofUsing the fact both U and V are Unitary

min‖x‖2

{min

x‖Ax − b‖2

}= min

‖x‖2

{min

x

∥∥∥UΣV Hx − b∥∥∥2

}= min

‖V Hx‖2

{min

x

∥∥∥ΣV Hx − UHb∥∥∥2

}= min

‖y‖2

{min

x

∥∥∥Σy − UHb∥∥∥2

}



Proof.

Observing at min‖y‖2

{min

x

∥∥Σy − UHb∥∥2

}. Since Σ is diagonal (Main

diagonal to the least) there’s only one Least Squares solution,y = Σ†UHb.Thus,

x = V y = V Σ†UHb

will attain the minimum norm.



As written previously, any solution which holds the NormalEquations is the Least Squares solution.

x = minx‖Ax − b‖2 ⇔

(AHA

)x = AHb

Yet, one should observe x ∈ R(AH), namely, the solution lies in

the Row Space of A. Hence, its norm is minimal among allsolutions. In short, the Pseudo Inverse simultaneously minimizesthe norm of the error as well as the norm of the solution.



Example IExamining the following Linear System:

Ax = b

Where,

A =

8 10 3 309 6 6 181 1 10 3

, x =

x1x2x3x4

=

1236

, b =

21714751

Obviously, A−1 can’t be calculated. Moreover, since rank (A) = 3neither

(AHA

)−1 exists. Yet the Pseudo Inverse using the SVDdoes exists.



Using the SVD approach A = UΣV H .Hence, A† = V Σ†UH .Using MATLAB to calculate the SVD yields:

Σ =

39.378 0 0 00 10.002 0 00 0 3.203 0

→ Σ† =

0.025 0 00 0.1 00 0 0.3120 0 0

Calculating x yields:

x = V Σ†UHb =

1236

= x

The SVD cancelled the 4th column which is dependent on the 2ndcolumn of A.Since b ∈ R (A) the exact solution could be calculated.



Example IIIn this case

A =

5 0 0 00 2 0 00 0 0 0

, x =

x1x2x3x4

=

1236

, b =

543

Obviously, b /∈ R (A). Neither A−1 nor

(AHA

)−1 exist. Using theSVD Pseudo Inverse:

x = V Σ†UHb =

1200



Examining the solution using the SVD. First, Since rank (A) = 2its Column Space is spanned by the first 2 columns of U.Calculating the projection of b onto the Column Space of A is

given by b = ProjR(A) (b) =2∑

i=1U i HbU i =

540

.Now given the updated Linear System Ax = b which has infinitenumber of solutions.

One could calculate that N (A) = span

0010

,

0001

.

Hence x =

b1

A1,1b2

A2,200

+

s

0010

+ t

0001

= xr + xn where s, t ∈ R.



The target is the solution with the minimum norm.Since xr⊥xn the norm of this solution is

‖x‖2 = ‖xr‖2 + ‖xn‖2

The minimum norm solution is obtained by taking xn = 0.This results in the Pseudo Inverse solution as above.



Numerically Sensitive ProblemsSystems of equations that are poorly conditioned are sensitive tosmall change in values.Since, practically speaking, there are always inaccuracies inmeasured data, the solution to these equations may be almostmeaningless.The SVD can help with the solution of ill-conditioned equations byidentifying the direction of sensitivity and discarding that portionof the problem. The procedure will be illustrated by the followingexample.



Example IIIExamining the following system of equations Ax = b[

1+ 3ε 1− 3ε3− ε 3+ ε

] [x1x2

]=

[b1b2

]The SVD of A is

A =1√20

[1 33 −1

] [2√5

2ε√5

] [1 11 −1

]From which the exact inverse of A is

A−1 =√20[1 11 −1

] [ 12√5

12ε√5

] [1 33 −1

]=

120

[1+ 3

ε 3− 1ε

1− 3ε 3+ 1

ε

]Easily, one can convince himself that for small ε the matrix A−1has large entries which makes x = A−1b unstable.



Observe that the entry 12ε√5 multiplies the column

[1−1

]. This is

the sensitive direction. As b changes slightly, the solution changesin a direction mostly along the sensitive direction.If ε is small, σ2 = 2ε

√5 may be set to zero to approximate A.

A ≈ 1√20

[1 33 −1

] [2√5

0

] [1 11 −1

]The Pseudo Inverse is

A† =√20[1 11 −1

] [ 12√5

0

] [1 33 −1

]=

120

[1 31 3

]In this case the multiplier of the sensitive direction vector is zero,no motion in the sensitive direction occurs. Any Least Squaressolution to the equation Ax = b is of the form x = A†b so that

x = c[11

]for c ∈ R, meaning perpendicular to the sensitive

direction.



As this example illustrates, the SVD identifies the stable andunstable directions of the problem and, by zeroing small singularvalues, eliminates the unstable directions.The SVD could be used to both illustrate poor conditioning andprovide a cure for the ailment. For the equation Ax = b withsolution x = A−1b, writing the solution using the SVD:

x = A−1b =(

UΣV H)−1

=r

∑i=1

viuH

i bσi

If the singular value σi is small, then a small change in b or a smallchange in either U or V may be amplified into a large change inthe solution x . A small singular value responds to a matrix whichis nearly singular and thus more difficult to invert accurately.



Another point of view, considering the equation

Ax0 = b0 ⇒ x0 = A−1b0

Let b = b0 + δb where δb is the error or noise, etc.Therefore

Ax = b0 + δb ⇒ x = A−1b0 + A−1δb = x0 + δx

Investigating how small or large is this error in the answer for agiven amount of error. Note that

δx = A−1δb ⇒ ‖δx‖ ≤∥∥A−1

∥∥ ‖δb‖

Or since∥∥A−1

∥∥ = σmax(A−1

)= 1

σmin(A) the following holds

‖δx‖ ≤ ‖δb‖σmin (A)



However recalling that x0 = A−1b0 and therefore

‖x0‖ ≥ σmin(A−1

)‖b0‖ =

‖b0‖σmax (A)

Combining the equations yields

‖δx‖‖x0‖

≤ ‖δb‖‖b0‖

σmax (A)σmin (A)

The last fraction, σmax (A)σmin(A) , is called ’The Condition Number of A’.

This number is indicative of the magnification of error in linearequation of interest. In most problems, a matrix with very largecondition number is called ill conditioned and will result in severenumerical difficulties.



The solution to those numerical difficulties using the SVD isbasically rank reduction:

1 Compute the SVD of A.2 Examine the singular values of A and zero out any that are

"small" to obtain a new approximate Σ matrix.3 Compute the solution by x = V Σ†UHb.

Determining which singular values are "small" is problemdependent and requires some judgement.


Total Least Squares





Total Least Squares

In the classic Least Squares problems, the solution minimizing‖Ax − b‖2 is sought after. The hidden assumption is that matrixA is correct, any error in the problem is in b.The Least Squares problem finds a vector x s.t.

‖Ax − b‖2 = min

which is accomplished by finding some perturbation r of the righthand side of minimum norm

Ax = b + r

s.t. (b + r ) ∈ R (A). In the Total Least Squares problem, boththe right and left side of the equation assumed to have errors. Thesolution of the perturbed equation

(A + E ) x = b + r

is sought s.t. (b + r ) ∈ R (A + E ) and the norm of theperturbations is minimized.


Total Least Squares

Intuitively, The right hand side is "bent" toward the left hand sidewhile the left hand side is "bent" toward the right hand side.


Total Least Squares

Let A be an mxn matrix. To find the solution to the TLS problemone may observe the homogeneous form

[A + E |b + r

] [ x−1

]= 0→

([A|b]+[E |r]) [ x−1

]= 0

Let C =[A|b]∈ Cmx(n+1) and let ∆ =

[E |r]be the perturbation

of the data. In order for the homogeneous form to have solution

the vector[

x−1

]must lie in the Null Space of C + ∆ and in order

for the solution not to be trivial, the perturbation ∆ must be suchthat C + ∆ is rank deficient.


Total Least Squares

Analyzing the TLS problem using the SVD. We bring(A + E ) x = (b + r ) into the form

[A + E |b + r

] [ x−1

]= 0

Let[A + E |b + r

]= UΣV H be the SVD of the above form. If

σn+1 6= 0 then rank([

A + E |b + r])

= n + 1 which means theR([

A + E |b + r])

= Rn+1, hence there’s no nonzero vector inthe orthogonal complement of the Row Space hence the set ofequations is incompatible. To obtain solution the rank of[A + E |b + r

]must be reduced to n.

As shown before the best approximation of rank n in bothFrobenius and l2 norm is given by the SVD[

A|b]= UΣV H , Σ = diag (σ1, σ2, . . . , σn, 0)


Total Least Squares

The minimal TLS correction is given by

σn+1 = minrank

([A|b

])=n

∥∥[A|b]− [A|b]∥∥F

Attained for [E |r]= σn+1un+1vH

n+1

Note that the TLS correction matrix has rank one.It is clear that the approximate set

[A|b] [ x−1

]= 0 is compatible

and the solution is given by the only vector, vn+1, that belongs toN([

A|b]).

The TLS solution is obtained by scaling vn+1 until its lastcomponent equals to −1, or[

x−1

]=

−1Vn+1,n+1

vn+1


Total Least Squares

For simplicity it assumed that Vn+1,n+1 6= 0 and σn > σn+1 hencethe solution exists and it is unique. Otherwise, the solution mightnot exists or isn’t unique (Any superposition of few columns of V ).For complete analysis of the existence and uniqueness of thesolution see [].

Basic algorithm of the TLS would be:Given Ax ≈ b, where A ∈ Cmxn, b ∈ Cm the TLS solution couldbe obtained by

Compute the SVD of[A|b]= UΣV H .

If Vn+1,n+1 6= 0 the TLS solution would be

xTLS =−1

Vn+1,n+1vn+1 (1 : n)


Total Least Squares

The geometric properties of the solution could be described asfollowing, the TLS solution minimizes the distance between thevector b to the plane defined by the solution xTLS .Let C = UΣV H From the definition of the l2 Norm of a matrix

‖Cv‖2‖v‖2

≥ σn+1

Where ‖v‖2 6= 0. Equality holds if and only if v ∈ Sc whereSc = span {vi} and vi are the columns of V which satisfyuH

i Cvi = σn+1.The TLS problem amounts to finding vector x s.t.∥∥∥∥[A|b] [ x

−1

]∥∥∥∥2∥∥∥∥[ x

−1

]∥∥∥∥2

= σn+1


Total Least Squares

By squaring everywhere

minx

∥∥∥∥[A|b] [ x−1

]∥∥∥∥22∥∥∥∥[ x

−1

]∥∥∥∥22

= minx

m

∑i=1

∣∣AHi x − bi

∣∣2xHx + 1

The quantity |AHi x−bi |2xHx+1 is the square of the distance from the point[

AHi

b

]∈ Cn+1 to the nearest point on the hyperplane P defined by

P =

{[ab

]|a ∈ Cn, b ∈ C, b = xHa

}So the TLS problem amounts to finding the closest hyperplane to

the set of points{[

AH1

b1

],

[AH2

b2

], . . . ,

[AH

mbm

]}.


Total Least Squares

The minimum distance property can be shown as following. Let Pbe the plane orthogonal to the normal vector n ∈ Rn+1 s.t.

P ={

r ∈ Cn+1 : rHn = 0}

and let n have the following form n =

[x−1

]. Let p =

[AH

mbm

]be a

point in Cn+1. Finding a point, q ∈ Cn+1 which belongs to theplane P and is closest to the point p is a constrained optimizationproblem, minimize ‖p − q‖ subject to nHq = 0. The minimizationfunction

J (q) = ‖p − q‖2 + 2λnHq = pHp − 2pHq + 2λnHq + qHq= (q − p + λn)2 (q − p + λn) + 2λpHn− λ2nHn

This is clearly minimized when q = p − λn.


Total Least Squares

Determining λ by the constrain

nHq = nHp − λnHn = 0→ λ =nHpnHn

Inserting results into the "Minimization Function" yields

J (q) = 2λpHn− λ2nHn

=2nHpppHn

nHn − nHnpHpnHnnHnnHn

=

(nHp

)2nHn =

(xHAm − bm

)2xHx + 1

Alternative solution using the "Projection Theorem". The distancefrom the point p to the plane P can be found by finding the lengthof the projection of p onto n, which yields

d2min (p,P) =

〈p, n〉2

‖n‖2=

[xH ,−1

] [AmH

bm

]xHx + 1


Principal Component Analysis






Principal Component Analysis (PCA) is the mathematicalprocedure that uses an orthogonal transformation to convert a setof observations of possibly correlated variables into a set of valuesof uncorrelated variables called Principal Components.The number of Principal Components is less than or equal to thenumber of original variables.This transformation is defined in such way that the first componenthas a variance as high as possible (That’s, accounts for as much ofthe variability in data as possible), and each succeeding componentin turn has the highest variance possible under the constraint thatit is orthogonal to (Uncorrelated with) the preceding components.



PCA is mathematically defined as an orthogonal lineartransformation that transforms the data to a new coordinatesystem such that the greatest variance by any projection of thedata comes to lie on the first coordinate (Called the first PrincipalComponent), the second greatest variance on the secondcoordinate and so on.Assuming given a collection of data of columns vectorsa1, a2, . . . , am ∈ Rn.The projection of the data onto a subspace U ∈ Rr , r ≤ m whichis is spanned by the orthogonal basis u1, u2, . . . , ur is given by

ai = fi1u1 + fi2u2 + . . . + fir ur , i = 1 : m

for some coefficients fij .Note that fij = aH

i uj , the projection of ai along the direction of uj .By the Projection Theorem This projection is the closest in thel2 − norm sense to the data given by ai .



The search is after the orthogonal basis u1, u2, . . . , ur .Formulation the constraint of maximization of the variance alongthe direction of u1 yields

max‖w‖=1

m

∑i=1

∣∣∣aHi w∣∣∣2 = ∥∥∥AHw

∥∥∥2 = (AHw)H (

AHw)= wHAAHw

Using the SVD of A = UΣV H , Then AAH = UΣΣHUH .Observing,

wHAAHwwHw =

(UHw

)H ΣΣH (UHw)

(UHw)H(UHw)

Noticing that there are only r non zero entries in Σ by theproperties the SVD. Defining x = UHw yields

wHAAHwwHw =

σ21x2

1 + σ22x2

2 + . . . + σ2r x2

rx21 + x2

2 + . . . + x2m



Now we have

maxw 6=0

wHAAHwwHw = max

w 6=0

σ21x2

1 + σ22x2

2 + . . . + σ2r x2

rx21 + x2

2 + . . . + x2m

Assuming σ1 ≥ σ2 ≥ . . . ≥ σr .Then

maxw 6=0

σ21x2

1 + σ22x2

2 + . . . + σ2r x2

rx21 + x2

2 + . . . + x2m

= σ21 = λ1

Which the largest eigenvalue of A. The vector x which makes themaximum is x1 = 1 and xi = 0 for i = 2 : m. Which correspondsto w = Ux = u1.The first Principal Component is indeed achieved by the firsteigenvector u1 of AAH .



Calculating the second Principal Component under the constraintbeing orthogonal to the first and maximizing the projection

max‖w‖=1,wHu1=0

m

∑i=1

∣∣∣aHi w∣∣∣2 = max

w 6=0,wHu1=0

wH (AAH)wwHw

Using the definition from above yields

maxx 6=0,xHUHu1=0

σ21x2

1 + σ22x2

2 + . . . + σ2r x2

rx21 + x2

2 + . . . + x2m

=

maxx 6=0,x1=0

σ21x2

1 + σ22x2

2 + . . . + σ2r x2

rx21 + x2

2 + . . . + x2m

= σ22 = λ2

Which is the second largest eigenvalue of AAH .The vector x which makes the maximum is x2 = 1 and xi = 0 fori = 1, 3 : m.This corresponds to w = Ux = u2, The second Eigenvector, u2, ofAAH .



Continuing this pattern, ui is the ith Principal Component.The set of orthogonal vectors which spans the subspace the data isprojected to and maximizes the variance of the data is the first rvectors which consists the orthogonal matrix from the SVD, U.Observing the SVD yields the the result immediately

A = UΣV H → Y = UHA = ΣV H

Observing the scatter matrix of Y

CY = YY H = UHA(

UHA)H

= UHAAHU = UHCX U

Since the matrix U is the Eigenvectors matrix of CX = XXH bythe Diagonalization Theorem CY is diagonal. Another look yields

YY H = ΣV H(

ΣV H)H

= ΣV HV ΣH = ΣΣH = diag(σ21 , σ2

2 , . . . , σ2r )

Namely, the Scatter matrix, hence the Covariance Matrix of Y isdiagonal. Moreover, The constraint on the variance holds.


The SVD is a decomposition which can be applied on anymatrix.The SVD exposes fundamental properties of a linear operatorsuch as the fundamental spaces, Frobenius Norm and l2 Norm.The SVD can be utilized in many applications such as solvinglinear systems (Least Squares, Total Least Squares) and orderreduction (Compression, Noise Reduction, PrincipalComponent Analysis).

To Be ContinuedRegulating Linear Equations System.

Appendix

For Further Reading

A. Author.Handbook of Everything.Some Press, 1990.S. Someone.On this and that.Journal on This and That. 2(1):50–100, 2000.

R. Avital.On this and that.Journal on This and That. 2(1):50–100, 2000.

singular value decomposition - lecture notes

Documents