lecture 6: singular value decomposition - ichiru/datascience/nov_6.pdf · lecture 6: singular value...

70
Lecture 6: Singular Value Decomposition - I November 6, 2017 Lecture 6: Singular Value Decomposition - I November 6, 2017 1 / 14

Upload: others

Post on 30-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Lecture 6: Singular Value Decomposition - I

November 6, 2017

Lecture 6: Singular Value Decomposition - I November 6, 2017 1 / 14

Page 2: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Introduction

n data points in d space - each a row of Data Matrix A (n × dmatrix).

Singular Value Decomposition (SVD) consists of best fit kdimensional subspace for A, for every k , k = 1,2, . . .rank(A).Best Fit in the sense of minimum sum of squared (perpendicular)distances of data points to subspace. Will see best fit for every ksimultaneously.Equivalently, maximum sum of squares of the lengths of projectionof data points into subspace.

Lecture 6: Singular Value Decomposition - I November 6, 2017 2 / 14

Page 3: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Introduction

n data points in d space - each a row of Data Matrix A (n × dmatrix).Singular Value Decomposition (SVD) consists of best fit kdimensional subspace for A, for every k , k = 1,2, . . .rank(A).

Best Fit in the sense of minimum sum of squared (perpendicular)distances of data points to subspace. Will see best fit for every ksimultaneously.Equivalently, maximum sum of squares of the lengths of projectionof data points into subspace.

Lecture 6: Singular Value Decomposition - I November 6, 2017 2 / 14

Page 4: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Introduction

n data points in d space - each a row of Data Matrix A (n × dmatrix).Singular Value Decomposition (SVD) consists of best fit kdimensional subspace for A, for every k , k = 1,2, . . .rank(A).Best Fit in the sense of minimum sum of squared (perpendicular)distances of data points to subspace. Will see best fit for every ksimultaneously.

Equivalently, maximum sum of squares of the lengths of projectionof data points into subspace.

Lecture 6: Singular Value Decomposition - I November 6, 2017 2 / 14

Page 5: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Introduction

n data points in d space - each a row of Data Matrix A (n × dmatrix).Singular Value Decomposition (SVD) consists of best fit kdimensional subspace for A, for every k , k = 1,2, . . .rank(A).Best Fit in the sense of minimum sum of squared (perpendicular)distances of data points to subspace. Will see best fit for every ksimultaneously.Equivalently, maximum sum of squares of the lengths of projectionof data points into subspace.

Lecture 6: Singular Value Decomposition - I November 6, 2017 2 / 14

Page 6: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Minimize distances ≡ Maximize projections

v

ai

disti

proji

Figure: The projection of the point ai onto the line through the origin inthe direction of v.

Min∑

idist2i ≡ Max

∑i

proj2i courtesy Pythogorus

Contrast: “Least-Squares Fit”: Given (xi , yi), i = 1,2, . . . ,nMina,b(axi + b − yi)

2. Dist.s “vertical”, not perp to line. [PICTURE]Least Squares- Not nec. through 0. But SVD : subspace, so has0. See later: best-fit affine subspace passes through centroid ofdata. Can translate to make centroid = 0.

Lecture 6: Singular Value Decomposition - I November 6, 2017 3 / 14

Page 7: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Minimize distances ≡ Maximize projections

v

ai

disti

proji

Figure: The projection of the point ai onto the line through the origin inthe direction of v.

Min∑

idist2i ≡ Max

∑i

proj2i courtesy Pythogorus

Contrast: “Least-Squares Fit”: Given (xi , yi), i = 1,2, . . . ,nMina,b(axi + b − yi)

2. Dist.s “vertical”, not perp to line. [PICTURE]Least Squares- Not nec. through 0. But SVD : subspace, so has0. See later: best-fit affine subspace passes through centroid ofdata. Can translate to make centroid = 0.

Lecture 6: Singular Value Decomposition - I November 6, 2017 3 / 14

Page 8: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Minimize distances ≡ Maximize projections

v

ai

disti

proji

Figure: The projection of the point ai onto the line through the origin inthe direction of v.

Min∑

idist2i ≡ Max

∑i

proj2i courtesy Pythogorus

Contrast: “Least-Squares Fit”: Given (xi , yi), i = 1,2, . . . ,nMina,b(axi + b − yi)

2. Dist.s “vertical”, not perp to line. [PICTURE]

Least Squares- Not nec. through 0. But SVD : subspace, so has0. See later: best-fit affine subspace passes through centroid ofdata. Can translate to make centroid = 0.

Lecture 6: Singular Value Decomposition - I November 6, 2017 3 / 14

Page 9: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Minimize distances ≡ Maximize projections

v

ai

disti

proji

Figure: The projection of the point ai onto the line through the origin inthe direction of v.

Min∑

idist2i ≡ Max

∑i

proj2i courtesy Pythogorus

Contrast: “Least-Squares Fit”: Given (xi , yi), i = 1,2, . . . ,nMina,b(axi + b − yi)

2. Dist.s “vertical”, not perp to line. [PICTURE]Least Squares- Not nec. through 0. But SVD : subspace, so has0. See later: best-fit affine subspace passes through centroid ofdata. Can translate to make centroid = 0.

Lecture 6: Singular Value Decomposition - I November 6, 2017 3 / 14

Page 10: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Will Show: Greedy works

First find the best-fit 1-dimensional subspace to data: line(through the origin).

Then, find best-fit line (through 0) perpendicular to first line.At the i th step, find best fit line perp to i − 1 lines found so far.Until: rank(A).Will Show: When done, we can write A = UDV T , where, columnsof V are unit vectors along lines found above; D is a diagonalmatrix with positive entries and columns of U,V are orthonormal.[A = UDV T is called SVD.] Now focus on the just the best-fit lines,not on the matrix factorization yet.

Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14

Page 11: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Will Show: Greedy works

First find the best-fit 1-dimensional subspace to data: line(through the origin).Then, find best-fit line (through 0) perpendicular to first line.

At the i th step, find best fit line perp to i − 1 lines found so far.Until: rank(A).Will Show: When done, we can write A = UDV T , where, columnsof V are unit vectors along lines found above; D is a diagonalmatrix with positive entries and columns of U,V are orthonormal.[A = UDV T is called SVD.] Now focus on the just the best-fit lines,not on the matrix factorization yet.

Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14

Page 12: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Will Show: Greedy works

First find the best-fit 1-dimensional subspace to data: line(through the origin).Then, find best-fit line (through 0) perpendicular to first line.At the i th step, find best fit line perp to i − 1 lines found so far.Until: rank(A).

Will Show: When done, we can write A = UDV T , where, columnsof V are unit vectors along lines found above; D is a diagonalmatrix with positive entries and columns of U,V are orthonormal.[A = UDV T is called SVD.] Now focus on the just the best-fit lines,not on the matrix factorization yet.

Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14

Page 13: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Will Show: Greedy works

First find the best-fit 1-dimensional subspace to data: line(through the origin).Then, find best-fit line (through 0) perpendicular to first line.At the i th step, find best fit line perp to i − 1 lines found so far.Until: rank(A).Will Show: When done, we can write A = UDV T , where, columnsof V are unit vectors along lines found above; D is a diagonalmatrix with positive entries and columns of U,V are orthonormal.[A = UDV T is called SVD.] Now focus on the just the best-fit lines,not on the matrix factorization yet.

Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14

Page 14: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

First Singular Vector

Notation: A n × d data matrix; each row is a data point.

If v is the unit vector along the best-fit line, v minimizes among allunit length vectors x the quantity:∑n

i=1(|ai|2 − (ai · x)2) =∑n

i=1 |ai|2 − |Ax|2.Now,

∑i |ai|2 = ||A||2F does not depend on x. So (as we said

earlier), equivalent to maximizing |Ax|2.Define the first singular vector of A by:v1 = argmax|v|=1 |Av|There can be ties. [What is an obvious tie for v1?] Break themarbitrarily. Will use ArgMax for arb. broken ties.Value σ1(A) = |Av1| is called the first singular value of A.If all data points lie on a line through the origin, the line on whichprojections are maximized is precisely the same line, so it is thefirst singular vectors.Further singular vectors. Think - what if data points are coplanar?Would like to get two perpendicular vectors spanning the plane.

Lecture 6: Singular Value Decomposition - I November 6, 2017 5 / 14

Page 15: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

First Singular Vector

Notation: A n × d data matrix; each row is a data point.If v is the unit vector along the best-fit line, v minimizes among allunit length vectors x the quantity:∑n

i=1(|ai|2 − (ai · x)2) =∑n

i=1 |ai|2 − |Ax|2.

Now,∑

i |ai|2 = ||A||2F does not depend on x. So (as we saidearlier), equivalent to maximizing |Ax|2.Define the first singular vector of A by:v1 = argmax|v|=1 |Av|There can be ties. [What is an obvious tie for v1?] Break themarbitrarily. Will use ArgMax for arb. broken ties.Value σ1(A) = |Av1| is called the first singular value of A.If all data points lie on a line through the origin, the line on whichprojections are maximized is precisely the same line, so it is thefirst singular vectors.Further singular vectors. Think - what if data points are coplanar?Would like to get two perpendicular vectors spanning the plane.

Lecture 6: Singular Value Decomposition - I November 6, 2017 5 / 14

Page 16: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

First Singular Vector

Notation: A n × d data matrix; each row is a data point.If v is the unit vector along the best-fit line, v minimizes among allunit length vectors x the quantity:∑n

i=1(|ai|2 − (ai · x)2) =∑n

i=1 |ai|2 − |Ax|2.Now,

∑i |ai|2 = ||A||2F does not depend on x. So (as we said

earlier), equivalent to maximizing |Ax|2.

Define the first singular vector of A by:v1 = argmax|v|=1 |Av|There can be ties. [What is an obvious tie for v1?] Break themarbitrarily. Will use ArgMax for arb. broken ties.Value σ1(A) = |Av1| is called the first singular value of A.If all data points lie on a line through the origin, the line on whichprojections are maximized is precisely the same line, so it is thefirst singular vectors.Further singular vectors. Think - what if data points are coplanar?Would like to get two perpendicular vectors spanning the plane.

Lecture 6: Singular Value Decomposition - I November 6, 2017 5 / 14

Page 17: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

First Singular Vector

Notation: A n × d data matrix; each row is a data point.If v is the unit vector along the best-fit line, v minimizes among allunit length vectors x the quantity:∑n

i=1(|ai|2 − (ai · x)2) =∑n

i=1 |ai|2 − |Ax|2.Now,

∑i |ai|2 = ||A||2F does not depend on x. So (as we said

earlier), equivalent to maximizing |Ax|2.Define the first singular vector of A by:v1 = argmax|v|=1 |Av|

There can be ties. [What is an obvious tie for v1?] Break themarbitrarily. Will use ArgMax for arb. broken ties.Value σ1(A) = |Av1| is called the first singular value of A.If all data points lie on a line through the origin, the line on whichprojections are maximized is precisely the same line, so it is thefirst singular vectors.Further singular vectors. Think - what if data points are coplanar?Would like to get two perpendicular vectors spanning the plane.

Lecture 6: Singular Value Decomposition - I November 6, 2017 5 / 14

Page 18: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

First Singular Vector

Notation: A n × d data matrix; each row is a data point.If v is the unit vector along the best-fit line, v minimizes among allunit length vectors x the quantity:∑n

i=1(|ai|2 − (ai · x)2) =∑n

i=1 |ai|2 − |Ax|2.Now,

∑i |ai|2 = ||A||2F does not depend on x. So (as we said

earlier), equivalent to maximizing |Ax|2.Define the first singular vector of A by:v1 = argmax|v|=1 |Av|There can be ties. [What is an obvious tie for v1?] Break themarbitrarily. Will use ArgMax for arb. broken ties.

Value σ1(A) = |Av1| is called the first singular value of A.If all data points lie on a line through the origin, the line on whichprojections are maximized is precisely the same line, so it is thefirst singular vectors.Further singular vectors. Think - what if data points are coplanar?Would like to get two perpendicular vectors spanning the plane.

Lecture 6: Singular Value Decomposition - I November 6, 2017 5 / 14

Page 19: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

First Singular Vector

Notation: A n × d data matrix; each row is a data point.If v is the unit vector along the best-fit line, v minimizes among allunit length vectors x the quantity:∑n

i=1(|ai|2 − (ai · x)2) =∑n

i=1 |ai|2 − |Ax|2.Now,

∑i |ai|2 = ||A||2F does not depend on x. So (as we said

earlier), equivalent to maximizing |Ax|2.Define the first singular vector of A by:v1 = argmax|v|=1 |Av|There can be ties. [What is an obvious tie for v1?] Break themarbitrarily. Will use ArgMax for arb. broken ties.Value σ1(A) = |Av1| is called the first singular value of A.

If all data points lie on a line through the origin, the line on whichprojections are maximized is precisely the same line, so it is thefirst singular vectors.Further singular vectors. Think - what if data points are coplanar?Would like to get two perpendicular vectors spanning the plane.

Lecture 6: Singular Value Decomposition - I November 6, 2017 5 / 14

Page 20: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

First Singular Vector

Notation: A n × d data matrix; each row is a data point.If v is the unit vector along the best-fit line, v minimizes among allunit length vectors x the quantity:∑n

i=1(|ai|2 − (ai · x)2) =∑n

i=1 |ai|2 − |Ax|2.Now,

∑i |ai|2 = ||A||2F does not depend on x. So (as we said

earlier), equivalent to maximizing |Ax|2.Define the first singular vector of A by:v1 = argmax|v|=1 |Av|There can be ties. [What is an obvious tie for v1?] Break themarbitrarily. Will use ArgMax for arb. broken ties.Value σ1(A) = |Av1| is called the first singular value of A.If all data points lie on a line through the origin, the line on whichprojections are maximized is precisely the same line, so it is thefirst singular vectors.

Further singular vectors. Think - what if data points are coplanar?Would like to get two perpendicular vectors spanning the plane.

Lecture 6: Singular Value Decomposition - I November 6, 2017 5 / 14

Page 21: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

First Singular Vector

Notation: A n × d data matrix; each row is a data point.If v is the unit vector along the best-fit line, v minimizes among allunit length vectors x the quantity:∑n

i=1(|ai|2 − (ai · x)2) =∑n

i=1 |ai|2 − |Ax|2.Now,

∑i |ai|2 = ||A||2F does not depend on x. So (as we said

earlier), equivalent to maximizing |Ax|2.Define the first singular vector of A by:v1 = argmax|v|=1 |Av|There can be ties. [What is an obvious tie for v1?] Break themarbitrarily. Will use ArgMax for arb. broken ties.Value σ1(A) = |Av1| is called the first singular value of A.If all data points lie on a line through the origin, the line on whichprojections are maximized is precisely the same line, so it is thefirst singular vectors.Further singular vectors. Think - what if data points are coplanar?Would like to get two perpendicular vectors spanning the plane.

Lecture 6: Singular Value Decomposition - I November 6, 2017 5 / 14

Page 22: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Greedy and Definition of further singular vectors

How to define further singular vectors? Again, think coplanar datapoints. Would like the 2-dim subspace maximizing sum of squaredprojections.

Try Greedy: Define the second singular vector v2 as the one(break ties arbitrarily) maximizing the sum of projections squaredsubject to being perpendicular to first. Algebra: same as:v2 = argmax v⊥v1

|v|=1|Av|.

v3 = argmax v⊥v1,v2|v|=1

|Av|.

Define σ2(A) = |Av2| ; σ3(A) = |Av3|... as the singular values of A.Stop when v1,v2, . . . ,vr have been found andmax v⊥v1,v2,...,vr

|v|=1|Av| = 0.

Will prove: r = rank(A) and even if there are ties, the singularvalues σ1(A), σ2(A), . . . are unique.

Lecture 6: Singular Value Decomposition - I November 6, 2017 6 / 14

Page 23: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Greedy and Definition of further singular vectors

How to define further singular vectors? Again, think coplanar datapoints. Would like the 2-dim subspace maximizing sum of squaredprojections.Try Greedy: Define the second singular vector v2 as the one(break ties arbitrarily) maximizing the sum of projections squaredsubject to being perpendicular to first. Algebra: same as:

v2 = argmax v⊥v1|v|=1|Av|.

v3 = argmax v⊥v1,v2|v|=1

|Av|.

Define σ2(A) = |Av2| ; σ3(A) = |Av3|... as the singular values of A.Stop when v1,v2, . . . ,vr have been found andmax v⊥v1,v2,...,vr

|v|=1|Av| = 0.

Will prove: r = rank(A) and even if there are ties, the singularvalues σ1(A), σ2(A), . . . are unique.

Lecture 6: Singular Value Decomposition - I November 6, 2017 6 / 14

Page 24: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Greedy and Definition of further singular vectors

How to define further singular vectors? Again, think coplanar datapoints. Would like the 2-dim subspace maximizing sum of squaredprojections.Try Greedy: Define the second singular vector v2 as the one(break ties arbitrarily) maximizing the sum of projections squaredsubject to being perpendicular to first. Algebra: same as:v2 = argmax v⊥v1

|v|=1|Av|.

v3 = argmax v⊥v1,v2|v|=1

|Av|.

Define σ2(A) = |Av2| ; σ3(A) = |Av3|... as the singular values of A.Stop when v1,v2, . . . ,vr have been found andmax v⊥v1,v2,...,vr

|v|=1|Av| = 0.

Will prove: r = rank(A) and even if there are ties, the singularvalues σ1(A), σ2(A), . . . are unique.

Lecture 6: Singular Value Decomposition - I November 6, 2017 6 / 14

Page 25: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Greedy and Definition of further singular vectors

How to define further singular vectors? Again, think coplanar datapoints. Would like the 2-dim subspace maximizing sum of squaredprojections.Try Greedy: Define the second singular vector v2 as the one(break ties arbitrarily) maximizing the sum of projections squaredsubject to being perpendicular to first. Algebra: same as:v2 = argmax v⊥v1

|v|=1|Av|.

v3 = argmax v⊥v1,v2|v|=1

|Av|.

Define σ2(A) = |Av2| ; σ3(A) = |Av3|... as the singular values of A.Stop when v1,v2, . . . ,vr have been found andmax v⊥v1,v2,...,vr

|v|=1|Av| = 0.

Will prove: r = rank(A) and even if there are ties, the singularvalues σ1(A), σ2(A), . . . are unique.

Lecture 6: Singular Value Decomposition - I November 6, 2017 6 / 14

Page 26: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Greedy and Definition of further singular vectors

How to define further singular vectors? Again, think coplanar datapoints. Would like the 2-dim subspace maximizing sum of squaredprojections.Try Greedy: Define the second singular vector v2 as the one(break ties arbitrarily) maximizing the sum of projections squaredsubject to being perpendicular to first. Algebra: same as:v2 = argmax v⊥v1

|v|=1|Av|.

v3 = argmax v⊥v1,v2|v|=1

|Av|.

Define σ2(A) = |Av2| ; σ3(A) = |Av3|... as the singular values of A.

Stop when v1,v2, . . . ,vr have been found andmax v⊥v1,v2,...,vr

|v|=1|Av| = 0.

Will prove: r = rank(A) and even if there are ties, the singularvalues σ1(A), σ2(A), . . . are unique.

Lecture 6: Singular Value Decomposition - I November 6, 2017 6 / 14

Page 27: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Greedy and Definition of further singular vectors

How to define further singular vectors? Again, think coplanar datapoints. Would like the 2-dim subspace maximizing sum of squaredprojections.Try Greedy: Define the second singular vector v2 as the one(break ties arbitrarily) maximizing the sum of projections squaredsubject to being perpendicular to first. Algebra: same as:v2 = argmax v⊥v1

|v|=1|Av|.

v3 = argmax v⊥v1,v2|v|=1

|Av|.

Define σ2(A) = |Av2| ; σ3(A) = |Av3|... as the singular values of A.Stop when v1,v2, . . . ,vr have been found andmax v⊥v1,v2,...,vr

|v|=1|Av| = 0.

Will prove: r = rank(A) and even if there are ties, the singularvalues σ1(A), σ2(A), . . . are unique.

Lecture 6: Singular Value Decomposition - I November 6, 2017 6 / 14

Page 28: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Greedy and Definition of further singular vectors

How to define further singular vectors? Again, think coplanar datapoints. Would like the 2-dim subspace maximizing sum of squaredprojections.Try Greedy: Define the second singular vector v2 as the one(break ties arbitrarily) maximizing the sum of projections squaredsubject to being perpendicular to first. Algebra: same as:v2 = argmax v⊥v1

|v|=1|Av|.

v3 = argmax v⊥v1,v2|v|=1

|Av|.

Define σ2(A) = |Av2| ; σ3(A) = |Av3|... as the singular values of A.Stop when v1,v2, . . . ,vr have been found andmax v⊥v1,v2,...,vr

|v|=1|Av| = 0.

Will prove: r = rank(A) and even if there are ties, the singularvalues σ1(A), σ2(A), . . . are unique.

Lecture 6: Singular Value Decomposition - I November 6, 2017 6 / 14

Page 29: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Greedy Works

Define best-fit k dim subspace for A as the one maximizing thesum of squared projection lengths of data points into subspaceover all k dim subspaces.

Theorem (The Greedy Algorithm Works)Let A be an n × d matrix with singular vectors v1,v2, . . . ,vr. For1 ≤ k ≤ r , let Vk be the subspace spanned by v1,v2, . . . ,vk. Foreach k , Vk is the best-fit k -dimensional subspace for A.

Lecture 6: Singular Value Decomposition - I November 6, 2017 7 / 14

Page 30: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Greedy Works

Define best-fit k dim subspace for A as the one maximizing thesum of squared projection lengths of data points into subspaceover all k dim subspaces.Theorem (The Greedy Algorithm Works)Let A be an n × d matrix with singular vectors v1,v2, . . . ,vr. For1 ≤ k ≤ r , let Vk be the subspace spanned by v1,v2, . . . ,vk. Foreach k , Vk is the best-fit k -dimensional subspace for A.

Lecture 6: Singular Value Decomposition - I November 6, 2017 7 / 14

Page 31: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Proof

Proof by induction on k . Statement obvious for k = 1.

Lets do k = 2. Assume W is the best-fit 2-d subspace.Claim There is a w2 ∈W s.t. |w2| = 1, w2 · v1 = 0.Because the projection of v1 onto W spans a (at most) one dimsubspace W ′ of W . Take w2 ∈W perp to W ′.Choose w1 ∈W of unit length perpendicular to w2. w1,w2 form a(orthonormal) basis for W . [Convention: Basis meansorthonormal..]Sum of squared projections of data points into W equals|Aw1|2 + |Aw2|2 - Why ? Algebra..|Aw1|2 ≤ |Av1|2 (Why?)|Aw2|2 ≤ |Av2|2 (why?) Add to get: V2 as good as W .

Lecture 6: Singular Value Decomposition - I November 6, 2017 8 / 14

Page 32: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Proof

Proof by induction on k . Statement obvious for k = 1.Lets do k = 2. Assume W is the best-fit 2-d subspace.

Claim There is a w2 ∈W s.t. |w2| = 1, w2 · v1 = 0.Because the projection of v1 onto W spans a (at most) one dimsubspace W ′ of W . Take w2 ∈W perp to W ′.Choose w1 ∈W of unit length perpendicular to w2. w1,w2 form a(orthonormal) basis for W . [Convention: Basis meansorthonormal..]Sum of squared projections of data points into W equals|Aw1|2 + |Aw2|2 - Why ? Algebra..|Aw1|2 ≤ |Av1|2 (Why?)|Aw2|2 ≤ |Av2|2 (why?) Add to get: V2 as good as W .

Lecture 6: Singular Value Decomposition - I November 6, 2017 8 / 14

Page 33: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Proof

Proof by induction on k . Statement obvious for k = 1.Lets do k = 2. Assume W is the best-fit 2-d subspace.Claim There is a w2 ∈W s.t. |w2| = 1, w2 · v1 = 0.

Because the projection of v1 onto W spans a (at most) one dimsubspace W ′ of W . Take w2 ∈W perp to W ′.Choose w1 ∈W of unit length perpendicular to w2. w1,w2 form a(orthonormal) basis for W . [Convention: Basis meansorthonormal..]Sum of squared projections of data points into W equals|Aw1|2 + |Aw2|2 - Why ? Algebra..|Aw1|2 ≤ |Av1|2 (Why?)|Aw2|2 ≤ |Av2|2 (why?) Add to get: V2 as good as W .

Lecture 6: Singular Value Decomposition - I November 6, 2017 8 / 14

Page 34: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Proof

Proof by induction on k . Statement obvious for k = 1.Lets do k = 2. Assume W is the best-fit 2-d subspace.Claim There is a w2 ∈W s.t. |w2| = 1, w2 · v1 = 0.Because the projection of v1 onto W spans a (at most) one dimsubspace W ′ of W . Take w2 ∈W perp to W ′.

Choose w1 ∈W of unit length perpendicular to w2. w1,w2 form a(orthonormal) basis for W . [Convention: Basis meansorthonormal..]Sum of squared projections of data points into W equals|Aw1|2 + |Aw2|2 - Why ? Algebra..|Aw1|2 ≤ |Av1|2 (Why?)|Aw2|2 ≤ |Av2|2 (why?) Add to get: V2 as good as W .

Lecture 6: Singular Value Decomposition - I November 6, 2017 8 / 14

Page 35: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Proof

Proof by induction on k . Statement obvious for k = 1.Lets do k = 2. Assume W is the best-fit 2-d subspace.Claim There is a w2 ∈W s.t. |w2| = 1, w2 · v1 = 0.Because the projection of v1 onto W spans a (at most) one dimsubspace W ′ of W . Take w2 ∈W perp to W ′.Choose w1 ∈W of unit length perpendicular to w2. w1,w2 form a(orthonormal) basis for W . [Convention: Basis meansorthonormal..]

Sum of squared projections of data points into W equals|Aw1|2 + |Aw2|2 - Why ? Algebra..|Aw1|2 ≤ |Av1|2 (Why?)|Aw2|2 ≤ |Av2|2 (why?) Add to get: V2 as good as W .

Lecture 6: Singular Value Decomposition - I November 6, 2017 8 / 14

Page 36: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Proof

Proof by induction on k . Statement obvious for k = 1.Lets do k = 2. Assume W is the best-fit 2-d subspace.Claim There is a w2 ∈W s.t. |w2| = 1, w2 · v1 = 0.Because the projection of v1 onto W spans a (at most) one dimsubspace W ′ of W . Take w2 ∈W perp to W ′.Choose w1 ∈W of unit length perpendicular to w2. w1,w2 form a(orthonormal) basis for W . [Convention: Basis meansorthonormal..]Sum of squared projections of data points into W equals|Aw1|2 + |Aw2|2 - Why ? Algebra..

|Aw1|2 ≤ |Av1|2 (Why?)|Aw2|2 ≤ |Av2|2 (why?) Add to get: V2 as good as W .

Lecture 6: Singular Value Decomposition - I November 6, 2017 8 / 14

Page 37: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Proof

Proof by induction on k . Statement obvious for k = 1.Lets do k = 2. Assume W is the best-fit 2-d subspace.Claim There is a w2 ∈W s.t. |w2| = 1, w2 · v1 = 0.Because the projection of v1 onto W spans a (at most) one dimsubspace W ′ of W . Take w2 ∈W perp to W ′.Choose w1 ∈W of unit length perpendicular to w2. w1,w2 form a(orthonormal) basis for W . [Convention: Basis meansorthonormal..]Sum of squared projections of data points into W equals|Aw1|2 + |Aw2|2 - Why ? Algebra..|Aw1|2 ≤ |Av1|2 (Why?)

|Aw2|2 ≤ |Av2|2 (why?) Add to get: V2 as good as W .

Lecture 6: Singular Value Decomposition - I November 6, 2017 8 / 14

Page 38: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Proof

Proof by induction on k . Statement obvious for k = 1.Lets do k = 2. Assume W is the best-fit 2-d subspace.Claim There is a w2 ∈W s.t. |w2| = 1, w2 · v1 = 0.Because the projection of v1 onto W spans a (at most) one dimsubspace W ′ of W . Take w2 ∈W perp to W ′.Choose w1 ∈W of unit length perpendicular to w2. w1,w2 form a(orthonormal) basis for W . [Convention: Basis meansorthonormal..]Sum of squared projections of data points into W equals|Aw1|2 + |Aw2|2 - Why ? Algebra..|Aw1|2 ≤ |Av1|2 (Why?)|Aw2|2 ≤ |Av2|2 (why?) Add to get: V2 as good as W .

Lecture 6: Singular Value Decomposition - I November 6, 2017 8 / 14

Page 39: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Proof for general k > 2

Inductive hypothesis implies: Vk−1 is best-fit k − 1 dim subspace.

Suppose W is best-fit k dim subspace. Claim There is a unitlength vector wk in W perpendicular to Vk−1 because: projectionsof v1,v2, . . . ,vk−1 onto W span a (at most) k − 1 dimensionalsubspace of W , so there is a wk perpendicular to this in W .Choose a basis w1,w2, . . .wk of W .|Aw1|2 + |Aw2|2 + · · ·+ |Awk−1|2 ≤ |Av1|2 + |Av2|2 + · · ·+ |Avk−1|2- Why?

Induction.

|Awk|2 ≤ |Avk|2. Why? Add to get Vk as good as W . QED

Lecture 6: Singular Value Decomposition - I November 6, 2017 9 / 14

Page 40: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Proof for general k > 2

Inductive hypothesis implies: Vk−1 is best-fit k − 1 dim subspace.Suppose W is best-fit k dim subspace. Claim There is a unitlength vector wk in W perpendicular to Vk−1 because: projectionsof v1,v2, . . . ,vk−1 onto W span a (at most) k − 1 dimensionalsubspace of W , so there is a wk perpendicular to this in W .

Choose a basis w1,w2, . . .wk of W .|Aw1|2 + |Aw2|2 + · · ·+ |Awk−1|2 ≤ |Av1|2 + |Av2|2 + · · ·+ |Avk−1|2- Why?

Induction.

|Awk|2 ≤ |Avk|2. Why? Add to get Vk as good as W . QED

Lecture 6: Singular Value Decomposition - I November 6, 2017 9 / 14

Page 41: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Proof for general k > 2

Inductive hypothesis implies: Vk−1 is best-fit k − 1 dim subspace.Suppose W is best-fit k dim subspace. Claim There is a unitlength vector wk in W perpendicular to Vk−1 because: projectionsof v1,v2, . . . ,vk−1 onto W span a (at most) k − 1 dimensionalsubspace of W , so there is a wk perpendicular to this in W .Choose a basis w1,w2, . . .wk of W .

|Aw1|2 + |Aw2|2 + · · ·+ |Awk−1|2 ≤ |Av1|2 + |Av2|2 + · · ·+ |Avk−1|2- Why?

Induction.

|Awk|2 ≤ |Avk|2. Why? Add to get Vk as good as W . QED

Lecture 6: Singular Value Decomposition - I November 6, 2017 9 / 14

Page 42: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Proof for general k > 2

Inductive hypothesis implies: Vk−1 is best-fit k − 1 dim subspace.Suppose W is best-fit k dim subspace. Claim There is a unitlength vector wk in W perpendicular to Vk−1 because: projectionsof v1,v2, . . . ,vk−1 onto W span a (at most) k − 1 dimensionalsubspace of W , so there is a wk perpendicular to this in W .Choose a basis w1,w2, . . .wk of W .|Aw1|2 + |Aw2|2 + · · ·+ |Awk−1|2 ≤ |Av1|2 + |Av2|2 + · · ·+ |Avk−1|2- Why?

Induction.

|Awk|2 ≤ |Avk|2. Why? Add to get Vk as good as W . QED

Lecture 6: Singular Value Decomposition - I November 6, 2017 9 / 14

Page 43: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Proof for general k > 2

Inductive hypothesis implies: Vk−1 is best-fit k − 1 dim subspace.Suppose W is best-fit k dim subspace. Claim There is a unitlength vector wk in W perpendicular to Vk−1 because: projectionsof v1,v2, . . . ,vk−1 onto W span a (at most) k − 1 dimensionalsubspace of W , so there is a wk perpendicular to this in W .Choose a basis w1,w2, . . .wk of W .|Aw1|2 + |Aw2|2 + · · ·+ |Awk−1|2 ≤ |Av1|2 + |Av2|2 + · · ·+ |Avk−1|2- Why?

Induction.

|Awk|2 ≤ |Avk|2. Why? Add to get Vk as good as W . QED

Lecture 6: Singular Value Decomposition - I November 6, 2017 9 / 14

Page 44: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Proof for general k > 2

Inductive hypothesis implies: Vk−1 is best-fit k − 1 dim subspace.Suppose W is best-fit k dim subspace. Claim There is a unitlength vector wk in W perpendicular to Vk−1 because: projectionsof v1,v2, . . . ,vk−1 onto W span a (at most) k − 1 dimensionalsubspace of W , so there is a wk perpendicular to this in W .Choose a basis w1,w2, . . .wk of W .|Aw1|2 + |Aw2|2 + · · ·+ |Awk−1|2 ≤ |Av1|2 + |Av2|2 + · · ·+ |Avk−1|2- Why?

Induction.

|Awk|2 ≤ |Avk|2. Why? Add to get Vk as good as W . QED

Lecture 6: Singular Value Decomposition - I November 6, 2017 9 / 14

Page 45: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Consequences

Theorem Proves: σ1(A), σ2(A), . . . are unique even if there wereties for the singular vectors.

Proof: σ1(A) obviously (first) exists (closed, bounded etc.) and isunique.Now, there is a unique value of the maximum over all 2-dsubspaces of sum of projections squared onto subspace, becausethe set of 2-d subspaces in closed etc.; call this value µ2.Theorem says: σ1(A)2 + σ2

2(A) = |Av1|2 + |Av2|2 = µ2. So,σ2

2(A) = µ2 − σ21(A) is unique.

General k : assume σ1(A), σ2(A), . . . , σk−1(A) are unique. Let µkbe the maximum over all k−d subspaces of the sum of squaredprojections onto the subspace. Then, theorem implies thatσ2

1(A) + σ22(A) + · · ·+ σk (A)2 = µk . Using inductive hypothesis,

now, σk (A) is unique. Provided µk exists - Prove.

Lecture 6: Singular Value Decomposition - I November 6, 2017 10 / 14

Page 46: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Consequences

Theorem Proves: σ1(A), σ2(A), . . . are unique even if there wereties for the singular vectors.Proof: σ1(A) obviously (first) exists (closed, bounded etc.) and isunique.

Now, there is a unique value of the maximum over all 2-dsubspaces of sum of projections squared onto subspace, becausethe set of 2-d subspaces in closed etc.; call this value µ2.Theorem says: σ1(A)2 + σ2

2(A) = |Av1|2 + |Av2|2 = µ2. So,σ2

2(A) = µ2 − σ21(A) is unique.

General k : assume σ1(A), σ2(A), . . . , σk−1(A) are unique. Let µkbe the maximum over all k−d subspaces of the sum of squaredprojections onto the subspace. Then, theorem implies thatσ2

1(A) + σ22(A) + · · ·+ σk (A)2 = µk . Using inductive hypothesis,

now, σk (A) is unique. Provided µk exists - Prove.

Lecture 6: Singular Value Decomposition - I November 6, 2017 10 / 14

Page 47: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Consequences

Theorem Proves: σ1(A), σ2(A), . . . are unique even if there wereties for the singular vectors.Proof: σ1(A) obviously (first) exists (closed, bounded etc.) and isunique.Now, there is a unique value of the maximum over all 2-dsubspaces of sum of projections squared onto subspace, becausethe set of 2-d subspaces in closed etc.; call this value µ2.Theorem says: σ1(A)2 + σ2

2(A) = |Av1|2 + |Av2|2 = µ2. So,σ2

2(A) = µ2 − σ21(A) is unique.

General k : assume σ1(A), σ2(A), . . . , σk−1(A) are unique. Let µkbe the maximum over all k−d subspaces of the sum of squaredprojections onto the subspace. Then, theorem implies thatσ2

1(A) + σ22(A) + · · ·+ σk (A)2 = µk . Using inductive hypothesis,

now, σk (A) is unique. Provided µk exists - Prove.

Lecture 6: Singular Value Decomposition - I November 6, 2017 10 / 14

Page 48: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Consequences

Theorem Proves: σ1(A), σ2(A), . . . are unique even if there wereties for the singular vectors.Proof: σ1(A) obviously (first) exists (closed, bounded etc.) and isunique.Now, there is a unique value of the maximum over all 2-dsubspaces of sum of projections squared onto subspace, becausethe set of 2-d subspaces in closed etc.; call this value µ2.Theorem says: σ1(A)2 + σ2

2(A) = |Av1|2 + |Av2|2 = µ2. So,σ2

2(A) = µ2 − σ21(A) is unique.

General k : assume σ1(A), σ2(A), . . . , σk−1(A) are unique. Let µkbe the maximum over all k−d subspaces of the sum of squaredprojections onto the subspace. Then, theorem implies thatσ2

1(A) + σ22(A) + · · ·+ σk (A)2 = µk . Using inductive hypothesis,

now, σk (A) is unique. Provided µk exists - Prove.

Lecture 6: Singular Value Decomposition - I November 6, 2017 10 / 14

Page 49: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Aside: Convergence of Subspaces

Suppose V1,V2, . . . are an infinite sequence of k dimensionalsubspaces of Rd . There is a convergent sub-sequence. [Caution:Subspaces seem to be unbounded objects.]

Choose a basis for each Vi . First take a subsequence of {Vi} inwhich the first basis vector converges.Then take subsequence of the subsequence where the secondbasis vector converges. Repeat.Finally, get a subsequence with each basis vector converging.Prove: in the limit, each “basis vector” is of length 1 and they areorthonormal. [Just convergent sequence of reals.]

Lecture 6: Singular Value Decomposition - I November 6, 2017 11 / 14

Page 50: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Aside: Convergence of Subspaces

Suppose V1,V2, . . . are an infinite sequence of k dimensionalsubspaces of Rd . There is a convergent sub-sequence. [Caution:Subspaces seem to be unbounded objects.]Choose a basis for each Vi . First take a subsequence of {Vi} inwhich the first basis vector converges.

Then take subsequence of the subsequence where the secondbasis vector converges. Repeat.Finally, get a subsequence with each basis vector converging.Prove: in the limit, each “basis vector” is of length 1 and they areorthonormal. [Just convergent sequence of reals.]

Lecture 6: Singular Value Decomposition - I November 6, 2017 11 / 14

Page 51: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Aside: Convergence of Subspaces

Suppose V1,V2, . . . are an infinite sequence of k dimensionalsubspaces of Rd . There is a convergent sub-sequence. [Caution:Subspaces seem to be unbounded objects.]Choose a basis for each Vi . First take a subsequence of {Vi} inwhich the first basis vector converges.Then take subsequence of the subsequence where the secondbasis vector converges. Repeat.

Finally, get a subsequence with each basis vector converging.Prove: in the limit, each “basis vector” is of length 1 and they areorthonormal. [Just convergent sequence of reals.]

Lecture 6: Singular Value Decomposition - I November 6, 2017 11 / 14

Page 52: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Aside: Convergence of Subspaces

Suppose V1,V2, . . . are an infinite sequence of k dimensionalsubspaces of Rd . There is a convergent sub-sequence. [Caution:Subspaces seem to be unbounded objects.]Choose a basis for each Vi . First take a subsequence of {Vi} inwhich the first basis vector converges.Then take subsequence of the subsequence where the secondbasis vector converges. Repeat.Finally, get a subsequence with each basis vector converging.Prove: in the limit, each “basis vector” is of length 1 and they areorthonormal. [Just convergent sequence of reals.]

Lecture 6: Singular Value Decomposition - I November 6, 2017 11 / 14

Page 53: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Singular Values and Norm

Av1 is list of lengths (with signs) of projections of rows of A ontov1.

So, σ1(A) = |Av1| may be viewed as “component” of A along v1.σ2(A) is “component” of A along v2.... Analogous to decomposinga vector into its components along the basis vectors.Better have sum of squares of components = whole to completethe analogy.Since ai · v = 0 for all v ⊥ v1,v2, . . . ,vr (why?), we have:∑r

t=1(ai · vt)2 = |ai|2.

n∑j=1|aj|2 =

n∑j=1

r∑i=1

(aj · vi)2 =

r∑i=1

n∑j=1

(aj · vi)2 =

r∑i=1|Avi|2 =

r∑i=1

σ2i (A).

Lemma∑r

t=1 σ2t (A) = ||A||2F .

Lecture 6: Singular Value Decomposition - I November 6, 2017 12 / 14

Page 54: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Singular Values and Norm

Av1 is list of lengths (with signs) of projections of rows of A ontov1.So, σ1(A) = |Av1| may be viewed as “component” of A along v1.

σ2(A) is “component” of A along v2.... Analogous to decomposinga vector into its components along the basis vectors.Better have sum of squares of components = whole to completethe analogy.Since ai · v = 0 for all v ⊥ v1,v2, . . . ,vr (why?), we have:∑r

t=1(ai · vt)2 = |ai|2.

n∑j=1|aj|2 =

n∑j=1

r∑i=1

(aj · vi)2 =

r∑i=1

n∑j=1

(aj · vi)2 =

r∑i=1|Avi|2 =

r∑i=1

σ2i (A).

Lemma∑r

t=1 σ2t (A) = ||A||2F .

Lecture 6: Singular Value Decomposition - I November 6, 2017 12 / 14

Page 55: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Singular Values and Norm

Av1 is list of lengths (with signs) of projections of rows of A ontov1.So, σ1(A) = |Av1| may be viewed as “component” of A along v1.σ2(A) is “component” of A along v2.... Analogous to decomposinga vector into its components along the basis vectors.

Better have sum of squares of components = whole to completethe analogy.Since ai · v = 0 for all v ⊥ v1,v2, . . . ,vr (why?), we have:∑r

t=1(ai · vt)2 = |ai|2.

n∑j=1|aj|2 =

n∑j=1

r∑i=1

(aj · vi)2 =

r∑i=1

n∑j=1

(aj · vi)2 =

r∑i=1|Avi|2 =

r∑i=1

σ2i (A).

Lemma∑r

t=1 σ2t (A) = ||A||2F .

Lecture 6: Singular Value Decomposition - I November 6, 2017 12 / 14

Page 56: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Singular Values and Norm

Av1 is list of lengths (with signs) of projections of rows of A ontov1.So, σ1(A) = |Av1| may be viewed as “component” of A along v1.σ2(A) is “component” of A along v2.... Analogous to decomposinga vector into its components along the basis vectors.Better have sum of squares of components = whole to completethe analogy.

Since ai · v = 0 for all v ⊥ v1,v2, . . . ,vr (why?), we have:∑rt=1(ai · vt)

2 = |ai|2.n∑

j=1|aj|2 =

n∑j=1

r∑i=1

(aj · vi)2 =

r∑i=1

n∑j=1

(aj · vi)2 =

r∑i=1|Avi|2 =

r∑i=1

σ2i (A).

Lemma∑r

t=1 σ2t (A) = ||A||2F .

Lecture 6: Singular Value Decomposition - I November 6, 2017 12 / 14

Page 57: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Singular Values and Norm

Av1 is list of lengths (with signs) of projections of rows of A ontov1.So, σ1(A) = |Av1| may be viewed as “component” of A along v1.σ2(A) is “component” of A along v2.... Analogous to decomposinga vector into its components along the basis vectors.Better have sum of squares of components = whole to completethe analogy.Since ai · v = 0 for all v ⊥ v1,v2, . . . ,vr (why?), we have:∑r

t=1(ai · vt)2 = |ai|2.

n∑j=1|aj|2 =

n∑j=1

r∑i=1

(aj · vi)2 =

r∑i=1

n∑j=1

(aj · vi)2 =

r∑i=1|Avi|2 =

r∑i=1

σ2i (A).

Lemma∑r

t=1 σ2t (A) = ||A||2F .

Lecture 6: Singular Value Decomposition - I November 6, 2017 12 / 14

Page 58: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Singular Values and Norm

Av1 is list of lengths (with signs) of projections of rows of A ontov1.So, σ1(A) = |Av1| may be viewed as “component” of A along v1.σ2(A) is “component” of A along v2.... Analogous to decomposinga vector into its components along the basis vectors.Better have sum of squares of components = whole to completethe analogy.Since ai · v = 0 for all v ⊥ v1,v2, . . . ,vr (why?), we have:∑r

t=1(ai · vt)2 = |ai|2.

n∑j=1|aj|2 =

n∑j=1

r∑i=1

(aj · vi)2 =

r∑i=1

n∑j=1

(aj · vi)2 =

r∑i=1|Avi|2 =

r∑i=1

σ2i (A).

Lemma∑r

t=1 σ2t (A) = ||A||2F .

Lecture 6: Singular Value Decomposition - I November 6, 2017 12 / 14

Page 59: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Singular Values and Norm

Av1 is list of lengths (with signs) of projections of rows of A ontov1.So, σ1(A) = |Av1| may be viewed as “component” of A along v1.σ2(A) is “component” of A along v2.... Analogous to decomposinga vector into its components along the basis vectors.Better have sum of squares of components = whole to completethe analogy.Since ai · v = 0 for all v ⊥ v1,v2, . . . ,vr (why?), we have:∑r

t=1(ai · vt)2 = |ai|2.

n∑j=1|aj|2 =

n∑j=1

r∑i=1

(aj · vi)2 =

r∑i=1

n∑j=1

(aj · vi)2 =

r∑i=1|Avi|2 =

r∑i=1

σ2i (A).

Lemma∑r

t=1 σ2t (A) = ||A||2F .

Lecture 6: Singular Value Decomposition - I November 6, 2017 12 / 14

Page 60: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Right and left singular vectors

v1,v2, . . . ,vr are called the right-singular vectors.

The vectors Avi form a fundamental set of vectors. Normalize tolength 1: ui =

1σi (A)

Avi.

Will show later ui similarly maximizes |uT A| over all uperpendicular to u1, . . . ,ui−1.ui are called the left-singular vectors.By definition, the right-singular vectors are orthogonal.Will show later that the left-singular vectors are also orthogonal.

Lecture 6: Singular Value Decomposition - I November 6, 2017 13 / 14

Page 61: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Right and left singular vectors

v1,v2, . . . ,vr are called the right-singular vectors.The vectors Avi form a fundamental set of vectors. Normalize tolength 1: ui =

1σi (A)

Avi.

Will show later ui similarly maximizes |uT A| over all uperpendicular to u1, . . . ,ui−1.ui are called the left-singular vectors.By definition, the right-singular vectors are orthogonal.Will show later that the left-singular vectors are also orthogonal.

Lecture 6: Singular Value Decomposition - I November 6, 2017 13 / 14

Page 62: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Right and left singular vectors

v1,v2, . . . ,vr are called the right-singular vectors.The vectors Avi form a fundamental set of vectors. Normalize tolength 1: ui =

1σi (A)

Avi.

Will show later ui similarly maximizes |uT A| over all uperpendicular to u1, . . . ,ui−1.

ui are called the left-singular vectors.By definition, the right-singular vectors are orthogonal.Will show later that the left-singular vectors are also orthogonal.

Lecture 6: Singular Value Decomposition - I November 6, 2017 13 / 14

Page 63: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Right and left singular vectors

v1,v2, . . . ,vr are called the right-singular vectors.The vectors Avi form a fundamental set of vectors. Normalize tolength 1: ui =

1σi (A)

Avi.

Will show later ui similarly maximizes |uT A| over all uperpendicular to u1, . . . ,ui−1.ui are called the left-singular vectors.

By definition, the right-singular vectors are orthogonal.Will show later that the left-singular vectors are also orthogonal.

Lecture 6: Singular Value Decomposition - I November 6, 2017 13 / 14

Page 64: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Right and left singular vectors

v1,v2, . . . ,vr are called the right-singular vectors.The vectors Avi form a fundamental set of vectors. Normalize tolength 1: ui =

1σi (A)

Avi.

Will show later ui similarly maximizes |uT A| over all uperpendicular to u1, . . . ,ui−1.ui are called the left-singular vectors.By definition, the right-singular vectors are orthogonal.

Will show later that the left-singular vectors are also orthogonal.

Lecture 6: Singular Value Decomposition - I November 6, 2017 13 / 14

Page 65: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Right and left singular vectors

v1,v2, . . . ,vr are called the right-singular vectors.The vectors Avi form a fundamental set of vectors. Normalize tolength 1: ui =

1σi (A)

Avi.

Will show later ui similarly maximizes |uT A| over all uperpendicular to u1, . . . ,ui−1.ui are called the left-singular vectors.By definition, the right-singular vectors are orthogonal.Will show later that the left-singular vectors are also orthogonal.

Lecture 6: Singular Value Decomposition - I November 6, 2017 13 / 14

Page 66: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Singular Value Decomposition

A any matrix, vt, t = 1,2, . . . , r , ut, t = 1,2, . . . , r ,σt , t = 1,2, . . . , r , its right singular vectors, left singular vectorsand singular values respy.

Theorem (Singular Value Decomposition A =∑r

t=1 σtutvtT .

Sum of r outer products.Claim Matrices A,B are identical iff for all v, we have Av = Bv. IfAv = Bv for all v, this holds for each unit vector ej and so j th colof A,B are the same for all j .Let B =

∑rt=1 σtutvt

T . Want to show Av = Bv for all v. Enough toshow for a set of v forming a basis of space. Take a convienientbasis: v1,v2, . . . ,vr,vr+1, . . . ,vd, containing the r singular vectorsof A. [Such a basis exists. Why?]For t = 1,2, . . . , r : Avt = σtut and Bvt = σtvt too by the orthonomrality of v1, . . . ,vr.For t ≥ r + 1, Avt = 0 (Why?) and so is Bvt. QED

Lecture 6: Singular Value Decomposition - I November 6, 2017 14 / 14

Page 67: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Singular Value Decomposition

A any matrix, vt, t = 1,2, . . . , r , ut, t = 1,2, . . . , r ,σt , t = 1,2, . . . , r , its right singular vectors, left singular vectorsand singular values respy.Theorem (Singular Value Decomposition A =

∑rt=1 σtutvt

T .Sum of r outer products.

Claim Matrices A,B are identical iff for all v, we have Av = Bv. IfAv = Bv for all v, this holds for each unit vector ej and so j th colof A,B are the same for all j .Let B =

∑rt=1 σtutvt

T . Want to show Av = Bv for all v. Enough toshow for a set of v forming a basis of space. Take a convienientbasis: v1,v2, . . . ,vr,vr+1, . . . ,vd, containing the r singular vectorsof A. [Such a basis exists. Why?]For t = 1,2, . . . , r : Avt = σtut and Bvt = σtvt too by the orthonomrality of v1, . . . ,vr.For t ≥ r + 1, Avt = 0 (Why?) and so is Bvt. QED

Lecture 6: Singular Value Decomposition - I November 6, 2017 14 / 14

Page 68: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Singular Value Decomposition

A any matrix, vt, t = 1,2, . . . , r , ut, t = 1,2, . . . , r ,σt , t = 1,2, . . . , r , its right singular vectors, left singular vectorsand singular values respy.Theorem (Singular Value Decomposition A =

∑rt=1 σtutvt

T .Sum of r outer products.Claim Matrices A,B are identical iff for all v, we have Av = Bv. IfAv = Bv for all v, this holds for each unit vector ej and so j th colof A,B are the same for all j .

Let B =∑r

t=1 σtutvtT . Want to show Av = Bv for all v. Enough to

show for a set of v forming a basis of space. Take a convienientbasis: v1,v2, . . . ,vr,vr+1, . . . ,vd, containing the r singular vectorsof A. [Such a basis exists. Why?]For t = 1,2, . . . , r : Avt = σtut and Bvt = σtvt too by the orthonomrality of v1, . . . ,vr.For t ≥ r + 1, Avt = 0 (Why?) and so is Bvt. QED

Lecture 6: Singular Value Decomposition - I November 6, 2017 14 / 14

Page 69: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Singular Value Decomposition

A any matrix, vt, t = 1,2, . . . , r , ut, t = 1,2, . . . , r ,σt , t = 1,2, . . . , r , its right singular vectors, left singular vectorsand singular values respy.Theorem (Singular Value Decomposition A =

∑rt=1 σtutvt

T .Sum of r outer products.Claim Matrices A,B are identical iff for all v, we have Av = Bv. IfAv = Bv for all v, this holds for each unit vector ej and so j th colof A,B are the same for all j .Let B =

∑rt=1 σtutvt

T . Want to show Av = Bv for all v. Enough toshow for a set of v forming a basis of space. Take a convienientbasis: v1,v2, . . . ,vr,vr+1, . . . ,vd, containing the r singular vectorsof A. [Such a basis exists. Why?]

For t = 1,2, . . . , r : Avt = σtut and Bvt = σtvt too by the orthonomrality of v1, . . . ,vr.For t ≥ r + 1, Avt = 0 (Why?) and so is Bvt. QED

Lecture 6: Singular Value Decomposition - I November 6, 2017 14 / 14

Page 70: Lecture 6: Singular Value Decomposition - Ichiru/datascience/nov_6.pdf · Lecture 6: Singular Value Decomposition - I November 6, 2017 4 / 14. First Singular Vector Notation: A n

Singular Value Decomposition

A any matrix, vt, t = 1,2, . . . , r , ut, t = 1,2, . . . , r ,σt , t = 1,2, . . . , r , its right singular vectors, left singular vectorsand singular values respy.Theorem (Singular Value Decomposition A =

∑rt=1 σtutvt

T .Sum of r outer products.Claim Matrices A,B are identical iff for all v, we have Av = Bv. IfAv = Bv for all v, this holds for each unit vector ej and so j th colof A,B are the same for all j .Let B =

∑rt=1 σtutvt

T . Want to show Av = Bv for all v. Enough toshow for a set of v forming a basis of space. Take a convienientbasis: v1,v2, . . . ,vr,vr+1, . . . ,vd, containing the r singular vectorsof A. [Such a basis exists. Why?]For t = 1,2, . . . , r : Avt = σtut and Bvt = σtvt too by the orthonomrality of v1, . . . ,vr.For t ≥ r + 1, Avt = 0 (Why?) and so is Bvt. QED

Lecture 6: Singular Value Decomposition - I November 6, 2017 14 / 14