csci b609: βfoundations of data scienceβgrigory.us/b609/lec05.pdfβ’ random spherical gaussian...
TRANSCRIPT
CSCI B609: βFoundations of Data Scienceβ
Grigory Yaroslavtsev http://grigory.us
Lecture 8/9: Faster Power Method and Applications of SVD
Slides at http://grigory.us/data-science-class.html
Faster Power Method β’ PM drawback: π΄ππ΄ is dense even for sparse π΄
β’ Pick random Gaussian π and compute π΅ππ
β’ π = πππππ π=1 (augment ππβs to o.n.b. if π < π )
β’ π΅ππ β π12ππ1π1
π ππππππ=1 = π1
2ππ1π1
π΅ππ = π΄ππ΄ π΄ππ΄ β¦ (π΄ππ΄)π
β’ Theorem: If π is unit βπ -vector, πππ1 β₯ πΉ:
β π = subspace spanned by ππβ²π for ππ β₯ 1 β π π1
β π = unit vector after π =1
2πln1
ππΉ iterations of PM
β π has a component at most π orthogonal to π
Faster Power Method: Analysis
β’ π΄ = ππππππππ
π=1 and π = πππππ π=1
β’ π΅ππ = ππ2πππππ
ππ π=1 ππππ
π π=1 = ππ
2ππππππ π=1
π΅ππ2
2= ππ
2πππππ
π
π=12
2
= ππ4πππ2
π
π=1
β₯ π14ππ12 β₯ ππ
4ππΏ2
β’ (Squared ) component orthogonal to π is
ππ4πππ2
π
π=π+1
β€ 1 β π 4ππ14π ππ
2
π
π=π+1
β€ 1 β π 4ππ14π
β’ Component of π β₯ π β€ 1 β π 2π/πΉ β€ π
Choice of π
β’ π random spherical Gaussian with unit variance
β’ π =π
ππ
:
ππ ππ»π β€1
20 π β€1
10+ 3πβπ /64
β’ ππ ππβ₯ π π β€ 3πβπ /64 (Gaussian Annulus)
β’ ππ»π βΌ π 0,1 β Pr ππ»π2β€1
10β€1
10
β’ Can set πΉ =1
20 π in the βfaster power methodβ
Singular Vectors and Eigenvectors
β’ Right singular vectors are eigenvectors of π΄ππ΄
β’ ππ2 are eigenvalues of π΄ππ΄
β’ Left singular vectors are eigenvectors of π΄π΄π
β’ π΄ππ΄ satisfies βπ: πππ΅π β₯ 0
β π΅ = ππ2ππππ
ππ
β βπ: ππππππππ = (ππππ)
2β₯ 0
β Such matrices are called positive semi-definite
β’ Any p.s.d matrix can be decomposed as π΄ππ΄
Application of SVD: Centering Data
β’ Minimize sum of squared distances from π¨π to ππ
β’ SVD: best fitting ππ if data is centered
β’ What if not?
β’ Thm. ππ that minimizes squared distance goes through centroid of the point set:
1
π π¨π
β’ Will only prove for π = 1, analogous proof for arbitrary π (see textbook)
Application of SVD: Centering Data β’ Thm. Line that minimizes squared distance goes through the centroid β’ Line: β = π + ππ; distance πππ π‘(π¨π, β)
β’ π¨π β π 22= πππ π‘ π¨π, β
2 + π, π¨π2
β’ Center so that π¨πππ=1 = π by subtracting the centroid
β’ πππ π‘ π¨π, β2π
π = ( π¨π β π 22β π, π¨π
2)ππ=1
= ( π¨π 22+ π
2
2β 2β¨π¨π, πβ© β π, π¨π
2)
π
π=1
= π¨π 22+ π π
2
2β 2β¨ π¨π
π
π=1
, πβ© β π,π¨π2
π
π=1
π
π=1
= π¨π 22+ π π
2
2β π,π¨π
2
π
π=1
π
π=1
β’ Minimized when π = π
Principal Component Analysis
β’ π Γ π matrix: customersΓmovies preference
β’ π = #customers, π = #movies
β’ π΄ππ = how much customer π likes movie π
β’ Assumption: π΄ππ can be described with π factors
β Customers and movies: vectors in ππ and ππ β βπ
β π΄ππ = β¨ππ, ππβ©
β’ Solution: π΄π
Class Project β’ Survey of 3-5 research papers
β Closely related to the topics of the class β’ Algorithms for high-dimensional data β’ Fast algorithms for numerical linear algebra β’ Algorithms for machine learning and/or clustering β’ Algorithms for streaming and massive data
β Office hours if you need suggestions β Individual (not a group) project β 1-page Proposal Due: October 31, 2016 at 23:59 EST β Final Deadline: December 09, 2016 at 23:59 EST
β’ Submission by e-mail to Lisul Islam (IU id: islammdl) β Submission Email Title: Project + Space + βYour Nameβ β Submission format: PDF from LaTeX
Separating mixture of π Gaussians
β’ Sample origin problem: β Given samples from π well-separated spherical Gaussians
β Q: Did they come from the same Gaussian?
β’ πΏ = distance between centers
β’ For two Gaussians naΓ―ve separation requires
πΏ > π π π/π
β’ Thm. πΏ = Ξ©(π1
4) suffices
β’ Idea: β Project on a π-dimensional subspace through centers
β Key fact: This subspace can be found via SVD
β Apply naΓ―ve algorithm
Separating mixture of π Gaussians β’ Easy fact: Projection preserves the property of
being a unit-variance spherical Gaussian
β’ Def. If π is a probability distribution, best fit line *ππ, π β β+ is:
π = ππππππ₯ π =1 πΌπβΌπ ππ»ππ
β’ Thm: Best fit line for a Gaussian centered at π passes through π and the origin
Best fit line for a Gaussian β’ Thm: Best fit line for a Gaussian centered at π
passes through π and the origin
πΌπβΌπ ππ»ππ= πΌπβΌπ π
π» π β π + ππ»ππ
= πΌπβΌπ ππ» π β π π + 2(ππ»π)ππ» π β π + (ππ»π)2
= πΌπβΌπ ,ππ» π β π π- + 2(ππ»π)πΌπβΌπ,π
π» π β π - + (ππ»π)2
= πΌπβΌπ ,ππ» π β π π- + (ππ»π)2
= π2 +(ππ»π)2
β’ Where we used:
β πΌπβΌπ,ππ» π β π - = π
β πΌπβΌπ,ππ» π β π π- = π2
β’ Best fit line maximizes (ππ»π)2
Best fit subspace for one Gaussian
β’ Best fit π-dimensional subspace π½π:
π½π = ππππππ₯π½:πππ π½ =π
πΌπβΌπ ππππ π, π½ π
π
β’ For a spherical Gaussian π½ is a best-fit π-dimensional subspace iff it contains π
β’ If π = 0 then any π-dim. subspace is best fit
β’ If π β 0 then best fit line π goes through π β Same greedy process as SVD projects on π
β After projection we have Gaussian with π = 0
β Any (π β 1)-dimensional subspace would do
Best fit subspace for π Gaussians
β’ Thm. π is a mixture of π spherical Gaussians β best fit π-dim. subspace contains their centers
β’ π = π€1π1 +π€2π2 +β―+π€πππ
β’ Let π½ be a subspace of dimension β€ π
πΌπβΌπ ππππ π, π½ π
π= π€π
π
π=1
πΌπβΌππ ππππ π, π½ π
π
β’ Each term is maximized if π½ contains all ππβ²π
β’ If we only have a finite number of samples then accuracy has to be analyzed carefully
HITS Algorithm for Hubs and Authorities
β’ Document ranking: project on 1st singular vector
β’ WWW: directed graph with links = edges
β’ π Authorities: pages containing original info
β’ π Hubs: collections of links to authorities β Authority depends on importance of pointing hubs
β Hub quality depends on how authoritative links are
β’ Authority vector: ππ, π = 1,β¦ , π: ππ βΌ πππ¨πππ π=1
β’ Hub vector: ππ, π = 1,β¦ , π : ππ βΌ πππ¨ππππ=1
β’ Use power method: π = π¨π, π = π¨π»π
β’ Converges to first left/right singular vectors