randomized svd, curdecomposition, and spsd matrix...
TRANSCRIPT
Randomized SVD, CUR Decomposition,and SPSD Matrix Approximation
Shusen Wang
Outline
• CX Decomposition & Approximate SVD• CUR Decomposition• SPSD Matrix Approximation
• Givenanymatrix𝐀 ∈ ℝ$×&
• TheCXdecompositionof𝐀1. Sketching:𝐂 = 𝐀𝐏 ∈ ℝ$×*2. Find𝐗 suchthat𝐀 ≈ 𝐂𝐗
• E.g.𝐗⋆ = argmin𝐗 𝐀 − 𝐂𝐗 56 = 𝐂7𝐀
• Itcosts𝑂 𝑚𝑛𝑐
• CXdecomposition⇔approximateSVD𝐀 ≈ 𝐂𝐗 = 𝐔G𝚺G𝐕GJ𝐗 = 𝐔G𝐙 = 𝐔G𝐔L𝚺L𝐕LJ
CX Decomposition
• Let the sketching matrix 𝐏 ∈ ℝ&×* be defined in the table.
• minWXYZ 𝐗 [\
𝐀 − 𝐂𝐗 ]6 ≤ 1 + 𝜖 𝐀 − 𝐀\ ]
6
Uniformsampling Leveragescoresampling
Gaussianprojection
SRHT Countsketch
c ≥ O 𝜈𝑘 log 𝑘 +1𝜖 O 𝑘 log 𝑘 +
1𝜖
O𝑘𝜖 O 𝑘 + log 𝑛 log 𝑘 +
1𝜖
O 𝑘6 +𝑘𝜖
𝜈 isthecolumncoherenceof𝐀Z
CX Decomposition
CX Decomposition ⇔ Approximate SVD
• CXdecomposition⇔approximateSVD
𝐀 ≈ 𝐂𝐗 = 𝐔G𝚺G𝐕GJ𝐗 = 𝐔G𝐙 = 𝐔G𝐔L𝚺L𝐕LJ
CX Decomposition ⇔ Approximate SVD
• CXdecomposition⇔approximateSVD
𝐀 ≈ 𝐂𝐗 = 𝐔G𝚺G𝐕GJ𝐗 = 𝐔G𝐙 = 𝐔G𝐔L𝚺L𝐕LJ
SVD:𝐂 = 𝐔G 𝚺G𝐕GJ ∈ ℝ$×*
Timecost:𝑂(𝑚𝑐6)
CX Decomposition ⇔ Approximate SVD
• CXdecomposition⇔approximateSVD
𝐀 ≈ 𝐂𝐗 = 𝐔G𝚺G𝐕GJ𝐗 = 𝐔G𝐙 = 𝐔G𝐔L𝚺L𝐕LJ
SVD:𝐂 = 𝐔G 𝚺G𝐕GJ ∈ ℝ$×*
Let𝚺G𝐕GJ𝐗 = 𝐙 ∈ ℝ*×&
Timecost:𝑂(𝑚𝑐6 + 𝑛𝑐6)
CX Decomposition ⇔ Approximate SVD
• CXdecomposition⇔approximateSVD
𝐀 ≈ 𝐂𝐗 = 𝐔G𝚺G𝐕GJ𝐗 = 𝐔G𝐙 = 𝐔G𝐔L𝚺L𝐕LJ
SVD:𝐂 = 𝐔G 𝚺G𝐕GJ ∈ ℝ$×*
Let𝚺G𝐕GJ𝐗 = 𝐙 ∈ ℝ*×&
SVD:𝐙 = 𝐔L𝚺L𝐕LJ ∈ ℝ*×&
Timecost:𝑂(𝑚𝑐6 + 𝑛𝑐6 + 𝑛𝑐6)
CX Decomposition ⇔ Approximate SVD
• CXdecomposition⇔approximateSVD
𝐀 ≈ 𝐂𝐗 = 𝐔G𝚺G𝐕GJ𝐗 = 𝐔G𝐙 = 𝐔G𝐔L𝚺L𝐕LJ
SVD:𝐂 = 𝐔G 𝚺G𝐕GJ ∈ ℝ$×*
Let𝚺G𝐕GJ𝐗 = 𝐙 ∈ ℝ*×&
SVD:𝐙 = 𝐔L𝚺L𝐕LJ ∈ ℝ*×&
𝑚×𝑠 matrixwithorthonormalcolumns
diagonalmatrix
𝑠×𝑛 matrixwithorthonormalrows
Timecost:𝑂(𝑚𝑐6 + 𝑛𝑐6 + 𝑛𝑐6 + 𝑚𝑐6)
CX Decomposition ⇔ Approximate SVD
• CXdecomposition⇔approximateSVD
• Done!Approximaterank𝑐 SVD:𝐀 ≈ (𝐔G𝐔L)𝚺L𝐕LJ
𝐀 ≈ 𝐂𝐗 = 𝐔G𝚺G𝐕GJ𝐗 = 𝐔G𝐙 = 𝐔G𝐔L𝚺L𝐕LJ
𝑚×𝑠 matrixwithorthonormalcolumns
diagonalmatrix
𝑠×𝑛 matrixwithorthonormalrows
Timecost:𝑂 𝑚𝑐6 + 𝑛𝑐6 + 𝑛𝑐6 + 𝑚𝑐6 = 𝑂(𝑚𝑐6 + 𝑛𝑐6)
CX Decomposition ⇔ Approximate SVD
• CXdecomposition⇔approximateSVD
• Given𝐀 ∈ ℝ$×& and𝐂 ∈ ℝ$×*,theapproximateSVDcosts• 𝑂 𝑚𝑛𝑐 time• 𝑂 𝑚𝑐 + 𝑛𝑐 memory
CX Decomposition
• TheCXdecompositionof𝐀 ∈ ℝ$×&
• Optimalsolution:𝐗⋆ = argmin𝐗 𝐀 − 𝐂𝐗 56 = 𝐂7𝐀
• Howtomakeitmoreefficient?
CX Decomposition
• TheCXdecompositionof𝐀 ∈ ℝ$×&
• Optimalsolution:𝐗⋆ = argmin𝐗 𝐀 − 𝐂𝐗 56 = 𝐂7𝐀
• Howtomakeitmoreefficient?Aregressionproblem!
Fast CX Decomposition
• FastCX [Drineas, Mahoney, Muthukrishnan, 2008][Clarkson & Woodruff, 2013]
• Drawanothersketchingmatrix𝐒 ∈ ℝ$×m
• Compute𝐗n = argmin𝐗 𝐒o 𝐀 − 𝐂𝐗 56 = 𝐒J𝐂 7 𝐒J𝐀
• Timecost:𝑂 𝑛𝑐𝑠 + TimeOfSketch• When𝑠 = 𝑂q 𝑐/𝜖 ,
𝐀 − 𝐂𝐗n5
6≤ 1 + 𝜖 ⋅ min𝐗 𝐀 − 𝐂𝐗 5
6
Outline
• CX Decomposition & Approximate SVD• CUR Decomposition• SPSD Matrix Approximation
CUR Decomposition
• Sketching• 𝐂 = 𝐀𝐏𝐂 ∈ ℝ$×*
• 𝐑 = 𝐏𝐑J𝐀 ∈ ℝv×&
• Find 𝐔 such that 𝐂𝐔𝐑 ≈ 𝐀• CUR⇔ ApproximateSVD• Inthesamewayas“CX⇔ ApproximateSVD”
CUR Decomposition
• Sketching• 𝐂 = 𝐀𝐏𝐂 ∈ ℝ$×*
• 𝐑 = 𝐏𝐑J𝐀 ∈ ℝv×&
• Find 𝐔 such that 𝐂𝐔𝐑 ≈ 𝐀• CUR⇔ ApproximateSVD• Inthesamewayas“CX⇔ ApproximateSVD”
• 3 types of 𝐔
CUR Decomposition
• Type 1 [Drineas, Mahoney, Muthukrishnan, 2008]:𝐔 = 𝐏𝐑o𝐀𝐏𝐂
7
𝐂𝐀 𝐔 𝐑
CUR Decomposition
• Type 1 [Drineas, Mahoney, Muthukrishnan, 2008]:𝐔 = 𝐏𝐑o𝐀𝐏𝐂
7
• Recall the fast CX decomposition𝐀 ≈ 𝐂𝐗n = 𝐂 𝐏𝐑o𝐂
7 𝐏𝐑o𝐀 = 𝐂𝐔𝐑
CUR Decomposition
• Type 1 [Drineas, Mahoney, Muthukrishnan, 2008]:𝐔 = 𝐏𝐑o𝐀𝐏𝐂
7
• Recall the fast CX decomposition𝐀 ≈ 𝐂𝐗n = 𝐂 𝐏𝐑o𝐂
7 𝐏𝐑o𝐀 = 𝐂𝐔𝐑
CUR Decomposition
• Type 1 [Drineas, Mahoney, Muthukrishnan, 2008]:𝐔 = 𝐏𝐑o𝐀𝐏𝐂
7
• Recall the fast CX decomposition𝐀 ≈ 𝐂𝐗n = 𝐂 𝐏𝐑o𝐂
7 𝐏𝐑o𝐀 = 𝐂𝐔𝐑
• They’re equivalent: 𝐂𝐗n = 𝐂𝐔𝐑
CUR Decomposition
• Type 1 [Drineas, Mahoney, Muthukrishnan, 2008]:𝐔 = 𝐏𝐑o𝐀𝐏𝐂
7
• Recall the fast CX decomposition𝐀 ≈ 𝐂𝐗n = 𝐂 𝐏𝐑o𝐂
7 𝐏𝐑o𝐀 = 𝐂𝐔𝐑
• They’re equivalent: 𝐂𝐗n = 𝐂𝐔𝐑
• Require 𝑐 = 𝑂q \w
and 𝑟 = 𝑂q *w
such that𝐀 − 𝐂𝐔𝐑 5
6 ≤ 1 + 𝜖 𝐀 − 𝐀\ 56
CUR Decomposition
• Type 1 [Drineas, Mahoney, Muthukrishnan, 2008]:𝐔 = 𝐏𝐑𝐓𝐀𝐏𝐂
7
• Efficient• O 𝑟𝑐6 + TimeOfSketch
• Loose bound• Sketch size ∝ 𝜖{6
• Bad empirical performance
CUR Decomposition
• Type 2:OptimalCUR𝐔⋆ = min
𝐔𝐀 − 𝐂𝐔𝐑 ]
6 = 𝐂7𝐀𝐑7
CUR Decomposition
• Type 2:OptimalCUR𝐔⋆ = min
𝐔𝐀 − 𝐂𝐔𝐑 ]
6 = 𝐂7𝐀𝐑7
• Theory [W&Zhang,2013], [Boutsidis & Woodruff, 2014]:• 𝐂 and 𝐑 are selected by the adaptive sampling algorithm• 𝑐 = 𝑂 \
w and 𝑟 = 𝑂 \w
• 𝐀 − 𝐂𝐔𝐑 ]6 ≤ 1 + 𝜖 𝐀 − 𝐀\ 5
6
CUR Decomposition
• Type 2:OptimalCUR𝐔⋆ = min
𝐔𝐀 − 𝐂𝐔𝐑 ]
6 = 𝐂7𝐀𝐑7
• Inefficient• O 𝑚𝑛𝑐 + TimeOfSketch
CUR Decomposition
• Type 3:FastCUR[W, Zhang, Zhang, 2015]
• Draw 2 sketching matrices 𝐒𝐂 and 𝐒𝐑• Solve the problem
𝐔n = min𝐔
𝑺𝑪o 𝐀 − 𝐂𝐔𝐑 𝐒𝐑 ]
6= 𝐒𝐂J𝐂
7 𝐒𝐂o𝐀𝐒𝑹 𝐑𝐒𝐑 7
• Intuition?
CUR Decomposition
• The optimal𝐔matrixis obtained by the optimization problem𝐔⋆ = min
𝐔𝐂𝐔𝐑 − 𝐀 ]
6
CUR Decomposition
• Approximately solve the optimization problem, e.g. by columnselection
CUR Decomposition
• Solve the small scale problem
CUR Decomposition
• Type 3:FastCUR[W, Zhang, Zhang, 2015]
• Draw 2 sketching matrices 𝐒𝐂 ∈ ℝ$×m� and 𝐒𝐑 ∈ ℝ&×m�• Solve the problem
𝐔n = min𝐔
𝐒𝑪o 𝐀 − 𝐂𝐔𝐑 𝐒𝐑 ]
6= 𝐒𝐂J𝐂
7 𝐒𝐂o𝐀𝐒𝑹 𝐑𝐒𝐑 7
• Theory• 𝑠* = 𝑂 *
w and sv = 𝑂 vw
• 𝐀 − 𝐂𝐔n𝐑]
6≤ 1 + 𝜖 ⋅ min
𝐔𝐀 − 𝐂𝐔𝐑 ]
6
CUR Decomposition
• Type 3:FastCUR[W, Zhang, Zhang, 2015]
• Draw 2 sketching matrices 𝐒𝐂 ∈ ℝ$×m� and 𝐒𝐑 ∈ ℝ&×m�• Solve the problem
𝐔n = min𝐔
𝐒𝑪o 𝐀 − 𝐂𝐔𝐑 𝐒𝐑 ]
6= 𝐒𝐂J𝐂
7 𝐒𝐂o𝐀𝐒𝑹 𝐑𝐒𝐑 7
• Efficient• 𝑂 𝑠*𝑠v 𝑐 + 𝑟 + TimeOfSketch
• Good empirical performance
Type 2:OptimalCUROriginal
Type 1:FastCX Type 3:FastCUR𝑠* = 2𝑐, 𝑠v = 2𝑟
Type 3:FastCUR𝑠* = 4𝑐, 𝑠v = 4𝑟
𝐀:𝑚 = 1920𝑛 = 1168
𝐂 and𝐑:• 𝑐 = 𝑟 = 100• uniformsampling
Conclusions
• ApproximatetruncatedSVD• CXdecomposition• CURdecomposition(3types)
• FastCURisthebest
Outline
• CX Decomposition & Approximate SVD• CUR Decomposition• SPSD Matrix Approximation
Motivation 1: Kernel Matrix
• Given𝑛 samples𝐱�,⋯ , 𝐱& ∈ ℝ� andkernelfunction𝜅 ⋅,⋅ .• E.g.GaussianRBFkernel
𝜅 𝐱�, 𝐱� = exp −𝐱� − 𝐱� 6
6
𝜎6 .
• Computingthekernelmatrix𝐊 ∈ ℝ&×&• where𝑘�� = 𝜅 𝐱�, 𝐱�• costsO(𝑛6𝑑) time
Motivation 1: Kernel Matrix
• Given𝑛 samples𝐱�,⋯ , 𝐱& ∈ ℝ� andkernelfunction𝜅 ⋅,⋅ .• E.g.GaussianRBFkernel
𝜅 𝐱�, 𝐱� = exp −𝐱� − 𝐱� 6
6
𝜎6 .
• Computingthekernelmatrix𝐊 ∈ ℝ&×&• where𝑘�� = 𝜅 𝐱�, 𝐱�• costs𝑂(𝑛6𝑑) time
Motivation 2: Matrix Inversion
• Solvethelinearsystem𝐊 + 𝛼𝐈& 𝐰 = 𝐲
tofind𝐰 ∈ ℝ&.• ItcostsO(𝑛�) timeandO(𝑛6)memory.• Performedby• Gaussianprocessregression(equivalently,kernelridgeregression)• LeastsquareskernelSVM
• 𝐊 ∈ ℝ&×& isthekernelmatrix• 𝐲 = 𝑦�,⋯ , 𝑦& ∈ ℝ& containsthelabels
Motivation 2: Matrix Inversion
• Solvethelinearsystem𝐊 + 𝛼𝐈& 𝐰 = 𝐲
tofind𝐰 ∈ ℝ&.• Solution:𝐰⋆ = 𝐊 + 𝛼𝐈& {�𝐲
Motivation 2: Matrix Inversion
• Solvethelinearsystem𝐊 + 𝛼𝐈& 𝐰 = 𝐲
tofind𝐰 ∈ ℝ&.• Solution:𝐰⋆ = 𝐊 + 𝛼𝐈& {�𝐲• Itcosts• 𝑂(𝑛�) time• 𝑂(𝑛6)memory.
Motivation 2: Matrix Inversion
• Solvethelinearsystem𝐊 + 𝛼𝐈& 𝐰 = 𝐲
tofind𝐰 ∈ ℝ&.• Solution:𝐰⋆ = 𝐊 + 𝛼𝐈& {�𝐲• Itcosts• 𝑂(𝑛�) time• 𝑂(𝑛6)memory.
• Performedby• Kernelridgeregression• LeastsquareskernelSVM
Motivation 3: Eigenvalue Decomposition
• Findthetop𝑘 (≪ 𝑛)eigenvectorsof𝐊.• Itcosts• 𝑂q(𝑛6𝑘) time• 𝑂(𝑛6)memory.
Motivation 3: Eigenvalue Decomposition
• Findthetop𝑘 (≪ 𝑛)eigenvectorsof𝐊.• Itcosts• 𝑂q(𝑛6𝑘) time• 𝑂(𝑛6)memory.
• Performedby• KernelPCA(𝑘 isthetargetrank)• Manifoldlearning (𝑘 isthetargetrank)
Computational Challenges
• Timecosts• Computingkernelmatrix:𝑂(𝑛6𝑑)• Matrixinversion:𝑂(𝑛�)• Rank𝑘 eigenvaluedecomposition:𝑂(𝑛6𝑘)
Computational Challenges
• Timecosts• Computingkernelmatrix:𝑂(𝑛6𝑑)• Matrixinversion:𝑂(𝑛�)• Rank𝑘 eigenvaluedecomposition:𝑂(𝑛6𝑘)
Atleastquadratictime!
Computational Challenges
• Timecosts• Computingkernelmatrix:𝑂(𝑛6𝑑)• Matrixinversion:𝑂(𝑛�)• Rank𝑘 eigenvaluedecomposition:𝑂(𝑛6𝑘)
• Memorycosts• Inversionandeigenvaluedecomposition:𝑂(𝑛6)
Computational Challenges
• Timecosts• Computingkernelmatrix:𝑂(𝑛6𝑑)• Matrixinversion:𝑂(𝑛�)• Rank𝑘 eigenvaluedecomposition:𝑂(𝑛6𝑘)
• Memorycosts• Inversionandeigenvaluedecomposition:𝑂(𝑛6)• Because
• thenumericalalgorithmsarepass-inefficient• è form𝐊 andkeepitinmemory
Computational Challenges
• Timecosts• Computingkernelmatrix:𝑂(𝑛6𝑑)• Matrixinversion:𝑂(𝑛�)• Rank𝑘 eigenvaluedecomposition:𝑂(𝑛6𝑘)
• Memorycosts• Inversionandeigenvaluedecomposition:𝑂(𝑛6)• Because
• thenumericalalgorithmsarepass-inefficient• è form𝐊 andkeepitinmemory
When𝑛 = 10�,the𝑛×𝑛 matrixcosts80GBmemory!
How to Speedup?
• Efficientlyformthelow-rankapproximation𝐊 ≈ 𝐂𝐔𝐂J
𝐊 𝐂 𝐂J𝐔
How to Speedup?
• Efficientlyformthelow-rankapproximation𝐊 ≈ 𝐂𝐔𝐂J
• Equivalent𝐊 ≈ 𝐋𝐋J
Efficient Matrix Inversion
• Solvethelinearsystem 𝐊 + 𝛼𝐈& 𝐰 = 𝐲:
• Replace𝐊 by𝐋𝐋J:𝐰⋆ = 𝐊 + 𝛼𝐈& {�𝐲 ≈ 𝐋𝐋J + 𝛼𝐈&{�𝐲
Efficient Matrix Inversion
• Approximatelysolvethelinearsystem 𝐊 + 𝛼𝐈& 𝐰 = 𝐲:
• Replace𝐊 by𝐋𝐋J:𝐰⋆ = 𝐊 + 𝛼𝐈& {�𝐲 ≈ 𝐋𝐋J + 𝛼𝐈&{�𝐲
Efficient Matrix Inversion
• Approximatelysolvethelinearsystem 𝐊 + 𝛼𝐈& 𝐰 = 𝐲
• Replace𝐊 by𝐋𝐋J:𝐰⋆ = 𝐊 + 𝛼𝐈& {�𝐲 ≈ 𝐋𝐋J + 𝛼𝐈&{�𝐲
• Expandtheinversion bytheWoodburyidentity
Efficient Matrix Inversion
• Approximatelysolvethelinearsystem 𝐊 + 𝛼𝐈& 𝐰 = 𝐲
• Replace𝐊 by𝐋𝐋J:𝐰⋆ = 𝐊 + 𝛼𝐈& {�𝐲 ≈ 𝐋𝐋J + 𝛼𝐈&{�𝐲
• Expandtheinversion bytheWoodburyidentity
𝐀 + 𝐁𝐂𝐃 {� = 𝐀{� − 𝐀{�𝐁 𝐂{� + 𝐃𝐀{�𝐁 {�𝐃𝐀{�
Efficient Matrix Inversion
• Approximatelysolvethelinearsystem 𝐊 + 𝛼𝐈& 𝐰 = 𝐲
• Replace𝐊 by𝐋𝐋J:𝐰⋆ = 𝐊 + 𝛼𝐈& {�𝐲 ≈ 𝐋𝐋J + 𝛼𝐈&{�𝐲
• Expandtheinversion bytheWoodburyidentity𝐰⋆ ≈ 𝛼{�𝐲 + 𝛼{�𝐋 𝛼𝐈 + 𝐋o𝐋 {�𝐋𝐓𝐲
Efficient Matrix Inversion
• Approximatelysolvethelinearsystem 𝐊 + 𝛼𝐈& 𝐰 = 𝐲
• Replace𝐊 by𝐋𝐋J:𝐰⋆ = 𝐊 + 𝛼𝐈& {�𝐲 ≈ 𝐋𝐋J + 𝛼𝐈&{�𝐲
• ExpandtheinversionbytheWoodburyidentity𝐰⋆ ≈ 𝛼{�𝐲 + 𝛼{�𝐋 𝛼𝐈 + 𝐋o𝐋 {�𝐋𝐓𝐲
• Timecost:𝑂 𝑛𝑐6
Linearin𝑛,muchbetterthan𝑂 𝑛�
Efficient Eigenvalue Decomposition
• Approximatelycomputethe𝑘-eigenvaluedecompositionof𝐊• SVD:𝐋 = 𝐔𝐋𝚺𝐋𝐕𝐋• 𝐊 ≈ 𝐋𝐋𝐓 = 𝐔𝐋𝚺𝐋𝟐𝐔𝐋𝐓
• Approximate𝑘- eigenvaluedecompositionof𝐊• eigenvectors:thefirst𝑘 vectorsin𝐔𝐋• eigenvalues:thefirst𝑘 vectorsdiagonalentriesin𝚺𝐋
Efficient Eigenvalue Decomposition
• Approximatelycomputethe𝑘-eigenvaluedecompositionof𝐊• SVD:𝐋 = 𝐔𝐋𝚺𝐋𝐕𝐋• 𝐊 ≈ 𝐋𝐋𝐓 = 𝐔𝐋𝚺𝐋𝟐𝐔𝐋𝐓
• Approximate𝑘- eigenvaluedecompositionof𝐊• eigenvectors:thefirst𝑘 vectorsin𝐔𝐋
• Timecost:𝑂 𝑛𝑐6
Efficient Eigenvalue Decomposition
• Approximatelycomputethe𝑘-eigenvaluedecompositionof𝐊• SVD:𝐋 = 𝐔𝐋𝚺𝐋𝐕𝐋• 𝐊 ≈ 𝐋𝐋𝐓 = 𝐔𝐋𝚺𝐋𝟐𝐔𝐋𝐓
• Approximate𝑘- eigenvaluedecomposition of𝐊• eigenvectors:thefirst𝑘 vectorsin𝐔𝐋
• Timecost:𝑂 𝑛𝑐6• Muchlowerthan𝑂q 𝑛6𝑘
Sketching Based Models
• Howtofindsuchanapproximation?
𝐊 ≈ 𝐂𝐔𝐂J
𝐊 𝐂 𝐂J𝐔
Sketching Based Models
• Howtofindsuchanapproximation?
• SketchingbasedMethods:𝐂 = 𝐊𝐒 ∈ ℝ&×* isasketchof𝐊.• 𝐒 ∈ ℝ&×* canbecolumnselectionorrandomprojectionmatrix
𝐊 ≈ 𝐂𝐔𝐂J
Sketching Based Models
• Howtofindsuchanapproximation?
• SketchingbasedMethods:𝐂 = 𝐊𝐒 ∈ ℝ&×* isasketchof𝐊.• 𝐒 ∈ ℝ&×* canbecolumnselectionorrandomprojectionmatrix
• Threemethods:• Theprototypemodel[HMT11,WZ13,WLZ16]
• Thefastmodel[WZZ15]
• TheNyström method [WS15,GM13]
𝐊 ≈ 𝐂𝐔𝐂J
The Prototype Model
• Objective:𝐊 ≈ 𝐂𝐔𝐂J
• Minimizetheapproximationerrorby𝐔⋆ = argmin
𝐔𝐊 − 𝐂𝐔𝐂J
]
6= 𝐂7𝐊 𝐂7 J.
The Prototype Model
• Objective:𝐊 ≈ 𝐂𝐔𝐂J
• Minimizetheapproximationerrorby𝐔⋆ = argmin
𝐔𝐊 − 𝐂𝐔𝐂J
]
6= 𝐂7𝐊 𝐂7 J.
Extension of the random SVD to SPSD matrix [HMT11]
The Prototype Model
• Objective:𝐊 ≈ 𝐂𝐔𝐂J
• Minimizetheapproximationerrorby𝐔⋆ = argmin
𝐔𝐊 − 𝐂𝐔𝐂J
]
6= 𝐂7𝐊 𝐂7 J.
• Time:𝑂(𝑛6𝑐)
• Thetimecomplexityisnearlythesametothe𝑘-eigenvaluedecomposition.• Itismuchfasterthanthe𝑘- eigenvaluedecompositioninpractice.
The Prototype Model
• Objective:𝐊 ≈ 𝐂𝐔𝐂J
• Minimizetheapproximationerrorby𝐔⋆ = argmin
𝐔𝐊 − 𝐂𝐔𝐂J
]
6= 𝐂7𝐊 𝐂7 J.
• Time:𝑂(𝑛6𝑐)• #Passes:one
The Prototype Model
• Objective:𝐊 ≈ 𝐂𝐔𝐂J
• Minimizetheapproximationerrorby𝐔⋆ = argmin
𝐔𝐊 − 𝐂𝐔𝐂J
]
6= 𝐂7𝐊 𝐂7 J.
• Time:𝑂(𝑛6𝑐)• #Passes:one• Memory:𝑂(𝑛𝑐)
• Put𝑘�� inmemoryonlywhenitisvisited• Keep𝐂7 inmemory
The Prototype Model
• ErrorBound• 𝑘 ≪ 𝑛 isarbitraryinteger• 𝐏 samples𝑐 = 𝑂 \
w columns by adaptive sampling
• 𝔼 𝐊 − 𝐂𝐔⋆𝐂J]
6≤ 1 + 𝜖 𝐊 − 𝐊\ ]
6
The Prototype Model
• Limitations• 𝐔⋆ = 𝐂7𝐊 𝐂7 J
• Timecostis𝑂 𝑛6𝑐• Requiresobservingthewholeof𝐊
The Prototype Model
• Prototypemodel:𝐊 ≈ 𝐂𝐔⋆𝐂J,where𝐔⋆ = argmin
𝐔𝐊 − 𝐂𝐔𝐂J
5
6.
𝐊 𝐂 𝐂J𝐔
The Fast Model
• Column/rowselection• Form𝐏J𝐊𝐏 and𝐏J𝐂
𝐏J𝐊𝐏 𝐏J𝐂 𝐂J𝐏𝐔
The Fast Model
• Column/rowselection• Form𝐏J𝐊𝐏 and𝐏J𝐂
𝐏J𝐊𝐏 𝐏J𝐂 𝐂J𝐏𝐔
The Fast Model
• 𝐊 ≈ 𝐂𝐔n 𝐂J, where𝐔n = argmin
𝐔𝐏J 𝐊 − 𝐂𝐔𝐂J 𝐏
𝑭
𝟐.
𝐏J𝐊𝐏 𝐏J𝐂 𝐂J𝐏𝐔
The Fast Model• Prototypemodel: 𝐔⋆ = argmin
𝐔𝐊 − 𝐂𝐔𝐂J
5
6= 𝐂7𝐊 𝐂7 J
• Fastmodel:𝐔n = argmin𝐔
𝐏J 𝐊 − 𝐂𝐔𝐂J 𝐏]
𝟐= 𝐏J𝐂 7(𝐏J𝐊𝐏) 𝐂J𝐏 7.
The Fast Model• Prototypemodel: 𝐔⋆ = argmin
𝐔𝐊 − 𝐂𝐔𝐂J
5
6= 𝐂7𝐊 𝐂7 J
• Fastmodel:𝐔n = argmin𝐔
𝐏J 𝐊 − 𝐂𝐔𝐂J 𝐏]
𝟐= 𝐏J𝐂 7(𝐏J𝐊𝐏) 𝐂J𝐏 7.
• Theory• 𝑝 = 𝑂 &*
w�
• 𝐏 is column selection matrix (according to the row leverage scores of 𝐂)
• Then 𝐊 − 𝐂𝐔n𝐂J]
6≤ 1 + 𝜖 𝐊 − 𝐂𝐔⋆𝐂J
]
6
Thefastermodelisnearlyasgoodastheprototypemodel!
The Fast Model• Prototypemodel: 𝐔⋆ = argmin
𝐔𝐊 − 𝐂𝐔𝐂J
5
6= 𝐂7𝐊 𝐂7 J
• Fastmodel:𝐔n = argmin𝐔
𝐏J 𝐊 − 𝐂𝐔𝐂J 𝐏]
𝟐= 𝐏J𝐂 7(𝐏J𝐊𝐏) 𝐂J𝐏 7.
• Theory• 𝑝 = 𝑂 &*
w�
• 𝐏 is column selection matrix (according to the row leverage scores of 𝐂)
• Then 𝐊 − 𝐂𝐔n𝐂J]
6≤ 1 + 𝜖 𝐊 − 𝐂𝐔⋆𝐂J
]
6
• Overalltimecost:𝑂 𝑝6𝑐 + 𝑛𝑐6 = 𝑂 𝑛𝑐�/𝜖linearin𝑛
The Nyström Method
• 𝐒 (𝑛×𝑐):columnselectionmatrix• 𝐂 = 𝐊𝐒 (𝑛×𝑐),𝐖 = 𝐒J𝐊𝐒 = 𝐒J𝐂 (𝑐×𝑐)• TheNyström method:𝐊 ≈ 𝐂𝐖7𝐂J
𝐊
The Nyström Method
• 𝐒 (𝑛×𝑐):columnselectionmatrix• 𝐂 = 𝐊𝐒 (𝑛×𝑐),𝐖 = 𝐒J𝐊𝐒 = 𝐒J𝐂 (𝑐×𝑐)• TheNyström method:𝐊 ≈ 𝐂𝐖7𝐂J
𝐊 𝐂 𝐂J
The Nyström Method
• 𝐒 (𝑛×𝑐):columnselectionmatrix• 𝐂 = 𝐊𝐒 (𝑛×𝑐),𝐖 = 𝐒J𝐊𝐒 = 𝐒J𝐂 (𝑐×𝑐)• TheNyström method:𝐊 ≈ 𝐂𝐖7𝐂J
𝐊 𝐂 𝐂J𝐖
The Nyström Method
• 𝐒 (𝑛×𝑐):columnselectionmatrix• 𝐂 = 𝐊𝐒 (𝑛×𝑐),𝐖 = 𝐒J𝐊𝐒 = 𝐒J𝐂 (𝑐×𝑐)• TheNyström method:𝐊 ≈ 𝐂𝐖7𝐂J
𝐊 𝐂 𝐂J𝐖7
The Nyström Method
• 𝐒 (𝑛×𝑐):columnselectionmatrix• 𝐂 = 𝐊𝐒 (𝑛×𝑐),𝐖 = 𝐒J𝐊𝐒 = 𝐒J𝐂 (𝑐×𝑐)• TheNyström method:𝐊 ≈ 𝐂𝐖7𝐂J
• Newexplanation:• Recallthefastmodel:𝐗n = argmin
𝐗𝐏J 𝐊 − 𝐂𝐗𝐂J 𝐏
5
6
• Setting𝐏 = 𝐒,then𝐗n = argmin
𝐗𝐒J 𝐊 − 𝐂𝐗𝐂J 𝐒
5
6
= 𝐒J𝐂 7 𝐒J𝐊𝐒 𝐂J𝐒 7
= 𝐖7𝐖𝐖7 = 𝐖7
The Nyström Method
• 𝐒 (𝑛×𝑐):columnselectionmatrix• 𝐂 = 𝐊𝐒 (𝑛×𝑐),𝐖 = 𝐒J𝐊𝐒 = 𝐒J𝐂 (𝑐×𝑐)• TheNyström method:𝐊 ≈ 𝐂𝐖7𝐂J
• Newexplanation:• Recallthefastmodel:𝐗n = argmin
𝐗𝐏J 𝐊 − 𝐂𝐗𝐂J 𝐏
5
6
• Setting𝐏 = 𝐒,then𝐗n = argmin
𝐗𝐒J 𝐊 − 𝐂𝐗𝐂J 𝐒
5
6
= 𝐒J𝐂 7 𝐒J𝐊𝐒 𝐂J𝐒 7
= 𝐖7𝐖𝐖7 = 𝐖7
The Nyström Method
• 𝐒 (𝑛×𝑐):columnselectionmatrix• 𝐂 = 𝐊𝐒 (𝑛×𝑐),𝐖 = 𝐒J𝐊𝐒 = 𝐒J𝐂 (𝑐×𝑐)• TheNyström method:𝐊 ≈ 𝐂𝐖7𝐂J
• Newexplanation:• Recallthefastmodel:𝐗n = argmin
𝐗𝐏J 𝐊 − 𝐂𝐗𝐂J 𝐏
5
6
• Setting𝐏 = 𝐒,then𝐗n = argmin
𝐗𝐒J 𝐊 − 𝐂𝐗𝐂J 𝐒
5
6
= 𝐒J𝐂 7 𝐒J𝐊𝐒 𝐂J𝐒 7
= 𝐖7𝐖𝐖7 = 𝐖7
The Nyström Method
• 𝐒 (𝑛×𝑐):columnselectionmatrix• 𝐂 = 𝐊𝐒 (𝑛×𝑐),𝐖 = 𝐒J𝐊𝐒 = 𝐒J𝐂 (𝑐×𝑐)• TheNyström method:𝐊 ≈ 𝐂𝐖7𝐂J
• Newexplanation:• Recallthefastmodel:𝐗n = argmin
𝐗𝐏J 𝐊 − 𝐂𝐗𝐂J 𝐏
5
6
• Setting𝐏 = 𝐒,then𝐗n = argmin
𝐗𝐒J 𝐊 − 𝐂𝐗𝐂J 𝐒
5
6
= 𝐒J𝐂 7 𝐒J𝐊𝐒 𝐂J𝐒 7
= 𝐖7𝐖𝐖7 = 𝐖7
The Nyström Method
• 𝐒 (𝑛×𝑐):columnselectionmatrix• 𝐂 = 𝐊𝐒 (𝑛×𝑐),𝐖 = 𝐒J𝐊𝐒 = 𝐒J𝐂 (𝑐×𝑐)• TheNyström method:𝐊 ≈ 𝐂𝐖7𝐂J
• Newexplanation:• Recallthefastmodel:𝐗n = argmin
𝐗𝐏J 𝐊 − 𝐂𝐗𝐂J 𝐏
5
6
• Setting𝐏 = 𝐒,then𝐗n = argmin
𝐗𝐒J 𝐊 − 𝐂𝐗𝐂J 𝐒
5
6
= 𝐒J𝐂 7 𝐒J𝐊𝐒 𝐂J𝐒 7
= 𝐖7𝐖𝐖7 = 𝐖7
The Nyström Method
• 𝐒 (𝑛×𝑐):columnselectionmatrix• 𝐂 = 𝐊𝐒 (𝑛×𝑐),𝐖 = 𝐒J𝐊𝐒 = 𝐒J𝐂 (𝑐×𝑐)• TheNyström method:𝐊 ≈ 𝐂𝐖7𝐂J
• Newexplanation:• Recallthefastmodel:𝐗n = argmin
𝐗𝐏J 𝐊 − 𝐂𝐗𝐂J 𝐏
5
6
• Setting𝐏 = 𝐒,then𝐗n = argmin
𝐗𝐒J 𝐊 − 𝐂𝐗𝐂J 𝐒
5
6
= 𝐒J𝐂 7 𝐒J𝐊𝐒 𝐂J𝐒 7
= 𝐖7𝐖𝐖7 = 𝐖7
• TheNystrommethodisspecialinstanceofthefastmodel.
• Itisapproximatesolutiontotheprototypemodel
The Nyström Method
• 𝐒 (𝑛×𝑐):columnselectionmatrix• 𝐂 = 𝐊𝐒 (𝑛×𝑐),𝐖 = 𝐒J𝐊𝐒 = 𝐒J𝐂 (𝑐×𝑐)• TheNyström method:𝐊 ≈ 𝐂𝐖7𝐂J
• Newexplanation:• Recallthefastmodel:𝐗n = argmin
𝐗𝐏J 𝐊 − 𝐂𝐗𝐂J 𝐏
5
6
• Setting𝐏 = 𝐒,then𝐗n = argmin
𝐗𝐒J 𝐊 − 𝐂𝐗𝐂J 𝐒
5
6
= 𝐒J𝐂 7 𝐒J𝐊𝐒 𝐂J𝐒 7
= 𝐖7𝐖𝐖7 = 𝐖7
• TheNystrommethodisspecialinstanceofthefastmodel.
• Itisapproximatesolutiontotheprototypemodel
The Nyström Method
• Cost• Time:𝑂 𝑛𝑐6• Memory:𝑂 𝑛𝑐
The Nyström Method
• Cost• Time:𝑂 𝑛𝑐6• Memory:𝑂 𝑛𝑐
Veryefficient!
The Nyström Method
• Cost• Time:𝑂 𝑛𝑐6• Memory:𝑂 𝑛𝑐
• Errorbound:weak
Veryefficient!
Comparisons
• 𝐂 = 𝐊𝐒 ∈ ℝ&×*,𝐖 = 𝐒J𝐊𝐒 = 𝐒J𝐂 ∈ ℝ*×*
• SPSDmatrixapproximation: 𝐊 ≈ 𝐂𝐔𝐂J
• Theprototypemodel:𝐔 = 𝐂7𝐊 𝐂7 J
• Thefastmodel: 𝐔 = 𝐏J𝐂 7(𝐏J𝐊𝐏) 𝐂J𝐏 7
• TheNyström method:𝐔 = 𝐖7
Comparisons
• 𝐂 = 𝐊𝐒 ∈ ℝ&×*,𝐖 = 𝐒J𝐊𝐒 = 𝐒J𝐂 ∈ ℝ*×*
• SPSDmatrixapproximation: 𝐊 ≈ 𝐂𝐔𝐂J
• Theprototypemodel:𝐔 = 𝐂7𝐊 𝐂7 J
• Thefastmodel: 𝐔 = 𝐏J𝐂 7(𝐏J𝐊𝐏) 𝐂J𝐏 7
• TheNyström method:𝐔 = 𝐖7
When 𝐏 = 𝐈&,theprototypemodel⇔ thefastmodel
Comparisons
• 𝐂 = 𝐊𝐒 ∈ ℝ&×*,𝐖 = 𝐒J𝐊𝐒 = 𝐒J𝐂 ∈ ℝ*×*
• SPSDmatrixapproximation: 𝐊 ≈ 𝐂𝐔𝐂J
• Theprototypemodel:𝐔 = 𝐂7𝐊 𝐂7 J
• Thefastmodel:𝐔 = 𝐏J𝐂 7(𝐏J𝐊𝐏) 𝐂J𝐏 7
• TheNyström method:𝐔 = 𝐖7
When 𝐏 = 𝐒,theNyström method⇔ thefastmodel
Comparisons
• 𝑐 = 150,𝑛 = 100𝑐,vary𝑝 from2𝑐 to40𝑐
𝑝/𝑐
𝐊− 𝐂𝐔𝐂J]
6
𝐊 ]6
TheNyströmMethod𝑂 𝑛𝑐6 time
TheFastModel𝑂 𝑛𝑐6 + 𝑝6𝑐 time
ThePrototypeModel𝑂 𝑛6𝑐 time
Conclusions
• Motivations• Avoidformingthekernelmatrix• Avoidinversion/decomposition
• Prototypemodel,fastmodel,Nystrom• Theyhaveconnections• ThefastmodelandNystromarepractical