low rank approximation lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · low rank...

60
Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL [email protected] 1

Upload: dinhxuyen

Post on 27-Aug-2019

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Low Rank ApproximationLecture 1

Daniel KressnerChair for Numerical Algorithms and HPC

Institute of Mathematics, [email protected]

1

Page 2: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Organizational aspects

I Lecture dates: 16.4., 23.4., 30.4., 14.5., 28.5., 4.6., 11.6., 18.6.,25.6., 2.7. (tentative)

I Exam: To be discussed next week (most likely oral exam).I Webpage: https://www5.in.tum.de/wiki/index.php/Low_Rank_ApproximationSlides on http://anchp.epfl.ch.

I EFY = Exercise For You.

2

Page 3: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

From http://www.niemanlab.org

... his [AleksandrKogan’s] messagewent on to confirmthat his approachwas indeed similar toSVD or other matrixfactorization meth-ods, like in the NetflixPrize competition, andthe Kosinki-Stillwell-Graepel Facebookmodel. Dimensionalityreduction of Facebookdata was the core ofhis model.

3

Page 4: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Rank and matrix factorizationsFor field F , let A ∈ F m×n. Then

rank(A) := dim(range(A)).

For simplicity, F = R throughout the lecture and often m ≥ n.Let B = b1, . . . ,br ⊂ Rm with r = rank(A) be basis of range(A).Then each of the columns of A =

(a1,a2, . . . ,an

)can be expressed

as linear combination of B:

aj =r∑

j=1

bicij for some coefficients cij ∈ R, i = 1, . . . , r , j = 1, . . . ,n.

Defining B =(b1,b2, . . . ,br

)∈ Rm×r :

aj = B

cj1...

cjr

A = B

c11 · · · cn1...

...c1r · · · cnr

4

Page 5: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Rank and matrix factorizationsLemma. A matrix A ∈ Rm×n of rank r admits a factorization of theform

A = BCT , B ∈ Rm×r , C ∈ Rn×r .

We say that A has low rank if rank(A) m,n.Illustration of low-rank factorization:

A BCT

#entries mn mr + nrI Generically (and in most applications), A has full rank, that is,

rank(A) = minm,n.I Aim instead at approximating A by a low-rank matrix.

5

Page 6: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Questions addressed in lecture series

What? Theoretical foundations of low-rank approximation.When? A priori and a posteriori estimates for low-rank

approximation. Situations that allow for low-rankapproximation techniques.

Why? Applications in engineering, scientific computing, dataanalysis, ... where low-rank approximation plays acentral role.

How? State-of-the-art algorithms for performing and workingwith low-rank approximations.

Will cover both, matrices and tensors.

6

Page 7: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Contents of Lecture 1

1. Fundamental tools (SVD, relation to eigenvalues, norms, bestlow-rank approximation)

2. Overview of applications3. Fundamental tools (Stability, QR)4. Extensions (weighted approximation, bivariate functions)5. Subspace iteration

7

Page 8: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Literature for Lecture 1

Golub/Van Loan’2013 Golub, Gene H.; Van Loan, Charles F. Matrixcomputations. Fourth edition. Johns Hopkins UniversityPress, Baltimore, MD, 2013.

Horn/Johnson’2013 Horn, Roger A.; Johnson, Charles R. Matrixanalysis. Second edition. Cambridge University Press,2013.

+ References on slides.

8

Page 9: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

1. Fundamental toolsI SVDI Relation to eigenvaluesI NormsI Best low-rank approximation

9

Page 10: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

The singular value decompositionTheorem (SVD). Let A ∈ Rm×n with m ≥ n. Then there areorthogonal matrices U ∈ Rm×m and V ∈ Rn×n such that

A = UΣV T , with Σ =

σ1

. . .σn

0

∈ Rm×n

and σ1 ≥ σ2 ≥ · · · ≥ σn ≥ 0.

I σ1, . . . , σn are called singular valuesI u1, . . . ,um are called left singular vectorsI v1, . . . , vn are called right singular vectorsI Avi = σiui , AT ui = σivi for i = 1, . . . ,n.I Singular values are always uniquely defined by A.I Singular values are never unique. If σ1 > σ2 > · · ·σn > 0 then

unique up to ui ← ±ui , vi ← ±vi .

10

Page 11: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

SVD: Sketch of proofInduction over n. n = 1 trivial.For general n, let v1 solve max‖Av‖2 : ‖v‖2 = 1 =: ‖A‖2. Setσ1 := ‖A‖2 and u1 := Av1/σ1.1 By definition,

Av1 = σ1u1.

After completion to orthogonal matrices U1 =(u1, U⊥

)∈ Rm×m and

V1 =(v1, V⊥

)∈ Rn×n:

UT1 AV1 =

(uT

1 Av1 uT1 AV⊥

UT⊥Av1 UT

⊥AV⊥

)=

(σ1 wT

0 A1

),

with w := V T⊥AT u1 and A1 = UT

⊥AV⊥. ‖ · ‖2 invariant under orthogonaltransformations

σ1 = ‖A‖2 = ‖UT1 AV1‖2 =

∥∥∥∥( σ1 wT

0 A1

)∥∥∥∥2≥√σ2

1 + ‖w‖22.

Hence, w = 0. Proof completed by applying induction to A1.1If σ1 = 0, choose arbitrary u1.

11

Page 12: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Very basic properties of the SVD

I r = rank(A) is number of nonzero singular values of A.I kernel(A) = spanvr+1, . . . , vnI range(A) = spanu1, . . . ,ur

12

Page 13: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

SVD: Computation (for small dense matrices)Computation of SVD proceeds in two steps:

1. Reduction to bidiagonal form: By applying n Householderreflectors from left and n − 1 Householder reflectors from right,compute orthogonal matrices U1, V1 such that

UT1 AV1 = B =

(B10

)=

@@@

@@

0

,

that is, B1 ∈ Rn×n is an upper bidiagonal matrix.2. Reduction to diagonal form: Use Divide&Conquer to compute

orthogonal matrices U2, V2 such that Σ = UT2 B1V2 is diagonal.

Set U = U1U2 and V = V1V2.Step 1 is usually the most expensive. Remarks on Step 1:

I If m is significantly larger than n, say, m ≥ 3n/2, first computingQR decomposition of A reduces cost.

I Most modern implementations reduce A successively via bandedform to bidiagonal form.2

2Bischof, C. H.; Lang, B.; Sun, X. A framework for symmetric band reduction. ACMTrans. Math. Software 26 (2000), no. 4, 581–601.

13

Page 14: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

SVD: Computation (for small dense matrices)In most applications, vectors un+1, . . . ,um are not of interest. Byomitting these vectors one obtains the following variant of the SVD.Theorem (Economy size SVD). Let A ∈ Rm×n with m ≥ n. Thenthere is a matrix U ∈ Rm×n with orthonormal columns and anorthonormal matrix V ∈ Rn×n such that

A = UΣV T , with Σ =

σ1. . .

σn

∈ Rn×n

and σ1 ≥ σ2 ≥ · · · ≥ σn ≥ 0.

Computed by MATLAB’s [U,S,V] = svd(A,’econ’).Complexity:

memory operationssingular values only O(mn) O(mn2)economy size SVD O(mn) O(mn2)

(full) SVD O(m2 + mn) O(m2n + mn2)

14

Page 15: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

SVD: Computation (for small dense matrices)Beware of roundoff error when interpreting singular value plots.Exmaple: semilogy(svd(hilb(100)))

0 20 40 60 80 10010

-20

10-10

100

I Kink is caused by roundoff error and does not reflect truebehavior of singular values.

I Exact singular values are known to decay exponentially.3

I Sometimes more accuracy possible.4.3Beckermann, B. The condition number of real Vandermonde, Krylov and positive

definite Hankel matrices. Numer. Math. 85 (2000), no. 4, 553–577.4Drmac, Z.; Veselic, K. New fast and accurate Jacobi SVD algorithm. I. SIAM J.

Matrix Anal. Appl. 29 (2007), no. 4, 1322–134215

Page 16: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Singular/eigenvalue relations: symmetric matrices

Symmetric A = AT ∈ Rn×n admits spectral decomposition

A = U diag(λ1, λ2, . . . , λn)UT

with orthogonal matrix U.After reordering may assume |λ1| ≥ |λ2| ≥ · · · ≥ |λn|. Spectraldecomposition can be turned into SVD A = UΣV T by defining

Σ = diag(|λ1|, . . . , |λn|), V = U diag(sign(λ1), . . . , sign(λn)).

Remark: This extends to the more general case of normal matrices(e.g., orthogonal or symmetric) via complex spectral or real Schurdecompositions.

16

Page 17: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Singular/eigenvalue relations: general matricesConsider SVD A = UΣV T of A ∈ Rm×n with m ≥ n. We then have:

1. Spectral decomposition of GramianAT A = V ΣT ΣV T = V diag(σ2

1 , . . . , σ2n)V T

AT A has eigenvalues σ21 , . . . , σ

2n ,

right singular vectors of A are eigenvectors of AT A.2. Spectral decomposition of Gramian

AAT = UΣΣT UT = U diag(σ21 , . . . , σ

2n ,0, . . . ,0)UT

AAT has eigenvalues σ21 , . . . , σ

2n and, additionally, m − n zero

eigenvalues,first n left singular vectors A are eigenvectors of AAT .

3. Decomposition of Golub-Kahan matrix

A =

(0 A

AT 0

)=

(U 00 V

)(0 Σ

ΣT 0

)(U 00 V

)T

.

EFY. Prove thatA has eigenvalues±σj with eigenvectors 1√2

(±uj

vj

).

17

Page 18: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Norms: Spectral and Frobenius normGiven SVD A = UΣV T , one defines:

I Spectral norm: ‖A‖2 = σ1.

I Frobenius norm: ‖A‖F =√σ2

1 + · · ·+ σ2n .

Basic properties:I ‖A‖2 = max‖Av‖2 : ‖v‖2 = 1 (see proof of SVD).I ‖ · ‖2 and ‖ · ‖F are both (submultiplicative) matrix norms.I ‖ · ‖2 and ‖ · ‖F are both unitarily invariant, that is

‖QAZ‖2 = ‖A‖2, ‖QAZ‖F = ‖A‖F

for any orthogonal matrices Q,Z .I ‖A‖2 ≤ ‖A‖F ≤ ‖A‖2/

√r

I ‖AB‖F ≤ min‖A‖2‖B‖F , ‖A‖F ‖B‖2EFY. Prove these two inequalities. Hint for the second inequality: Use the relations on the next slide to first show that‖B‖F = ‖(‖b1‖2, . . . , ‖bn‖2)‖F .

EFY. Find a matrix A ∈ Rm1×n and a nonzero matrix B ∈ Rm2×n such that ‖A‖2 =

∥∥∥∥(AB

)∥∥∥∥2. Classify the set of matrices

A ∈ Rm1×n such that ‖A‖2 <∥∥∥∥(A

B

)∥∥∥∥2

for every nonzero matrix B ∈ Rm2×n .

Investigate analogous questions for the Frobenius norm.

18

Page 19: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Euclidean geometry on matricesLet B ∈ Rn×n have eigenvalues λ1, . . . , λn ∈ C. Then

trace(B) := b11 + · · ·+ bnn = λ1 + · · ·+ λn.

In turn,‖A‖2

F = trace AT A = trace AAT =∑i,j

a2ij .

Two simple consequences:I ‖ · ‖F is the norm induced by the matrix inner product

〈A,B〉 := trace(ABT ), A,B ∈ Rm×n.

I Partition A =(a1,a2, . . . ,an

)and define vectorization

vec(A) =

a1...

an

∈ Rmn.

Then 〈A,B〉 = 〈vec(A), vec(B)〉 and ‖A‖F = ‖ vec(A)‖2.

19

Page 20: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Von Neumann’s trace inequality

TheoremFor m ≥ n, let A,B ∈ Rm×n have singular values σ1(A) ≥ · · · ≥ σn(A)and σ1(B) ≥ · · · ≥ σn(B), respectively. Then

|〈A,B〉| ≤ σ1(A)σ1(B) + · · ·+ σn(A)σn(B).

Consequence:

‖A− B‖2F = 〈A− B,A− B〉 = ‖A‖2

F − 2〈A,B〉+ ‖B‖2F

≥ ‖A‖2F − 2

n∑i=1

σi (A)σi (B) + ‖B‖2F

=n∑

i=1

(σi (A)− σi (B))2.

EFY. Use Von Neumann’s trace inequality and the SVD to show for 1 ≤ k ≤ n that

max|〈A, PQT 〉| : P ∈ Rm×k, Q ∈ Rn×k

, PT P = QT Q = Ik = σ1(A) + · · · + σk (A).

20

Page 21: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Proof of Von Neumann’s trace inequality5

Singular value vector σ(A) can be written as convex combination

σ(A) = σn(A)fn + (σn−1(A)− σn(A))fn−1 + · · ·

with fj = e1 + · · ·+ ej . Decompose A analogously via its SVDA = UAΣAV T

A :

A = σn(A)An + (σn−1(A)− σn(A))An−1 + · · · , Aj := UAdiag(fj )V TA

Insert in lhs of trace inequality:

|〈A,B〉| ≤ σn(A)|〈An,B〉|+ (σn−1(A)− σn(A))|〈An−1,B〉|+ · · · .

Rhs is linear wrt σ(A) May assume A = Ak for k = 1, . . . ,n.Analogously for B.

5This proof follows [Grigorieff, R. D. Note on von Neumann’s trace inequality. Math.Nachr. 151 (1991), 327–328]. For Mirsky’s ingenious proof based on doubly stochasticmatrices; see Theorem 8.7.6 in [Horn/Johnson’2013].

21

Page 22: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Proof of Von Neumann’s trace inequalityLet A = UAdiag(fk )V T

A , B = UBdiag(f`)V TB , and k ≤ `. Then

〈A,B〉 = trace( k∑

i=1

vA,iuTA,i

∑j=1

uB,jvTB,j

)

=k∑

i=1

∑j=1

trace(vA,iuT

A,iuB,jvTB,j)

=k∑

i=1

∑j=1

(uTA,iuB,j )(vT

B,jvA,i )

Cauchy-Schwartz

|〈A,B〉| ≤k∑

i=1

‖UTB uA,i‖2‖V T

B vA,i‖2 = k ,

which completes the proof.

22

Page 23: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Schatten normsThere are other unitarily invariant matrix norms.6

Let s(A) = (σ1, . . . , σn). The p-Schatten norm defined by

‖A‖(p) := ‖s(A)‖p

is a matrix norm for any 1 ≤ p ≤ ∞.p =∞: spectral norm, p = 2: Frobenius norm, p = 1: nuclear norm.EFY. What is lim

p→0+‖A‖(p)?

DefinitionThe dual of a matrix norm ‖ · ‖ on Rm×n is defined by

‖A‖D = max〈A,B〉 : ‖B‖ = 1.

LemmaLet p,q ∈ [1,∞] such that p−1 + q−1 = 1. Then

‖A‖D(p) = ‖A‖(q).

EFY. Prove this lemma for p = ∞. Hint: Von Neumann’s trace inequality.

6Complete characterization via symm gauge functions in [Horn/Johnson’2013].23

Page 24: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Best low-rank approximation

Consider k < n and let

Uk :=(u1 · · · uk

), Σk := diag(σ1, . . . , σk ), Vk :=

(u1 · · · uk

).

ThenTk (A) := Uk Σk V T

k

has rank at most k . For any unitarily invariant norm ‖ · ‖:

‖Tk (A)− A‖ =∥∥diag(0, . . . ,0, σk+1, . . . , σn)

∥∥In particular, for spectral norm and the Frobenius norm:

‖A− Tk (A)‖2 = σk+1, ‖A− Tk (A)‖F =√σ2

k+1 + · · ·+ σ2n .

Nearly equal if and only if singular values decay sufficiently quickly.

24

Page 25: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Best low-rank approximationTheorem (Schmidt-Mirsky). Let A ∈ Rm×n. Then

‖A− Tk (A)‖ = min‖A− B‖ : B ∈ Rm×n has rank at most k

holds for any unitarily invariant norm ‖ · ‖.

Proof7 for ‖ · ‖F : Follows directly from consequence of VonNeumann’s trace inequality.Proof for ‖ · ‖2: For any B ∈ Rm×n of rank ≤ k , kernel(B) hasdimension ≥ n − k . Hence, ∃w ∈ kernel(B) ∩ range(Vk+1) with‖w‖2 = 1. Then

‖A− B‖22 ≥ ‖(A− B)w‖2

2 = ‖Aw‖22 = ‖AVk+1V T

k+1w‖22

= ‖Uk+1Σk+1V Tk+1w‖2

2

=r+1∑j=1

σj |vTj w |2 ≥ σk+1

r+1∑j=1

|vTj w |2 = σk+1.

7See Section 7.4.9 in [Horn/Johnson’2013] for the general case.25

Page 26: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Best low-rank approximationUniqueness:

I If σk > σk+1 best rank-k approximation with respect to Frobeniusnorm is unique.

I If σk = σk+1 best rank-k approximation never unique. Forexample I3 has several best rank-two approximations:1 0 0

0 1 00 0 0

,

1 0 00 0 00 0 1

,

0 0 00 1 00 0 1

.

I With respect to spectral norm best rank-k approximation onlyunique if σk+1 = 0. For example, diag(2,1, ε) with 0 < ε < 1 hasinfinitely many best rank-two approximations:2 0 0

0 1 00 0 0

,

2− ε/2 0 00 1− ε/2 00 0 0

,

2− ε/3 0 00 1− ε/3 00 0 1

, . . . .

EFY. Given a symmetric matrix A ∈ Rn×n and 1 ≤ k < n, show that there is always a best rank-k approximation that is symmetric.Is every best rank-k approximation (with respect to Frobenius norm) symmetric? What about the spectral norm?

26

Page 27: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Approximating the range of a matrix

Aim at finding a matrix Q ∈ Rm×k with orthonormal columns such that

range(Q) ≈ range(A).

I −QQT is orthogonal projector onto range(Q)⊥ Aim at minimizing

‖(I −QQT )A‖ = ‖A−QQT A‖

for unitarily invariant norm ‖ · ‖. Because rank(QQT A) ≤ k ,

‖A−QQT A‖ ≥ ‖A− Tk (A)‖.

Setting Q = Uk one obtains

Uk UTk A = Uk UT

k UΣV T = Uk Σk V Tk = Tk (A).

Q = Uk is optimal.

27

Page 28: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Approximating the range of a matrix

Variation:max‖QT A‖F : QT Q = Ik.

Equivalent tomax|〈AAT ,QQT 〉| : QT Q = Ik.

By Von Neumann’s trace inequality and equivalence betweeneigenvectors of AAT and left singular vectors of A, optimal Q given byUk .EFY. When replacing the Frobenius norm by the spectral norm in this formulation, does one obtain the same result?

28

Page 29: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

2. ApplicationsI Principal Component AnalysisI Matrix CompletionI Some other applications

29

Page 30: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Principal Component Analysis (PCA)

I Most popular method for dimensionality reduction in statistics,data science, . . .

Consider N independently drawn observations for K randomvariables X1, . . . ,XK . Illustration of N = 100 observations for K = 2:

1 1.2 1.4 1.6 1.8 2

2.5

3

3.5

4

4.5

5

5.5

30

Page 31: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Principal Component Analysis (PCA)Each of the observations is arranged in a vector xj ∈ RK withj = 1, . . . ,N.Subtract sample mean

x :=1N

(x1 + · · ·+ xN)

Date with mean subtracted:

-0.5 0 0.5

-1

-0.5

0

0.5

1

1.5

31

Page 32: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Principal Component Analysis (PCA)

Covariance matrix

C :=1

N − 1

N∑j=1

(xj − x)(xj − x)T .

Diagonal entry cii estimates variance of Xi , while off-diagonal entrycik estimates covariance between Xi and Xk .Defining A :=

[x1 − x , . . . , xN − x

]∈ RK×N , we can equivalently write

C =1

N − 1AAT .

32

Page 33: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Principal Component Analysis (PCA)

Reduce data to dimension 1:Find linear combination Y1 = w1X1 + · · ·+ wK XK with w1, . . . ,wK ∈ Rand w2

1 + · · ·+ w2K = 1 that captures most of the observed variation.

Maximize variance of new variable Y1.Corresponding observations of Y1 given by wT x1, . . . ,wT xN withsample mean wT x maximization of variance corresponds to

maxw∈RN‖w‖2=1

N∑j=1

(wT xj − wT x)2 = maxw∈RN‖w‖2=1

‖wT A‖22.

Optimal vector w given by dominant left singular vector of A!(Corresponds to eigenvector for largest eigenvalue of AAT .)This is the first principal vector.

33

Page 34: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Principal Component Analysis (PCA)Data with first principal vector:

-1 -0.5 0 0.5 1

-2

-1

0

1

2

Projection of data onto first principal vector:

-1 -0.5 0 0.5 1

-2

-1

0

1

2

34

Page 35: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Principal Component Analysis (PCA)

I Analogously, first k principal vectors given by dominant k leftsingular vectors u1, . . . ,uk . Equivalent to best rank-kapproximation of data matrix:

minUT U=Ik

‖A− UCT‖

I PCA not robust wrt outliers in data. Robust PCA8 uses model

A ≈ low rank + sparse.

Obtained via solution of

min‖L‖(1) + λ‖S‖1 : A = L + S,

for multiplier λ > 0 and ‖S‖1 = 1-norm of vec(S).

8Emmanuel J. Candes; Xiaodong Li; Yi Ma; John Wright. Robust PrincipalComponent Analysis?

35

Page 36: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Matrix Completion

Assume that data matrix is modeled by (low) rank k . Two popularapproaches to deal with missing entries:

1. Impute data (insert 0 or row/column means in missing entries).Apply SVD to get best low-rank approximation BCT of imputeddata matrix.

2. Find rank-k matrix BCT that fits known entries best; measured in(weighted) Euclidean norm.

Predict unknown entries from BCT .Netflix prize won by combination of matrix completion with othertechniques.

36

Page 37: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Applications in Scientific Computing and Engineering

I POD, reduced basis method, reduced-order modelling.I High-dimensional integration.I Solution of large-scale matrix equations. Optimal control.I Solution of high-dimensional PDEs.I Uncertainty quantification.I . . .

Several of these will be covered in later parts of the course.

37

Page 38: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

3. Fundamental ToolsI Stability of SVDI Canonical anglesI Stability of low-rank approximationI QR decomposition

38

Page 39: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Stability of SVDWhat happens to SVD if A is perturbed by noise?

LemmaLet A,E ∈ Rm×n. Then

|σi (A + E)− σi (A)| ≤ ‖E‖2.

Proof.Using the characterization

σi (A + E) = min‖B‖2 : rank(A + E − B) ≤ i − 1

and setting B = A− Ti−1(A) + E , we obtain

σi (A + E) ≤ ‖B‖2 ≤ ‖A− Ti−1(A)‖2 + ‖E‖2 = σi (A) + ‖E‖2,

which implies the result.Result also special case of famous Weyl’s inequality.EFY. Show that the matrix rank is a lower semi-continuous function.

39

Page 40: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Stability of SVD

Singular values are perfectly well conditioned.Singular vectors tend to be less stable! Example:

A =

(1 00 1 + ε

), E =

(0 εε −ε

).

I A has right singular vectors(

10

),

(01

).

I A + E has right singular vectors 1√2

(11

), 1√

2

(1−1

)To formulate perturbation bound, need to measure distances betweensubspaces.

40

Page 41: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Canonical angles

Let columns of X ,Y ∈ Cn×k contain orthonormal bases of twok -dimensional subspaces X ,Y ⊂ Cn, respectively. Denote singularvalues (in reverse order) of X T Y :

0 ≤ σ1 ≤ · · · ≤ σp ≤ 1.

We callθi (X ,Y) := arccosσi , i = 1, . . . , k ,

the canonical angles between X and Y. Note: For k = 1, θ1 is theusual angle θ(x , y) between vectors.Geometric characterization:

θ1(X ,Y) = maxx∈Xx 6=0

miny∈Yy 6=0

θ(x , y).

It follows that θ1(X ,Y) = 0 if and only if X ∩ Y⊥ 6= 0.

41

Page 42: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Canonical anglesNote that XX T and YY T are orthogonal projectors on X and Y,respectively.

Lemma (Projector characterization)Define sin Θ(X ,Y) = diag(sin θ1(X ,Y), . . . , sin θp(X ,Y)). Then

sin θ1(X ,Y) = ‖ sin Θ(X ,Y)‖2 = ‖XX T − YY T‖2.

Proof. See Theorem I.5.5 in [Stewart/Sun’1990].

LemmaLet Q ∈ R(n−k)×k , and X = range

(Ik0

), Y = range

(IpQ

).

Then θ1(X ,Y) = arctan ‖Q‖2.

Proof.The columns of

( IpQ

)(I + QT Q)−1/2 form an orthonormal basis of Y.

By definition, this implies that cos θ1(X ,Y) is the smallest singularvalue of (I + QT Q)−1/2. By the SVD of Q, it follows that

cos θ1(X ,Y) =1√

1 + ‖Q‖22

.

42

Page 43: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Stability of SVD

Theorem (Wedin). Let k < n and assume

δ := σk (A + E)− σk+1(A) > 0.

Let Uk/Uk/Vk/Vk denote subspaces spanned by first k left/rightsingular vectors of A / A + E . Then√∥∥ sin Θ(Uk , Uk )

∥∥2F +

∥∥ sin Θ(Vk , Vk )∥∥2

F ≤√

2‖E‖F

δ. (1)

Θ: diagonal matrix containing canonical angles between twosubspaces.

I Perturbation on input multiplied by δ−1 ≈ [σk (A)− σk+1(A)]−1.I Bad news for stability of low-rank approximations?

43

Page 44: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Stability of low-rank approximationLemma. Let A ∈ Rm×n have rank ≤ k . Then

‖Tk (A + E)− A‖ ≤ C‖E‖

holds with C = 2 for any unitarily invariant norm ‖ · ‖. For theFrobenius norm, the constant can be improved to C = (1 +

√5)/2.

Proof. Schmidt-Mirsky gives ‖Tk (A + E)− (A + E)‖ ≤ ‖E‖. Triangleinequality implies

‖Tk (A + E)− (A + E) + (A + E)− A‖ ≤ 2‖E‖.

Second part is result by Hackbusch9.

Implication for general matrix A:

‖Tk (A + E)− Tk (A)‖ =∥∥Tk(Tk (A) + (A− Tk (A)) + E

)− Tk (A)

∥∥≤ C‖(A− Tk (A)) + E‖ ≤ C(‖A− Tk (A)‖+ ‖E‖).

Perturbations on the level of truncation error pose no danger.9Hackbusch, W. New estimates for the recursive low-rank truncation of

block-structured matrices. Numer. Math. 132 (2016), no. 2, 303–32844

Page 45: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Stability of low-rank approximation: Application

Consider partitioned matrix

A =

(A11 A12A21 A22

), Aij ∈ Rmi×nj ,

and desired rank k ≤ mi ,nj . Let ε := ‖Tk (A)− A‖.

Eij := Tk (Aij )− Aij ⇒ ‖Eij‖ ≤ ε.

By stability of low-rank approximation,∥∥∥∥Tk

(Tk (A11) Tk (A12)Tk (A21) Tk (A22)

)− A

∥∥∥∥F

=

∥∥∥∥Tk

(A +

(E11 E12E21 E22

))− A

∥∥∥∥F≤ Cε,

with C = 32 (1 +

√5).

This allows, e.g., to perform truncations in parallel.

45

Page 46: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

The QR decompositionTheoremLet X ∈ Rm×n with m ≥ n. Then there is an orthogonal matrixQ ∈ Rm×m such that

X = QR, with R =

(R10

)=

@@@

0

,

that is, R1 ∈ Rn×n is an upper triangular matrix.MATLAB: [Q,R] = qr(X).Will use economy size QR decomposition instead: Letting Q1 ∈ Cm×n

contain first n columns of Q, one obtains

X = Q1R1 = Q1 · @@@

.

MATLAB: [Q,R] = qr(X,0).EFY. Let A =

(a1, a2, . . . , an

)with ai ∈ Rm . Using the QR decomposition, show Hadamard’s inequality:

| det(A)| ≤ ‖a1‖2 · ‖a2‖2 · · · ‖an‖2.

Characterize the set of all m × n matrices A for which equality holds.

46

Page 47: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

QR for recompressionSuppose that

A = BCT , with B ∈ Rm×K ,C ∈ Rn×K . (2)

Goal: Compute best rank-k approximation of A for k < K .Typical example: Sum of J matrices of rank k :

A =J∑

j=1

Bi︸︷︷︸∈Rm×k

Ci︸︷︷︸∈Rn×k

T=(B1 · · · BJ

)︸ ︷︷ ︸Rm×Jk

(C1 · · · CJ

)︸ ︷︷ ︸Rm×Jk

T. (3)

Algorithm to recompress A:1. Compute (economic) QR decomps B = QBRB and C = QCRC .2. Compute truncated SVD Tk (RBRT

C ) = Uk Σk V Tk .

3. Set Uk = QBUk , Vk = QCVk and return Tk (A) := Uk Σk V Tk .

Returns best rank-k approximation of A with O((m + n)K 2) ops.

47

Page 48: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

4. ExtensionsI Weighted and structured low-rank approximationI Semi-separable approximation of bivariate functions

48

Page 49: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Weighted low-rank approximationIf some columns or rows are more important than others (e.g., theyare known to be less corrupted by noise), replace low-rankapproximation problem by

min‖DR(A− B)DC‖ : B ∈ Rm×n has rank at most k

with suitably chosen pos def diagonal matrices DR ,DC . More general:Given invertible matrices WR ∈ Rm×m, WC ∈ Rn×n, weightedlow-rank approximation problem consists of

min‖WR(A− B)WC‖ : B ∈ Rm×n has rank at most k

.

Solution given by

B = W−1R · Tk (WRAWC) ·W−1

C

Proof: EFY.Remark: Numerically more stable approach via generalized SVD[Golub/Van Loan’2013] .

49

Page 50: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Limit case: Infinite weights

Choosing diagonal weights that converge to∞ rows/columnsremain unperturbed.Case of fixed columns: Consider block column partition

A =(A1 A2

).

Consider

min‖A2 − B2‖ :

(A1 B2

)has rank at most k

No/trivial solution if rank(A1) ≥ k . Assume ` := rank(A1) < k and letX1 ∈ Rn×` contain orthonormal basis of range(A1). Then10

B2 = X1X T1 A2 + Tk−`

((I − X1X T

1 )A2).

10Golub, G. H.; Hoffman, A.; Stewart, G. W. A generalization of theEckart-Young-Mirsky matrix approximation theorem. Linear Algebra Appl. 88/89(1987), 317–327.

50

Page 51: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

General weights

Given an mn ×mn symmetric pos def matrix W , define

‖A‖W =√

vec(A)T W vec(A)

Equals Frobenius norm for W = I. General weighted low-rankapproximation problem:

min‖A− B‖W : B ∈ Rm×n has rank at most k

.

EFY. Show that this problem can be rephrased as the previously considered (standard) weighted low-rank approximation problem for thecase of a Kronecker product W = W2 ⊗ W1. Hint: Cholesky decomposition.

I For general W no expression in terms of SVD available needto use general optimization method.

I Similarly, imposing general structures on A (such asnonnegativity, fixing individual entries, ...) usually does not admitsolutions in terms of SVD. Often end up with NP-hard problems.

51

Page 52: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Separable approximation of bivariate functionsGiven Ωx ⊂ Rdx and Ωy ⊂ Rdy aim at finding semi-separableapproximation of f ∈ L2(Ωx × Ωy ) ∼= L2(Ωx )⊗ L2(Ωy ):

f (x , y) ≈ g1(x)h1(y) + · · ·+ gr (x)hr (y)

for g1, . . . ,gr ∈ L2(Ωx ), h1, . . . ,hr ∈ L2(Ωy )

Application to higher-dimensional integrals:∫Ωx

∫Ωy

f (x , y)dµy (y) dµx (x)

≈r∑

i=1

∫Ωx

∫Ωy

gi (x)hi (y) dµy (y) dµx (x)

=r∑

i=1

[ ∫Ωx

gi (x) dµx (x)][ ∫

Ωy

hi (y) dµy (y)]

semi-separable approximation breaks down dimensionality ofintegrals (for separable measures).

52

Page 53: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Separable approximation of bivariate functionsGiven f ∈ L2(Ωx × Ωy ), consider linear operator

Lf : L2(Ωx )→ L2(Ωy ), w 7→∫

Ωx

w(x)f (x , y) dx .

Admits SVD

Lf (·) =∞∑i=1

σiui〈vi , ·〉

with L2 orthonormal bases u1,u2, . . . and v1, v2, . . ..Best semi-separable approximation of f (in L2(Ωx × Ωy )) given by

fr (x , y) =r∑

i=1

σiui (x)vi (y),

provided that∑∞

i=1 σ2i <∞ (Hilbert-Schmidt).

‖f − fr‖L2 = σ2r+1 + σ2

r+2 + · · · .

53

Page 54: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Separable and low-rank approximation

Choose discretization x1, . . . , xm ∈ Ωx , y1, . . . , ym ∈ Ωy . Define

F =

f (x1, y1) f (x1, y2) · · · f (x1, yn)f (x2, y1) f (x2, y2) · · · f (x2, yn)

......

...f (xm, y1) f (xm, y2) · · · f (xm, yn)

and

Fr =

fr (x1, y1) fr (x1, y2) · · · fr (x1, yn)fr (x2, y1) fr (x2, y2) · · · fr (x2, yn)

......

...fr (xm, y1) fr (xm, y2) · · · fr (xm, yn)

=r∑

i=1

gi(x1)gi(x2)

...gi(xm)

hi(y1)hi(y2)

...hi(yn)

T

Fr has rank at most r .EFY. Prove ‖F − Fr‖2

F ≤ σ2r+1 + σ2

r+2 + · · · .

54

Page 55: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

5. Subspace Iteration

55

Page 56: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Subspace iteration and low-rank approximation

Subspace iteration = extension of power method.

Input: Matrix A ∈ Rm×n.1: Choose starting matrix X (0) ∈ Rm×k with

(X (0)

)T X (0) = Ik .2: j = 0.3: repeat4: Set j := j + 1.5: Compute Y (j) := AAT X (j−1).6: Compute economy size QR factorization: Y (j) = QR.7: Set X (j) := Q.8: until convergence is detected

As will soon be seen, converges to basis of dominant subspace Uk .Low-rank approximation obtained from

Tk (A) ≈ X (j)(X (j))T A.

56

Page 57: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Convergence of subspace iteration

TheoremConsider SVD A = UΣV T and, for k < n, let Uk = spanu1, . . . ,uk.Assume that σk+1 > σk and θ1(Uk ,X (0)) < π/2. Then the iteratesX (j) = range(X (j)) of the subspace iteration satisfy

tan θ1(Uk ,X (j)) =

∣∣∣∣σk+1

σk

∣∣∣∣2j

tan θ1(Uk ,X (0)).

Sketch of proof. As angles do not depend on choice of bases, mayomit QR decompositions X (j) = (AAT )jX (0). By SVD of A, may setA = Σ and hence Uk = spane1, . . . ,ek. Partition

Σ =

(Σ1 00 Σ2

), X (0) =

(X (0)

1

X (0)2

), Σ1,X

(0)1 ∈ Rk×k .

Result follows from applying expression for tangent of θ1.EFY. Complete details of proof.

57

Page 58: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Numerical experiments

Convergence of subspace iteration for 100× 100 Hilbert matrix.k = 5 σk+1/σk = 0.188. Random starting guess.

0 5 10 15 2010

-30

10-20

10-10

100 Black curve:

tan θ1(Uk ,X (j))

Blue curve:‖T7(A)− X (j)

(X (j)

)T A‖2

Red curve:‖A− X (j)

(X (j)

)T A‖2

58

Page 59: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Numerical experiments

Convergence of subspace iteration for matrix with singular values

1,0.99,0.98,1

10,

0.9910

,0.9810

,1

100,

0.99100

,0.98100

, . . .

k = 7 σk+1/σk = 0.99. Random starting guess.

0 5 10 15 2010

-3

10-2

10-1

100

101

Black curve:tan θ1(Uk ,X (j))

Blue curve:‖T7(A)− X (j)

(X (j)

)T A‖2

Red curve:‖A− X (j)

(X (j)

)T A‖2

59

Page 60: Low Rank Approximation Lecture 1 - sma.epfl.chsma.epfl.ch/~anchpcommon/lecture1.pdf · Low Rank Approximation Lecture 1 Daniel Kressner Chair for Numerical Algorithms and HPC Institute

Numerical experiments

Observations:I Low-rank approximation sufficiently good (for most purposes)

already after 1 iteration.I Convergence to dominant subspace arbitrarily slow, but not

relevant.I Classical, asymptotic convergence analysis insufficient.I Pre-asymptotic analysis needs to take randomization of starting

guess into account.

60