contrasting properties of rsvd and lsqr algorithms for...

Contrasting properties of RSVD and LSQRalgorithms for solutions of ill-posed problems:

Approximating the SVD

Rosemary Renaut1 Anthony Helmstetter 1 SaeedVatankhah2

1: School of Mathematical and Statistical Sciences, Arizona State University,[email protected], [email protected]

2: Institute of Geophysics, University of Tehran, [email protected]

International Conference on Mathematics of Data ScienceNovember 2018

Outline

Background: TSVD surrogate for the small scaleStandard Approaches to Estimate Regularization ProblemConvergence of the regularization parameter for UPREAlgorithm Verification

Methods for the Large Scale: Approximating the SVDKrylov: Golub Kahan Bidiagonalization - LSQRRandomized SVDSimulations: Hybrid RSVD and Hybrid LSQR

Conclusions: RSVD - LSQRMain ResultsRelevance to Data Science

Simple Ill-Posed Problem: Image Restoration

True Noise

Naive Solution PSF

Mildly ill-posed problem: Slow decay of singular values. SNR 13

Notation: Spectral Decomposition of the Solution: The SVD

Consider general discrete problem

Ax = b, A ∈ Rm×n, b ∈ Rm, x ∈ Rn.

Singular value decomposition (SVD) of A rank r ≤ min(m,n)

A = UΣV T =

r∑i=1

uiσivTi , Σ = diag(σ1, . . . , σr).

Singular values σi, singular vectors ui, vi, rank r.Expansion for the solution:

x =

r∑i=1

siσivi, si = uTi b

Regularization: Filtering and Truncation

Truncated SVD of size k gives best rank k approximation to A.Surrogate model is given by Ak ≈ UkΣkVk

T .Filtered and Truncated solution

x =

k∑i=1

γi(α)siσi

vi

Filter Factor γi(α) (γi = 0 when i > k)Regularization parameters :

I truncation k - find the size for the surrogatemodel.

I regularization parameter α for the hybridsurrogate.

Regularization Parameter Estimation: Find αopt to minimize F (α)

Filter function γi(α) and complement φi(α).φ(α) = α2

α2+σ2i

= 1− γi(α), i = 1 : r, φi = 1, i > k.

Unbiased Predictive Risk : Minimize functional, noise level η2

Uk(α) =

k∑i=1

φ2i (α)s2i − 2η2k∑i=1

φi(α)

GCV : Minimize rational function, m∗ = minm,n

G(α) =

(∑m∗

i=1 φ2i (α)s2i

)(∑m∗

i=1 φi(α))2

How does αopt = argminF (α) depend on k?

Convergence αk with k for GCV and UPRE: Examples Restore Tools

Different noise levels: GCV and UPREGrain Satellite

Mildly Ill-Posed Moderately Ill-Posedσi = ζi−τ , 12 ≤ τ ≤ 1 σi = ζi−τ , τ > 1

αk converges with k and depends on noise level.Supports use of truncated SVD as surrogate

Theory: Assumptions

Assumptions (Normalization)The system is normalized so that we may assume σ1 = 1.

Assumptions (Decay Rate)The measured coefficients si decay according to|si|2 = σ

2(1+ν)i > σ2 for 0 < ν < 1, 1 ≤ i ≤ `, i.e. the dominant

measured coefficients follow the decay rate of the exactcoefficients.

Assumptions (Noise in Coefficients)There exists ` such that E(|si|2) = σ2 for all i > `, i.e. that thecoefficients si are noise dominated for i > `.

Theorems on Convergence of αk for UPRE [RHV18]

TheoremSuppose Assumptions 2 and 3, and that Uk(αk) is a minimumfor Uk(α). Then αk > α` > σ`+1/

√1− σ2`+1 = αmin for k ≥ `.

TheoremSuppose the decay rate and noise assumptions, and that αopt,and each αk, k > ` are unique on σ`+1/

√1− σ2`+1 < α < 1.

ThenI αkk>` is on the average increasing with

limk→r E(αk) = E(αopt).I Uk(αk) is increasing.

Theory can be used to estimate k and αk

Comparing Automatic Parameter Estimates by TSVD and SVD

Truncated Full

0.025

0.030

α

(a) Noise level = 1%

Truncated Full0.06

0.07

0.08

0.09

0.10

α

(b) Noise level = 5%

Truncated Full0.08

0.10

0.12

0.14

0.16

α

(c) Noise level = 10%

Figure: Box plots comparing parameter estimates αk with αopt forproblem Satellite computed from 100 runs for noise levels 1%,5%, and 10%.

Robust algorithm verifies choice of k and αk with increasing k

Comparing Automatic Relative Errors TSVD and SVD

Truncated Full

0.64

0.66

0.68

0.70

0.72

0.74

RRE

(a) Noise level = 1%

Truncated Full

0.75

0.80

0.85

0.90

RRE

(b) Noise level = 5%

Truncated Full

0.80

0.85

0.90

0.95

1.00

1.05

RRE

(c) Noise level = 10%

Figure: Box plots comparing relative errors using estimated k and αkfor Full and Truncated SVD: for problem Satellite computed from100 runs for noise levels 1%, 5%, and 10%.

Surrogate found automatically and error is less than full space

Truncated Singular Value Decomposition as surrogate for A

Remark (Observations for UPRE)

1. Find αk for surrogate model TSVD Ak = UkΣkVTk with k

terms.2. Determine optimal k as αk converges to αopt

3. With UPRE for large enough k the full problem isregularized: i.e. γi(αk) ≈ 0 for i > k.

Remark (Extending to Large Scale)

I The TSVD for large problems is not feasible?I Use iterative methods, randomized SVD to find the

surrogate model of A.

Large Scale - Hybrid LSQR: Given k defines range space

LSQR Let β1 := ‖b‖2, and e(k+1)1 first column of Ik+1

Generate, lower bidiagonal Bk ∈ R(k+1)×k, columnorthonormal Hk+1 ∈ Rm×(k+1) , Gk ∈ Rn×k

AGk = Hk+1Bk, β1Hk+1e(k+1)1 = b.

Projected Problem on projected space: (standard Tikhonov)

wk(ζk) = argminw∈Rk

‖Bkw − β1e(k+1)1 ‖22 + ζ2k‖w‖22.

Projected Solution depends on ζoptk : Let Bk = U ΣV T

xk(ζoptk ) = Gkwk(ζ

optk ) = β1Gk

k+1∑i=1

γi(ζoptk )

uTi e(k+1)1

σivi

=

k∑i=1

γi(ζoptk )

uTi (HTk+1b)

σiGkvi =

k∑i=1

γi(ζoptk )

siσiGkvi

Approximate SVD: Ak = (Hk+1U)Σ(GkV )T

Hybrid Randomized Singular Value Decomposition : Proto [HMT11]

A ∈ Rm×n, target rank k, oversampling parameter p,k + p m. Power factor q. Compute A ≈ Ak = UkΣkV

Tk .

1: Generate a Gaussian random matrix Ω ∈ Rn×(k+p).2: Compute Y = AΩ ∈ Rm×(k+p). Y = orth(Y )3: If q > 0 repeat q times Y = A(ATY ), Y = orth(Y ). Power4: Form B = Y TA ∈ R(k+p)×n. (Q = Y )5: Economy SVD B = UBΣBV

TB , UB ∈ R(k+p)×(k+p),

VB ∈ Rk×k6: Uk = QUB(:, 1 : k), V k = VB(:, 1 : k), Σk = ΣB(1 : k, 1 : k)

Projected RSVD Problem

xk(µk) = argminx∈Rk

‖Akx− b‖22 + µ2k‖x‖22.

=

k∑i=1

γi(µk)uTi b

σivi. =

k∑i=1

γi(µk)siσi

vi.

Approximate SVD Ak = UkΣkVTk

Summary Comparisons : rank k approximation of A

RSVD and LSQR provide approximate TSVD (see references)

TSVD LSQR RSVDModel Ak Ak AkSVD UkΣkV

Tk (Hk+1U)Σ(GkV )T UkΣkV k

T

Terms k k k

si uTi b (Hk+1Uk)Ti b uTi b

Basis vi (GkVk)i viCoeff γi(αk) si

σivi γi(ζk) si

σi(Gkv)i γi(µk) si

σivi

‖A−Ak‖ σk+1 Theorem Ak Theorem Aksin(〈Vk, Vk〉) Golub [GvL96] Jia [Jia17] Saibaba [Sai]

Accuracy depends on the surrogate model?

Relative Errors using Approximate LSQR/RSVD with oversampling

101

102

103

10-1

100

Relative Errors

LSQR0

LSQR5

LSQR10

LSQR20

LSQR80

LSQR100

RSVD0

RSVD5

RSVD10

RSVD20

RSVD80

RSVD100

TSVD

OS LSQR conquers semi-convergence for small k.

Hybrid LSQR: LSQR with regularization

101

102

103

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Relative Errors for Regularized Solutions

LSQR0

LSQR5

LSQR10

LSQR20

LSQR80

LSQR100

TSVD

Relative Errors less than TSVD for small k

Hybrid RSVD: RSVD with regularization

101

102

103

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Relative Errors for Regularized Solutions q = 2

qRSVD0

qRSVD5

qRSVD10

qRSVD20

qRSVD80

qRSVD100

TSVD

Relative Errors larger than TSVD for small k

Questions to address

1. Both algorithms show semi-convergence.2. But what is happening with RSVD accuracy?3. Why is OS for LSQR effective?4. Relation of αk, ζk, µk.5. Can automatic algorithm be applied

Investigate the surrogate approximation for RSVD and LSQR

Contrasting RSVD and LSQR spectrum : Mildly Ill-posed

Figure: RSVD: Good Approximation of Dominant Singular Values fora problem of size 4096× 4096 using the RSVD algorithm using 100%oversampling, as compared to the exact singular values of theproblem.

200 400 600 800 1000 1200 1400 1600 1800 2000

10-1

RSVD Singular Values Max OS

True

k=2000

k=1000

k=750

k=500

k=400

k=200

k=100

k=50

k=20

k=10


Figure: LSQR: Good Approximation of fewer dominant singularvalues for a problem of size 4096× 4096 using the LSQR algorithmwith a Krylov subspace of size k, as compared to the exact singularvalues of the problem.


Figure: LSQR: Good Approximation of fewer dominant singularvalues for a problem of size 4096× 4096 using the LSQR algorithmwith a Krylov subspace of size k, as compared to the exact singularvalues of the problem. Oversampled 100%

The LSQR / RSVD spectrum

I The Lanczos algorithm provides good estimates ofextremal singular values

I LSQR exhibits semi-convergence as a result.I LSQR interior eigenvalue approximations improve with

increasing k - approximations stabilize with increasing k.I RSVD approximates dominant singular values, does not

capture ill-conditioning.

Contrast RSVD-LSQR: singular space approximation -with / without OS

Figure: Rank k approximation error RSVD Power with q = 2

200 400 600 800 1000 1200 1400 1600 1800 2000

10-1

Rank k Approximation errors

LSQR

RSVD0

RSVD5

RSVD10

RSVD20

RSVD80

RSVD100

TSVD

Power Iteration assists error reduction.

Contrast RSVD-LSQR: singular space approximation -with / without OS

Figure: Rank k approximation error RSVD q = 2 and OS LSQR

200 400 600 800 1000 1200 1400 1600 1800 2000

10-1

Rank k Approximation errors

LSQR0

LSQR5

LSQR10

LSQR20

LSQR80

LSQR100

RSVD5

TSVD

Oversampling LSQR improves rank k estimate

Contrasting Subspace Canonical Angles : Mildly Ill-posed

Figure: RSVD: The canonical angles increase exponentially forsubspace j to subspace k from 4096× 4096 using the RSVDalgorithm and decrease with OS: Example Size k = 400

50 100 150 200 250 300 350 400

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

RSVD Angles All k=400

p=0%

p=5%

p=10%

p=20%

p=80%

p=100%


Figure: RSVD with power iteration 2: The canonical angles increaseexponentially for subspace j to subspace k from 4096× 4096 usingthe RSVD algorithm and decrease with OS: Example Size k = 400

50 100 150 200 250 300 350 400

10-2

10-1

RSVD Angles All k=400

p=0%

p=5%

p=10%

p=20%

p=80%

p=100%


Figure: LSQR: The canonical angles increase after some subspacesize j∗ to subspace k from 4096× 4096 using the RSVD algorithm:Example Size k = 400

IMPACT: V Basis Matrices (2D)- Lower basis vectors

LSQR RSVD RSVD q = 2k = 100 p = 100%

1

0.01

0.02

0.03

2

-0.02

0

0.02

3

-0.02

0

0.02

4

-0.02

0

0.02

1

-0.04

-0.02

0

0.02

0.04

2

-0.02

0

0.02

0.04

3

-0.04

-0.02

0

0.02

0.04

4

-0.04

-0.02

0

0.02

0.04

1

0

0.02

0.04

2

-0.04

-0.02

0

0.02

3

-0.04

-0.02

0

0.02

4

-0.04

-0.02

0

0.02

k = 400 p = 100%1

0.01

0.02

0.03

2

-0.02

0

0.02

3

-0.02

0

0.02

4

-0.02

0

0.02

1

0

0.01

0.02

0.03

2

-0.02

0

0.02

3

-0.02

0

0.02

4

-0.02

0

0.02

1

0.01

0.02

0.03

2

-0.02

0

0.02

3

-0.02

0

0.02

4

-0.02

0

0.02

IMPACT: V Basis Matrices (2D)- Lower basis vectors

LSQR RSVD Power RSVD q = 2k = 750 p = 100%

1

0.01

0.02

0.03

2

-0.02

0

0.02

3

-0.02

0

0.02

4

-0.02

0

0.02

1

0.01

0.02

0.03

2

-0.02

0

0.02

3

-0.02

0

0.02

4

-0.02

0

0.02

1

0.01

0.02

0.03

2

-0.02

0

0.02

3

-0.02

0

0.02

4

-0.02

0

0.02

k = 2000 p = 100%1

0.01

0.02

0.03

2

-0.02

0

0.02

3

-0.02

0

0.02

4

-0.02

0

0.02

1

0.01

0.02

0.03

2

-0.02

0

0.02

3

-0.02

0

0.02

4

-0.02

0

0.02

1

0.01

0.02

0.03

2

-0.02

0

0.02

3

-0.02

0

0.02

4

-0.02

0

0.02

Observations: LSQR and RSVD

1. LSQR : semi-convergence2. OS LSQR : overcomes semi-convergence3. RSVD has smaller rank k error than LS.4. BUT RSVD does not capture the subspace of rank k from

a k + p estimate as well as LSQR - canonical angles arelarger.

5. Plots of the basis support the reduced accuracy of theRSVD subspaces

Restored solutions at optimal k = 750, 50 for RSVD, LSQR, resp.

Restored Regularized Solutions noise level with SNR ≈ 13

Figure: LSQR k = 50

k=50 p=0% k=50 p=10%

k=50 p=20% k=50 p=80%


Figure: RSVD k = 50

k=50 p=0% q=2 k=50 p=10% q=2

k=50 p=20% q=2 k=50 p=80% q=2


Figure: LSQR k = 750

k=750 p=0% k=750 p=10%

k=750 p=20% k=750 p=80%


Figure: RSVD k = 750

k=750 p=0% q=2 k=750 p=10% q=2

k=750 p=20% q=2 k=750 p=80% q=2

Overview Conclusions

Dominant Subspace Finding dominant singular space of modelmatrix is important: Oversampling

RSVD / LSQR Trade offs depend on speed by which singularvalues decrease (degree of ill-posedness)

Cost While LSQR costs more per iteration, provides thedominant subspace more accurately for k small.

Hybrid Implementations stabilize the solution errors.Future Investigate transfer of noise to the RSVD

subspace - apparently inaccurate.

Relevance to Data Science

Remark (Messages of the Analysis)

I SVD plays a role in analysis of large datasets?I Impact of approximating the spectrum by surrogates?I Important to understand impact of noise on spectrumI Important to analyze the methods

Some key references

Gene H. Golub and Charles F. van Loan.Matrix computations.Johns Hopkins Press, Baltimore, 3rd edition, 1996.

N. Halko, P. G. Martinsson, and J. A. Tropp.Finding structure with randomness: Probabilistic algorithms for constructing approximate matrixdecompositions.SIAM Review, 53(2):217–288, 2011.

Zhongxiao Jia.The regularization theory of the Krylov iterative solvers LSQR and CGLS for linear discrete ill-posedproblems, part I: the simple singular value case.https://arxiv.org/abs/1701.05708, January 2017.

R. A. Renaut, A. Helmstetter, and S. Vatankhah.Convergence of regularization parameters for solutions using the filtered truncated singular valuedecomposition.http://arxiv.org/abs/1809.00249, 2018.Submitted.

Arvind K. Saibaba.Analysis of randomized subspace iteration: Canonical angles and unitarily invariant norms.http://arxiv.org/abs/1804.02614.

http://arxiv.org/abs/1809.00249

http://arxiv.org/abs/1804.02614

contrasting properties of rsvd and lsqr algorithms for...

Documents