contrasting properties of rsvd and lsqr algorithms for...
TRANSCRIPT
Contrasting properties of RSVD and LSQRalgorithms for solutions of ill-posed problems:
Approximating the SVD
Rosemary Renaut1 Anthony Helmstetter 1 SaeedVatankhah2
1: School of Mathematical and Statistical Sciences, Arizona State University,[email protected], [email protected]
2: Institute of Geophysics, University of Tehran, [email protected]
International Conference on Mathematics of Data ScienceNovember 2018
Outline
Background: TSVD surrogate for the small scaleStandard Approaches to Estimate Regularization ProblemConvergence of the regularization parameter for UPREAlgorithm Verification
Methods for the Large Scale: Approximating the SVDKrylov: Golub Kahan Bidiagonalization - LSQRRandomized SVDSimulations: Hybrid RSVD and Hybrid LSQR
Conclusions: RSVD - LSQRMain ResultsRelevance to Data Science
Simple Ill-Posed Problem: Image Restoration
True Noise
Naive Solution PSF
Mildly ill-posed problem: Slow decay of singular values. SNR 13
Notation: Spectral Decomposition of the Solution: The SVD
Consider general discrete problem
Ax = b, A ∈ Rm×n, b ∈ Rm, x ∈ Rn.
Singular value decomposition (SVD) of A rank r ≤ min(m,n)
A = UΣV T =
r∑i=1
uiσivTi , Σ = diag(σ1, . . . , σr).
Singular values σi, singular vectors ui, vi, rank r.Expansion for the solution:
x =
r∑i=1
siσivi, si = uTi b
Regularization: Filtering and Truncation
Truncated SVD of size k gives best rank k approximation to A.Surrogate model is given by Ak ≈ UkΣkVk
T .Filtered and Truncated solution
x =
k∑i=1
γi(α)siσi
vi
Filter Factor γi(α) (γi = 0 when i > k)Regularization parameters :
I truncation k - find the size for the surrogatemodel.
I regularization parameter α for the hybridsurrogate.
Regularization Parameter Estimation: Find αopt to minimize F (α)
Filter function γi(α) and complement φi(α).φ(α) = α2
α2+σ2i
= 1− γi(α), i = 1 : r, φi = 1, i > k.
Unbiased Predictive Risk : Minimize functional, noise level η2
Uk(α) =
k∑i=1
φ2i (α)s2i − 2η2k∑i=1
φi(α)
GCV : Minimize rational function, m∗ = minm,n
G(α) =
(∑m∗
i=1 φ2i (α)s2i
)(∑m∗
i=1 φi(α))2
How does αopt = argminF (α) depend on k?
Convergence αk with k for GCV and UPRE: Examples Restore Tools
Different noise levels: GCV and UPREGrain Satellite
Mildly Ill-Posed Moderately Ill-Posedσi = ζi−τ , 12 ≤ τ ≤ 1 σi = ζi−τ , τ > 1
αk converges with k and depends on noise level.Supports use of truncated SVD as surrogate
Theory: Assumptions
Assumptions (Normalization)The system is normalized so that we may assume σ1 = 1.
Assumptions (Decay Rate)The measured coefficients si decay according to|si|2 = σ
2(1+ν)i > σ2 for 0 < ν < 1, 1 ≤ i ≤ `, i.e. the dominant
measured coefficients follow the decay rate of the exactcoefficients.
Assumptions (Noise in Coefficients)There exists ` such that E(|si|2) = σ2 for all i > `, i.e. that thecoefficients si are noise dominated for i > `.
Theorems on Convergence of αk for UPRE [RHV18]
TheoremSuppose Assumptions 2 and 3, and that Uk(αk) is a minimumfor Uk(α). Then αk > α` > σ`+1/
√1− σ2`+1 = αmin for k ≥ `.
TheoremSuppose the decay rate and noise assumptions, and that αopt,and each αk, k > ` are unique on σ`+1/
√1− σ2`+1 < α < 1.
ThenI αkk>` is on the average increasing with
limk→r E(αk) = E(αopt).I Uk(αk) is increasing.
Theory can be used to estimate k and αk
Comparing Automatic Parameter Estimates by TSVD and SVD
Truncated Full
0.025
0.030
α
(a) Noise level = 1%
Truncated Full0.06
0.07
0.08
0.09
0.10
α
(b) Noise level = 5%
Truncated Full0.08
0.10
0.12
0.14
0.16
α
(c) Noise level = 10%
Figure: Box plots comparing parameter estimates αk with αopt forproblem Satellite computed from 100 runs for noise levels 1%,5%, and 10%.
Robust algorithm verifies choice of k and αk with increasing k
Comparing Automatic Relative Errors TSVD and SVD
Truncated Full
0.64
0.66
0.68
0.70
0.72
0.74
RRE
(a) Noise level = 1%
Truncated Full
0.75
0.80
0.85
0.90
RRE
(b) Noise level = 5%
Truncated Full
0.80
0.85
0.90
0.95
1.00
1.05
RRE
(c) Noise level = 10%
Figure: Box plots comparing relative errors using estimated k and αkfor Full and Truncated SVD: for problem Satellite computed from100 runs for noise levels 1%, 5%, and 10%.
Surrogate found automatically and error is less than full space
Truncated Singular Value Decomposition as surrogate for A
Remark (Observations for UPRE)
1. Find αk for surrogate model TSVD Ak = UkΣkVTk with k
terms.2. Determine optimal k as αk converges to αopt
3. With UPRE for large enough k the full problem isregularized: i.e. γi(αk) ≈ 0 for i > k.
Remark (Extending to Large Scale)
I The TSVD for large problems is not feasible?I Use iterative methods, randomized SVD to find the
surrogate model of A.
Large Scale - Hybrid LSQR: Given k defines range space
LSQR Let β1 := ‖b‖2, and e(k+1)1 first column of Ik+1
Generate, lower bidiagonal Bk ∈ R(k+1)×k, columnorthonormal Hk+1 ∈ Rm×(k+1) , Gk ∈ Rn×k
AGk = Hk+1Bk, β1Hk+1e(k+1)1 = b.
Projected Problem on projected space: (standard Tikhonov)
wk(ζk) = argminw∈Rk
‖Bkw − β1e(k+1)1 ‖22 + ζ2k‖w‖22.
Projected Solution depends on ζoptk : Let Bk = U ΣV T
xk(ζoptk ) = Gkwk(ζ
optk ) = β1Gk
k+1∑i=1
γi(ζoptk )
uTi e(k+1)1
σivi
=
k∑i=1
γi(ζoptk )
uTi (HTk+1b)
σiGkvi =
k∑i=1
γi(ζoptk )
siσiGkvi
Approximate SVD: Ak = (Hk+1U)Σ(GkV )T
Hybrid Randomized Singular Value Decomposition : Proto [HMT11]
A ∈ Rm×n, target rank k, oversampling parameter p,k + p m. Power factor q. Compute A ≈ Ak = UkΣkV
Tk .
1: Generate a Gaussian random matrix Ω ∈ Rn×(k+p).2: Compute Y = AΩ ∈ Rm×(k+p). Y = orth(Y )3: If q > 0 repeat q times Y = A(ATY ), Y = orth(Y ). Power4: Form B = Y TA ∈ R(k+p)×n. (Q = Y )5: Economy SVD B = UBΣBV
TB , UB ∈ R(k+p)×(k+p),
VB ∈ Rk×k6: Uk = QUB(:, 1 : k), V k = VB(:, 1 : k), Σk = ΣB(1 : k, 1 : k)
Projected RSVD Problem
xk(µk) = argminx∈Rk
‖Akx− b‖22 + µ2k‖x‖22.
=
k∑i=1
γi(µk)uTi b
σivi. =
k∑i=1
γi(µk)siσi
vi.
Approximate SVD Ak = UkΣkVTk
Summary Comparisons : rank k approximation of A
RSVD and LSQR provide approximate TSVD (see references)
TSVD LSQR RSVDModel Ak Ak AkSVD UkΣkV
Tk (Hk+1U)Σ(GkV )T UkΣkV k
T
Terms k k k
si uTi b (Hk+1Uk)Ti b uTi b
Basis vi (GkVk)i viCoeff γi(αk) si
σivi γi(ζk) si
σi(Gkv)i γi(µk) si
σivi
‖A−Ak‖ σk+1 Theorem Ak Theorem Aksin(〈Vk, Vk〉) Golub [GvL96] Jia [Jia17] Saibaba [Sai]
Accuracy depends on the surrogate model?
Relative Errors using Approximate LSQR/RSVD with oversampling
101
102
103
10-1
100
Relative Errors
LSQR0
LSQR5
LSQR10
LSQR20
LSQR80
LSQR100
RSVD0
RSVD5
RSVD10
RSVD20
RSVD80
RSVD100
TSVD
OS LSQR conquers semi-convergence for small k.
Hybrid LSQR: LSQR with regularization
101
102
103
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Relative Errors for Regularized Solutions
LSQR0
LSQR5
LSQR10
LSQR20
LSQR80
LSQR100
TSVD
Relative Errors less than TSVD for small k
Hybrid RSVD: RSVD with regularization
101
102
103
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Relative Errors for Regularized Solutions q = 2
qRSVD0
qRSVD5
qRSVD10
qRSVD20
qRSVD80
qRSVD100
TSVD
Relative Errors larger than TSVD for small k
Questions to address
1. Both algorithms show semi-convergence.2. But what is happening with RSVD accuracy?3. Why is OS for LSQR effective?4. Relation of αk, ζk, µk.5. Can automatic algorithm be applied
Investigate the surrogate approximation for RSVD and LSQR
Contrasting RSVD and LSQR spectrum : Mildly Ill-posed
Figure: RSVD: Good Approximation of Dominant Singular Values fora problem of size 4096× 4096 using the RSVD algorithm using 100%oversampling, as compared to the exact singular values of theproblem.
200 400 600 800 1000 1200 1400 1600 1800 2000
10-1
RSVD Singular Values Max OS
True
k=2000
k=1000
k=750
k=500
k=400
k=200
k=100
k=50
k=20
k=10
Contrasting RSVD and LSQR spectrum : Mildly Ill-posed
Figure: LSQR: Good Approximation of fewer dominant singularvalues for a problem of size 4096× 4096 using the LSQR algorithmwith a Krylov subspace of size k, as compared to the exact singularvalues of the problem.
Contrasting RSVD and LSQR spectrum : Mildly Ill-posed
Figure: LSQR: Good Approximation of fewer dominant singularvalues for a problem of size 4096× 4096 using the LSQR algorithmwith a Krylov subspace of size k, as compared to the exact singularvalues of the problem. Oversampled 100%
The LSQR / RSVD spectrum
I The Lanczos algorithm provides good estimates ofextremal singular values
I LSQR exhibits semi-convergence as a result.I LSQR interior eigenvalue approximations improve with
increasing k - approximations stabilize with increasing k.I RSVD approximates dominant singular values, does not
capture ill-conditioning.
Contrast RSVD-LSQR: singular space approximation -with / without OS
Figure: Rank k approximation error RSVD Power with q = 2
200 400 600 800 1000 1200 1400 1600 1800 2000
10-1
Rank k Approximation errors
LSQR
RSVD0
RSVD5
RSVD10
RSVD20
RSVD80
RSVD100
TSVD
Power Iteration assists error reduction.
Contrast RSVD-LSQR: singular space approximation -with / without OS
Figure: Rank k approximation error RSVD q = 2 and OS LSQR
200 400 600 800 1000 1200 1400 1600 1800 2000
10-1
Rank k Approximation errors
LSQR0
LSQR5
LSQR10
LSQR20
LSQR80
LSQR100
RSVD5
TSVD
Oversampling LSQR improves rank k estimate
Contrasting Subspace Canonical Angles : Mildly Ill-posed
Figure: RSVD: The canonical angles increase exponentially forsubspace j to subspace k from 4096× 4096 using the RSVDalgorithm and decrease with OS: Example Size k = 400
50 100 150 200 250 300 350 400
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
RSVD Angles All k=400
p=0%
p=5%
p=10%
p=20%
p=80%
p=100%
Contrasting Subspace Canonical Angles : Mildly Ill-posed
Figure: RSVD with power iteration 2: The canonical angles increaseexponentially for subspace j to subspace k from 4096× 4096 usingthe RSVD algorithm and decrease with OS: Example Size k = 400
50 100 150 200 250 300 350 400
10-2
10-1
RSVD Angles All k=400
p=0%
p=5%
p=10%
p=20%
p=80%
p=100%
Contrasting Subspace Canonical Angles : Mildly Ill-posed
Figure: LSQR: The canonical angles increase after some subspacesize j∗ to subspace k from 4096× 4096 using the RSVD algorithm:Example Size k = 400
IMPACT: V Basis Matrices (2D)- Lower basis vectors
LSQR RSVD RSVD q = 2k = 100 p = 100%
1
0.01
0.02
0.03
2
-0.02
0
0.02
3
-0.02
0
0.02
4
-0.02
0
0.02
1
-0.04
-0.02
0
0.02
0.04
2
-0.02
0
0.02
0.04
3
-0.04
-0.02
0
0.02
0.04
4
-0.04
-0.02
0
0.02
0.04
1
0
0.02
0.04
2
-0.04
-0.02
0
0.02
3
-0.04
-0.02
0
0.02
4
-0.04
-0.02
0
0.02
k = 400 p = 100%1
0.01
0.02
0.03
2
-0.02
0
0.02
3
-0.02
0
0.02
4
-0.02
0
0.02
1
0
0.01
0.02
0.03
2
-0.02
0
0.02
3
-0.02
0
0.02
4
-0.02
0
0.02
1
0.01
0.02
0.03
2
-0.02
0
0.02
3
-0.02
0
0.02
4
-0.02
0
0.02
IMPACT: V Basis Matrices (2D)- Lower basis vectors
LSQR RSVD Power RSVD q = 2k = 750 p = 100%
1
0.01
0.02
0.03
2
-0.02
0
0.02
3
-0.02
0
0.02
4
-0.02
0
0.02
1
0.01
0.02
0.03
2
-0.02
0
0.02
3
-0.02
0
0.02
4
-0.02
0
0.02
1
0.01
0.02
0.03
2
-0.02
0
0.02
3
-0.02
0
0.02
4
-0.02
0
0.02
k = 2000 p = 100%1
0.01
0.02
0.03
2
-0.02
0
0.02
3
-0.02
0
0.02
4
-0.02
0
0.02
1
0.01
0.02
0.03
2
-0.02
0
0.02
3
-0.02
0
0.02
4
-0.02
0
0.02
1
0.01
0.02
0.03
2
-0.02
0
0.02
3
-0.02
0
0.02
4
-0.02
0
0.02
Observations: LSQR and RSVD
1. LSQR : semi-convergence2. OS LSQR : overcomes semi-convergence3. RSVD has smaller rank k error than LS.4. BUT RSVD does not capture the subspace of rank k from
a k + p estimate as well as LSQR - canonical angles arelarger.
5. Plots of the basis support the reduced accuracy of theRSVD subspaces
Restored solutions at optimal k = 750, 50 for RSVD, LSQR, resp.
Restored Regularized Solutions noise level with SNR ≈ 13
Figure: LSQR k = 50
k=50 p=0% k=50 p=10%
k=50 p=20% k=50 p=80%
Restored Regularized Solutions noise level with SNR ≈ 13
Figure: RSVD k = 50
k=50 p=0% q=2 k=50 p=10% q=2
k=50 p=20% q=2 k=50 p=80% q=2
Restored Regularized Solutions noise level with SNR ≈ 13
Figure: LSQR k = 750
k=750 p=0% k=750 p=10%
k=750 p=20% k=750 p=80%
Restored Regularized Solutions noise level with SNR ≈ 13
Figure: RSVD k = 750
k=750 p=0% q=2 k=750 p=10% q=2
k=750 p=20% q=2 k=750 p=80% q=2
Overview Conclusions
Dominant Subspace Finding dominant singular space of modelmatrix is important: Oversampling
RSVD / LSQR Trade offs depend on speed by which singularvalues decrease (degree of ill-posedness)
Cost While LSQR costs more per iteration, provides thedominant subspace more accurately for k small.
Hybrid Implementations stabilize the solution errors.Future Investigate transfer of noise to the RSVD
subspace - apparently inaccurate.
Relevance to Data Science
Remark (Messages of the Analysis)
I SVD plays a role in analysis of large datasets?I Impact of approximating the spectrum by surrogates?I Important to understand impact of noise on spectrumI Important to analyze the methods
Some key references
Gene H. Golub and Charles F. van Loan.Matrix computations.Johns Hopkins Press, Baltimore, 3rd edition, 1996.
N. Halko, P. G. Martinsson, and J. A. Tropp.Finding structure with randomness: Probabilistic algorithms for constructing approximate matrixdecompositions.SIAM Review, 53(2):217–288, 2011.
Zhongxiao Jia.The regularization theory of the Krylov iterative solvers LSQR and CGLS for linear discrete ill-posedproblems, part I: the simple singular value case.https://arxiv.org/abs/1701.05708, January 2017.
R. A. Renaut, A. Helmstetter, and S. Vatankhah.Convergence of regularization parameters for solutions using the filtered truncated singular valuedecomposition.http://arxiv.org/abs/1809.00249, 2018.Submitted.
Arvind K. Saibaba.Analysis of randomized subspace iteration: Canonical angles and unitarily invariant norms.http://arxiv.org/abs/1804.02614.