probabilistic numerical linear solvers · 2020. 5. 20. · overview 1 probabilistic numerics: the...
TRANSCRIPT
![Page 1: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/1.jpg)
Probabilistic Numerical Linear Solvers
Ilse C.F. Ipsen
Raleigh, NC, USA
Research supported in part by NSF DMS: RTG & FRG
![Page 2: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/2.jpg)
Collaborators
!"#$%"&'()#*
!"#$%&'()$*%+$,-)-'-./%01
%+,-.$/(0*.
0$)2.(,)-3%45%6.78#,-".%'94$%
&3$./%01
1-2$3*-4
64(-:%;#(4")$#%<-#-.%0$)2.(,)-3/%
0<!
![Page 3: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/3.jpg)
Probabilistic Numerics
Statistical treatment of approximation errors indeterministic numerical methods:
Assign probability distributions to quantities of interestExpress methods as probabilistic inference
Model uncertainty due to limited computational resources(Truncation errors in discretizations, termination of iterative methods)
But why??? Want error measures that
are more informative than traditional, pessimistic bounds
can be propagated through computational pipelines
Probabilistic numerical methods have been developed for:Approximation, quadrature, numerical solution of ordinary andpartial differential equations, optimization
Here: Computational kernels and linear algebra
![Page 4: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/4.jpg)
Probabilistic Numerics
Statistical treatment of approximation errors indeterministic numerical methods:
Assign probability distributions to quantities of interestExpress methods as probabilistic inference
Model uncertainty due to limited computational resources(Truncation errors in discretizations, termination of iterative methods)
But why??? Want error measures that
are more informative than traditional, pessimistic bounds
can be propagated through computational pipelines
Probabilistic numerical methods have been developed for:Approximation, quadrature, numerical solution of ordinary andpartial differential equations, optimization
Here: Computational kernels and linear algebra
![Page 5: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/5.jpg)
Probabilistic Numerics
Statistical treatment of approximation errors indeterministic numerical methods:
Assign probability distributions to quantities of interestExpress methods as probabilistic inference
Model uncertainty due to limited computational resources(Truncation errors in discretizations, termination of iterative methods)
But why??? Want error measures that
are more informative than traditional, pessimistic bounds
can be propagated through computational pipelines
Probabilistic numerical methods have been developed for:Approximation, quadrature, numerical solution of ordinary andpartial differential equations, optimization
Here: Computational kernels and linear algebra
![Page 6: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/6.jpg)
Our Focus: Accurate Error Estimationfor Iterative Solution of Linear Systems
[Disclaimer: Research still at proof-of-concept stage]
Given: Nonsingular real matrix A, vector bWant: Solution x∗ of linear system Ax∗ = b
Iterative solver:
From user-specified initial guess x0
computes iterates x1, x2, . . . , xm → x∗
Prior DistributionReflects initial knowledge about x∗
Posterior Distribution at iteration mReflects knowledge about x∗ after m iterations
![Page 7: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/7.jpg)
What We Can Do So Far (Details Later)
0 10 20 30 40 50 60 70 80 90 100
Iteration number m
10-6
10-4
10-2
100
102
104
106
Re
lativ
e e
rro
r
Error
CG bound
2-norm bound
Exact error ‖xm − x∗‖/‖x∗‖Traditional error bounds ≥ 1, not informativeOur estimates from posterior distribution ≈ exact error
![Page 8: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/8.jpg)
What We Can Do So Far (Details Later)
0 10 20 30 40 50 60 70 80 90 100
Iteration number m
10-6
10-4
10-2
100
102
104
106
Re
lativ
e e
rro
r
Error
CG bound
2-norm bound
S Statistic
Exact error ‖xm − x∗‖/‖x∗‖Traditional error bounds ≥ 1, not informativeOur S statistic from posterior distribution ≈ exact error
![Page 9: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/9.jpg)
Overview
1 Probabilistic Numerics: The Big Picture
2 Gaussian Probability Distributions
3 BayesCG, a Probabilistic Numerical Linear Solver
4 Errors, and Metrics on Probability Distributions
5 Two Priors that Reproduce Conjugate Gradient
Inverse priorKrylov priors
6 Low-Rank Approximations of Krylov Priors
![Page 10: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/10.jpg)
1. Probabilistic Numerics: The Big Picture
http://probabilistic-numerics.org/
![Page 11: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/11.jpg)
Early Probabilistic Numerics
H. Poincare (1896, 1912): Calcul des Probabilites
A.V. Sul’din (1959): Wiener Measure and its Application toApproximation Methods, I. Izv. Vyss. Ucben. Zayed. Mat.
F. M. Larkin (1972): Gaussian Measure in Hilbert space andApplications in Numerical Analysis, Rocky Mountain J. Math.
G.S. Kimeldorf, G. Wahba (1970): A Correspondence betweenBayesian Estimation on Stochastic Processes and Smoothingby Splines, Ann. Math. Stat.
P. Diaconis (1988): Bayesian Numerical Analysis, Stat. Decis.Theory Relat. Top. IV
A. O’Hagan (1992): Some Bayesian Numerical Analysis,Bayesian. Stat.
![Page 12: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/12.jpg)
Perspectives on Modern Probabilistic Numerics
Bayesian inference in inverse problemsA.M. Stuart (2010): Inverse Problems: A BayesianPerspective, Acta Numer.
‘A call to arms for probabilistic numerical methods’P. Hennig, M.A Osborne, M. Girolami (2015): ProbabilisticNumerics and Uncertainty in Computations, Proc. R. Soc. A
Statistical foundations for non-linear, non-Gaussian problemsJ. Cockayne, C.J. Oates, T.J. Sullivan, M. Girolami (2019):Bayesian Probabilistic Numerical Methods, SIAM Rev.
Historical development of probabilistic numericsC.J. Oates, T.J. Sullivan (2019): A Modern Retrospective onProbabilistic Numerics, Stat. Comput.
Probabilistic Numerics at SIAM Conference on UQhttp://probabilistic-numerics.org/meetings/SIAMUQ2020/
![Page 13: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/13.jpg)
Application: Global Optimization
![Page 14: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/14.jpg)
Application: Numerical Integration for Renderingof Glossy Surfaces
(RMSE = 0.039, time = 2m18s) (RMSE = 0.026, time = 2m20s)(a) Reference (b) LDIS (c) BMC
Fig. 1. Indirect radiance component for the Room scene rendered with low discrepancy Monte Carlo ImportanceSampling (LDIS, (b)) and BMC (c). The shown images have been multiplied by a factor of 4. The RMSE and the timeare computed by considering only the glossy component. 16 ray samples per visible point were used for the materialswith a glossy BRDF (teapot, tea cup and fruit-dish), while 64 samples were used for the diffuse BRDFs.
R. Marques, C. Bouville, M. Ribardiere, L.P. Santos, K. Bouatouch(2013), IEEE Trans. Vis. Comput. Graph.
![Page 15: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/15.jpg)
Probabilistic Numerics & Other Areas
Connections to
Information-based complexity[Traub, Wasilkowski, Wozniakowski 1983]
Average-case analysis[Novak 1988], [Ritter 2000]
Data assimilation, Kalman filters[Law, Stuart, Zygalakis 2015], [Reich, Cotter 2015]
Statistical learning[Hastie, Tibshirani, Friedman 2009], [Rasmussen, Williams 2006]
Difference to Uncertainty Quantification [Smith 2014]
UQ: Problems usually ill-posed, uncertainties from all sources:model, parameter, experimental, measurement
PN: Problems are well-posed, algorithmic uncertainty due tolimited computational resources
![Page 16: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/16.jpg)
Probabilistic Numerics in Linear Algebra
P. Hennig (2015): Probabilistic Interpretation of LinearSolvers, SIAM J. Optim.
S. Bartels, P. Hennig (2015): Probabilistic ApproximateLeast-Squares, Proc. Machine Learning Research
F. Schafer, T.J. Sullivan, H. Owhadi (2017): Compression,Inversion, and Approximate PCA of Dense Kernel Matrices atNear-Linear Computational Complexity, arXiv
J. Cockayne, C.J. Oates, I.C.F. Ipsen, M. Girolami (2019): ABayesian Conjugate Gradient Method, Bayesian Anal.
S. Bartels, J. Cockayne, I.C.F. Ipsen, P. Hennig (2019):Probabilistic Linear Solvers: A Unifying View, Stat. Comput.
![Page 17: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/17.jpg)
2. Gaussian Probability Distributions
[The MathWorks R2020a]
![Page 18: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/18.jpg)
Our Probability Distributions are Gaussians
Gaussian = multi-variate normal distributionGaussian random vector x ∼ N (µ0, Σ0)
Mean µ0 ∈ Rd
Covariance Σ0 ∈ Rd×d symmetric positive semi-definite
If Σ0 also positive definite, then probability density function is
p(x) = 1√det(Σ0) (2π)d
exp(−1
2‖x − µ0‖2Σ
−10
)where ‖x − µ0‖2
Σ−10
= (x − µ0)T Σ−10 (x − µ0) and T is transpose
Why Gaussians?Stability: Linear transformations preserve GaussianityIf x ∼ N (µ0, Σ0) then B x + z ∼ N (B µ0 + z , B Σ0 BT )
[Muirhead 1982], [Stuart 2010]
![Page 19: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/19.jpg)
Stability of Gaussians for Computing Posteriors
If x ∼ N (µ0, Σ0) then z + B x ∼ N (z + B µ0, B Σ0 BT )
Prior: x ∼ N (µ0,Σ0)
Condition on linear information y = Bx
Bayesian inferencePosterior: x | y ∼ N (µ, Σ) with
µ = µ0 + Σ0BT (B Σ0 BT )−1(y − Bµ0)
Σ = Σ0 − Σ0BT (B Σ0 BT )−1 B Σ0
Connections to: Schur complements, projectors
Conditioning Gaussian random vector x ∼ N (x0,Σ0)on linear information y = Bx
gives another Gaussian random vector x | y ∼ N (µ, Σ)
![Page 20: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/20.jpg)
Stability of Gaussians for Computational Sampling
Want: Sample X ∼ N (µ,Σ)
Stability: If x ∼ N (0, I ) then µ + L x ∼ N (µ, LLT )
Possible factorizations Σ = LLT
Square root L = Σ1/2
Cholesky factorization, if Σ nonsingularThin Cholesky: L has rank(Σ) columns
Matlab
X = µ + L ∗ randn(size(L, 2), 1)︸ ︷︷ ︸vector ∼ N (0,I )
![Page 21: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/21.jpg)
3. BayesCG, a Probabilistic Numerical Linear Solver
![Page 22: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/22.jpg)
Probabilistic Linear System Solution
Given: Symmetric positive-definite A ∈ Rd×d , vector b ∈ Rd
Want: Solution of Ax∗ = bInitial guess x0: A(x∗ − x0) = b − Ax0
Solver: Computes iterates x1, x2, . . . , xm → x∗
Solution-based inference
Probability distribution over solution space x ∈ Rd
Keeping it simple: xm = µm, m ≥ 0Iterates coincide with means of probability distribution
User specifies Gaussian prior N (x0, Σ0)Prior reflects initial knowledge about x∗
Solver computes Gaussian posteriors N (xm, Σm)Posteriors reflect knowledge about x∗ after m iterations
![Page 23: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/23.jpg)
Computing the Posteriors in Iteration m ≥ 1[Cockayne, Oates, Sullivan, Girolami 2019]
Information about x∗ provided by m search directions
Sm =(s1 · · · sm
)∈ Rd×m rank(Sm) = m
Condition on ym = STmAx∗ = ST
mbExists unique Bayesian method that outputs posterior distribution
x | ym ∼ N (xm, Σm)
with
xm = x0 + Σ0ASm
(STmAΣ0ASm
)−1STm (b − Ax0)
Σm = Σ0 − Σ0ASm
(STmAΣ0ASm
)−1STmAΣ0
Choose AΣ0A-orthogonal search directions: STmAΣ0ASm = Im
![Page 24: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/24.jpg)
Bayesian Conjugate Gradient Method (BayesCG)[Cockayne, Oates, Ipsen, Girolami 2019]
r0 = b − Ax0 {initial residual}s1 = r0/
√r0
TA Σ0A r0 {initial search direction}m = 1while not converged do
Σm = Σm−1 − (Σ0Asm) (Σ0Asm)T {next posterior}xm = xm−1 + Σ0Asm (rTm−1sm) {next iterate}rm = b − Axm {next residual}sm+1 = rm− sm (rTmA Σ0Asm) {next search direction}sm+1 = sm+1/
√sTm+1A Σ0A sm+1 {normalize search dir}
end while
![Page 25: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/25.jpg)
Properties of BayesCG
Krylov space for iterates: xm ∈ K∗m
K∗m = x0 +Km
(Σ0A2, Σ0A (b − Ax0)
)Error minimization in Σ−1
0 norm
‖xm − x∗‖Σ−10
= minx∈K∗
m
‖x − x∗‖Σ−10
Convergence in Σ−10 norm depends on conditioning of Σ0 A2
‖xm − x∗‖Σ−10≤ 2
(√κ2(Σ0 A2)−1√κ2(Σ0 A2)+1
)m
‖x0 − x∗‖Σ−10
Prior: If Σ0 = A−1 then BayesCG = CG
CG = Bayesian inference with prior N (x0,A−1)
![Page 26: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/26.jpg)
Summary: BayesCG
Solve symmetric positive-definite system Ax∗ = b
BayesCG = probabilistic extension of Conjugate GradientModels uncertainty in solution x∗ due to early termination
Input: Gaussian prior N (x0,Σ0)Models initial uncertainty about solution x∗Mean x0 identical to initial guess for iterates
Output: Gaussian posterior N (xm,Σm) at iteration mModels uncertainty about solution x∗ after m iterationsMean identical to iterate xm ≈ x∗
Next: How to use the BayesCG posteriors?
![Page 27: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/27.jpg)
4. Errors, andMetrics on Probability Distributions
![Page 28: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/28.jpg)
Metric on Set of Probability Distributions
Measure distance between two probability distributions
Any two Gaussian distributionsGm = N (xm,Σm) and G∗ = N (x∗,Σ∗)
Wasserstein 2-norm metric: Distance between Gm and G∗
[W2(Gm, G∗)]2 = ‖xm − x∗‖22 + trace
(Σm + Σ∗ − 2 (ΣmΣ∗)
1/2)
[Dowson, Landau 1982]
Extension to general weighted norms (B is spd)
[WB(Gm, G∗)]2 = ‖xm − x∗‖2B
+ trace(B(Σm + Σ∗)− 2 (B1/2ΣmBΣ∗B1/2)1/2
)
![Page 29: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/29.jpg)
Application of Wasserstein Metric to BayesCG
Input: Prior G0 = N (x0,Σ0)Output: Posteriors Gm = N (xm,Σm)Errors: ‖xm − x∗‖Σ
−10
Represent solution x∗ as point distribution N (x∗, 0)[W
Σ−10
(Gm, x∗)]2
= ‖xm − x∗‖2Σ
−10
+ trace(
Σ−10 Σm
)= ‖xm − x∗‖2
Σ−10
+ (d −m)
Σ−10 distance of posterior Gm to solution x∗ equals
Σ−10 norm error + dimension of unexplored space
Computational error estimation with “S-statistic”
‖xm − x∗‖2Σ
−10
≈ ‖xm − X‖2Σ
−10
where X ∼ Gm
![Page 30: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/30.jpg)
Possible Prior Distributions
“Noninformative”: Σ0 = I dInverse: Σ0 = A−1, BayesCG = CG
Natural: Σ0 = A−2, convergence in 1 iteration
Preconditioner: Σ0 = (PTP)−1 ≈ A−2
Krylov1: Σ0 = V ΦV T , where columns of V are basis forKrylov space
Hierarchical: Σ0 = ν Σ0 with Jeffrey’s improper p(ν) ∼ ν−1
Priors that reproduce CG: Inverse and Krylov1
Impractical“academic” priors that already contain allinformation to compute solution immediately
But: We study them to calibrate our expectationsIn order to: Develop practical, low-rank approximations
1Different from Krylov prior in [Cockayne, Oates, Ipsen, Girolami 2019]
![Page 31: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/31.jpg)
5a. Priors that Reproduce CGInverse Prior
![Page 32: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/32.jpg)
Wasserstein Metric for BayesCG under Inverse Prior
Prior G0 = N (x0,Σ0) with Σ0 = A−1
BayesCG under inverse prior = CG + posteriors
Posteriors Gm = N (xm,Σm) with Σm = Σm−1 − smsTmDown dating with search directions
Distance between posterior and solution
[WA(Gm, x∗)]2 = ‖xm − x∗‖2A + (d −m)
Error estimation with S statistic
‖xm − x∗‖2A ≈ ‖xm − X‖2
A where X ∼ Gm
Experiments: Ax∗ = b
Dimension d = 100, x∗ = 11, κ2(A) ≈ 4 · 104
S statistic samples per iteration: 10
![Page 33: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/33.jpg)
BayesCG under Inverse PriorAbsolute Error, Two-norm Bound, S Statistic
0 10 20 30 40 50 60 70 80 90 100
Iteration number m
10-4
10-2
100
102
104
106
108
1010
1012
Sq
ua
red
ab
so
lute
err
or
Error
two-norm bound
S Statistic
Squared absolute error ‖xm − x∗‖2A
S statistic samplesTwo-norm bound squared
![Page 34: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/34.jpg)
BayesCG under Inverse PriorPreliminary Conclusions
Pros
Iterates and convergence identical to that of CG
Samples from S statistic more accurate than norm-wiseforward error bounds
Cons
Preserving semi-definiteness of posterior during down datesΣm = Σm−1 − smsTmPrior is not practical
![Page 35: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/35.jpg)
Existing Work on A-Norm Errors in CG
Estimation approaches:Ritz values, Gauss-Radau rules, incremental norm estimation,maximal attainable accuracy estimates
Golub, Strakos 1994Greenbaum 1997Meurant 1998Golub, Meurant 1997Strakos, Tichy 2002Strakos, Tichy 2005Meurant, Tichy 2013Liesen, Strakos 2016Meurant, Tichy 2019
Probabilistic numerical solvers:Designed to capture computational uncertainty
![Page 36: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/36.jpg)
5b. Priors that Reproduce CGKrylov Priors
![Page 37: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/37.jpg)
Krylov Priors Γ0 = V ΦV T
Columns of V are A-orthonormal basis{V TAV = I n, normalized CG search directions}
for Krylov space of maximal dimension n ≤ d
Kn (A, r0) = span{r0,Ar0, . . . ,An−1r0}
Φ is any diagonal matrix Φ with positive diagonal
Advantages
BayesCG under Krylov prior = CG + posteriors
Trailing submatrices: No explicit downdating of posteriors
Γm = Vm+1:n Φm+1:n,m+1:n V Tm+1:n
Γ0 singular for n < d=⇒ cannot use Γ0 to define norm for Wasserstein distance
![Page 38: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/38.jpg)
Wasserstein Metric for BayesCGunder Specific Krylov Prior
Specific Krylov prior: Γ0 = V ΦV T has diagonal elements
Φmm = (vTmr0)2 1 ≤ m ≤ n
{Computed automatically in our CG implementation}
A-norm error with specific Krylov prior
‖xm − x∗‖2A = trace (AΓm)
Wasserstein A-norm metric for BayesCG under Γ0
[WA(Gm, x∗)]2 = ‖xm − x∗‖2A + trace (A Γm)
Wasserstein A-norm metric , A-norm error in iterate
[WA(Gm, x∗)]2 = 2 ‖xm − x∗‖2A
![Page 39: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/39.jpg)
S Statistic for Specific Krylov Prior
Krylov posterior Gm = N (xm, Γm)
Distance of posterior to solution , absolute error
[WA(Gm, x∗)]2 = 2 ‖xm − x∗‖2A
Absolute error , mean of S statistic
2 ‖xm − x∗‖2A = EX∼Gm
[‖xm − X‖2
A]
Empirical mean of S statistic from 10 samples
EX∼Gm[‖xm − X‖2
A]≈ 1
10
10∑j=1
12‖xm − Xj‖2
A Xj ∼ Gm
Experiments: Ax∗ = bDimension d = 100, x∗ = 11, same A as beforeCondition number κ2(A) ≈ 4 · 104
S statistic samples per iteration: 10
![Page 40: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/40.jpg)
BayesCG under Specific Krylov PriorEmpirical Mean of S Statistic ≈ Absolute Error
0 10 20 30 40 50 60 70 80 90 100
Iteration number m
10-4
10-2
100
102
104
106
Ab
so
lute
erro
r
Error
S Statistic mean
Squared absolute error ‖xm − x∗‖2A
Empirical mean of S statistic from 10 samples
![Page 41: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/41.jpg)
BayesCG under Specific Krylov PriorRelative Error, Bounds, S Statistic
0 10 20 30 40 50 60 70 80 90 100
Iteration number m
10-6
10-4
10-2
100
102
104
106
Re
lativ
e e
rro
r
Error
CG bound
2-norm bound
S Statistic
Exact error ‖xm − x∗‖A/‖x∗‖AS statistic samples (square-root, normalized) ≈ exact errorCG bound, and general 2-norm error bound
![Page 42: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/42.jpg)
BayesCG under Specific Krylov PriorPreliminary Conclusions
Pros:
Errors in iterates , distances of posteriors to solution
Errors in iterates , mean of S statistic
S statistic produces exact magnitude of errors
Cons: BayesCG under Krylov priors requires
1 Running maximal number n of CG iterations
2 Storing all n search directions V (for factored form Γ0 = V ΦVT )
3 Matvecs with Γm ∈ Rd×(n−m+1) in iteration m
Again, this is not practical
Next: Low-rank approximations of Krylov priors are promising
![Page 43: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/43.jpg)
6. Low-Rank Approximations of Krylov Priors
![Page 44: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/44.jpg)
Low-Rank Approximation via Look-Ahead
Run k iterations of CG
Look-ahead: Run ` = 5 more CG iterations
Prior has rank k + ` = k + 5
Estimate errors em = ‖xm − x∗‖2A at iteration 1 ≤ m ≤ k
with empirical mean of 5 samples from S statistic
σm = 15
5∑j=1
12‖xm − Xmj‖2
A Xmj ∼ N (xm, Γm)
Error of empirical mean in last iteration is |σk − ek |/ek
3 numerical examples: Ax∗ = b of dimension dA = QΛQT with Q random orthogonal [Stewart 1980]
Eigenvalue distributions from [Liesen, Strakos 2013]
![Page 45: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/45.jpg)
Uniformly Distributed Eigenvalues, Dimension d = 103
k = 80 Iterations, Prior of Rank 85
κ2(A) = 105, Eigenvalues λj = 1 + j−1d−1 (105 − 1), 1 ≤ j ≤ d
0 10 20 30 40 50 60 70 80
Iteration number m
10-2
100
102
104
106
108
1010
1012
1014
Absolu
te e
rror
square
dError
two-norm bound
S Statistic mean
Squared absolute error ‖xm − x∗‖2A
Empirical mean of S statistic, mean error in last iteration ≤ .72
![Page 46: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/46.jpg)
Large Eigenvalue Cluster, Dimension d = 103
k = 30 Iterations, Prior of Rank 35
κ2(A) = 105, Eigenvalues λj = 1 + j−1d−1 (105 − 1) 0.65d−j
0 5 10 15 20 25 30
Iteration number m
10-5
100
105
1010
Ab
so
lute
err
or
sq
uare
d
Error
two-norm bound
S Statistic mean
Squared absolute error ‖xm − x∗‖2A
Empirical mean of S statistic, mean error in last iteration ≤ .18
![Page 47: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/47.jpg)
Smoothly Increasing Eigenvalues, Dimension d = 100k = 12 Iterations, Prior of Rank 17
κ2(A) ≈ 7 · 1016, Eigenvalues λj = log(j)/log(d), 1 ≤ j ≤ d
0 2 4 6 8 10 12
Iteration number m
10-10
10-5
100
105
1010
1015
1020
1025
1030
1035
Ab
so
lute
err
or
sq
uare
d
Error
two-norm bound
S Statistic mean
Squared absolute error ‖xm − x∗‖2A
Empirical mean of S statistic, mean error in last iteration ≤ .2
![Page 48: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/48.jpg)
Relative Errors for Previous Problem
0 2 4 6 8 10 12
Iteration number m
10-5
100
105
1010
1015
1020
Rela
tiv
e E
rro
r
Error
two-norm bound
S Statistic mean
Relative error ‖xm − x∗‖A/‖x∗‖A, empirical mean of S statistic, 2-norm bound
Last iteration
Two-norm bound: No accuracy (κ2(A) ≈ 7 · 1016)
Relative error: ‖x12 − x∗‖A/‖x∗‖A ≈ 1.5405 · 10−5
S statistic empirical mean: σ12 ≈ 1.1956 · 10−5
![Page 49: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/49.jpg)
Accuracy of Low-Rank Prior Approximations
Experiments: Error estimates have correct magnitudeespecially where traditional bounds are not informative
Exact prior G0 = N (x0, Γ0)
Γ0 = V Φ V T rank(Γ0) = n
Rank k + ` approximation G0 = N (x0, Γ0)
Γ0 = V 1:k+` Φ1:k+`,1:k+`V T1:k+` rank(Γ0) = k + `
Distance between low-rank and exact distributions
[WA(Gm, Gm)]2 = ‖V Tk+`+1:nr0‖2
2 0 ≤ m ≤ k
Distance ≈ contribution of r0 in search directions ignored by Γ0
![Page 50: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/50.jpg)
Practical Procedure: Error Estimate for Last Iterate
1 Run CG until convergence, at some iteration k2 Run ` more CG iterations, store ` search directions V k+1:k+`
3 Normalize factor for low-rank posterior Γk = LkLTk
Lk = V k+1:k+` Φ1/2k+1:k+` ∈ Rd×`
4 S statistic samples σj = ‖xk − Xj‖2A, 1 ≤ j ≤ ns
Xj = xk + Lk randn(`, 1)
5 Error estimate = empirical mean of S statistic samples
1ns
ns∑j=1
12σj ≈ ‖xk − x∗‖2
A
Work in addition to CG: ` iterations, c ns matvecsAdditional storage: ` search directions
![Page 51: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/51.jpg)
Take-Home Message
Solution of Ax∗ = b with real symmetric positive definite matrix A
BayesCG: Probabilistic extension of Conjugate Gradient (CG)BayesCG under low-rank Krylov prior = CG
Accurate error estimates with a few more iterations & matvecsS statistic: Produces error estimates of correct magnitude
especially where traditional bounds are not informative
Distance of posterior to solution , absolute error in iterate
Not discussed here
Numerical implementation(reducing re-orthogonalizations, numerical accuracy of posteriors)
Rigorous & meaningful statistical setting(nonlinear dependence of CG on solution, degenerate distributions)
![Page 52: Probabilistic Numerical Linear Solvers · 2020. 5. 20. · Overview 1 Probabilistic Numerics: The Big Picture 2 Gaussian Probability Distributions 3 BayesCG, a Probabilistic Numerical](https://reader035.vdocument.in/reader035/viewer/2022071505/612574d933e5e3085447dcac/html5/thumbnails/52.jpg)
Future Work
Systematic low-rank approximation of priors
Error bars/variance for S statistic and other statistics
Hardwiring termination criteria into posteriors
Probabilistic numerical Krylov solvers forindefinite/non-symmetric systems, and least squares
Priors and posteriors designed forcapturing numerical uncertainty (roundoff)
Disintegration: Non-Gaussian distributions