fast iterative regularization methods

Fast Iterative Regularization MethodsFast Iterative Regularization Methods
Silvia Gazzola Join work J. Chung, J. Nagy, M. Sabate Landman
Department of Mathematical Sciences
March 19, 2019
Features of the problem Iterative Regularization Sparsity
What is this talk about? Solution of
Ax = b, A ∈ RM×N , b ∈ RM ,
coming from suitable discretization of∫
k(s, t)f (t)dt = g(s).
Modeling inverse problems: the process k, the output g (g = gex + ε) are known; the input f is unknown.
Example: image deblurring and denoising
S. Gazzola (UoB) FIRM March 19, 2019 2 / 35
Outline
1 Features of the problem and the need for regularization
2 Iterative Regularization (via Krylov methods)
3 Sparsity constraints (via Krylov methods)
Outline
In a “perfect” world Consider the SVD of A ∈ RN×N , for this example N = 65536:
A = UΣV T =⇒ x = N∑
i=1
vi .
For this example: κ2(A) ' 1.42 · 1012. If there is no noise:
0 1 2 3 4 5 6
10 4
10 -15
10 -10
10 -5
10 0
10 5
|u i
T b|
In a less “perfect” world
Consider the SVD of A ∈ RN×N , for this example N = 65536:
A = UΣV T =⇒ x = N∑
i=1
vi .
If there is noise, b = bex + e (for this example e2/b2 = 10−2):
0 1 2 3 4 5 6
10 4
10 -15
10 -10
10 -5
10 0
10 5
|u i
T b|
Applying some direct regularization (filtering) [Hansen. Discrete Inverse Problems. SIAM, 2010]
Truncated SVD (TSVD), k N
xk = k∑
i=1
.
0 1 2 3 4 5 6
10 4
10 -15
10 -10
10 -5
10 0
10 5
i |u
Applying some direct regularization (filtering) [Hansen. Discrete Inverse Problems. SIAM, 2010]
Tikhonov regularization: xλ = arg min
x∈RN Ax − b2
2 + λ Lx2 2 , λ > 0
Equivalently (for L = I), xλ = N∑
i=1 φi
σ2 i + λ
0 1 2 3 4 5 6
10 4
10 -15
10 -10
10 -5
10 0
10 5
i |u
In a more “real” world
no SVD!
xλ = arg min x∈RN
Ax − b2 2 + λ x1
no SVD!
Ax − b2 2 + λ Ψx1
no SVD!
Ax − b2 2 + λTV (x)
Outline
Iterative Regularization Methods, minx∈RN Ax − b2
Gradient Descent Methods xm+1 = xm + αmAT (b − Axm)
Krylov subspace methods
Krylov subspace methods relative errors
0 20 40 60 80 100 120 10
−0.8
10 −2
10 −1
0 20 40 60 80 100 120 10
−0.8
10 −2
10 −1
0 20 40 60 80 100 120 10
−0.8
10 −2
10 −1
0 20 40 60 80 100 120 10
−0.8
10 −2
10 −1
A common framework for Krylov methods Projection onto Krylov subspaces: Km(C , d) = span{d ,Cd , . . . ,Cm−1d},
xm ∈ Km(C , d), rm = b − Axm ⊥ Km(C ′, d ′)
Popular examples:
Km(C , d) Km(C ′, d ′) GMRES Km(A, b) AKm(A, b) RR-GMRES Km(A,Ab) AKm(A,Ab) RR-GMRES Km(A,A`b) AKm(A,A`b) CGLS (LSQR) Km(AT A,AT b) AKm(AT A,AT b)
At step m: 1 Expand the Krylov subspace: AWm = Zm+1Gm,
with Gm ∈ R(m+1)×m, R(Wm) = Km(C , d). 2 Solve a projected LS problem: ym = arg miny∈Rm fm − Gmy2. 3 Consider the approximation xm = Wmym.
[Saad. Iterative Methods. SIAM, 2003] S. Gazzola (UoB) FIRM March 19, 2019 12 / 35
Gradient Descent approach VS. Krylov Subspaces approach
relative error history
10 −0.7
10 −0.6
GD
CGLS
Krylov methods regularize because: they “mimic” the TSVD; Km(C , d) ' Km+1(C , d) for “small m”.
[G., Novati, Russo. On Krylov proj. meth. and Tikhonov. ETNA, 2015]
Hybrid regularization Hybrid (Krylov-Tikhonov) methods:
xλm,m = Wmyλm,m ∈ Km(C , d) , where
yλm,m = arg miny∈Rm Gmy − fm
2 2
10 -1
choice of m, λm; choice of Lm.
[Chung, Kilmer, O’Leary. Regularization via Operator Approximation. SISC, 2015] S. Gazzola (UoB) FIRM March 19, 2019 14 / 35
Regularization parameter(s) choice
e2 known
“secant”update approach (aka, discrepancy principle) [G. and Novati. Automatic parameter setting for AT. JCAM, 2014]
e2 unknown
“embedded”approach
rm2 = O(e2) , eventually
[G., Novati, Russo. Embedded techniques for choosing the parameter. NLAA, 2014] [Hnetynkova, Plesinger, Strakos. GKB and revealing the noise level. BIT, 2009.]
A new class of adaptive regularization parameter choice rules [based on bi-level optimization ideas; work in progress, with M. Sabate Landman]
Outline
Beyond the 2-norm: 1-norm
Adopting the Iteratively Reweighted Norm (IRN) strategy: [Rodriguez and Wohlberg. An efficient algorithm for sparse representations. IEEE, 2008]
x1 ≈ Wx2 2 = Wmx2
2, with Wm = Lm = diag (
1√ |xm−1|
) IRN algorithm Input: A, b, x0, L0(= I).
For m = 1, . . . , till a stopping criterion is satisfied: (Till a stopping criterion is satisfied:) run an (iterative) solver for
xm = arg min x∈RN
b − Ax2 2 + λLmx2
2 .
Beyond the 2-norm: 1-norm, A ∈ RN×N
[G., Nagy. Generalized AT methods for Sparse Rec. SISC, 2014.] Standard form transformation
minx b − Ax2 2 + λL(m)x2
2 minx b − A(m)x2 2 + λx2
2 ⇔
Preconditioned Krylov subspaces ...with variable (iteration-dependent) “preconditioning”.
Flexible Arnoldi algorithm [Saad. A flexible prec. GMRES. SISC, 1993.]
AZm = Vm+1Hm, Zm = [(L(1))−1v1, (L(2))−1v2, . . . , (L(m))−1vm].
Solution xm = Zmym, where ym = arg min y∈Rm
be1 − Hmy2 2+λmy2
2 .
The star cluster test problem [G., Hansen, Nagy. IR Tools, 2018]
A ∈ R65536×65536, ε = 10−2. exact blurred & noisy restored
Relative Error History
10 −2
10 −1
10 0
Reg. Parameter
10 0
10 −2
10 0
The star cluster test problem
Relative Error History
0 10 20 30 40 50 60 70 80 90 100 110 10
−4
Bioucas-Dias, Figueiredo. A new TwIST. IEEE Image Process., 2007.
Rodrguez, Wohlberg. An Efficient Alg. for Sp. Repr. with `p Data Fidelity Term. Proceedings of ANDESCON, 2008. Wright, Nowak, Figueiredo. SpaRSA. IEEE Sig. Proc., 2009.
The star cluster test problem
Method Relative Error Iterations Total Time Average Time SpaRSA 1.1081 · 10−2 343 45.16 0.13 TwIST 1.1105 · 10−2 102 16.39 0.16 l1 l2 1.1146 · 10−2 307 378.61 1.23
IRN-BPDN 1.1146 · 10−2 791 112.33 0.14 AT 1.8609 · 10−2 12 0.65 0.05
Flexi-AT 1.1610 · 10−2 100 5.03 0.05 NN-ReSt-GAT 4.0606 · 10−3 40 2.56 0.06
Bioucas-Dias, Figueiredo. A new TwIST: two-step it.e shrinkage/thresholding alg. IEEE Trans. Im Proc., 2007.
Ki, Koh, Lustig, Boyd, Gorinvesky. An interior-point method for large-scale `1-regularized least squares. IEEE Top. Im. Proc., 2007.
Rodrguez, Wohlberg. An Efficient Alg. for Sp. Repr. with `p Data Fidelity Term. Proceedings of IEEE ANDESCON, 2008.
Wright, Nowak, Figueiredo. SpaRSA. IEEE Trans. Sig. Proc., 2009.
Beyond the 2-norm: 1-norm, A ∈ RM×N
[Chung, G.. Flexible Krylov methods for `p regularization. Submitted.]
We need the Flexible Golub-Kahan (FGK) decomposition: AZm = Um+1Mm , AT Um+1 = Vm+1Tm+1,
where Zm = [(L(1))−1v1, (L(2))−1v2, . . . , (L(m))−1vm].
Flexible LSQR (FLSQR): xm = Zmym, where ym = arg min
y∈Rm be1 −Mmy2
2 .
Prop. xm = arg minx∈R(Zm) Axm − b2. In a hybrid fashion, xλm,m = Zmyλm,m, where
yλm,m = arg min y∈Rm
be1 −Mmy2 2 + λmy2
2.
ym = arg min y∈Rm
AT be1 −Mm+1Tmy 2
2 .
Prop. xm = arg minx∈R(Zm) AT (Axm − b)2. In a hybrid fashion, xλm,m = Zmyλm,m, where
yλm,m = arg min y∈Rm
AT be1 −Mm+1Tmy 2
2 + λmy2
Beyond the 1-norm: sparsity under transform
Equivalent problems (for Ψ orthogonal):
min x∈RN
m
Solution subspace for flexible Arnoldi:
sm ∈ span{(L(1))−1d , (L(2))−1B(L(1))−1d , . . . , (L(m))−1B · · · (L(2))−1B(L(1))−1d}
m
xm ∈ ΨT span{(L(1))−1Ψb, . . . , (L(m))−1ΨAΨT · · · (L(2))−1ΨAΨT (L(1))−1Ψb}
More straightforward for flexible Golub-Kahan (Ψ = I)
A simple 1D example...
10 20 30 40 50 60
-1
-0.5
0
0.5
1
-1.5
-1
-0.5
0
0.5
1
1.5
10 20 30 40 50 60
-1
-0.5
0
0.5
1
-1.5
-1
-0.5
0
0.5
1
1.5
A simple 1D example... [– GMRES; – FGMRES]
History of Relative Errors
-2
-0.5
0
0.5
1
-0.4
-0.2
0
0.2
0.4
An image deblurring example true PSF observed
0 20 40 60 80 100
Iteration
0
0.5
1
1.5
2
An image deblurring example FLSQR FLSQR-R HyBR
Beyond the 1-norm: TV penalization
min x b − Ax2
2 + λTV(x)
[Wohlberg and Rodriguez. An iteratively reweighted norm algorithm for TV. IEEE, 2007]
1d case: TV(x) = D1dx1' W1dDx2 2, where
D1d =
2d case: TV(x) = (
D2d = [
Dh
Dv
Smoothing Norm, A ∈ RN×N
Standard form transformation:
2 + λ2y2 2 , where
A = AL†A = A[I − (A(I − L†L))†A] b = b − Ax0
xL = L†AyL + x0 = xL + x0
.
[Hansen and Jensen. Smoothing-Norm Preconditioning for Reg. Min.-Res. SIMAX, 2007]
Write:
xL = xL + x0 = L†AyL + x0 = L†AyL + Kt0 , where R(K) = N (L) , L†A rectangular .
Equivalently:
] = b ,
and, further: [ (L†A)T AL†A (L†A)T AK K T AL†A K T AK
][ yL t0
] .
Schur complement system:
(L†A)T PAL†Ay = (L†A)T Pb, where P = I − AK(K T AK)−1K T ∈ RN×N .
Beyond the 1-norm: TV penalization, A ∈ RN×N
[G. and Sabate Landman. Flexible GMRES for TV Regularization. BIT, 2019.]
Similar idea, with reweighting...
Building a better approximation subspace for the solution!
L = WD (with W = W (xm)): flexible GMRES (instead of restarted GMRES); large-scale computations:
approximating L† (exploiting structure, and running preconditioned LSQR or LSMR) thresholding the weights
A simple 1D example...
-1
-0.5
0
0.5
1
1.5
2
-1
-0.5
0
0.5
1
1.5
2
-0.2
-0.1
0
0.1
0.2
0.3
-4
-2
0
2
4
6
8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
-1
-0.5
0
0.5
1
1.5
Numerical experiments, small image
[G., Hansen, Nagy. IR Tools: A MATLAB Package of Iterative Regularization and Large-Scale Test Pbs., Numer. Algos, 2018] Small geometrical example. 32× 32 pixels.
Blurred and noisy image
GMRES
Numerical experiments, small image [G., Hansen, Nagy. IR Tools: A MATLAB Package of Iterative Regularization and Large-Scale Test Pbs., Numer. Algos, 2018] Small geometrical example. 32× 32 pixels.
GMRES (D)
TV-FGMRES
TV-FGMRES “0 norm”
Numerical experiments, small image [G., Hansen, Nagy. IR Tools: A MATLAB Package of Iterative Regularization and Large-Scale Test Pbs., Numer. Algos, 2018]
Small geometrical example. 32× 32 pixels.
Relative Errors Total Variation
10 -1
10 0
10 1
10 2
10 3
Numerical experiments, larger image [G., Hansen, Nagy. IR Tools: A MATLAB Package of Iterative Regularization and Large-Scale Test Pbs., Numer. Algos, 2018] Cameraman example. 256× 256 pixels.
corrupted SN-GMRES
Numerical experiments, larger image
[G., Hansen, Nagy. IR Tools: A MATLAB Package of Iterative Regularization and Large-Scale Test Pbs., Numer. Algos, 2018] Cameraman example. 256× 256 pixels.
relative errors Total Variation
-1
4
Final remarks & future plans
Keep using Krylov methods!
Thanks for your attention!
Iterative Regularization (via Krylov methods)
Sparsity constraints (via Krylov methods)

fast iterative regularization methods

Documents