low complexity regularization of inverse problems
DESCRIPTION
Seminar given at "Data science in particle physics, astrophysics and cosmology" on November 29th, 2013.TRANSCRIPT
Gabriel Peyré
www.numerical-tours.com
Low Complexity Regularization of Inverse Problems
Samuel VaiterCharles Dossal
Jalal FadiliJoint works with:
Mohammad Golbabaee
VISI N
Overview
• Compressed Sensing and Inverse Problems
• Convex Regularization with Gauges
• Performance Guarantees
x̃0
Single Pixel Camera (Rice)
P measures � N micro-mirrors
x̃0
y[i] = hx0, 'ii
Single Pixel Camera (Rice)
P/N = 0.16 P/N = 0.02P/N = 1
P measures � N micro-mirrors
x̃0
y[i] = hx0, 'ii
Single Pixel Camera (Rice)
Physical hardware resolution limit: target resolution f � RN .
micromirrors
arrayresolution
CS hardware
x̃0 2 L2x0 2 RN y 2 RP
CS Hardware Model
CS is about designing hardware: input signals f̃ � L2(R2).
Physical hardware resolution limit: target resolution f � RN .
micromirrors
arrayresolution
CS hardware
,
...
Operator �
x̃0 2 L2x0 2 RN y 2 RP
CS Hardware Model
CS is about designing hardware: input signals f̃ � L2(R2).
,
,
�
x0
y = �x0 + w 2 RP
Inverse Problems
Recovering x0 � RN from noisy observations
Examples: Inpainting, super-resolution, compressed-sensing
y = �x0 + w 2 RP
�
Inverse Problems
Recovering x0 � RN from noisy observations
x0
x0
�
Inverse Problem Regularization
observations y
parameter �Estimator: x(y) depends only on
Observations: y = �x0 + w 2 RP .
Inverse Problem Regularization
observations y
parameter �
Example: variational methods
Estimator: x(y) depends only on
x(y) 2 argminx2RN
1
2||y � �x||2 + � J(x)
Data fidelity Regularity
Observations: y = �x0 + w 2 RP .
Inverse Problem Regularization
observations y
parameter �
Example: variational methods
Estimator: x(y) depends only on
x(y) 2 argminx2RN
1
2||y � �x||2 + � J(x)
�! Criteria on (x0, ||w||,�) to ensure
||x(y)� x0|| = O(||w||)
Data fidelity Regularity
Observations: y = �x0 + w 2 RP .
L2 stability
Performance analysis:
Model stability (e.g. spikes location)
Overview
• Compressed Sensing and Inverse Problems
• Convex Regularization with Gauges
• Performance Guarantees
�
Coe�cients x Image x
Union of Linear Models for Data ProcessingUnion of models: T 2 T linear spaces.
Synthesissparsity:
T
�
Coe�cients x Image x
Union of Linear Models for Data ProcessingUnion of models: T 2 T linear spaces.
Synthesissparsity:
T
Structuredsparsity:
�
Coe�cients x Image x
Union of Linear Models for Data Processing
D�
Image x
Gradient D⇤x
Union of models: T 2 T linear spaces.
Synthesissparsity:
T
Structuredsparsity:
Analysissparsity:
�
Coe�cients x Image x
Multi-spectral imaging:xi,· =
Prj=1 Ai,jSj,·
Union of Linear Models for Data Processing
D�
Image x
Gradient D⇤x
Union of models: T 2 T linear spaces.
Synthesissparsity:
T
Structuredsparsity:
Analysissparsity:
Low-rank:
S1,·
S2,·
S3,·Figure 3. The concept of hyperspectral imagery. Image measurements are made atmany narrow contiguous wavelength bands, resulting in a complete spectrum for eachpixel.
Hyperspectral Data
Most multispectral imagers (e.g., Landsat, SPOT, AVHRR) measure radiation reflectedfrom a surface at a few wide, separated wavelength bands (Fig. 4). Most hyperspectralimagers (Table 1), on the other hand, measure reflected radiation at a series of narrowand contiguous wavelength bands. When we look at a spectrum for one pixel in ahyperspectral image, it looks very much like a spectrum that would be measured in aspectroscopy laboratory (Fig. 5). This type of detailed pixel spectrum can provide muchmore information about the surface than a multispectral pixel spectrum.
x
Gauge: J : RN ! R+8↵ 2 R+
, J(↵x) = ↵J(x)
Gauges for Union of Linear Models
Convex
x
T
Gauge:
,Union of linear models
(T )T2TPiecewise regular ball
J : RN ! R+8↵ 2 R+
, J(↵x) = ↵J(x)
Gauges for Union of Linear Models
J(x) = ||x||1T = sparsevectors
Convex
x
T
Gauge:
,Union of linear models
(T )T2TPiecewise regular ball
J : RN ! R+8↵ 2 R+
, J(↵x) = ↵J(x)
Gauges for Union of Linear Models
J(x) = ||x||1
x
0T 0
T = sparsevectors
Convex
x
T
Gauge:
,Union of linear models
(T )T2TPiecewise regular ball
J : RN ! R+8↵ 2 R+
, J(↵x) = ↵J(x)
Gauges for Union of Linear Models
J(x) = ||x||1
x
0T 0
T = sparsevectors
|x1|+||x2,3||x
0
x
T
T 0
T = block
vectors
sparse
Convex
x
T
Gauge:
,Union of linear models
(T )T2TPiecewise regular ball
J : RN ! R+8↵ 2 R+
, J(↵x) = ↵J(x)
Gauges for Union of Linear Models
J(x) = ||x||1T = low-rank
matrices
J(x) = ||x||⇤
x
x
0T 0
T = sparsevectors
|x1|+||x2,3||x
0
x
T
T 0
T = block
vectors
sparse
Convex
x
T
Gauge:
,Union of linear models
(T )T2TPiecewise regular ball
J : RN ! R+8↵ 2 R+
, J(↵x) = ↵J(x)
Gauges for Union of Linear Models
J(x) = ||x||1T = low-rank
matrices
J(x) = ||x||⇤
x
x
0T 0
T = anti-sparsevectors
J(x) = ||x||1
x
x
0
T 0
T = sparsevectors
|x1|+||x2,3||x
0
x
T
T 0
T = block
vectors
sparse
Convex
@J(x) = {⌘ \ 8 y, J(y) > J(x)+h⌘, y � xi}
Subdifferentials and Models|x|
I = supp(x) = {i \ xi 6= 0}
@J(x) = {⌘ \ 8 y, J(y) > J(x)+h⌘, y � xi}
Subdifferentials and Models
Example: J(x) = ||x||1
@||x||1 =
⇢⌘ \ supp(⌘) = I,
8 j /2 I, |⌘j | 6 1
�
x
@J(x)
0
|x|
Definition:
I = supp(x) = {i \ xi 6= 0}
T
x
= VectHull(@J(x))?
@J(x) = {⌘ \ 8 y, J(y) > J(x)+h⌘, y � xi}
Subdifferentials and Models
Tx
= {⌘ \ supp(⌘) = I}
Example: J(x) = ||x||1
@||x||1 =
⇢⌘ \ supp(⌘) = I,
8 j /2 I, |⌘j | 6 1
�
Tx
x
@J(x)
0
|x|
Definition:
I = supp(x) = {i \ xi 6= 0}
T
x
= VectHull(@J(x))?
⌘ 2 @J(x) =) Proj
T
x
(⌘) = e
x
@J(x) = {⌘ \ 8 y, J(y) > J(x)+h⌘, y � xi}
Subdifferentials and Models
e
x
= sign(x)
Tx
= {⌘ \ supp(⌘) = I}
Example: J(x) = ||x||1
@||x||1 =
⇢⌘ \ supp(⌘) = I,
8 j /2 I, |⌘j | 6 1
�
Tx
x
@J(x)
0 ex
|x|
Examples`
1 sparsity: J(x) = ||x||1e
x
= sign(x) T
x
= {z \ supp(z) ⇢ supp(x)}
x
0x
@J(x)
Examples
x
0x
@J(x)
`
1 sparsity: J(x) = ||x||1e
x
= sign(x) T
x
= {z \ supp(z) ⇢ supp(x)}
e
x
= (N (xb
))b2B
N (a) = a/||a||Structured sparsity: J(x) =P
b ||xb||T
x
= {z \ supp(z) ⇢ supp(x)}
x
0x
@J(x)
Examples
x
0x
@J(x)
Tx
= {z \ U⇤?zV? = 0}e
x
= UV ⇤Nuclear norm: J(x) = ||x||⇤ x = U⇤V ⇤SVD:
`
1 sparsity: J(x) = ||x||1e
x
= sign(x) T
x
= {z \ supp(z) ⇢ supp(x)}
e
x
= (N (xb
))b2B
N (a) = a/||a||Structured sparsity: J(x) =P
b ||xb||T
x
= {z \ supp(z) ⇢ supp(x)}
x
@J(x)x
0x
@J(x)
Examples
x
0x
@J(x)
x
x
0
@J(x)
I = {i \ |xi| = ||x||1}Anti-sparsity: J(x) = ||x||1T
x
= {y \ y
I
/ sign(xI
)}e
x
= |I|�1 sign(x)
Tx
= {z \ U⇤?zV? = 0}e
x
= UV ⇤Nuclear norm: J(x) = ||x||⇤ x = U⇤V ⇤SVD:
`
1 sparsity: J(x) = ||x||1e
x
= sign(x) T
x
= {z \ supp(z) ⇢ supp(x)}
e
x
= (N (xb
))b2B
N (a) = a/||a||Structured sparsity: J(x) =P
b ||xb||T
x
= {z \ supp(z) ⇢ supp(x)}
x
@J(x)x
0x
@J(x)
Overview
• Compressed Sensing and Inverse Problems
• Convex Regularization with Gauges
• Performance Guarantees
Noiseless recovery:
min�x=�x0
J(x) (P0)
�x = �
x0
Dual Certificate and L2 Stability
x
?
Noiseless recovery:
min�x=�x0
J(x) (P0)
Dual certificates:
�x = �
x0Proposition:
D(x0) = Im(�⇤) \ @J(x0)
x0 solution of (P0) () 9 ⌘ 2 D(x0)
Dual Certificate and L2 Stability
⌘@J(x0)
x
?
Noiseless recovery:
min�x=�x0
J(x) (P0)
Dual certificates:
Tight dual certificates:
�x = �
x0Proposition:
D(x0) = Im(�⇤) \ @J(x0)
D̄(x0) = Im(�⇤) \ ri(@J(x0))
x0 solution of (P0) () 9 ⌘ 2 D(x0)
Dual Certificate and L2 Stability
⌘@J(x0)
x
?
Noiseless recovery:
min�x=�x0
J(x) (P0)
Dual certificates:
Tight dual certificates:
�x = �
x0Proposition:
D(x0) = Im(�⇤) \ @J(x0)
D̄(x0) = Im(�⇤) \ ri(@J(x0))
x0 solution of (P0) () 9 ⌘ 2 D(x0)
Dual Certificate and L2 Stability
⌘@J(x0)
x
?
Theorem:
[Fadili et al. 2013]
If 9 ⌘ 2 ¯D(x0), for � ⇠ ||w|| one has ||x? � x0|| = O(||w||)
Noiseless recovery:
min�x=�x0
J(x) (P0)
Dual certificates:
Tight dual certificates:
�x = �
x0Proposition:
[Grassmair 2012]: J(x? � x0) = O(||w||).[Grassmair, Haltmeier, Scherzer 2010]: J = || · ||1.
D(x0) = Im(�⇤) \ @J(x0)
D̄(x0) = Im(�⇤) \ ri(@J(x0))
x0 solution of (P0) () 9 ⌘ 2 D(x0)
Dual Certificate and L2 Stability
⌘@J(x0)
x
?
Theorem:
[Fadili et al. 2013]
If 9 ⌘ 2 ¯D(x0), for � ⇠ ||w|| one has ||x? � x0|| = O(||w||)
Random matrix: �i,j ⇠ N (0, 1), i.i.d.� 2 RP⇥N ,
Compressed Sensing Setting
Theorem:
Random matrix: �i,j ⇠ N (0, 1), i.i.d.
Sparse vectors: J = || · ||1.
Let s = ||x0||0. If
Then 9⌘ 2 ¯D(x0) with high probability on �.
� 2 RP⇥N ,
[Chandrasekaran et al. 2011]P > 2s log (N/s)
[Rudelson, Vershynin 2006]
Compressed Sensing Setting
Theorem:
Then 9⌘ 2 ¯D(x0) with high probability on �.
Theorem:
Random matrix: �i,j ⇠ N (0, 1), i.i.d.
Sparse vectors: J = || · ||1.
Low-rank matrices: J = || · ||⇤.
Let s = ||x0||0. If
Let r = rank(x0). If
Then 9⌘ 2 ¯D(x0) with high probability on �.
� 2 RP⇥N ,
[Chandrasekaran et al. 2011]P > 2s log (N/s)
P > 3r(N1 +N2 � r) x0 2 RN1⇥N2
[Rudelson, Vershynin 2006]
Compressed Sensing Setting
[Chandrasekaran et al. 2011]
Theorem:
Then 9⌘ 2 ¯D(x0) with high probability on �.
Theorem:
Random matrix: �i,j ⇠ N (0, 1), i.i.d.
Sparse vectors: J = || · ||1.
Low-rank matrices: J = || · ||⇤.
Let s = ||x0||0. If
Let r = rank(x0). If
Then 9⌘ 2 ¯D(x0) with high probability on �.
� 2 RP⇥N ,
�! Similar results for || · ||1,2, || · ||1.
[Chandrasekaran et al. 2011]P > 2s log (N/s)
P > 3r(N1 +N2 � r) x0 2 RN1⇥N2
[Rudelson, Vershynin 2006]
Compressed Sensing Setting
[Chandrasekaran et al. 2011]
⌘ 2 D(x0) =)⇢
⌘ = �
⇤q
ProjT (⌘) = e
Minimal-norm Certificate
T = Tx0
e = ex0
⌘ 2 D(x0) =)⇢
⌘ = �
⇤q
ProjT (⌘) = e
⌘0 = argmin⌘=�⇤q,⌘T=e
||q||
Minimal-norm Certificate
Minimal-norm pre-certificate:
T = Tx0
e = ex0
⌘ 2 D(x0) =)⇢
⌘ = �
⇤q
ProjT (⌘) = e
⌘0 = argmin⌘=�⇤q,⌘T=e
||q||
Minimal-norm Certificate
Proposition: One has
Minimal-norm pre-certificate:
T = Tx0
e = ex0
⌘0 = (�+T�)
⇤e
⌘ 2 D(x0) =)⇢
⌘ = �
⇤q
ProjT (⌘) = e
⌘0 = argmin⌘=�⇤q,⌘T=e
||q||
Minimal-norm Certificate
Proposition:
Theorem:
the unique solution x
?of P�(y) for y = �x0 + w satisfies
Tx
? = Tx0 and ||x? � x0|| = O(||w||) [Vaiter et al. 2013]
One has
Minimal-norm pre-certificate:
T = Tx0
e = ex0
⌘0 = (�+T�)
⇤e
If ⌘0 2 D̄(x0) and � ⇠ ||w||,
[Fuchs 2004]: J = || · ||1.[Bach 2008]: J = || · ||1,2 and J = || · ||⇤.
[Vaiter et al. 2011]: J = ||D⇤ · ||1.
⌘ 2 D(x0) =)⇢
⌘ = �
⇤q
ProjT (⌘) = e
⌘0 = argmin⌘=�⇤q,⌘T=e
||q||
Minimal-norm Certificate
Proposition:
Theorem:
the unique solution x
?of P�(y) for y = �x0 + w satisfies
Tx
? = Tx0 and ||x? � x0|| = O(||w||) [Vaiter et al. 2013]
One has
Minimal-norm pre-certificate:
T = Tx0
e = ex0
⌘0 = (�+T�)
⇤e
If ⌘0 2 D̄(x0) and � ⇠ ||w||,
Compressed Sensing Setting
Theorem:
Random matrix: �i,j ⇠ N (0, 1), i.i.d.
Sparse vectors: J = || · ||1.
Let s = ||x0||0. If
� 2 RP⇥N ,
Then ⌘0 2 ¯D ( x0 ) wi th hi g h prob a b i l i ty on � .
[Dossal et al. 2011]
[Wainwright 2009]
P > 2s log(N)
P ⇠ 2s log(N)P ⇠ 2s log(N/s)
L2 stability Model stability
Phase
transitions:
vs.
Compressed Sensing Setting
Theorem:
Random matrix: �i,j ⇠ N (0, 1), i.i.d.
Sparse vectors: J = || · ||1.
Let s = ||x0||0. If
� 2 RP⇥N ,
Then ⌘0 2 ¯D ( x0 ) wi th hi g h prob a b i l i ty on � .
[Dossal et al. 2011]
[Wainwright 2009]
P > 2s log(N)
�! Similar results for || · ||1,2, || · ||⇤, || · ||1.
P ⇠ 2s log(N)P ⇠ 2s log(N/s)
L2 stability Model stability
Phase
transitions:
vs.
Compressed Sensing Setting
Theorem:
Random matrix: �i,j ⇠ N (0, 1), i.i.d.
Sparse vectors: J = || · ||1.
Let s = ||x0||0. If
� 2 RP⇥N ,
Then ⌘0 2 ¯D ( x0 ) wi th hi g h prob a b i l i ty on � .
[Dossal et al. 2011]
[Wainwright 2009]
P > 2s log(N)
�! Similar results for || · ||1,2, || · ||⇤, || · ||1.
�! Not using RIP technics (non-uniform result on x0).
P ⇠ 2s log(N)P ⇠ 2s log(N/s)
L2 stability Model stability
Phase
transitions:
vs.
Compressed Sensing Setting
Theorem:
Random matrix: �i,j ⇠ N (0, 1), i.i.d.
Sparse vectors: J = || · ||1.
Let s = ||x0||0. If
� 2 RP⇥N ,
Then ⌘0 2 ¯D ( x0 ) wi th hi g h prob a b i l i ty on � .
[Dossal et al. 2011]
[Wainwright 2009]
P > 2s log(N)
⇥x =�
i
xi�(·��i)
Increasing �:� reduces correlation.� reduces resolution.
J(x) = ||x||1
1-D Sparse Spikes Deconvolution
�x0
x0�
⇥x =�
i
xi�(·��i)
Increasing �:� reduces correlation.� reduces resolution.
�0 10
2
support recovery.
J(x) = ||x||1
()
||⌘0,Ic ||1 < 1
⌘0 2 D̄(x0)
I = {j \ x0(j) 6= 0}||⌘0,Ic ||1
1-D Sparse Spikes Deconvolution
�x0
x0�
201
()
Gauges: encode linear models as singular points.
Conclusion
Gauges: encode linear models as singular points.
Performance measures
L2error
model
di↵erent CS guarantees
Conclusion
Gauges: encode linear models as singular points.
Performance measures
L2error
model
di↵erent CS guarantees
Conclusion
Specific certificate ⌘0.
– Approximate model recovery Tx
? ⇡ Tx0 .
Gauges: encode linear models as singular points.
Performance measures
L2error
model
– CS performance with complicated gauges (e.g. TV).
di↵erent CS guarantees
Conclusion
Open problems:
Specific certificate ⌘0.