low complexity regularization of inverse problems

Gabriel Peyré

www.numerical-tours.com

Low Complexity Regularization of Inverse Problems

Samuel VaiterCharles Dossal

Jalal FadiliJoint works with:

Mohammad Golbabaee

VISI N

http://www.ceremade.dauphine.fr/~peyre/wavelet-tour/

http://www.ceremade.dauphine.fr/~peyre/wavelet-tour/

Overview

• Compressed Sensing and Inverse Problems

• Convex Regularization with Gauges

• Performance Guarantees

x̃0

Single Pixel Camera (Rice)

P measures � N micro-mirrors

x̃0

y[i] = hx0, 'ii


P/N = 0.16 P/N = 0.02P/N = 1

P measures � N micro-mirrors

x̃0

y[i] = hx0, 'ii


Physical hardware resolution limit: target resolution f � RN .

micromirrors

arrayresolution

CS hardware

x̃0 2 L2x0 2 RN y 2 RP

CS Hardware Model

CS is about designing hardware: input signals f̃ � L2(R2).

Physical hardware resolution limit: target resolution f � RN .

micromirrors

arrayresolution

CS hardware

,

...

Operator �

x̃0 2 L2x0 2 RN y 2 RP

CS Hardware Model

CS is about designing hardware: input signals f̃ � L2(R2).

,

,

�

x0

y = �x0 + w 2 RP

Inverse Problems

Recovering x0 � RN from noisy observations

Examples: Inpainting, super-resolution, compressed-sensing

y = �x0 + w 2 RP

�

Inverse Problems

Recovering x0 � RN from noisy observations

x0

x0

�

Inverse Problem Regularization

observations y

parameter �Estimator: x(y) depends only on

Observations: y = �x0 + w 2 RP .


observations y

parameter �

Example: variational methods

Estimator: x(y) depends only on

x(y) 2 argminx2RN

1

2||y � �x||2 + � J(x)

Data fidelity Regularity



observations y

parameter �

Example: variational methods

Estimator: x(y) depends only on

x(y) 2 argminx2RN

1

2||y � �x||2 + � J(x)

�! Criteria on (x0, ||w||,�) to ensure

||x(y)� x0|| = O(||w||)

Data fidelity Regularity


L2 stability

Performance analysis:

Model stability (e.g. spikes location)

Overview




�

Coe�cients x Image x

Union of Linear Models for Data ProcessingUnion of models: T 2 T linear spaces.

Synthesissparsity:

T

�


Union of Linear Models for Data ProcessingUnion of models: T 2 T linear spaces.

Synthesissparsity:

T

Structuredsparsity:

�


Union of Linear Models for Data Processing

D�

Image x

Gradient D⇤x

Union of models: T 2 T linear spaces.

Synthesissparsity:

T

Structuredsparsity:

Analysissparsity:

�


Multi-spectral imaging:xi,· =

Prj=1 Ai,jSj,·

Union of Linear Models for Data Processing

D�

Image x

Gradient D⇤x

Union of models: T 2 T linear spaces.

Synthesissparsity:

T

Structuredsparsity:

Analysissparsity:

Low-rank:

S1,·

S2,·

S3,·Figure 3. The concept of hyperspectral imagery. Image measurements are made atmany narrow contiguous wavelength bands, resulting in a complete spectrum for eachpixel.

Hyperspectral Data

Most multispectral imagers (e.g., Landsat, SPOT, AVHRR) measure radiation reflectedfrom a surface at a few wide, separated wavelength bands (Fig. 4). Most hyperspectralimagers (Table 1), on the other hand, measure reflected radiation at a series of narrowand contiguous wavelength bands. When we look at a spectrum for one pixel in ahyperspectral image, it looks very much like a spectrum that would be measured in aspectroscopy laboratory (Fig. 5). This type of detailed pixel spectrum can provide muchmore information about the surface than a multispectral pixel spectrum.

x

Gauge: J : RN ! R+8↵ 2 R+

, J(↵x) = ↵J(x)

Gauges for Union of Linear Models

Convex

x

T

Gauge:

,Union of linear models

(T )T2TPiecewise regular ball

J : RN ! R+8↵ 2 R+

, J(↵x) = ↵J(x)


J(x) = ||x||1T = sparsevectors

Convex

x

T

Gauge:



J : RN ! R+8↵ 2 R+

, J(↵x) = ↵J(x)


J(x) = ||x||1

x

0T 0

T = sparsevectors

Convex

x

T

Gauge:



J : RN ! R+8↵ 2 R+

, J(↵x) = ↵J(x)


J(x) = ||x||1

x

0T 0

T = sparsevectors

|x1|+||x2,3||x

0

x

T

T 0

T = block

vectors

sparse

Convex

x

T

Gauge:



J : RN ! R+8↵ 2 R+

, J(↵x) = ↵J(x)


J(x) = ||x||1T = low-rank

matrices

J(x) = ||x||⇤

x

x

0T 0

T = sparsevectors

|x1|+||x2,3||x

0

x

T

T 0

T = block

vectors

sparse

Convex

x

T

Gauge:



J : RN ! R+8↵ 2 R+

, J(↵x) = ↵J(x)


J(x) = ||x||1T = low-rank

matrices

J(x) = ||x||⇤

x

x

0T 0

T = anti-sparsevectors

J(x) = ||x||1

x

x

0

T 0

T = sparsevectors

|x1|+||x2,3||x

0

x

T

T 0

T = block

vectors

sparse

Convex

@J(x) = {⌘ \ 8 y, J(y) > J(x)+h⌘, y � xi}

Subdifferentials and Models|x|

I = supp(x) = {i \ xi 6= 0}

@J(x) = {⌘ \ 8 y, J(y) > J(x)+h⌘, y � xi}

Subdifferentials and Models

Example: J(x) = ||x||1

@||x||1 =

⇢⌘ \ supp(⌘) = I,

8 j /2 I, |⌘j | 6 1

�

x

@J(x)

0

|x|

Definition:

I = supp(x) = {i \ xi 6= 0}

T

x

= VectHull(@J(x))?

@J(x) = {⌘ \ 8 y, J(y) > J(x)+h⌘, y � xi}


Tx

= {⌘ \ supp(⌘) = I}


@||x||1 =

⇢⌘ \ supp(⌘) = I,

8 j /2 I, |⌘j | 6 1

�

Tx

x

@J(x)

0

|x|

Definition:

I = supp(x) = {i \ xi 6= 0}

T

x

= VectHull(@J(x))?

⌘ 2 @J(x) =) Proj

T

x

(⌘) = e

x

@J(x) = {⌘ \ 8 y, J(y) > J(x)+h⌘, y � xi}


e

x

= sign(x)

Tx

= {⌘ \ supp(⌘) = I}


@||x||1 =

⇢⌘ \ supp(⌘) = I,

8 j /2 I, |⌘j | 6 1

�

Tx

x

@J(x)

0 ex

|x|

Examples`

1 sparsity: J(x) = ||x||1e

x

= sign(x) T

x

= {z \ supp(z) ⇢ supp(x)}

x

0x

@J(x)

Examples

x

0x

@J(x)

`


x

= sign(x) T

x


e

x

= (N (xb

))b2B

N (a) = a/||a||Structured sparsity: J(x) =P

b ||xb||T

x


x

0x

@J(x)

Examples

x

0x

@J(x)

Tx

= {z \ U⇤?zV? = 0}e

x

= UV ⇤Nuclear norm: J(x) = ||x||⇤ x = U⇤V ⇤SVD:

`


x

= sign(x) T

x


e

x

= (N (xb

))b2B


b ||xb||T

x


x

@J(x)x

0x

@J(x)

Examples

x

0x

@J(x)

x

x

0

@J(x)

I = {i \ |xi| = ||x||1}Anti-sparsity: J(x) = ||x||1T

x

= {y \ y

I

/ sign(xI

)}e

x

= |I|�1 sign(x)

Tx

= {z \ U⇤?zV? = 0}e

x

= UV ⇤Nuclear norm: J(x) = ||x||⇤ x = U⇤V ⇤SVD:

`


x

= sign(x) T

x


e

x

= (N (xb

))b2B


b ||xb||T

x


x

@J(x)x

0x

@J(x)

Overview




Noiseless recovery:

min�x=�x0

J(x) (P0)

�x = �

x0

Dual Certificate and L2 Stability

x

?

Noiseless recovery:

min�x=�x0

J(x) (P0)

Dual certificates:

�x = �

x0Proposition:

D(x0) = Im(�⇤) \ @J(x0)

x0 solution of (P0) () 9 ⌘ 2 D(x0)


⌘@J(x0)

x

?

Noiseless recovery:

min�x=�x0

J(x) (P0)

Dual certificates:

Tight dual certificates:

�x = �

x0Proposition:

D(x0) = Im(�⇤) \ @J(x0)

D̄(x0) = Im(�⇤) \ ri(@J(x0))



⌘@J(x0)

x

?

Noiseless recovery:

min�x=�x0

J(x) (P0)

Dual certificates:


�x = �

x0Proposition:

D(x0) = Im(�⇤) \ @J(x0)

D̄(x0) = Im(�⇤) \ ri(@J(x0))



⌘@J(x0)

x

?

Theorem:

[Fadili et al. 2013]

If 9 ⌘ 2 ¯D(x0), for � ⇠ ||w|| one has ||x? � x0|| = O(||w||)

Noiseless recovery:

min�x=�x0

J(x) (P0)

Dual certificates:


�x = �

x0Proposition:

[Grassmair 2012]: J(x? � x0) = O(||w||).[Grassmair, Haltmeier, Scherzer 2010]: J = || · ||1.

D(x0) = Im(�⇤) \ @J(x0)

D̄(x0) = Im(�⇤) \ ri(@J(x0))



⌘@J(x0)

x

?

Theorem:

[Fadili et al. 2013]

If 9 ⌘ 2 ¯D(x0), for � ⇠ ||w|| one has ||x? � x0|| = O(||w||)

Random matrix: �i,j ⇠ N (0, 1), i.i.d.� 2 RP⇥N ,

Compressed Sensing Setting

Theorem:

Random matrix: �i,j ⇠ N (0, 1), i.i.d.

Sparse vectors: J = || · ||1.

Let s = ||x0||0. If

Then 9⌘ 2 ¯D(x0) with high probability on �.

� 2 RP⇥N ,

[Chandrasekaran et al. 2011]P > 2s log (N/s)

[Rudelson, Vershynin 2006]


Theorem:


Theorem:



Low-rank matrices: J = || · ||⇤.

Let s = ||x0||0. If

Let r = rank(x0). If


� 2 RP⇥N ,


P > 3r(N1 +N2 � r) x0 2 RN1⇥N2



[Chandrasekaran et al. 2011]

Theorem:


Theorem:



Low-rank matrices: J = || · ||⇤.

Let s = ||x0||0. If

Let r = rank(x0). If


� 2 RP⇥N ,

�! Similar results for || · ||1,2, || · ||1.


P > 3r(N1 +N2 � r) x0 2 RN1⇥N2



[Chandrasekaran et al. 2011]

⌘ 2 D(x0) =)⇢

⌘ = �

⇤q

ProjT (⌘) = e

Minimal-norm Certificate

T = Tx0

e = ex0

⌘ 2 D(x0) =)⇢

⌘ = �

⇤q

ProjT (⌘) = e

⌘0 = argmin⌘=�⇤q,⌘T=e

||q||


Minimal-norm pre-certificate:

T = Tx0

e = ex0

⌘ 2 D(x0) =)⇢

⌘ = �

⇤q

ProjT (⌘) = e


||q||


Proposition: One has


T = Tx0

e = ex0

⌘0 = (�+T�)

⇤e

⌘ 2 D(x0) =)⇢

⌘ = �

⇤q

ProjT (⌘) = e


||q||


Proposition:

Theorem:

the unique solution x

?of P�(y) for y = �x0 + w satisfies

Tx

? = Tx0 and ||x? � x0|| = O(||w||) [Vaiter et al. 2013]

One has


T = Tx0

e = ex0

⌘0 = (�+T�)

⇤e

If ⌘0 2 D̄(x0) and � ⇠ ||w||,

[Fuchs 2004]: J = || · ||1.[Bach 2008]: J = || · ||1,2 and J = || · ||⇤.

[Vaiter et al. 2011]: J = ||D⇤ · ||1.

⌘ 2 D(x0) =)⇢

⌘ = �

⇤q

ProjT (⌘) = e


||q||


Proposition:

Theorem:

the unique solution x

?of P�(y) for y = �x0 + w satisfies

Tx

? = Tx0 and ||x? � x0|| = O(||w||) [Vaiter et al. 2013]

One has


T = Tx0

e = ex0

⌘0 = (�+T�)

⇤e

If ⌘0 2 D̄(x0) and � ⇠ ||w||,


Theorem:



Let s = ||x0||0. If

� 2 RP⇥N ,

Then ⌘0 2 ¯D ( x0 ) wi th hi g h prob a b i l i ty on � .

[Dossal et al. 2011]

[Wainwright 2009]

P > 2s log(N)

P ⇠ 2s log(N)P ⇠ 2s log(N/s)

L2 stability Model stability

Phase

transitions:

vs.


Theorem:



Let s = ||x0||0. If

� 2 RP⇥N ,



[Wainwright 2009]

P > 2s log(N)

�! Similar results for || · ||1,2, || · ||⇤, || · ||1.



Phase

transitions:

vs.


Theorem:



Let s = ||x0||0. If

� 2 RP⇥N ,



[Wainwright 2009]

P > 2s log(N)

�! Similar results for || · ||1,2, || · ||⇤, || · ||1.

�! Not using RIP technics (non-uniform result on x0).



Phase

transitions:

vs.


Theorem:



Let s = ||x0||0. If

� 2 RP⇥N ,



[Wainwright 2009]

P > 2s log(N)

⇥x =�

i

xi�(·��i)

Increasing �:� reduces correlation.� reduces resolution.

J(x) = ||x||1

1-D Sparse Spikes Deconvolution

�x0

x0�

⇥x =�

i

xi�(·��i)

Increasing �:� reduces correlation.� reduces resolution.

�0 10

2

support recovery.

J(x) = ||x||1

()

||⌘0,Ic ||1 < 1

⌘0 2 D̄(x0)

I = {j \ x0(j) 6= 0}||⌘0,Ic ||1

1-D Sparse Spikes Deconvolution

�x0

x0�

201

()

Gauges: encode linear models as singular points.

Conclusion


Performance measures

L2error

model

di↵erent CS guarantees

Conclusion



L2error

model


Conclusion

Specific certificate ⌘0.

– Approximate model recovery Tx

? ⇡ Tx0 .



L2error

model

– CS performance with complicated gauges (e.g. TV).


Conclusion

Open problems:

Specific certificate ⌘0.

low complexity regularization of inverse problems

Education

x1 t t

jx x2rn

t linear spaces

union of linear models

x1 supp

synthesis sparsity

sparse vectorsx tx x1

sparse vectorst0x tx