discrete versus continuous sparse regularizationthe deconvolution problem assumption:the signal m 0...

Discrete versus Continuous sparse regularization

Vincent Duval Gabriel Peyre

CNRS - Universite Paris-Dauphine

August 29, 2014

Vincent Duval, Gabriel Peyre Discrete vs continuous August 29, 2014 1 / 22

Outline

1 The deconvolution problem

2 Discrete regularization

3 Continuous regularization


Outline




Deconvolution

Measuring devices have a non sharp impulse response: our observationsare blurred of a ”true ideal scene”.

Geophysics,

Astronomy,

Microscopy,

Spectroscopy,

. . . Image courtesy of S. Ladjal

Goal: Obtain as much detail as we can from given measurements.


The Deconvolution Problem

Consider a signal m0 defined on T = R/Z (i.e. [0, 1) with periodicboundary condition).

Perturbation model:

Original Signal

m0

t0 1

∗

Low-pass filter

ϕ

t0.5−0.5

+

Noise

w

t0 1

=

Observation

y0 + w

t0 1

Goal: recover m0 from the observation y0 + w = ϕ ∗m0 + w (orsimply y0 = ϕ ∗m0)

Ill-posed problem:I the low pass filter might not be invertible (ϕn = 0 for some

frequency n)I even though, the problem is ill-conditioned (|ϕn| � |ϕ0| for high

frequencies n)



Consider a signal m0 defined on T = R/Z (i.e. [0, 1) with periodicboundary condition).

Perturbation model:

Original Signal

m0

t0 1

∗

Low-pass filter

ϕ

t0.5−0.5

+

Noise

w

t0 1

=

Observation

y0 + w

t0 1

Goal: recover m0 from the observation y0 + w = ϕ ∗m0 + w (orsimply y0 = ϕ ∗m0)Ill-posed problem:

I the low pass filter might not be invertible (ϕn = 0 for somefrequency n)

I even though, the problem is ill-conditioned (|ϕn| � |ϕ0| for highfrequencies n)



Assumption: the signal m0 is sparse

Original Signal

m0

t0 1x1

x2

x3

∗

Low-pass filter

ϕ

t0.5−0.5

+

Noise

w

t0 1

=

Observation

y + w

t0 1

m0 =N∑i=1

aiδxi , where

ai ∈ R,xi ∈ T,N ∈ N is small.

so that we observe y + w =∑N

i=1 aiϕ(· − xi) + w.

Idea: Look for a sparse signal m such that ϕ ∗m ≈ y0 + w (or y0).


Question

How can we use sparsity to solve the inverse problem?

Can we guarantee that the reconstructed signal is close to the originalone?

What can we say about the structure/support of the solution?


Outline




Discretization

Define a finite grid G = { i2M

; 0 6 i 6 2M − 1} ⊂ T,

and consider signals of the form m =∑2M−1

i=0 aiδ i

2M.

Candidate Signal

m

t0 1

12M

32M

2M−12M

Write

ϕ ∗m =

2M−1∑i=0

aiϕ(· − i

2M)

=

ϕ ϕ(·− 12M

) . . . ϕ(· − 2M−12M

)

︸︷︷︸

Φ

a0

a1...

aM−1

︸︷︷︸

a

.

Equivalent paradigm: Look for a sparse vector a ∈ R2M such thatΦa ≈ y0 (or Φa ≈ y0 + w).


Discrete `1 regularizationDefine

‖m‖`1(G) =

{ ∑2M−1i=0 |ai| if m =

∑2M−1i=0 aiδi/2M ,

+∞ otherwise.

m0 =∑2M−1

i=0 a0,iδi/2M

t0 11

2M

32M

2M−12M

Basis Pursuit [Chen & Donoho (94)]

infm∈M(T)

‖m‖`1(G) such that Φm = y0 (PM0 (y0))

LASSO [Tibshirani (96)] or Basis Pursuit Denoising [Chen et al. (99)]

infm∈M(T)

λ‖m‖`1(G) +1

2||Φm− (y0 + w)||22 (PMλ (y0 + w))

`2-robustness ([Grasmair et al.(11)])

If m0 =∑i a0,iδi/2M is the unique solution to PM0 (y0), and mλ =

∑i aλ,i is a

solution to PMλ (y0 + w), then ‖aλ − a0‖2 = O(‖w‖2) for λ = C‖w‖2.


Discrete `1 regularizationDefine

‖m‖`1(G) =

{ ∑2M−1i=0 |ai| if m =

∑2M−1i=0 aiδi/2M ,

+∞ otherwise.

Basis Pursuit [Chen & Donoho (94)]

infm∈M(T)

‖m‖`1(G) such that Φm = y0 (PM0 (y0))

LASSO [Tibshirani (96)] or Basis Pursuit Denoising [Chen et al. (99)]

infm∈M(T)

λ‖m‖`1(G) +1

2||Φm− (y0 + w)||22 (PMλ (y0 + w))

`2-robustness ([Grasmair et al.(11)])

If m0 =∑i a0,iδi/2M is the unique solution to PM0 (y0), and mλ =

∑i aλ,i is a

solution to PMλ (y0 + w), then ‖aλ − a0‖2 = O(‖w‖2) for λ = C‖w‖2.


Robustness of the support (discrete problem)

Can one guarantee that Suppmλ = Suppm0?

Sufficient conditions for Suppmλ ⊆ Suppm0:I Exact Recovery Principle (ERC) [Tropp (06)]

I Weak Exact Recovery Principle (W-ERC) [Dossal & Mallat (05)]

Almost necessary and sufficient Suppmλ = Suppm0

I Fuchs criterion [Fuchs (04)]


Optimality conditionsFrom convex analysis:

a0 is a solution to PM0 (y0) if and only if there exists p ∈ L2(T) such that thefunction η = Φ∗p satisfies:

∀i s.t. a0,i 6= 0, η

(i

2M

)= sign(a0,i) and ∀k,

∣∣∣∣η( k

2M

)∣∣∣∣ 6 1;

aλ is a solution to PMλ (y0 + w) if and only if the functionηλ = 1

λΦ∗(y0 + w − Φ∗aλ) satisfies:

∀i s.t. aλ,i 6= 0, ηλ

(i

2M

)= sign(aλ,i) and ∀k,

∣∣∣∣ηλ( k

2M

)∣∣∣∣ 6 1;

−1

0

1


Fuchs theorem

For m0 =∑Ni=1 a0,iδx0,i , define

ηF = Φ∗pF , where

pF = argmin{‖p‖L2(T); (Φ∗p)(x0,i) = sign(a0,i)}

= Φ+,∗s.−1

0

1

Theorem ([Fuchs (04)])

Assume that {ϕ(· − x0,1), . . . ϕ(· − x0,N )} has full rank.If |ηF ( k

2M)| < 1 for all k such that k

2M/∈ {x0,1, . . . , x0,N}, then m0 is the unique

solution to PM0 (y0), and there exists α > 0, λ0 > 0 such that for 0 6 λ 6 λ0 and‖w‖2 6 αλ,

The solution mMλ to PMλ (y0 + w) is unique.

SuppmM = Suppm0, that is mMλ =

∑Ni=1 a

Mλ,iδx0,i , and sign(aλ,i) = sign(a0,i),

aMλ,I = a0,I + Φ+xIw − λ(Φ∗xIΦxI )−1 sign(a0,I).

If |ηF ( k2M

)| > 1 for some k, the support is not stable.


Fuchs theorem

For m0 =∑Ni=1 a0,iδx0,i , define

ηF = Φ∗pF , where

pF = argmin{‖p‖L2(T); (Φ∗p)(x0,i) = sign(a0,i)}

= Φ+,∗s.

Theorem ([Fuchs (04)])

Assume that {ϕ(· − x0,1), . . . ϕ(· − x0,N )} has full rank.If |ηF ( k

2M)| < 1 for all k such that k

2M/∈ {x0,1, . . . , x0,N}, then m0 is the unique

solution to PM0 (y0), and there exists α > 0, λ0 > 0 such that for 0 6 λ 6 λ0 and‖w‖2 6 αλ,

The solution mMλ to PMλ (y0 + w) is unique.

SuppmM = Suppm0, that is mMλ =

∑Ni=1 a

Mλ,iδx0,i , and sign(aλ,i) = sign(a0,i),

aMλ,I = a0,I + Φ+xIw − λ(Φ∗xIΦxI )−1 sign(a0,I).

If |ηF ( k2M

)| > 1 for some k, the support is not stable.


But. . .

1 1

When the grid is too thin, the Fuchs criterion cannot hold⇒ the support is not stable.

Theorem (D.-Peyre 2013)

If the Non Degenerate Source Condition holds, and M is large enough,the support at low noise contained in

⋃i∈I(i ∪ i+ εi) where εi = ±1 .

In general, the support at low noise is made at mostof pairs of consecutive spikes: the original one andone of its immediate neighbours.

t0 1

12M

32M

2M−12M


Outline




Continuous setting (total variation)

As proposed in [de Castro & Gamboa (12), Candes & Fernandez-Granda (13), Bredies &

Pikkarainen (13), Tang et al. (12)], consider the

Total variation of measures:

|m|(T) = sup

{∫Tψdm;ψ ∈ C(T), ‖ψ‖∞ 6 1

}m0 =

∑Ni=1 a0,iδx0,i

t0 1

x0,1

x0,2

x0,3

Example : If m =∑Ni=0 aiδxi , then |m|(T) =

∑Ni=0 |ai|.

If m = fdL, then |m|(T) =∫T |f(t)|dt.

Rationale: the extreme points of {m ∈M(T), |m|(T) 6 1} are the Diracmasses: δx for x ∈ T.


Total variation regularization

Total variation of measures:|m|(T) = sup

{∫T ψdm;ψ ∈ C(T), ‖ψ‖∞ 6 1

}Basis Pursuit for measures [de Castro & Gamboa (12), Candes &

Fernandez-Granda (13)],

infm∈M(T)

|m|(T) such that Φm = y0 (P0(y0))

LASSO for measures [Bredies & Pikkarainen (13), Azais et al. (13)]

infm∈M(T)

λ|m|(T) +1

2||Φm− (y0 + w)||22 (Pλ(y0 + w))

Numerical methods for solving P0(y0) and Pλ(y0 + w) are proposed in [Bredies &

Pikkarainen (13), Candes & Fernandez-Granda (13)]


Identifiability for discrete measuresMinimum separation distance of a measure m:

∆(m) = minx,x′∈Suppm,x 6=x′

|x− x′|

Ideal Low Pass filter: ϕ(t) = sin(2fc+1)πt)sinπt

i.e ϕn = 1 for |n| 6 fc, 0 otherwise.

m =∑N

i=1 aiδxi

t0 1

x1

x2

x3

∆(m)

Theorem ((Candes and Fernandez-Granda, 2013))

Let ϕ be the ideal low-pass filter. There exists a constant C > 0 suchthat, for any (discrete) measure m0 with ∆(m0) > C

fc, m0 is the unique

solution of

infm∈M(T)

|m|(T) such that Φm = y0 (P0(y0))

where y0 = Φm0.

Remark: 1 6 C 6 1.87.


Robustness?

Weak-* robustness ([Bredies & Pikkarainen (13)])

If m0 =∑i a0,iδx0,i is the unique solution to P0(y0), mλ is a solution to

Pλ(y0 + w), then mλ∗⇀m0 as λ→ 0+, ‖w‖22/λ→ 0.

(see also [Azais et al. (13), Fernandez-Granda (13)] for robustness of local averages in the case of the ideal LPF)


Optimality conditions

From convex analysis:

m0 =∑Ni=1 a0,iδx0,i is a solution to P0(y0) if and only if there exists p ∈ L2(T)

such that the function η = Φ∗p satisfies:

∀1 6 i 6 N, η (x0,i) = sign(a0,i) and ∀t ∈ T, |η (t)| 6 1;

mλ =∑N′

i=1 aλ,iδxλ,i is a solution to Pλ(y0 + w) if and only if the functionηλ = 1

λΦ∗(y0 + w − Φ∗aλ) satisfies:

∀1 6 i 6 N ′, ηλ (xλ,i) = sign(aλ,i) and ∀t ∈ T, |ηλ (t)| 6 1;

−1

0

1


Robustness of the support (continuous problem)

For m0 =∑Ni=1 ai0δxi,0 , define

ηV = Φ∗pV , where

pV = argmin{‖p‖L2(T); (Φ∗p)(x0,i) = sign(ai,0),

and (Φ∗p)′(x0,i) = 0}−1

0

1


Robustness of the support (continuous problem)

For m0 =∑Ni=1 ai0δxi,0 , define

ηV = Φ∗pV , where

pV = argmin{‖p‖L2(T); (Φ∗p)(x0,i) = sign(ai,0),

and (Φ∗p)′(x0,i) = 0}

Theorem ((D.-Peyre 2013))

Assume that Γx0= (ϕ(· − x0,1), . . . ϕ(· − x0,N , ϕ′(· − x0,1), . . . ϕ′(· − x0,1)) has

full rank, and that m0 satisfies the Non Degenerate Source Condition, i.e.ηV = Φ∗pV satisfies

|ηV (t)| < 1 for all t ∈ T \ {x0,1, . . . , x0,N},η′′V (x0,i) 6= 0 for all 1 6 i 6 N .

Then there exists, α > 0, λ0 > 0 such that for 0 6 λ 6 λ0 and ‖w‖2 6 αλ,

the solution mλ to Pλ(y + w) is unique and has exactly N spikes,

mλ =∑Ni=1 aλ,iδxλ,i ,

the mapping (λ,w) 7→ (aλ, xλ) is C1.


Conclusion

Low noise stability analysis for the discrete and continuous frameworks

Imposing a grid leads to the apparition of parasitic spikes

Experiment yourself the Sparse Spikes Deconvolution on Numericaltours!

www.numerical-tours.com

Preprint:Exact Support Recovery for Sparse Spikes Deconvolution (2013),

V. Duval & G. Peyre,submitted to FoCM


www.numerical-tours.com

Thank you for your attention!

Azais, J.-M., De Castro, Y., and Gamboa, F. (2013). Spike detection frominaccurate samplings. Technical report.

Bredies, K. and Pikkarainen, H. (2013). Inverse problems in spaces ofmeasures. ESAIM: Control, Optimisation and Calculus of Variations,19:190–218.

Candes, E. J. and Fernandez-Granda, C. (2013). Towards a mathematicaltheory of super-resolution. Communications on Pure and AppliedMathematics. To appear.

Chen, S. and Donoho, D. (1994). Basis pursuit. Technical report, StanfordUniversity.

Chen, S., Donoho, D., and Saunders, M. (1999). Atomic decomposition bybasis pursuit. SIAM journal on scientific computing, 20(1):33–61.

de Castro, Y. and Gamboa, F. (2012). Exact reconstruction using beurlingminimal extrapolation. Journal of Mathematical Analysis andApplications, 395(1):336–354.

Dossal, C. and Mallat, S. (2005). Sparse spike deconvolution withminimum scale. In Proceedings of SPARS, pages 123–126.

Fuchs, J. (2004). On sparse representations in arbitrary redundant bases.IEEE Transactions on Information Theory, 50(6):1341–1344.

Grasmair, M., Scherzer, O., and Haltmeier, M. (2011). Necessary andsufficient conditions for linear convergence of `1-regularization.Communications on Pure and Applied Mathematics, 64(2):161–182.

Tang, G., Bhaskar, B. N., and Recht, B. (2013). Near minimax linespectral estimation. CoRR, abs/1303.4348.

Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso.Journal of the Royal Statistical Society. Series B. Methodological,58(1):267–288.

Tropp, J. (2006). Just relax: Convex programming methods for identifyingsparse signals in noise. IEEE Transactions on Information Theory,52(3):1030–1051.


discrete versus continuous sparse regularizationthe deconvolution problem assumption:the signal m 0...

Documents