a statistical physics approach to compressed sensingkrzakala/talk/compressedsensing_philips.pdf ·...

A statistical physics approach to compressed sensing

Florent KrzakalaESPCI, PCT and Gulliver CNRS

in collaboration withMarc Mézard & François Sausset (LPTMS)

Yifan Sun (ESPCI) and Lenka Zdeborová (IPhT Saclay)

http://aspics.krzakala.org

http://www.pct.espci.fr/~florent/ASPICS/ASPICS.html


Who are we? What do we do?Interface between statistical physics,

optimization, information theory and algorithms


Florent Krzakala(ESPCI, Paris)

Lenka Zdeborová(CNRS, Saclay)

Marc Mézard(CNRS, Orsay)




Florent KrzakalaESPCI, PCT and Gulliver CNRS

in collaboration withMarc Mézard & François Sausset (LPTMS)

Yifan Sun (ESPCI) and Lenka Zdeborova (IPhT Saclay)




Sparse signals: what is compressed sensing?

Measurements

From 106 wavelet coefficients, keep 25.000

Why do we record a huge amount of data, and then keep only the important bits?

Couldn’t we record only the relevant information directly?

Most signal of interest are sparse in an appropriated basis⇒Exploited for data compression (JPEG2000).

How does compressed sensing work?

How does compressed sensing work?Image I

nxn pixels

vector of sizeN=n×n

�I =

�

��

I1

.

.

.

.IN

�

��

How does compressed sensing work?M measurements

=M linear operations on the vector

Image I

nxn pixels


�I =

�

��

I1

.

.

.

.IN

�

��



�y =

�

��

y1

.

.

.

.yM

�

��

vector of size M

�y = G�I

G=M×N matri

x

Image I

nxn pixels


�I =

�

��

I1

.

.

.

.IN

�

��



Problem: you know y and G, how to reconstruct I ?

�y =

�

��

y1

.

.

.

.yM

�

��

vector of size M

�y = G�I

G=M×N matri

x

Image I

nxn pixels


�I =

�

��

I1

.

.

.

.IN

�

��



Problem: you know y and G, how to reconstruct I ?

�y =

�

��

y1

.

.

.

.yM

�

��

vector of size M

�y = G�I

G=M×N matri

x

If M=N ☞ easy, just use: I =G-1y

Image I

nxn pixels


�I =

�

��

I1

.

.

.

.IN

�

��



Problem: you know y and G, how to reconstruct I ?If M<N ☞ under-constrained system of equations

Many solutions are possible

Image I

nxn pixels


�y =

�

��

y1

.

.

.

.yM

�

��

vector of size M

�y = G�I

G=M×N matri

x

�I =

�

��

I1

.

.

.

.IN

�

��



Problem: you know y and G, how to reconstruct I ?If M<N ☞ under-constrained system of equations

Many solutions are possible

The idea of compressed sensing is to use the a-priori knowledge that the signal

is sparse in some appropriate basis

Image I

nxn pixels


�y =

�

��

y1

.

.

.

.yM

�

��

vector of size M

�y = G�I

G=M×N matri

x

�I =

�

��

I1

.

.

.

.IN

�

��



��1N×N matrix

Direct and inverse Wavelet transforms

Image I vector of sizeN=n×n

nxn pixels�y =

�

��

y1

.

.

.

.yM

�

��

vector of size M

�y = G�I

G=M×N matri

x

�I =

�

��

I1

.

.

.

.IN

�

��



��1N×N matrix



nxn pixels�y =

�

��

y1

.

.

.

.yM

�

��

vector of size M

�y = G�I

G=M×N matri

x

�I =

�

��

I1

.

.

.

.IN

�

��

Sparse vector of size N=n×n

�x =

�

��

x1

.

.

.

.xN

�

��



��1N×N matrix



nxn pixels�y =

�

��

y1

.

.

.

.yM

�

��

vector of size M

�y = G�I

G=M×N matri

x

with

�y = F�xF = G�

The problem to solve is now

F=M×N matrix

�I =

�

��

I1

.

.

.

.IN

�

��

Sparse vector of size N=n×n

�x =

�

��

x1

.

.

.

.xN

�

��

How does compressed sensing work?

M }N (R non-zeros)

}

•Needs for a solver that finds sparse solutions of an under-constrained set of equations

•Ideally works as long as M>R

•Robust to noise

M⨯N matrix

=y Fx

with

�y = F�xF = G�

The problem to solve is now

F=M×N matrix

State of the art in CS

• Incoherent samplings (i.e. a random matrix F)

• Reconstruction by minimizing the L1 norm

=y Fx

M⨯N matrix

M }N (R non-zeros)

}

||�x||L1 =�

i

|xi|

Candès & Tao (2005)Donoho and Tanner (2005)

Example: measuring a picture

One measurement (scaling product with a random pattern)

Example: measuring a pictureMany measurements (scaling product with many random patterns)

Example: measuring a picture

signal

Random matrix

Measurements

FG

I

Example: measuring a pictureFrom 106 points,

but only, 25.000 non zero

F

x xF

xGI

G


0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

ρ

α

αL1(ρ)

αEM-BP(ρ)

s-BP, N=104

s-BP, N=103

α = ρ

Reconstruction impossible

Reconst

ructio

n

in ex

pone

ntial

time

Reconst

ructio

n

in po

lynomial

time For a signal with

(1-ρ)N zeros R=ρN non zeros}

� =R

N

�=

M N and a random iid matrix with

M =α N


0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

ρ

α

αL1(ρ)

αEM-BP(ρ)

s-BP, N=104

s-BP, N=103

α = ρ

Reconstruction impossible

Reconst

ructio

n

in ex

pone

ntial

time

Reconst

ructio

n

in po

lynomial

time For a signal with

(1-ρ)N zeros R=ρN non zeros}

Reconstruction limited by the Donoho-Tanner transitionfor the L1 norm minimization

� =R

N

�=

M N and a random iid matrix with

M =α N

State of the art in CSFor a signal with

(1-ρ)N zeros R=ρN non zeros

and a random iid matrix with

M =α N

}

� =M

N

⇢ DT=

R/M

Hard

Easy

A different representation of the same transition

State of the art in CSFor a signal with

(1-ρ)N zeros R=ρN non zeros

and a random iid matrix with

M =α N

}

� =M

N

⇢ DT=

R/M

Hard

Easy

Our work

• A probabilistic approach to reconstruction

• The Belief Propagation algorithm

• Seeded measurements matrices


Our work

•A probabilistic approach to reconstruction




A probabilistic approach to compressed sensing

P (�x|�y) =1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�

�yµ �

N�

i=1

Fµixi

�We want to sample from this distribution:


P (�x|�y) =1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�

�yµ �

N�

i=1

Fµixi

�We want to sample from this distribution:}

Solution of the linear system


P (�x|�y) =1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�

�yµ �

N�

i=1

Fµixi

�We want to sample from this distribution:}

Solution of the linear system

}Sparse vector


P (�x|�y) =1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�

�yµ �

N�

i=1

Fµixi


Theorem: sampling from P(x|y) gives the correct solution in the large N limit as long as α>ρ0

if: a) Φ(x)>0 ∀x and b) 1>ρ>0


P (�x|�y) =1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�

�yµ �

N�

i=1

Fµixi



if: a) Φ(x)>0 ∀x and b) 1>ρ>0

Sampling from P(x|y) is optimal, (even if we do not know the correct Φ(x) or the correct ρ)


In practice, we use a Gaussian distribution for Φ(x), with mean m and variance σ2, and “learn” the best value for ρ,σ and m.

P (�x|�y) =1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�

�yµ �

N�

i=1

Fµixi



if: a) Φ(x)>0 ∀x and b) 1>ρ>0


A sketch of the proofConsider the system constrained to be at

distances larger than D with respect to the solution

1) Y(0) is infinite if α>ρ0 (equivalently if M>R) (just count the delta functions! N-R+M deltas versus N integrals...)

2) Y(D) is finite for any D>0 (bound by a first moment method, or “annealed” computation)



1) Y(0) is infinite if α>ρ0 (equivalently if M>R) (just count the delta functions! N-R+M deltas versus N integrals...)

2) Y(D) is finite for any D>0 (bound by a first moment method, or “annealed” computation)

If α>ρ0, the measure is always dominated by the solution



D

log Y

N


Statistical physics and information theory tools can be readily used for • Sampling • Computing phase diagram• etc etc...


Probabilistic reconstruction using:

P (�x|�y) =1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�

�yµ �

N�

i=1

Fµixi

�

The link with statistical physicsand spin glasses

withP (�x|�y) =1Z

N�

i=1

P (xi)M�

µ=1

�

�yµ �

N�

i=1

Fµixi

�P (xi) = (1� �)�(xi) + ��(xi)

P (�x|�y) =1Z

e�PN

i=1 log P (xi)� 12�

PMµ=1(yµ�

PNi=1 Fµixi)

2

In physics, this is called a spin glass problemThis is studied since the early 80’s



N�

i=1

P (xi)M�

µ=1

�

�yµ �

N�

i=1

Fµixi

�P (xi) = (1� �)�(xi) + ��(xi)

Hamiltonian

P (�x|�y) =1Z

e�PN

i=1 log P (xi)� 12�

PMµ=1(yµ�

PNi=1 Fµixi)

2




N�

i=1

P (xi)M�

µ=1

�

�yµ �

N�

i=1

Fµixi

�P (xi) = (1� �)�(xi) + ��(xi)

Partition sumHamiltonian

P (�x|�y) =1Z

e�PN

i=1 log P (xi)� 12�

PMµ=1(yµ�

PNi=1 Fµixi)

2




N�

i=1

P (xi)M�

µ=1

�

�yµ �

N�

i=1

Fµixi

�P (xi) = (1� �)�(xi) + ��(xi)

Disorderedinteraction

P (�x|�y) =1Z

e�PN

i=1 log P (xi)� 12�

PMµ=1(yµ�

PNi=1 Fµixi)

2




N�

i=1

P (xi)M�

µ=1

�

�yµ �

N�

i=1

Fµixi

�P (xi) = (1� �)�(xi) + ��(xi)

Mean-fieldlong-range interactions

P (�x|�y) =1Z

e�PN

i=1 log P (xi)� 12�

PMµ=1(yµ�

PNi=1 Fµixi)

2


Our work


•The Belief Propagation algorithm



Our work


•The Belief Propagation algorithm



Statistical Physics approach (FK et al.)+rigorous (Donoho, Montanari et al.)

How to sample?P (�x|�y) =

1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�

�yµ �

N�

i=1

Fµixi

�


1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�

�yµ �

N�

i=1

Fµixi

�

Solution number 1:using Markov-Chain Monte-Carlo


1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�

�yµ �

N�

i=1

Fµixi

�


Used in the litterature for ultrasound imaging(cf : Quinsac et al, 2011...)


1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�

�yµ �

N�

i=1

Fµixi

�


Very long!

Used in the litterature for ultrasound imaging(cf : Quinsac et al, 2011...)


1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�

�yµ �

N�

i=1

Fµixi

�


Very long!xUsed in the litterature for ultrasound imaging

(cf : Quinsac et al, 2011...)

Solution number 2: estimate the marginal probabilities with a message passing algorithm


1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�

�yµ �

N�

i=1

Fµixi

�

If we do it correctly, then the solution is given by ai =

ZdxiPi(xi)xi

Solution number 2: estimate the marginal probabilities with a message passing algorithm


1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�

�yµ �

N�

i=1

Fµixi

�

In this model, this can be done exactly (for large N,M) using an approach known as:

1. Thouless-Anderson-Parlmer, or Cavity method in physicsBethe-Peierls, Onsager (’35) Parisi and Mezard (’02)

2. Belief propagation in artificial intelligence (Pearl, ’82)

3. Sum-product in coding theory (Gallager, ’60)

3. Approximate Message Passing in compressed sensing Rangan, Montanari...

If we do it correctly, then the solution is given by ai =

ZdxiPi(xi)xi

How does BP works?Gibbs free energy approach:

With

logZ = max

{P(~x)}f

Gibbs

({P(~x)})

f

Gibbs

({P(~x)}) = �hlogP (~x|~y)iP(~x) �Z

d~xP(~x) logP(~x)


P(~x) =Y

i

Pi(~xi)Mean-Field ⇒

With

logZ = max

{P(~x)}f

Gibbs

({P(~x)})

f

Gibbs

({P(~x)}) = �hlogP (~x|~y)iP(~x) �Z

d~xP(~x) logP(~x)


P(~x) =Y

i

Pi(~xi)Mean-Field ⇒ Not correct+Convergence problems

With

logZ = max

{P(~x)}f

Gibbs

({P(~x)})

f

Gibbs

({P(~x)}) = �hlogP (~x|~y)iP(~x) �Z

d~xP(~x) logP(~x)


P(~x) =Y

i


P(~x) =

Qij Pij(~xi, ~xj)Qi Pi(~xi)M�1Belief-Propagation⇒

Not correct+Convergence problems

With

logZ = max

{P(~x)}f

Gibbs

({P(~x)})

f

Gibbs

({P(~x)}) = �hlogP (~x|~y)iP(~x) �Z

d~xP(~x) logP(~x)


P(~x) =Y

i


P(~x) =



(asymptotically) exact in CS

with random matrices

With

logZ = max

{P(~x)}f

Gibbs

({P(~x)})

f

Gibbs

({P(~x)}) = �hlogP (~x|~y)iP(~x) �Z

d~xP(~x) logP(~x)


P(~x) =Y

i


P(~x) =



(asymptotically) exact in CS

with random matrices

The BP recursion is given by the steepest ascent method

With

logZ = max

{P(~x)}f

Gibbs

({P(~x)})

f

Gibbs

({P(~x)}) = �hlogP (~x|~y)iP(~x) �Z

d~xP(~x) logP(~x)

How does BP works?Simplification thanks to the large connectivity limit:

Projection on first two moments is enough :

}Belief-Propagation equations

hxiit+1 = hxiit +@f

@hxii

hx2i it+1 = hx2

i it +@f

@hx2i i

f

�{hxii, hx2

i i}�

f ({Pi(xi),Pij(xi, xj)})

U (t+1)i =

�

M

�

µ

1�µ + �(t)

V (t+1)i =

�

µ

Fµi(yµ � �(t)

µ )

�µ + �(t)µ

+ fa

�U (t)

i , V (t)i

� �

M

�

µ

1�µ + �(t)

�(t+1)µ =

�

i

Fµifa(U (t+1)i , V (t+1)

i )� (yµ � �(t)µ )

�µ + �(t)

1N

�

i

�fa

�Y

�U (t+1)

i , V (t+1)i

�

�(t+1) =1N

�

i

fc(U(t+1)i , V (t+1)

i )

The Belief-Propagation algorithm

fa(X, Y ) =�

�Y

(1 + X)3/2eY 2/(2(1+X))

� �1� � +

�

(1 + X)1/2eY 2/(2(1+X))

��1

fc(X, Y ) =�

�

(1 + X)3/2eY 2/(2(1+X))

�1 +

Y 2

1 + X

�� 1� � +

�

(1 + X)1/2eY 2/(2(1+X))

��1

� fa(X, Y )2

Iterate these variables

Using these functions:

And finally at the end:

�xi� = fa (Ui, Vi)

U (t+1)i =

�

M

�

µ

1�µ + �(t)

V (t+1)i =

�

µ

Fµi(yµ � �(t)

µ )

�µ + �(t)µ

+ fa

�U (t)

i , V (t)i

� �

M

�

µ

1�µ + �(t)

�(t+1)µ =

�

i


i )� (yµ � �(t)µ )

�µ + �(t)

1N

�

i

�fa

�Y

�U (t+1)

i , V (t+1)i

�

�(t+1) =1N

�

i

fc(U(t+1)i , V (t+1)

i )


fa(X, Y ) =�

�Y

(1 + X)3/2eY 2/(2(1+X))

� �1� � +

�

(1 + X)1/2eY 2/(2(1+X))

��1

fc(X, Y ) =�

�

(1 + X)3/2eY 2/(2(1+X))

�1 +

Y 2

1 + X

�� 1� � +

�

(1 + X)1/2eY 2/(2(1+X))

��1

� fa(X, Y )2



Simplealgebraic

operations!}

}And finally at the end:

�xi� = fa (Ui, Vi)

U (t+1)i =

�

M

�

µ

1�µ + �(t)

V (t+1)i =

�

µ

Fµi(yµ � �(t)

µ )

�µ + �(t)µ

+ fa

�U (t)

i , V (t)i

� �

M

�

µ

1�µ + �(t)

�(t+1)µ =

�

i


i )� (yµ � �(t)µ )

�µ + �(t)

1N

�

i

�fa

�Y

�U (t+1)

i , V (t+1)i

�

�(t+1) =1N

�

i

fc(U(t+1)i , V (t+1)

i )


fa(X, Y ) =�

�Y

(1 + X)3/2eY 2/(2(1+X))

� �1� � +

�

(1 + X)1/2eY 2/(2(1+X))

��1

fc(X, Y ) =�

�

(1 + X)3/2eY 2/(2(1+X))

�1 +

Y 2

1 + X

�� 1� � +

�

(1 + X)1/2eY 2/(2(1+X))

��1

� fa(X, Y )2



Simplealgebraic

operations!}

}And finally at the end:

�xi� = fa (Ui, Vi) Complexity is O(N2×convergence time)

Compute the Bethe free-entropy using the BP messages.Compute the gradient

The Belief-Propagation algorithm:How to learn the parameter in the Prior?

F = logZ

⇤(x) =1p2�⇥2

e�(x�x)2/(2�2)�, x,⇥Three parameters

✓⇤F

⇤�,⇤F

⇤x,⇤F

⇤⇥

◆

and update parameters to maximize at each step of the BP iteration

F

Learning makes the algorithm faster(equivalent to Expectation-Maximization)

The performance of the algorithm for a given distribution of signals can be analyzed using a method knows as density

evolution (coding theory) or replica method (physics)

Z(y) =� N�

i=1

dxiP (x|y) F (�y) = � log Z(�y)

Analysis of the algorithm



Z(y) =� N�

i=1


Averaging over a signal distributiion (ex: Gauss Bernoulli) Fµi iid Gaussian, variance 1/N

yµ =NX

i=1

Fµix0i x

0iwhere are iid distributed from (1� ⇥0)�(x

0i ) + ⇥0⇤0(xi)




Z(y) =� N�

i=1



yµ =NX

i=1

Fµix0i x


0i ) + ⇥0⇤0(xi)

Replica method log Z = limn�0

Zn � 1n



⇥(Q, q,m, Q̂, q̂, m̂) = � 1

2N

X

µ

q � 2m+ �+�µ

�µ +Q� q� 1

2N

X

µ

log (�µ +Q� q) +QQ̂

2�mm̂+

qq̂

2

+

ZDz

Zdx0 [(1� ⇥0)�(x0) + ⇥0⇤0(x0)] log

⇢Zdx e�

Q̂+q̂2 x

2+m̂xx0+z

pq̂x

[(1� ⇥)�(x) + ⇥⇤(x)]

�

Order parameters:

Q =1

N

X

i

hx2i i q =

1

N

X

i

hxii2 m =1

N

X

i

x

0i hxii

Mean square error: E =1

N

X

i

�hxii � x

0i

�2= q � 2m+ h(x0

i )2i0


yµ =NX

i=1

Fµix0i x


0i ) + ⇥0⇤0(xi)

0 0.05 0.1 0.15 0.2 0.25 0.3

0.1

0.15

0.2

0.25

D

Φ(D)

α = 0.62α = 0.6α = 0.58α = 0.56

mean square error

Computing the free entropyExample with ρ0=0.4, and Φ0 a Gaussian distribution with zero mean and unit variance

E =1

N

X

i

�hxii � x

0i

�2

�(E

)=

log

Z(E

)

• Maximum is at E=0 (as long as α>ρ0): Equilibrium behavior dominated by the original signal

0 0.05 0.1 0.15 0.2 0.25 0.3

0.1

0.15

0.2

0.25

D

Φ(D)

α = 0.62α = 0.6α = 0.58α = 0.56

mean square error


E =1

N

X

i

�hxii � x

0i

�2

�(E

)=

log

Z(E

)

• Maximum is at E=0 (as long as α>ρ0): Equilibrium behavior dominated by the original signal • For α<0.58, a secondary maximum appears (meta-stable state): spinodal point

0 0.05 0.1 0.15 0.2 0.25 0.3

0.1

0.15

0.2

0.25

D

Φ(D)

α = 0.62α = 0.6α = 0.58α = 0.56

mean square error


E =1

N

X

i

�hxii � x

0i

�2

�(E

)=

log

Z(E

)

• Maximum is at E=0 (as long as α>ρ0): Equilibrium behavior dominated by the original signal • For α<0.58, a secondary maximum appears (meta-stable state): spinodal point• A steepest ascent dynamics starting from large E would reach the signal for α>0.58, but would stay block in the meta-stable state for α<0.58, even if the true equilibrium is at E=0.

0 0.05 0.1 0.15 0.2 0.25 0.3

0.1

0.15

0.2

0.25

D

Φ(D)

α = 0.62α = 0.6α = 0.58α = 0.56

mean square error


E =1

N

X

i

�hxii � x

0i

�2

�(E

)=

log

Z(E

)

• Maximum is at E=0 (as long as α>ρ0): Equilibrium behavior dominated by the original signal • For α<0.58, a secondary maximum appears (meta-stable state): spinodal point• A steepest ascent dynamics starting from large E would reach the signal for α>0.58, but would stay block in the meta-stable state for α<0.58, even if the true equilibrium is at E=0.• Similarity with supercooled liquids

0 0.05 0.1 0.15 0.2 0.25 0.3

0.1

0.15

0.2

0.25

D

Φ(D)

α = 0.62α = 0.6α = 0.58α = 0.56

mean square error


E =1

N

X

i

�hxii � x

0i

�2

�(E

)=

log

Z(E

)

αL1αBEPα=ρ

0.4 0.5 0.6 0.7 0.8 0.90

0.05

0.1

0.15

0.2

α

Mea

n s

qu

are

erro

r

L1

BEP

1 10-5 0.0001 0.001 0.01 0.1

-1

-0.5

0

0.5

1

Mean square errorta

nh

[4!

(E)]

α = 0.8

α = 0.6

α = 0.5

α = 0.3

αL1αrBPα=ρ

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

ρ

α

αL1(ρ)αrBP(ρ)S-BEP

α = ρ

0.4 0.5 0.6 0.7 0.8 0.9

30

100

300

1000

3000

10000

α

Nu

mb

er o

f it

erat

ion

s

rBP

Seeded BEP - L=10

Seeded BEP - L=40

L1

Spinodal line

Computing the Phase Diagram

0 0.05 0.1 0.15 0.2 0.25 0.3

0.1

0.15

0.2

0.25

DΦ(D)

α = 0.62α = 0.6α = 0.58α = 0.56

E(E

)

αL1αBEPα=ρ

0.4 0.5 0.6 0.7 0.8 0.90

0.05

0.1

0.15

0.2

α

Mea

n s

qu

are

erro

r

L1

BEP

1 10-5 0.0001 0.001 0.01 0.1

-1

-0.5

0

0.5

1

Mean square errorta

nh

[4!

(E)]

α = 0.8

α = 0.6

α = 0.5

α = 0.3

αL1αrBPα=ρ

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

ρ

α


α = ρ

0.4 0.5 0.6 0.7 0.8 0.9

30

100

300

1000

3000

10000

α

Nu

mb

er o

f it

erat

ion

s

rBP

Seeded BEP - L=10

Seeded BEP - L=40

L1

Spinodal line

A steepest ascent of the free entropy allows a perfect reconstruction until the spinodal line.

This is more efficient than L1-minimization

Computing the Phase Diagram

0 0.05 0.1 0.15 0.2 0.25 0.3

0.1

0.15

0.2

0.25

DΦ(D)

α = 0.62α = 0.6α = 0.58α = 0.56

E(E

)

0 0.05 0.1 0.15 0.2 0.25 0.3

0.1

0.15

0.2

0.25

D

Φ(D)

α = 0.62α = 0.6α = 0.58α = 0.56

Thermodynamic potential

distance to native state

BP convergence time

Spinodal transition(supercooled limit)

0.4 0.5 0.6 0.7 0.8 0.9

30

100

300

1000

3000

10000

αN

um

ber

of it

erat

ions

rBP

Seeded BEP - L=10

Seeded BEP - L=40

L1BP

The limit depends on the type of signal (while the Donoho-Tanner is universal)

αL1αBPα=ρ

0.4 0.5 0.6 0.7 0.8 0.90

0.05

0.1

0.15

0.2

α

Mea

n s

quar

e er

ror

L1

BEP

1 10-5 0.0001 0.001 0.01 0.1

-1

-0.5

0

0.5

1

Mean square error

tan

h[4

!(E

)]

α = 0.8

α = 0.6

α = 0.5

α = 0.3

αL1αBPα=ρ

0.4 0.5 0.6 0.7 0.8 0.9

30

100

300

1000

3000

10000

α

Nu

mb

er o

f it

erat

ion

s

BP

s-BP - L=10

s-BP - L=40

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

ρ

α

αL1(ρ)

αEM-BP(ρ)

s-BP, N=104

s-BP, N=103

α = ρ

★

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

ρ

α

αL1(ρ)

αEM-BP(ρ)

s-BP, N=104

s-BP, N=103

α = ρ

Gauss-Bernoulli signal Binary signals

Donoho

-Tann

er

BP

Donoho

-Tann

erBP

Trying different type of signals

The limit depends on the type of signal (while the Donoho-Tanner is universal)


Trying different type of signals

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

δDT

ρ DT

ℓ1

EM-BP

s-BP, N=104

s-BP, N=103

ρDT=1

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

δDT

ρ DT

ℓ1

EM-BP

s-BP, N=104

s-BP, N=103

ρDT = 1Donoho-Tanner

BP

Donoho-Tanner

BP

BP is Robust to noise

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

!/"

=K

/M

"=M/N

0.2

0.1

1E-41E-9

1E-10

1E-11

1E-12

DT

Spinodal line

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

!/"

=K

/M

"=M/N

0.2

0.1

1E-2 1E-3

1E-4

1E-5

DT

Spinodal line

Noise with Noise � = 10�5 � = 10�2

L1

BEP

S-BEP

α = 0.5 α = 0.4 α = 0.3 α = 0.2 α = 0.1

α = ρ ! 0.15

Shepp-Logan phantom, in the Haar-wavelet representation

A more complex signal

BP

BP + probabilistic approach

• Efficient and fast

• Robust to noise

• Very flexible (more information can be put in the prior)

P (�x|�y) =1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�

�yµ �

N�

i=1

Fµixi

�

Our work





This is good, but not good enough

αL1αBEPα=ρ

0.4 0.5 0.6 0.7 0.8 0.90

0.05

0.1

0.15

0.2

α

Mea

n s

qu

are

erro

r

L1

BEP

1 10-5 0.0001 0.001 0.01 0.1

-1

-0.5

0

0.5

1

Mean square errorta

nh

[4!

(E)]

α = 0.8

α = 0.6

α = 0.5

α = 0.3

αL1αrBPα=ρ

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

ρ

α


α = ρ

0.4 0.5 0.6 0.7 0.8 0.9

30

100

300

1000

3000

10000

α

Nu

mb

er o

f it

erat

ion

s

rBP

Seeded BEP - L=10

Seeded BEP - L=40

L1

Spinodal line

0 0.05 0.1 0.15 0.2 0.25 0.3

0.1

0.15

0.2

0.25

DΦ(D)

α = 0.62α = 0.6α = 0.58α = 0.56

The dynamics is stuck in a metastable state, just as a liquid cooled too fast remains in a supercooled

liquid state instead of crystalizing


αL1αBEPα=ρ

0.4 0.5 0.6 0.7 0.8 0.90

0.05

0.1

0.15

0.2

α

Mea

n s

qu

are

erro

r

L1

BEP

1 10-5 0.0001 0.001 0.01 0.1

-1

-0.5

0

0.5

1

Mean square errorta

nh

[4!

(E)]

α = 0.8

α = 0.6

α = 0.5

α = 0.3

αL1αrBPα=ρ

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

ρ

α


α = ρ

0.4 0.5 0.6 0.7 0.8 0.9

30

100

300

1000

3000

10000

α

Nu

mb

er o

f it

erat

ion

s

rBP

Seeded BEP - L=10

Seeded BEP - L=40

L1

Spinodal line

0 0.05 0.1 0.15 0.2 0.25 0.3

0.1

0.15

0.2

0.25

DΦ(D)

α = 0.62α = 0.6α = 0.58α = 0.56




0 0.05 0.1 0.15 0.2 0.25 0.3

0.1

0.15

0.2

0.25

DΦ(D)

α = 0.62α = 0.6α = 0.58α = 0.56

How to pass the spinodal point?




0 0.05 0.1 0.15 0.2 0.25 0.3

0.1

0.15

0.2

0.25

DΦ(D)

α = 0.62α = 0.6α = 0.58α = 0.56

How to pass the spinodal point?

By nucleation!

Special design of “seeded” matrices

0

BBBB@

1

CCCCA= ⇥

y F s

: unit coupling

: no coupling (null elements)

: coupling J1: coupling J2

J1J1

J1J1

J1J1

J1

J2J2

J2J2

J2J2

11

11

11

1

1 J2

0

0

0

BBBB@

1

CCCCA

0

BBBBBBBBBBBB@

1

CCCCCCCCCCCCA

L = 8

Ni = N/L

Mi = �iN/L

�1 > �BP

�j = �0 < �BP j � 2

� =1

L(�1 + (L� 1)�0)

0

BBBB@

1

CCCCA= ⇥

y F s

: unit coupling



J1J1

J1J1

J1J1

J1

J2J2

J2J2

J2J2

11

11

11

1

1 J2

0

0

0

BBBB@

1

CCCCA

0

BBBBBBBBBBBB@

1

CCCCCCCCCCCCA

L = 8

Mi = �iN/L

�1 > �BP

�j = �0 < �BP j � 2

� =1

L(�1 + (L� 1)�0)

Ni = N/L

Ni = N/L

0

BBBB@

1

CCCCA= ⇥

y F s

: unit coupling



J1J1

J1J1

J1J1

J1

J2J2

J2J2

J2J2

11

11

11

1

1 J2

0

0

0

BBBB@

1

CCCCA

0

BBBBBBBBBBBB@

1

CCCCCCCCCCCCA

L = 8

Mi = �iN/L

�1 > �BP

�j = �0 < �BP j � 2

� =1

L(�1 + (L� 1)�0)

Ni = N/L

Ni = N/L

Mi = �0N/L

i � {2, · · · , L}

M1 = �1N/L

0

BBBB@

1

CCCCA= ⇥

y F s

: unit coupling



J1J1

J1J1

J1J1

J1

J2J2

J2J2

J2J2

11

11

11

1

1 J2

0

0

0

BBBB@

1

CCCCA

0

BBBBBBBBBBBB@

1

CCCCCCCCCCCCA

L = 8

Ni = N/L

Mi = �iN/L

�1 > �BP

�j = �0 < �BP j � 2

� =1

L(�1 + (L� 1)�0)

0

BBBB@

1

CCCCA= ⇥

y F s

: unit coupling



J1J1

J1J1

J1J1

J1

J2J2

J2J2

J2J2

11

11

11

1

1 J2

0

0

0

BBBB@

1

CCCCA

0

BBBBBBBBBBBB@

1

CCCCCCCCCCCCA

L = 8

Ni = N/L

Mi = �iN/L

�1 > �BP

�j = �0 < �BP j � 2

� =1

L(�1 + (L� 1)�0)

Block 1 has a large value of M such that the solution arise in this block...

0

BBBB@

1

CCCCA= ⇥

y F s

: unit coupling



J1J1

J1J1

J1J1

J1

J2J2

J2J2

J2J2

11

11

11

1

1 J2

0

0

0

BBBB@

1

CCCCA

0

BBBBBBBBBBBB@

1

CCCCCCCCCCCCA

L = 8

Ni = N/L

Mi = �iN/L

�1 > �BP

�j = �0 < �BP j � 2

� =1

L(�1 + (L� 1)�0)

Block 1 has a large value of M such that the solution arise in this block...... and then propagate in the whole system!

5 10 15 200

0.1

0.2

0.3

0.4

Block index

Mea

n s

qu

are

erro

rt=1t=4t=10t=20t=50t=100t=150t=200t=250t=300

L = 20 � = .4N = 50000 J1 = 20

J2 = .2

↵1 = 1

� = .5

Example with ρ0=0.4, and Φ0

a Gaussian distribution with 0 mean and unit variance

A signal with α=0.5 and ρ=0.4

-3

-2

-1

0

1

2

3

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

N=10000 points, rho=0.4, alpha=0.5

Faulty reconstruction with L1Perfect reconstruction with BP

Blue is the true signal reconstructed by s-BPRed is the signal found by L1

αL1αBPα=ρ

0.4 0.5 0.6 0.7 0.8 0.90

0.05

0.1

0.15

0.2

α

Mea

n s

qu

are

erro

r

L1

BEP

1 10-5 0.0001 0.001 0.01 0.1

-1

-0.5

0

0.5

1

Mean square error

tan

h[4

!(E

)]

α = 0.8

α = 0.6

α = 0.5

α = 0.3

αL1αBPα=ρ

0.4 0.5 0.6 0.7 0.8 0.9

30

100

300

1000

3000

10000

α

Nu

mb

er o

f it

erat

ion

s

BP

s-BP - L=10

s-BP - L=40

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

ρ

α

αL1(ρ)

αEM-BP(ρ)

s-BP, N=104

s-BP, N=103

α = ρ

★

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

ρ

α

αL1(ρ)

αEM-BP(ρ)

s-BP, N=104

s-BP, N=103

α = ρ


L1

Phase DiagramsL1

(N=103 &104)BP

BP

seeded BP

L1

BEP

S-BEP

α = 0.5 α = 0.4 α = 0.3 α = 0.2 α = 0.1

α = ρ ! 0.15

s-BP

Shepp-Logan phantom, in the Haar-wavelet representation

A more interesting example

BP

L1

BEP

S−BEP

� =0.6 � =0.5 � =0.4 � =0.3 � =0.20.24

BEP

S-BEP

α = 0.6 α = 0.5 α = 0.4 α = 0.3 α = 0.2

α = ρ ! 0.24

L1

s-BP

The Lena picture in the Haar-wavelet representation

A EVEN more interesting example

Analytical results for seeding matrices•One can repeat the replica analysis for the seeded matrices, and the performance of the algorithm can be studied analytically, leading to α>ρ in the large N limit:

•These results have been recently confirmed by a rigorous analysis by Donoho, Montanari and Javanmard (arXiv:1112.0708)

•There is a lot of liberty in the design of the seeded matrices.

Analytical results for seeding matrices•One can repeat the replica analysis for the seeded matrices, and the performance of the algorithm can be studied analytically, leading to α>ρ in the large N limit:

•These results have been recently confirmed by a rigorous analysis by Donoho, Montanari and Javanmard (arXiv:1112.0708)

Asymptotically optimal measurements

•There is a lot of liberty in the design of the seeded matrices.

Conclusions...• A probabilistic approach to reconstruction



... and perspectives:• More information in the prior?

• Other matrix with asymptotic measurements?

• Calibration noise, additive noise, quasi-sparsity, etc... ?

• Applications ?

L1

BEP

S−BEP

� =0.6 � =0.5 � =0.4 � =0.3 � =0.20.24

BEP

S-BEP

α = 0.6 α = 0.5 α = 0.4 α = 0.3 α = 0.2

α = ρ ! 0.24

L1

-3

-2

-1

0

1

2

3

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000



L1

BEP

S−BEP

� =0.6 � =0.5 � =0.4 � =0.3 � =0.20.24

BEP

S-BEP

α = 0.6 α = 0.5 α = 0.4 α = 0.3 α = 0.2

α = ρ ! 0.24

L1

Thank you for your attention!

-3

-2

-1

0

1

2

3

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000



Noise sensitivity

1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.1

0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6

MSE

�

CS with Gauss-Bernoulli (�0=0.2) noisy (�n=10-4) signals

L=4, theoryBP, L=4, N=5000


L1 min, N=5000

0

0.05

0.1

0.15

0.2

0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6

MSE

�

CS with Gauss-Bernoulli (�0=0.2) noisy (�n=10-4) signals



L1 min, N=5000

Noise sensitivity

1e-10 1e-09 1e-08 1e-07 1e-06 1e-05

0.0001 0.001

0.01 0.1

1

0.4 0.42 0.44 0.46 0.48 0.5 0.52 0.54 0.56 0.58 0.6

MSE

α

CS with Gauss-Bernoulli (ρ0=0.4) noisy signals

s-BP, σ=10-3

s-BP, σ=10-4

s-BP, σ=10-5

Theory, σ=10-3

Theory, σ=10-4

Theory, σ=10-5

L=1 σ=10-5

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07

R

V

Gaussian Signal, Gaussian inference, rho=0.2 no spinodal

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07R

V

Gaussian Signal, Gaussian inference, rho=0.33 with spinodal

Flow in the space Q� qE = q � 2m+ h(x0

i )2i0,

0

0.05

0.1

0.15

0.2

0 0.05 0.1 0.15 0.2

R

V

Binary Signal, Gaussian inference, rho=0.15 no spinodal

0

0.05

0.1

0.15

0.2

0 0.05 0.1 0.15 0.2R

V

Binary Signal, Gaussian inference, rho=0.25 with spinodal

a statistical physics approach to compressed sensingkrzakala/talk/compressedsensing_philips.pdf ·...

Documents