a statistical physics approach to compressed sensingkrzakala/talk/compressedsensing_philips.pdf ·...

Post on 22-Jul-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A statistical physics approach to compressed sensing

Florent KrzakalaESPCI, PCT and Gulliver CNRS

in collaboration withMarc Mézard & François Sausset (LPTMS)

Yifan Sun (ESPCI) and Lenka Zdeborová (IPhT Saclay)

http://aspics.krzakala.org

Who are we? What do we do?Interface between statistical physics,

optimization, information theory and algorithms

http://aspics.krzakala.org

Florent Krzakala(ESPCI, Paris)

Lenka Zdeborová(CNRS, Saclay)

Marc Mézard(CNRS, Orsay)

A statistical physics approach to compressed sensing

Florent KrzakalaESPCI, PCT and Gulliver CNRS

in collaboration withMarc Mézard & François Sausset (LPTMS)

Yifan Sun (ESPCI) and Lenka Zdeborova (IPhT Saclay)

http://aspics.krzakala.org

Sparse signals: what is compressed sensing?

Measurements

From 106 wavelet coefficients, keep 25.000

Why do we record a huge amount of data, and then keep only the important bits?

Couldn’t we record only the relevant information directly?

Most signal of interest are sparse in an appropriated basis⇒Exploited for data compression (JPEG2000).

How does compressed sensing work?

How does compressed sensing work?Image I

nxn pixels

vector of sizeN=n×n

�I =

�������

I1

.

.

.

.IN

�������

How does compressed sensing work?M measurements

=M linear operations on the vector

Image I

nxn pixels

vector of sizeN=n×n

�I =

�������

I1

.

.

.

.IN

�������

How does compressed sensing work?M measurements

=M linear operations on the vector

�y =

�������

y1

.

.

.

.yM

�������

vector of size M

�y = G�I

G=M×N matri

x

Image I

nxn pixels

vector of sizeN=n×n

�I =

�������

I1

.

.

.

.IN

�������

How does compressed sensing work?M measurements

=M linear operations on the vector

Problem: you know y and G, how to reconstruct I ?

�y =

�������

y1

.

.

.

.yM

�������

vector of size M

�y = G�I

G=M×N matri

x

Image I

nxn pixels

vector of sizeN=n×n

�I =

�������

I1

.

.

.

.IN

�������

How does compressed sensing work?M measurements

=M linear operations on the vector

Problem: you know y and G, how to reconstruct I ?

�y =

�������

y1

.

.

.

.yM

�������

vector of size M

�y = G�I

G=M×N matri

x

If M=N ☞ easy, just use: I =G-1y

Image I

nxn pixels

vector of sizeN=n×n

�I =

�������

I1

.

.

.

.IN

�������

How does compressed sensing work?M measurements

=M linear operations on the vector

Problem: you know y and G, how to reconstruct I ?If M<N ☞ under-constrained system of equations

Many solutions are possible

Image I

nxn pixels

vector of sizeN=n×n

�y =

�������

y1

.

.

.

.yM

�������

vector of size M

�y = G�I

G=M×N matri

x

�I =

�������

I1

.

.

.

.IN

�������

How does compressed sensing work?M measurements

=M linear operations on the vector

Problem: you know y and G, how to reconstruct I ?If M<N ☞ under-constrained system of equations

Many solutions are possible

The idea of compressed sensing is to use the a-priori knowledge that the signal

is sparse in some appropriate basis

Image I

nxn pixels

vector of sizeN=n×n

�y =

�������

y1

.

.

.

.yM

�������

vector of size M

�y = G�I

G=M×N matri

x

�I =

�������

I1

.

.

.

.IN

�������

How does compressed sensing work?M measurements

=M linear operations on the vector

���1N×N matrix

Direct and inverse Wavelet transforms

Image I vector of sizeN=n×n

nxn pixels�y =

�������

y1

.

.

.

.yM

�������

vector of size M

�y = G�I

G=M×N matri

x

�I =

�������

I1

.

.

.

.IN

�������

How does compressed sensing work?M measurements

=M linear operations on the vector

���1N×N matrix

Direct and inverse Wavelet transforms

Image I vector of sizeN=n×n

nxn pixels�y =

�������

y1

.

.

.

.yM

�������

vector of size M

�y = G�I

G=M×N matri

x

�I =

�������

I1

.

.

.

.IN

�������

Sparse vector of size N=n×n

�x =

�������

x1

.

.

.

.xN

�������

How does compressed sensing work?M measurements

=M linear operations on the vector

���1N×N matrix

Direct and inverse Wavelet transforms

Image I vector of sizeN=n×n

nxn pixels�y =

�������

y1

.

.

.

.yM

�������

vector of size M

�y = G�I

G=M×N matri

x

with

�y = F�xF = G�

The problem to solve is now

F=M×N matrix

�I =

�������

I1

.

.

.

.IN

�������

Sparse vector of size N=n×n

�x =

�������

x1

.

.

.

.xN

�������

How does compressed sensing work?

M }N (R non-zeros)

}

•Needs for a solver that finds sparse solutions of an under-constrained set of equations

•Ideally works as long as M>R

•Robust to noise

M⨯N matrix

=y Fx

with

�y = F�xF = G�

The problem to solve is now

F=M×N matrix

State of the art in CS

• Incoherent samplings (i.e. a random matrix F)

• Reconstruction by minimizing the L1 norm

=y Fx

M⨯N matrix

M }N (R non-zeros)

}

||�x||L1 =�

i

|xi|

Candès & Tao (2005)Donoho and Tanner (2005)

Example: measuring a picture

One measurement (scaling product with a random pattern)

Example: measuring a pictureMany measurements (scaling product with many random patterns)

Example: measuring a picture

signal

Random matrix

Measurements

FG

I

Example: measuring a pictureFrom 106 points,

but only, 25.000 non zero

F

x xF

xGI

G

State of the art in CS

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

ρ

α

αL1(ρ)

αEM-BP(ρ)

s-BP, N=104

s-BP, N=103

α = ρ

Reconstruction impossible

Reconst

ructio

n

in ex

pone

ntial

time

Reconst

ructio

n

in po

lynomial

time For a signal with

(1-ρ)N zeros R=ρN non zeros}

� =R

N

�=

M N and a random iid matrix with

M =α N

State of the art in CS

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

ρ

α

αL1(ρ)

αEM-BP(ρ)

s-BP, N=104

s-BP, N=103

α = ρ

Reconstruction impossible

Reconst

ructio

n

in ex

pone

ntial

time

Reconst

ructio

n

in po

lynomial

time For a signal with

(1-ρ)N zeros R=ρN non zeros}

Reconstruction limited by the Donoho-Tanner transitionfor the L1 norm minimization

� =R

N

�=

M N and a random iid matrix with

M =α N

State of the art in CSFor a signal with

(1-ρ)N zeros R=ρN non zeros

and a random iid matrix with

M =α N

}

� =M

N

⇢ DT=

R/M

Hard

Easy

A different representation of the same transition

State of the art in CSFor a signal with

(1-ρ)N zeros R=ρN non zeros

and a random iid matrix with

M =α N

}

� =M

N

⇢ DT=

R/M

Hard

Easy

Our work

• A probabilistic approach to reconstruction

• The Belief Propagation algorithm

• Seeded measurements matrices

A statistical physics approach to compressed sensing

Our work

•A probabilistic approach to reconstruction

• The Belief Propagation algorithm

• Seeded measurements matrices

A statistical physics approach to compressed sensing

A probabilistic approach to compressed sensing

P (�x|�y) =1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�yµ �

N�

i=1

Fµixi

�We want to sample from this distribution:

A probabilistic approach to compressed sensing

P (�x|�y) =1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�yµ �

N�

i=1

Fµixi

�We want to sample from this distribution:}

Solution of the linear system

A probabilistic approach to compressed sensing

P (�x|�y) =1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�yµ �

N�

i=1

Fµixi

�We want to sample from this distribution:}

Solution of the linear system

}Sparse vector

A probabilistic approach to compressed sensing

P (�x|�y) =1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�yµ �

N�

i=1

Fµixi

�We want to sample from this distribution:

Theorem: sampling from P(x|y) gives the correct solution in the large N limit as long as α>ρ0

if: a) Φ(x)>0 ∀x and b) 1>ρ>0

A probabilistic approach to compressed sensing

P (�x|�y) =1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�yµ �

N�

i=1

Fµixi

�We want to sample from this distribution:

Theorem: sampling from P(x|y) gives the correct solution in the large N limit as long as α>ρ0

if: a) Φ(x)>0 ∀x and b) 1>ρ>0

Sampling from P(x|y) is optimal, (even if we do not know the correct Φ(x) or the correct ρ)

A probabilistic approach to compressed sensing

In practice, we use a Gaussian distribution for Φ(x), with mean m and variance σ2, and “learn” the best value for ρ,σ and m.

P (�x|�y) =1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�yµ �

N�

i=1

Fµixi

�We want to sample from this distribution:

Theorem: sampling from P(x|y) gives the correct solution in the large N limit as long as α>ρ0

if: a) Φ(x)>0 ∀x and b) 1>ρ>0

Sampling from P(x|y) is optimal, (even if we do not know the correct Φ(x) or the correct ρ)

A sketch of the proofConsider the system constrained to be at

distances larger than D with respect to the solution

1) Y(0) is infinite if α>ρ0 (equivalently if M>R) (just count the delta functions! N-R+M deltas versus N integrals...)

2) Y(D) is finite for any D>0 (bound by a first moment method, or “annealed” computation)

A sketch of the proofConsider the system constrained to be at

distances larger than D with respect to the solution

1) Y(0) is infinite if α>ρ0 (equivalently if M>R) (just count the delta functions! N-R+M deltas versus N integrals...)

2) Y(D) is finite for any D>0 (bound by a first moment method, or “annealed” computation)

If α>ρ0, the measure is always dominated by the solution

A sketch of the proofConsider the system constrained to be at

distances larger than D with respect to the solution

D

log Y

N

A probabilistic approach to compressed sensing

Statistical physics and information theory tools can be readily used for • Sampling • Computing phase diagram• etc etc...

Sampling from P(x|y) is optimal, (even if we do not know the correct Φ(x) or the correct ρ)

Probabilistic reconstruction using:

P (�x|�y) =1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�yµ �

N�

i=1

Fµixi

The link with statistical physicsand spin glasses

withP (�x|�y) =1Z

N�

i=1

P (xi)M�

µ=1

�yµ �

N�

i=1

Fµixi

�P (xi) = (1� �)�(xi) + ��(xi)

P (�x|�y) =1Z

e�PN

i=1 log P (xi)� 12�

PMµ=1(yµ�

PNi=1 Fµixi)

2

In physics, this is called a spin glass problemThis is studied since the early 80’s

The link with statistical physicsand spin glasses

withP (�x|�y) =1Z

N�

i=1

P (xi)M�

µ=1

�yµ �

N�

i=1

Fµixi

�P (xi) = (1� �)�(xi) + ��(xi)

Hamiltonian

P (�x|�y) =1Z

e�PN

i=1 log P (xi)� 12�

PMµ=1(yµ�

PNi=1 Fµixi)

2

In physics, this is called a spin glass problemThis is studied since the early 80’s

The link with statistical physicsand spin glasses

withP (�x|�y) =1Z

N�

i=1

P (xi)M�

µ=1

�yµ �

N�

i=1

Fµixi

�P (xi) = (1� �)�(xi) + ��(xi)

Partition sumHamiltonian

P (�x|�y) =1Z

e�PN

i=1 log P (xi)� 12�

PMµ=1(yµ�

PNi=1 Fµixi)

2

In physics, this is called a spin glass problemThis is studied since the early 80’s

The link with statistical physicsand spin glasses

withP (�x|�y) =1Z

N�

i=1

P (xi)M�

µ=1

�yµ �

N�

i=1

Fµixi

�P (xi) = (1� �)�(xi) + ��(xi)

Disorderedinteraction

P (�x|�y) =1Z

e�PN

i=1 log P (xi)� 12�

PMµ=1(yµ�

PNi=1 Fµixi)

2

In physics, this is called a spin glass problemThis is studied since the early 80’s

The link with statistical physicsand spin glasses

withP (�x|�y) =1Z

N�

i=1

P (xi)M�

µ=1

�yµ �

N�

i=1

Fµixi

�P (xi) = (1� �)�(xi) + ��(xi)

Mean-fieldlong-range interactions

P (�x|�y) =1Z

e�PN

i=1 log P (xi)� 12�

PMµ=1(yµ�

PNi=1 Fµixi)

2

In physics, this is called a spin glass problemThis is studied since the early 80’s

Our work

• A probabilistic approach to reconstruction

•The Belief Propagation algorithm

• Seeded measurements matrices

A statistical physics approach to compressed sensing

Our work

• A probabilistic approach to reconstruction

•The Belief Propagation algorithm

• Seeded measurements matrices

A statistical physics approach to compressed sensing

Statistical Physics approach (FK et al.)+rigorous (Donoho, Montanari et al.)

How to sample?P (�x|�y) =

1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�yµ �

N�

i=1

Fµixi

How to sample?P (�x|�y) =

1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�yµ �

N�

i=1

Fµixi

Solution number 1:using Markov-Chain Monte-Carlo

How to sample?P (�x|�y) =

1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�yµ �

N�

i=1

Fµixi

Solution number 1:using Markov-Chain Monte-Carlo

Used in the litterature for ultrasound imaging(cf : Quinsac et al, 2011...)

How to sample?P (�x|�y) =

1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�yµ �

N�

i=1

Fµixi

Solution number 1:using Markov-Chain Monte-Carlo

Very long!

Used in the litterature for ultrasound imaging(cf : Quinsac et al, 2011...)

How to sample?P (�x|�y) =

1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�yµ �

N�

i=1

Fµixi

Solution number 1:using Markov-Chain Monte-Carlo

Very long!xUsed in the litterature for ultrasound imaging

(cf : Quinsac et al, 2011...)

Solution number 2: estimate the marginal probabilities with a message passing algorithm

How to sample?P (�x|�y) =

1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�yµ �

N�

i=1

Fµixi

If we do it correctly, then the solution is given by ai =

ZdxiPi(xi)xi

Solution number 2: estimate the marginal probabilities with a message passing algorithm

How to sample?P (�x|�y) =

1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�yµ �

N�

i=1

Fµixi

In this model, this can be done exactly (for large N,M) using an approach known as:

1. Thouless-Anderson-Parlmer, or Cavity method in physicsBethe-Peierls, Onsager (’35) Parisi and Mezard (’02)

2. Belief propagation in artificial intelligence (Pearl, ’82)

3. Sum-product in coding theory (Gallager, ’60)

3. Approximate Message Passing in compressed sensing Rangan, Montanari...

If we do it correctly, then the solution is given by ai =

ZdxiPi(xi)xi

How does BP works?Gibbs free energy approach:

With

logZ = max

{P(~x)}f

Gibbs

({P(~x)})

f

Gibbs

({P(~x)}) = �hlogP (~x|~y)iP(~x) �Z

d~xP(~x) logP(~x)

How does BP works?Gibbs free energy approach:

P(~x) =Y

i

Pi(~xi)Mean-Field ⇒

With

logZ = max

{P(~x)}f

Gibbs

({P(~x)})

f

Gibbs

({P(~x)}) = �hlogP (~x|~y)iP(~x) �Z

d~xP(~x) logP(~x)

How does BP works?Gibbs free energy approach:

P(~x) =Y

i

Pi(~xi)Mean-Field ⇒ Not correct+Convergence problems

With

logZ = max

{P(~x)}f

Gibbs

({P(~x)})

f

Gibbs

({P(~x)}) = �hlogP (~x|~y)iP(~x) �Z

d~xP(~x) logP(~x)

How does BP works?Gibbs free energy approach:

P(~x) =Y

i

Pi(~xi)Mean-Field ⇒

P(~x) =

Qij Pij(~xi, ~xj)Qi Pi(~xi)M�1Belief-Propagation⇒

Not correct+Convergence problems

With

logZ = max

{P(~x)}f

Gibbs

({P(~x)})

f

Gibbs

({P(~x)}) = �hlogP (~x|~y)iP(~x) �Z

d~xP(~x) logP(~x)

How does BP works?Gibbs free energy approach:

P(~x) =Y

i

Pi(~xi)Mean-Field ⇒

P(~x) =

Qij Pij(~xi, ~xj)Qi Pi(~xi)M�1Belief-Propagation⇒

Not correct+Convergence problems

(asymptotically) exact in CS

with random matrices

With

logZ = max

{P(~x)}f

Gibbs

({P(~x)})

f

Gibbs

({P(~x)}) = �hlogP (~x|~y)iP(~x) �Z

d~xP(~x) logP(~x)

How does BP works?Gibbs free energy approach:

P(~x) =Y

i

Pi(~xi)Mean-Field ⇒

P(~x) =

Qij Pij(~xi, ~xj)Qi Pi(~xi)M�1Belief-Propagation⇒

Not correct+Convergence problems

(asymptotically) exact in CS

with random matrices

The BP recursion is given by the steepest ascent method

With

logZ = max

{P(~x)}f

Gibbs

({P(~x)})

f

Gibbs

({P(~x)}) = �hlogP (~x|~y)iP(~x) �Z

d~xP(~x) logP(~x)

How does BP works?Simplification thanks to the large connectivity limit:

Projection on first two moments is enough :

}Belief-Propagation equations

hxiit+1 = hxiit +@f

@hxii

hx2i it+1 = hx2

i it +@f

@hx2i i

f

�{hxii, hx2

i i}�

f ({Pi(xi),Pij(xi, xj)})

U (t+1)i =

M

µ

1�µ + �(t)

V (t+1)i =

µ

Fµi(yµ � �(t)

µ )

�µ + �(t)µ

+ fa

�U (t)

i , V (t)i

� �

M

µ

1�µ + �(t)

�(t+1)µ =

i

Fµifa(U (t+1)i , V (t+1)

i )� (yµ � �(t)µ )

�µ + �(t)

1N

i

�fa

�Y

�U (t+1)

i , V (t+1)i

�(t+1) =1N

i

fc(U(t+1)i , V (t+1)

i )

The Belief-Propagation algorithm

fa(X, Y ) =�

�Y

(1 + X)3/2eY 2/(2(1+X))

� �1� � +

(1 + X)1/2eY 2/(2(1+X))

��1

fc(X, Y ) =�

(1 + X)3/2eY 2/(2(1+X))

�1 +

Y 2

1 + X

�� �1� � +

(1 + X)1/2eY 2/(2(1+X))

��1

� fa(X, Y )2

Iterate these variables

Using these functions:

And finally at the end:

�xi� = fa (Ui, Vi)

U (t+1)i =

M

µ

1�µ + �(t)

V (t+1)i =

µ

Fµi(yµ � �(t)

µ )

�µ + �(t)µ

+ fa

�U (t)

i , V (t)i

� �

M

µ

1�µ + �(t)

�(t+1)µ =

i

Fµifa(U (t+1)i , V (t+1)

i )� (yµ � �(t)µ )

�µ + �(t)

1N

i

�fa

�Y

�U (t+1)

i , V (t+1)i

�(t+1) =1N

i

fc(U(t+1)i , V (t+1)

i )

The Belief-Propagation algorithm

fa(X, Y ) =�

�Y

(1 + X)3/2eY 2/(2(1+X))

� �1� � +

(1 + X)1/2eY 2/(2(1+X))

��1

fc(X, Y ) =�

(1 + X)3/2eY 2/(2(1+X))

�1 +

Y 2

1 + X

�� �1� � +

(1 + X)1/2eY 2/(2(1+X))

��1

� fa(X, Y )2

Iterate these variables

Using these functions:

Simplealgebraic

operations!}

}And finally at the end:

�xi� = fa (Ui, Vi)

U (t+1)i =

M

µ

1�µ + �(t)

V (t+1)i =

µ

Fµi(yµ � �(t)

µ )

�µ + �(t)µ

+ fa

�U (t)

i , V (t)i

� �

M

µ

1�µ + �(t)

�(t+1)µ =

i

Fµifa(U (t+1)i , V (t+1)

i )� (yµ � �(t)µ )

�µ + �(t)

1N

i

�fa

�Y

�U (t+1)

i , V (t+1)i

�(t+1) =1N

i

fc(U(t+1)i , V (t+1)

i )

The Belief-Propagation algorithm

fa(X, Y ) =�

�Y

(1 + X)3/2eY 2/(2(1+X))

� �1� � +

(1 + X)1/2eY 2/(2(1+X))

��1

fc(X, Y ) =�

(1 + X)3/2eY 2/(2(1+X))

�1 +

Y 2

1 + X

�� �1� � +

(1 + X)1/2eY 2/(2(1+X))

��1

� fa(X, Y )2

Iterate these variables

Using these functions:

Simplealgebraic

operations!}

}And finally at the end:

�xi� = fa (Ui, Vi) Complexity is O(N2×convergence time)

Compute the Bethe free-entropy using the BP messages.Compute the gradient

The Belief-Propagation algorithm:How to learn the parameter in the Prior?

F = logZ

⇤(x) =1p2�⇥2

e�(x�x)2/(2�2)�, x,⇥Three parameters

✓⇤F

⇤�,⇤F

⇤x,⇤F

⇤⇥

and update parameters to maximize at each step of the BP iteration

F

Learning makes the algorithm faster(equivalent to Expectation-Maximization)

The performance of the algorithm for a given distribution of signals can be analyzed using a method knows as density

evolution (coding theory) or replica method (physics)

Z(y) =� N�

i=1

dxiP (x|y) F (�y) = � log Z(�y)

Analysis of the algorithm

The performance of the algorithm for a given distribution of signals can be analyzed using a method knows as density

evolution (coding theory) or replica method (physics)

Z(y) =� N�

i=1

dxiP (x|y) F (�y) = � log Z(�y)

Averaging over a signal distributiion (ex: Gauss Bernoulli) Fµi iid Gaussian, variance 1/N

yµ =NX

i=1

Fµix0i x

0iwhere are iid distributed from (1� ⇥0)�(x

0i ) + ⇥0⇤0(xi)

Analysis of the algorithm

The performance of the algorithm for a given distribution of signals can be analyzed using a method knows as density

evolution (coding theory) or replica method (physics)

Z(y) =� N�

i=1

dxiP (x|y) F (�y) = � log Z(�y)

Averaging over a signal distributiion (ex: Gauss Bernoulli) Fµi iid Gaussian, variance 1/N

yµ =NX

i=1

Fµix0i x

0iwhere are iid distributed from (1� ⇥0)�(x

0i ) + ⇥0⇤0(xi)

Replica method log Z = limn�0

Zn � 1n

Analysis of the algorithm

Analysis of the algorithm

⇥(Q, q,m, Q̂, q̂, m̂) = � 1

2N

X

µ

q � 2m+ �+�µ

�µ +Q� q� 1

2N

X

µ

log (�µ +Q� q) +QQ̂

2�mm̂+

qq̂

2

+

ZDz

Zdx0 [(1� ⇥0)�(x0) + ⇥0⇤0(x0)] log

⇢Zdx e�

Q̂+q̂2 x

2+m̂xx0+z

pq̂x

[(1� ⇥)�(x) + ⇥⇤(x)]

Order parameters:

Q =1

N

X

i

hx2i i q =

1

N

X

i

hxii2 m =1

N

X

i

x

0i hxii

Mean square error: E =1

N

X

i

�hxii � x

0i

�2= q � 2m+ h(x0

i )2i0

Averaging over a signal distributiion (ex: Gauss Bernoulli) Fµi iid Gaussian, variance 1/N

yµ =NX

i=1

Fµix0i x

0iwhere are iid distributed from (1� ⇥0)�(x

0i ) + ⇥0⇤0(xi)

Analysis of the algorithm

⇥(Q, q,m, Q̂, q̂, m̂) = � 1

2N

X

µ

q � 2m+ �+�µ

�µ +Q� q� 1

2N

X

µ

log (�µ +Q� q) +QQ̂

2�mm̂+

qq̂

2

+

ZDz

Zdx0 [(1� ⇥0)�(x0) + ⇥0⇤0(x0)] log

⇢Zdx e�

Q̂+q̂2 x

2+m̂xx0+z

pq̂x

[(1� ⇥)�(x) + ⇥⇤(x)]

Order parameters:

Q =1

N

X

i

hx2i i q =

1

N

X

i

hxii2 m =1

N

X

i

x

0i hxii

Mean square error: E =1

N

X

i

�hxii � x

0i

�2= q � 2m+ h(x0

i )2i0

Averaging over a signal distributiion (ex: Gauss Bernoulli) Fµi iid Gaussian, variance 1/N

yµ =NX

i=1

Fµix0i x

0iwhere are iid distributed from (1� ⇥0)�(x

0i ) + ⇥0⇤0(xi)

0 0.05 0.1 0.15 0.2 0.25 0.3

0.1

0.15

0.2

0.25

D

Φ(D)

α = 0.62α = 0.6α = 0.58α = 0.56

mean square error

Computing the free entropyExample with ρ0=0.4, and Φ0 a Gaussian distribution with zero mean and unit variance

E =1

N

X

i

�hxii � x

0i

�2

�(E

)=

log

Z(E

)

• Maximum is at E=0 (as long as α>ρ0): Equilibrium behavior dominated by the original signal

0 0.05 0.1 0.15 0.2 0.25 0.3

0.1

0.15

0.2

0.25

D

Φ(D)

α = 0.62α = 0.6α = 0.58α = 0.56

mean square error

Computing the free entropyExample with ρ0=0.4, and Φ0 a Gaussian distribution with zero mean and unit variance

E =1

N

X

i

�hxii � x

0i

�2

�(E

)=

log

Z(E

)

• Maximum is at E=0 (as long as α>ρ0): Equilibrium behavior dominated by the original signal • For α<0.58, a secondary maximum appears (meta-stable state): spinodal point

0 0.05 0.1 0.15 0.2 0.25 0.3

0.1

0.15

0.2

0.25

D

Φ(D)

α = 0.62α = 0.6α = 0.58α = 0.56

mean square error

Computing the free entropyExample with ρ0=0.4, and Φ0 a Gaussian distribution with zero mean and unit variance

E =1

N

X

i

�hxii � x

0i

�2

�(E

)=

log

Z(E

)

• Maximum is at E=0 (as long as α>ρ0): Equilibrium behavior dominated by the original signal • For α<0.58, a secondary maximum appears (meta-stable state): spinodal point• A steepest ascent dynamics starting from large E would reach the signal for α>0.58, but would stay block in the meta-stable state for α<0.58, even if the true equilibrium is at E=0.

0 0.05 0.1 0.15 0.2 0.25 0.3

0.1

0.15

0.2

0.25

D

Φ(D)

α = 0.62α = 0.6α = 0.58α = 0.56

mean square error

Computing the free entropyExample with ρ0=0.4, and Φ0 a Gaussian distribution with zero mean and unit variance

E =1

N

X

i

�hxii � x

0i

�2

�(E

)=

log

Z(E

)

• Maximum is at E=0 (as long as α>ρ0): Equilibrium behavior dominated by the original signal • For α<0.58, a secondary maximum appears (meta-stable state): spinodal point• A steepest ascent dynamics starting from large E would reach the signal for α>0.58, but would stay block in the meta-stable state for α<0.58, even if the true equilibrium is at E=0.• Similarity with supercooled liquids

0 0.05 0.1 0.15 0.2 0.25 0.3

0.1

0.15

0.2

0.25

D

Φ(D)

α = 0.62α = 0.6α = 0.58α = 0.56

mean square error

Computing the free entropyExample with ρ0=0.4, and Φ0 a Gaussian distribution with zero mean and unit variance

E =1

N

X

i

�hxii � x

0i

�2

�(E

)=

log

Z(E

)

αL1αBEPα=ρ

0.4 0.5 0.6 0.7 0.8 0.90

0.05

0.1

0.15

0.2

α

Mea

n s

qu

are

erro

r

L1

BEP

1 10-5 0.0001 0.001 0.01 0.1

-1

-0.5

0

0.5

1

Mean square errorta

nh

[4!

(E)]

α = 0.8

α = 0.6

α = 0.5

α = 0.3

αL1αrBPα=ρ

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

ρ

α

αL1(ρ)αrBP(ρ)S-BEP

α = ρ

0.4 0.5 0.6 0.7 0.8 0.9

30

100

300

1000

3000

10000

α

Nu

mb

er o

f it

erat

ion

s

rBP

Seeded BEP - L=10

Seeded BEP - L=40

L1

Spinodal line

Computing the Phase Diagram

0 0.05 0.1 0.15 0.2 0.25 0.3

0.1

0.15

0.2

0.25

DΦ(D)

α = 0.62α = 0.6α = 0.58α = 0.56

E(E

)

αL1αBEPα=ρ

0.4 0.5 0.6 0.7 0.8 0.90

0.05

0.1

0.15

0.2

α

Mea

n s

qu

are

erro

r

L1

BEP

1 10-5 0.0001 0.001 0.01 0.1

-1

-0.5

0

0.5

1

Mean square errorta

nh

[4!

(E)]

α = 0.8

α = 0.6

α = 0.5

α = 0.3

αL1αrBPα=ρ

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

ρ

α

αL1(ρ)αrBP(ρ)S-BEP

α = ρ

0.4 0.5 0.6 0.7 0.8 0.9

30

100

300

1000

3000

10000

α

Nu

mb

er o

f it

erat

ion

s

rBP

Seeded BEP - L=10

Seeded BEP - L=40

L1

Spinodal line

A steepest ascent of the free entropy allows a perfect reconstruction until the spinodal line.

This is more efficient than L1-minimization

Computing the Phase Diagram

0 0.05 0.1 0.15 0.2 0.25 0.3

0.1

0.15

0.2

0.25

DΦ(D)

α = 0.62α = 0.6α = 0.58α = 0.56

E(E

)

0 0.05 0.1 0.15 0.2 0.25 0.3

0.1

0.15

0.2

0.25

D

Φ(D)

α = 0.62α = 0.6α = 0.58α = 0.56

Thermodynamic potential

distance to native state

BP convergence time

Spinodal transition(supercooled limit)

0.4 0.5 0.6 0.7 0.8 0.9

30

100

300

1000

3000

10000

αN

um

ber

of it

erat

ions

rBP

Seeded BEP - L=10

Seeded BEP - L=40

L1BP

The limit depends on the type of signal (while the Donoho-Tanner is universal)

αL1αBPα=ρ

0.4 0.5 0.6 0.7 0.8 0.90

0.05

0.1

0.15

0.2

α

Mea

n s

quar

e er

ror

L1

BEP

1 10-5 0.0001 0.001 0.01 0.1

-1

-0.5

0

0.5

1

Mean square error

tan

h[4

!(E

)]

α = 0.8

α = 0.6

α = 0.5

α = 0.3

αL1αBPα=ρ

0.4 0.5 0.6 0.7 0.8 0.9

30

100

300

1000

3000

10000

α

Nu

mb

er o

f it

erat

ion

s

BP

s-BP - L=10

s-BP - L=40

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

ρ

α

αL1(ρ)

αEM-BP(ρ)

s-BP, N=104

s-BP, N=103

α = ρ

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

ρ

α

αL1(ρ)

αEM-BP(ρ)

s-BP, N=104

s-BP, N=103

α = ρ

Gauss-Bernoulli signal Binary signals

Donoho

-Tann

er

BP

Donoho

-Tann

erBP

Trying different type of signals

The limit depends on the type of signal (while the Donoho-Tanner is universal)

Gauss-Bernoulli signal Binary signals

Trying different type of signals

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

δDT

ρ DT

ℓ1

EM-BP

s-BP, N=104

s-BP, N=103

ρDT=1

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

δDT

ρ DT

ℓ1

EM-BP

s-BP, N=104

s-BP, N=103

ρDT = 1Donoho-Tanner

BP

Donoho-Tanner

BP

BP is Robust to noise

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

!/"

=K

/M

"=M/N

0.2

0.1

1E-41E-9

1E-10

1E-11

1E-12

DT

Spinodal line

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

!/"

=K

/M

"=M/N

0.2

0.1

1E-2 1E-3

1E-4

1E-5

DT

Spinodal line

Noise with Noise � = 10�5 � = 10�2

L1

BEP

S-BEP

α = 0.5 α = 0.4 α = 0.3 α = 0.2 α = 0.1

α = ρ ! 0.15

Shepp-Logan phantom, in the Haar-wavelet representation

A more complex signal

BP

BP + probabilistic approach

• Efficient and fast

• Robust to noise

• Very flexible (more information can be put in the prior)

P (�x|�y) =1Z

N�

i=1

[(1� �) �(xi) + ��(xi)]M�

µ=1

�yµ �

N�

i=1

Fµixi

Our work

• A probabilistic approach to reconstruction

• The Belief Propagation algorithm

• Seeded measurements matrices

A statistical physics approach to compressed sensing

This is good, but not good enough

αL1αBEPα=ρ

0.4 0.5 0.6 0.7 0.8 0.90

0.05

0.1

0.15

0.2

α

Mea

n s

qu

are

erro

r

L1

BEP

1 10-5 0.0001 0.001 0.01 0.1

-1

-0.5

0

0.5

1

Mean square errorta

nh

[4!

(E)]

α = 0.8

α = 0.6

α = 0.5

α = 0.3

αL1αrBPα=ρ

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

ρ

α

αL1(ρ)αrBP(ρ)S-BEP

α = ρ

0.4 0.5 0.6 0.7 0.8 0.9

30

100

300

1000

3000

10000

α

Nu

mb

er o

f it

erat

ion

s

rBP

Seeded BEP - L=10

Seeded BEP - L=40

L1

Spinodal line

0 0.05 0.1 0.15 0.2 0.25 0.3

0.1

0.15

0.2

0.25

DΦ(D)

α = 0.62α = 0.6α = 0.58α = 0.56

The dynamics is stuck in a metastable state, just as a liquid cooled too fast remains in a supercooled

liquid state instead of crystalizing

This is good, but not good enough

αL1αBEPα=ρ

0.4 0.5 0.6 0.7 0.8 0.90

0.05

0.1

0.15

0.2

α

Mea

n s

qu

are

erro

r

L1

BEP

1 10-5 0.0001 0.001 0.01 0.1

-1

-0.5

0

0.5

1

Mean square errorta

nh

[4!

(E)]

α = 0.8

α = 0.6

α = 0.5

α = 0.3

αL1αrBPα=ρ

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

ρ

α

αL1(ρ)αrBP(ρ)S-BEP

α = ρ

0.4 0.5 0.6 0.7 0.8 0.9

30

100

300

1000

3000

10000

α

Nu

mb

er o

f it

erat

ion

s

rBP

Seeded BEP - L=10

Seeded BEP - L=40

L1

Spinodal line

0 0.05 0.1 0.15 0.2 0.25 0.3

0.1

0.15

0.2

0.25

DΦ(D)

α = 0.62α = 0.6α = 0.58α = 0.56

The dynamics is stuck in a metastable state, just as a liquid cooled too fast remains in a supercooled

liquid state instead of crystalizing

This is good, but not good enough

0 0.05 0.1 0.15 0.2 0.25 0.3

0.1

0.15

0.2

0.25

DΦ(D)

α = 0.62α = 0.6α = 0.58α = 0.56

How to pass the spinodal point?

The dynamics is stuck in a metastable state, just as a liquid cooled too fast remains in a supercooled

liquid state instead of crystalizing

This is good, but not good enough

0 0.05 0.1 0.15 0.2 0.25 0.3

0.1

0.15

0.2

0.25

DΦ(D)

α = 0.62α = 0.6α = 0.58α = 0.56

How to pass the spinodal point?

By nucleation!

Special design of “seeded” matrices

0

BBBB@

1

CCCCA= ⇥

y F s

: unit coupling

: no coupling (null elements)

: coupling J1: coupling J2

J1J1

J1J1

J1J1

J1

J2J2

J2J2

J2J2

11

11

11

1

1 J2

0

0

0

BBBB@

1

CCCCA

0

BBBBBBBBBBBB@

1

CCCCCCCCCCCCA

L = 8

Ni = N/L

Mi = �iN/L

�1 > �BP

�j = �0 < �BP j � 2

� =1

L(�1 + (L� 1)�0)

0

BBBB@

1

CCCCA= ⇥

y F s

: unit coupling

: no coupling (null elements)

: coupling J1: coupling J2

J1J1

J1J1

J1J1

J1

J2J2

J2J2

J2J2

11

11

11

1

1 J2

0

0

0

BBBB@

1

CCCCA

0

BBBBBBBBBBBB@

1

CCCCCCCCCCCCA

L = 8

Mi = �iN/L

�1 > �BP

�j = �0 < �BP j � 2

� =1

L(�1 + (L� 1)�0)

Ni = N/L

Ni = N/L

0

BBBB@

1

CCCCA= ⇥

y F s

: unit coupling

: no coupling (null elements)

: coupling J1: coupling J2

J1J1

J1J1

J1J1

J1

J2J2

J2J2

J2J2

11

11

11

1

1 J2

0

0

0

BBBB@

1

CCCCA

0

BBBBBBBBBBBB@

1

CCCCCCCCCCCCA

L = 8

Mi = �iN/L

�1 > �BP

�j = �0 < �BP j � 2

� =1

L(�1 + (L� 1)�0)

Ni = N/L

Ni = N/L

Mi = �0N/L

i � {2, · · · , L}

M1 = �1N/L

0

BBBB@

1

CCCCA= ⇥

y F s

: unit coupling

: no coupling (null elements)

: coupling J1: coupling J2

J1J1

J1J1

J1J1

J1

J2J2

J2J2

J2J2

11

11

11

1

1 J2

0

0

0

BBBB@

1

CCCCA

0

BBBBBBBBBBBB@

1

CCCCCCCCCCCCA

L = 8

Ni = N/L

Mi = �iN/L

�1 > �BP

�j = �0 < �BP j � 2

� =1

L(�1 + (L� 1)�0)

0

BBBB@

1

CCCCA= ⇥

y F s

: unit coupling

: no coupling (null elements)

: coupling J1: coupling J2

J1J1

J1J1

J1J1

J1

J2J2

J2J2

J2J2

11

11

11

1

1 J2

0

0

0

BBBB@

1

CCCCA

0

BBBBBBBBBBBB@

1

CCCCCCCCCCCCA

L = 8

Ni = N/L

Mi = �iN/L

�1 > �BP

�j = �0 < �BP j � 2

� =1

L(�1 + (L� 1)�0)

Block 1 has a large value of M such that the solution arise in this block...

0

BBBB@

1

CCCCA= ⇥

y F s

: unit coupling

: no coupling (null elements)

: coupling J1: coupling J2

J1J1

J1J1

J1J1

J1

J2J2

J2J2

J2J2

11

11

11

1

1 J2

0

0

0

BBBB@

1

CCCCA

0

BBBBBBBBBBBB@

1

CCCCCCCCCCCCA

L = 8

Ni = N/L

Mi = �iN/L

�1 > �BP

�j = �0 < �BP j � 2

� =1

L(�1 + (L� 1)�0)

Block 1 has a large value of M such that the solution arise in this block...... and then propagate in the whole system!

5 10 15 200

0.1

0.2

0.3

0.4

Block index

Mea

n s

qu

are

erro

rt=1t=4t=10t=20t=50t=100t=150t=200t=250t=300

L = 20 � = .4N = 50000 J1 = 20

J2 = .2

↵1 = 1

� = .5

Example with ρ0=0.4, and Φ0

a Gaussian distribution with 0 mean and unit variance

A signal with α=0.5 and ρ=0.4

-3

-2

-1

0

1

2

3

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

N=10000 points, rho=0.4, alpha=0.5

Faulty reconstruction with L1Perfect reconstruction with BP

Blue is the true signal reconstructed by s-BPRed is the signal found by L1

αL1αBPα=ρ

0.4 0.5 0.6 0.7 0.8 0.90

0.05

0.1

0.15

0.2

α

Mea

n s

qu

are

erro

r

L1

BEP

1 10-5 0.0001 0.001 0.01 0.1

-1

-0.5

0

0.5

1

Mean square error

tan

h[4

!(E

)]

α = 0.8

α = 0.6

α = 0.5

α = 0.3

αL1αBPα=ρ

0.4 0.5 0.6 0.7 0.8 0.9

30

100

300

1000

3000

10000

α

Nu

mb

er o

f it

erat

ion

s

BP

s-BP - L=10

s-BP - L=40

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

ρ

α

αL1(ρ)

αEM-BP(ρ)

s-BP, N=104

s-BP, N=103

α = ρ

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

ρ

α

αL1(ρ)

αEM-BP(ρ)

s-BP, N=104

s-BP, N=103

α = ρ

Gauss-Bernoulli signal Binary signals

L1

Phase DiagramsL1

(N=103 &104)BP

BP

seeded BP

L1

BEP

S-BEP

α = 0.5 α = 0.4 α = 0.3 α = 0.2 α = 0.1

α = ρ ! 0.15

s-BP

Shepp-Logan phantom, in the Haar-wavelet representation

A more interesting example

BP

L1

BEP

S−BEP

� =0.6 � =0.5 � =0.4 � =0.3 � =0.20.24

BEP

S-BEP

α = 0.6 α = 0.5 α = 0.4 α = 0.3 α = 0.2

α = ρ ! 0.24

L1

s-BP

The Lena picture in the Haar-wavelet representation

A EVEN more interesting example

Analytical results for seeding matrices•One can repeat the replica analysis for the seeded matrices, and the performance of the algorithm can be studied analytically, leading to α>ρ in the large N limit:

•These results have been recently confirmed by a rigorous analysis by Donoho, Montanari and Javanmard (arXiv:1112.0708)

•There is a lot of liberty in the design of the seeded matrices.

Analytical results for seeding matrices•One can repeat the replica analysis for the seeded matrices, and the performance of the algorithm can be studied analytically, leading to α>ρ in the large N limit:

•These results have been recently confirmed by a rigorous analysis by Donoho, Montanari and Javanmard (arXiv:1112.0708)

Asymptotically optimal measurements

•There is a lot of liberty in the design of the seeded matrices.

Conclusions...• A probabilistic approach to reconstruction

• The Belief Propagation algorithm

• Seeded measurements matrices

... and perspectives:• More information in the prior?

• Other matrix with asymptotic measurements?

• Calibration noise, additive noise, quasi-sparsity, etc... ?

• Applications ?

L1

BEP

S−BEP

� =0.6 � =0.5 � =0.4 � =0.3 � =0.20.24

BEP

S-BEP

α = 0.6 α = 0.5 α = 0.4 α = 0.3 α = 0.2

α = ρ ! 0.24

L1

-3

-2

-1

0

1

2

3

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

N=10000 points, rho=0.4, alpha=0.5

Faulty reconstruction with L1Perfect reconstruction with BP

L1

BEP

S−BEP

� =0.6 � =0.5 � =0.4 � =0.3 � =0.20.24

BEP

S-BEP

α = 0.6 α = 0.5 α = 0.4 α = 0.3 α = 0.2

α = ρ ! 0.24

L1

Thank you for your attention!

-3

-2

-1

0

1

2

3

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

N=10000 points, rho=0.4, alpha=0.5

Faulty reconstruction with L1Perfect reconstruction with BP

BONUS

Noise sensitivity

1e-09

1e-08

1e-07

1e-06

1e-05

0.0001

0.001

0.01

0.1

0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6

MSE

CS with Gauss-Bernoulli (�0=0.2) noisy (�n=10-4) signals

L=4, theoryBP, L=4, N=5000

L=1, theoryBP, L=1, N=5000

L1 min, N=5000

0

0.05

0.1

0.15

0.2

0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6

MSE

CS with Gauss-Bernoulli (�0=0.2) noisy (�n=10-4) signals

L=4, theoryBP, L=4, N=5000

L=1, theoryBP, L=1, N=5000

L1 min, N=5000

Noise sensitivity

1e-10 1e-09 1e-08 1e-07 1e-06 1e-05

0.0001 0.001

0.01 0.1

1

0.4 0.42 0.44 0.46 0.48 0.5 0.52 0.54 0.56 0.58 0.6

MSE

α

CS with Gauss-Bernoulli (ρ0=0.4) noisy signals

s-BP, σ=10-3

s-BP, σ=10-4

s-BP, σ=10-5

Theory, σ=10-3

Theory, σ=10-4

Theory, σ=10-5

L=1 σ=10-5

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07

R

V

Gaussian Signal, Gaussian inference, rho=0.2 no spinodal

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07R

V

Gaussian Signal, Gaussian inference, rho=0.33 with spinodal

Flow in the space Q� qE = q � 2m+ h(x0

i )2i0,

0

0.05

0.1

0.15

0.2

0 0.05 0.1 0.15 0.2

R

V

Binary Signal, Gaussian inference, rho=0.15 no spinodal

0

0.05

0.1

0.15

0.2

0 0.05 0.1 0.15 0.2R

V

Binary Signal, Gaussian inference, rho=0.25 with spinodal

top related