a statistical physics approach to compressed sensingkrzakala/talk/compressedsensing_philips.pdf ·...
TRANSCRIPT
A statistical physics approach to compressed sensing
Florent KrzakalaESPCI, PCT and Gulliver CNRS
in collaboration withMarc Mézard & François Sausset (LPTMS)
Yifan Sun (ESPCI) and Lenka Zdeborová (IPhT Saclay)
http://aspics.krzakala.org
Who are we? What do we do?Interface between statistical physics,
optimization, information theory and algorithms
http://aspics.krzakala.org
Florent Krzakala(ESPCI, Paris)
Lenka Zdeborová(CNRS, Saclay)
Marc Mézard(CNRS, Orsay)
A statistical physics approach to compressed sensing
Florent KrzakalaESPCI, PCT and Gulliver CNRS
in collaboration withMarc Mézard & François Sausset (LPTMS)
Yifan Sun (ESPCI) and Lenka Zdeborova (IPhT Saclay)
http://aspics.krzakala.org
Sparse signals: what is compressed sensing?
Measurements
From 106 wavelet coefficients, keep 25.000
Why do we record a huge amount of data, and then keep only the important bits?
Couldn’t we record only the relevant information directly?
Most signal of interest are sparse in an appropriated basis⇒Exploited for data compression (JPEG2000).
How does compressed sensing work?
How does compressed sensing work?Image I
nxn pixels
vector of sizeN=n×n
�I =
�
�������
I1
.
.
.
.IN
�
�������
How does compressed sensing work?M measurements
=M linear operations on the vector
Image I
nxn pixels
vector of sizeN=n×n
�I =
�
�������
I1
.
.
.
.IN
�
�������
How does compressed sensing work?M measurements
=M linear operations on the vector
�y =
�
�������
y1
.
.
.
.yM
�
�������
vector of size M
�y = G�I
G=M×N matri
x
Image I
nxn pixels
vector of sizeN=n×n
�I =
�
�������
I1
.
.
.
.IN
�
�������
How does compressed sensing work?M measurements
=M linear operations on the vector
Problem: you know y and G, how to reconstruct I ?
�y =
�
�������
y1
.
.
.
.yM
�
�������
vector of size M
�y = G�I
G=M×N matri
x
Image I
nxn pixels
vector of sizeN=n×n
�I =
�
�������
I1
.
.
.
.IN
�
�������
How does compressed sensing work?M measurements
=M linear operations on the vector
Problem: you know y and G, how to reconstruct I ?
�y =
�
�������
y1
.
.
.
.yM
�
�������
vector of size M
�y = G�I
G=M×N matri
x
If M=N ☞ easy, just use: I =G-1y
Image I
nxn pixels
vector of sizeN=n×n
�I =
�
�������
I1
.
.
.
.IN
�
�������
How does compressed sensing work?M measurements
=M linear operations on the vector
Problem: you know y and G, how to reconstruct I ?If M<N ☞ under-constrained system of equations
Many solutions are possible
Image I
nxn pixels
vector of sizeN=n×n
�y =
�
�������
y1
.
.
.
.yM
�
�������
vector of size M
�y = G�I
G=M×N matri
x
�I =
�
�������
I1
.
.
.
.IN
�
�������
How does compressed sensing work?M measurements
=M linear operations on the vector
Problem: you know y and G, how to reconstruct I ?If M<N ☞ under-constrained system of equations
Many solutions are possible
The idea of compressed sensing is to use the a-priori knowledge that the signal
is sparse in some appropriate basis
Image I
nxn pixels
vector of sizeN=n×n
�y =
�
�������
y1
.
.
.
.yM
�
�������
vector of size M
�y = G�I
G=M×N matri
x
�I =
�
�������
I1
.
.
.
.IN
�
�������
How does compressed sensing work?M measurements
=M linear operations on the vector
���1N×N matrix
Direct and inverse Wavelet transforms
Image I vector of sizeN=n×n
nxn pixels�y =
�
�������
y1
.
.
.
.yM
�
�������
vector of size M
�y = G�I
G=M×N matri
x
�I =
�
�������
I1
.
.
.
.IN
�
�������
How does compressed sensing work?M measurements
=M linear operations on the vector
���1N×N matrix
Direct and inverse Wavelet transforms
Image I vector of sizeN=n×n
nxn pixels�y =
�
�������
y1
.
.
.
.yM
�
�������
vector of size M
�y = G�I
G=M×N matri
x
�I =
�
�������
I1
.
.
.
.IN
�
�������
Sparse vector of size N=n×n
�x =
�
�������
x1
.
.
.
.xN
�
�������
How does compressed sensing work?M measurements
=M linear operations on the vector
���1N×N matrix
Direct and inverse Wavelet transforms
Image I vector of sizeN=n×n
nxn pixels�y =
�
�������
y1
.
.
.
.yM
�
�������
vector of size M
�y = G�I
G=M×N matri
x
with
�y = F�xF = G�
The problem to solve is now
F=M×N matrix
�I =
�
�������
I1
.
.
.
.IN
�
�������
Sparse vector of size N=n×n
�x =
�
�������
x1
.
.
.
.xN
�
�������
How does compressed sensing work?
M }N (R non-zeros)
}
•Needs for a solver that finds sparse solutions of an under-constrained set of equations
•Ideally works as long as M>R
•Robust to noise
M⨯N matrix
=y Fx
with
�y = F�xF = G�
The problem to solve is now
F=M×N matrix
State of the art in CS
• Incoherent samplings (i.e. a random matrix F)
• Reconstruction by minimizing the L1 norm
=y Fx
M⨯N matrix
M }N (R non-zeros)
}
||�x||L1 =�
i
|xi|
Candès & Tao (2005)Donoho and Tanner (2005)
Example: measuring a picture
One measurement (scaling product with a random pattern)
Example: measuring a pictureMany measurements (scaling product with many random patterns)
Example: measuring a picture
signal
Random matrix
Measurements
FG
I
Example: measuring a pictureFrom 106 points,
but only, 25.000 non zero
F
x xF
xGI
G
State of the art in CS
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
ρ
α
αL1(ρ)
αEM-BP(ρ)
s-BP, N=104
s-BP, N=103
α = ρ
Reconstruction impossible
Reconst
ructio
n
in ex
pone
ntial
time
Reconst
ructio
n
in po
lynomial
time For a signal with
(1-ρ)N zeros R=ρN non zeros}
� =R
N
�=
M N and a random iid matrix with
M =α N
State of the art in CS
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
ρ
α
αL1(ρ)
αEM-BP(ρ)
s-BP, N=104
s-BP, N=103
α = ρ
Reconstruction impossible
Reconst
ructio
n
in ex
pone
ntial
time
Reconst
ructio
n
in po
lynomial
time For a signal with
(1-ρ)N zeros R=ρN non zeros}
Reconstruction limited by the Donoho-Tanner transitionfor the L1 norm minimization
� =R
N
�=
M N and a random iid matrix with
M =α N
State of the art in CSFor a signal with
(1-ρ)N zeros R=ρN non zeros
and a random iid matrix with
M =α N
}
� =M
N
⇢ DT=
R/M
Hard
Easy
A different representation of the same transition
State of the art in CSFor a signal with
(1-ρ)N zeros R=ρN non zeros
and a random iid matrix with
M =α N
}
� =M
N
⇢ DT=
R/M
Hard
Easy
Our work
• A probabilistic approach to reconstruction
• The Belief Propagation algorithm
• Seeded measurements matrices
A statistical physics approach to compressed sensing
Our work
•A probabilistic approach to reconstruction
• The Belief Propagation algorithm
• Seeded measurements matrices
A statistical physics approach to compressed sensing
A probabilistic approach to compressed sensing
P (�x|�y) =1Z
N�
i=1
[(1� �) �(xi) + ��(xi)]M�
µ=1
�
�yµ �
N�
i=1
Fµixi
�We want to sample from this distribution:
A probabilistic approach to compressed sensing
P (�x|�y) =1Z
N�
i=1
[(1� �) �(xi) + ��(xi)]M�
µ=1
�
�yµ �
N�
i=1
Fµixi
�We want to sample from this distribution:}
Solution of the linear system
A probabilistic approach to compressed sensing
P (�x|�y) =1Z
N�
i=1
[(1� �) �(xi) + ��(xi)]M�
µ=1
�
�yµ �
N�
i=1
Fµixi
�We want to sample from this distribution:}
Solution of the linear system
}Sparse vector
A probabilistic approach to compressed sensing
P (�x|�y) =1Z
N�
i=1
[(1� �) �(xi) + ��(xi)]M�
µ=1
�
�yµ �
N�
i=1
Fµixi
�We want to sample from this distribution:
Theorem: sampling from P(x|y) gives the correct solution in the large N limit as long as α>ρ0
if: a) Φ(x)>0 ∀x and b) 1>ρ>0
A probabilistic approach to compressed sensing
P (�x|�y) =1Z
N�
i=1
[(1� �) �(xi) + ��(xi)]M�
µ=1
�
�yµ �
N�
i=1
Fµixi
�We want to sample from this distribution:
Theorem: sampling from P(x|y) gives the correct solution in the large N limit as long as α>ρ0
if: a) Φ(x)>0 ∀x and b) 1>ρ>0
Sampling from P(x|y) is optimal, (even if we do not know the correct Φ(x) or the correct ρ)
A probabilistic approach to compressed sensing
In practice, we use a Gaussian distribution for Φ(x), with mean m and variance σ2, and “learn” the best value for ρ,σ and m.
P (�x|�y) =1Z
N�
i=1
[(1� �) �(xi) + ��(xi)]M�
µ=1
�
�yµ �
N�
i=1
Fµixi
�We want to sample from this distribution:
Theorem: sampling from P(x|y) gives the correct solution in the large N limit as long as α>ρ0
if: a) Φ(x)>0 ∀x and b) 1>ρ>0
Sampling from P(x|y) is optimal, (even if we do not know the correct Φ(x) or the correct ρ)
A sketch of the proofConsider the system constrained to be at
distances larger than D with respect to the solution
1) Y(0) is infinite if α>ρ0 (equivalently if M>R) (just count the delta functions! N-R+M deltas versus N integrals...)
2) Y(D) is finite for any D>0 (bound by a first moment method, or “annealed” computation)
A sketch of the proofConsider the system constrained to be at
distances larger than D with respect to the solution
1) Y(0) is infinite if α>ρ0 (equivalently if M>R) (just count the delta functions! N-R+M deltas versus N integrals...)
2) Y(D) is finite for any D>0 (bound by a first moment method, or “annealed” computation)
If α>ρ0, the measure is always dominated by the solution
A sketch of the proofConsider the system constrained to be at
distances larger than D with respect to the solution
D
log Y
N
A probabilistic approach to compressed sensing
Statistical physics and information theory tools can be readily used for • Sampling • Computing phase diagram• etc etc...
Sampling from P(x|y) is optimal, (even if we do not know the correct Φ(x) or the correct ρ)
Probabilistic reconstruction using:
P (�x|�y) =1Z
N�
i=1
[(1� �) �(xi) + ��(xi)]M�
µ=1
�
�yµ �
N�
i=1
Fµixi
�
The link with statistical physicsand spin glasses
withP (�x|�y) =1Z
N�
i=1
P (xi)M�
µ=1
�
�yµ �
N�
i=1
Fµixi
�P (xi) = (1� �)�(xi) + ��(xi)
P (�x|�y) =1Z
e�PN
i=1 log P (xi)� 12�
PMµ=1(yµ�
PNi=1 Fµixi)
2
In physics, this is called a spin glass problemThis is studied since the early 80’s
The link with statistical physicsand spin glasses
withP (�x|�y) =1Z
N�
i=1
P (xi)M�
µ=1
�
�yµ �
N�
i=1
Fµixi
�P (xi) = (1� �)�(xi) + ��(xi)
Hamiltonian
P (�x|�y) =1Z
e�PN
i=1 log P (xi)� 12�
PMµ=1(yµ�
PNi=1 Fµixi)
2
In physics, this is called a spin glass problemThis is studied since the early 80’s
The link with statistical physicsand spin glasses
withP (�x|�y) =1Z
N�
i=1
P (xi)M�
µ=1
�
�yµ �
N�
i=1
Fµixi
�P (xi) = (1� �)�(xi) + ��(xi)
Partition sumHamiltonian
P (�x|�y) =1Z
e�PN
i=1 log P (xi)� 12�
PMµ=1(yµ�
PNi=1 Fµixi)
2
In physics, this is called a spin glass problemThis is studied since the early 80’s
The link with statistical physicsand spin glasses
withP (�x|�y) =1Z
N�
i=1
P (xi)M�
µ=1
�
�yµ �
N�
i=1
Fµixi
�P (xi) = (1� �)�(xi) + ��(xi)
Disorderedinteraction
P (�x|�y) =1Z
e�PN
i=1 log P (xi)� 12�
PMµ=1(yµ�
PNi=1 Fµixi)
2
In physics, this is called a spin glass problemThis is studied since the early 80’s
The link with statistical physicsand spin glasses
withP (�x|�y) =1Z
N�
i=1
P (xi)M�
µ=1
�
�yµ �
N�
i=1
Fµixi
�P (xi) = (1� �)�(xi) + ��(xi)
Mean-fieldlong-range interactions
P (�x|�y) =1Z
e�PN
i=1 log P (xi)� 12�
PMµ=1(yµ�
PNi=1 Fµixi)
2
In physics, this is called a spin glass problemThis is studied since the early 80’s
Our work
• A probabilistic approach to reconstruction
•The Belief Propagation algorithm
• Seeded measurements matrices
A statistical physics approach to compressed sensing
Our work
• A probabilistic approach to reconstruction
•The Belief Propagation algorithm
• Seeded measurements matrices
A statistical physics approach to compressed sensing
Statistical Physics approach (FK et al.)+rigorous (Donoho, Montanari et al.)
How to sample?P (�x|�y) =
1Z
N�
i=1
[(1� �) �(xi) + ��(xi)]M�
µ=1
�
�yµ �
N�
i=1
Fµixi
�
How to sample?P (�x|�y) =
1Z
N�
i=1
[(1� �) �(xi) + ��(xi)]M�
µ=1
�
�yµ �
N�
i=1
Fµixi
�
Solution number 1:using Markov-Chain Monte-Carlo
How to sample?P (�x|�y) =
1Z
N�
i=1
[(1� �) �(xi) + ��(xi)]M�
µ=1
�
�yµ �
N�
i=1
Fµixi
�
Solution number 1:using Markov-Chain Monte-Carlo
Used in the litterature for ultrasound imaging(cf : Quinsac et al, 2011...)
How to sample?P (�x|�y) =
1Z
N�
i=1
[(1� �) �(xi) + ��(xi)]M�
µ=1
�
�yµ �
N�
i=1
Fµixi
�
Solution number 1:using Markov-Chain Monte-Carlo
Very long!
Used in the litterature for ultrasound imaging(cf : Quinsac et al, 2011...)
How to sample?P (�x|�y) =
1Z
N�
i=1
[(1� �) �(xi) + ��(xi)]M�
µ=1
�
�yµ �
N�
i=1
Fµixi
�
Solution number 1:using Markov-Chain Monte-Carlo
Very long!xUsed in the litterature for ultrasound imaging
(cf : Quinsac et al, 2011...)
Solution number 2: estimate the marginal probabilities with a message passing algorithm
How to sample?P (�x|�y) =
1Z
N�
i=1
[(1� �) �(xi) + ��(xi)]M�
µ=1
�
�yµ �
N�
i=1
Fµixi
�
If we do it correctly, then the solution is given by ai =
ZdxiPi(xi)xi
Solution number 2: estimate the marginal probabilities with a message passing algorithm
How to sample?P (�x|�y) =
1Z
N�
i=1
[(1� �) �(xi) + ��(xi)]M�
µ=1
�
�yµ �
N�
i=1
Fµixi
�
In this model, this can be done exactly (for large N,M) using an approach known as:
1. Thouless-Anderson-Parlmer, or Cavity method in physicsBethe-Peierls, Onsager (’35) Parisi and Mezard (’02)
2. Belief propagation in artificial intelligence (Pearl, ’82)
3. Sum-product in coding theory (Gallager, ’60)
3. Approximate Message Passing in compressed sensing Rangan, Montanari...
If we do it correctly, then the solution is given by ai =
ZdxiPi(xi)xi
How does BP works?Gibbs free energy approach:
With
logZ = max
{P(~x)}f
Gibbs
({P(~x)})
f
Gibbs
({P(~x)}) = �hlogP (~x|~y)iP(~x) �Z
d~xP(~x) logP(~x)
How does BP works?Gibbs free energy approach:
P(~x) =Y
i
Pi(~xi)Mean-Field ⇒
With
logZ = max
{P(~x)}f
Gibbs
({P(~x)})
f
Gibbs
({P(~x)}) = �hlogP (~x|~y)iP(~x) �Z
d~xP(~x) logP(~x)
How does BP works?Gibbs free energy approach:
P(~x) =Y
i
Pi(~xi)Mean-Field ⇒ Not correct+Convergence problems
With
logZ = max
{P(~x)}f
Gibbs
({P(~x)})
f
Gibbs
({P(~x)}) = �hlogP (~x|~y)iP(~x) �Z
d~xP(~x) logP(~x)
How does BP works?Gibbs free energy approach:
P(~x) =Y
i
Pi(~xi)Mean-Field ⇒
P(~x) =
Qij Pij(~xi, ~xj)Qi Pi(~xi)M�1Belief-Propagation⇒
Not correct+Convergence problems
With
logZ = max
{P(~x)}f
Gibbs
({P(~x)})
f
Gibbs
({P(~x)}) = �hlogP (~x|~y)iP(~x) �Z
d~xP(~x) logP(~x)
How does BP works?Gibbs free energy approach:
P(~x) =Y
i
Pi(~xi)Mean-Field ⇒
P(~x) =
Qij Pij(~xi, ~xj)Qi Pi(~xi)M�1Belief-Propagation⇒
Not correct+Convergence problems
(asymptotically) exact in CS
with random matrices
With
logZ = max
{P(~x)}f
Gibbs
({P(~x)})
f
Gibbs
({P(~x)}) = �hlogP (~x|~y)iP(~x) �Z
d~xP(~x) logP(~x)
How does BP works?Gibbs free energy approach:
P(~x) =Y
i
Pi(~xi)Mean-Field ⇒
P(~x) =
Qij Pij(~xi, ~xj)Qi Pi(~xi)M�1Belief-Propagation⇒
Not correct+Convergence problems
(asymptotically) exact in CS
with random matrices
The BP recursion is given by the steepest ascent method
With
logZ = max
{P(~x)}f
Gibbs
({P(~x)})
f
Gibbs
({P(~x)}) = �hlogP (~x|~y)iP(~x) �Z
d~xP(~x) logP(~x)
How does BP works?Simplification thanks to the large connectivity limit:
Projection on first two moments is enough :
}Belief-Propagation equations
hxiit+1 = hxiit +@f
@hxii
hx2i it+1 = hx2
i it +@f
@hx2i i
f
�{hxii, hx2
i i}�
f ({Pi(xi),Pij(xi, xj)})
U (t+1)i =
�
M
�
µ
1�µ + �(t)
V (t+1)i =
�
µ
Fµi(yµ � �(t)
µ )
�µ + �(t)µ
+ fa
�U (t)
i , V (t)i
� �
M
�
µ
1�µ + �(t)
�(t+1)µ =
�
i
Fµifa(U (t+1)i , V (t+1)
i )� (yµ � �(t)µ )
�µ + �(t)
1N
�
i
�fa
�Y
�U (t+1)
i , V (t+1)i
�
�(t+1) =1N
�
i
fc(U(t+1)i , V (t+1)
i )
The Belief-Propagation algorithm
fa(X, Y ) =�
�Y
(1 + X)3/2eY 2/(2(1+X))
� �1� � +
�
(1 + X)1/2eY 2/(2(1+X))
��1
fc(X, Y ) =�
�
(1 + X)3/2eY 2/(2(1+X))
�1 +
Y 2
1 + X
�� �1� � +
�
(1 + X)1/2eY 2/(2(1+X))
��1
� fa(X, Y )2
Iterate these variables
Using these functions:
And finally at the end:
�xi� = fa (Ui, Vi)
U (t+1)i =
�
M
�
µ
1�µ + �(t)
V (t+1)i =
�
µ
Fµi(yµ � �(t)
µ )
�µ + �(t)µ
+ fa
�U (t)
i , V (t)i
� �
M
�
µ
1�µ + �(t)
�(t+1)µ =
�
i
Fµifa(U (t+1)i , V (t+1)
i )� (yµ � �(t)µ )
�µ + �(t)
1N
�
i
�fa
�Y
�U (t+1)
i , V (t+1)i
�
�(t+1) =1N
�
i
fc(U(t+1)i , V (t+1)
i )
The Belief-Propagation algorithm
fa(X, Y ) =�
�Y
(1 + X)3/2eY 2/(2(1+X))
� �1� � +
�
(1 + X)1/2eY 2/(2(1+X))
��1
fc(X, Y ) =�
�
(1 + X)3/2eY 2/(2(1+X))
�1 +
Y 2
1 + X
�� �1� � +
�
(1 + X)1/2eY 2/(2(1+X))
��1
� fa(X, Y )2
Iterate these variables
Using these functions:
Simplealgebraic
operations!}
}And finally at the end:
�xi� = fa (Ui, Vi)
U (t+1)i =
�
M
�
µ
1�µ + �(t)
V (t+1)i =
�
µ
Fµi(yµ � �(t)
µ )
�µ + �(t)µ
+ fa
�U (t)
i , V (t)i
� �
M
�
µ
1�µ + �(t)
�(t+1)µ =
�
i
Fµifa(U (t+1)i , V (t+1)
i )� (yµ � �(t)µ )
�µ + �(t)
1N
�
i
�fa
�Y
�U (t+1)
i , V (t+1)i
�
�(t+1) =1N
�
i
fc(U(t+1)i , V (t+1)
i )
The Belief-Propagation algorithm
fa(X, Y ) =�
�Y
(1 + X)3/2eY 2/(2(1+X))
� �1� � +
�
(1 + X)1/2eY 2/(2(1+X))
��1
fc(X, Y ) =�
�
(1 + X)3/2eY 2/(2(1+X))
�1 +
Y 2
1 + X
�� �1� � +
�
(1 + X)1/2eY 2/(2(1+X))
��1
� fa(X, Y )2
Iterate these variables
Using these functions:
Simplealgebraic
operations!}
}And finally at the end:
�xi� = fa (Ui, Vi) Complexity is O(N2×convergence time)
Compute the Bethe free-entropy using the BP messages.Compute the gradient
The Belief-Propagation algorithm:How to learn the parameter in the Prior?
F = logZ
⇤(x) =1p2�⇥2
e�(x�x)2/(2�2)�, x,⇥Three parameters
✓⇤F
⇤�,⇤F
⇤x,⇤F
⇤⇥
◆
and update parameters to maximize at each step of the BP iteration
F
Learning makes the algorithm faster(equivalent to Expectation-Maximization)
The performance of the algorithm for a given distribution of signals can be analyzed using a method knows as density
evolution (coding theory) or replica method (physics)
Z(y) =� N�
i=1
dxiP (x|y) F (�y) = � log Z(�y)
Analysis of the algorithm
The performance of the algorithm for a given distribution of signals can be analyzed using a method knows as density
evolution (coding theory) or replica method (physics)
Z(y) =� N�
i=1
dxiP (x|y) F (�y) = � log Z(�y)
Averaging over a signal distributiion (ex: Gauss Bernoulli) Fµi iid Gaussian, variance 1/N
yµ =NX
i=1
Fµix0i x
0iwhere are iid distributed from (1� ⇥0)�(x
0i ) + ⇥0⇤0(xi)
Analysis of the algorithm
The performance of the algorithm for a given distribution of signals can be analyzed using a method knows as density
evolution (coding theory) or replica method (physics)
Z(y) =� N�
i=1
dxiP (x|y) F (�y) = � log Z(�y)
Averaging over a signal distributiion (ex: Gauss Bernoulli) Fµi iid Gaussian, variance 1/N
yµ =NX
i=1
Fµix0i x
0iwhere are iid distributed from (1� ⇥0)�(x
0i ) + ⇥0⇤0(xi)
Replica method log Z = limn�0
Zn � 1n
Analysis of the algorithm
Analysis of the algorithm
⇥(Q, q,m, Q̂, q̂, m̂) = � 1
2N
X
µ
q � 2m+ �+�µ
�µ +Q� q� 1
2N
X
µ
log (�µ +Q� q) +QQ̂
2�mm̂+
qq̂
2
+
ZDz
Zdx0 [(1� ⇥0)�(x0) + ⇥0⇤0(x0)] log
⇢Zdx e�
Q̂+q̂2 x
2+m̂xx0+z
pq̂x
[(1� ⇥)�(x) + ⇥⇤(x)]
�
Order parameters:
Q =1
N
X
i
hx2i i q =
1
N
X
i
hxii2 m =1
N
X
i
x
0i hxii
Mean square error: E =1
N
X
i
�hxii � x
0i
�2= q � 2m+ h(x0
i )2i0
Averaging over a signal distributiion (ex: Gauss Bernoulli) Fµi iid Gaussian, variance 1/N
yµ =NX
i=1
Fµix0i x
0iwhere are iid distributed from (1� ⇥0)�(x
0i ) + ⇥0⇤0(xi)
Analysis of the algorithm
⇥(Q, q,m, Q̂, q̂, m̂) = � 1
2N
X
µ
q � 2m+ �+�µ
�µ +Q� q� 1
2N
X
µ
log (�µ +Q� q) +QQ̂
2�mm̂+
qq̂
2
+
ZDz
Zdx0 [(1� ⇥0)�(x0) + ⇥0⇤0(x0)] log
⇢Zdx e�
Q̂+q̂2 x
2+m̂xx0+z
pq̂x
[(1� ⇥)�(x) + ⇥⇤(x)]
�
Order parameters:
Q =1
N
X
i
hx2i i q =
1
N
X
i
hxii2 m =1
N
X
i
x
0i hxii
Mean square error: E =1
N
X
i
�hxii � x
0i
�2= q � 2m+ h(x0
i )2i0
Averaging over a signal distributiion (ex: Gauss Bernoulli) Fµi iid Gaussian, variance 1/N
yµ =NX
i=1
Fµix0i x
0iwhere are iid distributed from (1� ⇥0)�(x
0i ) + ⇥0⇤0(xi)
0 0.05 0.1 0.15 0.2 0.25 0.3
0.1
0.15
0.2
0.25
D
Φ(D)
α = 0.62α = 0.6α = 0.58α = 0.56
mean square error
Computing the free entropyExample with ρ0=0.4, and Φ0 a Gaussian distribution with zero mean and unit variance
E =1
N
X
i
�hxii � x
0i
�2
�(E
)=
log
Z(E
)
• Maximum is at E=0 (as long as α>ρ0): Equilibrium behavior dominated by the original signal
0 0.05 0.1 0.15 0.2 0.25 0.3
0.1
0.15
0.2
0.25
D
Φ(D)
α = 0.62α = 0.6α = 0.58α = 0.56
mean square error
Computing the free entropyExample with ρ0=0.4, and Φ0 a Gaussian distribution with zero mean and unit variance
E =1
N
X
i
�hxii � x
0i
�2
�(E
)=
log
Z(E
)
• Maximum is at E=0 (as long as α>ρ0): Equilibrium behavior dominated by the original signal • For α<0.58, a secondary maximum appears (meta-stable state): spinodal point
0 0.05 0.1 0.15 0.2 0.25 0.3
0.1
0.15
0.2
0.25
D
Φ(D)
α = 0.62α = 0.6α = 0.58α = 0.56
mean square error
Computing the free entropyExample with ρ0=0.4, and Φ0 a Gaussian distribution with zero mean and unit variance
E =1
N
X
i
�hxii � x
0i
�2
�(E
)=
log
Z(E
)
• Maximum is at E=0 (as long as α>ρ0): Equilibrium behavior dominated by the original signal • For α<0.58, a secondary maximum appears (meta-stable state): spinodal point• A steepest ascent dynamics starting from large E would reach the signal for α>0.58, but would stay block in the meta-stable state for α<0.58, even if the true equilibrium is at E=0.
0 0.05 0.1 0.15 0.2 0.25 0.3
0.1
0.15
0.2
0.25
D
Φ(D)
α = 0.62α = 0.6α = 0.58α = 0.56
mean square error
Computing the free entropyExample with ρ0=0.4, and Φ0 a Gaussian distribution with zero mean and unit variance
E =1
N
X
i
�hxii � x
0i
�2
�(E
)=
log
Z(E
)
• Maximum is at E=0 (as long as α>ρ0): Equilibrium behavior dominated by the original signal • For α<0.58, a secondary maximum appears (meta-stable state): spinodal point• A steepest ascent dynamics starting from large E would reach the signal for α>0.58, but would stay block in the meta-stable state for α<0.58, even if the true equilibrium is at E=0.• Similarity with supercooled liquids
0 0.05 0.1 0.15 0.2 0.25 0.3
0.1
0.15
0.2
0.25
D
Φ(D)
α = 0.62α = 0.6α = 0.58α = 0.56
mean square error
Computing the free entropyExample with ρ0=0.4, and Φ0 a Gaussian distribution with zero mean and unit variance
E =1
N
X
i
�hxii � x
0i
�2
�(E
)=
log
Z(E
)
αL1αBEPα=ρ
0.4 0.5 0.6 0.7 0.8 0.90
0.05
0.1
0.15
0.2
α
Mea
n s
qu
are
erro
r
L1
BEP
1 10-5 0.0001 0.001 0.01 0.1
-1
-0.5
0
0.5
1
Mean square errorta
nh
[4!
(E)]
α = 0.8
α = 0.6
α = 0.5
α = 0.3
αL1αrBPα=ρ
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
ρ
α
αL1(ρ)αrBP(ρ)S-BEP
α = ρ
0.4 0.5 0.6 0.7 0.8 0.9
30
100
300
1000
3000
10000
α
Nu
mb
er o
f it
erat
ion
s
rBP
Seeded BEP - L=10
Seeded BEP - L=40
L1
Spinodal line
Computing the Phase Diagram
0 0.05 0.1 0.15 0.2 0.25 0.3
0.1
0.15
0.2
0.25
DΦ(D)
α = 0.62α = 0.6α = 0.58α = 0.56
E(E
)
αL1αBEPα=ρ
0.4 0.5 0.6 0.7 0.8 0.90
0.05
0.1
0.15
0.2
α
Mea
n s
qu
are
erro
r
L1
BEP
1 10-5 0.0001 0.001 0.01 0.1
-1
-0.5
0
0.5
1
Mean square errorta
nh
[4!
(E)]
α = 0.8
α = 0.6
α = 0.5
α = 0.3
αL1αrBPα=ρ
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
ρ
α
αL1(ρ)αrBP(ρ)S-BEP
α = ρ
0.4 0.5 0.6 0.7 0.8 0.9
30
100
300
1000
3000
10000
α
Nu
mb
er o
f it
erat
ion
s
rBP
Seeded BEP - L=10
Seeded BEP - L=40
L1
Spinodal line
A steepest ascent of the free entropy allows a perfect reconstruction until the spinodal line.
This is more efficient than L1-minimization
Computing the Phase Diagram
0 0.05 0.1 0.15 0.2 0.25 0.3
0.1
0.15
0.2
0.25
DΦ(D)
α = 0.62α = 0.6α = 0.58α = 0.56
E(E
)
0 0.05 0.1 0.15 0.2 0.25 0.3
0.1
0.15
0.2
0.25
D
Φ(D)
α = 0.62α = 0.6α = 0.58α = 0.56
Thermodynamic potential
distance to native state
BP convergence time
Spinodal transition(supercooled limit)
0.4 0.5 0.6 0.7 0.8 0.9
30
100
300
1000
3000
10000
αN
um
ber
of it
erat
ions
rBP
Seeded BEP - L=10
Seeded BEP - L=40
L1BP
The limit depends on the type of signal (while the Donoho-Tanner is universal)
αL1αBPα=ρ
0.4 0.5 0.6 0.7 0.8 0.90
0.05
0.1
0.15
0.2
α
Mea
n s
quar
e er
ror
L1
BEP
1 10-5 0.0001 0.001 0.01 0.1
-1
-0.5
0
0.5
1
Mean square error
tan
h[4
!(E
)]
α = 0.8
α = 0.6
α = 0.5
α = 0.3
αL1αBPα=ρ
0.4 0.5 0.6 0.7 0.8 0.9
30
100
300
1000
3000
10000
α
Nu
mb
er o
f it
erat
ion
s
BP
s-BP - L=10
s-BP - L=40
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
ρ
α
αL1(ρ)
αEM-BP(ρ)
s-BP, N=104
s-BP, N=103
α = ρ
★
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
ρ
α
αL1(ρ)
αEM-BP(ρ)
s-BP, N=104
s-BP, N=103
α = ρ
Gauss-Bernoulli signal Binary signals
Donoho
-Tann
er
BP
Donoho
-Tann
erBP
Trying different type of signals
The limit depends on the type of signal (while the Donoho-Tanner is universal)
Gauss-Bernoulli signal Binary signals
Trying different type of signals
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
δDT
ρ DT
ℓ1
EM-BP
s-BP, N=104
s-BP, N=103
ρDT=1
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
δDT
ρ DT
ℓ1
EM-BP
s-BP, N=104
s-BP, N=103
ρDT = 1Donoho-Tanner
BP
Donoho-Tanner
BP
BP is Robust to noise
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
!/"
=K
/M
"=M/N
0.2
0.1
1E-41E-9
1E-10
1E-11
1E-12
DT
Spinodal line
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
!/"
=K
/M
"=M/N
0.2
0.1
1E-2 1E-3
1E-4
1E-5
DT
Spinodal line
Noise with Noise � = 10�5 � = 10�2
L1
BEP
S-BEP
α = 0.5 α = 0.4 α = 0.3 α = 0.2 α = 0.1
α = ρ ! 0.15
Shepp-Logan phantom, in the Haar-wavelet representation
A more complex signal
BP
BP + probabilistic approach
• Efficient and fast
• Robust to noise
• Very flexible (more information can be put in the prior)
P (�x|�y) =1Z
N�
i=1
[(1� �) �(xi) + ��(xi)]M�
µ=1
�
�yµ �
N�
i=1
Fµixi
�
Our work
• A probabilistic approach to reconstruction
• The Belief Propagation algorithm
• Seeded measurements matrices
A statistical physics approach to compressed sensing
This is good, but not good enough
αL1αBEPα=ρ
0.4 0.5 0.6 0.7 0.8 0.90
0.05
0.1
0.15
0.2
α
Mea
n s
qu
are
erro
r
L1
BEP
1 10-5 0.0001 0.001 0.01 0.1
-1
-0.5
0
0.5
1
Mean square errorta
nh
[4!
(E)]
α = 0.8
α = 0.6
α = 0.5
α = 0.3
αL1αrBPα=ρ
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
ρ
α
αL1(ρ)αrBP(ρ)S-BEP
α = ρ
0.4 0.5 0.6 0.7 0.8 0.9
30
100
300
1000
3000
10000
α
Nu
mb
er o
f it
erat
ion
s
rBP
Seeded BEP - L=10
Seeded BEP - L=40
L1
Spinodal line
0 0.05 0.1 0.15 0.2 0.25 0.3
0.1
0.15
0.2
0.25
DΦ(D)
α = 0.62α = 0.6α = 0.58α = 0.56
The dynamics is stuck in a metastable state, just as a liquid cooled too fast remains in a supercooled
liquid state instead of crystalizing
This is good, but not good enough
αL1αBEPα=ρ
0.4 0.5 0.6 0.7 0.8 0.90
0.05
0.1
0.15
0.2
α
Mea
n s
qu
are
erro
r
L1
BEP
1 10-5 0.0001 0.001 0.01 0.1
-1
-0.5
0
0.5
1
Mean square errorta
nh
[4!
(E)]
α = 0.8
α = 0.6
α = 0.5
α = 0.3
αL1αrBPα=ρ
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
ρ
α
αL1(ρ)αrBP(ρ)S-BEP
α = ρ
0.4 0.5 0.6 0.7 0.8 0.9
30
100
300
1000
3000
10000
α
Nu
mb
er o
f it
erat
ion
s
rBP
Seeded BEP - L=10
Seeded BEP - L=40
L1
Spinodal line
0 0.05 0.1 0.15 0.2 0.25 0.3
0.1
0.15
0.2
0.25
DΦ(D)
α = 0.62α = 0.6α = 0.58α = 0.56
The dynamics is stuck in a metastable state, just as a liquid cooled too fast remains in a supercooled
liquid state instead of crystalizing
This is good, but not good enough
0 0.05 0.1 0.15 0.2 0.25 0.3
0.1
0.15
0.2
0.25
DΦ(D)
α = 0.62α = 0.6α = 0.58α = 0.56
How to pass the spinodal point?
The dynamics is stuck in a metastable state, just as a liquid cooled too fast remains in a supercooled
liquid state instead of crystalizing
This is good, but not good enough
0 0.05 0.1 0.15 0.2 0.25 0.3
0.1
0.15
0.2
0.25
DΦ(D)
α = 0.62α = 0.6α = 0.58α = 0.56
How to pass the spinodal point?
By nucleation!
Special design of “seeded” matrices
0
BBBB@
1
CCCCA= ⇥
y F s
: unit coupling
: no coupling (null elements)
: coupling J1: coupling J2
J1J1
J1J1
J1J1
J1
J2J2
J2J2
J2J2
11
11
11
1
1 J2
0
0
0
BBBB@
1
CCCCA
0
BBBBBBBBBBBB@
1
CCCCCCCCCCCCA
L = 8
Ni = N/L
Mi = �iN/L
�1 > �BP
�j = �0 < �BP j � 2
� =1
L(�1 + (L� 1)�0)
0
BBBB@
1
CCCCA= ⇥
y F s
: unit coupling
: no coupling (null elements)
: coupling J1: coupling J2
J1J1
J1J1
J1J1
J1
J2J2
J2J2
J2J2
11
11
11
1
1 J2
0
0
0
BBBB@
1
CCCCA
0
BBBBBBBBBBBB@
1
CCCCCCCCCCCCA
L = 8
Mi = �iN/L
�1 > �BP
�j = �0 < �BP j � 2
� =1
L(�1 + (L� 1)�0)
Ni = N/L
Ni = N/L
0
BBBB@
1
CCCCA= ⇥
y F s
: unit coupling
: no coupling (null elements)
: coupling J1: coupling J2
J1J1
J1J1
J1J1
J1
J2J2
J2J2
J2J2
11
11
11
1
1 J2
0
0
0
BBBB@
1
CCCCA
0
BBBBBBBBBBBB@
1
CCCCCCCCCCCCA
L = 8
Mi = �iN/L
�1 > �BP
�j = �0 < �BP j � 2
� =1
L(�1 + (L� 1)�0)
Ni = N/L
Ni = N/L
Mi = �0N/L
i � {2, · · · , L}
M1 = �1N/L
0
BBBB@
1
CCCCA= ⇥
y F s
: unit coupling
: no coupling (null elements)
: coupling J1: coupling J2
J1J1
J1J1
J1J1
J1
J2J2
J2J2
J2J2
11
11
11
1
1 J2
0
0
0
BBBB@
1
CCCCA
0
BBBBBBBBBBBB@
1
CCCCCCCCCCCCA
L = 8
Ni = N/L
Mi = �iN/L
�1 > �BP
�j = �0 < �BP j � 2
� =1
L(�1 + (L� 1)�0)
0
BBBB@
1
CCCCA= ⇥
y F s
: unit coupling
: no coupling (null elements)
: coupling J1: coupling J2
J1J1
J1J1
J1J1
J1
J2J2
J2J2
J2J2
11
11
11
1
1 J2
0
0
0
BBBB@
1
CCCCA
0
BBBBBBBBBBBB@
1
CCCCCCCCCCCCA
L = 8
Ni = N/L
Mi = �iN/L
�1 > �BP
�j = �0 < �BP j � 2
� =1
L(�1 + (L� 1)�0)
Block 1 has a large value of M such that the solution arise in this block...
0
BBBB@
1
CCCCA= ⇥
y F s
: unit coupling
: no coupling (null elements)
: coupling J1: coupling J2
J1J1
J1J1
J1J1
J1
J2J2
J2J2
J2J2
11
11
11
1
1 J2
0
0
0
BBBB@
1
CCCCA
0
BBBBBBBBBBBB@
1
CCCCCCCCCCCCA
L = 8
Ni = N/L
Mi = �iN/L
�1 > �BP
�j = �0 < �BP j � 2
� =1
L(�1 + (L� 1)�0)
Block 1 has a large value of M such that the solution arise in this block...... and then propagate in the whole system!
5 10 15 200
0.1
0.2
0.3
0.4
Block index
Mea
n s
qu
are
erro
rt=1t=4t=10t=20t=50t=100t=150t=200t=250t=300
L = 20 � = .4N = 50000 J1 = 20
J2 = .2
↵1 = 1
� = .5
Example with ρ0=0.4, and Φ0
a Gaussian distribution with 0 mean and unit variance
A signal with α=0.5 and ρ=0.4
-3
-2
-1
0
1
2
3
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
N=10000 points, rho=0.4, alpha=0.5
Faulty reconstruction with L1Perfect reconstruction with BP
Blue is the true signal reconstructed by s-BPRed is the signal found by L1
αL1αBPα=ρ
0.4 0.5 0.6 0.7 0.8 0.90
0.05
0.1
0.15
0.2
α
Mea
n s
qu
are
erro
r
L1
BEP
1 10-5 0.0001 0.001 0.01 0.1
-1
-0.5
0
0.5
1
Mean square error
tan
h[4
!(E
)]
α = 0.8
α = 0.6
α = 0.5
α = 0.3
αL1αBPα=ρ
0.4 0.5 0.6 0.7 0.8 0.9
30
100
300
1000
3000
10000
α
Nu
mb
er o
f it
erat
ion
s
BP
s-BP - L=10
s-BP - L=40
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
ρ
α
αL1(ρ)
αEM-BP(ρ)
s-BP, N=104
s-BP, N=103
α = ρ
★
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
ρ
α
αL1(ρ)
αEM-BP(ρ)
s-BP, N=104
s-BP, N=103
α = ρ
Gauss-Bernoulli signal Binary signals
L1
Phase DiagramsL1
(N=103 &104)BP
BP
seeded BP
L1
BEP
S-BEP
α = 0.5 α = 0.4 α = 0.3 α = 0.2 α = 0.1
α = ρ ! 0.15
s-BP
Shepp-Logan phantom, in the Haar-wavelet representation
A more interesting example
BP
L1
BEP
S−BEP
� =0.6 � =0.5 � =0.4 � =0.3 � =0.20.24
BEP
S-BEP
α = 0.6 α = 0.5 α = 0.4 α = 0.3 α = 0.2
α = ρ ! 0.24
L1
s-BP
The Lena picture in the Haar-wavelet representation
A EVEN more interesting example
Analytical results for seeding matrices•One can repeat the replica analysis for the seeded matrices, and the performance of the algorithm can be studied analytically, leading to α>ρ in the large N limit:
•These results have been recently confirmed by a rigorous analysis by Donoho, Montanari and Javanmard (arXiv:1112.0708)
•There is a lot of liberty in the design of the seeded matrices.
Analytical results for seeding matrices•One can repeat the replica analysis for the seeded matrices, and the performance of the algorithm can be studied analytically, leading to α>ρ in the large N limit:
•These results have been recently confirmed by a rigorous analysis by Donoho, Montanari and Javanmard (arXiv:1112.0708)
Asymptotically optimal measurements
•There is a lot of liberty in the design of the seeded matrices.
Conclusions...• A probabilistic approach to reconstruction
• The Belief Propagation algorithm
• Seeded measurements matrices
... and perspectives:• More information in the prior?
• Other matrix with asymptotic measurements?
• Calibration noise, additive noise, quasi-sparsity, etc... ?
• Applications ?
L1
BEP
S−BEP
� =0.6 � =0.5 � =0.4 � =0.3 � =0.20.24
BEP
S-BEP
α = 0.6 α = 0.5 α = 0.4 α = 0.3 α = 0.2
α = ρ ! 0.24
L1
-3
-2
-1
0
1
2
3
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
N=10000 points, rho=0.4, alpha=0.5
Faulty reconstruction with L1Perfect reconstruction with BP
L1
BEP
S−BEP
� =0.6 � =0.5 � =0.4 � =0.3 � =0.20.24
BEP
S-BEP
α = 0.6 α = 0.5 α = 0.4 α = 0.3 α = 0.2
α = ρ ! 0.24
L1
Thank you for your attention!
-3
-2
-1
0
1
2
3
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
N=10000 points, rho=0.4, alpha=0.5
Faulty reconstruction with L1Perfect reconstruction with BP
BONUS
Noise sensitivity
1e-09
1e-08
1e-07
1e-06
1e-05
0.0001
0.001
0.01
0.1
0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6
MSE
�
CS with Gauss-Bernoulli (�0=0.2) noisy (�n=10-4) signals
L=4, theoryBP, L=4, N=5000
L=1, theoryBP, L=1, N=5000
L1 min, N=5000
0
0.05
0.1
0.15
0.2
0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6
MSE
�
CS with Gauss-Bernoulli (�0=0.2) noisy (�n=10-4) signals
L=4, theoryBP, L=4, N=5000
L=1, theoryBP, L=1, N=5000
L1 min, N=5000
Noise sensitivity
1e-10 1e-09 1e-08 1e-07 1e-06 1e-05
0.0001 0.001
0.01 0.1
1
0.4 0.42 0.44 0.46 0.48 0.5 0.52 0.54 0.56 0.58 0.6
MSE
α
CS with Gauss-Bernoulli (ρ0=0.4) noisy signals
s-BP, σ=10-3
s-BP, σ=10-4
s-BP, σ=10-5
Theory, σ=10-3
Theory, σ=10-4
Theory, σ=10-5
L=1 σ=10-5
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07
R
V
Gaussian Signal, Gaussian inference, rho=0.2 no spinodal
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07R
V
Gaussian Signal, Gaussian inference, rho=0.33 with spinodal
Flow in the space Q� qE = q � 2m+ h(x0
i )2i0,
0
0.05
0.1
0.15
0.2
0 0.05 0.1 0.15 0.2
R
V
Binary Signal, Gaussian inference, rho=0.15 no spinodal
0
0.05
0.1
0.15
0.2
0 0.05 0.1 0.15 0.2R
V
Binary Signal, Gaussian inference, rho=0.25 with spinodal