deconvolution with admm - stanford universitydeconvolution • given measurements band convolution...

Deconvolution with ADMM

Gordon WetzsteinStanford University

EE367/CS448I: Computational Imaging and Displaystanford.edu/class/ee367

Lecture 6

Lens as Optical Low-pass Filter

• point source on focal plane maps to point

focal plane

• away from focal plane: out of focus blur

focal plane

blur

red

poin

t


• shift-invariant convolution

focal plane



poin

t spr

ead

func

tion

(PSF

): c

x bsharp image measured, blurred image

b = c∗ x

convolution kernel is calledpoint spread function (PSF)


poin

t spr

ead

func

tion

(PSF

): c

x bsharp image measured, blurred image

b = c∗ x

diffraction-limited PSF of circular aperture (aka “Airy” pattern):

PSF, OTF, MTF

• point spread function (PSF) is fundamental concept in optics• optical transfer function (OTF) is (complex) Fourier transform of PSF

• modulation transfer function (MTF) is magnitude of OTF

• example: PSFOTF=F{PSF}MTF=|OTF|

PSF, OTF, MTF

PSFOTF=F{PSF}MTF=|OTF|• example:

Deconvolution

• given measurements b and convolution kernel c, what is x?

*

=

bcx

?

Deconvolution with Inverse Filtering• naive solution: apply inverse kernel

!x = c−1 ∗b = F−1 F b{ }

F c{ }⎧⎨⎩

⎫⎬⎭

x !x

Deconvolution with Inverse Filtering & Noise• naive solution: apply inverse kernel

• Gaussian noise,

!x = c−1 ∗b = F−1 F b{ }

F c{ }⎧⎨⎩

⎫⎬⎭

!xσ = 0.05

Deconvolution with Inverse Filtering & Noise

• results: terrible!

• why? this is an ill-posed problem (division by (close to) zero in frequency

domain) à noise is drastically amplified!

• need to include prior(s) on images to make up for lost data• for example: noise statistics (signal to noise ratio)

Deconvolution with Wiener Filtering• apply inverse kernel and don’t divide by 0

!x = F−1 F c{ } 2

F c{ } 2 + 1SNR⋅F b{ }F c{ }

⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

amplitude-dependent damping factor!

𝑆𝑁𝑅 =𝑚𝑒𝑎𝑛 𝑠𝑖𝑔𝑛𝑎𝑙 ≈ 0.5𝑛𝑜𝑖𝑠𝑒 𝑠𝑡𝑑 = 𝜎

Deconvolution with Wiener Filtering

x !xNaïve inverse filter Wiener


σ = 0.05 σ = 0.1σ = 0.01


• results: not too bad, but noisy

• this is a heuristic à dampen noise amplification

• idea: promote sparse gradients (edges)

• is finite differences operator, i.e. matrix

Total Variation

minimizex

Cx − b 22 + λTV (x) = minimize

xCx − b 2

2 + λ ∇x 1

x 1 = xii∑

∇

−1 1−1 1!

−1

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

Rudin et al. 1992

Total Variation

∗0 0 00 −1 10 0 0

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

∇yxx ∇xx

∗0 0 00 −1 00 1 0

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

express (forward finite difference) gradient as convolution!

better: isotropic

Total Variation

∇xx( )2 + ∇yx( )2x ∇xx( )2 + ∇yx( )2easier: anisotropic

Total Variation

• for simplicity, this lecture only discusses anisotropic TV:

• problem: l1-norm is not differentiable, can’t use inverse filtering

• however: simple solution for data fitting along and simple solution

for TV alone à split problem!

TV (x) = ∇xx 1 + ∇yx 1=

∇x

∇y

⎡

⎣⎢⎢

⎤

⎦⎥⎥x1

minimize f (x)+ g(z)subject to Ax + Bz = c

f (x) = Cx − b 22

g(z) = λ z 1

A = ∇, B = −I , c = 0


• split deconvolution with TV prior:

• general form of ADMM (alternating direction method of multiplies):

minimize Cx − b 22 + λ z 1

subject to ∇x = z

ADMM

• Lagrangian (bring constraints into objective = penalty method):

• augmented Lagrangian:


Lρ (x, y, z) = f (x)+ g(z)+ yT (Ax + Bz − c)+ (ρ / 2) Ax + Bz − c 22

L(x, y, z) = f (x)+ g(z)+ yT (Ax + Bz − c)

dual variable or Lagrange multiplier

additional penalty term

ADMM

• augmented Lagrangian is differentiable under mild conditions (usually

better convergence etc.)


Lρ (x, y, z) = f (x)+ g(z)+ yT (Ax + Bz − c)+ (ρ / 2) Ax + Bz − c 22

• ADMM consists of 3 steps per iteration k:

ADMMminimize f (x)+ g(z)subject to Ax + Bz = c

xk+1 := argminx

Lρ (x, zk , yk )

zk+1 := argminz

Lρ (xk+1, z, yk )

yk+1 := yk + ρ(Axk+1 + Bzk+1 − c)

ADMM



xk+1 := argminx

f (x)+ (ρ / 2) Ax + Bzk − c + uk( )zk+1 := argmin

zg(z)+ (ρ / 2) Axk+1 + Bz − c + uk( )

uk+1 := uk + Axk+1 + Bzk+1 − c

constant

u = (1 / ρ)yscaled dual variable:

ADMM



xk+1 := argminx

f (x)+ (ρ / 2) Ax + Bzk − c + uk2

2( )zk+1 := argmin

zg(z)+ (ρ / 2) Axk+1 + Bz − c + uk

2

2( )uk+1 := uk + Axk+1 + Bzk+1 − c

split f(x) and g(x) into independent problems! (u connects them)

u = (1 / ρ)yscaled dual variable:



minimize 12Cx − b 2

2 + λ z 1

subject to ∇x − z = 0

xk+1 := argminx

12Cx − b 2

2 + (ρ / 2) ∇x − zk + uk2

2⎛⎝⎜

⎞⎠⎟

zk+1 := argminz

λ z 1 + (ρ / 2) ∇xk+1 − z + uk

2

2( )uk+1 := uk +∇xk+1 − zk+1


1. x-update: xk+1 := argminx

12Cx − b 2

2 + (ρ / 2) ∇x − zk + uk2

2⎛⎝⎜

⎞⎠⎟

CTC + ρ∇T∇( )x = CTb + ρ∇T v( )

constant, say


2 + λ z 1

subject to ∇x − z = 0 v = zk − uk

∇T v =∇x

∇y

⎡

⎣⎢⎢

⎤

⎦⎥⎥

T

v = ∇xT v1 +∇y

T v2

solve normal equations


1. x-update:

• inverse filtering:

à may blow up, but that’s okay

xk+1 := argminx

12Cx − b 2

2 + (ρ / 2) ∇x − zk + uk2

2⎛⎝⎜

⎞⎠⎟

x = CTC + ρ∇T∇( )−1 CTb + ρ∇T v( )

constant, say


2 + λ z 1

subject to ∇x − z = 0 v = zk − uk

xk+1 = F−1F c{ }* ⋅F b{ }+ ρ F ∇x{ }* ⋅F v1{ }+ F ∇y{ }* ⋅F v2{ }( )F c{ }* ⋅F c{ }+ ρ F ∇x{ }* ⋅F ∇x{ }+ F ∇y{ }* ⋅F ∇y{ }( )

⎧

⎨⎪

⎩⎪

⎫

⎬⎪

⎭⎪

precompute!


2. z-update:

• l1-norm is not differentiable! yet, closed-form solution via element-wise

soft thresholding:

zk+1 := argminz

λ z 1 + (ρ / 2) ∇xk+1 − z + uk

2

2( ):= argmin

zλ z 1 + (ρ / 2) z − a 2

2

constant, say


2 + λ z 1


zk+1 := Sλ /ρ (a) Sκ (a) =a −κ a >κ0 a ≤κa +κ a < −κ

⎧

⎨⎪

⎩⎪

= (a −κ )+ − (−a −κ )+

a = ∇xk+1 + uk

κ = λ / ρ


for k=1:max_iters


2 + λ z 1


xk+1 := argminx

12

Cρ∇

⎡

⎣⎢⎢

⎤

⎦⎥⎥x −

bρv

⎡

⎣⎢⎢

⎤

⎦⎥⎥2

2⎛

⎝⎜⎜

⎞

⎠⎟⎟

zk+1 := Sλ /ρ (∇xk+1 + uk )

uk+1 := uk +∇xk+1 − zk+1

inverse filtering

element-wise threshold

trivial


for k=1:max_iters


2 + λ z 1


xk+1 := argminx

12

Cρ∇

⎡

⎣⎢⎢

⎤

⎦⎥⎥x −

bρv

⎡

⎣⎢⎢

⎤

⎦⎥⎥2

2⎛

⎝⎜⎜

⎞

⎠⎟⎟

zk+1 := Sλ /ρ (∇xk+1 + uk )

uk+1 := uk +∇xk+1 − zk+1

inverse filtering

element-wise threshold

trivial

à easy! J

Deconvolution with ADMMminimize 12Cx − b 2

2 + λ z 1

subject to ∇x − z = 0Wiener filtering ADMM with anisotropic TV, λ = 0.01, ρ = 10


2 + λ z 1


λ = 0.1, ρ = 10λ = 0.05, ρ = 10λ = 0.01, ρ = 10

• too much TV: “patchy”, too little TV: noisy


2 + λ z 1

subject to ∇x − z = 0Wiener filtering ADMM with anisotropic TV, λ = 0.1, ρ = 10


2 + λ z 1


λ = 0.1, ρ = 10λ = 0.05, ρ = 10λ = 0.01, ρ = 10

• too much TV: okay because image actually has sparse gradients!

Outlook ADMM• powerful tool for many computational imaging problems• include generic prior in g(z), just need to derive proximal operator

• example priors: noise statistics, sparse gradient, smoothness, …

• weighted sum of different priors also possible

• anisotropic TV is one of the easiest priors

minimizex,z{ }

f (x)+ g(z)

subject to Ax = z

minimizex

12Ax − b 2

2

data fidelity! "# $#

+ Γ(x)regularization%

Remember!

• implement matrix-free operations for Ax and A’x if efficient (e.g.

multiplications and divisions in frequency space)

• split difficult problems (e.g., inverse problems with non-

differentiable priors) into easier subproblems - ADMM

Homework 3

• implement:• filtering

• inverse filtering and Wiener filtering

• deconvolution with ADMM + (anisotropic) TV prior

• notes for ADMM implementation:

• initialize U, Z, X with 0

• implement with matrix-free form: all FT multiplications / divisions

• in 2D, finite differences matrix becomes(anisotropic form), use matrix free-operations as well!

• see note notes in HW

• check ADMM example scripts: http://web.stanford.edu/~boyd/papers/admm/

∇ =∇x

∇y

⎡

⎣⎢⎢

⎤

⎦⎥⎥

Notes for Homework 3I ∈ℜM×N , X ∈ℜMN×1

U ∈ℜ2MN×1, Z ∈ℜ2MN×1

• signal-to-noise ratio (SNR):

• peak signal-to-noise ratio (PSNR):

(always in dB)

• residual is value of objective function:

• convergence: residual for increasing iterations (should always decrease!)

Notes for Homework 3

12Cx − b 2

2 + λ∇x

∇y

⎡

⎣⎢⎢

⎤

⎦⎥⎥x1

12Cx − b 2

2not regularized: regularized:

MSE = 1mn

xtarget − xest( )n∑m∑ 2

PSNR = 10 ⋅ log10max(xtarget )

2

MSE⎛

⎝⎜⎞

⎠⎟= 10 ⋅ log10

1MSE

⎛⎝⎜

⎞⎠⎟

SNR =PsignalPnoise

SNRdB = 10 ⋅ log10PsignalPnoise

⎛⎝⎜

⎞⎠⎟

References and Further Reading• Boyd, Parikh, Chu, Peleato, Eckstein, “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers”,

Foundations and Trends in Machine Learning, 2011

• A. Chambolle, T. Pock “A first-order primal-dual algorithm for convex problems with applications in imaging”, Journal of Mathematical Imaging and Vision, 2011

• Boreman, “Modulation Transfer Function in Optical and ElectroOptical Systems”, SPIE Publications, 2001• Rudin, Osher, Fatemi, “Nonlinear total variation based noise removal algorithms”, Physica D: Nonlinear Phenomena 60, 1

• http://www.imagemagick.org/Usage/fourier/

deconvolution with admm - stanford universitydeconvolution • given measurements band convolution...

Documents