the little engine that could - elad.cs.technion.ac.il

75
RED: Regularization by Denoising The Little Engine that Could Michael Elad May 18, 2017 Joint work with: Yaniv Romano Peyman Milanfar This work was done during the summer 2016 in Google-Research Mountain-View, CA

Upload: others

Post on 16-Jan-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Little Engine that Could - elad.cs.technion.ac.il

RED: Regularization by Denoising

The Little Engine that Could

Michael Elad

May 18, 2017

Joint work with:

Yaniv Romano Peyman Milanfar

This work was done during the summer 2016 in Google-Research

Mountain-View, CA

Page 2: The Little Engine that Could - elad.cs.technion.ac.il

Background and Main Objective

Page 3: The Little Engine that Could - elad.cs.technion.ac.il

This is the “simplest” inverse problem

Noise (AWGN) Clean Measured

Image Denoising: Definition

y x e

Filter

An output as close as possible to y x

Page 4: The Little Engine that Could - elad.cs.technion.ac.il

Image Denoising – Past Work

Searching Web-of-Science (May 4th, 2017) with TOPIC = image and noise and (denoising or removal)

Leads to ~4700 papers

Page 5: The Little Engine that Could - elad.cs.technion.ac.il

Image Denoising – What’s Out There?

Year

L2-based Regularization

PDE-Methods Robust statistics

Heuristic Spatially adaptive filtering

Wavelet

Sparsity Methods

Neural-Networks

1970 1975 1980 1985 1990 1995 2000 2005 2010 2015

K-SVD

BM3D NCSR

Heuristic: Bilateral

Thresholding Cycle-Spinning TNRD Frames

Wiener

Anisotropic Diffusion

TV Beltrami Hubber-Markov

GSM SURE

Patch-Methods

Kernel-Regr. EPLL

DenoiseNet

Self-Similarity Methods NLM

NL-Bayes NLM-PCA

Page 6: The Little Engine that Could - elad.cs.technion.ac.il

Is Denoising Dead?

To a large extent, removal of additive noise from an image is a solved problem in image processing

This claim is based on several key observations: There are more than 20,000 papers studying this problem The proposed methods in the past decade are highly impressive Very different algorithms tend to lead to quite similar output quality Work investigating performance bounds confirm this

[Chatterjee & Milanfar, 2010], [Levin & Nadler, 2011]

Bottom line: Improving image denoising algorithms seems to lead to diminishing returns

Page 7: The Little Engine that Could - elad.cs.technion.ac.il

Noise Removal, Sharpening

Page 8: The Little Engine that Could - elad.cs.technion.ac.il

Noise Removal, Sharpening

Page 9: The Little Engine that Could - elad.cs.technion.ac.il

Compression Artifact Removal

Page 10: The Little Engine that Could - elad.cs.technion.ac.il

Compression Artifact Removal

Page 11: The Little Engine that Could - elad.cs.technion.ac.il

Some Visual Effects

Old man Younger More fun

Image courtesy Andrzej Dragan

Page 12: The Little Engine that Could - elad.cs.technion.ac.il

Image Denoising – Can we do More?

SO if improving image denoising algorithms

is a dead-end avenue …

THEN Lets seek ways to leverage existing denoising

algorithms in order to solve

OTHER (INVERSE) PROBLEMS

Page 13: The Little Engine that Could - elad.cs.technion.ac.il

Inverse Problems ? Given the measurement y, recover its origin signal, x

A statistical tie between x and y is given

Examples: Image deblurring, inpainting, tomographic reconstruction, super-resolution, …

Classic inverse problem:

The Bayesian approach: use the conditional probability Prob(x|y) in order to estimate x. How? Maximize the conditional probability (MAP) Find the expected x given y …

2y x v where v ~ 0, H I

Page 14: The Little Engine that Could - elad.cs.technion.ac.il

“The Little Engine that Could” ?

As we are about to see, image denoisers

(we refer to these as “engines”) are capable of far more than just

… denoising, And thus this sub-title

Page 15: The Little Engine that Could - elad.cs.technion.ac.il

Laplacian Regularization

Prior-Art 1:

Page 16: The Little Engine that Could - elad.cs.technion.ac.il

Often times we can describe our denoiser as a pseudo- linear Filter

True for K-SVD, EPLL, NLM, BM3D and other algorithms,

where the overall processing is divided into a non-linear stage of decisions, followed by a linear filtering

Typical properties of W:

Row stochastic: 1 is an eigenvector, spectral radius is 1 Fast decaying eigenvalues Symmetry or Non-negativity of W? Not always …

Pseudo-Linear Denoising

Filter x (x)xW

Page 17: The Little Engine that Could - elad.cs.technion.ac.il

The “residual”

Given this form:

We may propose an image-adaptive (-)Laplacian:

The Image Laplacian

Filter x (x)xW

(x) x

(x)x

I W

L

Laplacian x x (x)xW

Page 18: The Little Engine that Could - elad.cs.technion.ac.il

Laplacians as Regularization

Regularization

Filter Residual Signal

This idea appeared is a series of papers in several variations: [Elmoataz, Lezoray, & Bougleux 2008] [Szlam, Maggioni, & Coifman 2008]

[Peyre, Bougleux & Cohen 2011] [Milanfar 2013] [Kheradmand & Milanfar 2014] [Liu, Zhai, Zhao, Zhai, & Gao 2014] [Haque, Pai, & Govindu 2014] [Romano & Elad 2015]

Tx,y x x2

L

T T1 1x x x x x x

2 2L W

Log-Likelihood Data Fidelity

Page 19: The Little Engine that Could - elad.cs.technion.ac.il

Laplacians as Regularization

The problems with this line of work are that:

1. The regularization term is hard to work with since L/W is a function of x. This is circumvented by cheating and assuming a fixed W per each iteration

2. If so, what is really the underlying energy that is being minimized?

3. When the denoiser cannot admit a pseudo-linear interpretation of W(x)x, this term is not possible to use

Tx,y x x x2

W

Page 20: The Little Engine that Could - elad.cs.technion.ac.il

The Plug-and-Play-Prior (P3) Scheme

Prior-Art 2:

Page 21: The Little Engine that Could - elad.cs.technion.ac.il

The P3 Scheme The idea of using a denoiser to solve general denoising

problems was proposed under the name “Plug-and-Play-Priors” (P3) [Venkatakrishnan, Wohlberg & Bouman, 2013]

Main idea: Use ADMM to minimize the MAP energy

MAPx

x̂ min x,y x2

x ,v

min x,y v s.t. x v2

Page 22: The Little Engine that Could - elad.cs.technion.ac.il

The P3 Scheme

The above relies on a well-known concept in optimization called the augmented Lagrange algorithm, where u is the scaled Lagrange multiplier vector

x,v

min x,y v s.t. x v2

2

2x,vmin x,y v x v u

2 2

Page 23: The Little Engine that Could - elad.cs.technion.ac.il

The P3 Scheme

Minimize the above iteratively, in an alternating-direction fashion, w.r.t. the two unknowns (and update u=u-v+x):

1. : Simple inverse problem 2. : Denoising of (x+u) !

2

2x,vmin x,y v x v u

2 2

2

2xmin x,y x v u

2

2

2vmin v x v u

2 2

Page 24: The Little Engine that Could - elad.cs.technion.ac.il

P3 Algorithm

2

2xmin x,y x v u

2

2

2vmin v x v u

2 2

Initialize

u u v x

x, v, u

Implicit Prior: This is a key feature of P3 – The idea that one can use ANY denoiser as a replacement for this stage, even if

(v) is not known

k0

If the involved terms are convex, this algorithm is guaranteed to

converge to the global optimum of the original function

Page 25: The Little Engine that Could - elad.cs.technion.ac.il

P3 Algorithm

2

2xmin x,y x v u

2

2

2vmin v x v u

2 2

Initialize

u u v x

x, v, u

Implicit Prior: This is a key feature of P3 – The idea that one can use ANY denoiser as a replacement for this stage, even if

(v) is not known

k0

If the involved terms are convex, this algorithm is guaranteed to

converge to the global optimum of the original function

Page 26: The Little Engine that Could - elad.cs.technion.ac.il

P3 Shortcomings The P3 scheme is an excellent idea, but it has few

troubling shortcomings:

Parameter tuning is TOUGH when using a general denoiser This method is tightly tied to ADMM without an option for

changing this scheme CONVERGENCE ? Unclear (steady-state at best) For an arbitrarily denoiser, no underlying & consistent

COST FUNCTION

In this work we propose an alternative which is closely related to the above ideas (both Laplacian regularization and P3) which overcomes the mentioned problems: RED

Page 27: The Little Engine that Could - elad.cs.technion.ac.il

RED: First Steps

Page 28: The Little Engine that Could - elad.cs.technion.ac.il

Regularization by Denoising [RED]

[Romano, Elad & Milanfar, 2016]

We suggest:

… for an arbitrary denoiser f(x)

T1x x x x

2 W

x 0

1. x 0

2. x f x

3. Orthogonality

Page 29: The Little Engine that Could - elad.cs.technion.ac.il

Regularization by Denoising [RED]

[Romano, Elad & Milanfar, 2016]

We suggest:

… for an arbitrary denoiser f(x)

x 0

1. x 0

2. x f x

3. Orthogonality

T1x x x f x

2

Page 30: The Little Engine that Could - elad.cs.technion.ac.il

Which f(x) to Use ?

Almost any algorithm you want may be used here, from the simplest Median (see later), all the

way to the state-of-the-art CNN-like methods

We shall require f(x) to satisfy several properties as follows …

Page 31: The Little Engine that Could - elad.cs.technion.ac.il

Denoising Filter Property I

Differentiability:

Some filters obey this requirement (NLM, Bilateral, Kernel Regression, TNRD)

Others can be -modified to satisfy this (Median, K-SVD, BM3D, EPLL, CNN, …)

n n

f(x): 0,1 0,1

Page 32: The Little Engine that Could - elad.cs.technion.ac.il

Denoising Filter Property II

Local Homogeneity: for , we have that

Filter

Filter

=

c 1 1

f(cx) cf(x)

c

c

Page 33: The Little Engine that Could - elad.cs.technion.ac.il

Are Denoisers Homogenous ?

f((1 )x) (1 )f(x)

Peppers

0.01

=5

Scatter-plot of the two sides of this equation,

along with the STD from the equality

Page 34: The Little Engine that Could - elad.cs.technion.ac.il

Implication (1)

Directional Derivative:

Homogeneity

x

0

f(x d) f(x)f x d lim

x

0

f(x x) f(x)f x x lim

0

(1 )f(x) f(x)lim f(x)

d x

Page 35: The Little Engine that Could - elad.cs.technion.ac.il

Looks Familiar ?

n×n Matrix

This is much more general than and applies to any denoiser satisfying the above conditions

We got the property xf x x f(x)

f(x) (x)x W

Page 36: The Little Engine that Could - elad.cs.technion.ac.il

Implication (2)

For small

Implication: Filter stability. Small additive

perturbations of the input don’t change the filter matrix

Directional Derivative

xf(x h) f(x) f x h

x xf x x f x h

xf x x h

2h

Page 37: The Little Engine that Could - elad.cs.technion.ac.il

Denoising Filter Property III

Passivity via the spectral radius:

Filter

x xr f x max f x 1

x xx r f x x f x x f(x)

Page 38: The Little Engine that Could - elad.cs.technion.ac.il

Are Denoisers Passive ?

A direct inspection of this property implies computing the Jacobian of f(x) and evaluating its spectral radius r by the Power-Method:

After enough iterations,

The problem is getting the Jacobian

x kk 1

x k

f x hh

f x h

Tk 1 kr h h

Page 39: The Little Engine that Could - elad.cs.technion.ac.il

Are Denoisers Passive ? The alternative is based on the Taylor expansion

The Power Method becomes:

This leads to a value smaller than 1 for K-SVD, BM3D, NLM, EPLL, and the TNRD

xf(x h) f(x) f x h xf x h f(x h) f(x)

x kk 1

x k

f x hh

f x h

kk 1

k

f x h f(x)h

f x h f(x)

Page 40: The Little Engine that Could - elad.cs.technion.ac.il

Summary of Properties

The 3 properties that f(x) should follow:

Why are these needed?

Differentiability Homogeneity Passivity

xf x exists f(cx) cf(x) xr f x 1

Directional Derivative

Filter Stability

xf x h f x x h xf x x f(x)

?

Page 41: The Little Engine that Could - elad.cs.technion.ac.il

RED: Advancing

Page 42: The Little Engine that Could - elad.cs.technion.ac.il

Regularization by Denoising (RED)

Surprisingly, this expression is differentiable:

T1(x) x x f x

2

T1(x) x x f x

2

1

x f(x) f(x)x2

x f(x)

xf x x f(x)

the residual

*

2

2

1(x) x f x

2* Why not ?

and Homogeneity

Page 43: The Little Engine that Could - elad.cs.technion.ac.il

Passivity guarantees positive definiteness of the Hessian and hence convexity

Regularization by Denoising (RED)

T1(x) x x f x

2

(x) x f x

x(x) f xI 0

xr f x 1

Relying on the differentiability

Page 44: The Little Engine that Could - elad.cs.technion.ac.il

RED for Inverse Problems

Regularization Log-Likelihood Data Fidelity

Our regularization term is convex and thus the whole expression is convex if the

Log-likelihood term is convex as well

T

xmin x,y x x f x

2

Page 45: The Little Engine that Could - elad.cs.technion.ac.il

Regularization L2-based Data Fidelity

This energy-function is convex

Any reasonable optimization algorithm will get to the global minimum if applied correctly

RED for Linear Inverse Problems

2 T

2x

1min x y x x f x

2 2

H

Page 46: The Little Engine that Could - elad.cs.technion.ac.il

Numerical Approach: Gradient Descent

Guaranteed to converge for 0<<1, when H is stable

Drawback: Every iteration applies f(x) but mildly “inverting” the blur effect

2 T

2x

1min x y x x f x

2 2

H

T Tk 1 k kx x y f(x )I H H I H

Page 47: The Little Engine that Could - elad.cs.technion.ac.il

T Tk 1 k kx x y f(x )I H H I H

M b b

M

k 1xf(x) kx

Numerical Approach: Gradient Descent

Page 48: The Little Engine that Could - elad.cs.technion.ac.il

Numerical Approach II: ADMM

First: apply variable splitting by setting x = v in the second term:

Second: Use Augmented Lagrangian:

2 T

2x

1min x y x x f x

2 2

H

2 T

2x,v

1min x y v v f v s.t. x v

2 2

H

2 2T

2 2x,v

1min x y v v f v x v u

2 2 2

H

Page 49: The Little Engine that Could - elad.cs.technion.ac.il

Numerical Approach II: ADMM

Solve a simple linear system

The P3 differs from this ADMM in this stage, offering to compute

Solve by fixed point iteration

2 2

2 2x

1min x y x v u

2 2

H

2T

2vmin v v f v x v u

2 2

Third: Optimize w.r.t. x and v alternatingly

Update x: Update v:

v f v x v u 0

k 1 k k 1v f v x v u 0 k

k 1

f v x uv

k 1v f x u

Page 50: The Little Engine that Could - elad.cs.technion.ac.il

Numerical Approach II: ADMM

Solve a simple linear system

The P3 differs from this ADMM in this stage, offering to compute

Solve by fixed point iteration

2 2

2 2x

1min x y x v u

2 2

H

2T

2vmin v v f v x v u

2 2

Third: Optimize w.r.t. x and v alternatingly

Update x: Update v:

v f v x v u 0

k 1 k k 1v f v x v u 0 k

k 1

f v x uv

k 1v f x u

Page 51: The Little Engine that Could - elad.cs.technion.ac.il

Numerical Approach: Fixed Point

Guaranteed to converge due to the passivity of f(x)

2 T

2x

1min x y x x f x

2 2

H

T x y x f x 0 H H

Tk 1 k 1 kx y x f x 0 H H

1T T

k 1 kx y f x

H H I H

Page 52: The Little Engine that Could - elad.cs.technion.ac.il

Numerical Approach III: Fixed Point

1 1T T T

k 1 kx y f x

H H I H H H I

M b

b

f(x) k 1x kx M

Page 53: The Little Engine that Could - elad.cs.technion.ac.il

A connection to CNN

While CNN use a trivial and weak non-linearity f(●), we propose a very aggressive and image-aware denoiser

Our scheme is guaranteed to minimize a clear and relevant objective function

f(x) M f(x) M k 1x kxk 1x

M f(x)k 1z kz

M f(x)k 1z

k 1 kz f z b M

k 1z kz

Page 54: The Little Engine that Could - elad.cs.technion.ac.il

A connection to CNN

While CNN use a trivial and weak non-linearity f(●), we propose a very aggressive and image-aware denoiser

Our scheme is guaranteed to minimize a clear and relevant objective function

f(x) M f(x) M k 1x kxk 1x

M f(x)k 1z kz

M f(x)k 1z

k 1 kz f z b M

k 1z kz

Page 55: The Little Engine that Could - elad.cs.technion.ac.il

RED Underlying Model RED assumes the following Prior Theorem: With respect to the above prior, f(x) is

the MMSE estimator

Conjecture: With respect to this prior, minimizing gives the optimal MMSE solution

TProb x C exp c x x f x

2 T

2x

1min x y x x f x

2 2

H

Page 56: The Little Engine that Could - elad.cs.technion.ac.il

So, again, Which f(x) to Use ?

Almost any algorithm you want may be used here, from the simplest Median (see later), all the way to the state-of-the-art CNN-like methods

Comment: Our approach has one hidden parameter – the level

of the noise (σ) the denoiser targets. We simply fix this parameter for now. But more work is required to investigate its effect

Page 57: The Little Engine that Could - elad.cs.technion.ac.il

RED and P3 Equivalence

Page 58: The Little Engine that Could - elad.cs.technion.ac.il

RED vs. P3 ?

ADMM can be used for both P3 and RED, leading to two similar (yet different) algorithms

These algorithms differ in the update stage of v (assign y=x+u): P3: v=f(y) RED: v is the solution of (v-f(v))-(y-v)=0

Question: Under which conditions on f(x) (and ) would the two be the same?

Page 59: The Little Engine that Could - elad.cs.technion.ac.il

RED for Denoising?

Assume f(x)=Wx in these equations, in order to simplify them: P3: v=f(y)=Wy RED: (v-f(v))-(y-v)=(v-Wv)-(y-v)=0

Thus, for getting the same outcome we require

y 0v yv v&v W W

y 0y y y W WWW

Page 60: The Little Engine that Could - elad.cs.technion.ac.il

RED for Denoising?

We got the equation

Answer: The two algorithms coincide if the

eigenvalues of W are either 1 or /, i.e. only when the denoiser is highly degenerated

2 y y 0 I W W I W I W

y 0y y y W WWW

Page 61: The Little Engine that Could - elad.cs.technion.ac.il

Denoising via RED

Page 62: The Little Engine that Could - elad.cs.technion.ac.il

RED for Denoising?

2 T

2x

1min x y x x f x

2 2

x y x f x 0

f x x W

1

x y x x 0 x y

W I I W

This is a “sharper” version of the original denoiser, known to modify the trading between bias and variance

[Elmoataz et al. ’08],[Bougleux et al. ‘09]

Page 63: The Little Engine that Could - elad.cs.technion.ac.il

RED for Denoising? Question: Could we claim this just gives Wy?

(i.e. RED is equivalent to the denoiser it is built upon?)

Answer: Lets plug x=Wy into the equation

The two filters are equivalent if the eigenvalues of W are either 1 or 1/

Surprisingly: This condition is similar (equivalent for =1) to the one leading to equivalence of RED and P3

21 y 0 I W W

yx y x x 0 y 0yy W WW W W

Page 64: The Little Engine that Could - elad.cs.technion.ac.il

RED in Practice

Page 65: The Little Engine that Could - elad.cs.technion.ac.il

Examples: Deblurring

Uniform 9×9 kernel and WAGN

with 2=2

Page 66: The Little Engine that Could - elad.cs.technion.ac.il

Examples: Convergence Comparison

Page 67: The Little Engine that Could - elad.cs.technion.ac.il

Examples: Deblurring

Uniform 9×9 kernel and WAGN

with 2=2

Page 68: The Little Engine that Could - elad.cs.technion.ac.il

Examples: 3x Super-Resolution

Degradation: - A Gaussian 7×7

blur with width 1.6

- A 3:1 down-sampling and

- WAGN with =5

Page 69: The Little Engine that Could - elad.cs.technion.ac.il

Examples: 3x Super-Resolution

Degradation: - A Gaussian 7×7

blur with width 1.6

- A 3:1 down-sampling and

- WAGN with =5

Page 70: The Little Engine that Could - elad.cs.technion.ac.il

Sensitivity to Parameters

Page 71: The Little Engine that Could - elad.cs.technion.ac.il

Sensitivity to Parameters

Page 72: The Little Engine that Could - elad.cs.technion.ac.il

Conclusions

Page 73: The Little Engine that Could - elad.cs.technion.ac.il

What have we Seen Today ?

RED – a method to take a denoiser and use it sequentially for solving inverse problems

Main benefits: Clear objective being minimized, Convexity, flexibility to use almost any denoiser and any optimization scheme

One could refer to RED as a way to substantiate earlier methods (Laplacian-Regularization and the P3) and fix them

Challenges: Trainable version? Compression? MMSE conjecture?

Page 74: The Little Engine that Could - elad.cs.technion.ac.il

1. “Regularization by Denoising”, Y. Romano, M. Elad, and P. Milanfar, To appear, SIAM J. on Imaging Science.

2. “A Tour of Modern Image Filtering”, P. Milanfar, IEEE Signal Processing Magazine, no. 30, pp. 106–128, Jan. 2013

3. “A General Framework for Regularized, Similarity-based Image Restoration”, A. Kheradmand, and P. Milanfar, IEEE Trans on Image Processing, vol. 23, no. 12, Dec. 2014

4. “Boosting of Image Denoising Algorithms”, Y. Romano, M. Elad, SIAM J. on Image Science, vol. 8, no. 2, 2015

5. “How to SAIF-ly Improve Denoising Performance”, H. Talebi , X. Zhu, P. Milanfar, IEEE Trans. On Image Proc., vol. 22, no. 4, 2013

6. “Plug-and-Play Priors for Model-Based Reconstruction”, S.V. Venkatakakrishnan, C.A. Bouman, B. Wohlberg, GlobalSIP, 2013

7. “BM3D-AMP: A New Image Recovery Algorithm Based on BM3D Denoising”, C.A. Metzler, A. Maleki, and R.G. Baraniuk, ICIP 2015

8. “Is Denoising Dead?”, P. Chatterjee, P. Milanfar, IEEE Trans. On Image Proc., vol. 19, no. 4, 2010

10. “Symmetrizing Smoothing Filters”, P. Milanfar, SIAM Journal on Imaging Sciences, Vol. 6, No. 1, pp. 263–284

11. “What Regularized Auto-Encoders Learn from The Data-Generating Distribution”, G. Alain, and Y. Bengio, JMLR, vol.15, 2014,

12. “Representation Learning: A Review and New Perspectives”, Y. Bengio, A. Courville, and P. Vincent, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, Aug. 2013

Relevant Reading