efficient variational inference in large-scale bayesian …gpapan/pubs/confr/papandreou... ·...

26
Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing George Papandreou and Alan Yuille Department of Statistics University of California, Los Angeles ICCV Workshop on Information Theory in Computer Vision November 13, 2011, Barcelona, Spain

Upload: others

Post on 29-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Efficient Variational Inference in Large-Scale Bayesian …gpapan/pubs/confr/Papandreou... · 2011-11-13 · George Papandreou and Alan Yuille Department of Statistics University

Efficient Variational Inference in Large-Scale

Bayesian Compressed Sensing

George Papandreou and Alan Yuille

Department of StatisticsUniversity of California, Los Angeles

ICCV Workshop on Information Theory in Computer Vision

November 13, 2011, Barcelona, Spain

Page 2: Efficient Variational Inference in Large-Scale Bayesian …gpapan/pubs/confr/Papandreou... · 2011-11-13 · George Papandreou and Alan Yuille Department of Statistics University

Inverse Image Problems

Denoising Deblurring Inpainting

2 / 22

Page 3: Efficient Variational Inference in Large-Scale Bayesian …gpapan/pubs/confr/Papandreou... · 2011-11-13 · George Papandreou and Alan Yuille Department of Statistics University

The Sparse Linear Model

A hidden vector x ∈ RN and noisy measurements y ∈ R

M .

Sparse linear model

P(x;θ) ∝K∏

k=1

t(gTk x)

P(y|x;θ) = N (y;Hx, σ2I)

g1 g2 g3 gK

x1 x2 x3 x4 xN

h1 h2 h3 hM

◮ Sparsity directions: s = Gx, with G = [gT1 ; . . . ;g

TK ]

◮ Measurement directions: H = [hT1 ; . . . ;h

TM ]

◮ Sparse potential: t(s), e.g., Laplacian t(s) = e−τk |sk |

◮ Model parameters: θ = (G,H, σ2)

3 / 22

Page 4: Efficient Variational Inference in Large-Scale Bayesian …gpapan/pubs/confr/Papandreou... · 2011-11-13 · George Papandreou and Alan Yuille Department of Statistics University

Deterministic or Probabilistic Modeling?

Deterministic modeling: Standard Compressive Sensing

◮ Find minimum energy configuration

◮ Same as finding the posterior MAP

Probabilistic modeling: Bayesian Compressive Sensing

◮ Try to capture the full posterior distribution

◮ Suitable for learning parameters by maximum likelihood

(ML)

◮ Harder than just point estimate

4 / 22

Page 5: Efficient Variational Inference in Large-Scale Bayesian …gpapan/pubs/confr/Papandreou... · 2011-11-13 · George Papandreou and Alan Yuille Department of Statistics University

Deterministic Modeling

MAP estimate as an optimization problem

Estimate is xMAP = argminφMAP(x), where

φMAP(x) = σ−2‖y − Hx‖2 − 2

K∑

k=1

log t(sk ) , sk = gTk x .

Properties

◮ Modern optimization techniques allow us find xMAP

efficiently for large-scale problems.

5 / 22

Page 6: Efficient Variational Inference in Large-Scale Bayesian …gpapan/pubs/confr/Papandreou... · 2011-11-13 · George Papandreou and Alan Yuille Department of Statistics University

Deterministic Modeling

MAP estimate as an optimization problem

Estimate is xMAP = argminφMAP(x), where

φMAP(x) = σ−2‖y − Hx‖2 − 2

K∑

k=1

log t(sk ) , sk = gTk x .

Properties

◮ Modern optimization techniques allow us find xMAP

efficiently for large-scale problems.

◮ How much do we trust the solution? What about error

bars?

◮ Is the MAP best in terms of PSNR performance?

5 / 22

Page 7: Efficient Variational Inference in Large-Scale Bayesian …gpapan/pubs/confr/Papandreou... · 2011-11-13 · George Papandreou and Alan Yuille Department of Statistics University

Probabilistic Modeling

Work with the full posterior distribution

P(x|y) ∝ N (y;Hx, σ2I)K∏

k=1

t(gTk x) .

Pri

or/

Me

asu

reP

oste

rio

r

(Figure from Seeger & Wipf, ’10)6 / 22

Page 8: Efficient Variational Inference in Large-Scale Bayesian …gpapan/pubs/confr/Papandreou... · 2011-11-13 · George Papandreou and Alan Yuille Department of Statistics University

Probabilistic ModelingMarkov Chain Monte-Carlo vs. Variational Bayes

Markov Chain Monte-Carlo

◮ Draw samples from the posterior

◮ Typically model prior with Gaussian mixtures and perform

block Gibbs sampling.

◮ Very general, but can be slow and difficult to monitor

convergence

◮ [Schmidt, Rao & Roth ’10], [Papandreou & Yuille, ’10], ...

Variational Bayes

◮ Approximate the posterior distribution with a tractable

parametric form

◮ Systematic error but often guaranteed convergence

◮ [Attias, ’99], [Girolami, ’01], [Lewicki & Sejnowski, ’00], [Palmer et al., ’05], [Levin

et al., ’11], [Seeger & Nickisch, ’11], ...7 / 22

Page 9: Efficient Variational Inference in Large-Scale Bayesian …gpapan/pubs/confr/Papandreou... · 2011-11-13 · George Papandreou and Alan Yuille Department of Statistics University

Variational Bounding

◮ Approximate the posterior distribution with a Gaussian

Q(x|y) ∝ N (y;Hx, σ2I)e− 12

sTΓ−1s = N (x; xQ,A

−1) ,

with xQ = A−1b , A = σ−2HT H + GTΓ−1G ,

Γ = diag(γ) , and b = σ−2HT y .

◮ Suitable for super-Gaussian priors

t(sk ) = supγk>0

e−s2k/(2γk )−hk (γk )/2

◮ Optimization problem: Find the variational parameters γ

that give the tightest fit.

8 / 22

Page 10: Efficient Variational Inference in Large-Scale Bayesian …gpapan/pubs/confr/Papandreou... · 2011-11-13 · George Papandreou and Alan Yuille Department of Statistics University

Variational Bounding: Double-Loop Algorithm

Outer Loop: Variance Computation

Compute z = diag(GA−1GT ), i.e. the vector of variances

zk = VarQ(sk |y) along the sparsity directions sk = gTk x.

Inner Loop: Smoothed Estimation

◮ Obtain the variational mean xQ = argminx φQ(x; z), where

φQ(x; z) = σ−2‖y − Hx‖2 − 2

K∑

k=1

log t(

(s2k + zk )

1/2)

◮ Update the variational parameters

γ−1k = −2

d log t(√

v)

dv

v=s2k+zk

Convex if standard MAP is convex. See [Seeger & Nickisch, ’11].9 / 22

Page 11: Efficient Variational Inference in Large-Scale Bayesian …gpapan/pubs/confr/Papandreou... · 2011-11-13 · George Papandreou and Alan Yuille Department of Statistics University

Variance Computation

Goal: Estimate elements of Σ = A−1, where

A = σ−2HT H + GTΓ−1G

◮ Direct inversion is hopeless (N ≈ 106).

◮ Accurate and fast techniques for problems of special

structure [Malioutov et al., ’08].

◮ Lanczos iteration (only MVM required) [Schneider & Willsky, ’01],

[Seeger & Nickisch, ’11].

10 / 22

Page 12: Efficient Variational Inference in Large-Scale Bayesian …gpapan/pubs/confr/Papandreou... · 2011-11-13 · George Papandreou and Alan Yuille Department of Statistics University

Variance Computation

Goal: Estimate elements of Σ = A−1, where

A = σ−2HT H + GTΓ−1G

◮ Direct inversion is hopeless (N ≈ 106).

◮ Accurate and fast techniques for problems of special

structure [Malioutov et al., ’08].

◮ Lanczos iteration (only MVM required) [Schneider & Willsky, ’01],

[Seeger & Nickisch, ’11].

◮ This work: Monte-Carlo variance estimation.

10 / 22

Page 13: Efficient Variational Inference in Large-Scale Bayesian …gpapan/pubs/confr/Papandreou... · 2011-11-13 · George Papandreou and Alan Yuille Department of Statistics University

Gaussian Sampling by Local Perturbations

g1 g2 g3 gK

x1 x2 x3 x4 xN

h1 h2 h3 hM

g1 g2 g3 gK

x1 x2 x3 x4 xN

h1 h2 h3 hM

Gaussian MRF sampling by local noise injection

1. Local Perturbations : y ∼ N (0, σ2I), and β ∼ N (0,Γ−1)

2. Gaussian Mode : Ax = σ−2HT y + GT β

Then x ∼ N (0,A−1), where A = σ−2HT H + GTΓ−1G.

[Papandreou & Yuille, ’10]

11 / 22

Page 14: Efficient Variational Inference in Large-Scale Bayesian …gpapan/pubs/confr/Papandreou... · 2011-11-13 · George Papandreou and Alan Yuille Department of Statistics University

Monte-Carlo Variance Estimation

Let xi ∼ N (0,A−1), with i = 1, . . . ,Ns.

General purpose Monte-Carlo variance estimator

Σ =1

Ns

Ns∑

i=1

xi xTi , zk =

1

Ns

Ns∑

i=1

s2k ,i ,

where sk ,i = gTk xi .

Properties

◮ Marginal distribution of estimates zk/zk ∼ 1Nsχ2(Ns).

◮ Unbiased E {zk} = zk .

◮ Relative error is r = ∆(zk )/zk =√

2/Ns.

12 / 22

Page 15: Efficient Variational Inference in Large-Scale Bayesian …gpapan/pubs/confr/Papandreou... · 2011-11-13 · George Papandreou and Alan Yuille Department of Statistics University

Monte-Carlo vs. Lanczos Variance Estimates

0 2 4 6 8

x 10−3

0

2

4

6

8x 10

−3

zk

zk

SAMPLELANCZOSEXACT

13 / 22

Page 16: Efficient Variational Inference in Large-Scale Bayesian …gpapan/pubs/confr/Papandreou... · 2011-11-13 · George Papandreou and Alan Yuille Department of Statistics University

Application: Image Deconvolution

≈ ∗

◮ Measurement equation: y ≈ k ∗ x = Hx.

◮ Non-blind deconvolution (known blur kernel k).

◮ Blind deconvolution (unknown blur kernel k).

14 / 22

Page 17: Efficient Variational Inference in Large-Scale Bayesian …gpapan/pubs/confr/Papandreou... · 2011-11-13 · George Papandreou and Alan Yuille Department of Statistics University

Blind Image Deconvolution

Blur kernel recovery by Maximum Likelihood

◮ ML objective: k = argmaxk P(y; k) = argmaxk

P(y, x; k)dx.

◮ Variational ML: k = argmaxk Q(y; k)

◮ Contrast with argmaxk (maxx P(x, y; k)).

◮ [Fergus et al., ’06], [Levin et al., ’09].

15 / 22

Page 18: Efficient Variational Inference in Large-Scale Bayesian …gpapan/pubs/confr/Papandreou... · 2011-11-13 · George Papandreou and Alan Yuille Department of Statistics University

Variational EM for Maximum Likelihood

Find k by maximizing Q(y; k) [Girolami, ’01], [Levin et al., ’11].

E-Step

Given current kernel estimate kt , do variational Bayesian

inference, i.e., fit Q(x|y; kt).

M-Step

Maximize w.r.t. k the expected complete log-likelihood

EQ(x|y;kt ) {log Q(x, y; k)}. Equivalently, minimize w.r.t. k

EQ(x|y;kt )

{

1

2‖y − Hx‖2

}

=1

2tr(

(HT H)(A−1 + xxT ))

− yT Hx + (const)

=1

2kT Rxxk − rT

xyk + (const)

Expected moments Rxx estimated by Gaussian sampling.

16 / 22

Page 19: Efficient Variational Inference in Large-Scale Bayesian …gpapan/pubs/confr/Papandreou... · 2011-11-13 · George Papandreou and Alan Yuille Department of Statistics University

Summary of Computational Primitives

Smoothed estimationObtain the variational mean xQ = argminx φQ(x; z), where

φQ(x; z) = σ−2‖y − Hx‖2 − 2

K∑

k=1

log t(

(s2k + zk )

1/2)

◮ Inner loop of variational inference.

Sparse linear system

Ax = b, where A = σ−2HT H + GTΓ−1G .

◮ Estimate variances in outer loop of variational inference

and moments Rxx in blind image deconvolution.

17 / 22

Page 20: Efficient Variational Inference in Large-Scale Bayesian …gpapan/pubs/confr/Papandreou... · 2011-11-13 · George Papandreou and Alan Yuille Department of Statistics University

Summary of Computational Primitives

Smoothed estimationObtain the variational mean xQ = argminx φQ(x; z), where

φQ(x; z) = σ−2‖y − Hx‖2 − 2

K∑

k=1

log t(

(s2k + zk )

1/2)

◮ Inner loop of variational inference.

Sparse linear system

Ax = b, where A = σ−2HT H + GTΓ−1G .

◮ Estimate variances in outer loop of variational inference

and moments Rxx in blind image deconvolution.

◮ Solve with preconditioned conjugate gradients.17 / 22

Page 21: Efficient Variational Inference in Large-Scale Bayesian …gpapan/pubs/confr/Papandreou... · 2011-11-13 · George Papandreou and Alan Yuille Department of Statistics University

Efficient Circulant Preconditioning

Approximate

A = σ−2HT H + GTΓ−1G with P = σ−2HT H + γ−1GT G ,

with γ−1 , (1/K )∑K

k=1 γ−1k [Lefkimmiatis et al., ’12].

Properties

◮ Thanks to stationarity of P, DFT techniques apply.

◮ Optimality: P = argminX∈C‖X − A‖

18 / 22

Page 22: Efficient Variational Inference in Large-Scale Bayesian …gpapan/pubs/confr/Papandreou... · 2011-11-13 · George Papandreou and Alan Yuille Department of Statistics University

Effect of Preconditioner

0 20 40 60 80 100 12010

−15

10−10

10−5

100

105

1010

CGPCG

19 / 22

Page 23: Efficient Variational Inference in Large-Scale Bayesian …gpapan/pubs/confr/Papandreou... · 2011-11-13 · George Papandreou and Alan Yuille Department of Statistics University

Non-Blind Image Deblurring Example

ground truth our result (PSNR=31.93dB)

blurred (PSNR=22.57dB) VB stdev

20 / 22

Page 24: Efficient Variational Inference in Large-Scale Bayesian …gpapan/pubs/confr/Papandreou... · 2011-11-13 · George Papandreou and Alan Yuille Department of Statistics University

Blind Image Deblurring Example

ground truth our result (PSNR=27.54dB)

blurred (PSNR=22.57dB) kernel

21 / 22

Page 25: Efficient Variational Inference in Large-Scale Bayesian …gpapan/pubs/confr/Papandreou... · 2011-11-13 · George Papandreou and Alan Yuille Department of Statistics University

Summary

Main Points

◮ Variational Bayesian inference using standard optimization

primitives.

◮ Scalable to large-scale problems.

◮ Open question: Monte-Carlo or Variational?

Page 26: Efficient Variational Inference in Large-Scale Bayesian …gpapan/pubs/confr/Papandreou... · 2011-11-13 · George Papandreou and Alan Yuille Department of Statistics University

Summary

Main Points

◮ Variational Bayesian inference using standard optimization

primitives.

◮ Scalable to large-scale problems.

◮ Open question: Monte-Carlo or Variational?

Our software integrated in the glm-ie open source toolbox.

THANK YOU!