gaussian sampling by local perturbations - ttic.uchicago.edugpapan/pubs/confr/papandreou... ·...

1
f 1 f 2 f 3 f 4 f L x 1 x 2 x 3 x N Gaussian Sampling by Local Perturbations George Papandreou and Alan Yuille University of California, Los Angeles http://civs.stat.ucla.edu {gpapan,yuille}@stat.ucla.edu 1. Summary - Inference (find mean/covariance): solving linear system Algorithm (GMRF sampling as optimization) 1. Perturb the quadratic potential means 2. Find the minimum of the perturbed energy 5. Variance estimation - Posterior variance (diagonal of - difficult to compute) - Quantifies the estimation uncertainty - Needed in RVM and compressed sensing computations - Efficient sampling leads to Monte-Carlo estimate 6. Conditionally Gaussian models - Gaussian models often inadequate for real-world data - Rich family of conditionally Gaussian models 7. Conditionally Gaussian examples Acknowledgments: Work supported by grants NSF 0917141 and AFOSR 9550-08-1-0489. 2. Gaussian Markov random fields - Exact sampling in Gaussian Markov random fields (GMRFs) as mean/mode computation of a locally perturbed GMRF: sampling reduced to optimization - Efficient Monte-Carlo estimation of Gaussian variance - Block-Gibbs sampling in Gaussian mixture models: continuous or discrete, local or distributed latent variables - Signal analysis: Bayesian total variation (TV) - Image modeling: inpainting, RBMs with continuous units 3. GMRF inference & standard sampling 4. Sampling by local perturbations - Energy function defined by quadratic potentials [1] D. Ackley, G. Hinton, and T. Sejnowski. A learning algorithm for Boltzmann machines. Cogn. Science, 9(1):147–169, 1985. [2] D. Andrews and C. Mallows. Scale mixtures of normal distributions. JRSS (B), 36(1):99–102, 1974. [3] J. Besag. Spatial interaction and the statistical analysis of lattice systems. JRSS (B), 36(2):192–236, 1974. [4] D. Geman and C. Yang. Nonlinear image recovery with half-quadratic regularization. IEEE Trans. Image Process., 4(7):932–946, 1995. [5] S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. PAMI, 6(6):721–741, 1984. [6] G. Golub and C. Van Loan. Matrix Computations. John Hopkins Press, 1996. [7] G. Hinton. Training products of experts by minimizing contrastive divergence. Neur. Comp., 14(8):1771–1800, 2002. [8] A. Kokaram. Motion Picture Restoration. Springer, 1998. [9] H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proc. ICML, 2009. [10] S. Lyu and E. Simoncelli. Modeling multiscale subbands of photographic images with elds of Gaussian scale mixtures. IEEE Trans. PAMI, 31(4):693–706, Apr. 2009. [11] D. MacKay. Bayesian interpolation. Neur. Comp., 4(3):415–447, 1992. [12] D. Malioutov, J. Johnson, M. Choi, and A. Willsky. Low-rank variance approximation in GMRF models: Single and multiscale approaches. IEEE Trans. Signal Process., 56(10):4621–4634, Oct. 2008. [13] D. Malioutov, J. Johnson, and A. Willsky. Walk sums and belief propagation in Gaussian graphical models. J. of Mach. Learning Res., 7:2031–2064, 2006. [14] M. Nikolova. Model distortions in Bayesian MAP reconstruction. Inv. Pr. and Imag., 1(2):399–422, 2007. [15] C. Paige and M. Saunders. LSQR: An algorithm for sparse linear equations and sparse least squares. ACM Trans. on Math. Software, 8(1):43–71, 1982. [16] G. Papandreou, P. Maragos, and A. Kokaram. Image inpainting with a wavelet domain hidden Markov tree model. In Proc. ICASSP, pages 773–776, 2008. [17] T. Park and G. Casella. The Bayesian lasso. J. of the Amer. Stat. Assoc., 103(482):681–686, 2008. [18] M. Ranzato, A. Krizhevsky, and G. Hinton. Factored 3-way restricted Boltzmann machines for modeling natural images. In Proc. AISTATS, 2010. [19] S. Roth and M. Black. Fields of experts. Int. J. of Comp. Vis., 82(2):205– 229, 2009. [20] S. Roweis and Z. Ghahramani. A unifying review of linear Gaussian models. Neur. Comp., 11:305–345, 1999. [21] H. Rue. Fast sampling of Gaussian Markov random elds. JRSS (B), 63(2):325–338, 2001. [22] H. Rue and L. Held. Gaussian Markov random elds. Theory and Applications. Chapman & Hall, 2005. [23] U. Schmidt, Q. Gao, and S. Roth. A generative perspective on MRFs in low-level vision. In CVPR, 2010. [24] M. Schneider and A. Willsky. Krylov subspace estimation. SIAM J. Sci. Comp., 22(5):1840–1864, 2001. [25] M. Seeger and H. Nickisch. Large scale variational inference and experimental design for sparse generalized linear models. Technical Report TR-175, MPI for Biological Cybernetics, 2008. [26] M. Seeger, H. Nickisch, R. Pohmann, and B. Sch¨ olkopf. Bayesian experimental design of magnetic resonance imaging sequences. In NIPS, pages 1441–1448, 2008. [27] J. Skilling. Bayesian numerical analysis. In W. Grandy and P. Milonni, editors, Physics and Probability, pages 207–221. Cambridge Univ. Press, 1993. [28] E. Sudderth, M. Wainwright, and A. Willsky. Embedded trees: Estima- tion of Gaussian processes on graphs with cycles. IEEE Trans. Signal Process., 52(11):3136–3150, Nov. 2004. [29] R. Szeliski. Bayesian modeling of uncertainty in low-level vision. Int. J. of Comp. Vis., 5(3):271–301, 1990. [30] R. Szeliski and D. Terzopoulos. From splines to fractals. In Proc. ACM SIGGRAPH, pages 51–60, 1989. [31] D. Terzopoulos. The computation of visible-surface representations. IEEE Trans. PAMI, 10(4):417–438, 1988. [32] M. Tipping. Sparse Bayesian learning and the relevance vector machine. J. of Mach. Learning Res., 1:211–244, 2001. [33] Y. Weiss and W. Freeman. Correctness of belief propagation in Gaussian graphical models of arbitrary topology. Neur. Comp., 13(10):2173–2200, 2001. [34] Y. Weiss and W. Freeman. What makes a good model of natural images? In CVPR, 2007. [35] M. Welling, G. Hinton, and S. Osindero. Learning sparse topographic representations with products of Student-t distributions. In NIPS, 2002. [36] A. Willsky. Multiresolution Markov models for signal and image processing. Proc. IEEE, 90(8):1396–1458, 2002. [37] S. Zhu, Y. Wu, and D. Mumford. Filters, random elds and maximum entropy (FRAME): Towards a unied theory for texture modeling. Int. J. of Comp. Vis., 27(2):107–126, 1998. filter matrix : information params: Univariate experts: - Direct methods (special structure and/or medium scale) - sparse Cholesky decomposition - FFT techniques for stationary GMRFs - Iterative methods (generic, ) - multigrid, conj. gradients - loopy BP - Standard Gaussian sampling by Cholesky factorization - Covariance form - Information form (typically sparser) (sampling from simpler GMRF) (perturbed GMRF mean inference) - Sampling reduced to optimization - Efficient inference algorithms used intact for sampling - Practical whenever perturbation step is simple Linear Gaussian model: Univariate experts: Replace with Linear Gaussian model: Replace and with Posterior mean/sample in image inpainting (0.25 sec, 498x495 pix) original image intensity at edges posterior mean posterior sample - Estimation error drops as masked image posterior mean standard deviation (S=20) q 1 q 2 q 3 q 4 q J f 1 f 2 f 3 f 4 f L x 1 x 2 x 3 x N Latent variables layer: - continouous GSM or discrete GMM - local or distributed - Data Gaussian given : - Block-Gibbs sampling of (rapidly mixing): GMRF sample latent var sample - Summarize samples into : MMSE, sample, etc. sample mean Rao- Blackwell Bayesian TV denoising TV prior as GSM: Learning dependent GRBM receptive fields ML fitting: - Estimated by contrastive divergence, using block-Gibbs sampling - Can also handle full- image models

Upload: others

Post on 24-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Gaussian Sampling by Local Perturbations - ttic.uchicago.edugpapan/pubs/confr/Papandreou... · 2010-12-04 · 6. Conditionally Gaussian models - Gaussian models often inadequate for

f1 f2 f3 f4 fL

x1 x2 x3 xN

Gaussian Sampling by Local PerturbationsGeorge Papandreou and Alan Yuille

University of California, Los Angeleshttp://civs.stat.ucla.edu

{gpapan,yuille}@stat.ucla.edu

1. Summary

- Inference (find mean/covariance): solving linear system

Algorithm (GMRF sampling as optimization)

1. Perturb the quadratic potential means

2. Find the minimum of the perturbed energy

5. Variance estimation- Posterior variance (diagonal of - difficult to compute)

- Quantifies the estimation uncertainty

- Needed in RVM and compressed sensing computations

- Efficient sampling leads to Monte-Carlo estimate

6. Conditionally Gaussian models- Gaussian models often inadequate for real-world data

- Rich family of conditionally Gaussian models

7. Conditionally Gaussian examples

Acknowledgments: Work supported by grants NSF 0917141 and AFOSR 9550-08-1-0489.

2. Gaussian Markov random fields

- Exact sampling in Gaussian Markov random fields

(GMRFs) as mean/mode computation of a locally

perturbed GMRF: sampling reduced to optimization

- Efficient Monte-Carlo estimation of Gaussian variance

- Block-Gibbs sampling in Gaussian mixture models:

continuous or discrete, local or distributed latent variables

- Signal analysis: Bayesian total variation (TV)

- Image modeling: inpainting, RBMs with continuous units

3. GMRF inference & standard sampling

4. Sampling by local perturbations

- Energy function defined by quadratic potentials

[1] D. Ackley, G. Hinton, and T. Sejnowski. A learning algorithm forBoltzmann machines. Cogn. Science, 9(1):147–169, 1985.

[2] D. Andrews and C. Mallows. Scale mixtures of normal distributions.JRSS (B), 36(1):99–102, 1974.

[3] J. Besag. Spatial interaction and the statistical analysis of lattice systems.JRSS (B), 36(2):192–236, 1974.

[4] D. Geman and C. Yang. Nonlinear image recovery with half-quadraticregularization. IEEE Trans. Image Process., 4(7):932–946, 1995.

[5] S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, andthe Bayesian restoration of images. IEEE Trans. PAMI, 6(6):721–741,1984.

[6] G. Golub and C. Van Loan. Matrix Computations. John Hopkins Press,1996.

[7] G. Hinton. Training products of experts by minimizing contrastivedivergence. Neur. Comp., 14(8):1771–1800, 2002.

[8] A. Kokaram. Motion Picture Restoration. Springer, 1998.[9] H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng. Convolutional

deep belief networks for scalable unsupervised learning of hierarchicalrepresentations. In Proc. ICML, 2009.

[10] S. Lyu and E. Simoncelli. Modeling multiscale subbands of photographicimages with elds of Gaussian scale mixtures. IEEE Trans. PAMI,31(4):693–706, Apr. 2009.

[11] D. MacKay. Bayesian interpolation. Neur. Comp., 4(3):415–447, 1992.[12] D. Malioutov, J. Johnson, M. Choi, and A. Willsky. Low-rank variance

approximation in GMRF models: Single and multiscale approaches.IEEE Trans. Signal Process., 56(10):4621–4634, Oct. 2008.

[13] D. Malioutov, J. Johnson, and A. Willsky. Walk sums and beliefpropagation in Gaussian graphical models. J. of Mach. Learning Res.,7:2031–2064, 2006.

[14] M. Nikolova. Model distortions in Bayesian MAP reconstruction. Inv.Pr. and Imag., 1(2):399–422, 2007.

[15] C. Paige and M. Saunders. LSQR: An algorithm for sparse linearequations and sparse least squares. ACM Trans. on Math. Software,8(1):43–71, 1982.

[16] G. Papandreou, P. Maragos, and A. Kokaram. Image inpainting witha wavelet domain hidden Markov tree model. In Proc. ICASSP, pages773–776, 2008.

[17] T. Park and G. Casella. The Bayesian lasso. J. of the Amer. Stat. Assoc.,103(482):681–686, 2008.

[18] M. Ranzato, A. Krizhevsky, and G. Hinton. Factored 3-way restrictedBoltzmann machines for modeling natural images. In Proc. AISTATS,2010.

[19] S. Roth and M. Black. Fields of experts. Int. J. of Comp. Vis., 82(2):205–229, 2009.

[20] S. Roweis and Z. Ghahramani. A unifying review of linear Gaussianmodels. Neur. Comp., 11:305–345, 1999.

[21] H. Rue. Fast sampling of Gaussian Markov random elds. JRSS (B),63(2):325–338, 2001.

[22] H. Rue and L. Held. Gaussian Markov random elds. Theory andApplications. Chapman & Hall, 2005.

[23] U. Schmidt, Q. Gao, and S. Roth. A generative perspective on MRFsin low-level vision. In CVPR, 2010.

[24] M. Schneider and A. Willsky. Krylov subspace estimation. SIAM J. Sci.Comp., 22(5):1840–1864, 2001.

[25] M. Seeger and H. Nickisch. Large scale variational inference andexperimental design for sparse generalized linear models. TechnicalReport TR-175, MPI for Biological Cybernetics, 2008.

[26] M. Seeger, H. Nickisch, R. Pohmann, and B. Scholkopf. Bayesianexperimental design of magnetic resonance imaging sequences. In NIPS,pages 1441–1448, 2008.

[27] J. Skilling. Bayesian numerical analysis. In W. Grandy and P. Milonni,editors, Physics and Probability, pages 207–221. Cambridge Univ. Press,1993.

[28] E. Sudderth, M. Wainwright, and A. Willsky. Embedded trees: Estima-tion of Gaussian processes on graphs with cycles. IEEE Trans. SignalProcess., 52(11):3136–3150, Nov. 2004.

[29] R. Szeliski. Bayesian modeling of uncertainty in low-level vision. Int.J. of Comp. Vis., 5(3):271–301, 1990.

[30] R. Szeliski and D. Terzopoulos. From splines to fractals. In Proc. ACMSIGGRAPH, pages 51–60, 1989.

[31] D. Terzopoulos. The computation of visible-surface representations.IEEE Trans. PAMI, 10(4):417–438, 1988.

[32] M. Tipping. Sparse Bayesian learning and the relevance vector machine.J. of Mach. Learning Res., 1:211–244, 2001.

[33] Y. Weiss and W. Freeman. Correctness of belief propagation in Gaussiangraphical models of arbitrary topology. Neur. Comp., 13(10):2173–2200,2001.

[34] Y. Weiss and W. Freeman. What makes a good model of natural images?In CVPR, 2007.

[35] M. Welling, G. Hinton, and S. Osindero. Learning sparse topographicrepresentations with products of Student-t distributions. In NIPS, 2002.

[36] A. Willsky. Multiresolution Markov models for signal and imageprocessing. Proc. IEEE, 90(8):1396–1458, 2002.

[37] S. Zhu, Y. Wu, and D. Mumford. Filters, random elds and maximumentropy (FRAME): Towards a unied theory for texture modeling. Int.J. of Comp. Vis., 27(2):107–126, 1998.

filter matrix :

information params:

Univariate experts:

- Direct methods (special structure and/or medium scale) - sparse Cholesky decomposition - FFT techniques for stationary GMRFs- Iterative methods (generic, ) - multigrid, conj. gradients - loopy BP

- Standard Gaussian sampling by Cholesky factorization - Covariance form - Information form (typically sparser)

(sampling from simpler GMRF) (perturbed GMRF mean inference)

- Sampling reduced to optimization

- Efficient inference algorithms used intact for sampling

- Practical whenever perturbation step is simple

Linear Gaussian model:

Univariate experts: Replace with

Linear Gaussian model: Replace and with

Posterior mean/sample in image inpainting (0.25 sec, 498x495 pix)

original image intensity at edges posterior mean posterior sample

- Estimation error drops as

masked image posterior mean standard deviation (S=20)

q1 q2 q3 q4 qJ

f1 f2 f3 f4 fL

x1 x2 x3 xN

Latent variables layer:

- continouous GSM or

discrete GMM

- local or distributed

- Data Gaussian given :

- Block-Gibbs sampling of (rapidly mixing):

GMRF sample

latent var sample

- Summarize samples into : MMSE, sample, etc.samplemean

Rao-Blackwell

Bayesian TV denoising

TV prior as GSM:

Learning dependent GRBM

rece

ptiv

e fie

lds

ML fitting:

- Estimated by contrastive divergence, using block-Gibbs sampling- Can also handle full-image models