john cunningham and david knowles machine learning rcc 08...

Post on 08-Aug-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

John Cunninghamand

David Knowles

Machine Learning RCC08 December 2011

Approximate Inference

• Motivation

• Taxonomy

• Summations

• Estimators

• Easier Integrals

• Summary

Outline

• Motivation

• Taxonomy

• Summations

• Estimators

• Easier Integrals

• Summary

Outline

Probabilistic Inference

• Bayes Rule:

Probabilistic Inference

• Bayes Rule:

• (frequentist/statistical inference)

• (Bayesian/non-Bayesian distinction)

• (conjugate models)

• (enumerable simple cases)

• Many (most?) problems of interest in inference can be written as an integral of type:

• Examples:

• Posterior mean and moments:

• Data likelihood and model selection:

• Prediction:

Just an Integral

Central Object of Interest

Central Object of Focus

• Why not...

Central Object of Focus

• Why not...

• message passing on the factor graph?• explains {BP,VB,EP,Gibbs,etc.} nicely• abstracts approximate inference to message calculation• mechanistic, not actually the problem we are trying to solve

Central Object of Focus

• Why not...

• message passing on the factor graph?• explains {BP,VB,EP,Gibbs,etc.} nicely• abstracts approximate inference to message calculation• mechanistic, not actually the problem we are trying to solve

• the posterior?• pretty much the same thing, but again not often the core problem

• A huge field• Bishop PRML: ~100 pages• MacKay Info Theory, Inference, ... : ~180 pages• Murphy ML: A Probabilistic Perspective: ~110 pages• MLSS: ~half a day

Fool’s Errand

• A huge field• Bishop PRML: ~100 pages• MacKay Info Theory, Inference, ... : ~180 pages• Murphy ML: A Probabilistic Perspective: ~110 pages• MLSS: ~half a day

• Scope of this talk: • tutorial view of the field• incorporate by reference where possible• details where (hopefully) valuable

Fool’s Errand

• Motivation

• Taxonomy

• Summations

• Estimators

• Easier Integrals

• Summary

Outline

Approximate Inference Taxonomy

Approximate Inference Taxonomy

Approximate Inference Taxonomy

• “Replace hard integrals with summations”

• Sampling methods• Central problem: how to

sample • Monte Carlo, MCMC,

Gibbs, etc.

Approximate Inference Taxonomy

• “Replace hard integrals with summations”

• Sampling methods• Central problem: how to

sample • Monte Carlo, MCMC,

Gibbs, etc.

Approximate Inference Taxonomy

• “Replace hard integrals with easier integrals”

• Message Passing on Factor Graph

• Central problem: how to find

• VB, EP, etc.• Note (cheat): Also

“replace hard sums with easier sums”: BP, LBP,etc.

• “Replace hard integrals with summations”

• Sampling methods• Central problem: how to

sample • Monte Carlo, MCMC,

Gibbs, etc.

Approximate Inference Taxonomy

• “Replace hard integrals with easier integrals”

• Message Passing on Factor Graph

• Central problem: how to find

• VB, EP, etc.• Note (cheat): Also

“replace hard sums with easier sums”: BP, LBP,etc.

• “Replace hard integrals with summations”

• Sampling methods• Central problem: how to

sample • Monte Carlo, MCMC,

Gibbs, etc.

• “Replace hard integrals with estimators”

• “Non-Bayesian” methods• Central problem: how to

find • MAP, ML, Laplace, Nested

Laplace, etc.

Approximate Inference Taxonomy

• “Replace hard integrals with easier integrals”

• Message Passing on Factor Graph

• Central problem: how to find

• VB, EP, etc.• Note (cheat): Also

“replace hard sums with easier sums”: BP, LBP,etc.

• “Replace hard integrals with summations”

• Sampling methods• Central problem: how to

sample • Monte Carlo, MCMC,

Gibbs, etc.

• “Replace hard integrals with estimators”

• “Non-Bayesian” methods• Central problem: how to

find • MAP, ML, Laplace, Nested

Laplace, etc.

Deterministic MethodsRandom Methods

• Motivation

• Taxonomy

• Summations

• Estimators

• Easier Integrals

• Summary

Outline

Approximate Inference Taxonomy

• “Replace hard integrals with summations”

• Sampling methods• Central problem: how to

sample • Monte Carlo, MCMC,

Gibbs, etc.

Summations

• Two basic types: Sampling and MCMC

• “Instead of choosing [points] randomly, then weighting them..., we choose [points] with a probability... and weight them evenly.” - Metropolis et al (1953).

Summations

• Sampling:

Summations

• Sampling:

Summations

• Sampling:

• “pick an arbitrary point and weight it by what you care about.”

• MC, importance, rejection.

• MH/MCMC:

Summations

• Sampling:

• “pick an arbitrary point and weight it by what you care about.”

• MC, importance, rejection.

• MH/MCMC:

Summations

• Sampling:

• “pick an arbitrary point and weight it by what you care about.”

• MC, importance, rejection.

• MH/MCMC:

Summations

• Sampling:

• “pick an arbitrary point and weight it by what you care about.”

• MC, importance, rejection.

• MH/MCMC:

• “pick a point from what you care about and weight it evenly.”

• MH, MCMC, AIS, Gibbs, HMC, Slice Sampling, ESS, Hamiltonian MCMC, RML, ...

Summations

• Sampling:

• “pick an arbitrary point and weight it by what you care about.”

• MC, importance, rejection.

Big Topic, Incorporated by Reference

• Iain Murray’s MLSS lectures: http://videolectures.net/mlss09uk_murray_mcmc/

• Motivation

• Taxonomy

• Summations

• Estimators

• Easier Integrals

• Summary

Outline

Approximate Inference Taxonomy

• “Replace hard integrals with estimators”

• “Non-Bayesian” methods• Central problem: how to

find • MAP, ML, Laplace, Nested

Laplace, etc.

Approximate Inference Taxonomy

• “Replace hard integrals with estimators”

• “Non-Bayesian” methods• Central problem: how to

find • MAP, ML, Laplace, Nested

Laplace, etc.

• Laplace

Approximate Inference Taxonomy

• “Replace hard integrals with estimators”

• “Non-Bayesian” methods• Central problem: how to

find • MAP, ML, Laplace, Nested

Laplace, etc.

• Laplace

• MAP

Approximate Inference Taxonomy

• “Replace hard integrals with estimators”

• “Non-Bayesian” methods• Central problem: how to

find • MAP, ML, Laplace, Nested

Laplace, etc.

• Nested Laplace• Rue and Martino (2009), “Approximate

Bayesian Inference for latent Gaussian models by using integrated nested Laplace approximations”, JRSSB.

Approximate Inference Taxonomy

• “Replace hard integrals with estimators”

• “Non-Bayesian” methods• Central problem: how to

find • MAP, ML, Laplace, Nested

Laplace, etc.

• Nested Laplace• Rue and Martino (2009), “Approximate

Bayesian Inference for latent Gaussian models by using integrated nested Laplace approximations”, JRSSB.

• ...but see Cseke and Heskes (2011), “Approximate marginals in latent Gaussian models”, JMLR.

Approximate Inference Taxonomy

• “Replace hard integrals with estimators”

• “Non-Bayesian” methods• Central problem: how to

find • MAP, ML, Laplace, Nested

Laplace, etc.

• Motivation

• Taxonomy

• Summations

• Estimators

• Easier Integrals

• Summary

Outline

Approximate Inference Taxonomy

• “Replace hard integrals with easier integrals”

• Message Passing on Factor Graph

• Central problem: how to find

• VB, EP, etc.• Note (cheat): Also

“replace hard sums with easier sums”: BP, LBP,etc.

Message Passing on Factor Graph

Belief Propagation / Sum-product

Approximate messages

Expectation Propagation (EP)

Expectation Propagation (EP)

Instead...

~

~

Instead, EP does this...

Moment Match

A - Form “CAVITY” B - Add a true factor and “PROJECT”

~

~

Instead, EP does this...

Moment Match

A - Form “CAVITY” B - Add a true factor and “PROJECT”

At convergence, we have...

~

~

Approximately this...

Moment Match

Beyond Simple EP

Variational Bayes / Variational Message Passing

Variational Bayes / Variational Message Passing

Summary of Message Passing Perspective

• Exclusive (VB mode-seeking) vs. inclusive (EP) KL, consequences for multimodality

• Damping for EP• Power EP• More structured approximations (GBP, tree EP,

structured VB)• Connection to EM • Infer.NET

Things to be aware of

• Motivation

• Taxonomy

• Summations

• Estimators

• Easier Integrals

• Summary

Outline

Approximate Inference Taxonomy

• “Replace hard integrals with easier integrals”

• Message Passing on Factor Graph

• Central problem: how to find

• VB, EP, etc.• Note (cheat): Also

“replace hard sums with easier sums”: BP, LBP,etc.

• “Replace hard integrals with summations”

• Sampling methods• Central problem: how to

sample • Monte Carlo, MCMC,

Gibbs, etc.

• “Replace hard integrals with estimators”

• “Non-Bayesian” methods• Central problem: how to

find • MAP, ML, Laplace, Nested

Laplace, etc.

Summary of Features

Summary of Features

• exact (eventually)

• fast/efficient in big-huge cases (at times the only option)

• poor for model selection

• slow error convergence

• analytically useful

• fits into many MLschemes (bounds)

• fast/efficient in small-medium cases

• no exactness (ignores some features of true integral)

Summary of Features

• exact (eventually)

• fast/efficient in big-huge cases (at times the only option)

• poor for model selection

• slow error convergence

• analytically useful

• fits into many MLschemes (bounds)

• fast/efficient in small-medium cases

• no exactness (ignores some features of true integral)

Summary of Features

• exact (eventually)

• fast/efficient in big-huge cases (at times the only option)

• poor for model selection

• slow error convergence

• quick and dirty

• often works well

• quick and dirty (local... ignores many features of true integral)

• Has many names and duplicate fields, but in the end is just numerical integration

• Disappointingly (necessarily?) fractured field

• Inherently problem-specific

Conclusion

Resources• Books

• Bishop (2006) “Pattern Recognition and Machine Learning”, Chapters 10-11• Murphy (2012) “ML: A Probabilistic Perspective”, Chapters 18 - 22

• Rasmussen and Williams (2006), “Gaussian Processes for Machine Learning”, Chapter 3 (for EP and Laplace).

• MacKay (2003) “Information Theory, Inference, and Learning Algorithms”, Part IV

• Video• MLSS 09: Murray (MCMC): http://videolectures.net/mlss09uk_murray_mcmc/

• MLSS 09: Minka (Min Divergence): http://videolectures.net/mlss09uk_minka_ai/

• Papers• Wainwright and Jordan (2008), “Graphical Models, Exponential Families, and

Variational Inference.” Foundations and Trends in Machine Learning.

• Winn and Bishop (2005), “Variational Message Passing”, JMLR.• Minka and Winn (2009): Gates, NIPS

• Hennig (2011), “Approximate Inference in Graphical Models” (PhD Thesis), Chap 2.• Kuss and Rasmussen (2005), “Assessing Approximate Inference for Binary

Gaussian Process Classification”, JMLR.

top related