extending expectation propagation for graphical models

Extending Expectation Propagation for Graphical Models

Yuan (Alan) Qi

Joint work with Tom Minka

Motivation

• Graphical models are widely used in real-world applications, such as wireless communications and bioinformatics.

• Inference techniques on graphical models often sacrifice efficiency for accuracy or sacrifice accuracy for efficiency.

• Need a method that better balances the trade-off between accuracy and efficiency.

Motivation

Computational Time

orCurrentTechniques

What we want

Outline

• Background on expectation propagation (EP)

• Extending EP on Bayesian networks for dynamic systems– Poisson tracking– Signal detection for wireless communications

• Tree-structured EP on loopy graphs

• Conclusions and future work

Outline

Graphical Models

Directed

( Bayesian networks)

Undirected

( Markov networks)

ipai ppp )|()|()( )()( xyxxyx, a

aZp )(

1)( yx,yx,

Inference on Graphical Models

• Bayesian inference techniques:– Belief propagation (BP): Kalman filtering

/smoothing, forward-backward algorithm– Monte Carlo: Particle filter/smoothers,

MCMC• Loopy BP: typically efficient, but not accurate

on general loopy graphs• Monte Carlo: accurate, but often not efficient

Expectation Propagation in a Nutshell

• Approximate a probability distribution by simpler parametric terms:

For directed graphs:

For undirected graphs:

• Each approximation term lives in an exponential family (e.g. Gaussian)

afp )()( xx a

afq )(~

)|()( ajaia xxpf x

)()( aaf xx

EP in a Nutshell• The approximate term minimizes the

following KL divergence by moment matching:

))()(~

||)()((minarg \\

Where the leave-one-out approximation is

Limitations of Plain EP

• Can be difficult or expensive to analytically compute the needed moments in order to minimize the desired KL divergence.

• Can be expensive to compute and maintain a valid approximation distribution q(x), which is coherent under marginalization. – Tree-structured q(x): )()( iji xq,xxq

Three Extensions1. Instead of choosing the approximate term to

minimize the following KL divergence:

))()(~

||)()((minarg \\

use other criteria.2. Use numerical approximation to compute moments: Quadrature or Monte Carlo.

3. Allow the tree-structured q(x) to be non-coherent during the iterations. It only needs to be coherent in the end.

Efficiency vs. Accuracy

Computational Time

Extended EP ?

Monte Carlo

Loopy BP (Factorized EP)

Outline

Object Tracking

Guess the position of an object given noisy observations

Object

Bayesian Network

ttt νxx 1

noise tt xy

(random walk)e.g.

want distribution of x’s given y’s

x1 x2 xT

y1 y2 yT

Approximation

1111 )|()|()|()(),(t

tttt xypxxpxypxpp yx

111111 )(~)(~)(~)(~)()(t

tttttttt xoxpxpxoxpq x

Factorized and Gaussian in x

Message Interpretation

)(~)(~)(~)( 11 tttttttt xpxoxpxq

= (forward msg)(observation msg)(backward msg)

Forward Message

Backward Message

Observation Message

EP on Dynamic Systems• Filtering: t = 1, …, T

– Incorporate forward message

– Initialize observation message

• Smoothing: t = T, …, 1– Incorporate the backward message

– Compute the leave-one-out approximation by dividing out the old observation messages

– Re-approximate the new observation messages

• Re-filtering: t = 1, …, T– Incorporate forward and observation messages

Extensions of EP• Instead of matching moments, use any method for

approximate filtering.– Examples: statistical linearization, unscented Kalman

filter (UKF), mixture of Kalman filters

Turn any deterministic filtering method into a smoothing method!

All methods can be interpreted as finding linear/Gaussian approximations to original terms.

• Use quadrature or Monte Carlo for term approximations

Example: Poisson Tracking

• is an integer valued Poisson variate with mean )exp( tx

Poisson Tracking Model

)01.0,(~)|( 11 ttt xNxxp

)100,0(~)( 1 Nxp

!/)exp()|( tx

tttt yexyxyp t

Extension of EP: Approximate Observation Message

• is not Gaussian• Moments of x not analytic• Two approaches:

– Gauss-Hermite quadrature for moments– Statistical linearization instead of moment-

matching (Turn unscented Kalman filters into a smoothing method)

• Both work well

)|()|( tttt yxpxyp

Approximate vs. Exact Posteriorp(

x T| y

Extended EP vs. Monte Carlo: Accuracy

Variance

Accuracy/Efficiency Tradeoff

EP for Digital Wireless Communication

• Signal detection problem

• Transmitted signal st =

• vary to encode each symbol

• Complex representation:

)sin( ta

Binary Symbols, Gaussian Noise

• Symbols are 1 and –1 (in complex plane)

• Received signal yt =

• Optimal detection is easy

noiset s

Fading Channel

• Channel systematically changes amplitude and phase:

• changes over time

noise ttt sxy

1sxt 11sxt

Benchmark: Differential Detection

• Classical technique

• Use previous observation to estimate state

• Binary symbols only

Bayesian network for Signal Detection

x1 x2 xT

y1 y2 yT

s1 s2 sT

Extended-EP Joint Signal Detection and Channel Estimation

• Turn mixture of Kalman filters into a smoothing method

• Smoothing over the last observations

• Observations before act as prior for the current estimation

Computational Complexity• Expectation propagation O(nLd2)

• Stochastic mixture of Kalman filters O(LMd2)

• Rao-blackwised particle smoothers O(LMNd2)

n: Number of EP iterations (Typically, 4 or 5)

d: Dimension of the parameter vector

L: Smooth window length

M: Number of samples in filtering (Often larger than 500)

N: Number of samples in smoothing (Larger than 50)

EP is about 5,000 times faster than Rao-blackwised particle smoothers.

Experimental Results

EP outperforms particle smoothers in efficiency with comparable accuracy.

(Chen, Wang, Liu 2000)

Signal-Noise-Ratio Signal-Noise-Ratio

Bayesian Networks for Adaptive Decoding

x1 x2 xT

y1 y2 yT

e1 e2 eT

The information bits et are coded by a convolutional

error-correcting encoder.

EP Outperforms Viterbi Decoding

Signal-Noise-Ratio

Outline

Inference on Loopy Graphs

Problem: estimate marginal distributions of the variables indexed by the nodes in a loopy graph, e.g., p(xi), i = 1, . . . , 16.

X1 X2 X3 X4

X5 X6 X7 X8

X9 X10 X11 X12

X13 X14 X15 X16

4-node Loopy Graph

afp )()( xx

Joint distribution is product of pairwise potentials for all edges:

Want to approximate by a simpler distribution)(xp

BP vs. TreeEP

1x1xBP TreeEP

Junction Tree Representation

p(x) q(x) Junction tree

Two Kinds of Edges

• On-tree edges, e.g., (x1,x4): exactly incorporated into the junction tree

• Off-tree edges, e.g., (x1,x2): approximated by projecting them onto the tree structure

KL Minimization • KL minimization moment matching

• Match single and pairwise marginals of

• Reduces to exact inference on single loops– Use cutset conditioning

Matching Marginals on Graph

(1) Incorporate edge (x3 x4)

x1 x3 x1 x4

x3 x6x5 x7

x1 x3 x1 x4

x3 x6x5 x7

x1 x3 x1 x4

Drawbacks of Global Propagation

• Update all the cliques even when only incorporating one off-tree edge – Computationally expensive

• Store each off-tree data message as a whole tree– Require large memory size

Solution: Local Propagation

• Allow q(x) be non-coherent during the iterations. It only needs to be coherent in the end.

• Exploit the junction tree representation: only locally propagate information within the minimal loop (subtree) that is directly connected to the off-tree edge.– Reduce computational complexity

– Save memory

x1 x3 x1 x4

(1) Incorporate edge(x3 x4)

(2) Propagate evidence

On this simple graph, local propagation runs roughly 2 times faster and uses 2 times less memory to store messages than plain EP

New Interpretation of TreeEP

• Marry EP with Junction algorithm

• Can perform efficiently over hypertrees and hypernodes

4-node Graph

TreeEP = the proposed method GBP = generalized belief propagation on triangles TreeVB = variational tree BP = loopy belief propagation = Factorized EP MF = mean-field

Fully-connected graphs

Results are averaged over 10 graphs with randomly generated potentials• TreeEP performs the same or better than all other methods in both accuracy and efficiency!

8x8 grids, 10 trials

Method FLOPS Error

Exact 30,000 0

TreeEP 300,000 0.149

BP/double-loop 15,500,000 0.358

GBP 17,500,000 0.003

TreeEP versus BP and GBP

• TreeEP is always more accurate than BP and is often faster

• TreeEP is much more efficient than GBP and more accurate on some problems

• TreeEP converges more often than BP and GBP

Outline

Conclusions

• Extend EP on graphical models:– Instead of minimizing KL divergence, use other

sensible criteria to generate messages. Effectively turn any deterministic filtering method into a smoothing method.

– Use quadrature to approximate messages.– Local propagation to save the computation and

memory in tree structured EP.

Conclusions

• Extended EP algorithms outperform state-of-art inference methods on graphical models in the trade-off between accuracy and efficiency

Computational Time

Extended EP

State-of-artTechniques

Future Work

• More extensions of EP:– How to choose a sensible approximation family

(e.g. which tree structure) – More flexible approximation: mixture of EP?– Error bound?– Bayesian conditional random fields

• More real-world applications

Contact information:

yuanqi@media.mit.edu

Extended EP Accuracy Improves Significantly in only a Few Iterations

EP versus BP

• EP approximation is in a restricted family, e.g. Gaussian

• EP approximation does not have to be factorized

• EP applies to many more problems– e.g. mixture of discrete/continuous variables

EP versus Monte Carlo

• Monte Carlo is general but expensive

• EP exploits underlying simplicity of the problem if it exists

• Monte Carlo is still needed for complex problems (e.g. large isolated peaks)

• Trick is to know what problem you have

(Loopy) Belief propagation

• Specialize to factorized approximations:

• Minimize KL-divergence = match marginals of (partially factorized) and (fully factorized)– “send messages”

iaia xff )(~

x “messages”

)()( \ xx aa qf

)()(~ \ xx a

Limitation of BP

• If the dynamics or measurements are not linear and Gaussian, the complexity of the posterior increases with the number of measurements

• I.e. BP equations are not “closed”– Beliefs need not stay within a given family

* or any other exponential family

Approximate filtering

• Compute a Gaussian belief which approximates the true posterior:

• E.g. Extended Kalman filter, statistical linearization, unscented filter, assumed-density filter

)|()( ttt yxpxq

EP perspective

• Approximate filtering is equivalent to replacing true measurement/dynamics equations with linear/Gaussian equations

),|()|(

ttttt yxp

yyxpxyp

)|()|(),|( ttttttt yxpxypyyxp

implies

Gaussian

EP perspective

• EKF, UKF, ADF are all algorithms for:

)|( tt xyp

)|( 1tt xxp

Nonlinear,Non-Gaussian

Linear,Gaussian

)|(~tt xyp

)|(~1tt xxp

Terminology

• Filtering: p(xt|y1:t )

• Smoothing: p(xt|y1:t+L ) where L>0

• On-line: old data is discarded (fixed memory)

• Off-line: old data is re-used (unbounded memory)

Kalman filtering / Belief propagation

• Prediction:

• Measurement:

• Smoothing:

111 )|()|()|( ttttttt dxyxpxxpyxp

)|()|(),|( ttttttt yxpxypyyxp

11111 ),|()|()|(),|( ttttttttttt dxyyxpxxpyxpyyxp

Approximate an Edge by a Tree

xxfxxfxxfxxf

Each potential f a in p is projected onto the tree-structure of q

Correlations between two nodes are not lost, but projected onto the tree

Graphical ModelsDirected Undirected

Generative Bayesian networks Boltzman machines

Conditional (Discriminative)

Maximum entropy Markov models

Conditional random fields

EP on Dynamic SystemsDirected Undirected

EP on Boltzman machinesDirected Undirected

Future WorkDirected Undirected

extending expectation propagation for graphical models

expectation propagation

loopy graphsconclusions

belief propagation bp

treestructured qx

bayesian networks

approximation term lives

numerical approximation

nutshellthe approximate

Documents

expectation-maximization algorithm for clustering ... ·...

an expectation conditional maximization approach for...

customer expectation

expectation experiments

expectation maximization (intuition) expectation ......

expectation & experience

expectation #004

expectation management

expectation maximization

statistics - dr. bhimrao ambedkar university....

extending visual programming: graphical component based...

principal component analysis machine learning. last time...

praise for expectation hangover - christine...

cfcs - expectation and variance; chebyshev's …expectation...

mathematical expectation

extending expectation propagation on graphical models yuan...

research article extending the touchscreen pattern lock...

expectation setting

extending graphical representations for compact closed...

uwcisa symposium 2009 j. mack robinson college of business /...