free energy approximation - drexel university college of ...outline i basics of graphical model i...
TRANSCRIPT
![Page 1: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/1.jpg)
Free Energy Approximation
Solmaz Torabi
Dept. of Electrical and Computer EngineeringDrexel [email protected]
Advisor: Dr. John M. Walsh
June 19, 2014
1/101
hey
1
![Page 2: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/2.jpg)
Refrences
M. Opper and D. Saad, “Advanced mean field methods: Theory andpractice,” MIT press, 2001.
J. S. Yedidia, W. T. Freeman, and Y. Weiss, “Constructingfree-energy approximations and generalized belief propagationalgorithms.” Information Theory, IEEE Transactions, vol. 51, 2005.
M. Welling and Y. W. Teh, “Approximate inference in boltzmannmachines,” Artificial Intelligence, vol. 143, pp. 19–50, 2003.
A. Montanari, “Lecture notes, inference in graphical models,” 2011.
2/101
hey
2
![Page 3: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/3.jpg)
Outline
I Basics of graphical model
I Basics of message passing algorithm
I Variational free energy
I Mean field approximation
I TAP ( Thouless, Anderson and Palmer )
I Region Based approximation
I Bethe free energy
I Kikuchi approximation
3/101
hey
3
![Page 4: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/4.jpg)
Undirected graphical model, Markov random field
Undirected graphical model with random vector X = (X1, ...,Xn)
I Given an undirected graph G = (V ,E ), each node s has anassociated random variable Xs
I A clique C ⊆ V is a fully connected subset of V .
I The distribution p factorizes according to G if it can be expressed asa product over cliques.
p(x) =1
Z
∏C∈C
ψC (xC )
p(x) =1
Zψ1(x1, x2, x3)ψ2(x3, x4, x5)ψ3(x4, x5, x6)ψ4(x4, x7)
4/101
hey
4
![Page 5: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/5.jpg)
graphical model, Factor Graph
I Factor graph is bipartite graph G = (V ,F ,E ), where V is theoriginal set of vertices, and (s, a) ∈ E if xs participates in the factorindexed by a ∈ F
I We assume that the functions fa(xa) are non-negative and finite.
P(X) =1
Z
∏a
fa(xa)
P(x) =1
ZfA(x1, x2)fB(x2, x3, x4)fC (x4)
5/101
hey
5
![Page 6: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/6.jpg)
graphical model- Undirected graph, Factor Graph
I Maximal cliques:C = {1, 2, 3, 4}, {4, 5, 6}, {6, 7}
I Vertex set V = {1, ..., 7}factor set F = {a, b, c}
P(x) =1
Zfa(x1, x2, x3, x4)fb(x4, x5, x6)fc(x6, x7)
6/101
hey
6
![Page 7: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/7.jpg)
Pairwise graphical model
I Subclass of Markov networks commonly encounteredI Ising model, Boltzmann machines
I Computer vision
P(x1, x2, ...xN) =1
Z
∏(ij)
ψij(xi , xj)∏i
ψi (xi )
where ψij(xi , xj) is compatibility function and ψi (xi ) is the evidenceof node iψi : X → R+ for each i ∈ Vψij : X × X → R+ for each (i , j) ∈ E
7/101
hey
7
![Page 8: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/8.jpg)
Boltzmann distribution
I Physicists specialize on the class of distribution P known asBoltzman distribution (Gibbs distribution)
P(X) =e−H[X]
Z
I H(X) is the energy of each state
I Z =∑X
e−H[X] is the normalizing partition function
I Pair-wise Markov random Field
P(X) =1
Z
∏(ij)
ψij(xi , xj)∏i
ψi (xi ) =e−H[X]
Z
energy is
H[X] = −∑ij
lnψij(xi , xj)−∑i
lnψi (xi )
8/101
hey
8
![Page 9: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/9.jpg)
Ising model
I An example of pairwise model with ψij(xi , xj) = exp{Jijxixj},ψi (xi ) = exp{θixi}
I is a mathematical model of ferromagnetism in statistical mechanics.
I xi represents magnetic dipole moments of atomic spins,xi ∈ {+1,−1}, any two adjacent sites i , j has an interaction Jij
I each site i has an external magnetic field θi
I The energy for each configuration is
H(X) = −∑i,j
Jijxixj −∑i
θixi
I The configuration probability is
P(X) =e−H(X)
Z=
e−
∑i,j
Jijxixj−∑i
θixi
Z 9/101
hey
9
![Page 10: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/10.jpg)
Inference tasks
I Computing marginal distribution p(xA) over a particular subsetA ⊂ V on nodes.
I Computing conditional distribution P(xA|xB)
I Computing the most probable configurations. (MAP)
x = argmaxx∈Xm
P(x)
10/101
hey
10
![Page 11: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/11.jpg)
Outline
I Basics of graphical model
I Basics of message passing algorithm
I Variational free energy
I Mean field approximation
I TAP ( Thouless, Anderson and Palmer )
I Region Based approximation
I Bethe free energy
I Kikuchi approximation
11/101
hey
11
![Page 12: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/12.jpg)
Belief propagation
I BP is a method for computing marginal probability functions.
I The computed marginal probability is exact if the factor graph hasno cycles.
mi→a(xi ) =∏
c∈N(i)\a
mc→i (xi )
ma→i (xi ) =∑xa\xi
fa(xa)∏
c∈N(i)\a
mc→i (xi )
I i is used as general index over variables, a over factors.
12/101
hey
12
![Page 13: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/13.jpg)
Belief propagation
In case this iteration converges, marginals are approximated by,
bi (xi ) ∝∏a∈Ni
ma→i (xi )
ba(xa) ∝ fa(xa)∏i∈Na
mi→a(xi )
I In general LBP may not converge.I If it does, bi (xi ) may not be close to the true marginal P(xi ).
I The set of pseudomarginals b may not be realizable.
13/101
hey
13
![Page 14: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/14.jpg)
Outline
I Basics of graphical model
I Basics of message passing algorithm
I Variational free energy
I Mean field approximation
I TAP ( Thouless, Anderson and Palmer )
I Region Based approximation
I Bethe free energy
I Kikuchi approximation
14/101
hey
14
![Page 15: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/15.jpg)
Write down the energy function
Construct an approximation
Find the stationary condition
15/101
hey
15
![Page 16: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/16.jpg)
Variational free energy
I Variational method approximates an intractable distribution P(X) ofrandom variables X = (S1, ...,SN) by a tractable distribution Q(X)
I Q is chosen to minimize certain distance measure.
KL(Q||P) =∑X
Q(X) lnQ(X)
P(X)=⟨
lnQ
P
⟩Q
where 〈.〉Q denotes the expectation with respect to Q
16/101
hey
16
![Page 17: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/17.jpg)
Variational free energy
To find the best approximate to P = e−H(X)
Z
KL(Q||P) = ln Z + E [Q]− S [Q]
where
I S [Q] = −∑X
Q(X) ln Q(X) is the entropy of Q
I E [Q] =∑X
Q(X)H[X] is called average energy
=⇒ minQ
KL(Q||P) = ln Z + minQ
(E [Q]− S [Q])︸ ︷︷ ︸Variational free energy
17/101
hey
17
![Page 18: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/18.jpg)
Variational free energy for Ising model
I The model under consideration is a Boltzmann machine.
P(X) =e−H(X)
Z=
e−
∑i,j
Jijxixj−∑i
θixi
Z
I For binary variable it is convenient to reparametrize these marginalsas follows,
pi (xi = 1) =1 + mi
2
18/101
hey
18
![Page 19: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/19.jpg)
Mean Field approximation
Find a factorized distribution that best describes the true distribution.
I For binary variable the most general factorized distribution has theform.
QMF (x) =∏i
Qi (xi ) =∏i
(1 + ximi )
2
I KL(QMF ||P) = E (QMF )− S(QMF ) + log(Z )
I E (QMF ) =∑
QMFH(x) = −∑ij
Jijmimj −∑i
θimi
I S(QMF ) = −∑i
QMF ln QMF = −∑i
(1+mi
2 ln 1+mi
2 + 1−mi
2 ln 1−mi
2
)
19/101
hey
19
![Page 20: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/20.jpg)
Mean Field approximation
How to solve?
minmi
KL(QMF ||P)
I By taking derivative with respect to mi
I ∂∂mi
{−∑ij
Jijmimj−∑i
θimi+∑i
1+mi
2 ln 1+mi
2 + 1−mi
2 ln 1−mi
2 +log(Z )}
20/101
hey
20
![Page 21: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/21.jpg)
Mean Field fixed points
∂KL
∂mi= −
∑j∈N(i)
Jijmj − θi + log( mi
1−mi
)I Fixed points of MF approximation:
mi =
exp(∑j
Jijmj + θi )− exp(−∑j
Jijmj − θi )
exp(∑j
Jijmj + θi ) + exp(−∑j
Jijmj − θi )
⇒ mi = tanh(∑j
Jijmj + θi ), i = 1, ...,N
21/101
hey
21
![Page 22: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/22.jpg)
Mean Field
mi = tanh(∑j
Jijmj + θi ), i = 1, ...,N (1)
Note
I The intractable task of computing marginals has been replaced bythe problem of solving a set of nonlinear equations.
I These MF equations are run sequentially, i.e. we fix all mj except mi .
I In each step MF free energy is convex. Equation (1) finds minimumin one step.
I This procedure can be interpreted as coordinate descent in the mi
I Alternatively, all parameters mi can be updated in parallel.
I Doesn’t guarantee of decreasing the cost function at each iteration.I There might be many solutions to (1).
I Some of the solutions may not be local minima
22/101
hey
22
![Page 23: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/23.jpg)
Mean Field
mi = tanh(∑j
Jijmj + θi ), i = 1, ...,N (1)
Note
I The intractable task of computing marginals has been replaced bythe problem of solving a set of nonlinear equations.
I These MF equations are run sequentially, i.e. we fix all mj except mi .
I In each step MF free energy is convex. Equation (1) finds minimumin one step.
I This procedure can be interpreted as coordinate descent in the mi
I Alternatively, all parameters mi can be updated in parallel.
I Doesn’t guarantee of decreasing the cost function at each iteration.I There might be many solutions to (1).
I Some of the solutions may not be local minima
22/101
hey
23
![Page 24: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/24.jpg)
Mean Field
mi = tanh(∑j
Jijmj + θi ), i = 1, ...,N (1)
Note
I The intractable task of computing marginals has been replaced bythe problem of solving a set of nonlinear equations.
I These MF equations are run sequentially, i.e. we fix all mj except mi .
I In each step MF free energy is convex. Equation (1) finds minimumin one step.
I This procedure can be interpreted as coordinate descent in the mi
I Alternatively, all parameters mi can be updated in parallel.
I Doesn’t guarantee of decreasing the cost function at each iteration.I There might be many solutions to (1).
I Some of the solutions may not be local minima
22/101
hey
24
![Page 25: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/25.jpg)
Mean Field
mi = tanh(∑j
Jijmj + θi ), i = 1, ...,N (1)
Note
I The intractable task of computing marginals has been replaced bythe problem of solving a set of nonlinear equations.
I These MF equations are run sequentially, i.e. we fix all mj except mi .
I In each step MF free energy is convex. Equation (1) finds minimumin one step.
I This procedure can be interpreted as coordinate descent in the mi
I Alternatively, all parameters mi can be updated in parallel.
I Doesn’t guarantee of decreasing the cost function at each iteration.I There might be many solutions to (1).
I Some of the solutions may not be local minima
22/101
hey
25
![Page 26: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/26.jpg)
Mean Field
mi = tanh(∑j
Jijmj + θi ), i = 1, ...,N (1)
Note
I The intractable task of computing marginals has been replaced bythe problem of solving a set of nonlinear equations.
I These MF equations are run sequentially, i.e. we fix all mj except mi .
I In each step MF free energy is convex. Equation (1) finds minimumin one step.
I This procedure can be interpreted as coordinate descent in the mi
I Alternatively, all parameters mi can be updated in parallel.
I Doesn’t guarantee of decreasing the cost function at each iteration.I There might be many solutions to (1).
I Some of the solutions may not be local minima
22/101
hey
26
![Page 27: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/27.jpg)
Mean Field
mi = tanh(∑j
Jijmj + θi ), i = 1, ...,N (1)
Note
I The intractable task of computing marginals has been replaced bythe problem of solving a set of nonlinear equations.
I These MF equations are run sequentially, i.e. we fix all mj except mi .
I In each step MF free energy is convex. Equation (1) finds minimumin one step.
I This procedure can be interpreted as coordinate descent in the mi
I Alternatively, all parameters mi can be updated in parallel.
I Doesn’t guarantee of decreasing the cost function at each iteration.I There might be many solutions to (1).
I Some of the solutions may not be local minima
22/101
hey
27
![Page 28: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/28.jpg)
Mean Field
I In d-dimensional Ising model without theexternal magnetic field (θ = 0) and havingthe same interaction Jij = α
m(t+1) = tanh(2dαm(t))
I For α < 12d , the iteration converges to lim
t→∞m(t) = 0 (left figure)
I For α > 12d , if m(0) ≶ 0⇒ lim
t→∞m(t) = ∓m∗
[4]A. Montanari, Lecture notes for inference in graphical models,201123/101
hey
28
![Page 29: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/29.jpg)
Mean Field
I MF neglects the dependency between the random variables.
However,
I We get an upper bound on the exact free energy.
KL(QMF ||P) = E (QMF )− S(QMF )︸ ︷︷ ︸=F [QMF ] Variational MF energy
− (− log(Z ))︸ ︷︷ ︸Exact free energy
Since KL(QMF ||P) ≥ 0
F (QMF ) ≥ − log(Z )
24/101
hey
29
![Page 30: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/30.jpg)
Mean Field Method in general
I P(x) = 1Z
∏a∈F
fa(xa) is True distribution
I Q(x) =∏i
qi (xi ) is Approximate distribution
FMF (Q) =∑i
S(qi ) +∑a∈F
∑xa
∏xi∈N(a)
qi (xi ) log fa(xa)
I We passed from (|X |n − 1) to n(|X | − 1)
I FMF is no longer convex.
minQ
FMF (Q) subject to∑xi
qi (xi ) = 1
25/101
hey
30
![Page 31: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/31.jpg)
Mean Field Method in general
I Add Lagrange multiplier λi
I Find the stationary condition by ∂L(Q,λ)∂qi (xi )
= 0
qi (xi ) ∝∏
a∈N(i)
ma→i (xi )
where
ma→i (xi ) = exp
( ∑xj :j∈N(a)\i
log fa(xa)∏
j∈N(a)\i
qj(xj)
)
I A simple greedy algorithm for finding a stationary point consists inupdating the q by iterating the above equations until convergence.
26/101
hey
31
![Page 32: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/32.jpg)
Outline
I Basics of graphical model
I Basics of message passing algorithm
I Variational free energy
I Mean field approximation
I TAP ( Thouless, Anderson and Palmer )
I Region Based approximation
I Bethe free energy
I Kikuchi approximation
27/101
hey
32
![Page 33: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/33.jpg)
TAP approximation
The Legendre Transform and Plefka’s Expansion
28/101
hey
33
![Page 34: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/34.jpg)
Plefka Expansion
I Don’t restrict the approximate distribution Q to be productdistributions
I Minimize free energy in two steps:
I Constrained minimization in the family of distributions satisfying〈X〉Q = m for fixed m
G(m) = minQ{F [Q] = E [Q]− S [Q] |〈X〉Q = m}
I Minimize G(m) with respect to m
29/101
hey
34
![Page 35: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/35.jpg)
Plefka Expansion
G (m) = minQ{F [Q] | 〈X〉Q = m}
By adding Lagrange multiplier λThen Lagrangian
G (m, λ) = E [Q]− S [Q]−∑i
λi (〈xi 〉Q −mi )
G (m, λ) =∑X
Q(X)H[X]− S [Q]−∑x
∑i
λixiQ(X) +∑i
λimi
is the form of variational free energy, where H[X] is replaced byH[X]−
∑i
λixi . We can construct such a gibbs free energy by adding a
set of external auxiliary field.
⇒ Qλ(X) = 1Z e−H[X]+
∑i
λixi
30/101
hey
35
![Page 36: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/36.jpg)
Plefka Expansion
The dual function is,
G (mi ) = maxλi
{∑i
λimi − log(Z (λi ))}
I This equation known as Legendre transform between {λi} and {mi}.
I Z (λi ) is the normalizing constant for the Gibbs distribution
Qλ(X) =1
Zλi
e−H[X]+
∑i
λixi=
1
Zλi
e−
∑i,j
Jijxixj−∑i
θixi+∑i
λixi
I Set θ → 0 by shifting the Lagrange multiplier λi → λi − θi
I Z (λi ) =∑xi
exp(−∑i,j
Jijxixj +∑i
λixi )
31/101
hey
36
![Page 37: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/37.jpg)
Plefka Expansion
G (mi ) = maxλi
{∑i
λimi − log(∑xi
exp(−∑i,j
βJijxixj +∑i
λixi ))}
I Plefka expansion is derived by Jij → βJij , by Taylor expanding theGibbs free energy around β = 0, where β is an inverse temperaturein physics,
Notice
I For each term in Taylor expansion, one has to expand the Lagrangemultiplier λi which maximize the Gibbs distribution as well as log(Z )
I The auxiliary field is temperature dependent.
32/101
hey
37
![Page 38: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/38.jpg)
Plefka Expansion
I with Gn = ∂n
∂βn G (m)|β=0
G (m) = G0(m) + βG1(m) +β2
2!G2(m) + ...
I G0(m) =∑i
{1+mi
2 ln 1+mi
2 + 1−mi
2 ln 1−mi
2
}Spins are entirely
controlled by the auxiliary field.
I G1(m) = −∑i<j
Jijmimj
I G2(m) = − 12
∑ij
J2ij (1−m2
i )(1−m2j )
I ...
33/101
hey
38
![Page 39: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/39.jpg)
Plefka Expansion
I with Gn = ∂n
∂βn G (m)|β=0
G (m) = G0(m) + βG1(m) +β2
2!G2(m) + ...
I G0(m) =∑i
{1+mi
2 ln 1+mi
2 + 1−mi
2 ln 1−mi
2
}Spins are entirely
controlled by the auxiliary field.
I G1(m) = −∑i<j
Jijmimj
I G2(m) = − 12
∑ij
J2ij (1−m2
i )(1−m2j )
I ...
33/101
hey
39
![Page 40: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/40.jpg)
Plefka Expansion
I with Gn = ∂n
∂βn G (m)|β=0
G (m) = G0(m) + βG1(m) +β2
2!G2(m) + ...
I G0(m) =∑i
{1+mi
2 ln 1+mi
2 + 1−mi
2 ln 1−mi
2
}Spins are entirely
controlled by the auxiliary field.
I G1(m) = −∑i<j
Jijmimj
I G2(m) = − 12
∑ij
J2ij (1−m2
i )(1−m2j )
I ...
33/101
hey
40
![Page 41: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/41.jpg)
Plefka Expansion
I with Gn = ∂n
∂βn G (m)|β=0
G (m) = G0(m) + βG1(m) +β2
2!G2(m) + ...
I G0(m) =∑i
{1+mi
2 ln 1+mi
2 + 1−mi
2 ln 1−mi
2
}Spins are entirely
controlled by the auxiliary field.
I G1(m) = −∑i<j
Jijmimj
I G2(m) = − 12
∑ij
J2ij (1−m2
i )(1−m2j )
I ...
33/101
hey
41
![Page 42: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/42.jpg)
Plefka Expansion
with Gn = ∂n
∂βn G (m)|β=0
G (m) = G0(m) + βG1(m) +β2
2!G2(m) + ...
I G0 =∑i
{1+mi
2 ln 1+mi
2 + 1−mi
2 ln 1−mi
2
}⇒ MF variational entropy
I G1(m) = −∑i<j
Jijmimj ⇒ MF variational energy
I G2(m) = − 12
∑ij
J2ij (1−m2
i )(1−m2j )
I ...⇒ Takes into account the higher order dependencies
34/101
hey
42
![Page 43: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/43.jpg)
TAP approximation
TAP approximation= Minimizing G (m) for β = 1 and keeping only termsup to second order
GTAP(mi ) =−∑(ij)
Jijmimj +∑i
{1 + mi
2ln
1 + mi
2+
1−mi
2ln
1−mi
2
}− 1/2
∑(ij)
J2ij (1−m2
i )(1−m2j )
︸ ︷︷ ︸dependencies between rvs
I TAP takes in to account the dependencies between random variables.
I It’s exact in the high temperature for certain classes of models (SKmodels).
35/101
hey
43
![Page 44: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/44.jpg)
TAP approximation
Fixed points of TAP approximation:
mi = tanh( ∑
j∈N(i)
Jijmj +1
2(1− 2mi )
∑j∈N(i)
J2ijmj(1−mj)
)
I Running these equations doesn’t guarantee that TAP-Gibbs freeenergy decreases. (mi appears on both sides)
I There is danger that radius of convergence (of taylor expansion) willbe too small to obtain result for values of β we are interested in.
36/101
hey
44
![Page 45: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/45.jpg)
Outline
I Standard BP algorithm
I Junction tree algorithm
I Region Based free energyI Different types of region graph
I Special case: Bethe free energy
I Stationary points of Bethe free energy = BP Fixed points
I Generalized belief propagation (GBP)I Stationary points of Region based free approximation
37/101
hey
45
![Page 46: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/46.jpg)
Outline
I Standard BP algorithm
I Junction tree algorithm
I Region Based free energyI Different types of region graph
I Special case: Bethe free energy
I Stationary points of Bethe free energy = BP Fixed points
I Generalized belief propagation (GBP)I Stationary points of Region based free approximation
38/101
hey
46
![Page 47: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/47.jpg)
Message Passing - Computing the marginals
p(x1, x2, x3, x4) = fA(x1, x2)fB(x2, x3, x4)fC (x4)
b1(x1) = p(x1) =?
39/101
hey
47
![Page 48: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/48.jpg)
Message Passing
I b1(x1) = mA→1(x1)
I
I
I
40/101
hey
48
![Page 49: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/49.jpg)
Message Passing
I b1(x1) = mA→1(x1)
I b1(x1) =∑x2
fA(x1, x2)m2→A(x2)
I
I
41/101
hey
49
![Page 50: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/50.jpg)
Message Passing
I b1(x1) = mA→1(x1)
I b1(x1) =∑x2
fA(x1, x2)m2→A(x2)
I b1(x1) =∑x2
fA(x1, x2)mB→2(x2)
I
42/101
hey
50
![Page 51: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/51.jpg)
Message Passing
I b1(x1) = mA→1(x1)
I b1(x1) =∑x2
fA(x1, x2)mB→2(x2)
I b1(x1) =∑x2,x3,x4
fA(x1, x2)fB(x2, x3, x4)m3→Bm4→B(x2)
43/101
hey
51
![Page 52: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/52.jpg)
Message Passing
I b1(x1) = mA→1(x1)
I b1(x1) =∑x2
fA(x1, x2)mB→2(x2)
I b1(x1) =∑x2,x3,x4
fA(x1, x2)fB(x2, x3, x4)m4→B(x2)
44/101
hey
52
![Page 53: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/53.jpg)
Message Passing
I b1(x1) = mA→1(x1)
I b1(x1) =∑x2
fA(x1, x2)mB→2(x2)
I b1(x1) =∑x2,x3,x4
fA(x1, x2)fB(x2, x3, x4)m4→B(x2)
45/101
hey
53
![Page 54: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/54.jpg)
Message Passing
I b1(x1) = mA→1(x1)
I b1(x1) =∑x2
fA(x1, x2)mB→2(x2)
I b1(x1) =∑x2,x3,x4
fA(x1, x2)fB(x2, x3, x4)mC→4(x4)
46/101
hey
54
![Page 55: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/55.jpg)
Message Passing
I b1(x1) = mA→1(x1)
I b1(x1) =∑x2
fA(x1, x2)mB→2(x2)
I b1(x1) =∑x2,x3,x4
fA(x1, x2)fB(x2, x3, x4)fC (x4)
47/101
hey
55
![Page 56: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/56.jpg)
Outline
I Standard BP algorithm
I Junction tree algorithm
I Region Based free energyI Different types of region graph
I Special case: Bethe free energy
I Stationary points of Bethe free energy = BP Fixed points
I Generalized belief propagation (GBP)I Stationary points of Region based free approximation
48/101
hey
56
![Page 57: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/57.jpg)
Junction Tree algorithm
I Works for general graphI Tree shape graphs
I Graphs with cycles
I Directed graphs
I Undirected graphs
I Remove cycles by clustering nodes into cliques.
I Perform Belief Propagation on cliques.
I Exact inference of (clique) marginals.
49/101
hey
57
![Page 58: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/58.jpg)
Junction Tree algorithm - Moralization
I we first moralize the graph by connecting all unconnected parents.After this we make the graph an undirected graph
50/101
hey
58
![Page 59: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/59.jpg)
Junction Tree algorithm- Triangulation
I Triangulation i.e. for any given cycle there is an edge between anytwo non-successive nodes in the cycle
51/101
hey
59
![Page 60: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/60.jpg)
Junction Tree algorithm
ψC1(xA, xB) = ψA,B(xA, xB)
52/101
hey
60
![Page 61: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/61.jpg)
Junction Tree algorithm
ψC2(xB , xC , xF ) = ψB,C (xB , xC )ψC ,F (xC , xF )
53/101
hey
61
![Page 62: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/62.jpg)
Junction Tree algorithm
ψC3(xC , xF , xG ) = ψC ,F (xC , xF )ψF ,G (xF , xG )
54/101
hey
62
![Page 63: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/63.jpg)
Junction Tree algorithm
ψC4(xC , xD , xG , xH) =
ψC ,D,H(xC , xD , xH)ψD,G ,H(xD , xG , xH)
55/101
hey
63
![Page 64: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/64.jpg)
Junction Tree algorithm
ψC5(xC , xE , xH) = ψC ,E ,H(xC , xE , xH)
56/101
hey
64
![Page 65: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/65.jpg)
Independence in junction tree
I supposeI T is a junction tree for graph G .
I Consider cliques Ci and Cj with separator Sij = Ci ∩ Cj
I Variables X and Y are on opposite site of separator.
I X and Y are independent given Sij
57/101
hey
65
![Page 66: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/66.jpg)
Junction Tree algorithm
Given junction tree and potentials on the cliques, the messages fromclique Ci to Cj is
mij(xSij ) =∑Ci\Sij
ψCi (xCi )∏
k∈N(i)\j
mki (xSki)
I Sij : nodes shared by i and j
I N(i): neighboring cliques of i
I The marginal distribution of any cliquesare
p(xCi ) = ψCi
∏k∈N(i)
mki (xSki)
p(xSij ) = mijmji
58/101
hey
66
![Page 67: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/67.jpg)
Junction Tree algorithm
I m12(xB) =∑xA
ψC1(xA, xB)
I m23(xC , xF ) =∑xB
ψC2(xB , xC , xF )m12(xB)
I m34(xC , xG ) =∑xF
ψC3(xC , xF , xG )m23(xC , xF )
I m45(xC , xH) =∑xD ,xG
ψC4(xC , xD , xG , xH)m34(xC , xG )
59/101
hey
67
![Page 68: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/68.jpg)
Outline
I Standard BP algorithm
I Junction tree algorithm
I Region Based free energyI Different types of region graph
I Special case: Bethe free energy
I Stationary points of Bethe free energy = BP Fixed points
I Generalized belief propagation (GBP)I Stationary points of Region based free approximation
60/101
hey
68
![Page 69: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/69.jpg)
Variational free energy
To find the best approximate to P = 1Z
∏c∈cliques
φc(xc)
KL(Q||P) =∑X
Q(x) ln Q(x)−∑x
Q(x) ln p(x)
where
I U[Q] = −∑x
Q(x) ln Q(x) is the entropy of Q
I H[Q] = −∑
c∈cliques
∑xc
Q(xc) log φc(xc) is called average energy
=⇒ minQ
KL(Q||P) = ln Z + minQ
(U[Q]− H[Q])︸ ︷︷ ︸Variational free energy
61/101
hey
69
![Page 70: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/70.jpg)
Variational Free energy
I Two solution methods to
minQ
F [Q]
I Approximate F[Q]
I Region Based approximation =⇒ FR(qR)
I Choose a simpler form of Q
I Mean Field Approximation =⇒ Q =∏
qi
62/101
hey
70
![Page 71: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/71.jpg)
Region Based free energy
I We decompose the system into subsystems and then approximatethe free energy by combining the free energies of the subsystems
I Group nodes in to (possibly overlapping) clusters.
I In each region, all variable nodes connected to any included factornodes are included.
I The sets of nodes {1, 2},{B,C , 2, 3, 4} could be regions.
I {B, 3} could not be a region.
63/101
hey
71
![Page 72: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/72.jpg)
Region Based free energy
I The overall energy is the sum of the free energies of all the regions.
I If some of the large regions overlap, subtract out the free energies ofthese overlap region.
I Each factor and variable node should be counted exactly once.
I For every factor node a and every variable node i in a set of regionsR, the counting number is∑
R∈R
cRI(a ∈ FR) =∑R∈R
cRI(i ∈ VR) = 1
where I(x ∈ S) = 1 if x ∈ S
64/101
hey
72
![Page 73: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/73.jpg)
Region Based free energy
I The overall energy is the sum of the free energies of all the regions.
I If some of the large regions overlap, subtract out the free energies ofthese overlap region.
I Each factor and variable node should be counted exactly once.
I For every factor node a and every variable node i in a set of regionsR, the counting number is∑
R∈R
cRI(a ∈ FR) =∑R∈R
cRI(i ∈ VR) = 1
where I(x ∈ S) = 1 if x ∈ S
64/101
hey
73
![Page 74: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/74.jpg)
Region Based free energy
I The overall energy is the sum of the free energies of all the regions.
I If some of the large regions overlap, subtract out the free energies ofthese overlap region.
I Each factor and variable node should be counted exactly once.
I For every factor node a and every variable node i in a set of regionsR, the counting number is∑
R∈R
cRI(a ∈ FR) =∑R∈R
cRI(i ∈ VR) = 1
where I(x ∈ S) = 1 if x ∈ S
64/101
hey
74
![Page 75: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/75.jpg)
Region Based free energy
I The overall energy is the sum of the free energies of all the regions.
I If some of the large regions overlap, subtract out the free energies ofthese overlap region.
I Each factor and variable node should be counted exactly once.
I For every factor node a and every variable node i in a set of regionsR, the counting number is∑
R∈R
cRI(a ∈ FR) =∑R∈R
cRI(i ∈ VR) = 1
where I(x ∈ S) = 1 if x ∈ S
64/101
hey
75
![Page 76: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/76.jpg)
Region Based free energy
I Region base free energy for a set of region R is
FR(bR) = UR(bR)− HR(bR)
I Count every node once.
I UR(bR) =∑
R∈RcRUR(bR) =⇒ region based average energy
I HR(bR) =∑
R∈RcRHR(bR) =⇒ region based approximate entropy
65/101
hey
76
![Page 77: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/77.jpg)
Region Based free energy
if ∑R∈R
cRI(i ∈ FR) = 1for all a ∈ F
andbR(xR) = pR(xR)
=⇒ The average energy becomes exact.
UR(bR) =∑R∈R
cRUR(bR) = −∑R∈R
cR∑xR
bR(xR)∑a∈FR
ln fa(xa)
Exact energy⇒U =∑x∈S
p(x)E (x) = −∑a
∑xa
pa(xa) ln fa(xa)
66/101
hey
77
![Page 78: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/78.jpg)
Region Based free energy
I Counting each variable node and factor node exactly once, results inexactness of the average energy.
I However, the region based entropy is still an approximation.
HR(bR) =∑R∈R
cRHR(bR) = −∑R∈R
cR∑xR
bR(xR) ln bR(xR)
I We are interested in the accuracy of HR(bR) near its maximum.
minbR
FR(bR) = minbR{UR(bR)− HR(bR)}
I HR(bR) should achieve its maximum when all beliefs bR(xR) areuniform. (Maxent normal )
67/101
hey
78
![Page 79: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/79.jpg)
Outline
I Standard BP algorithm
I Junction tree algorithm
I Region Based free energyI Different types of region graph
I Special case: Bethe free energy
I Stationary points of Bethe free energy = BP Fixed points
I Generalized belief propagation (GBP)I Stationary points of Region based free approximation
68/101
hey
79
![Page 80: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/80.jpg)
Bethe Free energy
Regions are R = {Ri ,Ra, i ∈ V , a ∈ F}I Ri = ({i}, 0, 0)
I Ra = ({N (a)}, {a}, {(i , a) : i ∈ N (a)})
I Large regions containing a single factornode a and all attached variable nodes.cr = 1
I Small regions containing a single variablenode cr = 1− di where di = |N (i)|
I R1 is subregion of R2 if R1 ⊂ R2
69/101
hey
80
![Page 81: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/81.jpg)
Bethe Free energy
I Bethe region graph for thefollowing factor graph
70/101
hey
81
![Page 82: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/82.jpg)
Bethe Free energy
I Bethe region graph for thefollowing factor graph
71/101
hey
82
![Page 83: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/83.jpg)
Bethe Free energy
I Bethe region graph for thefollowing factor graph
72/101
hey
83
![Page 84: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/84.jpg)
Bethe Free energy
cr = 1 for r ∈ Ra
cr = 1− di for r ∈ Ri
73/101
hey
84
![Page 85: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/85.jpg)
Bethe Free energy
I Assigning counting number to the regions.
74/101
hey
85
![Page 86: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/86.jpg)
Bethe Free energy
I Every variable node and factor node is counted once.
75/101
hey
86
![Page 87: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/87.jpg)
Bethe Free energy
I Bethe free energy:
FBethe = UBethe − HBethe
I Bethe average energy:
UBethe = −∑a
∑xa
ba(xa) ln fa(xa)
I Bethe entropy:
HBethe =−∑a
∑xa
ba(xa) ln ba(xa)
+∑i
(di − 1)∑xi
bi (xi ) ln bi (xi )
76/101
hey
87
![Page 88: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/88.jpg)
Bethe Free energy - Maxent normal
I Global maximum of Bethe entropy is achieved when the beliefsbi (xi ), ba(xa) are uniform.
HBethe =∑i
H(bi )−∑a
I (ba)
whereH(bi ) = −
∑xa
bi (xi ) ln bi (xi )
I (ba) = −(∑
xa
ba(xa) ln ba(xa)−∑
i∈N(a)
H(bi ))
I Maximum of H(bi ) achieved when bi (xi ) has uniform dist.
I I (ba) ≥ 0→ when the beliefs are uniform, I (ba) = 0
77/101
hey
88
![Page 89: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/89.jpg)
Constrained Bethe free energy
Constrained Bethe free energy enforces the beliefs to obey:
I The normalization constrains:∑xi
bi (xi ) = 1
∑xa
ba(xa) = 1
I Consistency constraints ∑xa\xi
ba(xa) = bi (xi )
I Inactive Constraint ⇒ Complementary slackness
0 ≤ bi (xi ) ≤ 1
0 ≤ ba(xa) ≤ 178/101
hey
89
![Page 90: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/90.jpg)
Minimizing Constrained Bethe free energy
Theorem:Stationary points of the constrained Bethe free energy are BP fixedpoints.
minimizeb
FBethe
subject to∑xi
bi (xi ) = 1∑xa
ba(xa) = 1∑xa\xi
ba(xa) = bi (xi )
ba(xa), bi (xi ) ≥ 0
79/101
hey
90
![Page 91: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/91.jpg)
Minimizing Constrained Bethe free energy
I Lagrangian:
L = FBethe +∑i
γi
{∑xi
bi (xi )− 1
}
+∑a
∑i∈N(a)
∑xi
λai (xi )
{∑xa\xi
ba(xa)− bi (xi )
}
I ∂L∂bi (xi )
= 0 =⇒ bi (xi ) = exp
(1
di−1{1− γi +∑
a∈N(i)
λai (xi )}
)
I ∂L∂ba(xa)
= 0 =⇒ ba(xa) = exp
(− Ea(xa) +
∑a∈N(i)
λai (xi )
)
80/101
hey
91
![Page 92: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/92.jpg)
Minimizing Constrained Bethe free energy
I Lagrangian:
L = FBethe +∑i
γi
{∑xi
bi (xi )− 1
}
+∑a
∑i∈N(a)
∑xi
λai (xi )
{∑xa\xi
ba(xa)− bi (xi )
}
I ∂L∂bi (xi )
= 0 =⇒ bi (xi ) = exp
(1
di−1{1− γi +∑
a∈N(i)
λai (xi )}
)
I ∂L∂ba(xa)
= 0 =⇒ ba(xa) = exp
(− Ea(xa) +
∑a∈N(i)
λai (xi )
)
80/101
hey
92
![Page 93: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/93.jpg)
Minimizing Constrained Bethe free energy
I Lagrangian:
L = FBethe +∑i
γi
{∑xi
bi (xi )− 1
}
+∑a
∑i∈N(a)
∑xi
λai (xi )
{∑xa\xi
ba(xa)− bi (xi )
}
I ∂L∂bi (xi )
= 0 =⇒ bi (xi ) = exp
(1
di−1{1− γi +∑
a∈N(i)
λai (xi )}
)
I ∂L∂ba(xa)
= 0 =⇒ ba(xa) = exp
(− Ea(xa) +
∑a∈N(i)
λai (xi )
)
80/101
hey
93
![Page 94: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/94.jpg)
Bethe Fixed points
Define
λai (xi ) = ln∏
b∈N(i)\a
mb→i (xi )
Obtain BP equations:
bi (xi ) ∝∏
a∈N(i)
ma→i (xi )
ba(xa) ∝ fa(xa)∏
i∈N(a)
∏b∈N(i)\a
mb→i (xi )
81/101
hey
94
![Page 95: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/95.jpg)
Unrealizable beliefs
I bA(x1, x2) =
(0.4 0.10.1 0.4
)
I bB(x2, x3) =
(0.4 0.10.1 0.4
)
I bC (x1, x3) =
(0.1 0.40.4 0.1
)I b1(x1) = b2(x2) = b3(x3) =
(0.50.5
)
I There is no b(x1, x2, x3)!
82/101
hey
95
![Page 96: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/96.jpg)
Unrealizable beliefs
I bA(x1, x2) =
(0.4 0.10.1 0.4
)
I bB(x2, x3) =
(0.4 0.10.1 0.4
)
I bC (x1, x3) =
(0.1 0.40.4 0.1
)I b1(x1) = b2(x2) = b3(x3) =
(0.50.5
)I There is no b(x1, x2, x3)!
82/101
hey
96
![Page 97: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/97.jpg)
Region based energy
I How to select a set of regions R and and counting number cR?
I Some methods are:I Bethe method
I Junction Graph method
I Cluster variation method
I Region Graph method
83/101
hey
97
![Page 98: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/98.jpg)
Region Graph
I Region graph is a directed acyclic graph, R → R ′ ⇒ R ′ ⊆ R.
I If there is a directed path between R and R ′, we say R is ancestor ofR ′ , R ∈ A(R ′) and R ′ is a descendant of R, R ′ ∈ D(R)
I In in a region graph these set of conditions satisfied,
cR = 1−∑
R′∈A(R)
c ′R for all R ∈ R
84/101
hey
98
![Page 99: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/99.jpg)
Region Graph Condition
I Every nodes is counted once:∑R∈R
cRI(a ∈ FR) =∑R∈R
cRI(i ∈ VR) = 1
⇒ ensures that the region graph average energy is exact
I Regions containing a particular variable node, form a connectedsubgraph⇒ Marginal probability is consistent.
85/101
hey
99
![Page 100: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/100.jpg)
Example of not valid region graph
I This is not a valid region graph. Variable 5 is not counted once.
86/101
hey
100
![Page 101: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/101.jpg)
Example of valid region graph
I Bethe region graph for thefollowing factor graph
87/101
hey
101
![Page 102: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/102.jpg)
Example of valid region graph
I Bethe region graph for thefollowing factor graph
88/101
hey
102
![Page 103: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/103.jpg)
Example of valid region graph
I Bethe region graph for thefollowing factor graph
89/101
hey
103
![Page 104: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/104.jpg)
Region graph
cR = 1−∑
R′∈A(R)
c ′R for all R ∈ R
90/101
hey
104
![Page 105: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/105.jpg)
Region graph
I Valid region graph (every node is counted once)
91/101
hey
105
![Page 106: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/106.jpg)
Generalized Belief Propagation
I Theorem: The stationary points of the constrained region-basedfree energy for a valid region graph, are the fixed points ofGeneralized belief propagation” for that region.
Stationary point of FR({bR}) =∑R∈R
cRFR(bR)
subject to∑xR
bR(xR) = 1 forall R ∈ R∑xP\xC
bP(xP) = bC (xC ) Parent, Child regions ∈ R
bR(xR) ≥ 0
92/101
hey
106
![Page 107: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/107.jpg)
Generalized Belief Propagation
I Belief in a region is product of:
I Local information (factors in region)
I Messages from parent regions
I Messages into descendant regions from parents who ware notdescendant.
I Message update rules obtained by enforcing marginalizationconstraints.
93/101
hey
107
![Page 108: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/108.jpg)
Generalized Belief Propagation
Belief in a region is:
bR(xR) ∝∏a∈AR
fa(xa)×
( ∏P∈P(R)
mP→R(xR)
)︸ ︷︷ ︸Messages from parent regions
×
( ∏D∈D(R)
∏P′∈P(D)\ε(R)
mP′→D(xD)
)︸ ︷︷ ︸
messages into descendant regions from parents who ware not descendant
94/101
hey
108
![Page 109: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/109.jpg)
Generalized Belief Propagation
I Bethe region graph for thefollowing graph
[2]J.S. Yedidia, Construction free energy approximation, 2005
95/101
hey
109
![Page 110: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/110.jpg)
Generalized Belief propagation
96/101
hey
110
![Page 111: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/111.jpg)
Generalized Belief propagation
97/101
hey
111
![Page 112: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/112.jpg)
Generalized Belief propagation
98/101
hey
112
![Page 113: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/113.jpg)
Generalized Belief propagation
Use marginalization constraints to derive message-update rules
99/101
hey
113
![Page 114: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/114.jpg)
Generalized Belief propagation
Use marginalization constraints to derive message-update rules
100/101
hey
114
![Page 115: Free Energy Approximation - Drexel University College of ...Outline I Basics of graphical model I Basics of message passing algorithm I Variational free energy I Mean eld approximation](https://reader034.vdocument.in/reader034/viewer/2022042104/5e81e25fb2e2d235506f4ac0/html5/thumbnails/115.jpg)
Thanks
Questions?
101/101
hey
115