should we believe in numbers computed by loopy belief...

Should We Believe

in Numbers Computed

by Loopy Belief Propagation?Pascal O. VontobelInformation Theory Research GroupHewlett-Packard Laboratories Palo Alto

CIOG Workshop, Princeton University, November 3, 2011

The information contained herein is subject to change without notice

Chess Board

Question: in how many ways can we place 8 non-attacking rooks

on a chess board?

Chess Board

Row condition: exactly one rook per row.

Chess Board

Column condition: exactly one rook per column.

Chess Board

on a chess board?

Chess Board

on this modified chess board?

Chess Board

on this modified chess board?

Sudoku

Question: how many Sudoku arrays are there?

(More technically: how many valid configurations are there?)

Sudoku

Row condition: numbers 1, . . . , 9 appear exactly once.

Sudoku

Column condition: numbers 1, . . . , 9 appear exactly once.

Sudoku

Sub-block condition: numbers 1, . . . , 9 appear exactly once.

Sudoku

Question: how many Sudoku arrays are there?

(More technically: how many valid configurations are there?)

Other Sudoku Setups

1D constraints in communications

RLL Constraints

0 0 0 0 0 0 0 01 1 1 1 1

A (d, k) RLL constraint imposes:

At least d zero symbols between two ones.

At most k zero symbols between two ones.

RLL Constraints

0 0 0 0 0 0 0 01 1 1 1 1

Question: how many sequences of length T fulfill these constraints?

RLL Constraints

0 0 0 0 0 0 0 01 1 1 1 1

Answer: typically, the answer to such questions looks like

N(T ) = exp(

C · T + o(T ))

RLL Constraints

0 0 0 0 0 0 0 01 1 1 1 1

N(T ) = exp(

C · T + o(T ))

C: “capacity” or “entropy.”

RLL Constraints

0 1 0 1 0 1 11 0 0 0 0 0 0 0 1

N(T ) = exp(

C · T + o(T ))

2D constraints in communications

Two-Dimensional RLL Constraints

A (d1, k; d2, k2) RLL constraint imposes:

Question: How many arrays of size m× n

fulfill these constraints?

Answer: Typically, the answer to such

questions looks like

N(m,n) = exp(

C ·mn+ o(mn))

Answer: Typically, the answer to such

questions looks like

N(m,n) = exp(

C ·mn+ o(mn))

Towards a graphical model

Towards a Graphical Model

on a chess board?

A8,8A8,1

A1,1 A1,2 A1,3 A1,8

grow,8

grow,8(a8,1, . . . , a8,8) ,

1 exactly one rook

0 otherwise

gcol,8

gcol,8(a1,8, . . . , a8,8) ,

1 exactly one rook

0 otherwise

gcol,8

gcol,2

gcol,1

...grow,8

grow,2

grow,1

gcol,8

gcol,2

gcol,1

...grow,8

grow,2

grow,1

gcol,8

gcol,2

gcol,1

...grow,8

grow,2

grow,1

gcol,8

gcol,2

gcol,1

...grow,8

grow,2

grow,1

gcol,8

gcol,2

gcol,1

...grow,8

grow,2

grow,1

gcol,8

gcol,2

gcol,1

Global function:

g(a1,1, . . . , a8,8)

gcol,j(a1,j, . . . , a8,j)×

grow,i(ai,1, . . . , ai,8)

...grow,8

grow,2

grow,1

gcol,8

gcol,2

gcol,1

Global function:

g(a1,1, . . . , a8,8)

gcol,j(a1,j, . . . , a8,j)×

grow,i(ai,1, . . . , ai,8)

Total sum:

Z =∑

a1,1,...,a8,8

g(a1,1, . . . , a8,8)

...grow,8

grow,2

grow,1

gcol,8

gcol,2

gcol,1

Global function:

g(a1,1, . . . , a8,8)

gcol,j(a1,j, . . . , a8,j)×

grow,i(ai,1, . . . , ai,8)

Total sum (partition function):

Z =∑

a1,1,...,a8,8

g(a1,1, . . . , a8,8)

...grow,8

grow,2

grow,1

gcol,8

gcol,2

gcol,1

Global function:

g(a1,1, . . . , a8,8)

gcol,j(a1,j, . . . , a8,j)×

grow,i(ai,1, . . . , ai,8)

Z =∑

a1,1,...,a8,8

g(a1,1, . . . , a8,8)

Use of loopy belief propagation

for approximating Z?

...grow,8

grow,2

grow,1

gcol,8

gcol,2

gcol,1

Global function:

g(a1,1, . . . , a8,8)

gcol,j(a1,j, . . . , a8,j)×

grow,i(ai,1, . . . , ai,8)

Z =∑

a1,1,...,a8,8

g(a1,1, . . . , a8,8)

...grow,8

grow,2

grow,1

gcol,8

gcol,2

gcol,1

Global function:

g(a1,1, . . . , a8,8)

gcol,j(a1,j, . . . , a8,j)×

grow,i(ai,1, . . . , ai,8)

Z =∑

a1,1,...,a8,8

g(a1,1, . . . , a8,8)

Some considerations

on counting algorithms

Coloring the Surfacesof a Closed Strip

44+22 = 3 2

· · ·�

4 2 4 2

The permanent of a matrix

Determinant vs. Permanentof a Matrix

Consider the matrix θ =

θ11 θ12 θ13

θ21 θ22 θ23

θ31 θ32 θ33

θ11 θ12 θ13

θ21 θ22 θ23

θ31 θ32 θ33

The determinant of θ:

det(θ) = +θ11θ22θ33 + θ12θ23θ31 + θ13θ21θ32

− θ11θ23θ32 − θ12θ21θ33 − θ13θ22θ31.

θ11 θ12 θ13

θ21 θ22 θ23

θ31 θ32 θ33

The determinant of θ:

det(θ) = +θ11θ22θ33 + θ12θ23θ31 + θ13θ21θ32

− θ11θ23θ32 − θ12θ21θ33 − θ13θ22θ31.

The permanent of θ:

perm(θ) = + θ11θ22θ33 + θ12θ23θ31 + θ13θ21θ32

+ θ11θ23θ32 + θ12θ21θ33 + θ13θ22θ31.

The determinant of an n× n-matrix θ

det(θ) =∑

sgn(σ)∏

i∈[n]

θi,σ(i).

where the sum is over all n! permutations of the set [n] , {1, . . . , n}.

det(θ) =∑

sgn(σ)∏

i∈[n]

θi,σ(i).

The permanent of an n× n-matrix θ:

perm(θ) =∑

i∈[n]

θi,σ(i).

det(θ) =∑

sgn(σ)∏

i∈[n]

θi,σ(i).

The permanent of an n× n-matrix θ:

perm(θ) =∑

i∈[n]

θi,σ(i).

The permanent turns up in a variety of context, especially in

combinatorial problems, statistical physics (partition function), . . .

Exactly Computing the Permanent

Brute-force computation:

O(n · n!) = O(

n3/2 · (n/e)n)

arithmetic operations.

O(n · n!) = O(

n3/2 · (n/e)n)

Ryser’s algorithm:

Θ(n · 2n) arithmetic operations.

O(n · n!) = O(

n3/2 · (n/e)n)

Ryser’s algorithm:

Θ(n · 2n) arithmetic operations.

Complexity class [Valiant, 1979]:

#P (“sharp P” or “number P”),

where #P is the set of the counting problems associated with the

decision problems in the set NP. (Note that even the computation

of the permanent of zero-one matrices is #P-complete.)

Estimating the Permanent

More efficient algorithms are possible if one does not want to compute

the permanent of a matrix exactly.

For a matrix that contains positive and negative entries:

→ “constructive and destructive interference of terms

in the summation.”

For a matrix that contains positive and negative entries:

→ “constructive and destructive interference of terms

in the summation.”

For a matrix that contains only non-negative entries:

→ “constructive interference of terms in the summation.”

FROM NOW ON: we focus on the case where all entries of the

matrix are non-negative, i.e.

θij ≥ 0 ∀i, j.

Markov chain Monte Carlo based methods: [Broder, 1986], . . .

θij ≥ 0 ∀i, j.

Godsil-Gutman formula based methods: [Karmarkar et al., 1993],

[Barvinok, 1997ff.], [Chien, Rasmussen, Sinclair, 2004], . . .

θij ≥ 0 ∀i, j.

Fully polynomial-time randomized approximation schemes

(FPRAS): [Jerrum, Sinclair, Vigoda, 2004], . . .

θij ≥ 0 ∀i, j.

Fully polynomial-time randomized approximation schemes

(FPRAS): [Jerrum, Sinclair, Vigoda, 2004], . . .

Bethe-approximation-based / sum-product-algorithm-based

methods: [Chertkov et al., 2008], [Huang and Jebara, 2009], . . .

Estimating the Permanent of a Matrix

From [Huang/Jebara, 2009].

Estimating the Permanent of a Matrix

From [Huang/Jebara, 2009].

Valid Rook Configurationsand Permanents

Number of valid rook configurations

= perm

1 1 1 1 1 1 1 1

i∈[8]

θi,σ(i)

= perm

1 1 1 1 1 1 1 1

i∈[8]

θi,σ(i)

= perm

1 1 1 1 1 1 1 1

1 1 1 0 0 1 1 1

1 1 1 1 1 1 1 1

i∈[8]

θi,σ(i)

= perm

1 0 1 1 0 1 1 1

1 1 1 0 0 0 1 0

1 1 0 1 1 0 1 1

0 0 1 1 0 1 0 1

0 1 0 1 1 1 0 1

1 1 1 0 1 0 0 1

1 1 1 1 1 0 1 1

1 1 0 1 1 1 1 1

i∈[8]

θi,σ(i)

Perfect Matchings and PermanentsNumber of valid perfect matchings

= perm

1 1 1 1 1 1 1 1

i∈[8]

θi,σ(i)

= perm

1 1 1 1 1 1 1 1

i∈[8]

θi,σ(i)

= perm

1 1 1 1 1 1 1 1

1 1 1 0 0 1 1 1

1 1 1 1 1 1 1 1

i∈[8]

θi,σ(i)

= perm

1 0 1 1 0 1 1 1

1 1 1 0 0 0 1 0

1 1 0 1 1 0 1 1

0 0 1 1 0 1 0 1

0 1 0 1 1 1 0 1

1 1 1 0 1 0 0 1

1 1 1 1 1 0 1 1

1 1 0 1 1 1 1 1

i∈[8]

θi,σ(i)

= perm

1 1 1 1 1 1 1 1

i∈[8]

θi,σ(i)

Perfect Matchings and Permanents

θi,ji jTotal sum of weighted perf. matchings

= perm

θ11 θ12 θ13 θ14 θ15 θ16 θ17 θ18

θ21 θ22 θ23 θ24 θ25 θ26 θ27 θ28

θ31 θ32 θ33 θ34 θ35 θ36 θ37 θ38

θ41 θ42 θ43 θ44 θ45 θ46 θ47 θ48

θ51 θ52 θ53 θ54 θ55 θ56 θ57 θ58

θ61 θ62 θ63 θ64 θ65 θ66 θ67 θ68

θ71 θ72 θ73 θ74 θ75 θ76 θ77 θ78

θ81 θ82 θ83 θ84 θ85 θ86 θ87 θ88

i∈[8]

θi,σ(i)

Perfect Matchings and Permanents

θi,ji jTotal sum of weighted perf. matchings

= perm

θ11 θ12 θ13 θ14 θ15 θ16 θ17 θ18

θ21 θ22 θ23 θ24 θ25 θ26 θ27 θ28

θ31 θ32 θ33 θ34 θ35 θ36 θ37 θ38

θ41 θ42 θ43 θ44 θ45 θ46 θ47 θ48

θ51 θ52 θ53 θ54 θ55 θ56 θ57 θ58

θ61 θ62 θ63 θ64 θ65 θ66 θ67 θ68

θ71 θ72 θ73 θ74 θ75 θ76 θ77 θ78

θ81 θ82 θ83 θ84 θ85 θ86 θ87 θ88

i∈[8]

θi,σ(i)

Graphical Model for Permanent

...grow,8

grow,2

grow,1

gcol,8

gcol,2

gcol,1

(function nodes are suitably defined based on θ)

Global function:

g(a1,1, . . . , a8,8)

gcol,j(a1,j, . . . , a8,j)×

grow,i(ai,1, . . . , ai,8)

Permanent:

perm(θ) = Z =∑

a1,1,...,a8,8

g(a1,1, . . . , a8,8)

grow,1

grow,2

grow,3

grow,4

grow,5

grow,6

grow,7

grow,8

gcol,1

gcol,2

gcol,3

gcol,4

gcol,5

gcol,6

gcol,7

gcol,8

(variable nodes have been omitted)

Global function:

g(a1,1, . . . , a8,8)

gcol,j(a1,j, . . . , a8,j)×

grow,i(ai,1, . . . , ai,8)

Permanent:

perm(θ) = Z =∑

a1,1,...,a8,8

g(a1,1, . . . , a8,8)

grow,1

grow,2

grow,3

grow,4

grow,5

grow,6

grow,7

grow,8

gcol,1

gcol,2

gcol,3

gcol,4

gcol,5

gcol,6

gcol,7

gcol,8

Many short cycles.

The vertex degrees are high.

grow,1

grow,2

grow,3

grow,4

grow,5

grow,6

grow,7

grow,8

gcol,1

gcol,2

gcol,3

gcol,4

gcol,5

gcol,6

gcol,7

gcol,8

Many short cycles.

Both facts might suggest that the

application of the sum-product algo-

rithm to this factor graph is rather

problematic.

grow,1

grow,2

grow,3

grow,4

grow,5

grow,6

grow,7

grow,8

gcol,1

gcol,2

gcol,3

gcol,4

gcol,5

gcol,6

gcol,7

gcol,8

Many short cycles.

problematic.

However, luckily this is not the

grow,1

grow,2

grow,3

grow,4

grow,5

grow,6

grow,7

grow,8

gcol,1

gcol,2

gcol,3

gcol,4

gcol,5

gcol,6

gcol,7

gcol,8

Many short cycles.

problematic.

However, luckily this is not the

For an SPA suitability assessment,

the overall cycle structure and the

types of functions nodes are at least

as important.

Factor graphs and the

sum-product algorithm

Factor Graphs

A factor graph can be

used to represent a

multivariate function:

f(x1, x2, x3)

Factor Graphs

used to represent a

f(x1, x2, x3)

Variable nodes: for each variable we draw

a variable node (empty circles).

Factor Graphs

used to represent a

f(x1, x2, x3)

Function nodes: for each function we draw

a function node (filled squares).

Factor Graphs

used to represent a

f(x1, x2, x3)

Edges: there is an edge between a variable

node and a function node if the

corresponding variable is an argument of

the corresponding function.

Factor Graphs

used to represent a

f(x1, x2, x3)

Edges: there is an edge between a variable

node and a function node if the

corresponding variable is an argument of

the corresponding function.

Bipartite graph: the resulting graph is a

bipartite graph, i.e. there are only edges

between vertices of different types.

Factor Graphs

General references for factor graphs are:

F. R. Kschischang, B. J. Frey and H.-A. Loeliger, “Factor graphs

and the sum-product algorithm,” IEEE Trans. on Inform. Theory,

IT–47, Feb. 2001.

H.-A. Loeliger, “An introduction to factor graphs,” IEEE Signal

Processing Magazine, Jan. 2004.

Factor Graphs

We assume that we know more about the internal structure of the

function f(x1, x2, x3), e.g.

f(x1, x2, x3) = fA(x1, x2) · fB(x2, x3).

Factor Graphs

f(x1, x2, x3) = fA(x1, x2) · fB(x2, x3).

Then we can take advantage of this fact and the factor graph represents

this structure.X1 X2 X3

Factor Graphs

f(x1, x2, x3) = fA(x1, x2) · fB(x2, x3).

f(., ., .) is called the global function.

Factor Graphs

f(x1, x2, x3) = fA(x1, x2) · fB(x2, x3).

f(., ., .) is called the global function.

fA(., .) and fB(., .) are called local functions.

Factor Graphsf(x1, x2, x3, x4, x5)

= fA(x1) · fB(x2) · fC(x1, x2, x3) · fD(x3, x4) · fE(x3, x5)

X1 X2 X3 X4 X5

fA fB fC fD fE

X1 X2 X3 X4 X5

fA fB fC fD fE

X1 X2 X3 X4 X5

fA fB fC fD fE

Note: One and the same function can be represented by graphs with

different structures: some are more pleasing than others.

Factor Graph of an LDPC Code

In the context of channel coding, we usually take a factor graph to represent

the factorization of the joint pmf/pdf of all occuring variables, i.e. uncoded

symbols, coded symbols, and received symbols. Here it is shown when using

a quasi-cyclic repeat-accumulate LDPC code (binary [44, 22, 8] linear code).

BitInfo-BitChannel-BitInfo-/Channel-BitXOR-FunctionChannel-Function

The Sum-Product Algorithm

Let us consider again the following factor graph (which is a tree).

The global function is

f(x1, x2, x3, x4, x5)

= fA(x1) · fB(x2) · fC(x1, x2, x3) · fD(x3, x4) · fE(x3, x5).

The Sum-Product AlgorithmThe global function is

f(x1, x2, x3, x4, x5) = fA(x1) · fB(x2) · fC(x1, x2, x3) · fD(x3, x4) · fE(x3, x5).

Very often one wants to calculate marginal functions. E.g.

ηX1(x1) =∑

x2,x3,x4,x5

f(x1, x2, x3, x4, x5)

x2,x3,x4,x5

fA(x1) · fB(x2) · fC(x1, x2, x3) · fD(x3, x4) · fE(x3, x5).

Very often one wants to calculate marginal functions. E.g.

ηX1(x1) =∑

x2,x3,x4,x5

f(x1, x2, x3, x4, x5)

x2,x3,x4,x5

ηX3(x3) =∑

x1,x2,x4,x5

f(x1, x2, x3, x4, x5)

x1,x2,x4,x5

The Sum-Product AlgorithmηX1

(x1) =∑

fA(x1) · fB(x2) · fC(x1, x2, x3) · fD(x3, x4) · fE(x3, x5)

(x1) =∑

= fA(x1)︸︷︷︸

fC(x1, x2, x3) fB(x2)︸︷︷︸

︸︷︷︸

fD(x3, x4) · 1︸︷︷︸

︸︷︷︸

fE(x3, x5) · 1︸︷︷︸

︸︷︷︸

(x1) =∑

= fA(x1)︸︷︷︸

fC(x1, x2, x3) fB(x2)︸︷︷︸

︸︷︷︸

fD(x3, x4) · 1︸︷︷︸

︸︷︷︸

fE(x3, x5) · 1︸︷︷︸

µX5→fE(x5)

︸︷︷︸

(x1) =∑

= fA(x1)︸︷︷︸

fC(x1, x2, x3) fB(x2)︸︷︷︸

︸︷︷︸

fD(x3, x4) · 1︸︷︷︸

︸︷︷︸

fE(x3, x5) · 1︸︷︷︸

µX5→fE(x5)

︸︷︷︸

µfE→X3(x3)

︸︷︷︸

(x1) =∑

= fA(x1)︸︷︷︸

fC(x1, x2, x3) fB(x2)︸︷︷︸

︸︷︷︸

fD(x3, x4) · 1︸︷︷︸

µX4→fD(x4)

︸︷︷︸

fE(x3, x5) · 1︸︷︷︸

µX5→fE(x5)

︸︷︷︸

µfE→X3(x3)

︸︷︷︸

(x1) =∑

= fA(x1)︸︷︷︸

fC(x1, x2, x3) fB(x2)︸︷︷︸

︸︷︷︸

fD(x3, x4) · 1︸︷︷︸

µX4→fD(x4)

︸︷︷︸

µfD→X3(x3)

fE(x3, x5) · 1︸︷︷︸

µX5→fE(x5)

︸︷︷︸

µfE→X3(x3)

︸︷︷︸

(x1) =∑

= fA(x1)︸︷︷︸

fC(x1, x2, x3) fB(x2)︸︷︷︸

︸︷︷︸

fD(x3, x4) · 1︸︷︷︸

µX4→fD(x4)

︸︷︷︸

µfD→X3(x3)

fE(x3, x5) · 1︸︷︷︸

µX5→fE(x5)

︸︷︷︸

µfE→X3(x3)

︸︷︷︸

µfX3→fC(x3)

︸︷︷︸

(x1) =∑

= fA(x1)︸︷︷︸

fC(x1, x2, x3) fB(x2)︸︷︷︸

µfB→X2(x2)

︸︷︷︸

fD(x3, x4) · 1︸︷︷︸

µX4→fD(x4)

︸︷︷︸

µfD→X3(x3)

fE(x3, x5) · 1︸︷︷︸

µX5→fE(x5)

︸︷︷︸

µfE→X3(x3)

︸︷︷︸

µfX3→fC(x3)

︸︷︷︸

(x1) =∑

= fA(x1)︸︷︷︸

fC(x1, x2, x3) fB(x2)︸︷︷︸

µfB→X2(x2)

︸︷︷︸

µX2→fC(x2)

fD(x3, x4) · 1︸︷︷︸

µX4→fD(x4)

︸︷︷︸

µfD→X3(x3)

fE(x3, x5) · 1︸︷︷︸

µX5→fE(x5)

︸︷︷︸

µfE→X3(x3)

︸︷︷︸

µfX3→fC(x3)

︸︷︷︸

(x1) =∑

= fA(x1)︸︷︷︸

fC(x1, x2, x3) fB(x2)︸︷︷︸

µfB→X2(x2)

︸︷︷︸

µX2→fC(x2)

fD(x3, x4) · 1︸︷︷︸

µX4→fD(x4)

︸︷︷︸

µfD→X3(x3)

fE(x3, x5) · 1︸︷︷︸

µX5→fE(x5)

︸︷︷︸

µfE→X3(x3)

︸︷︷︸

µfX3→fC(x3)

︸︷︷︸

µfC→X1(x1)

(x1) =∑

= fA(x1)︸︷︷︸

µfA→X1(x1)

fC(x1, x2, x3) fB(x2)︸︷︷︸

µfB→X2(x2)

︸︷︷︸

µX2→fC(x2)

fD(x3, x4) · 1︸︷︷︸

µX4→fD(x4)

︸︷︷︸

µfD→X3(x3)

fE(x3, x5) · 1︸︷︷︸

µX5→fE(x5)

︸︷︷︸

µfE→X3(x3)

︸︷︷︸

µfX3→fC(x3)

︸︷︷︸

µfC→X1(x1)

(x1) =∑

= fA(x1)︸︷︷︸

µfA→X1(x1)

fC(x1, x2, x3) fB(x2)︸︷︷︸

µfB→X2(x2)

︸︷︷︸

µX2→fC(x2)

fD(x3, x4) · 1︸︷︷︸

µX4→fD(x4)

︸︷︷︸

µfD→X3(x3)

fE(x3, x5) · 1︸︷︷︸

µX5→fE(x5)

︸︷︷︸

µfE→X3(x3)

︸︷︷︸

µfX3→fC(x3)

︸︷︷︸

µfC→X1(x1)

The objects µXi→fj(xi) and µfj→Xi

(x1) =∑

= fA(x1)︸︷︷︸

µfA→X1(x1)

fC(x1, x2, x3) fB(x2)︸︷︷︸

µfB→X2(x2)

︸︷︷︸

µX2→fC(x2)

fD(x3, x4) · 1︸︷︷︸

µX4→fD(x4)

︸︷︷︸

µfD→X3(x3)

fE(x3, x5) · 1︸︷︷︸

µX5→fE(x5)

︸︷︷︸

µfE→X3(x3)

︸︷︷︸

µfX3→fC(x3)

︸︷︷︸

µfC→X1(x1)

They are called messages.

(x1) =∑

= fA(x1)︸︷︷︸

µfA→X1(x1)

fC(x1, x2, x3) fB(x2)︸︷︷︸

µfB→X2(x2)

︸︷︷︸

µX2→fC(x2)

fD(x3, x4) · 1︸︷︷︸

µX4→fD(x4)

︸︷︷︸

µfD→X3(x3)

fE(x3, x5) · 1︸︷︷︸

µX5→fE(x5)

︸︷︷︸

µfE→X3(x3)

︸︷︷︸

µfX3→fC(x3)

︸︷︷︸

µfC→X1(x1)

They can be associated with the edge between the vertex Xi and the vertex fj .

(x1) =∑

= fA(x1)︸︷︷︸

µfA→X1(x1)

fC(x1, x2, x3) fB(x2)︸︷︷︸

µfB→X2(x2)

︸︷︷︸

µX2→fC(x2)

fD(x3, x4) · 1︸︷︷︸

µX4→fD(x4)

︸︷︷︸

µfD→X3(x3)

fE(x3, x5) · 1︸︷︷︸

µX5→fE(x5)

︸︷︷︸

µfE→X3(x3)

︸︷︷︸

µfX3→fC(x3)

︸︷︷︸

µfC→X1(x1)

They are functions of xi, i.e. their domain is the alphabet of Xi.

(x1) =∑

= fA(x1)︸︷︷︸

µfA→X1(x1)

fC(x1, x2, x3) fB(x2)︸︷︷︸

µfB→X2(x2)

︸︷︷︸

µX2→fC(x2)

fD(x3, x4) · 1︸︷︷︸

µX4→fD(x4)

︸︷︷︸

µfD→X3(x3)

fE(x3, x5) · 1︸︷︷︸

µX5→fE(x5)

︸︷︷︸

µfE→X3(x3)

︸︷︷︸

µfX3→fC(x3)

︸︷︷︸

µfC→X1(x1)

They are functions of xi, i.e. their domain is the alphabet of Xi.

Note: similar manipulations can be performed for calculating ηX2(x2), ηX3

(x3), ηX4(x4),

ηX5(x5).

Messages necessary for calculating ηX1(x1).

The Sum-Product AlgorithmThe figure shows the messages that are necessary for calculating ηX1

(x1), ηX2(x2),

ηX3(x3), ηX4

(x4), and ηX5(x5).

(x1), ηX2(x2),

ηX3(x3), ηX4

(x4), and ηX5(x5).

Edges: Messages are sent along edges.

(x1), ηX2(x2),

ηX3(x3), ηX4

(x4), and ηX5(x5).

Processing: Taking products and doing summations is done in the vertices.

(x1), ηX2(x2),

ηX3(x3), ηX4

(x4), and ηX5(x5).

Processing: Taking products and doing summations is done in the vertices.

Reuse of messages: We see that messages can be “reused” in the sense that

many partial calculations are the same; so it suffices to perform them only

µX→f4(x)

µX→f4(x) = µf1→X(x) · µf2→X(x) · µf3→X(x)

µX→f4(x)

µX→f4(x) = µf1→X(x) · µf2→X(x) · µf3→X(x)

µf→X4(x4)

µf→X4(x4) =∑

f(x1, x2, x3, x4) · µX1→f (x1) · µX2→f (x2) · µX3→f (x3)

Computation of marginal at variable node:

ηX(x) = µf1→X(x) · µf2→X(x)

· µf3→X(x) · µf4→X(x)

Computation of marginal at variable node:

ηX(x) = µf1→X(x) · µf2→X(x)

· µf3→X(x) · µf4→X(x)

Computation of marginal at function node:

ηf (x1, x2, x3, x4) = f(x1, x2, x3, x4)

· µX1→f (x1) · µX2→f (x2)

· µX3→f (x3) · µX4→f (x4)

Factor graph without loops: in this case it is obvious what

messages have to be calculated when.

⇒ Mode of operation 1

Factor graph without loops: in this case it is obvious what

messages have to be calculated when.

Factor graph with loops: one has to decide what update schedule

to take.

Comments on theSum-Product Algorithm

If the factor graph has no loops then it is obvious what messages

have to be calculated when.

If the factor graphs has loops one has to decide what update

schedule to take.

Depending on the underlying semi-ring one gets different versions

of the summary-product algorithm.

For 〈R,+, · 〉 one gets the sum-product algorithm.

(This is the case discussed above.)

For 〈R+,max, · 〉 one gets the max-product algorithm.

For 〈R,min,+〉 one gets the min-sum algorithm.

Partition function (total sum)

Partition Function

Z =∑

x1,x2,x3,x4,x5

f(x1, x2, x3, x4, x5)

Partition Function

Z =∑

x1,x2,x3,x4,x5

f(x1, x2, x3, x4, x5)

Recall:

ηX1(x1) =∑

x2,x3,x4,x5

f(x1, x2, x3, x4, x5)

ηX2(x2) =∑

x1,x3,x4,x5

f(x1, x2, x3, x4, x5)

......

Partition Function

Z =∑

x1,x2,x3,x4,x5

f(x1, x2, x3, x4, x5)

Recall:

ηX1(x1) =∑

x2,x3,x4,x5

f(x1, x2, x3, x4, x5)

ηX2(x2) =∑

x1,x3,x4,x5

f(x1, x2, x3, x4, x5)

......

Define:

ZX1 =∑

ηX1(x1)

ZX2 =∑

ηX2(x2)

......

Partition Function

Z =∑

x1,x2,x3,x4,x5

f(x1, x2, x3, x4, x5)

Recall:

ηX1(x1) =∑

x2,x3,x4,x5

f(x1, x2, x3, x4, x5)

ηX2(x2) =∑

x1,x3,x4,x5

f(x1, x2, x3, x4, x5)

......

Define:

ZX1 =∑

ηX1(x1) = Z

ZX2 =∑

ηX2(x2) = Z

......

Partition Function

Z =∑

x1,x2,x3,x4,x5

f(x1, x2, x3, x4, x5)

Partition Function

Z =∑

x1,x2,x3,x4,x5

f(x1, x2, x3, x4, x5)

Recall:

......

ηfC(x1, x2, x3) =

f(x1, x2, x3, x4, x5)

......

Partition Function

Z =∑

x1,x2,x3,x4,x5

f(x1, x2, x3, x4, x5)

Recall:

......

ηfC(x1, x2, x3) =

f(x1, x2, x3, x4, x5)

......

Define:

......

x1,x2,x3

ηfC(x1, x2, x3)

......

Partition Function

Z =∑

x1,x2,x3,x4,x5

f(x1, x2, x3, x4, x5)

Recall:

......

ηfC(x1, x2, x3) =

f(x1, x2, x3, x4, x5)

......

Define:

......

x1,x2,x3

ηfC(x1, x2, x3) = Z

......

Partition Function

Z = ZX1 = ZX2 = ZX3 = ZX4 = ZX5 = ZfA = ZfB = ZfC = ZfD = ZfE

Partition Function

Claim:

Z =ZfA · ZfB · ZfC · ZfD · ZfE · ZX1 · ZX2 · ZX3 · ZX4 · ZX5

· Z2X2

· Z3X3

· Z1X4

· Z1X5

(Note: exponents in denominator equal variable node degrees.)

Partition Function

Claim:

Z =Z#vertices

Z#edges=

ZfA · ZfB · ZfC · ZfD · ZfE · ZX1 · ZX2 · ZX3 · ZX4 · ZX5

· Z2X2

· Z3X3

· Z1X4

· Z1X5

(Note: exponents in denominator equal variable node degrees.)

Partition Function

Claim:

Z =Z#vertices

Z#edges=

ZfA · ZfB · ZfC · ZfD · ZfE · ZX1 · ZX2 · ZX3 · ZX4 · ZX5

· Z2X2

· Z3X3

· Z1X4

· Z1X5

(Here we used the fact that for a graph with one component and no cycles it holds

that #vertices = #edges + 1.)

Partition Function

· Z2X2

· Z3X3

· Z1X4

· Z1X5

Partition Function

fBrescaled by γ

· Z2X2

· Z3X3

· Z1X4

· Z1X5

Partition Function

fBrescaled by γ

· Z2X2

· Z3X3

· Z1X4

· Z1X5

Partition Function

fBrescaled by γ

· Z2X2

· Z3X3

· Z1X4

· Z1X5

Partition Function

fBrescaled by γ

· Z2X2

· Z3X3

· Z1X4

· Z1X5

Partition Function

fBrescaled by γ

Z =ZfA · ZfB · ZfC · γZfD · γZfE · ZX1 · ZX2 · γZX3 · γZX4 · γZX5

· Z2X2

· γ3Z3X3

· γ1Z1X4

· γ1Z1X5

Partition Function

fBrescaled by γ

Z =γ5

γ5· ZfA · ZfB · ZfC · ZfD · ZfE · ZX1 · ZX2 · ZX3 · ZX4 · ZX5

· Z2X2

· Z3X3

· Z1X4

· Z1X5

Partition Function

fBrescaled by γ

· Z2X2

· Z3X3

· Z1X4

· Z1X5

Partition Function

fBrescaled by γ

· Z2X2

· Z3X3

· Z1X4

· Z1X5

Remarkable: this expression is invariant to rescaling of

function-node-to-variable-node messages!

Partition Function

· Z2X2

· Z3X3

· Z1X4

· Z1X5

Partition Function

f Zf · ∏

X ZX∏

X Zdeg(X)X

Partition Function

f Zf · ∏

X ZX∏

X Zdeg(X)X

Bethe approximation:

Use the above type of expression also when factor graph has cycles.

Partition Function

f Zf · ∏

X ZX∏

X Zdeg(X)X

Bethe approximation:

Use the above type of expression also when factor graph has cycles.

→ Z ′Bethe

Bethe Partition Function

Basically, we can evaluate the expresion for Z ′Bethe at any iteration

of the SPA.

Factor graph without cycles:

We have Z ′Bethe = Z only at a fixed point of the SPA.

of the SPA.

Factor graph with cycles:

Therefore, we call Z ′Bethe a (local) Bethe partition function only if

we are at a fixed point of the SPA.

of the SPA.

Factor graph with cycles:

Therefore, we call Z ′Bethe a (local) Bethe partition function only if

we are at a fixed point of the SPA.

Factor graph with cycles: the SPA can have multiple fixed points.

We define the Bethe partition function to be

ZBethe , maxfixed points of SPA

Z ′Bethe.

grow,1

grow,2

grow,3

grow,4

grow,5

grow,6

grow,7

grow,8

gcol,1

gcol,2

gcol,3

gcol,4

gcol,5

gcol,6

gcol,7

gcol,8

Global function:

g(a1,1, . . . , a8,8)

gcol,j(a1,j, . . . , a8,j)×

grow,i(ai,1, . . . , xi,8)

Permanent:

perm(θ) = Z =∑

a1,1,...,a8,8

g(a1,1, . . . , a8,8)

grow,1

grow,2

grow,3

grow,4

grow,5

grow,6

grow,7

grow,8

gcol,1

gcol,2

gcol,3

gcol,4

gcol,5

gcol,6

gcol,7

gcol,8

Global function:

g(a1,1, . . . , a8,8)

gcol,j(a1,j, . . . , a8,j)×

grow,i(ai,1, . . . , xi,8)

Bethe Permanent:

permB(θ) , ZBethe

grow,1

grow,2

grow,3

grow,4

grow,5

grow,6

grow,7

grow,8

gcol,1

gcol,2

gcol,3

gcol,4

gcol,5

gcol,6

gcol,7

gcol,8

Global function:

g(a1,1, . . . , a8,8)

gcol,j(a1,j, . . . , a8,j)×

grow,i(ai,1, . . . , xi,8)

Bethe Permanent:

permB(θ) , ZBethe

However, the SPA is a locally operating algorithm and so has its

limitations in the conclusions that it can reach.

grow,1

grow,2

grow,3

grow,4

grow,5

grow,6

grow,7

grow,8

gcol,1

gcol,2

gcol,3

gcol,4

gcol,5

gcol,6

gcol,7

gcol,8

Global function:

g(a1,1, . . . , a8,8)

gcol,j(a1,j, . . . , a8,j)×

grow,i(ai,1, . . . , xi,8)

Bethe Permanent:

permB(θ) , ZBethe

This locality of the SPA turns out to be well-captured by so-called

finite graph covers, especially at fixed points of the SPA.

A combinatorial interpretation

of the Bethe permanent

Combinatorial Characterizationof the Permanent

Consider the matrix

θ1,1 θ1,2

θ2,1 θ2,2

with perm(θ) = θ1,1θ2,2 + θ2,1θ1,2.

Consider the matrix

θ1,1 θ1,2

θ2,1 θ2,2

with perm(θ) = θ1,1θ2,2 + θ2,1θ1,2.

In particular,

with perm(θ) = 1 · 1 + 1 · 1 = 2.

Recall that the permanent of a zero/one matrix like

equals the number of perfect matchings in the following bipartite graph:

Recall that the permanent of a zero/one matrix like

equals the number of perfect matchings in the following bipartite graph:

Namely,

A Combinatorial Interpretationof the Bethe Permanent

Consider the non-negative matrix θ of size n× n.

Let PM×M be the set of all permutation matrices of size M ×M .

For every positive integer M , we define ΨM be the set

P(i,j)}

(i,j)∈[n]2

∣ P(i,j) ∈ PM×M

For P ∈ ΨM we define the P-lifting of θ to be the following

(nM)× (nM) matrix

θ1,1 · · · θ1,n...

θn,1 · · · θn,n

P-lifting−→of θ

θ↑P ,

θ1,1P(1,1) · · · θ1,nP

......

θn,1P(n,1) · · · θn,nP

Degree-M Bethe Permanent

Definition: For any positive integer M , we define the degree-M Bethe

permanent of θ to be

permB,M(θ) , M

θ↑P)

P∈ΨM

Theorem:

permB(θ) = lim supM→∞

permB,M(θ).

Special Case:Deg.-M Bethe Permanent for n = 2

We want to obtain some appreciation why the Bethe permanent of θ is

close to the permanent of θ, and where the differences are.

As before, consider the matrix

θ1,1 θ1,2

θ2,1 θ2,2

with perm(θ) = θ1,1θ2,2 + θ2,1θ1,2.

As before, consider the matrix

θ1,1 θ1,2

θ2,1 θ2,2

with perm(θ) = θ1,1θ2,2 + θ2,1θ1,2.

In particular,

with perm(θ) = 1 · 1 + 1 · 1 = 2.

For this θ, a P-lifting looks like

θ↑P =

1 ·P1,1 1 ·P1,2

1 ·P2,1 1 ·P2,2

P1,1 P1,2

P2,1 P2,2

θ↑P =

1 ·P1,1 1 ·P1,2

1 ·P2,1 1 ·P2,2

P1,1 P1,2

P2,1 P2,2

Applying some row and column permutations, we obtain

θ↑P)

= perm

I P−12,1P2,2P

−11,2P1,1

θ↑P =

1 ·P1,1 1 ·P1,2

1 ·P2,1 1 ·P2,2

P1,1 P1,2

P2,1 P2,2

Applying some row and column permutations, we obtain

θ↑P)

= perm

I P−12,1P2,2P

−11,2P1,1

Therefore,

permB,M(θ) , M

I P′2,2

2,2∈PM×M

Special Case:Degree-2 Bethe Permanent for n = 2

For M = 2 we have

permB,2(θ) , 2

I P′2,2

2,2∈P2×2

For M = 2 we have

permB,2(θ) , 2

I P′2,2

2,2∈P2×2

corresponds to computing the average number of perfect matchings in

the following 2-covers (and taking the 2nd root):

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

For M = 2 we have

permB,2(θ) =2

2!· (4 + 2)

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

For M = 2 we have

permB,2(θ) =2

2!· (4 + 2)

2!· 6 =

2√3 ≈ 1.732

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

For M = 2 we have

permB,2(θ) =2

2!· (4 + 2)

2!· 6 =

2√3 ≈ 1.732 <

2√4 = 2 = perm(θ)

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

Let us have a closer look at the perfect matchings in the graph

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

For this graph, the perfect matchings are

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

Because this double cover consists of two independent copies of the

base graph, the number of perfect matchings is 22 = 4.

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

The coupling of the cycles causes this graph to have fewer than

22 perfect matchings!

On the other hand, for M = 2 we have

permB,2(θ) =2

2!· (4 + 2)

2!· 6 =

2√3 ≈ 1.732 <

2√4 = 2 = perm(θ)

1′′

2′′

1′′

2′′

1′′

2′′

1′′

2′′

permB,3(θ) , 3

I P′2,2

2,2∈P3×3

permB,3(θ) , 3

I P′2,2

2,2∈P3×3

the following 3-covers (and taking the 3rd root):

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

8 4 4 4 2 2

permB,3(θ) =3

3!· (8 + 4 + 4 + 4 + 2 + 2)

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

8 4 4 4 2 2

permB,3(θ) =3

3!· (8 + 4 + 4 + 4 + 2 + 2)

3!· 24 =

3√4 ≈ 1.587

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

8 4 4 4 2 2

permB,3(θ) =3

3!· (8 + 4 + 4 + 4 + 2 + 2)

3!· 24 =

3√4 ≈ 1.587 <

3√8 = 2 = perm(θ)

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

8 4 4 4 2 2

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

1′′

1′′′

2′′

2′′′

The coupling of the cycles causes this graph to have fewer than

23 perfect matchings!

For general M we obtain

permB,M(θ) = M√

M√M + 1,

ζSM: cycle index of the symmetric group over M elements.

The Gibbs free energy function

Gibbs Free EnergyFunction

FGibbs(p)The Gibbs free energy function

FGibbs(p) , −∑

pa · log(

pa · log(pa).

− log(ZGibbs) FGibbs(p)The Gibbs free energy function

FGibbs(p) , −∑

pa · log(

pa · log(pa).

FGibbs(p) , −∑

pa · log(

pa · log(pa).

is defined such that its minimal value is related to the partition function:

Z = exp

−minp

FGibbs(p)

FGibbs(p) , −∑

pa · log(

pa · log(pa).

perm(θ) = Z = exp

−minp

FGibbs(p)

FGibbs(p) , −∑

pa · log(

pa · log(pa).

perm(θ) = Z = exp

−minp

FGibbs(p)

Nice, but it does not yield any computational savings by itself.

− log(ZGibbs) FGibbs(p)

F ′(p)

− log(Z ′)

The Gibbs free energy function

FGibbs(p) , −∑

pa · log(

pa · log(pa).

perm(θ) = Z = exp

−minp

FGibbs(p)

But it suggests other optimization schemes.

The Bethe approximation

Bethe Approximation

The Bethe approximation to the Gibbs free energy function yields such

an alternative optimization scheme.

Bethe Approximation

This approximation is interesting because of the following theorem:

Theorem (Yedidia/Freeman/Weiss, 2000):

Fixed points of the sum-product algorithm (SPA) correspond to

stationary points of the Bethe free energy function.

Bethe Approximation

This approximation is interesting because of the following theorem:

Theorem (Yedidia/Freeman/Weiss, 2000):

Fixed points of the sum-product algorithm (SPA) correspond to

stationary points of the Bethe free energy function.

Definition: We define the Bethe permanent of θ to be

permB(θ) = ZBethe = exp

−minβ

FBethe(β)

Bethe Approximation

However, in general, this approach of replacing the Gibbs free energy by

the Bethe free energy comes with very few guarantees:

Bethe Approximation

The Bethe free energy function might have multiple local minima.

Bethe Approximation

It is unclear how close the (global) minimum of the Bethe free

energy is to the minimum of the Gibbs free energy.

Bethe Approximation

It is unclear how close the (global) minimum of the Bethe free

energy is to the minimum of the Gibbs free energy.

It is unclear if the sum-product algorithm converges (even to a

local minimum of the Bethe free energy).

Bethe Approximation

Luckily, in the case of the permanent approximation problem, the

above-mentioned normal factor graph N(θ) is such that the Bethe free

energy function is very well behaved. In particular, one can show that:

Bethe Approximation

The Bethe free energy function (for a suitable parametrization)

is convex and therefore has no local minima [V., 2010, 2011].

Bethe Approximation

The minimum of the Bethe free energy is quite close to the

minimum of the Gibbs free energy. (More details later.)

Bethe Approximation

The minimum of the Bethe free energy is quite close to the

minimum of the Gibbs free energy. (More details later.)

The sum-product algorithm converges to the minimum of the

Bethe free energy. (More details later.)

Relationship betweenPermanent and Bethe Permanent

permB(θ)

Theorem (Gurvits, 2011)↓≤ perm(θ)

Conjecture (Gurvits, 2011)↓≤

√2n · permB(θ)

permB(θ)

Theorem (Gurvits, 2011)↓≤ perm(θ)

Conjecture (Gurvits, 2011)↓≤

√2n · permB(θ)

This can be rewritten as follows:

nlog permB(θ)

Theorem↓≤ 1

nlog perm(θ)

Conjecture↓≤ 1

nlog permB(θ) + log(

Problem: find large classes of random matrices such that w.h.p.

permB(θ)

Theorem (Gurvits, 2011)↓≤ perm(θ) ≤ O(

√n) · permB(θ).

Problem: find large classes of random matrices such that w.h.p.

permB(θ)

Theorem (Gurvits, 2011)↓≤ perm(θ) ≤ O(

√n) · permB(θ).

This can be rewritten as follows:

nlog permB(θ)

Theorem↓≤ 1

nlog perm(θ) ≤ 1

nlog permB(θ) +O

nlog(n)

Sum-Product Algorithm Convergence

Theorem: Modulo some minor technical conditions on the initial

messages, the sum-product algorithm converges to the (global)

minimum of the Bethe free energy function [V., 2010, 2011].

Sum-Product Algorithm Convergence

Theorem: Modulo some minor technical conditions on the initial

messages, the sum-product algorithm converges to the (global)

minimum of the Bethe free energy function [V., 2010, 2011].

Comment: the first part of the proof of the above theorem is very

similar to the SPA convergence proof in

Bayati and Nair, “A rigorous proof of the cavity method for counting

matchings,” Allerton 2006.

Note that they consider matchings, not perfect matchings. (Although

the perfect matching case can be seen as a limiting case of the matching

setup, the convergence proof of the SPA is incomplete for that case.)

Conclusions

Loopy belief propagagion is no silver bullet.

Conclusions

However, there are interesting setups where it works very well.

Conclusions

Complexity of the permanent estimation based on the SPA is

remarkably low. (Hard to be beaten by any standard convex

optimization algorithm that minimizes the Bethe free energy.)

Conclusions

If the Bethe approximation does not work well, one can try better

approximations, e.g., the Kikuchi approximation.

Conclusions

Note: One can also give a comb. interpr. of the Kikuchi part. func.

Conclusions

Note: One can also give a comb. interpr. of the Kikuchi part. func.

Inspired by the approaches mentioned in this talk, Ryuhei Mori

recently showed that many replica method computations can be

simplified and made quite a bit more intuitive.

Thank you!

References

P. O. Vontobel, “Counting in graph covers: a combinatorial

characterization of the Bethe entropy function,” submitted to IEEE

Trans. Inf. Theory, Nov. 2010, arxiv: 1012.0065.

P. O. Vontobel, “The Bethe permanent of a non-negative matrix,”

Proc. 48th Allerton Conf. on Communications, Control, and

Computing, Allerton House, Monticello, IL, USA, Sep. 29 – Oct. 1,

P. O. Vontobel, “A combinatorial characterization of the Bethe and the

Kikuchi partition functions,” Proc. Inf. Theory Appl. Workshop, UC

San Diego, La Jolla, CA, USA, Feb. 6–11, 2011.

R. Mori, “Connection between annealed free energy and belief

propagation on random factor graph ensembles,” Proc. ISIT 2011, St.

Petersburg, Russia, Jul. 31 – Aug. 5, arxiv: 1102.3132.

should we believe in numbers computed by loopy belief...

Documents

e cient probabilistically checkable...

message-passing algorithms for inferenceand...

advice coins for classical and quantum...

chief information officer group ciog program update · pdf...

inside this issue: points of contact: e-mail: group ciog-6 -...

ipv6 security - menog davidson - ipv6... · ipv6 security...

dynamical zeta functions: what, why and what are the good...

frilingos, tas ms on behalf of sp&i group coord wednesday,...

kernel-size lower bounds: the evidence from...

saat ciog lu

costing services of a large and complex organisation with...

statistical physics approach to compressed...

doing business with ciog — opportunities …...with the...

approximating the tutte polynomial (and the potts...

the treewidth of a linear...

jeffrey d....

multiplying 10-digit numbers using flickr: the power of...

request filled electronically, by e-mail attachment if...