two approximate algorithms for belief updating mini-clustering - mc robert mateescu, rina dechter,...

Two Approximate Algorithms for Belief Updating

Mini-Clustering - MCRobert Mateescu, Rina Dechter, Kalev Kask. "Tree Approximation for Belief Updating", AAAI-2002

Iterative Join-Graph Propagation - IJGP Rina Dechter, Kalev Kask and Robert Mateescu. "Iterative Join-Graph Propagation”, UAI 2002

What is Mini-Clustering?

Mini-Clustering (MC) is an approximate algorithm for belief updating in Bayesian networks

MC is an anytime version of join-tree clustering

MC applies message passing along a cluster tree

The complexity of MC is controlled by a user-adjustable parameter, the i-bound

Empirical evaluation shows that MC is a very effective algorithm, in many cases superior to other approximate schemes (IBP, Gibbs Sampling)

The belief updating problem is the task of computing the posterior probability P(Y|e) of query nodes Y X given evidence e.We focus on the basic case where Y is a single variable Xi

y tables)probabilit al(condition

CPTs are )|(},,...,{

over graph) acyclic (directedDAG a is

domains their ofset theis },...,{

variablesrandom ofset a is },...,{

: where,,,

quadruple a is A

paXPpppP

PGDXBN

network belief

Belief networks

Tree decompositions

property)on intersecti (running subtree connected

a forms set the bleeach variaFor 2.

such that vertex oneexactly is therefunction each For 1.

:satisfying

and sets, twox each verte with gassociatin functions,

labeling are and and treea is where,,, triple

a is network belief afor A

χ(v)}V|X{vXX

χ(v))scope(pψ(v)p

Pψ(v)

Xχ(v)Vv

ψχ(V,E)TT

X,D,G,PBNpositiontree decom

A B C p(a), p(b|a), p(c|a,b)

B C D Fp(d|b), p(f|c,d)

B E Fp(e|b,f)

E F Gp(g|e,f)

Belief network Tree decomposition

Cluster Tree Elimination

Cluster Tree Elimination (CTE) is an exact algorithm

It works by passing messages along a tree decomposition

Basic idea: Each node sends only one message to each of its

neighbors Node u sends a message to its neighbor v only when

u received messages from all its other neighbors

Cluster Tree Elimination

Previous work on tree clustering:

Lauritzen, Spiegelhalter - ‘88 (probabilities) Jensen, Lauritzen, Olesen - ‘90 (probabilities) Shenoy, Shafer - ‘90, Shenoy - ‘97 (general) Dechter, Pearl - ‘89 (constraints) Gottlob, Leone, Scarello - ‘00 (constraints)

)},(),,({)( 21 uxhuxhu )},({)( 1 uxhu )},(),...,,(),,({)( 21 uxhuxhuxhu n

),( )},({)(),(

:message theCompute

vuelim uvhuclusterffvuh

Belief Propagation

h(u,v)

)},(),,(),...,,(),,({)( 21 uvhuxhuxhuxhu n

),|()|()(),()2,1( bacpabpapcbha

),(),|()|(),( )2,3(,

)1,2( fbhdcfpbdpcbhfd

),(),|()|(),( )2,1(,

)3,2( cbhdcfpbdpfbhdc

),(),|(),( )3,4()2,3( fehfbepfbhe

),(),|(),( )3,2()4,3( fbhfbepfehb

),|(),()3,4( fegGpfeh e

Cluster Tree Elimination - example

Cluster Tree Elimination - the messages

),|()|()(),()2,1( bacpabpapcbha

B C D Fp(d|b), p(f|c,d)

h(1,2)(b,c)

B E Fp(e|b,f), h(2,3)(b,f)

E F Gp(g|e,f)

),(),|()|(),( )2,1(,

)3,2( cbhdcfpbdpfbhdc

BFsep(2,3)={B,F}

elim(2,3)={C,D}

Cluster Tree Elimination - properties

Correctness and completeness: Algorithm CTE is correct, i.e. it computes the exact joint probability of a single variable and the evidence.

Time complexity: O ( deg (n+N) d w*+1 )

Space complexity: O ( N d sep)where deg = the maximum degree of a node

n = number of variables (= number of CPTs)

N = number of nodes in the tree decomposition

d = the maximum domain size of a variable

w* = the induced widthsep = the separator size

Mini-Clustering - motivation

Time and space complexity of Cluster Tree Elimination depend on the induced width w* of the problem

When the induced width w* is big, CTE algorithm becomes infeasible

Mini-Clustering - the basic idea

Try to reduce the size of the cluster (the exponent); partition each cluster into mini-clusters with less variables

Accuracy parameter i = maximum number of variables in a mini-cluster

The idea was explored for variable elimination (Mini-Bucket)

Suppose cluster(u) is partitioned into p mini-clusters: mc(1),…,mc(p), each containing at most i variables

TC computes the ‘exact’ message:

We want to process each fmc(k) f separately

),( 1 )(),( vuelim

k kmcfvu fh

Mini-Clustering

),( 1 )(),( vuelim

k kmcfvu fh

Mini-Clustering

Approximate each fmc(k) f , k=2,…,p and take it outside the summation

How to process the mini-clusters to obtain approximations or bounds:

Process all mini-clusters by summation - this gives an upper bound on the joint probability

A tighter upper bound: process one mini-cluster by summation and the others by maximization

Can also use mean operator (average) - this gives an approximation of the joint probability

Split a cluster into mini-clusters =>bound complexity

)()()O(e :decrease complexity lExponentia n rnr eOeO

Idea of Mini-Clustering

),|()|()(:),(1)2,1( bacpabpapcbh

)2,1(H

),|(max:)(

),()|(:)(

2)1,2(

1)2,3(

1)1,2(

dcfpch

fbhbdpbh

)1,2(H

),|(max:)(

),()|(:)(

2)3,2(

1)2,1(

1)3,2(

dcfpfh

cbhbdpbh

)3,2(H

),(),|(:),( 1)3,4(

1)2,3( fehfbepfbh

)2,3(H

)()(),|(:),( 2)3,2(

1)3,2(

1)4,3( fhbhfbepfeh

)4,3(H

),|(:),(1)3,4( fegGpfeh e)3,4(H

Mini-Clustering - example

Mini-Clustering - the messages, i=3

),|()|()(),(1)2,1( bacpabpapcbh

B C D p(d|b), h(1,2)(b,c)

C D F p(f|c,d)

B E Fp(e|b,f),

h1(2,3)(b), h2

(2,3)(f)

E F Gp(g|e,f)

BFsep(2,3)={B,F}

elim(2,3)={C,D} ),|(max)(,

2)3,2( dcfpfh

),()|()( 1)2,1(

1)3,2( cbhbdpbh

Cluster Tree Elimination vs. Mini-Clustering

),()2,1( cbh1

),()1,2( cbh

),()3,2( fbh

),()2,3( fbh

),()4,3( feh

),()3,4( fehEF

),(1)2,1( cbh

1)1,2(

1)3,2(

),(1)2,3( fbh

),(1)4,3( feh

),(1)3,4( feh

)2,1(H

)1,2(H

)3,2(H

)2,3(H

)4,3(H

)3,4(H

Mini-Clustering

Correctness and completeness: Algorithm MC(i) computes a bound (or an approximation) on the joint probability P(Xi,e) of each variable and each of its values.

Time & space complexity: O(n hw* d i)

where hw* = maxu | {f | f (u) } |

Normalization

Algorithms for the belief updating problem compute, in general, the joint probability:

Computing the conditional probability:

is easy to do if exact algorithms can be applied becomes an important issue for approximate

algorithms

evidence node,query ),,( eXeXP ii

evidence node,query ),|( eXeXP ii

MC can compute an (upper) bound on the joint P(Xi,e)

Deriving a bound on the conditional P(Xi|e) is not easy when the exact P(e) is not available

If a lower bound would be available, we could use:

as an upper bound on the posterior

In our experiments we normalized the results and regarded them as approximations of the posterior P(Xi|e)

),( eXP i

)(/),( ePeXP i

Normalization

Experimental results

Algorithms: Exact IBP Gibbs sampling (GS) MC with normalization

(approximate)

Networks (all variables are binary): Coding networks CPCS 54, 360, 422 Grid networks (MxM) Random noisy-OR networks Random networks

We tested MC with max and mean operators

Measures: Normalized Hamming Distance

(NHD) BER (Bit Error Rate) Absolute error Relative error Time

Random networks - Absolute error

evidence=0 evidence=10

Random networks, N=50, P=2, k=2, evid=0, w*=10, 50 instances

i-bound

0 2 4 6 8 10

MCGibbs SamplingIBP

Random networks, N=50, P=2, k=2, evid=10, w*=10, 50 instances

i-bound

0 2 4 6 8 10

MCGibbs SamplingIBP

Coding networks - Bit Error Rate

sigma=0.22 sigma=.51

Coding networks, N=100, P=4, sigma=.51, w*=12, 50 instances

i-bound

0 2 4 6 8 10 12

Coding networks, N=100, P=4, sigma=.22, w*=12, 50 instances

i-bound

0 2 4 6 8 10 12

Noisy-OR networks - Absolute error

Noisy-OR networks, N=50, P=3, evid=10, w*=16, 25 instances

i-bound

0 2 4 6 8 10 12 14 16

MCIBPGibbs Sampling

Noisy-OR networks, N=50, P=3, evid=20, w*=16, 25 instances

i-bound

0 2 4 6 8 10 12 14 16A

MCIBPGibbs Sampling

CPCS422 - Absolute error

CPCS 422, evid=0, w*=23, 1 instance

i-bound

2 4 6 8 10 12 14 16 18

CPCS 422, evid=10, w*=23, 1 instance

i-bound

2 4 6 8 10 12 14 16 18

Grid 15x15, evid=0, w*=22, 10 instances

i-bound

0 2 4 6 8 10 12 14 16 18

Grid 15x15 - 0 evidenceGrid 15x15, evid=0, w*=22, 10 instances

i-bound

0 2 4 6 8 10 12 14 16 18

i-bound

0 2 4 6 8 10 12 14 16 18

i-bound

0 2 4 6 8 10 12 14 16 18

i-bound

0 2 4 6 8 10 12 14 16 18

i-bound

0 2 4 6 8 10 12 14 16 18

i-bound

0 2 4 6 8 10 12 14 16 18

i-bound

0 2 4 6 8 10 12 14 16 18

i-bound

0 2 4 6 8 10 12 14 16 18

MCIBPGibbs Sampling

i-bound

0 2 4 6 8 10 12 14 16 18

MCIBPGibbs Sampling

i-bound

0 2 4 6 8 10 12 14 16 18

MCIBPGibbs Sampling

i-bound

0 2 4 6 8 10 12 14 16 18

MCIBPGibbs Sampling

Conclusion

MC extends the partition based approximation from mini-buckets to general tree decompositions for the problem of belief updating

Empirical evaluation demonstrates its effectiveness and superiority (for certain types of problems, with respect to the measures considered) relative to other existing algorithms

two approximate algorithms for belief updating mini-clustering - mc robert mateescu, rina dechter,...

Documents

qualification round 1 - baskets oldenburg€¦ ·...

because religious belief,. because religious belief, or...

all or nothing belief updating in patients with...

chapter 5 belief updating in bayesian networks

april 4, 2018 kalev ruberg, vice president, digital ... ·...

belief that vs. belief in

noradrenergic and cholinergic modulation of belief updating...

updating directed belief network

the “ava - organ” vanggaard, leif; kuklane, kalev;...

a “k-hypotheses + other” belief updating model dan bohus...

archiving newspaper websites: a case study of the chicago...

kalev h. leetaru yahoo! fellow in residence …from provider...

laili kalev merilin randmaa carl robert jakobson gymnasium...

belief updating in spoken dialog systems

common value experimentation · common knowledge. there is...

c:/kalev/documents and settings/kalev/my documents ... ·...

impairments in probabilistic prediction and bayesian...

stereotypes and belief updating*...2019/11/04 · gender...

kalev leetaru, eric shook, and shaowen wang

sequential updating: a behavioral model of belief...