two approximate algorithms for belief updating mini-clustering - mc robert mateescu, rina dechter,...

Post on 05-Jan-2016

217 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Two Approximate Algorithms for Belief Updating

Mini-Clustering - MCRobert Mateescu, Rina Dechter, Kalev Kask. "Tree Approximation for Belief Updating", AAAI-2002

Iterative Join-Graph Propagation - IJGP Rina Dechter, Kalev Kask and Robert Mateescu. "Iterative Join-Graph Propagation”, UAI 2002

What is Mini-Clustering?

Mini-Clustering (MC) is an approximate algorithm for belief updating in Bayesian networks

MC is an anytime version of join-tree clustering

MC applies message passing along a cluster tree

The complexity of MC is controlled by a user-adjustable parameter, the i-bound

Empirical evaluation shows that MC is a very effective algorithm, in many cases superior to other approximate schemes (IBP, Gibbs Sampling)

The belief updating problem is the task of computing the posterior probability P(Y|e) of query nodes Y X given evidence e.We focus on the basic case where Y is a single variable Xi

G

E

F

C D

B

A

y tables)probabilit al(condition

CPTs are )|(},,...,{

over graph) acyclic (directedDAG a is

domains their ofset theis },...,{

variablesrandom ofset a is },...,{

: where,,,

quadruple a is A

1

1

1

iiin

n

n

paXPpppP

XG

DDD

XXX

PGDXBN

network belief

Belief networks

Tree decompositions

property)on intersecti (running subtree connected

a forms set the bleeach variaFor 2.

and

such that vertex oneexactly is therefunction each For 1.

:satisfying

and sets, twox each verte with gassociatin functions,

labeling are and and treea is where,,, triple

a is network belief afor A

χ(v)}V|X{vXX

χ(v))scope(pψ(v)p

Pp

Pψ(v)

Xχ(v)Vv

ψχ(V,E)TT

X,D,G,PBNpositiontree decom

ii

ii

i

A B C p(a), p(b|a), p(c|a,b)

B C D Fp(d|b), p(f|c,d)

B E Fp(e|b,f)

E F Gp(g|e,f)

EF

BF

BC

G

E

F

C D

B

A

Belief network Tree decomposition

Cluster Tree Elimination

Cluster Tree Elimination (CTE) is an exact algorithm

It works by passing messages along a tree decomposition

Basic idea: Each node sends only one message to each of its

neighbors Node u sends a message to its neighbor v only when

u received messages from all its other neighbors

Cluster Tree Elimination

Previous work on tree clustering:

Lauritzen, Spiegelhalter - ‘88 (probabilities) Jensen, Lauritzen, Olesen - ‘90 (probabilities) Shenoy, Shafer - ‘90, Shenoy - ‘97 (general) Dechter, Pearl - ‘89 (constraints) Gottlob, Leone, Scarello - ‘00 (constraints)

)(u

u v

x1

x2

xn

)},(),,({)( 21 uxhuxhu )},({)( 1 uxhu )},(),...,,(),,({)( 21 uxhuxhuxhu n

),( )},({)(),(

:message theCompute

vuelim uvhuclusterffvuh

Belief Propagation

h(u,v)

)},(),,(),...,,(),,({)( 21 uvhuxhuxhuxhu n

ABC

2

4

),|()|()(),()2,1( bacpabpapcbha

1

3 BEF

EFG

),(),|()|(),( )2,3(,

)1,2( fbhdcfpbdpcbhfd

),(),|()|(),( )2,1(,

)3,2( cbhdcfpbdpfbhdc

),(),|(),( )3,4()2,3( fehfbepfbhe

),(),|(),( )3,2()4,3( fbhfbepfehb

),|(),()3,4( fegGpfeh e

EF

BF

BC

BCDF

G

E

F

C D

B

A

Cluster Tree Elimination - example

Cluster Tree Elimination - the messages

),|()|()(),()2,1( bacpabpapcbha

A B C p(a), p(b|a), p(c|a,b)

B C D Fp(d|b), p(f|c,d)

h(1,2)(b,c)

B E Fp(e|b,f), h(2,3)(b,f)

E F Gp(g|e,f)

),(),|()|(),( )2,1(,

)3,2( cbhdcfpbdpfbhdc

2

4

1

3

EF

BC

BFsep(2,3)={B,F}

elim(2,3)={C,D}

Cluster Tree Elimination - properties

Correctness and completeness: Algorithm CTE is correct, i.e. it computes the exact joint probability of a single variable and the evidence.

Time complexity: O ( deg (n+N) d w*+1 )

Space complexity: O ( N d sep)where deg = the maximum degree of a node

n = number of variables (= number of CPTs)

N = number of nodes in the tree decomposition

d = the maximum domain size of a variable

w* = the induced widthsep = the separator size

Mini-Clustering - motivation

Time and space complexity of Cluster Tree Elimination depend on the induced width w* of the problem

When the induced width w* is big, CTE algorithm becomes infeasible

Mini-Clustering - the basic idea

Try to reduce the size of the cluster (the exponent); partition each cluster into mini-clusters with less variables

Accuracy parameter i = maximum number of variables in a mini-cluster

The idea was explored for variable elimination (Mini-Bucket)

Suppose cluster(u) is partitioned into p mini-clusters: mc(1),…,mc(p), each containing at most i variables

TC computes the ‘exact’ message:

We want to process each fmc(k) f separately

),( 1 )(),( vuelim

p

k kmcfvu fh

Mini-Clustering

),( 1 )(),( vuelim

p

k kmcfvu fh

Mini-Clustering

Approximate each fmc(k) f , k=2,…,p and take it outside the summation

How to process the mini-clusters to obtain approximations or bounds:

Process all mini-clusters by summation - this gives an upper bound on the joint probability

A tighter upper bound: process one mini-cluster by summation and the others by maximization

Can also use mean operator (average) - this gives an approximation of the joint probability

Split a cluster into mini-clusters =>bound complexity

XX gh

)()()O(e :decrease complexity lExponentia n rnr eOeO

Idea of Mini-Clustering

EF

BF

BC

),|()|()(:),(1)2,1( bacpabpapcbh

a

)2,1(H

),|(max:)(

),()|(:)(

,

2)1,2(

1)2,3(

,

1)1,2(

dcfpch

fbhbdpbh

fd

fd

)1,2(H

),|(max:)(

),()|(:)(

,

2)3,2(

1)2,1(

,

1)3,2(

dcfpfh

cbhbdpbh

dc

dc

)3,2(H

),(),|(:),( 1)3,4(

1)2,3( fehfbepfbh

e

)2,3(H

)()(),|(:),( 2)3,2(

1)3,2(

1)4,3( fhbhfbepfeh

b

)4,3(H

),|(:),(1)3,4( fegGpfeh e)3,4(H

ABC

2

4

1

3 BEF

EFG

BCDF

Mini-Clustering - example

Mini-Clustering - the messages, i=3

),|()|()(),(1)2,1( bacpabpapcbh

a

A B C p(a), p(b|a), p(c|a,b)

B C D p(d|b), h(1,2)(b,c)

C D F p(f|c,d)

B E Fp(e|b,f),

h1(2,3)(b), h2

(2,3)(f)

E F Gp(g|e,f)

2

4

1

3

EF

BC

BFsep(2,3)={B,F}

elim(2,3)={C,D} ),|(max)(,

2)3,2( dcfpfh

dc

),()|()( 1)2,1(

,

1)3,2( cbhbdpbh

dc

Cluster Tree Elimination vs. Mini-Clustering

ABC

2

4

),()2,1( cbh1

3 BEF

EFG

),()1,2( cbh

),()3,2( fbh

),()2,3( fbh

),()4,3( feh

),()3,4( fehEF

BF

BC

BCDF

),(1)2,1( cbh

)(

)(2

)1,2(

1)1,2(

ch

bh

)(

)(2

)3,2(

1)3,2(

fh

bh

),(1)2,3( fbh

),(1)4,3( feh

),(1)3,4( feh

)2,1(H

)1,2(H

)3,2(H

)2,3(H

)4,3(H

)3,4(H

ABC

2

4

1

3 BEF

EFG

EF

BF

BC

BCDF

Mini-Clustering

Correctness and completeness: Algorithm MC(i) computes a bound (or an approximation) on the joint probability P(Xi,e) of each variable and each of its values.

Time & space complexity: O(n hw* d i)

where hw* = maxu | {f | f (u) } |

Normalization

Algorithms for the belief updating problem compute, in general, the joint probability:

Computing the conditional probability:

is easy to do if exact algorithms can be applied becomes an important issue for approximate

algorithms

evidence node,query ),,( eXeXP ii

evidence node,query ),|( eXeXP ii

MC can compute an (upper) bound on the joint P(Xi,e)

Deriving a bound on the conditional P(Xi|e) is not easy when the exact P(e) is not available

If a lower bound would be available, we could use:

as an upper bound on the posterior

In our experiments we normalized the results and regarded them as approximations of the posterior P(Xi|e)

),( eXP i

)(eP

)(/),( ePeXP i

Normalization

Experimental results

Algorithms: Exact IBP Gibbs sampling (GS) MC with normalization

(approximate)

Networks (all variables are binary): Coding networks CPCS 54, 360, 422 Grid networks (MxM) Random noisy-OR networks Random networks

We tested MC with max and mean operators

Measures: Normalized Hamming Distance

(NHD) BER (Bit Error Rate) Absolute error Relative error Time

Random networks - Absolute error

evidence=0 evidence=10

Random networks, N=50, P=2, k=2, evid=0, w*=10, 50 instances

i-bound

0 2 4 6 8 10

Abs

olut

e er

ror

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

MCGibbs SamplingIBP

Random networks, N=50, P=2, k=2, evid=10, w*=10, 50 instances

i-bound

0 2 4 6 8 10

Abs

olut

e er

ror

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

MCGibbs SamplingIBP

Coding networks - Bit Error Rate

sigma=0.22 sigma=.51

Coding networks, N=100, P=4, sigma=.51, w*=12, 50 instances

i-bound

0 2 4 6 8 10 12

Bit

Err

or R

ate

0.06

0.08

0.10

0.12

0.14

0.16

0.18

MCIBP

Coding networks, N=100, P=4, sigma=.22, w*=12, 50 instances

i-bound

0 2 4 6 8 10 12

Bit

Err

or R

ate

0.000

0.001

0.002

0.003

0.004

0.005

0.006

0.007

MCIBP

Noisy-OR networks - Absolute error

Noisy-OR networks, N=50, P=3, evid=10, w*=16, 25 instances

i-bound

0 2 4 6 8 10 12 14 16

Abs

olut

e er

ror

1e-5

1e-4

1e-3

1e-2

1e-1

1e+0

MCIBPGibbs Sampling

Noisy-OR networks, N=50, P=3, evid=20, w*=16, 25 instances

i-bound

0 2 4 6 8 10 12 14 16A

bsol

ute

erro

r1e-5

1e-4

1e-3

1e-2

1e-1

1e+0

MCIBPGibbs Sampling

evidence=10 evidence=20

CPCS422 - Absolute error

evidence=0 evidence=10

CPCS 422, evid=0, w*=23, 1 instance

i-bound

2 4 6 8 10 12 14 16 18

Abs

olut

e er

ror

0.00

0.01

0.02

0.03

0.04

0.05

MCIBP

CPCS 422, evid=10, w*=23, 1 instance

i-bound

2 4 6 8 10 12 14 16 18

Abs

olut

e er

ror

0.00

0.01

0.02

0.03

0.04

0.05

MCIBP

Grid 15x15, evid=0, w*=22, 10 instances

i-bound

0 2 4 6 8 10 12 14 16 18

NH

D

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

MCIBP

Grid 15x15 - 0 evidenceGrid 15x15, evid=0, w*=22, 10 instances

i-bound

0 2 4 6 8 10 12 14 16 18

Abs

olut

e er

ror

0.00

0.01

0.02

0.03

0.04

0.05

MCIBP

Grid 15x15, evid=0, w*=22, 10 instances

i-bound

0 2 4 6 8 10 12 14 16 18

Rel

ativ

e er

ror

0.00

0.02

0.04

0.06

0.08

0.10

0.12

MCIBP

Grid 15x15, evid=0, w*=22, 10 instances

i-bound

0 2 4 6 8 10 12 14 16 18

Tim

e (s

eco

nds)

0

2

4

6

8

10

12

MCIBP

Grid 15x15 - 10 evidenceGrid 15x15, evid=10, w*=22, 10 instances

i-bound

0 2 4 6 8 10 12 14 16 18

NH

D

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

MCIBP

Grid 15x15, evid=10, w*=22, 10 instances

i-bound

0 2 4 6 8 10 12 14 16 18

Abs

olut

e er

ror

0.00

0.01

0.02

0.03

0.04

0.05

0.06

MCIBP

Grid 15x15, evid=10, w*=22, 10 instances

i-bound

0 2 4 6 8 10 12 14 16 18

Rel

ativ

e er

ror

0.00

0.02

0.04

0.06

0.08

0.10

0.12

MCIBP

Grid 15x15, evid=10, w*=22, 10 instances

i-bound

0 2 4 6 8 10 12 14 16 18

Tim

e (s

eco

nds)

0

2

4

6

8

10

12

MCIBP

Grid 15x15 - 20 evidenceGrid 15x15, evid=20, w*=22, 10 instances

i-bound

0 2 4 6 8 10 12 14 16 18

NH

D

0.001

0.01

0.1

1

MCIBPGibbs Sampling

Grid 15x15, evid=20, w*=22, 10 instances

i-bound

0 2 4 6 8 10 12 14 16 18

Abs

olut

e er

ror

0.001

0.01

0.1

1

MCIBPGibbs Sampling

Grid 15x15, evid=20, w*=22, 10 instances

i-bound

0 2 4 6 8 10 12 14 16 18

Rel

ativ

e er

ror

0.001

0.01

0.1

1

MCIBPGibbs Sampling

Grid 15x15, evid=20, w*=22, 10 instances

i-bound

0 2 4 6 8 10 12 14 16 18

Tim

e (s

eco

nds)

0

2

4

6

8

10

MCIBPGibbs Sampling

Conclusion

MC extends the partition based approximation from mini-buckets to general tree decompositions for the problem of belief updating

Empirical evaluation demonstrates its effectiveness and superiority (for certain types of problems, with respect to the measures considered) relative to other existing algorithms

top related